Thread

  1. Re: Concrete proposal for large objects and MVCC

    Tatsuo Ishii <t-ishii@sra.co.jp> — 2005-06-10T16:56:25Z

    > I spent a little bit of time thinking about what it would mean exactly
    > for large-object operations to obey MVCC, and decided that there are
    > more worms in that can than I had realized.  Part of the problem is
    > that we have no concept of a lock on an individual LO, and thus
    > operations that really should be serialized, such as DROP, aren't going
    > to work very well.  We could implement DROP as the equivalent of
    > 	DELETE FROM pg_largeobject WHERE loid = nnn;
    > with an MVCC snapshot --- but there is no guarantee that we won't miss
    > a page that someone else is concurrently inserting into that same large
    > object.
    > 
    > So what I'm thinking is that the prudent course is to leave writing
    > semantics as they are, namely SnapshotNow rules.  (What this means
    > in practice is you get "tuple concurrently updated" errors if two
    > transactions try to write the same page of the same LO concurrently.
    > We have seen few if any complaints about that error in connection with
    > LO operations, so ISTM there's not a problem there that needs solving.)
    > 
    > The problem we do need to solve is letting pg_dump have a stable view
    > of the database's large objects.  I propose that we can fix this in
    > a suitably narrow way by making the following definition:
    > 
    > 	* A large object descriptor opened for read-only access saves
    > 	  the current ActiveSnapshot and uses that snapshot to read
    > 	  pg_largeobject for the duration of its existence.
    > 
    > 	* A large object descriptor opened for write-only or read-write
    > 	  access uses SnapshotNow, same as before.
    > 
    > This avoids the risk of creating any serious backwards-compatibility
    > issues: if there's anyone out there who does need SnapshotNow reads,
    > they just have to be sure to open the LO in read-write mode to have
    > fully backward compatible operation.
    > 
    > Comments, objections?
    
    Besides the MVCC issue, I am not sure it's a good idea LO being binded
    to OID. In my understanding OID is solely used to distinguish each LO
    in a database. In another word, it's just a key to LO. I think giving
    explicit key when creating a LO has some benefits:
    
    1) not need to worry about OID wrap around problem
    2) easier to find orpahn LO
    3) for replication systems it's easier to replicate LOs
    
    What do you think?
    --
    Tatsuo Ishii