Thread

reducing the overhead of frequent table locks - now, with WIP patch

Robert Haas <robertmhaas@gmail.com> — 2011-06-03T13:17:08Z

I've now spent enough time working on this issue now to be convinced
that the approach has merit, if we can work out the kinks.  I'll start
with some performance numbers.

The case where the current system for taking table locks is really
hurting us is where we have a large number of backends attempting to
access a small number of relations.  They all fight over the lock
manager lock on whichever partition (or partitions) that relation (or
those relations) fall in.  Increasing the number of partitions doesn't
help, because they are all trying to access the same object, and that
object is only ever going to be in one partition.  To exercise this
case, I chose the following benchmark: pgbench -n -S -T 300 -c 36 -j
36.  I first tested this on my MacBook Pro, with scale factor 10 and
shared_buffers=400MB.  Here are the results of alternating runs
without and with the patch:

tps = 23997.120971 (including connections establishing)
tps = 25003.186860 (including connections establishing)
tps = 23499.257892 (including connections establishing)
tps = 24435.793773 (including connections establishing)
tps = 23579.624360 (including connections establishing)
tps = 24791.974810 (including connections establishing)

As you can see, this works out to a bit more than a 4% improvement on
this two-core box.  I also got access (thanks to Nate Boley) to a
24-core box and ran the same test with scale factor 100 and
shared_buffers=8GB.  Here are the results of alternating runs without
and with the patch on that machine:

tps = 36291.996228 (including connections establishing)
tps = 129242.054578 (including connections establishing)
tps = 36704.393055 (including connections establishing)
tps = 128998.648106 (including connections establishing)
tps = 36531.208898 (including connections establishing)
tps = 131341.367344 (including connections establishing)

That's an improvement of about ~3.5x.  According to the vmstat output,
when running without the patch, the CPU state was about 40% idle.
With the patch, it dropped down to around 6%.

There are numerous problems with the code as it stands at this point.
It crashes if you try to use 2PC, which means the regression tests
fail; it probably does horrible things if you run out of shared
memory; pg_locks knows nothing about the new mechanism (arguably, we
could leave it that way: only locks that can't possibly be conflicting
with anything can be taken using this mechanism, but it would be nice
to fix, I think); and there are likely some other gotchas as well.
Still, the basic mechanism appears to work.

The code is attached, for anyone who may be curious.  Known idiocies
are marked with "ZZZ".  The design was discussed on the previous
thread ("reducing the overhead of frequent table locks"), q.v.  There
are some comments in the patch as well, but more is likely needed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: reducing the overhead of frequent table locks - now, with WIP patch

Kevin Grittner <kevin.grittner@wicourts.gov> — 2011-06-03T14:13:45Z

Robert Haas <robertmhaas@gmail.com> wrote:
 
> That's an improvement of about ~3.5x.
 
Outstanding!
 
I don't want to even peek at this until I've posted the two WIP SSI
patches (now both listed on the "Open Items" page), but will
definitely take a look after that.
 
-Kevin

Re: reducing the overhead of frequent table locks - now, with WIP patch

Robert Haas <robertmhaas@gmail.com> — 2011-06-03T16:40:41Z

On Fri, Jun 3, 2011 at 10:13 AM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> Robert Haas <robertmhaas@gmail.com> wrote:
>
>> That's an improvement of about ~3.5x.
>
> Outstanding!
>
> I don't want to even peek at this until I've posted the two WIP SSI
> patches (now both listed on the "Open Items" page), but will
> definitely take a look after that.

Yeah, those SSI items are important to get nailed down RSN.  But
thanks for your interest in this patch.  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: reducing the overhead of frequent table locks - now, with WIP patch

Noah Misch <noah@leadboat.com> — 2011-06-03T18:10:03Z

On Fri, Jun 03, 2011 at 09:17:08AM -0400, Robert Haas wrote:
> As you can see, this works out to a bit more than a 4% improvement on
> this two-core box.  I also got access (thanks to Nate Boley) to a
> 24-core box and ran the same test with scale factor 100 and
> shared_buffers=8GB.  Here are the results of alternating runs without
> and with the patch on that machine:
> 
> tps = 36291.996228 (including connections establishing)
> tps = 129242.054578 (including connections establishing)
> tps = 36704.393055 (including connections establishing)
> tps = 128998.648106 (including connections establishing)
> tps = 36531.208898 (including connections establishing)
> tps = 131341.367344 (including connections establishing)

Nice!

Re: reducing the overhead of frequent table locks - now, with WIP patch

Simon Riggs <simon@2ndquadrant.com> — 2011-06-04T13:59:40Z

On Fri, Jun 3, 2011 at 2:17 PM, Robert Haas <robertmhaas@gmail.com> wrote:

> I've now spent enough time working on this issue now to be convinced
> that the approach has merit, if we can work out the kinks.

Yes, the approach has merits and I'm sure we can work out the kinks.

> As you can see, this works out to a bit more than a 4% improvement on
> this two-core box.  I also got access (thanks to Nate Boley) to a
> 24-core box and ran the same test with scale factor 100 and
> shared_buffers=8GB.  Here are the results of alternating runs without
> and with the patch on that machine:
>
> tps = 36291.996228 (including connections establishing)
> tps = 129242.054578 (including connections establishing)
> tps = 36704.393055 (including connections establishing)
> tps = 128998.648106 (including connections establishing)
> tps = 36531.208898 (including connections establishing)
> tps = 131341.367344 (including connections establishing)
>
> That's an improvement of about ~3.5x.  According to the vmstat output,
> when running without the patch, the CPU state was about 40% idle.
> With the patch, it dropped down to around 6%.

Congratulations. I believe that is realistic based upon my investigations.

-- 
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: reducing the overhead of frequent table locks - now, with WIP patch

Simon Riggs <simon@2ndquadrant.com> — 2011-06-04T15:01:08Z

On Sat, Jun 4, 2011 at 2:59 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

>> As you can see, this works out to a bit more than a 4% improvement on
>> this two-core box.  I also got access (thanks to Nate Boley) to a
>> 24-core box and ran the same test with scale factor 100 and
>> shared_buffers=8GB.  Here are the results of alternating runs without
>> and with the patch on that machine:
>>
>> tps = 36291.996228 (including connections establishing)
>> tps = 129242.054578 (including connections establishing)
>> tps = 36704.393055 (including connections establishing)
>> tps = 128998.648106 (including connections establishing)
>> tps = 36531.208898 (including connections establishing)
>> tps = 131341.367344 (including connections establishing)
>>
>> That's an improvement of about ~3.5x.  According to the vmstat output,
>> when running without the patch, the CPU state was about 40% idle.
>> With the patch, it dropped down to around 6%.
>
> Congratulations. I believe that is realistic based upon my investigations.

Tom,

You should look at this. It's good.

The approach looks sound to me. It's a fairly isolated patch and we
should be considering this for inclusion in 9.1, not wait another
year.

I will happily add its a completely different approach to the one I'd
been working on, and even more happily is so different from the Oracle
approach that we are definitely unencumbered by patent issues here.
Well done Robert, Noah.

-- 
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: reducing the overhead of frequent table locks - now, with WIP patch

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> — 2011-06-04T15:44:07Z

On 04.06.2011 18:01, Simon Riggs wrote:
> It's a fairly isolated patch and we
> should be considering this for inclusion in 9.1, not wait another
> year.

-1

-- 
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: reducing the overhead of frequent table locks - now, with WIP patch

Tom Lane <tgl@sss.pgh.pa.us> — 2011-06-04T16:55:45Z

Simon Riggs <simon@2ndquadrant.com> writes:
> The approach looks sound to me. It's a fairly isolated patch and we
> should be considering this for inclusion in 9.1, not wait another
> year.

That suggestion is completely insane.  The patch is only WIP and full of
bugs, even according to its author.  Even if it were solid, it is way
too late to be pushing such stuff into 9.1.  We're trying to ship a
release, not find ways to cause it to slip more.

			regards, tom lane

Re: reducing the overhead of frequent table locks - now, with WIP patch

Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> — 2011-06-05T19:04:23Z

On 06/03/2011 03:17 PM, Robert Haas wrote:
[...]
> 
> As you can see, this works out to a bit more than a 4% improvement on
> this two-core box.  I also got access (thanks to Nate Boley) to a
> 24-core box and ran the same test with scale factor 100 and
> shared_buffers=8GB.  Here are the results of alternating runs without
> and with the patch on that machine:
> 
> tps = 36291.996228 (including connections establishing)
> tps = 129242.054578 (including connections establishing)
> tps = 36704.393055 (including connections establishing)
> tps = 128998.648106 (including connections establishing)
> tps = 36531.208898 (including connections establishing)
> tps = 131341.367344 (including connections establishing)
> 
> That's an improvement of about ~3.5x.  According to the vmstat output,
> when running without the patch, the CPU state was about 40% idle.
> With the patch, it dropped down to around 6%.

nice - but lets see on real hardware...

Testing this on a brand new E7-4850 4 Socket/10cores+HT Box - so 80
hardware threads:

first some numbers with -HEAD(-T 120, runtimes at lower -c counts have
fairly high variation in the results, first number is the number of
connections/threads):


-j1:	tps = 7928.965493 (including connections establishing)
-j8:	tps = 53610.572347 (including connections establishing)
-j16:	tps = 80835.446118 (including connections establishing)
-j32:	tps = 75666.731883 (including connections establishing)
-j40:	tps = 74628.568388 (including connections establishing)
-j64.	tps = 68268.081973 (including connections establishing)
-c80	tps = 66704.216166 (including connections establishing)

postgresql is completely lock limited in this test anything beyond
around -j10 is basically not able to push the box to more than 80% IDLE(!)


and now with the patch applied:

-j1:	tps = 7783.295587 (including connections establishing)	
-j8:	tps = 44361.661947 (including connections establishing)
-j16:	tps = 92270.464541 (including connections establishing)
-j24:	tps = 108259.524782 (including connections establishing)
-j32:	tps = 183337.422612 (including connections establishing)
-j40	tps = 209616.052430 (including connections establishing)
-j48:	tps = 229621.292382 (including connections establishing)
-j56:	tps = 218690.391603 (including connections establishing)
-j64:	tps = 188028.348501 (including connections establishing)
-j80.	tps = 118814.741609 (including connections establishing)


so much better - but I still think there is some headroom left still,
although pgbench itself is a CPU hog in those benchmark with eating up
to 10 cores in the worst case scenario - will retest with sysbench which
in the past showed more reasonable CPU usage for me.



and a profile(patched code) for the -j48(aka fastest) case:

731535   11.8408  postgres                 s_lock
291878    4.7244  postgres                 LWLockAcquire
242373    3.9231  postgres                 AllocSetAlloc
239083    3.8698  postgres                 LWLockRelease
202341    3.2751  postgres                 SearchCatCache
190055    3.0763  postgres                 hash_search_with_hash_value
187148    3.0292  postgres                 base_yyparse
173265    2.8045  postgres                 GetSnapshotData
75700     1.2253  postgres                 core_yylex
74974     1.2135  postgres                 MemoryContextAllocZeroAligned
61404     0.9939  postgres                 _bt_compare
57529     0.9312  postgres                 MemoryContextAlloc


and one for the -j80 case(also patched).


485798   48.9667  postgres                 s_lock
60327     6.0808  postgres                 LWLockAcquire
57049     5.7503  postgres                 LWLockRelease
18357     1.8503  postgres                 hash_search_with_hash_value
17033     1.7169  postgres                 GetSnapshotData
14763     1.4881  postgres                 base_yyparse
14460     1.4575  postgres                 SearchCatCache
13975     1.4086  postgres                 AllocSetAlloc
6416      0.6467  postgres                 PinBuffer
5024      0.5064  postgres                 SIGetDataEntries
4704      0.4741  postgres                 core_yylex
4625      0.4662  postgres                 _bt_compare



Stefan

Re: reducing the overhead of frequent table locks - now, with WIP patch

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> — 2011-06-05T19:12:43Z

On 05.06.2011 22:04, Stefan Kaltenbrunner wrote:
> and one for the -j80 case(also patched).
>
>
> 485798   48.9667  postgres                 s_lock
> 60327     6.0808  postgres                 LWLockAcquire
> 57049     5.7503  postgres                 LWLockRelease
> 18357     1.8503  postgres                 hash_search_with_hash_value
> 17033     1.7169  postgres                 GetSnapshotData
> 14763     1.4881  postgres                 base_yyparse
> 14460     1.4575  postgres                 SearchCatCache
> 13975     1.4086  postgres                 AllocSetAlloc
> 6416      0.6467  postgres                 PinBuffer
> 5024      0.5064  postgres                 SIGetDataEntries
> 4704      0.4741  postgres                 core_yylex
> 4625      0.4662  postgres                 _bt_compare

Hmm, does that mean that it's spending 50% of the time spinning on a 
spinlock? That's bad. It's one thing to be contended on a lock, and have 
a lot of idle time because of that, but it's even worse to spend a lot 
of time spinning because that CPU time won't be spent on doing more 
useful work, even if there is some other process on the system that 
could make use of that CPU time.

I like the overall improvement on the throughput, of course, but we have 
to find a way to avoid the busy-wait.

-- 
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: reducing the overhead of frequent table locks - now, with WIP patch

Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> — 2011-06-05T20:01:32Z

On 06/05/2011 09:12 PM, Heikki Linnakangas wrote:
> On 05.06.2011 22:04, Stefan Kaltenbrunner wrote:
>> and one for the -j80 case(also patched).
>>
>>
>> 485798   48.9667  postgres                 s_lock
>> 60327     6.0808  postgres                 LWLockAcquire
>> 57049     5.7503  postgres                 LWLockRelease
>> 18357     1.8503  postgres                 hash_search_with_hash_value
>> 17033     1.7169  postgres                 GetSnapshotData
>> 14763     1.4881  postgres                 base_yyparse
>> 14460     1.4575  postgres                 SearchCatCache
>> 13975     1.4086  postgres                 AllocSetAlloc
>> 6416      0.6467  postgres                 PinBuffer
>> 5024      0.5064  postgres                 SIGetDataEntries
>> 4704      0.4741  postgres                 core_yylex
>> 4625      0.4662  postgres                 _bt_compare
> 
> Hmm, does that mean that it's spending 50% of the time spinning on a
> spinlock? That's bad. It's one thing to be contended on a lock, and have
> a lot of idle time because of that, but it's even worse to spend a lot
> of time spinning because that CPU time won't be spent on doing more
> useful work, even if there is some other process on the system that
> could make use of that CPU time.

well yeah - we are broken right now with only being able to use ~20% of
CPU on a modern mid-range box, but using 80% CPU (or 4x like in the
above case) and only getting less than 2x the performance seems wrong as
well. I also wonder if we are still missing something fundamental -
because even with the current patch we are quite far away from linear
scaling and light-years from some of our competitors...


Stefan

Re: reducing the overhead of frequent table locks - now, with WIP patch

Robert Haas <robertmhaas@gmail.com> — 2011-06-05T21:46:32Z

On Sun, Jun 5, 2011 at 4:01 PM, Stefan Kaltenbrunner
<stefan@kaltenbrunner.cc> wrote:
> On 06/05/2011 09:12 PM, Heikki Linnakangas wrote:
>> On 05.06.2011 22:04, Stefan Kaltenbrunner wrote:
>>> and one for the -j80 case(also patched).
>>>
>>>
>>> 485798   48.9667  postgres                 s_lock
>>> 60327     6.0808  postgres                 LWLockAcquire
>>> 57049     5.7503  postgres                 LWLockRelease
>>> 18357     1.8503  postgres                 hash_search_with_hash_value
>>> 17033     1.7169  postgres                 GetSnapshotData
>>> 14763     1.4881  postgres                 base_yyparse
>>> 14460     1.4575  postgres                 SearchCatCache
>>> 13975     1.4086  postgres                 AllocSetAlloc
>>> 6416      0.6467  postgres                 PinBuffer
>>> 5024      0.5064  postgres                 SIGetDataEntries
>>> 4704      0.4741  postgres                 core_yylex
>>> 4625      0.4662  postgres                 _bt_compare
>>
>> Hmm, does that mean that it's spending 50% of the time spinning on a
>> spinlock? That's bad. It's one thing to be contended on a lock, and have
>> a lot of idle time because of that, but it's even worse to spend a lot
>> of time spinning because that CPU time won't be spent on doing more
>> useful work, even if there is some other process on the system that
>> could make use of that CPU time.
>
> well yeah - we are broken right now with only being able to use ~20% of
> CPU on a modern mid-range box, but using 80% CPU (or 4x like in the
> above case) and only getting less than 2x the performance seems wrong as
> well. I also wonder if we are still missing something fundamental -
> because even with the current patch we are quite far away from linear
> scaling and light-years from some of our competitors...

Could you compile with LWLOCK_STATS, rerun these tests, total up the
"blk" numbers by LWLockId, and post the results?  (Actually, totalling
up the shacq and exacq numbers would be useful as well, if you
wouldn't mind.)

Unless I very much miss my guess, we're going to see zero contention
on the new structures introduced by this patch.  Rather, I suspect
what we're going to find is that, with the hideous contention on one
particular lock manager partition lock removed, there's a more
spread-out contention problem, likely involving the lock manager
partition lock, the buffer mapping locks, and possibly other LWLocks
as well.  The fact that the system is busy-waiting rather than just
not using the CPU at all probably means that the remaining contention
is more spread out than that which is removed by this patch.  We don't
actually have everything pile up on a single LWLock (as happens in git
master), but we do spend a lot of time fighting cache lines away from
other CPUs.  Or at any rate, that's my guess: we need some real
numbers to know for sure.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: reducing the overhead of frequent table locks - now, with WIP patch

Robert Haas <robertmhaas@gmail.com> — 2011-06-06T02:16:32Z

On Sun, Jun 5, 2011 at 5:46 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> Could you compile with LWLOCK_STATS, rerun these tests, total up the
> "blk" numbers by LWLockId, and post the results?  (Actually, totalling
> up the shacq and exacq numbers would be useful as well, if you
> wouldn't mind.)

I did this on the loaner 24-core box from Nate Boley and got the
following results.  This is just the LWLocks that had blk>0.

lwlock 0: shacq 0 exacq 200625 blk 24044
lwlock 4: shacq 80101430 exacq 196 blk 28
lwlock 33: shacq 8333673 exacq 11977 blk 864
lwlock 34: shacq 7092293 exacq 11890 blk 803
lwlock 35: shacq 7893875 exacq 11909 blk 848
lwlock 36: shacq 7567514 exacq 11912 blk 830
lwlock 37: shacq 7427774 exacq 11930 blk 745
lwlock 38: shacq 7120108 exacq 11989 blk 853
lwlock 39: shacq 7584952 exacq 11982 blk 782
lwlock 40: shacq 7949867 exacq 12056 blk 821
lwlock 41: shacq 6612240 exacq 11929 blk 746
lwlock 42: shacq 47512112 exacq 11844 blk 4503
lwlock 43: shacq 7943511 exacq 11871 blk 878
lwlock 44: shacq 7534558 exacq 12033 blk 800
lwlock 45: shacq 7128256 exacq 12045 blk 856
lwlock 46: shacq 7575339 exacq 12015 blk 818
lwlock 47: shacq 6745173 exacq 12094 blk 806
lwlock 48: shacq 8410348 exacq 12104 blk 977
lwlock 49: shacq 0 exacq 5007594 blk 172533
lwlock 50: shacq 0 exacq 5011704 blk 172282
lwlock 51: shacq 0 exacq 5003356 blk 172802
lwlock 52: shacq 0 exacq 5009020 blk 174648
lwlock 53: shacq 0 exacq 5010808 blk 172080
lwlock 54: shacq 0 exacq 5004908 blk 169934
lwlock 55: shacq 0 exacq 5009324 blk 170281
lwlock 56: shacq 0 exacq 5005904 blk 171001
lwlock 57: shacq 0 exacq 5006984 blk 169942
lwlock 58: shacq 0 exacq 5000346 blk 170001
lwlock 59: shacq 0 exacq 5004884 blk 170484
lwlock 60: shacq 0 exacq 5006304 blk 171325
lwlock 61: shacq 0 exacq 5008421 blk 170866
lwlock 62: shacq 0 exacq 5008162 blk 170868
lwlock 63: shacq 0 exacq 5002238 blk 170291
lwlock 64: shacq 0 exacq 5005348 blk 169764
lwlock 307: shacq 0 exacq 2 blk 1
lwlock 315: shacq 0 exacq 3 blk 2
lwlock 337: shacq 0 exacq 4 blk 3
lwlock 345: shacq 0 exacq 2 blk 1
lwlock 349: shacq 0 exacq 2 blk 1
lwlock 231251: shacq 0 exacq 2 blk 1
lwlock 253831: shacq 0 exacq 2 blk 1

So basically, even with the patch, at 24 cores the lock manager locks
are still under tremendous pressure.  But note that there's a big
difference between what's happening here and what's happening without
the patch.  Here's without the patch:

lwlock 0: shacq 0 exacq 191613 blk 17591
lwlock 4: shacq 21543085 exacq 102 blk 20
lwlock 33: shacq 2237938 exacq 11976 blk 463
lwlock 34: shacq 1907344 exacq 11890 blk 458
lwlock 35: shacq 2125308 exacq 11908 blk 442
lwlock 36: shacq 2038220 exacq 11912 blk 430
lwlock 37: shacq 1998059 exacq 11927 blk 449
lwlock 38: shacq 1916179 exacq 11953 blk 409
lwlock 39: shacq 2042173 exacq 12019 blk 479
lwlock 40: shacq 2140002 exacq 12056 blk 448
lwlock 41: shacq 1776772 exacq 11928 blk 392
lwlock 42: shacq 12777368 exacq 11842 blk 2451
lwlock 43: shacq 2132240 exacq 11869 blk 478
lwlock 44: shacq 2026845 exacq 12031 blk 446
lwlock 45: shacq 1918618 exacq 12045 blk 449
lwlock 46: shacq 2038437 exacq 12011 blk 472
lwlock 47: shacq 1814660 exacq 12089 blk 401
lwlock 48: shacq 2261208 exacq 12105 blk 478
lwlock 49: shacq 0 exacq 1347524 blk 17020
lwlock 50: shacq 0 exacq 1350678 blk 16888
lwlock 51: shacq 0 exacq 1346260 blk 16744
lwlock 52: shacq 0 exacq 1348432 blk 16864
lwlock 53: shacq 0 exacq 22216779 blk 4914363
lwlock 54: shacq 0 exacq 22217309 blk 4525381
lwlock 55: shacq 0 exacq 1348406 blk 13438
lwlock 56: shacq 0 exacq 1345996 blk 13299
lwlock 57: shacq 0 exacq 1347890 blk 13654
lwlock 58: shacq 0 exacq 1343486 blk 13349
lwlock 59: shacq 0 exacq 1346198 blk 13471
lwlock 60: shacq 0 exacq 1346236 blk 13532
lwlock 61: shacq 0 exacq 1343688 blk 13547
lwlock 62: shacq 0 exacq 1350068 blk 13614
lwlock 63: shacq 0 exacq 1345302 blk 13420
lwlock 64: shacq 0 exacq 1348858 blk 13635
lwlock 321: shacq 0 exacq 2 blk 1
lwlock 329: shacq 0 exacq 4 blk 3
lwlock 337: shacq 0 exacq 6 blk 4
lwlock 347: shacq 0 exacq 5 blk 4
lwlock 357: shacq 0 exacq 3 blk 2
lwlock 363: shacq 0 exacq 3 blk 2
lwlock 369: shacq 0 exacq 4 blk 3
lwlock 379: shacq 0 exacq 2 blk 1
lwlock 383: shacq 0 exacq 2 blk 1
lwlock 445: shacq 0 exacq 2 blk 1
lwlock 449: shacq 0 exacq 2 blk 1
lwlock 451: shacq 0 exacq 2 blk 1
lwlock 1023: shacq 0 exacq 2 blk 1
lwlock 11401: shacq 0 exacq 2 blk 1
lwlock 115591: shacq 0 exacq 2 blk 1
lwlock 117177: shacq 0 exacq 2 blk 1
lwlock 362839: shacq 0 exacq 2 blk 1

In the unpatched case, two lock manager locks are getting beaten to
death, and the others all about equally contended.  By eliminating the
portion of the lock manager contention that pertains specifically to
the two heavily trafficked locks, system throughput improves by about
3.5x - and, not surprisingly, traffic on the lock manager locks
increases by approximately the same multiple.  Those locks now become
the contention bottleneck, with about 12x the blocking they had
pre-patch.  I'm definitely interested in investigating what to do
about that, but I don't think it's this patch's problem to fix all of
our lock manager bottlenecks.  Another thing to note is that
pre-patch, the two really badly contented LWLocks were blocking about
22% of the time; post-patch, all of the lock manager locks are
blocking about 3.4% of the time.  That's certainly not great, but it's
progress.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: reducing the overhead of frequent table locks - now, with WIP patch

Robert Haas <robertmhaas@gmail.com> — 2011-06-06T04:12:32Z

On Sun, Jun 5, 2011 at 10:16 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> I'm definitely interested in investigating what to do
> about that, but I don't think it's this patch's problem to fix all of
> our lock manager bottlenecks.

I did some further investigation of this.  It appears that more than
99% of the lock manager lwlock traffic that remains with this patch
applied has locktag_type == LOCKTAG_VIRTUALTRANSACTION.  Every SELECT
statement runs in a separate transaction, and for each new transaction
we run VirtualXactLockTableInsert(), which takes a lock on the vxid of
that transaction, so that other processes can wait for it.  That
requires acquiring and releasing a lock manager partition lock, and we
have to do the same thing a moment later at transaction end to dump
the lock.

A quick grep seems to indicate that the only places where we actually
make use of those VXID locks are in DefineIndex(), when CREATE INDEX
CONCURRENTLY is in use, and during Hot Standby, when max_standby_delay
expires.  Considering that these are not commonplace events, it seems
tremendously wasteful to incur the overhead for every transaction.  It
might be possible to make the lock entry spring into existence "on
demand" - i.e. if a backend wants to wait on a vxid entry, it creates
the LOCK and PROCLOCK objects for that vxid.  That presents a few
synchronization challenges, and plus we have to make sure that the
backend that's just been "given" a lock knows that it needs to release
it, but those seem like they might be manageable problems, especially
given the new infrastructure introduced by the current patch, which
already has to deal with some of those issues.  I'll look into this
further.

It's likely that if we lick this problem, the BufFreelistLock and
BufMappingLocks are going to be the next hot spot.  Of course, we're
ignoring the ten-thousand pound gorilla in the corner, which is that
on write workloads we have a pretty bad contention problem with
WALInsertLock, which I fear will not be so easily addressed.  But one
problem at a time, I guess.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: reducing the overhead of frequent table locks - now, with WIP patch

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> — 2011-06-06T06:54:48Z

On 06.06.2011 07:12, Robert Haas wrote:
> I did some further investigation of this.  It appears that more than
> 99% of the lock manager lwlock traffic that remains with this patch
> applied has locktag_type == LOCKTAG_VIRTUALTRANSACTION.  Every SELECT
> statement runs in a separate transaction, and for each new transaction
> we run VirtualXactLockTableInsert(), which takes a lock on the vxid of
> that transaction, so that other processes can wait for it.  That
> requires acquiring and releasing a lock manager partition lock, and we
> have to do the same thing a moment later at transaction end to dump
> the lock.
>
> A quick grep seems to indicate that the only places where we actually
> make use of those VXID locks are in DefineIndex(), when CREATE INDEX
> CONCURRENTLY is in use, and during Hot Standby, when max_standby_delay
> expires.  Considering that these are not commonplace events, it seems
> tremendously wasteful to incur the overhead for every transaction.  It
> might be possible to make the lock entry spring into existence "on
> demand" - i.e. if a backend wants to wait on a vxid entry, it creates
> the LOCK and PROCLOCK objects for that vxid.  That presents a few
> synchronization challenges, and plus we have to make sure that the
> backend that's just been "given" a lock knows that it needs to release
> it, but those seem like they might be manageable problems, especially
> given the new infrastructure introduced by the current patch, which
> already has to deal with some of those issues.  I'll look into this
> further.

Ah, I remember I saw that vxid lock pop up quite high in an oprofile 
profile recently. I think it was the case of executing a lot of very 
simple prepared queries. So it would be nice to address that, even from 
a single CPU point of view.

-- 
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: reducing the overhead of frequent table locks - now, with WIP patch

Simon Riggs <simon@2ndquadrant.com> — 2011-06-06T09:40:40Z

On Sat, Jun 4, 2011 at 5:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
>> The approach looks sound to me. It's a fairly isolated patch and we
>> should be considering this for inclusion in 9.1, not wait another
>> year.
>
> That suggestion is completely insane.  The patch is only WIP and full of
> bugs, even according to its author.  Even if it were solid, it is way
> too late to be pushing such stuff into 9.1.  We're trying to ship a
> release, not find ways to cause it to slip more.

In 8.3, you implemented virtual transactionids days before we produced
a Release Candidate, against my recommendation.

At that time, I didn't start questioning your sanity. In fact we all
applauded that because it was a great performance gain.

The fact that you disagree with me does not make me insane. Inaction
on this point, resulting in a year's delay, will be considered to be a
gross waste by the majority of objective observers.

-- 
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: reducing the overhead of frequent table locks - now, with WIP patch

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> — 2011-06-06T10:19:54Z

On 06.06.2011 12:40, Simon Riggs wrote:
> On Sat, Jun 4, 2011 at 5:55 PM, Tom Lane<tgl@sss.pgh.pa.us>  wrote:
>> Simon Riggs<simon@2ndquadrant.com>  writes:
>>> The approach looks sound to me. It's a fairly isolated patch and we
>>> should be considering this for inclusion in 9.1, not wait another
>>> year.
>>
>> That suggestion is completely insane.  The patch is only WIP and full of
>> bugs, even according to its author.  Even if it were solid, it is way
>> too late to be pushing such stuff into 9.1.  We're trying to ship a
>> release, not find ways to cause it to slip more.
>
> In 8.3, you implemented virtual transactionids days before we produced
> a Release Candidate, against my recommendation.

FWIW, this bottleneck was not introduced by the introduction of virtual 
transaction ids. Before that patch, we just took the lock on the real 
transaction id instead.

> The fact that you disagree with me does not make me insane.

You are not insane, even if your suggestion is.

-- 
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: reducing the overhead of frequent table locks - now, with WIP patch

Robert Haas <robertmhaas@gmail.com> — 2011-06-06T11:59:58Z

On Mon, Jun 6, 2011 at 2:54 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Ah, I remember I saw that vxid lock pop up quite high in an oprofile profile
> recently. I think it was the case of executing a lot of very simple prepared
> queries. So it would be nice to address that, even from a single CPU point
> of view.

It doesn't seem too hard to do, although I have to think about the
details.  Even though the VXID locks involved are Exclusive locks,
they are actually very much like the "weak" locks that the current
patch accelerates, because the Exclusive lock is taken only by the
VXID owner, and it can therefore be safely assumed that the initial
lock acquisition won't block anything.  Therefore, it's really
unnecessary to touch the primary lock table at transaction start (and
to only touch it at the end if someone's waiting).  However, there's a
fly in the ointment: when someone tries to ShareLock a VXID, we need
to determine whether that VXID is still around and, if so, make an
Exclusive lock entry for it in the primary lock table.  And, unlike
what I'm doing for strong relation locks, it's probably NOT acceptable
for that to acquire and release every per-backend LWLock, because
every place that waits for VXID locks waits for a list of locks in
sequence, so we could end up with O(n^2) behavior.  Now, in theory
that's not a huge problem: the VXID includes the backend ID, so we
ought to be able to figure out which single per-backend LWLock is of
interest and just acquire/release that one.  Unfortunately, it appears
that there's no easy way to go from a backend ID to a PGPROC.  The
backend IDs are offsets into the "ProcState" array, so they give us a
pointer to the backend's sinval state, not its PGPROC.  And while the
PGPROC has a pointer to the sinval info, there's no pointer in the
opposite direction.  Even if there were, we'd probably need to hold
SInvalWriteLock in shared mode to follow it.

That might not be the end of the world, since VXID locks are fairly
infrequently used, but it's certainly a little grotty.  I do rather
wonder if we should be trying to reduce the number of separate places
where we list the running processes.  We have arrays of PGPROC
structures, and then we have one set of pointers to PGPROCs in the
ProcArray, and then we have the ProcState structures for sinval.  I
wonder if there's some way to rearrange all this to simplify the
bookkeeping.

BTW, how do you identify from oprofile that *vxid* locks were the
problem?  I didn't think it could produce that level of detail.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: reducing the overhead of frequent table locks - now, with WIP patch

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> — 2011-06-06T12:02:07Z

On 06.06.2011 07:12, Robert Haas wrote:
> I did some further investigation of this.  It appears that more than
> 99% of the lock manager lwlock traffic that remains with this patch
> applied has locktag_type == LOCKTAG_VIRTUALTRANSACTION.  Every SELECT
> statement runs in a separate transaction, and for each new transaction
> we run VirtualXactLockTableInsert(), which takes a lock on the vxid of
> that transaction, so that other processes can wait for it.  That
> requires acquiring and releasing a lock manager partition lock, and we
> have to do the same thing a moment later at transaction end to dump
> the lock.
>
> A quick grep seems to indicate that the only places where we actually
> make use of those VXID locks are in DefineIndex(), when CREATE INDEX
> CONCURRENTLY is in use, and during Hot Standby, when max_standby_delay
> expires.  Considering that these are not commonplace events, it seems
> tremendously wasteful to incur the overhead for every transaction.  It
> might be possible to make the lock entry spring into existence "on
> demand" - i.e. if a backend wants to wait on a vxid entry, it creates
> the LOCK and PROCLOCK objects for that vxid.  That presents a few
> synchronization challenges, and plus we have to make sure that the
> backend that's just been "given" a lock knows that it needs to release
> it, but those seem like they might be manageable problems, especially
> given the new infrastructure introduced by the current patch, which
> already has to deal with some of those issues.  I'll look into this
> further.

At the moment, the transaction with given vxid acquires an ExclusiveLock 
on the vxid, and anyone who wants to wait for it to finish acquires a 
ShareLock. If we simply reverse that, so that the transaction itself 
takes ShareLock, and anyone wanting to wait on it take an ExclusiveLock, 
will this fastlock patch bust this bottleneck too?

-- 
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: reducing the overhead of frequent table locks - now, with WIP patch

Robert Haas <robertmhaas@gmail.com> — 2011-06-06T12:08:04Z

On Mon, Jun 6, 2011 at 8:02 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> On 06.06.2011 07:12, Robert Haas wrote:
>>
>> I did some further investigation of this.  It appears that more than
>> 99% of the lock manager lwlock traffic that remains with this patch
>> applied has locktag_type == LOCKTAG_VIRTUALTRANSACTION.  Every SELECT
>> statement runs in a separate transaction, and for each new transaction
>> we run VirtualXactLockTableInsert(), which takes a lock on the vxid of
>> that transaction, so that other processes can wait for it.  That
>> requires acquiring and releasing a lock manager partition lock, and we
>> have to do the same thing a moment later at transaction end to dump
>> the lock.
>>
>> A quick grep seems to indicate that the only places where we actually
>> make use of those VXID locks are in DefineIndex(), when CREATE INDEX
>> CONCURRENTLY is in use, and during Hot Standby, when max_standby_delay
>> expires.  Considering that these are not commonplace events, it seems
>> tremendously wasteful to incur the overhead for every transaction.  It
>> might be possible to make the lock entry spring into existence "on
>> demand" - i.e. if a backend wants to wait on a vxid entry, it creates
>> the LOCK and PROCLOCK objects for that vxid.  That presents a few
>> synchronization challenges, and plus we have to make sure that the
>> backend that's just been "given" a lock knows that it needs to release
>> it, but those seem like they might be manageable problems, especially
>> given the new infrastructure introduced by the current patch, which
>> already has to deal with some of those issues.  I'll look into this
>> further.
>
> At the moment, the transaction with given vxid acquires an ExclusiveLock on
> the vxid, and anyone who wants to wait for it to finish acquires a
> ShareLock. If we simply reverse that, so that the transaction itself takes
> ShareLock, and anyone wanting to wait on it take an ExclusiveLock, will this
> fastlock patch bust this bottleneck too?

Not without some further twaddling.  Right now, the fast path only
applies when you are taking a lock < ShareUpdateExclusiveLock on an
unshared relation.  See also the email I just sent on why using the
exact same mechanism might not be such a hot idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: reducing the overhead of frequent table locks - now, with WIP patch

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> — 2011-06-06T12:18:20Z

On 06.06.2011 14:59, Robert Haas wrote:
> BTW, how do you identify from oprofile that *vxid* locks were the
> problem?  I didn't think it could produce that level of detail.

It can show the call stack of each call, with --callgraph=n option, 
where you can see what percentage of the calls to LockAcquire come from 
VirtualXactLockTableInsert.

-- 
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: reducing the overhead of frequent table locks - now, with WIP patch

Robert Haas <robertmhaas@gmail.com> — 2011-06-06T13:16:41Z

On Fri, Jun 3, 2011 at 9:17 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> There are numerous problems with the code as it stands at this point.
> It crashes if you try to use 2PC, which means the regression tests
> fail; it probably does horrible things if you run out of shared
> memory; pg_locks knows nothing about the new mechanism (arguably, we
> could leave it that way: only locks that can't possibly be conflicting
> with anything can be taken using this mechanism, but it would be nice
> to fix, I think); and there are likely some other gotchas as well.
> Still, the basic mechanism appears to work.
>
> The code is attached, for anyone who may be curious.  Known idiocies
> are marked with "ZZZ".  The design was discussed on the previous
> thread ("reducing the overhead of frequent table locks"), q.v.  There
> are some comments in the patch as well, but more is likely needed.

Updated patch attached.  This one passes the regression tests, and all
known bugs are fixed.  There are still a few debugging leftovers in
the patch, and probably a few other rough edges, but I think this is
now ready for serious review.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: reducing the overhead of frequent table locks - now, with WIP patch

Simon Riggs <simon@2ndquadrant.com> — 2011-06-06T14:49:25Z

On Mon, Jun 6, 2011 at 11:19 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> On 06.06.2011 12:40, Simon Riggs wrote:
>>
>> On Sat, Jun 4, 2011 at 5:55 PM, Tom Lane<tgl@sss.pgh.pa.us>  wrote:
>>>
>>> Simon Riggs<simon@2ndquadrant.com>  writes:
>>>>
>>>> The approach looks sound to me. It's a fairly isolated patch and we
>>>> should be considering this for inclusion in 9.1, not wait another
>>>> year.
>>>
>>> That suggestion is completely insane.  The patch is only WIP and full of
>>> bugs, even according to its author.  Even if it were solid, it is way
>>> too late to be pushing such stuff into 9.1.  We're trying to ship a
>>> release, not find ways to cause it to slip more.
>>
>> In 8.3, you implemented virtual transactionids days before we produced
>> a Release Candidate, against my recommendation.
>
> FWIW, this bottleneck was not introduced by the introduction of virtual
> transaction ids. Before that patch, we just took the lock on the real
> transaction id instead.

Of course it wasn't. You've misunderstood completely.

My point was that we have in the past implemented performance changes
to increase scalability at the last minute, and also that our personal
risk perspectives are not always set in stone.

Robert has highlighted the value of this change and its clearly not
beyond our wit to include it, even if it is beyond our will to do so.


>> The fact that you disagree with me does not make me insane.
>
> You are not insane, even if your suggestion is.

LOL. Your logic is still poor though. :-)

-- 
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: reducing the overhead of frequent table locks - now, with WIP patch

Robert Haas <robertmhaas@gmail.com> — 2011-06-06T15:50:34Z

On Mon, Jun 6, 2011 at 10:49 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> My point was that we have in the past implemented performance changes
> to increase scalability at the last minute, and also that our personal
> risk perspectives are not always set in stone.
>
> Robert has highlighted the value of this change and its clearly not
> beyond our wit to include it, even if it is beyond our will to do so.

So, at the risk of totally derailing this thread -- what this boils
down to is a philosophical disagreement.

It seems to me (and, I think, to Tom and Heikki and others as well)
that it's not possible to keep on making changes to the release right
up until the last minute and then expect the release to be of high
quality.  If we keep committing new features, then we'll keep
introducing new bugs.  The only hope of making the bug count go down
at some point is to stop making changes that aren't bug fixes.  We
could come up with some complex procedure for determining whether a
patch is important enough and non-invasive enough to bypass the normal
deadline, but that would probably lead to a lot more arguing about
procedure, and realistically, it's still going to increase the bug
count at least somewhat.  IMHO, it's better to just have a deadline,
and stuff either makes it or it doesn't.  I realize we haven't always
adhered to the principle in the past, but at least IMV that's not a
mistake we want to continue repeating.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: reducing the overhead of frequent table locks - now, with WIP patch

Kevin Grittner <kevin.grittner@wicourts.gov> — 2011-06-06T16:14:54Z

Robert Haas <robertmhaas@gmail.com> wrote:

> IMHO, it's better to just have a deadline, and stuff either makes
> it or it doesn't.  I realize we haven't always adhered to the
> principle in the past, but at least IMV that's not a mistake we
> want to continue repeating.

+1

I've said it before, but I think it bears repeating, that deferring
this to 9.2 doesn't mean that it comes out in a production release
12 months later -- unless we continue to repeat this mistake
endlessly.  It means that this release comes out closer to when we
said it would -- for the sake of argument let's hypothesize one
month.  So by holding the line on such inclusions all the current
9.1 features come out one month sooner, and this feature comes out
11 months later than it would have if we'd put it into 9.1.  With
some feature we consider squeezing in, it would be more like
delaying everything which is done by three months so that one
feature gets out nine months earlier.

Perhaps the best way to describe the suggestion that this be
included in 9.1 isn't that it's an insane suggestion; but that it's
a suggestion which, if adopted, would be likely to drive those who
are striving for a more organized development and release process
insane.

Or one could look at it in a cost/benefit format -- major features
delivered per year go up by holding the line, administrative costs
are reduced, and people who are focusing on release stability get
more months per year to do development.

-Kevin

Re: reducing the overhead of frequent table locks - now, with WIP patch

Simon Riggs <simon@2ndquadrant.com> — 2011-06-06T17:13:48Z

On Mon, Jun 6, 2011 at 5:14 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:

> Perhaps the best way to describe the suggestion that this be
> included in 9.1 isn't that it's an insane suggestion; but that it's
> a suggestion which, if adopted, would be likely to drive those who
> are striving for a more organized development and release process
> insane.

Kevin, I respect your opinion and thank you for stating your case
without insults.

In this discussion it should be recognised that I have personally
driven the development of a more organized dev and release process. I
requested and argued for stated release dates to assist resource
planning and suggested commitfests as a mechanism to reduce the
feedback times for developers. I also provided the first guide to
patch reviews we published. So I am a proponent of planning and
organization, though some would like to claim I see things
differently.

The major problems of the dev process are now solved, yet "more
organization" is still being discussed, as if "more" == "better". What
I hear is "changed organization" and I am not certain that all
"change" == "better" in what I see is a leading example of how to
produce great software.

Releasing regularly is important, but not more important than
anything. Ever. Period. Trying to force that will definitely make you
mad, I can see. I request that people stop trying to enforce a process
so strictly that sensible and important change cannot take place when
needed.

> Or one could look at it in a cost/benefit format -- major features
> delivered per year go up by holding the line, administrative costs
> are reduced, and people who are focusing on release stability get
> more months per year to do development.

I do look at it in a cost/benefit format. The problem is the above
statement has nothing user-centric about it.

The cost to us is a few days work and the benefit is a whole year's
worth of increased performance for our user base, which has a hardware
equivalent well into the millions of dollars.

And that's ignoring the users that would've switched to using Postgres
earlier, and those who might leave because of competitive comparison.

I won't say any more about this because I am in no way a beneficiary
from this and even my opinion is given unpaid.

-- 
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: reducing the overhead of frequent table locks - now, with WIP patch

Josh Berkus <josh@agliodbs.com> — 2011-06-06T18:49:29Z

> That's an improvement of about ~3.5x.  According to the vmstat output,
> when running without the patch, the CPU state was about 40% idle.
> With the patch, it dropped down to around 6%.

Wow!  That's fantastic.

Jignesh, are you in a position to test any of Robert's work using DBT or
other benchmarks?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: reducing the overhead of frequent table locks - now, with WIP patch

Dimitri Fontaine <dimitri@2ndquadrant.fr> — 2011-06-06T19:12:54Z

Robert Haas <robertmhaas@gmail.com> writes:
>   IMHO, it's better to just have a deadline,

Well, that's the fine point we're now talking about.

I still think that we should try at making the best release possible.
And if that means including changes at beta time because that's when
someone got around to doing them, so be it — well, they should really
worth it.

So, to the question “do we want hard deadlines?” I think the answer is
“no”, to “do we need hard deadlines?”, my answer is still “no”, and to
the question “does this very change should be considered this late?” my
answer is yes.

Because it really changes the game for PostgreSQL users.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support

Re: reducing the overhead of frequent table locks - now, with WIP patch

Dave Page <dpage@pgadmin.org> — 2011-06-06T19:24:33Z

On Mon, Jun 6, 2011 at 8:12 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote:
> So, to the question “do we want hard deadlines?” I think the answer is
> “no”, to “do we need hard deadlines?”, my answer is still “no”, and to
> the question “does this very change should be considered this late?” my
> answer is yes.
>
> Because it really changes the game for PostgreSQL users.

Much as I hate to say it (I too want to keep our schedule as
predictable and organised as possible), I have to agree. Assuming the
patch is good, I think this is something we should push into 9.1. It
really could be a game changer.

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: reducing the overhead of frequent table locks - now, with WIP patch

Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> — 2011-06-06T19:40:14Z

On 06/06/2011 09:24 PM, Dave Page wrote:
> On Mon, Jun 6, 2011 at 8:12 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote:
>> So, to the question “do we want hard deadlines?” I think the answer is
>> “no”, to “do we need hard deadlines?”, my answer is still “no”, and to
>> the question “does this very change should be considered this late?” my
>> answer is yes.
>>
>> Because it really changes the game for PostgreSQL users.
> 
> Much as I hate to say it (I too want to keep our schedule as
> predictable and organised as possible), I have to agree. Assuming the
> patch is good, I think this is something we should push into 9.1. It
> really could be a game changer.

I disagree - the proposed patch maybe provides a very significant
improvment for a certain workload type(nothing less but nothing more),
but it was posted way after -BETA and I'm not sure we yet understand all
implications of the changes.
We also have to consider that the underlying issues are known problems
for multiple years^releases so I don't think there is a particular rush
to force them into a particular release (as in 9.1).

Stefan

Re: reducing the overhead of frequent table locks - now, with WIP patch

Stephen Frost <sfrost@snowman.net> — 2011-06-06T19:44:41Z

* Dave Page (dpage@pgadmin.org) wrote:
> Much as I hate to say it (I too want to keep our schedule as
> predictable and organised as possible), I have to agree. Assuming the
> patch is good, I think this is something we should push into 9.1. It
> really could be a game changer.

So, with folks putting up that we should hammer this patch out and
force it into 9.1..  What should our new release date for 9.1 be?  What
about other patches that didn't make it into 9.1?  What about the
upcoming CommitFest that we've asked people to start working on?

If we're going to start putting in changes like this, I'd suggest that
we try and target something like September for 9.1 to actually be
released.  Playing with the lock management isn't something we want to
be doing lightly and I think we definitely need to have serious testing
of this, similar to what has been done for the SSI changes, before we're
going to be able to release it.

I don't agree that we should delay 9.1, but if people really want this
in, then we need to figure out what the new schedule is going to be.

	Thanks,

		Stephen

Re: reducing the overhead of frequent table locks - now, with WIP patch

Dave Page <dpage@pgadmin.org> — 2011-06-06T19:50:01Z

On Mon, Jun 6, 2011 at 8:40 PM, Stefan Kaltenbrunner
<stefan@kaltenbrunner.cc> wrote:
> On 06/06/2011 09:24 PM, Dave Page wrote:
>> On Mon, Jun 6, 2011 at 8:12 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote:
>>> So, to the question “do we want hard deadlines?” I think the answer is
>>> “no”, to “do we need hard deadlines?”, my answer is still “no”, and to
>>> the question “does this very change should be considered this late?” my
>>> answer is yes.
>>>
>>> Because it really changes the game for PostgreSQL users.
>>
>> Much as I hate to say it (I too want to keep our schedule as
>> predictable and organised as possible), I have to agree. Assuming the
>> patch is good, I think this is something we should push into 9.1. It
>> really could be a game changer.
>
> I disagree - the proposed patch maybe provides a very significant
> improvment for a certain workload type(nothing less but nothing more),
> but it was posted way after -BETA and I'm not sure we yet understand all
> implications of the changes.

We certainly need to be happy with the implications if we were to make
such a decision.

> We also have to consider that the underlying issues are known problems
> for multiple years^releases so I don't think there is a particular rush
> to force them into a particular release (as in 9.1).

No, there's no *technical* reason we need to do this, as there would
be if it were a bug fix for example. I would just like to see us
narrow the gap with our competitors sooner rather than later, *if*
we're a) happy with the change, and b) we're talking about a minimal
delay (which we may be - Robert says he thinks the patch is good, so
with another review and beta testing....).

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: reducing the overhead of frequent table locks - now, with WIP patch

Josh Berkus <josh@agliodbs.com> — 2011-06-06T19:50:29Z

On 6/6/11 12:12 PM, Dimitri Fontaine wrote:
> So, to the question “do we want hard deadlines?” I think the answer is
> “no”, to “do we need hard deadlines?”, my answer is still “no”, and to
> the question “does this very change should be considered this late?” my
> answer is yes.

I could not disagree more strongly.  We're in *beta* now.  It's not like
the last CF closed a couple weeks ago.  Heck, I'm about to open the
first CF for 9.2 in just over a week.

Also, a patch like this needs several months of development, discussion
and  testing in order to fix the issues Robert already identified and
make sure it doesn't break something fundamental to concurrency.   Which
would mean delaying the release would be delayed until at least
November, screwing over all the users who don't care about this patch.

There will *always* be another really cool patch.  If we keep delaying
release to get in one more patch, then we never release.  At some point
you just have to take what you have and call it a release, and we are
months past that point.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: reducing the overhead of frequent table locks - now, with WIP patch

Andrew Dunstan <andrew@dunslane.net> — 2011-06-06T19:51:20Z

On 06/06/2011 03:24 PM, Dave Page wrote:
> On Mon, Jun 6, 2011 at 8:12 PM, Dimitri Fontaine<dimitri@2ndquadrant.fr>  wrote:
>> So, to the question “do we want hard deadlines?” I think the answer is
>> “no”, to “do we need hard deadlines?”, my answer is still “no”, and to
>> the question “does this very change should be considered this late?” my
>> answer is yes.
>>
>> Because it really changes the game for PostgreSQL users.
> Much as I hate to say it (I too want to keep our schedule as
> predictable and organised as possible), I have to agree. Assuming the
> patch is good, I think this is something we should push into 9.1. It
> really could be a game changer.

I'm not a fan of hard and fast deadlines for releases - it puts too much 
pressure on us to release before we might be ready. But I'm also not a 
fan of totally abandoning our established processes, which accepting 
this would. I don't mind bending the rules a bit occasionally; I do mind 
throwing them out the door.

cheers

andrew

Re: reducing the overhead of frequent table locks - now, with WIP patch

Dave Page <dpage@pgadmin.org> — 2011-06-06T19:52:43Z

On Mon, Jun 6, 2011 at 8:44 PM, Stephen Frost <sfrost@snowman.net> wrote:
> * Dave Page (dpage@pgadmin.org) wrote:
>> Much as I hate to say it (I too want to keep our schedule as
>> predictable and organised as possible), I have to agree. Assuming the
>> patch is good, I think this is something we should push into 9.1. It
>> really could be a game changer.
>
> So, with folks putting up that we should hammer this patch out and
> force it into 9.1..  What should our new release date for 9.1 be?  What
> about other patches that didn't make it into 9.1?  What about the
> upcoming CommitFest that we've asked people to start working on?
>
> If we're going to start putting in changes like this, I'd suggest that
> we try and target something like September for 9.1 to actually be
> released.  Playing with the lock management isn't something we want to
> be doing lightly and I think we definitely need to have serious testing
> of this, similar to what has been done for the SSI changes, before we're
> going to be able to release it.

Completely aside from the issue at hand, aren't we looking at a
September release by now anyway (assuming we have to void late
July/August as we usually do)?


-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: reducing the overhead of frequent table locks - now, with WIP patch

Jignesh Shah <jkshah@gmail.com> — 2011-06-06T19:54:07Z

On Mon, Jun 6, 2011 at 2:49 PM, Josh Berkus <josh@agliodbs.com> wrote:
>
>> That's an improvement of about ~3.5x.  According to the vmstat output,
>> when running without the patch, the CPU state was about 40% idle.
>> With the patch, it dropped down to around 6%.
>
> Wow!  That's fantastic.
>
> Jignesh, are you in a position to test any of Robert's work using DBT or
> other benchmarks?
>
> --
> Josh Berkus
> PostgreSQL Experts Inc.
> http://pgexperts.com
>

I missed the discussion. Can you send me the patch (will that work
with 9.1 beta?)? I can do a before and after with DBT2 and let you
know.
And also test it with sysbench read test  which also has a relation
locking bottleneck.

Thanks.

Regards,
Jignesh

Re: reducing the overhead of frequent table locks - now, with WIP patch

Christopher Browne <cbbrowne@gmail.com> — 2011-06-06T19:59:47Z

On Mon, Jun 6, 2011 at 5:13 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> The cost to us is a few days work and the benefit is a whole year's
> worth of increased performance for our user base, which has a hardware
> equivalent well into the millions of dollars.

I doubt that this is an accurate reflection of the cost.

What was presented by Robert Haas was a "proof of concept," and he
pointed out that it had numerous problems.  To requote:

"There are numerous problems with the code as it stands at this point.
It crashes if you try to use 2PC, which means the regression tests
fail; it probably does horrible things if you run out of shared
memory; pg_locks knows nothing about the new mechanism (arguably, we
could leave it that way: only locks that can't possibly be conflicting
with anything can be taken using this mechanism, but it would be nice
to fix, I think); and there are likely some other gotchas as well."

Turning this into something ready for production deployment in 9.1
would require a non-trivial amount of additional effort, and would
likely have the adverse effect of deferring the release of 9.1, as
well as of further deferring all the effects of the patches submitted
for the latest commitfest
(<https://commitfest.postgresql.org/action/commitfest_view?id=10>),
since this defers release of 9.2, as well.

While the patch is a fine one, in that it has interesting effects, it
seems like a way wiser idea to me to let it go through the 9.2
process, so that it has 6 months worth of buildfarm runs before it
gets deployed "for real" just like all the other items in the 2011-06
CommitFest.

Note that it may lead to further discoveries, so that perhaps, in the
9.2 series, we'd see further improvements due to things that are
discovered as further consequence of testing
<https://commitfest.postgresql.org/action/patch_view?id=572>.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"

Re: reducing the overhead of frequent table locks - now, with WIP patch

Robert Haas <robertmhaas@gmail.com> — 2011-06-06T20:04:08Z

On Mon, Jun 6, 2011 at 3:59 PM, Christopher Browne <cbbrowne@gmail.com> wrote:
> On Mon, Jun 6, 2011 at 5:13 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> The cost to us is a few days work and the benefit is a whole year's
>> worth of increased performance for our user base, which has a hardware
>> equivalent well into the millions of dollars.
>
> I doubt that this is an accurate reflection of the cost.
>
> What was presented by Robert Haas was a "proof of concept," and he
> pointed out that it had numerous problems.  To requote:
>
> "There are numerous problems with the code as it stands at this point.
> It crashes if you try to use 2PC, which means the regression tests
> fail; it probably does horrible things if you run out of shared
> memory; pg_locks knows nothing about the new mechanism (arguably, we
> could leave it that way: only locks that can't possibly be conflicting
> with anything can be taken using this mechanism, but it would be nice
> to fix, I think); and there are likely some other gotchas as well."

The latest version of the patch is in much better shape:

http://archives.postgresql.org/pgsql-hackers/2011-06/msg00403.php

But this is not intended as disparagement for the balance of your argument.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: reducing the overhead of frequent table locks - now, with WIP patch

Kevin Grittner <kevin.grittner@wicourts.gov> — 2011-06-06T20:14:25Z

Stephen Frost <sfrost@snowman.net> wrote:
 
> if people really want this in, then we need to figure out what the
> new schedule is going to be.
 
I suggest June, 2012.  That way we can get a whole bunch more really
cool patches in, and the users won't have to wait for 9.2 to get
them.
 
-Kevin

Re: reducing the overhead of frequent table locks - now, with WIP patch

Simon Riggs <simon@2ndquadrant.com> — 2011-06-06T22:15:12Z

On Mon, Jun 6, 2011 at 8:52 PM, Dave Page <dpage@pgadmin.org> wrote:
> On Mon, Jun 6, 2011 at 8:44 PM, Stephen Frost <sfrost@snowman.net> wrote:
>> * Dave Page (dpage@pgadmin.org) wrote:
>>> Much as I hate to say it (I too want to keep our schedule as
>>> predictable and organised as possible), I have to agree. Assuming the
>>> patch is good, I think this is something we should push into 9.1. It
>>> really could be a game changer.
>>
>> So, with folks putting up that we should hammer this patch out and
>> force it into 9.1..  What should our new release date for 9.1 be?  What
>> about other patches that didn't make it into 9.1?  What about the
>> upcoming CommitFest that we've asked people to start working on?
>>
>> If we're going to start putting in changes like this, I'd suggest that
>> we try and target something like September for 9.1 to actually be
>> released.  Playing with the lock management isn't something we want to
>> be doing lightly and I think we definitely need to have serious testing
>> of this, similar to what has been done for the SSI changes, before we're
>> going to be able to release it.
>
> Completely aside from the issue at hand, aren't we looking at a
> September release by now anyway (assuming we have to void late
> July/August as we usually do)?

I see no reason to delay from a July release as has long been planned.

What open items are genuine blockers?

If we need deadlines anywhere its in beta and final release, otherwise
we all just sit around shrugging and saying "another week I guess".

-- 
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: reducing the overhead of frequent table locks - now, with WIP patch

Alvaro Herrera <alvherre@commandprompt.com> — 2011-06-06T22:29:06Z

Excerpts from Dimitri Fontaine's message of lun jun 06 15:12:54 -0400 2011:

> So, to the question “do we want hard deadlines?” I think the answer is
> “no”, to “do we need hard deadlines?”, my answer is still “no”, and to
> the question “does this very change should be considered this late?” my
> answer is yes.
> 
> Because it really changes the game for PostgreSQL users.

Maybe so, but the problem is that the patch is really WIP at this point
and it obviously still needs a lot of work, judging from the patch
author's comments.

I note that if 2nd Quadrant is interested in having a game-changing
platform without having to wait a full year for 9.2, they can obviously
distribute a modified version of Postgres that integrates Robert's
patch.

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: reducing the overhead of frequent table locks - now, with WIP patch

Alvaro Herrera <alvherre@commandprompt.com> — 2011-06-06T22:53:38Z

Excerpts from Robert Haas's message of vie jun 03 09:17:08 -0400 2011:
> I've now spent enough time working on this issue now to be convinced
> that the approach has merit, if we can work out the kinks.  I'll start
> with some performance numbers.

I hereby recommend that people with patches such as this one while on
the last weeks till release should refrain from posting them until the
release has actually taken place.

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: reducing the overhead of frequent table locks - now, with WIP patch

Tom Lane <tgl@sss.pgh.pa.us> — 2011-06-06T23:29:00Z

Dave Page <dpage@pgadmin.org> writes:
> On Mon, Jun 6, 2011 at 8:44 PM, Stephen Frost <sfrost@snowman.net> wrote:
>> If we're going to start putting in changes like this, I'd suggest that
>> we try and target something like September for 9.1 to actually be
>> released. Playing with the lock management isn't something we want to
>> be doing lightly and I think we definitely need to have serious testing
>> of this, similar to what has been done for the SSI changes, before we're
>> going to be able to release it.

> Completely aside from the issue at hand, aren't we looking at a
> September release by now anyway (assuming we have to void late
> July/August as we usually do)?

Very possibly.  So if we add this in, we're talking November or December
instead of September.  You can't argue that July/August will be lost
time for one development path but not another.

			regards, tom lane

Re: reducing the overhead of frequent table locks - now, with WIP patch

Robert Haas <robertmhaas@gmail.com> — 2011-06-06T23:43:30Z

On Mon, Jun 6, 2011 at 6:53 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
> Excerpts from Robert Haas's message of vie jun 03 09:17:08 -0400 2011:
>> I've now spent enough time working on this issue now to be convinced
>> that the approach has merit, if we can work out the kinks.  I'll start
>> with some performance numbers.
>
> I hereby recommend that people with patches such as this one while on
> the last weeks till release should refrain from posting them until the
> release has actually taken place.

%@#!

Next time I'll be sure to only post my patches during beta if they suck.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: reducing the overhead of frequent table locks - now, with WIP patch

Stephen Frost <sfrost@snowman.net> — 2011-06-07T00:55:24Z

* Simon Riggs (simon@2ndQuadrant.com) wrote:
> I see no reason to delay from a July release as has long been planned.
> 
> What open items are genuine blockers?
> 
> If we need deadlines anywhere its in beta and final release, otherwise
> we all just sit around shrugging and saying "another week I guess".

I'm a bit confused by your response here.  Clearly, if we're going to
try and get this patch cleaned up and committable, then it's an open
item and a genuine blocker with a couple of months of work associated
with it.  If we don't try to shove this patch in then perhaps we can
get a release out in the next month or so.  It was my understand that
we're in beta and final release right now, and we're trying to hit
deadlines now which are associated with that.  Adding this patch into
the queue of "things to be done before release" moves us back out of
the beta testing and final release stage.

In other words, if you're argueing to stick to a release soon then it
doesn't make sense, to me anyway, to advocate applying a mostly
untested patch which changes a great deal of very important core logic.

	Thanks,

		Stephen

Re: reducing the overhead of frequent table locks - now, with WIP patch

Jignesh Shah <jkshah@gmail.com> — 2011-06-07T03:20:23Z

On Mon, Jun 6, 2011 at 2:49 PM, Josh Berkus <josh@agliodbs.com> wrote:
>
>> That's an improvement of about ~3.5x.  According to the vmstat output,
>> when running without the patch, the CPU state was about 40% idle.
>> With the patch, it dropped down to around 6%.
>
> Wow!  That's fantastic.
>
> Jignesh, are you in a position to test any of Robert's work using DBT or
> other benchmarks?
>
> --
> Josh Berkus
> PostgreSQL Experts Inc.
> http://pgexperts.com
>

Okay I tried it out with sysbench read scaling test..
Note I had tried that earlier on 9.0
http://jkshah.blogspot.com/2010/11/postgresql-90-simple-select-scaling.html

And on that test I found that doing that test on anything bigger than
4 cores lead to decreased performance ..
Redoing the same test with 100 users on 4 vCPU Virtual Machine with
8GB with 1M rows I get
   transactions:                        17870082 (59566.46 per sec.)
which is inline with the best number on 9.0.
This test hardly had any idle CPUs.

However where it made a huge impact was doing the same test on my 8
vCPU VM with 8GB RAM I get
    transactions:                        33274594 (110914.85 per sec.)

which is a whopping 1.8x scaling for 2x scaling (from 4 to 8 vCPU)..
My idle cpu was less than 7% which when taken into consideration that
the "useful" work is line with my expectations is really impressive..
(And plus the last time I did MySQL they were around 95K or so for the
same test).

Also note that in my earlier case 60K was the max irrespective of the
hardware I threw at it.. For this fastlock patch that does not seem to
be the problem anymore :-)

This gain is impressive..

Next step DBT-2..

Regards,
Jignesh

Next step

Re: reducing the overhead of frequent table locks - now, with WIP patch

Dave Page <dpage@pgadmin.org> — 2011-06-07T07:09:13Z

On Tue, Jun 7, 2011 at 12:29 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Dave Page <dpage@pgadmin.org> writes:
>> On Mon, Jun 6, 2011 at 8:44 PM, Stephen Frost <sfrost@snowman.net> wrote:
>>> If we're going to start putting in changes like this, I'd suggest that
>>> we try and target something like September for 9.1 to actually be
>>> released.  Playing with the lock management isn't something we want to
>>> be doing lightly and I think we definitely need to have serious testing
>>> of this, similar to what has been done for the SSI changes, before we're
>>> going to be able to release it.
>
>> Completely aside from the issue at hand, aren't we looking at a
>> September release by now anyway (assuming we have to void late
>> July/August as we usually do)?
>
> Very possibly.  So if we add this in, we're talking November or December
> instead of September.  You can't argue that July/August will be lost
> time for one development path but not another.

That would depend on 2 things - a) whether testing and review of this
single patch would really add 2 - 3 months to the schedule (I'm no
expert on our locking, but I suspect it would not), and b) whether
there are people around over the summer who could test/review. The
reason we usually skip the summer isn't actually a wholesale lack of
people - it's because it's not so good from a publicity perspective,
and it's hard to get all the packagers around at the same time.


-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: reducing the overhead of frequent table locks - now, with WIP patch

Stephen Frost <sfrost@snowman.net> — 2011-06-07T13:24:11Z

* Alvaro Herrera (alvherre@commandprompt.com) wrote:
> I note that if 2nd Quadrant is interested in having a game-changing
> platform without having to wait a full year for 9.2, they can obviously
> distribute a modified version of Postgres that integrates Robert's
> patch.

Having thought about this, I've got to agree with Alvaro on this one.
The people who need this patch are likely to pull it down and patch it
in and use it, regardless of if it's in a release or not.  My money is
that Treat's already got it running on some massive prod system that he
supports ( ;) ).

If we get it into the first CF of 9.2 then people are going to be even
more likely to pull it down and back-patch it into 9.1.  As soon as we
wrap up CF1 and put out our first alpha, the performance testers will
have something to point at and say "look!  PG scales *even better* now!"
and they're not going to particularly care that it's an alpha and the
blog-o-sphere isn't going to either, especially if we can say "and it'll
be in the next release which is scheduled for May".

So, all-in-all, -1 from me on trying to get this into 9.1.  Let's get
9.1 done and out the door already, hopefully before summer saps away
*too* many resources..

	Thanks,

		Stephen

Re: reducing the overhead of frequent table locks - now, with WIP patch

Joshua D. Drake <jd@commandprompt.com> — 2011-06-07T15:56:45Z

On 06/06/2011 04:43 PM, Robert Haas wrote:
> On Mon, Jun 6, 2011 at 6:53 PM, Alvaro Herrera
> <alvherre@commandprompt.com>  wrote:
>> Excerpts from Robert Haas's message of vie jun 03 09:17:08 -0400 2011:
>>> I've now spent enough time working on this issue now to be convinced
>>> that the approach has merit, if we can work out the kinks.  I'll start
>>> with some performance numbers.
>>
>> I hereby recommend that people with patches such as this one while on
>> the last weeks till release should refrain from posting them until the
>> release has actually taken place.
>
> %@#!
>
> Next time I'll be sure to only post my patches during beta if they suck.
>

I think Alvaro's point isn't directed at you Robert but at the idea that 
this should be applied to 9.1.

Sincerely,

Joshua D. Drake

-- 
Command Prompt, Inc. - http://www.commandprompt.com/
PostgreSQL Support, Training, Professional Services and Development
The PostgreSQL Conference - http://www.postgresqlconference.org/
@cmdpromptinc - @postgresconf - 509-416-6579

Re: reducing the overhead of frequent table locks - now, with WIP patch

Simon Riggs <simon@2ndquadrant.com> — 2011-06-07T16:51:28Z

On Mon, Jun 6, 2011 at 8:50 PM, Dave Page <dpage@pgadmin.org> wrote:
> On Mon, Jun 6, 2011 at 8:40 PM, Stefan Kaltenbrunner
> <stefan@kaltenbrunner.cc> wrote:
>> On 06/06/2011 09:24 PM, Dave Page wrote:
>>> On Mon, Jun 6, 2011 at 8:12 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote:
>>>> So, to the question “do we want hard deadlines?” I think the answer is
>>>> “no”, to “do we need hard deadlines?”, my answer is still “no”, and to
>>>> the question “does this very change should be considered this late?” my
>>>> answer is yes.
>>>>
>>>> Because it really changes the game for PostgreSQL users.
>>>
>>> Much as I hate to say it (I too want to keep our schedule as
>>> predictable and organised as possible), I have to agree. Assuming the
>>> patch is good, I think this is something we should push into 9.1. It
>>> really could be a game changer.
>>
>> I disagree - the proposed patch maybe provides a very significant
>> improvment for a certain workload type(nothing less but nothing more),
>> but it was posted way after -BETA and I'm not sure we yet understand all
>> implications of the changes.
>
> We certainly need to be happy with the implications if we were to make
> such a decision.
>
>> We also have to consider that the underlying issues are known problems
>> for multiple years^releases so I don't think there is a particular rush
>> to force them into a particular release (as in 9.1).
>
> No, there's no *technical* reason we need to do this, as there would
> be if it were a bug fix for example. I would just like to see us
> narrow the gap with our competitors sooner rather than later, *if*
> we're a) happy with the change, and b) we're talking about a minimal
> delay (which we may be - Robert says he thinks the patch is good, so
> with another review and beta testing....).

Stefan/Robert's observation that we perform a
VirtualXactLockTableInsert() to no real benefit is a good one.

It leads to the following simple patch to remove one lock table hit
per transaction. It's a lot smaller impact on the LockMgr locks, but
it will still be substantial. Performance tests please?

This patch is much less invasive and has impact only on CREATE INDEX
CONCURRENTLY and Hot Standby. It's taken me about 2 hours to write and
test and there's no way it will cause any delay at all to the release
schedule. (Though I'm sure Robert can improve it).

If we combine this patch with Koichi-san's recommended changes to the
number of lock partitions, we will have considerable impact for 9.1.
Robert will still get his day in the sun, just with 9.2.

This way we get something now *and* something later, while the risk
minimisers will have succeeded in protecting the code. A compromise
for everyone.

Please consider this as a serious proposal for tuning in 9.1.

-- 
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: reducing the overhead of frequent table locks - now, with WIP patch

Robert Haas <robertmhaas@gmail.com> — 2011-06-07T16:53:11Z

On Tue, Jun 7, 2011 at 12:51 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Mon, Jun 6, 2011 at 8:50 PM, Dave Page <dpage@pgadmin.org> wrote:
>> On Mon, Jun 6, 2011 at 8:40 PM, Stefan Kaltenbrunner
>> <stefan@kaltenbrunner.cc> wrote:
>>> On 06/06/2011 09:24 PM, Dave Page wrote:
>>>> On Mon, Jun 6, 2011 at 8:12 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote:
>>>>> So, to the question “do we want hard deadlines?” I think the answer is
>>>>> “no”, to “do we need hard deadlines?”, my answer is still “no”, and to
>>>>> the question “does this very change should be considered this late?” my
>>>>> answer is yes.
>>>>>
>>>>> Because it really changes the game for PostgreSQL users.
>>>>
>>>> Much as I hate to say it (I too want to keep our schedule as
>>>> predictable and organised as possible), I have to agree. Assuming the
>>>> patch is good, I think this is something we should push into 9.1. It
>>>> really could be a game changer.
>>>
>>> I disagree - the proposed patch maybe provides a very significant
>>> improvment for a certain workload type(nothing less but nothing more),
>>> but it was posted way after -BETA and I'm not sure we yet understand all
>>> implications of the changes.
>>
>> We certainly need to be happy with the implications if we were to make
>> such a decision.
>>
>>> We also have to consider that the underlying issues are known problems
>>> for multiple years^releases so I don't think there is a particular rush
>>> to force them into a particular release (as in 9.1).
>>
>> No, there's no *technical* reason we need to do this, as there would
>> be if it were a bug fix for example. I would just like to see us
>> narrow the gap with our competitors sooner rather than later, *if*
>> we're a) happy with the change, and b) we're talking about a minimal
>> delay (which we may be - Robert says he thinks the patch is good, so
>> with another review and beta testing....).
>
> Stefan/Robert's observation that we perform a
> VirtualXactLockTableInsert() to no real benefit is a good one.
>
> It leads to the following simple patch to remove one lock table hit
> per transaction. It's a lot smaller impact on the LockMgr locks, but
> it will still be substantial. Performance tests please?
>
> This patch is much less invasive and has impact only on CREATE INDEX
> CONCURRENTLY and Hot Standby. It's taken me about 2 hours to write and
> test and there's no way it will cause any delay at all to the release
> schedule. (Though I'm sure Robert can improve it).
>
> If we combine this patch with Koichi-san's recommended changes to the
> number of lock partitions, we will have considerable impact for 9.1.
> Robert will still get his day in the sun, just with 9.2.
>
> This way we get something now *and* something later, while the risk
> minimisers will have succeeded in protecting the code. A compromise
> for everyone.
>
> Please consider this as a serious proposal for tuning in 9.1.

You seem to have completely ignored the reason why it works that way
in the first place, which is that there is otherwise a risk of
undetected deadlock.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: reducing the overhead of frequent table locks - now, with WIP patch

Robert Haas <robertmhaas@gmail.com> — 2011-06-07T16:58:49Z

On Tue, Jun 7, 2011 at 11:56 AM, Joshua D. Drake <jd@commandprompt.com> wrote:
> On 06/06/2011 04:43 PM, Robert Haas wrote:
>>
>> On Mon, Jun 6, 2011 at 6:53 PM, Alvaro Herrera
>> <alvherre@commandprompt.com>  wrote:
>>>
>>> Excerpts from Robert Haas's message of vie jun 03 09:17:08 -0400 2011:
>>>>
>>>> I've now spent enough time working on this issue now to be convinced
>>>> that the approach has merit, if we can work out the kinks.  I'll start
>>>> with some performance numbers.
>>>
>>> I hereby recommend that people with patches such as this one while on
>>> the last weeks till release should refrain from posting them until the
>>> release has actually taken place.
>>
>> %@#!
>>
>> Next time I'll be sure to only post my patches during beta if they suck.
>>
>
> I think Alvaro's point isn't directed at you Robert but at the idea that
> this should be applied to 9.1.

Oh, I get that.  I'm just dismayed that we can't have a discussion
about the patch without getting sidetracked into a conversation about
whether we should throw feature freeze out the window.  If posting
patches that do interesting things during beta results in everyone
ignoring both the work that needs to be done to get from beta to final
release, and the patch itself, in favor of talking about the release
schedule, then I think at the next developer meeting we're going to
get to hear Tom argue that overlapping the end of beta with the
beginning of the next release cycle is a mistake and we should go back
to the old system where we yell at everyone to shut up unless they're
helping test or fix bugs.  Since that overlap is going to (hopefully)
allow this patch to get into the tree ~2-3 months SOONER than it would
have under the old system, I would be unhappy to see it abolished.

Everyone who is arguing for the inclusion of this patch in 9.1 should
take a minute to think about the following fact: If the PostgreSQL
development process does not work for Tom, it does not work.  Full
stop.  We all know that Tom is conservative with respect to release
management, but we also know that his output is enormous, that he
fixes virtually all of the bugs that *get* fixed, and that our
well-deserved reputation for high quality releases is in large part
attributable to him.  We will not be better off if we design a process
that leaves him cold.  The fact that Alvaro, Heikki, Andrew, Kevin,
and myself don't like the proposed process either is just icing on the
cake.  And I use the term "process" loosely, because what's really
being proposed is the complete absence of any process.  The idea of
having a feature freeze some time prior to release is hardly a novel
roadblock that we've invented here at the PostgreSQL Global
Development Group.  It's a basic software engineering principle that
has been universally adopted by just about every open and closed
source development project in existence, and with good reason.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: reducing the overhead of frequent table locks - now, with WIP patch

Tom Lane <tgl@sss.pgh.pa.us> — 2011-06-07T17:10:48Z

Simon Riggs <simon@2ndQuadrant.com> writes:
> Please consider this as a serious proposal for tuning in 9.1.

Look: it is at least four months too late for anything of the sort in 9.1.
We should be fixing bugs, and nothing else, if we ever want to get 9.1
out the door.  Performance improvements don't qualify, especially not
ones that tinker with fundamental parts of the system and seem highly
likely to introduce new bugs.

			regards, tom lane

Re: reducing the overhead of frequent table locks - now, with WIP patch

Josh Berkus <josh@agliodbs.com> — 2011-06-07T17:21:26Z

> iew. The
> reason we usually skip the summer isn't actually a wholesale lack of
> people - it's because it's not so good from a publicity perspective,
> and it's hard to get all the packagers around at the same time.

Actually, the summer is *excellent* from a publicity perspective ... at least, June and July are.  Both of those months are full of US conferences whose PR we can piggyback on to make a splash.

August is really the only "bad" month from a PR perspective, because we lose a lot of our European RCs, and there's no bandwagons to jump on.  But even August has the advantage of having no major US or Christian holidays to interfere with release dates.

However, we're more likely to have an issue with *packager* availability in August.  Besides, isn't this a little premature?  Last I looked, we still have some big nasty open items.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
San Francisco

Re: reducing the overhead of frequent table locks - now, with WIP patch

Tom Lane <tgl@sss.pgh.pa.us> — 2011-06-07T17:21:44Z

Robert Haas <robertmhaas@gmail.com> writes:
> ... I think at the next developer meeting we're going to
> get to hear Tom argue that overlapping the end of beta with the
> beginning of the next release cycle is a mistake and we should go back
> to the old system where we yell at everyone to shut up unless they're
> helping test or fix bugs.

I think we have already got quite enough evidence to conclude that this
approach is broken.  Not only does it appear that hardly anybody but me
is actively working on stabilizing 9.1, but I'm wasting quite a bit of
my time trying to keep Simon from destabilizing it; to say nothing of
reacting to design proposals for 9.2 work (or else feeling guilty
because I'm ignoring them, which is in fact what I've mostly been
doing).

As a measure of how completely this is not working: I've had "read the
SSI code" as a number one priority item for about two months now, and
still haven't found time to read one line of it.

> Everyone who is arguing for the inclusion of this patch in 9.1 should
> take a minute to think about the following fact: If the PostgreSQL
> development process does not work for Tom, it does not work.

I'd like to think that I'm not the sole driver of this process.
However, if everybody else is going to start playing in their 9.2
sandbox and ignore getting a release out, then yeah it comes down
to how much bandwidth I've got.  And that's finite.

			regards, tom lane

Re: reducing the overhead of frequent table locks - now, with WIP patch

Josh Berkus <josh@agliodbs.com> — 2011-06-07T17:27:54Z

Robert,

> Oh, I get that. I'm just dismayed that we can't have a discussion
> about the patch without getting sidetracked into a conversation about
> whether we should throw feature freeze out the window. 

That's not something you can change.  Whatever the patch is, even if it's a psql improvement, *someone* will argue that it's super-critical to shoehorn it into the release at the last minute.  It's a truism of human nature to rationalize exceptions where your own interest is concerned.

As long as we have solidarity of the committers that this is not allowed, however, this is not a real problem.  And it appears that we do.  In the future, it shouldn't even be necessary to discuss it.

For my part, I'm excited that we seem to be getting some big hairy important patches in to CF1, which means that those patches will be well-tested by the time 9.2 reaches beta.  Espeically getting Robert's patch and Simons's WALInsertLock work into CF1 means that we'll have 7 months to find serious bugs before beta starts.  So I'd really like to carry on with the current development schedule.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
San Francisco

9.1 release scheduling (was Re: reducing the overhead of frequent table locks - now, with WIP patch)

Tom Lane <tgl@sss.pgh.pa.us> — 2011-06-07T17:32:16Z

Joshua Berkus <josh@agliodbs.com> writes:
> Actually, the summer is *excellent* from a publicity perspective ... at least, June and July are.  Both of those months are full of US conferences whose PR we can piggyback on to make a splash.

> August is really the only "bad" month from a PR perspective, because we lose a lot of our European RCs, and there's no bandwagons to jump on.  But even August has the advantage of having no major US or Christian holidays to interfere with release dates.

> However, we're more likely to have an issue with *packager* availability in August.  Besides, isn't this a little premature?  Last I looked, we still have some big nasty open items.

Well, we're trying to fix them --- I'm still hoping that the known beta
blockers will be cleared by Thursday so we can ship beta2.  However,
what happens after that is uncertain.  I'm concerned that once the CF
starts, the number of developer cycles devoted to 9.1 testing will go to
zero, meaning that four weeks or so from now when the CF is over, we'll
have made no real progress beyond beta2.  It's hard to see how we have a
release before August if that's how things stand in early July.

			regards, tom lane

Re: reducing the overhead of frequent table locks - now, with WIP patch

Robert Haas <robertmhaas@gmail.com> — 2011-06-07T17:33:37Z

On Tue, Jun 7, 2011 at 1:27 PM, Joshua Berkus <josh@agliodbs.com> wrote:
> As long as we have solidarity of the committers that this is not allowed, however, this is not a real problem.  And it appears that we do.  In the future, it shouldn't even be necessary to discuss it.

Solidarity?

Simon - who was a committer last time I checked - seems to think that
the current process is entirely bunko.  And that is resulting in the
waste of a lot of time that could be better spent.  Our ability to
sustain this development process rests on the idea that we have some
kind of shared idea of what is and is not acceptable in general and at
particular points in the release cycle.  It *shouldn't* be necessary
to discuss it, but it apparently is.  Over and over and over again, in
fact.  It is critically important for the future success of this
project that we learn to walk and chew gum at the same time.  We are
failing outright.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: 9.1 release scheduling (was Re: reducing the overhead of frequent table locks - now, with WIP patch)

Thom Brown <thom@linux.com> — 2011-06-07T17:45:13Z

On 7 June 2011 19:32, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Joshua Berkus <josh@agliodbs.com> writes:
>> Actually, the summer is *excellent* from a publicity perspective ... at least, June and July are.  Both of those months are full of US conferences whose PR we can piggyback on to make a splash.
>
>> August is really the only "bad" month from a PR perspective, because we lose a lot of our European RCs, and there's no bandwagons to jump on.  But even August has the advantage of having no major US or Christian holidays to interfere with release dates.
>
>> However, we're more likely to have an issue with *packager* availability in August.  Besides, isn't this a little premature?  Last I looked, we still have some big nasty open items.
>
> Well, we're trying to fix them --- I'm still hoping that the known beta
> blockers will be cleared by Thursday so we can ship beta2.  However,
> what happens after that is uncertain.  I'm concerned that once the CF
> starts, the number of developer cycles devoted to 9.1 testing will go to
> zero, meaning that four weeks or so from now when the CF is over, we'll
> have made no real progress beyond beta2.  It's hard to see how we have a
> release before August if that's how things stand in early July.

Speaking of which, is it now safe to remove the "NOT VALID constraints
don't dump properly" issue from the blocker list since the fix has
been committed?

-- 
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: reducing the overhead of frequent table locks - now, with WIP patch

Robert Haas <robertmhaas@gmail.com> — 2011-06-07T17:52:16Z

On Tue, Jun 7, 2011 at 1:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> ... I think at the next developer meeting we're going to
>> get to hear Tom argue that overlapping the end of beta with the
>> beginning of the next release cycle is a mistake and we should go back
>> to the old system where we yell at everyone to shut up unless they're
>> helping test or fix bugs.
>
> I think we have already got quite enough evidence to conclude that this
> approach is broken.  Not only does it appear that hardly anybody but me
> is actively working on stabilizing 9.1, but I'm wasting quite a bit of
> my time trying to keep Simon from destabilizing it; to say nothing of
> reacting to design proposals for 9.2 work (or else feeling guilty
> because I'm ignoring them, which is in fact what I've mostly been
> doing).
>
> As a measure of how completely this is not working: I've had "read the
> SSI code" as a number one priority item for about two months now, and
> still haven't found time to read one line of it.
>
>> Everyone who is arguing for the inclusion of this patch in 9.1 should
>> take a minute to think about the following fact: If the PostgreSQL
>> development process does not work for Tom, it does not work.
>
> I'd like to think that I'm not the sole driver of this process.
> However, if everybody else is going to start playing in their 9.2
> sandbox and ignore getting a release out, then yeah it comes down
> to how much bandwidth I've got.  And that's finite.

I plead guilty to taking my eye off the ball post-beta1.  I busted my
ass for two months stabilizing other people's code after CF4 was over,
and then I moved on to other things.  I will try to get my eye back on
the ball - but actually I'm not sure there's all that much to do.   A
quick review of the open items list suggests that we have fixed a
total of six issues since beta1, as opposed to 47 prior to beta1.  And
all of those are being handled (two by you).  I also don't see much in
the way of unanswered 9.1 bug reports on pgsql-bugs, either.  There
may well be other open items, and I'm not unwilling to work on them,
but I don't read minds.  What needs doing?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: 9.1 release scheduling (was Re: reducing the overhead of frequent table locks - now, with WIP patch)

Robert Haas <robertmhaas@gmail.com> — 2011-06-07T17:53:23Z

On Tue, Jun 7, 2011 at 1:45 PM, Thom Brown <thom@linux.com> wrote:
> Speaking of which, is it now safe to remove the "NOT VALID constraints
> don't dump properly" issue from the blocker list since the fix has
> been committed?

I hope so, because I just did that (before noticing this email from you).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: reducing the overhead of frequent table locks - now, with WIP patch

Tom Lane <tgl@sss.pgh.pa.us> — 2011-06-07T17:56:05Z

Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Jun 7, 2011 at 1:27 PM, Joshua Berkus <josh@agliodbs.com> wrote:
>> As long as we have solidarity of the committers that this is not allowed, however, this is not a real problem. And it appears that we do. In the future, it shouldn't even be necessary to discuss it.

> Solidarity?

> Simon - who was a committer last time I checked - seems to think that
> the current process is entirely bunko.  And that is resulting in the
> waste of a lot of time that could be better spent.

Yes.  If it were anybody but Simon, we wouldn't be spending a lot of
time on it; we'd just say "sorry, this has to wait for 9.2" and that
would be the end of it.  As things stand, we have to convince him not to
commit these things ... or else be prepared to fight a war over whether
to revert them, which will be even more time-consuming and
trust-destroying.

			regards, tom lane

Re: reducing the overhead of frequent table locks - now, with WIP patch

Simon Riggs <simon@2ndquadrant.com> — 2011-06-07T18:06:32Z

On Tue, Jun 7, 2011 at 6:33 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Jun 7, 2011 at 1:27 PM, Joshua Berkus <josh@agliodbs.com> wrote:
>> As long as we have solidarity of the committers that this is not allowed, however, this is not a real problem.  And it appears that we do.  In the future, it shouldn't even be necessary to discuss it.
>
> Solidarity?
>
> Simon - who was a committer last time I checked - seems to think that
> the current process is entirely bunko.

I'm not sure why anyone that disagrees with you should be accused of
wanting to junk the whole process. I've not said that and I don't
think this.

Before you arrived, it was quite normal to suggest tuning patches
after feature freeze.

-- 
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: reducing the overhead of frequent table locks - now, with WIP patch

Stephen Frost <sfrost@snowman.net> — 2011-06-07T18:22:17Z

* Simon Riggs (simon@2ndQuadrant.com) wrote:
> Before you arrived, it was quite normal to suggest tuning patches
> after feature freeze.

I haven't been around as long as some, but I think I've been around
longer than Robert, and I can say that I don't recall serious
performance patches, particularly ones around lock management and which
change a fair bit of good, generally being white-listed from feature
freeze or being pushed in after beta1.

Perhaps I've missed them or perhaps there's been a few exceptions that
I'm not remembering that make it look routine rather than an exception
basis.  We might have tweaked a config variable or changed a #define
somewhere close to the end of a cycle, but I really don't put those into
the same category as this change.

	Thanks,

		Stephen

Re: reducing the overhead of frequent table locks - now, with WIP patch

Kevin Grittner <kevin.grittner@wicourts.gov> — 2011-06-07T18:40:44Z

Simon Riggs <simon@2ndQuadrant.com> wrote:

> Before you arrived, it was quite normal to suggest tuning patches
> after feature freeze.

I've worn a lot of hats in the practical end of this industry, and
regardless of which perspective I look at this from, I can't think
of anything so destructive to productivity, developer morale,
meeting deadlines or release quality as "slipping in just one more
item after feature freeze".  It's *always* something that someone
feels is so important that it's worth the delay and/or risk, and it
never works out well.

There are a lot of aspects of the development and release processes
on which I can see valid trade-offs and a lot of room for
negotiations and compromise, but having a feature freeze which is
treated seriously isn't one of them.  If nobody else was making an
issue of this, I still would be.

There's absolutely nothing personal or political in this -- I just
know what I've seen work and what I've seen cause problems.

-Kevin

Re: reducing the overhead of frequent table locks - now, with WIP patch

Robert Haas <robertmhaas@gmail.com> — 2011-06-07T18:45:41Z

On Tue, Jun 7, 2011 at 2:06 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Tue, Jun 7, 2011 at 6:33 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Tue, Jun 7, 2011 at 1:27 PM, Joshua Berkus <josh@agliodbs.com> wrote:
>>> As long as we have solidarity of the committers that this is not allowed, however, this is not a real problem.  And it appears that we do.  In the future, it shouldn't even be necessary to discuss it.
>>
>> Solidarity?
>>
>> Simon - who was a committer last time I checked - seems to think that
>> the current process is entirely bunko.
>
> I'm not sure why anyone that disagrees with you should be accused of
> wanting to junk the whole process. I've not said that and I don't
> think this.
>
> Before you arrived, it was quite normal to suggest tuning patches
> after feature freeze.

I, of course, am not in a position to comment on what happened before
I arrived.  But of the six committers who have weighed in on this
thread, you're the only one who thinks this can plausibly be called a
tuning patch.  Nor would the outcome of this discussion have been any
different if I hadn't participated in it, which is why I steered clear
of the whole topic of how the patch should be handled procedurally for
the first three days.  By the time I weighed in with my opinion, Tom
and Heikki had already expressed theirs.

Now it's possible that my influence is so widespread and pernicious
that I've managed to convince to change Tom and Heikki's opinions on
the topic of feature freeze.  Perhaps, three years ago, they would
have been willing to accept the patch at the last minute, but now,
because of my advocacy for a disciplined feature freeze, they are not.
 To accept this argument, you would have to believe that I have the
power to make Tom Lane more conservative.  I don't believe I have
either the power or the inclination to do any such thing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: reducing the overhead of frequent table locks - now, with WIP patch

Tom Lane <tgl@sss.pgh.pa.us> — 2011-06-07T18:55:49Z

Simon Riggs <simon@2ndQuadrant.com> writes:
> Before you arrived, it was quite normal to suggest tuning patches
> after feature freeze.

*Low risk* tuning patches make sense at this stage, yes.  Fooling with
the lock mechanisms doesn't qualify as low risk in my book.  The
probability of undetected subtle problems is just too great.

			regards, tom lane

Re: reducing the overhead of frequent table locks - now, with WIP patch

Simon Riggs <simon@2ndquadrant.com> — 2011-06-07T19:31:48Z

On Tue, Jun 7, 2011 at 7:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> Before you arrived, it was quite normal to suggest tuning patches
>> after feature freeze.
>
> *Low risk* tuning patches make sense at this stage, yes.  Fooling with
> the lock mechanisms doesn't qualify as low risk in my book.  The
> probability of undetected subtle problems is just too great.

Good, then we do agree. Some things are allowed, with suitable
justification. That has not been a point accepted by everybody here
though.

Upthread, I proposed that we leave Robert's patch until 9.2. That was
*after* I had reviewed it for impact and risk. I agree, its High Risk,
and so must be put off until normal dev opens because of the
sensitivity and criticality of getting the locking interactions right.

Moving on from that, I have proposed other solutions. Koichi, Jignesh
and and then Robert have shown measurements of the huge contention in
this area of our software. Robert's patch addresses the problems, as
do Koichi's and my latest patch.  I would like to see us do
*something* about these problems for 9.1. Not all of them are risky or
time consuming. I'm clearly not alone in this thought; Dave, Dimitri
and Koichi-san have also spoken in favour of action for this release.

-- 
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: reducing the overhead of frequent table locks - now, with WIP patch

Jignesh Shah <jkshah@gmail.com> — 2011-06-07T19:44:02Z

On Mon, Jun 6, 2011 at 11:20 PM, Jignesh Shah <jkshah@gmail.com> wrote:

>
> Okay I tried it out with sysbench read scaling test..
> Note I had tried that earlier on 9.0
> http://jkshah.blogspot.com/2010/11/postgresql-90-simple-select-scaling.html
>
> And on that test I found that doing that test on anything bigger than
> 4 cores lead to decreased performance ..
> Redoing the same test with 100 users on 4 vCPU Virtual Machine with
> 8GB with 1M rows I get
>   transactions:                        17870082 (59566.46 per sec.)
> which is inline with the best number on 9.0.
> This test hardly had any idle CPUs.
>
> However where it made a huge impact was doing the same test on my 8
> vCPU VM with 8GB RAM I get
>    transactions:                        33274594 (110914.85 per sec.)
>
> which is a whopping 1.8x scaling for 2x scaling (from 4 to 8 vCPU)..
> My idle cpu was less than 7% which when taken into consideration that
> the "useful" work is line with my expectations is really impressive..
> (And plus the last time I did MySQL they were around 95K or so for the
> same test).
>

> Next step DBT-2..
>

I tried with a warehouse size of 50 all cached in memory and my
initial tests with DBT-2 using 8 vCPU does not show any major changes
for a quick 10 minute run. I did eliminate write bottlenecks for this
test so as to stress on locks (using full_page_writes=off,
synchronous_commit=off, etc). I also have a large enough bufferpool to
fit the all 50 warehouse DB in memory

Without patch  score:      29088 NOTPM
With patch patch score:  30161 NOTPM

It could be that I have other problems in the setup..One of the things
I noticed is that there are too many "Idle in Connections" being
reported which tells me something else is becoming a bottleneck here
:-) I also tested with multiple clients but similar results..  both
postgresql shows multiple idle in transaction and fetch in waiting
while the clients show waiting in SocketCheck.. like shown below for
example.

#0  0x00007fc4e83a43c6 in poll () from /lib64/libc.so.6
#1  0x00007fc4e8abd61a in pqSocketCheck ()
#2  0x00007fc4e8abd730 in pqWaitTimed ()
#3  0x00007fc4e8abc215 in PQgetResult ()
#4  0x00007fc4e8abc398 in PQexecFinish ()
#5  0x00000000004050e1 in execute_new_order ()
#6  0x000000000040374f in process_transaction ()
#7  0x0000000000403519 in db_worker ()

So yes for DBT2 I think this is inconclusive since there still could
be other bottlenecks in play..  (Networking included)
But overall yes I like the sysbench read scaling numbers quite a bit..

Regards,
Jignesh

Re: reducing the overhead of frequent table locks - now, with WIP patch

Tom Lane <tgl@sss.pgh.pa.us> — 2011-06-07T19:45:34Z

Robert Haas <robertmhaas@gmail.com> writes:
> I plead guilty to taking my eye off the ball post-beta1.  I busted my
> ass for two months stabilizing other people's code after CF4 was over,
> and then I moved on to other things.  I will try to get my eye back on
> the ball - but actually I'm not sure there's all that much to do.   A
> quick review of the open items list suggests that we have fixed a
> total of six issues since beta1, as opposed to 47 prior to beta1.  And
> all of those are being handled (two by you).  I also don't see much in
> the way of unanswered 9.1 bug reports on pgsql-bugs, either.  There
> may well be other open items, and I'm not unwilling to work on them,
> but I don't read minds.  What needs doing?

Well, right at the moment there's not that much (if there were, I'd not
have proposed wrapping beta2 in two days).  You could look at some of
the "not blocker" items on the open-items list --- we really ought to
either do those things, or punt them off to TODO or the next CF as
appropriate, sometime before 9.1 final.

			regards, tom lane

Re: reducing the overhead of frequent table locks - now, with WIP patch

Tom Lane <tgl@sss.pgh.pa.us> — 2011-06-07T20:00:36Z

Simon Riggs <simon@2ndQuadrant.com> writes:
> Moving on from that, I have proposed other solutions. Koichi, Jignesh
> and and then Robert have shown measurements of the huge contention in
> this area of our software. Robert's patch addresses the problems, as
> do Koichi's and my latest patch.  I would like to see us do
> *something* about these problems for 9.1. Not all of them are risky or
> time consuming.

In the first place, all of these issues predate 9.1 by years.  They are
not regressions or new bugs, and they have not suddenly gotten more
urgent.  In the second place, I haven't seen any proposals in the area
that appear low risk.  I seriously doubt that I would consider *any*
meaningful change in the locking area to be low risk.

			regards, tom lane

Re: reducing the overhead of frequent table locks - now, with WIP patch

Robert Haas <robertmhaas@gmail.com> — 2011-06-07T20:03:26Z

On Tue, Jun 7, 2011 at 3:44 PM, Jignesh Shah <jkshah@gmail.com> wrote:
> On Mon, Jun 6, 2011 at 11:20 PM, Jignesh Shah <jkshah@gmail.com> wrote:
>> Okay I tried it out with sysbench read scaling test..
>> Note I had tried that earlier on 9.0
>> http://jkshah.blogspot.com/2010/11/postgresql-90-simple-select-scaling.html
>>
>> And on that test I found that doing that test on anything bigger than
>> 4 cores lead to decreased performance ..
>> Redoing the same test with 100 users on 4 vCPU Virtual Machine with
>> 8GB with 1M rows I get
>>   transactions:                        17870082 (59566.46 per sec.)
>> which is inline with the best number on 9.0.
>> This test hardly had any idle CPUs.
>>
>> However where it made a huge impact was doing the same test on my 8
>> vCPU VM with 8GB RAM I get
>>    transactions:                        33274594 (110914.85 per sec.)
>>
>> which is a whopping 1.8x scaling for 2x scaling (from 4 to 8 vCPU)..
>> My idle cpu was less than 7% which when taken into consideration that
>> the "useful" work is line with my expectations is really impressive..
>> (And plus the last time I did MySQL they were around 95K or so for the
>> same test).
>>
>
>> Next step DBT-2..
>>
>
>
> I tried with a warehouse size of 50 all cached in memory and my
> initial tests with DBT-2 using 8 vCPU does not show any major changes
> for a quick 10 minute run. I did eliminate write bottlenecks for this
> test so as to stress on locks (using full_page_writes=off,
> synchronous_commit=off, etc). I also have a large enough bufferpool to
> fit the all 50 warehouse DB in memory
>
> Without patch  score:      29088 NOTPM
> With patch patch score:  30161 NOTPM
>
> It could be that I have other problems in the setup..One of the things
> I noticed is that there are too many "Idle in Connections" being
> reported which tells me something else is becoming a bottleneck here
> :-) I also tested with multiple clients but similar results..  both
> postgresql shows multiple idle in transaction and fetch in waiting
> while the clients show waiting in SocketCheck.. like shown below for
> example.
>
> #0  0x00007fc4e83a43c6 in poll () from /lib64/libc.so.6
> #1  0x00007fc4e8abd61a in pqSocketCheck ()
> #2  0x00007fc4e8abd730 in pqWaitTimed ()
> #3  0x00007fc4e8abc215 in PQgetResult ()
> #4  0x00007fc4e8abc398 in PQexecFinish ()
> #5  0x00000000004050e1 in execute_new_order ()
> #6  0x000000000040374f in process_transaction ()
> #7  0x0000000000403519 in db_worker ()
>
>
> So yes for DBT2 I think this is inconclusive since there still could
> be other bottlenecks in play..  (Networking included)
> But overall yes I like the sysbench read scaling numbers quite a bit..

I think you will find that for write workloads WALInsertLock is so
badly contended that nothing else matters.  We really need to spend
some time working on that during the 9.2 cycle, but I don't have
anything that resembles a plan at this point.  If you have the cycles,
try compiling with LWLOCK_STATS defined and looking at the "blk"
numbers just to confirm that's where the bottleneck is.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: reducing the overhead of frequent table locks - now, with WIP patch

Simon Riggs <simon@2ndquadrant.com> — 2011-06-07T20:11:46Z

On Tue, Jun 7, 2011 at 9:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> Moving on from that, I have proposed other solutions. Koichi, Jignesh
>> and and then Robert have shown measurements of the huge contention in
>> this area of our software. Robert's patch addresses the problems, as
>> do Koichi's and my latest patch.  I would like to see us do
>> *something* about these problems for 9.1. Not all of them are risky or
>> time consuming.
>
> In the first place, all of these issues predate 9.1 by years.  They are
> not regressions or new bugs, and they have not suddenly gotten more
> urgent.  In the second place, I haven't seen any proposals in the area
> that appear low risk.  I seriously doubt that I would consider *any*
> meaningful change in the locking area to be low risk.

That's a shame. We'll fix it in 9.2 then.

-- 
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: reducing the overhead of frequent table locks - now, with WIP patch

Robert Haas <robertmhaas@gmail.com> — 2011-06-07T20:52:43Z

On Tue, Jun 7, 2011 at 12:51 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Stefan/Robert's observation that we perform a
> VirtualXactLockTableInsert() to no real benefit is a good one.
>
> It leads to the following simple patch to remove one lock table hit
> per transaction. It's a lot smaller impact on the LockMgr locks, but
> it will still be substantial. Performance tests please?
>
> This patch is much less invasive and has impact only on CREATE INDEX
> CONCURRENTLY and Hot Standby. It's taken me about 2 hours to write and
> test and there's no way it will cause any delay at all to the release
> schedule. (Though I'm sure Robert can improve it).

Incidentally, I spent the morning (before we got off on this tangent)
writing a patch to make VXID locks spring into existence on demand
instead of creating them for every transaction.  This applies on top
of my fastlock patch and fits in quite nicely with the existing
infrastructure that patch creates, and it helps modestly.  Well,
according to one metric, at least, it helps dramatically: traffic on
each lock manager partition locks drops from hundreds of thousands of
lock requests in a five minute period to just a few hundred.  But the
actual user-visible performance benefit is fairly modest - it goes
from ~36K TPS unpatched to ~129K TPS with the fast relation locks
alone to ~138K TPS with the fast relation locks plus a similar hack
for fast VXID locks (all results with pgbench -c 36 -j 36 -n -S -T 300
on a Nate-Boley-provided 24-core box).  Now, I'm not going to knock a
7% performance improvement and the benefit may be larger on Stefan's
80-core box and I think it's definitely worth going to the trouble to
implement that optimization for 9.2, but it appears at least based on
the testing that I've done so far that the fast relation locks are the
big win and after that it gets much harder to make an improvement.  If
we were to fix ONLY the vxid issue in 9.1 as you were advocating, the
benefit would probably be much less, because at least in my tests, the
fast relation lock patch increases overall system throughput
sufficiently to cause a 12x increase in contention due to vxid
traffic.

With both the fast-relation locks and the fast-vxid locks in place, as
I mentioned, the lock manager partition lock contention is completely
gone; in fact the lock manager partition traffic is pretty much gone.
The remaining contention comes mostly from the free list locks (blk
~13%) and the buffer mapping locks (which were roughly: 800k shacq,
12000 exacq, 850 blk)  Interestingly, I saw that one buffer mapping
lock got about 5x hotter than the others, which is odd, but possibly
harmless, since the absolute amount of blocking is really rather small
(~0.1%).  At least for read performance, we may need to start looking
less at reducing lock contention and more at making the actual
underlying operations faster.

In the process of doing all of this, I discovered that I had neglected
to update GetLockConflicts() and, consequently, fastlock-v2 is broken
insofar as CREATE INDEX CONCURRENTLY and Hot Standby are concerned.  I
will fix that and post an updated version; and I'll also post the
follow-on patch to accelerate the VXID locks at that time.  In the
meantime, I would appreciate any review or testing of the remainder of
the patch.

> If we combine this patch with Koichi-san's recommended changes to the
> number of lock partitions, we will have considerable impact for 9.1.
> Robert will still get his day in the sun, just with 9.2.

I am at this point of the viewpoint that there is little point in
raising the number of lock partitions.  If you are doing very simple
SELECT statements across a large number of tables, then increasing the
number of lock partitions will help.  On read-write workloads, there's
really no benefit, because WALInsertLock contention is the bottleneck.
 And on read-only workloads that only touch one or a handful of
tables, the individual lock manager partitions where the locks fall
get very hot regardless of how many partitions you have.  Now that
does still leave some space for improvement - specifically, lots of
tables, read-only or read-mostly - but the fast-relation-lock and
fast-vxid-lock stuff will address those bottlenecks far more
thoroughly.  And increasing the number of lock partitions also has a
downside: it will slow down end-of-transaction cleanup, which is
already an area where we know we have problems.

There might be some point in raising the number of buffer mapping
partitions, but I don't know how to create a test case where it's
actually material, especially without the fastlock stuff.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: reducing the overhead of frequent table locks - now, with WIP patch

Simon Riggs <simon@2ndquadrant.com> — 2011-06-07T21:43:39Z

On Tue, Jun 7, 2011 at 9:52 PM, Robert Haas <robertmhaas@gmail.com> wrote:

> If we were to fix ONLY the vxid issue in 9.1 as you were advocating

Sensible debate is impossible when you don't read what I've written.

-- 
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: reducing the overhead of frequent table locks - now, with WIP patch

Robert Haas <robertmhaas@gmail.com> — 2011-06-07T21:58:49Z

On Tue, Jun 7, 2011 at 5:43 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Tue, Jun 7, 2011 at 9:52 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> If we were to fix ONLY the vxid issue in 9.1 as you were advocating
>
> Sensible debate is impossible when you don't read what I've written.

I've read every word you've written on this thread.  Much of it,
multiple times.  I am unclear what we are arguing about.  I don't want
to have a debate.  I want to figure out what works, and do it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: reducing the overhead of frequent table locks - now, with WIP patch

Josh Berkus <josh@agliodbs.com> — 2011-06-07T22:14:04Z

On 6/7/11 1:11 PM, Simon Riggs wrote:
>> that appear low risk.  I seriously doubt that I would consider *any*
>> > meaningful change in the locking area to be low risk.
> That's a shame. We'll fix it in 9.2 then.

I will point out that we bounced Alvaro's FK patch, which *was*
submitted in time for CF4, because of unknown locking impact.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: 9.1 release scheduling (was Re: reducing the overhead of frequent table locks - now, with WIP patch)

Alvaro Herrera <alvherre@commandprompt.com> — 2011-06-08T01:25:30Z

Excerpts from Robert Haas's message of mar jun 07 13:53:23 -0400 2011:
> On Tue, Jun 7, 2011 at 1:45 PM, Thom Brown <thom@linux.com> wrote:
> > Speaking of which, is it now safe to remove the "NOT VALID constraints
> > don't dump properly" issue from the blocker list since the fix has
> > been committed?
> 
> I hope so, because I just did that (before noticing this email from you).

Yeah, pg_dump works in HEAD ... the bug now is that psql prints "NOT
VALID" twice.  Will fix.

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: reducing the overhead of frequent table locks - now, with WIP patch

Bruce Momjian <bruce@momjian.us> — 2011-06-08T04:19:21Z

Robert Haas wrote:
> On Mon, Jun 6, 2011 at 10:49 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> > My point was that we have in the past implemented performance changes
> > to increase scalability at the last minute, and also that our personal
> > risk perspectives are not always set in stone.
> >
> > Robert has highlighted the value of this change and its clearly not
> > beyond our wit to include it, even if it is beyond our will to do so.
> 
> So, at the risk of totally derailing this thread -- what this boils
> down to is a philosophical disagreement.
> 
> It seems to me (and, I think, to Tom and Heikki and others as well)
> that it's not possible to keep on making changes to the release right
> up until the last minute and then expect the release to be of high
> quality.  If we keep committing new features, then we'll keep
> introducing new bugs.  The only hope of making the bug count go down
> at some point is to stop making changes that aren't bug fixes.  We
> could come up with some complex procedure for determining whether a
> patch is important enough and non-invasive enough to bypass the normal
> deadline, but that would probably lead to a lot more arguing about
> procedure, and realistically, it's still going to increase the bug
> count at least somewhat.  IMHO, it's better to just have a deadline,
> and stuff either makes it or it doesn't.  I realize we haven't always
> adhered to the principle in the past, but at least IMV that's not a
> mistake we want to continue repeating.

Simon is right that we slipped the vxid patch into 8.3 when a Postgres
user I talked to at Linuxworld mentioned high vacuum freeze activity and
simple calculations showed the many read-only queries could cause high
xid usage.  Fortunately we already had a patch available and Tom applied
it during beta.  It was an existing patch that took on new urgency
during beta.

Robert's point above is that it isn't so much making the decision of
whether something should slip past the deadline, but the time-sapping
discussion of whether something should slip, and the frankly disturbing
behavior of some in this group to not accept a clear consensus,
therefore prolonging the discussion of slippage far longer than
necessary.

Basically, if you propose something, and it gets shot down due to
procedure, accept that unless you have some very good _new_ reason for
continuing the discussion.  If you don't like that, then you are not
going to do well in our group and maybe this isn't the group for you.  

I think we are going to need to be much more forceful about this, and if
the threat that someone has commit rights and therefore we can't ignore
them, we will have to reconsider who can commit to this project.  Do I
need to be any clearer?

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +

Re: reducing the overhead of frequent table locks - now, with WIP patch

Bruce Momjian <bruce@momjian.us> — 2011-06-08T04:33:26Z

Bruce Momjian wrote:
> Simon is right that we slipped the vxid patch into 8.3 when a Postgres
> user I talked to at Linuxworld mentioned high vacuum freeze activity and
> simple calculations showed the many read-only queries could cause high
> xid usage.  Fortunately we already had a patch available and Tom applied
> it during beta.  It was an existing patch that took on new urgency
> during beta.
> 
> Robert's point above is that it isn't so much making the decision of
> whether something should slip past the deadline, but the time-sapping
> discussion of whether something should slip, and the frankly disturbing
> behavior of some in this group to not accept a clear consensus,
> therefore prolonging the discussion of slippage far longer than
> necessary.
> 
> Basically, if you propose something, and it gets shot down due to
> procedure, accept that unless you have some very good _new_ reason for
> continuing the discussion.  If you don't like that, then you are not
> going to do well in our group and maybe this isn't the group for you.  
> 
> I think we are going to need to be much more forceful about this, and if
> the threat that someone has commit rights and therefore we can't ignore
> them, we will have to reconsider who can commit to this project.  Do I
> need to be any clearer?

One more thing --- when Tom applied that patch during 8.3 beta it was
with everyone's agreement, so the policy should be that if we are going
to break the rules, everyone has to agree --- if anyone disagrees, the
rules stand.

In this case, several people early felt we should stick with the rules
--- at that point, there should have been no further discussion of
slipping things into 9.1.

Discussion takes energy, and discussing slipping things into 9.1 after
anyone objects is just wasting our valuable time.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +

Re: reducing the overhead of frequent table locks - now, with WIP patch

Tom Lane <tgl@sss.pgh.pa.us> — 2011-06-08T05:02:42Z

Bruce Momjian <bruce@momjian.us> writes:
> Simon is right that we slipped the vxid patch into 8.3 when a Postgres
> user I talked to at Linuxworld mentioned high vacuum freeze activity and
> simple calculations showed the many read-only queries could cause high
> xid usage.  Fortunately we already had a patch available and Tom applied
> it during beta.  It was an existing patch that took on new urgency
> during beta.

Just to set the record straight on this ... the vxid patch went in on
2007-09-05:
http://archives.postgresql.org/pgsql-committers/2007-09/msg00026.php
which was a day shy of a month before we wrapped 8.3beta1:
http://archives.postgresql.org/pgsql-committers/2007-10/msg00089.php
so it was during alpha phase not beta.  And 8.3RC1 was stamped on
2008-01-03.  So Simon's assertion that this was "days before we produced
a release candidate" is correct, if you take "days" as "4 months".

			regards, tom lane

Re: reducing the overhead of frequent table locks - now, with WIP patch

Jim Nasby <jim@nasby.net> — 2011-06-08T15:39:06Z

On Jun 7, 2011, at 8:24 AM, Stephen Frost wrote:
> * Alvaro Herrera (alvherre@commandprompt.com) wrote:
>> I note that if 2nd Quadrant is interested in having a game-changing
>> platform without having to wait a full year for 9.2, they can obviously
>> distribute a modified version of Postgres that integrates Robert's
>> patch.
> 
> Having thought about this, I've got to agree with Alvaro on this one.
> The people who need this patch are likely to pull it down and patch it
> in and use it, regardless of if it's in a release or not.  My money is
> that Treat's already got it running on some massive prod system that he
> supports ( ;) ).
> 
> If we get it into the first CF of 9.2 then people are going to be even
> more likely to pull it down and back-patch it into 9.1.  As soon as we
> wrap up CF1 and put out our first alpha, the performance testers will
> have something to point at and say "look!  PG scales *even better* now!"
> and they're not going to particularly care that it's an alpha and the
> blog-o-sphere isn't going to either, especially if we can say "and it'll
> be in the next release which is scheduled for May".

From the Thinking Outside The Box dept.:

Also, if the performance gains prove to be as earth-shattering as initial results indicate, there's nothing that says we *have* to wait until the middle of next year to get this out. We could push to get 9.2 out with fewer other features, or possibly even break with tradition and backport this to 9.1 (or perhaps have a fork of 9.1 that we only support until 9.2 is out).

Obviously, those options all involve serious time commitments and the community will have to weigh those carefully. And we'd have to have very strong evidence of the benefits before even having that discussion, because the discussion itself will likely be resource intensive. But the option *is* there, should we decide to pursue it.

This means that "this patch is too important to wait another 12 months" isn't really a valid point: it only has to wait 12 months if thats what the community thinks is best; otherwise it could miss 9.1 *and* be out significantly before 12 months from now.
--
Jim C. Nasby, Database Architect                   jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net

Re: reducing the overhead of frequent table locks - now, with WIP patch

Simon Riggs <simon@2ndquadrant.com> — 2011-06-08T16:25:42Z

On Wed, Jun 8, 2011 at 5:19 AM, Bruce Momjian <bruce@momjian.us> wrote:
> Robert Haas wrote:
>> On Mon, Jun 6, 2011 at 10:49 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> > My point was that we have in the past implemented performance changes
>> > to increase scalability at the last minute, and also that our personal
>> > risk perspectives are not always set in stone.
>> >
>> > Robert has highlighted the value of this change and its clearly not
>> > beyond our wit to include it, even if it is beyond our will to do so.
>>
>> So, at the risk of totally derailing this thread -- what this boils
>> down to is a philosophical disagreement.
>>
>> It seems to me (and, I think, to Tom and Heikki and others as well)
>> that it's not possible to keep on making changes to the release right
>> up until the last minute and then expect the release to be of high
>> quality.  If we keep committing new features, then we'll keep
>> introducing new bugs.  The only hope of making the bug count go down
>> at some point is to stop making changes that aren't bug fixes.  We
>> could come up with some complex procedure for determining whether a
>> patch is important enough and non-invasive enough to bypass the normal
>> deadline, but that would probably lead to a lot more arguing about
>> procedure, and realistically, it's still going to increase the bug
>> count at least somewhat.  IMHO, it's better to just have a deadline,
>> and stuff either makes it or it doesn't.  I realize we haven't always
>> adhered to the principle in the past, but at least IMV that's not a
>> mistake we want to continue repeating.
>
> Simon is right that we slipped the vxid patch into 8.3 when a Postgres
> user I talked to at Linuxworld mentioned high vacuum freeze activity and
> simple calculations showed the many read-only queries could cause high
> xid usage.  Fortunately we already had a patch available and Tom applied
> it during beta.  It was an existing patch that took on new urgency
> during beta.
>
> Robert's point above is that it isn't so much making the decision of
> whether something should slip past the deadline, but the time-sapping
> discussion of whether something should slip, and the frankly disturbing
> behavior of some in this group to not accept a clear consensus,
> therefore prolonging the discussion of slippage far longer than
> necessary.
>
> Basically, if you propose something, and it gets shot down due to
> procedure, accept that unless you have some very good _new_ reason for
> continuing the discussion.  If you don't like that, then you are not
> going to do well in our group and maybe this isn't the group for you.
>
> I think we are going to need to be much more forceful about this, and if
> the threat that someone has commit rights and therefore we can't ignore
> them, we will have to reconsider who can commit to this project.  Do I
> need to be any clearer?

You are very clear, but as to why, I am not sure.

On Monday, realising that Robert had discovered something of massive
potential benefit to the community, I asked Tom to take a look at the
patch to see if I could get his interest in including it in this
release. I did that out of pure altruism; how could I possibly benefit
from highlighting the work of another person, another company?

Tom has agreed with me that making tuning proposals during beta is
acceptable. In this case, he thinks it is too risky to apply. In fact,
I agreed, having reviewed the patch myself, suggesting a much simpler,
non-invasive patch instead (a new reason, as you say). I then
immediately accepted his decision to exclude any patch involving
locking from further consideration.

Given the level of potential benefit, I don't have a problem tapping
Tom on the shoulder to review it and see if it is tweakable. At no
point have I discussed applying the patch myself, nor have I ever even
considered it. The main point is that in his hands a task can be done
in days, not the months others have quoted. You can read that as
respect and optimism, or you can see chaos and disrespect, but that is
all in the eye of the beholder.

As a result of this, I've been insulted, told I have no respect for
process and even suggested there was a threat of patch war. None of
that is reasonable or anywhere close to truth. If there has been a
time sapping discussion, it is because people have jumped to
conclusions and responded irrationally. To be honest, I'm completely
surprised by all of that. I had no idea that me asking Tom a question
was perceived as a denial of service attack on the community, nor that
it would result in the comments made to me and about me.

As long as I am allowed the freedom to speak in this forum then I will
speak up for PostgreSQL users, committer or not. As long as I'm a
committer, I will take responsibility for the code and seek to improve
it and fix it according to the community process.

-- 
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: reducing the overhead of frequent table locks - now, with WIP patch

Robert Haas <robertmhaas@gmail.com> — 2011-06-08T16:32:52Z

On Wed, Jun 8, 2011 at 11:39 AM, Jim Nasby <jim@nasby.net> wrote:
> On Jun 7, 2011, at 8:24 AM, Stephen Frost wrote:
>> * Alvaro Herrera (alvherre@commandprompt.com) wrote:
>>> I note that if 2nd Quadrant is interested in having a game-changing
>>> platform without having to wait a full year for 9.2, they can obviously
>>> distribute a modified version of Postgres that integrates Robert's
>>> patch.
>>
>> Having thought about this, I've got to agree with Alvaro on this one.
>> The people who need this patch are likely to pull it down and patch it
>> in and use it, regardless of if it's in a release or not.  My money is
>> that Treat's already got it running on some massive prod system that he
>> supports ( ;) ).
>>
>> If we get it into the first CF of 9.2 then people are going to be even
>> more likely to pull it down and back-patch it into 9.1.  As soon as we
>> wrap up CF1 and put out our first alpha, the performance testers will
>> have something to point at and say "look!  PG scales *even better* now!"
>> and they're not going to particularly care that it's an alpha and the
>> blog-o-sphere isn't going to either, especially if we can say "and it'll
>> be in the next release which is scheduled for May".
>
> From the Thinking Outside The Box dept.:
>
> Also, if the performance gains prove to be as earth-shattering as initial results indicate, there's nothing that says we *have* to wait until the middle of next year to get this out. We could push to get 9.2 out with fewer other features, or possibly even break with tradition and backport this to 9.1 (or perhaps have a fork of 9.1 that we only support until 9.2 is out).
>
> Obviously, those options all involve serious time commitments and the community will have to weigh those carefully. And we'd have to have very strong evidence of the benefits before even having that discussion, because the discussion itself will likely be resource intensive. But the option *is* there, should we decide to pursue it.
>
> This means that "this patch is too important to wait another 12 months" isn't really a valid point: it only has to wait 12 months if thats what the community thinks is best; otherwise it could miss 9.1 *and* be out significantly before 12 months from now.

Right.  The community gets to decide when the community wants to
release, and with what features.  Right now, the consensus is that we
want to finish up 9.1 and release it.  It doesn't seem impossible that
we could manage to do that before this patch is ready for commit,
which is why I don't want to try to slip this into 9.1 no matter how
valuable it is.

I also feel that the fundamental thing we need in order to have better
releases is more developers spending more time developing cool stuff.
That is why I am somewhat dismayed to see this discussion veer off on
what I consider to be a tangent about release scheduling.  It took me
about 3 days to write the patch.  I've now spent the better part of a
day on this scheduling discussion.  I would rather have spent that
time improving the patch.  Or working on some other patch.  Or getting
9.1 out the door.  Now, mind you, I think release scheduling is
important.  I believe in the value of good project management.  But if
we make every cool patch that comes along into an opportunity to fight
about the release schedule, that's not productive.  Already, I feel
that any hope I might have had of getting useful technical feedback on
this patch anytime in the near future has been basically obliterated.
What a bummer.

As for the 9.2 schedule, I'm actually hoping that 9.2 will be a big
release for performance, sorta like 8.3 was.  I think that to make
that happen, we're going to need more than one good patch.  This patch
can be part of that picture, but there are many users who derive no
benefit or only a small benefit from it.  Of course, there are some
who will get a big benefit, and I'm as excited about that as everyone
else, but if we can broaden the aperture a bit and come up with a
variety of improvements that hit on a variety of use cases, then we'll
really have something to brag about.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: reducing the overhead of frequent table locks - now, with WIP patch

Simon Riggs <simon@2ndquadrant.com> — 2011-06-08T16:40:06Z

On Wed, Jun 8, 2011 at 6:02 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Bruce Momjian <bruce@momjian.us> writes:
>> Simon is right that we slipped the vxid patch into 8.3 when a Postgres
>> user I talked to at Linuxworld mentioned high vacuum freeze activity and
>> simple calculations showed the many read-only queries could cause high
>> xid usage.  Fortunately we already had a patch available and Tom applied
>> it during beta.  It was an existing patch that took on new urgency
>> during beta.
>
> Just to set the record straight on this ... the vxid patch went in on
> 2007-09-05:
> http://archives.postgresql.org/pgsql-committers/2007-09/msg00026.php
> which was a day shy of a month before we wrapped 8.3beta1:
> http://archives.postgresql.org/pgsql-committers/2007-10/msg00089.php
> so it was during alpha phase not beta.  And 8.3RC1 was stamped on
> 2008-01-03.  So Simon's assertion that this was "days before we produced
> a release candidate" is correct, if you take "days" as "4 months".

The patch went in slightly more than 6 months after feature freeze,
even though it was written by a summer student and did not even pass
review by the student's mentor (me).

The patch is invasive, involving core changes to the transaction
infrastructure and touching the more than 30 files.

It was a brilliant contribution from Florian.

I take it as an example of
* what you can do when you set your mind to it, given sufficient cause
and a good starting point
* how people can propose things of value to the community even at a late stage
* how I have respected the process at other times

-- 
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: reducing the overhead of frequent table locks - now, with WIP patch

Simon Riggs <simon@2ndquadrant.com> — 2011-06-08T16:43:08Z

On Wed, Jun 8, 2011 at 5:33 AM, Bruce Momjian <bruce@momjian.us> wrote:

> One more thing --- when Tom applied that patch during 8.3 beta it was
> with everyone's agreement, so the policy should be that if we are going
> to break the rules, everyone has to agree --- if anyone disagrees, the
> rules stand.

I spoke against applying the patch, and to my knowledge was the only
person to have reviewed it at that stage.

I was happy that Tom applied it, but I would not have done so myself
then, nor would I do so now. I would trust only Tom to do that, which
is why I proposed to Tom that he look at Robert's patch.

-- 
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: reducing the overhead of frequent table locks - now, with WIP patch

Robert Haas <robertmhaas@gmail.com> — 2011-06-08T16:44:48Z

On Wed, Jun 8, 2011 at 12:25 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> As a result of this, I've been insulted, told I have no respect for
> process and even suggested there was a threat of patch war.

Well, you've pretty much said flat out you don't like the process, and
you don't agree with having a firm feature freeze.  I think it's a
perfectly legitimate question to ask whether we're going to have to
continually relitigate that point.  This is at least the second major
dust-up on this point since the end of 9.1CF4, and there were some
smaller ones, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: reducing the overhead of frequent table locks - now, with WIP patch

Simon Riggs <simon@2ndquadrant.com> — 2011-06-08T16:54:34Z

On Wed, Jun 8, 2011 at 5:32 PM, Robert Haas <robertmhaas@gmail.com> wrote:

> It took me
> about 3 days to write the patch.  I've now spent the better part of a
> day on this scheduling discussion.  I would rather have spent that
> time improving the patch.  Or working on some other patch.  Or getting
> 9.1 out the door.

Sync Rep took 6 days to write initially and about 6 months to discuss
it, so you have a long way to go before your experience matches mine.

Sometimes people side track you onto things you think are pointless,
and sometimes you voice the opinion that they shouldn't have done so.

-- 
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: reducing the overhead of frequent table locks - now, with WIP patch

Simon Riggs <simon@2ndquadrant.com> — 2011-06-08T17:10:28Z

On Wed, Jun 8, 2011 at 5:44 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Jun 8, 2011 at 12:25 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> As a result of this, I've been insulted, told I have no respect for
>> process and even suggested there was a threat of patch war.
>
> Well, you've pretty much said flat out you don't like the process, and
> you don't agree with having a firm feature freeze.  I think it's a
> perfectly legitimate question to ask whether we're going to have to
> continually relitigate that point.  This is at least the second major
> dust-up on this point since the end of 9.1CF4, and there were some
> smaller ones, too.

Why do you address this to me? Many others have been committing
patches against raised issues well after feature freeze.

You do not wish to stop all patches, only those you disagree with. How
would I know you disagree with a patch without discussing it?

I note that you've claimed *everything* I have discussed is a new
feature, whereas everything you or others have done is an "open item".
You can claim that everything I suggest is a dust-up if you wish, but
who makes it a dust up and why?

The point I have made is that I disagree with a feature freeze date
fixed ahead of time without regard to the content of the forthcoming
release. I've not said I disagree with feature freezes altogether,
which would be utterly ridiculous. Fixed dates are IMHO much less
important than a sensible and useful feature set for our users. MySQL
repeatedly delivered releases with half-finished features and earned
much disrespect. We have never done that previously and I am against
doing so in the future.

-- 
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: reducing the overhead of frequent table locks - now, with WIP patch

Josh Berkus <josh@agliodbs.com> — 2011-06-08T17:43:29Z

Simon,

> The point I have made is that I disagree with a feature freeze date
> fixed ahead of time without regard to the content of the forthcoming
> release. I've not said I disagree with feature freezes altogether,
> which would be utterly ridiculous. Fixed dates are IMHO much less
> important than a sensible and useful feature set for our users.

This is such a non-argument it's silly.  We have so many new major features for 9.1 that I'm having trouble writing sensible press releases which don't sound like a laundry list.

> MySQL
> repeatedly delivered releases with half-finished features and earned
> much disrespect. We have never done that previously and I am against
> doing so in the future.

This is also total BS.  I worked on the MySQL team.  Before Sun/Oracle, MySQL specifically had feature-driven releases, where Marketing decided what features 5.0, 5.1 and 5.2 would have.  They also accepted new features during beta if Marketing liked them enough.  This resulted in the 5.1 release being *three years late*, and 5.3 being cancelled altogether.  And let's talk about the legendary instability of 5.0, because they decided that they couldn't cancel partitioning and stored procedures, whether they were ready for prime time or not and because they kept changing the API during beta.

MySQL never had time-based releases before Oracle took them over.  And Oracle has been having feature-free releases because they're trying to work through MySQL's list of thousands of unfixed bugs which dates back to 2003.

An argument for feature-driven releases is in fact an argument for the MySQL AB development model.  And that's not a company I want to emulate.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
San Francisco

Re: reducing the overhead of frequent table locks - now, with WIP patch

Joshua D. Drake <jd@commandprompt.com> — 2011-06-08T17:53:12Z

On 06/07/2011 11:55 AM, Tom Lane wrote:
> Simon Riggs<simon@2ndQuadrant.com>  writes:
>> Before you arrived, it was quite normal to suggest tuning patches
>> after feature freeze.
>
> *Low risk* tuning patches make sense at this stage, yes.  Fooling with
> the lock mechanisms doesn't qualify as low risk in my book.  The
> probability of undetected subtle problems is just too great.
>
> 			regards, tom lane

I would like to see us continue on the path of release not 
destabilization. Any patch that breaks into core feature mechanisms 
(like locking) is bound to have something unsuspecting in the wings.

+1 for submitting for 9.2.
+1 for not comitting to 9.1.

Sincerely,

Joshua D. Drake



-- 
Command Prompt, Inc. - http://www.commandprompt.com/
PostgreSQL Support, Training, Professional Services and Development
The PostgreSQL Conference - http://www.postgresqlconference.org/
@cmdpromptinc - @postgresconf - 509-416-6579

Re: reducing the overhead of frequent table locks - now, with WIP patch

Robert Haas <robertmhaas@gmail.com> — 2011-06-08T18:19:09Z

On Wed, Jun 8, 2011 at 1:10 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Why do you address this to me? Many others have been committing
> patches against raised issues well after feature freeze.

No one other than you has proposed committing anything nearly as
invasive as this, and the great majority of what we've committed has
been targeted at new regressions in 9.1.

There is a difference between a feature and a bug fix.  Sometimes the
distinction is arguable, but this isn't one of those cases.  A feature
freeze does not mean an absolute code freeze; it means a freeze on
*features*.

> You do not wish to stop all patches, only those you disagree with. How
> would I know you disagree with a patch without discussing it?
>
> I note that you've claimed *everything* I have discussed is a new
> feature, whereas everything you or others have done is an "open item".
> You can claim that everything I suggest is a dust-up if you wish, but
> who makes it a dust up and why?

I think the people, including me, who feel that it's not a good idea
to commit new features have been very clear about the reasons for
their position - namely, (1) the desire to get the release out the
door in a timely fashion, and (2) the desire to treat everyone's
patches in a fair and even-handed way rather than privileging some
over others.  I'm just as much against committing my own features, or
Tom's features, or Alvaro's features as I am against committing your
features - not because I don't like the features (I do) but because I
want to release 9.1 in about a month.

> The point I have made is that I disagree with a feature freeze date
> fixed ahead of time without regard to the content of the forthcoming
> release. I've not said I disagree with feature freezes altogether,
> which would be utterly ridiculous. Fixed dates are IMHO much less
> important than a sensible and useful feature set for our users. MySQL
> repeatedly delivered releases with half-finished features and earned
> much disrespect. We have never done that previously and I am against
> doing so in the future.

So am I.  But apparently, we have very different ideas of what that
means.   I thought that "making the server shuts down properly, even
if you are using sync rep" was a clear-cut case of correcting a
half-finished feature, but you argued against that change.  And I
think that "revamping the locking mechanism so it's faster" is clearly
a new feature, not a repair to something half-finished.  I don't
expect it's very realistic to think that everyone is going to agree on
every patch, but we can't agree that bug fixes and features should be
treated differently, or if we can't agree at least in most cases on
what the difference is between one and the other, then we will spend a
lot of time talking past each other.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: reducing the overhead of frequent table locks - now, with WIP patch

Tom Lane <tgl@sss.pgh.pa.us> — 2011-06-08T22:05:22Z

Simon Riggs <simon@2ndQuadrant.com> writes:
> On Wed, Jun 8, 2011 at 6:02 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Just to set the record straight on this ... the vxid patch went in on
>> 2007-09-05:
>> http://archives.postgresql.org/pgsql-committers/2007-09/msg00026.php
>> which was a day shy of a month before we wrapped 8.3beta1:
>> http://archives.postgresql.org/pgsql-committers/2007-10/msg00089.php
>> so it was during alpha phase not beta. And 8.3RC1 was stamped on
>> 2008-01-03. So Simon's assertion that this was "days before we produced
>> a release candidate" is correct, if you take "days" as "4 months".

> The patch went in slightly more than 6 months after feature freeze,
> even though it was written by a summer student and did not even pass
> review by the student's mentor (me).

I'm not sure why you're having such a hard time distinguishing "before
beta" from "after beta", but in any case please notice that you're
describing a cycle where we spent nine months in feature freeze.
Nobody else here is going to hold that up as an example of sound project
management that we ought to repeat.  And the way to not repeat it is to
not accept risky new patches late in the cycle.

(This may be something of an apples-to-oranges comparison, though, since
as best I can tell from a quick look in the archives, we were not then
using the term "feature freeze" the same as we are now --- 2007-04-01
seems to have been the point that we would now call "beginning of the
last CF", ie, all feature patches for 8.3 were supposed to have been
*submitted*, not necessarily committed.  And we had a lot of them
pending at that point, because of lack of the CF process to get things
in earlier.)

			regards, tom lane

Re: reducing the overhead of frequent table locks - now, with WIP patch

Tom Lane <tgl@sss.pgh.pa.us> — 2011-06-08T22:10:10Z

Joshua Berkus <josh@agliodbs.com> writes:
> Simon,
>> The point I have made is that I disagree with a feature freeze date
>> fixed ahead of time without regard to the content of the forthcoming
>> release. I've not said I disagree with feature freezes altogether,
>> which would be utterly ridiculous. Fixed dates are IMHO much less
>> important than a sensible and useful feature set for our users.

> This is such a non-argument it's silly.

Perhaps more to the point, we've tried that approach in the past,
repeatedly, and it's been a scheduling disaster every single time.
Slipping the release date in order to get in newly-written features,
no matter *how* attractive they are, does not work.  Maybe there are
people who can make it work, but not us.

			regards, tom lane

Re: reducing the overhead of frequent table locks - now, with WIP patch

Simon Riggs <simon@2ndquadrant.com> — 2011-06-09T09:09:23Z

On Wed, Jun 8, 2011 at 6:43 PM, Joshua Berkus <josh@agliodbs.com> wrote:
> Simon,
>
>> The point I have made is that I disagree with a feature freeze date
>> fixed ahead of time without regard to the content of the forthcoming
>> release. I've not said I disagree with feature freezes altogether,
>> which would be utterly ridiculous. Fixed dates are IMHO much less
>> important than a sensible and useful feature set for our users.
>
> This is such a non-argument it's silly.  We have so many new major features for 9.1 that I'm having trouble writing sensible press releases which don't sound like a laundry list.

You're right this is a non-argument.

I am not continuing this debate using the above point. I am merely
correcting people's assertions about what I think, which is a little
tiresome for all of us and it would be much better if people didn't
foolishly put words in my mouth, as multiple people have done on this
thread.

I'm also quite happy with the feature set for 9.1.

>> MySQL
>> repeatedly delivered releases with half-finished features and earned
>> much disrespect. We have never done that previously and I am against
>> doing so in the future.
>
> This is also total BS.  I worked on the MySQL team.

>Before Sun/Oracle, MySQL specifically had feature-driven releases, where Marketing decided what features 5.0, 5.1 and 5.2 would have.  They also accepted new features during beta if Marketing liked them enough.  This resulted in the 5.1 release being *three years late*, and 5.3 being cancelled altogether.  And let's talk about the legendary instability of 5.0, because they decided that they couldn't cancel partitioning and stored procedures, whether they were ready for prime time or not and because they kept changing the API during beta.
>
> MySQL never had time-based releases before Oracle took them over.  And Oracle has been having feature-free releases because they're trying to work through MySQL's list of thousands of unfixed bugs which dates back to 2003.

I claimed they delivered half-finished features. You clearly agree
with me on that. I'm not sure which part you see as BS?

> An argument for feature-driven releases is in fact an argument for the MySQL AB development model.  And that's not a company I want to emulate.

Yes, I've also experienced totally marketing-driven software
development, and that's why I'm *here*. I've spoken at length about
how good our process is and have considerable respect for it and the
people that have made it work. I am not advocating any changes to it
at all, especially not to the model used by MYSQL AB.

I have asked that we maintain the Reasonableness we have always had
about how the feature freeze date was applied. An example of such
reasonableness is that if a feature is a few days late and it is
important, then it would still go into the release. An example of
unreasonableness would be to close the feature freeze on a
predetermined date, without regard to the state of the feature set in
the release. To date, we have always been reasonable and I don't want
to change the process in the way Robert has suggested we should
change. I was one of a number of developers making that point at the
developer meeting and I would say I was part of the majority view.

-- 
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: reducing the overhead of frequent table locks - now, with WIP patch

Robert Haas <robertmhaas@gmail.com> — 2011-06-09T13:13:16Z

On Thu, Jun 9, 2011 at 5:09 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> I have asked that we maintain the Reasonableness we have always had
> about how the feature freeze date was applied. An example of such
> reasonableness is that if a feature is a few days late and it is
> important, then it would still go into the release. An example of
> unreasonableness would be to close the feature freeze on a
> predetermined date, without regard to the state of the feature set in
> the release. To date, we have always been reasonable and I don't want
> to change the process in the way Robert has suggested we should
> change.

Now you're putting words in my mouth.  I wouldn't want to put out a
release without a good feature set, either, but we don't have that
problem.  Getting them out on a fairly regular schedule without a
really long feature freeze has traditionally been a bit harder.  I
believe that over the last few releases we've actually gotten better
at integrating larger patches while also sticking closer to the
schedule; and I'd like to continue to get better at both of those
things.  I don't advocate blind adherence to the feature freeze date
either, but I do prefer to see deviations measured in days or at most
weeks rather than months; and I have a lot more sympathy for the
"patch submitted and no one got around to reviewing it" situation than
I do for the "patch just plain got here late" case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: reducing the overhead of frequent table locks - now, with WIP patch

Dave Page <dpage@pgadmin.org> — 2011-06-09T13:22:36Z

On Thu, Jun 9, 2011 at 2:13 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Jun 9, 2011 at 5:09 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> I have asked that we maintain the Reasonableness we have always had
>> about how the feature freeze date was applied. An example of such
>> reasonableness is that if a feature is a few days late and it is
>> important, then it would still go into the release. An example of
>> unreasonableness would be to close the feature freeze on a
>> predetermined date, without regard to the state of the feature set in
>> the release. To date, we have always been reasonable and I don't want
>> to change the process in the way Robert has suggested we should
>> change.
>
> Now you're putting words in my mouth.  I wouldn't want to put out a
> release without a good feature set, either, but we don't have that
> problem.  Getting them out on a fairly regular schedule without a
> really long feature freeze has traditionally been a bit harder.  I
> believe that over the last few releases we've actually gotten better
> at integrating larger patches while also sticking closer to the
> schedule; and I'd like to continue to get better at both of those
> things.  I don't advocate blind adherence to the feature freeze date
> either, but I do prefer to see deviations measured in days or at most
> weeks rather than months; and I have a lot more sympathy for the
> "patch submitted and no one got around to reviewing it" situation than
> I do for the "patch just plain got here late" case.

Can we make this the last post on this topic please?

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: reducing the overhead of frequent table locks - now, with WIP patch

Pavan Deolasee <pavan.deolasee@gmail.com> — 2011-06-09T13:30:17Z
```
> 
> Can we make this the last post on this topic please?
> 

+1 :)

Thanks,
Pavan
```