Thread

  1. patch review : Add ability to constrain backend temporary file space

    Cédric Villemain <cedric.villemain.debian@gmail.com> — 2011-05-31T21:24:48Z

    Hello
    
    here is a partial review of your patch, better than keeping it
    sleeping in the commitfest queue I hope.
    
     Submission review
    ================
    
        * The patch is not in context diff format.
        * The patch apply, but contains some extra whitespace.
        * Documentation is here but not explicit about 'temp tables',
    maybe worth adding that this won't limit temporary table size ?
        * There is no test provided. One can be expected to check that the
    feature work.
    
    Code review
    =========
    
    * in fd.c, I think that "temporary_files_size -=
    (double)vfdP->fileSize;" should be done later in the function once we
    have successfully unlink the file, not before.
    
    * I am not sure it is better to add a fileSize like you did or use
    relationgetnumberofblock() when file is about to be truncated or
    unlinked, this way the seekPos should be enough to increase the global
    counter.
    
    * temporary_files_size, I think it is better to have a number of pages
    à la postgresql than a kilobyte size
    
    * max_temp_files_size, I'll prefer an approach like shared_buffers
    GUC: you can use pages, or KB, MB, ...
    
    
    Simple Feature test
    ==============
    
    either explain buffers is wrong or the patch is wrong:
    cedric=# explain (analyze,buffers) select * from foo  order by 1 desc ;
                                                       QUERY PLAN
    -----------------------------------------------------------------------------------------------------------------
     Sort  (cost=10260.02..10495.82 rows=94320 width=4) (actual
    time=364.373..518.940 rows=100000 loops=1)
       Sort Key: generate_series
       Sort Method: external merge  Disk: 1352kB
       Buffers: local hit=393, temp read=249 written=249
       ->  Seq Scan on foo  (cost=0.00..1336.20 rows=94320 width=4)
    (actual time=0.025..138.754 rows=100000 loops=1)
             Buffers: local hit=393
     Total runtime: 642.874 ms
    (7 rows)
    
    cedric=# set max_temp_files_size to 1900;
    SET
    cedric=# explain (analyze,buffers) select * from foo  order by 1 desc ;
    ERROR:  aborting due to exceeding max temp files size
    STATEMENT:  explain (analyze,buffers) select * from foo  order by 1 desc ;
    ERROR:  aborting due to exceeding max temp files size
    
    Do you have some testing method I can apply to track that without
    explain (analyze, buffers) before going to low-level monitoring ?
    
    Architecture review
    ==============
    
    max_temp_files_size is used for the global space used per backend.
    Based on how work_mem work I expect something like "work_disk" to
    limit per file and maybe a backend_work_disk (and yes maybe a
    backend_work_mem ?!) per backend.
    So I propose to rename the current GUC to something like backend_work_disk.
    
    Patch is not large and easy to read.
    I like the idea and it sounds useful.
    
    -- 
    Cédric Villemain               2ndQuadrant
    http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support
    
    
  2. Re: patch review : Add ability to constrain backend temporary file space

    Mark Kirkwood <mark.kirkwood@catalyst.net.nz> — 2011-06-01T00:27:46Z

    On 01/06/11 09:24, Cédric Villemain wrote:
    > Hello
    >
    > here is a partial review of your patch, better than keeping it
    > sleeping in the commitfest queue I hope.
    >
    >   Submission review
    > ================
    >
    >      * The patch is not in context diff format.
    >      * The patch apply, but contains some extra whitespace.
    >      * Documentation is here but not explicit about 'temp tables',
    > maybe worth adding that this won't limit temporary table size ?
    >      * There is no test provided. One can be expected to check that the
    > feature work.
    >
    > Code review
    > =========
    >
    > * in fd.c, I think that "temporary_files_size -=
    > (double)vfdP->fileSize;" should be done later in the function once we
    > have successfully unlink the file, not before.
    >
    > * I am not sure it is better to add a fileSize like you did or use
    > relationgetnumberofblock() when file is about to be truncated or
    > unlinked, this way the seekPos should be enough to increase the global
    > counter.
    >
    > * temporary_files_size, I think it is better to have a number of pages
    > à la postgresql than a kilobyte size
    >
    > * max_temp_files_size, I'll prefer an approach like shared_buffers
    > GUC: you can use pages, or KB, MB, ...
    >
    >
    > Simple Feature test
    > ==============
    >
    > either explain buffers is wrong or the patch is wrong:
    > cedric=# explain (analyze,buffers) select * from foo  order by 1 desc ;
    >                                                     QUERY PLAN
    > -----------------------------------------------------------------------------------------------------------------
    >   Sort  (cost=10260.02..10495.82 rows=94320 width=4) (actual
    > time=364.373..518.940 rows=100000 loops=1)
    >     Sort Key: generate_series
    >     Sort Method: external merge  Disk: 1352kB
    >     Buffers: local hit=393, temp read=249 written=249
    >     ->   Seq Scan on foo  (cost=0.00..1336.20 rows=94320 width=4)
    > (actual time=0.025..138.754 rows=100000 loops=1)
    >           Buffers: local hit=393
    >   Total runtime: 642.874 ms
    > (7 rows)
    >
    > cedric=# set max_temp_files_size to 1900;
    > SET
    > cedric=# explain (analyze,buffers) select * from foo  order by 1 desc ;
    > ERROR:  aborting due to exceeding max temp files size
    > STATEMENT:  explain (analyze,buffers) select * from foo  order by 1 desc ;
    > ERROR:  aborting due to exceeding max temp files size
    >
    > Do you have some testing method I can apply to track that without
    > explain (analyze, buffers) before going to low-level monitoring ?
    >
    > Architecture review
    > ==============
    >
    > max_temp_files_size is used for the global space used per backend.
    > Based on how work_mem work I expect something like "work_disk" to
    > limit per file and maybe a backend_work_disk (and yes maybe a
    > backend_work_mem ?!) per backend.
    > So I propose to rename the current GUC to something like backend_work_disk.
    >
    > Patch is not large and easy to read.
    > I like the idea and it sounds useful.
    >
    
    Hi Cédric,
    
    Thanks for reviewing!
    
    The diff format is odd - I'm sure I told git to do a context diff, 
    however running 'file' on the v2 patch results it it saying 'HTML 
    document'... hmm, I'll check more carefully for the next one I do.
    
    Yeah, I agree about explicitly mentioning how it does not constraint 
    temp tables.
    
    Hmm, test - good idea - I'll look into it.
    
    Re the code comments - I agree with most of them. However with respect 
    to the Guc units, I copied the setup for work_mem as that seemed the 
    most related. Also I used bytes for the internal variable in fd.c to 
    limit the number of arithmetic operations while comparing.
    
    For testing I set 'log_temp_files = 0' in postgresql.conf and the 
    numbers seemed to agree exactly - I didn't think to use EXPLAIN + 
    buffers... not sure why they would disagree. Have a go with 
    log_temp_files set and see what you find!
    
    I like yhour suggesting for the name. Given that 'work_mem' is per 
    backend, I'm leaning towards the idea of 'work_disk' being sufficiently 
    descriptive.
    
    Thanks again, and I'll come up with a v3 patch.
    
    Mark
    
    
    
    
  3. Re: patch review : Add ability to constrain backend temporary file space

    Mark Kirkwood <mark.kirkwood@catalyst.net.nz> — 2011-06-01T00:32:31Z

    On 01/06/11 12:27, Mark Kirkwood wrote:
    >
    > Re the code comments - I agree with most of them. However with respect 
    > to the Guc units, I copied the setup for work_mem as that seemed the 
    > most related.
    
    Also - forget to mention - I *thought* you could specify the temp files 
    size GUC as KB, MB, GB or plain old pages, but I'll check this, in case 
    I've busted it somehow.
    
    Cheers
    
    Mark
    
    
  4. Re: patch review : Add ability to constrain backend temporary file space

    Mark Kirkwood <mark.kirkwood@catalyst.net.nz> — 2011-06-01T00:46:18Z

    On 01/06/11 12:32, Mark Kirkwood wrote:
    > On 01/06/11 12:27, Mark Kirkwood wrote:
    >>
    >> Re the code comments - I agree with most of them. However with 
    >> respect to the Guc units, I copied the setup for work_mem as that 
    >> seemed the most related.
    >
    > Also - forget to mention - I *thought* you could specify the temp 
    > files size GUC as KB, MB, GB or plain old pages, but I'll check this, 
    > in case I've busted it somehow.
    >
    
    Doh, ignore me here :-( You are correct - I've specified GUC_UNITS_KB 
    (copying work_mem) - it is easy enough to allow pages like shared_buffers.
    
    Cheers
    
    Mark
    
    
  5. Re: patch review : Add ability to constrain backend temporary file space

    Mark Kirkwood <mark.kirkwood@catalyst.net.nz> — 2011-06-01T23:35:55Z

    On 01/06/11 09:24, Cédric Villemain wrote:
    >
    >   Submission review
    > ================
    >
    >      * The patch is not in context diff format.
    >      * The patch apply, but contains some extra whitespace.
    >      * Documentation is here but not explicit about 'temp tables',
    > maybe worth adding that this won't limit temporary table size ?
    >      * There is no test provided. One can be expected to check that the
    > feature work.
    >
    
    I've created a new patch (attached) with 'git diff -c' so hopefully the 
    format is ok now. I've added a paragraph about how temporary *table* 
    storage is not constrained, only temp work files. I made the point that 
    the *query* that creates a temp table will have its work files 
    constrained too. I've added a test too.
    
    The major visible change is that the guc has been renamed to 
    'work_disk', to gel better with the idea that it is the disk spill for 
    'work_mem'.
    
    > Code review
    > =========
    >
    > * in fd.c, I think that "temporary_files_size -=
    > (double)vfdP->fileSize;" should be done later in the function once we
    > have successfully unlink the file, not before.
    >
    
    Agreed, I've moved it.
    
    > * I am not sure it is better to add a fileSize like you did or use
    > relationgetnumberofblock() when file is about to be truncated or
    > unlinked, this way the seekPos should be enough to increase the global
    > counter.
    >
    
    The temp files are not relations so I'd have to call stat I guess. Now 
    truncate/unlink can happen quite a lot (e.g hash with many files) and I 
    wanted to avoid adding too many library calls to this code for 
    performance reasons, so on balance I'm thinking it is gonna be more 
    efficient to remember the size in the Vfd.
    
    > * temporary_files_size, I think it is better to have a number of pages
    > à la postgresql than a kilobyte size
    >
    
    The temp file related stuff is worked on in bytes or kbytes in the fd.c 
    and other code (e.g log_temp_files), so it seems more natural to stay in 
    kbytes. Also work_disk is really only a spillover from work_mem (in 
    kbytes) so seems logical to match its units.
    
    > * max_temp_files_size, I'll prefer an approach like shared_buffers
    > GUC: you can use pages, or KB, MB, ...
    >
    
    We can use KB, MB, GB - just not pages, again like work_mem. These files 
    are not relations so I'm not sure pages is an entirely appropriate unit 
    for them.
    
    
    > Simple Feature test
    > ==============
    >
    > either explain buffers is wrong or the patch is wrong:
    > cedric=# explain (analyze,buffers) select * from foo  order by 1 desc ;
    >                                                     QUERY PLAN
    > -----------------------------------------------------------------------------------------------------------------
    >   Sort  (cost=10260.02..10495.82 rows=94320 width=4) (actual
    > time=364.373..518.940 rows=100000 loops=1)
    >     Sort Key: generate_series
    >     Sort Method: external merge  Disk: 1352kB
    >     Buffers: local hit=393, temp read=249 written=249
    >     ->   Seq Scan on foo  (cost=0.00..1336.20 rows=94320 width=4)
    > (actual time=0.025..138.754 rows=100000 loops=1)
    >           Buffers: local hit=393
    >   Total runtime: 642.874 ms
    > (7 rows)
    >
    > cedric=# set max_temp_files_size to 1900;
    > SET
    > cedric=# explain (analyze,buffers) select * from foo  order by 1 desc ;
    > ERROR:  aborting due to exceeding max temp files size
    > STATEMENT:  explain (analyze,buffers) select * from foo  order by 1 desc ;
    > ERROR:  aborting due to exceeding max temp files size
    >
    > Do you have some testing method I can apply to track that without
    > explain (analyze, buffers) before going to low-level monitoring ?
    >
    
    We're looking at this...
    
    
    > Architecture review
    > ==============
    >
    > max_temp_files_size is used for the global space used per backend.
    > Based on how work_mem work I expect something like "work_disk" to
    > limit per file and maybe a backend_work_disk (and yes maybe a
    > backend_work_mem ?!) per backend.
    > So I propose to rename the current GUC to something like backend_work_disk.
    >
    
    Done - 'work_disk' it is to match 'work_mem'.
    
    > Patch is not large and easy to read.
    > I like the idea and it sounds useful.
    >
    
    Great! Cheers
    
    Mark
    
    
  6. Re: Re: patch review : Add ability to constrain backend temporary file space

    Mark Kirkwood <mark.kirkwood@catalyst.net.nz> — 2011-06-02T02:52:18Z

    On 02/06/11 11:35, Mark Kirkwood wrote:
    > On 01/06/11 09:24, Cédric Villemain wrote: Simple Feature test
    >> ==============
    >>
    >> either explain buffers is wrong or the patch is wrong:
    >> cedric=# explain (analyze,buffers) select * from foo  order by 1 desc ;
    >>                                                     QUERY PLAN
    >> ----------------------------------------------------------------------------------------------------------------- 
    >>
    >>   Sort  (cost=10260.02..10495.82 rows=94320 width=4) (actual
    >> time=364.373..518.940 rows=100000 loops=1)
    >>     Sort Key: generate_series
    >>     Sort Method: external merge  Disk: 1352kB
    >>     Buffers: local hit=393, temp read=249 written=249
    >>     ->   Seq Scan on foo  (cost=0.00..1336.20 rows=94320 width=4)
    >> (actual time=0.025..138.754 rows=100000 loops=1)
    >>           Buffers: local hit=393
    >>   Total runtime: 642.874 ms
    >> (7 rows)
    >>
    >> cedric=# set max_temp_files_size to 1900;
    >> SET
    >> cedric=# explain (analyze,buffers) select * from foo  order by 1 desc ;
    >> ERROR:  aborting due to exceeding max temp files size
    >> STATEMENT:  explain (analyze,buffers) select * from foo  order by 1 
    >> desc ;
    >> ERROR:  aborting due to exceeding max temp files size
    >>
    >> Do you have some testing method I can apply to track that without
    >> explain (analyze, buffers) before going to low-level monitoring ?
    >>
    >
    > We're looking at this...
    >
    
    Arg - I was being dense. FileWrite can write *anywhere* in the file, so 
    need to be smart about whether to add to the total filesize or not.
    
    I'll update the Commitfest page as soon as it sends me my password reset 
    mail...
    
    Cheers
    
    Mark
    
    
  7. Re: Re: patch review : Add ability to constrain backend temporary file space

    Jaime Casanova <jaime@2ndquadrant.com> — 2011-06-02T06:34:40Z

    On Wed, Jun 1, 2011 at 6:35 PM, Mark Kirkwood
    <mark.kirkwood@catalyst.net.nz> wrote:
    > On 01/06/11 09:24, Cédric Villemain wrote:
    >>
    >>  Submission review
    >> ================
    >>
    >>     * The patch is not in context diff format.
    >>     * The patch apply, but contains some extra whitespace.
    >>     * Documentation is here but not explicit about 'temp tables',
    >> maybe worth adding that this won't limit temporary table size ?
    >>     * There is no test provided. One can be expected to check that the
    >> feature work.
    >>
    >
    > I've created a new patch (attached)
    
    Hi Mark,
    
    A few comments:
    
    - why only superusers can set this? if this is a per-backend setting,
    i don't see the problem in allowing normal users to restrict their own
    queries
    
    - why the calculations are done as double?
    +               if (temporary_files_size / 1024.0 > (double)work_disk)
    
    
    
    - the patch adds this to serial_schedule but no test has been added...
    
    diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
    index bb654f9..325cb3d 100644
    --- a/src/test/regress/serial_schedule
    +++ b/src/test/regress/serial_schedule
    @@ -127,3 +127,4 @@ test: largeobject
     test: with
     test: xml
     test: stats
    +test: resource
    
    -- 
    Jaime Casanova         www.2ndQuadrant.com
    Professional PostgreSQL: Soporte y capacitación de PostgreSQL
    
    
  8. Re: Re: patch review : Add ability to constrain backend temporary file space

    Mark Kirkwood <mark.kirkwood@catalyst.net.nz> — 2011-06-02T09:06:19Z

    On 02/06/11 18:34, Jaime Casanova wrote:
    > On Wed, Jun 1, 2011 at 6:35 PM, Mark Kirkwood
    > <mark.kirkwood@catalyst.net.nz>  wrote:
    >>
    >> I've created a new patch (attached)
    > Hi Mark,
    >
    > A few comments:
    >
    > - why only superusers can set this? if this is a per-backend setting,
    > i don't see the problem in allowing normal users to restrict their own
    > queries
    >
    
    Yeah, that's a good point, I was thinking that as a resource control 
    parameter it might be good to prevent casual users increasing their 
    limit. However the most likely use case would be ad-hoc query tools that 
    don't have the ability to do SET anyway. Hmm - what do you think?
    
    
    > - why the calculations are done as double?
    > +               if (temporary_files_size / 1024.0>  (double)work_disk)
    >
    >
    >
    
    I originally coded this with the idea that the guc would (or could) be a 
    double - to allow for seriously big limits in data warehousing 
    situations etc. But subsequent discussion led to that being discarded. 
    However work_disk can go to INT_MAX * 1024 bytes so I need to make sure 
    that whatever datatype I use can handle that - double seemed sufficient.
    
    > - the patch adds this to serial_schedule but no test has been added...
    >
    > diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
    > index bb654f9..325cb3d 100644
    > --- a/src/test/regress/serial_schedule
    > +++ b/src/test/regress/serial_schedule
    > @@ -127,3 +127,4 @@ test: largeobject
    >   test: with
    >   test: xml
    >   test: stats
    > +test: resource
    >
    
    Bugger - I think I forgot to 'git add'  em before doing the diff.
    
    I can sense a v5 coming on.
    
    
    
  9. Re: Re: patch review : Add ability to constrain backend temporary file space

    Robert Haas <robertmhaas@gmail.com> — 2011-06-02T13:46:46Z

    On Wed, Jun 1, 2011 at 7:35 PM, Mark Kirkwood
    <mark.kirkwood@catalyst.net.nz> wrote:
    > Done - 'work_disk' it is to match 'work_mem'.
    
    I guess I'm bikeshedding here, but I'm not sure I really buy this
    parallel.  work_mem is primarily a query planner parameter; it says,
    if you're going to need more memory than this, then you have to
    execute the plan some other way.  This new parameter is not a query
    planner paramater AIUI - its job is to KILL things if they exceed the
    limit.  In that sense it's more like statement_timeout.  I can imagine
    us wanting more parameters like this too.  Kill the query if it...
    
    ...takes too long (statement_timeout)
    ...uses too much temporary file space (the current patch)
    ...uses too much CPU time
    ...uses too much RAM
    ...generates too much disk I/O
    ...has too high an estimated cost
    ...others?
    
    So I'm not sure work_disk is a great name.  Actually, work_mem is
    already not a great name even for what it is, but at any rate I think
    this is something different.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
  10. Re: Re: patch review : Add ability to constrain backend temporary file space

    Tom Lane <tgl@sss.pgh.pa.us> — 2011-06-02T14:36:10Z

    Robert Haas <robertmhaas@gmail.com> writes:
    > So I'm not sure work_disk is a great name.
    
    I agree.  Maybe something along the lines of "temp_file_limit"?
    
    Also, once you free yourself from the analogy to work_mem, you could
    adopt some more natural unit than KB.  I'd think MB would be a practical
    unit size, and would avoid (at least for the near term) the need to make
    the parameter a float.
    
    			regards, tom lane
    
    
  11. Re: Re: patch review : Add ability to constrain backend temporary file space

    Cédric Villemain <cedric.villemain.debian@gmail.com> — 2011-06-02T14:58:24Z

    2011/6/2 Robert Haas <robertmhaas@gmail.com>:
    > On Wed, Jun 1, 2011 at 7:35 PM, Mark Kirkwood
    > <mark.kirkwood@catalyst.net.nz> wrote:
    >> Done - 'work_disk' it is to match 'work_mem'.
    >
    > I guess I'm bikeshedding here, but I'm not sure I really buy this
    > parallel.  work_mem is primarily a query planner parameter; it says,
    > if you're going to need more memory than this, then you have to
    > execute the plan some other way.  This new parameter is not a query
    > planner paramater AIUI - its job is to KILL things if they exceed the
    > limit.  In that sense it's more like statement_timeout.  I can imagine
    > us wanting more parameters like this too.  Kill the query if it...
    >
    > ...takes too long (statement_timeout)
    > ...uses too much temporary file space (the current patch)
    > ...uses too much CPU time
    > ...uses too much RAM
    > ...generates too much disk I/O
    > ...has too high an estimated cost
    > ...others?
    
    you're sorting limits for 'executor' and limits for 'planner': uses
    too much CPU time VS has too high an estimated cost.
    
    (backend)_work_(disk|mem) looks good also for the 'has too high an
    estimated cost' series: limiter at the planner level should allow
    planner to change its strategy, I think... But probably not something
    to consider too much right now.
    
    >
    > So I'm not sure work_disk is a great name.  Actually, work_mem is
    > already not a great name even for what it is, but at any rate I think
    > this is something different.
    
    I am not specially attached to a name, idea was not to use work_disk
    but backend_work_disk. I agree with you anyway, and suggestion from
    Tom is fine for me (temp_file_limit).
    
    >
    > --
    > Robert Haas
    > EnterpriseDB: http://www.enterprisedb.com
    > The Enterprise PostgreSQL Company
    >
    
    
    
    -- 
    Cédric Villemain               2ndQuadrant
    http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support
    
    
  12. Re: Re: patch review : Add ability to constrain backend temporary file space

    Robert Haas <robertmhaas@gmail.com> — 2011-06-02T16:53:15Z

    On Thu, Jun 2, 2011 at 10:58 AM, Cédric Villemain
    <cedric.villemain.debian@gmail.com> wrote:
    > I am not specially attached to a name, idea was not to use work_disk
    > but backend_work_disk. I agree with you anyway, and suggestion from
    > Tom is fine for me (temp_file_limit).
    
    Yeah, I like that too.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
  13. Re: patch review : Add ability to constrain backend temporary file space

    Greg Stark <gsstark@mit.edu> — 2011-06-02T18:43:15Z

    On Thu, Jun 2, 2011 at 7:36 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > Also, once you free yourself from the analogy to work_mem, you could
    > adopt some more natural unit than KB.  I'd think MB would be a practical
    > unit size, and would avoid (at least for the near term) the need to make
    > the parameter a float.
    
    As long as users can specify any unit when they input the parameter it
    doesn't really matter what unit the variable is stored in. I'm not
    sure the GUC infrastructure can currently handle megabytes as the
    native units for a guc though.
    
    -- 
    greg
    
    
  14. Re: Re: patch review : Add ability to constrain backend temporary file space

    Mark Kirkwood <mark.kirkwood@catalyst.net.nz> — 2011-06-02T22:25:57Z

    On 03/06/11 02:36, Tom Lane wrote:
    > Robert Haas<robertmhaas@gmail.com>  writes:
    >> So I'm not sure work_disk is a great name.
    > I agree.  Maybe something along the lines of "temp_file_limit"?
    >
    > Also, once you free yourself from the analogy to work_mem, you could
    > adopt some more natural unit than KB.  I'd think MB would be a practical
    > unit size, and would avoid (at least for the near term) the need to make
    > the parameter a float.
    >
    
    I agree, and  I like your name suggestion (and also agree with Robert 
    that 'work_mem' was not such a great name).
    
    I'd be quite happy to use MB (checks guc.h) but looks like we don't have 
    a GUC_UNIT_MB, not sure if adding it would be an issue (suggests on a 
    suitable mask if people think it is a great idea).
    
    Cheers
    
    Mark
    
    
  15. Re: Re: patch review : Add ability to constrain backend temporary file space

    Mark Kirkwood <mark.kirkwood@catalyst.net.nz> — 2011-06-02T23:01:43Z

    On 02/06/11 18:34, Jaime Casanova wrote:
    >
    >
    > - the patch adds this to serial_schedule but no test has been added...
    >
    > diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
    > index bb654f9..325cb3d 100644
    > --- a/src/test/regress/serial_schedule
    > +++ b/src/test/regress/serial_schedule
    > @@ -127,3 +127,4 @@ test: largeobject
    >   test: with
    >   test: xml
    >   test: stats
    > +test: resource
    >
    
    Corrected v4 patch with the test files, for completeness. Note that 
    discussion has moved on and there will be a v5 :-)
    
  16. Re: patch review : Add ability to constrain backend temporary file space

    Cédric Villemain <cedric.villemain.debian@gmail.com> — 2011-06-03T00:33:30Z

    2011/6/2 Mark Kirkwood <mark.kirkwood@catalyst.net.nz>:
    > On 01/06/11 09:24, Cédric Villemain wrote:
    >> * I am not sure it is better to add a fileSize like you did or use
    >> relationgetnumberofblock() when file is about to be truncated or
    >> unlinked, this way the seekPos should be enough to increase the global
    >> counter.
    >>
    >
    > The temp files are not relations so I'd have to call stat I guess. Now
    > truncate/unlink can happen quite a lot (e.g hash with many files) and I
    > wanted to avoid adding too many library calls to this code for performance
    > reasons, so on balance I'm thinking it is gonna be more efficient to
    > remember the size in the Vfd.
    
    I am not sure temporary relation are truncated. I have not checked
    right now, but IIRC, we don't need to truncate it. And I believe it
    would defeat the log_temp feature because log_temp is done on
    FileClose() only.
    
    If we are to add a field in struct vfd to keep the filesize then some
    optimization may happens for logging too....
    
    
  17. Re: patch review : Add ability to constrain backend temporary file space

    Mark Kirkwood <mark.kirkwood@catalyst.net.nz> — 2011-06-03T01:12:04Z

    On 03/06/11 12:33, Cédric Villemain wrote:
    > 2011/6/2 Mark Kirkwood<mark.kirkwood@catalyst.net.nz>:
    >> On 01/06/11 09:24, Cédric Villemain wrote:
    >>> * I am not sure it is better to add a fileSize like you did or use
    >>> relationgetnumberofblock() when file is about to be truncated or
    >>> unlinked, this way the seekPos should be enough to increase the global
    >>> counter.
    >>>
    >> The temp files are not relations so I'd have to call stat I guess. Now
    >> truncate/unlink can happen quite a lot (e.g hash with many files) and I
    >> wanted to avoid adding too many library calls to this code for performance
    >> reasons, so on balance I'm thinking it is gonna be more efficient to
    >> remember the size in the Vfd.
    > I am not sure temporary relation are truncated. I have not checked
    > right now, but IIRC, we don't need to truncate it. And I believe it
    > would defeat the log_temp feature because log_temp is done on
    > FileClose() only.
    >
    > If we are to add a field in struct vfd to keep the filesize then some
    > optimization may happens for logging too....
    
    We pretty much need to store the file size I think, due to needing to 
    work out if FileWrite is re-writing inside the file or extending it. So 
    maybe we could avoid a stat on unlink too (mind you it is a nice 
    validity check that the ongoing size accounting is correct). I was/am 
    keen to avoid changing too much here (heh - maybe I'm being too timid...).
    
    Cheers
    
    Mark
    
    
    
  18. Re: Re: patch review : Add ability to constrain backend temporary file space

    Cédric Villemain <cedric.villemain.debian@gmail.com> — 2011-06-14T14:52:45Z

    2011/6/3 Mark Kirkwood <mark.kirkwood@catalyst.net.nz>:
    > On 02/06/11 18:34, Jaime Casanova wrote:
    >>
    >>
    >> - the patch adds this to serial_schedule but no test has been added...
    >>
    >> diff --git a/src/test/regress/serial_schedule
    >> b/src/test/regress/serial_schedule
    >> index bb654f9..325cb3d 100644
    >> --- a/src/test/regress/serial_schedule
    >> +++ b/src/test/regress/serial_schedule
    >> @@ -127,3 +127,4 @@ test: largeobject
    >>  test: with
    >>  test: xml
    >>  test: stats
    >> +test: resource
    >>
    >
    > Corrected v4 patch with the test files, for completeness. Note that
    > discussion has moved on and there will be a v5 :-)
    >
    
    Mark, can you submit your updated patch ?
    
    -- 
    Cédric Villemain               2ndQuadrant
    http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support
    
    
  19. Re: Re: patch review : Add ability to constrain backend temporary file space

    Mark Kirkwood <mark.kirkwood@catalyst.net.nz> — 2011-06-15T00:35:32Z

    On 15/06/11 02:52, Cédric Villemain wrote:
    > 2011/6/3 Mark Kirkwood<mark.kirkwood@catalyst.net.nz>:
    >>
    >> Corrected v4 patch with the test files, for completeness. Note that
    >> discussion has moved on and there will be a v5 :-)
    >>
    > Mark, can you submit your updated patch ?
    >
    
    Thanks for the reminder! Here is v5. The changes are:
    
    - guc is called temp_file_limit, which seems like the best choice to 
    date :-)
    - removed code to do with truncating files, as after testing I agree 
    with you that temp work files don't seem to get truncated.
    
    I have not done anything about the business on units - so we are in KB 
    still - there is no MB unit avaiable in the code as yet - I'm not sure 
    we need one at this point, as most folk who use this feature will find 
    4096GB a big enough *per backend* limit. Obviously it might be a good 
    investment to plan to have MB, and GB as possible guc units too. Maybe 
    this could be a separate piece of work since any *other* resource 
    limiters we add might find it convenient to have these available?
    
    Cheers
    
    Mark
    
  20. Re: Re: patch review : Add ability to constrain backend temporary file space

    Cédric Villemain <cedric.villemain.debian@gmail.com> — 2011-06-16T21:49:12Z

    2011/6/15 Mark Kirkwood <mark.kirkwood@catalyst.net.nz>:
    > On 15/06/11 02:52, Cédric Villemain wrote:
    >>
    >> 2011/6/3 Mark Kirkwood<mark.kirkwood@catalyst.net.nz>:
    >>>
    >>> Corrected v4 patch with the test files, for completeness. Note that
    >>> discussion has moved on and there will be a v5 :-)
    >>>
    >> Mark, can you submit your updated patch ?
    >>
    >
    > Thanks for the reminder! Here is v5. The changes are:
    
    I have issues applying it.
    Please can you remove trailing space?
    Also, you can generate a cool patch like this :
    
    get git-external-diff from postgres/src/tools to /usr/lib/git-core/
    chmod +x it
    export GIT_EXTERNAL_DIFF=git-external-diff
    git format-patch --ext-diff origin
    
    
    my log:
    $ git apply  temp-files-v5.patch
    temp-files-v5.patch:22: trailing whitespace.
            defaults to unlimited (<literal>-1</>). Values larger than zero
    temp-files-v5.patch:23: trailing whitespace.
            constraint the temporary file space usage to be that number of
    temp-files-v5.patch:28: trailing whitespace.
            the total space used by all the files produced by one backend is
    temp-files-v5.patch:35: trailing whitespace.
            constrain disk space used for temporary table storage. However if
    temp-files-v5.patch:105: trailing whitespace.
                            /*
    error: patch failed: src/backend/storage/file/fd.c:132
    error: src/backend/storage/file/fd.c: patch does not apply
    error: src/test/regress/expected/resource.out: No such file or directory
    error: src/test/regress/sql/resource.sql: No such file or directory
    
    
    >
    > - guc is called temp_file_limit, which seems like the best choice to date
    > :-)
    > - removed code to do with truncating files, as after testing I agree with
    > you that temp work files don't seem to get truncated.
    >
    > I have not done anything about the business on units - so we are in KB still
    > - there is no MB unit avaiable in the code as yet - I'm not sure we need one
    > at this point, as most folk who use this feature will find 4096GB a big
    > enough *per backend* limit. Obviously it might be a good investment to plan
    > to have MB, and GB as possible guc units too. Maybe this could be a separate
    > piece of work since any *other* resource limiters we add might find it
    > convenient to have these available?
    >
    > Cheers
    >
    > Mark
    >
    
    
    
    -- 
    Cédric Villemain               2ndQuadrant
    http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support
    
    
  21. Re: Re: patch review : Add ability to constrain backend temporary file space

    Mark Kirkwood <mark.kirkwood@catalyst.net.nz> — 2011-06-17T01:08:58Z

    On 17/06/11 09:49, Cédric Villemain wrote:
    >
    > I have issues applying it.
    > Please can you remove trailing space?
    > Also, you can generate a cool patch like this :
    >
    > get git-external-diff from postgres/src/tools to /usr/lib/git-core/
    > chmod +x it
    > export GIT_EXTERNAL_DIFF=git-external-diff
    > git format-patch --ext-diff origin
    >
    >
    > my log:
    > $ git apply  temp-files-v5.patch
    > temp-files-v5.patch:22: trailing whitespace.
    >          defaults to unlimited (<literal>-1</>). Values larger than zero
    > temp-files-v5.patch:23: trailing whitespace.
    >          constraint the temporary file space usage to be that number of
    > temp-files-v5.patch:28: trailing whitespace.
    >          the total space used by all the files produced by one backend is
    > temp-files-v5.patch:35: trailing whitespace.
    >          constrain disk space used for temporary table storage. However if
    > temp-files-v5.patch:105: trailing whitespace.
    >                          /*
    > error: patch failed: src/backend/storage/file/fd.c:132
    > error: src/backend/storage/file/fd.c: patch does not apply
    > error: src/test/regress/expected/resource.out: No such file or directory
    > error: src/test/regress/sql/resource.sql: No such file or directory
    >
    
    I can generate a patch that way no problem - however it does not apply 
    using the above method:
    
    $ git apply  /data0/download/postgres/patches/temp/temp-files-v5.1.patch
    error: doc/src/sgml/config.sgml: already exists in working directory
    error: src/backend/storage/file/fd.c: already exists in working directory
    error: src/backend/utils/misc/guc.c: already exists in working directory
    error: src/backend/utils/misc/postgresql.conf.sample: already exists in 
    working directory
    error: src/include/utils/guc.h: already exists in working directory
    error: src/include/utils/guc_tables.h: already exists in working directory
    error: src/test/regress/serial_schedule: already exists in working directory
    
    However it applies just fine using the "normal" method:
    
    $ patch --dry-run -p1 < 
    /data0/download/postgres/patches/temp/temp-files-v5.1.patch
    patching file doc/src/sgml/config.sgml
    patching file src/backend/storage/file/fd.c
    patching file src/backend/utils/misc/guc.c
    patching file src/backend/utils/misc/postgresql.conf.sample
    patching file src/include/utils/guc.h
    patching file src/include/utils/guc_tables.h
    patching file src/test/regress/expected/resource.out
    patching file src/test/regress/serial_schedule
    patching file src/test/regress/sql/resource.sql
    
    Any idea?
    
    regards
    
    Mark
    
    
    
    
    
    
  22. Re: Re: patch review : Add ability to constrain backend temporary file space

    Mark Kirkwood <mark.kirkwood@catalyst.net.nz> — 2011-06-17T01:51:37Z

    On 17/06/11 13:08, Mark Kirkwood wrote:
    > On 17/06/11 09:49, Cédric Villemain wrote:
    >>
    >> I have issues applying it.
    >> Please can you remove trailing space?
    >> Also, you can generate a cool patch like this :
    >>
    >> get git-external-diff from postgres/src/tools to /usr/lib/git-core/
    >> chmod +x it
    >> export GIT_EXTERNAL_DIFF=git-external-diff
    >> git format-patch --ext-diff origin
    
    I think I have the trailing spaces removed, and patch is updated for the 
    variable renaming recently done in fd.c
    
    I have no idea why I can't get the git apply to work (obviously I have 
    exceeded by git foo by quite a ways), but it should apply for you I hope 
    (as it patches fine).
    
    regards
    
    Mark
    
  23. Re: Re: patch review : Add ability to constrain backend temporary file space

    Cédric Villemain <cedric.villemain.debian@gmail.com> — 2011-06-17T08:57:53Z

    2011/6/17 Mark Kirkwood <mark.kirkwood@catalyst.net.nz>:
    > On 17/06/11 13:08, Mark Kirkwood wrote:
    >>
    >> On 17/06/11 09:49, Cédric Villemain wrote:
    >>>
    >>> I have issues applying it.
    >>> Please can you remove trailing space?
    >>> Also, you can generate a cool patch like this :
    >>>
    >>> get git-external-diff from postgres/src/tools to /usr/lib/git-core/
    >>> chmod +x it
    >>> export GIT_EXTERNAL_DIFF=git-external-diff
    >>> git format-patch --ext-diff origin
    >
    > I think I have the trailing spaces removed, and patch is updated for the
    > variable renaming recently done in fd.c
    >
    > I have no idea why I can't get the git apply to work (obviously I have
    > exceeded by git foo by quite a ways), but it should apply for you I hope (as
    > it patches fine).
    >
    
    If I didn't made mistake the attached patch does not have trailling
    space anymore and I did a minor cosmetic in FileClose. It is not in
    the expected format required by postgresql commiters but can be
    applyed with git apply...
    It looks like the issue is that patch generated with the git-ext-diff
    can not be git applyed (they need to use patch).
    
    Either I did something wrong or git-ext-diff format is not so great.
    
    
    I didn't test and all yet. From reading, the patch looks sane. I'll
    review it later this day or this week-end.
    
    
    -- 
    Cédric Villemain               2ndQuadrant
    http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support
    
  24. Re: Re: patch review : Add ability to constrain backend temporary file space

    Cédric Villemain <cedric.villemain.debian@gmail.com> — 2011-06-20T13:15:40Z

    2011/6/17 Cédric Villemain <cedric.villemain.debian@gmail.com>:
    > 2011/6/17 Mark Kirkwood <mark.kirkwood@catalyst.net.nz>:
    >> On 17/06/11 13:08, Mark Kirkwood wrote:
    >>>
    >>> On 17/06/11 09:49, Cédric Villemain wrote:
    >>>>
    >>>> I have issues applying it.
    >>>> Please can you remove trailing space?
    >>>> Also, you can generate a cool patch like this :
    >>>>
    >>>> get git-external-diff from postgres/src/tools to /usr/lib/git-core/
    >>>> chmod +x it
    >>>> export GIT_EXTERNAL_DIFF=git-external-diff
    >>>> git format-patch --ext-diff origin
    >>
    >> I think I have the trailing spaces removed, and patch is updated for the
    >> variable renaming recently done in fd.c
    >>
    >> I have no idea why I can't get the git apply to work (obviously I have
    >> exceeded by git foo by quite a ways), but it should apply for you I hope (as
    >> it patches fine).
    >>
    >
    > If I didn't made mistake the attached patch does not have trailling
    > space anymore and I did a minor cosmetic in FileClose. It is not in
    > the expected format required by postgresql commiters but can be
    > applyed with git apply...
    > It looks like the issue is that patch generated with the git-ext-diff
    > can not be git applyed (they need to use patch).
    >
    > Either I did something wrong or git-ext-diff format is not so great.
    >
    >
    > I didn't test and all yet. From reading, the patch looks sane. I'll
    > review it later this day or this week-end.
    >
    >
    
    Submission review
    ===============
    
    I have updated the patch for cosmetic.
    (I attach 2 files: 1 patch for 'git-apply' and 1 for 'patch')
    
    I have also updated the guc.c entry (lacks of a NULL for the last
    Hooks handler and make the pg_settings comment more like other GUC)
    
    The patch includes doc and test.
    
    Usability review
    =================
    
    The patch implements what it claims modulo the BLK size used by
    PostgreSQL: the filewrite operation happens before the test on file
    limit and PostgreSQL write per 8kB. The result is that a
    temp_file_limit=0 does not prevent temp_file writing or creation, but
    stop the writing at 8192 bytes.
    I've already argue to keep the value as a BLKSZ, but I don't care to
    have kB by default.
    The doc may need to be updated to reflect this behavior.
    Maybe we want to disable the temp_file writing completely with the
    value set to 0 ? (it looks useless to me atm but someone may have a
    usecase for that ?!)
    
    I don't believe it follows any SQL specs, and so far the community did
    not refuse the idea, so looks in the good way.
    I believe we want it, my only grippe is that a large value don't
    prevent server to use a lot of IO before the operation is aborted, I
    don't see real alternative at this level though.
    
    After initial testing some corner cases have been outline (in previous
    versions of the patch), so I believe the current code fix them well as
    I didn't have issue again.
    
     Feature test
    ============
    
    The feature does not work exactly as expected because the write limit
    is rounded per 8kB because we write before checking. I believe if one
    write a file of 1GB in one pass (instead of repetitive 8kB increment),
    and the temp_file_limit is 0, then the server will write the 1GB
    before aborting.
    Also, the patch does not handle a !(returnCode >=0) in FileWrite. I
    don't know if it is needed or not.
    As the counter is per session and we can have very long session, it
    may hurt....but the problem will be larger than just this counter in
    such case so...is it worth the trouble to handle it ?
    
    Performance review
    ===================
    
    not done.
    
    Coding review
    =============
    
    I have done some update to the patch to make it follow the coding guidelines.
    I don't think it can exist a portability issue.
    
    Tom did say: I'd think MB would be a practical unit size, and would
    avoid (at least for the near term) the need to make the parameter a
    float.
    
    Architecture review
    ================
    
    IMO, the code used to log_temp_file is partially deprecated by this
    feature: the system call to stat() looks now useles and can be removed
    as well as the multiple if () around temp_file_limit or log_temp_file
    can be merged.
    
    Review review
    =============
    
    I didn't test for performances, I have not tryed to overflow
    temporary_files_size but looks hard to raise. (this might be possible
    if the temp_file_limit is not used but the temporary_files_size
    incremented, but current code does not allow to increment one without
    using the feature, so it is good I think)
    
    Mark, please double check that my little updates of your patch did not
    remove anything or modify the behavior in a wrong way.
    
    Thanks for your good job,
    -- 
    Cédric Villemain               2ndQuadrant
    http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support
    
  25. Re: Re: patch review : Add ability to constrain backend temporary file space

    Robert Haas <robertmhaas@gmail.com> — 2011-06-20T13:21:24Z

    On Mon, Jun 20, 2011 at 9:15 AM, Cédric Villemain
    <cedric.villemain.debian@gmail.com> wrote:
    > The feature does not work exactly as expected because the write limit
    > is rounded per 8kB because we write before checking. I believe if one
    > write a file of 1GB in one pass (instead of repetitive 8kB increment),
    > and the temp_file_limit is 0, then the server will write the 1GB
    > before aborting.
    
    Can we rearrange thing so we check first, and then write?
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
  26. Re: Re: patch review : Add ability to constrain backend temporary file space

    Cédric Villemain <cedric.villemain.debian@gmail.com> — 2011-06-20T14:39:40Z

    2011/6/20 Robert Haas <robertmhaas@gmail.com>:
    > On Mon, Jun 20, 2011 at 9:15 AM, Cédric Villemain
    > <cedric.villemain.debian@gmail.com> wrote:
    >> The feature does not work exactly as expected because the write limit
    >> is rounded per 8kB because we write before checking. I believe if one
    >> write a file of 1GB in one pass (instead of repetitive 8kB increment),
    >> and the temp_file_limit is 0, then the server will write the 1GB
    >> before aborting.
    >
    > Can we rearrange thing so we check first, and then write?
    
    probably but it needs more work to catch corner cases. We may be safe
    to just document that (and also in the code). The only way I see so
    far to have a larger value than 8kB here is to have a plugin doing the
    sort instead of the postgresql core sort algo.
    
    
    
    >
    > --
    > Robert Haas
    > EnterpriseDB: http://www.enterprisedb.com
    > The Enterprise PostgreSQL Company
    >
    
    
    
    -- 
    Cédric Villemain               2ndQuadrant
    http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support
    
    
  27. Re: Re: patch review : Add ability to constrain backend temporary file space

    Mark Kirkwood <mark.kirkwood@catalyst.net.nz> — 2011-06-21T23:13:12Z

    On 21/06/11 02:39, Cédric Villemain wrote:
    > 2011/6/20 Robert Haas<robertmhaas@gmail.com>:
    >> On Mon, Jun 20, 2011 at 9:15 AM, Cédric Villemain
    >> <cedric.villemain.debian@gmail.com>  wrote:
    >>> The feature does not work exactly as expected because the write limit
    >>> is rounded per 8kB because we write before checking. I believe if one
    >>> write a file of 1GB in one pass (instead of repetitive 8kB increment),
    >>> and the temp_file_limit is 0, then the server will write the 1GB
    >>> before aborting.
    >> Can we rearrange thing so we check first, and then write?
    > probably but it needs more work to catch corner cases. We may be safe
    > to just document that (and also in the code). The only way I see so
    > far to have a larger value than 8kB here is to have a plugin doing the
    > sort instead of the postgresql core sort algo.
    >
    >
    
    Thanks guys - will look at moving the check, and adding some 
    documentation about the possible impacts of plugins (or new executor 
    methods) that might write in chunks bigger than blocksz.
    
    Maybe a few days - I'm home sick ATM, plus looking after these 
    http://www.maftet.co.nz/kittens.html
    
    Cheers
    
    Mark
    
    
  28. Re: Re: patch review : Add ability to constrain backend temporary file space

    Mark Kirkwood <mark.kirkwood@catalyst.net.nz> — 2011-06-24T06:01:50Z

    On 22/06/11 11:13, Mark Kirkwood wrote:
    > On 21/06/11 02:39, Cédric Villemain wrote:
    >> 2011/6/20 Robert Haas<robertmhaas@gmail.com>:
    >>> On Mon, Jun 20, 2011 at 9:15 AM, Cédric Villemain
    >>> <cedric.villemain.debian@gmail.com>  wrote:
    >>>> The feature does not work exactly as expected because the write limit
    >>>> is rounded per 8kB because we write before checking. I believe if one
    >>>> write a file of 1GB in one pass (instead of repetitive 8kB increment),
    >>>> and the temp_file_limit is 0, then the server will write the 1GB
    >>>> before aborting.
    >>> Can we rearrange thing so we check first, and then write?
    >> probably but it needs more work to catch corner cases. We may be safe
    >> to just document that (and also in the code). The only way I see so
    >> far to have a larger value than 8kB here is to have a plugin doing the
    >> sort instead of the postgresql core sort algo.
    >>
    >>
    >
    > Thanks guys - will look at moving the check, and adding some 
    > documentation about the possible impacts of plugins (or new executor 
    > methods) that might write in chunks bigger than blocksz.
    >
    > Maybe a few days - I'm home sick ATM, plus looking after these 
    > http://www.maftet.co.nz/kittens.html
    >
    >
    
    This  version moves the check *before* we write the new buffer, so 
    should take care of issues about really large write buffers, plugins 
    etc. Also I *think* that means there is no need to amend the documentation.
    
    Cheers
    
    Mark
    
    P.s: Hopefully I've got a context diff this time, just realized that git 
    diff -c does *not* produce a context diff (doh).
    
    
  29. Re: Re: patch review : Add ability to constrain backend temporary file space

    Cédric Villemain <cedric.villemain.debian@gmail.com> — 2011-07-07T21:26:31Z

    2011/6/24 Mark Kirkwood <mark.kirkwood@catalyst.net.nz>:
    >
    > This  version moves the check *before* we write the new buffer, so should
    > take care of issues about really large write buffers, plugins etc. Also I
    > *think* that means there is no need to amend the documentation.
    >
    > Cheers
    >
    > Mark
    >
    > P.s: Hopefully I've got a context diff this time, just realized that git
    > diff -c does *not* produce a context diff (doh).
    >
    
    
    Mark, I have this (hopefully) final review in my TODO list, I won't be
    able to check until CHAR(11) is done, and probably not before the 19th
    of July.
    
    So... if one want to make progress in the meantime, please do.
    -- 
    Cédric Villemain               2ndQuadrant
    http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support
    
    
  30. Re: Re: patch review : Add ability to constrain backend temporary file space

    Josh Berkus <josh@agliodbs.com> — 2011-07-10T18:44:28Z

    Hackers,
    
    This patch needs a new reviewer, per Cedric.  Please help!
    
    -- 
    Josh Berkus
    PostgreSQL Experts Inc.
    http://pgexperts.com
    
    
  31. Re: Re: patch review : Add ability to constrain backend temporary file space

    Tatsuo Ishii <ishii@postgresql.org> — 2011-07-14T06:48:24Z

    > Hackers,
    > 
    > This patch needs a new reviewer, per Cedric.  Please help!
    
    Hi I am the new reviewer:-)
    
    I have looked into the v6 patches. One thing I would like to suggest
    is, enhancing the error message when temp_file_limit will be exceeded.
    
    ERROR:  aborting due to exceeding temp file limit
    
    Is it possible to add the current temp file limit to the message? For
    example,
    
    ERROR:  aborting due to exceeding temp file limit 10000kB
    
    I know the current setting of temp_file_limit can be viewd in other
    ways, but I think this will make admin's or application developer's
    life a little bit easier.
    --
    Tatsuo Ishii
    SRA OSS, Inc. Japan
    English: http://www.sraoss.co.jp/index_en.php
    Japanese: http://www.sraoss.co.jp
    
    
  32. Re: Re: patch review : Add ability to constrain backend temporary file space

    Mark Kirkwood <mark.kirkwood@catalyst.net.nz> — 2011-07-14T10:34:30Z

    On 14/07/11 18:48, Tatsuo Ishii wrote:
    >
    > Hi I am the new reviewer:-)
    >
    > I have looked into the v6 patches. One thing I would like to suggest
    > is, enhancing the error message when temp_file_limit will be exceeded.
    >
    > ERROR:  aborting due to exceeding temp file limit
    >
    > Is it possible to add the current temp file limit to the message? For
    > example,
    >
    > ERROR:  aborting due to exceeding temp file limit 10000kB
    >
    > I know the current setting of temp_file_limit can be viewd in other
    > ways, but I think this will make admin's or application developer's
    > life a little bit easier.
    
    Hi Tatsuo,
    
    Yeah, good suggestion - I agree that it would be useful to supply extra 
    detail, I'll amend and resubmit a new patch (along with whatever review 
    modifications we come up with in the meantime)!
    
    Cheers
    
    Mark
    
    
  33. Re: Re: patch review : Add ability to constrain backend temporary file space

    Tatsuo Ishii <ishii@postgresql.org> — 2011-07-15T02:57:09Z

    >> I have looked into the v6 patches. One thing I would like to suggest
    >> is, enhancing the error message when temp_file_limit will be exceeded.
    >>
    >> ERROR:  aborting due to exceeding temp file limit
    >>
    >> Is it possible to add the current temp file limit to the message? For
    >> example,
    >>
    >> ERROR:  aborting due to exceeding temp file limit 10000kB
    >>
    >> I know the current setting of temp_file_limit can be viewd in other
    >> ways, but I think this will make admin's or application developer's
    >> life a little bit easier.
    > 
    > Hi Tatsuo,
    > 
    > Yeah, good suggestion - I agree that it would be useful to supply
    > extra detail, I'll amend and resubmit a new patch (along with whatever
    > review modifications we come up with in the meantime)!
    > 
    > Cheers
    > 
    > Mark
    
    Maybe we could add more info regarding current usage and requested
    amount in addition to the temp file limit value. I mean something
    like:
    
    ERROR:  aborting due to exceeding temp file limit. Current usage 9000kB, requested size 1024kB, thus it will exceed temp file limit 10000kB.
    
    What do you think?
    --
    Tatsuo Ishii
    SRA OSS, Inc. Japan
    English: http://www.sraoss.co.jp/index_en.php
    Japanese: http://www.sraoss.co.jp
    
    
  34. Re: Re: patch review : Add ability to constrain backend temporary file space

    Mark Kirkwood <mark.kirkwood@catalyst.net.nz> — 2011-07-15T10:15:09Z

    On 15/07/11 14:57, Tatsuo Ishii wrote:
    >
    > Maybe we could add more info regarding current usage and requested
    > amount in addition to the temp file limit value. I mean something
    > like:
    >
    > ERROR:  aborting due to exceeding temp file limit. Current usage 9000kB, requested size 1024kB, thus it will exceed temp file limit 10000kB.
    >
    
    I modeled the original message on what happens when statement timeout is 
    exceeded, which doesn't state its limit in the error message at all - 
    actually I did wonder if there is was informal standard for *not* 
    stating the value of the limit that is being exceeded! However, I agree 
    with you and think it makes sense to include it here. I wonder if the 
    additional detail you are suggesting above might be better added to a 
    HINT - what do you think? If it is a better idea to just add it in the 
    message as above I can certainly do that.
    
    Cheers
    
    Mark
    
    
  35. Re: Re: patch review : Add ability to constrain backend temporary file space

    Cédric Villemain <cedric.villemain.debian@gmail.com> — 2011-07-15T10:55:01Z

    2011/7/15 Mark Kirkwood <mark.kirkwood@catalyst.net.nz>:
    > On 15/07/11 14:57, Tatsuo Ishii wrote:
    >>
    >> Maybe we could add more info regarding current usage and requested
    >> amount in addition to the temp file limit value. I mean something
    >> like:
    >>
    >> ERROR:  aborting due to exceeding temp file limit. Current usage 9000kB,
    >> requested size 1024kB, thus it will exceed temp file limit 10000kB.
    >>
    >
    > I modeled the original message on what happens when statement timeout is
    > exceeded, which doesn't state its limit in the error message at all -
    > actually I did wonder if there is was informal standard for *not* stating
    > the value of the limit that is being exceeded! However, I agree with you and
    > think it makes sense to include it here. I wonder if the additional detail
    > you are suggesting above might be better added to a HINT - what do you
    > think? If it is a better idea to just add it in the message as above I can
    > certainly do that.
    
    Remember that what will happens is probably:
    
    ERROR:  aborting due to exceeding temp file limit. Current usage 8000kB,
    requested size 8008kB, thus it will exceed temp file limit 8kB.
    
    because temp file are increased by 8kb at once, rarely more (and by
    rare I mean that it can happens via an extension or in the future, not
    with current core postgresql).
    
    This kind of message can also trouble the DBA by suggesting that the
    query doesn't need more than what was asking at this moment.
    
    That's being said I have no strong opinion for the HINT message if the
    documentation is clear enough to not confuse DBA.
    
    -- 
    Cédric Villemain               2ndQuadrant
    http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support
    
    
  36. Re: Re: patch review : Add ability to constrain backend temporary file space

    Tom Lane <tgl@sss.pgh.pa.us> — 2011-07-15T14:30:19Z

    =?ISO-8859-1?Q?C=E9dric_Villemain?= <cedric.villemain.debian@gmail.com> writes:
    >> On 15/07/11 14:57, Tatsuo Ishii wrote:
    >>> Maybe we could add more info regarding current usage and requested
    >>> amount in addition to the temp file limit value. I mean something
    >>> like:
    >>> 
    >>> ERROR: aborting due to exceeding temp file limit. Current usage 9000kB,
    >>> requested size 1024kB, thus it will exceed temp file limit 10000kB.
    
    > Remember that what will happens is probably:
    
    > ERROR:  aborting due to exceeding temp file limit. Current usage 8000kB,
    > requested size 8008kB, thus it will exceed temp file limit 8kB.
    
    > because temp file are increased by 8kb at once, rarely more
    
    Yes.  I think the extra detail Tatsuo proposes is useless and possibly
    confusing.  I don't object to stating the current limit, but the other
    numbers are not likely to be helpful to anyone.  (If we had some way of
    knowing what the file would ultimately grow to if we allowed the query
    to continue, that would be useful; but of course we don't know that.)
    
    			regards, tom lane
    
    
  37. Re: Re: patch review : Add ability to constrain backend temporary file space

    Tatsuo Ishii <ishii@postgresql.org> — 2011-07-16T23:45:46Z

    >> I modeled the original message on what happens when statement timeout is
    >> exceeded, which doesn't state its limit in the error message at all -
    >> actually I did wonder if there is was informal standard for *not* stating
    >> the value of the limit that is being exceeded! However, I agree with you and
    >> think it makes sense to include it here. I wonder if the additional detail
    >> you are suggesting above might be better added to a HINT - what do you
    >> think? If it is a better idea to just add it in the message as above I can
    >> certainly do that.
    > 
    > Remember that what will happens is probably:
    > 
    > ERROR:  aborting due to exceeding temp file limit. Current usage 8000kB,
    > requested size 8008kB, thus it will exceed temp file limit 8kB.
    > 
    > because temp file are increased by 8kb at once, rarely more (and by
    > rare I mean that it can happens via an extension or in the future, not
    > with current core postgresql).
    
    Could you please elaborate why "Current usage 8000kB" can bigger than
    "temp file limit 8kB"? I undertstand the point that temp files are
    allocated by 8kB at once, but I don't understand why those numbers you
    suggested could happen. Actually I tried with the modified patches and
    got:
    
    test=# CREATE TEMPORARY TABLE resourcetemp1 AS SELECT generate_series(1,100000);
    SELECT 100000
    test=# SET temp_file_limit = 578;
    SET
    test=# SELECT count(*) FROM (select * FROM resourcetemp1  ORDER BY 1) AS a;
    ERROR:  aborting due to exceeding temp file limit, current usage 576kB, requested size 8192kB, thus it will exceed temp file limit 578kB
    
    Here temp_file_limit is not specified by 8kB unit, so "current usage"
    becomes 576kB, which is 8kB unit. I don't think those numbers
    will terribly confuse DBAs..
    --
    Tatsuo Ishii
    SRA OSS, Inc. Japan
    English: http://www.sraoss.co.jp/index_en.php
    Japanese: http://www.sraoss.co.jp
    
    
  38. Re: Re: patch review : Add ability to constrain backend temporary file space

    Tom Lane <tgl@sss.pgh.pa.us> — 2011-07-17T03:00:30Z

    Mark Kirkwood <mark.kirkwood@catalyst.net.nz> writes:
    > This  version moves the check *before* we write the new buffer, so 
    > should take care of issues about really large write buffers, plugins 
    > etc.
    
    This logic seems pretty obviously wrong:
    
    +     if (temp_file_limit >= 0 && VfdCache[file].fdstate & FD_TEMPORARY)
    +     {
    +         if (VfdCache[file].seekPos + amount >= VfdCache[file].fileSize)
    +         {
    +             increaseSize = true;
    +             if ((temporary_files_size + (double)amount) / 1024.0 > (double)temp_file_limit)
    +                 ereport(ERROR,
    +                         (errcode(ERRCODE_QUERY_CANCELED),
    +                         errmsg("aborting due to exceeding temp file limit")));
    +         }
    +     }
    
    It's supposing that the write length ("amount") is exactly the amount by
    which the file length will grow, which would only be true if seekPos is
    exactly equal to fileSize before the write.  I think that would often be
    the case in our usage patterns, but surely code at this low level should
    not be assuming that.
    
    BTW, the definition of the GUC seems to constrain the total temp file
    size to be no more than 2GB on 32-bit machines ... do we really want
    it to act like that?
    
    			regards, tom lane
    
    
  39. Re: Re: patch review : Add ability to constrain backend temporary file space

    Tom Lane <tgl@sss.pgh.pa.us> — 2011-07-17T03:07:51Z

    Tatsuo Ishii <ishii@postgresql.org> writes:
    >> Remember that what will happens is probably:
    >> 
    >> ERROR:  aborting due to exceeding temp file limit. Current usage 8000kB,
    >> requested size 8008kB, thus it will exceed temp file limit 8kB.
    
    > Could you please elaborate why "Current usage 8000kB" can bigger than
    > "temp file limit 8kB"? I undertstand the point that temp files are
    > allocated by 8kB at once, but I don't understand why those numbers you
    > suggested could happen. Actually I tried with the modified patches and
    > got:
    
    > test=# CREATE TEMPORARY TABLE resourcetemp1 AS SELECT generate_series(1,100000);
    > SELECT 100000
    > test=# SET temp_file_limit = 578;
    > SET
    > test=# SELECT count(*) FROM (select * FROM resourcetemp1  ORDER BY 1) AS a;
    > ERROR:  aborting due to exceeding temp file limit, current usage 576kB, requested size 8192kB, thus it will exceed temp file limit 578kB
    
    You didn't show us how you computed those numbers, but I'm really
    dubious that FileWrite() has got any ability to produce numbers that
    are helpful.  Like Cedric, I think the write amount in any one call
    is usually going to be one block.
    
    			regards, tom lane
    
    
  40. Re: Re: patch review : Add ability to constrain backend temporary file space

    Tom Lane <tgl@sss.pgh.pa.us> — 2011-07-17T18:25:03Z

    Mark Kirkwood <mark.kirkwood@catalyst.net.nz> writes:
    > [ temp-files-v6.patch.gz ]
    
    I've applied this patch with some editorialization, notably:
    
    I changed the datatype of temporary_files_size to uint64 so as to avoid
    worries about accumulating roundoff error over a long transaction.
    This is probably just paranoia, since on most machines double can store
    integers exactly up to 2^52 which is more than the largest possible
    temp_file_limit, but it seemed safer.
    
    fileSize of a temp file, and temporary_files_size, must be updated
    correctly whether or not temp_file_limit is currently being enforced.
    Otherwise things do not behave sanely if temp_file_limit is turned on
    mid-transaction.
    
    Fixed bogus calculation in enforcement check, per my note yesterday.
    
    Added a new SQLSTATE code, ERRCODE_CONFIGURATION_LIMIT_EXCEEDED, because
    I felt that ERRCODE_QUERY_CANCELED wasn't at all appropriate for this.
    (Maybe we should think about changing the statement_timeout code to use
    that too, although I'm not entirely sure that we can easily tell
    statement_timeout apart from a query cancel.)
    
    Rephrased the error message to "temporary file size exceeds
    temp_file_limit (%dkB)", as per Tatsuo's first suggestion but not his
    second.  There's still room for bikeshedding here, of course.
    
    Removed the reset of temporary_files_size in CleanupTempFiles.  It was
    inappropriate because some temp files can survive CleanupTempFiles, and
    unnecessary anyway because the counter is correctly decremented when
    unlinking a temp file.  (This was another reason for wanting
    temporary_files_size to be exact, because we're now doing dead reckoning
    over the life of the session.)
    
    Set the maximum value of temp_file_limit to INT_MAX not MAX_KILOBYTES.
    There's no reason for the limit to be less on 32-bit machines than
    64-bit, since we are doing arithmetic in uint64 not size_t.
    
    Minor rewrite of documentation.
    
    Also, I didn't add the proposed regression test, because it seems shaky
    to me.  In an installation with larger than default work_mem, the sort
    might not spill to disk.  I don't think this feature is large enough to
    deserve its very own, rather expensive, regression test anyway.
    
    			regards, tom lane
    
    
  41. Re: Re: patch review : Add ability to constrain backend temporary file space

    Tatsuo Ishii <ishii@postgresql.org> — 2011-07-17T20:42:39Z

    >> Could you please elaborate why "Current usage 8000kB" can bigger than
    >> "temp file limit 8kB"? I undertstand the point that temp files are
    >> allocated by 8kB at once, but I don't understand why those numbers you
    >> suggested could happen. Actually I tried with the modified patches and
    >> got:
    > 
    >> test=# CREATE TEMPORARY TABLE resourcetemp1 AS SELECT generate_series(1,100000);
    >> SELECT 100000
    >> test=# SET temp_file_limit = 578;
    >> SET
    >> test=# SELECT count(*) FROM (select * FROM resourcetemp1  ORDER BY 1) AS a;
    >> ERROR:  aborting due to exceeding temp file limit, current usage 576kB, requested size 8192kB, thus it will exceed temp file limit 578kB
    > 
    > You didn't show us how you computed those numbers, but I'm really
    > dubious that FileWrite() has got any ability to produce numbers that
    > are helpful.  Like Cedric, I think the write amount in any one call
    > is usually going to be one block.
    
    Here it is(fd.c).
    
    	/*
    	 * If performing this write will increase the file size, then abort if it will
    	 * exceed temp_file_limit
    	 */
    	if (temp_file_limit >= 0 && VfdCache[file].fdstate & FD_TEMPORARY)
    	{
    		if (VfdCache[file].seekPos + amount >= VfdCache[file].fileSize)
    		{
    			increaseSize = true;
    			if ((temporary_files_size + (double)amount) / 1024.0 > (double)temp_file_limit)
     				ereport(ERROR,
     						(errcode(ERRCODE_QUERY_CANCELED),
    						 errmsg("aborting due to exceeding temp file limit, current usage %dkB, requested size %dkB, thus it will exceed temp file limit %dkB",
    								(int)(temporary_files_size/1024),
    								amount,
    								temp_file_limit)));
    		}
    	}
    
    I also attached the whole patch against fd.c at the point when Mark
    proposed the changes.
    --
    Tatsuo Ishii
    SRA OSS, Inc. Japan
    English: http://www.sraoss.co.jp/index_en.php
    Japanese: http://www.sraoss.co.jp
    
  42. Re: Re: patch review : Add ability to constrain backend temporary file space

    Tom Lane <tgl@sss.pgh.pa.us> — 2011-07-17T21:39:02Z

    Tatsuo Ishii <ishii@postgresql.org> writes:
    >> You didn't show us how you computed those numbers, but I'm really
    >> dubious that FileWrite() has got any ability to produce numbers that
    >> are helpful.  Like Cedric, I think the write amount in any one call
    >> is usually going to be one block.
    
    > Here it is(fd.c).
    
    >                  ereport(ERROR,
    >                          (errcode(ERRCODE_QUERY_CANCELED),
    >                          errmsg("aborting due to exceeding temp file limit, current usage %dkB, requested size %dkB, thus it will exceed temp file limit %dkB",
    >                                 (int)(temporary_files_size/1024),
    >                                 amount,
    >                                 temp_file_limit)));
    
    The thing is that unless "amount" is really large, you're just going to
    have two numbers that are very close to each other.  I think this is
    just useless complication, because "amount" is almost always going to
    be 8kB or less.
    
    			regards, tom lane
    
    
  43. Re: Re: patch review : Add ability to constrain backend temporary file space

    Tatsuo Ishii <ishii@postgresql.org> — 2011-07-17T21:53:29Z

    >>                  ereport(ERROR,
    >>                          (errcode(ERRCODE_QUERY_CANCELED),
    >>                          errmsg("aborting due to exceeding temp file limit, current usage %dkB, requested size %dkB, thus it will exceed temp file limit %dkB",
    >>                                 (int)(temporary_files_size/1024),
    >>                                 amount,
    >>                                 temp_file_limit)));
    > 
    > The thing is that unless "amount" is really large, you're just going to
    > have two numbers that are very close to each other.  I think this is
    > just useless complication, because "amount" is almost always going to
    > be 8kB or less.
    
    Oops. My bad. I should have divide "amount" by 1024. Now I understand
    your point.
    --
    Tatsuo Ishii
    SRA OSS, Inc. Japan
    English: http://www.sraoss.co.jp/index_en.php
    Japanese: http://www.sraoss.co.jp
    
    
  44. Re: Re: patch review : Add ability to constrain backend temporary file space

    Mark Kirkwood <mark.kirkwood@catalyst.net.nz> — 2011-07-17T22:11:46Z

    On 18/07/11 06:25, Tom Lane wrote:
    > Mark Kirkwood<mark.kirkwood@catalyst.net.nz>  writes:
    >> [ temp-files-v6.patch.gz ]
    > I've applied this patch with some editorialization, notably:
    
    Awesome, the changes look great!
    
    Cheers
    
    Mark
    
    
  45. Re: Re: patch review : Add ability to constrain backend temporary file space

    Mark Kirkwood <mark.kirkwood@catalyst.net.nz> — 2011-07-20T09:21:48Z

    And I thought I should mention: a big thank you to the the reviewers and 
    interested parties, Cedric, Tatsuo, Robert and Tom who did a review + 
    fixes as he committed the patch - nice work guys!
    
    regards
    
    Mark