Thread

  1. Re: Disabling Heap-Only Tuples

    Thom Brown <thom@linux.com> — 2023-07-07T11:21:03Z

    On Thu, 6 Jul 2023 at 21:18, Matthias van de Meent
    <boekewurm+postgres@gmail.com> wrote:
    >
    > On Wed, 5 Jul 2023 at 19:55, Thom Brown <thom@linux.com> wrote:
    > >
    > > On Wed, 5 Jul 2023 at 18:05, Matthias van de Meent
    > > <boekewurm+postgres@gmail.com> wrote:
    > > > So what were you thinking of? A session GUC? A table option?
    > >
    > > Both.
    >
    > Here's a small patch implementing a new table option max_local_update
    > (name very much bikesheddable). Value is -1 (default, disabled) or the
    > size of the table in MiB that you still want to allow to update on the
    > same page. I didn't yet go for a GUC as I think that has too little
    > control on the impact on the system.
    >
    > I decided that max_local_update would be in MB because there is no
    > reloption value that can contain MaxBlockNumber and -1/disabled; and 1
    > MiB seems like enough granularity for essentially all use cases.
    >
    > The added regression tests show how this feature works, that the new
    > feature works, and validate that lock levels are acceptable
    > (ShareUpdateExclusiveLock, same as for updating fillfactor).
    
    Wow, thanks for working on this.
    
    I've given it a test, and it does what I would expect it to do.
    
    I'm aware of the concerns about the potential for the relocation to
    land in an undesirable location, so perhaps that needs addressing.
    But this is already considerably better than the current need to
    update a row until it gets pushed off its current page.  Ideally there
    would be tooling built around this where the user wouldn't need to
    figure out how much of the table to UPDATE, or deal with VACUUMing
    concerns.
    
    But here's my quick test:
    
    CREATE OR REPLACE FUNCTION compact_table(table_name IN TEXT)
    RETURNS VOID AS $$
    DECLARE
        current_row RECORD;
        old_ctid TID;
        new_ctid TID;
        keys TEXT;
        update_query TEXT;
        row_counter INTEGER := 0;
    BEGIN
        SELECT string_agg(a.attname || ' = ' || a.attname, ', ')
        INTO keys
        FROM
            pg_index i
        JOIN
            pg_attribute a ON a.attnum = ANY(i.indkey)
        WHERE
            i.indrelid = table_name::regclass
            AND a.attrelid = table_name::regclass
            AND i.indisprimary;
    
        IF keys IS NULL THEN
            RAISE EXCEPTION 'Table % does not have a primary key.', table_name;
        END IF;
    
        FOR current_row IN
            EXECUTE FORMAT('SELECT ctid, * FROM %I ORDER BY ctid DESC', table_name)
        LOOP
            old_ctid := current_row.ctid;
    
            update_query := FORMAT('UPDATE %I SET %s WHERE ctid = $1
    RETURNING ctid', table_name, keys);
            EXECUTE update_query USING old_ctid INTO new_ctid;
    
            row_counter := row_counter + 1;
    
            IF row_counter % 1000 = 0 THEN
                RAISE NOTICE '% rows relocated.', row_counter;
            END IF;
    
            IF new_ctid <= old_ctid THEN
                CONTINUE;
            ELSE
                RAISE NOTICE 'All non-contiguous rows relocated.';
                EXIT;
            END IF;
        END LOOP;
    END; $$
    LANGUAGE plpgsql;
    
    
    postgres=# CREATE TABLE bigtable (id int, content text);
    CREATE TABLE
    postgres=# INSERT INTO bigtable SELECT x, 'This is just a way to fill
    up space.' FROM generate_series(1,10000000) a(x);
    INSERT 0 10000000
    postgres=# DELETE FROM bigtable WHERE id % 7 = 0;
    DELETE 1428571
    postgres=# VACUUM bigtable;
    VACUUM
    postgres=# ALTER TABLE bigtable SET (max_local_update = 0);
    ALTER TABLE
    postgres=# ALTER TABLE bigtable ADD PRIMARY KEY (id);
    ALTER TABLE
    postgres=# \dt+ bigtable
                                       List of relations
     Schema |   Name   | Type  | Owner | Persistence | Access method |
    Size  | Description
    --------+----------+-------+-------+-------------+---------------+--------+-------------
     public | bigtable | table | thom  | permanent   | heap          | 730 MB |
    (1 row)
    
    postgres=# SELECT * FROM pgstattuple('bigtable');
     table_len | tuple_count | tuple_len | tuple_percent |
    dead_tuple_count | dead_tuple_len | dead_tuple_percent | free_space |
    free_percent
    -----------+-------------+-----------+---------------+------------------+----------------+--------------------+------------+--------------
     765607936 |     8571429 | 557142885 |         72.77 |
    0 |              0 |                  0 |  105901628 |        13.83
    (1 row)
    
    postgres=# SELECT compact_table('bigtable');
    NOTICE:  1000 rows relocated.
    NOTICE:  2000 rows relocated.
    NOTICE:  3000 rows relocated.
    NOTICE:  4000 rows relocated.
    ...
    NOTICE:  1221000 rows relocated.
    NOTICE:  1222000 rows relocated.
    NOTICE:  1223000 rows relocated.
    NOTICE:  1224000 rows relocated.
    NOTICE:  All non-contiguous rows relocated.
     compact_table
    ---------------
    
    (1 row)
    
    postgres=# VACUUM bigtable;
    VACUUM
    postgres=# \dt+ bigtable;
                                       List of relations
     Schema |   Name   | Type  | Owner | Persistence | Access method |
    Size  | Description
    --------+----------+-------+-------+-------------+---------------+--------+-------------
     public | bigtable | table | thom  | permanent   | heap          | 626 MB |
    (1 row)
    
    postgres=# SELECT * FROM pgstattuple('bigtable');
     table_len | tuple_count | tuple_len | tuple_percent |
    dead_tuple_count | dead_tuple_len | dead_tuple_percent | free_space |
    free_percent
    -----------+-------------+-----------+---------------+------------------+----------------+--------------------+------------+--------------
     656236544 |     8571429 | 557142885 |          84.9 |
    0 |              0 |                  0 |    2564888 |         0.39
    (1 row)
    
    Works for me.
    
    Thom