Thread

  1. UPDATE modifies more rows that it should

    hubert depesz lubaczewski <depesz@depesz.com> — 2024-02-16T09:14:46Z

    Hi,
    I have a case where update modifies more rows than it should.
    Unfortunately, I can't seem to make smaller example. So let me show you
    what I can.
    
    First, tested it on Pg 16.2, and 17devel (on Linux), and the person that reported
    it to me was using (most likely) pg 15.5 on darwin.
    
    The problem is that the update:
    
    #v+
    UPDATE public.enrollments AS X
    SET
        id = x.id
    WHERE
        x.type = 'ObserverEnrollment' AND
        x.id IN (
            SELECT
                e.id
            FROM
                public.enrollments AS e
                JOIN public.users AS u ON u.id = e.associated_user_id
            WHERE
                e.type = 'ObserverEnrollment' AND
                u.workflow_state = 'deleted' AND
                e.role_id = 20097 AND
                e.course_section_id = 1159 AND
                e.user_id = 2463
            ORDER BY
                e.id ASC
            LIMIT 1
            FOR UPDATE OF e
        )
    RETURNING *;
    #v-
    
    updates two rows, despite the fact that it should only one.
    
    I verified that indexes are not broken.
    
    Subselect returns:
    #v+
      id
    ──────
     1799
    (1 row)
    #v-
    
    but update with returning id shows:
    #v+
      id
    ──────
     1799
     1800
    (2 rows)
    #v-
    
    Relevant data about the rows:
    
    #v+
    $ select id, associated_user_id, type, role_id, course_section_id, user_id from enrollments where id in (1799, 1800) or type = 'ObserverEnrollment';
      id  │ associated_user_id │        type        │ role_id │ course_section_id │ user_id
    ──────┼────────────────────┼────────────────────┼─────────┼───────────────────┼─────────
     1798 │               2464 │ ObserverEnrollment │   20097 │              1159 │    2463
     1799 │               2465 │ ObserverEnrollment │   20097 │              1159 │    2463
     1800 │               2466 │ ObserverEnrollment │   20097 │              1159 │    2463
    (3 rows)
    #v-
    
    Relevant data about users:
    
    #v+
    $ select id, workflow_state from users where id in (2464, 2465, 2466);
      id  │ workflow_state
    ──────┼────────────────
     2464 │ pre_registered
     2465 │ deleted
     2466 │ deleted
    (3 rows)
    #v-
    
    Tables are MUCH wider, and have many indexes, and foreign key, will show what I can at the end.
    
    Anyway - explain analyze for just the subselect shows:
    
    #v+
                                                                            QUERY PLAN
    ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
     Limit  (cost=2.17..2.18 rows=1 width=20) (actual time=0.030..0.031 rows=1 loops=1)
       ->  LockRows  (cost=2.17..2.18 rows=1 width=20) (actual time=0.029..0.030 rows=1 loops=1)
             ->  Sort  (cost=2.17..2.17 rows=1 width=20) (actual time=0.023..0.024 rows=1 loops=1)
                   Sort Key: e.id
                   Sort Method: quicksort  Memory: 25kB
                   ->  Nested Loop  (cost=0.00..2.16 rows=1 width=20) (actual time=0.017..0.019 rows=2 loops=1)
                         Join Filter: (u.id = e.associated_user_id)
                         Rows Removed by Join Filter: 3
                         ->  Seq Scan on enrollments e  (cost=0.00..1.08 rows=1 width=22) (actual time=0.010..0.011 rows=3 loops=1)
                               Filter: (((type)::text = 'ObserverEnrollment'::text) AND (role_id = 20097) AND (course_section_id = 1159) AND (user_id = 2463))
                               Rows Removed by Filter: 1
                         ->  Seq Scan on users u  (cost=0.00..1.06 rows=1 width=14) (actual time=0.001..0.002 rows=2 loops=3)
                               Filter: ((workflow_state)::text = 'deleted'::text)
                               Rows Removed by Filter: 3
     Planning Time: 0.235 ms
     Execution Time: 0.053 ms
    (16 rows)
    #v-
    
    Which is OK, and, as I shown earlier returns just enrollments id 1799.
    
    But when I put it with update, explain becomes:
    
    #v+
                                                                                  QUERY PLAN
    ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
     Update on enrollments x  (cost=2.17..3.25 rows=1 width=46) (actual time=0.056..0.063 rows=2 loops=1)
       ->  Nested Loop Semi Join  (cost=2.17..3.25 rows=1 width=46) (actual time=0.035..0.038 rows=2 loops=1)
             Join Filter: ("ANY_subquery".id = x.id)
             Rows Removed by Join Filter: 2
             ->  Seq Scan on enrollments x  (cost=0.00..1.05 rows=1 width=14) (actual time=0.009..0.010 rows=3 loops=1)
                   Filter: ((type)::text = 'ObserverEnrollment'::text)
                   Rows Removed by Filter: 1
             ->  Subquery Scan on "ANY_subquery"  (cost=2.17..2.19 rows=1 width=40) (actual time=0.008..0.009 rows=1 loops=3)
                   ->  LockRows  (cost=2.17..2.18 rows=1 width=20) (actual time=0.007..0.008 rows=1 loops=3)
                         ->  Sort  (cost=2.17..2.17 rows=1 width=20) (actual time=0.005..0.005 rows=2 loops=3)
                               Sort Key: e.id
                               Sort Method: quicksort  Memory: 25kB
                               ->  Nested Loop  (cost=0.00..2.16 rows=1 width=20) (actual time=0.007..0.009 rows=2 loops=1)
                                     Join Filter: (u.id = e.associated_user_id)
                                     Rows Removed by Join Filter: 3
                                     ->  Seq Scan on enrollments e  (cost=0.00..1.08 rows=1 width=22) (actual time=0.002..0.003 rows=3 loops=1)
                                           Filter: (((type)::text = 'ObserverEnrollment'::text) AND (role_id = 20097) AND (course_section_id = 1159) AND (user_id = 2463))
                                           Rows Removed by Filter: 1
                                     ->  Seq Scan on users u  (cost=0.00..1.06 rows=1 width=14) (actual time=0.001..0.001 rows=2 loops=3)
                                           Filter: ((workflow_state)::text = 'deleted'::text)
                                           Rows Removed by Filter: 3
     Planning Time: 0.270 ms
     Execution Time: 0.096 ms
    (23 rows)
    #v-
    
    Tested, that if I'd change `... IN ( select ... )` to `... = ( select ... )` -
    then it behaves as expected, but I think that updating two rows in this case is
    a bug.
    
    Also, removing "for update" removed the 2nd row update.
    
    Tried also update enrollments set .. where id in (1799), but it updated
    correctly one row only.
    
    Hope someone can figure out what the problem might be. Of course if you'd need
    more data, I can provide it.
    
    Best regards,
    
    depesz
    
    And the promised fuller \d of the tables - i changed some words, but kept \d as
    untouched as possible.
    
                                                     Table "public.users"
                Column            │            Type             │ Collation │ Nullable │              Default              
    ──────────────────────────────┼─────────────────────────────┼───────────┼──────────┼───────────────────────────────────
     id                           │ bigint                      │           │ not null │ nextval('users_id_seq'::regclass)
     name                         │ character varying(255)      │           │          │ 
     sortable_name                │ character varying(255)      │           │          │ 
     workflow_state               │ character varying(255)      │           │ not null │ 
     time_zone                    │ character varying(255)      │           │          │ 
     uuid                         │ character varying(255)      │           │          │ 
     created_at                   │ timestamp without time zone │           │ not null │ 
     updated_at                   │ timestamp without time zone │           │ not null │ 
     avatar_image_url             │ character varying(255)      │           │          │ 
     avatar_image_source          │ character varying(255)      │           │          │ 
     avatar_image_updated_at      │ timestamp without time zone │           │          │ 
     phone                        │ character varying(255)      │           │          │ 
     column_name                  │ character varying(255)      │           │          │ 
     column_position              │ character varying(255)      │           │          │ 
     short_name                   │ character varying(255)      │           │          │ 
     deleted_at                   │ timestamp without time zone │           │          │ 
     show_user_services           │ boolean                     │           │          │ true
     page_views_count             │ integer                     │           │          │ 0
     reminder_time_for_due_dates  │ integer                     │           │          │ 172800
     reminder_time_for_tantrum    │ integer                     │           │          │ 0
     storage_quota                │ bigint                      │           │          │ 
     visible_inbox_types          │ character varying(255)      │           │          │ 
     last_user_note               │ timestamp without time zone │           │          │ 
     subscribe_to_emails          │ boolean                     │           │          │ 
     features_used                │ text                        │           │          │ 
     preferences                  │ text                        │           │          │ 
     avatar_state                 │ character varying(255)      │           │          │ 
     locale                       │ character varying(255)      │           │          │ 
     browser_locale               │ character varying(255)      │           │          │ 
     unread_conversations_count   │ integer                     │           │          │ 0
     stuck_abc_fields             │ text                        │           │          │ 
     public                       │ boolean                     │           │          │ 
     aha_lethal_key_enc           │ character varying(255)      │           │          │ 
     aha_lethal_key_salt          │ character varying(255)      │           │          │ 
     aha_communication_channel_id │ bigint                      │           │          │ 
     initial_enrollment_type      │ character varying(255)      │           │          │ 
     whatever_id                  │ integer                     │           │          │ 
     last_logged_out              │ timestamp without time zone │           │          │ 
     def_context_id               │ character varying(255)      │           │          │ 
     globules_id                  │ bigint                      │           │          │ 
     def_id                       │ text                        │           │          │ 
     pronouns                     │ character varying           │           │          │ 
     root_account_ids             │ bigint[]                    │           │ not null │ '{}'::bigint[]
     delete_me_frd                │ boolean                     │           │ not null │ false
     merged_into_user_id          │ bigint                      │           │          │ 
    Indexes:
        "users_pkey" PRIMARY KEY, btree (id)
        "index_active_users_on_id" btree (id) WHERE workflow_state::text <> 'deleted'::text
        "index_users_on_avatar_state_and_avatar_image_updated_at" btree (avatar_state, avatar_image_updated_at)
        "index_users_on_delete_me_frd" btree (delete_me_frd) WHERE delete_me_frd = true AND workflow_state::text = 'deleted'::text
        "index_users_on_def_context_id" UNIQUE, btree (def_context_id)
        "index_users_on_merged_into_user_id" btree (merged_into_user_id) WHERE merged_into_user_id IS NOT NULL
        "index_users_on_sortable_name" btree ((lower(replace(sortable_name::text, '\'::text, '\\'::text))::bytea), id)
        "index_users_on_globules_id" UNIQUE, btree (globules_id) WHERE globules_id IS NOT NULL
        "index_users_on_unique_def_id" UNIQUE, btree (def_id)
        "index_users_on_unique_uuid" UNIQUE, btree (uuid)
        "index_users_on_workflow_state" btree (workflow_state)
        "index_users_replica_identity" UNIQUE, btree (root_account_ids, id) REPLICA IDENTITY
    Foreign-key constraints:
        "fk_rails_a47cd9d666" FOREIGN KEY (merged_into_user_id) REFERENCES users(id)
    Referenced by:
    (183 tables referencing this one)
    
                                                         Table "public.enrollments"
                   Column                │            Type             │ Collation │ Nullable │                 Default                 
    ─────────────────────────────────────┼─────────────────────────────┼───────────┼──────────┼─────────────────────────────────────────
     id                                  │ bigint                      │           │ not null │ nextval('enrollments_id_seq'::regclass)
     user_id                             │ bigint                      │           │ not null │ 
     course_id                           │ bigint                      │           │ not null │ 
     type                                │ character varying(255)      │           │ not null │ 
     uuid                                │ character varying(255)      │           │          │ 
     workflow_state                      │ character varying(255)      │           │ not null │ 
     created_at                          │ timestamp without time zone │           │ not null │ 
     updated_at                          │ timestamp without time zone │           │ not null │ 
     associated_user_id                  │ bigint                      │           │          │ 
     abc_batch_id                        │ bigint                      │           │          │ 
     start_at                            │ timestamp without time zone │           │          │ 
     end_at                              │ timestamp without time zone │           │          │ 
     course_section_id                   │ bigint                      │           │ not null │ 
     root_account_id                     │ bigint                      │           │ not null │ 
     completed_at                        │ timestamp without time zone │           │          │ 
     self_enrolled                       │ boolean                     │           │          │ 
     kiddo_publishing_status             │ character varying(255)      │           │          │ 'unpublished'::character varying
     last_publish_attempt_at             │ timestamp without time zone │           │          │ 
     stuck_abc_fields                    │ text                        │           │          │ 
     kiddo_publishing_message            │ text                        │           │          │ 
     limit_privileges_to_course_section  │ boolean                     │           │ not null │ false
     last_activity_at                    │ timestamp without time zone │           │          │ 
     total_activity_time                 │ integer                     │           │          │ 
     role_id                             │ bigint                      │           │ not null │ 
     kiddod_at                           │ timestamp without time zone │           │          │ 
     abc_pseudonym_id                    │ bigint                      │           │          │ 
     last_attended_at                    │ timestamp without time zone │           │          │ 
     temporary_enrollment_source_user_id │ bigint                      │           │          │ 
     temporary_enrollment_pairing_id     │ bigint                      │           │          │ 
    Indexes:
        "enrollments_pkey" PRIMARY KEY, btree (id)
        "index_enrollments_on_associated_user_id" btree (associated_user_id) WHERE associated_user_id IS NOT NULL
        "index_enrollments_on_course_id_and_id" btree (course_id, id)
        "index_enrollments_on_course_id_and_user_id" btree (course_id, user_id)
        "index_enrollments_on_course_id_and_workflow_state" btree (course_id, workflow_state)
        "index_enrollments_on_course_section_id_and_id" btree (course_section_id, id)
        "index_enrollments_on_course_when_active" btree (course_id) WHERE workflow_state::text = 'active'::text
        "index_enrollments_on_role_id_and_user_id" btree (role_id, user_id)
        "index_enrollments_on_root_account_id_and_course_id" btree (root_account_id, course_id)
        "index_enrollments_on_abc_batch_id" btree (abc_batch_id) WHERE abc_batch_id IS NOT NULL
        "index_enrollments_on_abc_pseudonym_id" btree (abc_pseudonym_id)
        "index_enrollments_on_temp_enrollment_user_type_role_section" UNIQUE, btree (temporary_enrollment_source_user_id, user_id, type, role_id, course_section_id) WHERE temporary_enrollment_source_user_id IS NOT NULL
        "index_enrollments_on_temporary_enrollment_pairing_id" btree (temporary_enrollment_pairing_id) WHERE temporary_enrollment_pairing_id IS NOT NULL
        "index_enrollments_on_user_id" btree (user_id)
        "index_enrollments_on_user_type_role_section" UNIQUE, btree (user_id, type, role_id, course_section_id) WHERE associated_user_id IS NULL
        "index_enrollments_on_user_type_role_section_associated_user" UNIQUE, btree (user_id, type, role_id, course_section_id, associated_user_id) WHERE associated_user_id IS NOT NULL
        "index_enrollments_on_uuid" btree (uuid)
        "index_enrollments_on_workflow_state" btree (workflow_state)
        "index_enrollments_replica_identity" UNIQUE, btree (root_account_id, id) REPLICA IDENTITY
    Foreign-key constraints:
        "fk_rails_6359366b63" FOREIGN KEY (associated_user_id) REFERENCES users(id)
    (8 other tables)
    Referenced by:
    (2 tables)
    
    
    
    
    
    
  2. Re: UPDATE modifies more rows that it should

    David Rowley <dgrowleyml@gmail.com> — 2024-02-16T11:16:33Z

    On Fri, 16 Feb 2024 at 22:15, hubert depesz lubaczewski
    <depesz@depesz.com> wrote:
    > I have a case where update modifies more rows than it should.
    > Unfortunately, I can't seem to make smaller example. So let me show you
    > what I can.
    
    Here's a smaller example.
    
    drop table if exists t1;
    create table t1 (a int primary key, b int);
    insert into t1 values(1,0),(2,0),(3,0);
    set enable_hashagg=0;
    set enable_hashjoin=0;
    set enable_mergejoin=0;
    set enable_material=0;
    set enable_sort=0;
    
    update t1 set b = b+1 where a in(select a from t1 order by a for
    update of t1 limit 1);
    -- UPDATE 3
    
    -- same again
    update t1 set b = b+1 where a in(select a from t1 order by a for
    update of t1 limit 1);
    -- UPDATE 3
    
    -- see plan
    explain analyze update t1 set b = b+1 where a in(select a from t1
    order by a for update of t1 limit 1);
    
    -- this time without FOR UPDATE
    update t1 set b = b+1 where a in(select a from t1 order by a limit 1);
    -- update 1 (!)
    
    -- with FOR UPDATE again
    update t1 set b = b+1 where a in(select a from t1 order by a for
    update of t1 limit 1);
    -- update 1 (!)
    
    cluster t1 using t1_pkey;
    update t1 set b = b+1 where a in(select a from t1 order by a for
    update of t1 limit 1);
    -- update 3 (!)
    
    It seems to be caused by the FOR UPDATE.  The first time through the
    Nested Loop finds the a=1 row, but on subsequent looks, the a=1 row is
    locked by the FOR UPDATE and hits the TM_SelfModified case in
    nodeLockRows.c which causes the goto lnext to trigger.
    
    There is a comment there that offers an explanation of why we skip:
    
    /*
    * The target tuple was already updated or deleted by the
    * current command, or by a later command in the current
    * transaction.  We *must* ignore the tuple in the former
    * case, so as to avoid the "Halloween problem" of repeated
    * update attempts.  In the latter case it might be sensible
    * to fetch the updated tuple instead, but doing so would
    * require changing heap_update and heap_delete to not
    * complain about updating "invisible" tuples, which seems
    * pretty scary (table_tuple_lock will not complain, but few
    * callers expect TM_Invisible, and we're not one of them). So
    * for now, treat the tuple as deleted and do not process.
    */
    
    but I do admit it is quite strange that the results are so dependent
    on the chosen query plan.
    
    I'd recommend not using FOR UPDATE.  It might be useful to use that if
    you were executing multiple statements in a transaction to ensure they
    remain locked between a SELECT and an UPDATE.  I don't see the need in
    a single statement.
    
    I can't offer you an explanation of *if* this should happen, but I do
    agree that it seems strange.
    
    David
    
    
    
    
  3. Re: UPDATE modifies more rows that it should

    David Rowley <dgrowleyml@gmail.com> — 2024-02-16T11:25:28Z

    On Sat, 17 Feb 2024 at 00:16, David Rowley <dgrowleyml@gmail.com> wrote:
    > It seems to be caused by the FOR UPDATE.  The first time through the
    > Nested Loop finds the a=1 row, but on subsequent looks, the a=1 row is
    > locked by the FOR UPDATE and hits the TM_SelfModified case in
    > nodeLockRows.c which causes the goto lnext to trigger.
    
    This probably needs more explanation than I've given...  Because lock
    rows does goto lnext on the 2nd execution after finding a=1 is
    TM_SelfModified, we then ExecProcNode and the index gives us the a=2
    row.  This row happens to match the in the Seq scan, so the Nest Loop
    condition passes.  On the 3rd execution, the Lock rows skips a=1 and
    a=2 and gets a=3 from the index, which again matches the current row
    in the Seq Scan.  That's why the CLUSTER order matters.   You'll
    notice I had to CLUSTER the table again to get it to "UPDATE 3" after
    I ran the UPDATE without the FOR UPDATE in the subquery.
    
    David