Обсуждение: BUG #16967: Extremely slow update statement in trigger

Поиск
Список
Период
Сортировка

BUG #16967: Extremely slow update statement in trigger

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      16967
Logged by:          Nicolas Burri
Email address:      postgres@burri.li
PostgreSQL version: 13.2
Operating system:   Linux & Windows
Description:

Hi,
During a data migration our production system spent several hours in a
simple update statement of a table with very few columns and about 200k
records. We managed to reproduce the problem and a minimal setup for
reproduction can be found here:
https://www.dropbox.com/s/mxjavpl43s48hdg/bug_report.sql. The bug seems to
be related to query plan caching and can be reproduced on postrgres 11.11
and 13.2 (I have not run tests with other versions)

Description:
We use a statement level trigger to log changes to records in a certain
table. On every update, the trigger calls a function comparing “old table”
and “new table”. For every record changed by the statement, the function
inserts a new record in a “log” table. In the following scenario, execution
times explode:
First, execute an update statement that affects no rows. This query is fast
and completes within milliseconds. Then, execute a second update statement
that affects a lot of records. At 200k records, this query runs for more
than 4h on my workstation. If we call “discard plans” before executing the
second update statement, or if we do not execute the first statement at all,
the update of all rows completes within about 1 second.

Thanks and best Regards
Nicolas


Re: BUG #16967: Extremely slow update statement in trigger

От
David Fetter
Дата:
On Fri, Apr 16, 2021 at 04:52:15PM +0000, PG Bug reporting form wrote:
> The following bug has been logged on the website:
> 
> Bug reference:      16967
> Logged by:          Nicolas Burri
> Email address:      postgres@burri.li
> PostgreSQL version: 13.2
> Operating system:   Linux & Windows
> Description:        
> 
> Hi,
> During a data migration our production system spent several hours in a
> simple update statement of a table with very few columns and about 200k
> records. We managed to reproduce the problem and a minimal setup for
> reproduction can be found here:
> https://www.dropbox.com/s/mxjavpl43s48hdg/bug_report.sql. The bug seems to
> be related to query plan caching and can be reproduced on postrgres 11.11
> and 13.2 (I have not run tests with other versions)
> 
> Description:
> We use a statement level trigger to log changes to records in a certain
> table. On every update, the trigger calls a function comparing “old table”
> and “new table”. For every record changed by the statement, the function
> inserts a new record in a “log” table. In the following scenario, execution
> times explode:
> First, execute an update statement that affects no rows. This query is fast
> and completes within milliseconds. Then, execute a second update statement
> that affects a lot of records. At 200k records, this query runs for more
> than 4h on my workstation. If we call “discard plans” before executing the
> second update statement, or if we do not execute the first statement at all,
> the update of all rows completes within about 1 second.
> 
> Thanks and best Regards
> Nicolas
> 

I'm including what I found on dropbox below. In future, please include
the entire content in emails so it stays independent of sites that
might not still be there when people look back.

I've managed to reproduce the extremely long execution, which I
haven't yet tried to wait out.

I did find something interesting here's the EXPLAIN for the update
that's intended to change everything in the table. Note that the
estimated rows is 1:

    shackle@[local]:5414/slow_stmt_triggers(14devel)(110490) # explain update demo set type = 'black' where
type='blue';
                              QUERY PLAN                          
    ══════════════════════════════════════════════════════════════
     Update on demo  (cost=0.00..3976.35 rows=0 width=0)
       ->  Seq Scan on demo  (cost=0.00..3976.35 rows=1 width=34)
             Filter: ((type)::text = 'blue'::text)
    (3 rows)

    Time: 0.707 ms

Next, I run ANALYZE on that table to collect statistics, as people
generally do just after a bulk load to ensure that the planner is
acting on reasonably accurate statistics.

    shackle@[local]:5414/slow_stmt_triggers(14devel)(110490) # analyze demo;
    ANALYZE
    Time: 35.511 ms

...and run the EXPLAIN again:

    shackle@[local]:5414/slow_stmt_triggers(14devel)(110490) # explain update demo set type = 'black' where
type='blue';
                                QUERY PLAN                             
    ═══════════════════════════════════════════════════════════════════
     Update on demo  (cost=0.00..3582.00 rows=0 width=0)
       ->  Seq Scan on demo  (cost=0.00..3582.00 rows=200000 width=34)
             Filter: ((type)::text = 'blue'::text)
    (3 rows)

    Time: 0.772 ms

Now the row estimate is 200_000, which is to say exactly what's in
there.

Now, I run the whole-table UPDATE, and:

    shackle@[local]:5414/slow_stmt_triggers(14devel)(110490) # update demo set type = 'black' where type='blue';
    UPDATE 200000
    Time: 2208.754 ms (00:02.209)

Perhaps my machine is a little slower than the one the original test
was run on, but updating a little under a hundred thousand rows a
second, complete with an INSERT of another hundred thousand rows in a
second, doesn't seem terrible.


tl;dr: running ANALYZE after a bulk load is a good idea, and I don't
see a bug here.

Original self-contained repro:

-----8<-------------------------------------------------------------------------
-- on this table we will execute a simple update statement
create table demo (
    id int not null primary key,
    type varchar(5)
);

-- this is our log table that has a record written to for every updated record in the "demo" table.
create table log (
    data_after text,
    data_before text
);

-- This function is executed in an update trigger (defined below) attached to the "demo" table. The function joins
"before"and "after" data provided by a statement level trigger.
 
-- For all records containing a changed value, both "before" and "after" data are written to the "log" table.
CREATE OR REPLACE FUNCTION log_function()
    RETURNS TRIGGER
AS
$$
DECLARE
begin
        INSERT
        INTO log
        select new_data.row        AS data_after,
               old_data.row        AS data_before
        FROM (SELECT row_number() OVER () as id,
                     ROW (new_table.*)
              FROM new_table) new_data
                 JOIN (
            SELECT row_number() OVER () id,
                   ROW (old_table.*)
            FROM old_table) old_data
                      ON (new_data.id = old_data.id)
        WHERE new_data.row IS DISTINCT FROM old_data.row;
    RETURN NULL;
        end;
$$
    LANGUAGE plpgsql;

-- This attaches a update trigger to the "demo" table. On "update", log_function() is called
CREATE TRIGGER log_trigger AFTER UPDATE ON demo REFERENCING OLD TABLE AS old_table NEW TABLE AS new_table FOR EACH
STATEMENTEXECUTE PROCEDURE log_function();
 

truncate demo; -- just to make sure all tables are empty
truncate log; -- just to make sure all tables are empty

insert into demo (id, type) select i, 'blue' from generate_series(1,200000) s(i); -- generate 200k sample records in
the"demo" table. The log table is still empty after the execution.
 

update demo set type = 'black' where type='red'; -- No rows are updated and the query executes in 16ms. After the
executionof this query, "log" is still empty, as no records have changed.
 

update demo set type = 'black' where type='blue'; -- If run after the query above, this query does not stop within a
reasonableamount of time (hours). However, if the black -> red update statement is not called first, execution of this
querytakes less than 1.2s. It updates all 200k records in the demo table and creates 200k new records in the "log"
table.

-- Additional information: If we call "discard plans" between the execution of the two update statements above. The
secondone will also complete in 1.2s.
 
truncate demo; -- just to make sure all tables are empty
truncate log; -- just to make sure all tables are empty
insert into demo (id, type) select i, 'blue' from generate_series(1,200000) s(i); -- generate 200k sample records in
the"demo" table. The log table is still empty after the execution.
 
update demo set type = 'black' where type='red'; -- No rows are updated and the query executes in 16ms. After the
executionof this query, "log" is still empty, as no records have changed.
 
discard plans;
update demo set type = 'black' where type='blue';

---- just for checking results. Not part of the reproduction
select * from demo;
select * from log;


-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: BUG #16967: Extremely slow update statement in trigger

От
Tom Lane
Дата:
David Fetter <david@fetter.org> writes:
> On Fri, Apr 16, 2021 at 04:52:15PM +0000, PG Bug reporting form wrote:
>> Bug reference:      16967
>> Logged by:          Nicolas Burri
>> Email address:      postgres@burri.li
>> PostgreSQL version: 13.2
>> Operating system:   Linux & Windows
>> ...

> I'm including what I found on dropbox below. In future, please include
> the entire content in emails so it stays independent of sites that
> might not still be there when people look back.

Thanks for putting that into the archives.

> I did find something interesting here's the EXPLAIN for the update
> that's intended to change everything in the table. Note that the
> estimated rows is 1:
>     shackle@[local]:5414/slow_stmt_triggers(14devel)(110490) # explain update demo set type = 'black' where
type='blue';
>                               QUERY PLAN
>     ══════════════════════════════════════════════════════════════
>      Update on demo  (cost=0.00..3976.35 rows=0 width=0)
>        ->  Seq Scan on demo  (cost=0.00..3976.35 rows=1 width=34)
>              Filter: ((type)::text = 'blue'::text)
>     (3 rows)

Yeah.  Of course, the plan for the UPDATE itself looks just the
same post-ANALYZE.  What we can infer is that the complex query
within the trigger function got a really stupid plan when there
were no stats, and then a better plan once there were some.

> tl;dr: running ANALYZE after a bulk load is a good idea, and I don't
> see a bug here.

Agreed, there's little evidence here of anything but the known
issue that auto-ANALYZE doesn't act instantaneously.  If we had
an easy way to make that better, we would, but ...

            regards, tom lane



Re: BUG #16967: Extremely slow update statement in trigger

От
Nicolas Burri
Дата:
Hi all,

thanks for looking into this and sorry about the dropbox link. I will inline all further examples.

I tried running "analyze demo" after the bulk load but it does not have the desired effect. It only works if the analyze call is executed between the two update statements. Is it possible that the speedup in this case is a side effect and not actually the result of the analyze call?

Here is what I tried: I reduced the number of bulk inserted records to 10k as this still results in a significant slowdown of the update statement, but the calls complete within 40s on my machine.

Slow run: Execute "analyze demo" after the insert and before the update calls:
insert into demo (id, type) select i, 'blue' from generate_series(1, 10000) s(i);
analyze demo;
update demo set type = 'black' where type='red'; -- No rows are updated and the query executes in 2ms.
update demo set type = 'black' where type='blue'; -- Execution time: 37 seconds

Fast run: Execute "analyze demo" after the first update statement:
insert into demo (id, type) select i, 'blue' from generate_series(1, 10000) s(i);
update demo set type = 'black' where type='red'; -- No rows are updated and the query executes in 2ms.
analyze demo;
update demo set type = 'black' where type='blue'; -- Execution time: 51ms

Also slow run: Execute "analyze demo" after the bulk load and between the update statements:
insert into demo (id, type) select i, 'blue' from generate_series(1, 10000) s(i);
analyze demo;
update demo set type = 'black' where type='red'; -- No rows are updated and the query executes in 2ms.
analyze demo;
update demo set type = 'black' where type='blue'; -- Execution time: ~37 seconds

Execution times are basically the same, whether I run all statements in one go or if I manually execute them one by one. I even tried waiting for a while after the "analyze demo" calls, in case there were still some background processes running but this also had no effect on the execution times.

Thanks again and best regards
Nicolas


Am Fr., 23. Apr. 2021 um 04:58 Uhr schrieb Tom Lane <tgl@sss.pgh.pa.us>:
David Fetter <david@fetter.org> writes:
> On Fri, Apr 16, 2021 at 04:52:15PM +0000, PG Bug reporting form wrote:
>> Bug reference:      16967
>> Logged by:          Nicolas Burri
>> Email address:      postgres@burri.li
>> PostgreSQL version: 13.2
>> Operating system:   Linux &amp; Windows
>> ...

> I'm including what I found on dropbox below. In future, please include
> the entire content in emails so it stays independent of sites that
> might not still be there when people look back.

Thanks for putting that into the archives.

> I did find something interesting here's the EXPLAIN for the update
> that's intended to change everything in the table. Note that the
> estimated rows is 1:
>     shackle@[local]:5414/slow_stmt_triggers(14devel)(110490) # explain update demo set type = 'black' where type='blue';
>                               QUERY PLAN                         
>     ══════════════════════════════════════════════════════════════
>      Update on demo  (cost=0.00..3976.35 rows=0 width=0)
>        ->  Seq Scan on demo  (cost=0.00..3976.35 rows=1 width=34)
>              Filter: ((type)::text = 'blue'::text)
>     (3 rows)

Yeah.  Of course, the plan for the UPDATE itself looks just the
same post-ANALYZE.  What we can infer is that the complex query
within the trigger function got a really stupid plan when there
were no stats, and then a better plan once there were some.

> tl;dr: running ANALYZE after a bulk load is a good idea, and I don't
> see a bug here.

Agreed, there's little evidence here of anything but the known
issue that auto-ANALYZE doesn't act instantaneously.  If we had
an easy way to make that better, we would, but ...

                        regards, tom lane