Re: BUG #17590: [OPTIMIZER] DELETE never ends on a small table, found a workaround which makes it instant
От | Tomas Vondra |
---|---|
Тема | Re: BUG #17590: [OPTIMIZER] DELETE never ends on a small table, found a workaround which makes it instant |
Дата | |
Msg-id | 5b0e5bab-08ac-0942-3bad-9b122b513f94@enterprisedb.com обсуждение исходный текст |
Ответ на | BUG #17590: [OPTIMIZER] DELETE never ends on a small table, found a workaround which makes it instant (PG Bug reporting form <noreply@postgresql.org>) |
Список | pgsql-bugs |
On 8/19/22 17:13, PG Bug reporting form wrote: > The following bug has been logged on the website: > > Bug reference: 17590 > Logged by: Guillaume FOUET > Email address: g.fouet@gmail.com > PostgreSQL version: 14.5 > Operating system: Windows 10 - 64x > Description: > > Tested with Postgres 13.5, 13.8 and 14.5 (I updated to see if that was a > bug). > > I have four tables: ADDRESS, COMPANY, CONTACT, TRIP > ADDRESS has ~130000 rows > COMPANY has a FK toward ADDRESS and ~16000 rows > CONTACT has a FK toward ADDRESS and ~12000 rows > TRIP has a FK toward ADDRESS and ~137500 rows > > We wanted to purge the address table from old, unused addresses: > DELETE FROM address WHERE address_id NOT IN ( > SELECT DISTINCT address_id FROM company > UNION > SELECT DISTINCT address_id FROM contact > UNION > SELECT DISTINCT address_id FROM trip > ); > > This query above never ends (I waited 15 minutes and it was still going, HDD > doing nothing, one core CPU used). > The EXPLAIN says it materializes the address_ids aggregates then scans > ADDRESS for deletion. It's probably better to include the query plan. Anyway, the union essentially creates a new relation, making indexes (on the base relations unusable). > > After many tries, I made this query instead: > CREATE TEMPORARY TABLE used_address_id AS ( > SELECT DISTINCT address_id FROM company > UNION > SELECT DISTINCT address_id FROM contact > UNION > SELECT DISTINCT address_id FROM trip > ); > CREATE UNIQUE INDEX ON used_address_id (address_id); > DELETE FROM used_address_id WHERE address_id IS NULL; > DELETE FROM address WHERE address_id NOT IN (SELECT address_id FROM > used_address_id); > > This was resolved in 500ms (basically instantly). > I have the feeling there's an O(n²) somewhere in the first query. Yeah. The temporary table means we can use the indexes again. I'm not sure I'd call this a bug, it's simply how we deal with this sort of queries. Maybe try splitting the one "NOT IN" condition into a separate condition per table. I mean, something like DELETE FROM address WHERE address_id NOT IN (SELECT address_id FROM company) AND address_id NOT IN (SELECT address_id FROM contact) AND address_id NOT IN (SELECT address_id FROM trip) or something like that. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-bugs по дате отправления: