RE: Updating large tables without dead tuples
От | ldh@laurent-hasson.com |
---|---|
Тема | RE: Updating large tables without dead tuples |
Дата | |
Msg-id | BY2PR15MB0872D2208E95046125EC577385C30@BY2PR15MB0872.namprd15.prod.outlook.com обсуждение исходный текст |
Ответ на | Re: Updating large tables without dead tuples (Stephen Frost <sfrost@snowman.net>) |
Ответы |
Re: Updating large tables without dead tuples
|
Список | pgsql-performance |
> -----Original Message----- > From: Stephen Frost [mailto:sfrost@snowman.net] > Sent: Friday, February 23, 2018 19:10 > To: ldh@laurent-hasson.com > Cc: pgsql-performance@lists.postgresql.org > Subject: Re: Updating large tables without dead tuples > > Greetings, > > * ldh@laurent-hasson.com (ldh@laurent-hasson.com) wrote: > > This was done during a maintenance window, and that table is read-only > except when we ETL data to it on a weekly basis, and so I was just wondering > why I should pay the "bloat" penalty for this type of transaction. Is there a trick > that could be use here? > > Yes, create a new table and INSERT the data into that table, then swap the new > table into place as the old table. Another option, if you don't mind the > exclusive lock taken on the table, is to dump the data to another table, then > TRUNCATE the current one and then INSERT into it. > > There's other options too, involving triggers and such to allow updates and > other changes to be captured during this process, avoiding the need to lock the > table, but that gets a bit complicated. > > > More generally, I suspect that the MVCC architecture is so deep that > something like LOCK TABLE, which would guarantee that there won't be > contentions, couldn't be used as a heuristic to not create dead tuples? That > would make quite a performance improvement for this type of work though. > > I'm afraid it wouldn't be quite that simple, particularly you have to think about > what happens when you issue a rollback... > > Thanks! > > Stephen [Laurent Hasson] [Laurent Hasson] This table several other tables with foreign keys into it... So any physical replacement of the table wouldn't work I believe.I'd have to disable/remove the foreign keys across the other tables, do this work, and then re-set the foreign keys.Overall time in aggregate may not be much shorter than the current implementation. This table represents Hospital visits, off of which hang a lot of other information. The updated column in that Visits tableis not part of the key. As for the rollback, I didn't think about it because in our case, short of a db/hardware failure, this operation wouldn'tfail... But the risk is there and I understand the engine must be prepared for anything and fulfill the ACID principles. With respect to that, I read in many places that an UPDATE is effectively a DELETE + INSERT. Does that mean in the rollbacklogs, there are 2 entries for each row updated as a result? Thank you, Laurent.
В списке pgsql-performance по дате отправления: