Re: How to do faster DML
| От | Francisco Olarte |
|---|---|
| Тема | Re: How to do faster DML |
| Дата | |
| Msg-id | CA+bJJbx0ObdkWJ-Q5qvsM+v1jqLA2bQi+Atqc675Ds5z5AjyEg@mail.gmail.com обсуждение исходный текст |
| Ответ на | Re: How to do faster DML (Greg Sabino Mullane <htamfids@gmail.com>) |
| Ответы |
Re: How to do faster DML
|
| Список | pgsql-general |
On Sat, 3 Feb 2024 at 19:29, Greg Sabino Mullane <htamfids@gmail.com> wrote: ... > Given the size of your table, you probably want to divide that up. > As long as nothing is changing the original table, you could do: > > insert into mytable2 select * from mytable1 order by ctid limit 10_000_000 offset 0; > insert into mytable2 select * from mytable1 order by ctid limit 10_000_000 offset 10_000_000; > insert into mytable2 select * from mytable1 order by ctid limit 10_000_000 offset 20_000_000; Is it me or does this have the faint smell of quadratic behaviour? I mean, you need to read and discard the first 10M to do offset 10M ( also I believe ctid order is implied in sequential scan, but no so sure, if not it would need a full table sort on each pass ). When doing things like this, I normally have some kind of unique index and do it by selecting with limit above the last read value( stored when doing it via code, as I normally load first, index later so I cannot select max() fast on the target ). Or, with some kind of "sparse" index (meaning, groups much less than the batch size ) and a little code you can select where index_col > last order by index_col limit 10M, remember last received index_col and reselect discarding missing ( or just reselect it when doing your on conflict do nothing, which also has a fast select max(id) due to the PK, it will work if it has an index on id column on the original even if not unique ) to avoid that. Also, I'm not sure if ctid is ordered and you can select where ctid>last ordered, if that works it probably is faster for immutable origins. Francisco Olarte.
В списке pgsql-general по дате отправления: