Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Дата
Msg-id CAA4eK1JfOBtdObF420-jnCYt8nhfWCWKq-7pX4KM3RQsjis+6Q@mail.gmail.com
обсуждение исходный текст
Ответ на [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher  (Önder Kalacı <onderkalaci@gmail.com>)
Ответы Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher  (Amit Kapila <amit.kapila16@gmail.com>)
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher  (Önder Kalacı <onderkalaci@gmail.com>)
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher  (Marco Slot <marco.slot@gmail.com>)
Список pgsql-hackers
On Tue, Jul 12, 2022 at 7:07 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
>
> Hi hackers,
>
>
> It is often not feasible to use `REPLICA IDENTITY FULL` on the publication, because it leads to full table scan
>
> per tuple change on the subscription. This makes `REPLICA IDENTITY FULL` impracticable -- probably other
>
> than some small number of use cases.
>

IIUC, this proposal is to optimize cases where users can't have a
unique/primary key for a relation on the subscriber and those
relations receive lots of updates or deletes?

> With this patch, I'm proposing the following change: If there is an index on the subscriber, use the index
>
> as long as the planner sub-modules picks any index over sequential scan.
>
> Majority of the logic on the subscriber side has already existed in the code. The subscriber is already
>
> capable of doing (unique) index scans. With this patch, we are allowing the index to iterate over the
>
> tuples fetched and only act when tuples are equal. The ones familiar with this part of the code could
>
> realize that the sequential scan code on the subscriber already implements the `tuples_equal()` function.
>
> In short, the changes on the subscriber are mostly combining parts of (unique) index scan and
>
> sequential scan codes.
>
> The decision on whether to use an index (or which index) is mostly derived from planner infrastructure.
>
> The idea is that on the subscriber we have all the columns. So, construct all the `Path`s with the
>
> restrictions on all columns, such as `col_1 = $1 AND col_2 = $2 ... AND col_n = $N`. Finally, let
>
> the planner sub-module -- `create_index_paths()` -- to give us the relevant  index `Path`s. On top of
>
> that adds the sequential scan `Path` as well. Finally, pick the cheapest `Path` among.
>
> From the performance point of view, there are few things to note. First, the patch aims not to
> change the behavior when PRIMARY KEY or UNIQUE INDEX is used. Second, when REPLICA IDENTITY
> IS FULL on the publisher and an index is used on the subscriber, the difference mostly comes down
> to `index scan` vs `sequential scan`. That's why it is hard to claim a certain number of improvements.
> It mostly depends on the data size, index and the data distribution.
>

It seems that in favorable cases it will improve performance but we
should consider unfavorable cases as well. Two things that come to
mind in that regard are (a) while choosing index/seq. scan paths, the
patch doesn't account for cost for tuples_equal() which needs to be
performed for index scans, (b) it appears to me that the patch decides
which index to use the first time it opens the rel (or if the rel gets
invalidated) on subscriber and then for all consecutive operations it
uses the same index. It is quite possible that after some more
operations on the table, using the same index will actually be
costlier than a sequence scan or some other index scan.

--
With Regards,
Amit Kapila.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Richard Guo
Дата:
Сообщение: Re: Costing elided SubqueryScans more nearly correctly
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Use -fvisibility=hidden for shared libraries