Обсуждение: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

Поиск

Список

Период

Сортировка

doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

От

Masahiko Sawada

Дата:

03 июля 2023 г., 05:15:05

Hi,

Commit 89e46da5e5 allowed us to use indexes for searching on REPLICA
IDENTITY FULL tables. The documentation explains:

When replica identity <quote>full</quote> is specified,
indexes can be used on the subscriber side for searching the rows.  Candidate
indexes must be btree, non-partial, and have at least one column reference
(i.e. cannot consist of only expressions).

To be exact, IIUC the column reference must be on the leftmost column
of indexes. Does it make sense to mention that? I've attached the
patch.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Вложения

doc_update.patch

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

От

Amit Kapila

Дата:

04 июля 2023 г., 12:51:35

On Mon, Jul 3, 2023 at 7:45 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> Commit 89e46da5e5 allowed us to use indexes for searching on REPLICA
> IDENTITY FULL tables. The documentation explains:
>
> When replica identity <quote>full</quote> is specified,
> indexes can be used on the subscriber side for searching the rows.  Candidate
> indexes must be btree, non-partial, and have at least one column reference
> (i.e. cannot consist of only expressions).
>
> To be exact, IIUC the column reference must be on the leftmost column
> of indexes. Does it make sense to mention that?
>

Yeah, I think it is good to mention that. Accordingly, the comments
atop build_replindex_scan_key(),
FindUsableIndexForReplicaIdentityFull(), IsIndexOnlyOnExpression()
should also be adjusted.

--
With Regards,
Amit Kapila.

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

От

Peter Smith

Дата:

05 июля 2023 г., 06:30:41

Hi. Here are some review comments for this patch.

+1 for the patch idea.

------

I wasn't sure about the code comment adjustments suggested by Amit [1]:
"Accordingly, the comments atop build_replindex_scan_key(),
FindUsableIndexForReplicaIdentityFull(), IsIndexOnlyOnExpression()
should also be adjusted."

Actually, I thought the FindUsableIndexForReplicaIdentityFull()
function comment is *already* describing the limitation about the
leftmost column (see fragment below), so IIUC the Sawada-san patch is
only trying to expose that same information in the PG docs.

[FindUsableIndexForReplicaIdentityFull comment fragment]
* We also skip indexes if the remote relation does not contain the leftmost
* column of the index. This is because in most such cases sequential scan is
* favorable over index scan.

OTOH, it may be better if these limitation rule details were not
scattered in the code. e.g. build_replindex_scan_key() function
comment can be simplified:

CURRENT:
* This is not generic routine, it expects the idxrel to be a btree, non-partial
* and have at least one column reference (i.e. cannot consist of only
* expressions).

SUGGESTION:
This is not a generic routine. It expects the 'idxrel' to be an index
deemed "usable" by the function
FindUsableIndexForReplicaIdentityFull().

------
doc/src/sgml/logical-replication.sgml

1.
the key. When replica identity <literal>FULL</literal> is specified,
indexes can be used on the subscriber side for searching the rows.
Candidate
indexes must be btree, non-partial, and have at least one column reference
- (i.e. cannot consist of only expressions). These restrictions
- on the non-unique index properties adhere to some of the restrictions that
- are enforced for primary keys. If there are no such suitable indexes,
+ at the leftmost column indexes (i.e. cannot consist of only
expressions). These
+ restrictions on the non-unique index properties adhere to some of
the restrictions
+ that are enforced for primary keys. If there are no such suitable indexes,
the search on the subscriber side can be very inefficient, therefore
replica identity <literal>FULL</literal> should only be used as a
fallback if no other solution is possible. If a replica identity other

Isn't this using the word "indexes" with different meanings in the
same sentence? e.g. IIUC "leftmost column indexes" is referring to the
ordinal number of the index fields. TBH, I am not sure the patch
wording is even describing the limitation in quite the same way as
what the code is actually doing.

HEAD (code comment):
* We also skip indexes if the remote relation does not contain the leftmost
* column of the index. This is because in most such cases sequential scan is
* favorable over index scan.

HEAD (rendered docs)
Candidate indexes must be btree, non-partial, and have at least one
column reference (i.e. cannot consist of only expressions). These
restrictions on the non-unique index properties adhere to some of the
restrictions that are enforced for primary keys.

PATCHED (rendered docs)
Candidate indexes must be btree, non-partial, and have at least one
column reference at the leftmost column indexes (i.e. cannot consist
of only expressions). These restrictions on the non-unique index
properties adhere to some of the restrictions that are enforced for
primary keys.

MY SUGGESTION:
Candidate indexes must be btree, non-partial, and have at least one
column reference (i.e. cannot consist of only expressions).
Furthermore, the leftmost field of the candidate index must be a
column of the published table. These restrictions on the non-unique
index properties adhere to some of the restrictions that are enforced
for primary keys.

------
[1] Amit suggestions -
https://www.postgresql.org/message-id/CAA4eK1LZ_A-UmC_P%2B_hLi%2BPbwyqak%2BvRKemZ7imzk2puVTpHOA%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

От

Amit Kapila

Дата:

05 июля 2023 г., 08:46:42

On Wed, Jul 5, 2023 at 9:01 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi. Here are some review comments for this patch.
>
> +1 for the patch idea.
>
> ------
>
> I wasn't sure about the code comment adjustments suggested by Amit [1]:
> "Accordingly, the comments atop build_replindex_scan_key(),
> FindUsableIndexForReplicaIdentityFull(), IsIndexOnlyOnExpression()
> should also be adjusted."
>
> Actually, I thought the FindUsableIndexForReplicaIdentityFull()
> function comment is *already* describing the limitation about the
> leftmost column (see fragment below), so IIUC the Sawada-san patch is
> only trying to expose that same information in the PG docs.
>
> [FindUsableIndexForReplicaIdentityFull comment fragment]
>  * We also skip indexes if the remote relation does not contain the leftmost
>  * column of the index. This is because in most such cases sequential scan is
>  * favorable over index scan.
>

This implies that the leftmost column of the index must be
non-expression but I feel what the patch intends to say in docs is
more straightforward and it doesn't match what the proposed docs says.

> ~
>
> OTOH, it may be better if these limitation rule details were not
> scattered in the code. e.g. build_replindex_scan_key() function
> comment can be simplified:
>
> CURRENT:
>  * This is not generic routine, it expects the idxrel to be a btree, non-partial
>  * and have at least one column reference (i.e. cannot consist of only
>  * expressions).
>
> SUGGESTION:
> This is not a generic routine. It expects the 'idxrel' to be an index
> deemed "usable" by the function
> FindUsableIndexForReplicaIdentityFull().
>

Note that for PK/ReplicaIdentity, we don't even call
FindUsableIndexForReplicaIdentityFull() but build_replindex_scan_key()
would still be called for such index. So, I am not sure your proposed
wording is an improvement.

> ------
> doc/src/sgml/logical-replication.sgml
>
> 1.
>     the key.  When replica identity <literal>FULL</literal> is specified,
>     indexes can be used on the subscriber side for searching the rows.
> Candidate
>     indexes must be btree, non-partial, and have at least one column reference
> -   (i.e. cannot consist of only expressions).  These restrictions
> -   on the non-unique index properties adhere to some of the restrictions that
> -   are enforced for primary keys.  If there are no such suitable indexes,
> +   at the leftmost column indexes (i.e. cannot consist of only
> expressions).  These
> +   restrictions on the non-unique index properties adhere to some of
> the restrictions
> +   that are enforced for primary keys.  If there are no such suitable indexes,
>     the search on the subscriber side can be very inefficient, therefore
>     replica identity <literal>FULL</literal> should only be used as a
>     fallback if no other solution is possible.  If a replica identity other
>
> Isn't this using the word "indexes" with different meanings in the
> same sentence? e.g. IIUC "leftmost column indexes" is referring to the
> ordinal number of the index fields. TBH, I am not sure the patch
> wording is even describing the limitation in quite the same way as
> what the code is actually doing.
>
> HEAD (code comment):
>  * We also skip indexes if the remote relation does not contain the leftmost
>  * column of the index. This is because in most such cases sequential scan is
>  * favorable over index scan.
>
> HEAD (rendered docs)
> Candidate indexes must be btree, non-partial, and have at least one
> column reference (i.e. cannot consist of only expressions). These
> restrictions on the non-unique index properties adhere to some of the
> restrictions that are enforced for primary keys.
>
> PATCHED (rendered docs)
> Candidate indexes must be btree, non-partial, and have at least one
> column reference at the leftmost column indexes (i.e. cannot consist
> of only expressions). These restrictions on the non-unique index
> properties adhere to some of the restrictions that are enforced for
> primary keys.
>
> MY SUGGESTION:
> Candidate indexes must be btree, non-partial, and have at least one
> column reference (i.e. cannot consist of only expressions).
> Furthermore, the leftmost field of the candidate index must be a
> column of the published table. These restrictions on the non-unique
> index properties adhere to some of the restrictions that are enforced
> for primary keys.
>

I don't know if this suggestion is what the code is actually doing. In
function RemoteRelContainsLeftMostColumnOnIdx(), we have the following
checks:
==========
keycol = indexInfo->ii_IndexAttrNumbers[0];
if (!AttributeNumberIsValid(keycol))
return false;

if (attrmap->maplen <= AttrNumberGetAttrOffset(keycol))
return false;

return attrmap->attnums[AttrNumberGetAttrOffset(keycol)] >= 0;
==========

The first of these checks indicates that the leftmost column of the
index should be non-expression, second and third indicates what you
suggest in your wording. We can also think that what you wrote in a
way is a superset of "leftmost index column is a non-expression" and
"leftmost index column should be present in remote rel" but I feel it
would be better to explicit about the first part as it is easy to
understand for users at least in docs.

--
With Regards,
Amit Kapila.

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

От

Masahiko Sawada

Дата:

05 июля 2023 г., 09:31:33

On Wed, Jul 5, 2023 at 2:46 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Jul 5, 2023 at 9:01 AM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > Hi. Here are some review comments for this patch.
> >
> > +1 for the patch idea.
> >
> > ------
> >
> > I wasn't sure about the code comment adjustments suggested by Amit [1]:
> > "Accordingly, the comments atop build_replindex_scan_key(),
> > FindUsableIndexForReplicaIdentityFull(), IsIndexOnlyOnExpression()
> > should also be adjusted."

As for IsIndexOnlyOnExpression(), what part do you think we need to
adjust? It says:

/*
 * Returns true if the given index consists only of expressions such as:
 *  CREATE INDEX idx ON table(foo(col));
 *
 * Returns false even if there is one column reference:
 *   CREATE INDEX idx ON table(foo(col), col_2);
 */

and it seems to me that the function doesn't check if the leftmost
index column is a non-expression.

> > doc/src/sgml/logical-replication.sgml
> >
> > 1.
> >     the key.  When replica identity <literal>FULL</literal> is specified,
> >     indexes can be used on the subscriber side for searching the rows.
> > Candidate
> >     indexes must be btree, non-partial, and have at least one column reference
> > -   (i.e. cannot consist of only expressions).  These restrictions
> > -   on the non-unique index properties adhere to some of the restrictions that
> > -   are enforced for primary keys.  If there are no such suitable indexes,
> > +   at the leftmost column indexes (i.e. cannot consist of only
> > expressions).  These
> > +   restrictions on the non-unique index properties adhere to some of
> > the restrictions
> > +   that are enforced for primary keys.  If there are no such suitable indexes,
> >     the search on the subscriber side can be very inefficient, therefore
> >     replica identity <literal>FULL</literal> should only be used as a
> >     fallback if no other solution is possible.  If a replica identity other
> >
> > Isn't this using the word "indexes" with different meanings in the
> > same sentence? e.g. IIUC "leftmost column indexes" is referring to the
> > ordinal number of the index fields.

That was my mistake, it should be " at the leftmost column".

>
> I don't know if this suggestion is what the code is actually doing. In
> function RemoteRelContainsLeftMostColumnOnIdx(), we have the following
> checks:
> ==========
> keycol = indexInfo->ii_IndexAttrNumbers[0];
> if (!AttributeNumberIsValid(keycol))
> return false;
>
> if (attrmap->maplen <= AttrNumberGetAttrOffset(keycol))
> return false;
>
> return attrmap->attnums[AttrNumberGetAttrOffset(keycol)] >= 0;
> ==========
>
> The first of these checks indicates that the leftmost column of the
> index should be non-expression, second and third indicates what you
> suggest in your wording. We can also think that what you wrote in a
> way is a superset of "leftmost index column is a non-expression" and
> "leftmost index column should be present in remote rel" but I feel it
> would be better to explicit about the first part as it is easy to
> understand for users at least in docs.

+1

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

От

Amit Kapila

Дата:

05 июля 2023 г., 13:36:31

On Wed, Jul 5, 2023 at 12:02 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, Jul 5, 2023 at 2:46 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Jul 5, 2023 at 9:01 AM Peter Smith <smithpb2250@gmail.com> wrote:
> > >
> > > Hi. Here are some review comments for this patch.
> > >
> > > +1 for the patch idea.
> > >
> > > ------
> > >
> > > I wasn't sure about the code comment adjustments suggested by Amit [1]:
> > > "Accordingly, the comments atop build_replindex_scan_key(),
> > > FindUsableIndexForReplicaIdentityFull(), IsIndexOnlyOnExpression()
> > > should also be adjusted."
>
> As for IsIndexOnlyOnExpression(), what part do you think we need to
> adjust? It says:
>
> /*
>  * Returns true if the given index consists only of expressions such as:
>  *  CREATE INDEX idx ON table(foo(col));
>  *
>  * Returns false even if there is one column reference:
>  *   CREATE INDEX idx ON table(foo(col), col_2);
>  */
>
> and it seems to me that the function doesn't check if the leftmost
> index column is a non-expression.
>

Right, so, we can leave this as is.

--
With Regards,
Amit Kapila.

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

От

Peter Smith

Дата:

06 июля 2023 г., 01:58:01

On Wed, Jul 5, 2023 at 4:32 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, Jul 5, 2023 at 2:46 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Jul 5, 2023 at 9:01 AM Peter Smith <smithpb2250@gmail.com> wrote:
> > >
> > > Hi. Here are some review comments for this patch.
> > >
> > > +1 for the patch idea.
> > >
> > > ------
> > >
> > > I wasn't sure about the code comment adjustments suggested by Amit [1]:
> > > "Accordingly, the comments atop build_replindex_scan_key(),
> > > FindUsableIndexForReplicaIdentityFull(), IsIndexOnlyOnExpression()
> > > should also be adjusted."
>
> As for IsIndexOnlyOnExpression(), what part do you think we need to
> adjust? It says:
>
> /*
>  * Returns true if the given index consists only of expressions such as:
>  *  CREATE INDEX idx ON table(foo(col));
>  *
>  * Returns false even if there is one column reference:
>  *   CREATE INDEX idx ON table(foo(col), col_2);
>  */
>
> and it seems to me that the function doesn't check if the leftmost
> index column is a non-expression.
>

TBH, this IsIndexOnlyOnExpression() function seemed redundant to me,
otherwise, there can be some indexes that are firstly considered
"useable" but then fail the subsequent leftmost check. It does not
seem right.

> > > doc/src/sgml/logical-replication.sgml
> > >
> > > 1.
> > >     the key.  When replica identity <literal>FULL</literal> is specified,
> > >     indexes can be used on the subscriber side for searching the rows.
> > > Candidate
> > >     indexes must be btree, non-partial, and have at least one column reference
> > > -   (i.e. cannot consist of only expressions).  These restrictions
> > > -   on the non-unique index properties adhere to some of the restrictions that
> > > -   are enforced for primary keys.  If there are no such suitable indexes,
> > > +   at the leftmost column indexes (i.e. cannot consist of only
> > > expressions).  These
> > > +   restrictions on the non-unique index properties adhere to some of
> > > the restrictions
> > > +   that are enforced for primary keys.  If there are no such suitable indexes,
> > >     the search on the subscriber side can be very inefficient, therefore
> > >     replica identity <literal>FULL</literal> should only be used as a
> > >     fallback if no other solution is possible.  If a replica identity other
> > >
> > > Isn't this using the word "indexes" with different meanings in the
> > > same sentence? e.g. IIUC "leftmost column indexes" is referring to the
> > > ordinal number of the index fields.
>
> That was my mistake, it should be " at the leftmost column".

IIUC the subscriber-side table can have more columns than the
publisher-side table, so just describing in the docs that the leftmost
INDEX field must be a column may not be quite enough; it also needs to
say that column has to exist on the publisher-table, doesn't it?

Also, after you document this 'leftmost field restriction' that
already implies there *must* be a non-expression in the INDEX. So I
thought we can just omit the "(i.e. cannot consist of only
expressions)" part.

Anyway, I will wait to see the wording of the updated patch before
commenting further.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

От

Masahiko Sawada

Дата:

06 июля 2023 г., 04:41:48

On Thu, Jul 6, 2023 at 7:58 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Wed, Jul 5, 2023 at 4:32 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Wed, Jul 5, 2023 at 2:46 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Wed, Jul 5, 2023 at 9:01 AM Peter Smith <smithpb2250@gmail.com> wrote:
> > > >
> > > > Hi. Here are some review comments for this patch.
> > > >
> > > > +1 for the patch idea.
> > > >
> > > > ------
> > > >
> > > > I wasn't sure about the code comment adjustments suggested by Amit [1]:
> > > > "Accordingly, the comments atop build_replindex_scan_key(),
> > > > FindUsableIndexForReplicaIdentityFull(), IsIndexOnlyOnExpression()
> > > > should also be adjusted."
> >
> > As for IsIndexOnlyOnExpression(), what part do you think we need to
> > adjust? It says:
> >
> > /*
> >  * Returns true if the given index consists only of expressions such as:
> >  *  CREATE INDEX idx ON table(foo(col));
> >  *
> >  * Returns false even if there is one column reference:
> >  *   CREATE INDEX idx ON table(foo(col), col_2);
> >  */
> >
> > and it seems to me that the function doesn't check if the leftmost
> > index column is a non-expression.
> >
>
> TBH, this IsIndexOnlyOnExpression() function seemed redundant to me,
> otherwise, there can be some indexes that are firstly considered
> "useable" but then fail the subsequent leftmost check. It does not
> seem right.

I see your point. IsIndexUsableForReplicaIdentityFull(), the sole user
of IsIndexOnlyOnExpression(), is also called by
RelationFindReplTupleByIndex() in an assertion build. I thought this
is the reason why we have separate IsIndexOnlyOnExpression() ( and
IsIndexUsableForReplicaIdentityFull()). But this assertion doesn't
check if the leftmost index column exists on the remote relation. What
are we doing this check for? If it's not necessary, we can remove this
assertion and merge both IsIndexOnlyOnExpression() and
IsIndexUsableForReplicaIdentityFull() into
FindUsableIndexForReplicaIdentityFull().

>
> > > > doc/src/sgml/logical-replication.sgml
> > > >
> > > > 1.
> > > >     the key.  When replica identity <literal>FULL</literal> is specified,
> > > >     indexes can be used on the subscriber side for searching the rows.
> > > > Candidate
> > > >     indexes must be btree, non-partial, and have at least one column reference
> > > > -   (i.e. cannot consist of only expressions).  These restrictions
> > > > -   on the non-unique index properties adhere to some of the restrictions that
> > > > -   are enforced for primary keys.  If there are no such suitable indexes,
> > > > +   at the leftmost column indexes (i.e. cannot consist of only
> > > > expressions).  These
> > > > +   restrictions on the non-unique index properties adhere to some of
> > > > the restrictions
> > > > +   that are enforced for primary keys.  If there are no such suitable indexes,
> > > >     the search on the subscriber side can be very inefficient, therefore
> > > >     replica identity <literal>FULL</literal> should only be used as a
> > > >     fallback if no other solution is possible.  If a replica identity other
> > > >
> > > > Isn't this using the word "indexes" with different meanings in the
> > > > same sentence? e.g. IIUC "leftmost column indexes" is referring to the
> > > > ordinal number of the index fields.
> >
> > That was my mistake, it should be " at the leftmost column".
>
> IIUC the subscriber-side table can have more columns than the
> publisher-side table, so just describing in the docs that the leftmost
> INDEX field must be a column may not be quite enough; it also needs to
> say that column has to exist on the publisher-table, doesn't it?

Right. I've updated the patch.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Вложения

doc_update_v2.patch

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

От

Peter Smith

Дата:

07 июля 2023 г., 04:54:57

Hi, Here are my review comments for patch v2.

======
1. doc/src/sgml/logical-replication.sgml

Candidate indexes must be btree, non-partial, and have at least one
column reference to the published table column at the leftmost column
(i.e. cannot consist of only expressions).

~

There is only one column which can be the leftmost one, so the wording
"At least one ... at the leftmost"  seemed a bit strange to me.
Personally, I would phrase it something like below:

SUGGESTION #1
Candidate indexes must be btree, non-partial, and the leftmost index
column must reference a published table column (i.e. the index cannot
consist of only expressions).

SUGGESTION #2 (same as above, but omitting the "only expressions"
part, which I think is implied by the "leftmost" rule anyway)
Candidate indexes must be btree, non-partial, and the leftmost index
column must reference a published table column.

======
2. src/backend/replication/logical/relation.c

  * Returns the oid of an index that can be used by the apply worker to scan
  * the relation. The index must be btree, non-partial, and have at least
- * one column reference (i.e. cannot consist of only expressions). These
- * limitations help to keep the index scan similar to PK/RI index scans.
+ * one column reference to the remote relation's column at the leftmost column
+ * (i.e. cannot consist of only expressions). These limitations help
to keep the
+ * index scan similar to PK/RI index scans.

This comment text is similar to the docs change, so refer to the same
suggestions as #1 above.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

От

Masahiko Sawada

Дата:

07 июля 2023 г., 11:05:40

On Fri, Jul 7, 2023 at 10:55 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi, Here are my review comments for patch v2.
>
> ======
> 1. doc/src/sgml/logical-replication.sgml
>
> Candidate indexes must be btree, non-partial, and have at least one
> column reference to the published table column at the leftmost column
> (i.e. cannot consist of only expressions).
>
> ~
>
> There is only one column which can be the leftmost one, so the wording
> "At least one ... at the leftmost"  seemed a bit strange to me.
> Personally, I would phrase it something like below:
>
> SUGGESTION #1
> Candidate indexes must be btree, non-partial, and the leftmost index
> column must reference a published table column (i.e. the index cannot
> consist of only expressions).

I prefer the first suggestion. I've attached the updated patch.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Вложения

v3-0001-Doc-clarify-the-conditions-of-usable-indexes-for-.patch

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

От

Amit Kapila

Дата:

08 июля 2023 г., 06:49:40

On Fri, Jul 7, 2023 at 1:36 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> I prefer the first suggestion. I've attached the updated patch.
>

This looks mostly good to me but I think it would be better if we can
also add the information that the leftmost index column must be a
non-expression. So, how about: "Candidate indexes must be btree,
non-partial, and the leftmost index column must be a non-expression
and reference to a published table column (i.e. cannot consist of only
expressions)."?

--
With Regards,
Amit Kapila.

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

От

Peter Smith

Дата:

10 июля 2023 г., 05:24:43

On Sat, Jul 8, 2023 at 1:49 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Jul 7, 2023 at 1:36 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > I prefer the first suggestion. I've attached the updated patch.
> >
>
> This looks mostly good to me but I think it would be better if we can
> also add the information that the leftmost index column must be a
> non-expression. So, how about: "Candidate indexes must be btree,
> non-partial, and the leftmost index column must be a non-expression
> and reference to a published table column (i.e. cannot consist of only
> expressions)."?

That part in parentheses ought to say "the index ..." because it is
referring to the full INDEX, not to the leftmost column. I think this
was missed when Sawada-san took my previous suggestion [1].

IMO it doesn't sound right to say the "index column must be a
non-expression". It is already a non-expression because it is a
column. So I think it would be better to refer to this as an INDEX
"field" instead of an INDEX column. Note that "field" is the same
terminology used in the docs for CREATE INDEX [2].

SUGGESTION
Candidate indexes must be btree, non-partial, and the leftmost index
field must be a column that references a published table column (i.e.
the index cannot consist of only expressions).

~~~~

What happened to the earlier idea of removing/merging the redundant
(?) function IsIndexOnlyOnExpression()?
- Something wrong with that?
- Chose not to do it?
- Will do it raised in another thread?

------
[1] my review v2 -
https://www.postgresql.org/message-id/CAHut%2BPsFdTZJ7DG1jyu7BpA_1d4hwEd-Q%2BmQAPWcj1ZLD_X5Dw%40mail.gmail.com
[2] create index - https://www.postgresql.org/docs/current/sql-createindex.html

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

От

Amit Kapila

Дата:

10 июля 2023 г., 07:21:07

On Mon, Jul 10, 2023 at 7:55 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Sat, Jul 8, 2023 at 1:49 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Fri, Jul 7, 2023 at 1:36 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > I prefer the first suggestion. I've attached the updated patch.
> > >
> >
> > This looks mostly good to me but I think it would be better if we can
> > also add the information that the leftmost index column must be a
> > non-expression. So, how about: "Candidate indexes must be btree,
> > non-partial, and the leftmost index column must be a non-expression
> > and reference to a published table column (i.e. cannot consist of only
> > expressions)."?
>
> That part in parentheses ought to say "the index ..." because it is
> referring to the full INDEX, not to the leftmost column. I think this
> was missed when Sawada-san took my previous suggestion [1].
>
> IMO it doesn't sound right to say the "index column must be a
> non-expression". It is already a non-expression because it is a
> column. So I think it would be better to refer to this as an INDEX
> "field" instead of an INDEX column. Note that "field" is the same
> terminology used in the docs for CREATE INDEX [2].
>

I thought it would be better to be explicit for this case but I am
fine if Sawada-San and you prefer some other way to document it.

--
With Regards,
Amit Kapila.

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

От

Peter Smith

Дата:

11 июля 2023 г., 02:23:36

On Mon, Jul 10, 2023 at 2:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Jul 10, 2023 at 7:55 AM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > On Sat, Jul 8, 2023 at 1:49 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Fri, Jul 7, 2023 at 1:36 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >
> > > > I prefer the first suggestion. I've attached the updated patch.
> > > >
> > >
> > > This looks mostly good to me but I think it would be better if we can
> > > also add the information that the leftmost index column must be a
> > > non-expression. So, how about: "Candidate indexes must be btree,
> > > non-partial, and the leftmost index column must be a non-expression
> > > and reference to a published table column (i.e. cannot consist of only
> > > expressions)."?
> >
> > That part in parentheses ought to say "the index ..." because it is
> > referring to the full INDEX, not to the leftmost column. I think this
> > was missed when Sawada-san took my previous suggestion [1].
> >
> > IMO it doesn't sound right to say the "index column must be a
> > non-expression". It is already a non-expression because it is a
> > column. So I think it would be better to refer to this as an INDEX
> > "field" instead of an INDEX column. Note that "field" is the same
> > terminology used in the docs for CREATE INDEX [2].
> >
>
> I thought it would be better to be explicit for this case but I am
> fine if Sawada-San and you prefer some other way to document it.
>

I see. How about just moving the parenthesized part to explicitly
refer only to the leftmost field?

SUGGESTION
Candidate indexes must be btree, non-partial, and the leftmost index
field must be a column (not an expression) that references a published
table column.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

От

Amit Kapila

Дата:

11 июля 2023 г., 07:05:41

On Tue, Jul 11, 2023 at 4:54 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Mon, Jul 10, 2023 at 2:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Jul 10, 2023 at 7:55 AM Peter Smith <smithpb2250@gmail.com> wrote:
> > >
> > > On Sat, Jul 8, 2023 at 1:49 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Fri, Jul 7, 2023 at 1:36 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > >
> > > > > I prefer the first suggestion. I've attached the updated patch.
> > > > >
> > > >
> > > > This looks mostly good to me but I think it would be better if we can
> > > > also add the information that the leftmost index column must be a
> > > > non-expression. So, how about: "Candidate indexes must be btree,
> > > > non-partial, and the leftmost index column must be a non-expression
> > > > and reference to a published table column (i.e. cannot consist of only
> > > > expressions)."?
> > >
> > > That part in parentheses ought to say "the index ..." because it is
> > > referring to the full INDEX, not to the leftmost column. I think this
> > > was missed when Sawada-san took my previous suggestion [1].
> > >
> > > IMO it doesn't sound right to say the "index column must be a
> > > non-expression". It is already a non-expression because it is a
> > > column. So I think it would be better to refer to this as an INDEX
> > > "field" instead of an INDEX column. Note that "field" is the same
> > > terminology used in the docs for CREATE INDEX [2].
> > >
> >
> > I thought it would be better to be explicit for this case but I am
> > fine if Sawada-San and you prefer some other way to document it.
> >
>
> I see. How about just moving the parenthesized part to explicitly
> refer only to the leftmost field?
>
> SUGGESTION
> Candidate indexes must be btree, non-partial, and the leftmost index
> field must be a column (not an expression) that references a published
> table column.
>

Yeah, something like that works for me.

--
With Regards,
Amit Kapila.

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

От

Masahiko Sawada

Дата:

11 июля 2023 г., 08:47:45

On Tue, Jul 11, 2023 at 1:05 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Jul 11, 2023 at 4:54 AM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > On Mon, Jul 10, 2023 at 2:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Mon, Jul 10, 2023 at 7:55 AM Peter Smith <smithpb2250@gmail.com> wrote:
> > > >
> > > > On Sat, Jul 8, 2023 at 1:49 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > >
> > > > > On Fri, Jul 7, 2023 at 1:36 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > > >
> > > > > > I prefer the first suggestion. I've attached the updated patch.
> > > > > >
> > > > >
> > > > > This looks mostly good to me but I think it would be better if we can
> > > > > also add the information that the leftmost index column must be a
> > > > > non-expression. So, how about: "Candidate indexes must be btree,
> > > > > non-partial, and the leftmost index column must be a non-expression
> > > > > and reference to a published table column (i.e. cannot consist of only
> > > > > expressions)."?
> > > >
> > > > That part in parentheses ought to say "the index ..." because it is
> > > > referring to the full INDEX, not to the leftmost column. I think this
> > > > was missed when Sawada-san took my previous suggestion [1].
> > > >
> > > > IMO it doesn't sound right to say the "index column must be a
> > > > non-expression". It is already a non-expression because it is a
> > > > column. So I think it would be better to refer to this as an INDEX
> > > > "field" instead of an INDEX column. Note that "field" is the same
> > > > terminology used in the docs for CREATE INDEX [2].
> > > >
> > >
> > > I thought it would be better to be explicit for this case but I am
> > > fine if Sawada-San and you prefer some other way to document it.
> > >
> >
> > I see. How about just moving the parenthesized part to explicitly
> > refer only to the leftmost field?
> >
> > SUGGESTION
> > Candidate indexes must be btree, non-partial, and the leftmost index
> > field must be a column (not an expression) that references a published
> > table column.
> >
>
> Yeah, something like that works for me.

Looks good to me. I've attached the updated patch. In the comment in
RemoteRelContainsLeftMostColumnOnIdx(), I used "remote relation"
instead of "published table" as it's more consistent with surrounding
comments. Also, I've removed the comment starting with "We also skip
indedes..." as the new comment now covers it. Please review it.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Вложения

v4-0001-Doc-clarify-the-conditions-of-usable-indexes-for-.patch

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

От

Peter Smith

Дата:

11 июля 2023 г., 11:31:30

Here are my comments for v4.

======

Docs/Comments:

All the docs and updated comments LTGM, except I felt one sentence
might be written differently to avoid nested parentheses.

BEFORE
...used for REPLICA IDENTITY FULL table (see
FindUsableIndexForReplicaIdentityFull() for details).

AFTER
...used for REPLICA IDENTITY FULL table. See
FindUsableIndexForReplicaIdentityFull() for details.

====

Logic:

What was the decision about the earlier question [1] of
removing/merging the function IsIndexOnlyOnExpression()?

------
[1] https://www.postgresql.org/message-id/CAHut%2BPuGhGHp9Uq8-Wk7uBiirAHF5quDY_1Z6WDoUKRZqkn%2Brg%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

От

Masahiko Sawada

Дата:

12 июля 2023 г., 10:00:25

On Tue, Jul 11, 2023 at 5:31 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Here are my comments for v4.
>
> ======
>
> Docs/Comments:
>
> All the docs and updated comments LTGM, except I felt one sentence
> might be written differently to avoid nested parentheses.
>
> BEFORE
> ...used for REPLICA IDENTITY FULL table (see
> FindUsableIndexForReplicaIdentityFull() for details).
>
> AFTER
> ...used for REPLICA IDENTITY FULL table. See
> FindUsableIndexForReplicaIdentityFull() for details.
>
> ====

Agreed. I've attached the updated patch. I'll push it barring any objections.

>
> Logic:
>
> What was the decision about the earlier question [1] of
> removing/merging the function IsIndexOnlyOnExpression()?
>

I don't think we have concluded any action for it. I agree that
IsIndexOnlyOnExpression() is redundant. We don't need to check *all*
index fields actually. I've attached a draft patch. It removes
IsIndexOnlyOnExpression() and merges
RemoteRelContainsLeftMostColumnOnIdx() to
FindUsableIndexForReplicaIdentityFull. One concern is that we no
longer do the assertion check with
IsIndexUsableForReplicaIdentityFull(). What do you think?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Вложения

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

От

Amit Kapila

Дата:

12 июля 2023 г., 13:08:43

On Wed, Jul 12, 2023 at 12:31 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Tue, Jul 11, 2023 at 5:31 PM Peter Smith <smithpb2250@gmail.com> wrote:
> >
>
> I don't think we have concluded any action for it. I agree that
> IsIndexOnlyOnExpression() is redundant. We don't need to check *all*
> index fields actually. I've attached a draft patch. It removes
> IsIndexOnlyOnExpression() and merges
> RemoteRelContainsLeftMostColumnOnIdx() to
> FindUsableIndexForReplicaIdentityFull. One concern is that we no
> longer do the assertion check with
> IsIndexUsableForReplicaIdentityFull(). What do you think?
>

I think this is a valid concern. Can't we move all the checks
(including the remote attrs check) inside
IsIndexUsableForReplicaIdentityFull() and then call it from both
places? Won't we have attrmap information available in the callers of
FindReplTupleInLocalRel() via ApplyExecutionData?

--
With Regards,
Amit Kapila.

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

От

Masahiko Sawada

Дата:

12 июля 2023 г., 17:15:01

On Wed, Jul 12, 2023 at 7:08 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Jul 12, 2023 at 12:31 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Tue, Jul 11, 2023 at 5:31 PM Peter Smith <smithpb2250@gmail.com> wrote:
> > >
> >
> > I don't think we have concluded any action for it. I agree that
> > IsIndexOnlyOnExpression() is redundant. We don't need to check *all*
> > index fields actually. I've attached a draft patch. It removes
> > IsIndexOnlyOnExpression() and merges
> > RemoteRelContainsLeftMostColumnOnIdx() to
> > FindUsableIndexForReplicaIdentityFull. One concern is that we no
> > longer do the assertion check with
> > IsIndexUsableForReplicaIdentityFull(). What do you think?
> >
>
> I think this is a valid concern. Can't we move all the checks
> (including the remote attrs check) inside
> IsIndexUsableForReplicaIdentityFull() and then call it from both
> places? Won't we have attrmap information available in the callers of
> FindReplTupleInLocalRel() via ApplyExecutionData?

You mean to pass ApplyExecutionData or attrmap down to
RelationFindReplTupleByIndex()? I think it would be better to call it
from FindReplTupleInLocalRel() instead.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

От

Önder Kalacı

Дата:

12 июля 2023 г., 17:15:17

Hi Amit, all

Amit Kapila <amit.kapila16@gmail.com>, 12 Tem 2023 Çar, 13:09 tarihinde şunu yazdı:

On Wed, Jul 12, 2023 at 12:31 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Tue, Jul 11, 2023 at 5:31 PM Peter Smith <smithpb2250@gmail.com> wrote:
> >
>
> I don't think we have concluded any action for it. I agree that
> IsIndexOnlyOnExpression() is redundant. We don't need to check *all*
> index fields actually. I've attached a draft patch. It removes
> IsIndexOnlyOnExpression() and merges
> RemoteRelContainsLeftMostColumnOnIdx() to
> FindUsableIndexForReplicaIdentityFull. One concern is that we no
> longer do the assertion check with
> IsIndexUsableForReplicaIdentityFull(). What do you think?
>

I think this is a valid concern. Can't we move all the checks
(including the remote attrs check) inside
IsIndexUsableForReplicaIdentityFull() and then call it from both
places? Won't we have attrmap information available in the callers of
FindReplTupleInLocalRel() via ApplyExecutionData?

I think such an approach is slightly better than the proposed changes on

remove_redundant_check.patch

I think one reason we ended up with IsIndexUsableForReplicaIdentityFull() is that it

is a nice way for documenting the requirements in the code.

However, as you also alluded to in the thread, RemoteRelContainsLeftMostColumnOnIdx()

breaks this documentation.

I agree that it is nice to have all the logic to be in the same place. I think remove_redundant_check.patch

does that by inlining IsIndexUsableForReplicaIdentityFull and RemoteRelContainsLeftMostColumnOnIdx into FindUsableIndexForReplicaIdentityFull().

As Amit noted, the other way around might be more interesting. We expand

IsIndexUsableForReplicaIdentityFull() such that it also includes

RemoteRelContainsLeftMostColumnOnIdx(). With that, readers of

IsIndexUsableForReplicaIdentityFull() can follow the requirements slightly easier.

Though, not sure yet if we can get all the necessary information for the Assert

via ApplyExecutionData in FindReplTupleInLocalRel. Perhaps yes.

Thanks,

Onder

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

От

Peter Smith

Дата:

13 июля 2023 г., 02:03:27

On Wed, Jul 12, 2023 at 5:01 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Tue, Jul 11, 2023 at 5:31 PM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > Here are my comments for v4.
> >
> > ======
> >
> > Docs/Comments:
> >

> > ====
>
> Agreed. I've attached the updated patch. I'll push it barring any objections.
>
> >

I checked v5-0001 and noticed the following:

======
doc/src/sgml/logical-replication.sgml

BEFORE
... and the leftmost index field must be a  column (not an expression)
that reference a published table column.

SUGGESTION ("references the", instead of "reference a")
... and the leftmost index field must be a  column (not an expression)
that references the published table column.

(maybe that last word "column" is also unnecessary?)

======
src/backend/replication/logical/relation.c

BEFORE
The index must be btree, non-partial, and the leftmost field must be a
column (not an expression) that reference the remote relation.

SUGGESTION ("references", instead of "reference")
The index must be btree, non-partial, and the leftmost field must be a
column (not an expression) that references the remote relation.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

От

Masahiko Sawada

Дата:

13 июля 2023 г., 04:27:41

On Thu, Jul 13, 2023 at 8:03 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Wed, Jul 12, 2023 at 5:01 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Tue, Jul 11, 2023 at 5:31 PM Peter Smith <smithpb2250@gmail.com> wrote:
> > >
> > > Here are my comments for v4.
> > >
> > > ======
> > >
> > > Docs/Comments:
> > >
>
> > > ====
> >
> > Agreed. I've attached the updated patch. I'll push it barring any objections.
> >
> > >
>
> I checked v5-0001 and noticed the following:
>
> ======
> doc/src/sgml/logical-replication.sgml
>
> BEFORE
> ... and the leftmost index field must be a  column (not an expression)
> that reference a published table column.
>
> SUGGESTION ("references the", instead of "reference a")
> ... and the leftmost index field must be a  column (not an expression)
> that references the published table column.

Thanks, will fix.

>
> (maybe that last word "column" is also unnecessary?)

But an index column doesn't reference the published table, but the
published table's column, no?

>
> ======
> src/backend/replication/logical/relation.c
>
> BEFORE
> The index must be btree, non-partial, and the leftmost field must be a
> column (not an expression) that reference the remote relation.
>
> SUGGESTION ("references", instead of "reference")
> The index must be btree, non-partial, and the leftmost field must be a
> column (not an expression) that references the remote relation.
>

Will fix.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

От

Peter Smith

Дата:

13 июля 2023 г., 05:12:27

On Thu, Jul 13, 2023 at 11:28 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Thu, Jul 13, 2023 at 8:03 AM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > On Wed, Jul 12, 2023 at 5:01 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Tue, Jul 11, 2023 at 5:31 PM Peter Smith <smithpb2250@gmail.com> wrote:
> > > >
...
> >
> > I checked v5-0001 and noticed the following:
> >
> > ======
> > doc/src/sgml/logical-replication.sgml
> >
> > BEFORE
> > ... and the leftmost index field must be a  column (not an expression)
> > that reference a published table column.
> >
> > SUGGESTION ("references the", instead of "reference a")
> > ... and the leftmost index field must be a  column (not an expression)
> > that references the published table column.
>
> Thanks, will fix.
>
> >
> > (maybe that last word "column" is also unnecessary?)
>
> But an index column doesn't reference the published table, but the
> published table's column, no?
>

Yeah, but there is some inconsistency with the other code comment that
just says "... that references the remote relation.", so I thought one
of them needs to change. If not this one, then the other one.

> >
> > ======
> > src/backend/replication/logical/relation.c
> >
> > BEFORE
> > The index must be btree, non-partial, and the leftmost field must be a
> > column (not an expression) that reference the remote relation.
> >
> > SUGGESTION ("references", instead of "reference")
> > The index must be btree, non-partial, and the leftmost field must be a
> > column (not an expression) that references the remote relation.
> >
>
> Will fix.
>

------
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

От

Masahiko Sawada

Дата:

13 июля 2023 г., 05:21:34

On Thu, Jul 13, 2023 at 11:12 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Thu, Jul 13, 2023 at 11:28 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Thu, Jul 13, 2023 at 8:03 AM Peter Smith <smithpb2250@gmail.com> wrote:
> > >
> > > On Wed, Jul 12, 2023 at 5:01 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >
> > > > On Tue, Jul 11, 2023 at 5:31 PM Peter Smith <smithpb2250@gmail.com> wrote:
> > > > >
> ...
> > >
> > > I checked v5-0001 and noticed the following:
> > >
> > > ======
> > > doc/src/sgml/logical-replication.sgml
> > >
> > > BEFORE
> > > ... and the leftmost index field must be a  column (not an expression)
> > > that reference a published table column.
> > >
> > > SUGGESTION ("references the", instead of "reference a")
> > > ... and the leftmost index field must be a  column (not an expression)
> > > that references the published table column.
> >
> > Thanks, will fix.
> >
> > >
> > > (maybe that last word "column" is also unnecessary?)
> >
> > But an index column doesn't reference the published table, but the
> > published table's column, no?
> >
>
> Yeah, but there is some inconsistency with the other code comment that
> just says "... that references the remote relation.", so I thought one
> of them needs to change. If not this one, then the other one.

Right. So let's add "column" in both places. Attached the updated patch

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Вложения

v6-0001-Doc-clarify-the-conditions-of-usable-indexes-for-.patch

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

От

Peter Smith

Дата:

13 июля 2023 г., 06:09:05

On Thu, Jul 13, 2023 at 12:22 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Thu, Jul 13, 2023 at 11:12 AM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > On Thu, Jul 13, 2023 at 11:28 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Thu, Jul 13, 2023 at 8:03 AM Peter Smith <smithpb2250@gmail.com> wrote:
> > > >
> > > > On Wed, Jul 12, 2023 at 5:01 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > >
> > > > > On Tue, Jul 11, 2023 at 5:31 PM Peter Smith <smithpb2250@gmail.com> wrote:
> > > > > >
> > ...
> > > >
> > > > I checked v5-0001 and noticed the following:
> > > >
> > > > ======
> > > > doc/src/sgml/logical-replication.sgml
> > > >
> > > > BEFORE
> > > > ... and the leftmost index field must be a  column (not an expression)
> > > > that reference a published table column.
> > > >
> > > > SUGGESTION ("references the", instead of "reference a")
> > > > ... and the leftmost index field must be a  column (not an expression)
> > > > that references the published table column.
> > >
> > > Thanks, will fix.
> > >
> > > >
> > > > (maybe that last word "column" is also unnecessary?)
> > >
> > > But an index column doesn't reference the published table, but the
> > > published table's column, no?
> > >
> >
> > Yeah, but there is some inconsistency with the other code comment that
> > just says "... that references the remote relation.", so I thought one
> > of them needs to change. If not this one, then the other one.
>
> Right. So let's add "column" in both places. Attached the updated patch
>

v6-0001 LGTM.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

От

Masahiko Sawada

Дата:

13 июля 2023 г., 08:25:15

On Wed, Jul 12, 2023 at 11:15 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, Jul 12, 2023 at 7:08 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Jul 12, 2023 at 12:31 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Tue, Jul 11, 2023 at 5:31 PM Peter Smith <smithpb2250@gmail.com> wrote:
> > > >
> > >
> > > I don't think we have concluded any action for it. I agree that
> > > IsIndexOnlyOnExpression() is redundant. We don't need to check *all*
> > > index fields actually. I've attached a draft patch. It removes
> > > IsIndexOnlyOnExpression() and merges
> > > RemoteRelContainsLeftMostColumnOnIdx() to
> > > FindUsableIndexForReplicaIdentityFull. One concern is that we no
> > > longer do the assertion check with
> > > IsIndexUsableForReplicaIdentityFull(). What do you think?
> > >
> >
> > I think this is a valid concern. Can't we move all the checks
> > (including the remote attrs check) inside
> > IsIndexUsableForReplicaIdentityFull() and then call it from both
> > places? Won't we have attrmap information available in the callers of
> > FindReplTupleInLocalRel() via ApplyExecutionData?
>
> You mean to pass ApplyExecutionData or attrmap down to
> RelationFindReplTupleByIndex()? I think it would be better to call it
> from FindReplTupleInLocalRel() instead.

I've attached a draft patch for this idea.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Вложения

remove_redundant_check_v2.patch

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

От

Amit Kapila

Дата:

15 июля 2023 г., 08:11:03

On Thu, Jul 13, 2023 at 10:55 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, Jul 12, 2023 at 11:15 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > > I think this is a valid concern. Can't we move all the checks
> > > (including the remote attrs check) inside
> > > IsIndexUsableForReplicaIdentityFull() and then call it from both
> > > places? Won't we have attrmap information available in the callers of
> > > FindReplTupleInLocalRel() via ApplyExecutionData?
> >
> > You mean to pass ApplyExecutionData or attrmap down to
> > RelationFindReplTupleByIndex()? I think it would be better to call it
> > from FindReplTupleInLocalRel() instead.
>
> I've attached a draft patch for this idea.
>

Looks reasonable to me. However, I am not very sure if we need to
change the prototype of RelationFindReplTupleByIndex(). Few other
minor comments:

1.
- * has been implemented as a tri-state with values DISABLED, PENDING, and
+n * has been implemented as a tri-state with values DISABLED, PENDING, and
  * ENABLED.
  *
The above seems like a spurious change.

2.
+ /* And must reference the remote relation column */
+ if (attrmap->maplen <= AttrNumberGetAttrOffset(keycol) ||
+ attrmap->attnums[AttrNumberGetAttrOffset(keycol)] < 0)
+ return false;
+

I think we should specify the reason for this. I see that in the
commit fd48a86c62 [1], the reason for this is removed. Shouldn't we
retain that in some form?

[1] -
- * We also skip indexes if the remote relation does not contain the leftmost
- * column of the index. This is because in most such cases sequential scan is
- * favorable over index scan.

--
With Regards,
Amit Kapila.

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

От

Önder Kalacı

Дата:

17 июля 2023 г., 10:24:27

Hi,

I've attached a draft patch for this idea.

I think it needs a rebase after edca3424342da323499a1998d18a888283e52ac7.

Also, as discussed in [1], I think one improvement we had was to keep IsIndexUsableForReplicaIdentityFull() in a way that it is easier to read & better documented in the code. So, it would be nice to stick to that.

Overall, the proposed changes make sense to me as well. Once the patch is ready, I'm happy to review & test in detail.

Thanks,

Onder

[1] https://www.postgresql.org/message-id/CAA4eK1Jcyrxt_84wt2%3DQnOcwwJEC2et%2BtCLjAuTXzE6N3FXqQw%40mail.gmail.com

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

От

Masahiko Sawada

Дата:

18 июля 2023 г., 09:39:32

On Sat, Jul 15, 2023 at 2:11 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Jul 13, 2023 at 10:55 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Wed, Jul 12, 2023 at 11:15 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > > I think this is a valid concern. Can't we move all the checks
> > > > (including the remote attrs check) inside
> > > > IsIndexUsableForReplicaIdentityFull() and then call it from both
> > > > places? Won't we have attrmap information available in the callers of
> > > > FindReplTupleInLocalRel() via ApplyExecutionData?
> > >
> > > You mean to pass ApplyExecutionData or attrmap down to
> > > RelationFindReplTupleByIndex()? I think it would be better to call it
> > > from FindReplTupleInLocalRel() instead.
> >
> > I've attached a draft patch for this idea.
> >
>
> Looks reasonable to me. However, I am not very sure if we need to
> change the prototype of RelationFindReplTupleByIndex(). Few other
> minor comments:

Agreed. I reverted the change.

>
> 1.
> - * has been implemented as a tri-state with values DISABLED, PENDING, and
> +n * has been implemented as a tri-state with values DISABLED, PENDING, and
>   * ENABLED.
>   *
> The above seems like a spurious change.

Fixed.

>
> 2.
> + /* And must reference the remote relation column */
> + if (attrmap->maplen <= AttrNumberGetAttrOffset(keycol) ||
> + attrmap->attnums[AttrNumberGetAttrOffset(keycol)] < 0)
> + return false;
> +
>
> I think we should specify the reason for this. I see that in the
> commit fd48a86c62 [1], the reason for this is removed. Shouldn't we
> retain that in some form?

Agreed.

I've updated the patch. Regarding one change in the patch:

  * Returns the oid of an index that can be used by the apply worker to scan
- * the relation. The index must be btree or hash, non-partial, and the leftmost
- * field must be a column (not an expression) that references the remote
- * relation column. These limitations help to keep the index scan similar
- * to PK/RI index scans.
+ * the relation.

I moved the second sentence to IsIndexUsableForReplicaIdentityFull()
because this function is now responsible for checking if the given
index is usable for REPLICA IDENTITY FULL tables. I think it would be
better to mention all conditions for such usable indexes in one place.
Currently, the conditions are explained in
FindUsableIndexForReplicaIdentityFull() but the checks are performed
and the details are explained in
IsIndexUsableForReplicaIdentityFull().

BTW, IsIndexOnlyExpression() is not necessary but the current code
still works fine. So do we need to backpatch it to PG16? I'm thinking
we can apply it to only HEAD.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Вложения

v3-0001-Remove-unnecessary-checks-for-indexes-for-REPLICA.patch

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

От

Amit Kapila

Дата:

18 июля 2023 г., 12:34:35

On Tue, Jul 18, 2023 at 12:10 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> BTW, IsIndexOnlyExpression() is not necessary but the current code
> still works fine. So do we need to backpatch it to PG16? I'm thinking
> we can apply it to only HEAD.
>

Either way is fine but I think if we backpatch it then the code
remains consistent and the backpatching would be easier.

--
With Regards,
Amit Kapila.

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

От

Önder Kalacı

Дата:

19 июля 2023 г., 11:09:41

Hi Masahiko, Amit, all

I've updated the patch.

I think the flow is much nicer now compared to the HEAD. I really don't have any

comments regarding the accuracy of the code changes, all looks good to me.

Overall, I cannot see any behavioral changes as you already alluded to.

Maybe few minor notes regarding the comments:

/*
+ * And must reference the remote relation column. This is because if it
+ * doesn't, the sequential scan is favorable over index scan in most
+ * cases..
+ */

I think the reader might have lost the context (or say in the future due to

another refactor etc). So maybe start with:

/* And the leftmost index field must refer to the ...

Also, now in IsIndexUsableForReplicaIdentityFull() some of the conditions have comments

some don't. Should we comment on the missing ones as well, maybe such as:

/* partial indexes are not support *
if (indexInfo->ii_Predicate != NIL)

and,

/* all indexes must have at least one attribute */
Assert(indexInfo->ii_NumIndexAttrs >= 1);

BTW, IsIndexOnlyExpression() is not necessary but the current code
still works fine. So do we need to backpatch it to PG16? I'm thinking
we can apply it to only HEAD.

Either way is fine but I think if we backpatch it then the code
remains consistent and the backpatching would be easier.

Yeah, I also have a slight preference for backporting. It could make it easier to maintain the code

in the future in case of another backport(s). With the cost of making it slightly harder for you now :)

Thanks,

Onder

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

От

Masahiko Sawada

Дата:

21 июля 2023 г., 04:25:13

On Wed, Jul 19, 2023 at 5:09 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
>
> Hi Masahiko, Amit, all
>
>> I've updated the patch.
>
>
> I think the flow is much nicer now compared to the HEAD. I really don't have any
> comments regarding the accuracy of the code changes, all looks good to me.
> Overall, I cannot see any behavioral changes as you already alluded to.

Thank you for reviewing the patch.

>
> Maybe few minor notes regarding the comments:
>
>>  /*
>> + * And must reference the remote relation column. This is because if it
>> + * doesn't, the sequential scan is favorable over index scan in most
>> + * cases..
>> + */
>
>
> I think the reader might have lost the context (or say in the future due to
> another refactor etc). So maybe start with:
>
>> /* And the leftmost index field must refer to the  ...

Fixed.

>
>
> Also, now in IsIndexUsableForReplicaIdentityFull() some of the conditions have comments
> some don't. Should we comment on the missing ones as well, maybe such as:
>
>> /* partial indexes are not support *
>> if (indexInfo->ii_Predicate != NIL)
>
> and,
>>
>> /* all indexes must have at least one attribute */
>> Assert(indexInfo->ii_NumIndexAttrs >= 1);

Agreed. But I don't think the latter comment is necessary as it's obvious.

>
>
>
>>
>>>
>>>
>>> BTW, IsIndexOnlyExpression() is not necessary but the current code
>>> still works fine. So do we need to backpatch it to PG16? I'm thinking
>>> we can apply it to only HEAD.
>>
>> Either way is fine but I think if we backpatch it then the code
>> remains consistent and the backpatching would be easier.
>
>
> Yeah, I also have a slight preference for backporting. It could make it easier to maintain the code
> in the future in case of another backport(s). With the cost of making it slightly harder for you now :)

Agreed.

I've attached the updated patch. I'll push it early next week, barring
any objections.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Вложения

v4-0001-Remove-unnecessary-checks-for-indexes-for-REPLICA.patch

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

От

Amit Kapila

Дата:

22 июля 2023 г., 13:32:45

On Fri, Jul 21, 2023 at 6:55 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> I've attached the updated patch. I'll push it early next week, barring
> any objections.
>

You have moved most of the comments related to the restriction of
which index can be picked atop IsIndexUsableForReplicaIdentityFull().
Now, the comments related to limitation atop
FindUsableIndexForReplicaIdentityFull() look slightly odd as it refers
to limitations but those limitation were not stated. The comments I am
referring to are: "Note that the limitations of index scans for
replica identity full only .... might not be a good idea in some
cases". Shall we move these as well atop
IsIndexUsableForReplicaIdentityFull()?

--
With Regards,
Amit Kapila.

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

От

Masahiko Sawada

Дата:

24 июля 2023 г., 04:09:13

On Sat, Jul 22, 2023 at 7:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Jul 21, 2023 at 6:55 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > I've attached the updated patch. I'll push it early next week, barring
> > any objections.
> >
>
> You have moved most of the comments related to the restriction of
> which index can be picked atop IsIndexUsableForReplicaIdentityFull().
> Now, the comments related to limitation atop
> FindUsableIndexForReplicaIdentityFull() look slightly odd as it refers
> to limitations but those limitation were not stated. The comments I am
> referring to are: "Note that the limitations of index scans for
> replica identity full only .... might not be a good idea in some
> cases". Shall we move these as well atop
> IsIndexUsableForReplicaIdentityFull()?

Good point.

Looking at neighbor comments, the following comment looks slightly odd to me:

 * XXX: See IsIndexUsableForReplicaIdentityFull() to know the challenges in
 * supporting indexes other than btree and hash. For partial indexes, the
 * required changes are likely to be larger. If none of the tuples satisfy
 * the expression for the index scan, we fall-back to sequential execution,
 * which might not be a good idea in some cases.

Are the first and second sentences related actually?

I think we can move it as well to
IsIndexUsableForReplicaIdentityFull() with some adjustments. I've
attached the updated patch that incorporated your comment and included
my idea to update the comment.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Вложения

v5-0001-Remove-unnecessary-checks-for-indexes-for-REPLICA.patch

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

От

Amit Kapila

Дата:

24 июля 2023 г., 06:04:55

On Mon, Jul 24, 2023 at 6:39 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Sat, Jul 22, 2023 at 7:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> >
> > You have moved most of the comments related to the restriction of
> > which index can be picked atop IsIndexUsableForReplicaIdentityFull().
> > Now, the comments related to limitation atop
> > FindUsableIndexForReplicaIdentityFull() look slightly odd as it refers
> > to limitations but those limitation were not stated. The comments I am
> > referring to are: "Note that the limitations of index scans for
> > replica identity full only .... might not be a good idea in some
> > cases". Shall we move these as well atop
> > IsIndexUsableForReplicaIdentityFull()?
>
> Good point.
>
> Looking at neighbor comments, the following comment looks slightly odd to me:
>
>  * XXX: See IsIndexUsableForReplicaIdentityFull() to know the challenges in
>  * supporting indexes other than btree and hash. For partial indexes, the
>  * required changes are likely to be larger. If none of the tuples satisfy
>  * the expression for the index scan, we fall-back to sequential execution,
>  * which might not be a good idea in some cases.
>
> Are the first and second sentences related actually?
>

Not really.

> I think we can move it as well to
> IsIndexUsableForReplicaIdentityFull() with some adjustments. I've
> attached the updated patch that incorporated your comment and included
> my idea to update the comment.
>

LGTM.

--
With Regards,
Amit Kapila.

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

От

Masahiko Sawada

Дата:

25 июля 2023 г., 10:44:34

On Mon, Jul 24, 2023 at 12:05 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Jul 24, 2023 at 6:39 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Sat, Jul 22, 2023 at 7:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > >
> > > You have moved most of the comments related to the restriction of
> > > which index can be picked atop IsIndexUsableForReplicaIdentityFull().
> > > Now, the comments related to limitation atop
> > > FindUsableIndexForReplicaIdentityFull() look slightly odd as it refers
> > > to limitations but those limitation were not stated. The comments I am
> > > referring to are: "Note that the limitations of index scans for
> > > replica identity full only .... might not be a good idea in some
> > > cases". Shall we move these as well atop
> > > IsIndexUsableForReplicaIdentityFull()?
> >
> > Good point.
> >
> > Looking at neighbor comments, the following comment looks slightly odd to me:
> >
> >  * XXX: See IsIndexUsableForReplicaIdentityFull() to know the challenges in
> >  * supporting indexes other than btree and hash. For partial indexes, the
> >  * required changes are likely to be larger. If none of the tuples satisfy
> >  * the expression for the index scan, we fall-back to sequential execution,
> >  * which might not be a good idea in some cases.
> >
> > Are the first and second sentences related actually?
> >
>
> Not really.
>
> > I think we can move it as well to
> > IsIndexUsableForReplicaIdentityFull() with some adjustments. I've
> > attached the updated patch that incorporated your comment and included
> > my idea to update the comment.
> >
>
> LGTM.

Pushed.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения