Обсуждение: Table AM Interface Enhancements

Поиск
Список
Период
Сортировка

Table AM Interface Enhancements

От
Alexander Korotkov
Дата:
Hello PostgreSQL Hackers,

I am pleased to submit a series of patches related to the Table Access
Method (AM) interface, which I initially announced during my talk at
PGCon 2023 [1]. These patches are primarily designed to support the
OrioleDB engine, but I believe they could be beneficial for other
table AM implementations as well.

The focus of these patches is to introduce more flexibility and
capabilities into the Table AM interface. This is particularly
relevant for advanced use cases like index-organized tables,
alternative MVCC implementations, etc.

Here's a brief overview of the patches included in this set:

0001-Allow-locking-updated-tuples-in-tuple_update-and--v1.patch

Optimizes the process of locking concurrently updated tuples during
update and delete operations. Helpful for table AMs where refinding
existing tuples is expensive.

0002-Add-EvalPlanQual-delete-returning-isolation-test-v1.patch

The new isolation test is related to the previous patch.  These two
patches were previously discussed in [2].

0003-Allow-table-AM-to-store-complex-data-structures-i-v1.patch

Allows table AM to store complex data structure in rd_amcache rather
than a single chunk of memory.

0004-Add-table-AM-tuple_is_current-method-v1.patch

This allows us to abstract how/whether table AM uses transaction identifiers.

0005-Generalize-relation-analyze-in-table-AM-interface-v1.patch

Provides a more flexible API for sampling tuples, beneficial for
non-standard table types like index-organized tables.

0006-Generalize-table-AM-API-for-INSERT-.-ON-CONFLICT-v1.patch

Provides a new table AM API method to encapsulate the whole INSERT ...
ON CONFLICT ... algorithm rather than just implementation of
speculative tokens.

0007-Allow-table-AM-tuple_insert-method-to-return-the--v1.patch

This allows table AM to return a native tuple slot, which is aware of
table AM-specific system attributes.

0008-Let-table-AM-insertion-methods-control-index-inse-v1.patch

Allows table AM to skip index insertions in the executor and handle
those insertions itself.

0009-Custom-reloptions-for-table-AM-v1.patch

Enables table AMs to define and override reloptions for tables and indexes.

0010-Notify-table-AM-about-index-creation-v1.patch

Allows table AMs to prepare or update specific meta-information during
index creation.

011-Introduce-RowRefType-which-describes-the-table-ro-v1.patch

Separates the row identifier type from the lock mode in RowMarkType,
providing clearer semantics and more flexibility.

0012-Introduce-RowID-bytea-tuple-identifier-v1.patch

`This patch introduces 'RowID', a new bytea tuple identifier, to
overcome the limitations of the current 32-bit block number and 16-bit
offset-based tuple identifier. This is particularly useful for
index-organized tables and other advanced use cases.

Each commit message contains a detailed explanation of the changes and
their rationale. I believe these enhancements will significantly
improve the flexibility and capabilities of the PostgreSQL Table AM
interface.

I am looking forward to your feedback and suggestions on these patches.

Links

1. https://www.pgcon.org/events/pgcon_2023/schedule/session/470-future-of-table-access-methods/
2. https://www.postgresql.org/message-id/CAPpHfdua-YFw3XTprfutzGp28xXLigFtzNbuFY8yPhqeq6X5kg%40mail.gmail.com

------
Regards,
Alexander Korotkov

Вложения

Re: Table AM Interface Enhancements

От
Matthias van de Meent
Дата:
Hi,

On Thu, 23 Nov 2023 at 13:43, Alexander Korotkov <aekorotkov@gmail.com> wrote:
>
> Hello PostgreSQL Hackers,
>
> I am pleased to submit a series of patches related to the Table Access
> Method (AM) interface, which I initially announced during my talk at
> PGCon 2023 [1]. These patches are primarily designed to support the
> OrioleDB engine, but I believe they could be beneficial for other
> table AM implementations as well.
>
> The focus of these patches is to introduce more flexibility and
> capabilities into the Table AM interface. This is particularly
> relevant for advanced use cases like index-organized tables,
> alternative MVCC implementations, etc.
>
> Here's a brief overview of the patches included in this set:

Note: no significant review of the patches, just a first response on
the cover letters and oddities I noticed:

Overall, this patchset adds significant API area to TableAmRoutine,
without adding the relevant documentation on how it's expected to be
used. With the overall size of the patchset also being very
significant, I don't think this patch is reviewable as is; the goal
isn't clear enough, the APIs aren't well explained, and the
interactions with the index API are left up in the air.

> 0001-Allow-locking-updated-tuples-in-tuple_update-and--v1.patch
>
> Optimizes the process of locking concurrently updated tuples during
> update and delete operations. Helpful for table AMs where refinding
> existing tuples is expensive.

Is this essentially an optimized implementation of the "DELETE FROM
... RETURNING *" per-tuple primitive?

> 0003-Allow-table-AM-to-store-complex-data-structures-i-v1.patch
>
> Allows table AM to store complex data structure in rd_amcache rather
> than a single chunk of memory.

I don't think we should allow arbitrarily large and arbitrarily many
chunks of data in the relcache or table caches. Why isn't one chunk
enough?

> 0004-Add-table-AM-tuple_is_current-method-v1.patch
>
> This allows us to abstract how/whether table AM uses transaction identifiers.

I'm not a fan of the indirection here. Also, assuming that table slots
don't outlive transactions, wouldn't this be a more appropriate fit
with the table tuple slot API?

> 0005-Generalize-relation-analyze-in-table-AM-interface-v1.patch
>
> Provides a more flexible API for sampling tuples, beneficial for
> non-standard table types like index-organized tables.
>
> 0006-Generalize-table-AM-API-for-INSERT-.-ON-CONFLICT-v1.patch
>
> Provides a new table AM API method to encapsulate the whole INSERT ...
> ON CONFLICT ... algorithm rather than just implementation of
> speculative tokens.

Does this not still require speculative inserts, with speculative
tokens, for secondary indexes? Why make AMs implement that all over
again?

> 0007-Allow-table-AM-tuple_insert-method-to-return-the--v1.patch
>
> This allows table AM to return a native tuple slot, which is aware of
> table AM-specific system attributes.

This seems reasonable.

> 0008-Let-table-AM-insertion-methods-control-index-inse-v1.patch
>
> Allows table AM to skip index insertions in the executor and handle
> those insertions itself.

Who handles index tuple removal then? I don't see a patch that
describes index AM changes for this...

> 0009-Custom-reloptions-for-table-AM-v1.patch
>
> Enables table AMs to define and override reloptions for tables and indexes.
>
> 0010-Notify-table-AM-about-index-creation-v1.patch
>
> Allows table AMs to prepare or update specific meta-information during
> index creation.

I don't think the described use case of this API is OK - a table AM
cannot know about the internals of index AMs, and is definitely not
allowed to overwrite the information of that index.
If I ask for an index that uses the "btree" index, then that needs to
be the AM actually used, or an error needs to be raised if it is
somehow incompatible with the table AM used. It can't be that we
silently update information and create an index that is explicitly not
what the user asked to create.

I also don't see updates in documentation, which I think is quite a
shame as I have trouble understanding some parts.

> 0012-Introduce-RowID-bytea-tuple-identifier-v1.patch
>
> `This patch introduces 'RowID', a new bytea tuple identifier, to
> overcome the limitations of the current 32-bit block number and 16-bit
> offset-based tuple identifier. This is particularly useful for
> index-organized tables and other advanced use cases.

We don't have any index methods that can handle anything but
block+offset TIDs, and I don't see any changes to the IndexAM APIs to
support these RowID tuples, so what's the plan here? I don't see any
of that in the commit message, nor in the rest of this patchset.

> Each commit message contains a detailed explanation of the changes and
> their rationale. I believe these enhancements will significantly
> improve the flexibility and capabilities of the PostgreSQL Table AM
> interface.

I've noticed there is not a lot of rationale for several of the
changes as to why PostgreSQL needs these changes implemented like
this, amongst which the index-related tableAM changes.

I understand that index-organized tables can be quite useful, but I
don't see design solutions to the more complex questions that would
still be required before we could host such table AMs like OreoleDB's
as a first-party citizen: For index-organized tables, you also need
index AM APIs that support TIDS with more than 48 bits of data
(assuming we actually want primary keys with >48 bits of usable
space), and for undo-based logging you would probably need index APIs
for retail index tuple deletion. Neither is supplied here, nor is
described why these APIs were omitted.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)



Re: Table AM Interface Enhancements

От
Mark Dilger
Дата:

> On Nov 23, 2023, at 4:42 AM, Alexander Korotkov <aekorotkov@gmail.com> wrote:


> 0006-Generalize-table-AM-API-for-INSERT-.-ON-CONFLICT-v1.patch
>
> Provides a new table AM API method to encapsulate the whole INSERT ...
> ON CONFLICT ... algorithm rather than just implementation of
> speculative tokens.

I *think* I understand that you are taking the part of INSERT..ON CONFLICT that lives outside the table AM and pulling
itinside so that table AM authors are free to come up with whatever implementation is more suited for them.  The most
straightforwardway of doing so results in an EState parameter in the interface definition.  That seems not so good, as
theEState is a fairly complicated structure, and future changes in the executor might want to rearrange what EState
tracks,which would change which values tuple_insert_with_arbiter() can depend on.  Should the patch at least document
whichparts of the EState are expected to be in which states, and which parts should be viewed as undefined?  If the
implementorsof table AMs rely on any/all aspects of EState, doesn't that prevent future changes to how that structure
isused? 

> 0008-Let-table-AM-insertion-methods-control-index-inse-v1.patch
>
> Allows table AM to skip index insertions in the executor and handle
> those insertions itself.

The new parameter could use more documentation.

> 0009-Custom-reloptions-for-table-AM-v1.patch
>
> Enables table AMs to define and override reloptions for tables and indexes.

This could use some regression tests to exercise the custom reloptions.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: Table AM Interface Enhancements

От
Alexander Korotkov
Дата:
On Fri, Nov 24, 2023 at 5:18 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
> > On Nov 23, 2023, at 4:42 AM, Alexander Korotkov <aekorotkov@gmail.com> wrote:
>
>
> > 0006-Generalize-table-AM-API-for-INSERT-.-ON-CONFLICT-v1.patch
> >
> > Provides a new table AM API method to encapsulate the whole INSERT ...
> > ON CONFLICT ... algorithm rather than just implementation of
> > speculative tokens.
>
> I *think* I understand that you are taking the part of INSERT..ON CONFLICT that lives outside the table AM and
pullingit inside so that table AM authors are free to come up with whatever implementation is more suited for them.
Themost straightforward way of doing so results in an EState parameter in the interface definition.  That seems not so
good,as the EState is a fairly complicated structure, and future changes in the executor might want to rearrange what
EStatetracks, which would change which values tuple_insert_with_arbiter() can depend on. 

I think this is the correct understanding.

> Should the patch at least document which parts of the EState are expected to be in which states, and which parts
shouldbe viewed as undefined?  If the implementors of table AMs rely on any/all aspects of EState, doesn't that prevent
futurechanges to how that structure is used? 

New tuple tuple_insert_with_arbiter() table AM API method needs EState
argument to call executor functions: ExecCheckIndexConstraints(),
ExecUpdateLockMode(), and ExecInsertIndexTuples().  I think we
probably need to invent some opaque way to call this executor function
without revealing EState to table AM.  Do you think this could work?

> > 0008-Let-table-AM-insertion-methods-control-index-inse-v1.patch
> >
> > Allows table AM to skip index insertions in the executor and handle
> > those insertions itself.
>
> The new parameter could use more documentation.
>
> > 0009-Custom-reloptions-for-table-AM-v1.patch
> >
> > Enables table AMs to define and override reloptions for tables and indexes.
>
> This could use some regression tests to exercise the custom reloptions.

Thank you for these notes.  I'll take this into account for the next
patchset version.

------
Regards,
Alexander Korotkov



Re: Table AM Interface Enhancements

От
Mark Dilger
Дата:

> On Nov 25, 2023, at 9:47 AM, Alexander Korotkov <aekorotkov@gmail.com> wrote:
>
>> Should the patch at least document which parts of the EState are expected to be in which states, and which parts
shouldbe viewed as undefined?  If the implementors of table AMs rely on any/all aspects of EState, doesn't that prevent
futurechanges to how that structure is used? 
>
> New tuple tuple_insert_with_arbiter() table AM API method needs EState
> argument to call executor functions: ExecCheckIndexConstraints(),
> ExecUpdateLockMode(), and ExecInsertIndexTuples().  I think we
> probably need to invent some opaque way to call this executor function
> without revealing EState to table AM.  Do you think this could work?

We're clearly not accessing all of the EState, just some specific fields, such as es_per_tuple_exprcontext.  I think
youcould at least refactor to pass the minimum amount of state information through the table AM API. 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: Table AM Interface Enhancements

От
Pavel Borisov
Дата:
Hi, Alexander!

I think table AM extensibility is a very good idea generally, not only in the scope of APIs that are needed in OrioleDB. Thanks for your proposals!

For patches
0001-Allow-locking-updated-tuples-in-tuple_update-and--v1.patch
0002-Add-EvalPlanQual-delete-returning-isolation-test-v1.patch

The new isolation test is related to the previous patch.  These two
patches were previously discussed in [2].

As discussion in [2] seems close to the patches being committed and the only thing it is not in v16 yet is that it was too close to feature freeze, I've copied these most recent versions of patches 0001 and 0002 from this thread in [2] to finish and commit them there. 

I'm planning to review some of the other patches from the current patchset soon.


Kind regards,
Pavel Borisov

Re: Table AM Interface Enhancements

От
Pavel Borisov
Дата:
Hi, Alexander!
I'm planning to review some of the other patches from the current patchset soon.

I've looked into the patch 0003.
The patch looks in good shape and is uncontroversial to me. Making memory structures to be dynamically allocated is simple enough and it allows to store complex data like lists etc. I consider places like this that expect memory structures to be flat and allocated at once are because the was no need in more complex ones previously. If there is a need for them, I think they could be added without much doubt, provided the simplicity of the change.

For the code:
+static inline void
+table_free_rd_amcache(Relation rel)
+{
+ if (rel->rd_tableam && rel->rd_tableam->free_rd_amcache)
+ {
+ rel->rd_tableam->free_rd_amcache(rel);
+ }
+ else
+ {
+ if (rel->rd_amcache)
+ pfree(rel->rd_amcache);
+ rel->rd_amcache = NULL;
+ }

here I suggest adding Assert(rel->rd_amcache == NULL) (or maybe better an error report) after calling free_rd_amcache to be sure the custom implementation has done what it should do. 

Also, I think some brief documentation about writing this custom method is quite relevant maybe based on already existing comments in the code. 

Kind regards,
Pavel

Re: Table AM Interface Enhancements

От
Nikita Malakhov
Дата:
Hi,

Pavel, as far as I understand Alexander's idea assertion and especially ereport
here does not make any sense - this method is not considered to report error, it
silently calls if there is underlying [free] function and simply falls through otherwise,
also, take into account that it could be located in the uninterruptible part of the code.

On the whole topic I have to 

On Wed, Nov 29, 2023 at 4:56 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
Hi, Alexander!
I'm planning to review some of the other patches from the current patchset soon.

I've looked into the patch 0003.
The patch looks in good shape and is uncontroversial to me. Making memory structures to be dynamically allocated is simple enough and it allows to store complex data like lists etc. I consider places like this that expect memory structures to be flat and allocated at once are because the was no need in more complex ones previously. If there is a need for them, I think they could be added without much doubt, provided the simplicity of the change.

For the code:
+static inline void
+table_free_rd_amcache(Relation rel)
+{
+ if (rel->rd_tableam && rel->rd_tableam->free_rd_amcache)
+ {
+ rel->rd_tableam->free_rd_amcache(rel);
+ }
+ else
+ {
+ if (rel->rd_amcache)
+ pfree(rel->rd_amcache);
+ rel->rd_amcache = NULL;
+ }

here I suggest adding Assert(rel->rd_amcache == NULL) (or maybe better an error report) after calling free_rd_amcache to be sure the custom implementation has done what it should do. 

Also, I think some brief documentation about writing this custom method is quite relevant maybe based on already existing comments in the code. 

Kind regards,
Pavel


--
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company

Re: Table AM Interface Enhancements

От
Pavel Borisov
Дата:
Hi, Nikita!

On Wed, 29 Nov 2023 at 18:27, Nikita Malakhov <hukutoc@gmail.com> wrote:
Hi,

Pavel, as far as I understand Alexander's idea assertion and especially ereport
here does not make any sense - this method is not considered to report error, it
silently calls if there is underlying [free] function and simply falls through otherwise,
also, take into account that it could be located in the uninterruptible part of the code.

On the whole topic I have to 

On Wed, Nov 29, 2023 at 4:56 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
Hi, Alexander!
I'm planning to review some of the other patches from the current patchset soon.

I've looked into the patch 0003.
The patch looks in good shape and is uncontroversial to me. Making memory structures to be dynamically allocated is simple enough and it allows to store complex data like lists etc. I consider places like this that expect memory structures to be flat and allocated at once are because the was no need in more complex ones previously. If there is a need for them, I think they could be added without much doubt, provided the simplicity of the change.

For the code:
+static inline void
+table_free_rd_amcache(Relation rel)
+{
+ if (rel->rd_tableam && rel->rd_tableam->free_rd_amcache)
+ {
+ rel->rd_tableam->free_rd_amcache(rel);
+ }
+ else
+ {
+ if (rel->rd_amcache)
+ pfree(rel->rd_amcache);
+ rel->rd_amcache = NULL;
+ }

here I suggest adding Assert(rel->rd_amcache == NULL) (or maybe better an error report) after calling free_rd_amcache to be sure the custom implementation has done what it should do. 

Also, I think some brief documentation about writing this custom method is quite relevant maybe based on already existing comments in the code. 

Kind regards,
Pavel

When we do default single chunk routine we invalidate rd_amcache pointer, 
+ if (rel->rd_amcache)
+ pfree(rel->rd_amcache);
+ rel->rd_amcache = NULL;

If we delegate this to method, my idea is to check the method implementation don't leave this pointer valid.
If it's not needed, I'm ok with it, but to me it seems that the check I proposed makes sense.

Regards,
Pavel

Re: Table AM Interface Enhancements

От
Pavel Borisov
Дата:
Hi, Alexander!

I've reviewed patch 0004. It's clear enough and I think does what it's supposed.
One thing, in function signature 
+bool (*tuple_is_current) (Relation rel, TupleTableSlot *slot);
there is a Relation agrument, which is unused in both existing heapam method. Also it's unused in OrioleDb implementation of tuple_is_current. For what goal it is needed in the interface?

No other objections around this patch.

I've also looked at 0005-0007. Although it is not a thorough review, they seem to depend on previous patch 0004.
Additionally changes in 0007 looks dependent from 0005. Does replacement of slot inside ExecInsert, that is already used in the code below the call of 

>/* insert the tuple normally */
>- table_tuple_insert(resultRelationDesc, slot,
>-   estate->es_output_cid,
>-   0, NULL); 

could be done without side effects?

Kind regards,
Pavel.

Re: Table AM Interface Enhancements

От
Pavel Borisov
Дата:

Additionally changes in 0007 looks dependent from 0005. Does replacement of slot inside ExecInsert, that is already used in the code below the call of 

>/* insert the tuple normally */
>- table_tuple_insert(resultRelationDesc, slot,
>-   estate->es_output_cid,
>-   0, NULL); 

could be done without side effects?

I'm sorry that I inserter not all relevant code in the previous message:

    /* insert the tuple normally */
- table_tuple_insert(resultRelationDesc, slot,
-   estate->es_output_cid,
-   0, NULL);
+ slot = table_tuple_insert(resultRelationDesc, slot,
+  estate->es_output_cid,
(Previously slot variable that exists in the ExecInsert() and could be used later was not modified at the quoted code block)

Pavel.

Re: Table AM Interface Enhancements

От
Alexander Korotkov
Дата:
Hi, Matthias!

On Fri, Nov 24, 2023 at 1:07 AM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
> On Thu, 23 Nov 2023 at 13:43, Alexander Korotkov <aekorotkov@gmail.com> wrote:
> >
> > Hello PostgreSQL Hackers,
> >
> > I am pleased to submit a series of patches related to the Table Access
> > Method (AM) interface, which I initially announced during my talk at
> > PGCon 2023 [1]. These patches are primarily designed to support the
> > OrioleDB engine, but I believe they could be beneficial for other
> > table AM implementations as well.
> >
> > The focus of these patches is to introduce more flexibility and
> > capabilities into the Table AM interface. This is particularly
> > relevant for advanced use cases like index-organized tables,
> > alternative MVCC implementations, etc.
> >
> > Here's a brief overview of the patches included in this set:
>
> Note: no significant review of the patches, just a first response on
> the cover letters and oddities I noticed:
>
> Overall, this patchset adds significant API area to TableAmRoutine,
> without adding the relevant documentation on how it's expected to be
> used.

I have to note that, unlike documentation for index access methods,
our documentation for table access methods doesn't have an explanation
of API functions.  Instead, it just refers to tableam.h for details.
The patches touching tableam.h also revise relevant comments.  These
comments are for sure a target for improvements.

> With the overall size of the patchset also being very
> significant

I wouldn't say that volume is very significant.  It's just 2K lines,
not the great size of a patchset.  But it for sure represents changes
of great importance.

> I don't think this patch is reviewable as is; the goal
> isn't clear enough,

The goal is to revise table AM API so that new full-featured
implementations could exist.  AFAICS, the current API was designed
keeping zheap in mind, but even zheap was always shipped with the core
patch.  All other implementations of table AM, which I've seen, are
very limited.  Basically, there is still no real alternative and
functional OLTP table AM.  I believe API limitation is one of the
reasons for that.

> the APIs aren't well explained, and

As I mentioned before, the table AM API is documented by the comments
in tableam.h.  The comments in the patchset aren't perfect for sure,
but a subject for the incremental improvements.

> the interactions with the index API are left up in the air.

Right.  These patches bring more control on interactions with indexes
to table AMs without touching the index API.  In my PGCon 2016 talk I
proposed that table AM could have its own implementation of index AM.

As you mentioned before, this patchset isn't very small already.
Considering it all together with a patchset for index AM redesign
would make it a mess.  I propose we can consider here the patches,
which are usable by themselves even without index AM changes.  And the
patches tightly coupled with index AM API changes could be considered
later together with those changes.

> > 0001-Allow-locking-updated-tuples-in-tuple_update-and--v1.patch
> >
> > Optimizes the process of locking concurrently updated tuples during
> > update and delete operations. Helpful for table AMs where refinding
> > existing tuples is expensive.
>
> Is this essentially an optimized implementation of the "DELETE FROM
> ... RETURNING *" per-tuple primitive?

Not really.  The test for "DELETE FROM ... RETURNING *" was used just
to reproduce one of the bugs stopped in [2].  The general idea is to
avoid repeated calls for tuple lock.

> > 0003-Allow-table-AM-to-store-complex-data-structures-i-v1.patch
> >
> > Allows table AM to store complex data structure in rd_amcache rather
> > than a single chunk of memory.
>
> I don't think we should allow arbitrarily large and arbitrarily many
> chunks of data in the relcache or table caches.

Hmm.. It seems to be far out of control of API what and how large
PostgreSQL extensions could actually cache.

> Why isn't one chunk
> enough?

It's generally possible to fit everything into one chunk, but that's
extremely unhandy when your cache contains something at least as
complex as tuple slots and descriptors.  I think the reason that we
still have one chunk restriction is that we don't have a full-featured
implementation fitting API yet.  If we had it, I can't imagine there
would be one chunk for a cache.

> > 0004-Add-table-AM-tuple_is_current-method-v1.patch
> >
> > This allows us to abstract how/whether table AM uses transaction identifiers.
>
> I'm not a fan of the indirection here. Also, assuming that table slots
> don't outlive transactions, wouldn't this be a more appropriate fit
> with the table tuple slot API?

This is a good idea.  I will update the patch accordingly.

> > 0005-Generalize-relation-analyze-in-table-AM-interface-v1.patch
> >
> > Provides a more flexible API for sampling tuples, beneficial for
> > non-standard table types like index-organized tables.
> >
> > 0006-Generalize-table-AM-API-for-INSERT-.-ON-CONFLICT-v1.patch
> >
> > Provides a new table AM API method to encapsulate the whole INSERT ...
> > ON CONFLICT ... algorithm rather than just implementation of
> > speculative tokens.
>
> Does this not still require speculative inserts, with speculative
> tokens, for secondary indexes? Why make AMs implement that all over
> again?

The idea here is to generalize upsert and leave speculative tokens as
details of one particular implementation.  Imagine an index-organized
table and upsert on primary key.  For that you need to just locate the
relevant page in a tree and do insert or update.  Speculative tokens
would rather be an unreasonable complication for this case.

> > 0007-Allow-table-AM-tuple_insert-method-to-return-the--v1.patch
> >
> > This allows table AM to return a native tuple slot, which is aware of
> > table AM-specific system attributes.
>
> This seems reasonable.
>
> > 0008-Let-table-AM-insertion-methods-control-index-inse-v1.patch
> >
> > Allows table AM to skip index insertions in the executor and handle
> > those insertions itself.
>
> Who handles index tuple removal then?

Table AM implementation decides what actions to perform on tuple
update/delete.  The reason why it can't really care about updating
indexes is that the executor already does it.
The situation is different with deletes, because the executor doesn't
do something immediately about the corresponding index tuples.  They
are deleted later by vacuum, which is also controlled by table AM
implementation.

> I don't see a patch that describes index AM changes for this...

Yes, index AM should be revised for that.  See my comment about that earlier.

> > 0009-Custom-reloptions-for-table-AM-v1.patch
> >
> > Enables table AMs to define and override reloptions for tables and indexes.
> >
> > 0010-Notify-table-AM-about-index-creation-v1.patch
> >
> > Allows table AMs to prepare or update specific meta-information during
> > index creation.
>
> I don't think the described use case of this API is OK - a table AM
> cannot know about the internals of index AMs, and is definitely not
> allowed to overwrite the information of that index.
> If I ask for an index that uses the "btree" index, then that needs to
> be the AM actually used, or an error needs to be raised if it is
> somehow incompatible with the table AM used. It can't be that we
> silently update information and create an index that is explicitly not
> what the user asked to create.

I agree that this currently looks more like workarounds rather than
proper API changes.  I propose these two should be considered later
together with relevant index API changes.

> I also don't see updates in documentation, which I think is quite a
> shame as I have trouble understanding some parts.

Sorry for this.  I hope I gave some answers in this message and I'll
update the patchset comments and commit messages accordingly.  And I'm
open to answer any further questions.

> > 0012-Introduce-RowID-bytea-tuple-identifier-v1.patch
> >
> > `This patch introduces 'RowID', a new bytea tuple identifier, to
> > overcome the limitations of the current 32-bit block number and 16-bit
> > offset-based tuple identifier. This is particularly useful for
> > index-organized tables and other advanced use cases.
>
> We don't have any index methods that can handle anything but
> block+offset TIDs, and I don't see any changes to the IndexAM APIs to
> support these RowID tuples, so what's the plan here? I don't see any
> of that in the commit message, nor in the rest of this patchset.
>
> > Each commit message contains a detailed explanation of the changes and
> > their rationale. I believe these enhancements will significantly
> > improve the flexibility and capabilities of the PostgreSQL Table AM
> > interface.
>
> I've noticed there is not a lot of rationale for several of the
> changes as to why PostgreSQL needs these changes implemented like
> this, amongst which the index-related tableAM changes.
>
> I understand that index-organized tables can be quite useful, but I
> don't see design solutions to the more complex questions that would
> still be required before we could host such table AMs like OreoleDB's
> as a first-party citizen: For index-organized tables, you also need
> index AM APIs that support TIDS with more than 48 bits of data
> (assuming we actually want primary keys with >48 bits of usable
> space), and for undo-based logging you would probably need index APIs
> for retail index tuple deletion. Neither is supplied here, nor is
> described why these APIs were omitted.

As I mentioned before, I agree that index AM changes haven't been
presented yet.  And yes, for bytea rowID there is currently no way to
use the current index API.  However, I think this exact patch could be
useful even without index AM implements.  This allows table AMs to
identify rows by custom bytea, even though these tables couldn't be
indexed yet.  So, if we allow a custom table AM to implement an
index-organized table, that would have use cases even if secondary
indexes are not supported yet.

Links
1. https://pgconf.ru/media/2016/02/19/06_Korotkov%20Extendability.pdf




2. https://www.postgresql.org/message-id/CAPpHfdua-YFw3XTprfutzGp28xXLigFtzNbuFY8yPhqeq6X5kg%40mail.gmail.com

------
Regards,
Alexander Korotkov



Re: Table AM Interface Enhancements

От
Alexander Korotkov
Дата:
On Mon, Nov 27, 2023 at 10:18 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
>
> > On Nov 25, 2023, at 9:47 AM, Alexander Korotkov <aekorotkov@gmail.com> wrote:
> >
> >> Should the patch at least document which parts of the EState are expected to be in which states, and which parts
shouldbe viewed as undefined?  If the implementors of table AMs rely on any/all aspects of EState, doesn't that prevent
futurechanges to how that structure is used? 
> >
> > New tuple tuple_insert_with_arbiter() table AM API method needs EState
> > argument to call executor functions: ExecCheckIndexConstraints(),
> > ExecUpdateLockMode(), and ExecInsertIndexTuples().  I think we
> > probably need to invent some opaque way to call this executor function
> > without revealing EState to table AM.  Do you think this could work?
>
> We're clearly not accessing all of the EState, just some specific fields, such as es_per_tuple_exprcontext.  I think
youcould at least refactor to pass the minimum amount of state information through the table AM API. 

Yes, the table AM doesn't need the full EState, just the ability to do
specific manipulation with tuples.  I'll refactor the patch to make a
better isolation for this.

------
Regards,
Alexander Korotkov



Re: Table AM Interface Enhancements

От
Alexander Korotkov
Дата:
On Sun, Mar 3, 2024 at 1:50 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> On Mon, Nov 27, 2023 at 10:18 PM Mark Dilger
> <mark.dilger@enterprisedb.com> wrote:
> >
> > > On Nov 25, 2023, at 9:47 AM, Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > >
> > >> Should the patch at least document which parts of the EState are expected to be in which states, and which parts
shouldbe viewed as undefined?  If the implementors of table AMs rely on any/all aspects of EState, doesn't that prevent
futurechanges to how that structure is used? 
> > >
> > > New tuple tuple_insert_with_arbiter() table AM API method needs EState
> > > argument to call executor functions: ExecCheckIndexConstraints(),
> > > ExecUpdateLockMode(), and ExecInsertIndexTuples().  I think we
> > > probably need to invent some opaque way to call this executor function
> > > without revealing EState to table AM.  Do you think this could work?
> >
> > We're clearly not accessing all of the EState, just some specific fields, such as es_per_tuple_exprcontext.  I
thinkyou could at least refactor to pass the minimum amount of state information through the table AM API. 
>
> Yes, the table AM doesn't need the full EState, just the ability to do
> specific manipulation with tuples.  I'll refactor the patch to make a
> better isolation for this.

Please find the revised patchset attached.  The changes are following:
1. Patchset is rebase. to the current master.
2. Patchset is reordered.  I tried to put less debatable patches to the top.
3. tuple_is_current() method is moved from the Table AM API to the
slot as proposed by Matthias van de Meent.
4. Assert added to the table_free_rd_amcache() as proposed by Pavel Borisov.

------
Regards,
Alexander Korotkov

Вложения

Re: Table AM Interface Enhancements

От
Pavel Borisov
Дата:
Hi, Alexander!

On Tue, 19 Mar 2024 at 03:34, Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Sun, Mar 3, 2024 at 1:50 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> On Mon, Nov 27, 2023 at 10:18 PM Mark Dilger
> <mark.dilger@enterprisedb.com> wrote:
> >
> > > On Nov 25, 2023, at 9:47 AM, Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > >
> > >> Should the patch at least document which parts of the EState are expected to be in which states, and which parts should be viewed as undefined?  If the implementors of table AMs rely on any/all aspects of EState, doesn't that prevent future changes to how that structure is used?
> > >
> > > New tuple tuple_insert_with_arbiter() table AM API method needs EState
> > > argument to call executor functions: ExecCheckIndexConstraints(),
> > > ExecUpdateLockMode(), and ExecInsertIndexTuples().  I think we
> > > probably need to invent some opaque way to call this executor function
> > > without revealing EState to table AM.  Do you think this could work?
> >
> > We're clearly not accessing all of the EState, just some specific fields, such as es_per_tuple_exprcontext.  I think you could at least refactor to pass the minimum amount of state information through the table AM API.
>
> Yes, the table AM doesn't need the full EState, just the ability to do
> specific manipulation with tuples.  I'll refactor the patch to make a
> better isolation for this.

Please find the revised patchset attached.  The changes are following:
1. Patchset is rebase. to the current master.
2. Patchset is reordered.  I tried to put less debatable patches to the top.
3. tuple_is_current() method is moved from the Table AM API to the
slot as proposed by Matthias van de Meent.
4. Assert added to the table_free_rd_amcache() as proposed by Pavel Borisov.

Patches 0001-0002 are unchanged compared to the last version in thread [1]. In my opinion, it's still ready to be committed, which was not done for time were too close to feature freeze one year ago.

0003 - Assert added from previous version. I still have a strong opinion that allowing multi-chunked data structures instead of single chunks is completely safe and makes natural process of Postgres improvement that is self-justified. The patch is simple enough and ready to be pushed.

0004  (previously 0007) - Have not changed, and there is consensus that this is reasonable. I've re-checked the current code. Looks safe considering returning a different slot, which I doubted before. So consider this patch also ready.

0005 (previously 0004) - Unused argument in the is_current_xact_tuple() signature is removed. Also comparing to v1 the code shifted from tableam methods to TTS's level.

I'd propose to remove  Assert(!TTS_EMPTY(slot)) for tts_minimal_is_current_xact_tuple() and  tts_virtual_is_current_xact_tuple() as these are only error reporting functions that don't use slot actually.

Comment similar to:
+/*
+ * VirtualTupleTableSlots never have a storage tuples.  We generally
+ * shouldn't get here, but provide a user-friendly message if we do.
+ */
also applies to tts_minimal_is_current_xact_tuple()

I'd propose changes for clarity of this comment:
%s/a storage tuples/storage tuples/g
%s/generally//g

Otherwise patch 0005 also looks good to me. 

I'm planning to review the remaining patches. Meanwhile think pushing what is now ready and uncontroversial is a good intention.
Thank you for the work done on this patchset!

Regards, 
Pavel Borisov,
Supabase.






Re: Table AM Interface Enhancements

От
Alexander Korotkov
Дата:
Hi, Pavel!

On Tue, Mar 19, 2024 at 11:34 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> On Tue, 19 Mar 2024 at 03:34, Alexander Korotkov <aekorotkov@gmail.com> wrote:
>>
>> On Sun, Mar 3, 2024 at 1:50 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
>> > On Mon, Nov 27, 2023 at 10:18 PM Mark Dilger
>> > <mark.dilger@enterprisedb.com> wrote:
>> > >
>> > > > On Nov 25, 2023, at 9:47 AM, Alexander Korotkov <aekorotkov@gmail.com> wrote:
>> > > >
>> > > >> Should the patch at least document which parts of the EState are expected to be in which states, and which
partsshould be viewed as undefined?  If the implementors of table AMs rely on any/all aspects of EState, doesn't that
preventfuture changes to how that structure is used? 
>> > > >
>> > > > New tuple tuple_insert_with_arbiter() table AM API method needs EState
>> > > > argument to call executor functions: ExecCheckIndexConstraints(),
>> > > > ExecUpdateLockMode(), and ExecInsertIndexTuples().  I think we
>> > > > probably need to invent some opaque way to call this executor function
>> > > > without revealing EState to table AM.  Do you think this could work?
>> > >
>> > > We're clearly not accessing all of the EState, just some specific fields, such as es_per_tuple_exprcontext.  I
thinkyou could at least refactor to pass the minimum amount of state information through the table AM API. 
>> >
>> > Yes, the table AM doesn't need the full EState, just the ability to do
>> > specific manipulation with tuples.  I'll refactor the patch to make a
>> > better isolation for this.
>>
>> Please find the revised patchset attached.  The changes are following:
>> 1. Patchset is rebase. to the current master.
>> 2. Patchset is reordered.  I tried to put less debatable patches to the top.
>> 3. tuple_is_current() method is moved from the Table AM API to the
>> slot as proposed by Matthias van de Meent.
>> 4. Assert added to the table_free_rd_amcache() as proposed by Pavel Borisov.
>
>
> Patches 0001-0002 are unchanged compared to the last version in thread [1]. In my opinion, it's still ready to be
committed,which was not done for time were too close to feature freeze one year ago. 
>
> 0003 - Assert added from previous version. I still have a strong opinion that allowing multi-chunked data structures
insteadof single chunks is completely safe and makes natural process of Postgres improvement that is self-justified.
Thepatch is simple enough and ready to be pushed. 
>
> 0004  (previously 0007) - Have not changed, and there is consensus that this is reasonable. I've re-checked the
currentcode. Looks safe considering returning a different slot, which I doubted before. So consider this patch also
ready.
>
> 0005 (previously 0004) - Unused argument in the is_current_xact_tuple() signature is removed. Also comparing to v1
thecode shifted from tableam methods to TTS's level. 
>
> I'd propose to remove  Assert(!TTS_EMPTY(slot)) for tts_minimal_is_current_xact_tuple() and
tts_virtual_is_current_xact_tuple()as these are only error reporting functions that don't use slot actually. 
>
> Comment similar to:
> +/*
> + * VirtualTupleTableSlots never have a storage tuples.  We generally
> + * shouldn't get here, but provide a user-friendly message if we do.
> + */
> also applies to tts_minimal_is_current_xact_tuple()
>
> I'd propose changes for clarity of this comment:
> %s/a storage tuples/storage tuples/g
> %s/generally//g
>
> Otherwise patch 0005 also looks good to me.
>
> I'm planning to review the remaining patches. Meanwhile think pushing what is now ready and uncontroversial is a good
intention.
> Thank you for the work done on this patchset!

Thank you, Pavel!

Regarding 0005, I did apply "a storage tuples" grammar fix.  Regarding
the rest of the things, I'd like to keep methods
tts_*_is_current_xact_tuple() to be similar to nearby
tts_*_getsysattr().  This is why I'm keeping the rest unchanged.  I
think we could refactor that later, but together with
tts_*_getsysattr() methods.

I'm going to push 0003, 0004 and 0005 if there are no objections.

And I'll update 0001 and 0002 in their dedicated thread.

------
Regards,
Alexander Korotkov

Вложения

Re: Table AM Interface Enhancements

От
Japin Li
Дата:
On Tue, 19 Mar 2024 at 21:05, Alexander Korotkov <aekorotkov@gmail.com> wrote:
> Hi, Pavel!
>
> On Tue, Mar 19, 2024 at 11:34 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
>> On Tue, 19 Mar 2024 at 03:34, Alexander Korotkov <aekorotkov@gmail.com> wrote:
>>>
>>> On Sun, Mar 3, 2024 at 1:50 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
>>> > On Mon, Nov 27, 2023 at 10:18 PM Mark Dilger
>>> > <mark.dilger@enterprisedb.com> wrote:
>>> > >
>>> > > > On Nov 25, 2023, at 9:47 AM, Alexander Korotkov <aekorotkov@gmail.com> wrote:
>>> > > >
>>> > > >> Should the patch at least document which parts of the EState are expected to be in which states, and which
partsshould be viewed as undefined?  If the implementors of table AMs rely on any/all aspects of EState, doesn't that
preventfuture changes to how that structure is used? 
>>> > > >
>>> > > > New tuple tuple_insert_with_arbiter() table AM API method needs EState
>>> > > > argument to call executor functions: ExecCheckIndexConstraints(),
>>> > > > ExecUpdateLockMode(), and ExecInsertIndexTuples().  I think we
>>> > > > probably need to invent some opaque way to call this executor function
>>> > > > without revealing EState to table AM.  Do you think this could work?
>>> > >
>>> > > We're clearly not accessing all of the EState, just some specific fields, such as es_per_tuple_exprcontext.  I
thinkyou could at least refactor to pass the minimum amount of state information through the table AM API. 
>>> >
>>> > Yes, the table AM doesn't need the full EState, just the ability to do
>>> > specific manipulation with tuples.  I'll refactor the patch to make a
>>> > better isolation for this.
>>>
>>> Please find the revised patchset attached.  The changes are following:
>>> 1. Patchset is rebase. to the current master.
>>> 2. Patchset is reordered.  I tried to put less debatable patches to the top.
>>> 3. tuple_is_current() method is moved from the Table AM API to the
>>> slot as proposed by Matthias van de Meent.
>>> 4. Assert added to the table_free_rd_amcache() as proposed by Pavel Borisov.
>>
>>
>> Patches 0001-0002 are unchanged compared to the last version in thread [1]. In my opinion, it's still ready to be
committed,which was not done for time were too close to feature freeze one year ago. 
>>
>> 0003 - Assert added from previous version. I still have a strong opinion that allowing multi-chunked data structures
insteadof single chunks is completely safe and makes natural process of Postgres improvement that is self-justified.
Thepatch is simple enough and ready to be pushed. 
>>
>> 0004  (previously 0007) - Have not changed, and there is consensus that this is reasonable. I've re-checked the
currentcode. Looks safe considering returning a different slot, which I doubted before. So consider this patch also
ready.
>>
>> 0005 (previously 0004) - Unused argument in the is_current_xact_tuple() signature is removed. Also comparing to v1
thecode shifted from tableam methods to TTS's level. 
>>
>> I'd propose to remove  Assert(!TTS_EMPTY(slot)) for tts_minimal_is_current_xact_tuple() and
tts_virtual_is_current_xact_tuple()as these are only error reporting functions that don't use slot actually. 
>>
>> Comment similar to:
>> +/*
>> + * VirtualTupleTableSlots never have a storage tuples.  We generally
>> + * shouldn't get here, but provide a user-friendly message if we do.
>> + */
>> also applies to tts_minimal_is_current_xact_tuple()
>>
>> I'd propose changes for clarity of this comment:
>> %s/a storage tuples/storage tuples/g
>> %s/generally//g
>>
>> Otherwise patch 0005 also looks good to me.
>>
>> I'm planning to review the remaining patches. Meanwhile think pushing what is now ready and uncontroversial is a
goodintention. 
>> Thank you for the work done on this patchset!
>
> Thank you, Pavel!
>
> Regarding 0005, I did apply "a storage tuples" grammar fix.  Regarding
> the rest of the things, I'd like to keep methods
> tts_*_is_current_xact_tuple() to be similar to nearby
> tts_*_getsysattr().  This is why I'm keeping the rest unchanged.  I
> think we could refactor that later, but together with
> tts_*_getsysattr() methods.
>
> I'm going to push 0003, 0004 and 0005 if there are no objections.
>
> And I'll update 0001 and 0002 in their dedicated thread.
>

When I try to test the patch on Ubuntu 22.04 with GCC 11.4.0.  There are some
warnings as following:

/home/japin/Codes/postgres/build/../src/backend/access/heap/heapam_handler.c: In function ‘heapam_acquire_sample_rows’:
/home/japin/Codes/postgres/build/../src/backend/access/heap/heapam_handler.c:1603:28: warning: implicit declaration of
function‘get_tablespace_maintenance_io_concurrency’ [-Wimplicit-function-declaration] 
 1603 |         prefetch_maximum = get_tablespace_maintenance_io_concurrency(onerel->rd_rel->reltablespace);
      |                            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/japin/Codes/postgres/build/../src/backend/access/heap/heapam_handler.c:1757:30: warning: implicit declaration of
function‘floor’ [-Wimplicit-function-declaration] 
 1757 |                 *totalrows = floor((liverows / bs.m) * totalblocks + 0.5);
      |                              ^~~~~
/home/japin/Codes/postgres/build/../src/backend/access/heap/heapam_handler.c:49:1: note: include ‘<math.h>’ or provide
adeclaration of ‘floor’ 
   48 | #include "utils/sampling.h"
  +++ |+#include <math.h>
   49 |
/home/japin/Codes/postgres/build/../src/backend/access/heap/heapam_handler.c:1757:30: warning: incompatible implicit
declarationof built-in function ‘floor’ [-Wbuiltin-declaration-mismatch] 
 1757 |                 *totalrows = floor((liverows / bs.m) * totalblocks + 0.5);
      |                              ^~~~~
/home/japin/Codes/postgres/build/../src/backend/access/heap/heapam_handler.c:1757:30: note: include ‘<math.h>’ or
providea declaration of ‘floor’ 
/home/japin/Codes/postgres/build/../src/backend/access/heap/heapam_handler.c:1603:21: warning: implicit declaration of
function'get_tablespace_maintenance_io_concurrency' is invalid in C99 [-Wimplicit-function-declaration] 
        prefetch_maximum = get_tablespace_maintenance_io_concurrency(onerel->rd_rel->reltablespace);
                           ^

It seems you forgot to include math.h and utils/spccache.h header files
in heapam_handler.c.

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index ac24691bd2..04365394f1 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -19,6 +19,8 @@
  */
 #include "postgres.h"

+#include <math.h>
+
 #include "access/genam.h"
 #include "access/heapam.h"
 #include "access/heaptoast.h"
@@ -46,6 +48,7 @@
 #include "utils/builtins.h"
 #include "utils/rel.h"
 #include "utils/sampling.h"
+#include "utils/spccache.h"

 static TM_Result heapam_tuple_lock(Relation relation, Datum tupleid,
                                    Snapshot snapshot, TupleTableSlot *slot,



Re: Table AM Interface Enhancements

От
Alexander Korotkov
Дата:
On Tue, Mar 19, 2024 at 4:26 PM Japin Li <japinli@hotmail.com> wrote:
> On Tue, 19 Mar 2024 at 21:05, Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > Regarding 0005, I did apply "a storage tuples" grammar fix.  Regarding
> > the rest of the things, I'd like to keep methods
> > tts_*_is_current_xact_tuple() to be similar to nearby
> > tts_*_getsysattr().  This is why I'm keeping the rest unchanged.  I
> > think we could refactor that later, but together with
> > tts_*_getsysattr() methods.
> >
> > I'm going to push 0003, 0004 and 0005 if there are no objections.
> >
> > And I'll update 0001 and 0002 in their dedicated thread.
> >
>
> When I try to test the patch on Ubuntu 22.04 with GCC 11.4.0.  There are some
> warnings as following:

Thank you for catching this!
Please, find the revised patchset attached.

------
Regards,
Alexander Korotkov

Вложения

Re: Table AM Interface Enhancements

От
Pavel Borisov
Дата:
Hi, Alexander!

For 0007:

Code inside

+heapam_reloptions(char relkind, Datum reloptions, bool validate) 
+{
+   if (relkind == RELKIND_RELATION ||
+       relkind == RELKIND_TOASTVALUE ||
+       relkind == RELKIND_MATVIEW)
+       return heap_reloptions(relkind, reloptions, validate);
+
+   return NULL;

looks redundant to what is done inside heap_reloptions(). Was this on purpose? Is it possible to leave only "return heap_reloptions()"  ?

This looks like a duplicate:
src/include/access/reloptions.h:extern bytea *index_reloptions(amoptions_function amoptions, Datum reloptions,
src/include/access/tableam.h:extern bytea *index_reloptions(amoptions_function amoptions, Datum reloptions,

Otherwise the patch looks good and doing what it's proposed to do.

Regards,
Pavel Borisov.

Re: Table AM Interface Enhancements

От
Pavel Borisov
Дата:
Hi, Alexander!


On Wed, 20 Mar 2024 at 09:22, Pavel Borisov <pashkin.elfe@gmail.com> wrote:
Hi, Alexander!

For 0007:

Code inside

+heapam_reloptions(char relkind, Datum reloptions, bool validate) 
+{
+   if (relkind == RELKIND_RELATION ||
+       relkind == RELKIND_TOASTVALUE ||
+       relkind == RELKIND_MATVIEW)
+       return heap_reloptions(relkind, reloptions, validate);
+
+   return NULL;

looks redundant to what is done inside heap_reloptions(). Was this on purpose? Is it possible to leave only "return heap_reloptions()"  ?

This looks like a duplicate:
src/include/access/reloptions.h:extern bytea *index_reloptions(amoptions_function amoptions, Datum reloptions,
src/include/access/tableam.h:extern bytea *index_reloptions(amoptions_function amoptions, Datum reloptions,

Otherwise the patch looks good and doing what it's proposed to do.

For patch 0006:

The change for analyze is in the same style as for previous table am extensibility patches.

table_scan_analyze_next_tuple/table_scan_analyze_next_block existing extensibility is dropped in favour of more general method table_relation_analyze. I haven't found existing extensions on a GitHub that use these table am's, so probably it's quite ok to remove the extensibility that didn't get any traction for many years.

The patch contains a big block of code copy-paste. I've checked that the code is the same with only function name replacement in favor to using table am instead of heap am. I'd propose restoring the static functions declaration in the head of the file, which was removed in the patch and place heapam_acquire_sample_rows() above compare_rows() to make functions copied as the whole code block. This is for better patch look only, not a principal change.

-static int acquire_sample_rows(Relation onerel, int elevel,
-                               HeapTuple *rows, int targrows,
-                               double *totalrows, double *totaldeadrows);
-static int compare_rows(const void *a, const void *b, void *arg)

May it also be a better place than vacuum.h for 
typedef int (*AcquireSampleRowsFunc) ? Maybe sampling.h ?


The other patch that I'd like to review is 0012:

For a 
typedef enum RowRefType
 I think some comments would be useful to avoid confusion about the changes like
-               newrc->allMarkTypes = (1 << newrc->markType);
+              newrc->allRefTypes = (1 << refType);

Also I think the semantical difference between ROW_REF_COPY and ROW_MARK_COPY is better to be mentioned in the comments and/or commit message. This may include a description of assigning different reftypes in parse_relation.c 

In a comment there is a small confusion between markType and refType: 

 * The parent's allRefTypes field gets the OR of (1<<refType) across all
 * its children (this definition allows children to use different markTypes).

Both patches look good to me and are ready, though they may need minimal comments/cosmetic work.

Regards,
Pavel Borisov

Re: Table AM Interface Enhancements

От
Pavel Borisov
Дата:
Hi, Alexander!

Thank you for working on this patchset and pushing some of these patches!

I tried to write comments for tts_minimal_is_current_xact_tuple() and tts_minimal_getsomeattrs() for them to be the same as for the same functions for heap and virtual tuple slots, as I proposed above in the thread. (tts_minimal_getsysattr is not introduced by the current patchset, but anyway)

Meanwhile I found that (never appearing) error message for tts_minimal_is_current_xact_tuple needs to be corrected. Please see the patch in the attachment.

Regards, 
Pavel Borisov
Вложения

Re: Table AM Interface Enhancements

От
Pavel Borisov
Дата:


On Fri, 22 Mar 2024 at 08:51, Pavel Borisov <pashkin.elfe@gmail.com> wrote:
Hi, Alexander!

Thank you for working on this patchset and pushing some of these patches!

I tried to write comments for tts_minimal_is_current_xact_tuple() and tts_minimal_getsomeattrs() for them to be the same as for the same functions for heap and virtual tuple slots, as I proposed above in the thread. (tts_minimal_getsysattr is not introduced by the current patchset, but anyway)

Meanwhile I found that (never appearing) error message for tts_minimal_is_current_xact_tuple needs to be corrected. Please see the patch in the attachment.

I need to correct myself: it's for  tts_minimal_getsysattr() not tts_minimal_getsomeattrs()  

Pavel.

Re: Table AM Interface Enhancements

От
Alexander Korotkov
Дата:
On Fri, Mar 22, 2024 at 6:56 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
>
> On Fri, 22 Mar 2024 at 08:51, Pavel Borisov <pashkin.elfe@gmail.com> wrote:
>>
>> Hi, Alexander!
>>
>> Thank you for working on this patchset and pushing some of these patches!
>>
>> I tried to write comments for tts_minimal_is_current_xact_tuple() and tts_minimal_getsomeattrs() for them to be the
sameas for the same functions for heap and virtual tuple slots, as I proposed above in the thread.
(tts_minimal_getsysattris not introduced by the current patchset, but anyway) 
>>
>> Meanwhile I found that (never appearing) error message for tts_minimal_is_current_xact_tuple needs to be corrected.
Pleasesee the patch in the attachment. 
>>
> I need to correct myself: it's for  tts_minimal_getsysattr() not tts_minimal_getsomeattrs()

Pushed.

The revised rest of the patchset is attached.
0001 (was 0006) – I prefer the definition of AcquireSampleRowsFunc to
stay in vacuum.h.  If we move it to sampling.h then we would have to
add there includes to define Relation, HeapTuple etc.  I'd like to
avoid this kind of change.  Also, I've deleted
table_beginscan_analyze(), because it's only called from
tableam-specific AcquireSampleRowsFunc.  Also I put some comments to
heapam_scan_analyze_next_block() and heapam_scan_analyze_next_tuple()
given that there are now no relevant comments for them in tableam.h.
I've removed some redundancies from acquire_sample_rows().  And added
comments to AcquireSampleRowsFunc based on what we have in FDW docs
for this function.  Did some small edits as well.  As you suggested,
turned back declarations for acquire_sample_rows() and compare_rows().

0002 (was 0007) – I've turned the redundant "if", which you've pointed
out, into an assert.  Also, added some comments, most notably comment
for TableAmRoutine.reloptions based on the indexam docs.

0007 (was 0012) – This patch doesn't make much sense if not removing
ROW_MARK_COPY.  What an oversight by me!  I managed to remove
ROW_MARK_COPY so that tests passed.  Added a lot of comments and made
other improvements.  But the problem is that I didn't manage to
research all the consequences of this patch to FDW.  And I think there
are open design questions.  In particular how should ROW_REF_COPY work
with row marks other than ROW_MARK_REFERENCE and should it work at
all?  This would require some consensus, and it doesn't seem feasible
to achieve before FF.  So, I think this is not a subject for v17.

Other patches are without changes.

------
Regards,
Alexander Korotkov

Вложения

Re: Table AM Interface Enhancements

От
Pavel Borisov
Дата:
Hi, Alexander!

The revised rest of the patchset is attached.
0001 (was 0006) – I prefer the definition of AcquireSampleRowsFunc to
stay in vacuum.h.  If we move it to sampling.h then we would have to
add there includes to define Relation, HeapTuple etc.  I'd like to
avoid this kind of change.  Also, I've deleted
table_beginscan_analyze(), because it's only called from
tableam-specific AcquireSampleRowsFunc.  Also I put some comments to
heapam_scan_analyze_next_block() and heapam_scan_analyze_next_tuple()
given that there are now no relevant comments for them in tableam.h.
I've removed some redundancies from acquire_sample_rows().  And added
comments to AcquireSampleRowsFunc based on what we have in FDW docs
for this function.  Did some small edits as well.  As you suggested,
turned back declarations for acquire_sample_rows() and compare_rows().

In my comment in the thread I was not thinking about returning the old name acquire_sample_rows(), it was only about the declarations and the order of functions to be one code block. To me heapam_acquire_sample_rows() looks better for a name of heap implementation of *AcquireSampleRowsFunc(). I suggest returning the name heapam_acquire_sample_rows() from v4. Sorry for the confusion in this.

The changed type of static function that always returned true for heap looks good to me: 
static void heapam_scan_analyze_next_block

The same is for removing the comparison of always true "block_accepted" in (heapam_)acquire_sample_rows()

Removing table_beginscan_analyze and call scan_begin() is not in the same style as other table_beginscan_* functions. Though this is not a change in functionality, I'd leave this part as it was in v4. Also, a comment about it was introduced in v5:

src/backend/access/heap/heapam_handler.c: * with table_beginscan_analyze()

For comments I'd propose:
%s/In addition, store estimates/In addition, a function should store estimates/g
%s/zerp/zero/g
 
0002 (was 0007) – I've turned the redundant "if", which you've pointed
out, into an assert.  Also, added some comments, most notably comment
for TableAmRoutine.reloptions based on the indexam docs.

%s/validate sthe/validates the/g

This seems not needed, it's already inited to InvalidOid before.
+else
+accessMethod = default_table_access_method;                                                                 
+       accessMethodId = InvalidOid; 

This code came from 374c7a22904. I don't insist on this simplification in a patch 0002.

Overall both patches look good to me.

Regards,
Pavel Borisov.

Re: Table AM Interface Enhancements

От
Pavel Borisov
Дата:
This seems not needed, it's already inited to InvalidOid before.
+else
+accessMethod = default_table_access_method;                                                                 
+       accessMethodId = InvalidOid; 

This code came from 374c7a22904. I don't insist on this simplification in a patch 0002.

A correction of the code quote for the previous message:

+else                                                               
+       accessMethodId = InvalidOid; 

Re: Table AM Interface Enhancements

От
Alexander Korotkov
Дата:
Hi, Pavel!

Thank you for your feedback.  The revised patch set is attached.

I found that vacuum.c has a lot of heap-specific code.  Thus, it
should be OK for analyze.c to keep heap-specific code.  Therefore, I
rolled back the movement of functions between files.  That leads to a
smaller patch, easier to review.

On Wed, Mar 27, 2024 at 2:52 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
>> The revised rest of the patchset is attached.
>> 0001 (was 0006) – I prefer the definition of AcquireSampleRowsFunc to
>> stay in vacuum.h.  If we move it to sampling.h then we would have to
>> add there includes to define Relation, HeapTuple etc.  I'd like to
>> avoid this kind of change.  Also, I've deleted
>> table_beginscan_analyze(), because it's only called from
>> tableam-specific AcquireSampleRowsFunc.  Also I put some comments to
>> heapam_scan_analyze_next_block() and heapam_scan_analyze_next_tuple()
>> given that there are now no relevant comments for them in tableam.h.
>> I've removed some redundancies from acquire_sample_rows().  And added
>> comments to AcquireSampleRowsFunc based on what we have in FDW docs
>> for this function.  Did some small edits as well.  As you suggested,
>> turned back declarations for acquire_sample_rows() and compare_rows().
>
>
> In my comment in the thread I was not thinking about returning the old name acquire_sample_rows(), it was only about
thedeclarations and the order of functions to be one code block. To me heapam_acquire_sample_rows() looks better for a
nameof heap implementation of *AcquireSampleRowsFunc(). I suggest returning the name heapam_acquire_sample_rows() from
v4.Sorry for the confusion in this. 

I found that the function name acquire_sample_rows is referenced in
quite several places in the source code.  So, I would prefer to save
the old name to keep the changes minimal.

> The changed type of static function that always returned true for heap looks good to me:
> static void heapam_scan_analyze_next_block
>
> The same is for removing the comparison of always true "block_accepted" in (heapam_)acquire_sample_rows()

Ok!

> Removing table_beginscan_analyze and call scan_begin() is not in the same style as other table_beginscan_* functions.
Thoughthis is not a change in functionality, I'd leave this part as it was in v4. 

With the patch, this method doesn't have usages outside of table am.
I don't think we should keep a method, which doesn't have clear
external usage patterns.  But I agree that starting a scan with
heap_beginscan() and ending with table_endscan() is not correct.  Now
ending this scan with heap_endscan().

> Also, a comment about it was introduced in v5:
>
> src/backend/access/heap/heapam_handler.c: * with table_beginscan_analyze()

Corrected.

> For comments I'd propose:
> %s/In addition, store estimates/In addition, a function should store estimates/g
> %s/zerp/zero/g

Fixed.

>> 0002 (was 0007) – I've turned the redundant "if", which you've pointed
>> out, into an assert.  Also, added some comments, most notably comment
>> for TableAmRoutine.reloptions based on the indexam docs.
>
> %s/validate sthe/validates the/g

Fixed.

> This seems not needed, it's already inited to InvalidOid before.
> +else
> +accessMethod = default_table_access_method;
> +       accessMethodId = InvalidOid;
>
> This code came from 374c7a22904. I don't insist on this simplification in a patch 0002.

This is minor redundancy.  I prefer to keep it.  This makes it obvious
that patch just moved this code.

------
Regards,
Alexander Korotkov

Вложения

Re: Table AM Interface Enhancements

От
Pavel Borisov
Дата:
Hi, Alexander!
Thank you for working on these patches.
On Thu, 28 Mar 2024 at 02:14, Alexander Korotkov <aekorotkov@gmail.com> wrote:
Hi, Pavel!

Thank you for your feedback.  The revised patch set is attached.

I found that vacuum.c has a lot of heap-specific code.  Thus, it
should be OK for analyze.c to keep heap-specific code.  Therefore, I
rolled back the movement of functions between files.  That leads to a
smaller patch, easier to review.
I agree with you.
And with the changes remaining in heapam_handler.c I suppose we can also remove the includes introduced:

#include <math.h>
#include "utils/sampling.h"
#include "utils/spccache.h"

On Wed, Mar 27, 2024 at 2:52 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
>> The revised rest of the patchset is attached.
>> 0001 (was 0006) – I prefer the definition of AcquireSampleRowsFunc to
>> stay in vacuum.h.  If we move it to sampling.h then we would have to
>> add there includes to define Relation, HeapTuple etc.  I'd like to
>> avoid this kind of change.  Also, I've deleted
>> table_beginscan_analyze(), because it's only called from
>> tableam-specific AcquireSampleRowsFunc.  Also I put some comments to
>> heapam_scan_analyze_next_block() and heapam_scan_analyze_next_tuple()
>> given that there are now no relevant comments for them in tableam.h.
>> I've removed some redundancies from acquire_sample_rows().  And added
>> comments to AcquireSampleRowsFunc based on what we have in FDW docs
>> for this function.  Did some small edits as well.  As you suggested,
>> turned back declarations for acquire_sample_rows() and compare_rows().
>
>
> In my comment in the thread I was not thinking about returning the old name acquire_sample_rows(), it was only about the declarations and the order of functions to be one code block. To me heapam_acquire_sample_rows() looks better for a name of heap implementation of *AcquireSampleRowsFunc(). I suggest returning the name heapam_acquire_sample_rows() from v4. Sorry for the confusion in this.

I found that the function name acquire_sample_rows is referenced in
quite several places in the source code.  So, I would prefer to save
the old name to keep the changes minimal.
The full list shows only a couple of changes in analyze.c and several comments elsewhere.

contrib/postgres_fdw/postgres_fdw.c:             * of the relation.  Same algorithm as in acquire_sample_rows in
src/backend/access/heap/vacuumlazy.c:            * match what analyze.c's acquire_sample_rows() does, otherwise VACUUM
src/backend/access/heap/vacuumlazy.c:            * The logic here is a bit simpler than acquire_sample_rows(), as
src/backend/access/heap/vacuumlazy.c:                            * what acquire_sample_rows() does.
src/backend/access/heap/vacuumlazy.c:                            * acquire_sample_rows() does, so be consistent.
src/backend/access/heap/vacuumlazy.c:            * acquire_sample_rows() will recognize the same LP_DEAD items as dead
src/backend/commands/analyze.c:static int       acquire_sample_rows(Relation onerel, int elevel,
src/backend/commands/analyze.c:         acquirefunc = acquire_sample_rows;
src/backend/commands/analyze.c: * acquire_sample_rows -- acquire a random sample of rows from the table
src/backend/commands/analyze.c:acquire_sample_rows(Relation onerel, int elevel,
src/backend/commands/analyze.c: * This has the same API as acquire_sample_rows, except that rows are
src/backend/commands/analyze.c:                 acquirefunc = acquire_sample_rows;
 
My point for renaming is to make clear that it's a heap implementation of acquire_sample_rows which could be useful for possible reworking heap implementations of table am methods into a separate place later. But I'm also ok with the existing naming.
 
> The changed type of static function that always returned true for heap looks good to me:
> static void heapam_scan_analyze_next_block
>
> The same is for removing the comparison of always true "block_accepted" in (heapam_)acquire_sample_rows()

Ok!

> Removing table_beginscan_analyze and call scan_begin() is not in the same style as other table_beginscan_* functions. Though this is not a change in functionality, I'd leave this part as it was in v4.

With the patch, this method doesn't have usages outside of table am.
I don't think we should keep a method, which doesn't have clear
external usage patterns.  But I agree that starting a scan with
heap_beginscan() and ending with table_endscan() is not correct.  Now
ending this scan with heap_endscan().
Good!
 
> Also, a comment about it was introduced in v5:
>
> src/backend/access/heap/heapam_handler.c: * with table_beginscan_analyze()

Corrected.
> For comments I'd propose:
> %s/In addition, store estimates/In addition, a function should store estimates/g
> %s/zerp/zero/g

Fixed.

>> 0002 (was 0007) – I've turned the redundant "if", which you've pointed
>> out, into an assert.  Also, added some comments, most notably comment
>> for TableAmRoutine.reloptions based on the indexam docs.
>
> %s/validate sthe/validates the/g

Fixed.

> This seems not needed, it's already inited to InvalidOid before.
> +else
> +accessMethod = default_table_access_method;
> +       accessMethodId = InvalidOid;
>
> This code came from 374c7a22904. I don't insist on this simplification in a patch 0002.

This is minor redundancy.  I prefer to keep it.  This makes it obvious
that patch just moved this code.
I agree with the remaining.

Regards, 
Pavel Borisov

Re: Table AM Interface Enhancements

От
Pavel Borisov
Дата:
Hi, Alexander!

The other extensibility that seems quite clear and uncontroversial to me is 0006.

It simply shifts the decision on whether tuple inserts should invoke inserts to the related indices to the table am level. It doesn't change the current heap insert behavior so it's safe for the existing heap access method. But new table access methods could redefine this (only for tables created with these am's) and make index inserts independently of ExecInsertIndexTuples inside their own implementations of tuple_insert/tuple_multi_insert methods.  

I'd propose changing the comment:

1405  * This function sets `*insert_indexes` to true if expects caller to return
1406  * the relevant index tuples.  If `*insert_indexes` is set to false, then
1407  * this function cares about indexes itself.

in the following way

Tableam implementation of tuple_insert should set `*insert_indexes` to true
if it expects the caller to insert the relevant index tuples (as in heap
 implementation). It should set `*insert_indexes` to false if it cares 
about index inserts itself and doesn't want the caller to do index inserts.

Maybe, a commit message is also better to reformulate to describe better who should do what.

I think, with rebase and correction in the comments/commit message patch 0006 is ready to be committed.

Regards,
Pavel Borisov.

Re: Table AM Interface Enhancements

От
Alexander Korotkov
Дата:
Hi Pavel!

Revised patchset is attached.

On Thu, Mar 28, 2024 at 3:12 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> The other extensibility that seems quite clear and uncontroversial to me is 0006.
>
> It simply shifts the decision on whether tuple inserts should invoke inserts to the related indices to the table am
level.It doesn't change the current heap insert behavior so it's safe for the existing heap access method. But new
tableaccess methods could redefine this (only for tables created with these am's) and make index inserts independently
ofExecInsertIndexTuples inside their own implementations of tuple_insert/tuple_multi_insert methods. 
>
> I'd propose changing the comment:
>
> 1405  * This function sets `*insert_indexes` to true if expects caller to return
> 1406  * the relevant index tuples.  If `*insert_indexes` is set to false, then
> 1407  * this function cares about indexes itself.
>
> in the following way
>
> Tableam implementation of tuple_insert should set `*insert_indexes` to true
> if it expects the caller to insert the relevant index tuples (as in heap
>  implementation). It should set `*insert_indexes` to false if it cares
> about index inserts itself and doesn't want the caller to do index inserts.

Changed as you proposed.

> Maybe, a commit message is also better to reformulate to describe better who should do what.

Done.

Also, I removed extra includes in 0001 as you proposed and edited the
commit message in 0002.

> I think, with rebase and correction in the comments/commit message patch 0006 is ready to be committed.

I'm going to push 0001, 0002 and 0006 if no objections.

------
Regards,
Alexander Korotkov

Вложения

Re: Table AM Interface Enhancements

От
Japin Li
Дата:
On Thu, 28 Mar 2024 at 21:26, Alexander Korotkov <aekorotkov@gmail.com> wrote:
> Hi Pavel!
>
> Revised patchset is attached.
>
> On Thu, Mar 28, 2024 at 3:12 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
>> The other extensibility that seems quite clear and uncontroversial to me is 0006.
>>
>> It simply shifts the decision on whether tuple inserts should invoke inserts to the related indices to the table am
level.It doesn't change the current heap insert behavior so it's safe for the existing heap access method. But new
tableaccess methods could redefine this (only for tables created with these am's) and make index inserts independently
ofExecInsertIndexTuples inside their own implementations of tuple_insert/tuple_multi_insert methods. 
>>
>> I'd propose changing the comment:
>>
>> 1405  * This function sets `*insert_indexes` to true if expects caller to return
>> 1406  * the relevant index tuples.  If `*insert_indexes` is set to false, then
>> 1407  * this function cares about indexes itself.
>>
>> in the following way
>>
>> Tableam implementation of tuple_insert should set `*insert_indexes` to true
>> if it expects the caller to insert the relevant index tuples (as in heap
>>  implementation). It should set `*insert_indexes` to false if it cares
>> about index inserts itself and doesn't want the caller to do index inserts.
>
> Changed as you proposed.
>
>> Maybe, a commit message is also better to reformulate to describe better who should do what.
>
> Done.
>
> Also, I removed extra includes in 0001 as you proposed and edited the
> commit message in 0002.
>
>> I think, with rebase and correction in the comments/commit message patch 0006 is ready to be committed.
>
> I'm going to push 0001, 0002 and 0006 if no objections.

Thanks for updating the patches.  Here are some minor sugesstion.

0003

+static inline TupleTableSlot *
+heapam_tuple_insert_with_arbiter(ResultRelInfo *resultRelInfo,

I'm not entirely certain whether the "inline" keyword has any effect.

0004

+static bytea *
+heapam_indexoptions(amoptions_function amoptions, char relkind,
+                                       Datum reloptions, bool validate)
+{
+       return index_reloptions(amoptions, reloptions, validate);
+}

Could you please explain why we are not verifying the relkind like
heapam_reloptions()?


-               case RELKIND_VIEW:
                case RELKIND_MATVIEW:
+               case RELKIND_VIEW:
                case RELKIND_PARTITIONED_TABLE:

I think this change is unnecessary.

+                       {
+                               Form_pg_class classForm;
+                               HeapTuple       classTup;
+
+                               /* fetch the relation's relcache entry */
+                               if (relation->rd_index->indrelid >= FirstNormalObjectId)
+                               {
+                                       classTup = SearchSysCacheCopy1(RELOID,
ObjectIdGetDatum(relation->rd_index->indrelid));
+                                       classForm = (Form_pg_class) GETSTRUCT(classTup);
+                                       if (classForm->relam >= FirstNormalObjectId)
+                                               tableam = GetTableAmRoutineByAmOid(classForm->relam);
+                                       else
+                                               tableam = GetHeapamTableAmRoutine();
+                                       heap_freetuple(classTup);
+                               }
+                               else
+                               {
+                                       tableam = GetHeapamTableAmRoutine();
+                               }
+                               amoptsfn = relation->rd_indam->amoptions;
+                       }

- We can reduce the indentation by moving the classFrom and classTup into
  the if branch.
- Perhaps we could remove the brace of else branch to maintain consistency
  in the code style.

--
Regards,
Japin Li



Re: Table AM Interface Enhancements

От
Pavel Borisov
Дата:

Re: Table AM Interface Enhancements

От
Pavel Borisov
Дата:
I've looked at patch 0003.

Generally, it does a similar thing as 0001 - it exposes a more generalized method tuple_insert_with_arbiter that encapsulates tuple_insert_speculative/tuple_complete_speculative and at the same time allows extensibility of this i.e. different implementation for custom table other than heap by the extensions. Though the code rearrangement is little bit more complicated, the patch is clear. It doesn't change the behavior for heap tables.

tuple_insert_speculative/tuple_complete_speculative are removed from table AM methods. I think it would not be hard for existing users of this to adapt to the changes.

Code block ExecCheckTupleVisible -- ExecCheckTIDVisible moved to heapam_handler.c I've checked, the code block unchanged except that ExecCheckTIDVisible now gets Relation from the caller instead of constructing it from ResultRelInfo.

Also two big code blocks are moved from ExecOnConflictUpdate and ExecInsert to a new method heapam_tuple_insert_with_arbiter. They correspond the old code with several minor modifications.

For ExecOnConflictUpdate comment need to be revised. This one is for shifted code:
> * Try to lock tuple for update as part of speculative insertion.
Probably it is worth to be moved to a comment for heapam_tuple_insert_with_arbiter.

For heapam_tuple_insert_with_arbiter both "return NULL" could be shifted level up into the end of the block:
>if (!ExecCheckIndexConstraints(resultRelInfo, slot, estate, &conflictTid,
>+                                                                          arbiterIndexes))

Also I'd add comment for heapam_tuple_insert_with_arbiter:
/* See comments for table_tuple_insert_with_arbiter() */

A comment to be corrected: 
src/backend/access/heap/heapam.c: * implement table_tuple_insert_speculative()

As Jaipin said, I'd also propose removing "inline" from heapam_tuple_insert_with_arbiter.

More corrections for comments:
%s/If tuple doesn't violates/If tuple doesn't violate/g
%s/which comprises the list of/list, which comprises/g
%s/conflicting tuple gets locked/conflicting tuple should be locked/g

I think for better code look this could be removed:
>vlock:
 >                       CHECK_FOR_INTERRUPTS();
together with CHECK_FOR_INTERRUPTS(); in heapam_tuple_insert_with_arbiter placed in the beginning of while loop.

Overall the patch looks good enough to me.

Regards,
Pavel

Re: Table AM Interface Enhancements

От
Pavel Borisov
Дата:
I think for better code look this could be removed:
>vlock:
 >                       CHECK_FOR_INTERRUPTS();
together with CHECK_FOR_INTERRUPTS(); in heapam_tuple_insert_with_arbiter placed in the beginning of while loop.
To clarify things, this I wrote only about CHECK_FOR_INTERRUPTS(); rearrangement.

Regards,
Pavel

Re: Table AM Interface Enhancements

От
Alexander Korotkov
Дата:
Hi, Pavel!

I've pushed 0001, 0002 and 0006.

On Fri, Mar 29, 2024 at 5:23 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
>>
>> I think for better code look this could be removed:
>> >vlock:
>>  >                       CHECK_FOR_INTERRUPTS();
>> together with CHECK_FOR_INTERRUPTS(); in heapam_tuple_insert_with_arbiter placed in the beginning of while loop.
>
> To clarify things, this I wrote only about CHECK_FOR_INTERRUPTS(); rearrangement.

Thank you for your review of this patch.  But I still think there is a
problem that this patch moves part of the executor to table AM which
directly uses executor data structures and functions.  This works, but
not committable since it breaks the encapsulation.

I think the way forward might be to introduce the new API, which would
isolate executor details from table AM.  We may introduce a new data
structure InsertWithArbiterContext which would contain EState and a
set of callbacks which would avoid table AM from calling the executor
directly.  That should be our goal for pg18.  Now, this is too close
to FF to design a new API.

------
Regards,
Alexander Korotkov



Re: Table AM Interface Enhancements

От
Andrei Lepikhov
Дата:
On 31/3/2024 00:33, Alexander Korotkov wrote:
> I think the way forward might be to introduce the new API, which would
> isolate executor details from table AM.  We may introduce a new data
> structure InsertWithArbiterContext which would contain EState and a
> set of callbacks which would avoid table AM from calling the executor
> directly.  That should be our goal for pg18.  Now, this is too close
> to FF to design a new API.
I'm a bit late, but have you ever considered adding some sort of index 
probing routine to the AM interface for estimation purposes?
I am working out the problem when we have dubious estimations. For 
example, we don't have MCV or do not fit MCV statistics for equality of 
multiple clauses, or we detected that the constant value is out of the 
histogram range. In such cases (especially for [parameterized] JOINs), 
the optimizer could have a chance to probe the index and avoid huge 
underestimation. This makes sense, especially for multicolumn 
filters/clauses.
Having a probing AM method, we may invent something for this challenge.

-- 
regards,
Andrei Lepikhov
Postgres Professional




Re: Table AM Interface Enhancements

От
Tom Lane
Дата:
Coverity complained about what you did in RelationParseRelOptions
in c95c25f9a:

*** CID 1595992:  Null pointer dereferences  (FORWARD_NULL)
/srv/coverity/git/pgsql-git/postgresql/src/backend/utils/cache/relcache.c: 499 in RelationParseRelOptions()
493
494         /*
495          * Fetch reloptions from tuple; have to use a hardwired descriptor because
496          * we might not have any other for pg_class yet (consider executing this
497          * code for pg_class itself)
498          */
>>>     CID 1595992:  Null pointer dereferences  (FORWARD_NULL)
>>>     Passing null pointer "tableam" to "extractRelOptions", which dereferences it.
499         options = extractRelOptions(tuple, GetPgClassDescriptor(),
500                                     tableam, amoptsfn);
501

I see that extractRelOptions only uses the tableam argument for some
relkinds, and RelationParseRelOptions does set it up for those
relkinds --- but Coverity's complaint isn't without merit, because
those two switch statements are looking at *different copies of the
relkind*, which in theory could be different.  This all seems quite
messy and poorly factored.  Can't we do better?  Why do we need to
involve two copies of allegedly the same pg_class tuple, anyhow?

            regards, tom lane



Re: Table AM Interface Enhancements

От
Alexander Korotkov
Дата:
On Mon, Apr 1, 2024 at 7:36 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Coverity complained about what you did in RelationParseRelOptions
> in c95c25f9a:
>
> *** CID 1595992:  Null pointer dereferences  (FORWARD_NULL)
> /srv/coverity/git/pgsql-git/postgresql/src/backend/utils/cache/relcache.c: 499 in RelationParseRelOptions()
> 493
> 494             /*
> 495              * Fetch reloptions from tuple; have to use a hardwired descriptor because
> 496              * we might not have any other for pg_class yet (consider executing this
> 497              * code for pg_class itself)
> 498              */
> >>>     CID 1595992:  Null pointer dereferences  (FORWARD_NULL)
> >>>     Passing null pointer "tableam" to "extractRelOptions", which dereferences it.
> 499             options = extractRelOptions(tuple, GetPgClassDescriptor(),
> 500                                                                     tableam, amoptsfn);
> 501
>
> I see that extractRelOptions only uses the tableam argument for some
> relkinds, and RelationParseRelOptions does set it up for those
> relkinds --- but Coverity's complaint isn't without merit, because
> those two switch statements are looking at *different copies of the
> relkind*, which in theory could be different.  This all seems quite
> messy and poorly factored.  Can't we do better?  Why do we need to
> involve two copies of allegedly the same pg_class tuple, anyhow?

Thank you for reporting this, Tom.
I'm planning to investigate this later today.

------
Regards,
Alexander Korotkov



Re: Table AM Interface Enhancements

От
Alexander Korotkov
Дата:
On Mon, Apr 1, 2024 at 7:36 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Coverity complained about what you did in RelationParseRelOptions
> in c95c25f9a:
>
> *** CID 1595992:  Null pointer dereferences  (FORWARD_NULL)
> /srv/coverity/git/pgsql-git/postgresql/src/backend/utils/cache/relcache.c: 499 in RelationParseRelOptions()
> 493
> 494             /*
> 495              * Fetch reloptions from tuple; have to use a hardwired descriptor because
> 496              * we might not have any other for pg_class yet (consider executing this
> 497              * code for pg_class itself)
> 498              */
> >>>     CID 1595992:  Null pointer dereferences  (FORWARD_NULL)
> >>>     Passing null pointer "tableam" to "extractRelOptions", which dereferences it.
> 499             options = extractRelOptions(tuple, GetPgClassDescriptor(),
> 500                                                                     tableam, amoptsfn);
> 501
>
> I see that extractRelOptions only uses the tableam argument for some
> relkinds, and RelationParseRelOptions does set it up for those
> relkinds --- but Coverity's complaint isn't without merit, because
> those two switch statements are looking at *different copies of the
> relkind*, which in theory could be different.  This all seems quite
> messy and poorly factored.  Can't we do better?  Why do we need to
> involve two copies of allegedly the same pg_class tuple, anyhow?

I wasn't registered at Coverity yet.  Now I've registered and am
waiting for approval to access the PostgreSQL analysis data.

I wonder why Coverity complains about tableam, but not amoptsfn.
Their usage patterns are very similar.

It appears that relation->rd_rel isn't the full copy of pg_class tuple
(see AllocateRelationDesc).  RelationParseRelOptions() is going to
update relation->rd_options, and thus needs a full pg_class tuple to
fetch options out of it.  However, it is really unnecessary to access
both tuples at the same time.  We can use a full tuple, not
relation->rd_rel, in both cases.  See the attached patch.

------
Regards,
Alexander Korotkov

Вложения

Re: Table AM Interface Enhancements

От
Japin Li
Дата:
On Tue, 02 Apr 2024 at 07:59, Alexander Korotkov <aekorotkov@gmail.com> wrote:
> On Mon, Apr 1, 2024 at 7:36 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Coverity complained about what you did in RelationParseRelOptions
>> in c95c25f9a:
>>
>> *** CID 1595992:  Null pointer dereferences  (FORWARD_NULL)
>> /srv/coverity/git/pgsql-git/postgresql/src/backend/utils/cache/relcache.c: 499 in RelationParseRelOptions()
>> 493
>> 494             /*
>> 495              * Fetch reloptions from tuple; have to use a hardwired descriptor because
>> 496              * we might not have any other for pg_class yet (consider executing this
>> 497              * code for pg_class itself)
>> 498              */
>> >>>     CID 1595992:  Null pointer dereferences  (FORWARD_NULL)
>> >>>     Passing null pointer "tableam" to "extractRelOptions", which dereferences it.
>> 499             options = extractRelOptions(tuple, GetPgClassDescriptor(),
>> 500                                                                     tableam, amoptsfn);
>> 501
>>
>> I see that extractRelOptions only uses the tableam argument for some
>> relkinds, and RelationParseRelOptions does set it up for those
>> relkinds --- but Coverity's complaint isn't without merit, because
>> those two switch statements are looking at *different copies of the
>> relkind*, which in theory could be different.  This all seems quite
>> messy and poorly factored.  Can't we do better?  Why do we need to
>> involve two copies of allegedly the same pg_class tuple, anyhow?
>
> I wasn't registered at Coverity yet.  Now I've registered and am
> waiting for approval to access the PostgreSQL analysis data.
>
> I wonder why Coverity complains about tableam, but not amoptsfn.
> Their usage patterns are very similar.
>
> It appears that relation->rd_rel isn't the full copy of pg_class tuple
> (see AllocateRelationDesc).  RelationParseRelOptions() is going to
> update relation->rd_options, and thus needs a full pg_class tuple to
> fetch options out of it.  However, it is really unnecessary to access
> both tuples at the same time.  We can use a full tuple, not
> relation->rd_rel, in both cases.  See the attached patch.
>
> ------
> Regards,
> Alexander Korotkov


+       Form_pg_class classForm = (Form_pg_class) GETSTRUCT(tuple);
+;

There is an additional semicolon in the code.

--
Regards,
Japin Li



Re: Table AM Interface Enhancements

От
Jeff Davis
Дата:
On Sat, 2024-03-30 at 23:33 +0200, Alexander Korotkov wrote:
> I've pushed 0001, 0002 and 0006.

Sorry to jump in to this discussion so late. I had worked on something
like the custom reloptions (0002) in the past, and there were some
complications that don't seem to be addressed in commit c95c25f9af.

* At minimum I think it needs some direction (comments, docs, tests)
that show how it's supposed to be used.

* The bytea returned by the reloptions() method is not in a trivial
format. It's a StdRelOptions struct with string values stored after the
end of the struct. To build the bytea internally, there's some
infrastructure like allocateRelOptStruct() and fillRelOptions(), and
it's not very easy to extend those to support a few custom options.

* If we ever decide to add a string option to StdRdOptions, I think the
design breaks, because the code that looks for those string values
wouldn't know how to skip over the custom options. Perhaps we can just
promise to never do that, but we should make it explicit somehow.

* Most existing heap reloptions (other than fillfactor) are used by
other parts of the system (like autovacuum) so should be considered
valid for any AM. Most AMs will just want to add a handful of their own
options on top, so it would be good to demonstrate how this should be
done.

* There are still places that are inappropriately calling
heap_reloptions directly. For instance, in ProcessUtilitySlow(), it
seems to assume that a toast table is a heap?

Regards,
    Jeff Davis




Re: Table AM Interface Enhancements

От
Alexander Korotkov
Дата:
Hi, Jeff!

On Tue, Apr 2, 2024 at 8:19 AM Jeff Davis <pgsql@j-davis.com> wrote:
> On Sat, 2024-03-30 at 23:33 +0200, Alexander Korotkov wrote:
> > I've pushed 0001, 0002 and 0006.
>
> Sorry to jump in to this discussion so late. I had worked on something
> like the custom reloptions (0002) in the past, and there were some
> complications that don't seem to be addressed in commit c95c25f9af.
>
> * At minimum I think it needs some direction (comments, docs, tests)
> that show how it's supposed to be used.
>
> * The bytea returned by the reloptions() method is not in a trivial
> format. It's a StdRelOptions struct with string values stored after the
> end of the struct. To build the bytea internally, there's some
> infrastructure like allocateRelOptStruct() and fillRelOptions(), and
> it's not very easy to extend those to support a few custom options.
>
> * If we ever decide to add a string option to StdRdOptions, I think the
> design breaks, because the code that looks for those string values
> wouldn't know how to skip over the custom options. Perhaps we can just
> promise to never do that, but we should make it explicit somehow.
>
> * Most existing heap reloptions (other than fillfactor) are used by
> other parts of the system (like autovacuum) so should be considered
> valid for any AM. Most AMs will just want to add a handful of their own
> options on top, so it would be good to demonstrate how this should be
> done.
>
> * There are still places that are inappropriately calling
> heap_reloptions directly. For instance, in ProcessUtilitySlow(), it
> seems to assume that a toast table is a heap?

Thank you for the detailed explanation.  This piece definitely needs
more work.  I've just reverted the c95c25f9af.

I don't like the idea that every custom table AM reltoptions should
begin with StdRdOptions.  I would rather introduce the new data
structure with table options, which need to be accessed outside of
table AM.  Then reloptions will be a backbox only directly used in
table AM, while table AM has a freedom on what to store in reloptions
and how to calculate externally-visible options.  What do you think?

------
Regards,
Alexander Korotkov



Re: Table AM Interface Enhancements

От
Jeff Davis
Дата:
On Tue, 2024-04-02 at 11:49 +0300, Alexander Korotkov wrote:
> I don't like the idea that every custom table AM reltoptions should
> begin with StdRdOptions.  I would rather introduce the new data
> structure with table options, which need to be accessed outside of
> table AM.  Then reloptions will be a backbox only directly used in
> table AM, while table AM has a freedom on what to store in reloptions
> and how to calculate externally-visible options.  What do you think?

Hi Alexander!

I agree with all of that. It will take some refactoring to get there,
though.

One idea is to store StdRdOptions like normal, but if an unrecognized
option is found, ask the table AM if it understands the option. In that
case I think we'd just use a different field in pg_class so that it can
use whatever format it wants to represent its options.

Regards,
    Jeff Davis




Re: Table AM Interface Enhancements

От
Pavel Borisov
Дата:
Hi, hackers!

On Tue, 2 Apr 2024 at 19:17, Jeff Davis <pgsql@j-davis.com> wrote:
On Tue, 2024-04-02 at 11:49 +0300, Alexander Korotkov wrote:
> I don't like the idea that every custom table AM reltoptions should
> begin with StdRdOptions.  I would rather introduce the new data
> structure with table options, which need to be accessed outside of
> table AM.  Then reloptions will be a backbox only directly used in
> table AM, while table AM has a freedom on what to store in reloptions
> and how to calculate externally-visible options.  What do you think?

Hi Alexander!

I agree with all of that. It will take some refactoring to get there,
though.

One idea is to store StdRdOptions like normal, but if an unrecognized
option is found, ask the table AM if it understands the option. In that
case I think we'd just use a different field in pg_class so that it can
use whatever format it wants to represent its options.

Regards,
        Jeff Davis
I tried to rework a patch regarding table am according to the input from Alexander and Jeff.

It splits table reloptions into two categories: 
- common for all tables (stored in a fixed size structure and could be accessed from outside) 
- table-am specific (variable size, parsed and accessed by access method only)

Please find a patch attached.
Вложения

Re: Table AM Interface Enhancements

От
Alexander Korotkov
Дата:
Hi, Pavel!

On Fri, Apr 5, 2024 at 6:58 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> On Tue, 2 Apr 2024 at 19:17, Jeff Davis <pgsql@j-davis.com> wrote:
>>
>> On Tue, 2024-04-02 at 11:49 +0300, Alexander Korotkov wrote:
>> > I don't like the idea that every custom table AM reltoptions should
>> > begin with StdRdOptions.  I would rather introduce the new data
>> > structure with table options, which need to be accessed outside of
>> > table AM.  Then reloptions will be a backbox only directly used in
>> > table AM, while table AM has a freedom on what to store in reloptions
>> > and how to calculate externally-visible options.  What do you think?
>>
>> Hi Alexander!
>>
>> I agree with all of that. It will take some refactoring to get there,
>> though.
>>
>> One idea is to store StdRdOptions like normal, but if an unrecognized
>> option is found, ask the table AM if it understands the option. In that
>> case I think we'd just use a different field in pg_class so that it can
>> use whatever format it wants to represent its options.
>>
>> Regards,
>>         Jeff Davis
>
> I tried to rework a patch regarding table am according to the input from Alexander and Jeff.
>
> It splits table reloptions into two categories:
> - common for all tables (stored in a fixed size structure and could be accessed from outside)
> - table-am specific (variable size, parsed and accessed by access method only)

Thank you for your work.  Please, check the revised patch.

It makes CommonRdOptions a separate data structure, not directly
involved in parsing the reloption.  Instead table AM can fill it on
the base of its reloptions or calculate the other way.  Patch comes
with a test module, which comes with heap-based table AM.  This table
AM has "enable_parallel" reloption, which is used as the base to set
the value of CommonRdOptions.parallel_workers.

------
Regards,
Alexander Korotkov

Вложения

Re: Table AM Interface Enhancements

От
Pavel Borisov
Дата:
Hi, Alexander!

On Sun, 7 Apr 2024 at 07:33, Alexander Korotkov <aekorotkov@gmail.com> wrote:
Hi, Pavel!

On Fri, Apr 5, 2024 at 6:58 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> On Tue, 2 Apr 2024 at 19:17, Jeff Davis <pgsql@j-davis.com> wrote:
>>
>> On Tue, 2024-04-02 at 11:49 +0300, Alexander Korotkov wrote:
>> > I don't like the idea that every custom table AM reltoptions should
>> > begin with StdRdOptions.  I would rather introduce the new data
>> > structure with table options, which need to be accessed outside of
>> > table AM.  Then reloptions will be a backbox only directly used in
>> > table AM, while table AM has a freedom on what to store in reloptions
>> > and how to calculate externally-visible options.  What do you think?
>>
>> Hi Alexander!
>>
>> I agree with all of that. It will take some refactoring to get there,
>> though.
>>
>> One idea is to store StdRdOptions like normal, but if an unrecognized
>> option is found, ask the table AM if it understands the option. In that
>> case I think we'd just use a different field in pg_class so that it can
>> use whatever format it wants to represent its options.
>>
>> Regards,
>>         Jeff Davis
>
> I tried to rework a patch regarding table am according to the input from Alexander and Jeff.
>
> It splits table reloptions into two categories:
> - common for all tables (stored in a fixed size structure and could be accessed from outside)
> - table-am specific (variable size, parsed and accessed by access method only)

Thank you for your work.  Please, check the revised patch.

It makes CommonRdOptions a separate data structure, not directly
involved in parsing the reloption.  Instead table AM can fill it on
the base of its reloptions or calculate the other way.  Patch comes
with a test module, which comes with heap-based table AM.  This table
AM has "enable_parallel" reloption, which is used as the base to set
the value of CommonRdOptions.parallel_workers.
To me, a patch v10 looks good.

I think the comment for RelationData now applies only to rd_options, not to rd_common_options.
>NULLs means "use defaults".

Regards,
Pavel

Re: Table AM Interface Enhancements

От
Pavel Borisov
Дата:
Hi, Alexander!

On Sun, 7 Apr 2024 at 12:34, Pavel Borisov <pashkin.elfe@gmail.com> wrote:
Hi, Alexander!

On Sun, 7 Apr 2024 at 07:33, Alexander Korotkov <aekorotkov@gmail.com> wrote:
Hi, Pavel!

On Fri, Apr 5, 2024 at 6:58 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> On Tue, 2 Apr 2024 at 19:17, Jeff Davis <pgsql@j-davis.com> wrote:
>>
>> On Tue, 2024-04-02 at 11:49 +0300, Alexander Korotkov wrote:
>> > I don't like the idea that every custom table AM reltoptions should
>> > begin with StdRdOptions.  I would rather introduce the new data
>> > structure with table options, which need to be accessed outside of
>> > table AM.  Then reloptions will be a backbox only directly used in
>> > table AM, while table AM has a freedom on what to store in reloptions
>> > and how to calculate externally-visible options.  What do you think?
>>
>> Hi Alexander!
>>
>> I agree with all of that. It will take some refactoring to get there,
>> though.
>>
>> One idea is to store StdRdOptions like normal, but if an unrecognized
>> option is found, ask the table AM if it understands the option. In that
>> case I think we'd just use a different field in pg_class so that it can
>> use whatever format it wants to represent its options.
>>
>> Regards,
>>         Jeff Davis
>
> I tried to rework a patch regarding table am according to the input from Alexander and Jeff.
>
> It splits table reloptions into two categories:
> - common for all tables (stored in a fixed size structure and could be accessed from outside)
> - table-am specific (variable size, parsed and accessed by access method only)

Thank you for your work.  Please, check the revised patch.

It makes CommonRdOptions a separate data structure, not directly
involved in parsing the reloption.  Instead table AM can fill it on
the base of its reloptions or calculate the other way.  Patch comes
with a test module, which comes with heap-based table AM.  This table
AM has "enable_parallel" reloption, which is used as the base to set
the value of CommonRdOptions.parallel_workers.
To me, a patch v10 looks good.

I think the comment for RelationData now applies only to rd_options, not to rd_common_options.
>NULLs means "use defaults".

Regards,
Pavel

I made minor changes to the patch. Please find v11 attached.

Regards,
Pavel.
Вложения

Re: Table AM Interface Enhancements

От
Andres Freund
Дата:
Hi,

On 2024-03-30 23:33:04 +0200, Alexander Korotkov wrote:
> I've pushed 0001, 0002 and 0006.

I briefly looked at 27bc1772fc81 and I don't think the state post this commit
makes sense. Before this commit another block based AM could implement analyze
without much code duplication. Now a large portion of analyze.c has to be
copied, because they can't stop acquire_sample_rows() from calling
heapam_scan_analyze_next_block().

I'm quite certain this will break a few out-of-core AMs in a way that can't
easily be fixed.


And even for non-block based AMs, the new interface basically requires
reimplementing all of analyze.c.

What am I missing here?

Greetings,

Andres Freund



Re: Table AM Interface Enhancements

От
Alexander Korotkov
Дата:
Hi,

On Mon, Apr 8, 2024 at 12:40 AM Andres Freund <andres@anarazel.de> wrote:
> On 2024-03-30 23:33:04 +0200, Alexander Korotkov wrote:
> > I've pushed 0001, 0002 and 0006.
>
> I briefly looked at 27bc1772fc81 and I don't think the state post this commit
> makes sense. Before this commit another block based AM could implement analyze
> without much code duplication. Now a large portion of analyze.c has to be
> copied, because they can't stop acquire_sample_rows() from calling
> heapam_scan_analyze_next_block().
>
> I'm quite certain this will break a few out-of-core AMs in a way that can't
> easily be fixed.

I was under the impression there are not so many out-of-core table
AMs, which have non-dummy analysis implementations.  And even if there
are some, duplicating acquire_sample_rows() isn't a big deal.

But given your feedback, I'd like to propose to keep both options
open.  Turn back the block-level API for analyze, but let table-AM
implement its own analyze function.  Then existing out-of-core AMs
wouldn't need to do anything (or probably just set the new API method
to NULL).

> And even for non-block based AMs, the new interface basically requires
> reimplementing all of analyze.c.
.
Non-lock base AM needs to just provide an alternative implementation
for what acquire_sample_rows() does.  This seems like reasonable
effort for me, and surely not reimplementing all of analyze.c.

------
Regards,
Alexander Korotkov

Вложения

Re: Table AM Interface Enhancements

От
Alexander Korotkov
Дата:
On Mon, Apr 8, 2024 at 2:25 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> On Mon, Apr 8, 2024 at 12:40 AM Andres Freund <andres@anarazel.de> wrote:
> > On 2024-03-30 23:33:04 +0200, Alexander Korotkov wrote:
> > > I've pushed 0001, 0002 and 0006.
> >
> > I briefly looked at 27bc1772fc81 and I don't think the state post this commit
> > makes sense. Before this commit another block based AM could implement analyze
> > without much code duplication. Now a large portion of analyze.c has to be
> > copied, because they can't stop acquire_sample_rows() from calling
> > heapam_scan_analyze_next_block().
> >
> > I'm quite certain this will break a few out-of-core AMs in a way that can't
> > easily be fixed.
>
> I was under the impression there are not so many out-of-core table
> AMs, which have non-dummy analysis implementations.  And even if there
> are some, duplicating acquire_sample_rows() isn't a big deal.
>
> But given your feedback, I'd like to propose to keep both options
> open.  Turn back the block-level API for analyze, but let table-AM
> implement its own analyze function.  Then existing out-of-core AMs
> wouldn't need to do anything (or probably just set the new API method
> to NULL).

The attached patch was to illustrate the approach.  It surely needs
some polishing.

------
Regards,
Alexander Korotkov



Re: Table AM Interface Enhancements

От
Andres Freund
Дата:
Hi,

On 2024-04-08 02:25:17 +0300, Alexander Korotkov wrote:
> On Mon, Apr 8, 2024 at 12:40 AM Andres Freund <andres@anarazel.de> wrote:
> > On 2024-03-30 23:33:04 +0200, Alexander Korotkov wrote:
> > > I've pushed 0001, 0002 and 0006.
> >
> > I briefly looked at 27bc1772fc81 and I don't think the state post this commit
> > makes sense. Before this commit another block based AM could implement analyze
> > without much code duplication. Now a large portion of analyze.c has to be
> > copied, because they can't stop acquire_sample_rows() from calling
> > heapam_scan_analyze_next_block().
> >
> > I'm quite certain this will break a few out-of-core AMs in a way that can't
> > easily be fixed.
>
> I was under the impression there are not so many out-of-core table
> AMs, which have non-dummy analysis implementations.

I know of at least 4 that have some production usage.


> And even if there are some, duplicating acquire_sample_rows() isn't a big
> deal.

I don't agree. The code has evolved a bunch over time, duplicating it into
various AMs is a bad idea.


> But given your feedback, I'd like to propose to keep both options
> open.  Turn back the block-level API for analyze, but let table-AM
> implement its own analyze function.  Then existing out-of-core AMs
> wouldn't need to do anything (or probably just set the new API method
> to NULL).

I think this patch simply hasn't been reviewed even close to careful enough
and should be reverted. It's IMO to late for a redesign.  Sorry for not
looking earlier, I was mostly out sick for the last few months.

I think a dedicated tableam callback for sample acquisition probably makes
sense, but if we want that, we need to provide an easy way for AMs that are
sufficiently block-like to reuse the code, not have two different ways to
implement analyze.

ISTM that ->relation_analyze is quite misleading as a name. For one, it it
just sets some callbacks, no? But more importantly, it sounds like it'd
actually allow to wrap the whole analyze process, rather than just the
acquisition of samples.

Greetings,

Andres Freund



Re: Table AM Interface Enhancements

От
Pavel Borisov
Дата:
Hi, Alexander and Andres!

On Mon, 8 Apr 2024 at 03:25, Alexander Korotkov <aekorotkov@gmail.com> wrote:
Hi,

On Mon, Apr 8, 2024 at 12:40 AM Andres Freund <andres@anarazel.de> wrote:
> On 2024-03-30 23:33:04 +0200, Alexander Korotkov wrote:
> > I've pushed 0001, 0002 and 0006.
>
> I briefly looked at 27bc1772fc81 and I don't think the state post this commit
> makes sense. Before this commit another block based AM could implement analyze
> without much code duplication. Now a large portion of analyze.c has to be
> copied, because they can't stop acquire_sample_rows() from calling
> heapam_scan_analyze_next_block().
>
> I'm quite certain this will break a few out-of-core AMs in a way that can't
> easily be fixed.

I was under the impression there are not so many out-of-core table
AMs, which have non-dummy analysis implementations.  And even if there
are some, duplicating acquire_sample_rows() isn't a big deal.

But given your feedback, I'd like to propose to keep both options
open.  Turn back the block-level API for analyze, but let table-AM
implement its own analyze function.  Then existing out-of-core AMs
wouldn't need to do anything (or probably just set the new API method
to NULL).
I think that providing both new and old interface functions for block-based and non-block based custom am is an excellent compromise. 

The patch v1-0001-Turn-back.. is mainly an undo of part of the 27bc1772fc81 that had turned off _analyze_next_tuple..analyze_next_block for external callers. If some extensions are already adapted to the old interface functions, they are free to still use it.

> And even for non-block based AMs, the new interface basically requires
> reimplementing all of analyze.c.
.
Non-lock base AM needs to just provide an alternative implementation
for what acquire_sample_rows() does.  This seems like reasonable
effort for me, and surely not reimplementing all of analyze.c.
I agree.

Regards, 
Pavel Borisov
Supabase

Re: Table AM Interface Enhancements

От
Alexander Korotkov
Дата:
On Mon, Apr 8, 2024 at 10:18 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> On Mon, 8 Apr 2024 at 03:25, Alexander Korotkov <aekorotkov@gmail.com> wrote:
>>
>> Hi,
>>
>> On Mon, Apr 8, 2024 at 12:40 AM Andres Freund <andres@anarazel.de> wrote:
>> > On 2024-03-30 23:33:04 +0200, Alexander Korotkov wrote:
>> > > I've pushed 0001, 0002 and 0006.
>> >
>> > I briefly looked at 27bc1772fc81 and I don't think the state post this commit
>> > makes sense. Before this commit another block based AM could implement analyze
>> > without much code duplication. Now a large portion of analyze.c has to be
>> > copied, because they can't stop acquire_sample_rows() from calling
>> > heapam_scan_analyze_next_block().
>> >
>> > I'm quite certain this will break a few out-of-core AMs in a way that can't
>> > easily be fixed.
>>
>> I was under the impression there are not so many out-of-core table
>> AMs, which have non-dummy analysis implementations.  And even if there
>> are some, duplicating acquire_sample_rows() isn't a big deal.
>>
>> But given your feedback, I'd like to propose to keep both options
>> open.  Turn back the block-level API for analyze, but let table-AM
>> implement its own analyze function.  Then existing out-of-core AMs
>> wouldn't need to do anything (or probably just set the new API method
>> to NULL).
>
> I think that providing both new and old interface functions for block-based and non-block based custom am is an
excellentcompromise. 
>
> The patch v1-0001-Turn-back.. is mainly an undo of part of the 27bc1772fc81 that had turned off
_analyze_next_tuple..analyze_next_blockfor external callers. If some extensions are already adapted to the old
interfacefunctions, they are free to still use it. 

Please, check this.  Instead of keeping two APIs, it generalizes
acquire_sample_rows().  The downside is change of
AcquireSampleRowsFunc signature, which would need some changes in FDWs
too.

------
Regards,
Alexander Korotkov

Вложения

Re: Table AM Interface Enhancements

От
Pavel Borisov
Дата:


On Mon, 8 Apr 2024 at 13:34, Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Mon, Apr 8, 2024 at 10:18 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> On Mon, 8 Apr 2024 at 03:25, Alexander Korotkov <aekorotkov@gmail.com> wrote:
>>
>> Hi,
>>
>> On Mon, Apr 8, 2024 at 12:40 AM Andres Freund <andres@anarazel.de> wrote:
>> > On 2024-03-30 23:33:04 +0200, Alexander Korotkov wrote:
>> > > I've pushed 0001, 0002 and 0006.
>> >
>> > I briefly looked at 27bc1772fc81 and I don't think the state post this commit
>> > makes sense. Before this commit another block based AM could implement analyze
>> > without much code duplication. Now a large portion of analyze.c has to be
>> > copied, because they can't stop acquire_sample_rows() from calling
>> > heapam_scan_analyze_next_block().
>> >
>> > I'm quite certain this will break a few out-of-core AMs in a way that can't
>> > easily be fixed.
>>
>> I was under the impression there are not so many out-of-core table
>> AMs, which have non-dummy analysis implementations.  And even if there
>> are some, duplicating acquire_sample_rows() isn't a big deal.
>>
>> But given your feedback, I'd like to propose to keep both options
>> open.  Turn back the block-level API for analyze, but let table-AM
>> implement its own analyze function.  Then existing out-of-core AMs
>> wouldn't need to do anything (or probably just set the new API method
>> to NULL).
>
> I think that providing both new and old interface functions for block-based and non-block based custom am is an excellent compromise.
>
> The patch v1-0001-Turn-back.. is mainly an undo of part of the 27bc1772fc81 that had turned off _analyze_next_tuple..analyze_next_block for external callers. If some extensions are already adapted to the old interface functions, they are free to still use it.

Please, check this.  Instead of keeping two APIs, it generalizes
acquire_sample_rows().  The downside is change of
AcquireSampleRowsFunc signature, which would need some changes in FDWs
too.
To me, both approaches v1-0001-Turn-back... and v2-0001-Generalize... and patch v2 look good. 

Pavel.

Re: Table AM Interface Enhancements

От
Pavel Borisov
Дата:
Hi, Alexander

On Mon, 8 Apr 2024 at 13:59, Pavel Borisov <pashkin.elfe@gmail.com> wrote:


On Mon, 8 Apr 2024 at 13:34, Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Mon, Apr 8, 2024 at 10:18 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> On Mon, 8 Apr 2024 at 03:25, Alexander Korotkov <aekorotkov@gmail.com> wrote:
>>
>> Hi,
>>
>> On Mon, Apr 8, 2024 at 12:40 AM Andres Freund <andres@anarazel.de> wrote:
>> > On 2024-03-30 23:33:04 +0200, Alexander Korotkov wrote:
>> > > I've pushed 0001, 0002 and 0006.
>> >
>> > I briefly looked at 27bc1772fc81 and I don't think the state post this commit
>> > makes sense. Before this commit another block based AM could implement analyze
>> > without much code duplication. Now a large portion of analyze.c has to be
>> > copied, because they can't stop acquire_sample_rows() from calling
>> > heapam_scan_analyze_next_block().
>> >
>> > I'm quite certain this will break a few out-of-core AMs in a way that can't
>> > easily be fixed.
>>
>> I was under the impression there are not so many out-of-core table
>> AMs, which have non-dummy analysis implementations.  And even if there
>> are some, duplicating acquire_sample_rows() isn't a big deal.
>>
>> But given your feedback, I'd like to propose to keep both options
>> open.  Turn back the block-level API for analyze, but let table-AM
>> implement its own analyze function.  Then existing out-of-core AMs
>> wouldn't need to do anything (or probably just set the new API method
>> to NULL).
>
> I think that providing both new and old interface functions for block-based and non-block based custom am is an excellent compromise.
>
> The patch v1-0001-Turn-back.. is mainly an undo of part of the 27bc1772fc81 that had turned off _analyze_next_tuple..analyze_next_block for external callers. If some extensions are already adapted to the old interface functions, they are free to still use it.

Please, check this.  Instead of keeping two APIs, it generalizes
acquire_sample_rows().  The downside is change of
AcquireSampleRowsFunc signature, which would need some changes in FDWs
too.
To me, both approaches v1-0001-Turn-back... and v2-0001-Generalize... and patch v2 look good. 

Pavel.

I added some changes in comments to better reflect changes in patch v2. See a patch v3 (code unchanged from v2)

Regards,
Pavel 
Вложения

Re: Table AM Interface Enhancements

От
Andres Freund
Дата:
Hi,

On 2024-04-08 11:17:51 +0400, Pavel Borisov wrote:
> On Mon, 8 Apr 2024 at 03:25, Alexander Korotkov <aekorotkov@gmail.com>
> > I was under the impression there are not so many out-of-core table
> > AMs, which have non-dummy analysis implementations.  And even if there
> > are some, duplicating acquire_sample_rows() isn't a big deal.
> >
> > But given your feedback, I'd like to propose to keep both options
> > open.  Turn back the block-level API for analyze, but let table-AM
> > implement its own analyze function.  Then existing out-of-core AMs
> > wouldn't need to do anything (or probably just set the new API method
> > to NULL).
> >
> I think that providing both new and old interface functions for block-based
> and non-block based custom am is an excellent compromise.

I don't agree, that way lies an unmanageable API. To me the new API doesn't
look well polished either, so it's not a question of a smoother transition or
something like that.

I don't think redesigning extension APIs at this stage of the release cycle
makes sense.

Greetings,

Andres Freund



Re: Table AM Interface Enhancements

От
Andres Freund
Дата:
On 2024-04-08 08:37:44 -0700, Andres Freund wrote:
> On 2024-04-08 11:17:51 +0400, Pavel Borisov wrote:
> > On Mon, 8 Apr 2024 at 03:25, Alexander Korotkov <aekorotkov@gmail.com>
> > > I was under the impression there are not so many out-of-core table
> > > AMs, which have non-dummy analysis implementations.  And even if there
> > > are some, duplicating acquire_sample_rows() isn't a big deal.
> > >
> > > But given your feedback, I'd like to propose to keep both options
> > > open.  Turn back the block-level API for analyze, but let table-AM
> > > implement its own analyze function.  Then existing out-of-core AMs
> > > wouldn't need to do anything (or probably just set the new API method
> > > to NULL).
> > >
> > I think that providing both new and old interface functions for block-based
> > and non-block based custom am is an excellent compromise.
>
> I don't agree, that way lies an unmanageable API. To me the new API doesn't
> look well polished either, so it's not a question of a smoother transition or
> something like that.
>
> I don't think redesigning extension APIs at this stage of the release cycle
> makes sense.

Wait, you already pushed an API redesign? With a design that hasn't even seen
the list from what I can tell? Without even mentioning that on the list? You
got to be kidding me.



Re: Table AM Interface Enhancements

От
Alexander Korotkov
Дата:
On Mon, Apr 8, 2024, 19:08 Andres Freund <andres@anarazel.de> wrote:
On 2024-04-08 08:37:44 -0700, Andres Freund wrote:
> On 2024-04-08 11:17:51 +0400, Pavel Borisov wrote:
> > On Mon, 8 Apr 2024 at 03:25, Alexander Korotkov <aekorotkov@gmail.com>
> > > I was under the impression there are not so many out-of-core table
> > > AMs, which have non-dummy analysis implementations.  And even if there
> > > are some, duplicating acquire_sample_rows() isn't a big deal.
> > >
> > > But given your feedback, I'd like to propose to keep both options
> > > open.  Turn back the block-level API for analyze, but let table-AM
> > > implement its own analyze function.  Then existing out-of-core AMs
> > > wouldn't need to do anything (or probably just set the new API method
> > > to NULL).
> > >
> > I think that providing both new and old interface functions for block-based
> > and non-block based custom am is an excellent compromise.
>
> I don't agree, that way lies an unmanageable API. To me the new API doesn't
> look well polished either, so it's not a question of a smoother transition or
> something like that.
>
> I don't think redesigning extension APIs at this stage of the release cycle
> makes sense.

Wait, you already pushed an API redesign? With a design that hasn't even seen
the list from what I can tell? Without even mentioning that on the list? You
got to be kidding me.

Yes, it was my mistake. I got rushing trying to fit this to FF, even doing significant changes just before commit.
I'll revert this later today.

------
Regards,
Alexander Korotkov

Re: Table AM Interface Enhancements

От
Robert Haas
Дата:
On Mon, Apr 8, 2024 at 12:33 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> Yes, it was my mistake. I got rushing trying to fit this to FF, even doing significant changes just before commit.
> I'll revert this later today.

Alexander,

Exactly how much is getting reverted here? I see these, all since March 23rd:

dd1f6b0c17 Provide a way block-level table AMs could re-use
acquire_sample_rows()
9bd99f4c26 Custom reloptions for table AM
97ce821e3e Fix the parameters order for
TableAmRoutine.relation_copy_for_cluster()
867cc7b6dd Revert "Custom reloptions for table AM"
b1484a3f19 Let table AM insertion methods control index insertion
c95c25f9af Custom reloptions for table AM
27bc1772fc Generalize relation analyze in table AM interface
87985cc925 Allow locking updated tuples in tuple_update() and tuple_delete()
c35a3fb5e0 Allow table AM tuple_insert() method to return the different slot
02eb07ea89 Allow table AM to store complex data structures in rd_amcache

I'm not really feeling very good about all of this, because:

- 87985cc925 was previously committed as 11470f544e on March 23, 2023,
and almost immediately reverted. Now you tried again on March 26,
2024. I know there was a bunch of rework in the middle, but there are
times in the year that things can be committed other than right before
the feature freeze. Like, don't wait a whole year for the next attempt
and then again do it right before the cutoff.

- The Discussion links in the commit messages do not seem to stand for
the proposition that these particular patches ought to be committed in
this form. Some of them are just links to the messages where the patch
was originally posted, which is probably not against policy or
anything, but it'd be nicer to see links to versions of the patch with
which people are, in nearby emails, agreeing. Even worse, some of
these are links to emails where somebody said, "hey, some earlier
commit does not look good." In particular,
dd1f6b0c172a643a73d6b71259fa2d10378b39eb has a discussion link where
Andres complains about 27bc1772fc814946918a5ac8ccb9b5c5ad0380aa, but
it's not clear how that justifies the new commit.

- The commit message for 867cc7b6dd says "This reverts commit
c95c25f9af4bc77f2f66a587735c50da08c12b37 due to multiple design issues
spotted after commit." That's not a very good justification for then
trying again 6 days later with 9bd99f4c26, and it's *definitely* not a
good justification for there being no meaningful discussion links in
the commit message for 9bd99f4c26. They're just the same links you had
in the previous attempt, so it's pretty hard for anybody to understand
what got fixed and whether all of the concerns were really addressed.
Just looking over the commit, it's pretty hard to understand what is
being changed and why: there's not a lot of comment updates, there's
no documentation changes, and there's not a lot of explanation in the
commit message either. Even if this feature is great and all the code
is perfect now, it's going to be hard for anyone to figure out how to
use it.

97ce821e3e looks like a clear bug fix to me, but I wonder if the rest
of this should all just be reverted, with a ban on ever trying it
again after March 1 of any year. I'd like to believe that there are
only bookkeeping problems here, and that there was in fact clear
agreement that all of these changes should be made in this form, and
that the commit messages simply failed to reference the most relevant
emails. But what I fear, especially in view of Andres's remarks, is
that these commits were done in haste without adequate consensus, and
I think that's a serious problem.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Table AM Interface Enhancements

От
Alexander Korotkov
Дата:
On Mon, Apr 8, 2024 at 9:54 PM Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Apr 8, 2024 at 12:33 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > Yes, it was my mistake. I got rushing trying to fit this to FF, even doing significant changes just before commit.
> > I'll revert this later today.

It appears to be a non-trivial revert, because 041b96802e already
revised the relation analyze after 27bc1772fc.  That is, I would need
to "backport" 041b96802e.  Sorry, I'm too tired to do this today.
I'll come back to this tomorrow.

> Alexander,
>
> Exactly how much is getting reverted here? I see these, all since March 23rd:
>
> dd1f6b0c17 Provide a way block-level table AMs could re-use
> acquire_sample_rows()
> 9bd99f4c26 Custom reloptions for table AM
> 97ce821e3e Fix the parameters order for
> TableAmRoutine.relation_copy_for_cluster()
> 867cc7b6dd Revert "Custom reloptions for table AM"
> b1484a3f19 Let table AM insertion methods control index insertion
> c95c25f9af Custom reloptions for table AM
> 27bc1772fc Generalize relation analyze in table AM interface
> 87985cc925 Allow locking updated tuples in tuple_update() and tuple_delete()
> c35a3fb5e0 Allow table AM tuple_insert() method to return the different slot
> 02eb07ea89 Allow table AM to store complex data structures in rd_amcache

It would be discouraging to revert all of this.  Some items are very
simple, some items get a lot of work.  I'll come back tomorrow and
answer all your points.

------
Regards,
Alexander Korotkov



Re: Table AM Interface Enhancements

От
Alexander Korotkov
Дата:
On Mon, Apr 8, 2024 at 9:54 PM Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Apr 8, 2024 at 12:33 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > Yes, it was my mistake. I got rushing trying to fit this to FF, even doing significant changes just before commit.
> > I'll revert this later today.

The patch to revert is attached.  Given that revert touches the work
done in 041b96802e, I think it needs some feedback before push.

> Alexander,
>
> Exactly how much is getting reverted here? I see these, all since March 23rd:
>
> dd1f6b0c17 Provide a way block-level table AMs could re-use
> acquire_sample_rows()
> 9bd99f4c26 Custom reloptions for table AM
> 97ce821e3e Fix the parameters order for
> TableAmRoutine.relation_copy_for_cluster()
> 867cc7b6dd Revert "Custom reloptions for table AM"
> b1484a3f19 Let table AM insertion methods control index insertion
> c95c25f9af Custom reloptions for table AM
> 27bc1772fc Generalize relation analyze in table AM interface
> 87985cc925 Allow locking updated tuples in tuple_update() and tuple_delete()
> c35a3fb5e0 Allow table AM tuple_insert() method to return the different slot
> 02eb07ea89 Allow table AM to store complex data structures in rd_amcache
>
> I'm not really feeling very good about all of this, because:
>
> - 87985cc925 was previously committed as 11470f544e on March 23, 2023,
> and almost immediately reverted. Now you tried again on March 26,
> 2024. I know there was a bunch of rework in the middle, but there are
> times in the year that things can be committed other than right before
> the feature freeze. Like, don't wait a whole year for the next attempt
> and then again do it right before the cutoff.

I agree with the facts.  But I have a different interpretation on
this.  The patch was committed as 11470f544e on March 23, 2023, then
reverted on April 3.  I've proposed the revised version, but Andres
complained that this is the new API design days before FF.  Then the
patch with this design was published in the thread for the year with
periodical rebases.  So, I think I expressed my intention with that
design before 2023 FF, nobody prevented me from expressing objections
or other feedback during the year.  Then I realized that 2024 FF is
approaching and decided to give this another try for pg18.

But I don't yet see it's wrong with this patch.  I waited a year for
feedback.  I waited 2 days after saying "I will push this if no
objections". Given your feedback now, I get that it would be better to
do another attempt to commit this earlier.

I admit my mistake with dd1f6b0c17.  I get rushed trying to fix the
things actually making things worse.  I apologise for this.  But if
I'm forced to revert 87985cc925 without even hearing any reasonable
critics besides imperfection of timing, I feel like this is the
punishment for my mistake with dd1f6b0c17.  Pretty unreasonable
punishment in my view.

> - The Discussion links in the commit messages do not seem to stand for
> the proposition that these particular patches ought to be committed in
> this form. Some of them are just links to the messages where the patch
> was originally posted, which is probably not against policy or
> anything, but it'd be nicer to see links to versions of the patch with
> which people are, in nearby emails, agreeing. Even worse, some of
> these are links to emails where somebody said, "hey, some earlier
> commit does not look good." In particular,
> dd1f6b0c172a643a73d6b71259fa2d10378b39eb has a discussion link where
> Andres complains about 27bc1772fc814946918a5ac8ccb9b5c5ad0380aa, but
> it's not clear how that justifies the new commit.

I have to repeat again, that I admit my mistake with dd1f6b0c17,
apologize for that, and make my own conclusions to not repeat this.
But dd1f6b0c17 seems to be the only one that has a link to the message
with complains.  I went through the list of commits above, it seems
that others have just linked to the first message of the thread.
Probably, there is a lack of consensus for some of them.  But I never
heard about a policy to link not just the discussion start, but also
exact messages expressing agreeing.  And I didn't see others doing
that.

> - The commit message for 867cc7b6dd says "This reverts commit
> c95c25f9af4bc77f2f66a587735c50da08c12b37 due to multiple design issues
> spotted after commit." That's not a very good justification for then
> trying again 6 days later with 9bd99f4c26, and it's *definitely* not a
> good justification for there being no meaningful discussion links in
> the commit message for 9bd99f4c26. They're just the same links you had
> in the previous attempt, so it's pretty hard for anybody to understand
> what got fixed and whether all of the concerns were really addressed.
> Just looking over the commit, it's pretty hard to understand what is
> being changed and why: there's not a lot of comment updates, there's
> no documentation changes, and there's not a lot of explanation in the
> commit message either. Even if this feature is great and all the code
> is perfect now, it's going to be hard for anyone to figure out how to
> use it.

1) 9bd99f4c26 comprises the reworked patch after working with notes
from Jeff Davis.  I agree it would be better to wait for him to
express explicit agreement.  Before reverting this, I would prefer to
hear his opinion.
2) One of the issues here is that table AM API doesn't have
documentation, it has just a very brief page which doesn't go deep
explaining particular API methods.  I have heard a lot of complains
about that from users attempting to write table access methods.  It's
now too late to complain about that (but if I had a wisdom of now back
during pg12 development I would definitely object against table AM API
being committed at that shape).  I understand I could be more
proactive and propose a patch with that documentation.

> 97ce821e3e looks like a clear bug fix to me, but I wonder if the rest
> of this should all just be reverted, with a ban on ever trying it
> again after March 1 of any year.

Do you propose a ban from March 1 to the end of any year?  I think the
first doesn't make sense, because it leaves only 2 months a year for
the work.  That would create a potential rush during these 2 month and
could serve exactly opposite to the intention.  So, I guess this means
a ban from March 1 to the FF of any year.  The situation now is quite
unpleasant for me.  So I'm far from repeating this next year.
However, if there should be a formal ban, it should be specified.
Does it relate to the patches I've pushed, all patches in this thread,
all similar patches, all table AM patches, or other API patches?

Surely, I'm an interested party and can't be impartial.  But I think
it would be nice if we introduce some general rules based on this
experience.  Could we have some API freeze date some time before the
feature freeze?

> I'd like to believe that there are
> only bookkeeping problems here, and that there was in fact clear
> agreement that all of these changes should be made in this form, and
> that the commit messages simply failed to reference the most relevant
> emails. But what I fear, especially in view of Andres's remarks, is
> that these commits were done in haste without adequate consensus, and
> I think that's a serious problem.

This thread had a lot of patches for table AM API.  My intention for
pg17 was to commit the easiest and least contradictory of them.  I
understand there should be more consensus for some of them and
committing dd1f6b0c17 instead of reverting 27bc1772fc was a mistake.
But I don't feel good about reverting everything in a row without
clear feedback.

------
Regards,
Alexander Korotkov

Вложения

Re: Table AM Interface Enhancements

От
Robert Haas
Дата:
On Wed, Apr 10, 2024 at 8:20 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> I agree with the facts.  But I have a different interpretation on
> this.  The patch was committed as 11470f544e on March 23, 2023, then
> reverted on April 3.  I've proposed the revised version, but Andres
> complained that this is the new API design days before FF.

Well, his first complaint that your committed patch was full of bugs:

https://www.postgresql.org/message-id/20230323003003.plgaxjqahjgkuxrk%40awork3.anarazel.de

When you commit a patch and another committer writes a post-commit
review saying that your patch has so many serious problems that he
gave up on reviewing before enumerating all of them, that's a really
bad sign. That should be a strong signal to you to step back and take
a close look at whether you really understand the area of the code
that you're touching well enough to be doing whatever it is that
you're doing. If I got a review like that, I would have reverted the
patch instantly, given up for the release cycle, possibly given up on
the patch permanently, and most definitely not tried again to commit
unless I was absolutely certain that I'd learned a lot in the meantime
*and* had the agreement of the committer who wrote that review (or
maybe some other committer who was acknowledged as an expert in that
area of the code).

What you did instead is try to do a bunch of post-commit fixup in a
desperate rush right before feature freeze, to which Andres
understandably objected. But that was your second mistake, not your
first one.

> Then the
> patch with this design was published in the thread for the year with
> periodical rebases.  So, I think I expressed my intention with that
> design before 2023 FF, nobody prevented me from expressing objections
> or other feedback during the year.  Then I realized that 2024 FF is
> approaching and decided to give this another try for pg18.

This doesn't seem to match the facts as I understand them. It appears
to me that there was no activity on the thread from April until
November. The message in November was not written by you. Your first
post to the thread after April of 2023 was on March 19, 2024. Five
days later you said you wanted to commit. That doesn't look to me like
you worked diligently on the patch set throughout the year and other
people had reasonable notice that you planned to get the work done
this cycle. It looks like you ignored the patch for 11 months and then
committed it without any real further feedback from anyone. True,
Pavel did post and say that he thought the patches were in good shape.
But you could hardly take that as evidence that Andres was now content
that the problems he'd raised earlier had been fixed, because (a)
Pavel had also been involved beforehand and had not raised the
concerns that Andres later raised and (b) Pavel wrote nothing in his
email specifically about why he thought your changes or his had
resolved those concerns. I certainly agree that Andres doesn't always
give as much review feedback as I'd like to have from him in, and it's
also true that he doesn't always give that feedback as quickly as I'd
like to have it ... but you know what?

It's not Andres's job to make sure my patches are not broken. It's my
job. That applies to the patches I write, and the patches written by
other people that I commit. If I commit something and it turns out
that it is broken, that's my bad. If I commit something and it turns
out that it does not have consensus, that is also my bad. It is not
the fault of the other people for not helping me get my patches to a
state where they are up to project standard. It is my fault, and my
fault alone, for committing something that was not ready. Now that
does not mean that it isn't frustrating when I can't get the help I
need. It is extremely frustrating. But the solution is not to commit
anyway and then blame the other people for not providing feedback.

I mean, committing without explicit agreement from someone else is OK
if you're pretty sure that you've got everything sorted out correctly.
But I don't think that the paper trail here supports the narrative
that you worked on this diligently throughout the year and had every
reason to believe it would be acceptable to the community. If I'd
looked at this thread, I would have concluded that you'd abandoned the
project. I would have expected that, when you picked it up again,
there would be a series of emails over a period of time carefully
working through the various issues that had been raised, inviting
specific commentary on specific discussion points, and generally
refining the work, and then maybe a suggestion of a commit at the end.
I would not have expected an email or two basically saying "well,
seems like it's all fixed now," followed by a commit.

> Do you propose a ban from March 1 to the end of any year?  I think the
> first doesn't make sense, because it leaves only 2 months a year for
> the work.  That would create a potential rush during these 2 month and
> could serve exactly opposite to the intention.  So, I guess this means
> a ban from March 1 to the FF of any year.  The situation now is quite
> unpleasant for me.  So I'm far from repeating this next year.
> However, if there should be a formal ban, it should be specified.
> Does it relate to the patches I've pushed, all patches in this thread,
> all similar patches, all table AM patches, or other API patches?

I meant from March 1 to feature freeze, but maybe I should have
proposed that you shouldn't ever commit these patches. The more I look
at this, the less happy I am with how you did it.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Table AM Interface Enhancements

От
Pavel Borisov
Дата:
Hi, Alexander!

On Wed, 10 Apr 2024 at 16:20, Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Mon, Apr 8, 2024 at 9:54 PM Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Apr 8, 2024 at 12:33 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > Yes, it was my mistake. I got rushing trying to fit this to FF, even doing significant changes just before commit.
> > I'll revert this later today.

The patch to revert is attached.  Given that revert touches the work
done in 041b96802e, I think it needs some feedback before push.

> Alexander,
>
> Exactly how much is getting reverted here? I see these, all since March 23rd:
>
> dd1f6b0c17 Provide a way block-level table AMs could re-use
> acquire_sample_rows()
> 9bd99f4c26 Custom reloptions for table AM
> 97ce821e3e Fix the parameters order for
> TableAmRoutine.relation_copy_for_cluster()
> 867cc7b6dd Revert "Custom reloptions for table AM"
> b1484a3f19 Let table AM insertion methods control index insertion
> c95c25f9af Custom reloptions for table AM
> 27bc1772fc Generalize relation analyze in table AM interface
> 87985cc925 Allow locking updated tuples in tuple_update() and tuple_delete()
> c35a3fb5e0 Allow table AM tuple_insert() method to return the different slot
> 02eb07ea89 Allow table AM to store complex data structures in rd_amcache
>
> I'm not really feeling very good about all of this, because:
>
> - 87985cc925 was previously committed as 11470f544e on March 23, 2023,
> and almost immediately reverted. Now you tried again on March 26,
> 2024. I know there was a bunch of rework in the middle, but there are
> times in the year that things can be committed other than right before
> the feature freeze. Like, don't wait a whole year for the next attempt
> and then again do it right before the cutoff.

I agree with the facts.  But I have a different interpretation on
this.  The patch was committed as 11470f544e on March 23, 2023, then
reverted on April 3.  I've proposed the revised version, but Andres
complained that this is the new API design days before FF.  Then the
patch with this design was published in the thread for the year with
periodical rebases.  So, I think I expressed my intention with that
design before 2023 FF, nobody prevented me from expressing objections
or other feedback during the year.  Then I realized that 2024 FF is
approaching and decided to give this another try for pg18.

But I don't yet see it's wrong with this patch.  I waited a year for
feedback.  I waited 2 days after saying "I will push this if no
objections". Given your feedback now, I get that it would be better to
do another attempt to commit this earlier.

I admit my mistake with dd1f6b0c17.  I get rushed trying to fix the
things actually making things worse.  I apologise for this.  But if
I'm forced to revert 87985cc925 without even hearing any reasonable
critics besides imperfection of timing, I feel like this is the
punishment for my mistake with dd1f6b0c17.  Pretty unreasonable
punishment in my view.

> - The Discussion links in the commit messages do not seem to stand for
> the proposition that these particular patches ought to be committed in
> this form. Some of them are just links to the messages where the patch
> was originally posted, which is probably not against policy or
> anything, but it'd be nicer to see links to versions of the patch with
> which people are, in nearby emails, agreeing. Even worse, some of
> these are links to emails where somebody said, "hey, some earlier
> commit does not look good." In particular,
> dd1f6b0c172a643a73d6b71259fa2d10378b39eb has a discussion link where
> Andres complains about 27bc1772fc814946918a5ac8ccb9b5c5ad0380aa, but
> it's not clear how that justifies the new commit.

I have to repeat again, that I admit my mistake with dd1f6b0c17,
apologize for that, and make my own conclusions to not repeat this.
But dd1f6b0c17 seems to be the only one that has a link to the message
with complains.  I went through the list of commits above, it seems
that others have just linked to the first message of the thread.
Probably, there is a lack of consensus for some of them.  But I never
heard about a policy to link not just the discussion start, but also
exact messages expressing agreeing.  And I didn't see others doing
that.

> - The commit message for 867cc7b6dd says "This reverts commit
> c95c25f9af4bc77f2f66a587735c50da08c12b37 due to multiple design issues
> spotted after commit." That's not a very good justification for then
> trying again 6 days later with 9bd99f4c26, and it's *definitely* not a
> good justification for there being no meaningful discussion links in
> the commit message for 9bd99f4c26. They're just the same links you had
> in the previous attempt, so it's pretty hard for anybody to understand
> what got fixed and whether all of the concerns were really addressed.
> Just looking over the commit, it's pretty hard to understand what is
> being changed and why: there's not a lot of comment updates, there's
> no documentation changes, and there's not a lot of explanation in the
> commit message either. Even if this feature is great and all the code
> is perfect now, it's going to be hard for anyone to figure out how to
> use it.

1) 9bd99f4c26 comprises the reworked patch after working with notes
from Jeff Davis.  I agree it would be better to wait for him to
express explicit agreement.  Before reverting this, I would prefer to
hear his opinion.
2) One of the issues here is that table AM API doesn't have
documentation, it has just a very brief page which doesn't go deep
explaining particular API methods.  I have heard a lot of complains
about that from users attempting to write table access methods.  It's
now too late to complain about that (but if I had a wisdom of now back
during pg12 development I would definitely object against table AM API
being committed at that shape).  I understand I could be more
proactive and propose a patch with that documentation.

> 97ce821e3e looks like a clear bug fix to me, but I wonder if the rest
> of this should all just be reverted, with a ban on ever trying it
> again after March 1 of any year.

Do you propose a ban from March 1 to the end of any year?  I think the
first doesn't make sense, because it leaves only 2 months a year for
the work.  That would create a potential rush during these 2 month and
could serve exactly opposite to the intention.  So, I guess this means
a ban from March 1 to the FF of any year.  The situation now is quite
unpleasant for me.  So I'm far from repeating this next year.
However, if there should be a formal ban, it should be specified.
Does it relate to the patches I've pushed, all patches in this thread,
all similar patches, all table AM patches, or other API patches?

Surely, I'm an interested party and can't be impartial.  But I think
it would be nice if we introduce some general rules based on this
experience.  Could we have some API freeze date some time before the
feature freeze?

> I'd like to believe that there are
> only bookkeeping problems here, and that there was in fact clear
> agreement that all of these changes should be made in this form, and
> that the commit messages simply failed to reference the most relevant
> emails. But what I fear, especially in view of Andres's remarks, is
> that these commits were done in haste without adequate consensus, and
> I think that's a serious problem.

This thread had a lot of patches for table AM API.  My intention for
pg17 was to commit the easiest and least contradictory of them.  I
understand there should be more consensus for some of them and
committing dd1f6b0c17 instead of reverting 27bc1772fc was a mistake.
But I don't feel good about reverting everything in a row without
clear feedback.

------
Regards,
Alexander Korotkov

In my view, the actual list of what has raised discussion is:
dd1f6b0c17 Provide a way block-level table AMs could re-use acquire_sample_rows()
27bc1772fc Generalize relation analyze in table AM interface

Proposals to revert the other patches in a wholesale way look to me like an ill-performed continuation of a discussion [1]. I can't believe that "Let's select which commits close to FF looks worse than the others" based on whereabouts, not patch contents is a good and productive way for the community to use.

At the same time if Andres, who is the most experienced person in the scope of access methods is willing to give his post-commit re-review of any of the committed patches and will recommend some of them reverted, it would be a good sensible input to act accordingly.
patch 



Re: Table AM Interface Enhancements

От
Joe Conway
Дата:
On 4/10/24 09:19, Robert Haas wrote:
> When you commit a patch and another committer writes a post-commit
> review saying that your patch has so many serious problems that he
> gave up on reviewing before enumerating all of them, that's a really
> bad sign. That should be a strong signal to you to step back and take
> a close look at whether you really understand the area of the code
> that you're touching well enough to be doing whatever it is that
> you're doing. If I got a review like that, I would have reverted the
> patch instantly, given up for the release cycle, possibly given up on
> the patch permanently, and most definitely not tried again to commit
> unless I was absolutely certain that I'd learned a lot in the meantime
> *and* had the agreement of the committer who wrote that review (or
> maybe some other committer who was acknowledged as an expert in that
> area of the code).

<snip>

> It's not Andres's job to make sure my patches are not broken. It's my
> job. That applies to the patches I write, and the patches written by
> other people that I commit. If I commit something and it turns out
> that it is broken, that's my bad. If I commit something and it turns
> out that it does not have consensus, that is also my bad. It is not
> the fault of the other people for not helping me get my patches to a
> state where they are up to project standard. It is my fault, and my
> fault alone, for committing something that was not ready. Now that
> does not mean that it isn't frustrating when I can't get the help I
> need. It is extremely frustrating. But the solution is not to commit
> anyway and then blame the other people for not providing feedback.

+many

-- 
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Table AM Interface Enhancements

От
Alexander Korotkov
Дата:
On Wed, Apr 10, 2024 at 4:19 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Apr 10, 2024 at 8:20 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > I agree with the facts.  But I have a different interpretation on
> > this.  The patch was committed as 11470f544e on March 23, 2023, then
> > reverted on April 3.  I've proposed the revised version, but Andres
> > complained that this is the new API design days before FF.
>
> Well, his first complaint that your committed patch was full of bugs:
>
> https://www.postgresql.org/message-id/20230323003003.plgaxjqahjgkuxrk%40awork3.anarazel.de
>
> When you commit a patch and another committer writes a post-commit
> review saying that your patch has so many serious problems that he
> gave up on reviewing before enumerating all of them, that's a really
> bad sign. That should be a strong signal to you to step back and take
> a close look at whether you really understand the area of the code
> that you're touching well enough to be doing whatever it is that
> you're doing. If I got a review like that, I would have reverted the
> patch instantly, given up for the release cycle, possibly given up on
> the patch permanently, and most definitely not tried again to commit
> unless I was absolutely certain that I'd learned a lot in the meantime
> *and* had the agreement of the committer who wrote that review (or
> maybe some other committer who was acknowledged as an expert in that
> area of the code).
>
> What you did instead is try to do a bunch of post-commit fixup in a
> desperate rush right before feature freeze, to which Andres
> understandably objected. But that was your second mistake, not your
> first one.
>
> > Then the
> > patch with this design was published in the thread for the year with
> > periodical rebases.  So, I think I expressed my intention with that
> > design before 2023 FF, nobody prevented me from expressing objections
> > or other feedback during the year.  Then I realized that 2024 FF is
> > approaching and decided to give this another try for pg18.
>
> This doesn't seem to match the facts as I understand them. It appears
> to me that there was no activity on the thread from April until
> November. The message in November was not written by you. Your first
> post to the thread after April of 2023 was on March 19, 2024. Five
> days later you said you wanted to commit. That doesn't look to me like
> you worked diligently on the patch set throughout the year and other
> people had reasonable notice that you planned to get the work done
> this cycle. It looks like you ignored the patch for 11 months and then
> committed it without any real further feedback from anyone. True,
> Pavel did post and say that he thought the patches were in good shape.
> But you could hardly take that as evidence that Andres was now content
> that the problems he'd raised earlier had been fixed, because (a)
> Pavel had also been involved beforehand and had not raised the
> concerns that Andres later raised and (b) Pavel wrote nothing in his
> email specifically about why he thought your changes or his had
> resolved those concerns. I certainly agree that Andres doesn't always
> give as much review feedback as I'd like to have from him in, and it's
> also true that he doesn't always give that feedback as quickly as I'd
> like to have it ... but you know what?
>
> It's not Andres's job to make sure my patches are not broken. It's my
> job. That applies to the patches I write, and the patches written by
> other people that I commit. If I commit something and it turns out
> that it is broken, that's my bad. If I commit something and it turns
> out that it does not have consensus, that is also my bad. It is not
> the fault of the other people for not helping me get my patches to a
> state where they are up to project standard. It is my fault, and my
> fault alone, for committing something that was not ready. Now that
> does not mean that it isn't frustrating when I can't get the help I
> need. It is extremely frustrating. But the solution is not to commit
> anyway and then blame the other people for not providing feedback.
>
> I mean, committing without explicit agreement from someone else is OK
> if you're pretty sure that you've got everything sorted out correctly.
> But I don't think that the paper trail here supports the narrative
> that you worked on this diligently throughout the year and had every
> reason to believe it would be acceptable to the community. If I'd
> looked at this thread, I would have concluded that you'd abandoned the
> project. I would have expected that, when you picked it up again,
> there would be a series of emails over a period of time carefully
> working through the various issues that had been raised, inviting
> specific commentary on specific discussion points, and generally
> refining the work, and then maybe a suggestion of a commit at the end.
> I would not have expected an email or two basically saying "well,
> seems like it's all fixed now," followed by a commit.

Robert, I appreciate your feedback.  I don't say I agree with
everything.  For example, I definitely wasn't going to place the blame
on others for not giving feedback.  My point was to show that it
wasn't so that I've committed that patch without taking feedback into
account.  But arguing on every point doesn't feel reasonable for now.
I would better share particular conclusions I made:
1) I shouldn't argue too much about reverting patches especially with
committers more experienced with relevant part of codebase.
2) The fact that previous feedback is taken into account should be
expressed more explicitly everywhere: in comments, commit messages,
mailing list messages etc.

But I have to mention that even that I've committed table AM stuff
close to the FF, there has been quite amount of depended work
committed.  So, revert of these patches is promising to be not
something immediate and easy, which requires just the decision.  It
would touch others work.  And and revert patches might also need
review.  I get the point that patches got lack of consensus.  But in
terms of efforts (not my efforts) it's probably makes sense to get
them some post-commit review.

> > Do you propose a ban from March 1 to the end of any year?  I think the
> > first doesn't make sense, because it leaves only 2 months a year for
> > the work.  That would create a potential rush during these 2 month and
> > could serve exactly opposite to the intention.  So, I guess this means
> > a ban from March 1 to the FF of any year.  The situation now is quite
> > unpleasant for me.  So I'm far from repeating this next year.
> > However, if there should be a formal ban, it should be specified.
> > Does it relate to the patches I've pushed, all patches in this thread,
> > all similar patches, all table AM patches, or other API patches?
>
> I meant from March 1 to feature freeze, but maybe I should have
> proposed that you shouldn't ever commit these patches. The more I look
> at this, the less happy I am with how you did it.

Robert, look.  Last year I went through the arrest for expressing my
opinion.  I that was not what normal arrest should look like, but a
period of survival.  My family went through a period of fear, struggle
and uncertainty.  Now, we're healthy and safe, but there is still
uncertainty given asylum seeker status.  During all this period, I
have to just obey, agree with everything, lie that I apologize about
things I don't apologize.  I had to do this, because the price of
expressing myself was not just my life, but also health, freedom and
well-being of my family.

I owe you great respect for all your work for PostgreSQL, and
especially for your efforts on getting things organized.  But it
wouldn't work the way you increase my potential punishment and I just
say that I'm obey and you're right about everything.  You may even
initiate the procedure of my exclusion from committers (no idea what
the procedure is), ban from the list etc.  I see you express many
valuable points, but my view is not exactly same as yours.  And like a
conclusion to some as result of discussion not threats.

I feel the sense of blame and fear in latest discussions, and I don't
like it.  That's OK to place the blame from time to time.  But I would
like to add here more joy and respect (and I'm sorry I personally
didn't do enough in this matter).  It's important get things right
etc.  But in long term relationships may mean more.

------
Regards,
Alexander Korotkov



Re: Table AM Interface Enhancements

От
Andres Freund
Дата:
Hi,

On 2024-04-08 14:54:46 -0400, Robert Haas wrote:
> Exactly how much is getting reverted here? I see these, all since March 23rd:

IMO:


> dd1f6b0c17 Provide a way block-level table AMs could re-use
> acquire_sample_rows()

Should be reverted.


> 9bd99f4c26 Custom reloptions for table AM

Hm. There are some oddities here:

- It doesn't seem great that relcache.c now needs to know about the default
  values for all kinds of reloptions.

- why is there table_reloptions() and tableam_reloptions()?

- Why does extractRelOptions() need a TableAmRoutine parameter, extracted by a
  caller, instead of doing that work itself?



> 97ce821e3e Fix the parameters order for
> TableAmRoutine.relation_copy_for_cluster()

Shouldn't be, this is a clear fix.


> b1484a3f19 Let table AM insertion methods control index insertion

I'm not sure. I'm not convinced this is right, nor the opposite. If the
tableam takes control of index insertion, shouldn't nodeModifyTuple know this
earlier, so it doesn't prepare a bunch of index insertion state?  Also,
there's pretty much no motivating explanation in the commit.


> 27bc1772fc Generalize relation analyze in table AM interface

Should be reverted.


> 87985cc925 Allow locking updated tuples in tuple_update() and tuple_delete()

Strongly suspect this should be reverted. The last time this was committed it
was far from ready. It's very easy to cause corruption due to subtle bugs in
this area.


> c35a3fb5e0 Allow table AM tuple_insert() method to return the different slot

If the AM returns a different slot, who is responsible for cleaning it up? And
how is creating a new slot for every insert not going to be a measurable
overhead?


> 02eb07ea89 Allow table AM to store complex data structures in rd_amcache

I am doubtful this is right.  Is it really sufficient to have a callback for
freeing? What happens when relcache entries are swapped as part of a rebuild?
That works for "flat" caches, but I don't immediately see how it works for
more complicated datastructures.  At least from the commit message it's hard
to evaluate how this actually intended to be used.

Greetings,

Andres Freund



Re: Table AM Interface Enhancements

От
Robert Haas
Дата:
On Wed, Apr 10, 2024 at 12:36 PM Alexander Korotkov
<aekorotkov@gmail.com> wrote:
> But I have to mention that even that I've committed table AM stuff
> close to the FF, there has been quite amount of depended work
> committed.  So, revert of these patches is promising to be not
> something immediate and easy, which requires just the decision.  It
> would touch others work.  And and revert patches might also need
> review.  I get the point that patches got lack of consensus.  But in
> terms of efforts (not my efforts) it's probably makes sense to get
> them some post-commit review.

That is somewhat fair, but it is also a lot of work. There are
multiple people asking for you to revert things on multiple threads,
and figuring out all of the revert requests and trying to come to some
consensus about what should be done in each case is going to take an
enormous amount of time. I know you've done lots of good work on
PostgreSQL in the past and I respect that, but I think you also have
to realize that you're asking other people to spend a LOT of time
figuring out what to do about the current situation. I see Andres has
posted more specifically about what he thinks should happen to each of
the table AM patches and I am willing to defer to his opinion, but we
need to make some quick decisions here to either keep things or take
them out. Extensive reworks after feature freeze should not be an
option that is on the table; that's what makes it a freeze.

I also do not think I really believe that there's been so much stuff
committed that a blanket revert would be all that hard to carry off,
if that were the option that the community ended up preferring.

> Robert, look.  Last year I went through the arrest for expressing my
> opinion.  I that was not what normal arrest should look like, but a
> period of survival.  My family went through a period of fear, struggle
> and uncertainty.  Now, we're healthy and safe, but there is still
> uncertainty given asylum seeker status.  During all this period, I
> have to just obey, agree with everything, lie that I apologize about
> things I don't apologize.  I had to do this, because the price of
> expressing myself was not just my life, but also health, freedom and
> well-being of my family.
>
> I owe you great respect for all your work for PostgreSQL, and
> especially for your efforts on getting things organized.  But it
> wouldn't work the way you increase my potential punishment and I just
> say that I'm obey and you're right about everything.  You may even
> initiate the procedure of my exclusion from committers (no idea what
> the procedure is), ban from the list etc.  I see you express many
> valuable points, but my view is not exactly same as yours.  And like a
> conclusion to some as result of discussion not threats.
>
> I feel the sense of blame and fear in latest discussions, and I don't
> like it.  That's OK to place the blame from time to time.  But I would
> like to add here more joy and respect (and I'm sorry I personally
> didn't do enough in this matter).  It's important get things right
> etc.  But in long term relationships may mean more.

I am not sure how to respond to this. On a personal level, I am sorry
to hear that you were arrested and, if I can be of some help to you,
we can discuss that off-list. However, if you're suggesting that there
is some kind of equivalence between me criticizing your decisions
about what to commit and someone in a position of authority putting
you in jail, well, I don't think it's remotely fair to compare those
things.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Table AM Interface Enhancements

От
Peter Geoghegan
Дата:
On Wed, Apr 10, 2024 at 1:25 PM Robert Haas <robertmhaas@gmail.com> wrote:
> That is somewhat fair, but it is also a lot of work. There are
> multiple people asking for you to revert things on multiple threads,
> and figuring out all of the revert requests and trying to come to some
> consensus about what should be done in each case is going to take an
> enormous amount of time. I know you've done lots of good work on
> PostgreSQL in the past and I respect that, but I think you also have
> to realize that you're asking other people to spend a LOT of time
> figuring out what to do about the current situation. I see Andres has
> posted more specifically about what he thinks should happen to each of
> the table AM patches and I am willing to defer to his opinion, but we
> need to make some quick decisions here to either keep things or take
> them out. Extensive reworks after feature freeze should not be an
> option that is on the table; that's what makes it a freeze.

Alexander has been sharply criticized for acting in haste, pushing
work in multiple areas when it was clearly not ready. And that seems
proportionate to me. I agree that he showed poor judgement in the past
few months, and especially in the past few weeks. Not just on one
occasion, but on several. That must have consequences.

> I also do not think I really believe that there's been so much stuff
> committed that a blanket revert would be all that hard to carry off,
> if that were the option that the community ended up preferring.

It seems to me that emotions are running high right now. I think that
it would be a mistake to act in haste when determining next steps.
It's very important, but it's not very urgent.

I've known Alexander for about 15 years. I think that he deserves some
consideration here. Say a week or two, to work through some of the
more complicated issues -- and to take a breather. I just don't see
any upside to rushing through this process, given where we are now.

--
Peter Geoghegan



Re: Table AM Interface Enhancements

От
Bruce Momjian
Дата:
On Wed, Apr 10, 2024 at 05:42:51PM +0400, Pavel Borisov wrote:
> Hi, Alexander!
> In my view, the actual list of what has raised discussion is:
> dd1f6b0c17 Provide a way block-level table AMs could re-use acquire_sample_rows
> ()
> 27bc1772fc Generalize relation analyze in table AM interface
> 
> Proposals to revert the other patches in a wholesale way look to me like an
> ill-performed continuation of a discussion [1]. I can't believe that "Let's

For reference this disussion was:

    I don't dispute that we could do better, and this is just a
    simplistic look based on "number of commits per day", but the
    attached does put it in perspective to some extent.

> select which commits close to FF looks worse than the others" based on
> whereabouts, not patch contents is a good and productive way for the community
> to use.

I don't know how you can say these patches are being questioned just
because they are near the feature freeze (FF).  There are clear
concerns, and post-feature freeze is not the time to be evaluating which
patches which were pushed in near feature freeze need help.

What is the huge rush for these patches, and if they were so important,
why was this not done earlier?  This can all wait until PG 18.  If
Supabase or someone else needs these patches for PG 17, they will need
to create a patched verison of PG 17 with these patches.

> At the same time if Andres, who is the most experienced person in the scope of
> access methods is willing to give his post-commit re-review of any of the
> committed patches and will recommend some of them reverted, it would be a good
> sensible input to act accordingly.
> patch 

So the patches were rushed, have problems, and now we are requiring
Andres to stop what he is doing to give immediate feedback --- that is
not fair to him.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.



Re: Table AM Interface Enhancements

От
Andres Freund
Дата:
Hi,

On 2024-04-10 15:19:47 +0300, Alexander Korotkov wrote:
> On Mon, Apr 8, 2024 at 9:54 PM Robert Haas <robertmhaas@gmail.com> wrote:
> > On Mon, Apr 8, 2024 at 12:33 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > > Yes, it was my mistake. I got rushing trying to fit this to FF, even doing significant changes just before
commit.
> > > I'll revert this later today.
>
> The patch to revert is attached.  Given that revert touches the work
> done in 041b96802e, I think it needs some feedback before push.

Hm. It's a bit annoying to revert it, you're right. I think on its own the
revert looks reasonable from what I've seen so far, will continue looking for
a bit.

I think we'll need to do some cleanup of 041b96802e separately afterwards -
possibly in 17, possibly in 18.  Particularly post-27bc1772fc8
acquire_sample_rows() was tied hard to heapam, so it made sense for 041b96802e
to create the stream in acquire_sample_rows() and have
block_sampling_read_stream_next() be in analyze.c. But eventually that should
be in access/heap/. Compared to 16, the state post the revert does tie
analyze.c a bit closer to the internals of the AM than before, but I'm not
sure the increase matters.

Greetings,

Andres Freund



Re: Table AM Interface Enhancements

От
Melanie Plageman
Дата:
On Wed, Apr 10, 2024 at 4:03 PM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2024-04-10 15:19:47 +0300, Alexander Korotkov wrote:
> > On Mon, Apr 8, 2024 at 9:54 PM Robert Haas <robertmhaas@gmail.com> wrote:
> > > On Mon, Apr 8, 2024 at 12:33 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > > > Yes, it was my mistake. I got rushing trying to fit this to FF, even doing significant changes just before
commit.
> > > > I'll revert this later today.
> >
> > The patch to revert is attached.  Given that revert touches the work
> > done in 041b96802e, I think it needs some feedback before push.
>
> Hm. It's a bit annoying to revert it, you're right. I think on its own the
> revert looks reasonable from what I've seen so far, will continue looking for
> a bit.
>
> I think we'll need to do some cleanup of 041b96802e separately afterwards -
> possibly in 17, possibly in 18.  Particularly post-27bc1772fc8
> acquire_sample_rows() was tied hard to heapam, so it made sense for 041b96802e
> to create the stream in acquire_sample_rows() and have
> block_sampling_read_stream_next() be in analyze.c. But eventually that should
> be in access/heap/. Compared to 16, the state post the revert does tie
> analyze.c a bit closer to the internals of the AM than before, but I'm not
> sure the increase matters.

Yes in an earlier version of 041b96802e, I gave the review feedback
that the read stream should be pushed down into heap-specific code,
but then after 27bc1772fc8, Bilal took the approach of putting the
read stream code in acquire_sample_rows() since that was no longer
table AM-agnostic.

This thread has been moving pretty fast, so could someone point out
which version of the patch has the modifications to
acquire_sample_rows() that would be relevant for Bilal (and others
involved in analyze streaming read) to review? Is it
v1-0001-revert-Generalize-relation-analyze-in-table-AM-in.patch?

- Melanie



Re: Table AM Interface Enhancements

От
Andres Freund
Дата:
Hi,

On 2024-04-10 16:24:40 -0400, Melanie Plageman wrote:
> This thread has been moving pretty fast, so could someone point out
> which version of the patch has the modifications to
> acquire_sample_rows() that would be relevant for Bilal (and others
> involved in analyze streaming read) to review? Is it
> v1-0001-revert-Generalize-relation-analyze-in-table-AM-in.patch?

I think so. It's at least what I've been looking at.

Greetings,

Andres Freund



Re: Table AM Interface Enhancements

От
Melanie Plageman
Дата:
On Wed, Apr 10, 2024 at 4:33 PM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2024-04-10 16:24:40 -0400, Melanie Plageman wrote:
> > This thread has been moving pretty fast, so could someone point out
> > which version of the patch has the modifications to
> > acquire_sample_rows() that would be relevant for Bilal (and others
> > involved in analyze streaming read) to review? Is it
> > v1-0001-revert-Generalize-relation-analyze-in-table-AM-in.patch?
>
> I think so. It's at least what I've been looking at.

I took a look at this patch, and you're right we will need to do
follow-on work with streaming ANALYZE. The streaming read code will
have to be moved now that acquire_sample_rows() is table-AM agnostic
again.

I don't think there was ever a version that Bilal wrote
where the streaming read code was outside of acquire_sample_rows(). By
the time he got that review feedback, 27bc1772fc8 had gone in.

This brings up a question about the prefetching. We never had to have
this discussion for sequential scan streaming read because it didn't
(and still doesn't) do prefetching. But, if we push the streaming read
code down into the heap AM layer, it will be doing the prefetching.
So, do we remove the prefetching from acquire_sample_rows() and expect
other table AMs to implement it themselves or use the streaming read
API?

- Melanie



Re: Table AM Interface Enhancements

От
Andres Freund
Дата:
Hi,

On 2024-04-10 16:50:44 -0400, Melanie Plageman wrote:
> This brings up a question about the prefetching. We never had to have
> this discussion for sequential scan streaming read because it didn't
> (and still doesn't) do prefetching. But, if we push the streaming read
> code down into the heap AM layer, it will be doing the prefetching.
> So, do we remove the prefetching from acquire_sample_rows() and expect
> other table AMs to implement it themselves or use the streaming read
> API?

The prefetching added to acquire_sample_rows was quite narrowly tailored to
something heap-like - it pretty much required that block numbers to be 1:1
with the actual physical on-disk location for the specific AM.  So I think
it's pretty much required for this to be pushed down.

Using a read stream is a few lines for something like this, so I'm not worried
about it. I guess we could have a default implementation for block based AMs,
similar what we have around table_block_parallelscan_*, but not sure it's
worth doing that, the complexity is much lower than in the
table_block_parallelscan_ case.

Greetings,

Andres



Re: Table AM Interface Enhancements

От
Alexander Korotkov
Дата:
Hi Andres,

On Wed, Apr 10, 2024 at 7:52 PM Andres Freund <andres@anarazel.de> wrote:
> On 2024-04-08 14:54:46 -0400, Robert Haas wrote:
> > Exactly how much is getting reverted here? I see these, all since March 23rd:
>
> IMO:
>
>
> > dd1f6b0c17 Provide a way block-level table AMs could re-use
> > acquire_sample_rows()
>
> Should be reverted.
>
>
> > 9bd99f4c26 Custom reloptions for table AM
>
> Hm. There are some oddities here:
>
> - It doesn't seem great that relcache.c now needs to know about the default
>   values for all kinds of reloptions.
>
> - why is there table_reloptions() and tableam_reloptions()?
>
> - Why does extractRelOptions() need a TableAmRoutine parameter, extracted by a
>   caller, instead of doing that work itself?
>
>
>
> > 97ce821e3e Fix the parameters order for
> > TableAmRoutine.relation_copy_for_cluster()
>
> Shouldn't be, this is a clear fix.
>
>
> > b1484a3f19 Let table AM insertion methods control index insertion
>
> I'm not sure. I'm not convinced this is right, nor the opposite. If the
> tableam takes control of index insertion, shouldn't nodeModifyTuple know this
> earlier, so it doesn't prepare a bunch of index insertion state?  Also,
> there's pretty much no motivating explanation in the commit.
>
>
> > 27bc1772fc Generalize relation analyze in table AM interface
>
> Should be reverted.
>
>
> > 87985cc925 Allow locking updated tuples in tuple_update() and tuple_delete()
>
> Strongly suspect this should be reverted. The last time this was committed it
> was far from ready. It's very easy to cause corruption due to subtle bugs in
> this area.
>
>
> > c35a3fb5e0 Allow table AM tuple_insert() method to return the different slot
>
> If the AM returns a different slot, who is responsible for cleaning it up? And
> how is creating a new slot for every insert not going to be a measurable
> overhead?
>
>
> > 02eb07ea89 Allow table AM to store complex data structures in rd_amcache
>
> I am doubtful this is right.  Is it really sufficient to have a callback for
> freeing? What happens when relcache entries are swapped as part of a rebuild?
> That works for "flat" caches, but I don't immediately see how it works for
> more complicated datastructures.  At least from the commit message it's hard
> to evaluate how this actually intended to be used.

Thank you for your feedback.  I've reverted all of above.

------
Regards,
Alexander Korotkov



Re: Table AM Interface Enhancements

От
Melanie Plageman
Дата:
On Wed, Apr 10, 2024 at 5:21 PM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2024-04-10 16:50:44 -0400, Melanie Plageman wrote:
> > This brings up a question about the prefetching. We never had to have
> > this discussion for sequential scan streaming read because it didn't
> > (and still doesn't) do prefetching. But, if we push the streaming read
> > code down into the heap AM layer, it will be doing the prefetching.
> > So, do we remove the prefetching from acquire_sample_rows() and expect
> > other table AMs to implement it themselves or use the streaming read
> > API?
>
> The prefetching added to acquire_sample_rows was quite narrowly tailored to
> something heap-like - it pretty much required that block numbers to be 1:1
> with the actual physical on-disk location for the specific AM.  So I think
> it's pretty much required for this to be pushed down.
>
> Using a read stream is a few lines for something like this, so I'm not worried
> about it. I guess we could have a default implementation for block based AMs,
> similar what we have around table_block_parallelscan_*, but not sure it's
> worth doing that, the complexity is much lower than in the
> table_block_parallelscan_ case.

This makes sense.

I am working on pushing streaming ANALYZE into heap AM code, and I ran
into a few roadblocks.

If we want ANALYZE to make the ReadStream object in heap_beginscan()
(like the read stream implementation of heap sequential and TID range
scans do), I don't see any way around changing the scan_begin table AM
callback to take a BufferAccessStrategy at the least (and perhaps also
the BlockSamplerData).

read_stream_begin_relation() doesn't just save the
BufferAccessStrategy in the ReadStream, it uses it to set various
other things in the ReadStream object. callback_private_data (which in
ANALYZE's case is the BlockSamplerData) is simply saved in the
ReadStream, so it could be set later, but that doesn't sound very
clean to me.

As such, it seems like a cleaner alternative would be to add a table
AM callback for creating a read stream object that takes the
parameters of read_stream_begin_relation(). But, perhaps it is a bit
late for such additions.

It also opens us up to the question of whether or not sequential scan
should use such a callback instead of making the read stream object in
heap_beginscan().

I am happy to write a patch that does any of the above. But, I want to
raise these questions, because perhaps I am simply missing an obvious
alternative solution.

- Melanie



Re: Table AM Interface Enhancements

От
Jeff Davis
Дата:
On Wed, 2024-04-10 at 15:19 +0300, Alexander Korotkov wrote:
> 1) 9bd99f4c26 comprises the reworked patch after working with notes
> from Jeff Davis.  I agree it would be better to wait for him to
> express explicit agreement.  Before reverting this, I would prefer to
> hear his opinion.

On this particular feature, I had tried it in the past myself, and
there were a number of minor frustrations and I left it unfinished. I
quickly recognized that commit c95c25f9af was too simple to work.

Commit 9bd99f4c26 looked substantially better, but I was surprised to
see it committed so soon after the redesign. I thought a revert was
likely outcome, but I put it on my list of things to review more deeply
in the next couple weeks so I could give productive feedback.

It would benefit from more discussion in v18, and I apologize for not
getting involved earlier when the patch still could have made it into
v17.

Regards,
    Jeff Davis




Re: Table AM Interface Enhancements

От
Alexander Korotkov
Дата:
On Thu, Apr 11, 2024 at 8:11 PM Jeff Davis <pgsql@j-davis.com> wrote:
> On Wed, 2024-04-10 at 15:19 +0300, Alexander Korotkov wrote:
> > 1) 9bd99f4c26 comprises the reworked patch after working with notes
> > from Jeff Davis.  I agree it would be better to wait for him to
> > express explicit agreement.  Before reverting this, I would prefer to
> > hear his opinion.
>
> On this particular feature, I had tried it in the past myself, and
> there were a number of minor frustrations and I left it unfinished. I
> quickly recognized that commit c95c25f9af was too simple to work.
>
> Commit 9bd99f4c26 looked substantially better, but I was surprised to
> see it committed so soon after the redesign. I thought a revert was
> likely outcome, but I put it on my list of things to review more deeply
> in the next couple weeks so I could give productive feedback.

Thank you for your feedback, Jeff.

> It would benefit from more discussion in v18, and I apologize for not
> getting involved earlier when the patch still could have made it into
> v17.

I believe you don't have to apologize.  It's definitely not your fault
that I've committed this patch in this shape.

------
Regards,
Alexander Korotkov



Re: Table AM Interface Enhancements

От
Alexander Korotkov
Дата:
Hi!

On Thu, Apr 11, 2024 at 7:19 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> On Wed, Apr 10, 2024 at 5:21 PM Andres Freund <andres@anarazel.de> wrote:
> > On 2024-04-10 16:50:44 -0400, Melanie Plageman wrote:
> > > This brings up a question about the prefetching. We never had to have
> > > this discussion for sequential scan streaming read because it didn't
> > > (and still doesn't) do prefetching. But, if we push the streaming read
> > > code down into the heap AM layer, it will be doing the prefetching.
> > > So, do we remove the prefetching from acquire_sample_rows() and expect
> > > other table AMs to implement it themselves or use the streaming read
> > > API?
> >
> > The prefetching added to acquire_sample_rows was quite narrowly tailored to
> > something heap-like - it pretty much required that block numbers to be 1:1
> > with the actual physical on-disk location for the specific AM.  So I think
> > it's pretty much required for this to be pushed down.
> >
> > Using a read stream is a few lines for something like this, so I'm not worried
> > about it. I guess we could have a default implementation for block based AMs,
> > similar what we have around table_block_parallelscan_*, but not sure it's
> > worth doing that, the complexity is much lower than in the
> > table_block_parallelscan_ case.
>
> This makes sense.
>
> I am working on pushing streaming ANALYZE into heap AM code, and I ran
> into a few roadblocks.
>
> If we want ANALYZE to make the ReadStream object in heap_beginscan()
> (like the read stream implementation of heap sequential and TID range
> scans do), I don't see any way around changing the scan_begin table AM
> callback to take a BufferAccessStrategy at the least (and perhaps also
> the BlockSamplerData).
>
> read_stream_begin_relation() doesn't just save the
> BufferAccessStrategy in the ReadStream, it uses it to set various
> other things in the ReadStream object. callback_private_data (which in
> ANALYZE's case is the BlockSamplerData) is simply saved in the
> ReadStream, so it could be set later, but that doesn't sound very
> clean to me.
>
> As such, it seems like a cleaner alternative would be to add a table
> AM callback for creating a read stream object that takes the
> parameters of read_stream_begin_relation(). But, perhaps it is a bit
> late for such additions.
>
> It also opens us up to the question of whether or not sequential scan
> should use such a callback instead of making the read stream object in
> heap_beginscan().
>
> I am happy to write a patch that does any of the above. But, I want to
> raise these questions, because perhaps I am simply missing an obvious
> alternative solution.

I understand that I'm the bad guy of this release, not sure if my
opinion counts.

But what is going on here?  I hope this work is targeting pg18.
Otherwise, do I get this right that this post feature-freeze works on
designing a new API?  Yes, 27bc1772fc masked the problem.  But it was
committed on Mar 30.  So that couldn't justify why the proper API
wasn't designed in time.  Are we judging different commits with the
same criteria?

IMHO, 041b96802e should be just reverted.

------
Regards,
Alexander Korotkov



Re: Table AM Interface Enhancements

От
Melanie Plageman
Дата:
On Thu, Apr 11, 2024 at 12:19 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
>
> On Wed, Apr 10, 2024 at 5:21 PM Andres Freund <andres@anarazel.de> wrote:
> >
> > Hi,
> >
> > On 2024-04-10 16:50:44 -0400, Melanie Plageman wrote:
> > > This brings up a question about the prefetching. We never had to have
> > > this discussion for sequential scan streaming read because it didn't
> > > (and still doesn't) do prefetching. But, if we push the streaming read
> > > code down into the heap AM layer, it will be doing the prefetching.
> > > So, do we remove the prefetching from acquire_sample_rows() and expect
> > > other table AMs to implement it themselves or use the streaming read
> > > API?
> >
> > The prefetching added to acquire_sample_rows was quite narrowly tailored to
> > something heap-like - it pretty much required that block numbers to be 1:1
> > with the actual physical on-disk location for the specific AM.  So I think
> > it's pretty much required for this to be pushed down.
> >
> > Using a read stream is a few lines for something like this, so I'm not worried
> > about it. I guess we could have a default implementation for block based AMs,
> > similar what we have around table_block_parallelscan_*, but not sure it's
> > worth doing that, the complexity is much lower than in the
> > table_block_parallelscan_ case.
>
> This makes sense.
>
> I am working on pushing streaming ANALYZE into heap AM code, and I ran
> into a few roadblocks.
>
> If we want ANALYZE to make the ReadStream object in heap_beginscan()
> (like the read stream implementation of heap sequential and TID range
> scans do), I don't see any way around changing the scan_begin table AM
> callback to take a BufferAccessStrategy at the least (and perhaps also
> the BlockSamplerData).

I will also say that, had this been 6 months ago, I would probably
suggest we restructure ANALYZE's table AM interface to accommodate
read stream setup and to address a few other things I find odd about
the current code. For example, I think creating a scan descriptor for
the analyze scan in acquire_sample_rows() is quite odd. It seems like
it would be better done in the relation_analyze callback. The
relation_analyze callback saves some state like the callbacks for
acquire_sample_rows() and the Buffer Access Strategy. But at least in
the heap implementation, it just saves them in static variables in
analyze.c. It seems like it would be better to save them in a useful
data structure that could be accessed later. We have access to pretty
much everything we need at that point (in the relation_analyze
callback). I also think heap's implementation of
table_beginscan_analyze() doesn't need most of
heap_beginscan()/initscan(), so doing this instead of something
ANALYZE specific seems more confusing than helpful.

- Melanie



Re: Table AM Interface Enhancements

От
Robert Haas
Дата:
On Thu, Apr 11, 2024 at 1:46 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> I understand that I'm the bad guy of this release, not sure if my
> opinion counts.
>
> But what is going on here?  I hope this work is targeting pg18.
> Otherwise, do I get this right that this post feature-freeze works on
> designing a new API?  Yes, 27bc1772fc masked the problem.  But it was
> committed on Mar 30.  So that couldn't justify why the proper API
> wasn't designed in time.  Are we judging different commits with the
> same criteria?

I mean, Andres already said that the cleanup was needed possibly in
17, and possibly in 18.

As far as fairness is concerned, you'll get no argument from me if you
say the streaming read stuff was all committed far later than it
should have been. I said that in the very first email I wrote on the
"post-feature freeze cleanup" thread. But if you're going to argue
that there's no opportunity for anyone to adjust patches that were
sideswiped by the reverts of your patches, and that if any such
adjustments seem advisable we should just revert the sideswiped
patches entirely, I don't agree with that, and I don't see why anyone
would agree with that. I think it's fine to have the discussion, and
if the result of that discussion is that somebody says "hey, we want
to do X in 17 for reason Y," then we can discuss that proposal on its
merits, taking into account the answers to questions like "why wasn't
this done before the freeze?" and "is that adjustment more or less
risky than just reverting?" and "how about we just leave it alone for
now and deal with it next release?".

> IMHO, 041b96802e should be just reverted.

IMHO, it's too early to decide that, because we don't know what change
concretely is going to be proposed, and there has been no discussion
of why that change, whatever it is, belongs in this release or next
release.

I understand that you're probably not feeling great about being asked
to revert a bunch of stuff here, and I do think it is a fair point to
make that we need to be even-handed and not overreact. Just because
you had some patches that had some problems doesn't mean that
everything that got touched by the reverts can or should be whacked
around a whole bunch more post-freeze, especially since that stuff was
*also* committed very late, in haste, way closer to feature freeze
than it should have been. At the same time, it's also important to
keep in mind that our goal here is not to punish people for being bad,
or to reward them for being good, or really to make any moral
judgements at all, but to produce a quality release. I'm sure that,
where possible, you'd prefer to fix bugs in a patch you committed
rather than revert the whole thing as soon as anyone finds any
problem. I would also prefer that, both for your patches, and for
mine. And everyone else deserves that same consideration.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Table AM Interface Enhancements

От
Andres Freund
Дата:
Hi,

On 2024-04-11 20:46:02 +0300, Alexander Korotkov wrote:
> I hope this work is targeting pg18.

I think anything of the scope discussed by Melanie would be very clearly
targeting 18. For 17, I don't know yet whether we should revert the the
ANALYZE streaming read user (041b96802ef), just do a bit of comment polishing,
or some other small change.

One oddity is that before 041b96802ef, the opportunities for making the
interface cleaner were less apparent, because c6fc50cb4028 increased the
coupling between analyze.c and the way the table storage works.


> Otherwise, do I get this right that this post feature-freeze works on
> designing a new API?  Yes, 27bc1772fc masked the problem.  But it was
> committed on Mar 30.

Note that there were versions of the patch that were targeting the
pre-27bc1772fc interface.

Greetings,

Andres Freund



Re: Table AM Interface Enhancements

От
Alexander Korotkov
Дата:
On Fri, Apr 12, 2024 at 12:04 AM Andres Freund <andres@anarazel.de> wrote:
> On 2024-04-11 20:46:02 +0300, Alexander Korotkov wrote:
> > I hope this work is targeting pg18.
>
> I think anything of the scope discussed by Melanie would be very clearly
> targeting 18. For 17, I don't know yet whether we should revert the the
> ANALYZE streaming read user (041b96802ef), just do a bit of comment polishing,
> or some other small change.
>
> One oddity is that before 041b96802ef, the opportunities for making the
> interface cleaner were less apparent, because c6fc50cb4028 increased the
> coupling between analyze.c and the way the table storage works.

Thank you for pointing this out about c6fc50cb4028, I've missed this.

> > Otherwise, do I get this right that this post feature-freeze works on
> > designing a new API?  Yes, 27bc1772fc masked the problem.  But it was
> > committed on Mar 30.
>
> Note that there were versions of the patch that were targeting the
> pre-27bc1772fc interface.

Sure, I've checked this before writing.  It looks quite similar to the
result of applying my revert patch [1] to the head.

Let me describe my view over the current situation.

1) If we just apply my revert patch and leave c6fc50cb4028 and
041b96802ef in the tree, then we get our table AM API narrowed.  As
you expressed the current API requires block numbers to be 1:1 with
the actual physical on-disk location [2].  Not a secret I think the
current API is quite restrictive.  And we're getting the ANALYZE
interface narrower than it was since 737a292b5de.  Frankly speaking, I
don't think this is acceptable.

2) Pushing down the read stream and prefetch to heap am is related to
difficulties [3], [4].  That's quite a significant piece of work to be
done post FF.

In token of all of the above, is the in-tree state that bad? (if we
abstract the way 27bc1772fc and dd1f6b0c17 were committed).

The in-tree state provides quite a general API for analyze, supporting
even non-block storages.  There is a way to reuse existing
acquire_sample_rows() for table AMs, which have block numbers 1:1 with
the actual physical on-disk location.  It requires some cleanup for
comments and docs, but does not require us to redesing the API post
FF.

Links.
1. https://www.postgresql.org/message-id/CAPpHfdvuT6DnguzaV-M1UQ2whYGDojaNU%3D-%3DiHc0A7qo9HBEJw%40mail.gmail.com
2. https://www.postgresql.org/message-id/20240410212117.mxsldz2w6htrl36v%40awork3.anarazel.de
3. https://www.postgresql.org/message-id/CAAKRu_ZxU6hucckrT1SOJxKfyN7q-K4KU1y62GhDwLBZWG%2BROg%40mail.gmail.com
4. https://www.postgresql.org/message-id/CAAKRu_YkphAPNbBR2jcLqnxGhDEWTKhYfLFY%3D0R_oG5LHBH7Gw%40mail.gmail.com

------
Regards,
Alexander Korotkov



Re: Table AM Interface Enhancements

От
Alexander Korotkov
Дата:
On Thu, Apr 11, 2024 at 11:27 PM Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Apr 11, 2024 at 1:46 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > I understand that I'm the bad guy of this release, not sure if my
> > opinion counts.
> >
> > But what is going on here?  I hope this work is targeting pg18.
> > Otherwise, do I get this right that this post feature-freeze works on
> > designing a new API?  Yes, 27bc1772fc masked the problem.  But it was
> > committed on Mar 30.  So that couldn't justify why the proper API
> > wasn't designed in time.  Are we judging different commits with the
> > same criteria?
>
> I mean, Andres already said that the cleanup was needed possibly in
> 17, and possibly in 18.
>
> As far as fairness is concerned, you'll get no argument from me if you
> say the streaming read stuff was all committed far later than it
> should have been. I said that in the very first email I wrote on the
> "post-feature freeze cleanup" thread. But if you're going to argue
> that there's no opportunity for anyone to adjust patches that were
> sideswiped by the reverts of your patches, and that if any such
> adjustments seem advisable we should just revert the sideswiped
> patches entirely, I don't agree with that, and I don't see why anyone
> would agree with that. I think it's fine to have the discussion, and
> if the result of that discussion is that somebody says "hey, we want
> to do X in 17 for reason Y," then we can discuss that proposal on its
> merits, taking into account the answers to questions like "why wasn't
> this done before the freeze?" and "is that adjustment more or less
> risky than just reverting?" and "how about we just leave it alone for
> now and deal with it next release?".

I don't think 041b96802e could be sideswiped by 27bc1772fc.  The "Use
streaming I/O in ANALYZE" patch has the same issue before 27bc1772fc,
which was committed on Mar 30.  So, in the worst case 27bc1772fc
steals a week of work.  I can imagine without 27bc1772fc , a new API
could be proposed days before FF.  This means I saved patch authors
from what you name in my case "desperate rush".  Huh!

> > IMHO, 041b96802e should be just reverted.
>
> IMHO, it's too early to decide that, because we don't know what change
> concretely is going to be proposed, and there has been no discussion
> of why that change, whatever it is, belongs in this release or next
> release.
>
> I understand that you're probably not feeling great about being asked
> to revert a bunch of stuff here, and I do think it is a fair point to
> make that we need to be even-handed and not overreact. Just because
> you had some patches that had some problems doesn't mean that
> everything that got touched by the reverts can or should be whacked
> around a whole bunch more post-freeze, especially since that stuff was
> *also* committed very late, in haste, way closer to feature freeze
> than it should have been. At the same time, it's also important to
> keep in mind that our goal here is not to punish people for being bad,
> or to reward them for being good, or really to make any moral
> judgements at all, but to produce a quality release. I'm sure that,
> where possible, you'd prefer to fix bugs in a patch you committed
> rather than revert the whole thing as soon as anyone finds any
> problem. I would also prefer that, both for your patches, and for
> mine. And everyone else deserves that same consideration.

I expressed my thoughts about producing a better release without a
desperate rush post-FF in my reply to Andres [2].

Links.
1. https://www.postgresql.org/message-id/CA%2BTgmobZUnJQaaGkuoeo22Sydf9%3DmX864W11yZKd6sv-53-aEQ%40mail.gmail.com
2. https://www.postgresql.org/message-id/CAPpHfdt%2BcCj6j6cR5AyBThP6SyDf6wxAz4dU-0NdXjfpiFca7Q%40mail.gmail.com

------
Regards,
Alexander Korotkov



Re: Table AM Interface Enhancements

От
Melanie Plageman
Дата:
On Thu, Apr 11, 2024 at 6:04 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
>
> On Fri, Apr 12, 2024 at 12:04 AM Andres Freund <andres@anarazel.de> wrote:
> > On 2024-04-11 20:46:02 +0300, Alexander Korotkov wrote:
> > > I hope this work is targeting pg18.
> >
> > I think anything of the scope discussed by Melanie would be very clearly
> > targeting 18. For 17, I don't know yet whether we should revert the the
> > ANALYZE streaming read user (041b96802ef), just do a bit of comment polishing,
> > or some other small change.
> >
> > One oddity is that before 041b96802ef, the opportunities for making the
> > interface cleaner were less apparent, because c6fc50cb4028 increased the
> > coupling between analyze.c and the way the table storage works.
>
> Thank you for pointing this out about c6fc50cb4028, I've missed this.
>
> > > Otherwise, do I get this right that this post feature-freeze works on
> > > designing a new API?  Yes, 27bc1772fc masked the problem.  But it was
> > > committed on Mar 30.
> >
> > Note that there were versions of the patch that were targeting the
> > pre-27bc1772fc interface.
>
> Sure, I've checked this before writing.  It looks quite similar to the
> result of applying my revert patch [1] to the head.
>
> Let me describe my view over the current situation.
>
> 1) If we just apply my revert patch and leave c6fc50cb4028 and
> 041b96802ef in the tree, then we get our table AM API narrowed.  As
> you expressed the current API requires block numbers to be 1:1 with
> the actual physical on-disk location [2].  Not a secret I think the
> current API is quite restrictive.  And we're getting the ANALYZE
> interface narrower than it was since 737a292b5de.  Frankly speaking, I
> don't think this is acceptable.
>
> 2) Pushing down the read stream and prefetch to heap am is related to
> difficulties [3], [4].  That's quite a significant piece of work to be
> done post FF.

I had operated under the assumption that we needed to push the
streaming read code into heap AM because that is what we did for
sequential scan, but now that I think about it, I don't see why we
would have to. Bilal's patch pre-27bc1772fc did not do this. But I
think the code in acquire_sample_rows() isn't more tied to heap AM
after 041b96802ef than it was before it. Are you of the opinion that
the code with 041b96802ef ties acquire_sample_rows() more closely to
heap format?

- Melanie



Re: Table AM Interface Enhancements

От
Alexander Korotkov
Дата:
Hi, Melanie!

On Fri, Apr 12, 2024 at 8:48 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
>
> On Thu, Apr 11, 2024 at 6:04 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> >
> > On Fri, Apr 12, 2024 at 12:04 AM Andres Freund <andres@anarazel.de> wrote:
> > > On 2024-04-11 20:46:02 +0300, Alexander Korotkov wrote:
> > > > I hope this work is targeting pg18.
> > >
> > > I think anything of the scope discussed by Melanie would be very clearly
> > > targeting 18. For 17, I don't know yet whether we should revert the the
> > > ANALYZE streaming read user (041b96802ef), just do a bit of comment polishing,
> > > or some other small change.
> > >
> > > One oddity is that before 041b96802ef, the opportunities for making the
> > > interface cleaner were less apparent, because c6fc50cb4028 increased the
> > > coupling between analyze.c and the way the table storage works.
> >
> > Thank you for pointing this out about c6fc50cb4028, I've missed this.
> >
> > > > Otherwise, do I get this right that this post feature-freeze works on
> > > > designing a new API?  Yes, 27bc1772fc masked the problem.  But it was
> > > > committed on Mar 30.
> > >
> > > Note that there were versions of the patch that were targeting the
> > > pre-27bc1772fc interface.
> >
> > Sure, I've checked this before writing.  It looks quite similar to the
> > result of applying my revert patch [1] to the head.
> >
> > Let me describe my view over the current situation.
> >
> > 1) If we just apply my revert patch and leave c6fc50cb4028 and
> > 041b96802ef in the tree, then we get our table AM API narrowed.  As
> > you expressed the current API requires block numbers to be 1:1 with
> > the actual physical on-disk location [2].  Not a secret I think the
> > current API is quite restrictive.  And we're getting the ANALYZE
> > interface narrower than it was since 737a292b5de.  Frankly speaking, I
> > don't think this is acceptable.
> >
> > 2) Pushing down the read stream and prefetch to heap am is related to
> > difficulties [3], [4].  That's quite a significant piece of work to be
> > done post FF.
>
> I had operated under the assumption that we needed to push the
> streaming read code into heap AM because that is what we did for
> sequential scan, but now that I think about it, I don't see why we
> would have to. Bilal's patch pre-27bc1772fc did not do this. But I
> think the code in acquire_sample_rows() isn't more tied to heap AM
> after 041b96802ef than it was before it. Are you of the opinion that
> the code with 041b96802ef ties acquire_sample_rows() more closely to
> heap format?

Yes, I think so.  Table AM API deals with TIDs and block numbers, but
doesn't force on what they actually mean.  For example, in ZedStore
[1], data is stored on per-column B-trees, where TID used in table AM
is just a logical key of that B-trees.  Similarly, blockNumber is a
range for B-trees.

c6fc50cb4028 and 041b96802ef are putting to acquire_sample_rows() an
assumption that we are sampling physical blocks as they are stored in
data files.  That couldn't anymore be some "logical" block numbers
with meaning only table AM implementation knows.  That was pointed out
by Andres [2].  I'm not sure if ZedStore is alive, but there could be
other table AM implementations like this, or other implementations in
development, etc.  Anyway, I don't feel good about narrowing the API,
which is there from pg12.

Links.
1. https://www.pgcon.org/events/pgcon_2020/sessions/session/44/slides/13/Zedstore-PGCon2020-Virtual.pdf
2. https://www.postgresql.org/message-id/20240410212117.mxsldz2w6htrl36v%40awork3.anarazel.de

------
Regards,
Alexander Korotkov



Re: Table AM Interface Enhancements

От
Robert Haas
Дата:
On Sat, Apr 13, 2024 at 5:28 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> Yes, I think so.  Table AM API deals with TIDs and block numbers, but
> doesn't force on what they actually mean.  For example, in ZedStore
> [1], data is stored on per-column B-trees, where TID used in table AM
> is just a logical key of that B-trees.  Similarly, blockNumber is a
> range for B-trees.
>
> c6fc50cb4028 and 041b96802ef are putting to acquire_sample_rows() an
> assumption that we are sampling physical blocks as they are stored in
> data files.  That couldn't anymore be some "logical" block numbers
> with meaning only table AM implementation knows.  That was pointed out
> by Andres [2].  I'm not sure if ZedStore is alive, but there could be
> other table AM implementations like this, or other implementations in
> development, etc.  Anyway, I don't feel good about narrowing the API,
> which is there from pg12.

I spent some time looking at this. I think it's valid to complain
about the tighter coupling, but c6fc50cb4028 is there starting in v14,
so I don't think I understand why the situation after 041b96802ef is
materially worse than what we've had for the last few releases. I
think it is worse in the sense that, before, you could dodge the
problem without defining USE_PREFETCH, and now you can't, but I don't
think we can regard nonphysical block numbers as a supported scenario
on that basis.

But maybe I'm not correctly understanding the situation?

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Table AM Interface Enhancements

От
Pavel Borisov
Дата:


On Mon, 15 Apr 2024 at 19:36, Robert Haas <robertmhaas@gmail.com> wrote:
On Sat, Apr 13, 2024 at 5:28 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> Yes, I think so.  Table AM API deals with TIDs and block numbers, but
> doesn't force on what they actually mean.  For example, in ZedStore
> [1], data is stored on per-column B-trees, where TID used in table AM
> is just a logical key of that B-trees.  Similarly, blockNumber is a
> range for B-trees.
>
> c6fc50cb4028 and 041b96802ef are putting to acquire_sample_rows() an
> assumption that we are sampling physical blocks as they are stored in
> data files.  That couldn't anymore be some "logical" block numbers
> with meaning only table AM implementation knows.  That was pointed out
> by Andres [2].  I'm not sure if ZedStore is alive, but there could be
> other table AM implementations like this, or other implementations in
> development, etc.  Anyway, I don't feel good about narrowing the API,
> which is there from pg12.

I spent some time looking at this. I think it's valid to complain
about the tighter coupling, but c6fc50cb4028 is there starting in v14,
so I don't think I understand why the situation after 041b96802ef is
materially worse than what we've had for the last few releases. I
think it is worse in the sense that, before, you could dodge the
problem without defining USE_PREFETCH, and now you can't, but I don't
think we can regard nonphysical block numbers as a supported scenario
on that basis.

But maybe I'm not correctly understanding the situation?
Hi, Robert!

In my understanding, the downside of 041b96802ef is bringing read_stream* things from being heap-only-related up to the level of acquire_sample_rows() that is not supposed to be tied to heap. And changing *_analyze_next_block() function signature to use ReadStream explicitly in the signature. 

Regards,
Pavel.

Re: Table AM Interface Enhancements

От
Nazir Bilal Yavuz
Дата:
Hi,

On Mon, 15 Apr 2024 at 18:36, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Sat, Apr 13, 2024 at 5:28 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > Yes, I think so.  Table AM API deals with TIDs and block numbers, but
> > doesn't force on what they actually mean.  For example, in ZedStore
> > [1], data is stored on per-column B-trees, where TID used in table AM
> > is just a logical key of that B-trees.  Similarly, blockNumber is a
> > range for B-trees.
> >
> > c6fc50cb4028 and 041b96802ef are putting to acquire_sample_rows() an
> > assumption that we are sampling physical blocks as they are stored in
> > data files.  That couldn't anymore be some "logical" block numbers
> > with meaning only table AM implementation knows.  That was pointed out
> > by Andres [2].  I'm not sure if ZedStore is alive, but there could be
> > other table AM implementations like this, or other implementations in
> > development, etc.  Anyway, I don't feel good about narrowing the API,
> > which is there from pg12.
>
> I spent some time looking at this. I think it's valid to complain
> about the tighter coupling, but c6fc50cb4028 is there starting in v14,
> so I don't think I understand why the situation after 041b96802ef is
> materially worse than what we've had for the last few releases. I
> think it is worse in the sense that, before, you could dodge the
> problem without defining USE_PREFETCH, and now you can't, but I don't
> think we can regard nonphysical block numbers as a supported scenario
> on that basis.

I agree with you but I did not understand one thing. If out-of-core
AMs are used, does not all block sampling logic (BlockSampler_Init(),
BlockSampler_Next() etc.) need to be edited as well since these
functions assume block numbers are actual physical on-disk location,
right? I mean if the block number is something different than the
actual physical on-disk location, the acquire_sample_rows() function
looks wrong to me before c6fc50cb4028 as well.

--
Regards,
Nazir Bilal Yavuz
Microsoft



Re: Table AM Interface Enhancements

От
Robert Haas
Дата:
On Mon, Apr 15, 2024 at 12:37 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> In my understanding, the downside of 041b96802ef is bringing read_stream* things from being heap-only-related up to
thelevel of acquire_sample_rows() that is not supposed to be tied to heap. And changing *_analyze_next_block() function
signatureto use ReadStream explicitly in the signature. 

I don't think that really clarifies anything. The ReadStream is
basically just acting as a wrapper for a stream of block numbers, and
the API took a BlockNumber before. So why does it make any difference?

If I understand correctly, Alexander thinks that, before 041b96802ef,
the block number didn't necessarily have to be the physical block
number on disk, but could instead be any 32-bit quantity that the
table AM wanted to pack into the block number. But I don't think
that's true, because acquire_sample_rows() was already passing those
block numbers to PrefetchBuffer(), which already requires physical
block numbers.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Table AM Interface Enhancements

От
Robert Haas
Дата:
On Mon, Apr 15, 2024 at 12:41 PM Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:
> I agree with you but I did not understand one thing. If out-of-core
> AMs are used, does not all block sampling logic (BlockSampler_Init(),
> BlockSampler_Next() etc.) need to be edited as well since these
> functions assume block numbers are actual physical on-disk location,
> right? I mean if the block number is something different than the
> actual physical on-disk location, the acquire_sample_rows() function
> looks wrong to me before c6fc50cb4028 as well.

Yes, this is also a problem with trying to use non-physical block
numbers. We can hypothesize an AM where it works out OK in practice,
say because there are always exactly the same number of logical block
numbers as there are physical block numbers. Or, because there are
always more logical block numbers than physical block numbers, but for
some reason the table AM author doesn't care because they believe that
in the target use case for their AM the data distribution will be
sufficiently uniform that sampling only low-numbered blocks won't
really hurt anything.

But that does seem a bit strained. In practice, I suspect that table
AMs that use logical block numbers might want to replace this line
from acquire_sample_rows() with a call to a tableam method that
returns the number of logical blocks:

        totalblocks = RelationGetNumberOfBlocks(onerel);

But even that does not seem like enough, because my guess would be
that a lot of table AMs would end up with a sparse logical block
space. For instance, you might create a logical block number sequence
that starts at 0 and just counts up towards 2^32 and eventually either
wraps around or errors out. Each new tuple gets the next TID that
isn't yet used. Well, what's going to happen eventually in a lot of
workloads is that the low-numbered logical blocks are going to be
mostly or entirely empty, and the data is going to be clustered in the
ones that are nearer to the highest logical block number that's so far
been assigned. So, then, as you say, you'd want to replace the whole
BlockSampler thing entirely.

That said, I find it a little bit hard to know what people are already
doing or realistically might try to do with table AMs. If somebody
says they have a table AM where the number of logical block numbers
equals the number of physical block numbers (or is somewhat larger but
in a way that doesn't really matter) and the existing block sampling
logic works well enough, I can't really disprove that. It puts awfully
tight limits on what the AM can be doing, but, OK, sometimes people
want to develop AMs for very specific purposes. However, because of
the prefetching thing, I think even that fairly narrow use case was
already broken before 041b96802efa33d2bc9456f2ad946976b92b5ae1. So I
just don't really see how that commit made anything worse in any way
that really matters.

But maybe it did. People often find extremely creative ways of working
around the limitations of the core interfaces. I think it could be the
case that someone found a clever way of dodging all of these problems
and had something that was working well enough that they were happy
with it, and now they can't make it work after the changes for some
reason. If that someone is reading this thread and wants to spell that
out, we can consider whether there's some relief that we could give to
that person, *especially* if they can demonstrate that they raised the
alarm before the commit went in. But in the absence of that, my
current belief is that nonphysical block numbers were never a
supported scenario; hence, the idea that
041b96802efa33d2bc9456f2ad946976b92b5ae1 should be reverted for
de-supporting them ought to be rejected.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Table AM Interface Enhancements

От
Pavel Borisov
Дата:


On Mon, 15 Apr 2024 at 22:09, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Apr 15, 2024 at 12:37 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> In my understanding, the downside of 041b96802ef is bringing read_stream* things from being heap-only-related up to the level of acquire_sample_rows() that is not supposed to be tied to heap. And changing *_analyze_next_block() function signature to use ReadStream explicitly in the signature.

I don't think that really clarifies anything. The ReadStream is
basically just acting as a wrapper for a stream of block numbers, and
the API took a BlockNumber before. So why does it make any difference?

If I understand correctly, Alexander thinks that, before 041b96802ef,
the block number didn't necessarily have to be the physical block
number on disk, but could instead be any 32-bit quantity that the
table AM wanted to pack into the block number. But I don't think
that's true, because acquire_sample_rows() was already passing those
block numbers to PrefetchBuffer(), which already requires physical
block numbers.
 
Hi, Robert!

Why it makes a difference looks a little bit unclear to me, I can't comment on this. I noticed that before 041b96802ef we had a block number and block sampler state that tied acquire_sample_rows() to the actual block structure. After we have the whole struct ReadStream which doesn't comprise just a wrapper for the same variables, but the state that ties acquire_sample_rows() to the streaming read algorithm (and heap). Yes, we don't have other access methods other than heap implemented for analyze routine, so the patch works anyway, but from the view on acquire_sample_rows() as a general method that is intended to have different implementations in the future it doesn't look good. 

It's my impression on 041b96802ef, please forgive me if I haven't understood something. 

Regards,
Pavel Borisov
Supabase

Re: Table AM Interface Enhancements

От
Andres Freund
Дата:
Hi,

On 2024-04-15 23:14:01 +0400, Pavel Borisov wrote:
> Why it makes a difference looks a little bit unclear to me, I can't comment
> on this. I noticed that before 041b96802ef we had a block number and block
> sampler state that tied acquire_sample_rows() to the actual block
> structure.

That, and the prefetch calls actually translating the block numbers 1:1 to
physical locations within the underlying file.

And before 041b96802ef they were tied much more closely by the direct calls to
heapam added in 27bc1772fc81.


> After we have the whole struct ReadStream which doesn't comprise just a
> wrapper for the same variables, but the state that ties
> acquire_sample_rows() to the streaming read algorithm (and heap).

Yes ... ? I don't see how that is a meaningful difference to the state as of
27bc1772fc81.  Nor fundamentally worse than the state 27bc1772fc81^, given
that we already issued requests for specific blocks in the file.

That said, I don't like the state after applying
https://postgr.es/m/CAPpHfdvuT6DnguzaV-M1UQ2whYGDojaNU%3D-%3DiHc0A7qo9HBEJw%40mail.gmail.com
because there's too much coupling. Hence talking about needing to iterate on
the interface in some form, earlier in the thread.


What are you actually arguing for here?

Greetings,

Andres Freund



Re: Table AM Interface Enhancements

От
Robert Haas
Дата:
On Mon, Apr 15, 2024 at 3:47 PM Andres Freund <andres@anarazel.de> wrote:
> That said, I don't like the state after applying
> https://postgr.es/m/CAPpHfdvuT6DnguzaV-M1UQ2whYGDojaNU%3D-%3DiHc0A7qo9HBEJw%40mail.gmail.com
> because there's too much coupling. Hence talking about needing to iterate on
> the interface in some form, earlier in the thread.

Mmph, I can't follow what the actual state of things is here. Are we
waiting for Alexander to push that patch? Is he waiting for somebody
to sign off on that patch? Do you want that patch applied, not
applied, or applied with some set of modifications?

I find the discussion of "too much coupling" too abstract. I want to
get down to specific proposals for what we should change, or not
change.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Table AM Interface Enhancements

От
Andres Freund
Дата:
Hi,

On 2024-04-12 01:04:03 +0300, Alexander Korotkov wrote:
> 1) If we just apply my revert patch and leave c6fc50cb4028 and
> 041b96802ef in the tree, then we get our table AM API narrowed.  As
> you expressed the current API requires block numbers to be 1:1 with
> the actual physical on-disk location [2].  Not a secret I think the
> current API is quite restrictive.  And we're getting the ANALYZE
> interface narrower than it was since 737a292b5de.  Frankly speaking, I
> don't think this is acceptable.

As others already pointed out, c6fc50cb4028 was committed quite a while
ago. I'm fairly unhappy about c6fc50cb4028, fwiw, but didn't realize that
until it was too late.


> In token of all of the above, is the in-tree state that bad? (if we
> abstract the way 27bc1772fc and dd1f6b0c17 were committed).

To me the 27bc1772fc doesn't make much sense on its own. You added calls
directly to heapam internals to a file in src/backend/commands/, that just
doesn't make sense.

Leaving that aside, I think the interface isn't good on its own:
table_relation_analyze() doesn't actually do anything, it just sets callbacks,
that then later are called from analyze.c, which doesn't at all fit to the
name of the callback/function.  I realize that this is kinda cribbed from the
FDW code, but I don't think that is a particularly good excuse.

I don't think dd1f6b0c17 improves the situation, at all. It sets global
variables to redirect how an individual acquire_sample_rows invocation
works:
void
block_level_table_analyze(Relation relation,
                          AcquireSampleRowsFunc *func,
                          BlockNumber *totalpages,
                          BufferAccessStrategy bstrategy,
                          ScanAnalyzeNextBlockFunc scan_analyze_next_block_cb,
                          ScanAnalyzeNextTupleFunc scan_analyze_next_tuple_cb)
{
    *func = acquire_sample_rows;
    *totalpages = RelationGetNumberOfBlocks(relation);
    vac_strategy = bstrategy;
    scan_analyze_next_block = scan_analyze_next_block_cb;
    scan_analyze_next_tuple = scan_analyze_next_tuple_cb;
}

Notably it does so within the ->relation_analyze tableam callback, which does
*NOT* not actually do anything other than returning a callback.  So if
->relation_analyze() for another relation is called, the acquire_sample_rows()
for the earlier relation will do something different.  Note that this isn't a
theoretical risk, acquire_inherited_sample_rows() actually collects the
acquirefunc for all the inherited relations before calling acquirefunc.

This is honestly leaving me somewhat speechless.

Greetings,

Andres Freund



Re: Table AM Interface Enhancements

От
Andres Freund
Дата:
Hi,

On 2024-04-15 16:02:00 -0400, Robert Haas wrote:
> On Mon, Apr 15, 2024 at 3:47 PM Andres Freund <andres@anarazel.de> wrote:
> > That said, I don't like the state after applying
> > https://postgr.es/m/CAPpHfdvuT6DnguzaV-M1UQ2whYGDojaNU%3D-%3DiHc0A7qo9HBEJw%40mail.gmail.com
> > because there's too much coupling. Hence talking about needing to iterate on
> > the interface in some form, earlier in the thread.
> 
> Mmph, I can't follow what the actual state of things is here. Are we
> waiting for Alexander to push that patch? Is he waiting for somebody
> to sign off on that patch?

I think Alexander is arguing that we shouldn't revert 27bc1772fc & dd1f6b0c17
in 17.  I already didn't think that was an option, because I didn't like the
added interfaces, but now am even more certain, given how broken dd1f6b0c17
seems to be:
https://postgr.es/m/20240415201057.khoyxbwwxfgzomeo%40awork3.anarazel.de


> Do you want that patch applied, not applied, or applied with some set of
> modifications?

I think we should apply Alexander's proposed revert and then separately
discuss what we should do about 041b96802ef.


> I find the discussion of "too much coupling" too abstract. I want to
> get down to specific proposals for what we should change, or not
> change.

I think it's a bit hard to propose something concrete until we've decided
whether we'll revert 27bc1772fc & dd1f6b0c17.

Greetings,

Andres Freund



Re: Table AM Interface Enhancements

От
Alexander Korotkov
Дата:
On Mon, Apr 15, 2024 at 11:11 PM Andres Freund <andres@anarazel.de> wrote:
> On 2024-04-12 01:04:03 +0300, Alexander Korotkov wrote:
> > 1) If we just apply my revert patch and leave c6fc50cb4028 and
> > 041b96802ef in the tree, then we get our table AM API narrowed.  As
> > you expressed the current API requires block numbers to be 1:1 with
> > the actual physical on-disk location [2].  Not a secret I think the
> > current API is quite restrictive.  And we're getting the ANALYZE
> > interface narrower than it was since 737a292b5de.  Frankly speaking, I
> > don't think this is acceptable.
>
> As others already pointed out, c6fc50cb4028 was committed quite a while
> ago. I'm fairly unhappy about c6fc50cb4028, fwiw, but didn't realize that
> until it was too late.

+1

> > In token of all of the above, is the in-tree state that bad? (if we
> > abstract the way 27bc1772fc and dd1f6b0c17 were committed).
>
> To me the 27bc1772fc doesn't make much sense on its own. You added calls
> directly to heapam internals to a file in src/backend/commands/, that just
> doesn't make sense.
>
> Leaving that aside, I think the interface isn't good on its own:
> table_relation_analyze() doesn't actually do anything, it just sets callbacks,
> that then later are called from analyze.c, which doesn't at all fit to the
> name of the callback/function.  I realize that this is kinda cribbed from the
> FDW code, but I don't think that is a particularly good excuse.
>
> I don't think dd1f6b0c17 improves the situation, at all. It sets global
> variables to redirect how an individual acquire_sample_rows invocation
> works:
> void
> block_level_table_analyze(Relation relation,
>                                                   AcquireSampleRowsFunc *func,
>                                                   BlockNumber *totalpages,
>                                                   BufferAccessStrategy bstrategy,
>                                                   ScanAnalyzeNextBlockFunc scan_analyze_next_block_cb,
>                                                   ScanAnalyzeNextTupleFunc scan_analyze_next_tuple_cb)
> {
>         *func = acquire_sample_rows;
>         *totalpages = RelationGetNumberOfBlocks(relation);
>         vac_strategy = bstrategy;
>         scan_analyze_next_block = scan_analyze_next_block_cb;
>         scan_analyze_next_tuple = scan_analyze_next_tuple_cb;
> }
>
> Notably it does so within the ->relation_analyze tableam callback, which does
> *NOT* not actually do anything other than returning a callback.  So if
> ->relation_analyze() for another relation is called, the acquire_sample_rows()
> for the earlier relation will do something different.  Note that this isn't a
> theoretical risk, acquire_inherited_sample_rows() actually collects the
> acquirefunc for all the inherited relations before calling acquirefunc.

You're right.  No sense trying to fix this.  Reverted.

------
Regards,
Alexander Korotkov



Re: Table AM Interface Enhancements

От
Alexander Korotkov
Дата:
On Mon, Apr 15, 2024 at 11:17 PM Andres Freund <andres@anarazel.de> wrote:
> On 2024-04-15 16:02:00 -0400, Robert Haas wrote:
> > Do you want that patch applied, not applied, or applied with some set of
> > modifications?
>
> I think we should apply Alexander's proposed revert and then separately
> discuss what we should do about 041b96802ef.

Taking a closer look at acquire_sample_rows(), I think it would be
good if table AM implementation would care about block-level (or
whatever-level) sampling.  So that acquire_sample_rows() just fetches
tuples one-by-one from table AM implementation without any care about
blocks.  Possible table_beginscan_analyze() could take an argument of
target number of tuples, then those tuples are just fetches with
table_scan_analyze_next_tuple().  What do you think?

------
Regards,
Alexander Korotkov



Re: Table AM Interface Enhancements

От
Pavel Borisov
Дата:


On Tue, 16 Apr 2024 at 14:52, Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Mon, Apr 15, 2024 at 11:17 PM Andres Freund <andres@anarazel.de> wrote:
> On 2024-04-15 16:02:00 -0400, Robert Haas wrote:
> > Do you want that patch applied, not applied, or applied with some set of
> > modifications?
>
> I think we should apply Alexander's proposed revert and then separately
> discuss what we should do about 041b96802ef.

Taking a closer look at acquire_sample_rows(), I think it would be
good if table AM implementation would care about block-level (or
whatever-level) sampling.  So that acquire_sample_rows() just fetches
tuples one-by-one from table AM implementation without any care about
blocks.  Possible table_beginscan_analyze() could take an argument of
target number of tuples, then those tuples are just fetches with
table_scan_analyze_next_tuple().  What do you think?
Hi, Alexander!

I like the idea of splitting abstraction levels for: 
1. acquirefuncs (FDW or physical table) 
2. new specific row fetch functions (alike to existing _scan_analyze_next_tuple()), that could be AM-specific. 

Then scan_analyze_next_block() or another iteration algorithm would be contained inside table AM implementation of _scan_analyze_next_tuple().

So, init of scan state would be inside table AM implementation of _beginscan_analyze(). Scan state (like BlockSamplerData or other state that could be custom for AM) could be transferred from _beginscan_analyze() to _scan_analyze_next_tuple() by some opaque AM-specific data structure. If so we'll also may need AM-specific table_endscan_analyze to clean it.

Regards,
Pavel




Re: Table AM Interface Enhancements

От
Robert Haas
Дата:
On Tue, Apr 16, 2024 at 6:52 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> Taking a closer look at acquire_sample_rows(), I think it would be
> good if table AM implementation would care about block-level (or
> whatever-level) sampling.  So that acquire_sample_rows() just fetches
> tuples one-by-one from table AM implementation without any care about
> blocks.  Possible table_beginscan_analyze() could take an argument of
> target number of tuples, then those tuples are just fetches with
> table_scan_analyze_next_tuple().  What do you think?

Andres is the expert here, but FWIW, that plan seems reasonable to me.
One downside is that every block-based tableam is going to end up with
a very similar implementation, which is kind of something I don't like
about the tableam API in general: if you want to make something that
is basically heap plus a little bit of special sauce, you have to copy
a mountain of code. Right now we don't really care about that problem,
because we don't have any other tableams in core, but if we ever do, I
think we're going to find ourselves very unhappy with that aspect of
things. But maybe now is not the time to start worrying. That problem
isn't unique to analyze, and giving out-of-core tableams the
flexibility to do what they want is better than not.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Table AM Interface Enhancements

От
Andres Freund
Дата:
On 2024-04-16 13:33:53 +0300, Alexander Korotkov wrote:
> Reverted.

Thanks!



Re: Table AM Interface Enhancements

От
Andres Freund
Дата:
Hi,

On 2024-04-16 08:31:24 -0400, Robert Haas wrote:
> On Tue, Apr 16, 2024 at 6:52 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > Taking a closer look at acquire_sample_rows(), I think it would be
> > good if table AM implementation would care about block-level (or
> > whatever-level) sampling.  So that acquire_sample_rows() just fetches
> > tuples one-by-one from table AM implementation without any care about
> > blocks.  Possible table_beginscan_analyze() could take an argument of
> > target number of tuples, then those tuples are just fetches with
> > table_scan_analyze_next_tuple().  What do you think?
> 
> Andres is the expert here, but FWIW, that plan seems reasonable to me.
> One downside is that every block-based tableam is going to end up with
> a very similar implementation, which is kind of something I don't like
> about the tableam API in general: if you want to make something that
> is basically heap plus a little bit of special sauce, you have to copy
> a mountain of code. Right now we don't really care about that problem,
> because we don't have any other tableams in core, but if we ever do, I
> think we're going to find ourselves very unhappy with that aspect of
> things. But maybe now is not the time to start worrying. That problem
> isn't unique to analyze, and giving out-of-core tableams the
> flexibility to do what they want is better than not.

I think that can partially be addressed by having more "block oriented AM"
helpers in core, like we have for table_block_parallelscan*. Doesn't work for
everything, but should for something like analyze.

Greetings,

Andres Freund



Re: Table AM Interface Enhancements

От
Mats Kindahl
Дата:
On Thu, Nov 23, 2023 at 1:43 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
Hello PostgreSQL Hackers,

I am pleased to submit a series of patches related to the Table Access
Method (AM) interface, which I initially announced during my talk at
PGCon 2023 [1]. These patches are primarily designed to support the
OrioleDB engine, but I believe they could be beneficial for other
table AM implementations as well.

The focus of these patches is to introduce more flexibility and
capabilities into the Table AM interface. This is particularly
relevant for advanced use cases like index-organized tables,
alternative MVCC implementations, etc.

Hi Alexander and great to see some action around in the table access method interface.

Sorry for being late to the game, but wondering a few things about the patches, but I'll start with the first one that caught my eye.

0007-Allow-table-AM-tuple_insert-method-to-return-the--v1.patch

This allows table AM to return a native tuple slot, which is aware of
table AM-specific system attributes.

This patch seems straightforward enough, but from reading the surrounding code and trying to understand the context I am wondering a few things. Reading the thread, I am unsure if this will go in or not, but just wanted to point out a concern I had. My apologies if I am raising an issue that is already resolved.

AFAICT, the general contract for working with table tuple slots is creating them for a particular purpose, filling it in, and then passing around a pointer to it. Since the slot is created using a "source" implementation, the "source" is responsible for memory allocation and also other updates to the state. Please correct me if I have misunderstood how this is intended to work, but this seems like a good API since it avoids unnecessary allocation and, in particular, unbounded creation of new slots affecting memory usage while a query is executing. For a plan you want to execute, you just make sure that you have slots of the right kind in each plan node and there is no need to dynamically allocate more slots. If you want one for the table access method, just make sure to fetch the slot callbacks from the table access method use those correctly. As a result, the number of slots used during execution is bounded

Assuming that I've understood it correct, if a TTS is then created and passed to tuple_insert, and it needs to return a different slot, this raises two questions:
  • As Andres pointed out: who is responsible for taking care of and dealing with the cleanup of the returned slot here? Note that this is not just a matter of releasing memory, there are other stateful things that they might need to deal with that the TAM have created for in the slot. For this, some sort of callback is needed and the tuple_insert implementation needs to call that correctly.
  • The dual is the cleanup of the "original" slot passed in: a slot of a particular kind is passed in and you need to deal with this correctly to release the resources allocated by the original slot, using some sort of callback.
For both these cases, the question is what cleanup function to call.

In most cases, the slot comes from a subplan and is not dynamically allocated, i.e., it cannot just use release() since it is reused later. For example, for ExecScanFetch the slot ss_ScanTupleSlot is returned, which is then used with tuple_insert (unless I've misread the code), which is typically cleared, not released.

If clear() is used instead, and you clear this slot as part of inserting a tuple, you can instead clear a premature intermediate result (ss_ScanTupleSlot, in the example above), which can cause strange issues if this result is needed later. 

So, given that the dynamic allocation of new slots is unbounded within a query and it is complicated to make sure that slots are cleared/reset/released correctly depending on context, this seems to be hard to get to work correctly and not risk introducing bugs. IMHO, it would be preferable to have a very simple contract where you init, set, clear, and release the slot to avoid bugs creeping into the code, which is what the PostgreSQL code mostly has now.

So, the question here is why changing the slot implementation is needed. I do not know the details of OrioleDB, but this slot is immediately used with ExecInsertIndexTuples() after the call in nodeModifyTable. If the need is to pass information from the TAM to the IAM then it might be better to store this information in the execution state. Is there a case where the correct slot is not created, then fixing that location might be better. (I've noticed that the copyFrom code has a somewhat naïve assumption of what slot implementation should be used, but that is a separate discussion.)

Best wishes,
Mats Kindahl