Обсуждение: Avoid overhead open-close indexes (catalog updates)
Hi,
Introduced the CopyStatistics function.
To do the work, CopyStatistics uses a less efficient function
to update/insert tuples at catalog systems.
The comment at indexing.c says:
"Avoid using it for multiple tuples, since opening the indexes
* and building the index info structures is moderately expensive.
* (Use CatalogTupleInsertWithInfo in such cases.)"
* and building the index info structures is moderately expensive.
* (Use CatalogTupleInsertWithInfo in such cases.)"
So inspired by the comment, changed in some fews places,
the CatalogInsert/CatalogUpdate to more efficient functions
CatalogInsertWithInfo/CatalogUpdateWithInfo.
With quick tests, resulting in small performance.
head:
1. REINDEX TABLE CONCURRENTLY pgbench_accounts;
Time: 77,805 ms
Time: 74,836 ms
Time: 73,480 ms
2. REINDEX TABLE CONCURRENTLY pgbench_tellers;
Time: 22,260 ms
Time: 22,205 ms
Time: 21,008 ms
patched:
1. REINDEX TABLE CONCURRENTLY pgbench_accounts;
Time: 65,048 ms
Time: 61,853 ms
Time: 61,119 ms
2. REINDEX TABLE CONCURRENTLY pgbench_tellers;
Time: 15,999 ms
Time: 15,961 ms
Time: 13,264 ms
1. REINDEX TABLE CONCURRENTLY pgbench_accounts;
Time: 77,805 ms
Time: 74,836 ms
Time: 73,480 ms
2. REINDEX TABLE CONCURRENTLY pgbench_tellers;
Time: 22,260 ms
Time: 22,205 ms
Time: 21,008 ms
patched:
1. REINDEX TABLE CONCURRENTLY pgbench_accounts;
Time: 65,048 ms
Time: 61,853 ms
Time: 61,119 ms
2. REINDEX TABLE CONCURRENTLY pgbench_tellers;
Time: 15,999 ms
Time: 15,961 ms
Time: 13,264 ms
There are other places that this could be useful,
but a careful analysis is necessary.
regards,
Ranier Vilela
Вложения
At Wed, 31 Aug 2022 08:16:55 -0300, Ranier Vilela <ranier.vf@gmail.com> wrote in > Hi, > > The commit > https://github.com/postgres/postgres/commit/b17ff07aa3eb142d2cde2ea00e4a4e8f63686f96 > Introduced the CopyStatistics function. > > To do the work, CopyStatistics uses a less efficient function > to update/insert tuples at catalog systems. > > The comment at indexing.c says: > "Avoid using it for multiple tuples, since opening the indexes > * and building the index info structures is moderately expensive. > * (Use CatalogTupleInsertWithInfo in such cases.)" > > So inspired by the comment, changed in some fews places, > the CatalogInsert/CatalogUpdate to more efficient functions > CatalogInsertWithInfo/CatalogUpdateWithInfo. > > With quick tests, resulting in small performance. Considering the whole operation usually takes far longer time, I'm not sure that amount of performance gain is useful or not, but I like the change as a matter of tidiness or as example for later codes. > There are other places that this could be useful, > but a careful analysis is necessary. What kind of concern do have in your mind? By the way, there is another similar function CatalogTupleMultiInsertWithInfo() which would be more time-efficient (but not space-efficient), which is used in InsertPgAttributeTuples. I don't see a clear criteria of choosing which one of the two, though. I think the overhead of catalog index open is significant when any other time-consuming tasks are not involved in the whole operation. In that sense, in term of performance, rather storeOperations and storePrecedures (called under DefineOpCalss) might get more benefit from that if disregarding the rareness of the command being used.. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Em qua., 31 de ago. de 2022 às 22:12, Kyotaro Horiguchi <horikyota.ntt@gmail.com> escreveu:
At Wed, 31 Aug 2022 08:16:55 -0300, Ranier Vilela <ranier.vf@gmail.com> wrote in
> Hi,
>
> The commit
> https://github.com/postgres/postgres/commit/b17ff07aa3eb142d2cde2ea00e4a4e8f63686f96
> Introduced the CopyStatistics function.
>
> To do the work, CopyStatistics uses a less efficient function
> to update/insert tuples at catalog systems.
>
> The comment at indexing.c says:
> "Avoid using it for multiple tuples, since opening the indexes
> * and building the index info structures is moderately expensive.
> * (Use CatalogTupleInsertWithInfo in such cases.)"
>
> So inspired by the comment, changed in some fews places,
> the CatalogInsert/CatalogUpdate to more efficient functions
> CatalogInsertWithInfo/CatalogUpdateWithInfo.
>
> With quick tests, resulting in small performance.
Hi,
Thanks for taking a look at this.
Considering the whole operation usually takes far longer time, I'm not
sure that amount of performance gain is useful or not, but I like the
change as a matter of tidiness or as example for later codes.
Yeah, this serves as an example for future codes.
> There are other places that this could be useful,
> but a careful analysis is necessary.
What kind of concern do have in your mind?
Code Bloat.
3 more lines are required per call (CatalogTupleInsert/CatalogTupleUpdate).
However not all code paths are reachable.
The ideal typical case would be CopyStatistics, I think.
With none or at least one filter in tuples loop.
The cost to call CatalogOpenIndexes unconditionally, should be considered.
By the way, there is another similar function
CatalogTupleMultiInsertWithInfo() which would be more time-efficient
(but not space-efficient), which is used in InsertPgAttributeTuples. I
don't see a clear criteria of choosing which one of the two, though.
I don't think CatalogTupleMultiInsertWithInfo would be useful in these cases reported here.
The cost of building the slots I think would be unfeasible and would add unnecessary complexity.
The cost of building the slots I think would be unfeasible and would add unnecessary complexity.
I think the overhead of catalog index open is significant when any
other time-consuming tasks are not involved in the whole operation.
In that sense, in term of performance, rather storeOperations and
storePrecedures (called under DefineOpCalss) might get more benefit
from that if disregarding the rareness of the command being used..
Yeah, storeOperations and storePrecedures are good candidates.
Let's wait for the patch to be accepted and committed, so we can try to change it.
I will create a CF entry.
regards,
Ranier Vilela
On Thu, Sep 01, 2022 at 08:42:15AM -0300, Ranier Vilela wrote: > Let's wait for the patch to be accepted and committed, so we can try to > change it. FWIW, I think that this switch is a good idea for cases where we potentially update a bunch of tuples, especially based on what CatalogTupleInsert() tells in its top comment. Each code path updated here needs a performance check to see if that's noticeable enough, but I can get behind the one of CopyStatistics(), at least. EnumValuesCreate() would matter less as this would require a large set of values in an enum, but perhaps ORMs would care and that should be measurable. update_attstats() should lead to a measurable difference with a relation that has a bunch of attributes with few tuples. DefineTSConfiguration() is less of an issue, still fine to change. AddRoleMems() should be equally measurable with a large DDL. As a whole, this looks pretty sane to me and a good idea to move on with. I still need to check properly the code paths changed here, of course.. -- Michael
Вложения
Em qui., 10 de nov. de 2022 às 05:16, Michael Paquier <michael@paquier.xyz> escreveu:
On Thu, Sep 01, 2022 at 08:42:15AM -0300, Ranier Vilela wrote:
> Let's wait for the patch to be accepted and committed, so we can try to
> change it.
FWIW, I think that this switch is a good idea for cases where we
potentially update a bunch of tuples, especially based on what
CatalogTupleInsert() tells in its top comment.
That's the idea.
Each code path updated
here needs a performance check to see if that's noticeable enough, but
I can get behind the one of CopyStatistics(), at least.
For CopyStatistics() have performance checks.
EnumValuesCreate() would matter less as this would require a large set
of values in an enum, but perhaps ORMs would care and that should be
measurable.
Have a list_length call, for a number of vals.
For 2 or more vals, it is already worth it, since CatalogOpenIndexes/CatalogCloseIndexes will be called for each val.
update_attstats() should lead to a measurable difference
with a relation that has a bunch of attributes with few tuples.
Same here.
For 2 or more attributes, it is already worth it, since CatalogOpenIndexes/CatalogCloseIndexes will be called for each.
DefineTSConfiguration() is less of an issue, still fine to change.
Ok.
AddRoleMems() should be equally measurable with a large DDL. As a
whole, this looks pretty sane to me and a good idea to move on with.
One filter, only.
For all these functions, the only case that would possibly have no effect would be in the case of changing a single tuple, in which case there would be only one call CatalogOpenIndexes/CatalogCloseIndexes for both paths.
I still need to check properly the code paths changed here, of
course..
At least, the patch still applies properly.
regards,
Ranier Vilela
On Thu, Nov 10, 2022 at 08:56:25AM -0300, Ranier Vilela wrote: > For CopyStatistics() have performance checks. You are not giving all the details of your tests, though, so I had a look with some of my stuff using the attached set of SQL functions (create_function.sql) to create a bunch of indexes with a maximum number of expressions, as of: select create_table_cols('tab', 32); select create_index_multi_exprs('ind', 400, 'tab', 32); insert into tab values (1); analyze tab; -- 12.8k~ pg_statistic records On HEAD, a REINDEX CONCURRENTLY for the table 'tab' takes 1550ms on my laptop with an average of 10 runs. The patch impacts the runtime with a single session, making the execution down to 1480ms as per an effect of the maximum number of attributes on an index being 32. There may be some noise, but there is a trend, and some perf profiles confirm the same with CopyStatistics(). My case is a bit extreme, of course, still that's something. Anyway, while reviewing this code, it occured to me that we could do even better than this proposal once we switch to CatalogTuplesMultiInsertWithInfo() for the data insertion. This would reduce more the operation overhead by switching to multi INSERTs rather than 1 INSERT for each index attribute with tuples stored in a set of TupleTableSlots, meaning 1 WAL record rather than N records. The approach would be similar to what you do for dependencies, see for example recordMultipleDependencies() when it comes to the number of slots used etc. -- Michael
Вложения
Em sex., 11 de nov. de 2022 às 01:54, Michael Paquier <michael@paquier.xyz> escreveu:
On Thu, Nov 10, 2022 at 08:56:25AM -0300, Ranier Vilela wrote:
> For CopyStatistics() have performance checks.
You are not giving all the details of your tests, though,
Windows 10 64 bits
SSD 256 GB
pgbench -i
pgbench_accounts;
pgbench_tellers;
Simple test, based on tables created by pgbench.
so I had a
look with some of my stuff using the attached set of SQL functions
(create_function.sql) to create a bunch of indexes with a maximum
number of expressions, as of:
select create_table_cols('tab', 32);
select create_index_multi_exprs('ind', 400, 'tab', 32);
insert into tab values (1);
analyze tab; -- 12.8k~ pg_statistic records
On HEAD, a REINDEX CONCURRENTLY for the table 'tab' takes 1550ms on my
laptop with an average of 10 runs. The patch impacts the runtime with
a single session, making the execution down to 1480ms as per an effect
of the maximum number of attributes on an index being 32. There may
be some noise, but there is a trend, and some perf profiles confirm
the same with CopyStatistics(). My case is a bit extreme, of course,
still that's something.
Anyway, while reviewing this code, it occured to me that we could do
even better than this proposal once we switch to
CatalogTuplesMultiInsertWithInfo() for the data insertion.
This would reduce more the operation overhead by switching to multi
INSERTs rather than 1 INSERT for each index attribute with tuples
stored in a set of TupleTableSlots, meaning 1 WAL record rather than N
records. The approach would be similar to what you do for
dependencies, see for example recordMultipleDependencies() when it
comes to the number of slots used etc.
I think complexity doesn't pay off.
For example, CopyStatistics not knowing how many tuples will be processed.
IMHO, this step is right now.
CatalogTupleInsertWithInfo offers considerable improvement without introducing bugs and maintenance issues.
Ranier Vilela
On Sat, Nov 12, 2022 at 11:03:46AM -0300, Ranier Vilela wrote: > I think complexity doesn't pay off. > For example, CopyStatistics not knowing how many tuples will be processed. > IMHO, this step is right now. > CatalogTupleInsertWithInfo offers considerable improvement without > introducing bugs and maintenance issues. Considerable may be a bit an overstatement? I can see a difference in profiles when switching from one to the other in some extreme cases, but for the REINDEX CONCURRENTLY case most of the runtime is going to be eaten in the wait phases, the index build and its validation. Anyway, multi-inserts are going to be solution better than CatalogTupleInsertWithInfo() in some cases, because we would just generate one WAL record of N inserts rather than N records with one INSERT each. Looking closely, EnumValuesCreate() is a DDL path but I'd like to think that two enum values are at least present at creation in most cases. AddRoleMems() becomes relevant when using more than one role, which is a less common pattern, so I'd be fine with switching to a single index-opening approach with CatalogTupleUpdateWithInfo() as you suggest without the tuple slot management. CopyStatistics() does not know in advance the number of tuples it would insert, and it would be a gain when there are more than 2 expressions with entries in pg_statistic as of HEAD. Perhaps you're right with your simple suggestion to stick with CatalogTupleUpdateWithInfo() in this case. Maybe there is some external code calling this routine for tables, who knows. update_attstats() is actually an area that cannot be changed now that I look at it, as we could finish to update some entries, so the slot approach will not be relevant, but using CatalogTupleUpdateWithInfo() is. (As a matter of fact, the regression test suite is reporting that update_attstats() is called for one attribute 10% of the time, did not check the insert/update rate though). Would you like to give a try with the tuple slot management in EnumValuesCreate()? -- Michael
Вложения
On Tue, Nov 15, 2022 at 09:57:26AM +0900, Michael Paquier wrote: > Anyway, multi-inserts are going to be solution better than > CatalogTupleInsertWithInfo() in some cases, because we would just > generate one WAL record of N inserts rather than N records with one > INSERT each. Something that you did not consider in the initial patch is that we may finish by opening catalog indexes even in cases where this would not have happened on HEAD, as we may finish by doing nothing when copying the stats or updating them during an analyze, and that's not fine IMO. However it is easy enough to minimize the cost: just do a CatalogOpenIndexes() when absolutely required, and close things only if the indexes have been opened. Then, there are the cases where it is worth switching to a multi-insert logic as these are going to manipulate more than 2 entries all the time: enum list addition and two code paths of tsearchcmds.c (where up to 16 entries can be lined up). This is a case-by-case analysis. For example, in the case of the enums, the number of elements is known in advance so it is possible to know the number of slots that would be used and initialize them. But that's not something you would do for the first tsearch bits where the data is built upon a scan so the slot init should be delayed. The second tsearch one can use a predictible approach, like the enums based on the number of known elements to insert. So I've given a try at all that, and finished with the attached. This patch finishes with a list of bullet points, so this had better be split into different commits, I guess. Thoughts? -- Michael
Вложения
Em ter., 15 de nov. de 2022 às 04:02, Michael Paquier <michael@paquier.xyz> escreveu:
On Tue, Nov 15, 2022 at 09:57:26AM +0900, Michael Paquier wrote:
> Anyway, multi-inserts are going to be solution better than
> CatalogTupleInsertWithInfo() in some cases, because we would just
> generate one WAL record of N inserts rather than N records with one
> INSERT each.
Something that you did not consider in the initial patch is that we
may finish by opening catalog indexes even in cases where this would
not have happened on HEAD, as we may finish by doing nothing when
copying the stats or updating them during an analyze, and that's not
fine IMO. However it is easy enough to minimize the cost: just do a
CatalogOpenIndexes() when absolutely required, and close things only
if the indexes have been opened.
I find it very difficult not to have some tuple to be updated,
once inside CopyStatistics and the branch cost can get in the way,
but I don't object with your solution.
Then, there are the cases where it is worth switching to a
multi-insert logic as these are going to manipulate more than 2
entries all the time: enum list addition and two code paths of
tsearchcmds.c (where up to 16 entries can be lined up). This is a
case-by-case analysis. For example, in the case of the enums, the
number of elements is known in advance so it is possible to know the
number of slots that would be used and initialize them. But that's
not something you would do for the first tsearch bits where the data
is built upon a scan so the slot init should be delayed. The second
tsearch one can use a predictible approach, like the enums based on
the number of known elements to insert.
Makes sense.
So I've given a try at all that, and finished with the attached. This
patch finishes with a list of bullet points, so this had better be
split into different commits, I guess.
Thoughts?
Missed AddRoleMems?
Could you continue with CatalogTupleInsertWithInfo, what do you think?
regards,
Ranier Vilela
On Tue, Nov 15, 2022 at 11:42:34AM -0300, Ranier Vilela wrote: > I find it very difficult not to have some tuple to be updated, > once inside CopyStatistics and the branch cost can get in the way, > but I don't object with your solution. The code assumes that it is a possibility. > Missed AddRoleMems? > Could you continue with CatalogTupleInsertWithInfo, what do you think? This one has been left out on purpose. I was tempting to use WithInfo() with a CatalogIndexState opened optionally but I got the impression that it makes the code a bit harder to follow and AddRoleMems() is already complex on its own. Most DDL patterns working on role would involve one role. More roles could be added of course in one shot, but the extra logic complexity did not look that appealing to me especially as some role updates are skipped. -- Michael
Вложения
On Wed, Nov 16, 2022 at 06:58:01AM +0900, Michael Paquier wrote: > This one has been left out on purpose. I was tempting to use > WithInfo() with a CatalogIndexState opened optionally but I got the > impression that it makes the code a bit harder to follow and > AddRoleMems() is already complex on its own. Most DDL patterns > working on role would involve one role. More roles could be added of > course in one shot, but the extra logic complexity did not look that > appealing to me especially as some role updates are skipped. I have worked more on that today, and applied all that after splitting the whole in three commits in total as different areas were touched. It looks like we are good for this thread, then. I have spotted more optimizations possible, particularly for operator classes, but that could happen later. -- Michael
Вложения
Em qua., 16 de nov. de 2022 às 04:23, Michael Paquier <michael@paquier.xyz> escreveu:
On Wed, Nov 16, 2022 at 06:58:01AM +0900, Michael Paquier wrote:
> This one has been left out on purpose. I was tempting to use
> WithInfo() with a CatalogIndexState opened optionally but I got the
> impression that it makes the code a bit harder to follow and
> AddRoleMems() is already complex on its own. Most DDL patterns
> working on role would involve one role. More roles could be added of
> course in one shot, but the extra logic complexity did not look that
> appealing to me especially as some role updates are skipped.
I have worked more on that today, and applied all that after splitting
the whole in three commits in total as different areas were touched.
It looks like we are good for this thread, then.
Thanks Michael.
I have spotted more optimizations possible, particularly for operator
classes, but that could happen later.
Good to know.
regards,
Ranier Vilela