Обсуждение: Re: improve DEBUG1 logging of parallel workers for CREATE INDEX?
Hi, On Fri, Jan 03, 2025 at 12:30:14PM -0600, Sami Imseih wrote: > While reviewing patch [1], I realized that the DEBUG1 message > for CREATE INDEX could do better in providing information > about parallel workers launched. Currently, the message just > shows how many workers are planned, but a user may want > to ensure that they have the appropriate number of workers > launched as well when they enable DEBUG1 logging. Yeah, one could see how many workers are currently running but would not be able to figure out once the index is created. So adding the info in the log makes sense to me. > I prepared a simple patch, attached, for this. The log message matches > the format > used in VACUUM VERBOSE ( for consistency sake ). A few random comments: === 1 s/parallel vacuum workers for index vacuuming/parallel workers for index creation/? (2 times) === 2 - (errmsg_internal("building index \"%s\" on table \"%s\" with request for %d parallel workers", + (errmsg_internal("building index \"%s\" on table \"%s\"", I'd add "in parallel" to match its counterpart "serially" above. That would make it more clear in case one just look for "building index" in the log. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi,
Le mer. 8 janv. 2025 à 18:37, Sami Imseih <samimseih@gmail.com> a écrit :
> s/parallel vacuum workers for index vacuuming/parallel workers for index creation/?
oops, that's my oversight from copying the message from vacuum. fixed.
> I'd add "in parallel" to match its counterpart "serially" above. That would
> make it more clear in case one just look for "building index" in the log.
good point.
Below is the new output with the attached v2.
postgres=# create index foo_idx1 on foo(id);
DEBUG: building index "foo_idx1" on table "foo" with parallel workers
DEBUG: launched 1 parallel workers for index creation (planned: 1)
DEBUG: index "foo_idx1" can safely use deduplication
CREATE INDEX
I tend to agree it might be better than Benoit's patch on the index messages, though I'm afraid that DEBUG1 level won't work for many users. DEBUGx are for developers, not users. A better log level for me would be LOG. For client messages, LOG messages won't be displayed by default. So there's still a need for a "SET client_min_messages to LOG" but it won't be necessary to go all the way to DEBUG1.
Regards.
Guillaume.
Guillaume Lelarge <guillaume@lelarge.info> writes: > I tend to agree it might be better than Benoit's patch on the index > messages, though I'm afraid that DEBUG1 level won't work for many users. > DEBUGx are for developers, not users. A better log level for me would be > LOG. Please, no. That would result in spamming the postmaster log for perfectly ordinary actions, with no usable way to prevent it. I think DEBUG is perfectly appropriate for this sort of message. regards, tom lane
> I think DEBUG is perfectly appropriate for this sort of message. I am curious what are the thoughts on introducing a CREATE INDEX VERBOSE which can provide this info? similar to users scripting VACUUM VERBOSE to log more details about the vacuum operation including parallel usage. I have not convinced myself this is a good idea, but maybe someone has an opinion. If not, we can just stick with a better DEBUG1 message. Regards, Sami
Le mer. 8 janv. 2025 à 21:35, Tom Lane <tgl@sss.pgh.pa.us> a écrit :
Guillaume Lelarge <guillaume@lelarge.info> writes:
> I tend to agree it might be better than Benoit's patch on the index
> messages, though I'm afraid that DEBUG1 level won't work for many users.
> DEBUGx are for developers, not users. A better log level for me would be
> LOG.
Please, no. That would result in spamming the postmaster log
for perfectly ordinary actions, with no usable way to prevent it.
Not sure why I was thinking it would only go to the client. You're right, it will also go to the logs, and it will be stored by default, which would be bad. Sorry about this...
I think DEBUG is perfectly appropriate for this sort of message.
Still think DEBUG level is pretty bad for a user experience. Anyway, I much prefer a DEBUG message than no message at all :)
Guillaume.
Le mer. 8 janv. 2025 à 21:44, Sami Imseih <samimseih@gmail.com> a écrit :
> I think DEBUG is perfectly appropriate for this sort of message.
I am curious what are the thoughts on introducing a
CREATE INDEX VERBOSE which can provide this info?
similar to users scripting VACUUM VERBOSE to log
more details about the vacuum operation including parallel
usage.
I have not convinced myself this is a good idea, but maybe
someone has an opinion. If not, we can just stick with a better
DEBUG1 message.
Sounds better to me than relying on DEBUG messages. So, yeah, VERBOSE has my vote.
--
Guillaume.
Sami Imseih <samimseih@gmail.com> writes: > I am curious what are the thoughts on introducing a > CREATE INDEX VERBOSE which can provide this info? > similar to users scripting VACUUM VERBOSE to log > more details about the vacuum operation including parallel > usage. What I can recall being discussed in the past is to extend EXPLAIN and/or EXPLAIN ANALYZE to cover utility statements that have nontrivial execution complexity --- for example, ALTER TABLE has a lot of machinery underneath, and people often wish to know things like whether a particular ALTER will cause a table rewrite or not. Of course, a patch for that would be a few orders of magnitude larger than what you've got here :-(. But if you're looking for a framework for reporting these sorts of details, I'd much rather go in that direction than follow the model of VACUUM VERBOSE. VACUUM VERBOSE is a kluge with little to recommend it other than having been easy to implement. regards, tom lane
> Of course, a patch for that would be a few orders of magnitude > larger than what you've got here :-(. But if you're looking > for a framework for reporting these sorts of details, I'd > much rather go in that direction than follow the model of > VACUUM VERBOSE. VACUUM VERBOSE is a kluge with little to > recommend it other than having been easy to implement. To my surprise, REINDEX does have a VERBOSE option. should have check this earlier :) postgres=# reindex (verbose) index t_idx1; INFO: index "t_idx1" was reindexed DETAIL: CPU: user: 5.33 s, system: 0.48 s, elapsed: 6.26 s REINDEX Is there a reason not to do the same for CREATE INDEX? Also, we can improve the REINDEX verbose message by also providing the parallel usage. Regards, Sami
Le jeu. 9 janv. 2025 à 04:24, Sami Imseih <samimseih@gmail.com> a écrit :
> Of course, a patch for that would be a few orders of magnitude
> larger than what you've got here :-(. But if you're looking
> for a framework for reporting these sorts of details, I'd
> much rather go in that direction than follow the model of
> VACUUM VERBOSE. VACUUM VERBOSE is a kluge with little to
> recommend it other than having been easy to implement.
To my surprise, REINDEX does have a VERBOSE option.
should have check this earlier :)
postgres=# reindex (verbose) index t_idx1;
INFO: index "t_idx1" was reindexed
DETAIL: CPU: user: 5.33 s, system: 0.48 s, elapsed: 6.26 s
REINDEX
Is there a reason not to do the same for CREATE INDEX?
Sounds a good idea to me.
Also, we can improve the REINDEX verbose message by
also providing the parallel usage.
+1
Guillaume.
> Hmm. I am reading Tom's opinion that goes toward not going in this > direction for more commands, with the point to extend EXPLAIN to show > this kind of information: > https://www.postgresql.org/message-id/1692530.1736369905@sss.pgh.pa.us That sounds like the ability to do something like EXPLAIN CREATE INDEX ... is that correct? > So do we really want to do what's proposed here? I'm +-0 about the > VERBOSE option attached to more commands, as it is a bit harder for > clients to catch the information wanted. So here comes my question: > how do we want to consume this information at the end from the > perspective of the client? For interactive usage in psql or pgadmin, it's trivial to capture this information. The information can also be written to the server log with log_min_messages=INFO A bit more work is required to redirect messages to an out file from psql, as you need to ensure that stderr is redirected to a file. It's also a bit more work to capture this information from something like a JDBC application. IMO the interactive use-case is where this is the most useful as you can start a CREATE INDEX (VERBOSE) and ensure that it's going to launch all the parallel workers that it planned before letting it it continue; or control-c and figure out why not all planned workers launched. Regards, Sami
Hi,
Le dim. 19 janv. 2025 à 06:40, Sami Imseih <samimseih@gmail.com> a écrit :
> Hmm. I am reading Tom's opinion that goes toward not going in this
> direction for more commands, with the point to extend EXPLAIN to show
> this kind of information:
> https://www.postgresql.org/message-id/1692530.1736369905@sss.pgh.pa.us
That sounds like the ability to do something like EXPLAIN CREATE INDEX ...
is that correct?
Yes, and I suppose this would be quite some work to do. Though I see how interesting it would be, and a nice project to hack on.
> So do we really want to do what's proposed here? I'm +-0 about the
> VERBOSE option attached to more commands, as it is a bit harder for
> clients to catch the information wanted.
It sounds to me a lot easier to know about a VERBOSE option that gives you more information, than using a DEBUG level to get the same information. DEBUG level messages aren't explained in the CREATE INDEX manpage, whereas a VERBOSE option would be.
So here comes my question:
> how do we want to consume this information at the end from the
> perspective of the client?
For interactive usage in psql or pgadmin, it's trivial to capture this
information.
+1
The information can also be written to the server log with log_min_messages=INFO
A bit more work is required to redirect messages to an out file from psql, as
you need to ensure that stderr is redirected to a file.
It's also a bit more work to capture this information from something like a
JDBC application.
IMO the interactive use-case is where this is the most useful as you can start a
CREATE INDEX (VERBOSE) and ensure that it's going to launch all the parallel
workers that it planned before letting it it continue; or control-c and figure
out why not all planned workers launched.
Guillaume.
> > > Hmm. I am reading Tom's opinion that goes toward not going in this > > > direction for more commands, with the point to extend EXPLAIN to show > > > this kind of information: > > > https://www.postgresql.org/message-id/1692530.1736369905@sss.pgh.pa.us > > > > That sounds like the ability to do something like EXPLAIN CREATE INDEX ... > > is that correct? > > > Yes, and I suppose this would be quite some work to do. Though I see how > interesting it would be, and a nice project to hack on. I have been contemplating this and do see how something like EXPLAIN could be useful, as it would give users the ability to examine what a command will do, i.e. EXPLAIN CREATE INDEX will tell you how many parallel workers are planned and EXPLAIN (ANALYZE) CREATE INDEX will actually execute the command and provide the actual parallel workers launched. Tom also mentioned the example of a user wanting to know if a table rewrite will occur. There are other cases where this could be useful. FWIW, Oracle does support EXPLAIN PLAN FOR CREATE INDEX and EXPLAIN PLAN FOR REBUILD (for index rebuilds) [1][2], but this functionality is not officially documented [3]. Oracle also provides a documented function to estimate the size of an index [4]. I do not have access to an Oracle instance to verify the state of current versions of Oracle, but I do vaguely remember this from when I worked on Oracle many years ago :) Now, I also realized that v1 does not include logging for ALTER TABLE commands that add an index, such as "ALTER TABLE foo ADD CONSTRAINT foo_uq UNIQUE (c1)" or attaching a partition and ensuring the index is built. So this needs to be added to the current proposal. So far these are 2 options being discussed: 1/ The current proposal of: CREATE INDEX (VERBOSE) and ALTER TABLE (VERBOSE) and output debugging at INFO level. 2/ EXPLAIN This is a lot more work. Currently QueryDesc has all the info we need to generate the execution plan. But, we will likely need a new struct that can track the debugging info to relay it back up to explain. And this has to be generic enough for all cases. Also, implementing an EXPLAIN and EXPLAIN ANALYZE for such commands may not even be feasible. Plans are different in that they are generated up-front and we can choose to execute them or not. This is not the same for the other types of commands being discussed. I am not convinced it's worth the effort. thoughts? other approaches? [1] https://oracle-randolf.blogspot.com/2009/02/explain-plan-on-ddls.html [2] https://stackoverflow.com/questions/827123/how-can-i-estimate-the-size-of-an-oracle-index [3] https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/EXPLAIN-PLAN.html [4] https://docs.oracle.com/en/database/oracle/oracle-database/21/arpls/DBMS_SPACE.html#GUID-B6EB7527-BC67-4394-9E7A-01F2790C409A Regards, Sami
Le lun. 20 janv. 2025 à 04:28, Sami Imseih <samimseih@gmail.com> a écrit :
> > > Hmm. I am reading Tom's opinion that goes toward not going in this
> > > direction for more commands, with the point to extend EXPLAIN to show
> > > this kind of information:
> > > https://www.postgresql.org/message-id/1692530.1736369905@sss.pgh.pa.us
> >
> > That sounds like the ability to do something like EXPLAIN CREATE INDEX ...
> > is that correct?
> >
> Yes, and I suppose this would be quite some work to do. Though I see how
> interesting it would be, and a nice project to hack on.
I have been contemplating this and do see how something like EXPLAIN
could be useful, as it would give users the ability to examine
what a command will do, i.e. EXPLAIN CREATE INDEX will tell you
how many parallel workers are planned and EXPLAIN (ANALYZE) CREATE
INDEX will actually execute the command and provide the actual parallel
workers launched. Tom also mentioned the example of a user wanting to know if
a table rewrite will occur. There are other cases where this could be useful.
FWIW, Oracle does support EXPLAIN PLAN FOR CREATE INDEX and
EXPLAIN PLAN FOR REBUILD (for index rebuilds) [1][2], but this functionality
is not officially documented [3]. Oracle also provides a documented function to
estimate the size of an index [4]. I do not have access to an Oracle instance to
verify the state of current versions of Oracle, but I do vaguely remember this
from when I worked on Oracle many years ago :)
Now, I also realized that v1 does not include logging for ALTER TABLE commands
that add an index, such as "ALTER TABLE foo ADD CONSTRAINT foo_uq UNIQUE (c1)"
or attaching a partition and ensuring the index is built. So this
needs to be added to
the current proposal.
So far these are 2 options being discussed:
1/
The current proposal of:
CREATE INDEX (VERBOSE) and ALTER TABLE (VERBOSE)
and output debugging at INFO level.
I guess the major issue is that it won't scale fine.
2/
EXPLAIN
This is a lot more work. Currently QueryDesc has all the info we
need to generate the execution plan. But, we will likely need a
new struct that can track the debugging info to relay it back
up to explain. And this has to be generic enough for all cases.
Also, implementing an EXPLAIN and EXPLAIN ANALYZE for such commands
may not even be feasible. Plans are different in that they are generated
up-front and we can choose to execute them or not. This is not the same
for the other types of commands being discussed.
I am not convinced it's worth the effort.
thoughts? other approaches?
You might be interested by this thread "Thinking about EXPLAIN ALTER TABLE":
I didn't had the time to read the whole thread, but it might be quite interesting.
[1] https://oracle-randolf.blogspot.com/2009/02/explain-plan-on-ddls.html
[2] https://stackoverflow.com/questions/827123/how-can-i-estimate-the-size-of-an-oracle-index
[3] https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/EXPLAIN-PLAN.html
[4] https://docs.oracle.com/en/database/oracle/oracle-database/21/arpls/DBMS_SPACE.html#GUID-B6EB7527-BC67-4394-9E7A-01F2790C409A
Regards,
Sami
--
Guillaume.
> You might be interested by this thread "Thinking about EXPLAIN ALTER TABLE": > https://www.postgresql.org/message-id/CAM-w4HNm1M5J-ow8UjTcqRe3JPxkVCrGe56tRpPUSePSdGcZ_w%40mail.gmail.com I reviewed this thread, and the primary issue with the EXPLAIN command lies in the inability to predict all the steps the ALTER TABLE will take as some are made in phase 2 or 3. It is unlikely that all significant decisions can be made in phase 1. However, I don't think that EXPLAIN addresses the same problem as the proposed VERBOSE option. Consider, for instance, a user who intends to perform a schema change that includes a long sequence of ALTERs. These steps will depend on each other. Tom raises this point of dependent ALTERs [2] but for a different reason. I think however, this is an important point. How will EXPLAIN help here? It may not tell you the truth, because it does not actually do the work and can't know the future state of the schema. VERBOSE on the other hand will perform the steps, and a user can test these changes in a test environment or a schema-only restore and know exactly what to expect in production. As this thread has expanded beyond its original scope, I believe this broader discussion is valuable. Rather than merely addressing the DEBUG1 issue, we can work toward a more complete and beneficial solution. Thoughts? [0] https://www.postgresql.org/message-id/26597.1544460770%40sss.pgh.pa.us Regards, Sami
At this point I am planning on withdrawing this patch from the March commitfest. I don't think fixing the REINDEX debug1 output makes a whole lot of sense. I still think more logging for (CREATE|ALTER) (INDEX|TABLE) will be a good to have but there needs to be more discussion about the best approach. Regards, Sami