Обсуждение: Re: improve DEBUG1 logging of parallel workers for CREATE INDEX?

Поиск

Список

Период

Сортировка

Re: improve DEBUG1 logging of parallel workers for CREATE INDEX?

От

Bertrand Drouvot

Дата:

08 января 2025 г., 18:21:07

Hi,

On Fri, Jan 03, 2025 at 12:30:14PM -0600, Sami Imseih wrote:
> While reviewing patch [1], I realized that the DEBUG1 message
> for CREATE INDEX could do better in providing information
> about parallel workers launched. Currently, the message just
> shows how many workers are planned, but a user may want
> to ensure that they have the appropriate number of workers
> launched as well when they enable DEBUG1 logging.

Yeah, one could see how many workers are currently running but would not be
able to figure out once the index is created. So adding the info in the log
makes sense to me.

> I prepared a simple patch, attached, for this. The log message matches
> the format
> used in VACUUM VERBOSE ( for consistency sake ).

A few random comments:

=== 1

s/parallel vacuum workers for index vacuuming/parallel workers for index creation/?

(2 times)

=== 2

-  (errmsg_internal("building index \"%s\" on table \"%s\" with request for %d parallel workers",
+  (errmsg_internal("building index \"%s\" on table \"%s\"",

I'd add "in parallel" to match its counterpart "serially" above. That would
make it more clear in case one just look for "building index" in the log.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: improve DEBUG1 logging of parallel workers for CREATE INDEX?

От

Guillaume Lelarge

Дата:

08 января 2025 г., 23:21:46

Hi,

Le mer. 8 janv. 2025 à 18:37, Sami Imseih <samimseih@gmail.com> a écrit :

> s/parallel vacuum workers for index vacuuming/parallel workers for index creation/?

oops, that's my oversight from copying the message from vacuum. fixed.

> I'd add "in parallel" to match its counterpart "serially" above. That would
> make it more clear in case one just look for "building index" in the log.

good point.

Below is the new output with the attached v2.

postgres=# create index foo_idx1 on foo(id);
DEBUG: building index "foo_idx1" on table "foo" with parallel workers
DEBUG: launched 1 parallel workers for index creation (planned: 1)
DEBUG: index "foo_idx1" can safely use deduplication
CREATE INDEX

I tend to agree it might be better than Benoit's patch on the index messages, though I'm afraid that DEBUG1 level won't work for many users. DEBUGx are for developers, not users. A better log level for me would be LOG. For client messages, LOG messages won't be displayed by default. So there's still a need for a "SET client_min_messages to LOG" but it won't be necessary to go all the way to DEBUG1.

Regards.

Guillaume.

Re: improve DEBUG1 logging of parallel workers for CREATE INDEX?

От

Tom Lane

Дата:

08 января 2025 г., 23:35:38

Guillaume Lelarge <guillaume@lelarge.info> writes:
> I tend to agree it might be better than Benoit's patch on the index
> messages, though I'm afraid that DEBUG1 level won't work for many users.
> DEBUGx are for developers, not users. A better log level for me would be
> LOG.

Please, no.  That would result in spamming the postmaster log
for perfectly ordinary actions, with no usable way to prevent it.

I think DEBUG is perfectly appropriate for this sort of message.

            regards, tom lane

Re: improve DEBUG1 logging of parallel workers for CREATE INDEX?

От

Sami Imseih

Дата:

08 января 2025 г., 23:44:00

> I think DEBUG is perfectly appropriate for this sort of message.

I am curious what are the thoughts on introducing a
CREATE INDEX VERBOSE which can provide this info?
similar to users scripting VACUUM VERBOSE to log
more details about the vacuum operation including parallel
usage.

I have not convinced myself this is a good idea, but maybe
someone has an opinion. If not, we can just stick with a better
DEBUG1 message.

Regards,

Sami

Re: improve DEBUG1 logging of parallel workers for CREATE INDEX?

От

Guillaume Lelarge

Дата:

08 января 2025 г., 23:46:39

Le mer. 8 janv. 2025 à 21:35, Tom Lane <tgl@sss.pgh.pa.us> a écrit :

Guillaume Lelarge <guillaume@lelarge.info> writes:
> I tend to agree it might be better than Benoit's patch on the index
> messages, though I'm afraid that DEBUG1 level won't work for many users.
> DEBUGx are for developers, not users. A better log level for me would be
> LOG.

Please, no. That would result in spamming the postmaster log
for perfectly ordinary actions, with no usable way to prevent it.

Not sure why I was thinking it would only go to the client. You're right, it will also go to the logs, and it will be stored by default, which would be bad. Sorry about this...

I think DEBUG is perfectly appropriate for this sort of message.

Still think DEBUG level is pretty bad for a user experience. Anyway, I much prefer a DEBUG message than no message at all :)

Guillaume.

Re: improve DEBUG1 logging of parallel workers for CREATE INDEX?

От

Guillaume Lelarge

Дата:

08 января 2025 г., 23:48:17

Le mer. 8 janv. 2025 à 21:44, Sami Imseih <samimseih@gmail.com> a écrit :

> I think DEBUG is perfectly appropriate for this sort of message.

I am curious what are the thoughts on introducing a
CREATE INDEX VERBOSE which can provide this info?
similar to users scripting VACUUM VERBOSE to log
more details about the vacuum operation including parallel
usage.

I have not convinced myself this is a good idea, but maybe
someone has an opinion. If not, we can just stick with a better
DEBUG1 message.

Sounds better to me than relying on DEBUG messages. So, yeah, VERBOSE has my vote.

Guillaume.

Re: improve DEBUG1 logging of parallel workers for CREATE INDEX?

От

Tom Lane

Дата:

08 января 2025 г., 23:58:25

Sami Imseih <samimseih@gmail.com> writes:
> I am curious what are the thoughts on introducing a
> CREATE INDEX VERBOSE which can provide this info?
> similar to users scripting VACUUM VERBOSE to log
> more details about the vacuum operation including parallel
> usage.

What I can recall being discussed in the past is to extend
EXPLAIN and/or EXPLAIN ANALYZE to cover utility statements
that have nontrivial execution complexity --- for example,
ALTER TABLE has a lot of machinery underneath, and people
often wish to know things like whether a particular ALTER
will cause a table rewrite or not.

Of course, a patch for that would be a few orders of magnitude
larger than what you've got here :-(.  But if you're looking
for a framework for reporting these sorts of details, I'd
much rather go in that direction than follow the model of
VACUUM VERBOSE.  VACUUM VERBOSE is a kluge with little to
recommend it other than having been easy to implement.

            regards, tom lane

Re: improve DEBUG1 logging of parallel workers for CREATE INDEX?

От

Sami Imseih

Дата:

09 января 2025 г., 06:24:27

> Of course, a patch for that would be a few orders of magnitude
> larger than what you've got here :-(.  But if you're looking
> for a framework for reporting these sorts of details, I'd
> much rather go in that direction than follow the model of
> VACUUM VERBOSE.  VACUUM VERBOSE is a kluge with little to
> recommend it other than having been easy to implement.

To my surprise, REINDEX does have a VERBOSE option.
should have check this earlier :)

postgres=# reindex (verbose) index t_idx1;
INFO:  index "t_idx1" was reindexed
DETAIL:  CPU: user: 5.33 s, system: 0.48 s, elapsed: 6.26 s
REINDEX

Is there a reason not to do the same for CREATE INDEX?

Also, we can improve the REINDEX verbose message by
also providing the parallel usage.

Regards,

Sami

Re: improve DEBUG1 logging of parallel workers for CREATE INDEX?

От

Guillaume Lelarge

Дата:

11 января 2025 г., 13:29:57

Le jeu. 9 janv. 2025 à 04:24, Sami Imseih <samimseih@gmail.com> a écrit :

> Of course, a patch for that would be a few orders of magnitude
> larger than what you've got here :-(. But if you're looking
> for a framework for reporting these sorts of details, I'd
> much rather go in that direction than follow the model of
> VACUUM VERBOSE. VACUUM VERBOSE is a kluge with little to
> recommend it other than having been easy to implement.

To my surprise, REINDEX does have a VERBOSE option.
should have check this earlier :)

postgres=# reindex (verbose) index t_idx1;
INFO: index "t_idx1" was reindexed
DETAIL: CPU: user: 5.33 s, system: 0.48 s, elapsed: 6.26 s
REINDEX

Is there a reason not to do the same for CREATE INDEX?

Sounds a good idea to me.

Also, we can improve the REINDEX verbose message by
also providing the parallel usage.

Guillaume.

Re: improve DEBUG1 logging of parallel workers for CREATE INDEX?

От

Sami Imseih

Дата:

19 января 2025 г., 08:40:32

> Hmm.  I am reading Tom's opinion that goes toward not going in this
> direction for more commands, with the point to extend EXPLAIN to show
> this kind of information:
> https://www.postgresql.org/message-id/1692530.1736369905@sss.pgh.pa.us

That sounds like the ability to do something like EXPLAIN CREATE INDEX ...
is that correct?

> So do we really want to do what's proposed here?  I'm +-0 about the
> VERBOSE option attached to more commands, as it is a bit harder for
> clients to catch the information wanted. So here comes my question:
> how do we want to consume this information at the end from the
> perspective of the client?

For interactive usage in psql or pgadmin, it's trivial to capture this
information. The
information can also be written to the server log with log_min_messages=INFO

A bit more work is required to redirect messages to an out file from psql, as
you need to ensure that stderr is redirected to a file.
It's also a bit more work to capture this information from something like a
JDBC application.

IMO the interactive use-case is where this is the most useful as you can start a
CREATE INDEX (VERBOSE) and ensure that it's going to launch all the parallel
workers that it planned before letting it it continue; or control-c and figure
out why not all planned workers launched.

Regards,

Sami

Re: improve DEBUG1 logging of parallel workers for CREATE INDEX?

От

Guillaume Lelarge

Дата:

19 января 2025 г., 13:30:13

Hi,

Le dim. 19 janv. 2025 à 06:40, Sami Imseih <samimseih@gmail.com> a écrit :

> Hmm. I am reading Tom's opinion that goes toward not going in this
> direction for more commands, with the point to extend EXPLAIN to show
> this kind of information:
> https://www.postgresql.org/message-id/1692530.1736369905@sss.pgh.pa.us

That sounds like the ability to do something like EXPLAIN CREATE INDEX ...
is that correct?

Yes, and I suppose this would be quite some work to do. Though I see how interesting it would be, and a nice project to hack on.

> So do we really want to do what's proposed here? I'm +-0 about the
> VERBOSE option attached to more commands, as it is a bit harder for
> clients to catch the information wanted.

It sounds to me a lot easier to know about a VERBOSE option that gives you more information, than using a DEBUG level to get the same information. DEBUG level messages aren't explained in the CREATE INDEX manpage, whereas a VERBOSE option would be.

So here comes my question:
> how do we want to consume this information at the end from the
> perspective of the client?

For interactive usage in psql or pgadmin, it's trivial to capture this
information.

The information can also be written to the server log with log_min_messages=INFO

A bit more work is required to redirect messages to an out file from psql, as
you need to ensure that stderr is redirected to a file.
It's also a bit more work to capture this information from something like a
JDBC application.

IMO the interactive use-case is where this is the most useful as you can start a
CREATE INDEX (VERBOSE) and ensure that it's going to launch all the parallel
workers that it planned before letting it it continue; or control-c and figure
out why not all planned workers launched.

Guillaume.

Re: improve DEBUG1 logging of parallel workers for CREATE INDEX?

От

Sami Imseih

Дата:

20 января 2025 г., 06:27:57

> > > Hmm.  I am reading Tom's opinion that goes toward not going in this
> > > direction for more commands, with the point to extend EXPLAIN to show
> > > this kind of information:
> > > https://www.postgresql.org/message-id/1692530.1736369905@sss.pgh.pa.us
> >
> > That sounds like the ability to do something like EXPLAIN CREATE INDEX ...
> > is that correct?
> >

> Yes, and I suppose this would be quite some work to do. Though I see how
> interesting it would be, and a nice project to hack on.

I have been contemplating this and do see how something like EXPLAIN
could be useful, as it would give users the ability to examine
what a command will do, i.e. EXPLAIN CREATE INDEX will tell you
how many parallel workers are planned and EXPLAIN (ANALYZE) CREATE
INDEX will actually execute the command and provide the actual parallel
workers launched. Tom also mentioned the example of a user wanting to know if
a table rewrite will occur. There are other cases where this could be useful.

FWIW, Oracle does support EXPLAIN PLAN FOR CREATE INDEX and
EXPLAIN PLAN FOR REBUILD (for index rebuilds) [1][2], but this functionality
is not officially documented [3]. Oracle also provides a documented function to
estimate the size of an index [4]. I do not have access to an Oracle instance to
verify the state of current versions of Oracle, but I do vaguely remember this
from when I worked on Oracle many years ago :)

Now, I also realized that v1 does not include logging for ALTER TABLE commands
that add an index, such as "ALTER TABLE foo ADD CONSTRAINT foo_uq UNIQUE (c1)"
or attaching a partition and ensuring the index is built. So this
needs to be added to
the current proposal.

So far these are 2 options being discussed:

1/
The current proposal of:

CREATE INDEX (VERBOSE) and ALTER TABLE (VERBOSE)
and output debugging at INFO level.

2/
EXPLAIN

This is a lot more work. Currently QueryDesc has all the info we
need to generate the execution plan. But, we will likely need a
new struct that can track the debugging info to relay it back
up to explain. And this has to be generic enough for all cases.

Also, implementing an EXPLAIN and EXPLAIN ANALYZE for such commands
may not even be feasible. Plans are different in that they are generated
up-front and we can choose to execute them or not. This is not the same
for the other types of commands being discussed.

I am not convinced it's worth the effort.


thoughts? other approaches?


[1] https://oracle-randolf.blogspot.com/2009/02/explain-plan-on-ddls.html
[2] https://stackoverflow.com/questions/827123/how-can-i-estimate-the-size-of-an-oracle-index
[3] https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/EXPLAIN-PLAN.html
[4]
https://docs.oracle.com/en/database/oracle/oracle-database/21/arpls/DBMS_SPACE.html#GUID-B6EB7527-BC67-4394-9E7A-01F2790C409A

Regards,

Sami

Re: improve DEBUG1 logging of parallel workers for CREATE INDEX?

От

Guillaume Lelarge

Дата:

20 января 2025 г., 10:24:54

Le lun. 20 janv. 2025 à 04:28, Sami Imseih <samimseih@gmail.com> a écrit :

> > > Hmm. I am reading Tom's opinion that goes toward not going in this
> > > direction for more commands, with the point to extend EXPLAIN to show
> > > this kind of information:
> > > https://www.postgresql.org/message-id/1692530.1736369905@sss.pgh.pa.us
> >
> > That sounds like the ability to do something like EXPLAIN CREATE INDEX ...
> > is that correct?
> >

> Yes, and I suppose this would be quite some work to do. Though I see how
> interesting it would be, and a nice project to hack on.

I have been contemplating this and do see how something like EXPLAIN
could be useful, as it would give users the ability to examine
what a command will do, i.e. EXPLAIN CREATE INDEX will tell you
how many parallel workers are planned and EXPLAIN (ANALYZE) CREATE
INDEX will actually execute the command and provide the actual parallel
workers launched. Tom also mentioned the example of a user wanting to know if
a table rewrite will occur. There are other cases where this could be useful.

FWIW, Oracle does support EXPLAIN PLAN FOR CREATE INDEX and
EXPLAIN PLAN FOR REBUILD (for index rebuilds) [1][2], but this functionality
is not officially documented [3]. Oracle also provides a documented function to
estimate the size of an index [4]. I do not have access to an Oracle instance to
verify the state of current versions of Oracle, but I do vaguely remember this
from when I worked on Oracle many years ago :)

Now, I also realized that v1 does not include logging for ALTER TABLE commands
that add an index, such as "ALTER TABLE foo ADD CONSTRAINT foo_uq UNIQUE (c1)"
or attaching a partition and ensuring the index is built. So this
needs to be added to
the current proposal.

So far these are 2 options being discussed:

1/
The current proposal of:

CREATE INDEX (VERBOSE) and ALTER TABLE (VERBOSE)
and output debugging at INFO level.

I guess the major issue is that it won't scale fine.

2/
EXPLAIN

This is a lot more work. Currently QueryDesc has all the info we
need to generate the execution plan. But, we will likely need a
new struct that can track the debugging info to relay it back
up to explain. And this has to be generic enough for all cases.

Also, implementing an EXPLAIN and EXPLAIN ANALYZE for such commands
may not even be feasible. Plans are different in that they are generated
up-front and we can choose to execute them or not. This is not the same
for the other types of commands being discussed.

I am not convinced it's worth the effort.

thoughts? other approaches?

You might be interested by this thread "Thinking about EXPLAIN ALTER TABLE":

https://www.postgresql.org/message-id/CAM-w4HNm1M5J-ow8UjTcqRe3JPxkVCrGe56tRpPUSePSdGcZ_w%40mail.gmail.com

I didn't had the time to read the whole thread, but it might be quite interesting.

[1] https://oracle-randolf.blogspot.com/2009/02/explain-plan-on-ddls.html
[2] https://stackoverflow.com/questions/827123/how-can-i-estimate-the-size-of-an-oracle-index
[3] https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/EXPLAIN-PLAN.html
[4] https://docs.oracle.com/en/database/oracle/oracle-database/21/arpls/DBMS_SPACE.html#GUID-B6EB7527-BC67-4394-9E7A-01F2790C409A

Regards,

Sami

Guillaume.

Re: improve DEBUG1 logging of parallel workers for CREATE INDEX?

От

Sami Imseih

Дата:

22 января 2025 г., 03:33:00

> You might be interested by this thread "Thinking about EXPLAIN ALTER TABLE":

> https://www.postgresql.org/message-id/CAM-w4HNm1M5J-ow8UjTcqRe3JPxkVCrGe56tRpPUSePSdGcZ_w%40mail.gmail.com

I reviewed this thread, and the primary issue with the EXPLAIN command lies
in the inability to predict all the steps the ALTER TABLE will take as some
are made in phase 2 or 3. It is unlikely that all significant
decisions can be made
in phase 1.

However, I don't think that EXPLAIN addresses the same problem as the proposed
VERBOSE option. Consider, for instance, a user who intends to perform a schema
change that includes a long sequence of ALTERs. These steps will depend on each
other. Tom raises this point of dependent ALTERs [2]  but for a
different reason.
I think however, this is an important point. How will EXPLAIN help
here? It may not tell
you the truth, because it does not actually do the work and can't know
the future
state of the schema.

VERBOSE on the other hand will perform the steps, and a user
can test these changes in a test environment or a schema-only restore
and know exactly what to expect in production.

As this thread has expanded beyond its original scope,
I believe this broader discussion is valuable. Rather than merely
addressing the DEBUG1 issue, we can work toward a more complete
and beneficial solution.

Thoughts?

[0] https://www.postgresql.org/message-id/26597.1544460770%40sss.pgh.pa.us

Regards,

Sami

Re: improve DEBUG1 logging of parallel workers for CREATE INDEX?

От

Sami Imseih

Дата:

03 февраля 2025 г., 23:29:03

At this point I am planning on withdrawing this patch
from the March commitfest. I don't think fixing the REINDEX
debug1 output makes a whole lot of sense. I still think more logging
for (CREATE|ALTER) (INDEX|TABLE) will be a good to have but there
needs to be more discussion about the best approach.

Regards,

Sami

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Re: improve DEBUG1 logging of parallel workers for CREATE INDEX?