Обсуждение: Add new error_action COPY ON_ERROR "log"

Поиск
Список
Период
Сортировка

Add new error_action COPY ON_ERROR "log"

От
torikoshia
Дата:
Hi,

As described in 9e2d870119, COPY ON_EEOR is expected to have more 
"error_action".
(Note that option name was changed by b725b7eec)

I'd like to have a new option "log", which skips soft errors and logs 
information that should have resulted in errors to PostgreSQL log.

I think this option has some advantages like below:

1) We can know which number of line input data was not loaded and 
reason.

   Example:

   =# copy t1 from stdin with (on_error log);
   Enter data to be copied followed by a newline.
   End with a backslash and a period on a line by itself, or an EOF 
signal.
   >> 1
   >> 2
   >> 3
   >> z
   >> \.
   LOG:  invalid input syntax for type integer: "z"
   NOTICE:  1 row was skipped due to data type incompatibility
   COPY 3

   =# \! tail data/log/postgresql*.log
   LOG:  22P02: invalid input syntax for type integer: "z"
   CONTEXT:  COPY t1, line 4, column i: "z"
   LOCATION:  pg_strtoint32_safe, numutils.c:620
   STATEMENT:  copy t1 from stdin with (on_error log);


2) Easier maintenance than storing error information in tables or 
proprietary log files.
For example, in case a large number of soft errors occur, some 
mechanisms are needed to prevent an infinite increase in the size of the 
destination data, but we can left it to PostgreSQL's log rotation.


Attached a patch.
This basically comes from previous discussion[1] which did both "ignore" 
and "log" soft error.

As shown in the example above, the log output to the client does not 
contain CONTEXT, so I'm a little concerned that client cannot see what 
line of the input data had a problem without looking at the server log.


What do you think?

[1] 
https://www.postgresql.org/message-id/c0fb57b82b150953f26a5c7e340412e8%40oss.nttdata.com

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation
Вложения

Re: Add new error_action COPY ON_ERROR "log"

От
jian he
Дата:
On Fri, Jan 26, 2024 at 12:42 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
>
> Hi,
>
> As described in 9e2d870119, COPY ON_EEOR is expected to have more
> "error_action".
> (Note that option name was changed by b725b7eec)
>
> I'd like to have a new option "log", which skips soft errors and logs
> information that should have resulted in errors to PostgreSQL log.
>
> I think this option has some advantages like below:
>
> 1) We can know which number of line input data was not loaded and
> reason.
>
>    Example:
>
>    =# copy t1 from stdin with (on_error log);
>    Enter data to be copied followed by a newline.
>    End with a backslash and a period on a line by itself, or an EOF
> signal.
>    >> 1
>    >> 2
>    >> 3
>    >> z
>    >> \.
>    LOG:  invalid input syntax for type integer: "z"
>    NOTICE:  1 row was skipped due to data type incompatibility
>    COPY 3
>
>    =# \! tail data/log/postgresql*.log
>    LOG:  22P02: invalid input syntax for type integer: "z"
>    CONTEXT:  COPY t1, line 4, column i: "z"
>    LOCATION:  pg_strtoint32_safe, numutils.c:620
>    STATEMENT:  copy t1 from stdin with (on_error log);
>
>
> 2) Easier maintenance than storing error information in tables or
> proprietary log files.
> For example, in case a large number of soft errors occur, some
> mechanisms are needed to prevent an infinite increase in the size of the
> destination data, but we can left it to PostgreSQL's log rotation.
>
>
> Attached a patch.
> This basically comes from previous discussion[1] which did both "ignore"
> and "log" soft error.
>
> As shown in the example above, the log output to the client does not
> contain CONTEXT, so I'm a little concerned that client cannot see what
> line of the input data had a problem without looking at the server log.
>
>
> What do you think?
>

I doubt the following part:
      If the <literal>log</literal> value is specified,
      <command>COPY</command> behaves the same as
<literal>ignore</literal>, exept that
      it logs information that should have resulted in errors to
PostgreSQL log at
      <literal>INFO</literal> level.

I think it does something like:
When an error happens, cstate->escontext->error_data->elevel will be ERROR
you manually change the cstate->escontext->error_data->elevel to LOG,
then you call ThrowErrorData.

but it's not related to `<literal>INFO</literal> level`?
my log_min_messages is default, warning.



Re: Add new error_action COPY ON_ERROR "log"

От
"David G. Johnston"
Дата:
On Thu, Jan 25, 2024 at 9:42 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
Hi,

As described in 9e2d870119, COPY ON_EEOR is expected to have more
"error_action".
(Note that option name was changed by b725b7eec)

I'd like to have a new option "log", which skips soft errors and logs
information that should have resulted in errors to PostgreSQL log.


Seems like an easy win but largely unhelpful in the typical case.  I suppose ETL routines using this feature may be running on their machine under root or "postgres" but in a system where they are not this very useful information is inaccessible to them.  I suppose the DBA could set up an extractor to send these specific log lines elsewhere but that seems like enough hassle to disfavor this approach and favor one that can place the soft error data and feedback into user-specified tables in the same database.  Setting up temporary tables or unlogged tables probably is going to be a more acceptable methodology than trying to get to the log files.

David J.

Re: Add new error_action COPY ON_ERROR "log"

От
torikoshia
Дата:
On Fri, Jan 26, 2024 at 10:44 PM jian he <jian.universality@gmail.com> 
wrote:

> I doubt the following part:
>       If the <literal>log</literal> value is specified,
>       <command>COPY</command> behaves the same as
> <literal>ignore</literal>, exept that
>       it logs information that should have resulted in errors to
> PostgreSQL log at
>       <literal>INFO</literal> level.
> 
> I think it does something like:
> When an error happens, cstate->escontext->error_data->elevel will be 
> ERROR
> you manually change the cstate->escontext->error_data->elevel to LOG,
> then you call ThrowErrorData.
> 
> but it's not related to `<literal>INFO</literal> level`?
> my log_min_messages is default, warning.

Thanks!

Modified them to NOTICE in accordance with the following summary 
message:
> NOTICE:  x row was skipped due to data type incompatibility


On 2024-01-27 00:43, David G. Johnston wrote:
> On Thu, Jan 25, 2024 at 9:42 AM torikoshia
> <torikoshia@oss.nttdata.com> wrote:
> 
>> Hi,
>> 
>> As described in 9e2d870119, COPY ON_EEOR is expected to have more
>> "error_action".
>> (Note that option name was changed by b725b7eec)
>> 
>> I'd like to have a new option "log", which skips soft errors and
>> logs
>> information that should have resulted in errors to PostgreSQL log.
> 
> Seems like an easy win but largely unhelpful in the typical case.  I
> suppose ETL routines using this feature may be running on their
> machine under root or "postgres" but in a system where they are not
> this very useful information is inaccessible to them.  I suppose the
> DBA could set up an extractor to send these specific log lines
> elsewhere but that seems like enough hassle to disfavor this approach
> and favor one that can place the soft error data and feedback into
> user-specified tables in the same database.  Setting up temporary
> tables or unlogged tables probably is going to be a more acceptable
> methodology than trying to get to the log files.
> 
> David J.

I agree that not a few people would prefer to store error information in 
tables and there have already been suggestions[1].

OTOH not everyone thinks saving table information is the best idea[2].

I think it would be desirable for ON_ERROR to be in a form that allows 
the user to choose where to store error information from among some 
options, such as table, log and file.

"ON_ERROR log" would be useful at least in the case of 'running on their 
machine under root or "postgres"' as you pointed out.


[1] 
https://www.postgresql.org/message-id/CACJufxEkkqnozdnvNMGxVAA94KZaCPkYw_Cx4JKG9ueNaZma_A%40mail.gmail.com

[2] 
https://www.postgresql.org/message-id/20231109002600.fuihn34bjqqgmbjm@awork3.anarazel.de

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation
Вложения

Re: Add new error_action COPY ON_ERROR "log"

От
Bharath Rupireddy
Дата:
On Mon, Jan 29, 2024 at 8:41 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
>
> On 2024-01-27 00:43, David G. Johnston wrote:
>
> >> I'd like to have a new option "log", which skips soft errors and
> >> logs
> >> information that should have resulted in errors to PostgreSQL log.
> >
> > user-specified tables in the same database.  Setting up temporary
> > tables or unlogged tables probably is going to be a more acceptable
> > methodology than trying to get to the log files.
>
> OTOH not everyone thinks saving table information is the best idea[2].

The added NOTICE message gives a summary of how many rows were
skipped, but not the line numbers. It's hard for the users to find the
malformed data, especially when they are bulk-inserting from data
files of multiple GBs in size (hard to open such files in any editor
too).

I understand the value of writing the info to server log or tables of
users' choice as being discussed in a couple of other threads.
However, I'd prefer we do the simplest thing first before we go debate
server log or table - let the users know what line numbers are
containing the errors that COPY ignored something like [1] with a
simple change like [2]. It not only helps users figure out which rows
and attributes were malformed, but also helps them redirect them to
server logs with setting log_min_messages = notice [3]. In the worst
case scenario, a problem with this one NOTICE per malformed row is
that it can overload the psql session if all the rows are malformed.
I'm not sure if this is a big problem, but IMO better than a single
summary NOTICE message and simpler than writing to tables of users'
choice.

Thoughts?

FWIW, I presented the new COPY ... ON_ERROR option feature at
Hyderabad PostgreSQL User Group meetup and it was well-received by the
audience. The audience felt it's going to be extremely helpful for
bulk-loading tasks. Thank you all for working on this feature.

[1]
postgres=# CREATE TABLE check_ign_err (n int, m int[], k int);
CREATE TABLE
postgres=# COPY check_ign_err FROM STDIN WITH (on_error ignore);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> 1    {1}     1
a       {2}     2
3       {3}     3333333333
4       {a, 4}  4

5       {5}>> >> >> >> >>       5
\.>>
NOTICE:  detected data type incompatibility at line number 2 for
column check_ign_err, COPY n
NOTICE:  detected data type incompatibility at line number 3 for
column check_ign_err, COPY k
NOTICE:  detected data type incompatibility at line number 4 for
column check_ign_err, COPY m
NOTICE:  detected data type incompatibility at line number 5 for
column check_ign_err, COPY n
NOTICE:  4 rows were skipped due to data type incompatibility
COPY 2

[2]
diff --git a/src/backend/commands/copyfromparse.c
b/src/backend/commands/copyfromparse.c
index 906756362e..93ab5d4ebb 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -961,7 +961,16 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,

                 &values[m]))
                        {
                                cstate->num_errors++;
-                               return true;
+
+                               if (cstate->opts.on_error != COPY_ON_ERROR_STOP)
+                               {
+                                       ereport(NOTICE,
+
errmsg("detected data type incompatibility at line number %llu for
column %s, COPY %s",
+
(unsigned long long) cstate->cur_lineno,
+
cstate->cur_relname,
+
cstate->cur_attname));
+                                       return true;
+                               }
                        }

                        cstate->cur_attname = NULL;

[3]
2024-02-12 06:20:29.363 UTC [427892] NOTICE:  detected data type
incompatibility at line number 2 for column check_ign_err, COPY n
2024-02-12 06:20:29.363 UTC [427892] CONTEXT:  COPY check_ign_err,
line 2, column n: "a"
2024-02-12 06:20:29.363 UTC [427892] NOTICE:  detected data type
incompatibility at line number 3 for column check_ign_err, COPY k
2024-02-12 06:20:29.363 UTC [427892] CONTEXT:  COPY check_ign_err,
line 3, column k: "3333333333"
2024-02-12 06:20:29.363 UTC [427892] NOTICE:  detected data type
incompatibility at line number 4 for column check_ign_err, COPY m
2024-02-12 06:20:29.363 UTC [427892] CONTEXT:  COPY check_ign_err,
line 4, column m: "{a, 4}"
2024-02-12 06:20:29.363 UTC [427892] NOTICE:  detected data type
incompatibility at line number 5 for column check_ign_err, COPY n
2024-02-12 06:20:29.363 UTC [427892] CONTEXT:  COPY check_ign_err,
line 5, column n: ""
2024-02-12 06:20:29.363 UTC [427892] NOTICE:  4 rows were skipped due
to data type incompatibility

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Add new error_action COPY ON_ERROR "log"

От
torikoshia
Дата:
On 2024-02-12 15:46, Bharath Rupireddy wrote:

Thanks for your comments.

> On Mon, Jan 29, 2024 at 8:41 AM torikoshia <torikoshia@oss.nttdata.com> 
> wrote:
>> 
>> On 2024-01-27 00:43, David G. Johnston wrote:
>> 
>> >> I'd like to have a new option "log", which skips soft errors and
>> >> logs
>> >> information that should have resulted in errors to PostgreSQL log.
>> >
>> > user-specified tables in the same database.  Setting up temporary
>> > tables or unlogged tables probably is going to be a more acceptable
>> > methodology than trying to get to the log files.
>> 
>> OTOH not everyone thinks saving table information is the best idea[2].
> 
> The added NOTICE message gives a summary of how many rows were
> skipped, but not the line numbers. It's hard for the users to find the
> malformed data, especially when they are bulk-inserting from data
> files of multiple GBs in size (hard to open such files in any editor
> too).
> 
> I understand the value of writing the info to server log or tables of
> users' choice as being discussed in a couple of other threads.
> However, I'd prefer we do the simplest thing first before we go debate
> server log or table - let the users know what line numbers are
> containing the errors that COPY ignored something like [1] with a
> simple change like [2].

Agreed.
Unlike my patch, it hides the error information(i.e. 22P02: invalid 
input syntax for type integer: ), but I feel that it's usually 
sufficient to know the row number and column where the error occurred.

> It not only helps users figure out which rows
> and attributes were malformed, but also helps them redirect them to
> server logs with setting log_min_messages = notice [3]. In the worst
> case scenario, a problem with this one NOTICE per malformed row is
> that it can overload the psql session if all the rows are malformed.
> I'm not sure if this is a big problem, but IMO better than a single
> summary NOTICE message and simpler than writing to tables of users'
> choice.

Maybe could we do what you suggested for the behavior when 'log' is set 
to on_error?

> Thoughts?
> 
> FWIW, I presented the new COPY ... ON_ERROR option feature at
> Hyderabad PostgreSQL User Group meetup and it was well-received by the
> audience. The audience felt it's going to be extremely helpful for
> bulk-loading tasks. Thank you all for working on this feature.

Thanks for informing it!
I hope it will not be reverted as the audience:)

> [1]
> postgres=# CREATE TABLE check_ign_err (n int, m int[], k int);
> CREATE TABLE
> postgres=# COPY check_ign_err FROM STDIN WITH (on_error ignore);
> Enter data to be copied followed by a newline.
> End with a backslash and a period on a line by itself, or an EOF 
> signal.
>>> 1    {1}     1
> a       {2}     2
> 3       {3}     3333333333
> 4       {a, 4}  4
> 
> 5       {5}>> >> >> >> >>       5
> \.>>
> NOTICE:  detected data type incompatibility at line number 2 for
> column check_ign_err, COPY n
> NOTICE:  detected data type incompatibility at line number 3 for
> column check_ign_err, COPY k
> NOTICE:  detected data type incompatibility at line number 4 for
> column check_ign_err, COPY m
> NOTICE:  detected data type incompatibility at line number 5 for
> column check_ign_err, COPY n
> NOTICE:  4 rows were skipped due to data type incompatibility
> COPY 2
> 
> [2]
> diff --git a/src/backend/commands/copyfromparse.c
> b/src/backend/commands/copyfromparse.c
> index 906756362e..93ab5d4ebb 100644
> --- a/src/backend/commands/copyfromparse.c
> +++ b/src/backend/commands/copyfromparse.c
> @@ -961,7 +961,16 @@ NextCopyFrom(CopyFromState cstate, ExprContext 
> *econtext,
> 
>                  &values[m]))
>                         {
>                                 cstate->num_errors++;
> -                               return true;
> +
> +                               if (cstate->opts.on_error != 
> COPY_ON_ERROR_STOP)
> +                               {
> +                                       ereport(NOTICE,
> +
> errmsg("detected data type incompatibility at line number %llu for
> column %s, COPY %s",
> +
> (unsigned long long) cstate->cur_lineno,
> +
> cstate->cur_relname,
> +
> cstate->cur_attname));
> +                                       return true;
> +                               }
>                         }
> 
>                         cstate->cur_attname = NULL;
> 
> [3]
> 2024-02-12 06:20:29.363 UTC [427892] NOTICE:  detected data type
> incompatibility at line number 2 for column check_ign_err, COPY n
> 2024-02-12 06:20:29.363 UTC [427892] CONTEXT:  COPY check_ign_err,
> line 2, column n: "a"
> 2024-02-12 06:20:29.363 UTC [427892] NOTICE:  detected data type
> incompatibility at line number 3 for column check_ign_err, COPY k
> 2024-02-12 06:20:29.363 UTC [427892] CONTEXT:  COPY check_ign_err,
> line 3, column k: "3333333333"
> 2024-02-12 06:20:29.363 UTC [427892] NOTICE:  detected data type
> incompatibility at line number 4 for column check_ign_err, COPY m
> 2024-02-12 06:20:29.363 UTC [427892] CONTEXT:  COPY check_ign_err,
> line 4, column m: "{a, 4}"
> 2024-02-12 06:20:29.363 UTC [427892] NOTICE:  detected data type
> incompatibility at line number 5 for column check_ign_err, COPY n
> 2024-02-12 06:20:29.363 UTC [427892] CONTEXT:  COPY check_ign_err,
> line 5, column n: ""
> 2024-02-12 06:20:29.363 UTC [427892] NOTICE:  4 rows were skipped due
> to data type incompatibility
> 
> --
> Bharath Rupireddy
> PostgreSQL Contributors Team
> RDS Open Source Databases
> Amazon Web Services: https://aws.amazon.com

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation



Re: Add new error_action COPY ON_ERROR "log"

От
Bharath Rupireddy
Дата:
On Wed, Feb 14, 2024 at 5:04 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
>
> [....] let the users know what line numbers are
> > containing the errors that COPY ignored something like [1] with a
> > simple change like [2].
>
> Agreed.
> Unlike my patch, it hides the error information(i.e. 22P02: invalid
> input syntax for type integer: ), but I feel that it's usually
> sufficient to know the row number and column where the error occurred.

Right.

> > It not only helps users figure out which rows
> > and attributes were malformed, but also helps them redirect them to
> > server logs with setting log_min_messages = notice [3]. In the worst
> > case scenario, a problem with this one NOTICE per malformed row is
> > that it can overload the psql session if all the rows are malformed.
> > I'm not sure if this is a big problem, but IMO better than a single
> > summary NOTICE message and simpler than writing to tables of users'
> > choice.
>
> Maybe could we do what you suggested for the behavior when 'log' is set
> to on_error?

 My point is that why someone wants just the summary of failures
without row and column info especially for bulk loading tasks. I'd
suggest doing it independently of 'log' or 'table'.  I think we can
keep things simple just like the attached patch, and see how this
feature will be adopted. I'm sure we can come back and do things like
saving to 'log' or 'table' or 'separate_error_file' etc., if we
receive any firsthand feedback.

Thoughts?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Вложения

Re: Add new error_action COPY ON_ERROR "log"

От
torikoshia
Дата:
On 2024-02-16 17:15, Bharath Rupireddy wrote:
> On Wed, Feb 14, 2024 at 5:04 PM torikoshia <torikoshia@oss.nttdata.com> 
> wrote:
>> 
>> [....] let the users know what line numbers are
>> > containing the errors that COPY ignored something like [1] with a
>> > simple change like [2].
>> 
>> Agreed.
>> Unlike my patch, it hides the error information(i.e. 22P02: invalid
>> input syntax for type integer: ), but I feel that it's usually
>> sufficient to know the row number and column where the error occurred.
> 
> Right.
> 
>> > It not only helps users figure out which rows
>> > and attributes were malformed, but also helps them redirect them to
>> > server logs with setting log_min_messages = notice [3]. In the worst
>> > case scenario, a problem with this one NOTICE per malformed row is
>> > that it can overload the psql session if all the rows are malformed.
>> > I'm not sure if this is a big problem, but IMO better than a single
>> > summary NOTICE message and simpler than writing to tables of users'
>> > choice.
>> 
>> Maybe could we do what you suggested for the behavior when 'log' is 
>> set
>> to on_error?
> 
>  My point is that why someone wants just the summary of failures
> without row and column info especially for bulk loading tasks. I'd
> suggest doing it independently of 'log' or 'table'.  I think we can
> keep things simple just like the attached patch, and see how this
> feature will be adopted. I'm sure we can come back and do things like
> saving to 'log' or 'table' or 'separate_error_file' etc., if we
> receive any firsthand feedback.
> 
> Thoughts?

I may be wrong since I seldom do data loading tasks, but I greed with 
you.

I also a little concerned about the case where there are many malformed 
data and it causes lots of messages, but the information is usually 
valuable and if users don't need it, they can suppress it by changing 
client_min_messages.

Currently both summary of failures and individual information is logged 
in NOTICE level.
If we should assume that there are cases where only summary information 
is required, it'd be useful to set lower log level, i.e. LOG to the 
individual information.


-- 
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation



Re: Add new error_action COPY ON_ERROR "log"

От
Bharath Rupireddy
Дата:
On Fri, Feb 16, 2024 at 8:17 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
>
> I may be wrong since I seldom do data loading tasks, but I greed with
> you.
>
> I also a little concerned about the case where there are many malformed
> data and it causes lots of messages, but the information is usually
> valuable and if users don't need it, they can suppress it by changing
> client_min_messages.
>
> Currently both summary of failures and individual information is logged
> in NOTICE level.
> If we should assume that there are cases where only summary information
> is required, it'd be useful to set lower log level, i.e. LOG to the
> individual information.

How about we emit the summary at INFO level and individual information
at NOTICE level? With this, the summary is given a different priority
than the individual info. With SET client_min_messages = WARNING; one
can still get the summary but not the individual info. Also, to get
all of these into server log, one can SET log_min_messages = INFO; or
SET log_min_messages = NOTICE;.

Thoughts?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Вложения

Re: Add new error_action COPY ON_ERROR "log"

От
torikoshia
Дата:
On 2024-02-17 15:00, Bharath Rupireddy wrote:
> On Fri, Feb 16, 2024 at 8:17 PM torikoshia <torikoshia@oss.nttdata.com> 
> wrote:
>> 
>> I may be wrong since I seldom do data loading tasks, but I greed with
>> you.
>> 
>> I also a little concerned about the case where there are many 
>> malformed
>> data and it causes lots of messages, but the information is usually
>> valuable and if users don't need it, they can suppress it by changing
>> client_min_messages.
>> 
>> Currently both summary of failures and individual information is 
>> logged
>> in NOTICE level.
>> If we should assume that there are cases where only summary 
>> information
>> is required, it'd be useful to set lower log level, i.e. LOG to the
>> individual information.
> 
> How about we emit the summary at INFO level and individual information
> at NOTICE level? With this, the summary is given a different priority
> than the individual info. With SET client_min_messages = WARNING; one
> can still get the summary but not the individual info. Also, to get
> all of these into server log, one can SET log_min_messages = INFO; or
> SET log_min_messages = NOTICE;.
> 
> Thoughts?

It looks good to me.

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation



Re: Add new error_action COPY ON_ERROR "log"

От
torikoshia
Дата:
On 2024-02-20 17:22, torikoshia wrote:
> On 2024-02-17 15:00, Bharath Rupireddy wrote:
>> On Fri, Feb 16, 2024 at 8:17 PM torikoshia 
>> <torikoshia@oss.nttdata.com> wrote:
>>> 
>>> I may be wrong since I seldom do data loading tasks, but I greed with
>>> you.
>>> 
>>> I also a little concerned about the case where there are many 
>>> malformed
>>> data and it causes lots of messages, but the information is usually
>>> valuable and if users don't need it, they can suppress it by changing
>>> client_min_messages.
>>> 
>>> Currently both summary of failures and individual information is 
>>> logged
>>> in NOTICE level.
>>> If we should assume that there are cases where only summary 
>>> information
>>> is required, it'd be useful to set lower log level, i.e. LOG to the
>>> individual information.
>> 
>> How about we emit the summary at INFO level and individual information
>> at NOTICE level? With this, the summary is given a different priority
>> than the individual info. With SET client_min_messages = WARNING; one
>> can still get the summary but not the individual info. Also, to get
>> all of these into server log, one can SET log_min_messages = INFO; or
>> SET log_min_messages = NOTICE;.
>> 
>> Thoughts?
> 
> It looks good to me.

Here are comments on the v2 patch.

+               if (cstate->opts.on_error != COPY_ON_ERROR_STOP)
+               {
+                   ereport(NOTICE,

I think cstate->opts.on_error is not COPY_ON_ERROR_STOP here, since if 
it is COPY_ON_ERROR_STOP, InputFunctionCallSafe() should already have 
errored out.

Should it be something like "Assert(cstate->opts.on_error != 
COPY_ON_ERROR_STOP)"?


Should below manual also be updated?

> A NOTICE message containing the ignored row count is emitted at the end 
> of the COPY FROM if at least one row was discarded.

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation



Re: Add new error_action COPY ON_ERROR "log"

От
Bharath Rupireddy
Дата:
On Mon, Feb 26, 2024 at 5:47 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
>
> >
> > It looks good to me.
>
> Here are comments on the v2 patch.

Thanks for looking at it.

> +               if (cstate->opts.on_error != COPY_ON_ERROR_STOP)
> +               {
> +                   ereport(NOTICE,
>
> I think cstate->opts.on_error is not COPY_ON_ERROR_STOP here, since if
> it is COPY_ON_ERROR_STOP, InputFunctionCallSafe() should already have
> errored out.
>
> Should it be something like "Assert(cstate->opts.on_error !=
> COPY_ON_ERROR_STOP)"?

Nice catch. When COPY_ON_ERROR_STOP is specified, we use ereport's
soft error mechanism. An assertion seems a good choice to validate the
state is what we expect. Done that way.

> Should below manual also be updated?
>
> > A NOTICE message containing the ignored row count is emitted at the end
> > of the COPY FROM if at least one row was discarded.

Changed.

PSA v3 patch with the above review comments addressed.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Вложения

Re: Add new error_action COPY ON_ERROR "log"

От
Michael Paquier
Дата:
On Wed, Feb 28, 2024 at 12:10:00PM +0530, Bharath Rupireddy wrote:
> On Mon, Feb 26, 2024 at 5:47 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
>> +               if (cstate->opts.on_error != COPY_ON_ERROR_STOP)
>> +               {
>> +                   ereport(NOTICE,
>>
>> I think cstate->opts.on_error is not COPY_ON_ERROR_STOP here, since if
>> it is COPY_ON_ERROR_STOP, InputFunctionCallSafe() should already have
>> errored out.
>>
>> Should it be something like "Assert(cstate->opts.on_error !=
>> COPY_ON_ERROR_STOP)"?
>
> Nice catch. When COPY_ON_ERROR_STOP is specified, we use ereport's
> soft error mechanism. An assertion seems a good choice to validate the
> state is what we expect. Done that way.

Hmm.  I am not really on board with this patch, that would generate
one NOTICE message each time a row is incompatible in the soft error
mode.  If you have a couple of billion rows to bulk-load into the
backend and even 0.01% of them are corrupted, you could finish with a
more than 100k log entries, and all systems should be careful about
the log quantity generated, especially if we use the syslogger which
could become easily a bottleneck.

The existing ON_ERROR controls what to do on error.  I think that we'd
better control the amount of information reported with a completely
separate option, an option even different than where to redirect
errors (if required, which would be either the logs, the client, a
pipe, a combination of these or even all of them).
--
Michael

Вложения

Re: Add new error_action COPY ON_ERROR "log"

От
Bharath Rupireddy
Дата:
On Fri, Mar 1, 2024 at 10:22 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> > Nice catch. When COPY_ON_ERROR_STOP is specified, we use ereport's
> > soft error mechanism. An assertion seems a good choice to validate the
> > state is what we expect. Done that way.
>
> Hmm.  I am not really on board with this patch, that would generate
> one NOTICE message each time a row is incompatible in the soft error
> mode.  If you have a couple of billion rows to bulk-load into the
> backend and even 0.01% of them are corrupted, you could finish with a
> more than 100k log entries, and all systems should be careful about
> the log quantity generated, especially if we use the syslogger which
> could become easily a bottleneck.

Hm. I was having some concerns about it as mentioned upthread. But,
thanks a lot for illustrating it.

> The existing ON_ERROR controls what to do on error.  I think that we'd
> better control the amount of information reported with a completely
> separate option, an option even different than where to redirect
> errors (if required, which would be either the logs, the client, a
> pipe, a combination of these or even all of them).

How about an extra option to error_action ignore-with-verbose which is
similar to ignore but when specified emits one NOTICE per malformed
row? With this, one can say COPY x FROM stdin (ON_ERROR
ignore-with-verbose);.

Alternatively, we can think of adding a new option verbose altogether
which can be used for not only this but for reporting some other COPY
related info/errors etc. With this, one can say COPY x FROM stdin
(VERBOSE on, ON_ERROR ignore);.

There's also another way of having a separate GUC, but -100 from me
for it. Because, it not only increases the total number of GUCs by 1,
but also might set a wrong precedent to have a new GUC for controlling
command level outputs.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Add new error_action COPY ON_ERROR "log"

От
Michael Paquier
Дата:
On Mon, Mar 04, 2024 at 05:00:00AM +0530, Bharath Rupireddy wrote:
> How about an extra option to error_action ignore-with-verbose which is
> similar to ignore but when specified emits one NOTICE per malformed
> row? With this, one can say COPY x FROM stdin (ON_ERROR
> ignore-with-verbose);.
>
> Alternatively, we can think of adding a new option verbose altogether
> which can be used for not only this but for reporting some other COPY
> related info/errors etc. With this, one can say COPY x FROM stdin
> (VERBOSE on, ON_ERROR ignore);.

I would suggest a completely separate option, because that offers more
flexibility as each option has a separate meaning.  My main concern in
using one option to control them all is that one may want in the
future to be able to specify more combinations of actions at query
level, especially if more modes are added to the ON_ERROR mode.  One
option would prevent that.

Perhaps ERROR_VERBOSE or ERROR_VERBOSITY would be better names, but
I'm never wedded to my naming suggestions.  Bad history with the
matter.

> There's also another way of having a separate GUC, but -100 from me
> for it. Because, it not only increases the total number of GUCs by 1,
> but also might set a wrong precedent to have a new GUC for controlling
> command level outputs.

What does this have to do with GUCs?  The ON_ERROR option does not
have one.
--
Michael

Вложения

Re: Add new error_action COPY ON_ERROR "log"

От
Bharath Rupireddy
Дата:
On Tue, Mar 5, 2024 at 4:48 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Mon, Mar 04, 2024 at 05:00:00AM +0530, Bharath Rupireddy wrote:
> > How about an extra option to error_action ignore-with-verbose which is
> > similar to ignore but when specified emits one NOTICE per malformed
> > row? With this, one can say COPY x FROM stdin (ON_ERROR
> > ignore-with-verbose);.
> >
> > Alternatively, we can think of adding a new option verbose altogether
> > which can be used for not only this but for reporting some other COPY
> > related info/errors etc. With this, one can say COPY x FROM stdin
> > (VERBOSE on, ON_ERROR ignore);.
>
> I would suggest a completely separate option, because that offers more
> flexibility as each option has a separate meaning.  My main concern in
> using one option to control them all is that one may want in the
> future to be able to specify more combinations of actions at query
> level, especially if more modes are added to the ON_ERROR mode.  One
> option would prevent that.
>
> Perhaps ERROR_VERBOSE or ERROR_VERBOSITY would be better names, but
> I'm never wedded to my naming suggestions.  Bad history with the
> matter.

+1 for a separate option and LOG_VERBOSITY seemed a better and generic
naming choice. Because, the ON_ERROR ignore isn't actually an error
per se IMO.

> > There's also another way of having a separate GUC, but -100 from me
> > for it. Because, it not only increases the total number of GUCs by 1,
> > but also might set a wrong precedent to have a new GUC for controlling
> > command level outputs.
>
> What does this have to do with GUCs?  The ON_ERROR option does not
> have one.

My thought was to have a separate GUC for deciding log level for COPY
command messages/errors similar to log_replication_commands. But
that's a no-go for sure when compared with a separate option.

Please see the attached v4 patch. If it looks good, I can pull
LOG_VERBOSITY changes out into 0001 and with 0002 containing the
detailed messages for discarded rows.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Вложения

Re: Add new error_action COPY ON_ERROR "log"

От
Michael Paquier
Дата:
On Wed, Mar 06, 2024 at 07:32:28PM +0530, Bharath Rupireddy wrote:
> Please see the attached v4 patch. If it looks good, I can pull
> LOG_VERBOSITY changes out into 0001 and with 0002 containing the
> detailed messages for discarded rows.

The approach looks sensible seen from here.

+    LOG_VERBOSITY [ <replaceable class="parameter">boolean</replaceable> ]
[...]
+    <term><literal>LOG_VERBOSITY</literal></term>
+    <listitem>
+     <para>
+      Sets the verbosity of logged messages by <command>COPY</command>
+      command. As an example, see its usage for
+      <command>COPY FROM</command> command's <literal>ON_ERROR</literal>
+      clause with <literal>ignore</literal> option.
+     </para>
+    </listitem>

Is a boolean the best interface for the end-user, though?  Maybe
something like a "mode" value would speak more than a yes/no from the
start, say a "default" mode to emit only the last LOG and a "verbose"
for the whole set in the case of ON_ERROR?  That could use an enum
from the start internally, but that's an implementation detail.

Describing what gets logged in the paragraph of ON_ERROR sounds fine,
especially if in the future more logs are added depending on other
options.  That's an assumption at this stage, of course.

I am adding Alexander Korotkov in CC, as the original committer of
9e2d8701194f, as I assume that he may want to chime in this
discussion.

Torikoshi-san or others, if you have any comments about the interface,
feel free.
--
Michael

Вложения

Re: Add new error_action COPY ON_ERROR "log"

От
torikoshia
Дата:
On 2024-03-07 13:00, Michael Paquier wrote:
> On Wed, Mar 06, 2024 at 07:32:28PM +0530, Bharath Rupireddy wrote:
>> Please see the attached v4 patch. If it looks good, I can pull
>> LOG_VERBOSITY changes out into 0001 and with 0002 containing the
>> detailed messages for discarded rows.
> 
> The approach looks sensible seen from here.
> 
> +    LOG_VERBOSITY [ <replaceable 
> class="parameter">boolean</replaceable> ]
> [...]
> +    <term><literal>LOG_VERBOSITY</literal></term>
> +    <listitem>
> +     <para>
> +      Sets the verbosity of logged messages by <command>COPY</command>
> +      command. As an example, see its usage for
> +      <command>COPY FROM</command> command's 
> <literal>ON_ERROR</literal>
> +      clause with <literal>ignore</literal> option.
> +     </para>
> +    </listitem>
> 
> Is a boolean the best interface for the end-user, though?  Maybe
> something like a "mode" value would speak more than a yes/no from the
> start, say a "default" mode to emit only the last LOG and a "verbose"
> for the whole set in the case of ON_ERROR?  That could use an enum
> from the start internally, but that's an implementation detail.
> 
> Describing what gets logged in the paragraph of ON_ERROR sounds fine,
> especially if in the future more logs are added depending on other
> options.  That's an assumption at this stage, of course.
> 
> I am adding Alexander Korotkov in CC, as the original committer of
> 9e2d8701194f, as I assume that he may want to chime in this
> discussion.
> 
> Torikoshi-san or others, if you have any comments about the interface,
> feel free.

Thanks.

Maybe I'm overly concerned, but I'm a little concerned about whether the 
contents of this output can really be called verbose, since it does not 
output the actual soft error message that occurred, but only the row and 
column where the error occurred.

Since the soft error mechanism can at least output the contents of soft 
errors in the server log [1], it might be a good idea to use something 
like a 'mode' value instead of boolean as Michael-san suggested, so that 
the log output contents can be adjusted at multiple levels.

[1] 
https://www.postgresql.org/message-id/20230322175000.qbdctk7bnmifh5an%40awork3.anarazel.de


-- 
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation



Re: Add new error_action COPY ON_ERROR "log"

От
Masahiko Sawada
Дата:
On Thu, Mar 7, 2024 at 1:00 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Mar 06, 2024 at 07:32:28PM +0530, Bharath Rupireddy wrote:
> > Please see the attached v4 patch. If it looks good, I can pull
> > LOG_VERBOSITY changes out into 0001 and with 0002 containing the
> > detailed messages for discarded rows.
>
> The approach looks sensible seen from here.

Looks like a good approach to me.

>
> +    LOG_VERBOSITY [ <replaceable class="parameter">boolean</replaceable> ]
> [...]
> +    <term><literal>LOG_VERBOSITY</literal></term>
> +    <listitem>
> +     <para>
> +      Sets the verbosity of logged messages by <command>COPY</command>
> +      command. As an example, see its usage for
> +      <command>COPY FROM</command> command's <literal>ON_ERROR</literal>
> +      clause with <literal>ignore</literal> option.
> +     </para>
> +    </listitem>
>
> Is a boolean the best interface for the end-user, though?  Maybe
> something like a "mode" value would speak more than a yes/no from the
> start, say a "default" mode to emit only the last LOG and a "verbose"
> for the whole set in the case of ON_ERROR?  That could use an enum
> from the start internally, but that's an implementation detail.

+1 for making it an enum, so that we will be able to have multiple
levels for example to get actual soft error contents.

One question I have is; do we want to write multiple NOTICE messages
for one row if the row has malformed data on some columns?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Add new error_action COPY ON_ERROR "log"

От
Michael Paquier
Дата:
On Thu, Mar 07, 2024 at 03:52:41PM +0900, Masahiko Sawada wrote:
> One question I have is; do we want to write multiple NOTICE messages
> for one row if the row has malformed data on some columns?

Good idea.  We can do that as the field strings are already parsed.
--
Michael

Вложения

Re: Add new error_action COPY ON_ERROR "log"

От
Bharath Rupireddy
Дата:
On Thu, Mar 7, 2024 at 9:30 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> Is a boolean the best interface for the end-user, though?  Maybe
> something like a "mode" value would speak more than a yes/no from the
> start, say a "default" mode to emit only the last LOG and a "verbose"
> for the whole set in the case of ON_ERROR?  That could use an enum
> from the start internally, but that's an implementation detail.

I'm okay with it. But, help me understand it better. We want the
'log_verbosity' clause to have options 'default' and 'verbose', right?
And, later it can also be extended to contain all the LOG levels like
'notice', 'error', 'info' , 'debugX' etc. depending on the need,
right?

One more thing, how does it sound using both verbosity and verbose in
log_verbosity verbose something like below? Is this okay?

COPY check_ign_err FROM STDIN WITH (on_error ignore, log_verbosity verbose);

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Add new error_action COPY ON_ERROR "log"

От
Bharath Rupireddy
Дата:
On Thu, Mar 7, 2024 at 12:37 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, Mar 07, 2024 at 03:52:41PM +0900, Masahiko Sawada wrote:
> > One question I have is; do we want to write multiple NOTICE messages
> > for one row if the row has malformed data on some columns?
>
> Good idea.  We can do that as the field strings are already parsed.

Nice catch. So, are you suggesting to log one NOTICE message per row
even if multiple columns in the single row fail to parse or are
malformed?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Add new error_action COPY ON_ERROR "log"

От
Michael Paquier
Дата:
On Thu, Mar 07, 2024 at 12:48:12PM +0530, Bharath Rupireddy wrote:
> I'm okay with it. But, help me understand it better. We want the
> 'log_verbosity' clause to have options 'default' and 'verbose', right?
> And, later it can also be extended to contain all the LOG levels like
> 'notice', 'error', 'info' , 'debugX' etc. depending on the need,
> right?

You could, or names that have some status like row_details, etc.

> One more thing, how does it sound using both verbosity and verbose in
> log_verbosity verbose something like below? Is this okay?

There's some history with this pattern in psql at least with \set
VERBOSITY verbose.  For the patch, I would tend to choose these two,
but that's as far as my opinion goes and I am OK other ideas gather
more votes.
--
Michael

Вложения

Re: Add new error_action COPY ON_ERROR "log"

От
Michael Paquier
Дата:
On Thu, Mar 07, 2024 at 12:50:33PM +0530, Bharath Rupireddy wrote:
> Nice catch. So, are you suggesting to log one NOTICE message per row
> even if multiple columns in the single row fail to parse or are
> malformed?

One NOTICE per malformed value, I guess, which would be easier to
parse particularly if the values are long (like, JSON-long).
--
Michael

Вложения

Re: Add new error_action COPY ON_ERROR "log"

От
Bharath Rupireddy
Дата:
On Thu, Mar 7, 2024 at 12:54 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, Mar 07, 2024 at 12:48:12PM +0530, Bharath Rupireddy wrote:
> > I'm okay with it. But, help me understand it better. We want the
> > 'log_verbosity' clause to have options 'default' and 'verbose', right?
> > And, later it can also be extended to contain all the LOG levels like
> > 'notice', 'error', 'info' , 'debugX' etc. depending on the need,
> > right?
>
> You could, or names that have some status like row_details, etc.
>
> > One more thing, how does it sound using both verbosity and verbose in
> > log_verbosity verbose something like below? Is this okay?
>
> There's some history with this pattern in psql at least with \set
> VERBOSITY verbose.  For the patch, I would tend to choose these two,
> but that's as far as my opinion goes and I am OK other ideas gather
> more votes.

Please see the attached v5-0001 patch implementing LOG_VERBOSITY with
options 'default' and 'verbose'. v5-0002 adds the detailed info to
ON_ERROR 'ignore' option.

We have a CF entry https://commitfest.postgresql.org/47/4798/ for the
original idea proposed in this thread, that is, to have the ON_ERROR
'log' option. I'll probably start a new thread and add a new CF entry
in the next commitfest if there's no objection from anyone.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Вложения

Re: Add new error_action COPY ON_ERROR "log"

От
Michael Paquier
Дата:
On Fri, Mar 08, 2024 at 03:36:30PM +0530, Bharath Rupireddy wrote:
> Please see the attached v5-0001 patch implementing LOG_VERBOSITY with
> options 'default' and 'verbose'. v5-0002 adds the detailed info to
> ON_ERROR 'ignore' option.

I may be reading this patch set incorrectly, but why doesn't this
patch generate one NOTICE per attribute, as suggested by Sawada-san,
incrementing num_errors once per row when the last attribute has been
processed?  Also, why not have a test that checks that multiple rows
spawn more than more messages in some distributed fashion?  Did you
look at this idea?

> We have a CF entry https://commitfest.postgresql.org/47/4798/ for the
> original idea proposed in this thread, that is, to have the ON_ERROR
> 'log' option. I'll probably start a new thread and add a new CF entry
> in the next commitfest if there's no objection from anyone.

Hmm.  You are referring to the part where you'd want to control where
the errors are sent, right?  At least, what you have here would
address point 1) mentioned in the first message of this thread.
--
Michael

Вложения

Re: Add new error_action COPY ON_ERROR "log"

От
Bharath Rupireddy
Дата:
On Fri, Mar 8, 2024 at 4:42 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Fri, Mar 08, 2024 at 03:36:30PM +0530, Bharath Rupireddy wrote:
> > Please see the attached v5-0001 patch implementing LOG_VERBOSITY with
> > options 'default' and 'verbose'. v5-0002 adds the detailed info to
> > ON_ERROR 'ignore' option.
>
> I may be reading this patch set incorrectly, but why doesn't this
> patch generate one NOTICE per attribute, as suggested by Sawada-san,
> incrementing num_errors once per row when the last attribute has been
> processed?  Also, why not have a test that checks that multiple rows
> spawn more than more messages in some distributed fashion?  Did you
> look at this idea?

If NOTICE per attribute and incrementing num_errors per row is
implemented, it ends up erroring out with ERROR:  missing data for
column "m"  for all-column-empty-row. Shall we treat this ERROR softly
too if on_error ignore is specified? Or shall we discuss this idea
separately?

-- tests for options on_error and log_verbosity
COPY check_ign_err FROM STDIN WITH (on_error ignore, log_verbosity 'verbose');
1   {1} 1
a   {2} 2
3   {3} 3333333333
4   {a, 4}  4

5   {5} 5
\.

NOTICE:  detected data type incompatibility at line number 2 for
column n; COPY check_ign_err
NOTICE:  detected data type incompatibility at line number 2 for
column m; COPY check_ign_err
NOTICE:  detected data type incompatibility at line number 2 for
column k; COPY check_ign_err
NOTICE:  detected data type incompatibility at line number 3 for
column k; COPY check_ign_err
NOTICE:  detected data type incompatibility at line number 4 for
column m; COPY check_ign_err
NOTICE:  detected data type incompatibility at line number 4 for
column k; COPY check_ign_err
NOTICE:  detected data type incompatibility at line number 5 for
column n; COPY check_ign_err
ERROR:  missing data for column "m"
CONTEXT:  COPY check_ign_err, line 5: ""

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Add new error_action COPY ON_ERROR "log"

От
Michael Paquier
Дата:
On Sat, Mar 09, 2024 at 12:01:49AM +0530, Bharath Rupireddy wrote:
> If NOTICE per attribute and incrementing num_errors per row is
> implemented, it ends up erroring out with ERROR:  missing data for
> column "m"  for all-column-empty-row. Shall we treat this ERROR softly
> too if on_error ignore is specified? Or shall we discuss this idea
> separately?
>
> ERROR:  missing data for column "m"
> CONTEXT:  COPY check_ign_err, line 5: ""

Hmm.  I have spent some time looking at the bevahior of ON_ERROR, and
there are two tests in copy2.sql, one for the case where there is more
data than attributes and a second where there is not enough data in a
row that checks for errors.

For example, take this table:
=# create table tab (a int, b int, c int);
CREATE TABLE

This case works, even if the row has clearly not enough attributes.
The first attribute can be parsed, not the second one and this causes
the remaining data of the row to be skipped:
=# copy tab from stdin (delimiter ',', on_error ignore);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> 1,
>> \.
NOTICE:  00000: 1 row was skipped due to data type incompatibility
LOCATION:  CopyFrom, copyfrom.c:1314
COPY 0

This case fails.  The first and the second attributes can be parsed,
and the line fails because we are missing the last attribute as of a
lack of delimiter:
=# copy tab from stdin (delimiter ',', on_error ignore);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> 1,1
>> \.
ERROR:  22P04: missing data for column "c"
CONTEXT:  COPY tab, line 1: "1,1"

This brings a weird edge case for the all-column-empty-row case you
are mentioning once if we try to get information about all the rows we
should expect, but this has as side effect to break a case that's
intended ro work with ON_ERROR, as far as I understand, which is to
skip entirely a raw on the first conversion error we find, even if
there is no data afterwards.  I was a bit confused by that first, but
I can also see why it is useful as-is on HEAD.

At the end of the day, this comes down to what is more helpful to the
user.  And I'd agree on the side what ON_ERROR does currently, which
is what your patch relies on: on the first conversion failure, give up
and skip the rest of the row because we cannot trust its contents.
That's my way of saying that I am fine with the proposal of your
patch, and that we cannot provide the full state of a row without
making the error stack of COPY more invasive.

Perhaps we could discuss this idea of ensuring that all the attributes
are on a row in a different thread, as you say, but I am not really
convinced that there's a strong need for it at this stage as ON_ERROR
is new to v17.  So it does not sound like a bad thing to let it brew
more before implementing more options and make the COPY paths more
complicated than they already are.  I suspect that this may trigger
some discussion during the beta/stability phase depending on the
initial feedback.  Perhaps I'm wrong, though.

+      <literal>verbose</literal>, a <literal>NOTICE</literal> message
+      containing the line number and column name for each discarded row is
+      emitted.

This should clarify that the column name refers to the attribute where
the input conversion has failed, I guess.  Specifying only "column
name" without more context is a bit confusing.
--
Michael

Вложения

Re: Add new error_action COPY ON_ERROR "log"

От
Bharath Rupireddy
Дата:
On Mon, Mar 11, 2024 at 11:16 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> At the end of the day, this comes down to what is more helpful to the
> user.  And I'd agree on the side what ON_ERROR does currently, which
> is what your patch relies on: on the first conversion failure, give up
> and skip the rest of the row because we cannot trust its contents.
> That's my way of saying that I am fine with the proposal of your
> patch, and that we cannot provide the full state of a row without
> making the error stack of COPY more invasive.

+1.

> +      <literal>verbose</literal>, a <literal>NOTICE</literal> message
> +      containing the line number and column name for each discarded row is
> +      emitted.
>
> This should clarify that the column name refers to the attribute where
> the input conversion has failed, I guess.  Specifying only "column
> name" without more context is a bit confusing.

Done.

Please see the attached v6 patch set.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Вложения

Re: Add new error_action COPY ON_ERROR "log"

От
Michael Paquier
Дата:
On Mon, Mar 11, 2024 at 06:00:00PM +0530, Bharath Rupireddy wrote:
> Please see the attached v6 patch set.

I am tempted to tweak a few things in the docs, the comments and the
tests (particularly adding more patterns for tuples that fail on
conversion while it's clear that there are not enough attributes after
the incorrect values), but that looks roughly OK.

Wouldn't it be better to squash the patches together, by the way?
--
Michael

Вложения

Re: Add new error_action COPY ON_ERROR "log"

От
Bharath Rupireddy
Дата:
On Tue, Mar 12, 2024 at 12:22 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Mon, Mar 11, 2024 at 06:00:00PM +0530, Bharath Rupireddy wrote:
> > Please see the attached v6 patch set.
>
> I am tempted to tweak a few things in the docs, the comments and the
> tests (particularly adding more patterns for tuples that fail on
> conversion while it's clear that there are not enough attributes after
> the incorrect values), but that looks roughly OK.

+1. But, do you want to add them now or later as a separate
patch/discussion altogether?

> Wouldn't it be better to squash the patches together, by the way?

I guess not. They are two different things IMV.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Add new error_action COPY ON_ERROR "log"

От
Michael Paquier
Дата:
On Tue, Mar 12, 2024 at 12:54:29PM +0530, Bharath Rupireddy wrote:
> +1. But, do you want to add them now or later as a separate
> patch/discussion altogether?

The attached 0003 is what I had in mind:
- Simplification of the LOG generated with quotes applied around the
column name, don't see much need to add the relation name, either, for
consistency and because the knowledge is known in the query.
- A few more tests.
- Some doc changes.

>> Wouldn't it be better to squash the patches together, by the way?
>
> I guess not. They are two different things IMV.

Well, 0001 is sitting doing nothing because the COPY code does not
make use of it internally.
--
Michael

Вложения

Re: Add new error_action COPY ON_ERROR "log"

От
Bharath Rupireddy
Дата:
On Wed, Mar 13, 2024 at 5:16 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, Mar 12, 2024 at 12:54:29PM +0530, Bharath Rupireddy wrote:
> > +1. But, do you want to add them now or later as a separate
> > patch/discussion altogether?
>
> The attached 0003 is what I had in mind:
> - Simplification of the LOG generated with quotes applied around the
> column name, don't see much need to add the relation name, either, for
> consistency and because the knowledge is known in the query.
> - A few more tests.
> - Some doc changes.

LGMT. So, I've merged those changes into 0001 and 0002.

> >> Wouldn't it be better to squash the patches together, by the way?
> >
> > I guess not. They are two different things IMV.
>
> Well, 0001 is sitting doing nothing because the COPY code does not
> make use of it internally.

Yes. That's why I left a note in the commit message that a future
commit will use it.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Вложения

Re: Add new error_action COPY ON_ERROR "log"

От
Michael Paquier
Дата:
On Wed, Mar 13, 2024 at 08:08:42AM +0530, Bharath Rupireddy wrote:
> On Wed, Mar 13, 2024 at 5:16 AM Michael Paquier <michael@paquier.xyz> wrote:
>> The attached 0003 is what I had in mind:
>> - Simplification of the LOG generated with quotes applied around the
>> column name, don't see much need to add the relation name, either, for
>> consistency and because the knowledge is known in the query.
>> - A few more tests.
>> - Some doc changes.
>
> LGMT. So, I've merged those changes into 0001 and 0002.

I've applied the extra tests for now, as this was really confusing.

Hmm.  This NOTICE is really bugging me.  It is true that the clients
would get more information, but the information is duplicated on the
server side because the error context provides the same information as
the NOTICE itself:
NOTICE:  data type incompatibility at line 1 for column "a"
CONTEXT:  COPY aa, line 1, column a: "a"
STATEMENT:  copy aa from stdin with (on_error ignore, log_verbosity verbose);
--
Michael

Вложения

Re: Add new error_action COPY ON_ERROR "log"

От
Bharath Rupireddy
Дата:
On Wed, Mar 13, 2024 at 11:09 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> Hmm.  This NOTICE is really bugging me.  It is true that the clients
> would get more information, but the information is duplicated on the
> server side because the error context provides the same information as
> the NOTICE itself:
> NOTICE:  data type incompatibility at line 1 for column "a"
> CONTEXT:  COPY aa, line 1, column a: "a"
> STATEMENT:  copy aa from stdin with (on_error ignore, log_verbosity verbose);

Yes, if wanted, clients can also get the CONTEXT - for instance, using
'\set SHOW_CONTEXT always' in psql.

I think we can enhance the NOTICE message to include the column value
(just like CONTEXT message is showing) and leverage relname_only to
emit only the relation name in the CONTEXT message.

        /*
         * We suppress error context information other than the relation name,
         * if one of the operations below fails.
         */
        Assert(!cstate->relname_only);
        cstate->relname_only = true;

I'm attaching the v8 patch set implementing the above idea. With this,
[1] is sent to the client, [2] is sent to the server log. This
approach not only reduces the duplicate info in the NOTICE and CONTEXT
messages, but also makes it easy for users to get all the necessary
info in the NOTICE message without having to set extra parameters to
get CONTEXT message.

Another idea is to move even the table name to NOTICE message and hide
the context with errhidecontext when we emit the new NOTICE messages.

Thoughts?

[1]
NOTICE:  data type incompatibility at line 2 for column n: "a"
NOTICE:  data type incompatibility at line 3 for column k: "3333333333"
NOTICE:  data type incompatibility at line 4 for column m: "{a, 4}"
NOTICE:  data type incompatibility at line 5 for column n: ""
NOTICE:  data type incompatibility at line 7 for column m: "a"
NOTICE:  data type incompatibility at line 8 for column k: "a"
NOTICE:  6 rows were skipped due to data type incompatibility
COPY 3

[2]
2024-03-13 13:49:14.138 UTC [1330270] NOTICE:  data type
incompatibility at line 2 for column n: "a"
2024-03-13 13:49:14.138 UTC [1330270] CONTEXT:  COPY check_ign_err
2024-03-13 13:49:14.138 UTC [1330270] NOTICE:  data type
incompatibility at line 3 for column k: "3333333333"
2024-03-13 13:49:14.138 UTC [1330270] CONTEXT:  COPY check_ign_err
2024-03-13 13:49:14.138 UTC [1330270] NOTICE:  data type
incompatibility at line 4 for column m: "{a, 4}"
2024-03-13 13:49:14.138 UTC [1330270] CONTEXT:  COPY check_ign_err
2024-03-13 13:49:14.138 UTC [1330270] NOTICE:  data type
incompatibility at line 5 for column n: ""
2024-03-13 13:49:14.138 UTC [1330270] CONTEXT:  COPY check_ign_err
2024-03-13 13:49:14.138 UTC [1330270] NOTICE:  data type
incompatibility at line 7 for column m: "a"
2024-03-13 13:49:14.138 UTC [1330270] CONTEXT:  COPY check_ign_err
2024-03-13 13:49:14.138 UTC [1330270] NOTICE:  data type
incompatibility at line 8 for column k: "a"
2024-03-13 13:49:14.138 UTC [1330270] CONTEXT:  COPY check_ign_err
2024-03-13 13:49:14.138 UTC [1330270] NOTICE:  6 rows were skipped due
to data type incompatibility

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Вложения

Re: Add new error_action COPY ON_ERROR "log"

От
Michael Paquier
Дата:
On Wed, Mar 13, 2024 at 07:32:37PM +0530, Bharath Rupireddy wrote:
> I'm attaching the v8 patch set implementing the above idea. With this,
> [1] is sent to the client, [2] is sent to the server log. This
> approach not only reduces the duplicate info in the NOTICE and CONTEXT
> messages, but also makes it easy for users to get all the necessary
> info in the NOTICE message without having to set extra parameters to
> get CONTEXT message.

In terms of eliminating the information duplication while allowing
clients to know the value, the attribute and the line involved in the
conversion failure, the approach of tweaking the context information
has merits, I guess.

How do others feel about this kind of tuning?
--
Michael

Вложения

Re: Add new error_action COPY ON_ERROR "log"

От
Kyotaro Horiguchi
Дата:
At Fri, 15 Mar 2024 16:57:25 +0900, Michael Paquier <michael@paquier.xyz> wrote in 
> On Wed, Mar 13, 2024 at 07:32:37PM +0530, Bharath Rupireddy wrote:
> > I'm attaching the v8 patch set implementing the above idea. With this,
> > [1] is sent to the client, [2] is sent to the server log. This
> > approach not only reduces the duplicate info in the NOTICE and CONTEXT
> > messages, but also makes it easy for users to get all the necessary
> > info in the NOTICE message without having to set extra parameters to
> > get CONTEXT message.
> 
> In terms of eliminating the information duplication while allowing
> clients to know the value, the attribute and the line involved in the
> conversion failure, the approach of tweaking the context information
> has merits, I guess.
> 
> How do others feel about this kind of tuning?

If required, I think that we have already included the minimum
information necessary for the primary diagnosis, including locations,
within the primary messages. Here is an example:

> LOG:  00000: invalid record length at 0/18049F8: expected at least 24, got 0

And I believe that CONTEXT, if it exists, is augmentation information
to the primary messages. The objective of the precedent for the use of
relname_only was somewhat different, but this use also seems legit.

In short, I think the distribution between message types (primary and
context) is fine as it is in the latest patch.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Add new error_action COPY ON_ERROR "log"

От
Michael Paquier
Дата:
On Mon, Mar 18, 2024 at 12:05:17PM +0900, Kyotaro Horiguchi wrote:
> And I believe that CONTEXT, if it exists, is augmentation information
> to the primary messages. The objective of the precedent for the use of
> relname_only was somewhat different, but this use also seems legit.
>
> In short, I think the distribution between message types (primary and
> context) is fine as it is in the latest patch.

Okay, thanks for the feedback.
--
Michael

Вложения

Re: Add new error_action COPY ON_ERROR "log"

От
Masahiko Sawada
Дата:
On Wed, Mar 13, 2024 at 11:02 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Wed, Mar 13, 2024 at 11:09 AM Michael Paquier <michael@paquier.xyz> wrote:
> >
> > Hmm.  This NOTICE is really bugging me.  It is true that the clients
> > would get more information, but the information is duplicated on the
> > server side because the error context provides the same information as
> > the NOTICE itself:
> > NOTICE:  data type incompatibility at line 1 for column "a"
> > CONTEXT:  COPY aa, line 1, column a: "a"
> > STATEMENT:  copy aa from stdin with (on_error ignore, log_verbosity verbose);
>
> Yes, if wanted, clients can also get the CONTEXT - for instance, using
> '\set SHOW_CONTEXT always' in psql.
>
> I think we can enhance the NOTICE message to include the column value
> (just like CONTEXT message is showing) and leverage relname_only to
> emit only the relation name in the CONTEXT message.
>
>         /*
>          * We suppress error context information other than the relation name,
>          * if one of the operations below fails.
>          */
>         Assert(!cstate->relname_only);
>         cstate->relname_only = true;
>
> I'm attaching the v8 patch set implementing the above idea. With this,
> [1] is sent to the client, [2] is sent to the server log. This
> approach not only reduces the duplicate info in the NOTICE and CONTEXT
> messages, but also makes it easy for users to get all the necessary
> info in the NOTICE message without having to set extra parameters to
> get CONTEXT message.
>
> Another idea is to move even the table name to NOTICE message and hide
> the context with errhidecontext when we emit the new NOTICE messages.
>
> Thoughts?
>

The current approach, eliminating the duplicated information in
CONTEXT, seems good to me.

One question about the latest (v8) patch:

+                   else
+                       ereport(NOTICE,
+                               errmsg("data type incompatibility at
line %llu for column %s: null input",
+                                      (unsigned long long) cstate->cur_lineno,
+                                      cstate->cur_attname));
+

How can we reach this path? It seems we don't cover this path by the tests.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Add new error_action COPY ON_ERROR "log"

От
Bharath Rupireddy
Дата:
On Mon, Mar 25, 2024 at 10:42 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> The current approach, eliminating the duplicated information in
> CONTEXT, seems good to me.

Thanks for looking into it.

> One question about the latest (v8) patch:
>
> +                   else
> +                       ereport(NOTICE,
> +                               errmsg("data type incompatibility at
> line %llu for column %s: null input",
> +                                      (unsigned long long) cstate->cur_lineno,
> +                                      cstate->cur_attname));
> +
>
> How can we reach this path? It seems we don't cover this path by the tests.

Tests don't cover that part, but it can be hit with something like
[1]. I've added a test for this.

Note the use of domain to provide an indirect way of providing null
constraint check. Otherwise, COPY FROM fails early in
CopyFrom->ExecConstraints if the NOT NULL constraint is directly
provided next to the column in the table [2].

Please see the attached v9 patch set.

[1]
create domain dcheck_ign_err2    varchar(15) NOT NULL;
CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
COPY check_ign_err2 FROM STDIN WITH (on_error ignore, log_verbosity verbose);
1    {1}    1    'foo'
2    {2}    2    \N
\.

[2]
CREATE TABLE check_ign_err3 (n int, m int[], k int, l varchar(15) NOT NULL);
postgres=# COPY check_ign_err3 FROM STDIN WITH (on_error ignore,
log_verbosity verbose);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> 1    {1}     1       'foo'
2       {2}     2       \N>>
>> \.
ERROR:  null value in column "l" of relation "check_ign_err3" violates
not-null constraint
DETAIL:  Failing row contains (2, {2}, 2, null).
CONTEXT:  COPY check_ign_err3, line 2: "2       {2}     2       \N"

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Вложения

Re: Add new error_action COPY ON_ERROR "log"

От
Masahiko Sawada
Дата:
On Mon, Mar 25, 2024 at 8:21 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Mon, Mar 25, 2024 at 10:42 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > The current approach, eliminating the duplicated information in
> > CONTEXT, seems good to me.
>
> Thanks for looking into it.
>
> > One question about the latest (v8) patch:
> >
> > +                   else
> > +                       ereport(NOTICE,
> > +                               errmsg("data type incompatibility at
> > line %llu for column %s: null input",
> > +                                      (unsigned long long) cstate->cur_lineno,
> > +                                      cstate->cur_attname));
> > +
> >
> > How can we reach this path? It seems we don't cover this path by the tests.
>
> Tests don't cover that part, but it can be hit with something like
> [1]. I've added a test for this.
>
> Note the use of domain to provide an indirect way of providing null
> constraint check. Otherwise, COPY FROM fails early in
> CopyFrom->ExecConstraints if the NOT NULL constraint is directly
> provided next to the column in the table [2].
>
> Please see the attached v9 patch set.
>

Thank you for updating the patch! The patch mostly looks good to me.
Here are some minor comments:

---
 /* non-export function prototypes */
-static char *limit_printout_length(const char *str);
-
static void ClosePipeFromProgram(CopyFromState cstate);

Now that we have only one function we should replace "prototypes" with
"prototype".

---
+                                                ereport(NOTICE,
+
errmsg("data type incompatibility at line %llu for column %s: \"%s\"",
+
     (unsigned long long) cstate->cur_lineno,
+
     cstate->cur_attname,
+
     attval));

I guess it would be better to make the log message clearer to convey
what we did for the malformed row. For example, how about something
like "skipping row due to data type incompatibility at line %llu for
column %s: \"s\""?

---
 extern void CopyFromErrorCallback(void *arg);
+extern char *limit_printout_length(const char *str);

I don't disagree with exposing the limit_printout_length() function
but I think it's better to rename it for consistency with other
exposed COPY command functions. Only this function is snake-case. How
about CopyLimitPrintoutLength() or similar?

FWIW I'm going to merge two patches before the push.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Add new error_action COPY ON_ERROR "log"

От
Bharath Rupireddy
Дата:
On Tue, Mar 26, 2024 at 7:16 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> > Please see the attached v9 patch set.
>
> Thank you for updating the patch! The patch mostly looks good to me.
> Here are some minor comments:

Thanks for looking into this.

> ---
>  /* non-export function prototypes */
> -static char *limit_printout_length(const char *str);
> -
> static void ClosePipeFromProgram(CopyFromState cstate);
>
> Now that we have only one function we should replace "prototypes" with
> "prototype".

Well no. We might add a few more (never know). A quick look around the
GUCs under /* GUCs */ tells me that plural form there is being used
even just one GUC is defined (xlogprefetcher.c for instance).

> ---
> +                                                ereport(NOTICE,
> +
> errmsg("data type incompatibility at line %llu for column %s: \"%s\"",
> +
>      (unsigned long long) cstate->cur_lineno,
> +
>      cstate->cur_attname,
> +
>      attval));
>
> I guess it would be better to make the log message clearer to convey
> what we did for the malformed row. For example, how about something
> like "skipping row due to data type incompatibility at line %llu for
> column %s: \"s\""?

The summary message which gets printed at the end says that "NOTICE:
6 rows were skipped due to data type incompatibility". Isn't this
enough? If someone is using ON_ERROR 'ignore', it's quite natural that
such rows get skipped softly and the summary message can help them,
no?

> ---
>  extern void CopyFromErrorCallback(void *arg);
> +extern char *limit_printout_length(const char *str);
>
> I don't disagree with exposing the limit_printout_length() function
> but I think it's better to rename it for consistency with other
> exposed COPY command functions. Only this function is snake-case. How
> about CopyLimitPrintoutLength() or similar?

WFM. Although its implementation is not related to COPY code, COPY is
the sole user of it right now, so I'm fine with it. Done that.

> FWIW I'm going to merge two patches before the push.

Done that.

Please see the attached v10 patch.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Вложения

Re: Add new error_action COPY ON_ERROR "log"

От
Masahiko Sawada
Дата:
On Tue, Mar 26, 2024 at 12:23 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Tue, Mar 26, 2024 at 7:16 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > > Please see the attached v9 patch set.
> >
> > Thank you for updating the patch! The patch mostly looks good to me.
> > Here are some minor comments:
>
> Thanks for looking into this.
>
> > ---
> >  /* non-export function prototypes */
> > -static char *limit_printout_length(const char *str);
> > -
> > static void ClosePipeFromProgram(CopyFromState cstate);
> >
> > Now that we have only one function we should replace "prototypes" with
> > "prototype".
>
> Well no. We might add a few more (never know). A quick look around the
> GUCs under /* GUCs */ tells me that plural form there is being used
> even just one GUC is defined (xlogprefetcher.c for instance).

Understood.

>
> > ---
> > +                                                ereport(NOTICE,
> > +
> > errmsg("data type incompatibility at line %llu for column %s: \"%s\"",
> > +
> >      (unsigned long long) cstate->cur_lineno,
> > +
> >      cstate->cur_attname,
> > +
> >      attval));
> >
> > I guess it would be better to make the log message clearer to convey
> > what we did for the malformed row. For example, how about something
> > like "skipping row due to data type incompatibility at line %llu for
> > column %s: \"s\""?
>
> The summary message which gets printed at the end says that "NOTICE:
> 6 rows were skipped due to data type incompatibility". Isn't this
> enough? If someone is using ON_ERROR 'ignore', it's quite natural that
> such rows get skipped softly and the summary message can help them,
> no?

I think that in the main log message we should mention what happened
(or is happening) or what we did (or are doing). If the message "data
type incompatibility ..." was in the DETAIL message with the main
message saying something like "skipping row at line %llu for column
%s: ...", it would make sense to me. But the current message seems not
to be clear to me and consistent with other NOTICE messages. Also, the
last summary line would not be written if the user cancelled, and
someone other than person who used ON_ERROR 'ignore' might check the
server logs later.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Add new error_action COPY ON_ERROR "log"

От
Bharath Rupireddy
Дата:
On Tue, Mar 26, 2024 at 9:56 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> > > errmsg("data type incompatibility at line %llu for column %s: \"%s\"",
>
> > > I guess it would be better to make the log message clearer to convey
> > > what we did for the malformed row. For example, how about something
> > > like "skipping row due to data type incompatibility at line %llu for
> > > column %s: \"s\""?
> >
> > The summary message which gets printed at the end says that "NOTICE:
> > 6 rows were skipped due to data type incompatibility". Isn't this
> > enough? If someone is using ON_ERROR 'ignore', it's quite natural that
> > such rows get skipped softly and the summary message can help them,
> > no?
>
> I think that in the main log message we should mention what happened
> (or is happening) or what we did (or are doing). If the message "data
> type incompatibility ..." was in the DETAIL message with the main
> message saying something like "skipping row at line %llu for column
> %s: ...", it would make sense to me. But the current message seems not
> to be clear to me and consistent with other NOTICE messages. Also, the
> last summary line would not be written if the user cancelled, and
> someone other than person who used ON_ERROR 'ignore' might check the
> server logs later.

Agree. I changed the NOTICE message to what you've suggested. Thanks.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Вложения

Re: Add new error_action COPY ON_ERROR "log"

От
Masahiko Sawada
Дата:
On Tue, Mar 26, 2024 at 3:04 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Tue, Mar 26, 2024 at 9:56 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > > > errmsg("data type incompatibility at line %llu for column %s: \"%s\"",
> >
> > > > I guess it would be better to make the log message clearer to convey
> > > > what we did for the malformed row. For example, how about something
> > > > like "skipping row due to data type incompatibility at line %llu for
> > > > column %s: \"s\""?
> > >
> > > The summary message which gets printed at the end says that "NOTICE:
> > > 6 rows were skipped due to data type incompatibility". Isn't this
> > > enough? If someone is using ON_ERROR 'ignore', it's quite natural that
> > > such rows get skipped softly and the summary message can help them,
> > > no?
> >
> > I think that in the main log message we should mention what happened
> > (or is happening) or what we did (or are doing). If the message "data
> > type incompatibility ..." was in the DETAIL message with the main
> > message saying something like "skipping row at line %llu for column
> > %s: ...", it would make sense to me. But the current message seems not
> > to be clear to me and consistent with other NOTICE messages. Also, the
> > last summary line would not be written if the user cancelled, and
> > someone other than person who used ON_ERROR 'ignore' might check the
> > server logs later.
>
> Agree. I changed the NOTICE message to what you've suggested. Thanks.
>

Thank you for updating the patch! Looks good to me.

Please find the attached patch. I've made some changes for the
documentation and the commit message. I'll push it, barring any
objections.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Вложения

Re: Add new error_action COPY ON_ERROR "log"

От
Bharath Rupireddy
Дата:
On Tue, Mar 26, 2024 at 1:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> Thank you for updating the patch! Looks good to me.
>
> Please find the attached patch. I've made some changes for the
> documentation and the commit message. I'll push it, barring any
> objections.

Thanks. v12 patch LGTM.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Add new error_action COPY ON_ERROR "log"

От
Masahiko Sawada
Дата:
On Tue, Mar 26, 2024 at 6:36 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Tue, Mar 26, 2024 at 1:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > Thank you for updating the patch! Looks good to me.
> >
> > Please find the attached patch. I've made some changes for the
> > documentation and the commit message. I'll push it, barring any
> > objections.
>
> Thanks. v12 patch LGTM.
>

While testing the new option, I realized that the tab-completion
complements DEFAULT value, but it doesn't work without single-quotes:

postgres(1:2179134)=# copy test from '/tmp/dump.data' with
(log_verbosity default );
ERROR:  syntax error at or near "default"
LINE 1: ...py test from '/tmp/dump.data' with (log_verbosity default );
                                                             ^
postgres(1:2179134)=# copy test from '/tmp/dump.data' with
(log_verbosity 'default' );
COPY 0

Whereas VERBOSE works even without single-quotes:

postgres(1:2179134)=# copy test from '/tmp/dump.data' with
(log_verbosity verbose );
COPY 0

postgres(1:2179134)=# copy test from '/tmp/dump.data' with
(log_verbosity 'verbose' );
COPY 0

Which could confuse users. This is because DEFAULT is a reserved
keyword and the COPY option doesn't accept them as an option value.

I think that there are two options to handle it:

1. change COPY grammar to accept DEFAULT as an option value.
2. change tab-completion to complement 'DEFAULT' instead of DEFAULT,
and update the doc too.

As for the documentation, we can add single-quotes as follows:

     ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
+    LOG_VERBOSITY [ '<replaceable class="parameter">mode</replaceable>' ]

I thought the option (2) is better but there seems no precedent of
complementing a single-quoted string other than file names. So the
option (1) could be clearer.

What do you think?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Add new error_action COPY ON_ERROR "log"

От
Bharath Rupireddy
Дата:
On Wed, Mar 27, 2024 at 7:42 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> I think that there are two options to handle it:
>
> 1. change COPY grammar to accept DEFAULT as an option value.
> 2. change tab-completion to complement 'DEFAULT' instead of DEFAULT,
> and update the doc too.
>
> As for the documentation, we can add single-quotes as follows:
>
>      ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
> +    LOG_VERBOSITY [ '<replaceable class="parameter">mode</replaceable>' ]
>
> I thought the option (2) is better but there seems no precedent of
> complementing a single-quoted string other than file names. So the
> option (1) could be clearer.
>
> What do you think?

There is another option to change log_verbosity to {none, verbose} or
{none, skip_row_info} (discuseed here
https://www.postgresql.org/message-id/Zelrqq-pkfkvsjPn%40paquier.xyz
that we might extend this option to other use-cases in future). I tend
to agree with you to support log_verbose to be set to default without
quotes just to be consistent with other commands that allow that [1].
And, thanks for quickly identifying where to change in the gram.y.
With that change, now I have changed all the new tests added to use
log_verbosity default without quotes.

FWIW, a recent commit [2] did the same. Therefore, I don't see a
problem supporting it that way for COPY log_verbosity.

Please find the attached v13 patch with the change.

[1]
column_compression:
            COMPRESSION ColId                        { $$ = $2; }
            | COMPRESSION DEFAULT                    { $$ =
pstrdup("default"); }
        ;

column_storage:
            STORAGE ColId                            { $$ = $2; }
            | STORAGE DEFAULT                        { $$ =
pstrdup("default"); }
        ;

[2]
commit b9424d014e195386a83b0f1fe9f5a8e5727e46ea
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Thu Nov 10 18:20:49 2022 -0500

    Support writing "CREATE/ALTER TABLE ... SET STORAGE DEFAULT".

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Вложения

Re: Add new error_action COPY ON_ERROR "log"

От
Masahiko Sawada
Дата:
On Thu, Mar 28, 2024 at 2:49 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Wed, Mar 27, 2024 at 7:42 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > I think that there are two options to handle it:
> >
> > 1. change COPY grammar to accept DEFAULT as an option value.
> > 2. change tab-completion to complement 'DEFAULT' instead of DEFAULT,
> > and update the doc too.
> >
> > As for the documentation, we can add single-quotes as follows:
> >
> >      ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
> > +    LOG_VERBOSITY [ '<replaceable class="parameter">mode</replaceable>' ]
> >
> > I thought the option (2) is better but there seems no precedent of
> > complementing a single-quoted string other than file names. So the
> > option (1) could be clearer.
> >
> > What do you think?
>
> There is another option to change log_verbosity to {none, verbose} or
> {none, skip_row_info} (discuseed here
> https://www.postgresql.org/message-id/Zelrqq-pkfkvsjPn%40paquier.xyz
> that we might extend this option to other use-cases in future). I tend
> to agree with you to support log_verbose to be set to default without
> quotes just to be consistent with other commands that allow that [1].
> And, thanks for quickly identifying where to change in the gram.y.
> With that change, now I have changed all the new tests added to use
> log_verbosity default without quotes.
>
> FWIW, a recent commit [2] did the same. Therefore, I don't see a
> problem supporting it that way for COPY log_verbosity.
>
> Please find the attached v13 patch with the change.

Thank you for updating the patch quickly, and sharing the reference.

I think {default, verbose} is a good start and more consistent with
other similar features. We can add other modes later.

Regarding the syntax change, since copy_generic_opt_arg rule is only
for COPY option syntax, the change doesn't affect other syntaxes. I've
confirmed the tab-completion works fine.

I'll push the patch, barring any objections.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Add new error_action COPY ON_ERROR "log"

От
torikoshia
Дата:
On 2024-03-26 17:16, Masahiko Sawada wrote:
> On Tue, Mar 26, 2024 at 3:04 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
>> 
>> On Tue, Mar 26, 2024 at 9:56 AM Masahiko Sawada 
>> <sawada.mshk@gmail.com> wrote:
>> >
>> > > > errmsg("data type incompatibility at line %llu for column %s: \"%s\"",
>> >
>> > > > I guess it would be better to make the log message clearer to convey
>> > > > what we did for the malformed row. For example, how about something
>> > > > like "skipping row due to data type incompatibility at line %llu for
>> > > > column %s: \"s\""?
>> > >
>> > > The summary message which gets printed at the end says that "NOTICE:
>> > > 6 rows were skipped due to data type incompatibility". Isn't this
>> > > enough? If someone is using ON_ERROR 'ignore', it's quite natural that
>> > > such rows get skipped softly and the summary message can help them,
>> > > no?
>> >
>> > I think that in the main log message we should mention what happened
>> > (or is happening) or what we did (or are doing). If the message "data
>> > type incompatibility ..." was in the DETAIL message with the main
>> > message saying something like "skipping row at line %llu for column
>> > %s: ...", it would make sense to me. But the current message seems not
>> > to be clear to me and consistent with other NOTICE messages. Also, the
>> > last summary line would not be written if the user cancelled, and
>> > someone other than person who used ON_ERROR 'ignore' might check the
>> > server logs later.
>> 
>> Agree. I changed the NOTICE message to what you've suggested. Thanks.
>> 
> 
> Thank you for updating the patch! Looks good to me.
> 
> Please find the attached patch. I've made some changes for the
> documentation and the commit message. I'll push it, barring any
> objections.
> 
> Regards,
> 
> --
> Masahiko Sawada
> Amazon Web Services: https://aws.amazon.com

Thanks!

Attached patch fixes the doc, but I'm wondering perhaps it might be 
better to modify the codes to prohibit abbreviation of the value.

When seeing the query which abbreviates ON_ERROR value, I feel it's not 
obvious what happens compared to other options which tolerates 
abbreviation of the value such as FREEZE or HEADER.

   COPY t1 FROM stdin WITH (ON_ERROR);

What do you think?

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation



Re: Add new error_action COPY ON_ERROR "log"

От
Bharath Rupireddy
Дата:
On Thu, Mar 28, 2024 at 1:43 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
>
> Attached patch fixes the doc,

May I know which patch you are referring to? And, what do you mean by
"fixes the doc"?

> but I'm wondering perhaps it might be
> better to modify the codes to prohibit abbreviation of the value.

Please help me understand the meaning here.

> When seeing the query which abbreviates ON_ERROR value, I feel it's not
> obvious what happens compared to other options which tolerates
> abbreviation of the value such as FREEZE or HEADER.
>
>    COPY t1 FROM stdin WITH (ON_ERROR);
>
> What do you think?

So, do you mean to prohibit ON_ERROR being specified without any value
like in COPY t1 FROM stdin WITH (ON_ERROR);? If yes, I think all the
other options do allow that [1].

Even if we were to do something like this, shall we discuss this separately?

Having said that, what do you think of the v13 patch posted upthread?

[1]
postgres=# COPY t1 FROM stdin WITH (
DEFAULT         ESCAPE          FORCE_QUOTE     HEADER          QUOTE
DELIMITER       FORCE_NOT_NULL  FORMAT          NULL
ENCODING        FORCE_NULL      FREEZE          ON_ERROR

postgres=# COPY t1 FROM stdin WITH ( QUOTE );
ERROR:  relation "t1" does not exist
postgres=# COPY t1 FROM stdin WITH ( DEFAULT );
ERROR:  relation "t1" does not exist
postgres=# COPY t1 FROM stdin WITH ( ENCODING );
ERROR:  relation "t1" does not exist

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Add new error_action COPY ON_ERROR "log"

От
torikoshia
Дата:
On 2024-03-28 17:27, Bharath Rupireddy wrote:
> On Thu, Mar 28, 2024 at 1:43 PM torikoshia <torikoshia@oss.nttdata.com> 
> wrote:
>> 
>> Attached patch fixes the doc,
> 
> May I know which patch you are referring to? And, what do you mean by
> "fixes the doc"?

Ugh, I replied to the wrong thread.
Sorry for making you confused and please ignore it.

>> but I'm wondering perhaps it might be
>> better to modify the codes to prohibit abbreviation of the value.
> 
> Please help me understand the meaning here.
> 
>> When seeing the query which abbreviates ON_ERROR value, I feel it's 
>> not
>> obvious what happens compared to other options which tolerates
>> abbreviation of the value such as FREEZE or HEADER.
>> 
>>    COPY t1 FROM stdin WITH (ON_ERROR);
>> 
>> What do you think?
> 
> So, do you mean to prohibit ON_ERROR being specified without any value
> like in COPY t1 FROM stdin WITH (ON_ERROR);? If yes, I think all the
> other options do allow that [1].
> 
> Even if we were to do something like this, shall we discuss this 
> separately?
> 
> Having said that, what do you think of the v13 patch posted upthread?
> 
> [1]
> postgres=# COPY t1 FROM stdin WITH (
> DEFAULT         ESCAPE          FORCE_QUOTE     HEADER          QUOTE
> DELIMITER       FORCE_NOT_NULL  FORMAT          NULL
> ENCODING        FORCE_NULL      FREEZE          ON_ERROR
> 
> postgres=# COPY t1 FROM stdin WITH ( QUOTE );
> ERROR:  relation "t1" does not exist
> postgres=# COPY t1 FROM stdin WITH ( DEFAULT );
> ERROR:  relation "t1" does not exist
> postgres=# COPY t1 FROM stdin WITH ( ENCODING );
> ERROR:  relation "t1" does not exist


-- 
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation



Re: Add new error_action COPY ON_ERROR "log"

От
Masahiko Sawada
Дата:
On Thu, Mar 28, 2024 at 5:28 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Thu, Mar 28, 2024 at 1:43 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
> >
> > Attached patch fixes the doc,
>
> May I know which patch you are referring to? And, what do you mean by
> "fixes the doc"?
>
> > but I'm wondering perhaps it might be
> > better to modify the codes to prohibit abbreviation of the value.
>
> Please help me understand the meaning here.
>
> > When seeing the query which abbreviates ON_ERROR value, I feel it's not
> > obvious what happens compared to other options which tolerates
> > abbreviation of the value such as FREEZE or HEADER.
> >
> >    COPY t1 FROM stdin WITH (ON_ERROR);
> >
> > What do you think?
>
> So, do you mean to prohibit ON_ERROR being specified without any value
> like in COPY t1 FROM stdin WITH (ON_ERROR);? If yes, I think all the
> other options do allow that [1].
>
> Even if we were to do something like this, shall we discuss this separately?
>
> Having said that, what do you think of the v13 patch posted upthread?
>

This topic accidentally jumped in this thread, but it made me think
that the same might be true for the LOG_VERBOSITY option. That is,
since the LOG_VERBOSITY option is an enum parameter, it might make
more sense to require the value, instead of making the value optional.
For example, the following command could not be obvious for users:

COPY test FROM stdin (ON_ERROR ignore, LOG_VERBOSITY);

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Add new error_action COPY ON_ERROR "log"

От
torikoshia
Дата:
On 2024-03-28 21:36, torikoshia wrote:
> On 2024-03-28 17:27, Bharath Rupireddy wrote:
>> On Thu, Mar 28, 2024 at 1:43 PM torikoshia 
>> <torikoshia@oss.nttdata.com> wrote:
>>> 
>>> Attached patch fixes the doc,
>> 
>> May I know which patch you are referring to? And, what do you mean by
>> "fixes the doc"?
> 
> Ugh, I replied to the wrong thread.
> Sorry for making you confused and please ignore it.
> 
>>> but I'm wondering perhaps it might be
>>> better to modify the codes to prohibit abbreviation of the value.
>> 
>> Please help me understand the meaning here.
>> 
>>> When seeing the query which abbreviates ON_ERROR value, I feel it's 
>>> not
>>> obvious what happens compared to other options which tolerates
>>> abbreviation of the value such as FREEZE or HEADER.
>>> 
>>>    COPY t1 FROM stdin WITH (ON_ERROR);
>>> 
>>> What do you think?
>> 
>> So, do you mean to prohibit ON_ERROR being specified without any value
>> like in COPY t1 FROM stdin WITH (ON_ERROR);? If yes, I think all the
>> other options do allow that [1].
>> 
>> Even if we were to do something like this, shall we discuss this 
>> separately?
>> 
>> Having said that, what do you think of the v13 patch posted upthread?

It looks good to me other than what Sawada-san lastly pointed out.

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation



Re: Add new error_action COPY ON_ERROR "log"

От
Bharath Rupireddy
Дата:
On Thu, Mar 28, 2024 at 6:31 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> That is,
> since the LOG_VERBOSITY option is an enum parameter, it might make
> more sense to require the value, instead of making the value optional.
> For example, the following command could not be obvious for users:
>
> COPY test FROM stdin (ON_ERROR ignore, LOG_VERBOSITY);

Agreed. Please see the attached v14 patch. The LOG_VERBOSITY now needs
a value to be specified. Note that I've not added any test for this
case as there seemed to be no such tests so far generating "ERROR:
<<option>> requires a parameter". I don't mind adding one for
LOG_VERBOSITY though.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Вложения

Re: Add new error_action COPY ON_ERROR "log"

От
Masahiko Sawada
Дата:
On Sat, Mar 30, 2024 at 11:05 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Thu, Mar 28, 2024 at 6:31 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > That is,
> > since the LOG_VERBOSITY option is an enum parameter, it might make
> > more sense to require the value, instead of making the value optional.
> > For example, the following command could not be obvious for users:
> >
> > COPY test FROM stdin (ON_ERROR ignore, LOG_VERBOSITY);
>
> Agreed. Please see the attached v14 patch.

Thank you for updating the patch!

>  The LOG_VERBOSITY now needs
> a value to be specified. Note that I've not added any test for this
> case as there seemed to be no such tests so far generating "ERROR:
> <<option>> requires a parameter". I don't mind adding one for
> LOG_VERBOSITY though.

+1

One minor point:

     ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
+    LOG_VERBOSITY [ <replaceable class="parameter">mode</replaceable> ]
 </synopsis>

'[' and ']' are not necessary because the value is no longer optional.

I've attached the updated patch. I'll push it, barring any objections.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Вложения

Re: Add new error_action COPY ON_ERROR "log"

От
Masahiko Sawada
Дата:
On Mon, Apr 1, 2024 at 10:03 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Sat, Mar 30, 2024 at 11:05 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Thu, Mar 28, 2024 at 6:31 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > That is,
> > > since the LOG_VERBOSITY option is an enum parameter, it might make
> > > more sense to require the value, instead of making the value optional.
> > > For example, the following command could not be obvious for users:
> > >
> > > COPY test FROM stdin (ON_ERROR ignore, LOG_VERBOSITY);
> >
> > Agreed. Please see the attached v14 patch.
>
> Thank you for updating the patch!
>
> >  The LOG_VERBOSITY now needs
> > a value to be specified. Note that I've not added any test for this
> > case as there seemed to be no such tests so far generating "ERROR:
> > <<option>> requires a parameter". I don't mind adding one for
> > LOG_VERBOSITY though.
>
> +1
>
> One minor point:
>
>      ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
> +    LOG_VERBOSITY [ <replaceable class="parameter">mode</replaceable> ]
>  </synopsis>
>
> '[' and ']' are not necessary because the value is no longer optional.
>
> I've attached the updated patch. I'll push it, barring any objections.
>

Pushed.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Add new error_action COPY ON_ERROR "log"

От
Michael Paquier
Дата:
On Tue, Apr 02, 2024 at 09:53:57AM +0900, Masahiko Sawada wrote:
> Pushed.

Thanks for following up with this thread.
--
Michael

Вложения