Обсуждение: Small fix on COPY ON_ERROR document

Поиск
Список
Период
Сортировка

Small fix on COPY ON_ERROR document

От
Yugo NAGATA
Дата:
Hi,

I found that the documentation of COPY ON_ERROR said
COPY stops operation at the first error when 
"ON_ERROR is not specified.", but it also stop when
ON_ERROR is specified to the default value, "stop".

I attached a very small patch to fix this just for
making the description more accurate.

Regards,
Yugo Nagata 

-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Вложения

Re: Small fix on COPY ON_ERROR document

От
Masahiko Sawada
Дата:
On Fri, Jan 26, 2024 at 11:28 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote:
>
> Hi,
>
> I found that the documentation of COPY ON_ERROR said
> COPY stops operation at the first error when
> "ON_ERROR is not specified.", but it also stop when
> ON_ERROR is specified to the default value, "stop".
>
> I attached a very small patch to fix this just for
> making the description more accurate.

Thank you for the patch!

+1 to fix it.

-    <literal>ON_ERROR</literal> is not specified. This
-    should not lead to problems in the event of a <command>COPY
+    <literal>ON_ERROR</literal> is not specified or <literal>stop</literal>.
+    This should not lead to problems in the event of a <command>COPY

How about the followings for consistency with the description of the
ON_ERROR option?

COPY stops operation at the first error if the stop value is specified
to the ON_ERROR option. This should not lead to ...

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Small fix on COPY ON_ERROR document

От
Yugo NAGATA
Дата:
On Fri, 26 Jan 2024 13:59:09 +0900
Masahiko Sawada <sawada.mshk@gmail.com> wrote:

> On Fri, Jan 26, 2024 at 11:28 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote:
> >
> > Hi,
> >
> > I found that the documentation of COPY ON_ERROR said
> > COPY stops operation at the first error when
> > "ON_ERROR is not specified.", but it also stop when
> > ON_ERROR is specified to the default value, "stop".
> >
> > I attached a very small patch to fix this just for
> > making the description more accurate.
> 
> Thank you for the patch!
> 
> +1 to fix it.
> 
> -    <literal>ON_ERROR</literal> is not specified. This
> -    should not lead to problems in the event of a <command>COPY
> +    <literal>ON_ERROR</literal> is not specified or <literal>stop</literal>.
> +    This should not lead to problems in the event of a <command>COPY
> 
> How about the followings for consistency with the description of the
> ON_ERROR option?
> 
> COPY stops operation at the first error if the stop value is specified
> to the ON_ERROR option. This should not lead to ...

Thank you for you review. However, after posting the previous patch, 
I noticed that I overlooked ON_ERROR works only for errors due to data
type incompatibility (that is mentioned as "malformed data" in the 
ON_ERROR description, though). 

So, how about rewriting this like:

 COPY stops operation at the first error unless the error is due to data
 type incompatibility and a value other than stop is specified to the
 ON_ERROR option.

I attached the updated patch.

Regards,
Yugo Nagata

> 
> Regards,
> 
> -- 
> Masahiko Sawada
> Amazon Web Services: https://aws.amazon.com


-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Вложения

Re: Small fix on COPY ON_ERROR document

От
Masahiko Sawada
Дата:
On Fri, Jan 26, 2024 at 2:40 PM Yugo NAGATA <nagata@sraoss.co.jp> wrote:
>
> On Fri, 26 Jan 2024 13:59:09 +0900
> Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> > On Fri, Jan 26, 2024 at 11:28 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote:
> > >
> > > Hi,
> > >
> > > I found that the documentation of COPY ON_ERROR said
> > > COPY stops operation at the first error when
> > > "ON_ERROR is not specified.", but it also stop when
> > > ON_ERROR is specified to the default value, "stop".
> > >
> > > I attached a very small patch to fix this just for
> > > making the description more accurate.
> >
> > Thank you for the patch!
> >
> > +1 to fix it.
> >
> > -    <literal>ON_ERROR</literal> is not specified. This
> > -    should not lead to problems in the event of a <command>COPY
> > +    <literal>ON_ERROR</literal> is not specified or <literal>stop</literal>.
> > +    This should not lead to problems in the event of a <command>COPY
> >
> > How about the followings for consistency with the description of the
> > ON_ERROR option?
> >
> > COPY stops operation at the first error if the stop value is specified
> > to the ON_ERROR option. This should not lead to ...
>
> Thank you for you review. However, after posting the previous patch,
> I noticed that I overlooked ON_ERROR works only for errors due to data
> type incompatibility (that is mentioned as "malformed data" in the
> ON_ERROR description, though).

Right.

>
> So, how about rewriting this like:
>
>  COPY stops operation at the first error unless the error is due to data
>  type incompatibility and a value other than stop is specified to the
>  ON_ERROR option.

Hmm, this sentence seems not very readable to me, especially the "COPY
stops ... unless ... a value other than stop is specified ..." part. I
think we can simplify it:

COPY stops operation at the first data type incompatibility error if
the stop value is specified to the ON_ERROR option.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Small fix on COPY ON_ERROR document

От
"David G. Johnston"
Дата:
On Thu, Jan 25, 2024 at 10:40 PM Yugo NAGATA <nagata@sraoss.co.jp> wrote:
On Fri, 26 Jan 2024 13:59:09 +0900
Masahiko Sawada <sawada.mshk@gmail.com> wrote:

> On Fri, Jan 26, 2024 at 11:28 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote:
> >
> > Hi,
> >
> > I found that the documentation of COPY ON_ERROR said
> > COPY stops operation at the first error when
> > "ON_ERROR is not specified.", but it also stop when
> > ON_ERROR is specified to the default value, "stop".
> >


I'm finding the current wording surrounding ON_ERROR to not fit in with the rest of the page.  I've tried to improve things by being a bit more invasive.

This first hunk updates the description of the COPY command to include describing the purpose of the ON_ERROR clause.

I too am concerned about the word "parsed" here, and "malformed" in the more detailed descriptions; this would need to be modified to better reflect the circumstances under which ignore happens.

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 21a5c4a052..5fe4c9f747 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -90,6 +90,14 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     in the <structname>pg_stat_progress_copy</structname> view. See
     <xref linkend="copy-progress-reporting"/> for details.
   </para>
+
+  <para>
+    By default, <command>COPY</command> will abort if it encounters errors.
+    For non-binary format <command>COPY FROM</command> the user can specify
+    to instead ignore input rows that cannot be parsed by specifying
+    the <literal>ignore</literal> option to the <literal>ON_ERROR</literal>
+    clause.
+  </para>
  </refsect1>

  <refsect1>

The use of the word "Currently" stood out to me when reading the next hunk.  We always document current reality and don't call out that fact.  We do not imply some future feature may change how this all works.  On a related note, right now we have stop and ignore, which are basically enums just like we have for format (csv, text, binary).  Unlike format, though, this option requires single quotes.  Why is that?  I did keep with not specifying the enum in the main syntax block since format doesn't as well; though writing { stop | ignore } seemed quite appealing.

The rest of the page indicates what the default is in a sentence by itself, not as a parenthetical next to the option.

The command limitations seemed worthy of a separate paragraph, though it repeats the description which isn't great.  I'm going to sleep on this one.

@@ -379,17 +387,16 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     <listitem>
      <para>
       Specifies which <replaceable class="parameter">
-      error_action</replaceable> to perform when there is malformed data in the input.
-      Currently, only <literal>stop</literal> (default) and <literal>ignore</literal>
-      values are supported.
-      If the <literal>stop</literal> value is specified,
-      <command>COPY</command> stops operation at the first error.
-      If the <literal>ignore</literal> value is specified,
-      <command>COPY</command> skips malformed data and continues copying data.
-      The option is allowed only in <command>COPY FROM</command>.
-      Only <literal>stop</literal> value is allowed when
-      using <literal>binary</literal> format.
+      error_action</replaceable> to perform when encountering a malformed input row:
+      <literal>stop</literal> or <literal>ignore</literal>.
+      A value of <literal>stop</literal> means fail the command, while
+      <literal>ignore</literal> means discard the input row and continue with the next one.
+      The default is <literal>stop</literal>
      </para>
+     <para>
+      The ignore option is applicable only for <command>COPY FROM</command>
+      when the <literal>FORMAT</literal> is <literal>text</literal>
+      or <literal>csv</literal>.
+    </para>
     </listitem>
    </varlistentry>

The following was, IMO, too much text about something not to worry about, COPY TO and ignore mode.  The point is dead tuples on error and running vacuum.  It doesn't really even matter what caused the error, though if the error was even before rows started to be imported then obviously there would be no dead tuples.

Oh, and the word "first" really isn't adding anything here that we cannot reasonably leave to common sense, IMO.  We either ignore all errors or stop on the first one.  There isn't a stop mode that is capable of bypassing errors and the ignore mode doesn't have a "first n" option so one assumes all errors are ignored.

@@ -576,15 +583,12 @@ COPY <replaceable class="parameter">count</replaceable>
    </para>

    <para>
-    <command>COPY</command> stops operation at the first error when
-    <literal>ON_ERROR</literal> is not specified. This
-    should not lead to problems in the event of a <command>COPY
-    TO</command>, but the target table will already have received
-    earlier rows in a <command>COPY FROM</command>. These rows will not
-    be visible or accessible, but they still occupy disk space. This might
-    amount to a considerable amount of wasted disk space if the failure
-    happened well into a large copy operation. You might wish to invoke
-    <command>VACUUM</command> to recover the wasted space.
+    A failed <command>COPY FROM</command> command will have performed
+    physical insertion of all rows prior to the malformed one.
+    While these rows will not be visible or accessible, they still occupy
+    disk space. This might amount to a considerable amount of wasted disk
+    space if the failure happened well into a large copy operation.
+    <command>VACUUM</command> should be used to recover the wasted space.
    </para>

    <para>

David J.

Re: Small fix on COPY ON_ERROR document

От
Yugo NAGATA
Дата:
On Fri, 26 Jan 2024 15:26:55 +0900
Masahiko Sawada <sawada.mshk@gmail.com> wrote:

> On Fri, Jan 26, 2024 at 2:40 PM Yugo NAGATA <nagata@sraoss.co.jp> wrote:
> >
> > On Fri, 26 Jan 2024 13:59:09 +0900
> > Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > > On Fri, Jan 26, 2024 at 11:28 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote:
> > > >
> > > > Hi,
> > > >
> > > > I found that the documentation of COPY ON_ERROR said
> > > > COPY stops operation at the first error when
> > > > "ON_ERROR is not specified.", but it also stop when
> > > > ON_ERROR is specified to the default value, "stop".
> > > >
> > > > I attached a very small patch to fix this just for
> > > > making the description more accurate.
> > >
> > > Thank you for the patch!
> > >
> > > +1 to fix it.
> > >
> > > -    <literal>ON_ERROR</literal> is not specified. This
> > > -    should not lead to problems in the event of a <command>COPY
> > > +    <literal>ON_ERROR</literal> is not specified or <literal>stop</literal>.
> > > +    This should not lead to problems in the event of a <command>COPY
> > >
> > > How about the followings for consistency with the description of the
> > > ON_ERROR option?
> > >
> > > COPY stops operation at the first error if the stop value is specified
> > > to the ON_ERROR option. This should not lead to ...
> >
> > Thank you for you review. However, after posting the previous patch,
> > I noticed that I overlooked ON_ERROR works only for errors due to data
> > type incompatibility (that is mentioned as "malformed data" in the
> > ON_ERROR description, though).
> 
> Right.
> 
> >
> > So, how about rewriting this like:
> >
> >  COPY stops operation at the first error unless the error is due to data
> >  type incompatibility and a value other than stop is specified to the
> >  ON_ERROR option.
> 
> Hmm, this sentence seems not very readable to me, especially the "COPY
> stops ... unless ... a value other than stop is specified ..." part. I
> think we can simplify it:

I can agree with your opinion on the readability, but...

> COPY stops operation at the first data type incompatibility error if
> the stop value is specified to the ON_ERROR option.

This statement doesn't explain COPY also stops when an error other than
data type incompatibility (e.g. constrain violations) occurs.

Maybe, we can separate the sentese to two, for example:

  COPY stops operation at the first error. (The exception is if the error
  is due to data type incompatibility and a value other than stop is specified
  to the ON_ERROR option.)

Regards,
Yugo Nagata

> Regards,
> 
> -- 
> Masahiko Sawada
> Amazon Web Services: https://aws.amazon.com


-- 
Yugo NAGATA <nagata@sraoss.co.jp>



Re: Small fix on COPY ON_ERROR document

От
"David G. Johnston"
Дата:
On Thursday, January 25, 2024, Yugo NAGATA <nagata@sraoss.co.jp> wrote:

Maybe, we can separate the sentese to two, for example:

  COPY stops operation at the first error. (The exception is if the error
  is due to data type incompatibility and a value other than stop is specified
  to the ON_ERROR option.)

I’d lean more toward saying:

Copy from can be instructed to ignore errors that arise from casting input data to the data types of the target columns by setting the on_error option to ignore.  Skipping the entire input row in the process.

The last part is because another way to ignore would be to set null values for those columns.

That a command stops on an error is the assumed behavior throughout the system, writing “copy stops operation at the first error.” just seems really unnecessary.

I will need to make this tweak and probably a couple others to my own suggestions in 12 hours or so.

David J.

Re: Small fix on COPY ON_ERROR document

От
Yugo NAGATA
Дата:
On Thu, 25 Jan 2024 23:33:22 -0700
"David G. Johnston" <david.g.johnston@gmail.com> wrote:

> On Thu, Jan 25, 2024 at 10:40 PM Yugo NAGATA <nagata@sraoss.co.jp> wrote:
> 
> > On Fri, 26 Jan 2024 13:59:09 +0900
> > Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > > On Fri, Jan 26, 2024 at 11:28 AM Yugo NAGATA <nagata@sraoss.co.jp>
> > wrote:
> > > >
> > > > Hi,
> > > >
> > > > I found that the documentation of COPY ON_ERROR said
> > > > COPY stops operation at the first error when
> > > > "ON_ERROR is not specified.", but it also stop when
> > > > ON_ERROR is specified to the default value, "stop".
> > > >
> >
> >
> I'm finding the current wording surrounding ON_ERROR to not fit in with the
> rest of the page.  I've tried to improve things by being a bit more
> invasive.

Introducing the ON_ERROR option changed the behavior of COPY about error
handling to some extent, so it might be good to rewrite a bit more parts
of the documentation as you suggest.

> This first hunk updates the description of the COPY command to include
> describing the purpose of the ON_ERROR clause.
> 
> I too am concerned about the word "parsed" here, and "malformed" in the
> more detailed descriptions; this would need to be modified to better
> reflect the circumstances under which ignore happens.

I think "errors due to data type incompatibility" would be nice to describe
the errors to be ignored by ON_ERROR, as used in the NOTICE message output.
 
> diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
> index 21a5c4a052..5fe4c9f747 100644
> --- a/doc/src/sgml/ref/copy.sgml
> +++ b/doc/src/sgml/ref/copy.sgml
> @@ -90,6 +90,14 @@ COPY { <replaceable
> class="parameter">table_name</replaceable> [ ( <replaceable
>      in the <structname>pg_stat_progress_copy</structname> view. See
>      <xref linkend="copy-progress-reporting"/> for details.
>    </para>
> +
> +  <para>
> +    By default, <command>COPY</command> will abort if it encounters errors.
> +    For non-binary format <command>COPY FROM</command> the user can specify
> +    to instead ignore input rows that cannot be parsed by specifying
> +    the <literal>ignore</literal> option to the <literal>ON_ERROR</literal>
> +    clause.
> +  </para>
>   </refsect1>
> 

> 
> The following was, IMO, too much text about something not to worry about,
> COPY TO and ignore mode.  The point is dead tuples on error and running
> vacuum.  It doesn't really even matter what caused the error, though if the
> error was even before rows started to be imported then obviously there
> would be no dead tuples.
> 
> Oh, and the word "first" really isn't adding anything here that we cannot
> reasonably leave to common sense, IMO.  We either ignore all errors or stop
> on the first one.  There isn't a stop mode that is capable of bypassing
> errors and the ignore mode doesn't have a "first n" option so one assumes
> all errors are ignored.
> 
> @@ -576,15 +583,12 @@ COPY <replaceable
> class="parameter">count</replaceable>
>     </para>
> 
>     <para>
> -    <command>COPY</command> stops operation at the first error when
> -    <literal>ON_ERROR</literal> is not specified. This
> -    should not lead to problems in the event of a <command>COPY
> -    TO</command>, but the target table will already have received
> -    earlier rows in a <command>COPY FROM</command>. These rows will not
> -    be visible or accessible, but they still occupy disk space. This might
> -    amount to a considerable amount of wasted disk space if the failure
> -    happened well into a large copy operation. You might wish to invoke
> -    <command>VACUUM</command> to recover the wasted space.
> +    A failed <command>COPY FROM</command> command will have performed
> +    physical insertion of all rows prior to the malformed one.

As you said that it does not matter what caused the error, I don't think
we have to use "malformed" here. Instead, we could say "we will have
performed physical insertion of rows before the failure."


> +    While these rows will not be visible or accessible, they still occupy
> +    disk space. This might amount to a considerable amount of wasted disk
> +    space if the failure happened well into a large copy operation.
> +    <command>VACUUM</command> should be used to recover the wasted space.
>     </para>
> 
>     <para>
> 
> David J.

Regards,
Yugo Nagata

-- 
Yugo NAGATA <nagata@sraoss.co.jp>



Re: Small fix on COPY ON_ERROR document

От
Yugo NAGATA
Дата:
On Fri, 26 Jan 2024 00:00:57 -0700
"David G. Johnston" <david.g.johnston@gmail.com> wrote:

> On Thursday, January 25, 2024, Yugo NAGATA <nagata@sraoss.co.jp> wrote:
> 
> >
> > Maybe, we can separate the sentese to two, for example:
> >
> >   COPY stops operation at the first error. (The exception is if the error
> >   is due to data type incompatibility and a value other than stop is
> > specified
> >   to the ON_ERROR option.)
> >
> 
> I’d lean more toward saying:
> 
> Copy from can be instructed to ignore errors that arise from casting input
> data to the data types of the target columns by setting the on_error option
> to ignore.  Skipping the entire input row in the process.
> 
> The last part is because another way to ignore would be to set null values
> for those columns.

That makes sense. Is is a bit ambiguous to just say "skips malformed data";
it might be unclear for users if the data in the problematic column is skipped
(NULL is set) or the entire row is skipped. Also, setting null values for those
columns could be a future feature of ON_ERROR option.
> 
> That a command stops on an error is the assumed behavior throughout the
> system, writing “copy stops operation at the first error.” just seems
> really unnecessary.

I think we need to describe this default behavior explicitly somewhere,
as you suggested in the other post [1].

[1] https://www.postgresql.org/message-id/CAKFQuwZJZ6uaS2B7qpL2FJzWBsnDdzgtbsW3pH9zuT6vC3fH%2Bg%40mail.gmail.com

Regards,
Yugo Nagata

> I will need to make this tweak and probably a couple others to my own
> suggestions in 12 hours or so.
> 
> David J.


-- 
Yugo NAGATA <nagata@sraoss.co.jp>



Re: Small fix on COPY ON_ERROR document

От
"David G. Johnston"
Дата:
On Fri, Jan 26, 2024 at 2:30 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote:
On Fri, 26 Jan 2024 00:00:57 -0700
"David G. Johnston" <david.g.johnston@gmail.com> wrote:

> I will need to make this tweak and probably a couple others to my own
> suggestions in 12 hours or so.
>


And here is my v2.

Notably I choose to introduce the verbiage "soft error" and then define in the ON_ERROR clause the specific soft error that matters here - "invalid input syntax".

I also note the log message behavior when ignore mode is chosen.  I haven't confirmed that it is accurate but that is readily tweaked if approved of.

David J.

Вложения

Re: Small fix on COPY ON_ERROR document

От
Yugo NAGATA
Дата:
On Fri, 26 Jan 2024 08:04:45 -0700
"David G. Johnston" <david.g.johnston@gmail.com> wrote:

> On Fri, Jan 26, 2024 at 2:30 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote:
> 
> > On Fri, 26 Jan 2024 00:00:57 -0700
> > "David G. Johnston" <david.g.johnston@gmail.com> wrote:
> >
> > > I will need to make this tweak and probably a couple others to my own
> > > suggestions in 12 hours or so.
> > >
> >
> >
> And here is my v2.
> 
> Notably I choose to introduce the verbiage "soft error" and then define in
> the ON_ERROR clause the specific soft error that matters here - "invalid
> input syntax".

I am not sure we should use "soft error" without any explanation
because it seems to me that the meaning of words is unclear for users. 

This is explained in src/backend/utils/fmgr/README;

 * Some callers of datatype input functions (and in future perhaps
 other classes of functions) pass an instance of ErrorSaveContext.
 This indicates that the caller wishes to handle "soft" errors without
 a transaction-terminating exception being thrown: instead, the callee
 should store information about the error cause in the ErrorSaveContext
 struct and return a dummy result value. 

However, this is not mentioned in the documentation anywhere. So, it
would be hard for users to understand what "soft error" is because
they would not read README.

Also, I think "invalid input syntax" is a bit ambiguous. For example, 
COPY FROM raises an error when the number of input column does not match
to the table schema, but this error is not ignored by ON_ERROR while
this seems to fall into the category of "invalid input syntax".

I found the description as followings in the documentation in COPY:

 The column values themselves are strings (...) acceptable to the input
 function, of each attribute's data type.

So, keeping consistency with the existing description, we can say:

"Specifies which how to behave when encountering an error due to
 column values unacceptable to the input function of each attribute's
 data type."

Currently, ON_ERROR doesn't support other soft errors, so it can explain
it more simply without introducing the new concept, "soft error" to users.

> I also note the log message behavior when ignore mode is chosen.  I haven't
> confirmed that it is accurate but that is readily tweaked if approved of.
> 

+      An <literal>INFO</literal> level context message containing the ignored row count is
+      emitted at the end of the <command>COPY FROM</command> if at least one row was discarded.


The log level is NOTICE not INFO.

    <para>
-    <command>COPY</command> stops operation at the first error when
-    <literal>ON_ERROR</literal> is not specified. This
-    should not lead to problems in the event of a <command>COPY
-    TO</command>, but the target table will already have received
-    earlier rows in a <command>COPY FROM</command>. These rows will not
-    be visible or accessible, but they still occupy disk space. This might
+    The <command>COPY FROM</command> command physically inserts input rows
+    into the table as it progresses.  If the command fails these rows are
+    left in a deleted state, still occupying disk space. This might
     amount to a considerable amount of wasted disk space if the failure
-    happened well into a large copy operation. You might wish to invoke
-    <command>VACUUM</command> to recover the wasted space.
+    happened well into a large copy operation. <command>VACUUM</command>
+    should be used to recover the wasted space.
    </para>

I think "left in a deleted state" is also unclear for users because this
explains the internal state but not how looks from user's view.How about
leaving the explanation "These rows will not be visible or accessible" in
the existing statement?

Regards,
Yugo Nagata

-- 
Yugo NAGATA <nagata@sraoss.co.jp>



Re: Small fix on COPY ON_ERROR document

От
torikoshia
Дата:
On 2024-01-27 00:04, David G. Johnston wrote:
> On Fri, Jan 26, 2024 at 2:30 AM Yugo NAGATA <nagata@sraoss.co.jp>
> wrote:
> 
>> On Fri, 26 Jan 2024 00:00:57 -0700
>> "David G. Johnston" <david.g.johnston@gmail.com> wrote:
>> 
>>> I will need to make this tweak and probably a couple others to my
>> own
>>> suggestions in 12 hours or so.
>>> 
> 
> And here is my v2.
> 
> Notably I choose to introduce the verbiage "soft error" and then
> define in the ON_ERROR clause the specific soft error that matters
> here - "invalid input syntax".
> 
> I also note the log message behavior when ignore mode is chosen.  I
> haven't confirmed that it is accurate but that is readily tweaked if
> approved of.
> 
> David J.

Thanks for refining the doc.


> +      Specifies which how to behave when encountering a soft error.

To be consistent with other parts in the manual[1][2], should be “soft” 
error?

> +      An <replaceable class="parameter">error_action</replaceable> 
> value of
> +      <literal>stop</literal> means fail the command, while
> +      <literal>ignore</literal> means discard the input row and 
> continue with the next one.
> +      The default is <literal>stop</literal>

Is "." required at the end of the line?

+     <para>
+      The only relevant soft error is "invalid input syntax", which 
manifests when attempting
+      to create a column value from the text input.
+     </para>

I think it is not restricted to "invalid input syntax".
We can handle out of range error:

   =# create table t1(i int);
   CREATE TABLE

   =# copy t1  from stdin with(ON_ERROR ignore);
   Enter data to be copied followed by a newline.
   End with a backslash and a period on a line by itself, or an EOF
   signal.
   >> 111111111111111111111
   >> \.
   NOTICE:  1 row was skipped due to data type incompatibility
   COPY 0


Also, I'm a little concerned that users might wonder what soft error is.

Certainly there are already references to "soft" errors in the manual, 
but they seem to be for developer, such as creating new TYPE for 
PostgreSQL.

It might be better to describe what soft error is like below:

> -- src/backend/utils/fmgr/README
> An error reported "softly" must be safe, in the sense that there is
> no question about our ability to continue normal processing of the
> transaction.

[1] https://www.postgresql.org/docs/devel/sql-createtype.html
[2] https://www.postgresql.org/docs/devel/functions-info.html

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation



Small fix on COPY ON_ERROR document

От
"David G. Johnston"
Дата:


On Sunday, January 28, 2024, Yugo NAGATA <nagata@sraoss.co.jp> wrote:
On Fri, 26 Jan 2024 08:04:45 -0700
"David G. Johnston" <david.g.johnston@gmail.com> wrote:

> On Fri, Jan 26, 2024 at 2:30 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote:
>
> > On Fri, 26 Jan 2024 00:00:57 -0700
> > "David G. Johnston" <david.g.johnston@gmail.com> wrote:
> >
> > > I will need to make this tweak and probably a couple others to my own
> > > suggestions in 12 hours or so.
> > >
> >
> >
> And here is my v2.
>
> Notably I choose to introduce the verbiage "soft error" and then define in
> the ON_ERROR clause the specific soft error that matters here - "invalid
> input syntax".

I am not sure we should use "soft error" without any explanation
because it seems to me that the meaning of words is unclear for users. 

Agreed. It needs to be added to the glossary.

 

Also, I think "invalid input syntax" is a bit ambiguous. For example,
COPY FROM raises an error when the number of input column does not match
to the table schema, but this error is not ignored by ON_ERROR while
this seems to fall into the category of "invalid input syntax".


It is literally the error text that appears if one were not to ignore it.  It isn’t a category of errors.  But I’m open to ideas here.  But being explicit with what on actually sees in the system seemed preferable to inventing new classification terms not otherwise used.
 

So, keeping consistency with the existing description, we can say:

"Specifies which how to behave when encountering an error due to
 column values unacceptable to the input function of each attribute's
 data type."

Yeah, I was considering something along those lines as an option as well.  But I’d rather add that wording to the glossary.



Currently, ON_ERROR doesn't support other soft errors, so it can explain
it more simply without introducing the new concept, "soft error" to users.


Good point.  Seems we should define what user-facing errors are ignored anywhere in the system and if we aren’t consistently leveraging these in all areas/commands make the necessary qualifications in those specific places.

 
> I also note the log message behavior when ignore mode is chosen.  I haven't
> confirmed that it is accurate but that is readily tweaked if approved of.
>

+      An <literal>INFO</literal> level context message containing the ignored row count is
+      emitted at the end of the <command>COPY FROM</command> if at least one row was discarded.


The log level is NOTICE not INFO.

Makes sense, I hadn’t experimented. 


I think "left in a deleted state" is also unclear for users because this
explains the internal state but not how looks from user's view.How about
leaving the explanation "These rows will not be visible or accessible" in
the existing statement?

Just visible then, I don’t like an “or” there and as tuples at least they are accessible to the system, in vacuum especially.  But I expected the user to understand “as if you deleted it” as their operational concept more readily than visible.  I think this will be read by people who haven’t read MVCC to fully understand what visible means but know enough to run vacuum to clean up updated and deleted data as a rule.

David J.
 

Re: Small fix on COPY ON_ERROR document

От
Yugo NAGATA
Дата:
On Sun, 28 Jan 2024 19:14:58 -0700
"David G. Johnston" <david.g.johnston@gmail.com> wrote:

> > Also, I think "invalid input syntax" is a bit ambiguous. For example,
> > COPY FROM raises an error when the number of input column does not match
> > to the table schema, but this error is not ignored by ON_ERROR while
> > this seems to fall into the category of "invalid input syntax".
> 
> 
> 
> It is literally the error text that appears if one were not to ignore it.
> It isn’t a category of errors.  But I’m open to ideas here.  But being
> explicit with what on actually sees in the system seemed preferable to
> inventing new classification terms not otherwise used.

Thank you for explanation! I understood the words was from the error messages
that users actually see. However, as Torikoshi-san said in [1], errors other
than valid input syntax (e.g. range error) can be also ignored, therefore it
would be better to describe to be ignored errors more specifically.

[1] https://www.postgresql.org/message-id/7f1457497fa3bf9dfe486f162d1c8ec6%40oss.nttdata.com

> 
> >
> > So, keeping consistency with the existing description, we can say:
> >
> > "Specifies which how to behave when encountering an error due to
> >  column values unacceptable to the input function of each attribute's
> >  data type."
> 
> 
> Yeah, I was considering something along those lines as an option as well.
> But I’d rather add that wording to the glossary.

Although I am still be not convinced if we have to introduce the words
"soft error" to the documentation, I don't care it if there are no other
opposite opinions. 

> 
> > Currently, ON_ERROR doesn't support other soft errors, so it can explain
> > it more simply without introducing the new concept, "soft error" to users.
> >
> >
> Good point.  Seems we should define what user-facing errors are ignored
> anywhere in the system and if we aren’t consistently leveraging these in
> all areas/commands make the necessary qualifications in those specific
> places.
> 

> > I think "left in a deleted state" is also unclear for users because this
> > explains the internal state but not how looks from user's view.How about
> > leaving the explanation "These rows will not be visible or accessible" in
> > the existing statement?
> >
> 
> Just visible then, I don’t like an “or” there and as tuples at least they
> are accessible to the system, in vacuum especially.  But I expected the
> user to understand “as if you deleted it” as their operational concept more
> readily than visible.  I think this will be read by people who haven’t read
> MVCC to fully understand what visible means but know enough to run vacuum
> to clean up updated and deleted data as a rule.

Ok, I agree we can omit "or accessible". How do you like the followings?
Still redundant?

 "If the command fails, these rows are left in a deleted state;
  these rows will not be visible, but they still occupy disk space. "

Regards,
Yugo Nagata

-- 
Yugo NAGATA <nagata@sraoss.co.jp>



Re: Small fix on COPY ON_ERROR document

От
Yugo NAGATA
Дата:
On Mon, 29 Jan 2024 15:47:25 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:

> On Sun, 28 Jan 2024 19:14:58 -0700
> "David G. Johnston" <david.g.johnston@gmail.com> wrote:
> 
> > > Also, I think "invalid input syntax" is a bit ambiguous. For example,
> > > COPY FROM raises an error when the number of input column does not match
> > > to the table schema, but this error is not ignored by ON_ERROR while
> > > this seems to fall into the category of "invalid input syntax".
> > 
> > 
> > 
> > It is literally the error text that appears if one were not to ignore it.
> > It isn’t a category of errors.  But I’m open to ideas here.  But being
> > explicit with what on actually sees in the system seemed preferable to
> > inventing new classification terms not otherwise used.
> 
> Thank you for explanation! I understood the words was from the error messages
> that users actually see. However, as Torikoshi-san said in [1], errors other
> than valid input syntax (e.g. range error) can be also ignored, therefore it
> would be better to describe to be ignored errors more specifically.
> 
> [1] https://www.postgresql.org/message-id/7f1457497fa3bf9dfe486f162d1c8ec6%40oss.nttdata.com
> 
> > 
> > >
> > > So, keeping consistency with the existing description, we can say:
> > >
> > > "Specifies which how to behave when encountering an error due to
> > >  column values unacceptable to the input function of each attribute's
> > >  data type."
> > 
> > 
> > Yeah, I was considering something along those lines as an option as well.
> > But I’d rather add that wording to the glossary.
> 
> Although I am still be not convinced if we have to introduce the words
> "soft error" to the documentation, I don't care it if there are no other
> opposite opinions. 

Attached is a updated patch v3, which is a version that uses the above
wording instead of "soft error".

> > 
> > > Currently, ON_ERROR doesn't support other soft errors, so it can explain
> > > it more simply without introducing the new concept, "soft error" to users.
> > >
> > >
> > Good point.  Seems we should define what user-facing errors are ignored
> > anywhere in the system and if we aren’t consistently leveraging these in
> > all areas/commands make the necessary qualifications in those specific
> > places.
> > 
> 
> > > I think "left in a deleted state" is also unclear for users because this
> > > explains the internal state but not how looks from user's view.How about
> > > leaving the explanation "These rows will not be visible or accessible" in
> > > the existing statement?
> > >
> > 
> > Just visible then, I don’t like an “or” there and as tuples at least they
> > are accessible to the system, in vacuum especially.  But I expected the
> > user to understand “as if you deleted it” as their operational concept more
> > readily than visible.  I think this will be read by people who haven’t read
> > MVCC to fully understand what visible means but know enough to run vacuum
> > to clean up updated and deleted data as a rule.
> 
> Ok, I agree we can omit "or accessible". How do you like the followings?
> Still redundant?
> 
>  "If the command fails, these rows are left in a deleted state;
>   these rows will not be visible, but they still occupy disk space. "

Also, the above statement is used in the patch.

Regards,
Yugo Nagata

 
> Regards,
> Yugo Nagata
> 
> -- 
> Yugo NAGATA <nagata@sraoss.co.jp>
> 
> 


-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Вложения

Re: Small fix on COPY ON_ERROR document

От
torikoshia
Дата:
On 2024-02-01 15:16, Yugo NAGATA wrote:
> On Mon, 29 Jan 2024 15:47:25 +0900
> Yugo NAGATA <nagata@sraoss.co.jp> wrote:
> 
>> On Sun, 28 Jan 2024 19:14:58 -0700
>> "David G. Johnston" <david.g.johnston@gmail.com> wrote:
>> 
>> > > Also, I think "invalid input syntax" is a bit ambiguous. For example,
>> > > COPY FROM raises an error when the number of input column does not match
>> > > to the table schema, but this error is not ignored by ON_ERROR while
>> > > this seems to fall into the category of "invalid input syntax".
>> >
>> >
>> >
>> > It is literally the error text that appears if one were not to ignore it.
>> > It isn’t a category of errors.  But I’m open to ideas here.  But being
>> > explicit with what on actually sees in the system seemed preferable to
>> > inventing new classification terms not otherwise used.
>> 
>> Thank you for explanation! I understood the words was from the error 
>> messages
>> that users actually see. However, as Torikoshi-san said in [1], errors 
>> other
>> than valid input syntax (e.g. range error) can be also ignored, 
>> therefore it
>> would be better to describe to be ignored errors more specifically.
>> 
>> [1] 
>> https://www.postgresql.org/message-id/7f1457497fa3bf9dfe486f162d1c8ec6%40oss.nttdata.com
>> 
>> >
>> > >
>> > > So, keeping consistency with the existing description, we can say:
>> > >
>> > > "Specifies which how to behave when encountering an error due to
>> > >  column values unacceptable to the input function of each attribute's
>> > >  data type."
>> >
>> >
>> > Yeah, I was considering something along those lines as an option as well.
>> > But I’d rather add that wording to the glossary.
>> 
>> Although I am still be not convinced if we have to introduce the words
>> "soft error" to the documentation, I don't care it if there are no 
>> other
>> opposite opinions.
> 
> Attached is a updated patch v3, which is a version that uses the above
> wording instead of "soft error".
> 
>> >
>> > > Currently, ON_ERROR doesn't support other soft errors, so it can explain
>> > > it more simply without introducing the new concept, "soft error" to users.
>> > >
>> > >
>> > Good point.  Seems we should define what user-facing errors are ignored
>> > anywhere in the system and if we aren’t consistently leveraging these in
>> > all areas/commands make the necessary qualifications in those specific
>> > places.
>> >
>> 
>> > > I think "left in a deleted state" is also unclear for users because this
>> > > explains the internal state but not how looks from user's view.How about
>> > > leaving the explanation "These rows will not be visible or accessible" in
>> > > the existing statement?
>> > >
>> >
>> > Just visible then, I don’t like an “or” there and as tuples at least they
>> > are accessible to the system, in vacuum especially.  But I expected the
>> > user to understand “as if you deleted it” as their operational concept more
>> > readily than visible.  I think this will be read by people who haven’t read
>> > MVCC to fully understand what visible means but know enough to run vacuum
>> > to clean up updated and deleted data as a rule.
>> 
>> Ok, I agree we can omit "or accessible". How do you like the 
>> followings?
>> Still redundant?
>> 
>>  "If the command fails, these rows are left in a deleted state;
>>   these rows will not be visible, but they still occupy disk space. "
> 
> Also, the above statement is used in the patch.

Thanks for updating the patch!

I like your description which doesn't use the word soft error.


Here are minor comments:

> +      <literal>ignore</literal> means discard the input row and 
> continue with the next one.
> +      The default is <literal>stop</literal>

Is "." required at the end of the line?

>      An <literal>NOTICE</literal> level context message containing the 
> ignored row count is

Should 'An' be 'A'?

Also, I wasn't sure the necessity of 'context'.
It might be possible to just say "A NOTICE message containing the 
ignored row count.."
considering below existing descriptions:

   doc/src/sgml/pltcl.sgml:     a <literal>NOTICE</literal> message each 
time a supported command is
   doc/src/sgml/pltcl.sgml-     executed:

   doc/src/sgml/plpgsql.sgml:     This example trigger simply raises a 
<literal>NOTICE</literal> message
   doc/src/sgml/plpgsql.sgml-     each time a supported command is 
executed.

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation



Re: Small fix on COPY ON_ERROR document

От
Yugo NAGATA
Дата:
On Fri, 02 Feb 2024 11:29:41 +0900
torikoshia <torikoshia@oss.nttdata.com> wrote:

> On 2024-02-01 15:16, Yugo NAGATA wrote:
> > On Mon, 29 Jan 2024 15:47:25 +0900
> > Yugo NAGATA <nagata@sraoss.co.jp> wrote:
> > 
> >> On Sun, 28 Jan 2024 19:14:58 -0700
> >> "David G. Johnston" <david.g.johnston@gmail.com> wrote:
> >> 
> >> > > Also, I think "invalid input syntax" is a bit ambiguous. For example,
> >> > > COPY FROM raises an error when the number of input column does not match
> >> > > to the table schema, but this error is not ignored by ON_ERROR while
> >> > > this seems to fall into the category of "invalid input syntax".
> >> >
> >> >
> >> >
> >> > It is literally the error text that appears if one were not to ignore it.
> >> > It isn’t a category of errors.  But I’m open to ideas here.  But being
> >> > explicit with what on actually sees in the system seemed preferable to
> >> > inventing new classification terms not otherwise used.
> >> 
> >> Thank you for explanation! I understood the words was from the error 
> >> messages
> >> that users actually see. However, as Torikoshi-san said in [1], errors 
> >> other
> >> than valid input syntax (e.g. range error) can be also ignored, 
> >> therefore it
> >> would be better to describe to be ignored errors more specifically.
> >> 
> >> [1] 
> >> https://www.postgresql.org/message-id/7f1457497fa3bf9dfe486f162d1c8ec6%40oss.nttdata.com
> >> 
> >> >
> >> > >
> >> > > So, keeping consistency with the existing description, we can say:
> >> > >
> >> > > "Specifies which how to behave when encountering an error due to
> >> > >  column values unacceptable to the input function of each attribute's
> >> > >  data type."
> >> >
> >> >
> >> > Yeah, I was considering something along those lines as an option as well.
> >> > But I’d rather add that wording to the glossary.
> >> 
> >> Although I am still be not convinced if we have to introduce the words
> >> "soft error" to the documentation, I don't care it if there are no 
> >> other
> >> opposite opinions.
> > 
> > Attached is a updated patch v3, which is a version that uses the above
> > wording instead of "soft error".
> > 
> >> >
> >> > > Currently, ON_ERROR doesn't support other soft errors, so it can explain
> >> > > it more simply without introducing the new concept, "soft error" to users.
> >> > >
> >> > >
> >> > Good point.  Seems we should define what user-facing errors are ignored
> >> > anywhere in the system and if we aren’t consistently leveraging these in
> >> > all areas/commands make the necessary qualifications in those specific
> >> > places.
> >> >
> >> 
> >> > > I think "left in a deleted state" is also unclear for users because this
> >> > > explains the internal state but not how looks from user's view.How about
> >> > > leaving the explanation "These rows will not be visible or accessible" in
> >> > > the existing statement?
> >> > >
> >> >
> >> > Just visible then, I don’t like an “or” there and as tuples at least they
> >> > are accessible to the system, in vacuum especially.  But I expected the
> >> > user to understand “as if you deleted it” as their operational concept more
> >> > readily than visible.  I think this will be read by people who haven’t read
> >> > MVCC to fully understand what visible means but know enough to run vacuum
> >> > to clean up updated and deleted data as a rule.
> >> 
> >> Ok, I agree we can omit "or accessible". How do you like the 
> >> followings?
> >> Still redundant?
> >> 
> >>  "If the command fails, these rows are left in a deleted state;
> >>   these rows will not be visible, but they still occupy disk space. "
> > 
> > Also, the above statement is used in the patch.
> 
> Thanks for updating the patch!
> 
> I like your description which doesn't use the word soft error.

Thank you for your comments!

> 
> Here are minor comments:
> 
> > +      <literal>ignore</literal> means discard the input row and 
> > continue with the next one.
> > +      The default is <literal>stop</literal>
> 
> Is "." required at the end of the line?
>
> >      An <literal>NOTICE</literal> level context message containing the 
> > ignored row count is
> 
> Should 'An' be 'A'?
> 
> Also, I wasn't sure the necessity of 'context'.
> It might be possible to just say "A NOTICE message containing the 
> ignored row count.."
> considering below existing descriptions:
> 
>    doc/src/sgml/pltcl.sgml:     a <literal>NOTICE</literal> message each 
> time a supported command is
>    doc/src/sgml/pltcl.sgml-     executed:
> 
>    doc/src/sgml/plpgsql.sgml:     This example trigger simply raises a 
> <literal>NOTICE</literal> message
>    doc/src/sgml/plpgsql.sgml-     each time a supported command is 
> executed.

I attached a updated patch including fixes you pointed out above.

Regards,
Yugo Nagata

> -- 
> Regards,
> 
> --
> Atsushi Torikoshi
> NTT DATA Group Corporation


-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Вложения

Re: Small fix on COPY ON_ERROR document

От
"David G. Johnston"
Дата:
On Thu, Feb 1, 2024 at 11:59 PM Yugo NAGATA <nagata@sraoss.co.jp> wrote:

I attached a updated patch including fixes you pointed out above.


Removed "which"; changed "occupying" to "occupy"
Removed on of the two "amounts"
Changed "unacceptable to the input function" to just "converting" as that is what the average reader is more likely to be thinking.
The rest was good.

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 3c2feaa11a..55764fc1f2 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -385,8 +385,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     <term><literal>ON_ERROR</literal></term>
     <listitem>
      <para>
-      Specifies which how to behave when encountering an error due to column values
-      unacceptable to the input function of each attribute's data type.
+      Specifies how to behave when encountering an error converting a column's
+      input value into its data type.
       An <replaceable class="parameter">error_action</replaceable> value of
       <literal>stop</literal> means fail the command, while
       <literal>ignore</literal> means discard the input row and continue with the next one.
@@ -589,7 +589,7 @@ COPY <replaceable class="parameter">count</replaceable>
     The <command>COPY FROM</command> command physically inserts input rows
     into the table as it progresses.  If the command fails, these rows are
     left in a deleted state; these rows will not be visible, but still
-    occupying disk space. This might amount to a considerable amount of
+    occupy disk space. This might amount to considerable
     wasted disk space if the failure happened well into a large copy
     operation. <command>VACUUM</command> should be used to recover the
     wasted space.

David J.


Re: Small fix on COPY ON_ERROR document

От
Alexander Korotkov
Дата:
On Fri, Feb 2, 2024 at 10:07 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:
> On Thu, Feb 1, 2024 at 11:59 PM Yugo NAGATA <nagata@sraoss.co.jp> wrote:
>>
>>
>> I attached a updated patch including fixes you pointed out above.
>>
>
> Removed "which"; changed "occupying" to "occupy"
> Removed on of the two "amounts"
> Changed "unacceptable to the input function" to just "converting" as that is what the average reader is more likely
tobe thinking. 
> The rest was good.

Thanks to everybody in the thread.
Pushed.

------
Regards,
Alexander Korotkov



Re: Small fix on COPY ON_ERROR document

От
Yugo NAGATA
Дата:
On Sat, 3 Feb 2024 01:51:56 +0200
Alexander Korotkov <aekorotkov@gmail.com> wrote:

> On Fri, Feb 2, 2024 at 10:07 PM David G. Johnston
> <david.g.johnston@gmail.com> wrote:
> > On Thu, Feb 1, 2024 at 11:59 PM Yugo NAGATA <nagata@sraoss.co.jp> wrote:
> >>
> >>
> >> I attached a updated patch including fixes you pointed out above.
> >>
> >
> > Removed "which"; changed "occupying" to "occupy"
> > Removed on of the two "amounts"
> > Changed "unacceptable to the input function" to just "converting" as that is what the average reader is more likely
tobe thinking.
 
> > The rest was good.
> 
> Thanks to everybody in the thread.
> Pushed.

Thanks!
> 
> ------
> Regards,
> Alexander Korotkov


-- 
Yugo NAGATA <nagata@sraoss.co.jp>