Re: Add new error_action COPY ON_ERROR "log"

Поиск
Список
Период
Сортировка
От torikoshia
Тема Re: Add new error_action COPY ON_ERROR "log"
Дата
Msg-id 3bad6fd5570c5b63e0b68f79e20253f2@oss.nttdata.com
обсуждение исходный текст
Ответ на Re: Add new error_action COPY ON_ERROR "log"  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Ответы Re: Add new error_action COPY ON_ERROR "log"  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Список pgsql-hackers
On 2024-02-12 15:46, Bharath Rupireddy wrote:

Thanks for your comments.

> On Mon, Jan 29, 2024 at 8:41 AM torikoshia <torikoshia@oss.nttdata.com> 
> wrote:
>> 
>> On 2024-01-27 00:43, David G. Johnston wrote:
>> 
>> >> I'd like to have a new option "log", which skips soft errors and
>> >> logs
>> >> information that should have resulted in errors to PostgreSQL log.
>> >
>> > user-specified tables in the same database.  Setting up temporary
>> > tables or unlogged tables probably is going to be a more acceptable
>> > methodology than trying to get to the log files.
>> 
>> OTOH not everyone thinks saving table information is the best idea[2].
> 
> The added NOTICE message gives a summary of how many rows were
> skipped, but not the line numbers. It's hard for the users to find the
> malformed data, especially when they are bulk-inserting from data
> files of multiple GBs in size (hard to open such files in any editor
> too).
> 
> I understand the value of writing the info to server log or tables of
> users' choice as being discussed in a couple of other threads.
> However, I'd prefer we do the simplest thing first before we go debate
> server log or table - let the users know what line numbers are
> containing the errors that COPY ignored something like [1] with a
> simple change like [2].

Agreed.
Unlike my patch, it hides the error information(i.e. 22P02: invalid 
input syntax for type integer: ), but I feel that it's usually 
sufficient to know the row number and column where the error occurred.

> It not only helps users figure out which rows
> and attributes were malformed, but also helps them redirect them to
> server logs with setting log_min_messages = notice [3]. In the worst
> case scenario, a problem with this one NOTICE per malformed row is
> that it can overload the psql session if all the rows are malformed.
> I'm not sure if this is a big problem, but IMO better than a single
> summary NOTICE message and simpler than writing to tables of users'
> choice.

Maybe could we do what you suggested for the behavior when 'log' is set 
to on_error?

> Thoughts?
> 
> FWIW, I presented the new COPY ... ON_ERROR option feature at
> Hyderabad PostgreSQL User Group meetup and it was well-received by the
> audience. The audience felt it's going to be extremely helpful for
> bulk-loading tasks. Thank you all for working on this feature.

Thanks for informing it!
I hope it will not be reverted as the audience:)

> [1]
> postgres=# CREATE TABLE check_ign_err (n int, m int[], k int);
> CREATE TABLE
> postgres=# COPY check_ign_err FROM STDIN WITH (on_error ignore);
> Enter data to be copied followed by a newline.
> End with a backslash and a period on a line by itself, or an EOF 
> signal.
>>> 1    {1}     1
> a       {2}     2
> 3       {3}     3333333333
> 4       {a, 4}  4
> 
> 5       {5}>> >> >> >> >>       5
> \.>>
> NOTICE:  detected data type incompatibility at line number 2 for
> column check_ign_err, COPY n
> NOTICE:  detected data type incompatibility at line number 3 for
> column check_ign_err, COPY k
> NOTICE:  detected data type incompatibility at line number 4 for
> column check_ign_err, COPY m
> NOTICE:  detected data type incompatibility at line number 5 for
> column check_ign_err, COPY n
> NOTICE:  4 rows were skipped due to data type incompatibility
> COPY 2
> 
> [2]
> diff --git a/src/backend/commands/copyfromparse.c
> b/src/backend/commands/copyfromparse.c
> index 906756362e..93ab5d4ebb 100644
> --- a/src/backend/commands/copyfromparse.c
> +++ b/src/backend/commands/copyfromparse.c
> @@ -961,7 +961,16 @@ NextCopyFrom(CopyFromState cstate, ExprContext 
> *econtext,
> 
>                  &values[m]))
>                         {
>                                 cstate->num_errors++;
> -                               return true;
> +
> +                               if (cstate->opts.on_error != 
> COPY_ON_ERROR_STOP)
> +                               {
> +                                       ereport(NOTICE,
> +
> errmsg("detected data type incompatibility at line number %llu for
> column %s, COPY %s",
> +
> (unsigned long long) cstate->cur_lineno,
> +
> cstate->cur_relname,
> +
> cstate->cur_attname));
> +                                       return true;
> +                               }
>                         }
> 
>                         cstate->cur_attname = NULL;
> 
> [3]
> 2024-02-12 06:20:29.363 UTC [427892] NOTICE:  detected data type
> incompatibility at line number 2 for column check_ign_err, COPY n
> 2024-02-12 06:20:29.363 UTC [427892] CONTEXT:  COPY check_ign_err,
> line 2, column n: "a"
> 2024-02-12 06:20:29.363 UTC [427892] NOTICE:  detected data type
> incompatibility at line number 3 for column check_ign_err, COPY k
> 2024-02-12 06:20:29.363 UTC [427892] CONTEXT:  COPY check_ign_err,
> line 3, column k: "3333333333"
> 2024-02-12 06:20:29.363 UTC [427892] NOTICE:  detected data type
> incompatibility at line number 4 for column check_ign_err, COPY m
> 2024-02-12 06:20:29.363 UTC [427892] CONTEXT:  COPY check_ign_err,
> line 4, column m: "{a, 4}"
> 2024-02-12 06:20:29.363 UTC [427892] NOTICE:  detected data type
> incompatibility at line number 5 for column check_ign_err, COPY n
> 2024-02-12 06:20:29.363 UTC [427892] CONTEXT:  COPY check_ign_err,
> line 5, column n: ""
> 2024-02-12 06:20:29.363 UTC [427892] NOTICE:  4 rows were skipped due
> to data type incompatibility
> 
> --
> Bharath Rupireddy
> PostgreSQL Contributors Team
> RDS Open Source Databases
> Amazon Web Services: https://aws.amazon.com

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: About a recently-added message
Следующее
От: Jim Jones
Дата:
Сообщение: Re: Psql meta-command conninfo+