Re: Add tuples_skipped to pg_stat_progress_copy

Поиск
Список
Период
Сортировка
От torikoshia
Тема Re: Add tuples_skipped to pg_stat_progress_copy
Дата
Msg-id 6f037b4201e4515c858f1a6eac18b2d2@oss.nttdata.com
обсуждение исходный текст
Ответ на Re: Add tuples_skipped to pg_stat_progress_copy  (Masahiko Sawada <sawada.mshk@gmail.com>)
Ответы Re: Add tuples_skipped to pg_stat_progress_copy  (Masahiko Sawada <sawada.mshk@gmail.com>)
Список pgsql-hackers
On 2024-01-17 14:47, Masahiko Sawada wrote:
> On Wed, Jan 17, 2024 at 2:22 PM torikoshia <torikoshia@oss.nttdata.com> 
> wrote:
>> 
>> Hi,
>> 
>> 132de9968840c introduced SAVE_ERROR_TO option to COPY and enabled to
>> skip malformed data, but there is no way to watch the number of 
>> skipped
>> rows during COPY.
>> 
>> Attached patch adds tuples_skipped to pg_stat_progress_copy, which
>> counts the number of skipped tuples because source data is malformed.
>> If SAVE_ERROR_TO is not specified, this column remains zero.
>> 
>> The advantage would be that users can quickly notice and stop COPYing
>> when there is a larger amount of skipped data than expected, for
>> example.
>> 
>> As described in commit log, it is expected to add more choices for
>> SAVE_ERROR_TO like 'log' and using such options may enable us to know
>> the number of skipped tuples during COPY, but exposed in
>> pg_stat_progress_copy would be easier to monitor.
>> 
>> 
>> What do you think?
> 
> +1
> 
> The patch is pretty simple. Here is a comment:
> 
> +       (if <literal>SAVE_ERROR_TO</literal> is specified, otherwise 
> zero).
> +      </para></entry>
> +     </row>
> 
> To be precise, this counter only advances when a value other than
> 'ERROR' is specified to SAVE_ERROR_TO option.

Thanks for your comment and review!

Updated the patch according to your comment and option name change by 
b725b7eec.


BTW, based on this patch, I think we can add another option which 
specifies the maximum tolerable number of malformed rows.
I remember this was discussed in [1], and feel it would be useful when 
loading 'dirty' data but there is a limit to how dirty it can be.
Attached 0002 is WIP patch for this(I haven't added doc yet).

This may be better discussed in another thread, but any comments(e.g. 
necessity of this option, option name) are welcome.


[1] 
https://www.postgresql.org/message-id/752672.1699474336%40sss.pgh.pa.us

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: jian he
Дата:
Сообщение: Re: remaining sql/json patches
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Re: Adding facility for injection points (or probe points?) for more advanced tests