Re: Add new COPY option REJECT_LIMIT
От | torikoshia |
---|---|
Тема | Re: Add new COPY option REJECT_LIMIT |
Дата | |
Msg-id | de0551d0f9d9e9072e324e51ed5c426d@oss.nttdata.com обсуждение исходный текст |
Ответ на | Re: Add new COPY option REJECT_LIMIT (Fujii Masao <masao.fujii@oss.nttdata.com>) |
Ответы |
Re: Add new COPY option REJECT_LIMIT
|
Список | pgsql-hackers |
On 2024-07-03 02:07, Fujii Masao wrote: Thanks for your comments! > On 2024/01/26 18:49, torikoshia wrote: >> Hi, >> >> 9e2d870 enabled the COPY command to skip soft error, and I think we >> can add another option which specifies the maximum tolerable number of >> soft errors. >> >> I remember this was discussed in [1], and feel it would be useful when >> loading 'dirty' data but there is a limit to how dirty it can be. >> >> Attached a patch for this. >> >> What do you think? > > The patch no longer applies cleanly to HEAD. Could you update it? I'm going to update it after discussing the option format as described below. > > I think the REJECT_LIMIT feature is useful. Allowing it to be set as > either the absolute number of skipped rows or a percentage of the > total input rows is a good idea. > > However, if we support REJECT_LIMIT, I'm not sure if the ON_ERROR > option is still necessary. REJECT_LIMIT seems to cover the same cases. > For instance, REJECT_LIMIT=infinity can act like ON_ERROR=ignore, and > REJECT_LIMIT=0 can act like ON_ERROR=stop. I agree that it's possible to use only REJECT_LIMIT without ON_ERROR. I also think it's easy to understand that REJECT_LIMIT=0 is ON_ERROR=stop. However, expressing REJECT_LIMIT='infinity' needs some definition like "setting REJECT_LIMIT to -1 means 'infinity'", doesn't it? If so, I think this might not so intuitive. Also, since it seems Snowflake and Redshift have both options equivalent to REJECT_LIMIT and ON_ERROR, having both of them in PostgreSQL COPY might not be surprising: - Snowflake's ON_ERROR accepts "CONTINUE | SKIP_FILE | SKIP_FILE_num | 'SKIP_FILE_num%' | ABORT_STATEMENT"[1] - Redshift has MAXERROR and IGNOREALLERRORS options[2] BTW after seeing Snowflake makes SKIP_FILE_num one of the options of ON_ERROR, I'm a bit wondering whether REJECT_LIMIT also should be the same. [1] https://docs.snowflake.com/en/sql-reference/sql/copy-into-table#copy-options-copyoptions [2] https://docs.aws.amazon.com/en_en/redshift/latest/dg/copy-parameters-data-load.html -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
В списке pgsql-hackers по дате отправления: