Обсуждение: Fdw batch insert error out when set batch_size > 65535
Hi, When reading the code, I noticed some possible issue about fdw batch insert. When I set batch_size > 65535 and insert more than 65535 rows into foreign table, It will throw an error: For example: ------------------ CREATE FOREIGN TABLE vzdalena_tabulka2(a int, b varchar) SERVER omega_db OPTIONS (table_name 'tabulka', batch_size '65536'); INSERT INTO vzdalena_tabulka2 SELECT i, 'AHOJ' || i FROM generate_series(1,65536) g(i); ERROR: number of parameters must be between 0 and 65535 CONTEXT: remote SQL command: INSERT INTO public.tabulka(a, b) VALUES ($1, $2), ($3, $4)... ------------------ Actually, I think if the (number of columns) * (number of rows) > 65535, then we can get this error. But, I think almost no one will set such a large value, so I think adjust the batch_size automatically when doing INSERT seems an acceptable solution. In the postgresGetForeignModifyBatchSize(), if we found the (num of param) * batch_size Is greater than the limit(65535), then we set it to 65535 / (num of param). Thoughts ? Best regards, houzj
Вложения
On Fri, May 21, 2021 at 8:18 AM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote: > > Hi, > > When reading the code, I noticed some possible issue about fdw batch insert. > When I set batch_size > 65535 and insert more than 65535 rows into foreign table, > It will throw an error: > > For example: > > ------------------ > CREATE FOREIGN TABLE vzdalena_tabulka2(a int, b varchar) > SERVER omega_db > OPTIONS (table_name 'tabulka', batch_size '65536'); > > INSERT INTO vzdalena_tabulka2 SELECT i, 'AHOJ' || i FROM generate_series(1,65536) g(i); > > ERROR: number of parameters must be between 0 and 65535 > CONTEXT: remote SQL command: INSERT INTO public.tabulka(a, b) VALUES ($1, $2), ($3, $4)... > ------------------ > > Actually, I think if the (number of columns) * (number of rows) > 65535, then we can > get this error. But, I think almost no one will set such a large value, so I think adjust the > batch_size automatically when doing INSERT seems an acceptable solution. > > In the postgresGetForeignModifyBatchSize(), if we found the (num of param) * batch_size > Is greater than the limit(65535), then we set it to 65535 / (num of param). > > Thoughts ? +1 to limit batch_size for postgres_fdw and it's a good idea to have a macro for the max params. Few comments: 1) How about using macro in the error message, something like below? appendPQExpBuffer(errorMessage, libpq_gettext("number of parameters must be between 0 and %d\n"), PQ_MAX_PARAM_NUMBER); 2) I think we can choose not mention the 65535 because it's hard to maintain if that's changed in libpq protocol sometime in future. How about "The final number of rows postgres_fdw inserts in a batch actually depends on the number of columns and the provided batch_size value. This is because of the limit the libpq protocol (which postgres_fdw uses to connect to a remote server) has on the number of query parameters that can be specified per query. For instance, if the number of columns * batch_size is more than the limit, then the libpq emits an error. But postgres_fdw adjusts the batch_size to avoid this error." instead of + overrides an option specified for the server. Note if the batch size + exceed the protocol limit (number of columns * batch_size > 65535), + then the actual batch size will be less than the specified batch_size. 3) I think "postgres_fdw should insert in each insert operation" doesn't hold after this patch. We can reword it to "postgres_fdw inserts in each insert operation". This option specifies the number of rows <filename>postgres_fdw</filename> should insert in each insert operation. It can be specified for a 4) How about naming the macro as PQ_QUERY_PARAM_MAX_LIMIT? 5) We can use macro in code comments as well. + * 65535, so set the batch_size to not exceed limit in a batch insert. 6) I think both code and docs patches can be combined into a single patch. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Sent: Friday, May 21, 2021 1:42 PM > On Fri, May 21, 2021 at 8:18 AM houzj.fnst@fujitsu.com > <houzj.fnst@fujitsu.com> wrote: > > Actually, I think if the (number of columns) * (number of rows) > > > 65535, then we can get this error. But, I think almost no one will set > > such a large value, so I think adjust the batch_size automatically when doing > INSERT seems an acceptable solution. > > > > In the postgresGetForeignModifyBatchSize(), if we found the (num of > > param) * batch_size Is greater than the limit(65535), then we set it to 65535 / > (num of param). > > > > Thoughts ? > > +1 to limit batch_size for postgres_fdw and it's a good idea to have a > macro for the max params. > > Few comments: Thanks for the comments. > 1) How about using macro in the error message, something like below? > appendPQExpBuffer(errorMessage, > libpq_gettext("number of parameters must be > between 0 and %d\n"), > PQ_MAX_PARAM_NUMBER); Changed. > 2) I think we can choose not mention the 65535 because it's hard to maintain if > that's changed in libpq protocol sometime in future. How about "The final > number of rows postgres_fdw inserts in a batch actually depends on the > number of columns and the provided batch_size value. > This is because of the limit the libpq protocol (which postgres_fdw uses to > connect to a remote server) has on the number of query parameters that can > be specified per query. For instance, if the number of columns * batch_size is > more than the limit, then the libpq emits an error. But postgres_fdw adjusts the > batch_size to avoid this error." > instead of > + overrides an option specified for the server. Note if the batch size > + exceed the protocol limit (number of columns * batch_size > 65535), > + then the actual batch size will be less than the specified batch_size. Thanks, your description looks better. Changed. > 3) I think "postgres_fdw should insert in each insert operation" > doesn't hold after this patch. We can reword it to "postgres_fdw inserts in > each insert operation". > This option specifies the number of rows > <filename>postgres_fdw</filename> > should insert in each insert operation. It can be specified for a Changed. > 4) How about naming the macro as PQ_QUERY_PARAM_MAX_LIMIT? Changed. > 5) We can use macro in code comments as well. Thanks, I reorganized the code comments. > + * 65535, so set the batch_size to not exceed limit in a batch insert. > 6) I think both code and docs patches can be combined into a single patch. Combined. Attaching V2 patch. Please consider it for further review. Best regards, houzj
Вложения
On Fri, May 21, 2021 at 1:19 PM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote: > Attaching V2 patch. Please consider it for further review. Thanks for the patch. Some more comments: 1) Can fmstate->p_nums ever be 0 in postgresGetForeignModifyBatchSize? By any chance, if it can, I think instead of an assert, we can have something like below, since we are using it in the division. if (fmstate->p_nums > 0 && (batch_size * fmstate->p_nums > PQ_QUERY_PARAM_MAX_LIMIT)) { batch_size = PQ_QUERY_PARAM_MAX_LIMIT / fmstate->p_nums; } Also, please remove the { and } for the above if condition, because for 1 line statements we don't normally use { and } 2) An empty line after the macro definition will be good. +#define PQ_QUERY_PARAM_MAX_LIMIT 65535 extern int PQsendQuery(PGconn *conn, const char *query); 3) I think we use: <filename>postgres_fdw</filename> not postgres_fdw <literal>batch_size</literal> not batch_size the number of columns * <literal>batch_size</literal> not the number of columns * batch_size + overrides an option specified for the server. Note the final number + of rows postgres_fdw inserts in a batch actually depends on the + number of columns and the provided batch_size value. This is because + of the limit the libpq protocol (which postgres_fdw uses to connect + to a remote server) has on the number of query parameters that can + be specified per query. For instance, if the number of columns * batch_size + is more than the limit, then the libpq emits an error. But postgres_fdw + adjusts the batch_size to avoid this error. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
On 5/21/21 5:03 AM, Bharath Rupireddy wrote: > On Fri, May 21, 2021 at 1:19 PM houzj.fnst@fujitsu.com > <houzj.fnst@fujitsu.com> wrote: >> Attaching V2 patch. Please consider it for further review. > Thanks for the patch. Some more comments: > > 1) Can fmstate->p_nums ever be 0 in postgresGetForeignModifyBatchSize? > By any chance, if it can, I think instead of an assert, we can have > something like below, since we are using it in the division. > if (fmstate->p_nums > 0 && > (batch_size * fmstate->p_nums > PQ_QUERY_PARAM_MAX_LIMIT)) > { > batch_size = PQ_QUERY_PARAM_MAX_LIMIT / fmstate->p_nums; > } > Also, please remove the { and } for the above if condition, because > for 1 line statements we don't normally use { and } We used to filter that out in pgindent IIRC but we don't any more. IMNSHO there are cases when it makes the code more readable, especially if (as here) the condition spans more than one line. I also personally dislike having one branch of an "if" statement with braces and another without - it looks far better to my eyes to have all or none with braces. But I realize that's a matter of taste, and there are plenty of examples in the code running counter to my taste here. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Sent: Friday, May 21, 2021 5:03 PM > On Fri, May 21, 2021 at 1:19 PM houzj.fnst@fujitsu.com > <houzj.fnst@fujitsu.com> wrote: > > Attaching V2 patch. Please consider it for further review. > > Thanks for the patch. Some more comments: > > 1) Can fmstate->p_nums ever be 0 in postgresGetForeignModifyBatchSize? > By any chance, if it can, I think instead of an assert, we can have something like > below, since we are using it in the division. > if (fmstate->p_nums > 0 && > (batch_size * fmstate->p_nums > PQ_QUERY_PARAM_MAX_LIMIT)) > { > batch_size = PQ_QUERY_PARAM_MAX_LIMIT / fmstate->p_nums; > } > Also, please remove the { and } for the above if condition, because for 1 line > statements we don't normally use { and } > 2) An empty line after the macro definition will be good. > +#define PQ_QUERY_PARAM_MAX_LIMIT 65535 > extern int PQsendQuery(PGconn *conn, const char *query); > 3) I think we use: > <filename>postgres_fdw</filename> not postgres_fdw > <literal>batch_size</literal> not batch_size the number of columns * > <literal>batch_size</literal> not the number of columns * batch_size > + overrides an option specified for the server. Note the final number > + of rows postgres_fdw inserts in a batch actually depends on the > + number of columns and the provided batch_size value. This is because > + of the limit the libpq protocol (which postgres_fdw uses to connect > + to a remote server) has on the number of query parameters that can > + be specified per query. For instance, if the number of columns > * batch_size > + is more than the limit, then the libpq emits an error. But postgres_fdw > + adjusts the batch_size to avoid this error. Thanks for the comments. I have addressed all comments to the v3 patch. BTW, Is the batch_size issue here an Open Item of PG14 ? Best regards, houzj
Вложения
On Tue, May 25, 2021 at 1:08 PM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote: > Thanks for the comments. I have addressed all comments to the v3 patch. Thanks! The patch basically looks good to me except that it is missing a commit message. I think it can be added now. > BTW, Is the batch_size issue here an Open Item of PG14 ? IMO, the issue you found when setting batch_size to a too high value is an extreme case testing of the feature added by commit b663a4136 that introduced the batch_size parameter. So, it's a bug to me. I think you can add it as a bug in the commitfest and let the committers take the call. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
On Tue, May 25, 2021 at 2:47 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Tue, May 25, 2021 at 1:08 PM houzj.fnst@fujitsu.com > <houzj.fnst@fujitsu.com> wrote: > > Thanks for the comments. I have addressed all comments to the v3 patch. > > Thanks! The patch basically looks good to me except that it is missing > a commit message. I think it can be added now. With v3 patch, I observed failure in postgres_fdw test cases with insert query in prepared statements. Root cause is that in postgresGetForeignModifyBatchSize, fmstate can be null (see the existing code which handles for fmstate null cases). I fixed this, and added a commit message. PSA v4 patch. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Вложения
On 5/26/21 8:57 AM, Bharath Rupireddy wrote: > On Tue, May 25, 2021 at 2:47 PM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: >> >> On Tue, May 25, 2021 at 1:08 PM houzj.fnst@fujitsu.com >> <houzj.fnst@fujitsu.com> wrote: >>> Thanks for the comments. I have addressed all comments to the v3 patch. >> >> Thanks! The patch basically looks good to me except that it is missing >> a commit message. I think it can be added now. > > With v3 patch, I observed failure in postgres_fdw test cases with > insert query in prepared statements. Root cause is that in > postgresGetForeignModifyBatchSize, fmstate can be null (see the > existing code which handles for fmstate null cases). I fixed this, and > added a commit message. PSA v4 patch. > Thanks. In what situation is the fmstate NULL? If it is NULL, the current code simply skips the line adjusting it. Doesn't that mean we may not actually fix the bug in that case? Also, I think it'd be keep the existing comment, probably as the first line of the new comment block. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, May 26, 2021 at 6:36 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > On 5/26/21 8:57 AM, Bharath Rupireddy wrote: > > On Tue, May 25, 2021 at 2:47 PM Bharath Rupireddy > > <bharath.rupireddyforpostgres@gmail.com> wrote: > >> > >> On Tue, May 25, 2021 at 1:08 PM houzj.fnst@fujitsu.com > >> <houzj.fnst@fujitsu.com> wrote: > >>> Thanks for the comments. I have addressed all comments to the v3 patch. > >> > >> Thanks! The patch basically looks good to me except that it is missing > >> a commit message. I think it can be added now. > > > > With v3 patch, I observed failure in postgres_fdw test cases with > > insert query in prepared statements. Root cause is that in > > postgresGetForeignModifyBatchSize, fmstate can be null (see the > > existing code which handles for fmstate null cases). I fixed this, and > > added a commit message. PSA v4 patch. > > > > Thanks. In what situation is the fmstate NULL? If it is NULL, the > current code simply skips the line adjusting it. Doesn't that mean we > may not actually fix the bug in that case? fmstate i.e. resultRelInfo->ri_FdwState is NULL for EXPLAIN without ANALYZE cases, below comment says it and we can't get the bug because we don't actually execute the insert statement. The bug occurs on the remote server when the insert query with those many query parameters is submitted to the remote server. I'm not sure if there are any other cases where it can be NULL. /* * In EXPLAIN without ANALYZE, ri_fdwstate is NULL, so we have to lookup * the option directly in server/table options. Otherwise just use the * value we determined earlier. */ if (fmstate) batch_size = fmstate->batch_size; else batch_size = get_batch_size_option(resultRelInfo->ri_RelationDesc); > Also, I think it'd be keep the existing comment, probably as the first > line of the new comment block. Do you mean to say we need to retain "/* Otherwise use the batch size specified for server/table. */"? If so, PSA v5. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Вложения
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Sent: Wednesday, May 26, 2021 9:56 PM > On Wed, May 26, 2021 at 6:36 PM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: > > > > On 5/26/21 8:57 AM, Bharath Rupireddy wrote: > > > On Tue, May 25, 2021 at 2:47 PM Bharath Rupireddy > > > <bharath.rupireddyforpostgres@gmail.com> wrote: > > >> > > >> On Tue, May 25, 2021 at 1:08 PM houzj.fnst@fujitsu.com > > >> <houzj.fnst@fujitsu.com> wrote: > > >>> Thanks for the comments. I have addressed all comments to the v3 > patch. > > >> > > >> Thanks! The patch basically looks good to me except that it is > > >> missing a commit message. I think it can be added now. > > > > > > With v3 patch, I observed failure in postgres_fdw test cases with > > > insert query in prepared statements. Root cause is that in > > > postgresGetForeignModifyBatchSize, fmstate can be null (see the > > > existing code which handles for fmstate null cases). I fixed this, > > > and added a commit message. PSA v4 patch. > > > > > > > Thanks. In what situation is the fmstate NULL? If it is NULL, the > > current code simply skips the line adjusting it. Doesn't that mean we > > may not actually fix the bug in that case? > > fmstate i.e. resultRelInfo->ri_FdwState is NULL for EXPLAIN without ANALYZE > cases, below comment says it and we can't get the bug because we don't > actually execute the insert statement. The bug occurs on the remote server > when the insert query with those many query parameters is submitted to the > remote server. Agreed. The "ri_FdwState" is initialized in postgresBeginForeignInsert or postgresBeginForeignModify. I think the above functions are always invoked before getting the batch_size. Only in EXPLAIN mode, it will not initialize the ri_FdwState. /* * Do nothing in EXPLAIN (no ANALYZE) case. resultRelInfo->ri_FdwState * stays NULL. */ if (eflags & EXEC_FLAG_EXPLAIN_ONLY) return; Best regards, houzj
Hi, I took at this patch today. I did some minor changes, mostly: 1) change the code limiting batch_size from if (fmstate->p_nums > 0 && (batch_size * fmstate->p_nums > PQ_QUERY_PARAM_MAX_LIMIT)) { batch_size = PQ_QUERY_PARAM_MAX_LIMIT / fmstate->p_nums; } to if (fmstate && fmstate->p_nums > 0) batch_size = Min(batch_size, PQ_QUERY_PARAM_MAX_LIMIT / fmstate->p_nums); which I think is somewhat clearer / more common patter. 2) I've reworded the docs a bit, splitting the single para into two. I think this makes it clearer. Attached is a patch doing this. Please check the commit message etc. Barring objections I'll get it committed in a couple days. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
On Mon, May 31, 2021 at 1:21 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > Hi, > > I took at this patch today. I did some minor changes, mostly: > > 1) change the code limiting batch_size from > > if (fmstate->p_nums > 0 && > (batch_size * fmstate->p_nums > PQ_QUERY_PARAM_MAX_LIMIT)) > { > batch_size = PQ_QUERY_PARAM_MAX_LIMIT / fmstate->p_nums; > } > > to > > if (fmstate && fmstate->p_nums > 0) > batch_size = Min(batch_size, > PQ_QUERY_PARAM_MAX_LIMIT / fmstate->p_nums); > > which I think is somewhat clearer / more common patter. Agree, that looks pretty good. > 2) I've reworded the docs a bit, splitting the single para into two. I > think this makes it clearer. LGTM, except one thing that the batch_size description says "should insert in", but it's not that the value entered for batch_size is always honoured right? Because this patch might it. This option specifies the number of rows <filename>postgres_fdw</filename> should insert in each insert operation. It can be specified for a So, I suggest to remove "should" and change it to: This option specifies the number of rows <filename>postgres_fdw</filename> inserts in each insert operation. It can be specified for a > Attached is a patch doing this. Please check the commit message etc. > Barring objections I'll get it committed in a couple days. One minor comment: In the commit message, Int16 is used The FE/BE protocol identifies parameters with an Int16 index, which limits the maximum number of parameters per query to 65535. With and in the code comments uint16 is used. + * parameters in a batch is limited to 64k (uint16), so make sure we don't Isn't it uint16 in the commit message too? Also, can we use 64k in the commit message instead of 65535? With Regards, Bharath Rupireddy.
On 5/31/21 6:01 AM, Bharath Rupireddy wrote: > On Mon, May 31, 2021 at 1:21 AM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> >> Hi, >> >> I took at this patch today. I did some minor changes, mostly: >> >> 1) change the code limiting batch_size from >> >> if (fmstate->p_nums > 0 && >> (batch_size * fmstate->p_nums > PQ_QUERY_PARAM_MAX_LIMIT)) >> { >> batch_size = PQ_QUERY_PARAM_MAX_LIMIT / fmstate->p_nums; >> } >> >> to >> >> if (fmstate && fmstate->p_nums > 0) >> batch_size = Min(batch_size, >> PQ_QUERY_PARAM_MAX_LIMIT / fmstate->p_nums); >> >> which I think is somewhat clearer / more common patter. > > Agree, that looks pretty good. > >> 2) I've reworded the docs a bit, splitting the single para into two. I >> think this makes it clearer. > > LGTM, except one thing that the batch_size description says "should > insert in", but it's not that the value entered for batch_size is > always honoured right? Because this patch might it. > > This option specifies the number of rows > <filename>postgres_fdw</filename> > should insert in each insert operation. It can be specified for a > > So, I suggest to remove "should" and change it to: > > This option specifies the number of rows > <filename>postgres_fdw</filename> > inserts in each insert operation. It can be specified for a > I think the "should" indicates exactly that postgres_fdw may adjust the batch size. Without it it sounds much more definitive, so I kept it. >> Attached is a patch doing this. Please check the commit message etc. >> Barring objections I'll get it committed in a couple days. > > One minor comment: > In the commit message, Int16 is used > The FE/BE protocol identifies parameters with an Int16 index, which > limits the maximum number of parameters per query to 65535. With > > and in the code comments uint16 is used. > + * parameters in a batch is limited to 64k (uint16), so make sure we don't > > Isn't it uint16 in the commit message too? Also, can we use 64k in the > commit message instead of 65535? > No, the "Int16" refers to the FE/BE documentation, where we use Int16. But in the C code we interpret it as uint16. I've added a simple regression test to postgres_fdw, testing that batch sizes > 65535 work fine, and pushed the fix. I've considered checking the value in postgres_fdw_validator and just rejecting anything over 65535, but I've decided against that. We'd still need to adjust depending on number of columns anyway. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Jun 9, 2021 at 12:04 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > No, the "Int16" refers to the FE/BE documentation, where we use Int16. > But in the C code we interpret it as uint16. Hm. I see that in protocol.sgml Int16 is being used. > I've added a simple regression test to postgres_fdw, testing that batch > sizes > 65535 work fine, and pushed the fix. I was earlier thinking of adding one, but stopped because it might increase the regression test execution time. It looks like that's true - with and without the test case it takes 17 sec and 4 sec respectively on my dev system which is 4X slower. I'm not sure if this is okay. With Regards, Bharath Rupireddy.
Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> writes: >> I've added a simple regression test to postgres_fdw, testing that batch >> sizes > 65535 work fine, and pushed the fix. > I was earlier thinking of adding one, but stopped because it might > increase the regression test execution time. It looks like that's true > - with and without the test case it takes 17 sec and 4 sec > respectively on my dev system which is 4X slower. I'm not sure if this > is okay. The cost, versus the odds of ever detecting a problem, doesn't seem like a good tradeoff. regards, tom lane
I wrote: > Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> writes: >>> I've added a simple regression test to postgres_fdw, testing that batch >>> sizes > 65535 work fine, and pushed the fix. >> I was earlier thinking of adding one, but stopped because it might >> increase the regression test execution time. It looks like that's true >> - with and without the test case it takes 17 sec and 4 sec >> respectively on my dev system which is 4X slower. I'm not sure if this >> is okay. > The cost, versus the odds of ever detecting a problem, doesn't > seem like a good tradeoff. I took a quick look and noted that on buildfarm member longfin (to take a random example that's sitting a few feet from me), the time for contrib-install-check went from 34 seconds before this patch to 40 seconds after. I find that completely unacceptable compared to the likely value of this test case. regards, tom lane
On 6/9/21 8:28 AM, Tom Lane wrote: > I wrote: >> Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> writes: >>>> I've added a simple regression test to postgres_fdw, testing that batch >>>> sizes > 65535 work fine, and pushed the fix. > >>> I was earlier thinking of adding one, but stopped because it might >>> increase the regression test execution time. It looks like that's true >>> - with and without the test case it takes 17 sec and 4 sec >>> respectively on my dev system which is 4X slower. I'm not sure if this >>> is okay. > >> The cost, versus the odds of ever detecting a problem, doesn't >> seem like a good tradeoff. > > I took a quick look and noted that on buildfarm member longfin > (to take a random example that's sitting a few feet from me), > the time for contrib-install-check went from 34 seconds before > this patch to 40 seconds after. I find that completely > unacceptable compared to the likely value of this test case. > Note that the problem here is [1] - we're creating a lot of slots referencing the same tuple descriptor, which inflates the duration. There's a fix in the other thread, which eliminates ~99% of the overhead. I plan to push that fix soon (a day or two). regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 6/9/21 12:22 PM, Tomas Vondra wrote: > > > On 6/9/21 8:28 AM, Tom Lane wrote: >> I wrote: >>> Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> writes: >>>>> I've added a simple regression test to postgres_fdw, testing that >>>>> batch >>>>> sizes > 65535 work fine, and pushed the fix. >> >>>> I was earlier thinking of adding one, but stopped because it might >>>> increase the regression test execution time. It looks like that's true >>>> - with and without the test case it takes 17 sec and 4 sec >>>> respectively on my dev system which is 4X slower. I'm not sure if this >>>> is okay. >> >>> The cost, versus the odds of ever detecting a problem, doesn't >>> seem like a good tradeoff. >> >> I took a quick look and noted that on buildfarm member longfin >> (to take a random example that's sitting a few feet from me), >> the time for contrib-install-check went from 34 seconds before >> this patch to 40 seconds after. I find that completely >> unacceptable compared to the likely value of this test case. >> > > Note that the problem here is [1] - we're creating a lot of slots > referencing the same tuple descriptor, which inflates the duration. > There's a fix in the other thread, which eliminates ~99% of the > overhead. I plan to push that fix soon (a day or two). > Forgot to add the link: [1] https://www.postgresql.org/message-id/ebbbcc7d-4286-8c28-0272-61b4753af761%40enterprisedb.com regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Tomas Vondra <tomas.vondra@enterprisedb.com> writes: > Note that the problem here is [1] - we're creating a lot of slots > referencing the same tuple descriptor, which inflates the duration. > There's a fix in the other thread, which eliminates ~99% of the > overhead. I plan to push that fix soon (a day or two). Oh, okay, as long as there's a plan to bring the time back down. regards, tom lane
On 6/9/21 3:28 PM, Tom Lane wrote: > Tomas Vondra <tomas.vondra@enterprisedb.com> writes: >> Note that the problem here is [1] - we're creating a lot of slots >> referencing the same tuple descriptor, which inflates the duration. >> There's a fix in the other thread, which eliminates ~99% of the >> overhead. I plan to push that fix soon (a day or two). > > Oh, okay, as long as there's a plan to bring the time back down. > Yeah. Sorry for not mentioning this in the original message about the new regression test. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 6/9/21 4:05 PM, Tomas Vondra wrote: > On 6/9/21 3:28 PM, Tom Lane wrote: >> Tomas Vondra <tomas.vondra@enterprisedb.com> writes: >>> Note that the problem here is [1] - we're creating a lot of slots >>> referencing the same tuple descriptor, which inflates the duration. >>> There's a fix in the other thread, which eliminates ~99% of the >>> overhead. I plan to push that fix soon (a day or two). >> >> Oh, okay, as long as there's a plan to bring the time back down. >> > > Yeah. Sorry for not mentioning this in the original message about the > new regression test. > I've pushed a fix addressing the performance issue. There's one caveat, though - for regular builds the slowdown is pretty much eliminated. But with valgrind it's still considerably slower. For postgres_fdw the "make check" used to take ~5 minutes for me, now it takes >1h. And yes, this is entirely due to the new test case which is generating / inserting 70k rows. So maybe the test case is not worth it after all, and we should get rid of it. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Tomas Vondra <tomas.vondra@enterprisedb.com> writes: > There's one caveat, though - for regular builds the slowdown is pretty > much eliminated. But with valgrind it's still considerably slower. For > postgres_fdw the "make check" used to take ~5 minutes for me, now it > takes >1h. And yes, this is entirely due to the new test case which is > generating / inserting 70k rows. So maybe the test case is not worth it > after all, and we should get rid of it. I bet the CLOBBER_CACHE animals won't like it much either. I suggest what we do is leave it in place for long enough to get a round of reports from those slow animals, and then (assuming those reports are positive) drop the test. regards, tom lane
On 2021-Jun-12, Tomas Vondra wrote: > There's one caveat, though - for regular builds the slowdown is pretty > much eliminated. But with valgrind it's still considerably slower. For > postgres_fdw the "make check" used to take ~5 minutes for me, now it > takes >1h. And yes, this is entirely due to the new test case which is > generating / inserting 70k rows. So maybe the test case is not worth it > after all, and we should get rid of it. Hmm, what if the table is made 1600 columns wide -- would inserting 41 rows be sufficient to trigger the problem case? If it does, maybe it would reduce the runtime for valgrind/cache-clobber animals enough that it's no longer a concern. -- Álvaro Herrera Valdivia, Chile "At least to kernel hackers, who really are human, despite occasional rumors to the contrary" (LWN.net)
On 6/13/21 2:40 AM, Alvaro Herrera wrote: > On 2021-Jun-12, Tomas Vondra wrote: > >> There's one caveat, though - for regular builds the slowdown is pretty >> much eliminated. But with valgrind it's still considerably slower. For >> postgres_fdw the "make check" used to take ~5 minutes for me, now it >> takes >1h. And yes, this is entirely due to the new test case which is >> generating / inserting 70k rows. So maybe the test case is not worth it >> after all, and we should get rid of it. > > Hmm, what if the table is made 1600 columns wide -- would inserting 41 > rows be sufficient to trigger the problem case? If it does, maybe it > would reduce the runtime for valgrind/cache-clobber animals enough that > it's no longer a concern. > Good idea. I gave that a try, creating a table with 1500 columns and inserting 50 rows (so 75k parameters). See the attached patch. While this cuts the runtime about in half (to ~30 minutes on my laptop), that's probably not enough - it's still about ~6x longer than it used to take. All these timings are with valgrind. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
On Sun, Jun 13, 2021 at 6:10 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > On 2021-Jun-12, Tomas Vondra wrote: > > > There's one caveat, though - for regular builds the slowdown is pretty > > much eliminated. But with valgrind it's still considerably slower. For > > postgres_fdw the "make check" used to take ~5 minutes for me, now it > > takes >1h. And yes, this is entirely due to the new test case which is > > generating / inserting 70k rows. So maybe the test case is not worth it > > after all, and we should get rid of it. > > Hmm, what if the table is made 1600 columns wide -- would inserting 41 > rows be sufficient to trigger the problem case? If it does, maybe it > would reduce the runtime for valgrind/cache-clobber animals enough that > it's no longer a concern. Yeah, that's a good idea. PSA patch that creates the table of 1600 columns and inserts 41 rows into the foreign table. If the batch_size adjustment fix isn't there, we will hit the error. On my dev system, postgres_fdw contrib regression tests execution time: with and without the attached patch 4.5 sec and 5.7 sec respectively. On Sun, Jun 13, 2021 at 7:25 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > Good idea. I gave that a try, creating a table with 1500 columns and > inserting 50 rows (so 75k parameters). See the attached patch. Thanks for the patch. I also prepared a patch, just sharing. I'm okay if it's ignored. With Regards, Bharath Rupireddy.
Вложения
On 6/13/21 5:25 PM, Bharath Rupireddy wrote: > On Sun, Jun 13, 2021 at 6:10 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: >> >> On 2021-Jun-12, Tomas Vondra wrote: >> >>> There's one caveat, though - for regular builds the slowdown is pretty >>> much eliminated. But with valgrind it's still considerably slower. For >>> postgres_fdw the "make check" used to take ~5 minutes for me, now it >>> takes >1h. And yes, this is entirely due to the new test case which is >>> generating / inserting 70k rows. So maybe the test case is not worth it >>> after all, and we should get rid of it. >> >> Hmm, what if the table is made 1600 columns wide -- would inserting 41 >> rows be sufficient to trigger the problem case? If it does, maybe it >> would reduce the runtime for valgrind/cache-clobber animals enough that >> it's no longer a concern. > > Yeah, that's a good idea. PSA patch that creates the table of 1600 > columns and inserts 41 rows into the foreign table. If the batch_size > adjustment fix isn't there, we will hit the error. On my dev system, > postgres_fdw contrib regression tests execution time: with and without > the attached patch 4.5 sec and 5.7 sec respectively. > But we're discussing cases with valgrind and/or CLOBBER_CACHE_ALWAYS. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sun, Jun 13, 2021 at 9:28 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > On 6/13/21 5:25 PM, Bharath Rupireddy wrote: > > On Sun, Jun 13, 2021 at 6:10 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > >> > >> On 2021-Jun-12, Tomas Vondra wrote: > >> > >>> There's one caveat, though - for regular builds the slowdown is pretty > >>> much eliminated. But with valgrind it's still considerably slower. For > >>> postgres_fdw the "make check" used to take ~5 minutes for me, now it > >>> takes >1h. And yes, this is entirely due to the new test case which is > >>> generating / inserting 70k rows. So maybe the test case is not worth it > >>> after all, and we should get rid of it. > >> > >> Hmm, what if the table is made 1600 columns wide -- would inserting 41 > >> rows be sufficient to trigger the problem case? If it does, maybe it > >> would reduce the runtime for valgrind/cache-clobber animals enough that > >> it's no longer a concern. > > > > Yeah, that's a good idea. PSA patch that creates the table of 1600 > > columns and inserts 41 rows into the foreign table. If the batch_size > > adjustment fix isn't there, we will hit the error. On my dev system, > > postgres_fdw contrib regression tests execution time: with and without > > the attached patch 4.5 sec and 5.7 sec respectively. > > > > But we're discussing cases with valgrind and/or CLOBBER_CACHE_ALWAYS. Okay. Here are the readings on my dev system: 1) on master with the existing test case with inserting 70K rows: 4263200 ms (71.05 min) 2) with Tomas's patch with the test case modified with 1500 table columns and 50 rows, (majority of the time ~30min it took in SELECT create_batch_tables(1500); statement. I measured this time manually looking at the start and end time of the statement - 6649312 ms (110.8 min) 3) with my patch with test case modified with 1600 table columns and 41 rows: 4003007 ms (66.71 min) 4) on master without the test case at all: 3770722 ms (62.84 min) With Regards, Bharath Rupireddy.
On Mon, Jun 14, 2021, 5:33 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
Okay. Here are the readings on my dev system:
1) on master with the existing test case with inserting 70K rows:
4263200 ms (71.05 min)
2) with Tomas's patch with the test case modified with 1500 table
columns and 50 rows, (majority of the time ~30min it took in SELECT
create_batch_tables(1500); statement. I measured this time manually
looking at the start and end time of the statement - 6649312 ms (110.8
min)
3) with my patch with test case modified with 1600 table columns and
41 rows: 4003007 ms (66.71 min)
4) on master without the test case at all: 3770722 ms (62.84 min)
I forgot to mention one thing: I ran the above tests with CLOBBER_CACHE_ALWAYS.
Regards,
Bharath Rupireddy.
Hi, On 2021-06-11 18:44:28 -0400, Tom Lane wrote: > Tomas Vondra <tomas.vondra@enterprisedb.com> writes: > > There's one caveat, though - for regular builds the slowdown is pretty > > much eliminated. But with valgrind it's still considerably slower. For > > postgres_fdw the "make check" used to take ~5 minutes for me, now it > > takes >1h. And yes, this is entirely due to the new test case which is > > generating / inserting 70k rows. So maybe the test case is not worth it > > after all, and we should get rid of it. > > I bet the CLOBBER_CACHE animals won't like it much either. > > I suggest what we do is leave it in place for long enough to get > a round of reports from those slow animals, and then (assuming > those reports are positive) drop the test. I just encountered this test because it doesn't succeed on a 32bit system with address sanitizer enabled - it runs out of memory. At that point there are "just" 29895 parameters parsed... It's also the slowest step on skink (valgrind animal), taking nearly an hour. I think two years later is long enough to have some confidence in this being fixed? Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2021-06-11 18:44:28 -0400, Tom Lane wrote: >> I suggest what we do is leave it in place for long enough to get >> a round of reports from those slow animals, and then (assuming >> those reports are positive) drop the test. > I think two years later is long enough to have some confidence in this being > fixed? +1, time to drop it (in the back branches too). regards, tom lane
On 7/2/23 15:23, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: >> On 2021-06-11 18:44:28 -0400, Tom Lane wrote: >>> I suggest what we do is leave it in place for long enough to get >>> a round of reports from those slow animals, and then (assuming >>> those reports are positive) drop the test. > >> I think two years later is long enough to have some confidence in this being >> fixed? > > +1, time to drop it (in the back branches too). > OK, will do (unless someone else wants to handle this) on Monday. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 7/2/23 15:50, Tomas Vondra wrote: > On 7/2/23 15:23, Tom Lane wrote: >> Andres Freund <andres@anarazel.de> writes: >>> On 2021-06-11 18:44:28 -0400, Tom Lane wrote: >>>> I suggest what we do is leave it in place for long enough to get >>>> a round of reports from those slow animals, and then (assuming >>>> those reports are positive) drop the test. >> >>> I think two years later is long enough to have some confidence in this being >>> fixed? >> >> +1, time to drop it (in the back branches too). >> > > OK, will do (unless someone else wants to handle this) on Monday. > FWIW I've removed the test from all branches where it was present. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company