Обсуждение: logrep stuck with 'ERROR: int2vector has too many elements'

Поиск
Список
Период
Сортировка

logrep stuck with 'ERROR: int2vector has too many elements'

От
Erik Rijkers
Дата:
Hello,

Logical replication sometimes gets stuck with
   ERROR:  int2vector has too many elements

I can't find the exact circumstances that cause it but it has something 
to do with many columns (or adding many columns) in combination with 
perhaps generated columns.

This replication test, in a slightly different form, used to work. This 
is also suggested by the fact that the attached runs without errors in 
REL_15_STABLE but gets stuck in HEAD.

What it does: it initdbs and runs two instances, primary and replica. In 
the primary 'pgbench -is1' done, and many columns, including 1 generated 
column, are added to all 4 pgbench tables. This is then 
pg_dump/pg_restored to the replica, and a short pgbench is run. The 
result tables on primary and replica are compared for the final result. 
(To run it will need some tweaks to directory and connection parms)

I ran it on both v15 and v16 for 25 runs: with the parameters as given 
15 has no problem while 16 always got stuck with the int2vector error. 
(15 can actually be pushed up to the max of 1600 columns per table 
without errors)

Both REL_15_STABLE and 16devel built from recent master on Debian 10, 
gcc 12.2.0.

I hope someone understands what's going wrong.

Thanks,

Erik Rijkers
Вложения

Re: logrep stuck with 'ERROR: int2vector has too many elements'

От
Alvaro Herrera
Дата:
On 2023-Jan-15, Erik Rijkers wrote:

> Hello,
> 
> Logical replication sometimes gets stuck with
>   ERROR:  int2vector has too many elements

Weird.  This error comes from int2vectorin which amusingly only wants to
read up to FUNC_MAX_ARGS values in the array (100 in the default config,
but it can be changed in pg_config_manual.h).  I wonder how come we
haven't noticed this before ... surely we use int2vector's for other
things than function argument lists nowadays.

At the same time, I don't understand why it fails in 16 but not in 15.
Maybe something changed in the way we process the column lists in 16?


-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Update: super-fast reaction on the Postgres bugs mailing list. The report
was acknowledged [...], and a fix is under discussion.
The wonders of open-source !"
             https://twitter.com/gunnarmorling/status/1596080409259003906



Re: logrep stuck with 'ERROR: int2vector has too many elements'

От
Erik Rijkers
Дата:
On 1/15/23 12:33, Alvaro Herrera wrote:
> On 2023-Jan-15, Erik Rijkers wrote:
> 
>> Hello,
>>
>> Logical replication sometimes gets stuck with
>>    ERROR:  int2vector has too many elements
> 
> Weird.  This error comes from int2vectorin which amusingly only wants to
> read up to FUNC_MAX_ARGS values in the array (100 in the default config,
> but it can be changed in pg_config_manual.h).  I wonder how come we
> haven't noticed this before ... surely we use int2vector's for other
> things than function argument lists nowadays.
> 
> At the same time, I don't understand why it fails in 16 but not in 15.
> Maybe something changed in the way we process the column lists in 16?

I wrote as comment in the script, but that's maybe vague so let me be 
more explicit: 16 also accepts many columns, up to 1600, without error, 
as long as that is not combined with generated column(s) such as in the 
script. It seems the combination becomes quickly problematic. Although 
adding just 50 columns + a generated column is still ok, 100 is already 
too high (see the ADD_COLUMNS variable in my script).

Weird indeed.


Erik



RE: logrep stuck with 'ERROR: int2vector has too many elements'

От
"houzj.fnst@fujitsu.com"
Дата:
On Sunday, January 15, 2023 5:35 PM Erik Rijkers <er@xs4all.nl> wrote:
> 
> I can't find the exact circumstances that cause it but it has something to do with
> many columns (or adding many columns) in combination with perhaps
> generated columns.
> 
> This replication test, in a slightly different form, used to work. This is also
> suggested by the fact that the attached runs without errors in REL_15_STABLE but
> gets stuck in HEAD.
> 
> What it does: it initdbs and runs two instances, primary and replica. In the
> primary 'pgbench -is1' done, and many columns, including 1 generated column,
> are added to all 4 pgbench tables. This is then pg_dump/pg_restored to the
> replica, and a short pgbench is run. The result tables on primary and replica are
> compared for the final result.
> (To run it will need some tweaks to directory and connection parms)
> 
> I ran it on both v15 and v16 for 25 runs: with the parameters as given
> 15 has no problem while 16 always got stuck with the int2vector error.
> (15 can actually be pushed up to the max of 1600 columns per table without
> errors)
> 
> Both REL_15_STABLE and 16devel built from recent master on Debian 10, gcc
> 12.2.0.
> 
> I hope someone understands what's going wrong.

Thanks for reporting.

I think the basic problem is that we try to fetch the column list as a int2vector
when doing table sync, and then if the number of columns is larger than 100, we
will get an ERROR like the $subject.

We can also hit this ERROR by manually specifying a long(>100) column
list in the publication Like:

create publication pub for table test(a1,a2,a3... a200);
create subscription xxx.

The script didn't reproduce this in PG15, because we didn't filter out
generated column when fetching the column list, so it assumes all columns are
replicated and will return NULL for the column list(int2vector) value. But in
PG16 (b7ae039), we started to filter out generated column(because generated columns are
not replicated in logical replication), so we get a valid int2vector and get
the ERROR. 
I will think and work on a fix for this.

Best regards,
Hou zj

Re: logrep stuck with 'ERROR: int2vector has too many elements'

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> On 2023-Jan-15, Erik Rijkers wrote:
>> Logical replication sometimes gets stuck with
>> ERROR:  int2vector has too many elements

> Weird.  This error comes from int2vectorin which amusingly only wants to
> read up to FUNC_MAX_ARGS values in the array (100 in the default config,
> but it can be changed in pg_config_manual.h).  I wonder how come we
> haven't noticed this before ... surely we use int2vector's for other
> things than function argument lists nowadays.

Yeah.  So remove the limit in int2vectorin (probably also oidvectorin),
or change logrep to use int2[] not int2vector, or both.

> At the same time, I don't understand why it fails in 16 but not in 15.
> Maybe something changed in the way we process the column lists in 16?

Probably didn't have a dependency on int2vector before.

            regards, tom lane



Re: logrep stuck with 'ERROR: int2vector has too many elements'

От
Tom Lane
Дата:
I wrote:
> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
>> At the same time, I don't understand why it fails in 16 but not in 15.
>> Maybe something changed in the way we process the column lists in 16?

> Probably didn't have a dependency on int2vector before.

It looks like the proximate cause is that fd0b9dceb started fetching
the remote's pg_get_publication_tables() result as-is rather than
unnesting it, so that the on-the-wire representation is now int2vector
not a series of int2.  However, that just begs the question of who
thought that making pg_publication_rel.prattrs be int2vector instead
of int2[] was a good idea.  Quite aside from this issue, int2vector
isn't toastable, which'll lead to bloat in pg_publication_rel.

But I suppose we are stuck with that, seeing that this datatype choice
is effectively part of the logrep protocol now.  I think the only
reasonable solution is to get rid of the FUNC_MAX_ARGS restriction
in int2vectorin.  We probably ought to back-patch that as far as
pg_publication_rel.prattrs exists, too.

BTW, fd0b9dceb is in v15, so are you sure this doesn't fail in 15?
It looks like the code path is only taken if the remote is also >= 15,
so maybe your test case didn't expose it?

            regards, tom lane



Re: logrep stuck with 'ERROR: int2vector has too many elements'

От
Andres Freund
Дата:
Hi,

On 2023-01-15 14:39:41 -0500, Tom Lane wrote:
> It looks like the proximate cause is that fd0b9dceb started fetching
> the remote's pg_get_publication_tables() result as-is rather than
> unnesting it, so that the on-the-wire representation is now int2vector
> not a series of int2.  However, that just begs the question of who
> thought that making pg_publication_rel.prattrs be int2vector instead
> of int2[] was a good idea.  Quite aside from this issue, int2vector
> isn't toastable, which'll lead to bloat in pg_publication_rel.

There's no easily visible comments about these restrictions of int2vector. And
there's plenty other places using it, where it's not immediatelly obvious that
the number of entries is very constrained, even if they are
(e.g. pg_trigger.tgattr).


> But I suppose we are stuck with that, seeing that this datatype choice
> is effectively part of the logrep protocol now.  I think the only
> reasonable solution is to get rid of the FUNC_MAX_ARGS restriction
> in int2vectorin.  We probably ought to back-patch that as far as
> pg_publication_rel.prattrs exists, too.

Are you thinking of introducing another, or just "rely" on too long arrays to
trigger errors when forming tuples?

I guess we'll have to process the input twice? Pre-allocating an int2vector
for 100 elements is one thing, for 1600 another.

Greetings,

Andres Freund



Re: logrep stuck with 'ERROR: int2vector has too many elements'

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2023-01-15 14:39:41 -0500, Tom Lane wrote:
>> But I suppose we are stuck with that, seeing that this datatype choice
>> is effectively part of the logrep protocol now.  I think the only
>> reasonable solution is to get rid of the FUNC_MAX_ARGS restriction
>> in int2vectorin.  We probably ought to back-patch that as far as
>> pg_publication_rel.prattrs exists, too.

> Are you thinking of introducing another, or just "rely" on too long arrays to
> trigger errors when forming tuples?

There's enough protections already, eg repalloc will complain if you
try to go past 1GB.  I'm thinking of the attached for HEAD (it'll
take minor mods to back-patch).

            regards, tom lane

diff --git a/src/backend/utils/adt/int.c b/src/backend/utils/adt/int.c
index e47c15a54f..44d1c7ad0c 100644
--- a/src/backend/utils/adt/int.c
+++ b/src/backend/utils/adt/int.c
@@ -143,11 +143,13 @@ int2vectorin(PG_FUNCTION_ARGS)
     char       *intString = PG_GETARG_CSTRING(0);
     Node       *escontext = fcinfo->context;
     int2vector *result;
+    int            nalloc;
     int            n;
 
-    result = (int2vector *) palloc0(Int2VectorSize(FUNC_MAX_ARGS));
+    nalloc = 32;                /* arbitrary initial size guess */
+    result = (int2vector *) palloc0(Int2VectorSize(nalloc));
 
-    for (n = 0; n < FUNC_MAX_ARGS; n++)
+    for (n = 0;; n++)
     {
         long        l;
         char       *endp;
@@ -157,6 +159,12 @@ int2vectorin(PG_FUNCTION_ARGS)
         if (*intString == '\0')
             break;
 
+        if (n >= nalloc)
+        {
+            nalloc *= 2;
+            result = (int2vector *) repalloc(result, Int2VectorSize(nalloc));
+        }
+
         errno = 0;
         l = strtol(intString, &endp, 10);
 
@@ -176,17 +184,11 @@ int2vectorin(PG_FUNCTION_ARGS)
             ereturn(escontext, (Datum) 0,
                     (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
                      errmsg("invalid input syntax for type %s: \"%s\"",
-                            "integer", intString)));
+                            "smallint", intString)));
 
         result->values[n] = l;
         intString = endp;
     }
-    while (*intString && isspace((unsigned char) *intString))
-        intString++;
-    if (*intString)
-        ereturn(escontext, (Datum) 0,
-                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-                 errmsg("int2vector has too many elements")));
 
     SET_VARSIZE(result, Int2VectorSize(n));
     result->ndim = 1;
@@ -261,12 +263,6 @@ int2vectorrecv(PG_FUNCTION_ARGS)
                 (errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
                  errmsg("invalid int2vector data")));
 
-    /* check length for consistency with int2vectorin() */
-    if (ARR_DIMS(result)[0] > FUNC_MAX_ARGS)
-        ereport(ERROR,
-                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-                 errmsg("oidvector has too many elements")));
-
     PG_RETURN_POINTER(result);
 }
 
diff --git a/src/backend/utils/adt/oid.c b/src/backend/utils/adt/oid.c
index 697588313d..3f7af5b3a0 100644
--- a/src/backend/utils/adt/oid.c
+++ b/src/backend/utils/adt/oid.c
@@ -115,27 +115,30 @@ oidvectorin(PG_FUNCTION_ARGS)
     char       *oidString = PG_GETARG_CSTRING(0);
     Node       *escontext = fcinfo->context;
     oidvector  *result;
+    int            nalloc;
     int            n;
 
-    result = (oidvector *) palloc0(OidVectorSize(FUNC_MAX_ARGS));
+    nalloc = 32;                /* arbitrary initial size guess */
+    result = (oidvector *) palloc0(OidVectorSize(nalloc));
 
-    for (n = 0; n < FUNC_MAX_ARGS; n++)
+    for (n = 0;; n++)
     {
         while (*oidString && isspace((unsigned char) *oidString))
             oidString++;
         if (*oidString == '\0')
             break;
+
+        if (n >= nalloc)
+        {
+            nalloc *= 2;
+            result = (oidvector *) repalloc(result, OidVectorSize(nalloc));
+        }
+
         result->values[n] = uint32in_subr(oidString, &oidString,
                                           "oid", escontext);
         if (SOFT_ERROR_OCCURRED(escontext))
             PG_RETURN_NULL();
     }
-    while (*oidString && isspace((unsigned char) *oidString))
-        oidString++;
-    if (*oidString)
-        ereturn(escontext, (Datum) 0,
-                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-                 errmsg("oidvector has too many elements")));
 
     SET_VARSIZE(result, OidVectorSize(n));
     result->ndim = 1;
@@ -212,12 +215,6 @@ oidvectorrecv(PG_FUNCTION_ARGS)
                 (errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
                  errmsg("invalid oidvector data")));
 
-    /* check length for consistency with oidvectorin() */
-    if (ARR_DIMS(result)[0] > FUNC_MAX_ARGS)
-        ereport(ERROR,
-                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-                 errmsg("oidvector has too many elements")));
-
     PG_RETURN_POINTER(result);
 }


Re: logrep stuck with 'ERROR: int2vector has too many elements'

От
Andres Freund
Дата:
Hi,

On 2023-01-15 15:17:16 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2023-01-15 14:39:41 -0500, Tom Lane wrote:
> >> But I suppose we are stuck with that, seeing that this datatype choice
> >> is effectively part of the logrep protocol now.  I think the only
> >> reasonable solution is to get rid of the FUNC_MAX_ARGS restriction
> >> in int2vectorin.  We probably ought to back-patch that as far as
> >> pg_publication_rel.prattrs exists, too.
> 
> > Are you thinking of introducing another, or just "rely" on too long arrays to
> > trigger errors when forming tuples?
> 
> There's enough protections already, eg repalloc will complain if you
> try to go past 1GB.

We'll practically error out at a much lower limit than that, due to, due to
reaching the max length of a row and int2vector not being toastable. So I'm
just wondering if we want to set the limit to something that'll commonly avoid
erroring out out with something like
  ERROR:  54000: row is too big: size 10048, maximum size 8160

For the purpose here a limit of MaxTupleAttributeNumber or such instead of
FUNC_MAX_ARGS would do the trick, I think?


> diff --git a/src/backend/utils/adt/int.c b/src/backend/utils/adt/int.c
> index e47c15a54f..44d1c7ad0c 100644
> --- a/src/backend/utils/adt/int.c
> +++ b/src/backend/utils/adt/int.c
> @@ -143,11 +143,13 @@ int2vectorin(PG_FUNCTION_ARGS)
>      char       *intString = PG_GETARG_CSTRING(0);
>      Node       *escontext = fcinfo->context;
>      int2vector *result;
> +    int            nalloc;
>      int            n;
>  
> -    result = (int2vector *) palloc0(Int2VectorSize(FUNC_MAX_ARGS));
> +    nalloc = 32;                /* arbitrary initial size guess */
> +    result = (int2vector *) palloc0(Int2VectorSize(nalloc));
>  
> -    for (n = 0; n < FUNC_MAX_ARGS; n++)
> +    for (n = 0;; n++)
>      {
>          long        l;
>          char       *endp;
> @@ -157,6 +159,12 @@ int2vectorin(PG_FUNCTION_ARGS)
>          if (*intString == '\0')
>              break;
>  
> +        if (n >= nalloc)
> +        {
> +            nalloc *= 2;
> +            result = (int2vector *) repalloc(result, Int2VectorSize(nalloc));
> +        }

Should this be repalloc0? I don't know if the palloc0 above was just used with
the goal of initializing the "header" fields, or also to avoid trailing
uninitialized bytes?

Greetings,

Andres Freund



Re: logrep stuck with 'ERROR: int2vector has too many elements'

От
Tom Lane
Дата:
I wrote:
> BTW, fd0b9dceb is in v15, so are you sure this doesn't fail in 15?

Ah-hah: simple test cases only fail since b7ae03953.  Before
that, the default situation was that pg_publication_rel.prattrs
was null and that would be passed on as the transmitted value.
b7ae03953 decided it'd be cool if pg_get_publication_tables()
expanded that to show all the actually-transmitted columns,
and as of that point we can get an overflow in int2vectorin
given the default case where all columns are published.

To break it in v15, you need a test case that publishes more than
100 columns, but not all of them (else the optimization in
fetch_remote_table_info's query prevents the issue).

            regards, tom lane



Re: logrep stuck with 'ERROR: int2vector has too many elements'

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> For the purpose here a limit of MaxTupleAttributeNumber or such instead of
> FUNC_MAX_ARGS would do the trick, I think?

As long as we have to change the code, we might as well remove the
arbitrary restriction.

> Should this be repalloc0? I don't know if the palloc0 above was just used with
> the goal of initializing the "header" fields, or also to avoid trailing
> uninitialized bytes?

I think probably the palloc0 was mostly about belt-and-suspenders
programming.  But yeah, its only real value is to ensure that all
the header fields are zero, so I don't think we need repalloc0
when enlarging.  After we set the array size at the end of the
loop, it'd be a programming bug to touch any bytes beyond that.

            regards, tom lane



Re: logrep stuck with 'ERROR: int2vector has too many elements'

От
Andres Freund
Дата:
Hi,

On 2023-01-15 15:53:09 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > For the purpose here a limit of MaxTupleAttributeNumber or such instead of
> > FUNC_MAX_ARGS would do the trick, I think?
> 
> As long as we have to change the code, we might as well remove the
> arbitrary restriction.

WFM, just wanted to be sure we thought about the errors it could cause. I'm
not sure we've exercised cases of tuples being too wide due to variable-width
plain storage types exhaustively. There's only a small number of these types:
int2vector, oidvector, gtsvector, tsquery

What's behind using plain for these types? Is it just because we want to use
it in tables that don't have a toast table (namely pg_index)? Obviously we
can't change the storage in existing releases...


> > Should this be repalloc0? I don't know if the palloc0 above was just used with
> > the goal of initializing the "header" fields, or also to avoid trailing
> > uninitialized bytes?
> 
> I think probably the palloc0 was mostly about belt-and-suspenders
> programming.  But yeah, its only real value is to ensure that all
> the header fields are zero, so I don't think we need repalloc0
> when enlarging.  After we set the array size at the end of the
> loop, it'd be a programming bug to touch any bytes beyond that.

Agreed.

Greetings,

Andres Freund



Re: logrep stuck with 'ERROR: int2vector has too many elements'

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> WFM, just wanted to be sure we thought about the errors it could cause. I'm
> not sure we've exercised cases of tuples being too wide due to variable-width
> plain storage types exhaustively. There's only a small number of these types:
> int2vector, oidvector, gtsvector, tsquery

> What's behind using plain for these types? Is it just because we want to use
> it in tables that don't have a toast table (namely pg_index)? Obviously we
> can't change the storage in existing releases...

For int2vector and oidvector, I think it boils down to wanting to access
columns like pg_proc.proargtypes without detoasting.  We could fix that,
but it'd likely be invasive and not a net positive.

It seems a bit broken that tsquery is marked that way, though; I doubt
we are getting any notational benefit from it.

Dunno about gtsvector.

            regards, tom lane