Обсуждение: pg_stat_get_activity(): integer overflow due to (int) * (int) for MemoryContextAllocHuge()

Поиск
Список
Период
Сортировка

pg_stat_get_activity(): integer overflow due to (int) * (int) for MemoryContextAllocHuge()

От
Jakub Wartak
Дата:
Hi -hackers,

We've got customer report that high max_connections (3k) with high
pgstat_track_activity_query_size (1MB) ends up with:

postgres=# select * from pg_stat_get_activity(NULL);
ERROR:  invalid memory alloc request size 18446744072590721024
postgres=# select version();
                                               version
------------------------------------------------------------------------------------------------------
 PostgreSQL 17devel on x86_64-pc-linux-gnu, compiled by gcc (Debian
10.2.1-6) 10.2.1 20210110, 64-bit
(1 row)

it's in:
#0  errstart (elevel=elevel@entry=21, domain=domain@entry=0x0) at elog.c:358
#1  0x000055971cafc9a8 in errstart_cold (elevel=elevel@entry=21,
domain=domain@entry=0x0) at elog.c:333
#2  0x000055971cb018b7 in MemoryContextAllocHuge (context=<optimized
out>, size=18446744072590721024) at mcxt.c:1594
#3  0x000055971ce2fd59 in pgstat_read_current_status () at backend_status.c:767
#4  0x000055971ce30ab1 in pgstat_read_current_status () at backend_status.c:1167
#5  pgstat_fetch_stat_numbackends () at backend_status.c:1168
#6  0x000055971ceccc7f in pg_stat_get_activity (fcinfo=0x55971e239ff8)
at pgstatfuncs.c:308

It seems to be integer overflow due to (int) * (int), while
MemoryContextAllocHuge() allows taking Size(size_t) as parameter. I
get similar behaviour with:
size_t val = (int)1048576 * (int)3022;

Attached patch adjusts pgstat_track_activity_query_size to be of
size_t from int and fixes the issue.

Regards,
-Jakub Wartak.

Вложения

Re: pg_stat_get_activity(): integer overflow due to (int) * (int) for MemoryContextAllocHuge()

От
Michael Paquier
Дата:
On Wed, Sep 27, 2023 at 08:41:55AM +0200, Jakub Wartak wrote:
> Attached patch adjusts pgstat_track_activity_query_size to be of
> size_t from int and fixes the issue.

This cannot be backpatched, and using size_t is not really needed as
track_activity_query_size is capped at 1MB.  Why don't you just tweak
the calculation done in pgstat_read_current_status() instead, say with
a cast?
--
Michael

Вложения

Re: pg_stat_get_activity(): integer overflow due to (int) * (int) for MemoryContextAllocHuge()

От
Jakub Wartak
Дата:
On Wed, Sep 27, 2023 at 10:08 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Sep 27, 2023 at 08:41:55AM +0200, Jakub Wartak wrote:
> > Attached patch adjusts pgstat_track_activity_query_size to be of
> > size_t from int and fixes the issue.
>
> This cannot be backpatched, and using size_t is not really needed as
> track_activity_query_size is capped at 1MB.  Why don't you just tweak
> the calculation done in pgstat_read_current_status() instead, say with
> a cast?

Thanks Michael, sure, that is probably a better alternative. I've attached v2.

BTW: CF entry is https://commitfest.postgresql.org/45/4592/

Regards,
-Jakub Wartak.

Вложения

Re: pg_stat_get_activity(): integer overflow due to (int) * (int) for MemoryContextAllocHuge()

От
Peter Eisentraut
Дата:
On 27.09.23 09:08, Michael Paquier wrote:
> On Wed, Sep 27, 2023 at 08:41:55AM +0200, Jakub Wartak wrote:
>> Attached patch adjusts pgstat_track_activity_query_size to be of
>> size_t from int and fixes the issue.
> 
> This cannot be backpatched, and using size_t is not really needed as
> track_activity_query_size is capped at 1MB.  Why don't you just tweak
> the calculation done in pgstat_read_current_status() instead, say with
> a cast?

I think it's preferable to use the right type to begin with, rather than 
fixing it up afterwards with casts.




Re: pg_stat_get_activity(): integer overflow due to (int) * (int) for MemoryContextAllocHuge()

От
Andres Freund
Дата:
Hi,

On 2023-09-27 15:42:15 +0100, Peter Eisentraut wrote:
> On 27.09.23 09:08, Michael Paquier wrote:
> > On Wed, Sep 27, 2023 at 08:41:55AM +0200, Jakub Wartak wrote:
> > > Attached patch adjusts pgstat_track_activity_query_size to be of
> > > size_t from int and fixes the issue.
> > 
> > This cannot be backpatched, and using size_t is not really needed as
> > track_activity_query_size is capped at 1MB.  Why don't you just tweak
> > the calculation done in pgstat_read_current_status() instead, say with
> > a cast?
> 
> I think it's preferable to use the right type to begin with, rather than
> fixing it up afterwards with casts.

I don't think going for size_t is a viable path for fixing this. I'm pretty
sure the initial patch would trigger a type mismatch from guc_tables.c - we
don't have infrastructure for size_t GUCs.

Nor do I think either of the patches here is a complete fix - there will still
be overflows on 32bit systems.

Perhaps we ought to error out (in BackendStatusShmemSize() or such) if
pgstat_track_activity_query_size * MaxBackends >= 4GB?

Frankly, it seems like a quite bad idea to have such a high limit for
pgstat_track_activity_query_size. The overhead such a high value has will
surprise people...

Greetings,

Andres Freund



Re: pg_stat_get_activity(): integer overflow due to (int) * (int) for MemoryContextAllocHuge()

От
Michael Paquier
Дата:
On Wed, Sep 27, 2023 at 10:29:25AM -0700, Andres Freund wrote:
> I don't think going for size_t is a viable path for fixing this. I'm pretty
> sure the initial patch would trigger a type mismatch from guc_tables.c - we
> don't have infrastructure for size_t GUCs.

Nothing marked as PGDLLIMPORT uses size_t in the tree currently, FWIW.

> Perhaps we ought to error out (in BackendStatusShmemSize() or such) if
> pgstat_track_activity_query_size * MaxBackends >= 4GB?

Yeah, agreed that putting a check like that could catch errors more
quickly.

> Frankly, it seems like a quite bad idea to have such a high limit for
> pgstat_track_activity_query_size. The overhead such a high value has will
> surprise people...

Still it could have some value for some users with large analytical
queries where the syslogger is not going to be a bottleneck?  It seems
too late to me to change that, but perhaps the docs could be improved
to tell that using a too high value can have performance consequences,
while mentioning the maximum value.
--
Michael

Вложения

Re: pg_stat_get_activity(): integer overflow due to (int) * (int) for MemoryContextAllocHuge()

От
Andres Freund
Дата:
Hi,

On 2023-09-28 07:53:45 +0900, Michael Paquier wrote:
> On Wed, Sep 27, 2023 at 10:29:25AM -0700, Andres Freund wrote:
> > Frankly, it seems like a quite bad idea to have such a high limit for
> > pgstat_track_activity_query_size. The overhead such a high value has will
> > surprise people...
> 
> Still it could have some value for some users with large analytical
> queries where the syslogger is not going to be a bottleneck?  It seems
> too late to me to change that, but perhaps the docs could be improved
> to tell that using a too high value can have performance consequences,
> while mentioning the maximum value.

I don't think the issue is syslogger, the problem is that suddenly accessing
pg_stat_activity requires gigabytes of memory. That's insane.

Greetings,

Andres Freund



Re: pg_stat_get_activity(): integer overflow due to (int) * (int) for MemoryContextAllocHuge()

От
Jakub Wartak
Дата:
On Thu, Sep 28, 2023 at 12:53 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Sep 27, 2023 at 10:29:25AM -0700, Andres Freund wrote:
> > I don't think going for size_t is a viable path for fixing this. I'm pretty
> > sure the initial patch would trigger a type mismatch from guc_tables.c - we
> > don't have infrastructure for size_t GUCs.
>
> Nothing marked as PGDLLIMPORT uses size_t in the tree currently, FWIW.
>
> > Perhaps we ought to error out (in BackendStatusShmemSize() or such) if
> > pgstat_track_activity_query_size * MaxBackends >= 4GB?
>
> Yeah, agreed that putting a check like that could catch errors more
> quickly.

Hi,

v3 attached. I had a problem coming out with a better error message,
so suggestions are welcome.  The cast still needs to be present as per
above suggestion as 3GB is still valid buf size and still was causing
integer overflow. We just throw an error on >= 4GB with v3.

-J.

Вложения

Re: pg_stat_get_activity(): integer overflow due to (int) * (int) for MemoryContextAllocHuge()

От
Michael Paquier
Дата:
On Thu, Sep 28, 2023 at 11:01:14AM +0200, Jakub Wartak wrote:
> v3 attached. I had a problem coming out with a better error message,
> so suggestions are welcome.  The cast still needs to be present as per
> above suggestion as 3GB is still valid buf size and still was causing
> integer overflow. We just throw an error on >= 4GB with v3.

+/* Safety net to prevent requesting huge memory by each query to pg_stat_activity */
+#define PGSTAT_MAX_ACTIVITY_BUF_SIZE 4 * 1024 * 1024 * 1024L

-    size = add_size(size,
-                    mul_size(pgstat_track_activity_query_size, NumBackendStatSlots));
+    pgstat_track_size = mul_size(pgstat_track_activity_query_size,
+                    NumBackendStatSlots);
+    if(pgstat_track_size >= PGSTAT_MAX_ACTIVITY_BUF_SIZE)
+        elog(FATAL, "too big Backend Activity Buffer allocation of %zu bytes", pgstat_track_size);
+    size = add_size(size, pgstat_track_size);

That should be enough to put in a hardcoded 4GB safety limit, while
mul_size() detects it at a higher range.  Note, however, that elog()
is only used for internal errors that users should never face, but
this one can come from a misconfiguration.  This would be better as an
ereport(), with ERRCODE_CONFIG_FILE_ERROR as errcode, I guess.

"Backend Activity Buffer" is the name of the internal struct.  Sure,
it shows up on the system views for shmem, but I wouldn't use this
term in a user-facing error message.  Perhaps something like "size
requested for backend status is out of range" would be cleaner.  Other
ideas are welcome.
--
Michael

Вложения

Re: pg_stat_get_activity(): integer overflow due to (int) * (int) for MemoryContextAllocHuge()

От
Jakub Wartak
Дата:
On Fri, Sep 29, 2023 at 4:00 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, Sep 28, 2023 at 11:01:14AM +0200, Jakub Wartak wrote:
> > v3 attached. I had a problem coming out with a better error message,
> > so suggestions are welcome.  The cast still needs to be present as per
> > above suggestion as 3GB is still valid buf size and still was causing
> > integer overflow. We just throw an error on >= 4GB with v3.
>
> +/* Safety net to prevent requesting huge memory by each query to pg_stat_activity */
> +#define PGSTAT_MAX_ACTIVITY_BUF_SIZE 4 * 1024 * 1024 * 1024L
>
> -       size = add_size(size,
> -                                       mul_size(pgstat_track_activity_query_size, NumBackendStatSlots));
> +       pgstat_track_size = mul_size(pgstat_track_activity_query_size,
> +                                       NumBackendStatSlots);
> +       if(pgstat_track_size >= PGSTAT_MAX_ACTIVITY_BUF_SIZE)
> +               elog(FATAL, "too big Backend Activity Buffer allocation of %zu bytes", pgstat_track_size);
> +       size = add_size(size, pgstat_track_size);
>
> That should be enough to put in a hardcoded 4GB safety limit, while
> mul_size() detects it at a higher range.  Note, however, that elog()
> is only used for internal errors that users should never face, but
> this one can come from a misconfiguration.  This would be better as an
> ereport(), with ERRCODE_CONFIG_FILE_ERROR as errcode, I guess.
>
> "Backend Activity Buffer" is the name of the internal struct.  Sure,
> it shows up on the system views for shmem, but I wouldn't use this
> term in a user-facing error message.  Perhaps something like "size
> requested for backend status is out of range" would be cleaner.  Other
> ideas are welcome.

Hi Michael,

I've attached v4 that covers your suggestions.

-J.

Вложения

Re: pg_stat_get_activity(): integer overflow due to (int) * (int) for MemoryContextAllocHuge()

От
Michael Paquier
Дата:
On Mon, Oct 02, 2023 at 10:22:06AM +0200, Jakub Wartak wrote:
> I've attached v4 that covers your suggestions.

Hmm.  I was looking at all that and pondered quite a bit about the
addition of the restriction when starting up the server, particularly
why there would be any need to include it in the same commit as the
one fixing the arguments given to AllocHuge().  At the end, as the
restriction goes a bit against 8d0ddccec636, I have removed it and
went to the simplest solution of adding the casts to Size, which is
the same thing as we do in all the other callers that deal with signed
variables (like mbutils.c).

Regarding any restrictions, perhaps we should improve the docs, at
least.  It would be better to discuss that separately.
--
Michael

Вложения