Обсуждение: pg_stat_get_activity(): integer overflow due to (int) * (int) for MemoryContextAllocHuge()
Hi -hackers, We've got customer report that high max_connections (3k) with high pgstat_track_activity_query_size (1MB) ends up with: postgres=# select * from pg_stat_get_activity(NULL); ERROR: invalid memory alloc request size 18446744072590721024 postgres=# select version(); version ------------------------------------------------------------------------------------------------------ PostgreSQL 17devel on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit (1 row) it's in: #0 errstart (elevel=elevel@entry=21, domain=domain@entry=0x0) at elog.c:358 #1 0x000055971cafc9a8 in errstart_cold (elevel=elevel@entry=21, domain=domain@entry=0x0) at elog.c:333 #2 0x000055971cb018b7 in MemoryContextAllocHuge (context=<optimized out>, size=18446744072590721024) at mcxt.c:1594 #3 0x000055971ce2fd59 in pgstat_read_current_status () at backend_status.c:767 #4 0x000055971ce30ab1 in pgstat_read_current_status () at backend_status.c:1167 #5 pgstat_fetch_stat_numbackends () at backend_status.c:1168 #6 0x000055971ceccc7f in pg_stat_get_activity (fcinfo=0x55971e239ff8) at pgstatfuncs.c:308 It seems to be integer overflow due to (int) * (int), while MemoryContextAllocHuge() allows taking Size(size_t) as parameter. I get similar behaviour with: size_t val = (int)1048576 * (int)3022; Attached patch adjusts pgstat_track_activity_query_size to be of size_t from int and fixes the issue. Regards, -Jakub Wartak.
Вложения
Re: pg_stat_get_activity(): integer overflow due to (int) * (int) for MemoryContextAllocHuge()
От
Michael Paquier
Дата:
On Wed, Sep 27, 2023 at 08:41:55AM +0200, Jakub Wartak wrote: > Attached patch adjusts pgstat_track_activity_query_size to be of > size_t from int and fixes the issue. This cannot be backpatched, and using size_t is not really needed as track_activity_query_size is capped at 1MB. Why don't you just tweak the calculation done in pgstat_read_current_status() instead, say with a cast? -- Michael
Вложения
On Wed, Sep 27, 2023 at 10:08 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Wed, Sep 27, 2023 at 08:41:55AM +0200, Jakub Wartak wrote: > > Attached patch adjusts pgstat_track_activity_query_size to be of > > size_t from int and fixes the issue. > > This cannot be backpatched, and using size_t is not really needed as > track_activity_query_size is capped at 1MB. Why don't you just tweak > the calculation done in pgstat_read_current_status() instead, say with > a cast? Thanks Michael, sure, that is probably a better alternative. I've attached v2. BTW: CF entry is https://commitfest.postgresql.org/45/4592/ Regards, -Jakub Wartak.
Вложения
Re: pg_stat_get_activity(): integer overflow due to (int) * (int) for MemoryContextAllocHuge()
От
Peter Eisentraut
Дата:
On 27.09.23 09:08, Michael Paquier wrote: > On Wed, Sep 27, 2023 at 08:41:55AM +0200, Jakub Wartak wrote: >> Attached patch adjusts pgstat_track_activity_query_size to be of >> size_t from int and fixes the issue. > > This cannot be backpatched, and using size_t is not really needed as > track_activity_query_size is capped at 1MB. Why don't you just tweak > the calculation done in pgstat_read_current_status() instead, say with > a cast? I think it's preferable to use the right type to begin with, rather than fixing it up afterwards with casts.
Hi, On 2023-09-27 15:42:15 +0100, Peter Eisentraut wrote: > On 27.09.23 09:08, Michael Paquier wrote: > > On Wed, Sep 27, 2023 at 08:41:55AM +0200, Jakub Wartak wrote: > > > Attached patch adjusts pgstat_track_activity_query_size to be of > > > size_t from int and fixes the issue. > > > > This cannot be backpatched, and using size_t is not really needed as > > track_activity_query_size is capped at 1MB. Why don't you just tweak > > the calculation done in pgstat_read_current_status() instead, say with > > a cast? > > I think it's preferable to use the right type to begin with, rather than > fixing it up afterwards with casts. I don't think going for size_t is a viable path for fixing this. I'm pretty sure the initial patch would trigger a type mismatch from guc_tables.c - we don't have infrastructure for size_t GUCs. Nor do I think either of the patches here is a complete fix - there will still be overflows on 32bit systems. Perhaps we ought to error out (in BackendStatusShmemSize() or such) if pgstat_track_activity_query_size * MaxBackends >= 4GB? Frankly, it seems like a quite bad idea to have such a high limit for pgstat_track_activity_query_size. The overhead such a high value has will surprise people... Greetings, Andres Freund
Re: pg_stat_get_activity(): integer overflow due to (int) * (int) for MemoryContextAllocHuge()
От
Michael Paquier
Дата:
On Wed, Sep 27, 2023 at 10:29:25AM -0700, Andres Freund wrote: > I don't think going for size_t is a viable path for fixing this. I'm pretty > sure the initial patch would trigger a type mismatch from guc_tables.c - we > don't have infrastructure for size_t GUCs. Nothing marked as PGDLLIMPORT uses size_t in the tree currently, FWIW. > Perhaps we ought to error out (in BackendStatusShmemSize() or such) if > pgstat_track_activity_query_size * MaxBackends >= 4GB? Yeah, agreed that putting a check like that could catch errors more quickly. > Frankly, it seems like a quite bad idea to have such a high limit for > pgstat_track_activity_query_size. The overhead such a high value has will > surprise people... Still it could have some value for some users with large analytical queries where the syslogger is not going to be a bottleneck? It seems too late to me to change that, but perhaps the docs could be improved to tell that using a too high value can have performance consequences, while mentioning the maximum value. -- Michael
Вложения
Hi, On 2023-09-28 07:53:45 +0900, Michael Paquier wrote: > On Wed, Sep 27, 2023 at 10:29:25AM -0700, Andres Freund wrote: > > Frankly, it seems like a quite bad idea to have such a high limit for > > pgstat_track_activity_query_size. The overhead such a high value has will > > surprise people... > > Still it could have some value for some users with large analytical > queries where the syslogger is not going to be a bottleneck? It seems > too late to me to change that, but perhaps the docs could be improved > to tell that using a too high value can have performance consequences, > while mentioning the maximum value. I don't think the issue is syslogger, the problem is that suddenly accessing pg_stat_activity requires gigabytes of memory. That's insane. Greetings, Andres Freund
On Thu, Sep 28, 2023 at 12:53 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Wed, Sep 27, 2023 at 10:29:25AM -0700, Andres Freund wrote: > > I don't think going for size_t is a viable path for fixing this. I'm pretty > > sure the initial patch would trigger a type mismatch from guc_tables.c - we > > don't have infrastructure for size_t GUCs. > > Nothing marked as PGDLLIMPORT uses size_t in the tree currently, FWIW. > > > Perhaps we ought to error out (in BackendStatusShmemSize() or such) if > > pgstat_track_activity_query_size * MaxBackends >= 4GB? > > Yeah, agreed that putting a check like that could catch errors more > quickly. Hi, v3 attached. I had a problem coming out with a better error message, so suggestions are welcome. The cast still needs to be present as per above suggestion as 3GB is still valid buf size and still was causing integer overflow. We just throw an error on >= 4GB with v3. -J.
Вложения
Re: pg_stat_get_activity(): integer overflow due to (int) * (int) for MemoryContextAllocHuge()
От
Michael Paquier
Дата:
On Thu, Sep 28, 2023 at 11:01:14AM +0200, Jakub Wartak wrote: > v3 attached. I had a problem coming out with a better error message, > so suggestions are welcome. The cast still needs to be present as per > above suggestion as 3GB is still valid buf size and still was causing > integer overflow. We just throw an error on >= 4GB with v3. +/* Safety net to prevent requesting huge memory by each query to pg_stat_activity */ +#define PGSTAT_MAX_ACTIVITY_BUF_SIZE 4 * 1024 * 1024 * 1024L - size = add_size(size, - mul_size(pgstat_track_activity_query_size, NumBackendStatSlots)); + pgstat_track_size = mul_size(pgstat_track_activity_query_size, + NumBackendStatSlots); + if(pgstat_track_size >= PGSTAT_MAX_ACTIVITY_BUF_SIZE) + elog(FATAL, "too big Backend Activity Buffer allocation of %zu bytes", pgstat_track_size); + size = add_size(size, pgstat_track_size); That should be enough to put in a hardcoded 4GB safety limit, while mul_size() detects it at a higher range. Note, however, that elog() is only used for internal errors that users should never face, but this one can come from a misconfiguration. This would be better as an ereport(), with ERRCODE_CONFIG_FILE_ERROR as errcode, I guess. "Backend Activity Buffer" is the name of the internal struct. Sure, it shows up on the system views for shmem, but I wouldn't use this term in a user-facing error message. Perhaps something like "size requested for backend status is out of range" would be cleaner. Other ideas are welcome. -- Michael
Вложения
On Fri, Sep 29, 2023 at 4:00 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Thu, Sep 28, 2023 at 11:01:14AM +0200, Jakub Wartak wrote: > > v3 attached. I had a problem coming out with a better error message, > > so suggestions are welcome. The cast still needs to be present as per > > above suggestion as 3GB is still valid buf size and still was causing > > integer overflow. We just throw an error on >= 4GB with v3. > > +/* Safety net to prevent requesting huge memory by each query to pg_stat_activity */ > +#define PGSTAT_MAX_ACTIVITY_BUF_SIZE 4 * 1024 * 1024 * 1024L > > - size = add_size(size, > - mul_size(pgstat_track_activity_query_size, NumBackendStatSlots)); > + pgstat_track_size = mul_size(pgstat_track_activity_query_size, > + NumBackendStatSlots); > + if(pgstat_track_size >= PGSTAT_MAX_ACTIVITY_BUF_SIZE) > + elog(FATAL, "too big Backend Activity Buffer allocation of %zu bytes", pgstat_track_size); > + size = add_size(size, pgstat_track_size); > > That should be enough to put in a hardcoded 4GB safety limit, while > mul_size() detects it at a higher range. Note, however, that elog() > is only used for internal errors that users should never face, but > this one can come from a misconfiguration. This would be better as an > ereport(), with ERRCODE_CONFIG_FILE_ERROR as errcode, I guess. > > "Backend Activity Buffer" is the name of the internal struct. Sure, > it shows up on the system views for shmem, but I wouldn't use this > term in a user-facing error message. Perhaps something like "size > requested for backend status is out of range" would be cleaner. Other > ideas are welcome. Hi Michael, I've attached v4 that covers your suggestions. -J.
Вложения
Re: pg_stat_get_activity(): integer overflow due to (int) * (int) for MemoryContextAllocHuge()
От
Michael Paquier
Дата:
On Mon, Oct 02, 2023 at 10:22:06AM +0200, Jakub Wartak wrote: > I've attached v4 that covers your suggestions. Hmm. I was looking at all that and pondered quite a bit about the addition of the restriction when starting up the server, particularly why there would be any need to include it in the same commit as the one fixing the arguments given to AllocHuge(). At the end, as the restriction goes a bit against 8d0ddccec636, I have removed it and went to the simplest solution of adding the casts to Size, which is the same thing as we do in all the other callers that deal with signed variables (like mbutils.c). Regarding any restrictions, perhaps we should improve the docs, at least. It would be better to discuss that separately. -- Michael