Обсуждение: Patch: New GUC prepared_statement_limit to limit memory used byprepared statements
Patch: New GUC prepared_statement_limit to limit memory used byprepared statements
Hello, attached you find a patch that adds a new GUC: prepared_statement_limit: Specifies the maximum amount of memory used in each session to cache parsed-and-rewritten queries and execution plans. This affects the maximum memory a backend threads will reserve when many prepared statements are used. The default value of 0 disables this setting, but it is recommended to set this value to a bit lower than the maximum memory a backend worker thread should reserve permanently. If the GUC is configured after each save of a CachedPlanSource, or after creating a CachedPlan from it, the function EnforcePreparedStatementLimit is called now. It checks the mem usage of the existing saved CachedPlanSources and invalidates the query_list and the gplan if available until the memory limit is met again. CachedPlanSource are removed-and-tailadded in the saved_plan_list everytime GetCachedPlan is called on them so it can be used as a LRU list. I also reworked ResetPlanCache, PlanCacheRelCallback and PlanCacheObjectCallback a bit so when a CachedPlanSource is invalidated the query_list is not only marked as invalid but it is also fully released to free memory here. Regards, Daniel Migowski PS@Konstantin: This patch also includes the CachedPlanMemoryUsage function you like, maybe you like the review the patch for me?
Вложения
Hello,
attached you find a patch that adds a new GUC:
prepared_statement_limit:
Specifies the maximum amount of memory used in each session to
cache
parsed-and-rewritten queries and execution plans. This affects
the maximum memory
a backend threads will reserve when many prepared statements
are used.
The default value of 0 disables this setting, but it is
recommended to set this
value to a bit lower than the maximum memory a backend worker
thread should reserve
permanently.
If the GUC is configured after each save of a CachedPlanSource, or after
creating a CachedPlan from it, the function
EnforcePreparedStatementLimit is called now. It checks the mem usage of
the existing saved CachedPlanSources and invalidates the query_list and
the gplan if available until the memory limit is met again.
CachedPlanSource are removed-and-tailadded in the saved_plan_list
everytime GetCachedPlan is called on them so it can be used as a LRU list.
I also reworked ResetPlanCache, PlanCacheRelCallback and
PlanCacheObjectCallback a bit so when a CachedPlanSource is invalidated
the query_list is not only marked as invalid but it is also fully
released to free memory here.
Regards,
Daniel Migowski
PS@Konstantin: This patch also includes the CachedPlanMemoryUsage
function you like, maybe you like the review the patch for me?
Re: Patch: New GUC prepared_statement_limit to limit memory used byprepared statements
No, it is a proposal. It could also be named plancache_mem or cachedplansource_maxmem or anything else. It was intended to make prepared statements not use up all my mem, but development has shown that it could also be used for other CachedPlans, as long as it is a saved plan.On Sat, Aug 17, 2019 at 6:58 PM Daniel Migowski <dmigowski@ikoffice.de> wrote:
attached you find a patch that adds a new GUC:Quick questions before looking at the patch.
prepared_statement_limit:- Do we have a consensus about the name of GUC? I don't think it isthe right name for that.
- Is this a WIP patch or the final patch? Because I can see TODO and non-standardcomments in the patch.
Definitely work in progress! The current implementation seems to work for me, but might be improved, but I wanted some input from the mailing list before I optimize things.
The most important question is, if such a feature would find some love here. Personally it is essential for me because a single prepared statement uses up to 45MB in my application and there were cases where ORM-generated prepared statememts would crash my server after some time.
Then I would like to know if the current implementation would at least not crash (even it might by slow a bit) or if I have to take more care for locking in some places. I believe a backend is a single thread of execution but there were notes of invalidation messages that seem to be run asynchronously to the main thread. Could someone care to explain the treading model of a single backend to me? Does it react so signals and what would those signals change? Can I assume that only the ResetPlanCache, PlanCacheRelCallback and PlanCacheObjectCallback will be called async?
Could be a single move-to-tail function I would add.Specifies the maximum amount of memory used in each session to
cache
parsed-and-rewritten queries and execution plans. This affects
the maximum memory
a backend threads will reserve when many prepared statements
are used.
The default value of 0 disables this setting, but it is
recommended to set this
value to a bit lower than the maximum memory a backend worker
thread should reserve
permanently.
If the GUC is configured after each save of a CachedPlanSource, or after
creating a CachedPlan from it, the function
EnforcePreparedStatementLimit is called now. It checks the mem usage of
the existing saved CachedPlanSources and invalidates the query_list and
the gplan if available until the memory limit is met again.
CachedPlanSource are removed-and-tailadded in the saved_plan_list
everytime GetCachedPlan is called on them so it can be used as a LRU list.
Because this seems to work I was able to reuse the new ReleaseQueryList in my implementation.I also reworked ResetPlanCache, PlanCacheRelCallback and
PlanCacheObjectCallback a bit so when a CachedPlanSource is invalidated
the query_list is not only marked as invalid but it is also fully
released to free memory here.
Daniel Migowski--
Regards,
Daniel Migowski
PS@Konstantin: This patch also includes the CachedPlanMemoryUsage
function you like, maybe you like the review the patch for me?Ibrar Ahmed
Re: Patch: New GUC prepared_statement_limit to limit memory usedby prepared statements
Hello, At Sun, 18 Aug 2019 09:43:09 +0200, Daniel Migowski <dmigowski@ikoffice.de> wrote in <6e25ca12-9484-8994-a1ee-40fdbe6afa8b@ikoffice.de> > Am 17.08.2019 um 19:10 schrieb Ibrar Ahmed: > > On Sat, Aug 17, 2019 at 6:58 PM Daniel Migowski <dmigowski@ikoffice.de > > <mailto:dmigowski@ikoffice.de>> wrote: > > > > > > attached you find a patch that adds a new GUC: > > > > > > Quick questions before looking at the patch. > > > > > > prepared_statement_limit: > > > > - Do we have a consensus about the name of GUC? I don't think it is > > the right name for that. The almost same was proposed [1] as a part of syscache-pruning patch [2], but removed to concentrate on defining how to do that on the first instance - syscache. We have some mechanisms that have the same characteristics - can be bloat and no means to keep it in a certain size. It is better that they are treated the same way, or at leaast on the same principle. [1] https://www.postgresql.org/message-id/20180315.141246.130742928.horiguchi.kyotaro%40lab.ntt.co.jp [2] https://commitfest.postgresql.org/23/931/ Pruning plancaches in any means is valuable, but we haven't reached a concsensus on how to do that. My old patch does that based on the number of entries because precise memory accounting of memory contexts is too expensive. I didn't look this patch closer but it seems to use MemoryContext->methods->stats to count memory usage, which would be too expensive for the purpose. We currently use it only for debug output on critical errors like OOM. > No, it is a proposal. It could also be named plancache_mem or > cachedplansource_maxmem or anything else. It was intended to make > prepared statements not use up all my mem, but development has shown > that it could also be used for other CachedPlans, as long as it is a > saved plan. > > - Is this a WIP patch or the final patch? Because I can see TODO and > > non-standard > > comments in the patch. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Patch: New GUC prepared_statement_limit to limit memory used byprepared statements
Am 19.08.2019 um 05:57 schrieb Kyotaro Horiguchi: > At Sun, 18 Aug 2019 09:43:09 +0200, Daniel Migowski <dmigowski@ikoffice.de> wrote in <6e25ca12-9484-8994-a1ee-40fdbe6afa8b@ikoffice.de> >> Am 17.08.2019 um 19:10 schrieb Ibrar Ahmed: >>> On Sat, Aug 17, 2019 at 6:58 PM Daniel Migowski <dmigowski@ikoffice.de >>> <mailto:dmigowski@ikoffice.de>> wrote: >>> >>> >>> attached you find a patch that adds a new GUC: >>> >>> >>> Quick questions before looking at the patch. >>> >>> >>> prepared_statement_limit: >>> >>> - Do we have a consensus about the name of GUC? I don't think it is >>> the right name for that. > The almost same was proposed [1] as a part of syscache-pruning > patch [2], but removed to concentrate on defining how to do that > on the first instance - syscache. We have some mechanisms that > have the same characteristics - can be bloat and no means to keep > it in a certain size. It is better that they are treated the same > way, or at least on the same principle. Correct. However, I don't know the backend well enough to see how to unify this. Also time based eviction isn't that important for me, and I'd rather work with the memory used. I agree that memory leaks are all bad, and a time based eviction for some small cache entries might suffice, but CachedPlanSources take up up to 45MB EACH here, and looking at the memory directly seems desirable in that case. > [1] https://www.postgresql.org/message-id/20180315.141246.130742928.horiguchi.kyotaro%40lab.ntt.co.jp > [2] https://commitfest.postgresql.org/23/931/ > > Pruning plancaches in any means is valuable, but we haven't > reached a concsensus on how to do that. My old patch does that > based on the number of entries because precise memory accounting > of memory contexts is too expensive. I didn't look this patch > closer but it seems to use MemoryContext->methods->stats to count > memory usage, which would be too expensive for the purpose. We > currently use it only for debug output on critical errors like > OOM. Yes, this looks like a place to improve. I hadn't looked at the stats function before, and wasn't aware it iterates over all chunks of the context. We really don't need all the mem stats, just the total usage and this calculates solely from the size of the blocks. Maybe we should add this counter (totalmemusage) to the MemoryContexts themselves to start to be able to make decisions based on the memory usage fast. Shouldn't be too slow because blocks seem to be aquired much less often than chunks and when they are aquired an additional add to variable in memory wouldn't hurt. One the other hand they should be as fast as possible given how often they are used in the database, so that might be bad. Also one could precompute the memory usage of a CachedPlanSource when it is saved, when the query_list gets calculated (for plans about to be saved) and when the generic plan is stored in it. In combination with a fast stats function which only calculates the total usage this looks good for me. Also one could store the sum in a session global variable to make checking for the need of a prune run faster. While time based eviction also makes sense in a context where the database is not alone on a server, contraining the memory used directly attacks the problem one tries to solve with time based eviction.
Re: Patch: New GUC prepared_statement_limit to limit memory used byprepared statements
On 2019-Aug-18, Daniel Migowski wrote: > > - Is this a WIP patch or the final patch? Because I can see TODO > > and non-standard comments in the patch. > > Definitely work in progress! The current implementation seems to work for > me, but might be improved, but I wanted some input from the mailing list > before I optimize things. > > The most important question is, if such a feature would find some love here. > Personally it is essential for me because a single prepared statement uses > up to 45MB in my application and there were cases where ORM-generated > prepared statememts would crash my server after some time. > > Then I would like to know if the current implementation would at least not > crash (even it might by slow a bit) or if I have to take more care for > locking in some places. On this patch, beyond the fact that it's causing a crash in the regression tests as evidenced by the CFbot, we seem to be waiting on the input of the larger community on whether it's a desired feature or not. We have Kyotaro's vote for it, but it would be good to get more. I'm switching it as Needs Review, so that others chime in. In the meantime, please do fix the code style: brace location and whitespacing are not per style, as well as usage of //-comments. Also please research and fix the crash. Thanks -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > On this patch, beyond the fact that it's causing a crash in the > regression tests as evidenced by the CFbot, we seem to be waiting on the > input of the larger community on whether it's a desired feature or not. > We have Kyotaro's vote for it, but it would be good to get more. I'm pretty dubious about this (just as I am with the somewhat-related topic of syscache/relcache size limits). It's hard to tell where performance is going to fall off a cliff if you limit the cache size. Another point is that, because this only gets rid of the regeneratable parts of a plancache entry, it isn't really going to move the needle all that far in terms of total space consumption. As an experiment, I instrumented GetCachedPlan right after where it fills in the gplan field to see the relative sizes of the cache entry's contexts. Running that over the core + PL regression tests, I get 14 GetCachedPlan: 1024 context, 1024 query_context, 1024 gplan 4 GetCachedPlan: 1024 context, 2048 query_context, 1024 gplan 1 GetCachedPlan: 1024 context, 2048 query_context, 2048 gplan 2109 GetCachedPlan: 2048 context, 2048 query_context, 2048 gplan 29 GetCachedPlan: 2048 context, 2048 query_context, 4096 gplan 6 GetCachedPlan: 2048 context, 4096 query_context, 2048 gplan 33 GetCachedPlan: 2048 context, 4096 query_context, 4096 gplan 2 GetCachedPlan: 2048 context, 4096 query_context, 8192 gplan 1 GetCachedPlan: 2048 context, 8192 query_context, 16384 gplan 4 GetCachedPlan: 2048 context, 8192 query_context, 4096 gplan 2 GetCachedPlan: 2048 context, 8192 query_context, 8192 gplan 8 GetCachedPlan: 3480 context, 8192 query_context, 8192 gplan 250 GetCachedPlan: 4096 context, 2048 query_context, 2048 gplan 107 GetCachedPlan: 4096 context, 2048 query_context, 4096 gplan 3 GetCachedPlan: 4096 context, 4096 query_context, 16384 gplan 1 GetCachedPlan: 4096 context, 4096 query_context, 2048 gplan 7 GetCachedPlan: 4096 context, 4096 query_context, 32768 gplan 190 GetCachedPlan: 4096 context, 4096 query_context, 4096 gplan 61 GetCachedPlan: 4096 context, 4096 query_context, 8192 gplan 11 GetCachedPlan: 4096 context, 8192 query_context, 4096 gplan 587 GetCachedPlan: 4096 context, 8192 query_context, 8192 gplan 1 GetCachedPlan: 4096 context, 16384 query_context, 8192 gplan 5 GetCachedPlan: 4096 context, 32768 query_context, 32768 gplan 1 GetCachedPlan: 4096 context, 65536 query_context, 65536 gplan 12 GetCachedPlan: 8192 context, 4096 query_context, 4096 gplan 2 GetCachedPlan: 8192 context, 4096 query_context, 8192 gplan 49 GetCachedPlan: 8192 context, 8192 query_context, 16384 gplan 46 GetCachedPlan: 8192 context, 8192 query_context, 8192 gplan 10 GetCachedPlan: 8192 context, 16384 query_context, 16384 gplan 1 GetCachedPlan: 8192 context, 16384 query_context, 32768 gplan 1 GetCachedPlan: 8192 context, 16384 query_context, 8192 gplan 1 GetCachedPlan: 8192 context, 32768 query_context, 32768 gplan 2 GetCachedPlan: 8192 context, 131072 query_context, 131072 gplan 3 GetCachedPlan: 16384 context, 8192 query_context, 16384 gplan 1 GetCachedPlan: 16384 context, 16384 query_context, 16384 gplan 2 GetCachedPlan: 16384 context, 16384 query_context, 17408 gplan 1 GetCachedPlan: 16384 context, 16384 query_context, 32768 gplan 1 GetCachedPlan: 16384 context, 16384 query_context, 65536 gplan (The first column is the number of occurrences of the log entry; I got this list from "grep|sort|uniq -c" on the postmaster log.) Patch for this is attached, just in the interests of full disclosure. So yeah, there are cases where you can save a whole lot by deleting the query_context and/or gplan, but they're pretty few and far between; more commonly, the nonreclaimable data in "context" accounts for a third of the usage. (BTW, it looks to me like the code tries to keep the *total* usage under the GUC limit, not the freeable usage. Which means it won't be hard at all to drive it to the worst case where it tries to free everything all the time, if the nonreclaimable data is already over the limit.) Admittedly, the regression tests might well not be representative of common usage, but if you don't want to take them as a benchmark then we need some other benchmark we can look at. I also notice that this doesn't seem to be doing anything with CachedExpressions, which are likely to be a pretty big factor in the usage of e.g. plpgsql. Now, I'd be the first to say that my thoughts about this are probably biased by my time at Salesforce, where their cache-consumption problems were driven by lots and lots and lots and lots (and lots) of plpgsql code. Maybe with another usage pattern you'd come to a different conclusion, but if I were trying to deal with that situation, what I'd look at doing is reclaiming stuff starting at the plpgsql function cache level, and then cascading down to the plancache entries referenced by a plpgsql function body you've chosen to free. One major advantage of doing that is that plpgsql has a pretty clear idea of when a given function cache instance has gone to zero refcount, whereas plancache.c simply doesn't know that. As far as the crash issue is concerned, I notice that right now the cfbot is showing green for this patch, but that seems to just be because the behavior is unstable. I see crashes in "make installcheck-parallel" about 50% of the time with this patch applied. Since, in fact, the patch is not supposed to be doing anything at all with prepared_statement_limit set to zero, that suggests sloppiness in the refactoring that was done to separate out the resource-freeing code. On the other hand, if I do ALTER SYSTEM SET prepared_statement_limit = 1 and then run "make installcheck-parallel", I see a different set of failures. It rather looks like the patch is deleting plans out from under plpgsql, which connects back to my point about plancache.c not really knowing whether a plan is in use or not. Certainly, EnforcePreparedStatementLimit as coded here has no idea about that. (Speaking of ALTER SYSTEM, why the devil is the variable PGC_POSTMASTER? That seems entirely silly.) Aside from Alvaro's style points, I'm fairly concerned about the overhead the patch will add when active. Running through the entire plancache and collecting memory stats is likely to be quite an expensive proposition when there are lots of entries, yet this is willing to do that over again at the drop of a hat. It won't be hard to get this to expend O(N^2) time with N entries. I think that for acceptable performance, it'll be necessary to track total usage incrementally instead of doing it this way. regards, tom lane diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c index abc3062..210c049 100644 --- a/src/backend/utils/cache/plancache.c +++ b/src/backend/utils/cache/plancache.c @@ -431,6 +431,37 @@ CompleteCachedPlan(CachedPlanSource *plansource, plansource->is_complete = true; plansource->is_valid = true; + + { + MemoryContextCounters counters; + MemoryContext context; + Size csize, + qcsize = 0, + gpsize = 0; + + memset(&counters, 0, sizeof(counters)); + context = plansource->context; + context->methods->stats(context, NULL, NULL, &counters); + csize = counters.totalspace; + + if (plansource->query_context) + { + memset(&counters, 0, sizeof(counters)); + context = plansource->query_context; + context->methods->stats(context, NULL, NULL, &counters); + qcsize = counters.totalspace; + } + + if (plansource->gplan) + { + memset(&counters, 0, sizeof(counters)); + context = plansource->gplan->context; + context->methods->stats(context, NULL, NULL, &counters); + gpsize = counters.totalspace; + } + elog(LOG, "CompleteCachedPlan: %zu context, %zu query_context, %zu gplan", + csize, qcsize, gpsize); + } } /* @@ -1188,6 +1219,38 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams, /* Update generic_cost whenever we make a new generic plan */ plansource->generic_cost = cached_plan_cost(plan, false); + + { + MemoryContextCounters counters; + MemoryContext context; + Size csize, + qcsize = 0, + gpsize = 0; + + memset(&counters, 0, sizeof(counters)); + context = plansource->context; + context->methods->stats(context, NULL, NULL, &counters); + csize = counters.totalspace; + + if (plansource->query_context) + { + memset(&counters, 0, sizeof(counters)); + context = plansource->query_context; + context->methods->stats(context, NULL, NULL, &counters); + qcsize = counters.totalspace; + } + + if (plansource->gplan) + { + memset(&counters, 0, sizeof(counters)); + context = plansource->gplan->context; + context->methods->stats(context, NULL, NULL, &counters); + gpsize = counters.totalspace; + } + elog(LOG, "GetCachedPlan: %zu context, %zu query_context, %zu gplan", + csize, qcsize, gpsize); + } + /* * If, based on the now-known value of generic_cost, we'd not have * chosen to use a generic plan, then forget it and make a custom
I wrote: > As far as the crash issue is concerned, I notice that right now the > cfbot is showing green for this patch, but that seems to just be because > the behavior is unstable. I see crashes in "make installcheck-parallel" > about 50% of the time with this patch applied. Since, in fact, > the patch is not supposed to be doing anything at all with > prepared_statement_limit set to zero, that suggests sloppiness in the > refactoring that was done to separate out the resource-freeing code. Oh ... actually, I bet the problem is that the patch thinks it's okay to immediately free space in PlanCacheRelCallback and friends, rather than just marking invalid entries as invalid. Nope, you cannot do that. You can't tell whether that data is being examined in an outer call level. In principle I think you could decrement-the-refcount-and-possibly-free right away on the gplan, since any outside uses of that ought to have their own refcounts. But the query_context data isn't refcounted. Also, some of the failures I saw looked like premature deletion of a plan, not query_context data, so the patch seems to be too aggressive on that side too. regards, tom lane
Re: Patch: New GUC prepared_statement_limit to limit memory used byprepared statements
On Tue, Sep 03, 2019 at 06:30:32PM -0400, Tom Lane wrote: > Oh ... actually, I bet the problem is that the patch thinks it's okay > to immediately free space in PlanCacheRelCallback and friends, rather > than just marking invalid entries as invalid. Nope, you cannot do that. > You can't tell whether that data is being examined in an outer call > level. > > In principle I think you could decrement-the-refcount-and-possibly-free > right away on the gplan, since any outside uses of that ought to have > their own refcounts. But the query_context data isn't refcounted. > Also, some of the failures I saw looked like premature deletion of a > plan, not query_context data, so the patch seems to be too aggressive > on that side too. We are a couple of months after this update, and the patch has not been updated, so I am marking it as returned with feedack. -- Michael