Обсуждение: Assert in pg_stat_statements

Поиск
Список
Период
Сортировка

Assert in pg_stat_statements

От
Satoshi Nagayasu
Дата:
Hi,

I just found that pg_stat_statements causes assert when queryId is
set by other module, which is loaded prior to pg_stat_statements in
the shared_preload_libraries parameter.

Theoretically, queryId in the shared memory could be set by other
module by design.

So, IMHO, pg_stat_statements should not cause assert even if queryId
already has non-zero value --- my module has this problem now.

Is my understanding correct? Any comments?

Regards,

--
NAGAYASU Satoshi <snaga@uptime.jp>

Вложения

Re: Assert in pg_stat_statements

От
Satoshi Nagayasu
Дата:
On 2015/08/08 22:32, Robert Haas wrote:
> On Sat, Aug 8, 2015 at 5:00 AM, Satoshi Nagayasu <snaga@uptime.jp> wrote:
>> I just found that pg_stat_statements causes assert when queryId is
>> set by other module, which is loaded prior to pg_stat_statements in
>> the shared_preload_libraries parameter.
>>
>> Theoretically, queryId in the shared memory could be set by other
>> module by design.
>>
>> So, IMHO, pg_stat_statements should not cause assert even if queryId
>> already has non-zero value --- my module has this problem now.
>>
>> Is my understanding correct? Any comments?
>
> Seems like an ERROR would be more appropriate than an Assert.

Well, it's not that simple, and I'm afraid that it may not fix
the issue.

Here, let's assume that we have two modules (foo and pg_stat_statements)
which calculate, use and store Query->queryId independently.

What we may see when two are enabled is following.

(1) Enable two modules in shared_preload_libraries.
    shared_preload_libraries = 'foo,pg_stat_statements'

(2) Once a query comes in, a callback function in module "foo"    associated with post_parse_analyze_hook calculates
andstore    queryId in Query->queryId.
 

(3) After that, a callback function (pgss_post_parse_analyze) in    "pg_stat_statements" associated with
post_parse_analyze_hook   detects that Query->queryId has non-zero value, and asserts it.
 

As a result, we can not have two or more modules that use queryId
at the same time.

But I think it should be possible because one of the advantages of
PostgreSQL is its extensibility.

So, I think the problem here is the fact that pg_stat_statements
deals with having non-zero value in queryId as "error" even if
it has exact the same value with other module.

Regards,
-- 
NAGAYASU Satoshi <snaga@uptime.jp>



Re: Assert in pg_stat_statements

От
Robert Haas
Дата:
On Sun, Aug 9, 2015 at 1:36 AM, Satoshi Nagayasu <snaga@uptime.jp> wrote:
> On 2015/08/08 22:32, Robert Haas wrote:
>> On Sat, Aug 8, 2015 at 5:00 AM, Satoshi Nagayasu <snaga@uptime.jp> wrote:
>>>
>>> I just found that pg_stat_statements causes assert when queryId is
>>> set by other module, which is loaded prior to pg_stat_statements in
>>> the shared_preload_libraries parameter.
>>>
>>> Theoretically, queryId in the shared memory could be set by other
>>> module by design.
>>>
>>> So, IMHO, pg_stat_statements should not cause assert even if queryId
>>> already has non-zero value --- my module has this problem now.
>>>
>>> Is my understanding correct? Any comments?
>>
>>
>> Seems like an ERROR would be more appropriate than an Assert.
>
>
> Well, it's not that simple, and I'm afraid that it may not fix
> the issue.
>
> Here, let's assume that we have two modules (foo and pg_stat_statements)
> which calculate, use and store Query->queryId independently.
>
> What we may see when two are enabled is following.
>
> (1) Enable two modules in shared_preload_libraries.
>
>     shared_preload_libraries = 'foo,pg_stat_statements'
>
> (2) Once a query comes in, a callback function in module "foo"
>     associated with post_parse_analyze_hook calculates and store
>     queryId in Query->queryId.
>
> (3) After that, a callback function (pgss_post_parse_analyze) in
>     "pg_stat_statements" associated with post_parse_analyze_hook
>     detects that Query->queryId has non-zero value, and asserts it.
>
> As a result, we can not have two or more modules that use queryId
> at the same time.
>
> But I think it should be possible because one of the advantages of
> PostgreSQL is its extensibility.
>
> So, I think the problem here is the fact that pg_stat_statements
> deals with having non-zero value in queryId as "error" even if
> it has exact the same value with other module.

I'm not too excited about supporting the use case where there are two
people using queryId but it just so happens that they always set
exactly the same value.  That seems like a weird setup.  Wouldn't that
mean both modules were applying the same jumble algorithm, and
therefore you're doing the work of computing the jumble twice per
query?  It would be better to have the second module have an option
not to compute the query ID and just do whatever else it does.  Then,
if you want to use that other module with pg_stat_statements, flip the
GUC.

The reason I think an Assert is wrong is that if there are other
modules using queryId, we should catch attempts to use the queryId for
more than one purpose even in non-assert enabled builds.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Assert in pg_stat_statements

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> I'm not too excited about supporting the use case where there are two
> people using queryId but it just so happens that they always set
> exactly the same value.  That seems like a weird setup.  Wouldn't that
> mean both modules were applying the same jumble algorithm, and
> therefore you're doing the work of computing the jumble twice per
> query?

Not only that, but you'd need to keep the two modules in sync, which
would be one of the greatest recipes for bugs we've thought of yet.

If there's actually a use case for that sort of thing, I would vote
for moving the jumble-calculation code into core and making both of
the interested extensions do
/* Calculate query hash if it's not been done yet */if (query->queryId == 0)    calculate_query_hash(query);

I note also that pg_stat_statements is using query->queryId == 0 as a
state flag, which means that if some other module has already calculated
the hash that would break it, even if the value were identical.
(In particular, it's using the event of calculating the queryId as an
opportunity to possibly make an entry in its own hashtable.)  So you'd
need to do something to replace that logic if you'd like to use
pg_stat_statements concurrently with some other use of queryId.

> The reason I think an Assert is wrong is that if there are other
> modules using queryId, we should catch attempts to use the queryId for
> more than one purpose even in non-assert enabled builds.

Yeah, an explicit runtime check and elog() would be safer.
        regards, tom lane



Re: Assert in pg_stat_statements

От
Peter Geoghegan
Дата:
On Sun, Aug 9, 2015 at 10:23 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> If there's actually a use case for that sort of thing, I would vote
> for moving the jumble-calculation code into core

I think that there'd be a good case for doing that for several other
reasons. It would be great to have a per-queryId
log_min_duration_statement, for example.

Most likely, this would work based on a system of grouping queries.
One grouping might be "SLA queries", that the user hopes to prevent
ever hobbling loading the application's home page. Outliers in this
group of queries are far more interesting to the user than any other,
or expectations are in some other way quite different from the
average. At the same time, maybe for reporting queries 60 seconds is
considered an unreasonably long time to wait.

-- 
Peter Geoghegan



Re: Assert in pg_stat_statements

От
Satoshi Nagayasu
Дата:
2015-08-10 0:04 GMT+09:00 Robert Haas <robertmhaas@gmail.com>:
> On Sun, Aug 9, 2015 at 1:36 AM, Satoshi Nagayasu <snaga@uptime.jp> wrote:
>> On 2015/08/08 22:32, Robert Haas wrote:
>>> On Sat, Aug 8, 2015 at 5:00 AM, Satoshi Nagayasu <snaga@uptime.jp> wrote:
>>>>
>>>> I just found that pg_stat_statements causes assert when queryId is
>>>> set by other module, which is loaded prior to pg_stat_statements in
>>>> the shared_preload_libraries parameter.
>>>>
>>>> Theoretically, queryId in the shared memory could be set by other
>>>> module by design.
>>>>
>>>> So, IMHO, pg_stat_statements should not cause assert even if queryId
>>>> already has non-zero value --- my module has this problem now.
>>>>
>>>> Is my understanding correct? Any comments?
>>>
>>>
>>> Seems like an ERROR would be more appropriate than an Assert.
>>
>>
>> Well, it's not that simple, and I'm afraid that it may not fix
>> the issue.
>>
>> Here, let's assume that we have two modules (foo and pg_stat_statements)
>> which calculate, use and store Query->queryId independently.
>>
>> What we may see when two are enabled is following.
>>
>> (1) Enable two modules in shared_preload_libraries.
>>
>>     shared_preload_libraries = 'foo,pg_stat_statements'
>>
>> (2) Once a query comes in, a callback function in module "foo"
>>     associated with post_parse_analyze_hook calculates and store
>>     queryId in Query->queryId.
>>
>> (3) After that, a callback function (pgss_post_parse_analyze) in
>>     "pg_stat_statements" associated with post_parse_analyze_hook
>>     detects that Query->queryId has non-zero value, and asserts it.
>>
>> As a result, we can not have two or more modules that use queryId
>> at the same time.
>>
>> But I think it should be possible because one of the advantages of
>> PostgreSQL is its extensibility.
>>
>> So, I think the problem here is the fact that pg_stat_statements
>> deals with having non-zero value in queryId as "error" even if
>> it has exact the same value with other module.
>
> I'm not too excited about supporting the use case where there are two
> people using queryId but it just so happens that they always set
> exactly the same value.  That seems like a weird setup.

I think so too, but unfortunately, I have to do that to work
with 9.4 and 9.5.

> Wouldn't that
> mean both modules were applying the same jumble algorithm, and
> therefore you're doing the work of computing the jumble twice per
> query?

Basically yes, because both modules should work not only together,
but also solely. So, my module need to have capability to calculate
queryId itself.

More precisely, if we have following configuration,
 shared_preload_libraries = 'foo,pg_stat_statements'

my module "foo" calculates queryId itself, and pg_stat_statements
would do the same. (and gets crashed when it has --enable-cassert)

If we have following configuration,
 shared_preload_libraries = 'pg_stat_statements,foo'

my module "foo" can skip queryId calculation because
pg_stat_statements has already done that.

Of course, my module "foo" should be able to work solely as following.
 shared_preload_libraries = 'foo'

> It would be better to have the second module have an option
> not to compute the query ID and just do whatever else it does.  Then,
> if you want to use that other module with pg_stat_statements, flip the
> GUC.

Yes, that's exactly what I'm doing in my module, and what I intended
in my first patch for pg_stat_statements on this thread. :)

Regards,
-- 
NAGAYASU Satoshi <snaga@uptime.jp>



Re: Assert in pg_stat_statements

От
Satoshi Nagayasu
Дата:
2015-08-10 2:23 GMT+09:00 Tom Lane <tgl@sss.pgh.pa.us>:
> Robert Haas <robertmhaas@gmail.com> writes:
>> I'm not too excited about supporting the use case where there are two
>> people using queryId but it just so happens that they always set
>> exactly the same value.  That seems like a weird setup.  Wouldn't that
>> mean both modules were applying the same jumble algorithm, and
>> therefore you're doing the work of computing the jumble twice per
>> query?
>
> Not only that, but you'd need to keep the two modules in sync, which
> would be one of the greatest recipes for bugs we've thought of yet.
>
> If there's actually a use case for that sort of thing, I would vote
> for moving the jumble-calculation code into core and making both of
> the interested extensions do
>
>         /* Calculate query hash if it's not been done yet */
>         if (query->queryId == 0)
>                 calculate_query_hash(query);

I vote for this too.

Jumble-calculation is the smartest way to identify query groups,
so several modules can take advantages of it if in the core.

Regards,
-- 
Satoshi Nagayasu <snaga@uptime.jp>