Re: Hash id in pg_stat_statements
От | Magnus Hagander |
---|---|
Тема | Re: Hash id in pg_stat_statements |
Дата | |
Msg-id | CABUevEwW0iQs1kY8qfh6h=01AHWGeemEn=Cr_Wc6+PEKf6bS5Q@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Hash id in pg_stat_statements (Peter Geoghegan <peter@2ndquadrant.com>) |
Ответы |
Re: Hash id in pg_stat_statements
|
Список | pgsql-hackers |
On Tue, Oct 2, 2012 at 8:22 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote: > On 2 October 2012 18:16, Tom Lane <tgl@sss.pgh.pa.us> wrote >> 1. Why isn't something like md5() on the reported query text an equally >> good solution for users who want a query hash? > > Because that does not uniquely identify the entry. The very first > thing that the docs say on search_path is "Qualified names are tedious > to write, and it's often best not to wire a particular schema name > into applications anyway". Presumably, the reason it's best not to > wire schema names into apps is because it might be useful to modify > search_path in a way that dynamically made the same queries in some > application reference what are technically distinct relations. If > anyone does this, and it seems likely that many do for various > reasons, they will be out of luck when using some kind of > pg_stat_statements aggregation. > > This was the behaviour that I intended for pg_stat_statements all > along, and I think it's better than a solution that matches query > strings. > >> 2. If people are going to accumulate stats on queries over a long period >> of time, is a 32-bit hash really good enough for the purpose? If I'm >> doing the math right, the chance of collision is already greater than 1% >> at 10000 queries, and rises to about 70% for 100000 queries; see >> http://en.wikipedia.org/wiki/Birthday_paradox >> We discussed this issue and decided it was okay for pg_stat_statements's >> internal hash table, but it's not at all clear to me that it's sensible >> to use 32-bit hashes for external accumulation of query stats. > > Well, forgive me for pointing this out, but I did propose that the > hash be a 64-bit value (which would have necessitated adopting > hash_any() to produce 64-bit values), but you rejected the proposal. I > arrived at the same probability for a collision as you did and posted > in to the list, in discussion shortly after the normalisation stuff > was committed. What was the argument for rejecting it? Just that it required hash_any() to be adapted? Now that we have a very clear use case where this would help, perhaps it's time to re-visit this proposal? --Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/
В списке pgsql-hackers по дате отправления: