Re: Hash id in pg_stat_statements
От | Tom Lane |
---|---|
Тема | Re: Hash id in pg_stat_statements |
Дата | |
Msg-id | 9844.1349198176@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: Hash id in pg_stat_statements (Stephen Frost <sfrost@snowman.net>) |
Ответы |
Re: Hash id in pg_stat_statements
|
Список | pgsql-hackers |
Stephen Frost <sfrost@snowman.net> writes: > * Peter Geoghegan (peter@2ndquadrant.com) wrote: >> I simply do not understand objections to the proposal. Have I missed something? > It was my impression that the concern is the stability of the hash value > and ensuring that tools which operate on it don't mistakenly lump two > different queries into one because they had the same hash value (caused > by a change in our hashing algorithm or input into it over time, eg a > point release). I was hoping to address that to allow this proposal to > move forward.. I think there are at least two questions that ought to be answered: 1. Why isn't something like md5() on the reported query text an equally good solution for users who want a query hash? 2. If people are going to accumulate stats on queries over a long period of time, is a 32-bit hash really good enough for the purpose? If I'm doing the math right, the chance of collision is already greater than 1% at 10000 queries, and rises to about 70% for 100000 queries; see http://en.wikipedia.org/wiki/Birthday_paradox We discussed this issue and decided it was okay for pg_stat_statements's internal hash table, but it's not at all clear to me that it's sensible to use 32-bit hashes for external accumulation of query stats. regards, tom lane
В списке pgsql-hackers по дате отправления: