Re: Hash id in pg_stat_statements
От | Daniel Farina |
---|---|
Тема | Re: Hash id in pg_stat_statements |
Дата | |
Msg-id | CAAZKuFYmWEpGZjMMEcSjyB3_y--eCWZFJLo=sABz6cZ3gBtVmQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Hash id in pg_stat_statements (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-hackers |
On Wed, Oct 3, 2012 at 11:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Daniel Farina <daniel@heroku.com> writes: >> Instead, I think it makes sense to assign a number -- arbitrarily, but >> uniquely -- to the generation of a new row in pg_stat_statements, and, >> on the flip side, whenever a row is retired its number should be >> eliminated, practically, for-ever. This way re-introductions between >> two samplings of pg_stat_statements cannot be confused for a >> contiguously maintained statistic on a query. > > This argument seems sensible to me. Is there any use-case where the > proposed counter wouldn't do what people wished to do with an exposed > hash value? Yes: unless schemas are included into the canonicalized query text, two identical texts can actually mean different things. This is the only corner case (besides collision) I can think of. I also referenced an idea in a similar vein to the counter/arbitrary number: instead, one can perform a kind of error propagation from evicted entries in the hash table. This might avoid having to even bother saving a counter, which feels rather nice to me. I have a sketch (I now know of two stupid bugs in this) in implementation and it comes out very neatly so far. I got this idea from a paper that I saw implemented once, with strikingly good effect: http://www.cs.ucsb.edu/research/tech_reports/reports/2005-23.pdf Here they have a very specific rendition that is, for a couple of reasons, not quite what we want, I think. But the part I found very interesting was the simple propagation of error that made the technique possible. Inspired by this, I gave every hash entry a maximum-under-estimation error. When members of the hash table are evicted, the minimum of the metric gathered plus its already accumulated under-estimation error are propagated to the new entry. The general property is that hash table members who are frequently evicted will accumulate huge amounts of error. Those that are never evicted (thanks to constant use) never accumulate any error. A tool reading the table can take into account the error accumulation to determine if there was an eviction between two samplings, as well as bounding the error accrued between two snapshots. I think there is a tiny bit of room for some sort of ambiguity in a corner case, but I'd have to think more about it. -- fdr
В списке pgsql-hackers по дате отправления: