Re: Hash function for numeric (WIP)
От | Neil Conway |
---|---|
Тема | Re: Hash function for numeric (WIP) |
Дата | |
Msg-id | 1178247839.18303.53.camel@goldbach обсуждение исходный текст |
Ответ на | Re: Hash function for numeric (WIP) (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Hash function for numeric (WIP)
|
Список | pgsql-patches |
On Mon, 2007-30-04 at 00:04 -0400, Tom Lane wrote: > I'm still not very comfortable with that. You're proposing to add a > pretty obvious failure mechanism --- any numeric-returning function > that failed to "normalize" its output would now create a subtle, > hard-to-find bug. What about teaching hash_numeric() to explicitly ignore leading and trailing zero digits? > Perhaps a suitable test would be to compare the number of > hash collisions in a large set of randomly-chosen-but-distinct > numeric values. Okay, I did a little testing. I created a table with ~2 million numerics, generated with equal probability by one of the two expressions: random()::numeric * ((random() * 100) ^ 20) * 100 random()::numeric * -100; There were 251 duplicate inputs that I didn't bother eliminating; of course, they should have effected both hash functions identically. Results: neilc=# create temp table r1 as select hash_numeric(a), count(*) from x group by hash_numeric(a); neilc=# create temp table r2 as select old_hash_numeric(a), count(*) from x group by old_hash_numeric(a); neilc=# select count(*) from r1 where count > 1; count -------- 123342 (1 row) neilc=# select count(*) from r2 where count > 1; count ------- 756 (1 row) old_hash_numeric() is the hash_any()-based hash, hash_numeric() is my coding of your suggested hash function (see attached patch). So it seems we need a better hash if we're not going to use hash_any(). The analysis required to write a robust hash function from scratch is precisely why I'd prefer to use hash_any() if possible. -Neil
Вложения
В списке pgsql-patches по дате отправления: