Re: Hash Function: MD5 or other?
| От | Alex Stapleton |
|---|---|
| Тема | Re: Hash Function: MD5 or other? |
| Дата | |
| Msg-id | 34DEB1AE-95EE-4462-8E51-9FEE07A9C565@advfn.com обсуждение исходный текст |
| Ответ на | Hash Function: MD5 or other? (Peter Fein <pfein@pobox.com>) |
| Ответы |
Re: Hash Function: MD5 or other?
|
| Список | pgsql-general |
On 13 Jun 2005, at 23:49, Peter Fein wrote: > Hi- > > I wanted to use a partially unique index (dependent on a flag) on a > TEXT > column, but the index row size was too big for btrees. See the thread > "index row size 2728 exceeds btree maximum, 2713" from the > beginning of > this month for someone with a similar problem. In it, someone > suggests > indexing on a hash of the text. I'm fine with this, as the texts in > question are similar enough to each other to make collisions unlikely > and a collision won't really cause any serious problems. > > My question is: is the builtin MD5 appropriate for this use or > should I > be using a function from pl/something? Figures on collision rates > would > be nice as well - the typical chunk of text is probably 1k-8k. > > Thanks! > As others have said MD5 isn't the fastest one out there. However no cryptographically secure hashes are really that fast. However you can probably get away with using a CRC hash which is long enough to reduce your chances of collision a lot. However, PostgreSQL doesn't have a built in CRC function, which is a bit of a pain unless your prepared to implement one, or use pl/* to do it, which sounds like overkill. I suggest you run some benchmarks on MD5 and see if it's fast enough to meet your current (and perhaps future) needs. You could of course, just use a hash index on your text field! I think that would probably cope with larger data sets OK. It has the advantage of handling collisions for you as well :) However it means you have to transfer large amounts of data around, so if network speed ever becomes a limitation, MD5 hashing (or enabling compression on your PgSQL connection) may help. > -- > Peter Fein pfein@pobox.com > 773-575-0694 > > Basically, if you're not a utopianist, you're a schmuck. -J. Feldman > > ---------------------------(end of > broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org > > >
В списке pgsql-general по дате отправления: