Re: B-Tree support function number 3 (strxfrm() optimization)
От | Claudio Freire |
---|---|
Тема | Re: B-Tree support function number 3 (strxfrm() optimization) |
Дата | |
Msg-id | CAGTBQpY4Qbunj+kYc5hRin3jWP4uovmTbgcKW-VY0LKxH9Ggxg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: B-Tree support function number 3 (strxfrm() optimization) (Peter Geoghegan <pg@heroku.com>) |
Ответы |
Re: B-Tree support function number 3 (strxfrm() optimization)
|
Список | pgsql-hackers |
On Mon, Jul 14, 2014 at 2:53 PM, Peter Geoghegan <pg@heroku.com> wrote: > My concern is that it won't be worth it to do the extra work, > particularly given that I already have 8 bytes to work with. Supposing > I only had 4 bytes to work with (as researchers writing [2] may have > only had in 1994), that would leave me with a relatively small number > of distinct normalized keys in many representative cases. For example, > I'd have a mere 40,665 distinct normalized keys in the case of my > "cities" database, rather than 243,782 (out of a set of 317,102 rows) > for 8 bytes of storage. But if I double that to 16 bytes (which might > be taken as a proxy for what a good compression scheme could get me), > I only get a modest improvement - 273,795 distinct keys. To be fair, > that's in no small part because there are only 275,330 distinct city > names overall (and so most dups get away with a cheap memcmp() on > their tie-breaker), but this is a reasonably organic, representative > dataset. Are those numbers measured on MAC's strxfrm? That was the one with suboptimal entropy on the first 8 bytes.
В списке pgsql-hackers по дате отправления: