Re: Enum proposal / design
От | Tom Dunstan |
---|---|
Тема | Re: Enum proposal / design |
Дата | |
Msg-id | 44E3531E.4090103@tomd.cc обсуждение исходный текст |
Ответ на | Re: Enum proposal / design (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Enum proposal / design
|
Список | pgsql-hackers |
Tom Lane wrote: > Tom Dunstan <pgsql@tomd.cc> writes: >>On disk, enums will occupy 4 bytes: the high 22 bits will be an enum >>identifier, with the bottom 10 bits being the enum value. This allows >>1024 values for a given enum, and 2^22 different enum types, both of >>which should be heaps. The exact distribution of bits doesn't matter all >>that much, we just picked some that we were comfortable with. > > > I think this is excessive concern for bit-shaving. Make the on-disk > representation be 8 bytes instead of 4, then you can store the OID > directly and have no need for the separate identifier concept. This > in turn eliminates one index, one syscache, and one set of lookup/cache > routines. And you can have as many values of an enum as you darn please. That's all true. It's a bit depressing to think that IMO 99% of users of this will have enum values whose range would fit into 1 byte, but we'll be using 8 to store it on disk. I had convinced myself that 4 was ok on the basis that alignment issues in surrounding columns would pad out the remaining bits anyway much of the time. Was I correct in that assumption? Would e.g. an int after a char require 3 bytes of padding? Ok, I'll run one more idea up the flagpole before giving up on a 4 byte on disk representation. :) How about assigning a unique 4 byte id to each enum value, and storing that on disk. This would be unique across the database, not per enum type. The structure of pg_enum would be a bit different, as the per-type enum id would be gone, and there would be multiple rows for each enum type. The columns would be: the type oid, the associated unique id and the textual representation. That would probably simplify the caching mechanism as well, since input function lookups could do a straight syscache lookup on type oid and text representation, and the output function could do a straight lookup on the unique id. No need to muck around creating a little dynahash or whatever to attach to the fn_entra pointer. It does still require the extra syscache, but it removes the limitations on number of enum types and number of values per type while keeping the on disk size smallish. I like that better than the original idea, actually. > If you didn't notice already: typcache is the place to put any > type-related caching you need to add. I hadn't. I'll investigate. Thanks. Cheers Tom
В списке pgsql-hackers по дате отправления: