Re: Enum proposal / design

Поиск

Список

Период

Сортировка

От	Tom Dunstan
Тема	Re: Enum proposal / design
Дата	16 августа 2006 г. 14:17:57
Msg-id	44E3531E.4090103@tomd.cc обсуждение исходный текст
Ответ на	Re: Enum proposal / design (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы	Re: Enum proposal / design
Список	pgsql-hackers

Дерево обсуждения

Tom Lane wrote:
> Tom Dunstan <pgsql@tomd.cc> writes:
>>On disk, enums will occupy 4 bytes: the high 22 bits will be an enum
>>identifier, with the bottom 10 bits being the enum value. This allows
>>1024 values for a given enum, and 2^22 different enum types, both of
>>which should be heaps. The exact distribution of bits doesn't matter all 
>>that much, we just picked some that we were comfortable with.
> 
> 
> I think this is excessive concern for bit-shaving.  Make the on-disk
> representation be 8 bytes instead of 4, then you can store the OID
> directly and have no need for the separate identifier concept.  This
> in turn eliminates one index, one syscache, and one set of lookup/cache
> routines.  And you can have as many values of an enum as you darn please.

That's all true. It's a bit depressing to think that IMO 99% of users of 
this will have enum values whose range would fit into 1 byte, but we'll 
be using 8 to store it on disk. I had convinced myself that 4 was ok on 
the basis that alignment issues in surrounding columns would pad out the 
remaining bits anyway much of the time. Was I correct in that 
assumption? Would e.g. an int after a char require 3 bytes of padding?

Ok, I'll run one more idea up the flagpole before giving up on a 4 byte 
on disk representation. :) How about assigning a unique 4 byte id to 
each enum value, and storing that on disk. This would be unique across 
the database, not per enum type. The structure of pg_enum would be a bit 
different, as the per-type enum id would be gone, and there would be 
multiple rows for each enum type. The columns would be: the type oid, 
the associated unique id and the textual representation. That would 
probably simplify the caching mechanism as well, since input function 
lookups could do a straight syscache lookup on type oid and text 
representation, and the output function could do a straight lookup on 
the unique id. No need to muck around creating a little dynahash or 
whatever to attach to the fn_entra pointer.

It does still require the extra syscache, but it removes the limitations 
on number of enum types and number of values per type while keeping the 
on disk size smallish. I like that better than the original idea, actually.

> If you didn't notice already: typcache is the place to put any
> type-related caching you need to add.

I hadn't. I'll investigate. Thanks.

Cheers

Tom

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Enum proposal / design