Re: Do we want a hashset type?
От | Tomas Vondra |
---|---|
Тема | Re: Do we want a hashset type? |
Дата | |
Msg-id | c8cb89a0-3cb8-7b70-c01d-4548128d9be3@enterprisedb.com обсуждение исходный текст |
Ответ на | Re: Do we want a hashset type? (Andrew Dunstan <andrew@dunslane.net>) |
Список | pgsql-hackers |
On 6/23/23 13:47, Andrew Dunstan wrote: > > On 2023-06-23 Fr 04:23, Joel Jacobson wrote: >> On Fri, Jun 23, 2023, at 08:40, jian he wrote: >>> I played around array_func.c >>> many of the code can be used for multiset data type. >>> now I imagine multiset as something like one dimension array. (nested >>> is somehow beyond the imagination...). >> Are you suggesting it might be a better idea to start over completely >> and work on a new code base that is based on arrayfuncs.c, >> and aim for MULTISET/SET or anyhashset from start, that would not >> only support int4/int8/uuid but any type? >> > > Before we run too far down this rabbit hole, let's discuss the storage > implications of using multisets. ISTM that for small base datums like > integers it will be a substantial increase in size, since you'll need an > addition int for the item count, unless some very clever tricks are played. > I honestly don't quite understand what exactly is meant by the proposal to "reuse array_func.c for multisets". We're implementing sets, not multisets (those were mentioned only to illustrate behavior). And the whole point is that sets are not arrays - no duplicates, ordering does not matter (so no index). I mentioned that maybe we can model sets based on arrays (say, gram.y would do similar stuff for SET[] and ARRAY[], polymorphism), not that we should store sets as arrays. Would it be possible - maybe, if we extend arrays to also maintain some hash hash table. But I'd bet that'll just make arrays more complex, and will make sets slower. Or maybe I just don't understand the proposal. Perhaps it'd be best if jian wrote a patch illustrating the idea, and showing how it performs compared to the current approach. As for the storage size, I don't think an extra "count" field would make any measurable difference. If we're storing a hash table, we're bound to have a couple percent of wasted space due to load factor (likely between 0.75 and 0.9). > As for this older discussion referred to upthread, if the SQL Standards > Committee hasn't acted on it by now it seem reasonable to think they are > unlikely to. > AFAIK multisets are included in SQL 2023, pretty much matching the draft we discussed earlier. Yeah, it's unlikely to change in the future. > Just for reference, Here's some description of Oracle's suport for > Multisets from > <https://docs.oracle.com/en/database/oracle/oracle-database/23/sqlrf/Oracle-Support-for-Optional-Features-of-SQLFoundation2011.html#GUID-3BA98AEC-FAAD-4F21-A6AD-F696B5D36D56>: > good to know regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: