Re: tweaking NTUP_PER_BUCKET
От | Robert Haas |
---|---|
Тема | Re: tweaking NTUP_PER_BUCKET |
Дата | |
Msg-id | CA+TgmoZRWuAxAKzodDEjRUJyrcFVGhuE9_LAvexzx8ffWrUJwA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: tweaking NTUP_PER_BUCKET (Tomas Vondra <tv@fuzzy.cz>) |
Ответы |
Re: tweaking NTUP_PER_BUCKET
Re: tweaking NTUP_PER_BUCKET |
Список | pgsql-hackers |
On Tue, Jul 8, 2014 at 5:16 PM, Tomas Vondra <tv@fuzzy.cz> wrote: > Thinking about this a bit more, do we really need to build the hash > table on the first pass? Why not to do this: > > (1) batching > - read the tuples, stuff them into a simple list > - don't build the hash table yet > > (2) building the hash table > - we have all the tuples in a simple list, batching is done > - we know exact row count, can size the table properly > - build the table We could do this, and in fact we could save quite a bit of memory if we allocated say 1MB chunks and packed the tuples in tightly instead of palloc-ing each one separately. But I worry that rescanning the data to build the hash table would slow things down too much. > Also, maybe we could use a regular linear hash table [1], instead of > using the current implementation with NTUP_PER_BUCKET=1. (Although, > that'd be absolutely awful with duplicates.) Linear probing is pretty awful unless your load factor is << 1. You'd probably want NTUP_PER_BUCKET=0.25, or something like that, which would eat up a lot of memory. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: