Re: tweaking NTUP_PER_BUCKET
От | Tomas Vondra |
---|---|
Тема | Re: tweaking NTUP_PER_BUCKET |
Дата | |
Msg-id | 53C2DECC.2010408@fuzzy.cz обсуждение исходный текст |
Ответ на | Re: tweaking NTUP_PER_BUCKET (Tomas Vondra <tv@fuzzy.cz>) |
Ответы |
Re: tweaking NTUP_PER_BUCKET
|
Список | pgsql-hackers |
On 11.7.2014 19:25, Tomas Vondra wrote: > 2) walking through the tuples sequentially > ------------------------------------------ > > The other option is not to walk the tuples through buckets, but by > walking throught the chunks - we know the tuples are stored as > HashJoinTuple/MinimalTuple, so it shouldn't be that difficult. And by > walking the tuples in the order they're stored, we may just rearrange > the tuples in already allocated memory. And maybe it could be even > faster than the current code, because of the sequential access to the > memory (as opposed to the random access through buckets) and CPU caches. > So I like this approach - it's simple, clean and shouldn't add any > overhead (memory or CPU). So, attached is a patch that implements this. It's not very complicated and the resulting code is surprisingly readable (IMHO comprable to the previous code). Basics of the patch: (1) adds HashChunkData structure - a buffer used for dense allocation of tuples, 16kB by default (2) adds 'chunks_batch' into HashJoinTableData, which is a linked list of HashChunkData (3) the allocation itself is done in chunk_alloc - in short it either allocates the tuple in the current chunk, or adds a new one if needed (4) ExecHashTableInsert calls chunk_alloc (instead of the regular MemoryContextAlloc) (5) the main change is in ExecHashIncreaseNumBatches, which now can't do pfree (does not work with dense allocation), so it reads the tuples from chunks directly and simply rebuilds the buckets from scratch (thanks to this we only need one more chunk of memory, not a complete copy, and we don't need to duplicate buckets at all) >From the tests I've done so far this seems to work perfectly - it still saves the memory overhead, and I've seen no difference in performance. The current patch only implemnents this for tuples in the main hash table, not for skew buckets. I plan to do that, but it will require separate chunks for each skew bucket (so we can remove it without messing with all of them). The chunks for skew buckets should be smaller (I'm thinking about ~4kB), because there'll be more of them, but OTOH those buckets are for frequent values so the chunks should not remain empty. But let's see if the current patch misses something important first. regards Tomas
Вложения
В списке pgsql-hackers по дате отправления: