Re: Number of buckets in a hash join
От | Tom Lane |
---|---|
Тема | Re: Number of buckets in a hash join |
Дата | |
Msg-id | 17849.1359392285@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Number of buckets in a hash join (Heikki Linnakangas <hlinnakangas@vmware.com>) |
Список | pgsql-hackers |
Heikki Linnakangas <hlinnakangas@vmware.com> writes: > The first question is, why do we aim at 10 tuples per bucket? I see nothing particularly wrong with that. The problem here is with having 1000 tuples per bucket. > Ideally, the planner would always make a good guess the number of rows, > but for the situations that it doesn't, it would be good if the hash > table was enlarged if it becomes too full. Yeah, possibly. The proposed test case actually doesn't behave very badly if work_mem is small, because there is logic in there to adjust the number of batches. You didn't say what work_mem you're testing at, but it's clearly more than the default 1MB. I think the issue arises if the initial estimate of hashtable size is a good bit less than work_mem, so the number of buckets is set to something a good bit less than what would be optimal if we're using more of work_mem. This seems a little reminiscent of what we did recently in tuplesort to make better use of work_mem --- in both cases we have to choose a pointer-array size that will make best use of work_mem after the tuples themselves are added. regards, tom lane
В списке pgsql-hackers по дате отправления: