Re: Solving hash table overrun problems
От | Bruno Wolff III |
---|---|
Тема | Re: Solving hash table overrun problems |
Дата | |
Msg-id | 20050304162201.GA22428@wolff.to обсуждение исходный текст |
Ответ на | Re: Solving hash table overrun problems (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Solving hash table overrun problems
|
Список | pgsql-hackers |
On Fri, Mar 04, 2005 at 10:42:08 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Bruno Wolff III <bruno@wolff.to> writes: > > Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> * Estimate the number of batches N using the planner's estimate. > >> We will always choose N a power of 2. A tuple's batch number is > >> ((H div K) mod N). > > > If K is way low this could be very slow. > > How so? You're not concerned about the time to do the division itself > are you? No, rather having lots of entries in the same hash buckets. I was thinking about recent discussions were there was a large number of rows with almost all of the keys having just a few values, but there are a lot of unique keys, but analyze doesn't see enough of the unique ones to make a good estimate for K. > >> * Now begin scanning the outer join input. Tuples of batch number > >> zero (according to the current calculation) can be matched to the > >> current hashtable contents. Tuples of higher batch numbers are dropped > >> into holding files for the outer input, one per batch. > > > For new keys at this step do you know their final disposition so that making > > new hash entries won't be necessary? > > Well, we probably have a pretty fair idea of N at this point, so it'll > usually be right --- but we reserve the right to increase N again later > in case we have to do so because one of the later inner batches is much > bigger than the zero batch we are currently processing. I just noticed that it wasn't mentioned that an overflow could occur at this step. I didn't think it would be hard to do one if needed, but was wondering if knowing a key couldn't match (because it was in the current batch 0 and didn't match and existing key in that batch) was enough to emit or discard the row.
В списке pgsql-hackers по дате отправления: