Re: queries with lots of UNIONed relations
От | Jon Nelson |
---|---|
Тема | Re: queries with lots of UNIONed relations |
Дата | |
Msg-id | AANLkTinubA5_CFAdERCyzDy1H3v409pyVTzdvPjTaPpr@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: queries with lots of UNIONed relations (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: queries with lots of UNIONed relations
|
Список | pgsql-performance |
On Thu, Jan 13, 2011 at 5:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Thu, Jan 13, 2011 at 5:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> I don't believe there is any case where hashing each individual relation >>> is a win compared to hashing them all together. If the optimizer were >>> smart enough to be considering the situation as a whole, it would always >>> do the latter. > >> You might be right, but I'm not sure. Suppose that there are 100 >> inheritance children, and each has 10,000 distinct values, but none of >> them are common between the tables. In that situation, de-duplicating >> each individual table requires a hash table that can hold 10,000 >> entries. But deduplicating everything at once requires a hash table >> that can hold 1,000,000 entries. > >> Or am I all wet? > > If you have enough memory to de-dup them individually, you surely have > enough to de-dup all at once. It is not possible for a single hashtable > to have worse memory consumption than N hashtables followed by a union > hashtable, and in fact if there are no common values then the latter eats > twice as much space because every value appears twice in two different > hashtables. If everything were available up-front, sure. However, and please correct me if I'm wrong, but doesn't postgresql work in a fairly linear fashion, moving from table to table performing a series of operations on each? That seems to indicate that is what the plan is: Compare: for each table LOOP scan table for result rows, append to results END LOOP hash / sort + unique results versus: for each table LOOP scan table for result rows, append to table-results hash / sort+unique table-results, append to results END LOOP hash / sort + unique results In the former case, all of the result rows from all tables are appended together before the de-duplification process can start. In the latter case, only enough memory for each table's result set is necessary for de-duplification, and it would only be necessary to allocate it for that table. Is that not how this works? -- Jon
В списке pgsql-performance по дате отправления: