Re: Group by more efficient than distinct?
От | Mark Mielke |
---|---|
Тема | Re: Group by more efficient than distinct? |
Дата | |
Msg-id | 480DE25E.4080507@mark.mielke.cc обсуждение исходный текст |
Ответ на | Re: Group by more efficient than distinct? (Matthew Wakeling <matthew@flymine.org>) |
Список | pgsql-performance |
Matthew Wakeling wrote: > On Tue, 22 Apr 2008, Mark Mielke wrote: >> The poster I responded to said that the memory required for a hash >> join was relative to the number of distinct values, not the number of >> rows. They gave an example of millions of rows, but only a few >> distinct values. Above, you agree with me that it it would include >> the rows (or at least references to the rows) as well. If it stores >> rows, or references to rows, then memory *is* relative to the number >> of rows, and millions of records would require millions of rows (or >> row references). > > Yeah, I think we're talking at cross-purposes, due to hash tables > being used in two completely different places in Postgres. Firstly, > you have hash joins, where Postgres loads the references to the actual > rows, and puts those in the hash table. For that situation, you want a > small number of rows. Secondly, you have hash aggregates, where > Postgres stores an entry for each "group" in the hash table, and does > not store the actual rows. For that situation, you can have a > bazillion individual rows, but only a small number of distinct groups. That makes sense with my reality. :-) Thanks, mark -- Mark Mielke <mark@mielke.cc>
В списке pgsql-performance по дате отправления: