Re: Hash join and picking which result set to build the hash table with.
От | Tom Lane |
---|---|
Тема | Re: Hash join and picking which result set to build the hash table with. |
Дата | |
Msg-id | 14387.1716411233@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Hash join and picking which result set to build the hash table with. ("Dirschel, Steve" <steve.dirschel@thomsonreuters.com>) |
Список | pgsql-general |
"Dirschel, Steve" <steve.dirschel@thomsonreuters.com> writes: > The query and execution plan are shown below. My question is > related to the result set the optimizer is choosing to build the > hash table from. My understanding is for a hash join you want to > build the hash table out of the smaller result set. That's *a* consideration, but not the only one. We also consider whether the hash key has a flat distribution; if it is too skewed, we might find specific hash chains getting too long. > When running some tests I forgot to create the PK on table > docloc_test. When the PK was not on the table the optimizer decided > to create the hash table off the 1000 rows from collection. But as > soon as I put the PK on that table it then decides to use > docloc_test to build the hash table. I think that the presence of a unique index overrides the statistics (or the lack of any) so that the planner knows that the column is unique and thus safe to use as a hash key. Now, it should have known that anyway, unless maybe this is a freshly-built table that auto-analyze hasn't gotten to yet? regards, tom lane
В списке pgsql-general по дате отправления: