Re: [HACKERS] Avoiding OOM in a hash join with many duplicate inner keys
От | Robert Haas |
---|---|
Тема | Re: [HACKERS] Avoiding OOM in a hash join with many duplicate inner keys |
Дата | |
Msg-id | CA+Tgmoa3UdJ0cOU_vVyWFZM9mPkLvLdqZWxNow-2wPkk2xtL3g@mail.gmail.com обсуждение исходный текст |
Ответ на | [HACKERS] Avoiding OOM in a hash join with many duplicate inner keys (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: [HACKERS] Avoiding OOM in a hash join with many duplicate inner keys
Re: [HACKERS] Avoiding OOM in a hash join with many duplicate inner keys |
Список | pgsql-hackers |
On Thu, Feb 16, 2017 at 2:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > The planner doesn't currently worry about work_mem restrictions when > planning a hash join, figuring that the executor should be able to > subdivide the data arbitrarily finely by splitting buckets at runtime. > However there's a thread here: > https://www.postgresql.org/message-id/flat/CACw4T0p4Lzd6VpwptxgPgoTMh2dEKTQBGu7NTaJ1%2BA0PRx1BGg%40mail.gmail.com > exhibiting a case where a hash join was chosen even though a single > value accounts for three-quarters of the inner relation. Bucket > splitting obviously can never separate multiple instances of the > same value, so this choice forced the executor to try to load > three-quarters of the (very large) inner relation into memory at once; > unsurprisingly, it failed. > > To fix this, I think we need to discourage use of hash joins whenever > a single bucket is predicted to exceed work_mem, as in the attached > draft patch. The patch results in changing from hash to merge join > in one regression test case, which is fine; that case only cares about > the join order not the types of the joins. > > This might be overly aggressive, because it will pretty much shut off > any attempt to use hash joining on a large inner relation unless we > have statistics for it (and those stats are favorable). But having > seen this example, I think we need to be worried. I do think that's worrying, but on the other hand it seems like this solution could disable many hash joins that would actually be fine. I don't think the largest ndistinct estimates we ever generate are very large, and therefore this seems highly prone to worry even when worrying isn't really justified. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: