Re: accounting for memory used for BufFile during hash joins
От | Tom Lane |
---|---|
Тема | Re: accounting for memory used for BufFile during hash joins |
Дата | |
Msg-id | 27778.1557240156@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: accounting for memory used for BufFile during hash joins (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Ответы |
Re: accounting for memory used for BufFile during hash joins
|
Список | pgsql-hackers |
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > On Mon, May 06, 2019 at 11:18:28PM -0400, Tom Lane wrote: >> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >>> Do we actually check how many duplicates are there during planning? >> Certainly that's part of the planner's cost estimates ... but it's >> only as good as the planner's statistical knowledge. > I'm looking at the code, and the only place where I see code dealing with > MCVs (probably the best place for info about duplicate values) is > estimate_hash_bucketsize in final_cost_hashjoin. What I'm thinking of is this bit in final_cost_hashjoin: /* * If the bucket holding the inner MCV would exceed work_mem, we don't * want to hash unless there is really no other alternative, so apply * disable_cost. (The executor normally copes with excessive memory usage * by splitting batches, but obviously it cannot separate equal values * that way, so it will be unable to drive the batch size below work_mem * when this is true.) */ if (relation_byte_size(clamp_row_est(inner_path_rows * innermcvfreq), inner_path->pathtarget->width) > (work_mem * 1024L)) startup_cost += disable_cost; It's certainly likely that that logic needs improvement in view of this discussion --- I was just pushing back on the claim that we weren't considering the issue at all. regards, tom lane
В списке pgsql-hackers по дате отправления: