Re: strange slow query - lost lot of time somewhere
От | David Rowley |
---|---|
Тема | Re: strange slow query - lost lot of time somewhere |
Дата | |
Msg-id | CAApHDvp5yS=-yoXpjwQzJ-QobaAXJqG99RSepnK9bHE0trbOGA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: strange slow query - lost lot of time somewhere ("David G. Johnston" <david.g.johnston@gmail.com>) |
Ответы |
Re: strange slow query - lost lot of time somewhere
|
Список | pgsql-hackers |
On Tue, 3 May 2022 at 13:43, David G. Johnston <david.g.johnston@gmail.com> wrote: > hit_ratio = (est_entries / ndistinct) - (ndistinct / calls) || clamp to 0.0 > I don't understand the adjustment factor ndistinct/calls I've attached a spreadsheet showing you the impact of subtracting (ndistinct / calls). What this is correcting for is the fact that the first scan from each unique value is a cache miss. The more calls we have, the more hits we'll get. If there was only 1 call per distinct value then there'd never be any hits. Without subtracting (ndistinct / calls) and assuming there's space in the cache for each ndistinct value, we'd assume 100% cache hit ratio if calls == ndistinct. What we should assume in that case is a 0% hit ratio as the first scan for each distinct parameter must always be a miss as we've never had a chance to cache any tuples for it yet. > This is a "rescan" so aside from cache management isn't the cost of originally populating the cache already accounted forelsewhere? The cost of the first scan is calculated in create_memoize_path(). Since the first scan will always be a cache miss, the code there just adds some cache management surcharges. Namely: /* * Add a small additional charge for caching the first entry. All the * harder calculations for rescans are performed in cost_memoize_rescan(). */ pathnode->path.startup_cost = subpath->startup_cost + cpu_tuple_cost; pathnode->path.total_cost = subpath->total_cost + cpu_tuple_cost; David
Вложения
В списке pgsql-hackers по дате отправления: