Обсуждение: pg_statistic "forced" values

Поиск
Список
Период
Сортировка

pg_statistic "forced" values

От
Jacques Caron
Дата:
Hi,

It is well known that in some instances the Postgresql will make 
estimates of the number of distinct values in a table that can be 
quite far off reality. This then has a tendency to make the planner 
lean towards unsavory plans (read: seqscans) because it estimates the 
number of lines returned by a part of the request as being quite a 
lot more than they really are.

The "good" solution would be to fix the estimator, but there has 
already been long discussions on this topic in the past years and 
apparently no consensus was found, with alternatives proposed 
"fixing" some cases where the current estimator is wrong but getting 
in trouble in others, or requiring quite a bit more CPU/memory/disk 
I/O to achieve their results (correct me if I'm wrong).

There is a "simple" way to override this, which is to change the 
value present in pg_statistic, however it will be overwritten the 
next time ANALYZE (or VACUUM ANALYZE) is run. This thus requires 
adding updates to this value every time a request that might be 
fooled by it is executed, which is cumbersome, and does not 
facilitate updates of this value (especially with positive values of 
stadistinct).

It seems to me it would be a good idea to be able to store a forced 
value for stadistinct in pg_attribute (with optionally some clauses 
to set/change/reset it in CREATE TABLE, ALTER TABLE ADD COLUMN and 
ALTER TABLE ALTER COLUMN, in a way similar to the STATISTICS clauses).

Alternatively, it could be a simple boolean to just say "don't update 
stadistinct".

Or did I miss something and this already exists somewhere?

If not, are there any comments or suggestions regarding implementing this?

Thanks,

Jacques.



Re: pg_statistic "forced" values

От
Simon Riggs
Дата:
On Wed, 2007-11-07 at 17:18 +0100, Jacques Caron wrote:

> It is well known that in some instances the Postgresql will make 
> estimates of the number of distinct values in a table that can be 
> quite far off reality. This then has a tendency to make the planner 
> lean towards unsavory plans (read: seqscans) because it estimates the 
> number of lines returned by a part of the request as being quite a 
> lot more than they really are.
> 
> The "good" solution would be to fix the estimator, but there has 
> already been long discussions on this topic in the past years and 
> apparently no consensus was found, with alternatives proposed 
> "fixing" some cases where the current estimator is wrong but getting 
> in trouble in others, or requiring quite a bit more CPU/memory/disk 
> I/O to achieve their results (correct me if I'm wrong).
> 
> There is a "simple" way to override this, which is to change the 
> value present in pg_statistic, however it will be overwritten the 
> next time ANALYZE (or VACUUM ANALYZE) is run. This thus requires 
> adding updates to this value every time a request that might be 
> fooled by it is executed, which is cumbersome, and does not 
> facilitate updates of this value (especially with positive values of 
> stadistinct).
> 
> It seems to me it would be a good idea to be able to store a forced 
> value for stadistinct in pg_attribute (with optionally some clauses 
> to set/change/reset it in CREATE TABLE, ALTER TABLE ADD COLUMN and 
> ALTER TABLE ALTER COLUMN, in a way similar to the STATISTICS clauses).

I'm looking at exactly this issue at the moment, though as one issue
amongst many similar ones.

My inclination is to decide what needs to be stored, then debate
separately where it should be stored.

I expect to post a wider proposal in around two weeks.

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com