Re: [PERFORM] Bad n_distinct estimation; hacks suggested?
От | Mischa Sandberg |
---|---|
Тема | Re: [PERFORM] Bad n_distinct estimation; hacks suggested? |
Дата | |
Msg-id | 1115155990.4277ee16aba34@webmail.telus.net обсуждение исходный текст |
Ответ на | Re: [PERFORM] Bad n_distinct estimation; hacks suggested? (Markus Schaber <schabi@logix-tt.com>) |
Ответы |
Re: [PERFORM] Bad n_distinct estimation; hacks suggested?
|
Список | pgsql-hackers |
Quoting Markus Schaber <schabi@logix-tt.com>: > Hi, Josh, > > Josh Berkus wrote: > > > Yes, actually. We need 3 different estimation methods: > > 1 for tables where we can sample a large % of pages (say, >= 0.1) > > 1 for tables where we sample a small % of pages but are "easily > estimated" > > 1 for tables which are not easily estimated by we can't afford to > sample a > > large % of pages. > > > > If we're doing sampling-based estimation, I really don't want > people to lose > > sight of the fact that page-based random sampling is much less > expensive than > > row-based random sampling. We should really be focusing on > methods which > > are page-based. Okay, although given the track record of page-based sampling for n-distinct, it's a bit like looking for your keys under the streetlight, rather than in the alley where you dropped them :-) How about applying the distinct-sampling filter on a small extra data stream to the stats collector? -- Engineers think equations approximate reality. Physicists think reality approximates the equations. Mathematicians never make the connection.
В списке pgsql-hackers по дате отправления: