Re: [PERFORM] Bad n_distinct estimation; hacks suggested?
От | Gurmeet Manku |
---|---|
Тема | Re: [PERFORM] Bad n_distinct estimation; hacks suggested? |
Дата | |
Msg-id | Pine.LNX.4.44.0504261443340.12051-100000@xenon.Stanford.EDU обсуждение исходный текст |
Ответ на | Re: [PERFORM] Bad n_distinct estimation; hacks suggested? (Simon Riggs <simon@2ndquadrant.com>) |
Список | pgsql-hackers |
Hi everybody! Perhaps the following papers are relevant to the discussion here (their contact authors have been cc'd): 1. The following proposes effective algorithms for using block-level sampling for n_distinct estimation: "Effective use of block-level sampling in statistics estimation" by Chaudhuri, Das and Srivastava, SIGMOD 2004. http://www-db.stanford.edu/~usriv/papers/block-sampling.pdf 2. In a single scan, it is possible to estimate n_distinct by using a very simple algorithm: "Distinct sampling for highly-accurate answers to distinct value queries and event reports" by Gibbons, VLDB 2001. http://www.aladdin.cs.cmu.edu/papers/pdfs/y2001/dist_sampl.pdf 3. In fact, Gibbon's basic idea has been extended to "sliding windows" (this extension is useful in streaming systems like Aurora / Stream): "Distributed streams algorithms for sliding windows" by Gibbons and Tirthapura, SPAA 2002. http://home.eng.iastate.edu/~snt/research/tocs.pdf Thanks, Gurmeet ---------------------------------------------------- Gurmeet Singh Manku Google Inc. http://www.cs.stanford.edu/~manku (650) 967 1890 ----------------------------------------------------
В списке pgsql-hackers по дате отправления: