Re: [HACKERS] Bad n_distinct estimation; hacks suggested?
От | Dave Held |
---|---|
Тема | Re: [HACKERS] Bad n_distinct estimation; hacks suggested? |
Дата | |
Msg-id | 49E94D0CFCD4DB43AFBA928DDD20C8F9026184DE@asg002.asg.local обсуждение исходный текст |
Список | pgsql-performance |
> -----Original Message----- > From: Gurmeet Manku [mailto:manku@CS.Stanford.EDU] > Sent: Tuesday, April 26, 2005 5:01 PM > To: Simon Riggs > Cc: Tom Lane; josh@agliodbs.com; Greg Stark; Marko Ristola; > pgsql-perform; pgsql-hackers@postgresql.org; Utkarsh Srivastava; > snt@iastate.edu > Subject: Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks > suggested? > > [...] > 2. In a single scan, it is possible to estimate n_distinct by using > a very simple algorithm: > > "Distinct sampling for highly-accurate answers to distinct value > queries and event reports" by Gibbons, VLDB 2001. > > http://www.aladdin.cs.cmu.edu/papers/pdfs/y2001/dist_sampl.pdf > > [...] This paper looks the most promising, and isn't too different from what I suggested about collecting stats over the whole table continuously. What Gibbons does is give a hard upper bound on the sample size by using a logarithmic technique for storing sample information. His technique appears to offer very good error bounds and confidence intervals as shown by tests on synthetic and real data. I think it deserves a hard look from people hacking the estimator. __ David B. Held Software Engineer/Array Services Group 200 14th Ave. East, Sartell, MN 56377 320.534.3637 320.253.7800 800.752.8129
В списке pgsql-performance по дате отправления: