Re: auto_explain WAS: RFC: Timing Events
От | Robert Haas |
---|---|
Тема | Re: auto_explain WAS: RFC: Timing Events |
Дата | |
Msg-id | CA+TgmoYE8_VGV2GC41ZHxkupmHcOO3X6F+haEQZ0uZFn_4Nfig@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: auto_explain WAS: RFC: Timing Events (Greg Stark <stark@mit.edu>) |
Ответы |
Re: auto_explain WAS: RFC: Timing Events
|
Список | pgsql-hackers |
On Mon, Feb 25, 2013 at 10:22 PM, Greg Stark <stark@mit.edu> wrote: > On Mon, Feb 25, 2013 at 8:26 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Sun, Feb 24, 2013 at 7:27 PM, Jim Nasby <jim@nasby.net> wrote: >>> We actually do that in our application and have discovered that random >>> sampling can end up significantly skewing your data. >> >> /me blinks. >> >> How so? > > Sampling is a pretty big area of statistics. There are dozens of > sampling methods to deal with various problems that occur with > different types of data distributions. > > One problem is if you have some very rare events then random sampling > can produce odd results since those rare events will drop out entirely > unless your sample is very large whereas less rare events are > represented proportionally. There are sampling methods that ensure > that x% of the rare events are included even if those rare events are > less than x% of your total data set. One of those might be appropriate > to use for profiling data when you're looking for rare slow queries > amongst many faster queries. I'll grant all that, but it still seems to me like x% of all queries plus all queries running longer than x milliseconds would cover most of the interesting cases. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: