Re: Add min and max execute statement time in pg_stat_statement
От | Arne Scheffer |
---|---|
Тема | Re: Add min and max execute statement time in pg_stat_statement |
Дата | |
Msg-id | permail-20150121121804fe5316b600007a2b-scheffa@message-id.uni-muenster.de обсуждение исходный текст |
Ответ на | Re: Add min and max execute statement time in pg_stat_statement (David G Johnston <david.g.johnston@gmail.com>) |
Список | pgsql-hackers |
David G Johnston schrieb am 2015-01-21: > Andrew Dunstan wrote > > On 01/20/2015 01:26 PM, Arne Scheffer wrote: > >> And a very minor aspect: > >> The term "standard deviation" in your code stands for > >> (corrected) sample standard deviation, I think, > >> because you devide by n-1 instead of n to keep the > >> estimator unbiased. > >> How about mentioning the prefix "sample" > >> to indicate this beiing the estimator? > > I don't understand. I'm following pretty exactly the calculations > > stated > > at <http://www.johndcook.com/blog/standard_deviation/> > > I'm not a statistician. Perhaps others who are more literate in > > statistics can comment on this paragraph. > I'm largely in the same boat as Andrew but... > I take it that Arne is referring to: > http://en.wikipedia.org/wiki/Bessel's_correction Yes, it is. > but the mere presence of an (n-1) divisor does not mean that is what > is > happening. In this particular situation I believe the (n-1) simply > is a > necessary part of the recurrence formula and not any attempt to > correct for > sampling bias when estimating a population's variance. That's wrong, it's applied in the end to the sum of squared differences and therefore per definition the corrected sample standard deviation estimator. > In fact, as > far as > the database knows, the values provided to this function do represent > an > entire population and such a correction would be unnecessary. I That would probably be an exotic assumption in a working database and it is not, what is computed here! > guess it > boils down to whether "future" queries are considered part of the > population > or whether the population changes upon each query being run and thus > we are > calculating the ever-changing population variance. Yes, indeed correct. And exactly to avoid that misunderstanding, I suggested to use the "sample" term. To speak in Postgresql terms; applied in Andrews/Welfords algorithm is stddev_samp(le), not stddev_pop(ulation). Therefore stddev in Postgres is only kept for historical reasons, look at http://www.postgresql.org/docs/9.4/static/functions-aggregate.html Table 9-43. VlG-Arne > Note point 3 in > the > linked Wikipedia article. > David J. > -- > View this message in context: > http://postgresql.nabble.com/Add-min-and-max-execute-statement-time-in-pg-stat-statement-tp5774989p5834805.html > Sent from the PostgreSQL - hackers mailing list archive at > Nabble.com. > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers
В списке pgsql-hackers по дате отправления: