Re: how to identify outliers
От | Scott Bailey |
---|---|
Тема | Re: how to identify outliers |
Дата | |
Msg-id | 4AE77C8F.1000900@comcast.net обсуждение исходный текст |
Ответ на | Re: how to identify outliers ("Rhys A.D. Stewart" <rhys.stewart@gmail.com>) |
Список | pgsql-general |
Rhys A.D. Stewart wrote: > Im asking how to get the ones that dont fall near the avg.... so for > example lets say i have the following distances: > 10,11,12,11,10,9,9,10,11,12,10,11,99 > > then 99 would be an outlier. the avg would be like 16 or 17 i reckon > with the 99. so i want a way to find aan outlier, remove it and then > recalcuate the avg...and then i'd get a 'better' avg..... > > i did some seraching about outliers and most of hits are about R or > spss or some other statistical package.....so looking for a way to do > it wholly in pgsql. > > > Rhys > > On Tue, Oct 27, 2009 at 4:53 PM, Ben Chobot <bench@silentmedia.com> wrote: >> Are you asking how to find the average and standard deviation? Or how to >> compare the your data against some set values? Perhaps an example would be >> appropriate; it's not very clear to me what you're asking. >> >> Rhys A.D. Stewart wrote: >>> Hey all, >>> I have the following table: data(pnum text, distance float8, route text). >>> I would like to remove the outliers in distance, i.e. lets say i get >>> the avg dist of pnum for each route and the std deviation of the >>> distance what is the best way to identify the outliers? >>> >>> >>> Rhys. Oh, so you want to "cook" your data? I don't agree with that conceptually, but: WITH base AS ( SELECT random(1, 100)::int AS i FROM generate_series(1, 100) i ), stats AS ( SELECT avg(i) AS dist_avg, stddev(i) AS dist_dev FROM base ) SELECT count(i), avg(i) AS new_avg, MIN(stats.dist_avg) AS old_avg, stddev(i) AS new_dev, MIN(stats.dist_dev) AS old_dev FROM base, stats WHERE base.i BETWEEN stats.dist_avg - dist_dev AND stats.dist_avg + dist_dev Scott Bailey
В списке pgsql-general по дате отправления: