RE: Statistical Analysis
От | Nathan Barnett |
---|---|
Тема | RE: Statistical Analysis |
Дата | |
Msg-id | 71975481CD04D4118E57004033A2596E0DF948@ip205.82.136.216.in-addr.arpa обсуждение исходный текст |
Ответ на | Statistical Analysis ("Nathan Barnett" <nbarnett@cellularphones.com>) |
Список | pgsql-general |
Stephan, The SORT is what I'm trying to avoid because I was using a group by to grab all the data in the groups that I needed, but it requires a sort to group by and this bottlenecked the query. I really just wanted to grab a sample of all the rows in the table and then perform the group by on the subset to avoid the overhead of sorting the whole table. My query has no where clauses and thus must sort through all of the data being analyzed. It then aggregates the data in a table that is then being used in the realtime queries. The analysis must be able to run every hour. ---------------- Nathan Barnett -----Original Message----- From: pgsql-general-owner@hub.org [mailto:pgsql-general-owner@hub.org]On Behalf Of Stephan Szabo Sent: Monday, July 10, 2000 3:49 PM To: Nathan Barnett; pgsql-general@postgresql.org Subject: Re: [GENERAL] Statistical Analysis Are you grabbing a set of rows to work on in an outside app? You may be able to get a smaller random set with: select <whatever> from <table> order by random() limit <number> But this will pretty much force a sort step [and if you're not limiting the rows with a where clause, probably a full sequential scan] and could be very expensive depending on the number or matching rows for any limiting clauses you have. You'd have to play with it in practice to see if it's any faster. ----- Original Message ----- From: "Nathan Barnett" <nbarnett@cellularphones.com> To: <pgsql-general@postgresql.org> Sent: Monday, July 24, 2000 12:20 PM Subject: [GENERAL] Statistical Analysis > I am having to perform a large data analysis query fairly frequently and the > execution time is not exceptable, so I was looking at doing a statictical > sample of the data to get fairly accurate results. Is there a way to > perform a query on a set number of random rows instead of the whole dataset? > I have looked through the documentation for a function that would do this, > but I have not seen any. If this is a RTFM type question, then feel free to > tell me so and point me in the right direction because I just haven't been > able to find any info on it.
В списке pgsql-general по дате отправления: