Re: Tricky math in PSQL
От | Kip Warner |
---|---|
Тема | Re: Tricky math in PSQL |
Дата | |
Msg-id | 1461740984.28754.14.camel@thevertigo.com обсуждение исходный текст |
Ответ на | Re: Tricky math in PSQL (Merlin Moncure <mmoncure@gmail.com>) |
Список | pgsql-novice |
On Tue, 2016-04-26 at 09:13 -0500, Merlin Moncure wrote: > I'm not going to work out the whole problem for you, but you probably > want to start looking in the direction custom aggregate function. > Most people are not aware of this but custom aggregates can be > applied over a window. The state can be a composite type and thus > hold arbitrarily complex information that you can maintain as you > walk through the set (this works great for time series and things > like that). > > Barring that, if you are still looking for solutions to this type of > problem in the database, I'd point you in the direction of pl/r. The > learning curve is somewhat steep but rewards mastery. > > CREATE AGGREGATE RateChange(NUMERIC, NUMERIC, INT) > ( > SFUNC = AggRateChange, > STYPE = NUMERIC > ); > > CREATE OR REPLACE FUNCTION AggRateChange( > State NUMERIC, > Seed NUMERIC, > RateChange NUMERIC, > NDigits INT) RETURNS NUMERIC AS > $$ > SELECT ROUND(COALESCE($1, $2) * (1.0 + $3), NDigits); > $$ LANGUAGE SQL IMMUTABLE; > > select ratechange(100, (random() * 0.05)::numeric, 3) over(order by > s) > from generate_series(1,10) s; > ratechange > ──────────── > 101.524 > 104.570 > 104.674 > 106.599 > 111.685 > 114.583 > 120.174 > 125.561 > 130.291 Hey Merlin, Thank you for your thoughtful suggestions. I spent the better part of the day wrestling with some refactored schema and queries with a great deal of help from the kind folks in #postgresql. This is what I've got now, albeit without custom aggregate functions: http://pastebin.com/zwWmYMd0 As you can see, it does appear to work, but it's a problem of doing so efficiently now. The solution I have around line 192 to 208 performs total_diff() for a given id against all others, but with some constants removed. It does work, but for a statistic table of 1,000,000 rows and a statistics_child table of 101,948,612, the query takes approximately 4m 11s on my machine. I'd like to improve this. I wonder if providing indexes on some of the columns in the two views would be helpful? -- Kip Warner -- Senior Software Engineer OpenPGP encrypted/signed mail preferred http://www.thevertigo.com
Вложения
В списке pgsql-novice по дате отправления: