Re: Tricky math in PSQL

Поиск

Список

Период

Сортировка

От	Kip Warner
Тема	Re: Tricky math in PSQL
Дата	27 апреля 2016 г. 07:09:51
Msg-id	1461740984.28754.14.camel@thevertigo.com обсуждение исходный текст
Ответ на	Re: Tricky math in PSQL (Merlin Moncure <mmoncure@gmail.com>)
Список	pgsql-novice

Дерево обсуждения

On Tue, 2016-04-26 at 09:13 -0500, Merlin Moncure wrote:
> I'm not going to work out the whole problem for you, but you probably
> want to start looking in the direction custom aggregate function.
> Most people are not aware of this but custom aggregates can be
> applied over a window.  The state can be a composite type and thus
> hold arbitrarily complex information that you can maintain as you
> walk through the set (this works great for time series and things
> like that).
>
> Barring that, if you are still looking for solutions to this type of
> problem in the database, I'd point you in the direction of pl/r.  The
> learning curve is somewhat steep but rewards mastery.
>
> CREATE AGGREGATE RateChange(NUMERIC, NUMERIC, INT)
> (
>   SFUNC = AggRateChange,
>   STYPE = NUMERIC
> );
>
> CREATE OR REPLACE FUNCTION AggRateChange(
>   State NUMERIC,
>   Seed NUMERIC,
>   RateChange NUMERIC,
>   NDigits INT) RETURNS NUMERIC AS
> $$
>   SELECT ROUND(COALESCE($1, $2) * (1.0 + $3), NDigits);
> $$ LANGUAGE SQL IMMUTABLE;
>
> select ratechange(100, (random() * 0.05)::numeric, 3) over(order by
> s)
> from generate_series(1,10) s;
>  ratechange
> ────────────
>     101.524
>     104.570
>     104.674
>     106.599
>     111.685
>     114.583
>     120.174
>     125.561
>     130.291

Hey Merlin,

Thank you for your thoughtful suggestions. I spent the better part of
the day wrestling with some refactored schema and queries with a great
deal of help from the kind folks in #postgresql. This is what I've got
now, albeit without custom aggregate functions:

    http://pastebin.com/zwWmYMd0

As you can see, it does appear to work, but it's a problem of doing so
efficiently now.  The solution I have around line 192 to 208 performs
total_diff() for a given id against all others, but with some constants
removed. It does work, but for a statistic table of 1,000,000 rows and
a statistics_child table of 101,948,612, the query takes approximately
4m 11s on my machine.

I'd like to improve this. I wonder if providing indexes on some of the
columns in the two views would be helpful?

--
Kip Warner -- Senior Software Engineer
OpenPGP encrypted/signed mail preferred
http://www.thevertigo.com

Вложения

signature.asc

В списке pgsql-novice по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Tricky math in PSQL

Вложения