RE: Joining 1-minute data with 5-minute data
От | Stephen Froehlich |
---|---|
Тема | RE: Joining 1-minute data with 5-minute data |
Дата | |
Msg-id | MWHPR0601MB3657055AB134ADCFB46CD0EFE50A0@MWHPR0601MB3657.namprd06.prod.outlook.com обсуждение исходный текст |
Ответ на | Re: Joining 1-minute data with 5-minute data (Laurenz Albe <laurenz.albe@cybertec.at>) |
Список | pgsql-novice |
That's a solid plan ... thanks. -----Original Message----- From: Laurenz Albe <laurenz.albe@cybertec.at> Sent: Tuesday, January 28, 2020 1:44 AM To: Stephen Froehlich <s.froehlich@cablelabs.com>; pgsql-novice@lists.postgresql.org Subject: Re: Joining 1-minute data with 5-minute data On Mon, 2020-01-27 at 16:35 +0000, Stephen Froehlich wrote: > I have a couple of relatively large tables, each with 100-500 million lines (at least in each monthly partition). > > One has data every 1 minute, and the other has data every 5 minutes, > and I’d like to be able to join them (i.e. with each minute in the 5-minute span rounded down to the beginning of that5-minute interval). > > I’m currently running PostgreSQL 11. An upgrade to 12 (for calculated fields) is possible but annoying at the moment. > (i.e. I’ll do it if its worth it, but I’m otherwise planning on > holding off until the Ubuntu 20.40LTS release for that upgrade > process.) > > What is the most efficient (i.e. performant) way to do that join? > - Create an index for the 1-min table something like (trunc(time_stamp::int / 300) * 300)::timestamp with time zone > - Is there a more efficient way to round to 5 minutes? > - Encode the time stamp for the 5-min table as a tstzrange and create a gist index on that column? > - Manually add a 5-minute rounded column to the 1-minute table and index that? > - Something I have missed entirely? Depending on the number of rows required from each table, an index may not be useful at all: with a hash join, indexes don'thelp. You should make sure that the join condition looks like this: (expression with columns of the 1-minute table) = (expression with columns of the 5-minute table) Otherwise, PostgreSQL can only use a nested loop join, which may not be the best strategy. Then experiment with indexes on the expressions in the join condition: nested loop joins and merge joins can profit from them. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
В списке pgsql-novice по дате отправления: