Re: efficient way to do "fuzzy" join
От | Andy Colson |
---|---|
Тема | Re: efficient way to do "fuzzy" join |
Дата | |
Msg-id | 5348235C.2060306@squeakycode.net обсуждение исходный текст |
Ответ на | efficient way to do "fuzzy" join (Rémi Cura <remi.cura@gmail.com>) |
Ответы |
Re: efficient way to do "fuzzy" join
Re: efficient way to do "fuzzy" join |
Список | pgsql-general |
On 4/11/2014 7:50 AM, Rémi Cura wrote: > Hey dear List, > > I'm looking for some advice about the best way to perform a "fuzzy" > join, that is joining two table based on approximate matching. > > It is about temporal matching > given a table A with rows containing data and a control_time (for > instance 1 ; 5; 6; .. sec, not necessarly rounded of evenly-spaced) > > given another table B with lines on no precise timing (eg control_time = > 2.3 ; 5.8 ; 6.2 for example) > > How to join every row of B to A based on > min(@(A.control_time-B.control_time)) > (that is, for every row of B, get the row of A that is temporaly the > closest), > in an efficient way? > (to be explicit, 2.3 would match to 1, 5.8 to 6, 6.2 to 6) > > Optionnaly, how to get interpolation efficiently (meaning one has to get > the previous time and next time for 1 st order interpolation, 2 before > and 2 after for 2nd order interpolation, and so on)? > (to be explicit 5.8 would match to 5 and 6, the weight being 0.2 and 0.8 > respectively) > > > Currently my data is spatial so I use Postgis function to interpolate a > point on a line, but is is far from efficient or general, and I don't > have control on interpolation (only the spatial values are interpolated). > > > Cheers, > Rémi-C Ok, here is a just sql way. No ranges. No idea if its right. A first pass, so to speak. create table a(t float, data text); create table b(t float, data text); insert into a values (1), (5), (6); insert into b values (2.3), (5.8), (6.2); select a.t, b.t from ( select t, least( least(t, mint), least(t, maxt)) as t2 from ( select t, (select t from a where a.t >= b.t order by a.t limit 1) as mint, (select t from a where a.t < b.t order by a.t desc limit 1) as maxt from b ) as tmp ) as tmp2 inner join a on (tmp2.t2 = a.t) inner join b on (tmp2.t = b.t) The middle part is the magic: select t, (select t from a where a.t >= b.t order by a.t limit 1) as mint, (select t from a where a.t < b.t order by a.t desc limit 1) as maxt from b The rest is just to make it usable. If t is indexed, it'll probably be fast too. -Andy
В списке pgsql-general по дате отправления: