Re: Recursive merging of overlapping arrays in a column
От | dave |
---|---|
Тема | Re: Recursive merging of overlapping arrays in a column |
Дата | |
Msg-id | 1442764637710-5866579.post@n5.nabble.com обсуждение исходный текст |
Ответ на | Re: Recursive merging of overlapping arrays in a column (Rob Sargent <robjsargent@gmail.com>) |
Ответы |
Re: Recursive merging of overlapping arrays in a column
|
Список | pgsql-sql |
Sorry, here is the post again in plain text... i have the following Table: CREATE TABLE arrays (id SERIAL, arr INT[]); INSERT INTO arrays (arr) VALUES (ARRAY[1,3,6,9]); INSERT INTO arrays (arr) VALUES (ARRAY[2,4]); INSERT INTO arrays (arr) VALUES (ARRAY[3,10,40]); INSERT INTO arrays (arr) VALUES (ARRAY[3,18,44]); INSERT INTO arrays (arr) VALUES (ARRAY[63,140,420]); INSERT INTO arrays (arr) VALUES (ARRAY[42,102,420]); INSERT INTO arrays (arr) VALUES (ARRAY[2,7]); INSERT INTO arrays (arr) VALUES (ARRAY[1,3,11]); INSERT INTO arrays (arr) VALUES (ARRAY[8,12,19]); I want to merge the arrays which have overlapping elements, so that I get the result which doesn't contain overlapping arrays anymore: arr --------------------------{1,3,6,9,10,11,18,40,44}{2,4,7}{8,12,19}{42,63,102,140,420} I am not an expert in SQL and it took me a long time to come up with this solution: WITH RECURSIVE clusters AS (select DISTINCT uniq(sort_asc(array_cat(a1.arr, a2.arr))) AS arrfrom arrays a1 cross join arraysa2 where a1.arr && a2.arr ANDleast(a1.id,a2.id) != greatest(a1.id, a2.id)UNIONselect DISTINCT uniq(sort_asc(array_cat(a1.arr,a2.arr))) AS arrfrom arrays a1 cross join clusters a2 where a1.arr && a2.arr AND a1.arr !=a2.arr ) SELECT arr FROM (SELECT * FROM ( SELECT DISTINCT ON (arr[1]) arr FROM clusters ORDER BY arr[1], array_length(arr,1) DESC) AS c UNION SELECT arr FROM arrays WHERE id NOT IN ( SELECT DISTINCT a1.id FROM arrays a1 CROSS JOIN arrays a2 WHERE a1.arr&& a2.arr AND least(a1.id,a2.id) != greatest(a1.id, a2.id)) ) AS clustertable ORDER BY arr; Which gives me the result: arr --------------------------{1,3,6,9,10,11,18,40,44}{2,4,7}{3,10,18,40,44}{8,12,19}{42,63,102,140,420} (5 rows) Result number 3 is contained in number one and shouldn't be in the output anymore, because I only want non overlapping arrays in the result. Another problem I encountered is that the performance of this query seems to be very bad. I tried running it on a larger table (~400000 arrays) and it is still running after ~10h. I would appreciate any input on this problems, so it would be nice if anyone could give me a hint how to get only the merged arrays without overlaps in the resultset and maybe how to build a more elegant and efficient query. Thanks in advance, Dave -- View this message in context: http://postgresql.nabble.com/Recursive-merging-of-overlapping-arrays-in-a-column-tp5866560p5866579.html Sent from the PostgreSQL - sql mailing list archive at Nabble.com.
В списке pgsql-sql по дате отправления: