Re: Removing redundant itemsets
От | Allan Kamau |
---|---|
Тема | Re: Removing redundant itemsets |
Дата | |
Msg-id | 47F0B5DA.7020101@sanbi.ac.za обсуждение исходный текст |
Ответ на | Re: Removing redundant itemsets (Craig Ringer <craig@postnewspapers.com.au>) |
Список | pgsql-sql |
Craig, Thank you so much for the solution. I have spent many hours since Thursday last week including the weekend (and it took you just a few minutes) trying to figure out a solution not involving procedural programming and looping (as the size of the items and even the number of "purchases" in the datasets I may be working with may be large), I was looking for a solution that may take (almost) polynomial time (and resources) and also make use of Postgresql refined and efficient engine. Your solution satisfies these requirements. Thanks. Allan. Craig Ringer wrote: > Allan Kamau wrote: > >> Hi all, >> I have a list of purchases (market basket) and I would like to select >> non redundant longest possible patterns by eliminating >> (creating/populating other table to contain only non redandant itemsets) >> purchases having item lists which are fully included in at least one >> other purchase. >> > > Here's a possibly slow and surely ugly solution (I think it's right, > though I haven't done more than passing testing): > > > > CREATE VIEW togo_as_arr AS > SELECT a.tid, > ARRAY(SELECT item FROM togo b WHERE b.tid = a.tid ORDER BY item) > AS items > FROM togo a GROUP BY tid; > > SELECT arr_a.tid AS redundant_tid, arr_b.tid AS contained_by > FROM togo_as_arr arr_a CROSS JOIN togo_as_arr arr_b > WHERE arr_a.tid <> arr_b.tid AND arr_a.items <@ arr_b.items; > > > > (the view isn't necessary, but does improve the readability of the query). > > It groups the purchases up with item lists as arrays, then finds any > purchases with items arrays wholly contained by other item arrays from > other purchases. > > I'm *sure* there's a smarter way to do this that avoids the use of > arrays, but I don't seem to be able to come up with one right now. It's > interesting, though, so I might keep fiddling. > > -- > Craig Ringer >
В списке pgsql-sql по дате отправления: