Re: Merge rows based on Levenshtein distance
От | David G Johnston |
---|---|
Тема | Re: Merge rows based on Levenshtein distance |
Дата | |
Msg-id | CAKFQuwYR0bAcgaMc-UthUSQ9UTX_TH43Dsq-xz1ZGHhhruEPTA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Merge rows based on Levenshtein distance (pinker <pinker@onet.eu>) |
Ответы |
Re: Merge rows based on Levenshtein distance
|
Список | pgsql-general |
There is nice extension in postgres: fuzzystrmatch I have used to calculate the distance. From documetation:
SELECT levenshtein_less_equal('extensive', 'exhaustive',2);
You can use it then with your group by query.
Something like this - replace the substring(...) comparison with legenshtein_less_equal(...) or whatever comparison you find applicable.
In the case below new groups are started whenever the first letter of the value changes.
The first group would be NULL so I add a COALESCE() call to make it 0 - subsequent groups start with 1 and increment properly.
WITH src (val) AS (
VALUES ('A1'::varchar),('A2'),('B1'),('B2'),('B3'),('C1'),('D1')
)
, grp AS (
SELECT val
, CASE WHEN
substring(val,1,1) <> substring(lag(val) OVER (ORDER BY val),1,1)
THEN 1
ELSE NULL
END AS changed
, ROW_NUMBER() OVER (ORDER BY val) AS val_idx
FROM src
)
SELECT val, COALESCE(sum(changed) OVER (ORDER BY val_idx), 0) AS group_id
FROM grp
;
David J.
View this message in context: Re: Merge rows based on Levenshtein distance
Sent from the PostgreSQL - general mailing list archive at Nabble.com.
В списке pgsql-general по дате отправления: