Re: Counting the number of repeated phrases in a column

Поиск

Список

Период

Сортировка

От	Merlin Moncure
Тема	Re: Counting the number of repeated phrases in a column
Дата	27 января 2022 г. 23:10:42
Msg-id	CAHyXU0w2-jbeQ5HPmJH87gJHwkYCcG=Q0xTOUEuLO1e25wVh-Q@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Counting the number of repeated phrases in a column (benj.dev@laposte.net)
Список	pgsql-general

Дерево обсуждения

On Thu, Jan 27, 2022 at 11:56 AM <benj.dev@laposte.net> wrote:
> Le 27/01/2022 à 18:35, Merlin Moncure a écrit :
> > select distinct array_to_string(v[a:b], ' ') phrase, count(*) as occurrences
> > from
> > (
> >    select array_agg(t) v
> >    from
> >    (
> >      select trim(replace(unnest(v), E'\n', '')) t
> >      from regexp_split_to_array(<sentence>, ' ') v
> >    ) q
> >    where length(t) > 1
> > ) q
> > cross join lateral generate_series(1, array_upper(v, 1)) a
> > cross join lateral generate_series(a + 1, array_upper(v, 1)) b
> > group by 1
> > having count(*) > 1;
> >
> > We are definitely in N^2 space here, so look for things to start
> > breaking down for sentences > 1000 words.
> >
> > merlin
> >
>
> (for better complexity) you may search about "Ukkonen suffix tree"
> Similar problem as yours :
> https://www.geeksforgeeks.org/suffix-tree-application-3-longest-repeated-substring/?ref=lbp

Yep.  Many problems like this are well solved in imperative languages
and will fit poorly into SQL quase-functional space.  That
implementation could probably be converted to pl/pgsql pretty easily,
or a 'sql + tables' variant as a fun challenge.  It also slightly
exploits the fact that only the most repeated needle is returned,
rather than all of them.

Having the need to have single statement stateless SQL solutions to
interesting problems comes up all the time in common development
practice though for simplicity's sake even if there are better
approaches out there.  It's also fun.

merlin

В списке pgsql-general по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Counting the number of repeated phrases in a column