Re: Counting the number of repeated phrases in a column
От | Merlin Moncure |
---|---|
Тема | Re: Counting the number of repeated phrases in a column |
Дата | |
Msg-id | CAHyXU0x501igQ2x_83wTfsL1pg0e9aRYwr0ScxSR+w95r8CJPg@mail.gmail.com обсуждение исходный текст |
Ответ на | Counting the number of repeated phrases in a column (Shaozhong SHI <shishaozhong@gmail.com>) |
Ответы |
Re: Counting the number of repeated phrases in a column
|
Список | pgsql-general |
On Tue, Jan 25, 2022 at 11:10 AM Shaozhong SHI <shishaozhong@gmail.com> wrote: > > There is a short of a function in the standard Postgres to do the following: > > It is easy to count the number of occurrence of words, but it is rather difficult to count the number of occurrence ofphrases. > > For instance: > > A cell of value: 'Hello World' means 1 occurrence a phrase. > > A cell of value: 'Hello World World Hello' means no occurrence of any repeated phrase. > > But, A cell of value: 'Hello World World Hello Hello World' means 2 occurrences of 'Hello World'. > > 'The City of London, London' also has no occurrences of any repeated phrase. > > Anyone has got such a function to check out the number of occurrence of any repeated phrases? Let's define phase as a sequence of two or more words, delimited by space. you could find it with something like: with s as (select 'Hello World Hello World' as sentence) select phrase, array_upper(string_to_array((select sentence from s), phrase), 1) - 1 as occurrances from ( select array_to_string(x, ' ') as phrase from ( select distinct v[a:b] x from regexp_split_to_array((select sentence from s), ' ') v cross join lateral generate_series(1, array_upper(v, 1)) a cross join lateral generate_series(a + 1, array_upper(v, 1)) b ) q ) q; this would be slow for large sentences obviously, and you'd probably want to prepare the string stripping some characters and such. merlin
В списке pgsql-general по дате отправления: