Обсуждение: prefix search in tsearch
[docs from cvs HEAD] I found the text-search documentation a little unclear about 'prefix search'; specifically, the examples do not show that the so-called 'prefix' is first stemmed, before it is used as prefix. For instance, the following can be a little surprising: SELECT to_tsvector( 'postgraduate' ) @@ to_tsquery( 'postgres:*' ); ?column? ---------- t (1 row) Because prefix search is such an important functionality I think this should be better explained, which I hope the attached doc-patch does. (In textsearch.sgml is another mention + example of prefix search, perhaps it should be extended a little there too - which I'm happy to do as well, but I first wanted to see if you agree that it is a little too obscure as it stands) Erik Rijkers
Вложения
Erik,
I think it'd be more clear if you say not 'stemmed', but processed in
according to configuration. Here is an example:
$SHAREDIR/tsearch_data/my_synonyms.syn contains one line:
one 1
CREATE TEXT SEARCH DICTIONARY my_synonym (
TEMPLATE = synonym,
SYNONYMS = my_synonyms
);
ALTER TEXT SEARCH CONFIGURATION english
ALTER MAPPING FOR asciiword
WITH my_synonym, english_stem;
test=# select 'one'::tsvector @@ to_tsquery('english','one:*');
?column?
----------
f
(1 row)
because 'one' was processed by my_synonym dictionary.
test=# select ts_debug('english','one');
ts_debug
------------------------------------------------------------------------------
(asciiword,"Word, all ASCII",one,"{my_synonym,english_stem}",my_synonym,{1})
(1 row)
On Tue, 31 Aug 2010, Erik Rijkers wrote:
> [docs from cvs HEAD]
>
> I found the text-search documentation a little unclear about 'prefix search'; specifically, the
> examples do not show that the so-called 'prefix' is first stemmed, before it is used as prefix.
>
> For instance, the following can be a little surprising:
>
> SELECT to_tsvector( 'postgraduate' ) @@ to_tsquery( 'postgres:*' );
> ?column?
> ----------
> t
> (1 row)
>
> Because prefix search is such an important functionality I think this should be better explained,
> which I hope the attached doc-patch does.
>
> (In textsearch.sgml is another mention + example of prefix search, perhaps it should be extended a
> little there too - which I'm happy to do as well, but I first wanted to see if you agree that it
> is a little too obscure as it stands)
>
>
> Erik Rijkers
>
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
I applied a modified documentation patch (attached) that includes Oleg's
suggestions.
---------------------------------------------------------------------------
Oleg Bartunov wrote:
> Erik,
>
> I think it'd be more clear if you say not 'stemmed', but processed in
> according to configuration. Here is an example:
>
> $SHAREDIR/tsearch_data/my_synonyms.syn contains one line:
> one 1
>
>
> CREATE TEXT SEARCH DICTIONARY my_synonym (
> TEMPLATE = synonym,
> SYNONYMS = my_synonyms
> );
>
> ALTER TEXT SEARCH CONFIGURATION english
> ALTER MAPPING FOR asciiword
> WITH my_synonym, english_stem;
>
>
> test=# select 'one'::tsvector @@ to_tsquery('english','one:*');
> ?column?
> ----------
> f
> (1 row)
>
> because 'one' was processed by my_synonym dictionary.
>
> test=# select ts_debug('english','one');
> ts_debug
> ------------------------------------------------------------------------------
> (asciiword,"Word, all ASCII",one,"{my_synonym,english_stem}",my_synonym,{1})
> (1 row)
>
>
>
> On Tue, 31 Aug 2010, Erik Rijkers wrote:
>
> > [docs from cvs HEAD]
> >
> > I found the text-search documentation a little unclear about 'prefix search'; specifically, the
> > examples do not show that the so-called 'prefix' is first stemmed, before it is used as prefix.
> >
> > For instance, the following can be a little surprising:
> >
> > SELECT to_tsvector( 'postgraduate' ) @@ to_tsquery( 'postgres:*' );
> > ?column?
> > ----------
> > t
> > (1 row)
> >
> > Because prefix search is such an important functionality I think this should be better explained,
> > which I hope the attached doc-patch does.
> >
> > (In textsearch.sgml is another mention + example of prefix search, perhaps it should be extended a
> > little there too - which I'm happy to do as well, but I first wanted to see if you agree that it
> > is a little too obscure as it stands)
> >
> >
> > Erik Rijkers
> >
>
> Regards,
> Oleg
> _____________________________________________________________
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83
>
> --
> Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-docs
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +
diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 2bf411d..10f0e59 100644
*** a/doc/src/sgml/datatype.sgml
--- b/doc/src/sgml/datatype.sgml
*************** SELECT 'super:*'::tsquery;
*** 3847,3853 ****
'super':*
</programlisting>
This query will match any word in a <type>tsvector</> that begins
! with <quote>super</>.
</para>
<para>
--- 3847,3874 ----
'super':*
</programlisting>
This query will match any word in a <type>tsvector</> that begins
! with <quote>super</>.
! </para>
!
! <para>
! Note that text search configuration processing happens before
! comparisons, which means this comparison returns <literal>true</>:
! <programlisting>
! SELECT to_tsvector( 'postgraduate' ) @@ to_tsquery( 'postgres:*' );
! ?column?
! ----------
! t
! (1 row)
! </programlisting>
! because <literal>postgres</> gets stemmed to <literal>postgr</>:
! <programlisting>
! SELECT to_tsquery('postgres:*');
! to_tsquery
! ------------
! 'postgr':*
! (1 row)
! </programlisting>
! which then matches <literal>postgraduate</>.
</para>
<para>
I came up with some better wording, which I have applied:
This query will match any word in a <type>tsvector</> that begins
with <quote>super</>. Note that prefixes are first processed by
text search configurations, which means this comparison returns
true:
---------------------------------------------------------------------------
bruce wrote:
>
> I applied a modified documentation patch (attached) that includes Oleg's
> suggestions.
>
> ---------------------------------------------------------------------------
>
> Oleg Bartunov wrote:
> > Erik,
> >
> > I think it'd be more clear if you say not 'stemmed', but processed in
> > according to configuration. Here is an example:
> >
> > $SHAREDIR/tsearch_data/my_synonyms.syn contains one line:
> > one 1
> >
> >
> > CREATE TEXT SEARCH DICTIONARY my_synonym (
> > TEMPLATE = synonym,
> > SYNONYMS = my_synonyms
> > );
> >
> > ALTER TEXT SEARCH CONFIGURATION english
> > ALTER MAPPING FOR asciiword
> > WITH my_synonym, english_stem;
> >
> >
> > test=# select 'one'::tsvector @@ to_tsquery('english','one:*');
> > ?column?
> > ----------
> > f
> > (1 row)
> >
> > because 'one' was processed by my_synonym dictionary.
> >
> > test=# select ts_debug('english','one');
> > ts_debug
> > ------------------------------------------------------------------------------
> > (asciiword,"Word, all ASCII",one,"{my_synonym,english_stem}",my_synonym,{1})
> > (1 row)
> >
> >
> >
> > On Tue, 31 Aug 2010, Erik Rijkers wrote:
> >
> > > [docs from cvs HEAD]
> > >
> > > I found the text-search documentation a little unclear about 'prefix search'; specifically, the
> > > examples do not show that the so-called 'prefix' is first stemmed, before it is used as prefix.
> > >
> > > For instance, the following can be a little surprising:
> > >
> > > SELECT to_tsvector( 'postgraduate' ) @@ to_tsquery( 'postgres:*' );
> > > ?column?
> > > ----------
> > > t
> > > (1 row)
> > >
> > > Because prefix search is such an important functionality I think this should be better explained,
> > > which I hope the attached doc-patch does.
> > >
> > > (In textsearch.sgml is another mention + example of prefix search, perhaps it should be extended a
> > > little there too - which I'm happy to do as well, but I first wanted to see if you agree that it
> > > is a little too obscure as it stands)
> > >
> > >
> > > Erik Rijkers
> > >
> >
> > Regards,
> > Oleg
> > _____________________________________________________________
> > Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> > Sternberg Astronomical Institute, Moscow University, Russia
> > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> > phone: +007(495)939-16-83, +007(495)939-23-83
> >
> > --
> > Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-docs
>
> --
> Bruce Momjian <bruce@momjian.us> http://momjian.us
> EnterpriseDB http://enterprisedb.com
>
> + It's impossible for everything to be true. +
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +