Обсуждение: text type has no default operator class for GIN?
CREATE INDEX "ufile_name_search" ON "public"."uploadtree" USING GIN
("ufile_name");
ERROR: data type text has no default operator class for access method
"gin"
HINT: You must specify an operator class for the index or define a
default operator class for the data type.
This is on a new 8.3 install.
Why is this happening? Isn't creating a GIN index on a text type a
common thing to do?
I'm not seeing any mention of this in 8.3.7 docs cha 12 (Full Text
Search).
Do I really have to CREATE OPERATOR CLASS to make this work?
I see http://www.postgresql.org/docs/8.3/interactive/sql-createopclass.html
many thanks,
bob
On Aug 18, 2009, at 3:46 PM, Gobeille, Robert wrote:
> CREATE INDEX "ufile_name_search" ON "public"."uploadtree" USING GIN
> ("ufile_name");
>
> ERROR: data type text has no default operator class for access method
> "gin"
> HINT: You must specify an operator class for the index or define a
> default operator class for the data type.
>
> This is on a new 8.3 install.
> Why is this happening? Isn't creating a GIN index on a text type a
> common thing to do?
>
> I'm not seeing any mention of this in 8.3.7 docs cha 12 (Full Text
> Search).
>
> Do I really have to CREATE OPERATOR CLASS to make this work?
> I see http://www.postgresql.org/docs/8.3/interactive/sql-createopclass.html
I still don't understand the above, but the following works:
create index ufile_name_ginidx on uploadtree using
gin(to_tsvector('english', ufile_name));
Bob
On Tue, Aug 18, 2009 at 03:50:47PM -0600, Bob Gobeille wrote:
> On Aug 18, 2009, at 3:46 PM, Gobeille, Robert wrote:
> >CREATE INDEX "ufile_name_search" ON "public"."uploadtree" USING GIN
> >("ufile_name");
> >
> >ERROR: data type text has no default operator class for access method
> >"gin"
> >HINT: You must specify an operator class for the index or define a
> >default operator class for the data type.
[..]
> I still don't understand the above, but the following works:
>
> create index ufile_name_ginidx on uploadtree using
> gin(to_tsvector('english', ufile_name));
Not sure if understand very well myself, but GIN indexes can only speed
up specific access patterns and these are exposed through various
different operators.
When PG refuses to create a GIN index on a plain TEXT column it's saying
that it doesn't how to use those operators with a values of TEXT type.
As soon as you pull this value apart (with the to_tsvector) you end up
with something that PG can get some traction on and all is good.
Maybe a useful question to ask is, what are you expecting PG do to when
you create a GIN index on this TEXT column?
--
Sam http://samason.me.uk/
Sam Mason <sam@samason.me.uk> writes:
> On Tue, Aug 18, 2009 at 03:50:47PM -0600, Bob Gobeille wrote:
>>> CREATE INDEX "ufile_name_search" ON "public"."uploadtree" USING GIN
>>> ("ufile_name");
>>> ERROR: data type text has no default operator class for access method
>>> "gin"
> Not sure if understand very well myself, but GIN indexes can only speed
> up specific access patterns and these are exposed through various
> different operators.
What GIN indexes are good for is indexing equality queries on the
components of something the database otherwise thinks of as a single
object. For instance you can GIN-index searches for arrays containing
a particular value as a member.
Now type text doesn't have any built-in notion of a component, other
than individual characters, which aren't normally that interesting
to search for. What I suppose the OP has in mind is full-text
searching, which is looking for component *words*. But "word" is a
very language- and context-dependent concept. And defining which words
are to be considered equal for searching purposes is even more so.
If we'd hard-wired one notion of "word" into datatype text, it wouldn't
be very flexible. The point of the tsvector layer is to have a
configurable way to extract searchable words from a chunk of text.
There are also some implementation advantages like not having to repeat
that processing constantly during a search --- but the main point is
having a place to define what a word is and what search equality means.
regards, tom lane
On Tue, Aug 18, 2009 at 08:21:49PM -0400, Tom Lane wrote:
> Sam Mason <sam@samason.me.uk> writes:
> > On Tue, Aug 18, 2009 at 03:50:47PM -0600, Bob Gobeille wrote:
> >>> CREATE INDEX "ufile_name_search" ON "public"."uploadtree" USING GIN
> >>> ("ufile_name");
> >>> ERROR: data type text has no default operator class for access method
> >>> "gin"
>
> > Not sure if understand very well myself, but GIN indexes can only speed
> > up specific access patterns and these are exposed through various
> > different operators.
>
> What GIN indexes are good for is indexing equality queries on the
> components of something the database otherwise thinks of as a single
> object. For instance you can GIN-index searches for arrays containing
> a particular value as a member.
Yup, that's a much better description than I could muster!
> What I suppose the OP has in mind is full-text
> searching, which is looking for component *words*. But "word" is a
> very language- and context-dependent concept.
That's what I was trying to get the OP to think about when I said "what
are you expecting PG do to when you create a GIN index on this TEXT
column" but reading it back now I was being my normal oblique self.
Writing nice emails is depressingly difficult!
--
Sam http://samason.me.uk/
On Aug 18, 2009, at 6:21 PM, Tom Lane wrote: > What GIN indexes are good for is indexing equality queries on the > components of something the database otherwise thinks of as a single > object. For instance you can GIN-index searches for arrays containing > a particular value as a member. > > Now type text doesn't have any built-in notion of a component, other > than individual characters, which aren't normally that interesting > to search for. What I suppose the OP has in mind is full-text > searching, which is looking for component *words*. But "word" is a > very language- and context-dependent concept. And defining which > words > are to be considered equal for searching purposes is even more so. > If we'd hard-wired one notion of "word" into datatype text, it > wouldn't > be very flexible. The point of the tsvector layer is to have a > configurable way to extract searchable words from a chunk of text. > There are also some implementation advantages like not having to > repeat > that processing constantly during a search --- but the main point is > having a place to define what a word is and what search equality > means. Yes, I was looking for full text searching in english. Since my postgresql.conf contained: default_text_search_config = 'pg_catalog.english' doesn't this specify the parser, dictionary, and template to use for full text searching in english? I should have mentioned the above in my post but since it was in the original conf file (debian install) I didn't think of it. Bob Gobeille bobg@fossology.org
Bob Gobeille <bob.gobeille@hp.com> writes:
> Yes, I was looking for full text searching in english. Since my
> postgresql.conf contained:
> default_text_search_config = 'pg_catalog.english'
> doesn't this specify the parser, dictionary, and template to use for
> full text searching in english?
It does, but that's a default for tsvector, not bare text.
regards, tom lane