Обсуждение: tsvector/tsearch equality and/or portability issue issue ?

Поиск
Список
Период
Сортировка

tsvector/tsearch equality and/or portability issue issue ?

От
Stefan Kaltenbrunner
Дата:
We just had a complaint on IRC that:

devel=# select 'blah foo bar'::tsvector = 'blah foo bar'::tsvector;?column?
----------f
(1 row)

and that searches for certain values would not return all matches under
some circumstances.

a little bit of testing shows the following:

postgres=# create table foo (bla tsvector);
CREATE TABLE
postgres=# insert into foo values ('bla bla');
INSERT 0 1
postgres=# insert into foo values ('bla bla');
INSERT 0 1
postgres=# select bla from foo group by bla; bla
-------'bla'
(1 row)

postgres=# create index foo_idx on foo(bla);
CREATE INDEX
postgres=# set enable_seqscan to off;
SET
postgres=# select bla from foo group by bla; bla
-------'bla''bla'
(2 rows)

postgres=# set enable_seqscan to on;
SET
postgres=# select bla from foo group by bla; bla
-------'bla'
(1 row)

ouch :-(

I can reproduce that at least on OpenBSD/i386 and Debian Etch/x86_64.

It is also noteworthy that the existing regression tests for tsearch2 do
not seem to do any equality testing ...


Stefan


Re: tsvector/tsearch equality and/or portability issue issue ?

От
"Andrew J. Kopciuch"
Дата:
On Thursday 24 August 2006 10:34, Stefan Kaltenbrunner wrote:
> We just had a complaint on IRC that:
>
> devel=# select 'blah foo bar'::tsvector = 'blah foo bar'::tsvector;
>  ?column?
> ----------
>  f
> (1 row)
>


This could be an endianess issue?

This was probably the same person who posted this on the OpenFTS list.

He's compiled from source :

<snip>
dew=# select version();
PostgreSQL 8.1.4 on powerpc-apple-darwin8.6.0, compiled by GCCpowerpc-apple-darwin8-gcc-4.0.1 (GCC) 4.0.1 (Apple
Computer,Inc. build5250)
 
</snip>

I don't have any access to an OSX box to verify things ATM.  I am trying to 
get access to one though.  :S   Can someone else verify this right now?



Andy


Re: tsvector/tsearch equality and/or portability issue

От
Teodor Sigaev
Дата:
> devel=# select 'blah foo bar'::tsvector = 'blah foo bar'::tsvector;
>  ?column?
> ----------
>  f
> (1 row)

Fixed in 8.1 and HEAD. Thank you

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 


Re: tsvector/tsearch equality and/or portability issue issue ?

От
AgentM
Дата:
On Aug 24, 2006, at 12:58 , Andrew J. Kopciuch wrote:

> On Thursday 24 August 2006 10:34, Stefan Kaltenbrunner wrote:
>> We just had a complaint on IRC that:
>>
>> devel=# select 'blah foo bar'::tsvector = 'blah foo bar'::tsvector;
>>  ?column?
>> ----------
>>  f
>> (1 row)
>>
>
>
> This could be an endianess issue?
>
> This was probably the same person who posted this on the OpenFTS list.
>
> He's compiled from source :
>
> <snip>
> dew=# select version();
> PostgreSQL 8.1.4 on powerpc-apple-darwin8.6.0, compiled by GCC
>  powerpc-apple-darwin8-gcc-4.0.1 (GCC) 4.0.1 (Apple Computer, Inc.  
> build
>  5250)
> </snip>
>
> I don't have any access to an OSX box to verify things ATM.  I am  
> trying to
> get access to one though.  :S   Can someone else verify this right  
> now?

Stefan said he reproduced on OpenBSD/i386 so it is unlikely to be an  
endianness issue. Anyway, here's the comparison code- I guess it  
doesn't use strcmp to avoid encoding silliness. (?)

static int
silly_cmp_tsvector(const tsvector * a, const tsvector * b)
{        if (a->len < b->len)                return -1;        else if (a->len > b->len)                return 1;
else if (a->size < b->size)                return -1;        else if (a->size > b->size)                return 1;
else        {                unsigned char *aptr = (unsigned char *) (a->data) +  
 
DATAHDRSIZE;                unsigned char *bptr = (unsigned char *) (b->data) +  
DATAHDRSIZE;
                while (aptr - ((unsigned char *) (a->data)) < a->len)                {                        if (*aptr
!=*bptr)                                return (*aptr < *bptr) ? -1 : 1;                        aptr++;
      bptr++;                }        }        return 0;
 
}



Re: tsvector/tsearch equality and/or portability issue issue ?

От
Tom Lane
Дата:
"Andrew J. Kopciuch" <akopciuch@bddf.ca> writes:
> On Thursday 24 August 2006 10:34, Stefan Kaltenbrunner wrote:
>> devel=# select 'blah foo bar'::tsvector = 'blah foo bar'::tsvector;
>> ?column?
>> ----------
>> f
>> (1 row)

> This could be an endianess issue?

Apparently not, it works for me on HPPA (big endian) and on Darwin/PPC
(ditto).  I'm testing CVS HEAD though, not 8.1 branch.

However ... I also see that tsearch2's regression test is dumping
core on my OS X machine.  I haven't cvs update'd for awhile on this
machine though --- will bring it to HEAD and report back.

Can some other people try this?  We need to get a handle on which
machines show the problem.
        regards, tom lane


Re: tsvector/tsearch equality and/or portability issue

От
Stefan Kaltenbrunner
Дата:
Teodor Sigaev wrote:
>> devel=# select 'blah foo bar'::tsvector = 'blah foo bar'::tsvector;
>>  ?column?
>> ----------
>>  f
>> (1 row)
> 
> Fixed in 8.1 and HEAD. Thank you

thanks for the fast response - would it maybe be worthwhile to add
regression tests for this kind of thing though ?


Stefan


Re: tsvector/tsearch equality and/or portability issue

От
Teodor Sigaev
Дата:
> Stefan said he reproduced on OpenBSD/i386 so it is unlikely to be an 
> endianness issue. Anyway, here's the comparison code- I guess it doesn't 
> use strcmp to avoid encoding silliness. (?)

I suppose that ordering for tsvector type is some strange and it hasn't any 
matter. For me, it's a secret why it's needed :)
The reason of bug was: some internal parts of tsvector should be shortaligned, 
so there was an unused bytes. Previous comparing function compares they too...



-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 


Re: tsvector/tsearch equality and/or portability issue

От
"Joshua D. Drake"
Дата:
Tom Lane wrote:
> "Andrew J. Kopciuch" <akopciuch@bddf.ca> writes:
>> On Thursday 24 August 2006 10:34, Stefan Kaltenbrunner wrote:
>>> devel=# select 'blah foo bar'::tsvector = 'blah foo bar'::tsvector;
>>> ?column?
>>> ----------
>>> f
>>> (1 row)
> 
>> This could be an endianess issue?
> 
> Apparently not, it works for me on HPPA (big endian) and on Darwin/PPC
> (ditto).  I'm testing CVS HEAD though, not 8.1 branch.
> 
> However ... I also see that tsearch2's regression test is dumping
> core on my OS X machine.  I haven't cvs update'd for awhile on this
> machine though --- will bring it to HEAD and report back.
> 
> Can some other people try this?  We need to get a handle on which
> machines show the problem.

I am trying on current copy of HEAD.. however:

jd@scratch:~/pgsqldev$ bin/psql -U postgres postgres < 
share/contrib/tsearch2.sql
SET
BEGIN
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index 
"pg_ts_dict_pkey" for table "pg_ts_dict"
CREATE TABLE
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
INSERT 57434167 1
CREATE FUNCTION
CREATE FUNCTION
INSERT 57434170 1
ERROR:  could not find function "snb_ru_init_koi8" in file 
"/usr/local/pgsql/lib/tsearch2.so"
ERROR:  current transaction is aborted, commands ignored until end of 
transaction block
ERROR:  current transaction is aborted, commands ignored until end of 
transaction block

I will try on 8.1 in a moment.

Joshua D. Drake



> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
> 
>                http://www.postgresql.org/docs/faq
> 


-- 
   === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240   Providing the most comprehensive  PostgreSQL
solutionssince 1997             http://www.commandprompt.com/
 




Re: tsvector/tsearch equality and/or portability issue

От
"Joshua D. Drake"
Дата:
> Can some other people try this?  We need to get a handle on which
> machines show the problem.

d@scratch:~/pgsqldev$ /usr/local/pgsql/bin/psql -U postgres postgres
Welcome to psql 8.1.3, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms       \h for help with SQL commands       \? for help with psql commands
\gor terminate with semicolon to execute query       \q to quit
 

postgres=# select 'blah foo bar'::tsvector = 'blah foo bar'::tsvector; ?column?
---------- t
(1 row)

postgres=#


AMD 64 X2, Ubuntu Dapper LTS.

Sincerely,

Joshua D. Drake






> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
> 
>                http://www.postgresql.org/docs/faq
> 


-- 
   === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240   Providing the most comprehensive  PostgreSQL
solutionssince 1997             http://www.commandprompt.com/
 




Re: tsvector/tsearch equality and/or portability issue

От
"Joshua D. Drake"
Дата:
>> Can some other people try this?  We need to get a handle on which
>> machines show the problem.
> 
> I am trying on current copy of HEAD.. however:

Ignore the below... This is an error with my linker/ld.so.conf

Joshua D. Drake

> 
> jd@scratch:~/pgsqldev$ bin/psql -U postgres postgres < 
> share/contrib/tsearch2.sql
> SET
> BEGIN
> NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index 
> "pg_ts_dict_pkey" for table "pg_ts_dict"
> CREATE TABLE
> CREATE FUNCTION
> CREATE FUNCTION
> CREATE FUNCTION
> CREATE FUNCTION
> CREATE FUNCTION
> CREATE FUNCTION
> CREATE FUNCTION
> INSERT 57434167 1
> CREATE FUNCTION
> CREATE FUNCTION
> INSERT 57434170 1
> ERROR:  could not find function "snb_ru_init_koi8" in file 
> "/usr/local/pgsql/lib/tsearch2.so"
> ERROR:  current transaction is aborted, commands ignored until end of 
> transaction block
> ERROR:  current transaction is aborted, commands ignored until end of 
> transaction block
> 
> I will try on 8.1 in a moment.
> 
> Joshua D. Drake
> 
> 
> 
>>
>>             regards, tom lane
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 3: Have you checked our extensive FAQ?
>>
>>                http://www.postgresql.org/docs/faq
>>
> 
> 


-- 
   === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240   Providing the most comprehensive  PostgreSQL
solutionssince 1997             http://www.commandprompt.com/
 




Re: tsvector/tsearch equality and/or portability issue

От
Tom Lane
Дата:
Teodor Sigaev <teodor@sigaev.ru> writes:
> Fixed in 8.1 and HEAD. Thank you

This appears to have created a regression test failure:

*** ./expected/tsearch2.out    Sun Jun 18 12:55:28 2006
--- ./results/tsearch2.out    Thu Aug 24 14:30:02 2006
***************
*** 2496,2503 ****  f        |



                                                                                        f        | '345':1 'qwerti':2
'copyright':3



                                     f        | 'qq':7 'bar':2,8 'foo':1,3,6 'copyright':9



                                                                                                       
 
-  f        | 'a':1A,2,3C 'b':5A,6B,7C,8B



                                                                       f        | 'a':1A,2,3B 'b':5A,6A,7C,8




                  f        | '7w' 'ch' 'd7' 'eo' 'gw' 'i4' 'lq' 'o6' 'qt' 'y0'



                                                                                      f        | 'ar' 'ei' 'kq' 'ma'
'qa''qh' 'qq' 'qz' 'rx' 'st'



                                    f        | 'gs' 'i6' 'i9' 'j2' 'l0' 'oq' 'qx' 'sc' 'xe' 'yu'



                                                                                                      
 
--- 2496,2503 ----  f        |   f        | '345':1 'qwerti':2 'copyright':3  f        | 'qq':7 'bar':2,8 'foo':1,3,6
'copyright':9 f        | 'a':1A,2,3B 'b':5A,6A,7C,8
 
+  f        | 'a':1A,2,3C 'b':5A,6B,7C,8B  f        | '7w' 'ch' 'd7' 'eo' 'gw' 'i4' 'lq' 'o6' 'qt' 'y0'  f        |
'ar''ei' 'kq' 'ma' 'qa' 'qh' 'qq' 'qz' 'rx' 'st'  f        | 'gs' 'i6' 'i9' 'j2' 'l0' 'oq' 'qx' 'sc' 'xe' 'yu'
 

======================================================================

        regards, tom lane


Re: tsvector/tsearch equality and/or portability issue

От
Tom Lane
Дата:
"Joshua D. Drake" <jd@commandprompt.com> writes:
>>> Can some other people try this?  We need to get a handle on which
>>> machines show the problem.
>> 
>> I am trying on current copy of HEAD.. however:

Looks like Teodor already solved the problem, so no need for a fire
drill anymore.
        regards, tom lane


Re: tsvector/tsearch equality and/or portability issue

От
Teodor Sigaev
Дата:
Oops. Fixed.

Tom Lane wrote:
> Teodor Sigaev <teodor@sigaev.ru> writes:
>> Fixed in 8.1 and HEAD. Thank you
> 
> This appears to have created a regression test failure:
> 
> *** ./expected/tsearch2.out    Sun Jun 18 12:55:28 2006
> --- ./results/tsearch2.out    Thu Aug 24 14:30:02 2006
> ***************
> *** 2496,2503 ****
>    f        |



                                                                       
 
>    f        | '345':1 'qwerti':2 'copyright':3



                                                                       
 
>    f        | 'qq':7 'bar':2,8 'foo':1,3,6 'copyright':9



                                                                       
 
> -  f        | 'a':1A,2,3C 'b':5A,6B,7C,8B



                                                                       
 
>    f        | 'a':1A,2,3B 'b':5A,6A,7C,8



                                                                       
 
>    f        | '7w' 'ch' 'd7' 'eo' 'gw' 'i4' 'lq' 'o6' 'qt' 'y0'



                                                                       
 
>    f        | 'ar' 'ei' 'kq' 'ma' 'qa' 'qh' 'qq' 'qz' 'rx' 'st'



                                                                       
 
>    f        | 'gs' 'i6' 'i9' 'j2' 'l0' 'oq' 'qx' 'sc' 'xe' 'yu'



                                                                       
 
> --- 2496,2503 ----
>    f        | 
>    f        | '345':1 'qwerti':2 'copyright':3
>    f        | 'qq':7 'bar':2,8 'foo':1,3,6 'copyright':9
>    f        | 'a':1A,2,3B 'b':5A,6A,7C,8
> +  f        | 'a':1A,2,3C 'b':5A,6B,7C,8B
>    f        | '7w' 'ch' 'd7' 'eo' 'gw' 'i4' 'lq' 'o6' 'qt' 'y0'
>    f        | 'ar' 'ei' 'kq' 'ma' 'qa' 'qh' 'qq' 'qz' 'rx' 'st'
>    f        | 'gs' 'i6' 'i9' 'j2' 'l0' 'oq' 'qx' 'sc' 'xe' 'yu'
> 
> ======================================================================
> 
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 


Re: tsvector/tsearch equality and/or portability issue

От
Phil Frost
Дата:
On Thu, Aug 24, 2006 at 09:40:13PM +0400, Teodor Sigaev wrote:
> >devel=# select 'blah foo bar'::tsvector = 'blah foo bar'::tsvector;
> > ?column?
> >----------
> > f
> >(1 row)
> 
> Fixed in 8.1 and HEAD. Thank you

Things still seem to be broken for me. Among other things, the script at
<http://unununium.org/~indigo/testvectors.sql.bz2> fails. It performs two
tests, comparing 1000 random vectors with positions and random weights, and
comparing the same vectors, but stripped. Oddly, the unstripped comparisons all
pass, which is not consistant with what I am seeing in my database. However,
I'm yet unable to reproduce those problems.

It's worth noting that in running this script I have seen the number of
failures change, which seems to indicate that some uninitialized memory
is still being compared.

test=# \i testvectors.sql 
BEGIN
CREATE FUNCTION
CREATE TABLEtotal vectors in test set 
---------------------------                     1000
(1 row)
failing unstripped equality 
-----------------------------                          0
(1 row)
failing stripped equality 
---------------------------                      389
(1 row)

ROLLBACK
test=# 


Re: tsvector/tsearch equality and/or portability issue

От
Tom Lane
Дата:
Phil Frost <indigo@bitglue.com> writes:
> Things still seem to be broken for me. Among other things, the script at
> <http://unununium.org/~indigo/testvectors.sql.bz2> fails. It performs two
> tests, comparing 1000 random vectors with positions and random weights, and
> comparing the same vectors, but stripped. Oddly, the unstripped comparisons all
> pass, which is not consistant with what I am seeing in my database. However,
> I'm yet unable to reproduce those problems.

It looks to me like tsvector comparison may be too strong.  The strip()
function evidently thinks that it's OK to rearrange the string chunks
into the same order as the WordEntry items, which suggests to me that
the "pos" fields are not really semantically significant.  But 
silly_cmp_tsvector() considers that a difference in pos values is
important.  I don't understand the data structure well enough to know
which one to believe, but something's not consistent here.
        regards, tom lane


Re: tsvector/tsearch equality and/or portability issue

От
Teodor Sigaev
Дата:
>> comparing the same vectors, but stripped. Oddly, the unstripped comparisons all
>> pass, which is not consistant with what I am seeing in my database. However,
>> I'm yet unable to reproduce those problems.

Fixed: strncmp was called with wrong length parameter.

> 
> It looks to me like tsvector comparison may be too strong.  The strip()
> function evidently thinks that it's OK to rearrange the string chunks
> into the same order as the WordEntry items, which suggests to me that
> the "pos" fields are not really semantically significant.  But 
> silly_cmp_tsvector() considers that a difference in pos values is
> important.  I don't understand the data structure well enough to know
> which one to believe, but something's not consistent here.

You are right: Pos really means position of lexeme itself in a tail of tsvector 
structure. So, it's removed from comparison.

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/