BUG #6455: Wrong match of ipsell dict.
От | vincent.desmares@inovia-team.com |
---|---|
Тема | BUG #6455: Wrong match of ipsell dict. |
Дата | |
Msg-id | E1RwvqG-0000fE-Oj@wrigleys.postgresql.org обсуждение исходный текст |
Список | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 6455 Logged by: Desmares Vincent Email address: vincent.desmares@inovia-team.com PostgreSQL version: 9.1.0 Operating system: Ubuntu Description:=20=20=20=20=20=20=20=20 Hello everyone,=20 We recently discovered something that could be a "bug" when using the Full Text Search of Postgres. More precisely the ispell dictionary. It appears that words composed with the same character (like =E2=80=9Ca=E2= =80=9D, =E2=80=9Caa=E2=80=9D, =E2=80=9Caaa=E2=80=9D, ...) trigger all the prefix and suffix rules even if= nothing have been specified in the dictionary. We got the bug with the word =E2=80=9Ce=E2=80=9D which was associated to th= e word =E2=80=9Cdeer=E2=80=9D. Here is a short way to reproduce the bug from scratch : # 1) Create a test.dict with only =E2=80=9Ce=E2=80=9D inside cat =E2=80=9Ce=E2=80=9D > test.dict # 2) Create an empty test.stop file touch test.stop # 3) Create a test.affix file with rules : echo -e 'PFX C Y 1\nPFX C 0 de .\n\nSFX R Y 1\nSFX R 0 r e\n' > test.affix # 4) Execute those requests : DROP TEXT SEARCH DICTIONARY IF EXISTS testispell CASCADE; CREATE TEXT SEARCH DICTIONARY testispell ( TEMPLATE =3D ispell, DictFile =3D test, AffFile =3D test, StopWords =3D test ); CREATE TEXT SEARCH CONFIGURATION test_ispell ( PARSER =3D "default" ); ALTER TEXT SEARCH CONFIGURATION test_ispell ADD MAPPING FOR asciihword WITH testispell; ALTER TEXT SEARCH CONFIGURATION test_ispell ADD MAPPING FOR asciiword WITH testispell; ALTER TEXT SEARCH CONFIGURATION test_ispell ADD MAPPING FOR uint WITH testispell; ALTER TEXT SEARCH CONFIGURATION test_ispell ADD MAPPING FOR word WITH testispell; SELECT * from ts_debug('test_ispell', 'deer'); # 5) You should get a table with this result : alias : "asciiword" description : "Word, all ASCII" token : "deer" dictionaries : "{testispell}" dictionary : "testispell"=20 lexemes : "{e}" It appear that it=E2=80=99s reproductible with more characters of the same = letter : - .dict with [ee] searching for [deeer] give [ee] but - .dict with [ee] searching for [eer|deee] give nothing Did we miss a configuration or a default behavior, or there is really a bug ? Regards, Vincent Desmares Developer @ Inovia-team
В списке pgsql-bugs по дате отправления: