Re: BUG #16337: Finnish Ispell dictionary cannot be created
От | Kyotaro Horiguchi |
---|---|
Тема | Re: BUG #16337: Finnish Ispell dictionary cannot be created |
Дата | |
Msg-id | 20200413.173610.1847967467851370073.horikyota.ntt@gmail.com обсуждение исходный текст |
Ответ на | Re: BUG #16337: Finnish Ispell dictionary cannot be created (Artur Zakirov <zaartur@gmail.com>) |
Ответы |
Re: BUG #16337: Finnish Ispell dictionary cannot be created
(Artur Zakirov <zaartur@gmail.com>)
|
Список | pgsql-bugs |
Hello, Artur. At Sun, 12 Apr 2020 23:13:26 +0900, Artur Zakirov <zaartur@gmail.com> wrote in > On Fri, Apr 3, 2020 at 5:55 PM Tomas Vondra > <tomas.vondra@2ndquadrant.com> wrote: > > I'm not sure if it's a valid ispell format (it might be, but I'm not > > very good in reading the ispell manpage). But if it is, we should fix > > the code to be able to read it. > > I attached the simple patch which fixes PAE_INREPL state. Looking man 5 ispell, "Any character with special meaning to parser can be changed to an uniterpreted token by backslashing it". It depends on how we sholud be strict on that, but I think it is safer that we think that any character prefixed by a backslash is an word character. (I don't understand how '-' can be in a word by the definition in the .affix file, though.) Since a escaped character is intended to be a part of a word, there's no point in identifying minus-sign ad-hockerly, I think. So as the result parse_affentry would be something like the follows. while (*str) { if (t_iseq(str, '\\') && !isescaped) { str += pg_mblen(str); escaped = true; continue; } if (state == ..) { if (t_seq(str, <special>) && !escaped) <handle special> else if (t_isalpha() || escaped) <handle non-special (or word) character> else if (!t_isspace()) ereport(ERROR... ... str += pg_mblen(); escaped = false; } Is there a thouths or opinions? > I don't fully understand the ispell manpage either. I've looked the > ispell source code. They > use yacc for parsing. I'm not good at yacc but it seems that the > escape symbol is used > for all fields. But the patch fixes only PAE_INREPL state. > > Also I did some tests with ispell utility. For simplicity I fixed the > .aff file in the following way: > > flag *E: > . > YLI > . > YLI\- > > And I got the following results: > > word: ylijohdon > ok (derives from root JOHDON) > > word: yli-johdon > ok (derives from root JOHDON) > > word: yly-johdon > how about: yli-johdon > > So hyphen escaping works. And results for PostgreSQL with the patch > and the .aff file > fix: > > =# select ts_lexize('finnish_ispell', 'yli-johdon'); > ts_lexize > ------------------- > {johdon,johdossa} > =# select ts_lexize('finnish_ispell', 'ylijohdon'); > ts_lexize > ------------------- > {johdon,johdossa} regards. -- Kyotaro Horiguchi NTT Open Source Software Center
В списке pgsql-bugs по дате отправления:
Предыдущее
От: wenjingДата:
Сообщение: [bug] Table not have typarray when created by single user mode