Re: BUG #18479: websearch_to_tsquery inconsistent behavior for german when using parentheses
| От | Tom Lane |
|---|---|
| Тема | Re: BUG #18479: websearch_to_tsquery inconsistent behavior for german when using parentheses |
| Дата | |
| Msg-id | 2130969.1718316260@sss.pgh.pa.us обсуждение исходный текст |
| Ответ на | BUG #18479: websearch_to_tsquery inconsistent behavior for german when using parentheses (PG Bug reporting form <noreply@postgresql.org>) |
| Ответы |
Re: BUG #18479: websearch_to_tsquery inconsistent behavior for german when using parentheses
|
| Список | pgsql-bugs |
PG Bug reporting form <noreply@postgresql.org> writes: > Although the docs > https://www.postgresql.org/docs/current/textsearch-controls.html say nothing > about websearch_to_tsquery supporting parentheses in queries, I noticed some > inconsistent behaviour when using multiple 'or' keywords with parentheses in > postgres 15.4 The definition of websearch_to_tsquery says pretty plainly that "Other punctuation is ignored". So I'd expect parens to do nothing. That makes this problematic: > select websearch_to_tsquery('german', 'foo or baz bar or (ding dong)'); > websearch_to_tsquery > ----------------------------------------- > 'foo' | 'baz' & 'bar' | 'ding' & 'dong' > select websearch_to_tsquery('german', 'foo or (baz bar) or (ding dong)'); > websearch_to_tsquery > ------------------------------------------------ > 'foo' | 'baz' & 'bar' & 'or' & 'ding' & 'dong' I found what seems to be the issue in gettoken_query_websearch: it ignores ISOPERATOR chars (including parens) in WAITOPERAND state, but not in WAITOPERATOR state. That results in switching back to WAITOPERAND state which will consume the "or" as a regular word. So a minimal fix could look like the attached. It's fairly confusing that this code manages to ignore not-ISOPERATOR punctuation. It seems like that gets eaten by gettoken_tsvector() and then later we decide there's not really a word there. I'm also confused how come the same thing doesn't happen in the english tsconfig. Not sure it's worth poking at more, though. regards, tom lane diff --git a/src/backend/utils/adt/tsquery.c b/src/backend/utils/adt/tsquery.c index 690a80d774..eb08e912ea 100644 --- a/src/backend/utils/adt/tsquery.c +++ b/src/backend/utils/adt/tsquery.c @@ -492,6 +492,12 @@ gettoken_query_websearch(TSQueryParserState state, int8 *operator, *operator = OP_OR; return PT_OPR; } + else if (ISOPERATOR(state->buf)) + { + /* ignore other operators here too */ + state->buf++; + continue; + } else if (*state->buf == '\0') { return PT_END;
В списке pgsql-bugs по дате отправления: