Обсуждение: pgsql: Fix ts_headline() edge cases for empty query and empty search te
Fix ts_headline() edge cases for empty query and empty search text. tsquery's GETQUERY() macro is only safe to apply to a tsquery that is known non-empty; otherwise it gives a pointer to garbage. Before commit 5a617d75d, ts_headline() avoided this pitfall, but only in a very indirect, nonobvious way. (hlCover could not reach its TS_execute call, because if the query contains no lexemes then hlFirstIndex would surely return -1.) After that commit, it fell into the trap, resulting in weird errors such as "unrecognized operator" and/or valgrind complaints. In HEAD, fix this by not calling TS_execute_locations() at all for an empty query. In the back branches, add a defensive check to hlCover() --- that's not fixing any live bug, but I judge the code a bit too fragile as-is. Also, both mark_hl_fragments() and mark_hl_words() were careless about the possibility of empty search text: in the cases where no match has been found, they'd end up telling mark_fragment() to mark from word indexes 0 to 0 inclusive, even when there is no word 0. This is harmless since we over-allocated the prs->words array, but it does annoy valgrind. Fix so that the end index is -1 and thus mark_fragment() will do nothing in such cases. Bottom line is that this fixes a live bug in HEAD, but in the back branches it's only getting rid of a valgrind nitpick. Back-patch anyway. Per report from Alexander Lakhin. Discussion: https://postgr.es/m/c27f642d-020b-01ff-ae61-086af287c4fd@gmail.com Branch ------ master Details ------- https://git.postgresql.org/pg/commitdiff/029dea882a7aa34f46732473eed7c917505e6481 Modified Files -------------- src/backend/tsearch/wparser_def.c | 21 ++++++++++++++------- src/test/regress/expected/tsearch.out | 21 +++++++++++++++++++++ src/test/regress/sql/tsearch.sql | 6 ++++++ 3 files changed, 41 insertions(+), 7 deletions(-)
Hi, On 2023-04-06 19:52:55 +0000, Tom Lane wrote: > Fix ts_headline() edge cases for empty query and empty search text. Unfortunately there appears to be some portability issue with this. Causes a test output difference on macos. Noticed it when testing something I was about to push, and I was very confused for a second how my change could have a portability aspect :) https://cirrus-ci.com/build/6629557554380800 https://api.cirrus-ci.com/v1/artifact/task/4764833480966144/testrun/build/testrun/regress/regress/regression.diffs diff -U3 /Users/admin/pgsql/src/test/regress/expected/tsearch.out /Users/admin/pgsql/build/testrun/regress/regress/results/tsearch.out --- /Users/admin/pgsql/src/test/regress/expected/tsearch.out 2023-04-06 20:03:11 +++ /Users/admin/pgsql/build/testrun/regress/regress/results/tsearch.out 2023-04-06 20:05:56 @@ -2133,6 +2133,9 @@ NOTICE: text-search query doesn't contain lexemes: "" LINE 2: '', ''::tsquery); ^ +NOTICE: text-search query doesn't contain lexemes: "" +LINE 2: '', ''::tsquery); + ^ ts_headline ------------- Sure looks like somehow there's duplicate NOTICEs? Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2023-04-06 19:52:55 +0000, Tom Lane wrote: >> Fix ts_headline() edge cases for empty query and empty search text. > Unfortunately there appears to be some portability issue with this. Causes a > test output difference on macos. Huh ... looking. regards, tom lane
I wrote: > Andres Freund <andres@anarazel.de> writes: >> On 2023-04-06 19:52:55 +0000, Tom Lane wrote: >>> Fix ts_headline() edge cases for empty query and empty search text. >> Unfortunately there appears to be some portability issue with this. Causes a >> test output difference on macos. > Huh ... looking. It's not about macOS, it's about that build script using -DRANDOMIZE_ALLOCATED_MEMORY, which causes coerce_type() to call tsqueryin() twice. So you get the bleat twice. Ugh. I can work around it in these test cases by using to_tsquery() instead of just "''::tsquery", but maybe we should rethink something. regards, tom lane