Re: tsearch parser inefficiency if text includes urls or emails - new version

Поиск
Список
Период
Сортировка
От Kevin Grittner
Тема Re: tsearch parser inefficiency if text includes urls or emails - new version
Дата
Msg-id 4B20E530020000250002D305@gw.wicourts.gov
обсуждение исходный текст
Ответ на Re: tsearch parser inefficiency if text includes urls or emails - new version  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Ответы Re: tsearch parser inefficiency if text includes urls or emails - new version  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Re: tsearch parser inefficiency if text includes urls or emails - new version  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
I wrote:
> Thanks for the sample which shows the difference.
Ah, once I got on the right track, there is no problem seeing
dramatic improvements with the patch.  It changes some nasty O(N^2)
cases to O(N).  In particular, the fixes affect parsing of large
strings encoded with multi-byte character encodings and containing
email addresses or URLs with a non-IP-address host component.  It
strikes me as odd that URLs without a slash following the host
portion, or with an IP address, are treated so differently in the
parser, but if we want to address that, it's a matter for another
patch.
I'm inclined to think that the minimal differences found in some of
my tests probably have more to do with happenstance of code
alignment than the particulars of the patch.
I did find one significant (although easily solved) problem.  In the
patch, the recursive setup of usewide, pgwstr, and wstr are not
conditioned by #ifdef USE_WIDE_UPPER_LOWER in the non-patched
version.  Unless there's a good reason for that, the #ifdef should
be added.
Less critical, but worth fixing one way or the other, TParserClose
does not drop breadcrumbs conditioned on #ifdef WPARSER_TRACE, but
TParserCopyClose does.  I think this should be consistent.
Finally, there's that spelling error in the comment for
TParserCopyInit.  Please fix.
If a patch is produced with fixes for these three things, I'd say
it'll be ready for committer.  I'm marking it as Waiting on Author
for fixes to these three items.
Sorry for the delay in review.  I hope there's still time to get
this committed in this CF.
-Kevin


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Need --without-docs build switch
Следующее
От: Andrew Dunstan
Дата:
Сообщение: Re: explain output infelicity in psql