Re: tsearch parser inefficiency if text includes urls or emails - new version
От | Andres Freund |
---|---|
Тема | Re: tsearch parser inefficiency if text includes urls or emails - new version |
Дата | |
Msg-id | 200912102005.16560.andres@anarazel.de обсуждение исходный текст |
Ответ на | Re: tsearch parser inefficiency if text includes urls or emails - new version ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>) |
Ответы |
Re: tsearch parser inefficiency if text includes
urls or emails - new version
|
Список | pgsql-hackers |
On Thursday 10 December 2009 19:10:24 Kevin Grittner wrote: > I wrote: > > Thanks for the sample which shows the difference. > > Ah, once I got on the right track, there is no problem seeing > dramatic improvements with the patch. It changes some nasty O(N^2) > cases to O(N). In particular, the fixes affect parsing of large > strings encoded with multi-byte character encodings and containing > email addresses or URLs with a non-IP-address host component. It > strikes me as odd that URLs without a slash following the host > portion, or with an IP address, are treated so differently in the > parser, but if we want to address that, it's a matter for another > patch. Same here. Generally I do have to say that I dont find that parser very nice - and actually I think its not very efficient as well because it searches a big part of the search space all the time. I think a generated parser would be more efficient and way much easier to read... > I'm inclined to think that the minimal differences found in some of > my tests probably have more to do with happenstance of code > alignment than the particulars of the patch. Yes, I think that as well. > I did find one significant (although easily solved) problem. In the > patch, the recursive setup of usewide, pgwstr, and wstr are not > conditioned by #ifdef USE_WIDE_UPPER_LOWER in the non-patched > version. Unless there's a good reason for that, the #ifdef should > be added. Uh. Sorry. Fixed. > Less critical, but worth fixing one way or the other, TParserClose > does not drop breadcrumbs conditioned on #ifdef WPARSER_TRACE, but > TParserCopyClose does. I think this should be consistent. Actually there was *some* thinking behind that: The end of parsing is quite obvious (the call returns, and its so verbose you will never do more than one call anyway) - but knowing where the copy ends is quite important to understand the parse flow. I added a log there because in the end that line is not going to make any difference ;-) > Sorry for the delay in review. I hope there's still time to get > this committed in this CF. Thanks for your reviewing! Actually I dont mind very much if it gets delayed or not. Its trivial enough that it shouldnt cause much work/conflicts/whatever next round and I am running patched versions anyway, so ... Andres
В списке pgsql-hackers по дате отправления: