Re: text_position worst case runtime
От | Mark Dilger |
---|---|
Тема | Re: text_position worst case runtime |
Дата | |
Msg-id | 446D2432.3030201@markdilger.com обсуждение исходный текст |
Ответ на | Re: text_position worst case runtime (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: text_position worst case runtime
|
Список | pgsql-hackers |
Tom Lane wrote: > Mark Dilger <pgsql@markdilger.com> writes: > >>The function >> static int32 text_position(text *t1, text *t2, int matchnum) >>defined in src/backend/utils/adt/varlena.c uses repeated calls to >>strncmp (or pg_wchar_strncmp) to find the location of the pattern in >>the text. The worst case runtime for such an approach is O(n*m) where >>n and m are the lengths of the pattern and text. The best case would >>be O(n), I guess, because it only takes n comparisons to find the >>pattern at the very start of the text. I'm not sure how to determine >>the average case runtime, because it depends what your data looks like >>on average. > > > I would think that the worst-case times would be fairly improbable. > I'm disinclined to push something as complicated as Boyer-Moore matching > into this function without considerable evidence that it's a performance > bottleneck for real applications. A common approach in biological data applications is to store nucleic and amino acid sequences as text in a relational database. The smaller alphabet sizes and the tendency for redundancy in these sequences increases the likelihood of a performance problem. I have solved this problem by writing my own data types with their own functions for sequence comparison and alignment, and I used boyer-moore for some of that work. Whether the same technique should be used for the text and varchar types was unclear to me, hence the question. > The question that comes to mind for me is why we're not using simple > strncmp in all cases in that code. The conversion to pg_wchar format > looks expensive and unnecessary ...
В списке pgsql-hackers по дате отправления: