Re: strpos() && KMP
От | Pavel Ajtkulov |
---|---|
Тема | Re: strpos() && KMP |
Дата | |
Msg-id | 16410532421.20070811012749@acm.org обсуждение исходный текст |
Ответ на | Re: strpos() && KMP (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: strpos() && KMP
|
Список | pgsql-patches |
Tom Lane writes: >> hash table? > I'd think the cost of hashing would render it impractical. Most of the > papers I've seen on this topic worry about getting single instructions > out of the search loop --- a hash lookup will cost lots more than that. > Moreover, you'd lose the guarantee of not-worse-than-linear time, > because hash lookup can be pathologically bad if you get a lot of hash > collisions. compute max_wchar, min_wchar. If (d = max_wchar - min_wchar) < k (for example, k = 1000), then we use index table (wchar -> wchar - min_wchar). Else we use hash table. Number of collisions would be a few (because hash table needs for pattern characters only. Characters located serially, hash function = whchar % const). >> The main difficulty with BM is verification and understanding "good >> suffix shift" (the second part of BM) (I don't understand it entirely). > Yeah, there seem to be a bunch of variants of BM (many of them not > guaranteed linear, which I'm sure we don't want) and the earliest > papers had bugs. But a good implementation would be a lot easier > sell because it would show benefits for a much wider set of use-cases > than KMP. Is there requirement for some string mathching algorithms/data structure(suffix array/tree) in PG? or "We've had no complaints about the speed of those functions". ---- Ajtkulov Pavel ajtkulov@acm.org
В списке pgsql-patches по дате отправления: