Re: WIP: index support for regexp search
От | Heikki Linnakangas |
---|---|
Тема | Re: WIP: index support for regexp search |
Дата | |
Msg-id | 50BC795C.5000509@vmware.com обсуждение исходный текст |
Ответ на | Re: WIP: index support for regexp search (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: WIP: index support for regexp search
|
Список | pgsql-hackers |
On 02.12.2012 20:19, Tom Lane wrote: > Alexander Korotkov<aekorotkov@gmail.com> writes: >> Nice idea to delay expanding colors to characters! Obviously, we should >> delay expanding inly alphanumerical characters. Because non-alphanumberical >> characters influence graph structure. Trying to implement... > > Uh, why would that be? Colors are colors. The regexp machinery doesn't > care whether they represent alphanumerics or not. (Or to be more > precise, if there is a situation where it makes a difference, separate > colors will have been created for each set of characters that need to be > distinguished.) The regexp machinery doesn't care, but the trigrams that pg_trgm extracts only contain alphanumeric characters. So if by looking at the CNFA graph produced by the regexp machinery you conclude that any matching strings must contain three-letter sequences "%oo" and "#oo", you can just club them together into " oo" trigram. I think you can run a pre-processing step to the colors, and merge colors that are equivalent as far as trigrams are considered. For example, if you have a color that contains only character '%', and another that contains character '#', you can treat them as the same hcolor. You might then be able to simplify the CNFA. Actually, it would be even better if you could apply the pre-processing to the regexp before the regexp machinery turns it into a CNFA. Not sure how easy it would be to do such pre-processing. BTW, why create the path matrix? You could check the "check" array of trigrams in the consistent function directly against the graph. Consistent should return true, iff there is a path through the graph following only arcs that contain trigrams present in the check array. Finding a path through a complex graph could be expensive, O(|E|), but if the path is complex, the path matrix would be large as well, and checking against a large matrix isn't exactly free either. It would allow you to avoid "overflows" caused by having too many paths through the graph. - Heikki
В списке pgsql-hackers по дате отправления: