Re: reducing the footprint of ScanKeyword (was Re: Large writablevariables)
От | Joerg Sonnenberger |
---|---|
Тема | Re: reducing the footprint of ScanKeyword (was Re: Large writablevariables) |
Дата | |
Msg-id | 20190103163340.GA15803@britannica.bec.de обсуждение исходный текст |
Ответ на | reducing the footprint of ScanKeyword (was Re: Large writable variables) (John Naylor <jcnaylor@gmail.com>) |
Ответы |
Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)
|
Список | pgsql-hackers |
On Sun, Dec 16, 2018 at 11:50:15AM -0500, John Naylor wrote: > A few months ago I was looking into faster search algorithms for > ScanKeywordLookup(), so this is interesting to me. While an optimal > full replacement would be a lot of work, the above ideas are much less > invasive and would still have some benefit. Unless anyone intends to > work on this, I'd like to flesh out the offset-into-giant-string > approach a bit further: Hello John, I was pointed at your patch on IRC and decided to look into adding my own pieces. What I can provide you is a fast perfect hash function generator. I've attached a sample hash function based on the current main keyword list. hash() essentially gives you the number of the only possible match, a final strcmp/memcmp is still necessary to verify that it is an actual keyword though. The |0x20 can be dropped if all cases have pre-lower-cased the input already. This would replace the binary search in the lookup functions. Returning offsets directly would be easy as well. That allows writing a single string where each entry is prefixed with a type mask, the token id, the length of the keyword and the actual keyword text. Does that sound useful to you? Joerg
Вложения
В списке pgsql-hackers по дате отправления: