Re: reducing the footprint of ScanKeyword (was Re: Large writablevariables)
От | Andrew Dunstan |
---|---|
Тема | Re: reducing the footprint of ScanKeyword (was Re: Large writablevariables) |
Дата | |
Msg-id | ceb23715-78dd-81b0-4760-da00a8b08ffd@2ndQuadrant.com обсуждение исходный текст |
Ответ на | Re: reducing the footprint of ScanKeyword (was Re: Large writable variables) (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)
|
Список | pgsql-hackers |
On 12/27/18 12:12 PM, Tom Lane wrote: >> I don't buy that we're inable to write a descent parser that way. > I do not think that we could write one for the current state of the > PG grammar without an investment of effort so large that it's not > going to happen. Even if such a parser were to spring fully armed > from somebody's forehead, we absolutely cannot expect that it would > continue to work correctly after non-wizard contributors modify it. I just did a quick survey of generator tools. Unfortunately, the best candidate alternative (ANTLR) no longer supports generating plain C code. I don't know of another tool that is well maintained, supports C, and generates top down parsers. Twenty-five years ago or so I wrote a top-down table-driven parser generator, but that was in another country, and besides, the wench is dead. There are well known techniques (See s 4.4 of the Dragon Book, if you have a copy) for formal analysis of grammars to determine predictive parser action. They aren't hard, and the tables they produce are typically much smaller than those used for LALR parsers. Still, probably not for the faint of heart. The tools that have moved to using hand cut RD parsers have done so precisely because they get a significant performance benefit from doing so. RD parsers are not terribly hard to write. Yes, the JSON grammar is tiny, but I think I wrote the basics of the RD parser we use for JSON in about an hour. I think arguing that our hacker base is not competent to maintain such a thing for the SQL grammar is wrong. We successfully maintain vastly more complex pieces of code. Having said all that, I don't intend to spend any time on implementing an alternative parser. It would as you say involve a heck of a lot of time, which I don't have. It would be a fine academic research project for some student. A smaller project might be to see if we can replace the binary keyword search in ScanKeyword with a perfect hashing function generated by gperf, or something similar. I had a quick look at that, too. Unfortunately the smallest hash table I could generate for our 440 symbols had 1815 entries, so I'm not sure how well that would work. Worth investigating, though. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-hackers по дате отправления: