Обсуждение: Summer of Code idea
Hi. I have been thinking about this for a while and now that Google Summer of Code is coming I thought I would share this idea. The GCC people have traded their bison/flex parser with a hand written recursive-descent parser for a nice speed up. So it would be interesting to see if PostgreSQL would benefit from the same 'switch'. By the looks of it *) the job could be completed within the time frame and maybe pgbench could serve as the testing framework for the performance measurements. I think it has an academic angle to it -- something fun for the student and maybe a speed up for PostgreSQL :) Just an idea... Keep up the great work you are doing ! Best regards,Jesper *) http://gcc.gnu.org/ml/gcc-patches/2004-10/msg01969.html
On Tue, Apr 25, 2006 at 10:30:26PM +0200, Jesper Pedersen wrote: > Hi. > > I have been thinking about this for a while and now that Google Summer of Code > is coming I thought I would share this idea. > > The GCC people have traded their bison/flex parser with a hand written > recursive-descent parser for a nice speed up. Nice? The figures I'm seeing are 2%. Is that even noticable? > So it would be interesting to see if PostgreSQL would benefit from the same > 'switch'. > > By the looks of it *) the job could be completed within the time frame and > maybe pgbench could serve as the testing framework for the performance > measurements. I think it's worth a try, but you have to consider that unlike C, SQL as a language keeps changing in ways we have no idea about yet. Whatever the result is, it has to be more maintainable than what we have now... > I think it has an academic angle to it -- something fun for the student and > maybe a speed up for PostgreSQL :) Abosolutly. It'd be a fun experiment, if one were so inclined. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Jesper Pedersen wrote: > I have been thinking about this for a while and now that Google Summer of Code > is coming I thought I would share this idea. > > The GCC people have traded their bison/flex parser with a hand written > recursive-descent parser for a nice speed up. > > So it would be interesting to see if PostgreSQL would benefit from the same > 'switch'. We talked about it when GCC announced their switch. The conclusion was that our grammar is still too much a moving target, so it would be too difficult to mantain such a grammar. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On 4/27/06, Alvaro Herrera <alvherre@commandprompt.com> wrote: > We talked about it when GCC announced their switch. The conclusion was > that our grammar is still too much a moving target, so it would be too > difficult to mantain such a grammar. For the sake of saying again, I already have a recursive-descent parser for PostgreSQL written in a PCCTS grammar. It's something I started writing years ago, but I'd be willing to consider open sourcing it if the PostgreSQL community will really entertain the thought of switching. Unfortunately, this discussion usually ends up with, "why would we want to change what we have now when it already works?" -- Jonah H. Harris, Database Internals Architect EnterpriseDB Corporation 732.331.1324
* Jonah H. Harris (jonah.harris@gmail.com) wrote: > Unfortunately, this discussion usually ends up with, "why would we > want to change what we have now when it already works?" The answer to that can certainly be "performance" provided other factors (such as maintainability) don't change much. If you could show that then I think such a switch would be very seriously considered. Of course, the performance improvment would have to be some substantial amount (I would guess at least 5%, maybe 10%) to warrent the learning curve associated with changing it. If you're not interested or not able to do the performance comparison then I guess you'd need to find someone who is and then work out if they'd be willing to do the testing under some NDA in case it doesn't pan out. Thanks, Stephen
"Jonah H. Harris" <jonah.harris@gmail.com> writes: > Unfortunately, this discussion usually ends up with, "why would we > want to change what we have now when it already works?" ... and is far more maintainable than an RD parser, and is not a performance bottleneck. I've never seen yyparse occupy as much as 2% of a backend profile ... regards, tom lane
On 4/27/06, Stephen Frost <sfrost@snowman.net> wrote: > The answer to that can certainly be "performance" provided other factors > (such as maintainability) don't change much. If you could show that > then I think such a switch would be very seriously considered. IMHO, switching parser-types (and parser generators) is more about maintainability than performance itself. SQL is much nicer in recursive descent where you don't have yacc/bison limitations such as 1 token of lookahead and non-ebnf grammars. The sort-of odd thing is that PCCTS (like its much younger brother ANTLR) is intended on generating ASTs whereas yacc/bison requires you to build the parse tree manually (as we do). Don't get me wrong, this was taken into consideration with PCCTS, but it's not as optimal or beautiful as it would be to have PCCTS itself generate the parse tree. Still, it's nicer to maintain than a yacc/bison grammar. -- Jonah H. Harris, Database Internals Architect EnterpriseDB Corporation 732.331.1324
On 4/27/06, Tom Lane <tgl@sss.pgh.pa.us> wrote: > ... and is far more maintainable than an RD parser, and is not a > performance bottleneck. I've never seen yyparse occupy as much as 2% > of a backend profile ... Not more maintainable by any stretch of the imagination. For example, try and remove the AS alias for columns in our bison grammar without having to remove keywords from unreserved_keywords :) In PCCTS (or a hand-written RD parser), it's VERY easy because you can control the lookahead as necessary. Now, in terms of performance, it's hard to beat a LALR parser, but LL parsers are comparable if done correctly; especially if written by hand. Besides, parsing itself isn't what kills us, it's the lack of caching unprepared statements. Yes, this is another topic in and of itself, but I know there was discussion about it between you and Neil; did anything ever come of it? -- Jonah H. Harris, Database Internals Architect EnterpriseDB Corporation 732.331.1324
> For the sake of saying again, I already have a recursive-descent > parser for PostgreSQL written in a PCCTS grammar. It's something I > started writing years ago, but I'd be willing to consider open > sourcing it if the PostgreSQL community will really entertain the > thought of switching. > > Unfortunately, this discussion usually ends up with, "why would we > want to change what we have now when it already works?" Is it faster? How much faster?
On 4/27/06, Christopher Kings-Lynne <chris.kings-lynne@calorieking.com> wrote: > Is it faster? How much faster? I'm not sure, I haven't done direct timings on it vs. the bison version. When I wrote it, I wasn't really concerned with the time it took to parse. -- Jonah H. Harris, Database Internals Architect EnterpriseDB Corporation 732.331.1324
Jonah H. Harris wrote: > On 4/27/06, Christopher Kings-Lynne <chris.kings-lynne@calorieking.com> wrote: > > Is it faster? How much faster? > > I'm not sure, I haven't done direct timings on it vs. the bison > version. When I wrote it, I wasn't really concerned with the time it > took to parse. Is the source easier to maintain? -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On 4/27/06, Alvaro Herrera <alvherre@commandprompt.com> wrote: > Is the source easier to maintain? Yes, aside from extra lookahead, that was my main motivation. -- Jonah H. Harris, Database Internals Architect EnterpriseDB Corporation 732.331.1324