Re: scanner/parser minimization
От | Heikki Linnakangas |
---|---|
Тема | Re: scanner/parser minimization |
Дата | |
Msg-id | 51324954.9060004@vmware.com обсуждение исходный текст |
Ответ на | Re: scanner/parser minimization (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: scanner/parser minimization
|
Список | pgsql-hackers |
On 02.03.2013 17:09, Tom Lane wrote: > Greg Stark<stark@mit.edu> writes: >> Regarding yytransition I think the problem is we're using flex to >> implement keyword recognition which is usually not what it's used for. >> Usually people use flex to handle syntax things like quoting and >> numeric formats. All identifiers are handled by flex as equivalent. >> Then the last step in the scanner for identifiers is to look up the >> identifier in a hash table and return the keyword token if it's a >> keyword. That would massively simplify the scanner tables. > > Uh ... no. I haven't looked into why the flex tables are so large, > but this theory is just wrong. See ScanKeywordLookup(). Interestingly, the yy_transition array generated by flex used to be much smaller: 8.3: 22072 elements 8.4: 62623 elements master: 64535 elements The big jump between 8.3 and 8.4 was caused by introduction of the unicode escapes: U&'foo' [UESCAPE 'x'] . And in particular, the "error rule" for the UESCAPE, which we use to avoid backtracking. I experimented with a patch that uses two extra flex states to shorten the error rules, see attached. The idea is that after lexing a unicode literal like "U&'foo'", you enter a new state, in which you check whether an "UESCAPE 'x'" follows. This slashes the size of the array to 36581 elements. - Heikki
Вложения
В списке pgsql-hackers по дате отправления: