Re: benchmarking Flex practices

Поиск

Список

Период

Сортировка

От	Tom Lane
Тема	Re: benchmarking Flex practices
Дата	26 ноября 2019 г. 15:32:29
Msg-id	30156.1574782349@sss.pgh.pa.us обсуждение исходный текст
Ответ на	Re: benchmarking Flex practices (John Naylor <john.naylor@2ndquadrant.com>)
Ответы	Re: benchmarking Flex practices
Список	pgsql-hackers

Дерево обсуждения

John Naylor <john.naylor@2ndquadrant.com> writes:
> It seems something is not quite right in v9 with the error position reporting:

>  SELECT U&'wrong: +0061' UESCAPE '+';
>  ERROR:  invalid Unicode escape character at or near "'+'"
>  LINE 1: SELECT U&'wrong: +0061' UESCAPE '+';
> -                                        ^
> +                               ^

> The caret is not pointing to the third token, or the second for that
> matter.

Interesting.  For me it points at the third token with or without
your fix ... some flex version discrepancy maybe?  Anyway, I have
no objection to your fix; it's probably cleaner than what I had.

>> * I did not do more with ecpg than get it to compile, using the
>> same hacks as in your v7.  It still fails its regression tests,
>> but now the reason is that what we've done in parser/parser.c
>> needs to be transposed into the identical functionality in
>> ecpg/preproc/parser.c.  Or at least some kind of functionality
>> there.  A problem with this approach is that it presumes we can
>> reduce a UIDENT sequence to a plain IDENT, but to do so we need
>> assumptions about the target encoding, and I'm not sure that
>> ecpg should make any such assumptions.  Maybe ecpg should just
>> reject all cases that produce non-ASCII identifiers?  (Probably
>> it could be made to do something smarter with more work, but
>> it's not clear to me that it's worth the trouble.)

> Hmm, I thought we only allowed Unicode escapes in the first place if
> the server encoding was UTF-8. Or did you mean something else?

Well, yeah, but the problem here is that ecpg would have to assume
that the client encoding that its output program will be executed
with is UTF-8.  That seems pretty action-at-a-distance-y.

I haven't looked closely at what ecpg does with the processed
identifiers.  If it just spits them out as-is, a possible solution
is to not do anything about de-escaping, but pass the sequence
U&"..." (plus UESCAPE ... if any), just like that, on to the grammar
as the value of the IDENT token.

BTW, in the back of my mind here is Chapman's point that it'd be
a large step forward in usability if we allowed Unicode escapes
when the backend encoding is *not* UTF-8.  I think I see how to
get there once this patch is done, so I definitely would not like
to introduce some comparable restriction in ecpg.

            regards, tom lane

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: benchmarking Flex practices