Обсуждение: Syntax error reporting (was Re: [PATCHES] syntax error position "CREATE FUNCTION" bug fix)

Поиск
Список
Период
Сортировка

Syntax error reporting (was Re: [PATCHES] syntax error position "CREATE FUNCTION" bug fix)

От
Tom Lane
Дата:
[ moving thread to hackers ]

Fabien COELHO <coelho@cri.ensmp.fr> writes:
> However, I still stick with my "bad" simple idea because the simpler the
> better, and also because of the following example:
> ...
> psql> SELECT count_tup('pg_shadow');
> ERROR:  syntax error at or near "FRM" at character 22
> CONTEXT:  PL/pgSQL function "count_tup" line 4 at for over execute statement

> As you can notice, the extract is not in the submitted query, so there
> is no point to show it there.

Yeah.  However, I dislike your solution because it confuses the cases of
a syntax error in the actually submitted query, and a syntax error in an
internally generated query.  We should keep these cases clearly separate
because clients may want to do different things.  For a syntax error in
the submitted input, what you probably want to do is edit and resubmit
the original query --- that's the case I was thinking about in saying
that a GUI client like pgadmin would want to set the editing cursor in
the original input window.  But this action is nonsensical if the syntax
error is from a generated query.  Perhaps the GUI client could be smart
enough to pop up a new window in which one could edit and resubmit the
erroneous function definition.

Even in psql's simplistic error handling, you want to distinguish the
two cases.  There's no point in showing the entire original query; one
line worth of context is plenty.  But you very probably do want to see
all of a generated query.  So I don't want the backend sending back
error reports that look the same in both cases.

The original design concept for the 'P' (position) error field is that
it would be used to locate syntax errors in the *original query*, and
so its presence is a cue to the client code to go in the direction of
setting the editing cursor.  (Note the protocol specification says
"index into the original query string".)  We have in fact misimplemented
it, because it is being set for syntax errors in internally generated
queries too.

I was already planning to modify plpgsql to send back the full text of
generated queries when there is an error.  My intention was to supply
this just as part of the CONTEXT stack, that is instead of your example
of

ERROR:  syntax error at or near "FRM" at character 22
CONTEXT:  PL/pgSQL function "count_tup" line 4 at for over execute statement

you'd get something like

ERROR:  syntax error at or near "FRM" at character 22
CONTEXT:  Executing command "SELECT COUNT(*) AS c FRM pg_shadow"
PL/pgSQL function "count_tup" line 4 at for over execute statement

However it might be better to invent a new error-message field that
carries just the text of the SQL command, rather than stuffing it into
CONTEXT.  (This is similar to your original patch, but different in
detail because I'm envisioning sending back generated queries, never the
submitted query.  Regurgitating the submitted query is just a waste of
bandwidth.)  The plus side of that would be that it'd be easy to extract
for syntax-error highlighting.  The minuses are that existing clients
would fail to print such a field (the protocol spec says to ignore
unknown fields), and that there is no good way to cope with nested
queries.

A possible compromise is to put the text of the generated SQL command
into a new field only if the error is a syntax error, and put it into
the CONTEXT stack otherwise.  Syntax errors couldn't be nested so at
least that problem goes away.  This seems a bit messy though.

The other thing to think about is whether we should invent a new field
to carry syntax error position for generated queries, rather than making
'P' do double duty as it does now.  If we don't do that then we have to
change the protocol specification to reflect reality.  In any case I
think it has to be possible to tell very easily from the error message
whether the 'P' position refers to the submitted query or a generated
query.

Comments anyone?
        regards, tom lane


Re: Syntax error reporting (was Re: [PATCHES] syntax error position

От
Fabien COELHO
Дата:
Dear Tom,

> > However, I still stick with my "bad" simple idea because the simpler the
> > better, and also because of the following example:
> > ...
> > psql> SELECT count_tup('pg_shadow');
> > ERROR:  syntax error at or near "FRM" at character 22
> > CONTEXT:  PL/pgSQL function "count_tup" line 4 at for over execute statement
>
> > As you can notice, the extract is not in the submitted query, so there
> > is no point to show it there.
>
> Yeah.  However, I dislike your solution because it confuses the cases of
> a syntax error in the actually submitted query, and a syntax error in an
> internally generated query.  We should keep these cases clearly separate
> because clients may want to do different things.
> [...]

I agree with you that both reports should not look the same.

The good news is that they already do not look the same, thanks
to the CONTEXT information. However, the context information is
informal (it is just plain English), to it is not easy for a client
to take that into account.

> The original design concept for the 'P' (position) error field is that
> it would be used to locate syntax errors in the *original query*, and
> so its presence is a cue to the client code to go in the direction of
> setting the editing cursor.  (Note the protocol specification says
> "index into the original query string".)

Yes, I noted that.

In my proposed patch, I changed it to the "processed" query, which
may or may not be the initial query.

> We have in fact misimplemented it, because it is being set for syntax
> errors in internally generated queries too.

Well, from the parser point of view, it is a little bit messy to have
to do different things for error reporting in different context. That
why I would try a one-fit-all solution. Maybe I'm a little bit lazy, but
sometimes it is a quality in programming, if it help keep things simple.

> I was already planning to modify plpgsql to send back the full text of
> generated queries when there is an error.  My intention was to supply
> this just as part of the CONTEXT stack, that is instead of your example
> of
>
> ERROR:  syntax error at or near "FRM" at character 22
> CONTEXT:  PL/pgSQL function "count_tup" line 4 at for over execute statement
>
> you'd get something like
>
> ERROR:  syntax error at or near "FRM" at character 22
> CONTEXT:  Executing command "SELECT COUNT(*) AS c FRM pg_shadow"
> PL/pgSQL function "count_tup" line 4 at for over execute statement
>
> However it might be better to invent a new error-message field that
> carries just the text of the SQL command, rather than stuffing it into
> CONTEXT.

I'm not sure I would like the CONTEXT field for that? as it may be
usefull for a lot of things. In your above example, the CONTEXT is filled
with 2 different informations that are just messed up for the client.
If localisation is used, there would be no way for a client to seperate
them. Different information should require different fields.

More over, I have other ideas for CONTEXT, which should really be a stack.
But that is another issue.

> (This is similar to your original patch, but different in detail because
> I'm envisioning sending back generated queries, never the submitted
> query.  Regurgitating the submitted query is just a waste of bandwidth.)

Well, if it is the same I agree. On the other hand, it is a rare case,
during an exception. I would expect submitted queries to work most of the
time, because the client is just some program which uses the database.

> The plus side of that would be that it'd be easy to extract
> for syntax-error highlighting.  The minuses are that existing clients
> would fail to print such a field (the protocol spec says to ignore
> unknown fields

That's a really good design idea, because it allows to include new fields
and still to have old client compatible with the new protocol.

I think it is no big deal if existing clients don't make use if the new
information. It was not there before anyway, so it is just as is was
before.

Moreover, one should distinguish between fields for the human and fields
for the client. reporting the query and the position is more for the
client, giving the context is more for the human. One is use for
some automatic processing, the other is a high level semantical
information to be interpreted by the user.

>), and that there is no good way to cope with nested queries.

> A possible compromise is to put the text of the generated SQL command
> into a new field only if the error is a syntax error, and put it into
> the CONTEXT stack otherwise.  Syntax errors couldn't be nested so at,
> least that problem goes away.  This seems a bit messy though.

I don't like it, because above comments.

> The other thing to think about is whether we should invent a new field
> to carry syntax error position for generated queries, rather than making
> 'P' do double duty as it does now.  If we don't do that then we have to
> change the protocol specification to reflect reality.  In any case I
> think it has to be possible to tell very easily from the error message
> whether the 'P' position refers to the submitted query or a generated
> query.

I think a new field is alas necessary. I thought about a

P 22 SELECT something FRM table

but psql simply uses the raw "P" field content in the message, so you
would get:

... at or near character 22 SELECT something FRM table

with old clients:-(

-- 
Fabien Coelho - coelho@cri.ensmp.fr


Fabien COELHO <coelho@cri.ensmp.fr> writes:
> I agree with you that both reports should not look the same.

> The good news is that they already do not look the same, thanks
> to the CONTEXT information.

Right, but you quite properly didn't like my quick-hack to psql that
assumes that the presence of any CONTEXT means it's not a top-level
syntax error.  I would like to replace that hack with something cleaner.

>> We have in fact misimplemented it, because it is being set for syntax
>> errors in internally generated queries too.

> Well, from the parser point of view, it is a little bit messy to have
> to do different things for error reporting in different context. That
> why I would try a one-fit-all solution.

The parser needn't do anything different.  What I'm imagining is that
for internally submitted queries, there will always be an
error_context_stack routine that can transform the error report into
whatever we decide the appropriate format is.

>> However it might be better to invent a new error-message field that
>> carries just the text of the SQL command, rather than stuffing it into
>> CONTEXT.

> I'm not sure I would like the CONTEXT field for that? as it may be
> usefull for a lot of things. In your above example, the CONTEXT is filled
> with 2 different informations that are just messed up for the client.
> If localisation is used, there would be no way for a client to seperate
> them. Different information should require different fields.

The point is that CONTEXT is essentially a record of "how we got here".
In a situation where the actual error occurs inside a couple of levels
of nesting, you want to be able to report the outer queries as well as
the one that directly caused the error.  I agree that there's probably
little hope of clients automatically making sense of the CONTEXT info.
But we need to provide it to help people debug complex functions.

> More over, I have other ideas for CONTEXT, which should really be a stack.

It already is a stack.

>> The other thing to think about is whether we should invent a new field
>> to carry syntax error position for generated queries, rather than making
>> 'P' do double duty as it does now.

> I think a new field is alas necessary.

I'm leaning in that direction also.  How about

'P': used only for syntax error position in the client-submitted query.

'p': syntax error position in an internally-generated query.

'q': text of an internally-generated query ('p' and 'q' would always    appear together).

In the case of a non-syntax error in an internally generated query, we
should stick the query text into the CONTEXT stack, since it might not
be the most closely nested context item anyway.

An existing client will ignore the 'p' and 'q' fields, thus providing
behavior that's no worse than what you get with 7.4 now.
        regards, tom lane


Re: Syntax error reporting (was Re: [PATCHES] syntax error position

От
Fabien COELHO
Дата:
Dear Tom,

> The point is that CONTEXT is essentially a record of "how we got here".

Yes, but for human eyes.

> In a situation where the actual error occurs inside a couple of levels
> of nesting, you want to be able to report the outer queries as well as
> the one that directly caused the error.  I agree that there's probably
> little hope of clients automatically making sense of the CONTEXT info.
> But we need to provide it to help people debug complex functions.

Yes.

> > More over, I have other ideas for CONTEXT, which should really be a stack.
> It already is a stack.

Ok, I agree that there is a "push", but I'm still looking fot the "pop".

Maybe I missed something, but it seemed to me that strings are appended
on to the other, and there is no way back.

That's what I mean by a "real" stack. If a student of mine comes to me
with his or her "stack" without a "pop", he or she will not have a very
good grade;-)

So when more information are to be fed into the context, they are to be
fed when they are actually needed on the error, maybe with callbacks?
They cannot be fed prevently.

What I would have looked for, is a stack on which functions could push
and pop informations as they want, so that the stack would be always
available for any error or warning or debug trace down the callgraph.

If it is the case already, as I said above, I haven't seen the "pop"
feature yet.

> > I think a new field is alas necessary.
> I'm leaning in that direction also.  How about
> 'P': used only for syntax error position in the client-submitted query.
> 'p': syntax error position in an internally-generated query.
> 'q': text of an internally-generated query ('p' and 'q' would always
>      appear together).

Why not.

Taking your very idea a little step further, I would suggest the
following, which is pretty similar, but even more general:

1/ field conventions:
 - upper case letter fields are for humans.   they are meant to be shown to the user, even new ones?
 - lower case letter fields are for machines / clients   they are never meant to be shown to the user!   but they
shouldgive some information to the client on how to   process an error report.
 

2/ new fields:
- p: character [length] [reserved for future use]  where the "parser error" occured, with the length of the  offending
tokenif available (for better hilighting?)  this is basically the current P, but not meant to the user,  and with
possibleadditions.
 
- q: select foo frm bla; the raw sql query that was being processed, whatever the case. I think syntax error are rare
occurences,so it is no big deal to return the query anyway. this may make some clients code easier because there is no
differencein the processing for different kind of errors.
 
- c: context-description some context for the client, not the human, so no localisation. something like:  c: lang=sql
query c: lang=sql query-part  c: lang=sql function 'foo' 4
 
 it could be adapted to other syntax errors, something like.
  c: lang=plpgsql function 'foo' 4  c: lang=perl function 'bla' 43
 or separate fields: l:sql for the language, f: for the function... in such cases, the q: field would not necessarily
containssql.
 
 the client may open an editor for plpgsql function of it sees it fits... or whatever. The information is available,
thatis the point.
 

Have a nice day,

-- 
Fabien Coelho - coelho@cri.ensmp.fr


Fabien COELHO <coelho@cri.ensmp.fr> writes:
>>> More over, I have other ideas for CONTEXT, which should really be a stack.
>> It already is a stack.

> Ok, I agree that there is a "push", but I'm still looking fot the "pop".

> Maybe I missed something, but it seemed to me that strings are appended
> on to the other, and there is no way back.

But the string list is not constructed until the error actually occurs.
You don't need a pop at that point --- the call stack is what it is.

I think you are imagining that outer-level context hooks should be able
to editorialize on what inner-level ones said (or perhaps vice versa?)
but I honestly cannot think of a valid use-case for that.

> What I would have looked for, is a stack on which functions could push
> and pop informations as they want, so that the stack would be always
> available for any error or warning or debug trace down the callgraph.

Look at the existing examples of adjusting the error_context_stack.
They already do all that, they just don't bother to compute the error
strings unless actually needed.  I'm not willing to push very much cost
into the non-error path of control when there's no visible payoff.
        regards, tom lane


Re: Syntax error reporting (was Re: [PATCHES] syntax error

От
Fabien COELHO
Дата:
> > Maybe I missed something, but it seemed to me that strings are appended
> > on to the other, and there is no way back.
>
> But the string list is not constructed until the error actually occurs.
> You don't need a pop at that point --- the call stack is what it is.
>
> I think you are imagining that outer-level context hooks should be able
> to editorialize on what inner-level ones said (or perhaps vice versa?)

No.

I just missed the "error_context_stack" by focussing on "errcontext()",
which does not managed a stack at all, so I was quite puzzled. Now
it is clearer.

Thanks,

-- 
Fabien Coelho - coelho@cri.ensmp.fr