Обсуждение: Syntax error reporting (was Re: [PATCHES] syntax error position "CREATE FUNCTION" bug fix)
[ moving thread to hackers ] Fabien COELHO <coelho@cri.ensmp.fr> writes: > However, I still stick with my "bad" simple idea because the simpler the > better, and also because of the following example: > ... > psql> SELECT count_tup('pg_shadow'); > ERROR: syntax error at or near "FRM" at character 22 > CONTEXT: PL/pgSQL function "count_tup" line 4 at for over execute statement > As you can notice, the extract is not in the submitted query, so there > is no point to show it there. Yeah. However, I dislike your solution because it confuses the cases of a syntax error in the actually submitted query, and a syntax error in an internally generated query. We should keep these cases clearly separate because clients may want to do different things. For a syntax error in the submitted input, what you probably want to do is edit and resubmit the original query --- that's the case I was thinking about in saying that a GUI client like pgadmin would want to set the editing cursor in the original input window. But this action is nonsensical if the syntax error is from a generated query. Perhaps the GUI client could be smart enough to pop up a new window in which one could edit and resubmit the erroneous function definition. Even in psql's simplistic error handling, you want to distinguish the two cases. There's no point in showing the entire original query; one line worth of context is plenty. But you very probably do want to see all of a generated query. So I don't want the backend sending back error reports that look the same in both cases. The original design concept for the 'P' (position) error field is that it would be used to locate syntax errors in the *original query*, and so its presence is a cue to the client code to go in the direction of setting the editing cursor. (Note the protocol specification says "index into the original query string".) We have in fact misimplemented it, because it is being set for syntax errors in internally generated queries too. I was already planning to modify plpgsql to send back the full text of generated queries when there is an error. My intention was to supply this just as part of the CONTEXT stack, that is instead of your example of ERROR: syntax error at or near "FRM" at character 22 CONTEXT: PL/pgSQL function "count_tup" line 4 at for over execute statement you'd get something like ERROR: syntax error at or near "FRM" at character 22 CONTEXT: Executing command "SELECT COUNT(*) AS c FRM pg_shadow" PL/pgSQL function "count_tup" line 4 at for over execute statement However it might be better to invent a new error-message field that carries just the text of the SQL command, rather than stuffing it into CONTEXT. (This is similar to your original patch, but different in detail because I'm envisioning sending back generated queries, never the submitted query. Regurgitating the submitted query is just a waste of bandwidth.) The plus side of that would be that it'd be easy to extract for syntax-error highlighting. The minuses are that existing clients would fail to print such a field (the protocol spec says to ignore unknown fields), and that there is no good way to cope with nested queries. A possible compromise is to put the text of the generated SQL command into a new field only if the error is a syntax error, and put it into the CONTEXT stack otherwise. Syntax errors couldn't be nested so at least that problem goes away. This seems a bit messy though. The other thing to think about is whether we should invent a new field to carry syntax error position for generated queries, rather than making 'P' do double duty as it does now. If we don't do that then we have to change the protocol specification to reflect reality. In any case I think it has to be possible to tell very easily from the error message whether the 'P' position refers to the submitted query or a generated query. Comments anyone? regards, tom lane
Dear Tom, > > However, I still stick with my "bad" simple idea because the simpler the > > better, and also because of the following example: > > ... > > psql> SELECT count_tup('pg_shadow'); > > ERROR: syntax error at or near "FRM" at character 22 > > CONTEXT: PL/pgSQL function "count_tup" line 4 at for over execute statement > > > As you can notice, the extract is not in the submitted query, so there > > is no point to show it there. > > Yeah. However, I dislike your solution because it confuses the cases of > a syntax error in the actually submitted query, and a syntax error in an > internally generated query. We should keep these cases clearly separate > because clients may want to do different things. > [...] I agree with you that both reports should not look the same. The good news is that they already do not look the same, thanks to the CONTEXT information. However, the context information is informal (it is just plain English), to it is not easy for a client to take that into account. > The original design concept for the 'P' (position) error field is that > it would be used to locate syntax errors in the *original query*, and > so its presence is a cue to the client code to go in the direction of > setting the editing cursor. (Note the protocol specification says > "index into the original query string".) Yes, I noted that. In my proposed patch, I changed it to the "processed" query, which may or may not be the initial query. > We have in fact misimplemented it, because it is being set for syntax > errors in internally generated queries too. Well, from the parser point of view, it is a little bit messy to have to do different things for error reporting in different context. That why I would try a one-fit-all solution. Maybe I'm a little bit lazy, but sometimes it is a quality in programming, if it help keep things simple. > I was already planning to modify plpgsql to send back the full text of > generated queries when there is an error. My intention was to supply > this just as part of the CONTEXT stack, that is instead of your example > of > > ERROR: syntax error at or near "FRM" at character 22 > CONTEXT: PL/pgSQL function "count_tup" line 4 at for over execute statement > > you'd get something like > > ERROR: syntax error at or near "FRM" at character 22 > CONTEXT: Executing command "SELECT COUNT(*) AS c FRM pg_shadow" > PL/pgSQL function "count_tup" line 4 at for over execute statement > > However it might be better to invent a new error-message field that > carries just the text of the SQL command, rather than stuffing it into > CONTEXT. I'm not sure I would like the CONTEXT field for that? as it may be usefull for a lot of things. In your above example, the CONTEXT is filled with 2 different informations that are just messed up for the client. If localisation is used, there would be no way for a client to seperate them. Different information should require different fields. More over, I have other ideas for CONTEXT, which should really be a stack. But that is another issue. > (This is similar to your original patch, but different in detail because > I'm envisioning sending back generated queries, never the submitted > query. Regurgitating the submitted query is just a waste of bandwidth.) Well, if it is the same I agree. On the other hand, it is a rare case, during an exception. I would expect submitted queries to work most of the time, because the client is just some program which uses the database. > The plus side of that would be that it'd be easy to extract > for syntax-error highlighting. The minuses are that existing clients > would fail to print such a field (the protocol spec says to ignore > unknown fields That's a really good design idea, because it allows to include new fields and still to have old client compatible with the new protocol. I think it is no big deal if existing clients don't make use if the new information. It was not there before anyway, so it is just as is was before. Moreover, one should distinguish between fields for the human and fields for the client. reporting the query and the position is more for the client, giving the context is more for the human. One is use for some automatic processing, the other is a high level semantical information to be interpreted by the user. >), and that there is no good way to cope with nested queries. > A possible compromise is to put the text of the generated SQL command > into a new field only if the error is a syntax error, and put it into > the CONTEXT stack otherwise. Syntax errors couldn't be nested so at, > least that problem goes away. This seems a bit messy though. I don't like it, because above comments. > The other thing to think about is whether we should invent a new field > to carry syntax error position for generated queries, rather than making > 'P' do double duty as it does now. If we don't do that then we have to > change the protocol specification to reflect reality. In any case I > think it has to be possible to tell very easily from the error message > whether the 'P' position refers to the submitted query or a generated > query. I think a new field is alas necessary. I thought about a P 22 SELECT something FRM table but psql simply uses the raw "P" field content in the message, so you would get: ... at or near character 22 SELECT something FRM table with old clients:-( -- Fabien Coelho - coelho@cri.ensmp.fr
Fabien COELHO <coelho@cri.ensmp.fr> writes: > I agree with you that both reports should not look the same. > The good news is that they already do not look the same, thanks > to the CONTEXT information. Right, but you quite properly didn't like my quick-hack to psql that assumes that the presence of any CONTEXT means it's not a top-level syntax error. I would like to replace that hack with something cleaner. >> We have in fact misimplemented it, because it is being set for syntax >> errors in internally generated queries too. > Well, from the parser point of view, it is a little bit messy to have > to do different things for error reporting in different context. That > why I would try a one-fit-all solution. The parser needn't do anything different. What I'm imagining is that for internally submitted queries, there will always be an error_context_stack routine that can transform the error report into whatever we decide the appropriate format is. >> However it might be better to invent a new error-message field that >> carries just the text of the SQL command, rather than stuffing it into >> CONTEXT. > I'm not sure I would like the CONTEXT field for that? as it may be > usefull for a lot of things. In your above example, the CONTEXT is filled > with 2 different informations that are just messed up for the client. > If localisation is used, there would be no way for a client to seperate > them. Different information should require different fields. The point is that CONTEXT is essentially a record of "how we got here". In a situation where the actual error occurs inside a couple of levels of nesting, you want to be able to report the outer queries as well as the one that directly caused the error. I agree that there's probably little hope of clients automatically making sense of the CONTEXT info. But we need to provide it to help people debug complex functions. > More over, I have other ideas for CONTEXT, which should really be a stack. It already is a stack. >> The other thing to think about is whether we should invent a new field >> to carry syntax error position for generated queries, rather than making >> 'P' do double duty as it does now. > I think a new field is alas necessary. I'm leaning in that direction also. How about 'P': used only for syntax error position in the client-submitted query. 'p': syntax error position in an internally-generated query. 'q': text of an internally-generated query ('p' and 'q' would always appear together). In the case of a non-syntax error in an internally generated query, we should stick the query text into the CONTEXT stack, since it might not be the most closely nested context item anyway. An existing client will ignore the 'p' and 'q' fields, thus providing behavior that's no worse than what you get with 7.4 now. regards, tom lane
Dear Tom, > The point is that CONTEXT is essentially a record of "how we got here". Yes, but for human eyes. > In a situation where the actual error occurs inside a couple of levels > of nesting, you want to be able to report the outer queries as well as > the one that directly caused the error. I agree that there's probably > little hope of clients automatically making sense of the CONTEXT info. > But we need to provide it to help people debug complex functions. Yes. > > More over, I have other ideas for CONTEXT, which should really be a stack. > It already is a stack. Ok, I agree that there is a "push", but I'm still looking fot the "pop". Maybe I missed something, but it seemed to me that strings are appended on to the other, and there is no way back. That's what I mean by a "real" stack. If a student of mine comes to me with his or her "stack" without a "pop", he or she will not have a very good grade;-) So when more information are to be fed into the context, they are to be fed when they are actually needed on the error, maybe with callbacks? They cannot be fed prevently. What I would have looked for, is a stack on which functions could push and pop informations as they want, so that the stack would be always available for any error or warning or debug trace down the callgraph. If it is the case already, as I said above, I haven't seen the "pop" feature yet. > > I think a new field is alas necessary. > I'm leaning in that direction also. How about > 'P': used only for syntax error position in the client-submitted query. > 'p': syntax error position in an internally-generated query. > 'q': text of an internally-generated query ('p' and 'q' would always > appear together). Why not. Taking your very idea a little step further, I would suggest the following, which is pretty similar, but even more general: 1/ field conventions: - upper case letter fields are for humans. they are meant to be shown to the user, even new ones? - lower case letter fields are for machines / clients they are never meant to be shown to the user! but they shouldgive some information to the client on how to process an error report. 2/ new fields: - p: character [length] [reserved for future use] where the "parser error" occured, with the length of the offending tokenif available (for better hilighting?) this is basically the current P, but not meant to the user, and with possibleadditions. - q: select foo frm bla; the raw sql query that was being processed, whatever the case. I think syntax error are rare occurences,so it is no big deal to return the query anyway. this may make some clients code easier because there is no differencein the processing for different kind of errors. - c: context-description some context for the client, not the human, so no localisation. something like: c: lang=sql query c: lang=sql query-part c: lang=sql function 'foo' 4 it could be adapted to other syntax errors, something like. c: lang=plpgsql function 'foo' 4 c: lang=perl function 'bla' 43 or separate fields: l:sql for the language, f: for the function... in such cases, the q: field would not necessarily containssql. the client may open an editor for plpgsql function of it sees it fits... or whatever. The information is available, thatis the point. Have a nice day, -- Fabien Coelho - coelho@cri.ensmp.fr
Fabien COELHO <coelho@cri.ensmp.fr> writes: >>> More over, I have other ideas for CONTEXT, which should really be a stack. >> It already is a stack. > Ok, I agree that there is a "push", but I'm still looking fot the "pop". > Maybe I missed something, but it seemed to me that strings are appended > on to the other, and there is no way back. But the string list is not constructed until the error actually occurs. You don't need a pop at that point --- the call stack is what it is. I think you are imagining that outer-level context hooks should be able to editorialize on what inner-level ones said (or perhaps vice versa?) but I honestly cannot think of a valid use-case for that. > What I would have looked for, is a stack on which functions could push > and pop informations as they want, so that the stack would be always > available for any error or warning or debug trace down the callgraph. Look at the existing examples of adjusting the error_context_stack. They already do all that, they just don't bother to compute the error strings unless actually needed. I'm not willing to push very much cost into the non-error path of control when there's no visible payoff. regards, tom lane
> > Maybe I missed something, but it seemed to me that strings are appended > > on to the other, and there is no way back. > > But the string list is not constructed until the error actually occurs. > You don't need a pop at that point --- the call stack is what it is. > > I think you are imagining that outer-level context hooks should be able > to editorialize on what inner-level ones said (or perhaps vice versa?) No. I just missed the "error_context_stack" by focussing on "errcontext()", which does not managed a stack at all, so I was quite puzzled. Now it is clearer. Thanks, -- Fabien Coelho - coelho@cri.ensmp.fr