Обсуждение: New to_timestamp implementation is pretty strict
I like strict in general, but this doesn't seem logical: postgres=# SELECT to_timestamp('29-12-2005 01:2:03', 'DD-MM-YYYY HH24:MI:SS'); -- works to_timestamp ------------------------ 2005-12-29 01:02:03+02 (1 row) postgres=# SELECT to_timestamp('29-12-2005 01:02:3', 'DD-MM-YYYY HH24:MI:SS'); -- doesn't work ERROR: source string too short for "SS" formatting field DETAIL: Field requires 2 characters, but only 1 remain. HINT: If your source string is not fixed-width, try using the "FM" modifier. I think the end of string should be treated like a field separator, colon in this example, and we should accept both of the above. Opinions? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Dec 1, 2008, at 1:08 PM, Heikki Linnakangas wrote: > postgres=# SELECT to_timestamp('29-12-2005 01:02:3', 'DD-MM-YYYY > HH24:MI:SS'); -- doesn't work > ERROR: source string too short for "SS" formatting field > DETAIL: Field requires 2 characters, but only 1 remain. > HINT: If your source string is not fixed-width, try using the "FM" > modifier. > > I think the end of string should be treated like a field separator, > colon in this example, and we should accept both of the above. > Opinions? I'm generally in favor of being generous in the input one can accept, but in this case it seems ambiguous to me. Is that supposed to be :30 or :03? There's no way to tell. My $0.02. Best, David
"David E. Wheeler" <david@kineticode.com> writes: > On Dec 1, 2008, at 1:08 PM, Heikki Linnakangas wrote: >> I think the end of string should be treated like a field separator, >> colon in this example, and we should accept both of the above. >> Opinions? > I'm generally in favor of being generous in the input one can accept, > but in this case it seems ambiguous to me. Is that supposed to be :30 > or :03? There's no way to tell. But notice that we are allowing a single digit for the hour and minute fields. It's inconsistent that the last field works differently. (And it is that it's the last field, not that it's SS --- try minutes as the last field.) regards, tom lane
On Mon, Dec 1, 2008 at 2:45 PM, David E. Wheeler <david@kineticode.com> wrote: > On Dec 1, 2008, at 1:08 PM, Heikki Linnakangas wrote: > >> postgres=# SELECT to_timestamp('29-12-2005 01:02:3', 'DD-MM-YYYY >> HH24:MI:SS'); -- doesn't work >> ERROR: source string too short for "SS" formatting field >> DETAIL: Field requires 2 characters, but only 1 remain. >> HINT: If your source string is not fixed-width, try using the "FM" >> modifier. >> >> I think the end of string should be treated like a field separator, colon >> in this example, and we should accept both of the above. Opinions? > > I'm generally in favor of being generous in the input one can accept, but in > this case it seems ambiguous to me. Is that supposed to be :30 or :03? > There's no way to tell. How is it ambiguous? The leading zero is technically redundant. A trailing on most certainly isn't. -- Dave Page EnterpriseDB UK: http://www.enterprisedb.com
On Dec 1, 2008, at 3:52 PM, Tom Lane wrote: >> I'm generally in favor of being generous in the input one can accept, >> but in this case it seems ambiguous to me. Is that supposed to be :30 >> or :03? There's no way to tell. > > But notice that we are allowing a single digit for the hour and minute > fields. It's inconsistent that the last field works differently. > (And it is that it's the last field, not that it's SS --- try minutes > as the last field.) Oh, well yeah, it should be consistent. But I'm still not sure that :3 should be allowed. OTOH, who does that, anyway? Best, David
On Dec 1, 2008, at 3:55 PM, Dave Page wrote: >> I'm generally in favor of being generous in the input one can >> accept, but in >> this case it seems ambiguous to me. Is that supposed to be :30 or : >> 03? >> There's no way to tell. > > How is it ambiguous? The leading zero is technically redundant. A > trailing on most certainly isn't. it depends on how you look at it, I suppose. If you look at ":xy" as "x" being the 10s position and "y" being the 1s position, it makes no sense. If you look at it as an integer, it does. Best, David
David E. Wheeler wrote: > Oh, well yeah, it should be consistent. But I'm still not sure that :3 > should be allowed. OTOH, who does that, anyway? Anyone who prints times as %d:%d:%d. You can find those in the wild. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Mon, Dec 1, 2008 at 3:02 PM, David E. Wheeler <david@kineticode.com> wrote: > it depends on how you look at it, I suppose. If you look at ":xy" as "x" > being the 10s position and "y" being the 1s position, it makes no sense. Suffice it to say, I don't look at it that way :-). I'd wager most people wouldn't either, but I have no data to back that up of course. -- Dave Page EnterpriseDB UK: http://www.enterprisedb.com
On Dec 1, 2008, at 4:07 PM, Alvaro Herrera wrote: > David E. Wheeler wrote: > >> Oh, well yeah, it should be consistent. But I'm still not sure >> that :3 >> should be allowed. OTOH, who does that, anyway? > > Anyone who prints times as %d:%d:%d. You can find those in the wild. I guess I should have expected that. Sheesh. Best, David
On Dec 1, 2008, at 4:09 PM, Dave Page wrote: > On Mon, Dec 1, 2008 at 3:02 PM, David E. Wheeler > <david@kineticode.com> wrote: > >> it depends on how you look at it, I suppose. If you look at ":xy" >> as "x" >> being the 10s position and "y" being the 1s position, it makes no >> sense. > > Suffice it to say, I don't look at it that way :-). I'd wager most > people wouldn't either, but I have no data to back that up of course. Yeah, I could see that. It makes no sense to me (":3" just looks weird), but maybe I just think too much like a computer. ;-) Best, David
Dave Page wrote: > On Mon, Dec 1, 2008 at 3:02 PM, David E. Wheeler <david@kineticode.com> wrote: > > >> it depends on how you look at it, I suppose. If you look at ":xy" as "x" >> being the 10s position and "y" being the 1s position, it makes no sense. >> > > Suffice it to say, I don't look at it that way :-). I'd wager most > people wouldn't either, but I have no data to back that up of course. > > > Isn't the point that ambiguity is undesirable, as is inconsistency? So counts of people who see this one way or the other should be irrelevant. Alvaro noted the use in the wild of formats like "%d:%d:%d" for times. IMNSHO we should not cater to such bad code. cheers andrew cheers
How would you parse an input format of just 'SS' ? is there something ambiguous about '3' ? I don't see anything "bad" about using %d to output an integer number of seconds. -- greg
On Mon, Dec 1, 2008 at 10:33 AM, Greg Stark <stark@enterprisedb.com> wrote: > How would you parse an input format of just 'SS' ? is there something > ambiguous about '3' ? I don't see anything "bad" about using %d to > output an integer number of seconds. +1. It seems to me that it's pretty silly to say that we "know" that the 2 in "01:2:03" is intended to mean 02, but we are somehow confused about whether the 3 in "01:02:3" is intended to mean 03 or 30. Sure, the latter could be the result of a truncation, but if the user is randomly truncating their strings, they're going to have problems with a lot more than to_timestamp(). ...Robert
Greg Stark wrote: > How would you parse an input format of just 'SS' ? is there something > ambiguous about '3' ? I don't see anything "bad" about using %d to > output an integer number of seconds. > > The docs say that SS corresponds to "second (00-59)", so clearly it should expect a two digit zero padded number. What's so hard about using "%0.2d" ? cheers andrew
Another point here is that we have always accepted single digits in dates: portal=> select '2008-11-1'::date; date ------------2008-11-01 (1 row) portal=> select '2008-1-11'::date; date ------------2008-01-11 (1 row) If we're going to handle dates and timestamps inconsistently, there should be a good reason for it. ...Robert
"Robert Haas" <robertmhaas@gmail.com> writes: > Another point here is that we have always accepted single digits in dates: Yeah, but that's the general datetime input code, which has rather different goals than to_timestamp(). After thinking about it I'm inclined to feel that SS and friends should insist on exactly 2 digits. If you want to allow 1-or-2-digits then use FMSS, just like the error message tells you. (However, I have a vague feeling that Oracle doesn't insist on this, and in the end we ought to follow Oracle's behavior. Can anyone check?) In any case, it's certainly broken that the last field behaves differently from not-last fields. I'm not all that set on whether we insist on two digits or not, but I do think the inconsistency needs to be fixed. regards, tom lane
Robert Haas wrote: > On Mon, Dec 1, 2008 at 10:33 AM, Greg Stark <stark@enterprisedb.com> wrote: >> How would you parse an input format of just 'SS' ? is there something >> ambiguous about '3' ? I don't see anything "bad" about using %d to >> output an integer number of seconds. > > +1. > > It seems to me that it's pretty silly to say that we "know" that the 2 > in "01:2:03" is intended to mean 02, but we are somehow confused about > whether the 3 in "01:02:3" is intended to mean 03 or 30. Yep. It's a fair argument that we shouldn't accept either, but the inconsistency is just wrong. I've committed a patch fixing the inconsistency, by allowing "01:02:3". Now whether we should forbid both, my opinion is that we shouldn't; that would just unnecessarily brake old applications, and I don't think there's much danger of ambiguity in what "01:2:03" means. For better or worse, we also allow these more questionable inputs: postgres=# SELECT to_timestamp('2008/-3/01', 'YYYY/MM/DD'); to_timestamp ------------------------ 2007-09-01 00:00:00+03 (1 row) postgres=# SELECT to_timestamp('2008--3-01', 'YYYY-MM-DD'); to_timestamp ------------------------ 2007-09-01 00:00:00+03 (1 row) postgres=# SELECT to_timestamp('2008-03', 'YYYY-MM-DD'); to_timestamp ------------------------ 2008-03-01 00:00:00+02 (1 row) postgres=# SELECT to_timestamp('2008-03-04-foobar', 'YYYY-MM-DD'); to_timestamp ------------------------ 2008-03-04 00:00:00+02 (1 row) The argument for rejecting these is stronger, IMHO, but given that we allowed these in previous releases as well, I don't think we try to forbid them either. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
> For better or worse, we also allow these more questionable inputs: Wow. Those are all pretty atrocious. Even so, it's not clear to me that there's a lot of merit to changing the behavior. If to_timestamp() isn't rigorous enough, you can always stick some additional error checking in front of it; it's easy to write a regular expression that will only match EXACTLY YYYY-MM-DD if that's what you want to do. If to_timestamp() is excessively pedantic, it forces you into rewriting to_timestamp(), which is a lot more work. I probably still wouldn't make it accept anything quite as... creative... as these examples if starting over, but now that the existing version is out there, I think breaking backward compatibility isn't warranted. ...Robert
On Mon, Dec 1, 2008 at 11:08 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > postgres=# SELECT to_timestamp('29-12-2005 01:02:3', 'DD-MM-YYYY > HH24:MI:SS'); -- doesn't work ... > I think the end of string should be treated like a field separator, colon in > this example, and we should accept both of the above. Opinions? > (With apologies for being late to the thread) I can agree that it is inconsistent to treat separator characters, like colons and hyphens, differently to the end of the string. If we're allowing separators to imply variable-width nodes, then the end of the string should do so as well, so I think Heikki did the right thing here. However, I actually think that, ultimately, Heikki's example above *should* be rejected, and so should any other attempt to provide a value of the wrong width, even if there are separators present, unless the user has specified the 'FM' fill mode flag. Heikki's example shows a too-short string for SS, but what about one that is too long? Should we accept # to_timestamp('29122005 0102333', 'DDMMYYYY HH24MISS') As meaning three hundred and thirty-three seconds? I would argue we shouldn't; it's most likely that the user made an error, so the right thing to do is throw an exception and give them an opportunity to fix it. Making guesses about the user's intention when the input is heavily ambiguous isn't a fun game to be playing, for us or for the user. Given the contrary arguments (backwards- and Oracle- compatibility, mostly) I decided that was too much for me to bite off in my patch. That was the same reason I didn't fiddle with the treatment of end-of-string; 8.3 didn't treat it as a separator either. The different is that, although it always treated the final field as fixed-width, prior to my patch it didn't actually throw an error when fixed-width fields were too short. Cheers, BJ
Tom Lane wrote: > After thinking about it I'm inclined to feel that SS and friends should > insist on exactly 2 digits. If you want to allow 1-or-2-digits then use > FMSS, just like the error message tells you. (However, I have a vague > feeling that Oracle doesn't insist on this, and in the end we ought to > follow Oracle's behavior. Can anyone check?) Oracle doesn't insist on it. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com