Обсуждение: New to_timestamp implementation is pretty strict

Поиск
Список
Период
Сортировка

New to_timestamp implementation is pretty strict

От
Heikki Linnakangas
Дата:
I like strict in general, but this doesn't seem logical:

postgres=# SELECT to_timestamp('29-12-2005 01:2:03', 'DD-MM-YYYY 
HH24:MI:SS'); -- works      to_timestamp
------------------------ 2005-12-29 01:02:03+02
(1 row)

postgres=# SELECT to_timestamp('29-12-2005 01:02:3', 'DD-MM-YYYY 
HH24:MI:SS'); -- doesn't work
ERROR:  source string too short for "SS" formatting field
DETAIL:  Field requires 2 characters, but only 1 remain.
HINT:  If your source string is not fixed-width, try using the "FM" 
modifier.

I think the end of string should be treated like a field separator, 
colon in this example, and we should accept both of the above. Opinions?

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: New to_timestamp implementation is pretty strict

От
"David E. Wheeler"
Дата:
On Dec 1, 2008, at 1:08 PM, Heikki Linnakangas wrote:

> postgres=# SELECT to_timestamp('29-12-2005 01:02:3', 'DD-MM-YYYY  
> HH24:MI:SS'); -- doesn't work
> ERROR:  source string too short for "SS" formatting field
> DETAIL:  Field requires 2 characters, but only 1 remain.
> HINT:  If your source string is not fixed-width, try using the "FM"  
> modifier.
>
> I think the end of string should be treated like a field separator,  
> colon in this example, and we should accept both of the above.  
> Opinions?

I'm generally in favor of being generous in the input one can accept,  
but in this case it seems ambiguous to me. Is that supposed to be :30  
or :03? There's no way to tell.

My $0.02.

Best,

David


Re: New to_timestamp implementation is pretty strict

От
Tom Lane
Дата:
"David E. Wheeler" <david@kineticode.com> writes:
> On Dec 1, 2008, at 1:08 PM, Heikki Linnakangas wrote:
>> I think the end of string should be treated like a field separator,  
>> colon in this example, and we should accept both of the above.  
>> Opinions?

> I'm generally in favor of being generous in the input one can accept,  
> but in this case it seems ambiguous to me. Is that supposed to be :30  
> or :03? There's no way to tell.

But notice that we are allowing a single digit for the hour and minute
fields.  It's inconsistent that the last field works differently.
(And it is that it's the last field, not that it's SS --- try minutes
as the last field.)
        regards, tom lane


Re: New to_timestamp implementation is pretty strict

От
"Dave Page"
Дата:
On Mon, Dec 1, 2008 at 2:45 PM, David E. Wheeler <david@kineticode.com> wrote:
> On Dec 1, 2008, at 1:08 PM, Heikki Linnakangas wrote:
>
>> postgres=# SELECT to_timestamp('29-12-2005 01:02:3', 'DD-MM-YYYY
>> HH24:MI:SS'); -- doesn't work
>> ERROR:  source string too short for "SS" formatting field
>> DETAIL:  Field requires 2 characters, but only 1 remain.
>> HINT:  If your source string is not fixed-width, try using the "FM"
>> modifier.
>>
>> I think the end of string should be treated like a field separator, colon
>> in this example, and we should accept both of the above. Opinions?
>
> I'm generally in favor of being generous in the input one can accept, but in
> this case it seems ambiguous to me. Is that supposed to be :30 or :03?
> There's no way to tell.

How is it ambiguous? The leading zero is technically redundant. A
trailing on most certainly isn't.

-- 
Dave Page
EnterpriseDB UK:   http://www.enterprisedb.com


Re: New to_timestamp implementation is pretty strict

От
"David E. Wheeler"
Дата:
On Dec 1, 2008, at 3:52 PM, Tom Lane wrote:

>> I'm generally in favor of being generous in the input one can accept,
>> but in this case it seems ambiguous to me. Is that supposed to be :30
>> or :03? There's no way to tell.
>
> But notice that we are allowing a single digit for the hour and minute
> fields.  It's inconsistent that the last field works differently.
> (And it is that it's the last field, not that it's SS --- try minutes
> as the last field.)

Oh, well yeah, it should be consistent. But I'm still not sure that :3  
should be allowed. OTOH, who does that, anyway?

Best,

David


Re: New to_timestamp implementation is pretty strict

От
"David E. Wheeler"
Дата:
On Dec 1, 2008, at 3:55 PM, Dave Page wrote:

>> I'm generally in favor of being generous in the input one can  
>> accept, but in
>> this case it seems ambiguous to me. Is that supposed to be :30 or : 
>> 03?
>> There's no way to tell.
>
> How is it ambiguous? The leading zero is technically redundant. A
> trailing on most certainly isn't.

it depends on how you look at it, I suppose. If you look at ":xy" as  
"x" being the 10s position and "y" being the 1s position, it makes no  
sense. If you look at it as an integer, it does.

Best,

David


Re: New to_timestamp implementation is pretty strict

От
Alvaro Herrera
Дата:
David E. Wheeler wrote:

> Oh, well yeah, it should be consistent. But I'm still not sure that :3  
> should be allowed. OTOH, who does that, anyway?

Anyone who prints times as %d:%d:%d.  You can find those in the wild.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: New to_timestamp implementation is pretty strict

От
"Dave Page"
Дата:
On Mon, Dec 1, 2008 at 3:02 PM, David E. Wheeler <david@kineticode.com> wrote:

> it depends on how you look at it, I suppose. If you look at ":xy" as "x"
> being the 10s position and "y" being the 1s position, it makes no sense.

Suffice it to say, I don't look at it that way :-). I'd wager most
people wouldn't either, but I have no data to back that up of course.


-- 
Dave Page
EnterpriseDB UK:   http://www.enterprisedb.com


Re: New to_timestamp implementation is pretty strict

От
"David E. Wheeler"
Дата:
On Dec 1, 2008, at 4:07 PM, Alvaro Herrera wrote:

> David E. Wheeler wrote:
>
>> Oh, well yeah, it should be consistent. But I'm still not sure  
>> that :3
>> should be allowed. OTOH, who does that, anyway?
>
> Anyone who prints times as %d:%d:%d.  You can find those in the wild.

I guess I should have expected that. Sheesh.

Best,

David



Re: New to_timestamp implementation is pretty strict

От
"David E. Wheeler"
Дата:
On Dec 1, 2008, at 4:09 PM, Dave Page wrote:

> On Mon, Dec 1, 2008 at 3:02 PM, David E. Wheeler  
> <david@kineticode.com> wrote:
>
>> it depends on how you look at it, I suppose. If you look at ":xy"  
>> as "x"
>> being the 10s position and "y" being the 1s position, it makes no  
>> sense.
>
> Suffice it to say, I don't look at it that way :-). I'd wager most
> people wouldn't either, but I have no data to back that up of course.

Yeah, I could see that. It makes no sense to me (":3" just looks  
weird), but maybe I just think too much like a computer. ;-)

Best,

David



Re: New to_timestamp implementation is pretty strict

От
Andrew Dunstan
Дата:

Dave Page wrote:
> On Mon, Dec 1, 2008 at 3:02 PM, David E. Wheeler <david@kineticode.com> wrote:
>
>   
>> it depends on how you look at it, I suppose. If you look at ":xy" as "x"
>> being the 10s position and "y" being the 1s position, it makes no sense.
>>     
>
> Suffice it to say, I don't look at it that way :-). I'd wager most
> people wouldn't either, but I have no data to back that up of course.
>
>
>   

Isn't the point that ambiguity is undesirable, as is inconsistency? So 
counts of people who see this one way or the other should be irrelevant.

Alvaro noted the use in the wild of formats like "%d:%d:%d" for times. 
IMNSHO we should not cater to such bad code.

cheers

andrew

cheers


Re: New to_timestamp implementation is pretty strict

От
"Greg Stark"
Дата:
How would you parse an input format of just 'SS' ? is there something
ambiguous about '3' ? I don't see anything "bad" about using %d to
output an integer number of seconds.

-- 
greg


Re: New to_timestamp implementation is pretty strict

От
"Robert Haas"
Дата:
On Mon, Dec 1, 2008 at 10:33 AM, Greg Stark <stark@enterprisedb.com> wrote:
> How would you parse an input format of just 'SS' ? is there something
> ambiguous about '3' ? I don't see anything "bad" about using %d to
> output an integer number of seconds.

+1.

It seems to me that it's pretty silly to say that we "know" that the 2
in "01:2:03" is intended to mean 02, but we are somehow confused about
whether the 3 in "01:02:3" is intended to mean 03 or 30.  Sure, the
latter could be the result of a truncation, but if the user is
randomly truncating their strings, they're going to have problems with
a lot more than to_timestamp().

...Robert


Re: New to_timestamp implementation is pretty strict

От
Andrew Dunstan
Дата:

Greg Stark wrote:
> How would you parse an input format of just 'SS' ? is there something
> ambiguous about '3' ? I don't see anything "bad" about using %d to
> output an integer number of seconds.
>
>   

The docs say that SS corresponds to "second (00-59)", so clearly it 
should expect a two digit zero padded number.

What's so hard about using "%0.2d" ?

cheers

andrew


Re: New to_timestamp implementation is pretty strict

От
"Robert Haas"
Дата:
Another point here is that we have always accepted single digits in dates:

portal=> select '2008-11-1'::date;   date
------------2008-11-01
(1 row)

portal=> select '2008-1-11'::date;   date
------------2008-01-11
(1 row)

If we're going to handle dates and timestamps inconsistently, there
should be a good reason for it.

...Robert


Re: New to_timestamp implementation is pretty strict

От
Tom Lane
Дата:
"Robert Haas" <robertmhaas@gmail.com> writes:
> Another point here is that we have always accepted single digits in dates:

Yeah, but that's the general datetime input code, which has rather
different goals than to_timestamp().

After thinking about it I'm inclined to feel that SS and friends should
insist on exactly 2 digits.  If you want to allow 1-or-2-digits then use
FMSS, just like the error message tells you.  (However, I have a vague
feeling that Oracle doesn't insist on this, and in the end we ought to
follow Oracle's behavior.  Can anyone check?)

In any case, it's certainly broken that the last field behaves
differently from not-last fields.  I'm not all that set on whether we
insist on two digits or not, but I do think the inconsistency needs
to be fixed.
        regards, tom lane


Re: New to_timestamp implementation is pretty strict

От
Heikki Linnakangas
Дата:
Robert Haas wrote:
> On Mon, Dec 1, 2008 at 10:33 AM, Greg Stark <stark@enterprisedb.com> wrote:
>> How would you parse an input format of just 'SS' ? is there something
>> ambiguous about '3' ? I don't see anything "bad" about using %d to
>> output an integer number of seconds.
> 
> +1.
> 
> It seems to me that it's pretty silly to say that we "know" that the 2
> in "01:2:03" is intended to mean 02, but we are somehow confused about
> whether the 3 in "01:02:3" is intended to mean 03 or 30.

Yep. It's a fair argument that we shouldn't accept either, but the 
inconsistency is just wrong. I've committed a patch fixing the 
inconsistency, by allowing "01:02:3".

Now whether we should forbid both, my opinion is that we shouldn't; that 
would just unnecessarily brake old applications, and I don't think 
there's much danger of ambiguity in what "01:2:03" means.

For better or worse, we also allow these more questionable inputs:

postgres=# SELECT to_timestamp('2008/-3/01', 'YYYY/MM/DD');      to_timestamp
------------------------ 2007-09-01 00:00:00+03
(1 row)

postgres=# SELECT to_timestamp('2008--3-01', 'YYYY-MM-DD');      to_timestamp
------------------------ 2007-09-01 00:00:00+03
(1 row)

postgres=# SELECT to_timestamp('2008-03', 'YYYY-MM-DD');      to_timestamp
------------------------ 2008-03-01 00:00:00+02
(1 row)

postgres=# SELECT to_timestamp('2008-03-04-foobar', 'YYYY-MM-DD');      to_timestamp
------------------------ 2008-03-04 00:00:00+02
(1 row)

The argument for rejecting these is stronger, IMHO, but given that we 
allowed these in previous releases as well, I don't think we try to 
forbid them either.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: New to_timestamp implementation is pretty strict

От
"Robert Haas"
Дата:
> For better or worse, we also allow these more questionable inputs:

Wow.  Those are all pretty atrocious.

Even so, it's not clear to me that there's a lot of merit to changing
the behavior.  If to_timestamp() isn't rigorous enough, you can always
stick some additional error checking in front of it; it's easy to
write a regular expression that will only match EXACTLY YYYY-MM-DD if
that's what you want to do.  If to_timestamp() is excessively
pedantic, it forces you into rewriting to_timestamp(), which is a lot
more work.  I probably still wouldn't make it accept anything quite
as... creative... as these examples if starting over, but now that the
existing version is out there, I think breaking backward compatibility
isn't warranted.

...Robert


Re: New to_timestamp implementation is pretty strict

От
"Brendan Jurd"
Дата:
On Mon, Dec 1, 2008 at 11:08 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> postgres=# SELECT to_timestamp('29-12-2005 01:02:3', 'DD-MM-YYYY
> HH24:MI:SS'); -- doesn't work
...
> I think the end of string should be treated like a field separator, colon in
> this example, and we should accept both of the above. Opinions?
>

(With apologies for being late to the thread)

I can agree that it is inconsistent to treat separator characters,
like colons and hyphens, differently to the end of the string.  If
we're allowing separators to imply variable-width nodes, then the end
of the string should do so as well, so I think Heikki did the right
thing here.

However, I actually think that, ultimately, Heikki's example above
*should* be rejected, and so should any other attempt to provide a
value of the wrong width, even if there are separators present, unless
the user has specified the 'FM' fill mode flag.

Heikki's example shows a too-short string for SS, but what about one
that is too long?  Should we accept

# to_timestamp('29122005 0102333', 'DDMMYYYY HH24MISS')

As meaning three hundred and thirty-three seconds?  I would argue we
shouldn't; it's most likely that the user made an error, so the right
thing to do is throw an exception and give them an opportunity to fix
it.  Making guesses about the user's intention when the input is
heavily ambiguous isn't a fun game to be playing, for us or for the
user.

Given the contrary arguments (backwards- and Oracle- compatibility,
mostly) I decided that was too much for me to bite off in my patch.
That was the same reason I didn't fiddle with the treatment of
end-of-string; 8.3 didn't treat it as a separator either.  The
different is that, although it always treated the final field as
fixed-width, prior to my patch it didn't actually throw an error when
fixed-width fields were too short.

Cheers,
BJ


Re: New to_timestamp implementation is pretty strict

От
Heikki Linnakangas
Дата:
Tom Lane wrote:
> After thinking about it I'm inclined to feel that SS and friends should
> insist on exactly 2 digits.  If you want to allow 1-or-2-digits then use
> FMSS, just like the error message tells you.  (However, I have a vague
> feeling that Oracle doesn't insist on this, and in the end we ought to
> follow Oracle's behavior.  Can anyone check?)

Oracle doesn't insist on it.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com