Обсуждение: bug in date_part() function in 6.5.2, 7.0.2

Поиск
Список
Период
Сортировка

bug in date_part() function in 6.5.2, 7.0.2

От
pgsql-bugs@postgresql.org
Дата:
Alex Karpov (analyst@sibinet.ru) reports a bug with a severity of 1
The lower the number the more severe it is.

Short Description
bug in  date_part() function in 6.5.2, 7.0.2

Long Description
Checked on Postgres version 6.5.2 and 7.0.2:
Incorrect DOW and DAY parts of date calculation. See exanple code.
I`ve not tested this function for another time interval, only for march 2000.

  ?column?  | date_part | date_part
------------+-----------+-----------
 24.03.2000 |        24 |         5
 25.03.2000 |        25 |         6
 26.03.2000 |        25 |         6
 27.03.2000 |        27 |         1
(4 rows)


Sample Code
create table oops (date date);

insert into oops (date) values (to_date('24.03.2000','dd.mm.yyyy'));
insert into oops (date) values (to_date('25.03.2000','dd.mm.yyyy'));
insert into oops (date) values (to_date('26.03.2000','dd.mm.yyyy'));
insert into oops (date) values (to_date('27.03.2000','dd.mm.yyyy'));

select
 date::date
,date_part('day',date::date)
,date_part('dow',date::date)
from oops
order by date;

  ?column?  | date_part | date_part
------------+-----------+-----------
 24.03.2000 |        24 |         5
 25.03.2000 |        25 |         6
 26.03.2000 |        25 |         6
 27.03.2000 |        27 |         1
(4 rows)



No file was uploaded with this report

Re: bug in date_part() function in 6.5.2, 7.0.2

От
Karel Zak
Дата:
> Sample Code
> create table oops (date date);
>
> insert into oops (date) values (to_date('24.03.2000','dd.mm.yyyy'));
> insert into oops (date) values (to_date('25.03.2000','dd.mm.yyyy'));
> insert into oops (date) values (to_date('26.03.2000','dd.mm.yyyy'));
> insert into oops (date) values (to_date('27.03.2000','dd.mm.yyyy'));
>
> select
>  date::date
> ,date_part('day',date::date)
> ,date_part('dow',date::date)
> from oops
> order by date;
>
>   ?column?  | date_part | date_part
> ------------+-----------+-----------
>  24.03.2000 |        24 |         5
>  25.03.2000 |        25 |         6
>  26.03.2000 |        25 |         6
>  27.03.2000 |        27 |         1
> (4 rows)

 Interesting...

test=# select date::timestamp, date_part('day', date), to_char(date, 'DD')
from oops;
        ?column?        | date_part | to_char
------------------------+-----------+---------
 2000-03-24 00:00:00+01 |        24 | 24
 2000-03-25 00:00:00+01 |        25 | 25
 2000-03-25 23:00:00+01 |        25 | 25
 2000-03-27 00:00:00+02 |        27 | 27
(4 rows)

 to_char() and date_part() has different code but some result, a problem
can't be in date_part()...

test=# select date::timestamp from oops;
        ?column?
------------------------
 2000-03-24 00:00:00+01
 2000-03-25 00:00:00+01
 2000-03-25 23:00:00+01
           ^^^^
           ????
 2000-03-27 00:00:00+02
(4 rows)

 ....it's not date_part() bug, it's to_date() bug:

test=# select to_date('26.03.2000','dd.mm.yyyy')::timestamp;
        ?column?
------------------------
 2000-03-25 23:00:00+01
            ^^^
           ! Bug !

test=# select to_timestamp('26.03.2000','dd.mm.yyyy');
      to_timestamp
------------------------
 2000-03-26 00:00:00+01
            ^^^
          ! correct !


 But to_date() call to_timestamp() only:

Datum
to_date(PG_FUNCTION_ARGS)
{
        /* Quick hack: since our inputs are just like to_timestamp,
         * hand over the whole input info struct...
         */
        return DirectFunctionCall1(timestamp_date, to_timestamp(fcinfo));
}

 Comments, some idea?

                Karel

Re: bug in date_part() function in 6.5.2, 7.0.2

От
Tom Lane
Дата:
Karel Zak <zakkr@zf.jcu.cz> writes:
>  ....it's not date_part() bug, it's to_date() bug:

> test=# select to_date('26.03.2000','dd.mm.yyyy')::timestamp;
>         ?column?
> ------------------------
>  2000-03-25 23:00:00+01
>             ^^^
>            ! Bug !

> test=# select to_timestamp('26.03.2000','dd.mm.yyyy');
>       to_timestamp
> ------------------------
>  2000-03-26 00:00:00+01
>             ^^^
>           ! correct !

Looks like it's a DST-transition issue.  Here in EST5EDT, where this
year's EST->EDT transition occurred at 2000-04-02 02:00, I get

regression=# select to_date('26.03.2000','dd.mm.yyyy')::timestamp;
        ?column?
------------------------
 2000-03-26 00:00:00-05
(1 row)

regression=# select to_timestamp('26.03.2000','dd.mm.yyyy');
      to_timestamp
------------------------
 2000-03-26 00:00:00-05
(1 row)

regression=# select to_timestamp('02.04.2000','dd.mm.yyyy');
      to_timestamp
------------------------
 2000-04-02 00:00:00-05
(1 row)

regression=# select to_date('02.04.2000','dd.mm.yyyy')::timestamp;
        ?column?
------------------------
 2000-04-01 23:00:00-05
(1 row)

I presume 26 March was the transition date where you live?

>  But to_date() call to_timestamp() only:

> Datum
> to_date(PG_FUNCTION_ARGS)
> {
>         /* Quick hack: since our inputs are just like to_timestamp,
>          * hand over the whole input info struct...
>          */
>         return DirectFunctionCall1(timestamp_date, to_timestamp(fcinfo));
> }

>  Comments, some idea?

There's nothing wrong in the above code.  What's broken is the date-to-
timestamp type conversion for a DST transition date:

regression=# select to_date('02.04.2000','dd.mm.yyyy');
  to_date
------------
 2000-04-02
(1 row)

regression=# select '2000-04-02'::date::timestamp;
        ?column?
------------------------
 2000-04-01 23:00:00-05
(1 row)

regression=# select '2000-10-29'::date::timestamp;
        ?column?
------------------------
 2000-10-29 01:00:00-04
(1 row)

Looks to me like an off-by-one kind of problem in deciding which
timezone applies to midnight of a transition day.

            regards, tom lane

Re: bug in date_part() function in 6.5.2, 7.0.2

От
Karel Zak
Дата:
>
> Looks to me like an off-by-one kind of problem in deciding which
> timezone applies to midnight of a transition day.
>
>             regards, tom lane


 Hmm ... really cool solution. You are right! I was sure that in the my
to_xxxx() code can't be bug :-))

 Thanks.
                    Karel

Re: bug in date_part() function in 6.5.2, 7.0.2

От
Thomas Lockhart
Дата:
> >  ....it's not date_part() bug, it's to_date() bug:
> Looks to me like an off-by-one kind of problem in deciding which
> timezone applies to midnight of a transition day.

Probably a bit worse (but no problem to solve ;): you need to make sure
that you rotate the date type to the correct time zone when you do the
conversion to timestamp. If you just blast the yy/mm/dd into the time
structure, leaving the time zone fields zero'd out, then you will be
doing the conversion in UTC. When the timestamp is read back out it is
converted to your local time zone.

The date->timestamp conversion code gets this right, so you might want
to look at that. From my testing, the only annoying case is for 02:00 on
the day of a DST transition, when you either skip or get an extra hour.

If you still have trouble, I'd be happy to look at your code...

                    - Thomas

Re: bug in date_part() function in 6.5.2, 7.0.2

От
Tom Lane
Дата:
Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
>> Looks to me like an off-by-one kind of problem in deciding which
>> timezone applies to midnight of a transition day.

> The date->timestamp conversion code gets this right, so you might want
> to look at that.

Au contraire: the cited examples appear to prove that the
date->timestamp conversion code gets this *wrong*.  Or did
you miss the point of

regression=# select '2000-04-02'::date::timestamp;
        ?column?
------------------------
 2000-04-01 23:00:00-05
(1 row)

            regards, tom lane

Re: bug in date_part() function in 6.5.2, 7.0.2

От
Thomas Lockhart
Дата:
> >> Looks to me like an off-by-one kind of problem in deciding which
> >> timezone applies to midnight of a transition day.
> > The date->timestamp conversion code gets this right, so you might want
> > to look at that.
> Au contraire: the cited examples appear to prove that the
> date->timestamp conversion code gets this *wrong*.  Or did
> you miss the point of
> regression=# select '2000-04-02'::date::timestamp;
>         ?column?
> ------------------------
>  2000-04-01 23:00:00-05
> (1 row)

*sigh* Well, of course I missed the point. Thanks for clarifying that ;)

OK, the situation is coming back to me now. The conversion is done in
steps:

1) Check that the date is likely to be within the Unix system time
range. Otherwise, do all conversions/calculations in UTC.

2) Convert to an integer "Unix system time".

3) Rotate by 12 hours (to UTC noon!). This is supposed to ensure that we
stay in the correct day after conversion to local time *no matter what
time zone we are actually in*, but is likely the problem in this edge
case.

4) Call localtime() to fill in the fields of a tm structure. This is how
I get ahold of the time zone (which is not known before this step). For
this DST edge case, the time zone is off by one hour :(

5) Copy the fields from the result of the call to localtime() into a new
tm structure, with zeros for time fields.

6) Call tm2timestamp(), which recalculates an internal UTC-based
timestamp value based on the current contents of the tm structure.

It *may* be possible to run through some of these steps *twice*, to
verify that the conversion to/from the tm structure including time zone
info gives a result on an even day boundary. But it sounds expensive.

Comments and suggestions are appreciated...

                       - Thomas

Re: bug in date_part() function in 6.5.2, 7.0.2

От
Tom Lane
Дата:
Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
>>>>> Looks to me like an off-by-one kind of problem in deciding which
>>>>> timezone applies to midnight of a transition day.

> 2) Convert to an integer "Unix system time".

> 3) Rotate by 12 hours (to UTC noon!). This is supposed to ensure that we
> stay in the correct day after conversion to local time *no matter what
> time zone we are actually in*, but is likely the problem in this edge
> case.

> 4) Call localtime() to fill in the fields of a tm structure. This is how
> I get ahold of the time zone (which is not known before this step). For
> this DST edge case, the time zone is off by one hour :(

> 5) Copy the fields from the result of the call to localtime() into a new
> tm structure, with zeros for time fields.

Seems like you could just skip step 3 and call localtime() with fields
indicating midnight of the specified date.  Then use the complete
localtime result (don't discard any fields) and you should be OK, no?

            regards, tom lane

Re: bug in date_part() function in 6.5.2, 7.0.2

От
Thomas Lockhart
Дата:
> Seems like you could just skip step 3 and call localtime() with fields
> indicating midnight of the specified date.  Then use the complete
> localtime result (don't discard any fields) and you should be OK, no?

Pretty sure this won't work, since the complete localtime result will
not be local midnight, which is the expected result. The problem is that
I don't know what the actual time zone (and DST status) is until *after*
converting from UTC to local. And when the input type is "date" I have
no time zone context to use as a hint.

Rotating to UTC noon solves some problems since converting from UTC to
local time will always stay within the same local calendar day. But it
falls down on the DST transition days, as we've noticed.

I started playing with mktime(), which would seem to be the right thing,
but it isn't clear from my docs that this routine will work without the
time zone fields filled in. Well, no matter what the docs say it seems
to interact badly with my code and I get segfaults a few lines farther
along. I'm poking at it to track down the problem.

                     - Thomas

Re: bug in date_part() function in 6.5.2, 7.0.2

От
Tom Lane
Дата:
Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
>> Seems like you could just skip step 3 and call localtime() with fields
>> indicating midnight of the specified date.  Then use the complete
>> localtime result (don't discard any fields) and you should be OK, no?

> Pretty sure this won't work, since the complete localtime result will
> not be local midnight, which is the expected result.

Actually, now that I go back and reread the man pages, the correct
recipe is
    * fill a struct tm with y/m/d, h = m = s = 0, isdst = -1
    * call mktime() to produce a time_t
    * either use the updated struct tm (mktime() side effect),
          or convert the time_t directly to timestamp.

mktime() is an ANSI C requirement, so even though it's not as old as
localtime(), it's probably safe enough.

            regards, tom lane

Re: bug in date_part() function in 6.5.2, 7.0.2

От
Thomas Lockhart
Дата:
> Seems like you could just skip step 3 and call localtime() with fields
> indicating midnight of the specified date.  Then use the complete
> localtime result (don't discard any fields) and you should be OK, no?

OK, I have a solution which involves mktime(). As a side effect, I've
stripped some code, so date and date,time to timestamp conversions no
longer have to run through a complete conversion via a tm structure.
Things should be faster (unless mktime() is substantially slower than
the other support routines). btw, the date/timetz to timestamp
conversion routine has shrunk down to one line since it turns out all
required fields were already available :)

Regression tests pass (which I guess tells me that they need some added
cases, since they passed before too); will commit sometime soon.

                     - Thomas