Обсуждение: Support TZ format code in to_timestamp()
It's annoyed me for some time that to_timestamp() doesn't implement the TZ format code that to_char() has. I finally got motivated to do something about that after the complaint at [1] that jsonpath's datetime() method can't read typical JSON.stringify() output like "2023-05-22T03:09:37.825Z". We do already understand "Z" as a time zone abbreviation for UTC; we just need to get formatting.c to support this. Hence, the attached patch teaches to_timestamp() to read time zone abbreviations as well as HH and HH:MM numeric zone offsets when TZ is specified. (We need to accept HH and HH:MM to be sure that we can round-trip the output of to_char(), since its TZ output code will fall back to one of those if it does not know any abbreviation for the current zone.) You might reasonably say that we should make it read time zone names not only abbreviations. I tried to make that work, and realized that it'd create a complete mess because tzload() is so lax about what it will interpret as a POSIX-style timezone spec. I included an example in the test cases below: I think that to_timestamp('2011-12-18 11:38ESTFOO24', 'YYYY-MM-DD HH12:MITZFOOSS') should work and read just "EST" as the TZ value, allowing the "24" to be read as the SS value. But tzload() will happily eat all of "ESTFOO24" as a POSIX zone spec. We could conceivably refactor tzload() enough to match only tzdb zone names in this context. But I'm very hesitant to do that, for a few reasons: * it would make localtime.c diverge significantly from the upstream IANA source code; * we only need to support zone abbreviations to ensure we can round-trip the output of to_char(); * it's not clear to me that average users would understand why to_timestamp() accepts some but not all zone names that are accepted by the TimeZone GUC and timestamptz input. If we document it as taking only timezone abbreviations, that does correspond to a concept that's in the manual already. So I think that the attached represents a reasonable and useful compromise. I'll park this in the July commitfest. regards, tom lane [1] https://www.postgresql.org/message-id/flat/014A028B-5CE6-4FDF-AC24-426CA6FC9CEE%40mohiohio.com From 9ad1a8f707c601829a9292600f7b2e6fb1230ee8 Mon Sep 17 00:00:00 2001 From: Tom Lane <tgl@sss.pgh.pa.us> Date: Tue, 13 Jun 2023 11:50:39 -0400 Subject: [PATCH v1] Support timezone abbreviations in to_timestamp(). This patch allows the TZ format code to be used in to_timestamp(). It will accept the same possibilities that to_char() can produce with the TZ format code, namely a zone abbreviation or a numeric zone offset in the form [+-]HH or [+-]HH:MM. While at it we may as well implement the OF format code too, since it corresponds exactly to the numeric zone offset case and indeed can share code. A conceivable extension to this would be to accept timezone names not just abbreviations. However, to_char() never outputs those, and there'd be a pretty serious garbage-input hazard because pg_tzset() will accept just about anything as a POSIX zone name. A useful side effect is that jsonpath's datetime() method will now accept the common-in-JSON format "yyyy-mm-ddThh:mm:ssZ", correctly interpreting the "Z" as signifying UTC time. We can reduce the number of format patterns that executeDateTimeMethod has to try, too. --- doc/src/sgml/func.sgml | 10 +- src/backend/utils/adt/datetime.c | 76 ++++++++++ src/backend/utils/adt/formatting.c | 151 ++++++++++++------- src/backend/utils/adt/jsonpath_exec.c | 18 +-- src/include/utils/datetime.h | 3 + src/test/regress/expected/horology.out | 70 ++++++++- src/test/regress/expected/jsonb_jsonpath.out | 12 ++ src/test/regress/sql/horology.sql | 18 ++- src/test/regress/sql/jsonb_jsonpath.sql | 2 + 9 files changed, 287 insertions(+), 73 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 5a47ce4343..c5a966d92b 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -8046,13 +8046,11 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}'); </row> <row> <entry><literal>TZ</literal></entry> - <entry>upper case time-zone abbreviation - (only supported in <function>to_char</function>)</entry> + <entry>upper case time-zone abbreviation</entry> </row> <row> <entry><literal>tz</literal></entry> - <entry>lower case time-zone abbreviation - (only supported in <function>to_char</function>)</entry> + <entry>lower case time-zone abbreviation</entry> </row> <row> <entry><literal>TZH</literal></entry> @@ -8064,8 +8062,8 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}'); </row> <row> <entry><literal>OF</literal></entry> - <entry>time-zone offset from UTC - (only supported in <function>to_char</function>)</entry> + <entry>time-zone offset from UTC (<replaceable>HH</replaceable> + or <replaceable>HH</replaceable><literal>:</literal><replaceable>MM</replaceable>)</entry> </row> </tbody> </tgroup> diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c index 5d8d583ddc..19db97efc4 100644 --- a/src/backend/utils/adt/datetime.c +++ b/src/backend/utils/adt/datetime.c @@ -3246,6 +3246,82 @@ DecodeTimezoneNameToTz(const char *tzname) return result; } +/* DecodeTimezoneAbbrevPrefix() + * Interpret prefix of string as a timezone abbreviation, if possible. + * + * This has roughly the same functionality as DecodeTimezoneAbbrev(), + * but the API is adapted to the needs of formatting.c. Notably, + * we will match the longest possible prefix of the given string + * rather than insisting on a complete match, and downcasing is applied + * here rather than in the caller. + * + * Returns the length of the timezone abbreviation, or -1 if not recognized. + * On success, sets *offset to the GMT offset for the abbreviation if it + * is a fixed-offset abbreviation, or sets *tz to the pg_tz struct for + * a dynamic abbreviation. + */ +int +DecodeTimezoneAbbrevPrefix(const char *str, int *offset, pg_tz **tz) +{ + char lowtoken[TOKMAXLEN + 1]; + int len; + + *offset = 0; /* avoid uninitialized vars on failure */ + *tz = NULL; + + if (!zoneabbrevtbl) + return -1; /* no abbrevs known, so fail immediately */ + + /* Downcase as much of the string as we could need */ + for (len = 0; len < TOKMAXLEN; len++) + { + if (*str == '\0' || !isalpha((unsigned char) *str)) + break; + lowtoken[len] = pg_tolower((unsigned char) *str++); + } + lowtoken[len] = '\0'; + + /* + * We could avoid doing repeated binary searches if we cared to duplicate + * datebsearch here, but it's not clear that such an optimization would be + * worth the trouble. In common cases there's probably not anything after + * the zone abbrev anyway. So just search with successively truncated + * strings. + */ + while (len > 0) + { + const datetkn *tp = datebsearch(lowtoken, zoneabbrevtbl->abbrevs, + zoneabbrevtbl->numabbrevs); + + if (tp != NULL) + { + if (tp->type == DYNTZ) + { + DateTimeErrorExtra extra; + pg_tz *tzp = FetchDynamicTimeZone(zoneabbrevtbl, tp, + &extra); + + if (tzp != NULL) + { + /* Caller must resolve the abbrev's current meaning */ + *tz = tzp; + return len; + } + } + else + { + /* Fixed-offset zone abbrev, so it's easy */ + *offset = tp->value; + return len; + } + } + lowtoken[--len] = '\0'; + } + + /* Did not find a match */ + return -1; +} + /* ClearPgItmIn * diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c index e6246dc44b..7c75e9fb87 100644 --- a/src/backend/utils/adt/formatting.c +++ b/src/backend/utils/adt/formatting.c @@ -418,14 +418,24 @@ typedef struct us, yysz, /* is it YY or YYYY ? */ clock, /* 12 or 24 hour clock? */ - tzsign, /* +1, -1 or 0 if timezone info is absent */ + tzsign, /* +1, -1, or 0 if no TZH/TZM fields */ tzh, tzm, ff; /* fractional precision */ + bool has_tz; /* was there a TZ field? */ + int gmtoffset; /* GMT offset of fixed-offset zone abbrev */ + pg_tz *tzp; /* pg_tz for dynamic abbrev */ + char *abbrev; /* dynamic abbrev */ } TmFromChar; #define ZERO_tmfc(_X) memset(_X, 0, sizeof(TmFromChar)) +struct fmt_tz /* do_to_timestamp's timezone info output */ +{ + bool has_tz; /* was there any TZ/TZH/TZM field? */ + int gmtoffset; /* GMT offset in seconds */ +}; + /* ---------- * Debug * ---------- @@ -1058,8 +1068,8 @@ static bool from_char_seq_search(int *dest, const char **src, char **localized_array, Oid collid, FormatNode *node, Node *escontext); static bool do_to_timestamp(text *date_txt, text *fmt, Oid collid, bool std, - struct pg_tm *tm, fsec_t *fsec, int *fprec, - uint32 *flags, Node *escontext); + struct pg_tm *tm, fsec_t *fsec, struct fmt_tz *tz, + int *fprec, uint32 *flags, Node *escontext); static char *fill_str(char *str, int c, int max); static FormatNode *NUM_cache(int len, NUMDesc *Num, text *pars_str, bool *shouldFree); static char *int_to_roman(int number); @@ -3485,11 +3495,49 @@ DCH_from_char(FormatNode *node, const char *in, TmFromChar *out, break; case DCH_tz: case DCH_TZ: + { + int tzlen; + + tzlen = DecodeTimezoneAbbrevPrefix(s, + &out->gmtoffset, + &out->tzp); + if (tzlen > 0) + { + out->has_tz = true; + /* we only need the zone abbrev for DYNTZ case */ + if (out->tzp) + out->abbrev = pnstrdup(s, tzlen); + out->tzsign = 0; /* drop any earlier TZH/TZM info */ + s += tzlen; + break; + } + } + /* it doesn't match any abbrev, so parse it like OF */ + /* FALL THRU */ case DCH_OF: - ereturn(escontext,, - (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - errmsg("formatting field \"%s\" is only supported in to_char", - n->key->name))); + /* OF is equivalent to TZH or TZH:TZM */ + /* see TZH comments below */ + if (*s == '+' || *s == '-' || *s == ' ') + { + out->tzsign = *s == '-' ? -1 : +1; + s++; + } + else + { + if (extra_skip > 0 && *(s - 1) == '-') + out->tzsign = -1; + else + out->tzsign = +1; + } + if (from_char_parse_int_len(&out->tzh, &s, 2, n, escontext) < 0) + return; + if (*s == ':') + { + s++; + if (from_char_parse_int_len(&out->tzm, &s, 2, n, + escontext) < 0) + return; + } break; case DCH_TZH: @@ -4185,22 +4233,16 @@ to_timestamp(PG_FUNCTION_ARGS) Timestamp result; int tz; struct pg_tm tm; + struct fmt_tz ftz; fsec_t fsec; int fprec; do_to_timestamp(date_txt, fmt, collid, false, - &tm, &fsec, &fprec, NULL, NULL); + &tm, &fsec, &ftz, &fprec, NULL, NULL); /* Use the specified time zone, if any. */ - if (tm.tm_zone) - { - DateTimeErrorExtra extra; - int dterr = DecodeTimezone(tm.tm_zone, &tz); - - if (dterr) - DateTimeParseError(dterr, &extra, text_to_cstring(date_txt), - "timestamptz", NULL); - } + if (ftz.has_tz) + tz = ftz.gmtoffset; else tz = DetermineTimeZoneOffset(&tm, session_timezone); @@ -4229,10 +4271,11 @@ to_date(PG_FUNCTION_ARGS) Oid collid = PG_GET_COLLATION(); DateADT result; struct pg_tm tm; + struct fmt_tz ftz; fsec_t fsec; do_to_timestamp(date_txt, fmt, collid, false, - &tm, &fsec, NULL, NULL, NULL); + &tm, &fsec, &ftz, NULL, NULL, NULL); /* Prevent overflow in Julian-day routines */ if (!IS_VALID_JULIAN(tm.tm_year, tm.tm_mon, tm.tm_mday)) @@ -4274,12 +4317,13 @@ parse_datetime(text *date_txt, text *fmt, Oid collid, bool strict, Node *escontext) { struct pg_tm tm; + struct fmt_tz ftz; fsec_t fsec; int fprec; uint32 flags; if (!do_to_timestamp(date_txt, fmt, collid, strict, - &tm, &fsec, &fprec, &flags, escontext)) + &tm, &fsec, &ftz, &fprec, &flags, escontext)) return (Datum) 0; *typmod = fprec ? fprec : -1; /* fractional part precision */ @@ -4292,18 +4336,9 @@ parse_datetime(text *date_txt, text *fmt, Oid collid, bool strict, { TimestampTz result; - if (tm.tm_zone) + if (ftz.has_tz) { - DateTimeErrorExtra extra; - int dterr = DecodeTimezone(tm.tm_zone, tz); - - if (dterr) - { - DateTimeParseError(dterr, &extra, - text_to_cstring(date_txt), - "timestamptz", escontext); - return (Datum) 0; - } + *tz = ftz.gmtoffset; } else { @@ -4384,18 +4419,9 @@ parse_datetime(text *date_txt, text *fmt, Oid collid, bool strict, { TimeTzADT *result = palloc(sizeof(TimeTzADT)); - if (tm.tm_zone) + if (ftz.has_tz) { - DateTimeErrorExtra extra; - int dterr = DecodeTimezone(tm.tm_zone, tz); - - if (dterr) - { - DateTimeParseError(dterr, &extra, - text_to_cstring(date_txt), - "timetz", escontext); - return (Datum) 0; - } + *tz = ftz.gmtoffset; } else { @@ -4448,7 +4474,7 @@ parse_datetime(text *date_txt, text *fmt, Oid collid, bool strict, * do_to_timestamp: shared code for to_timestamp and to_date * * Parse the 'date_txt' according to 'fmt', return results as a struct pg_tm, - * fractional seconds, and fractional precision. + * fractional seconds, struct fmt_tz, and fractional precision. * * 'collid' identifies the collation to use, if needed. * 'std' specifies standard parsing mode. @@ -4465,12 +4491,12 @@ parse_datetime(text *date_txt, text *fmt, Oid collid, bool strict, * 'date_txt'. * * The TmFromChar is then analysed and converted into the final results in - * struct 'tm', 'fsec', and 'fprec'. + * struct 'tm', 'fsec', struct 'tz', and 'fprec'. */ static bool do_to_timestamp(text *date_txt, text *fmt, Oid collid, bool std, - struct pg_tm *tm, fsec_t *fsec, int *fprec, - uint32 *flags, Node *escontext) + struct pg_tm *tm, fsec_t *fsec, struct fmt_tz *tz, + int *fprec, uint32 *flags, Node *escontext) { FormatNode *format = NULL; TmFromChar tmfc; @@ -4487,6 +4513,7 @@ do_to_timestamp(text *date_txt, text *fmt, Oid collid, bool std, ZERO_tmfc(&tmfc); ZERO_tm(tm); *fsec = 0; + tz->has_tz = false; if (fprec) *fprec = 0; if (flags) @@ -4762,11 +4789,14 @@ do_to_timestamp(text *date_txt, text *fmt, Oid collid, bool std, goto fail; } - /* Save parsed time-zone into tm->tm_zone if it was specified */ + /* + * If timezone info was present, reduce it to a GMT offset. (We cannot do + * this until we've filled all of the tm struct, since the zone's offset + * might be time-varying.) + */ if (tmfc.tzsign) { - char *tz; - + /* TZH and/or TZM fields */ if (tmfc.tzh < 0 || tmfc.tzh > MAX_TZDISP_HOUR || tmfc.tzm < 0 || tmfc.tzm >= MINS_PER_HOUR) { @@ -4775,10 +4805,27 @@ do_to_timestamp(text *date_txt, text *fmt, Oid collid, bool std, goto fail; } - tz = psprintf("%c%02d:%02d", - tmfc.tzsign > 0 ? '+' : '-', tmfc.tzh, tmfc.tzm); - - tm->tm_zone = tz; + tz->has_tz = true; + tz->gmtoffset = (tmfc.tzh * MINS_PER_HOUR + tmfc.tzm) * SECS_PER_MINUTE; + /* note we are flipping the sign convention here */ + if (tmfc.tzsign > 0) + tz->gmtoffset = -tz->gmtoffset; + } + else if (tmfc.has_tz) + { + /* TZ field */ + tz->has_tz = true; + if (tmfc.tzp == NULL) + { + /* fixed-offset abbreviation; flip the sign convention */ + tz->gmtoffset = -tmfc.gmtoffset; + } + else + { + /* dynamic-offset abbreviation, resolve using specified time */ + tz->gmtoffset = DetermineTimeZoneAbbrevOffset(tm, tmfc.abbrev, + tmfc.tzp); + } } DEBUG_TM(tm); diff --git a/src/backend/utils/adt/jsonpath_exec.c b/src/backend/utils/adt/jsonpath_exec.c index 2d0599b4aa..963012fc49 100644 --- a/src/backend/utils/adt/jsonpath_exec.c +++ b/src/backend/utils/adt/jsonpath_exec.c @@ -1846,20 +1846,14 @@ executeDateTimeMethod(JsonPathExecContext *cxt, JsonPathItem *jsp, static const char *fmt_str[] = { "yyyy-mm-dd", /* date */ - "HH24:MI:SS.USTZH:TZM", /* timetz */ - "HH24:MI:SS.USTZH", - "HH24:MI:SSTZH:TZM", - "HH24:MI:SSTZH", + "HH24:MI:SS.USTZ", /* timetz */ + "HH24:MI:SSTZ", "HH24:MI:SS.US", /* time without tz */ "HH24:MI:SS", - "yyyy-mm-dd HH24:MI:SS.USTZH:TZM", /* timestamptz */ - "yyyy-mm-dd HH24:MI:SS.USTZH", - "yyyy-mm-dd HH24:MI:SSTZH:TZM", - "yyyy-mm-dd HH24:MI:SSTZH", - "yyyy-mm-dd\"T\"HH24:MI:SS.USTZH:TZM", - "yyyy-mm-dd\"T\"HH24:MI:SS.USTZH", - "yyyy-mm-dd\"T\"HH24:MI:SSTZH:TZM", - "yyyy-mm-dd\"T\"HH24:MI:SSTZH", + "yyyy-mm-dd HH24:MI:SS.USTZ", /* timestamptz */ + "yyyy-mm-dd HH24:MI:SSTZ", + "yyyy-mm-dd\"T\"HH24:MI:SS.USTZ", + "yyyy-mm-dd\"T\"HH24:MI:SSTZ", "yyyy-mm-dd HH24:MI:SS.US", /* timestamp without tz */ "yyyy-mm-dd HH24:MI:SS", "yyyy-mm-dd\"T\"HH24:MI:SS.US", diff --git a/src/include/utils/datetime.h b/src/include/utils/datetime.h index a871e3223d..fa95bcded5 100644 --- a/src/include/utils/datetime.h +++ b/src/include/utils/datetime.h @@ -348,6 +348,9 @@ extern int DecodeUnits(int field, const char *lowtoken, int *val); extern int DecodeTimezoneName(const char *tzname, int *offset, pg_tz **tz); extern pg_tz *DecodeTimezoneNameToTz(const char *tzname); +extern int DecodeTimezoneAbbrevPrefix(const char *str, + int *offset, pg_tz **tz); + extern int j2day(int date); extern struct Node *TemporalSimplify(int32 max_precis, struct Node *node); diff --git a/src/test/regress/expected/horology.out b/src/test/regress/expected/horology.out index e63e5b30fe..d97e6ee39e 100644 --- a/src/test/regress/expected/horology.out +++ b/src/test/regress/expected/horology.out @@ -3063,8 +3063,54 @@ SELECT to_timestamp('2011-12-18 11:38 20', 'YYYY-MM-DD HH12:MI TZM'); Sun Dec 18 03:18:00 2011 PST (1 row) -SELECT to_timestamp('2011-12-18 11:38 PST', 'YYYY-MM-DD HH12:MI TZ'); -- NYI -ERROR: formatting field "TZ" is only supported in to_char +SELECT to_timestamp('2011-12-18 11:38 EST', 'YYYY-MM-DD HH12:MI TZ'); + to_timestamp +------------------------------ + Sun Dec 18 08:38:00 2011 PST +(1 row) + +SELECT to_timestamp('2011-12-18 11:38 -05', 'YYYY-MM-DD HH12:MI TZ'); + to_timestamp +------------------------------ + Sun Dec 18 08:38:00 2011 PST +(1 row) + +SELECT to_timestamp('2011-12-18 11:38 +01:30', 'YYYY-MM-DD HH12:MI TZ'); + to_timestamp +------------------------------ + Sun Dec 18 02:08:00 2011 PST +(1 row) + +SELECT to_timestamp('2011-12-18 11:38 MSK', 'YYYY-MM-DD HH12:MI TZ'); -- dyntz + to_timestamp +------------------------------ + Sat Dec 17 23:38:00 2011 PST +(1 row) + +SELECT to_timestamp('2011-12-18 11:38ESTFOO24', 'YYYY-MM-DD HH12:MITZFOOSS'); + to_timestamp +------------------------------ + Sun Dec 18 08:38:24 2011 PST +(1 row) + +SELECT to_timestamp('2011-12-18 11:38-05FOO24', 'YYYY-MM-DD HH12:MITZFOOSS'); + to_timestamp +------------------------------ + Sun Dec 18 08:38:24 2011 PST +(1 row) + +SELECT to_timestamp('2011-12-18 11:38 -05', 'YYYY-MM-DD HH12:MI OF'); + to_timestamp +------------------------------ + Sun Dec 18 08:38:00 2011 PST +(1 row) + +SELECT to_timestamp('2011-12-18 11:38 +01:30', 'YYYY-MM-DD HH12:MI OF'); + to_timestamp +------------------------------ + Sun Dec 18 02:08:00 2011 PST +(1 row) + SELECT to_timestamp('2018-11-02 12:34:56.025', 'YYYY-MM-DD HH24:MI:SS.MS'); to_timestamp ---------------------------------- @@ -3480,6 +3526,19 @@ SELECT to_date('0000-02-01','YYYY-MM-DD'); -- allowed, though it shouldn't be 02-01-0001 BC (1 row) +-- to_char's TZ format code produces zone abbrev if known +SELECT to_char('2012-12-12 12:00'::timestamptz, 'YYYY-MM-DD HH:MI:SS TZ'); + to_char +------------------------- + 2012-12-12 12:00:00 PST +(1 row) + +SELECT to_char('2012-12-12 12:00'::timestamptz, 'YYYY-MM-DD HH:MI:SS tz'); + to_char +------------------------- + 2012-12-12 12:00:00 pst +(1 row) + -- -- Check behavior with SQL-style fixed-GMT-offset time zone (cf bug #8572) -- @@ -3521,4 +3580,11 @@ SELECT to_char('2012-12-12 12:00'::timestamptz, 'YYYY-MM-DD SSSSS'); 2012-12-12 43200 (1 row) +SET TIME ZONE '+2'; +SELECT to_char('2012-12-12 12:00'::timestamptz, 'YYYY-MM-DD HH:MI:SS TZ'); + to_char +------------------------- + 2012-12-12 12:00:00 +02 +(1 row) + RESET TIME ZONE; diff --git a/src/test/regress/expected/jsonb_jsonpath.out b/src/test/regress/expected/jsonb_jsonpath.out index 6659bc9091..783a6f51bf 100644 --- a/src/test/regress/expected/jsonb_jsonpath.out +++ b/src/test/regress/expected/jsonb_jsonpath.out @@ -1935,6 +1935,18 @@ select jsonb_path_query('"2017-03-10T12:34:56.789+3:10"', '$.datetime()'); select jsonb_path_query('"2017-03-10t12:34:56.789+3:10"', '$.datetime()'); ERROR: datetime format is not recognized: "2017-03-10t12:34:56.789+3:10" HINT: Use a datetime template argument to specify the input data format. +select jsonb_path_query('"2017-03-10T12:34:56.789EST"', '$.datetime()'); + jsonb_path_query +--------------------------------- + "2017-03-10T12:34:56.789-05:00" +(1 row) + +select jsonb_path_query('"2017-03-10T12:34:56.789Z"', '$.datetime()'); + jsonb_path_query +--------------------------------- + "2017-03-10T12:34:56.789+00:00" +(1 row) + select jsonb_path_query('"12:34:56"', '$.datetime().type()'); jsonb_path_query -------------------------- diff --git a/src/test/regress/sql/horology.sql b/src/test/regress/sql/horology.sql index f7f8c8d2dd..32fe9475fa 100644 --- a/src/test/regress/sql/horology.sql +++ b/src/test/regress/sql/horology.sql @@ -498,7 +498,15 @@ SELECT to_timestamp('2011-12-18 11:38 +05:20', 'YYYY-MM-DD HH12:MI TZH:TZM'); SELECT to_timestamp('2011-12-18 11:38 -05:20', 'YYYY-MM-DD HH12:MI TZH:TZM'); SELECT to_timestamp('2011-12-18 11:38 20', 'YYYY-MM-DD HH12:MI TZM'); -SELECT to_timestamp('2011-12-18 11:38 PST', 'YYYY-MM-DD HH12:MI TZ'); -- NYI +SELECT to_timestamp('2011-12-18 11:38 EST', 'YYYY-MM-DD HH12:MI TZ'); +SELECT to_timestamp('2011-12-18 11:38 -05', 'YYYY-MM-DD HH12:MI TZ'); +SELECT to_timestamp('2011-12-18 11:38 +01:30', 'YYYY-MM-DD HH12:MI TZ'); +SELECT to_timestamp('2011-12-18 11:38 MSK', 'YYYY-MM-DD HH12:MI TZ'); -- dyntz +SELECT to_timestamp('2011-12-18 11:38ESTFOO24', 'YYYY-MM-DD HH12:MITZFOOSS'); +SELECT to_timestamp('2011-12-18 11:38-05FOO24', 'YYYY-MM-DD HH12:MITZFOOSS'); + +SELECT to_timestamp('2011-12-18 11:38 -05', 'YYYY-MM-DD HH12:MI OF'); +SELECT to_timestamp('2011-12-18 11:38 +01:30', 'YYYY-MM-DD HH12:MI OF'); SELECT to_timestamp('2018-11-02 12:34:56.025', 'YYYY-MM-DD HH24:MI:SS.MS'); @@ -613,6 +621,10 @@ SELECT to_date('2016 366', 'YYYY DDD'); -- ok SELECT to_date('2016 367', 'YYYY DDD'); SELECT to_date('0000-02-01','YYYY-MM-DD'); -- allowed, though it shouldn't be +-- to_char's TZ format code produces zone abbrev if known +SELECT to_char('2012-12-12 12:00'::timestamptz, 'YYYY-MM-DD HH:MI:SS TZ'); +SELECT to_char('2012-12-12 12:00'::timestamptz, 'YYYY-MM-DD HH:MI:SS tz'); + -- -- Check behavior with SQL-style fixed-GMT-offset time zone (cf bug #8572) -- @@ -629,4 +641,8 @@ SELECT to_char('2012-12-12 12:00'::timestamptz, 'YYYY-MM-DD HH:MI:SS TZ'); SELECT to_char('2012-12-12 12:00'::timestamptz, 'YYYY-MM-DD SSSS'); SELECT to_char('2012-12-12 12:00'::timestamptz, 'YYYY-MM-DD SSSSS'); +SET TIME ZONE '+2'; + +SELECT to_char('2012-12-12 12:00'::timestamptz, 'YYYY-MM-DD HH:MI:SS TZ'); + RESET TIME ZONE; diff --git a/src/test/regress/sql/jsonb_jsonpath.sql b/src/test/regress/sql/jsonb_jsonpath.sql index e0ce509264..e9bd94b034 100644 --- a/src/test/regress/sql/jsonb_jsonpath.sql +++ b/src/test/regress/sql/jsonb_jsonpath.sql @@ -417,6 +417,8 @@ select jsonb_path_query('"2017-03-10t12:34:56+3:10"', '$.datetime()'); select jsonb_path_query('"2017-03-10 12:34:56.789+3:10"', '$.datetime()'); select jsonb_path_query('"2017-03-10T12:34:56.789+3:10"', '$.datetime()'); select jsonb_path_query('"2017-03-10t12:34:56.789+3:10"', '$.datetime()'); +select jsonb_path_query('"2017-03-10T12:34:56.789EST"', '$.datetime()'); +select jsonb_path_query('"2017-03-10T12:34:56.789Z"', '$.datetime()'); select jsonb_path_query('"12:34:56"', '$.datetime().type()'); select jsonb_path_query('"12:34:56"', '$.datetime()'); select jsonb_path_query('"12:34:56+3"', '$.datetime().type()'); -- 2.39.3
On Tue, Jun 13, 2023 at 12:20:42PM -0400, Tom Lane wrote: > It's annoyed me for some time that to_timestamp() doesn't implement > the TZ format code that to_char() has. I finally got motivated to > do something about that after the complaint at [1] that jsonpath's > datetime() method can't read typical JSON.stringify() output like > "2023-05-22T03:09:37.825Z". We do already understand "Z" as a > time zone abbreviation for UTC; we just need to get formatting.c > to support this. I have to admit I tend to prefer actual time zone names like "America/New_York" over acronyms or offset values. However, I can see the dump/restore problem with such names. Parenthetically, I often use airport codes that map to time zones in my own calendar. I would argue that on a global scale airport codes are actually more useful than abbreviations like EST, assuming you don't need to designate whether daylight saving time was active, e.g. EST vs. EDT. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
On 6/21/23 20:07, Bruce Momjian wrote: > On Tue, Jun 13, 2023 at 12:20:42PM -0400, Tom Lane wrote: >> It's annoyed me for some time that to_timestamp() doesn't implement >> the TZ format code that to_char() has. I finally got motivated to >> do something about that after the complaint at [1] that jsonpath's >> datetime() method can't read typical JSON.stringify() output like >> "2023-05-22T03:09:37.825Z". We do already understand "Z" as a >> time zone abbreviation for UTC; we just need to get formatting.c >> to support this. > > I have to admit I tend to prefer actual time zone names like > "America/New_York" over acronyms or offset values. However, I can see > the dump/restore problem with such names. I think the abbreviations are worse than useless -- dangerously misleading even. I was converting a timestamp I had pulled from the internet the other day in IST (India Standard Time) using Postres to test some new code I was working on. I got a rather surprising result so changed it to Asia/Kolkata and got what I expected. Turns out IST is *also* Israel Standard Time and Irish Standard Time. I think Postres gave me the conversion in Irish time. At any rate, it was not offset by 30 minutes which was the dead giveaway. Offsets are fine when you just need an absolute date to feed into something like recovery and it doesn't much matter what timezone you were in. Z and UTC also seem fine since they are unambiguous as far as I know. Regards, -David
2024-01 Commitfest. Hi, this patch was marked in CF as "Needs Review" [1], but there has been no activity on this thread for 7+ months. If nothing more is planned for this thread then it will be closed ("Returned with feedback") at the end of this CF. ====== [1] https://commitfest.postgresql.org/46/4362/ Kind Regards, Peter Smith.
Peter Smith <smithpb2250@gmail.com> writes: > Hi, this patch was marked in CF as "Needs Review" [1], but there has > been no activity on this thread for 7+ months. > If nothing more is planned for this thread then it will be closed > ("Returned with feedback") at the end of this CF. I still think it would be a good idea, but I can't deny the lack of other interest in it. Unless someone steps up to review, let's close it. (FTR, I don't agree with David's objections to the entire concept of zone abbreviations. We're not going to remove support for them everywhere else, so why shouldn't to_timestamp() handle them?) regards, tom lane
Hi, > > Hi, this patch was marked in CF as "Needs Review" [1], but there has > > been no activity on this thread for 7+ months. > > If nothing more is planned for this thread then it will be closed > > ("Returned with feedback") at the end of this CF. > > I still think it would be a good idea, but I can't deny the lack > of other interest in it. Unless someone steps up to review, > let's close it. I agree that it would be a good idea, and again I would like to condemn the approach "since no one reviews it we are going to reject it". A friendly reminder like "hey, this patch was waiting long enough, maybe someone could take a look" would be more appropriate IMO. I remember during previous commitfests some CF managers created a list of patches that could use more attention. That was useful. I will review the patch, but probably only tomorrow. -- Best regards, Aleksander Alekseev
> On 22 Jan 2024, at 03:10, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I still think it would be a good idea, but I can't deny the lack > of other interest in it. Unless someone steps up to review, > let's close it. Since I had this on my (ever-growing) TODO I re-prioritized today and took a look at it since I think it's something we should support. Nothing really sticks out and I was unable to poke any holes so I don't have too much more to offer than a LGTM. + while (len > 0) + { + const datetkn *tp = datebsearch(lowtoken, zoneabbrevtbl->abbrevs, + zoneabbrevtbl->numabbrevs); My immediate reaction was that we should stop at prefix lengths of two since I could only think of abbreviations of two or more. Googling and reading found that there are indeed one-letter timezones (Alpha, Bravo etc..). Not sure if it's worth mentioning that in the comment to help other readers who aren't neck deep in timezones? + /* FALL THRU */ Tiny nitpick, it looks a bit curious that we spell it FALL THRU here and "fall through" a few cases up in the same switch. While we are quite inconsistent across the tree, consistency within a file is preferrable (regardless of which). -- Daniel Gustafsson
Daniel Gustafsson <daniel@yesql.se> writes: > On 22 Jan 2024, at 03:10, Tom Lane <tgl@sss.pgh.pa.us> wrote: > + while (len > 0) > + { > + const datetkn *tp = datebsearch(lowtoken, zoneabbrevtbl->abbrevs, > + zoneabbrevtbl->numabbrevs); > My immediate reaction was that we should stop at prefix lengths of two since I > could only think of abbreviations of two or more. Googling and reading found > that there are indeed one-letter timezones (Alpha, Bravo etc..). Not sure if > it's worth mentioning that in the comment to help other readers who aren't neck > deep in timezones? The one I usually think of is "Z" for UTC; I wasn't actually aware that there were any other single-letter abbrevs. But in any case I don't see a reason for this code to be making such assumptions. > + /* FALL THRU */ > Tiny nitpick, it looks a bit curious that we spell it FALL THRU here and "fall > through" a few cases up in the same switch. While we are quite inconsistent > across the tree, consistency within a file is preferrable (regardless of > which). Fair. I tend to shorten it, but I failed to notice that there was nearby precedent for the other way. regards, tom lane
Hi, > Since I had this on my (ever-growing) TODO I re-prioritized today and took a > look at it since I think it's something we should support. > > Nothing really sticks out and I was unable to poke any holes so I don't have > too much more to offer than a LGTM. > > + while (len > 0) > + { > + const datetkn *tp = datebsearch(lowtoken, zoneabbrevtbl->abbrevs, > + zoneabbrevtbl->numabbrevs); > > My immediate reaction was that we should stop at prefix lengths of two since I > could only think of abbreviations of two or more. Googling and reading found > that there are indeed one-letter timezones (Alpha, Bravo etc..). Not sure if > it's worth mentioning that in the comment to help other readers who aren't neck > deep in timezones? > > + /* FALL THRU */ > > Tiny nitpick, it looks a bit curious that we spell it FALL THRU here and "fall > through" a few cases up in the same switch. While we are quite inconsistent > across the tree, consistency within a file is preferrable (regardless of > which). I reviewed the patch and tested it on MacOS and generally concur with stated above. The only nitpick I have is the apparent lack of negative tests for to_timestamp(), e.g. when the string doesn't match the specified format. -- Best regards, Aleksander Alekseev
Aleksander Alekseev <aleksander@timescale.com> writes: > I reviewed the patch and tested it on MacOS and generally concur with > stated above. The only nitpick I have is the apparent lack of negative > tests for to_timestamp(), e.g. when the string doesn't match the > specified format. That's an excellent suggestion indeed, because when I tried SELECT to_timestamp('2011-12-18 11:38 JUNK', 'YYYY-MM-DD HH12:MI TZ'); -- error I got ERROR: invalid value "JU" for "TZ" DETAIL: Value must be an integer. which seems pretty off-point. In the attached I made it give an error message about a bad zone abbreviation if the input starts with a letter, but perhaps the dividing line between "probably meant as a zone name" and "probably meant as numeric" should be drawn differently? Anyway, v2-0001 below is the previous patch rebased up to current (only line numbers change), and then v2-0002 responds to your and Daniel's review comments. regards, tom lane From 8361441cda3343e0f9286bb3b51ccd3041b46807 Mon Sep 17 00:00:00 2001 From: Tom Lane <tgl@sss.pgh.pa.us> Date: Tue, 23 Jan 2024 16:58:19 -0500 Subject: [PATCH v2 1/2] Support timezone abbreviations in to_timestamp(). This patch allows the TZ format code to be used in to_timestamp(). It will accept the same possibilities that to_char() can produce with the TZ format code, namely a zone abbreviation or a numeric zone offset in the form [+-]HH or [+-]HH:MM. While at it we may as well implement the OF format code too, since it corresponds exactly to the numeric zone offset case and indeed can share code. A conceivable extension to this would be to accept timezone names not just abbreviations. However, to_char() never outputs those, and there'd be a pretty serious garbage-input hazard because pg_tzset() will accept just about anything as a POSIX zone name. A useful side effect is that jsonpath's datetime() method will now accept the common-in-JSON format "yyyy-mm-ddThh:mm:ssZ", correctly interpreting the "Z" as signifying UTC time. We can reduce the number of format patterns that executeDateTimeMethod has to try, too. --- doc/src/sgml/func.sgml | 10 +- src/backend/utils/adt/datetime.c | 76 ++++++++++ src/backend/utils/adt/formatting.c | 151 ++++++++++++------- src/backend/utils/adt/jsonpath_exec.c | 18 +-- src/include/utils/datetime.h | 3 + src/test/regress/expected/horology.out | 70 ++++++++- src/test/regress/expected/jsonb_jsonpath.out | 12 ++ src/test/regress/sql/horology.sql | 18 ++- src/test/regress/sql/jsonb_jsonpath.sql | 2 + 9 files changed, 287 insertions(+), 73 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 210c7c0b02..ad965432f3 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -8131,13 +8131,11 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}'); </row> <row> <entry><literal>TZ</literal></entry> - <entry>upper case time-zone abbreviation - (only supported in <function>to_char</function>)</entry> + <entry>upper case time-zone abbreviation</entry> </row> <row> <entry><literal>tz</literal></entry> - <entry>lower case time-zone abbreviation - (only supported in <function>to_char</function>)</entry> + <entry>lower case time-zone abbreviation</entry> </row> <row> <entry><literal>TZH</literal></entry> @@ -8149,8 +8147,8 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}'); </row> <row> <entry><literal>OF</literal></entry> - <entry>time-zone offset from UTC - (only supported in <function>to_char</function>)</entry> + <entry>time-zone offset from UTC (<replaceable>HH</replaceable> + or <replaceable>HH</replaceable><literal>:</literal><replaceable>MM</replaceable>)</entry> </row> </tbody> </tgroup> diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c index 17b0248bf7..cccabb0c2a 100644 --- a/src/backend/utils/adt/datetime.c +++ b/src/backend/utils/adt/datetime.c @@ -3246,6 +3246,82 @@ DecodeTimezoneNameToTz(const char *tzname) return result; } +/* DecodeTimezoneAbbrevPrefix() + * Interpret prefix of string as a timezone abbreviation, if possible. + * + * This has roughly the same functionality as DecodeTimezoneAbbrev(), + * but the API is adapted to the needs of formatting.c. Notably, + * we will match the longest possible prefix of the given string + * rather than insisting on a complete match, and downcasing is applied + * here rather than in the caller. + * + * Returns the length of the timezone abbreviation, or -1 if not recognized. + * On success, sets *offset to the GMT offset for the abbreviation if it + * is a fixed-offset abbreviation, or sets *tz to the pg_tz struct for + * a dynamic abbreviation. + */ +int +DecodeTimezoneAbbrevPrefix(const char *str, int *offset, pg_tz **tz) +{ + char lowtoken[TOKMAXLEN + 1]; + int len; + + *offset = 0; /* avoid uninitialized vars on failure */ + *tz = NULL; + + if (!zoneabbrevtbl) + return -1; /* no abbrevs known, so fail immediately */ + + /* Downcase as much of the string as we could need */ + for (len = 0; len < TOKMAXLEN; len++) + { + if (*str == '\0' || !isalpha((unsigned char) *str)) + break; + lowtoken[len] = pg_tolower((unsigned char) *str++); + } + lowtoken[len] = '\0'; + + /* + * We could avoid doing repeated binary searches if we cared to duplicate + * datebsearch here, but it's not clear that such an optimization would be + * worth the trouble. In common cases there's probably not anything after + * the zone abbrev anyway. So just search with successively truncated + * strings. + */ + while (len > 0) + { + const datetkn *tp = datebsearch(lowtoken, zoneabbrevtbl->abbrevs, + zoneabbrevtbl->numabbrevs); + + if (tp != NULL) + { + if (tp->type == DYNTZ) + { + DateTimeErrorExtra extra; + pg_tz *tzp = FetchDynamicTimeZone(zoneabbrevtbl, tp, + &extra); + + if (tzp != NULL) + { + /* Caller must resolve the abbrev's current meaning */ + *tz = tzp; + return len; + } + } + else + { + /* Fixed-offset zone abbrev, so it's easy */ + *offset = tp->value; + return len; + } + } + lowtoken[--len] = '\0'; + } + + /* Did not find a match */ + return -1; +} + /* ClearPgItmIn * diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c index 83e1f1265c..8859d174c2 100644 --- a/src/backend/utils/adt/formatting.c +++ b/src/backend/utils/adt/formatting.c @@ -418,14 +418,24 @@ typedef struct us, yysz, /* is it YY or YYYY ? */ clock, /* 12 or 24 hour clock? */ - tzsign, /* +1, -1 or 0 if timezone info is absent */ + tzsign, /* +1, -1, or 0 if no TZH/TZM fields */ tzh, tzm, ff; /* fractional precision */ + bool has_tz; /* was there a TZ field? */ + int gmtoffset; /* GMT offset of fixed-offset zone abbrev */ + pg_tz *tzp; /* pg_tz for dynamic abbrev */ + char *abbrev; /* dynamic abbrev */ } TmFromChar; #define ZERO_tmfc(_X) memset(_X, 0, sizeof(TmFromChar)) +struct fmt_tz /* do_to_timestamp's timezone info output */ +{ + bool has_tz; /* was there any TZ/TZH/TZM field? */ + int gmtoffset; /* GMT offset in seconds */ +}; + /* ---------- * Debug * ---------- @@ -1058,8 +1068,8 @@ static bool from_char_seq_search(int *dest, const char **src, char **localized_array, Oid collid, FormatNode *node, Node *escontext); static bool do_to_timestamp(text *date_txt, text *fmt, Oid collid, bool std, - struct pg_tm *tm, fsec_t *fsec, int *fprec, - uint32 *flags, Node *escontext); + struct pg_tm *tm, fsec_t *fsec, struct fmt_tz *tz, + int *fprec, uint32 *flags, Node *escontext); static char *fill_str(char *str, int c, int max); static FormatNode *NUM_cache(int len, NUMDesc *Num, text *pars_str, bool *shouldFree); static char *int_to_roman(int number); @@ -3467,11 +3477,49 @@ DCH_from_char(FormatNode *node, const char *in, TmFromChar *out, break; case DCH_tz: case DCH_TZ: + { + int tzlen; + + tzlen = DecodeTimezoneAbbrevPrefix(s, + &out->gmtoffset, + &out->tzp); + if (tzlen > 0) + { + out->has_tz = true; + /* we only need the zone abbrev for DYNTZ case */ + if (out->tzp) + out->abbrev = pnstrdup(s, tzlen); + out->tzsign = 0; /* drop any earlier TZH/TZM info */ + s += tzlen; + break; + } + } + /* it doesn't match any abbrev, so parse it like OF */ + /* FALL THRU */ case DCH_OF: - ereturn(escontext,, - (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - errmsg("formatting field \"%s\" is only supported in to_char", - n->key->name))); + /* OF is equivalent to TZH or TZH:TZM */ + /* see TZH comments below */ + if (*s == '+' || *s == '-' || *s == ' ') + { + out->tzsign = *s == '-' ? -1 : +1; + s++; + } + else + { + if (extra_skip > 0 && *(s - 1) == '-') + out->tzsign = -1; + else + out->tzsign = +1; + } + if (from_char_parse_int_len(&out->tzh, &s, 2, n, escontext) < 0) + return; + if (*s == ':') + { + s++; + if (from_char_parse_int_len(&out->tzm, &s, 2, n, + escontext) < 0) + return; + } break; case DCH_TZH: @@ -4167,22 +4215,16 @@ to_timestamp(PG_FUNCTION_ARGS) Timestamp result; int tz; struct pg_tm tm; + struct fmt_tz ftz; fsec_t fsec; int fprec; do_to_timestamp(date_txt, fmt, collid, false, - &tm, &fsec, &fprec, NULL, NULL); + &tm, &fsec, &ftz, &fprec, NULL, NULL); /* Use the specified time zone, if any. */ - if (tm.tm_zone) - { - DateTimeErrorExtra extra; - int dterr = DecodeTimezone(tm.tm_zone, &tz); - - if (dterr) - DateTimeParseError(dterr, &extra, text_to_cstring(date_txt), - "timestamptz", NULL); - } + if (ftz.has_tz) + tz = ftz.gmtoffset; else tz = DetermineTimeZoneOffset(&tm, session_timezone); @@ -4211,10 +4253,11 @@ to_date(PG_FUNCTION_ARGS) Oid collid = PG_GET_COLLATION(); DateADT result; struct pg_tm tm; + struct fmt_tz ftz; fsec_t fsec; do_to_timestamp(date_txt, fmt, collid, false, - &tm, &fsec, NULL, NULL, NULL); + &tm, &fsec, &ftz, NULL, NULL, NULL); /* Prevent overflow in Julian-day routines */ if (!IS_VALID_JULIAN(tm.tm_year, tm.tm_mon, tm.tm_mday)) @@ -4256,12 +4299,13 @@ parse_datetime(text *date_txt, text *fmt, Oid collid, bool strict, Node *escontext) { struct pg_tm tm; + struct fmt_tz ftz; fsec_t fsec; int fprec; uint32 flags; if (!do_to_timestamp(date_txt, fmt, collid, strict, - &tm, &fsec, &fprec, &flags, escontext)) + &tm, &fsec, &ftz, &fprec, &flags, escontext)) return (Datum) 0; *typmod = fprec ? fprec : -1; /* fractional part precision */ @@ -4274,18 +4318,9 @@ parse_datetime(text *date_txt, text *fmt, Oid collid, bool strict, { TimestampTz result; - if (tm.tm_zone) + if (ftz.has_tz) { - DateTimeErrorExtra extra; - int dterr = DecodeTimezone(tm.tm_zone, tz); - - if (dterr) - { - DateTimeParseError(dterr, &extra, - text_to_cstring(date_txt), - "timestamptz", escontext); - return (Datum) 0; - } + *tz = ftz.gmtoffset; } else { @@ -4366,18 +4401,9 @@ parse_datetime(text *date_txt, text *fmt, Oid collid, bool strict, { TimeTzADT *result = palloc(sizeof(TimeTzADT)); - if (tm.tm_zone) + if (ftz.has_tz) { - DateTimeErrorExtra extra; - int dterr = DecodeTimezone(tm.tm_zone, tz); - - if (dterr) - { - DateTimeParseError(dterr, &extra, - text_to_cstring(date_txt), - "timetz", escontext); - return (Datum) 0; - } + *tz = ftz.gmtoffset; } else { @@ -4430,7 +4456,7 @@ parse_datetime(text *date_txt, text *fmt, Oid collid, bool strict, * do_to_timestamp: shared code for to_timestamp and to_date * * Parse the 'date_txt' according to 'fmt', return results as a struct pg_tm, - * fractional seconds, and fractional precision. + * fractional seconds, struct fmt_tz, and fractional precision. * * 'collid' identifies the collation to use, if needed. * 'std' specifies standard parsing mode. @@ -4447,12 +4473,12 @@ parse_datetime(text *date_txt, text *fmt, Oid collid, bool strict, * 'date_txt'. * * The TmFromChar is then analysed and converted into the final results in - * struct 'tm', 'fsec', and 'fprec'. + * struct 'tm', 'fsec', struct 'tz', and 'fprec'. */ static bool do_to_timestamp(text *date_txt, text *fmt, Oid collid, bool std, - struct pg_tm *tm, fsec_t *fsec, int *fprec, - uint32 *flags, Node *escontext) + struct pg_tm *tm, fsec_t *fsec, struct fmt_tz *tz, + int *fprec, uint32 *flags, Node *escontext) { FormatNode *format = NULL; TmFromChar tmfc; @@ -4469,6 +4495,7 @@ do_to_timestamp(text *date_txt, text *fmt, Oid collid, bool std, ZERO_tmfc(&tmfc); ZERO_tm(tm); *fsec = 0; + tz->has_tz = false; if (fprec) *fprec = 0; if (flags) @@ -4744,11 +4771,14 @@ do_to_timestamp(text *date_txt, text *fmt, Oid collid, bool std, goto fail; } - /* Save parsed time-zone into tm->tm_zone if it was specified */ + /* + * If timezone info was present, reduce it to a GMT offset. (We cannot do + * this until we've filled all of the tm struct, since the zone's offset + * might be time-varying.) + */ if (tmfc.tzsign) { - char *tz; - + /* TZH and/or TZM fields */ if (tmfc.tzh < 0 || tmfc.tzh > MAX_TZDISP_HOUR || tmfc.tzm < 0 || tmfc.tzm >= MINS_PER_HOUR) { @@ -4757,10 +4787,27 @@ do_to_timestamp(text *date_txt, text *fmt, Oid collid, bool std, goto fail; } - tz = psprintf("%c%02d:%02d", - tmfc.tzsign > 0 ? '+' : '-', tmfc.tzh, tmfc.tzm); - - tm->tm_zone = tz; + tz->has_tz = true; + tz->gmtoffset = (tmfc.tzh * MINS_PER_HOUR + tmfc.tzm) * SECS_PER_MINUTE; + /* note we are flipping the sign convention here */ + if (tmfc.tzsign > 0) + tz->gmtoffset = -tz->gmtoffset; + } + else if (tmfc.has_tz) + { + /* TZ field */ + tz->has_tz = true; + if (tmfc.tzp == NULL) + { + /* fixed-offset abbreviation; flip the sign convention */ + tz->gmtoffset = -tmfc.gmtoffset; + } + else + { + /* dynamic-offset abbreviation, resolve using specified time */ + tz->gmtoffset = DetermineTimeZoneAbbrevOffset(tm, tmfc.abbrev, + tmfc.tzp); + } } DEBUG_TM(tm); diff --git a/src/backend/utils/adt/jsonpath_exec.c b/src/backend/utils/adt/jsonpath_exec.c index ac16f5c85d..1db1ffe3c8 100644 --- a/src/backend/utils/adt/jsonpath_exec.c +++ b/src/backend/utils/adt/jsonpath_exec.c @@ -1846,20 +1846,14 @@ executeDateTimeMethod(JsonPathExecContext *cxt, JsonPathItem *jsp, static const char *fmt_str[] = { "yyyy-mm-dd", /* date */ - "HH24:MI:SS.USTZH:TZM", /* timetz */ - "HH24:MI:SS.USTZH", - "HH24:MI:SSTZH:TZM", - "HH24:MI:SSTZH", + "HH24:MI:SS.USTZ", /* timetz */ + "HH24:MI:SSTZ", "HH24:MI:SS.US", /* time without tz */ "HH24:MI:SS", - "yyyy-mm-dd HH24:MI:SS.USTZH:TZM", /* timestamptz */ - "yyyy-mm-dd HH24:MI:SS.USTZH", - "yyyy-mm-dd HH24:MI:SSTZH:TZM", - "yyyy-mm-dd HH24:MI:SSTZH", - "yyyy-mm-dd\"T\"HH24:MI:SS.USTZH:TZM", - "yyyy-mm-dd\"T\"HH24:MI:SS.USTZH", - "yyyy-mm-dd\"T\"HH24:MI:SSTZH:TZM", - "yyyy-mm-dd\"T\"HH24:MI:SSTZH", + "yyyy-mm-dd HH24:MI:SS.USTZ", /* timestamptz */ + "yyyy-mm-dd HH24:MI:SSTZ", + "yyyy-mm-dd\"T\"HH24:MI:SS.USTZ", + "yyyy-mm-dd\"T\"HH24:MI:SSTZ", "yyyy-mm-dd HH24:MI:SS.US", /* timestamp without tz */ "yyyy-mm-dd HH24:MI:SS", "yyyy-mm-dd\"T\"HH24:MI:SS.US", diff --git a/src/include/utils/datetime.h b/src/include/utils/datetime.h index 460c75cfdd..e4ac2b8e7f 100644 --- a/src/include/utils/datetime.h +++ b/src/include/utils/datetime.h @@ -348,6 +348,9 @@ extern int DecodeUnits(int field, const char *lowtoken, int *val); extern int DecodeTimezoneName(const char *tzname, int *offset, pg_tz **tz); extern pg_tz *DecodeTimezoneNameToTz(const char *tzname); +extern int DecodeTimezoneAbbrevPrefix(const char *str, + int *offset, pg_tz **tz); + extern int j2day(int date); extern struct Node *TemporalSimplify(int32 max_precis, struct Node *node); diff --git a/src/test/regress/expected/horology.out b/src/test/regress/expected/horology.out index cfb4b205e4..142a2109ba 100644 --- a/src/test/regress/expected/horology.out +++ b/src/test/regress/expected/horology.out @@ -3140,8 +3140,54 @@ SELECT to_timestamp('2011-12-18 11:38 20', 'YYYY-MM-DD HH12:MI TZM'); Sun Dec 18 03:18:00 2011 PST (1 row) -SELECT to_timestamp('2011-12-18 11:38 PST', 'YYYY-MM-DD HH12:MI TZ'); -- NYI -ERROR: formatting field "TZ" is only supported in to_char +SELECT to_timestamp('2011-12-18 11:38 EST', 'YYYY-MM-DD HH12:MI TZ'); + to_timestamp +------------------------------ + Sun Dec 18 08:38:00 2011 PST +(1 row) + +SELECT to_timestamp('2011-12-18 11:38 -05', 'YYYY-MM-DD HH12:MI TZ'); + to_timestamp +------------------------------ + Sun Dec 18 08:38:00 2011 PST +(1 row) + +SELECT to_timestamp('2011-12-18 11:38 +01:30', 'YYYY-MM-DD HH12:MI TZ'); + to_timestamp +------------------------------ + Sun Dec 18 02:08:00 2011 PST +(1 row) + +SELECT to_timestamp('2011-12-18 11:38 MSK', 'YYYY-MM-DD HH12:MI TZ'); -- dyntz + to_timestamp +------------------------------ + Sat Dec 17 23:38:00 2011 PST +(1 row) + +SELECT to_timestamp('2011-12-18 11:38ESTFOO24', 'YYYY-MM-DD HH12:MITZFOOSS'); + to_timestamp +------------------------------ + Sun Dec 18 08:38:24 2011 PST +(1 row) + +SELECT to_timestamp('2011-12-18 11:38-05FOO24', 'YYYY-MM-DD HH12:MITZFOOSS'); + to_timestamp +------------------------------ + Sun Dec 18 08:38:24 2011 PST +(1 row) + +SELECT to_timestamp('2011-12-18 11:38 -05', 'YYYY-MM-DD HH12:MI OF'); + to_timestamp +------------------------------ + Sun Dec 18 08:38:00 2011 PST +(1 row) + +SELECT to_timestamp('2011-12-18 11:38 +01:30', 'YYYY-MM-DD HH12:MI OF'); + to_timestamp +------------------------------ + Sun Dec 18 02:08:00 2011 PST +(1 row) + SELECT to_timestamp('2018-11-02 12:34:56.025', 'YYYY-MM-DD HH24:MI:SS.MS'); to_timestamp ---------------------------------- @@ -3557,6 +3603,19 @@ SELECT to_date('0000-02-01','YYYY-MM-DD'); -- allowed, though it shouldn't be 02-01-0001 BC (1 row) +-- to_char's TZ format code produces zone abbrev if known +SELECT to_char('2012-12-12 12:00'::timestamptz, 'YYYY-MM-DD HH:MI:SS TZ'); + to_char +------------------------- + 2012-12-12 12:00:00 PST +(1 row) + +SELECT to_char('2012-12-12 12:00'::timestamptz, 'YYYY-MM-DD HH:MI:SS tz'); + to_char +------------------------- + 2012-12-12 12:00:00 pst +(1 row) + -- -- Check behavior with SQL-style fixed-GMT-offset time zone (cf bug #8572) -- @@ -3598,4 +3657,11 @@ SELECT to_char('2012-12-12 12:00'::timestamptz, 'YYYY-MM-DD SSSSS'); 2012-12-12 43200 (1 row) +SET TIME ZONE '+2'; +SELECT to_char('2012-12-12 12:00'::timestamptz, 'YYYY-MM-DD HH:MI:SS TZ'); + to_char +------------------------- + 2012-12-12 12:00:00 +02 +(1 row) + RESET TIME ZONE; diff --git a/src/test/regress/expected/jsonb_jsonpath.out b/src/test/regress/expected/jsonb_jsonpath.out index 6659bc9091..783a6f51bf 100644 --- a/src/test/regress/expected/jsonb_jsonpath.out +++ b/src/test/regress/expected/jsonb_jsonpath.out @@ -1935,6 +1935,18 @@ select jsonb_path_query('"2017-03-10T12:34:56.789+3:10"', '$.datetime()'); select jsonb_path_query('"2017-03-10t12:34:56.789+3:10"', '$.datetime()'); ERROR: datetime format is not recognized: "2017-03-10t12:34:56.789+3:10" HINT: Use a datetime template argument to specify the input data format. +select jsonb_path_query('"2017-03-10T12:34:56.789EST"', '$.datetime()'); + jsonb_path_query +--------------------------------- + "2017-03-10T12:34:56.789-05:00" +(1 row) + +select jsonb_path_query('"2017-03-10T12:34:56.789Z"', '$.datetime()'); + jsonb_path_query +--------------------------------- + "2017-03-10T12:34:56.789+00:00" +(1 row) + select jsonb_path_query('"12:34:56"', '$.datetime().type()'); jsonb_path_query -------------------------- diff --git a/src/test/regress/sql/horology.sql b/src/test/regress/sql/horology.sql index 252bce4b1c..fe647af976 100644 --- a/src/test/regress/sql/horology.sql +++ b/src/test/regress/sql/horology.sql @@ -501,7 +501,15 @@ SELECT to_timestamp('2011-12-18 11:38 +05:20', 'YYYY-MM-DD HH12:MI TZH:TZM'); SELECT to_timestamp('2011-12-18 11:38 -05:20', 'YYYY-MM-DD HH12:MI TZH:TZM'); SELECT to_timestamp('2011-12-18 11:38 20', 'YYYY-MM-DD HH12:MI TZM'); -SELECT to_timestamp('2011-12-18 11:38 PST', 'YYYY-MM-DD HH12:MI TZ'); -- NYI +SELECT to_timestamp('2011-12-18 11:38 EST', 'YYYY-MM-DD HH12:MI TZ'); +SELECT to_timestamp('2011-12-18 11:38 -05', 'YYYY-MM-DD HH12:MI TZ'); +SELECT to_timestamp('2011-12-18 11:38 +01:30', 'YYYY-MM-DD HH12:MI TZ'); +SELECT to_timestamp('2011-12-18 11:38 MSK', 'YYYY-MM-DD HH12:MI TZ'); -- dyntz +SELECT to_timestamp('2011-12-18 11:38ESTFOO24', 'YYYY-MM-DD HH12:MITZFOOSS'); +SELECT to_timestamp('2011-12-18 11:38-05FOO24', 'YYYY-MM-DD HH12:MITZFOOSS'); + +SELECT to_timestamp('2011-12-18 11:38 -05', 'YYYY-MM-DD HH12:MI OF'); +SELECT to_timestamp('2011-12-18 11:38 +01:30', 'YYYY-MM-DD HH12:MI OF'); SELECT to_timestamp('2018-11-02 12:34:56.025', 'YYYY-MM-DD HH24:MI:SS.MS'); @@ -616,6 +624,10 @@ SELECT to_date('2016 366', 'YYYY DDD'); -- ok SELECT to_date('2016 367', 'YYYY DDD'); SELECT to_date('0000-02-01','YYYY-MM-DD'); -- allowed, though it shouldn't be +-- to_char's TZ format code produces zone abbrev if known +SELECT to_char('2012-12-12 12:00'::timestamptz, 'YYYY-MM-DD HH:MI:SS TZ'); +SELECT to_char('2012-12-12 12:00'::timestamptz, 'YYYY-MM-DD HH:MI:SS tz'); + -- -- Check behavior with SQL-style fixed-GMT-offset time zone (cf bug #8572) -- @@ -632,4 +644,8 @@ SELECT to_char('2012-12-12 12:00'::timestamptz, 'YYYY-MM-DD HH:MI:SS TZ'); SELECT to_char('2012-12-12 12:00'::timestamptz, 'YYYY-MM-DD SSSS'); SELECT to_char('2012-12-12 12:00'::timestamptz, 'YYYY-MM-DD SSSSS'); +SET TIME ZONE '+2'; + +SELECT to_char('2012-12-12 12:00'::timestamptz, 'YYYY-MM-DD HH:MI:SS TZ'); + RESET TIME ZONE; diff --git a/src/test/regress/sql/jsonb_jsonpath.sql b/src/test/regress/sql/jsonb_jsonpath.sql index e0ce509264..e9bd94b034 100644 --- a/src/test/regress/sql/jsonb_jsonpath.sql +++ b/src/test/regress/sql/jsonb_jsonpath.sql @@ -417,6 +417,8 @@ select jsonb_path_query('"2017-03-10t12:34:56+3:10"', '$.datetime()'); select jsonb_path_query('"2017-03-10 12:34:56.789+3:10"', '$.datetime()'); select jsonb_path_query('"2017-03-10T12:34:56.789+3:10"', '$.datetime()'); select jsonb_path_query('"2017-03-10t12:34:56.789+3:10"', '$.datetime()'); +select jsonb_path_query('"2017-03-10T12:34:56.789EST"', '$.datetime()'); +select jsonb_path_query('"2017-03-10T12:34:56.789Z"', '$.datetime()'); select jsonb_path_query('"12:34:56"', '$.datetime().type()'); select jsonb_path_query('"12:34:56"', '$.datetime()'); select jsonb_path_query('"12:34:56+3"', '$.datetime().type()'); -- 2.39.3 From e4403f185c4cbd0d5035002a2fe6073acb6c048d Mon Sep 17 00:00:00 2001 From: Tom Lane <tgl@sss.pgh.pa.us> Date: Tue, 23 Jan 2024 17:25:19 -0500 Subject: [PATCH v2 2/2] Respond to review comments. Standardize on "FALLTHROUGH" as the way to spell such comments within formatting.c (there were two of these already). Improve error message given for a bogus zone abbreviation, and add some tests to exercise these error cases. --- src/backend/utils/adt/formatting.c | 20 +++++++++++++++++--- src/test/regress/expected/horology.out | 6 ++++++ src/test/regress/sql/horology.sql | 2 ++ 3 files changed, 25 insertions(+), 3 deletions(-) diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c index 8859d174c2..829aaa8d0e 100644 --- a/src/backend/utils/adt/formatting.c +++ b/src/backend/utils/adt/formatting.c @@ -3454,7 +3454,7 @@ DCH_from_char(FormatNode *node, const char *in, TmFromChar *out, case DCH_FF5: case DCH_FF6: out->ff = n->key->id - DCH_FF1 + 1; - /* fall through */ + /* FALLTHROUGH */ case DCH_US: /* microsecond */ len = from_char_parse_int_len(&out->us, &s, n->key->id == DCH_US ? 6 : @@ -3493,9 +3493,23 @@ DCH_from_char(FormatNode *node, const char *in, TmFromChar *out, s += tzlen; break; } + else if (isalpha((unsigned char) *s)) + { + /* + * It doesn't match any abbreviation, but it starts + * with a letter. OF format certainly won't succeed; + * assume it's a misspelled abbreviation and complain + * accordingly. + */ + ereturn(escontext,, + (errcode(ERRCODE_INVALID_DATETIME_FORMAT), + errmsg("invalid value \"%s\" for \"%s\"", + s, n->key->name), + errdetail("Time zone abbreviation is not recognized."))); + } + /* otherwise parse it like OF */ } - /* it doesn't match any abbrev, so parse it like OF */ - /* FALL THRU */ + /* FALLTHROUGH */ case DCH_OF: /* OF is equivalent to TZH or TZH:TZM */ /* see TZH comments below */ diff --git a/src/test/regress/expected/horology.out b/src/test/regress/expected/horology.out index 142a2109ba..c810c4fc91 100644 --- a/src/test/regress/expected/horology.out +++ b/src/test/regress/expected/horology.out @@ -3176,6 +3176,12 @@ SELECT to_timestamp('2011-12-18 11:38-05FOO24', 'YYYY-MM-DD HH12:MITZFOOSS'); Sun Dec 18 08:38:24 2011 PST (1 row) +SELECT to_timestamp('2011-12-18 11:38 JUNK', 'YYYY-MM-DD HH12:MI TZ'); -- error +ERROR: invalid value "JUNK" for "TZ" +DETAIL: Time zone abbreviation is not recognized. +SELECT to_timestamp('2011-12-18 11:38 ...', 'YYYY-MM-DD HH12:MI TZ'); -- error +ERROR: invalid value ".." for "TZ" +DETAIL: Value must be an integer. SELECT to_timestamp('2011-12-18 11:38 -05', 'YYYY-MM-DD HH12:MI OF'); to_timestamp ------------------------------ diff --git a/src/test/regress/sql/horology.sql b/src/test/regress/sql/horology.sql index fe647af976..7edd65facb 100644 --- a/src/test/regress/sql/horology.sql +++ b/src/test/regress/sql/horology.sql @@ -507,6 +507,8 @@ SELECT to_timestamp('2011-12-18 11:38 +01:30', 'YYYY-MM-DD HH12:MI TZ'); SELECT to_timestamp('2011-12-18 11:38 MSK', 'YYYY-MM-DD HH12:MI TZ'); -- dyntz SELECT to_timestamp('2011-12-18 11:38ESTFOO24', 'YYYY-MM-DD HH12:MITZFOOSS'); SELECT to_timestamp('2011-12-18 11:38-05FOO24', 'YYYY-MM-DD HH12:MITZFOOSS'); +SELECT to_timestamp('2011-12-18 11:38 JUNK', 'YYYY-MM-DD HH12:MI TZ'); -- error +SELECT to_timestamp('2011-12-18 11:38 ...', 'YYYY-MM-DD HH12:MI TZ'); -- error SELECT to_timestamp('2011-12-18 11:38 -05', 'YYYY-MM-DD HH12:MI OF'); SELECT to_timestamp('2011-12-18 11:38 +01:30', 'YYYY-MM-DD HH12:MI OF'); -- 2.39.3
> On 23 Jan 2024, at 23:33, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Anyway, v2-0001 below is the previous patch rebased up to current > (only line numbers change), and then v2-0002 responds to your > and Daniel's review comments. LGTM. -- Daniel Gustafsson
Hi, > > Anyway, v2-0001 below is the previous patch rebased up to current > > (only line numbers change), and then v2-0002 responds to your > > and Daniel's review comments. > > LGTM. ``` +SELECT to_timestamp('2011-12-18 11:38 JUNK', 'YYYY-MM-DD HH12:MI TZ'); -- error +ERROR: invalid value "JUNK" for "TZ" +DETAIL: Time zone abbreviation is not recognized. +SELECT to_timestamp('2011-12-18 11:38 ...', 'YYYY-MM-DD HH12:MI TZ'); -- error +ERROR: invalid value ".." for "TZ" ``` Shouldn't the second error display the full value "..." (three dots) similarly to the previous one? Also I think we need at least one negative test for OF. Other than that v2 looks OK. -- Best regards, Aleksander Alekseev
Aleksander Alekseev <aleksander@timescale.com> writes: > +SELECT to_timestamp('2011-12-18 11:38 JUNK', 'YYYY-MM-DD HH12:MI TZ'); -- error > +ERROR: invalid value "JUNK" for "TZ" > +DETAIL: Time zone abbreviation is not recognized. > +SELECT to_timestamp('2011-12-18 11:38 ...', 'YYYY-MM-DD HH12:MI TZ'); -- error > +ERROR: invalid value ".." for "TZ" > Shouldn't the second error display the full value "..." (three dots) > similarly to the previous one? That's coming from the pre-existing OF code, which is looking for a integer of at most two digits. I'm not especially inclined to mess with that, and even if I were I'd think it should be a separate patch. > Also I think we need at least one > negative test for OF. OK. regards, tom lane