Обсуждение: IANA timezone abbreviations versus timezone_abbreviations

Поиск
Список
Период
Сортировка

IANA timezone abbreviations versus timezone_abbreviations

От
Tom Lane
Дата:
Although we've never documented this, it's been true for ages
that timestamptz_out shows timezone abbreviations that are taken
from the IANA tzdb data (in datestyles that use non-numeric
timezone fields, which is all but ISO).  Meanwhile, timestamptz_in
recognizes timezone abbreviations if they match an entry in the
timezone_abbreviations file.  Those two sources of truth are not
entirely consistent, leading to fun results like these:

regression=# set datestyle = postgres;
SET
regression=# set timezone = 'America/Montevideo';
SET
regression=# select '1900-01-01 00:00'::timestamptz;
         timestamptz
------------------------------
 Mon Jan 01 00:00:00 1900 LMT
(1 row)

regression=# select '1900-01-01 00:00'::timestamptz::text::timestamptz;
ERROR:  invalid input syntax for type timestamp with time zone: "Mon Jan 01 00:00:00 1900 LMT"

(because LMT for "local mean time" is not in timezone_abbreviations)

regression=# select '1912-01-01 00:00'::timestamptz;
         timestamptz
------------------------------
 Mon Jan 01 00:00:00 1912 MMT
(1 row)

regression=# select '1912-01-01 00:00'::timestamptz::text::timestamptz;
         timestamptz
------------------------------
 Sun Dec 31 13:45:09 1911 MMT
(1 row)

(because this zone uses MMT for "Montevideo Mean Time" while
timezone_abbreviations thinks it means "Myanmar Time").

You can get unfortunate results even for current timestamps,
because we've not hesitated to put some North-American-centric
interpretations into the default abbreviations list:

regression=# set timezone = 'Asia/Shanghai';
SET
regression=# select '2024-12-13 00:00'::timestamptz;
         timestamptz
------------------------------
 Fri Dec 13 00:00:00 2024 CST
(1 row)

regression=# select '2024-12-13 00:00'::timestamptz::text::timestamptz;
         timestamptz
------------------------------
 Fri Dec 13 14:00:00 2024 CST

(because this IANA zone uses CST for "China Standard Time"
while timezone_abbreviations thinks it means (USA) "Central
Standard Time").

This mess was brought up in pgsql-bugs [1], but the solution
I propose here is invasive enough that I think it needs
discussion on -hackers.

What I think we should do about this is to teach timestamp
input to look into the current IANA time zone to see if it
knows the given abbreviation, and if so use that meaning
regardless of what timezone_abbreviations might say.  This
isn't particularly hard, and it doesn't appear to cost
anything speed-wise, but is there anybody out there who
is relying on the current behavior?

I can imagine that somebody might be using an interpretation
that is contrary to IANA's ideas; but it seems fairly unlikely
with current IANA data, because they largely got rid of the
made-up abbreviations their data used to be full of.  Anyplace
where we find a non-numeric abbreviation in the IANA data is
probably someplace where that abbreviation is widely current,
and people wouldn't expect it to mean something different.

On the positive side, this gives us a far better story for
abbreviations that conflict in different parts of the world.
timestamptz_in will now automatically do the right thing
given a correct timezone setting, without having to manually
adjust the abbreviation list.  So between that and getting
rid of the round-trip hazards seen above, I think there is
sufficient reason to do this.

The only other way I can envision to remove the round-trip hazard
is to stop using alphabetic abbreviations at all in timestamp
output, and use numeric GMT offsets regardless of datestyle.
I doubt that would make many people happy.  It would certainly
break a bunch of our own regression tests, and I expect it would
break other people's applications too.

(To be clear, I'm only proposing this for v18 not for back-patch.
While it's certainly fixing live bugs, there have not been that
many complaints, and the behavioral change is surely more than
we want for a back branch.)

Draft patches attached.  Any thoughts?

            regards, tom lane

[1] https://www.postgresql.org/message-id/flat/CAJ7c6TOATjJqvhnYsui0%3DCO5XFMF4dvTGH%2BskzB--jNhqSQu5g%40mail.gmail.com

From cd36b9f4916abdcc2594ed7529941c787ca10fa4 Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Thu, 12 Dec 2024 17:39:44 -0500
Subject: [PATCH v1 1/2] Seek zone abbreviations in the IANA data before
 timezone_abbreviations.

If a time zone abbreviation used in datetime input is defined in
the currently active timezone, use that definition in preference
to looking in the timezone_abbreviations list.  That allows us to
correctly handle abbreviations that have different meanings in
different timezones.  Also, it eliminates an inconsistency between
datetime input and datetime output: the non-ISO datestyles for
timestamptz have always printed abbreviations taken from the IANA
data not from timezone_abbreviations.  Before this fix, it was
possible to demonstrate cases where casting a timestamp to text
and back fails or changes the value significantly because of that
inconsistency.

While this change removes the ability to override the IANA data about
an abbreviation known in the current zone, it's not clear that there's
any real use-case for doing so.  But it is clear that this makes life
a lot easier for dealing with abbreviations that have conflicts.

There are a couple of loose ends still to deal with:

* As this patch stands, it causes a noticeable degradation of the
runtime of timestamptz_in (about 20% in a microbenchmark of just
that function).  This is from DecodeTimezoneAbbrev not caching
the results of its lookup in the new path.  I split out the
improvement of that part for a follow-up patch.

* The pg_timezone_abbrevs view shows only abbreviations from
the timezone_abbreviations list.  That should probably be
adjusted to account for abbreviations taken from the timezone.

Per report from Aleksander Alekseev and additional investigation.

Discussion: https://postgr.es/m/CAJ7c6TOATjJqvhnYsui0=CO5XFMF4dvTGH+skzB--jNhqSQu5g@mail.gmail.com
---
 doc/src/sgml/datatype.sgml                |  4 +
 doc/src/sgml/datetime.sgml                | 14 +++-
 src/backend/utils/adt/datetime.c          | 89 +++++++++++++++++++++--
 src/include/pgtime.h                      |  5 ++
 src/test/regress/expected/horology.out    |  6 ++
 src/test/regress/expected/timestamptz.out | 59 +++++++++++++++
 src/test/regress/sql/horology.sql         |  1 +
 src/test/regress/sql/timestamptz.sql      | 17 +++++
 src/timezone/localtime.c                  | 76 +++++++++++++++++++
 9 files changed, 265 insertions(+), 6 deletions(-)

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 3e6751d64c..1d9127e94e 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -2534,6 +2534,10 @@ TIMESTAMP WITH TIME ZONE '2004-10-19 10:23:54+02'
      abbreviation if one is in common use in the current zone.  Otherwise
      it appears as a signed numeric offset in ISO 8601 basic format
      (<replaceable>hh</replaceable> or <replaceable>hhmm</replaceable>).
+     The alphabetic abbreviations shown in these styles are taken from the
+     IANA time zone database entry currently selected by the
+     <xref linkend="guc-timezone"/> run-time parameter; they are not
+     affected by the <xref linkend="guc-timezone-abbreviations"/> setting.
     </para>

     <para>
diff --git a/doc/src/sgml/datetime.sgml b/doc/src/sgml/datetime.sgml
index e7035c7806..eb8760c1f0 100644
--- a/doc/src/sgml/datetime.sgml
+++ b/doc/src/sgml/datetime.sgml
@@ -424,7 +424,7 @@
    <para>
     Since timezone abbreviations are not well standardized,
     <productname>PostgreSQL</productname> provides a means to customize
-    the set of abbreviations accepted by the server.  The
+    the set of abbreviations accepted in datetime input.  The
     <xref linkend="guc-timezone-abbreviations"/> run-time parameter
     determines the active set of abbreviations.  While this parameter
     can be altered by any database user, the possible values for it
@@ -444,6 +444,18 @@
     backup files and other extraneous files.)
    </para>

+   <para>
+    Before consulting the <varname>timezone_abbreviations</varname> file,
+    <productname>PostgreSQL</productname> checks to see whether an
+    abbreviation used in datetime input is defined in the IANA time zone
+    database entry currently selected by the
+    <xref linkend="guc-timezone"/> run-time parameter.  If so the time
+    zone's meaning is used, for consistency with datetime output.  The
+    <varname>timezone_abbreviations</varname> file is mainly useful for
+    allowing datetime input to recognize abbreviations for time zones
+    other than the active zone.
+   </para>
+
    <para>
     A timezone abbreviation file can contain blank lines and comments
     beginning with <literal>#</literal>.  Non-comment lines must have one of
diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c
index 0b19cddf54..e6b3ac134d 100644
--- a/src/backend/utils/adt/datetime.c
+++ b/src/backend/utils/adt/datetime.c
@@ -1845,6 +1845,40 @@ DetermineTimeZoneAbbrevOffsetInternal(pg_time_t t, const char *abbr, pg_tz *tzp,
 }


+/* TimeZoneAbbrevIsKnown()
+ *
+ * Detect whether the given string is a time zone abbreviation that's known
+ * in the specified TZDB timezone, and if so whether it's fixed or varying
+ * meaning.  The match is not case-sensitive.
+ */
+static bool
+TimeZoneAbbrevIsKnown(const char *abbr, pg_tz *tzp,
+                      bool *isfixed, int *offset, int *isdst)
+{
+    char        upabbr[TZ_STRLEN_MAX + 1];
+    unsigned char *p;
+    long int    gmtoff;
+
+    /* We need to force the abbrev to upper case */
+    strlcpy(upabbr, abbr, sizeof(upabbr));
+    for (p = (unsigned char *) upabbr; *p; p++)
+        *p = pg_toupper(*p);
+
+    /* Look up the abbrev's meaning in this zone */
+    if (pg_timezone_abbrev_is_known(upabbr,
+                                    isfixed,
+                                    &gmtoff,
+                                    isdst,
+                                    tzp))
+    {
+        /* Change sign to agree with DetermineTimeZoneOffset() */
+        *offset = (int) -gmtoff;
+        return true;
+    }
+    return false;
+}
+
+
 /* DecodeTimeOnly()
  * Interpret parsed string as time fields only.
  * Returns 0 if successful, DTERR code if bogus input detected.
@@ -3092,8 +3126,28 @@ DecodeTimezoneAbbrev(int field, const char *lowtoken,
                      int *ftype, int *offset, pg_tz **tz,
                      DateTimeErrorExtra *extra)
 {
+    bool        isfixed;
+    int            isdst;
     const datetkn *tp;

+    /*
+     * See if the current session_timezone recognizes it.  Checking this
+     * before zoneabbrevtbl allows us to correctly handle abbreviations whose
+     * meaning varies across zones, such as "LMT".  (Caching this lookup is
+     * left for later.)
+     */
+    if (session_timezone &&
+        TimeZoneAbbrevIsKnown(lowtoken, session_timezone,
+                              &isfixed, offset, &isdst))
+    {
+        *ftype = (isfixed ? (isdst ? DTZ : TZ) : DYNTZ);
+        *tz = (isfixed ? NULL : session_timezone);
+        /* flip sign to agree with the convention used in zoneabbrevtbl */
+        *offset = -(*offset);
+        return 0;
+    }
+
+    /* Nope, so look in zoneabbrevtbl */
     tp = abbrevcache[field];
     /* use strncmp so that we match truncated tokens */
     if (tp == NULL || strncmp(lowtoken, tp->token, TOKMAXLEN) != 0)
@@ -3109,6 +3163,7 @@ DecodeTimezoneAbbrev(int field, const char *lowtoken,
         *ftype = UNKNOWN_FIELD;
         *offset = 0;
         *tz = NULL;
+        /* failure results are not cached */
     }
     else
     {
@@ -3278,9 +3333,6 @@ DecodeTimezoneAbbrevPrefix(const char *str, int *offset, pg_tz **tz)
     *offset = 0;                /* avoid uninitialized vars on failure */
     *tz = NULL;

-    if (!zoneabbrevtbl)
-        return -1;                /* no abbrevs known, so fail immediately */
-
     /* Downcase as much of the string as we could need */
     for (len = 0; len < TOKMAXLEN; len++)
     {
@@ -3299,9 +3351,34 @@ DecodeTimezoneAbbrevPrefix(const char *str, int *offset, pg_tz **tz)
      */
     while (len > 0)
     {
-        const datetkn *tp = datebsearch(lowtoken, zoneabbrevtbl->abbrevs,
-                                        zoneabbrevtbl->numabbrevs);
+        bool        isfixed;
+        int            isdst;
+        const datetkn *tp;
+
+        /* See if the current session_timezone recognizes it. */
+        if (session_timezone &&
+            TimeZoneAbbrevIsKnown(lowtoken, session_timezone,
+                                  &isfixed, offset, &isdst))
+        {
+            if (isfixed)
+            {
+                /* flip sign to agree with the convention in zoneabbrevtbl */
+                *offset = -(*offset);
+            }
+            else
+            {
+                /* Caller must resolve the abbrev's current meaning */
+                *tz = session_timezone;
+            }
+            return len;
+        }

+        /* Known in zoneabbrevtbl? */
+        if (zoneabbrevtbl)
+            tp = datebsearch(lowtoken, zoneabbrevtbl->abbrevs,
+                             zoneabbrevtbl->numabbrevs);
+        else
+            tp = NULL;
         if (tp != NULL)
         {
             if (tp->type == DYNTZ)
@@ -3324,6 +3401,8 @@ DecodeTimezoneAbbrevPrefix(const char *str, int *offset, pg_tz **tz)
                 return len;
             }
         }
+
+        /* Nope, try the next shorter string. */
         lowtoken[--len] = '\0';
     }

diff --git a/src/include/pgtime.h b/src/include/pgtime.h
index 2742086711..a20b5019d8 100644
--- a/src/include/pgtime.h
+++ b/src/include/pgtime.h
@@ -69,6 +69,11 @@ extern bool pg_interpret_timezone_abbrev(const char *abbrev,
                                          long int *gmtoff,
                                          int *isdst,
                                          const pg_tz *tz);
+extern bool pg_timezone_abbrev_is_known(const char *abbrev,
+                                        bool *isfixed,
+                                        long int *gmtoff,
+                                        int *isdst,
+                                        const pg_tz *tz);
 extern bool pg_get_timezone_offset(const pg_tz *tz, long int *gmtoff);
 extern const char *pg_get_timezone_name(pg_tz *tz);
 extern bool pg_tz_acceptable(pg_tz *tz);
diff --git a/src/test/regress/expected/horology.out b/src/test/regress/expected/horology.out
index cb28dfbaee..b90bfcd794 100644
--- a/src/test/regress/expected/horology.out
+++ b/src/test/regress/expected/horology.out
@@ -3332,6 +3332,12 @@ SELECT to_timestamp('2011-12-18 11:38 MSK', 'YYYY-MM-DD HH12:MI TZ');  -- dyntz
  Sat Dec 17 23:38:00 2011 PST
 (1 row)

+SELECT to_timestamp('2011-12-18 00:00 LMT', 'YYYY-MM-DD HH24:MI TZ');  -- dyntz
+         to_timestamp
+------------------------------
+ Sat Dec 17 23:52:58 2011 PST
+(1 row)
+
 SELECT to_timestamp('2011-12-18 11:38ESTFOO24', 'YYYY-MM-DD HH12:MITZFOOSS');
          to_timestamp
 ------------------------------
diff --git a/src/test/regress/expected/timestamptz.out b/src/test/regress/expected/timestamptz.out
index b437613ac8..be01cfc76f 100644
--- a/src/test/regress/expected/timestamptz.out
+++ b/src/test/regress/expected/timestamptz.out
@@ -176,6 +176,65 @@ SELECT '205000-01-10 17:32:01 Europe/Helsinki'::timestamptz; -- non-DST
  Fri Jan 10 07:32:01 205000 PST
 (1 row)

+-- Recognize "LMT" as whatever it means in the current zone
+SELECT 'Jan 01 00:00:00 1000 LMT'::timestamptz;
+         timestamptz
+------------------------------
+ Wed Jan 01 00:00:00 1000 LMT
+(1 row)
+
+SELECT 'Jan 01 00:00:00 2024 LMT'::timestamptz;
+         timestamptz
+------------------------------
+ Sun Dec 31 23:52:58 2023 PST
+(1 row)
+
+SET timezone = 'Europe/London';
+SELECT 'Jan 01 00:00:00 1000 LMT'::timestamptz;
+         timestamptz
+------------------------------
+ Wed Jan 01 00:00:00 1000 LMT
+(1 row)
+
+SELECT 'Jan 01 00:00:00 2024 LMT'::timestamptz;
+         timestamptz
+------------------------------
+ Mon Jan 01 00:01:15 2024 GMT
+(1 row)
+
+-- which might be nothing
+SET timezone = 'UTC';
+SELECT 'Jan 01 00:00:00 2024 LMT'::timestamptz;  -- fail
+ERROR:  invalid input syntax for type timestamp with time zone: "Jan 01 00:00:00 2024 LMT"
+LINE 1: SELECT 'Jan 01 00:00:00 2024 LMT'::timestamptz;
+               ^
+-- Another example of an abbrev that varies across zones
+SELECT '1912-01-01 00:00 MMT'::timestamptz;  -- from timezone_abbreviations
+         timestamptz
+------------------------------
+ Sun Dec 31 17:30:00 1911 UTC
+(1 row)
+
+SET timezone = 'America/Montevideo';
+SELECT '1912-01-01 00:00'::timestamptz;
+         timestamptz
+------------------------------
+ Mon Jan 01 00:00:00 1912 MMT
+(1 row)
+
+SELECT '1912-01-01 00:00 MMT'::timestamptz;
+         timestamptz
+------------------------------
+ Mon Jan 01 00:00:00 1912 MMT
+(1 row)
+
+SELECT '1912-01-01 00:00 MMT'::timestamptz AT TIME ZONE 'UTC';
+         timezone
+--------------------------
+ Mon Jan 01 03:44:51 1912
+(1 row)
+
+RESET timezone;
 -- Test non-error-throwing API
 SELECT pg_input_is_valid('now', 'timestamptz');
  pg_input_is_valid
diff --git a/src/test/regress/sql/horology.sql b/src/test/regress/sql/horology.sql
index 4aa88b4ba9..1310b43277 100644
--- a/src/test/regress/sql/horology.sql
+++ b/src/test/regress/sql/horology.sql
@@ -538,6 +538,7 @@ SELECT to_timestamp('2011-12-18 11:38 EST', 'YYYY-MM-DD HH12:MI TZ');
 SELECT to_timestamp('2011-12-18 11:38 -05', 'YYYY-MM-DD HH12:MI TZ');
 SELECT to_timestamp('2011-12-18 11:38 +01:30', 'YYYY-MM-DD HH12:MI TZ');
 SELECT to_timestamp('2011-12-18 11:38 MSK', 'YYYY-MM-DD HH12:MI TZ');  -- dyntz
+SELECT to_timestamp('2011-12-18 00:00 LMT', 'YYYY-MM-DD HH24:MI TZ');  -- dyntz
 SELECT to_timestamp('2011-12-18 11:38ESTFOO24', 'YYYY-MM-DD HH12:MITZFOOSS');
 SELECT to_timestamp('2011-12-18 11:38-05FOO24', 'YYYY-MM-DD HH12:MITZFOOSS');
 SELECT to_timestamp('2011-12-18 11:38 JUNK', 'YYYY-MM-DD HH12:MI TZ');  -- error
diff --git a/src/test/regress/sql/timestamptz.sql b/src/test/regress/sql/timestamptz.sql
index 6b91e7eddc..7f38a23d2d 100644
--- a/src/test/regress/sql/timestamptz.sql
+++ b/src/test/regress/sql/timestamptz.sql
@@ -109,6 +109,23 @@ SELECT '20500110 173201 Europe/Helsinki'::timestamptz; -- non-DST
 SELECT '205000-07-10 17:32:01 Europe/Helsinki'::timestamptz; -- DST
 SELECT '205000-01-10 17:32:01 Europe/Helsinki'::timestamptz; -- non-DST

+-- Recognize "LMT" as whatever it means in the current zone
+SELECT 'Jan 01 00:00:00 1000 LMT'::timestamptz;
+SELECT 'Jan 01 00:00:00 2024 LMT'::timestamptz;
+SET timezone = 'Europe/London';
+SELECT 'Jan 01 00:00:00 1000 LMT'::timestamptz;
+SELECT 'Jan 01 00:00:00 2024 LMT'::timestamptz;
+-- which might be nothing
+SET timezone = 'UTC';
+SELECT 'Jan 01 00:00:00 2024 LMT'::timestamptz;  -- fail
+-- Another example of an abbrev that varies across zones
+SELECT '1912-01-01 00:00 MMT'::timestamptz;  -- from timezone_abbreviations
+SET timezone = 'America/Montevideo';
+SELECT '1912-01-01 00:00'::timestamptz;
+SELECT '1912-01-01 00:00 MMT'::timestamptz;
+SELECT '1912-01-01 00:00 MMT'::timestamptz AT TIME ZONE 'UTC';
+RESET timezone;
+
 -- Test non-error-throwing API
 SELECT pg_input_is_valid('now', 'timestamptz');
 SELECT pg_input_is_valid('garbage', 'timestamptz');
diff --git a/src/timezone/localtime.c b/src/timezone/localtime.c
index 0bc160ea7d..65511ae8be 100644
--- a/src/timezone/localtime.c
+++ b/src/timezone/localtime.c
@@ -1843,6 +1843,82 @@ pg_interpret_timezone_abbrev(const char *abbrev,
     return false;                /* hm, not actually used in any interval? */
 }

+/*
+ * Detect whether a timezone abbreviation is defined within the given zone.
+ *
+ * This is similar to pg_interpret_timezone_abbrev() but is not concerned
+ * with a specific point in time.  We want to know if the abbreviation is
+ * known at all, and if so whether it has one meaning or several.
+ *
+ * Returns true if the abbreviation is known, false if not.
+ * If the abbreviation is known and has a single meaning (only one value
+ * of gmtoff/isdst), sets *isfixed = true and sets *gmtoff and *isdst.
+ * If there are multiple meanings, sets *isfixed = false.
+ *
+ * Note: abbrev is matched case-sensitively; it should be all-upper-case.
+ */
+bool
+pg_timezone_abbrev_is_known(const char *abbrev,
+                            bool *isfixed,
+                            long int *gmtoff,
+                            int *isdst,
+                            const pg_tz *tz)
+{
+    bool        result = false;
+    const struct state *sp = &tz->state;
+    const char *abbrs;
+    int            abbrind;
+
+    /*
+     * Locate the abbreviation in the zone's abbreviation list.  We assume
+     * there are not duplicates in the list.
+     */
+    abbrs = sp->chars;
+    abbrind = 0;
+    while (abbrind < sp->charcnt)
+    {
+        if (strcmp(abbrev, abbrs + abbrind) == 0)
+            break;
+        while (abbrs[abbrind] != '\0')
+            abbrind++;
+        abbrind++;
+    }
+    if (abbrind >= sp->charcnt)
+        return false;            /* definitely not there */
+
+    /*
+     * Scan the ttinfo array to find uses of the abbreviation.
+     */
+    for (int i = 0; i < sp->typecnt; i++)
+    {
+        const struct ttinfo *ttisp = &sp->ttis[i];
+
+        if (ttisp->tt_desigidx == abbrind)
+        {
+            if (!result)
+            {
+                /* First usage */
+                *isfixed = true;    /* for the moment */
+                *gmtoff = ttisp->tt_utoff;
+                *isdst = ttisp->tt_isdst;
+                result = true;
+            }
+            else
+            {
+                /* Second or later usage, does it match? */
+                if (*gmtoff != ttisp->tt_utoff ||
+                    *isdst != ttisp->tt_isdst)
+                {
+                    *isfixed = false;
+                    break;        /* no point in looking further */
+                }
+            }
+        }
+    }
+
+    return result;
+}
+
 /*
  * If the given timezone uses only one GMT offset, store that offset
  * into *gmtoff and return true, else return false.
--
2.43.5

From 3d50373d5332ae89b6321886ddf2cf74db6e72cd Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Thu, 12 Dec 2024 17:54:15 -0500
Subject: [PATCH v1 2/2] Improve DecodeTimezoneAbbrev's caching logic.

The previous implementation could only cope with caching a
result found in the zoneabbrevtbl.  It is worth expending
a little more space to be able to cache results obtained from
the IANA timezone data, especially since that's likely to be
the majority use-case going forward.

To do this, we have to reset the cache after a change in
the timezone GUC not only the timezone_abbrev GUC, but that's
not hard.

In my testing, this puts the speed of repeated timestamptz_in
calls back on par with what it was before the previous patch.

Per report from Aleksander Alekseev and additional investigation.

Discussion: https://postgr.es/m/CAJ7c6TOATjJqvhnYsui0=CO5XFMF4dvTGH+skzB--jNhqSQu5g@mail.gmail.com
---
 src/backend/commands/variable.c  |  2 +
 src/backend/utils/adt/datetime.c | 69 ++++++++++++++++++++++++--------
 src/include/utils/datetime.h     |  2 +
 src/tools/pgindent/typedefs.list |  1 +
 4 files changed, 58 insertions(+), 16 deletions(-)

diff --git a/src/backend/commands/variable.c b/src/backend/commands/variable.c
index d2db0c45b2..794d2a0fce 100644
--- a/src/backend/commands/variable.c
+++ b/src/backend/commands/variable.c
@@ -381,6 +381,8 @@ void
 assign_timezone(const char *newval, void *extra)
 {
     session_timezone = *((pg_tz **) extra);
+    /* datetime.c's cache of timezone abbrevs may now be obsolete */
+    ClearTimeZoneAbbrevCache();
 }

 /*
diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c
index e6b3ac134d..4e6515865b 100644
--- a/src/backend/utils/adt/datetime.c
+++ b/src/backend/utils/adt/datetime.c
@@ -259,7 +259,17 @@ static const datetkn *datecache[MAXDATEFIELDS] = {NULL};

 static const datetkn *deltacache[MAXDATEFIELDS] = {NULL};

-static const datetkn *abbrevcache[MAXDATEFIELDS] = {NULL};
+/* Cache for results of timezone abbreviation lookups */
+
+typedef struct TzAbbrevCache
+{
+    char        abbrev[TOKMAXLEN + 1];    /* always NUL-terminated */
+    char        ftype;            /* TZ, DTZ, or DYNTZ */
+    int            offset;            /* GMT offset, if fixed-offset */
+    pg_tz       *tz;                /* relevant zone, if variable-offset */
+} TzAbbrevCache;
+
+static TzAbbrevCache tzabbrevcache[MAXDATEFIELDS];


 /*
@@ -3126,15 +3136,28 @@ DecodeTimezoneAbbrev(int field, const char *lowtoken,
                      int *ftype, int *offset, pg_tz **tz,
                      DateTimeErrorExtra *extra)
 {
+    TzAbbrevCache *tzc = &tzabbrevcache[field];
     bool        isfixed;
     int            isdst;
     const datetkn *tp;

+    /*
+     * Do we have a cached result?  Use strncmp so that we match truncated
+     * names, although we shouldn't really see that happen with normal
+     * abbreviations.
+     */
+    if (strncmp(lowtoken, tzc->abbrev, TOKMAXLEN) == 0)
+    {
+        *ftype = tzc->ftype;
+        *offset = tzc->offset;
+        *tz = tzc->tz;
+        return 0;
+    }
+
     /*
      * See if the current session_timezone recognizes it.  Checking this
      * before zoneabbrevtbl allows us to correctly handle abbreviations whose
-     * meaning varies across zones, such as "LMT".  (Caching this lookup is
-     * left for later.)
+     * meaning varies across zones, such as "LMT".
      */
     if (session_timezone &&
         TimeZoneAbbrevIsKnown(lowtoken, session_timezone,
@@ -3144,20 +3167,20 @@ DecodeTimezoneAbbrev(int field, const char *lowtoken,
         *tz = (isfixed ? NULL : session_timezone);
         /* flip sign to agree with the convention used in zoneabbrevtbl */
         *offset = -(*offset);
+        /* cache result; use strlcpy to truncate name if necessary */
+        strlcpy(tzc->abbrev, lowtoken, TOKMAXLEN + 1);
+        tzc->ftype = *ftype;
+        tzc->offset = *offset;
+        tzc->tz = *tz;
         return 0;
     }

     /* Nope, so look in zoneabbrevtbl */
-    tp = abbrevcache[field];
-    /* use strncmp so that we match truncated tokens */
-    if (tp == NULL || strncmp(lowtoken, tp->token, TOKMAXLEN) != 0)
-    {
-        if (zoneabbrevtbl)
-            tp = datebsearch(lowtoken, zoneabbrevtbl->abbrevs,
-                             zoneabbrevtbl->numabbrevs);
-        else
-            tp = NULL;
-    }
+    if (zoneabbrevtbl)
+        tp = datebsearch(lowtoken, zoneabbrevtbl->abbrevs,
+                         zoneabbrevtbl->numabbrevs);
+    else
+        tp = NULL;
     if (tp == NULL)
     {
         *ftype = UNKNOWN_FIELD;
@@ -3167,7 +3190,6 @@ DecodeTimezoneAbbrev(int field, const char *lowtoken,
     }
     else
     {
-        abbrevcache[field] = tp;
         *ftype = tp->type;
         if (tp->type == DYNTZ)
         {
@@ -3181,11 +3203,26 @@ DecodeTimezoneAbbrev(int field, const char *lowtoken,
             *offset = tp->value;
             *tz = NULL;
         }
+
+        /* cache result; use strlcpy to truncate name if necessary */
+        strlcpy(tzc->abbrev, lowtoken, TOKMAXLEN + 1);
+        tzc->ftype = *ftype;
+        tzc->offset = *offset;
+        tzc->tz = *tz;
     }

     return 0;
 }

+/*
+ * Reset tzabbrevcache after a change in session_timezone.
+ */
+void
+ClearTimeZoneAbbrevCache(void)
+{
+    memset(tzabbrevcache, 0, sizeof(tzabbrevcache));
+}
+

 /* DecodeSpecial()
  * Decode text string using lookup table.
@@ -5036,8 +5073,8 @@ void
 InstallTimeZoneAbbrevs(TimeZoneAbbrevTable *tbl)
 {
     zoneabbrevtbl = tbl;
-    /* reset abbrevcache, which may contain pointers into old table */
-    memset(abbrevcache, 0, sizeof(abbrevcache));
+    /* reset tzabbrevcache, which may contain results from old table */
+    memset(tzabbrevcache, 0, sizeof(tzabbrevcache));
 }

 /*
diff --git a/src/include/utils/datetime.h b/src/include/utils/datetime.h
index e4ac2b8e7f..79382b9ecd 100644
--- a/src/include/utils/datetime.h
+++ b/src/include/utils/datetime.h
@@ -351,6 +351,8 @@ extern pg_tz *DecodeTimezoneNameToTz(const char *tzname);
 extern int    DecodeTimezoneAbbrevPrefix(const char *str,
                                        int *offset, pg_tz **tz);

+extern void ClearTimeZoneAbbrevCache(void);
+
 extern int    j2day(int date);

 extern struct Node *TemporalSimplify(int32 max_precis, struct Node *node);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index ce33e55bf1..a84f26218c 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3022,6 +3022,7 @@ TypeCat
 TypeFuncClass
 TypeInfo
 TypeName
+TzAbbrevCache
 U32
 U8
 UChar
--
2.43.5


Re: IANA timezone abbreviations versus timezone_abbreviations

От
Andreas Karlsson
Дата:
On 12/13/24 12:33 AM, Tom Lane wrote:
> What I think we should do about this is to teach timestamp
> input to look into the current IANA time zone to see if it
> knows the given abbreviation, and if so use that meaning
> regardless of what timezone_abbreviations might say.  This
> isn't particularly hard, and it doesn't appear to cost
> anything speed-wise, but is there anybody out there who
> is relying on the current behavior?

I am not convinced this is an improvement. While this patch removes the 
round-trip hazard it also makes it confusing to use the 
timezone_abbreviations GUC since it can be overridden by IANA data based 
on your current timezone. So you need to know all the, sometimes weird, 
names for your current timezone. Seems unnecessarily hard to reason 
about and wouldn't most people who use timezone_abbreviations rely on 
the current behavior?

But that said I personally only use ISO timestamps with numerical 
offsets. Partially to avoid all this mess.

Andreas




Re: IANA timezone abbreviations versus timezone_abbreviations

От
Tom Lane
Дата:
Andreas Karlsson <andreas@proxel.se> writes:
> On 12/13/24 12:33 AM, Tom Lane wrote:
>> What I think we should do about this is to teach timestamp
>> input to look into the current IANA time zone to see if it
>> knows the given abbreviation, and if so use that meaning
>> regardless of what timezone_abbreviations might say.

> I am not convinced this is an improvement. While this patch removes the 
> round-trip hazard it also makes it confusing to use the 
> timezone_abbreviations GUC since it can be overridden by IANA data based 
> on your current timezone. So you need to know all the, sometimes weird, 
> names for your current timezone. Seems unnecessarily hard to reason 
> about and wouldn't most people who use timezone_abbreviations rely on 
> the current behavior?

Presumably they're not that weird to the locals?

I am not sure what you mean by "people who use
timezone_abbreviations".  I think that's about everyone --- it's
not like the default setting doesn't contain any abbreviations.
(If it didn't then we'd not have such a problem...)

> But that said I personally only use ISO timestamps with numerical 
> offsets. Partially to avoid all this mess.

If you only use ISO notation then this doesn't matter to you
either way.

            regards, tom lane



Re: IANA timezone abbreviations versus timezone_abbreviations

От
Bruce Momjian
Дата:
On Mon, Dec 16, 2024 at 02:57:59PM -0500, Tom Lane wrote:
> Andreas Karlsson <andreas@proxel.se> writes:
> > On 12/13/24 12:33 AM, Tom Lane wrote:
> >> What I think we should do about this is to teach timestamp
> >> input to look into the current IANA time zone to see if it
> >> knows the given abbreviation, and if so use that meaning
> >> regardless of what timezone_abbreviations might say.
> 
> > I am not convinced this is an improvement. While this patch removes the 
> > round-trip hazard it also makes it confusing to use the 
> > timezone_abbreviations GUC since it can be overridden by IANA data based 
> > on your current timezone. So you need to know all the, sometimes weird, 
> > names for your current timezone. Seems unnecessarily hard to reason 
> > about and wouldn't most people who use timezone_abbreviations rely on 
> > the current behavior?
> 
> Presumably they're not that weird to the locals?
> 
> I am not sure what you mean by "people who use
> timezone_abbreviations".  I think that's about everyone --- it's
> not like the default setting doesn't contain any abbreviations.
> (If it didn't then we'd not have such a problem...)
> 
> > But that said I personally only use ISO timestamps with numerical 
> > offsets. Partially to avoid all this mess.
> 
> If you only use ISO notation then this doesn't matter to you
> either way.

Yes, your patch seems like a big improvement.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Do not let urgent matters crowd out time for investment in the future.





Re: IANA timezone abbreviations versus timezone_abbreviations

От
"Jelte Fennema-Nio"
Дата:
On Sun Dec 29, 2024 at 12:47 AM CET, Bruce Momjian wrote:
> On Mon, Dec 16, 2024 at 02:57:59PM -0500, Tom Lane wrote:
> Yes, your patch seems like a big improvement.

+1

+    Before consulting the <varname>timezone_abbreviations</varname> file,
+    <productname>PostgreSQL</productname> checks to see whether an
+    abbreviation used in datetime input is defined in the IANA time zone
+    database entry currently selected by the
+    <xref linkend="guc-timezone"/> run-time parameter.  If so the time
+    zone's meaning is used, for consistency with datetime output.  The
+    <varname>timezone_abbreviations</varname> file is mainly useful for
+    allowing datetime input to recognize abbreviations for time zones
+    other than the active zone.

I think it would be good to add some additional clarify here. It was
fairly confusing to me. Especially the last sentence, due to the use of
"active zone", even though it's really talking about the currently
active abbreviations list. Probably my confusion mostly came from the
fact that I wasn't aware that timezone abbreviations were localized, but
I doubt I'm alone in not knowing this. Maybe something like this (feel
free to improve further):


Before consulting the <varname>timezone_abbreviations</varname>
file, <productname>PostgreSQL</productname> checks to see whether an
abbreviation used in datetime input is defined in the currently active
IANA time zone database.  The abbreviations for these entries are
localized based on the <xref linkend="guc-timezone"/>
run-time parameter, so depending on the configured <xref
linkend="guc-timezone"/> abbreviations will differ. If it is found the
IANA time zone database, then that meaning is used for consistency with
datetime output.  The <varname>timezone_abbreviations</varname> file is
only useful for allowing datetime input to recognize
abbreviations for time zones that are not defined in the currently
active IANA time zone database.



Re: IANA timezone abbreviations versus timezone_abbreviations

От
"Jelte Fennema-Nio"
Дата:
On Mon Dec 16, 2024 at 8:57 PM CET, Tom Lane wrote:
> Andreas Karlsson <andreas@proxel.se> writes:
>> On 12/13/24 12:33 AM, Tom Lane wrote:
>> I am not convinced this is an improvement. While this patch removes the
>> round-trip hazard it also makes it confusing to use the
>> timezone_abbreviations GUC since it can be overridden by IANA data based
>> on your current timezone. So you need to know all the, sometimes weird,
>> names for your current timezone. Seems unnecessarily hard to reason
>> about and wouldn't most people who use timezone_abbreviations rely on
>> the current behavior?
>
> Presumably they're not that weird to the locals?
>
> I am not sure what you mean by "people who use
> timezone_abbreviations".  I think that's about everyone --- it's
> not like the default setting doesn't contain any abbreviations.
> (If it didn't then we'd not have such a problem...)

Maybe changing the default value of timezone_abbreviations is a better
solution to the problem, or in addition to the proposed patch.



Re: IANA timezone abbreviations versus timezone_abbreviations

От
Tom Lane
Дата:
"Jelte Fennema-Nio" <postgres@jeltef.nl> writes:
> I think it would be good to add some additional clarify here. It was
> fairly confusing to me. Especially the last sentence, due to the use of
> "active zone", even though it's really talking about the currently
> active abbreviations list. Probably my confusion mostly came from the
> fact that I wasn't aware that timezone abbreviations were localized, but
> I doubt I'm alone in not knowing this. Maybe something like this (feel
> free to improve further):

Hmm, I don't like your phrasing using "IANA time zone database",
because that makes it sound like we'll take any abbreviation that's
found anywhere in that whole data set.  What the proposal actually
does is to recognize any abbreviation that is used, or has been
used in the past, in the IANA time zone selected by our current
timezone setting.  (And, perhaps more to the point, to give it the
meaning it has or had in that zone.)  Not sure about concise wording
for that.

            regards, tom lane



Re: IANA timezone abbreviations versus timezone_abbreviations

От
"Jelte Fennema-Nio"
Дата:
On Sun Dec 29, 2024 at 11:49 PM CET, Jelte Fennema-Nio wrote:
> Maybe changing the default value of timezone_abbreviations is a better
> solution to the problem, or in addition to the proposed patch.

To clarify, I meant maybe changing the default of timezone_abbreviations
to be empty. It sounds like the timezone_abbreviations setting is a lot
less useful now than it was in the past.



Re: IANA timezone abbreviations versus timezone_abbreviations

От
"Jelte Fennema-Nio"
Дата:
On Sun Dec 29, 2024 at 11:56 PM CET, Tom Lane wrote:
> Hmm, I don't like your phrasing using "IANA time zone database",
> because that makes it sound like we'll take any abbreviation that's
> found anywhere in that whole data set.  What the proposal actually
> does is to recognize any abbreviation that is used, or has been
> used in the past, in the IANA time zone selected by our current
> timezone setting.  (And, perhaps more to the point, to give it the
> meaning it has or had in that zone.)  Not sure about concise wording
> for that.

Okay, yeah I definitely misunderstood what was happening here. So
scratch my previous attempt at clarifying.

The current situation seems utterly messed up though. One thing that
shocks me is that we're, by default and without warning, parsing IST as
Israel Standard Time instead of the timezone that 17% of the world's
population uses: Indian Standard Time. And even with this patch that
behaviour only changes if you set your timezone to Asia/India. Which
seems suboptimal, because even as a European myself, IST means Indian
Standard Time.

I definitely think this is a step in the right direction, but I'm not
sure that it goes far enough. How about in addition we change the
following:

1. Change the default of timezone_abbreviations to an empty list.
2. When parsing search for the abbreviation string in the IANA timezone
   database.
   a. If it's a unique match, use that timezone.
   b. If it's not unique, but it matches an abbreviation of the current
      timezone, use that timezone.
   c. If it's part of the timezone_abbreviations list, use that timezone.
   d. Else, throw an error.

I think that would result in a lot more sensible behaviour.

Another option would be to put "c" at the top of the list, which would
allow overriding the IANA timezone database with that file. As long as
we don't do that by default I think that would be fine.

And I guess a third option would be to remove conflicts from the Default
timezone_abbreviations list.

To be clear, for backwards compatibility we should probably keep the old
Default list in any of these cases so people can easily switch back in
case this change breaks their application.



Re: IANA timezone abbreviations versus timezone_abbreviations

От
Tom Lane
Дата:
"Jelte Fennema-Nio" <postgres@jeltef.nl> writes:
> The current situation seems utterly messed up though. One thing that
> shocks me is that we're, by default and without warning, parsing IST as
> Israel Standard Time instead of the timezone that 17% of the world's
> population uses: Indian Standard Time. And even with this patch that
> behaviour only changes if you set your timezone to Asia/India. Which
> seems suboptimal, because even as a European myself, IST means Indian
> Standard Time.

This argument would hold more water if we'd gotten more than
approximately zero field complaints about the current setup.
I think most folk who want that just use the documented solution
of setting timezone_abbreviations = 'India'.

Certainly there's an argument to be made that we should have gone
for a minimal rather than maximal default list of abbreviations.
But it's a couple of decades too late to be making that argument ---
at this point backwards compatibility is a huge consideration IMV.
(The current design dates from d8b5c95ca of 2006-07-25, just to
clarify how long this has stood.)

> 1. Change the default of timezone_abbreviations to an empty list.

The villagers would be on our doorstep with pitchforks if we did
that.  It's remarkable how many of these abbreviations have live
constituencies --- try searching the commit log for "abbreviation"
to find a lot of commits that added zone abbrevs based on user
complaints.  Now, most of those complaints were ten or more years
back, so maybe people no longer care as much ... but I doubt that's
the way to bet.

> 2. When parsing search for the abbreviation string in the IANA timezone
>    database. 

Searching the whole IANA database isn't very practical I fear.
We do something approximately that costly to identify the default
timezone setting --- and that is code that got moved into initdb
precisely because it was too slow to be tolerable as part of
postmaster start.  It might work to calculate the set of abbrevs
known in the IANA data once and cache it; but where, and how
would we know when to update the cache?

Another thing that complicates all this is that our
timezone_abbreviations data reflects a very old state of the IANA
database, from when they were of a mindset that every zone should have
alphabetic timezone abbrevs even if they had to make them up.  A few
of those choices achieved real-world currency, but most didn't, and
starting in about 2015 the IANA crew removed (most of?) the ones
they'd made up in favor of numerical UTC offsets.  We kept them all,
for fear of user complaints if we removed them.  So we're now in a
situation where timezone_abbreviations knows a lot of abbrevs that
are no longer to be found in IANA, and it's very unclear how many
people might be depending on those entries.  It's probably more than
none though.  (Conversely, IANA does still have some historical
abbrevs such as "LMT" that aren't in timezone_abbreviations.)

Anyway, I'm of the opinion that we should be after a minimal change
not a grand rewrite that removes all ambiguity.  Timezones are such
a political mess that we probably couldn't reach 100% clarity anyway.
We could definitely do major damage to the system's usability
though, if we don't tread carefully.

            regards, tom lane



Re: IANA timezone abbreviations versus timezone_abbreviations

От
Aleksander Alekseev
Дата:
Hi Tom,

> This mess was brought up in pgsql-bugs [1], but the solution
> I propose here is invasive enough that I think it needs
> discussion on -hackers.
>
> [...]

I tested and reviewed the patch. It fixes the originally reported bug
and looks good to me.

> The only other way I can envision to remove the round-trip hazard
> is to stop using alphabetic abbreviations at all in timestamp
> output, and use numeric GMT offsets regardless of datestyle.
> I doubt that would make many people happy.  It would certainly
> break a bunch of our own regression tests, and I expect it would
> break other people's applications too.

I think we need to introduce alternative data types e.g. datetime /
datetimetz pair if we want to do this.

-- 
Best regards,
Aleksander Alekseev



Re: IANA timezone abbreviations versus timezone_abbreviations

От
Tom Lane
Дата:
Aleksander Alekseev <aleksander@timescale.com> writes:
> I tested and reviewed the patch. It fixes the originally reported bug
> and looks good to me.

Thanks for looking at it!  I'm not sure we have full consensus on
this yet, but at least there seem to be a majority in favor.
So here's a v2 with some loose ends cleaned up.

In 0001, I edited the docs more heavily in hopes of addressing
Jelte's concern about the docs being confusing.  The code and
test changes are the same as before.

0002 is also the same as before.

0003 is new work that fixes the pg_timezone_abbrevs view so
that its output matches the new behavior.

            regards, tom lane

From 4ce9ad893861b1a9e49e3c8567aa4468c17465fc Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Thu, 9 Jan 2025 17:08:56 -0500
Subject: [PATCH v2 1/3] Seek zone abbreviations in the IANA data before
 timezone_abbreviations.

If a time zone abbreviation used in datetime input is defined in
the currently active timezone, use that definition in preference
to looking in the timezone_abbreviations list.  That allows us to
correctly handle abbreviations that have different meanings in
different timezones.  Also, it eliminates an inconsistency between
datetime input and datetime output: the non-ISO datestyles for
timestamptz have always printed abbreviations taken from the IANA
data not from timezone_abbreviations.  Before this fix, it was
possible to demonstrate cases where casting a timestamp to text
and back fails or changes the value significantly because of that
inconsistency.

While this change removes the ability to override the IANA data about
an abbreviation known in the current zone, it's not clear that there's
any real use-case for doing so.  But it is clear that this makes life
a lot easier for dealing with abbreviations that have conflicts.

There are a couple of loose ends still to deal with:

* As this patch stands, it causes a noticeable degradation of the
runtime of timestamptz_in (about 20% in a microbenchmark of just
that function).  This is from DecodeTimezoneAbbrev not caching
the results of its lookup in the new path.  I split out the
improvement of that part for a follow-up patch.

* The pg_timezone_abbrevs view shows only abbreviations from
the timezone_abbreviations list.  That should probably be
adjusted to account for abbreviations taken from the timezone.

Per report from Aleksander Alekseev and additional investigation.

Discussion: https://postgr.es/m/CAJ7c6TOATjJqvhnYsui0=CO5XFMF4dvTGH+skzB--jNhqSQu5g@mail.gmail.com
---
 doc/src/sgml/config.sgml                  |  6 +-
 doc/src/sgml/datatype.sgml                |  4 +
 doc/src/sgml/datetime.sgml                | 42 ++++++++++-
 src/backend/utils/adt/datetime.c          | 89 +++++++++++++++++++++--
 src/include/pgtime.h                      |  5 ++
 src/test/regress/expected/horology.out    |  6 ++
 src/test/regress/expected/timestamptz.out | 59 +++++++++++++++
 src/test/regress/sql/horology.sql         |  1 +
 src/test/regress/sql/timestamptz.sql      | 17 +++++
 src/timezone/localtime.c                  | 76 +++++++++++++++++++
 10 files changed, 294 insertions(+), 11 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index f1ab614575..453d213966 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -10022,8 +10022,10 @@ SET XML OPTION { DOCUMENT | CONTENT };
       </term>
       <listitem>
        <para>
-        Sets the collection of time zone abbreviations that will be accepted
-        by the server for datetime input.  The default is <literal>'Default'</literal>,
+        Sets the collection of additional time zone abbreviations that
+        will be accepted by the server for datetime input (beyond any
+        abbreviations defined by the current <varname>TimeZone</varname>
+        setting).  The default is <literal>'Default'</literal>,
         which is a collection that works in most of the world; there are
         also <literal>'Australia'</literal> and <literal>'India'</literal>,
         and other collections can be defined for a particular installation.
diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 3e6751d64c..1d9127e94e 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -2534,6 +2534,10 @@ TIMESTAMP WITH TIME ZONE '2004-10-19 10:23:54+02'
      abbreviation if one is in common use in the current zone.  Otherwise
      it appears as a signed numeric offset in ISO 8601 basic format
      (<replaceable>hh</replaceable> or <replaceable>hhmm</replaceable>).
+     The alphabetic abbreviations shown in these styles are taken from the
+     IANA time zone database entry currently selected by the
+     <xref linkend="guc-timezone"/> run-time parameter; they are not
+     affected by the <xref linkend="guc-timezone-abbreviations"/> setting.
     </para>

     <para>
diff --git a/doc/src/sgml/datetime.sgml b/doc/src/sgml/datetime.sgml
index e7035c7806..3e24170acb 100644
--- a/doc/src/sgml/datetime.sgml
+++ b/doc/src/sgml/datetime.sgml
@@ -80,7 +80,7 @@
       <step>
        <para>
         See if the token matches any known time zone abbreviation.
-        These abbreviations are supplied by the configuration file
+        These abbreviations are determined by the configuration settings
         described in <xref linkend="datetime-config-files"/>.
        </para>
       </step>
@@ -424,9 +424,43 @@
    <para>
     Since timezone abbreviations are not well standardized,
     <productname>PostgreSQL</productname> provides a means to customize
-    the set of abbreviations accepted by the server.  The
-    <xref linkend="guc-timezone-abbreviations"/> run-time parameter
-    determines the active set of abbreviations.  While this parameter
+    the set of abbreviations accepted in datetime input.
+    There are two sources for these abbreviations:
+
+    <orderedlist>
+     <listitem>
+      <para>
+       The <xref linkend="guc-timezone"/> run-time parameter is usually
+       set to the name of an entry in the IANA time zone database.
+       If that zone has widely-used zone abbreviations, they will appear
+       in the IANA data, and <productname>PostgreSQL</productname> will
+       preferentially recognize those abbreviations with the meanings
+       given in the IANA data.
+       For example, if <varname>timezone</varname> is set
+       to <literal>America/New_York</literal> then <literal>EST</literal>
+       will be understood as UTC-5 and <literal>EDT</literal> will be
+       understood as UTC-4.  (These IANA abbreviations will also be used
+       in datetime output, if <xref linkend="guc-datestyle"/> is set to a
+       style that prefers non-numeric zone abbreviations.)
+      </para>
+     </listitem>
+
+     <listitem>
+      <para>
+       If an abbreviation is not found in the current IANA time zone,
+       it is sought in the list specified by the
+       <xref linkend="guc-timezone-abbreviations"/> run-time parameter.
+       The <varname>timezone_abbreviations</varname> list is primarily
+       useful for allowing datetime input to recognize abbreviations for
+       time zones other than the current zone.  (These abbreviations will
+       not be used in datetime output.)
+      </para>
+     </listitem>
+    </orderedlist>
+   </para>
+
+   <para>
+    While the <varname>timezone_abbreviations</varname> parameter
     can be altered by any database user, the possible values for it
     are under the control of the database administrator — they
     are in fact names of configuration files stored in
diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c
index d8af3591d1..cb028d3934 100644
--- a/src/backend/utils/adt/datetime.c
+++ b/src/backend/utils/adt/datetime.c
@@ -1845,6 +1845,40 @@ DetermineTimeZoneAbbrevOffsetInternal(pg_time_t t, const char *abbr, pg_tz *tzp,
 }


+/* TimeZoneAbbrevIsKnown()
+ *
+ * Detect whether the given string is a time zone abbreviation that's known
+ * in the specified TZDB timezone, and if so whether it's fixed or varying
+ * meaning.  The match is not case-sensitive.
+ */
+static bool
+TimeZoneAbbrevIsKnown(const char *abbr, pg_tz *tzp,
+                      bool *isfixed, int *offset, int *isdst)
+{
+    char        upabbr[TZ_STRLEN_MAX + 1];
+    unsigned char *p;
+    long int    gmtoff;
+
+    /* We need to force the abbrev to upper case */
+    strlcpy(upabbr, abbr, sizeof(upabbr));
+    for (p = (unsigned char *) upabbr; *p; p++)
+        *p = pg_toupper(*p);
+
+    /* Look up the abbrev's meaning in this zone */
+    if (pg_timezone_abbrev_is_known(upabbr,
+                                    isfixed,
+                                    &gmtoff,
+                                    isdst,
+                                    tzp))
+    {
+        /* Change sign to agree with DetermineTimeZoneOffset() */
+        *offset = (int) -gmtoff;
+        return true;
+    }
+    return false;
+}
+
+
 /* DecodeTimeOnly()
  * Interpret parsed string as time fields only.
  * Returns 0 if successful, DTERR code if bogus input detected.
@@ -3092,8 +3126,28 @@ DecodeTimezoneAbbrev(int field, const char *lowtoken,
                      int *ftype, int *offset, pg_tz **tz,
                      DateTimeErrorExtra *extra)
 {
+    bool        isfixed;
+    int            isdst;
     const datetkn *tp;

+    /*
+     * See if the current session_timezone recognizes it.  Checking this
+     * before zoneabbrevtbl allows us to correctly handle abbreviations whose
+     * meaning varies across zones, such as "LMT".  (Caching this lookup is
+     * left for later.)
+     */
+    if (session_timezone &&
+        TimeZoneAbbrevIsKnown(lowtoken, session_timezone,
+                              &isfixed, offset, &isdst))
+    {
+        *ftype = (isfixed ? (isdst ? DTZ : TZ) : DYNTZ);
+        *tz = (isfixed ? NULL : session_timezone);
+        /* flip sign to agree with the convention used in zoneabbrevtbl */
+        *offset = -(*offset);
+        return 0;
+    }
+
+    /* Nope, so look in zoneabbrevtbl */
     tp = abbrevcache[field];
     /* use strncmp so that we match truncated tokens */
     if (tp == NULL || strncmp(lowtoken, tp->token, TOKMAXLEN) != 0)
@@ -3109,6 +3163,7 @@ DecodeTimezoneAbbrev(int field, const char *lowtoken,
         *ftype = UNKNOWN_FIELD;
         *offset = 0;
         *tz = NULL;
+        /* failure results are not cached */
     }
     else
     {
@@ -3278,9 +3333,6 @@ DecodeTimezoneAbbrevPrefix(const char *str, int *offset, pg_tz **tz)
     *offset = 0;                /* avoid uninitialized vars on failure */
     *tz = NULL;

-    if (!zoneabbrevtbl)
-        return -1;                /* no abbrevs known, so fail immediately */
-
     /* Downcase as much of the string as we could need */
     for (len = 0; len < TOKMAXLEN; len++)
     {
@@ -3299,9 +3351,34 @@ DecodeTimezoneAbbrevPrefix(const char *str, int *offset, pg_tz **tz)
      */
     while (len > 0)
     {
-        const datetkn *tp = datebsearch(lowtoken, zoneabbrevtbl->abbrevs,
-                                        zoneabbrevtbl->numabbrevs);
+        bool        isfixed;
+        int            isdst;
+        const datetkn *tp;
+
+        /* See if the current session_timezone recognizes it. */
+        if (session_timezone &&
+            TimeZoneAbbrevIsKnown(lowtoken, session_timezone,
+                                  &isfixed, offset, &isdst))
+        {
+            if (isfixed)
+            {
+                /* flip sign to agree with the convention in zoneabbrevtbl */
+                *offset = -(*offset);
+            }
+            else
+            {
+                /* Caller must resolve the abbrev's current meaning */
+                *tz = session_timezone;
+            }
+            return len;
+        }

+        /* Known in zoneabbrevtbl? */
+        if (zoneabbrevtbl)
+            tp = datebsearch(lowtoken, zoneabbrevtbl->abbrevs,
+                             zoneabbrevtbl->numabbrevs);
+        else
+            tp = NULL;
         if (tp != NULL)
         {
             if (tp->type == DYNTZ)
@@ -3324,6 +3401,8 @@ DecodeTimezoneAbbrevPrefix(const char *str, int *offset, pg_tz **tz)
                 return len;
             }
         }
+
+        /* Nope, try the next shorter string. */
         lowtoken[--len] = '\0';
     }

diff --git a/src/include/pgtime.h b/src/include/pgtime.h
index 37171f1737..b8b898a69c 100644
--- a/src/include/pgtime.h
+++ b/src/include/pgtime.h
@@ -69,6 +69,11 @@ extern bool pg_interpret_timezone_abbrev(const char *abbrev,
                                          long int *gmtoff,
                                          int *isdst,
                                          const pg_tz *tz);
+extern bool pg_timezone_abbrev_is_known(const char *abbrev,
+                                        bool *isfixed,
+                                        long int *gmtoff,
+                                        int *isdst,
+                                        const pg_tz *tz);
 extern bool pg_get_timezone_offset(const pg_tz *tz, long int *gmtoff);
 extern const char *pg_get_timezone_name(pg_tz *tz);
 extern bool pg_tz_acceptable(pg_tz *tz);
diff --git a/src/test/regress/expected/horology.out b/src/test/regress/expected/horology.out
index cb28dfbaee..b90bfcd794 100644
--- a/src/test/regress/expected/horology.out
+++ b/src/test/regress/expected/horology.out
@@ -3332,6 +3332,12 @@ SELECT to_timestamp('2011-12-18 11:38 MSK', 'YYYY-MM-DD HH12:MI TZ');  -- dyntz
  Sat Dec 17 23:38:00 2011 PST
 (1 row)

+SELECT to_timestamp('2011-12-18 00:00 LMT', 'YYYY-MM-DD HH24:MI TZ');  -- dyntz
+         to_timestamp
+------------------------------
+ Sat Dec 17 23:52:58 2011 PST
+(1 row)
+
 SELECT to_timestamp('2011-12-18 11:38ESTFOO24', 'YYYY-MM-DD HH12:MITZFOOSS');
          to_timestamp
 ------------------------------
diff --git a/src/test/regress/expected/timestamptz.out b/src/test/regress/expected/timestamptz.out
index a6dd45626c..36349e363f 100644
--- a/src/test/regress/expected/timestamptz.out
+++ b/src/test/regress/expected/timestamptz.out
@@ -176,6 +176,65 @@ SELECT '205000-01-10 17:32:01 Europe/Helsinki'::timestamptz; -- non-DST
  Fri Jan 10 07:32:01 205000 PST
 (1 row)

+-- Recognize "LMT" as whatever it means in the current zone
+SELECT 'Jan 01 00:00:00 1000 LMT'::timestamptz;
+         timestamptz
+------------------------------
+ Wed Jan 01 00:00:00 1000 LMT
+(1 row)
+
+SELECT 'Jan 01 00:00:00 2024 LMT'::timestamptz;
+         timestamptz
+------------------------------
+ Sun Dec 31 23:52:58 2023 PST
+(1 row)
+
+SET timezone = 'Europe/London';
+SELECT 'Jan 01 00:00:00 1000 LMT'::timestamptz;
+         timestamptz
+------------------------------
+ Wed Jan 01 00:00:00 1000 LMT
+(1 row)
+
+SELECT 'Jan 01 00:00:00 2024 LMT'::timestamptz;
+         timestamptz
+------------------------------
+ Mon Jan 01 00:01:15 2024 GMT
+(1 row)
+
+-- which might be nothing
+SET timezone = 'UTC';
+SELECT 'Jan 01 00:00:00 2024 LMT'::timestamptz;  -- fail
+ERROR:  invalid input syntax for type timestamp with time zone: "Jan 01 00:00:00 2024 LMT"
+LINE 1: SELECT 'Jan 01 00:00:00 2024 LMT'::timestamptz;
+               ^
+-- Another example of an abbrev that varies across zones
+SELECT '1912-01-01 00:00 MMT'::timestamptz;  -- from timezone_abbreviations
+         timestamptz
+------------------------------
+ Sun Dec 31 17:30:00 1911 UTC
+(1 row)
+
+SET timezone = 'America/Montevideo';
+SELECT '1912-01-01 00:00'::timestamptz;
+         timestamptz
+------------------------------
+ Mon Jan 01 00:00:00 1912 MMT
+(1 row)
+
+SELECT '1912-01-01 00:00 MMT'::timestamptz;
+         timestamptz
+------------------------------
+ Mon Jan 01 00:00:00 1912 MMT
+(1 row)
+
+SELECT '1912-01-01 00:00 MMT'::timestamptz AT TIME ZONE 'UTC';
+         timezone
+--------------------------
+ Mon Jan 01 03:44:51 1912
+(1 row)
+
+RESET timezone;
 -- Test non-error-throwing API
 SELECT pg_input_is_valid('now', 'timestamptz');
  pg_input_is_valid
diff --git a/src/test/regress/sql/horology.sql b/src/test/regress/sql/horology.sql
index 4aa88b4ba9..1310b43277 100644
--- a/src/test/regress/sql/horology.sql
+++ b/src/test/regress/sql/horology.sql
@@ -538,6 +538,7 @@ SELECT to_timestamp('2011-12-18 11:38 EST', 'YYYY-MM-DD HH12:MI TZ');
 SELECT to_timestamp('2011-12-18 11:38 -05', 'YYYY-MM-DD HH12:MI TZ');
 SELECT to_timestamp('2011-12-18 11:38 +01:30', 'YYYY-MM-DD HH12:MI TZ');
 SELECT to_timestamp('2011-12-18 11:38 MSK', 'YYYY-MM-DD HH12:MI TZ');  -- dyntz
+SELECT to_timestamp('2011-12-18 00:00 LMT', 'YYYY-MM-DD HH24:MI TZ');  -- dyntz
 SELECT to_timestamp('2011-12-18 11:38ESTFOO24', 'YYYY-MM-DD HH12:MITZFOOSS');
 SELECT to_timestamp('2011-12-18 11:38-05FOO24', 'YYYY-MM-DD HH12:MITZFOOSS');
 SELECT to_timestamp('2011-12-18 11:38 JUNK', 'YYYY-MM-DD HH12:MI TZ');  -- error
diff --git a/src/test/regress/sql/timestamptz.sql b/src/test/regress/sql/timestamptz.sql
index a92586c363..2fa5378a57 100644
--- a/src/test/regress/sql/timestamptz.sql
+++ b/src/test/regress/sql/timestamptz.sql
@@ -109,6 +109,23 @@ SELECT '20500110 173201 Europe/Helsinki'::timestamptz; -- non-DST
 SELECT '205000-07-10 17:32:01 Europe/Helsinki'::timestamptz; -- DST
 SELECT '205000-01-10 17:32:01 Europe/Helsinki'::timestamptz; -- non-DST

+-- Recognize "LMT" as whatever it means in the current zone
+SELECT 'Jan 01 00:00:00 1000 LMT'::timestamptz;
+SELECT 'Jan 01 00:00:00 2024 LMT'::timestamptz;
+SET timezone = 'Europe/London';
+SELECT 'Jan 01 00:00:00 1000 LMT'::timestamptz;
+SELECT 'Jan 01 00:00:00 2024 LMT'::timestamptz;
+-- which might be nothing
+SET timezone = 'UTC';
+SELECT 'Jan 01 00:00:00 2024 LMT'::timestamptz;  -- fail
+-- Another example of an abbrev that varies across zones
+SELECT '1912-01-01 00:00 MMT'::timestamptz;  -- from timezone_abbreviations
+SET timezone = 'America/Montevideo';
+SELECT '1912-01-01 00:00'::timestamptz;
+SELECT '1912-01-01 00:00 MMT'::timestamptz;
+SELECT '1912-01-01 00:00 MMT'::timestamptz AT TIME ZONE 'UTC';
+RESET timezone;
+
 -- Test non-error-throwing API
 SELECT pg_input_is_valid('now', 'timestamptz');
 SELECT pg_input_is_valid('garbage', 'timestamptz');
diff --git a/src/timezone/localtime.c b/src/timezone/localtime.c
index 0bc160ea7d..65511ae8be 100644
--- a/src/timezone/localtime.c
+++ b/src/timezone/localtime.c
@@ -1843,6 +1843,82 @@ pg_interpret_timezone_abbrev(const char *abbrev,
     return false;                /* hm, not actually used in any interval? */
 }

+/*
+ * Detect whether a timezone abbreviation is defined within the given zone.
+ *
+ * This is similar to pg_interpret_timezone_abbrev() but is not concerned
+ * with a specific point in time.  We want to know if the abbreviation is
+ * known at all, and if so whether it has one meaning or several.
+ *
+ * Returns true if the abbreviation is known, false if not.
+ * If the abbreviation is known and has a single meaning (only one value
+ * of gmtoff/isdst), sets *isfixed = true and sets *gmtoff and *isdst.
+ * If there are multiple meanings, sets *isfixed = false.
+ *
+ * Note: abbrev is matched case-sensitively; it should be all-upper-case.
+ */
+bool
+pg_timezone_abbrev_is_known(const char *abbrev,
+                            bool *isfixed,
+                            long int *gmtoff,
+                            int *isdst,
+                            const pg_tz *tz)
+{
+    bool        result = false;
+    const struct state *sp = &tz->state;
+    const char *abbrs;
+    int            abbrind;
+
+    /*
+     * Locate the abbreviation in the zone's abbreviation list.  We assume
+     * there are not duplicates in the list.
+     */
+    abbrs = sp->chars;
+    abbrind = 0;
+    while (abbrind < sp->charcnt)
+    {
+        if (strcmp(abbrev, abbrs + abbrind) == 0)
+            break;
+        while (abbrs[abbrind] != '\0')
+            abbrind++;
+        abbrind++;
+    }
+    if (abbrind >= sp->charcnt)
+        return false;            /* definitely not there */
+
+    /*
+     * Scan the ttinfo array to find uses of the abbreviation.
+     */
+    for (int i = 0; i < sp->typecnt; i++)
+    {
+        const struct ttinfo *ttisp = &sp->ttis[i];
+
+        if (ttisp->tt_desigidx == abbrind)
+        {
+            if (!result)
+            {
+                /* First usage */
+                *isfixed = true;    /* for the moment */
+                *gmtoff = ttisp->tt_utoff;
+                *isdst = ttisp->tt_isdst;
+                result = true;
+            }
+            else
+            {
+                /* Second or later usage, does it match? */
+                if (*gmtoff != ttisp->tt_utoff ||
+                    *isdst != ttisp->tt_isdst)
+                {
+                    *isfixed = false;
+                    break;        /* no point in looking further */
+                }
+            }
+        }
+    }
+
+    return result;
+}
+
 /*
  * If the given timezone uses only one GMT offset, store that offset
  * into *gmtoff and return true, else return false.
--
2.43.5

From 44a4d51a1295264b778a533a0a9615762803603d Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Thu, 9 Jan 2025 17:14:19 -0500
Subject: [PATCH v2 2/3] Improve DecodeTimezoneAbbrev's caching logic.

The previous implementation could only cope with caching a
result found in the zoneabbrevtbl.  It is worth expending
a little more space to be able to cache results obtained from
the IANA timezone data, especially since that's likely to be
the majority use-case going forward.

To do this, we have to reset the cache after a change in
the timezone GUC not only the timezone_abbrev GUC, but that's
not hard.

In my testing, this puts the speed of repeated timestamptz_in
calls back on par with what it was before the previous patch.

Per report from Aleksander Alekseev and additional investigation.

Discussion: https://postgr.es/m/CAJ7c6TOATjJqvhnYsui0=CO5XFMF4dvTGH+skzB--jNhqSQu5g@mail.gmail.com
---
 src/backend/commands/variable.c  |  2 +
 src/backend/utils/adt/datetime.c | 69 ++++++++++++++++++++++++--------
 src/include/utils/datetime.h     |  2 +
 src/tools/pgindent/typedefs.list |  1 +
 4 files changed, 58 insertions(+), 16 deletions(-)

diff --git a/src/backend/commands/variable.c b/src/backend/commands/variable.c
index 44796bf15a..4ad6e236d6 100644
--- a/src/backend/commands/variable.c
+++ b/src/backend/commands/variable.c
@@ -381,6 +381,8 @@ void
 assign_timezone(const char *newval, void *extra)
 {
     session_timezone = *((pg_tz **) extra);
+    /* datetime.c's cache of timezone abbrevs may now be obsolete */
+    ClearTimeZoneAbbrevCache();
 }

 /*
diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c
index cb028d3934..ef04112602 100644
--- a/src/backend/utils/adt/datetime.c
+++ b/src/backend/utils/adt/datetime.c
@@ -259,7 +259,17 @@ static const datetkn *datecache[MAXDATEFIELDS] = {NULL};

 static const datetkn *deltacache[MAXDATEFIELDS] = {NULL};

-static const datetkn *abbrevcache[MAXDATEFIELDS] = {NULL};
+/* Cache for results of timezone abbreviation lookups */
+
+typedef struct TzAbbrevCache
+{
+    char        abbrev[TOKMAXLEN + 1];    /* always NUL-terminated */
+    char        ftype;            /* TZ, DTZ, or DYNTZ */
+    int            offset;            /* GMT offset, if fixed-offset */
+    pg_tz       *tz;                /* relevant zone, if variable-offset */
+} TzAbbrevCache;
+
+static TzAbbrevCache tzabbrevcache[MAXDATEFIELDS];


 /*
@@ -3126,15 +3136,28 @@ DecodeTimezoneAbbrev(int field, const char *lowtoken,
                      int *ftype, int *offset, pg_tz **tz,
                      DateTimeErrorExtra *extra)
 {
+    TzAbbrevCache *tzc = &tzabbrevcache[field];
     bool        isfixed;
     int            isdst;
     const datetkn *tp;

+    /*
+     * Do we have a cached result?  Use strncmp so that we match truncated
+     * names, although we shouldn't really see that happen with normal
+     * abbreviations.
+     */
+    if (strncmp(lowtoken, tzc->abbrev, TOKMAXLEN) == 0)
+    {
+        *ftype = tzc->ftype;
+        *offset = tzc->offset;
+        *tz = tzc->tz;
+        return 0;
+    }
+
     /*
      * See if the current session_timezone recognizes it.  Checking this
      * before zoneabbrevtbl allows us to correctly handle abbreviations whose
-     * meaning varies across zones, such as "LMT".  (Caching this lookup is
-     * left for later.)
+     * meaning varies across zones, such as "LMT".
      */
     if (session_timezone &&
         TimeZoneAbbrevIsKnown(lowtoken, session_timezone,
@@ -3144,20 +3167,20 @@ DecodeTimezoneAbbrev(int field, const char *lowtoken,
         *tz = (isfixed ? NULL : session_timezone);
         /* flip sign to agree with the convention used in zoneabbrevtbl */
         *offset = -(*offset);
+        /* cache result; use strlcpy to truncate name if necessary */
+        strlcpy(tzc->abbrev, lowtoken, TOKMAXLEN + 1);
+        tzc->ftype = *ftype;
+        tzc->offset = *offset;
+        tzc->tz = *tz;
         return 0;
     }

     /* Nope, so look in zoneabbrevtbl */
-    tp = abbrevcache[field];
-    /* use strncmp so that we match truncated tokens */
-    if (tp == NULL || strncmp(lowtoken, tp->token, TOKMAXLEN) != 0)
-    {
-        if (zoneabbrevtbl)
-            tp = datebsearch(lowtoken, zoneabbrevtbl->abbrevs,
-                             zoneabbrevtbl->numabbrevs);
-        else
-            tp = NULL;
-    }
+    if (zoneabbrevtbl)
+        tp = datebsearch(lowtoken, zoneabbrevtbl->abbrevs,
+                         zoneabbrevtbl->numabbrevs);
+    else
+        tp = NULL;
     if (tp == NULL)
     {
         *ftype = UNKNOWN_FIELD;
@@ -3167,7 +3190,6 @@ DecodeTimezoneAbbrev(int field, const char *lowtoken,
     }
     else
     {
-        abbrevcache[field] = tp;
         *ftype = tp->type;
         if (tp->type == DYNTZ)
         {
@@ -3181,11 +3203,26 @@ DecodeTimezoneAbbrev(int field, const char *lowtoken,
             *offset = tp->value;
             *tz = NULL;
         }
+
+        /* cache result; use strlcpy to truncate name if necessary */
+        strlcpy(tzc->abbrev, lowtoken, TOKMAXLEN + 1);
+        tzc->ftype = *ftype;
+        tzc->offset = *offset;
+        tzc->tz = *tz;
     }

     return 0;
 }

+/*
+ * Reset tzabbrevcache after a change in session_timezone.
+ */
+void
+ClearTimeZoneAbbrevCache(void)
+{
+    memset(tzabbrevcache, 0, sizeof(tzabbrevcache));
+}
+

 /* DecodeSpecial()
  * Decode text string using lookup table.
@@ -5036,8 +5073,8 @@ void
 InstallTimeZoneAbbrevs(TimeZoneAbbrevTable *tbl)
 {
     zoneabbrevtbl = tbl;
-    /* reset abbrevcache, which may contain pointers into old table */
-    memset(abbrevcache, 0, sizeof(abbrevcache));
+    /* reset tzabbrevcache, which may contain results from old table */
+    memset(tzabbrevcache, 0, sizeof(tzabbrevcache));
 }

 /*
diff --git a/src/include/utils/datetime.h b/src/include/utils/datetime.h
index 7fe12a4ea7..53a1c69eda 100644
--- a/src/include/utils/datetime.h
+++ b/src/include/utils/datetime.h
@@ -351,6 +351,8 @@ extern pg_tz *DecodeTimezoneNameToTz(const char *tzname);
 extern int    DecodeTimezoneAbbrevPrefix(const char *str,
                                        int *offset, pg_tz **tz);

+extern void ClearTimeZoneAbbrevCache(void);
+
 extern int    j2day(int date);

 extern struct Node *TemporalSimplify(int32 max_precis, struct Node *node);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 9f83ecf181..e11f8e781b 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3030,6 +3030,7 @@ TypeCat
 TypeFuncClass
 TypeInfo
 TypeName
+TzAbbrevCache
 U32
 U8
 UChar
--
2.43.5

From 932935ecdbcffda7824d1decb8d8ff8a2ad8a28e Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Thu, 9 Jan 2025 18:51:45 -0500
Subject: [PATCH v2 3/3] Add zone-derived abbreviations to the
 pg_timezone_abbrevs view.

This ensures that pg_timezone_abbrevs will report abbreviations
that are recognized via the IANA data, and *not* report any
timezone_abbreviations entries that are thereby overridden.

Under the hood, there are now two SRFs, one that pulls the IANA
data and one that pulls timezone_abbreviations entries.  They're
combined by logic in the view.  This approach was useful for
debugging (since the functions can be called on their own).
While I don't propose to document the functions explicitly,
they might be useful to call directly.

XXX: don't forget catversion bump.

Per report from Aleksander Alekseev and additional investigation.

Discussion: https://postgr.es/m/CAJ7c6TOATjJqvhnYsui0=CO5XFMF4dvTGH+skzB--jNhqSQu5g@mail.gmail.com
---
 doc/src/sgml/system-views.sgml         |  4 +-
 src/backend/catalog/system_views.sql   |  7 +-
 src/backend/utils/adt/datetime.c       | 92 +++++++++++++++++++++++++-
 src/include/catalog/pg_proc.dat        | 12 +++-
 src/include/pgtime.h                   |  2 +
 src/test/regress/expected/rules.out    | 17 +++--
 src/test/regress/expected/sysviews.out |  8 +++
 src/test/regress/sql/sysviews.sql      |  3 +
 src/timezone/localtime.c               | 38 +++++++++++
 9 files changed, 172 insertions(+), 11 deletions(-)

diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml
index a586156614..8e2b0a7927 100644
--- a/doc/src/sgml/system-views.sgml
+++ b/doc/src/sgml/system-views.sgml
@@ -4566,7 +4566,9 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
    The view <structname>pg_timezone_abbrevs</structname> provides a list
    of time zone abbreviations that are currently recognized by the datetime
    input routines.  The contents of this view change when the
-   <xref linkend="guc-timezone-abbreviations"/> run-time parameter is modified.
+   <xref linkend="guc-timezone"/> or
+   <xref linkend="guc-timezone-abbreviations"/> run-time parameters are
+   modified.
   </para>

   <table>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 7a595c84db..8e98bf847d 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -634,7 +634,12 @@ REVOKE ALL ON pg_ident_file_mappings FROM PUBLIC;
 REVOKE EXECUTE ON FUNCTION pg_ident_file_mappings() FROM PUBLIC;

 CREATE VIEW pg_timezone_abbrevs AS
-    SELECT * FROM pg_timezone_abbrevs();
+    SELECT * FROM pg_timezone_abbrevs_zone() z
+    UNION ALL
+    (SELECT * FROM pg_timezone_abbrevs_abbrevs() a
+     WHERE NOT EXISTS (SELECT 1 FROM pg_timezone_abbrevs_zone() z2
+                       WHERE z2.abbrev = a.abbrev))
+    ORDER BY abbrev;

 CREATE VIEW pg_timezone_names AS
     SELECT * FROM pg_timezone_names();
diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c
index ef04112602..5d893cff50 100644
--- a/src/backend/utils/adt/datetime.c
+++ b/src/backend/utils/adt/datetime.c
@@ -5110,11 +5110,99 @@ FetchDynamicTimeZone(TimeZoneAbbrevTable *tbl, const datetkn *tp,


 /*
- * This set-returning function reads all the available time zone abbreviations
+ * This set-returning function reads all the time zone abbreviations
+ * defined by the IANA data for the current timezone setting,
  * and returns a set of (abbrev, utc_offset, is_dst).
  */
 Datum
-pg_timezone_abbrevs(PG_FUNCTION_ARGS)
+pg_timezone_abbrevs_zone(PG_FUNCTION_ARGS)
+{
+    FuncCallContext *funcctx;
+    int           *pindex;
+    Datum        result;
+    HeapTuple    tuple;
+    Datum        values[3];
+    bool        nulls[3] = {0};
+    TimestampTz now = GetCurrentTransactionStartTimestamp();
+    pg_time_t    t = timestamptz_to_time_t(now);
+    const char *abbrev;
+    long int    gmtoff;
+    int            isdst;
+    struct pg_itm_in itm_in;
+    Interval   *resInterval;
+
+    /* stuff done only on the first call of the function */
+    if (SRF_IS_FIRSTCALL())
+    {
+        TupleDesc    tupdesc;
+        MemoryContext oldcontext;
+
+        /* create a function context for cross-call persistence */
+        funcctx = SRF_FIRSTCALL_INIT();
+
+        /*
+         * switch to memory context appropriate for multiple function calls
+         */
+        oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+        /* allocate memory for user context */
+        pindex = (int *) palloc(sizeof(int));
+        *pindex = 0;
+        funcctx->user_fctx = pindex;
+
+        if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+            elog(ERROR, "return type must be a row type");
+        funcctx->tuple_desc = tupdesc;
+
+        MemoryContextSwitchTo(oldcontext);
+    }
+
+    /* stuff done on every call of the function */
+    funcctx = SRF_PERCALL_SETUP();
+    pindex = (int *) funcctx->user_fctx;
+
+    while ((abbrev = pg_get_next_timezone_abbrev(pindex,
+                                                 session_timezone)) != NULL)
+    {
+        /* Ignore abbreviations that aren't all-alphabetic */
+        if (strspn(abbrev, "ABCDEFGHIJKLMNOPQRSTUVWXYZ") != strlen(abbrev))
+            continue;
+
+        /* Determine the current meaning of the abbrev */
+        if (!pg_interpret_timezone_abbrev(abbrev,
+                                          &t,
+                                          &gmtoff,
+                                          &isdst,
+                                          session_timezone))
+            continue;            /* hm, not actually used in this zone? */
+
+        values[0] = CStringGetTextDatum(abbrev);
+
+        /* Convert offset (in seconds) to an interval; can't overflow */
+        MemSet(&itm_in, 0, sizeof(struct pg_itm_in));
+        itm_in.tm_usec = (int64) gmtoff * USECS_PER_SEC;
+        resInterval = (Interval *) palloc(sizeof(Interval));
+        (void) itmin2interval(&itm_in, resInterval);
+        values[1] = IntervalPGetDatum(resInterval);
+
+        values[2] = BoolGetDatum(isdst);
+
+        tuple = heap_form_tuple(funcctx->tuple_desc, values, nulls);
+        result = HeapTupleGetDatum(tuple);
+
+        SRF_RETURN_NEXT(funcctx, result);
+    }
+
+    SRF_RETURN_DONE(funcctx);
+}
+
+/*
+ * This set-returning function reads all the time zone abbreviations
+ * defined by the timezone_abbreviations setting,
+ * and returns a set of (abbrev, utc_offset, is_dst).
+ */
+Datum
+pg_timezone_abbrevs_abbrevs(PG_FUNCTION_ARGS)
 {
     FuncCallContext *funcctx;
     int           *pindex;
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index b37e8a6f88..eed78a8a1e 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8392,12 +8392,18 @@
   proargmodes => '{o,o,o,o,o,o}',
   proargnames => '{name,statement,is_holdable,is_binary,is_scrollable,creation_time}',
   prosrc => 'pg_cursor' },
-{ oid => '2599', descr => 'get the available time zone abbreviations',
-  proname => 'pg_timezone_abbrevs', prorows => '1000', proretset => 't',
+{ oid => '9221', descr => 'get abbreviations from current timezone',
+  proname => 'pg_timezone_abbrevs_zone', prorows => '10', proretset => 't',
   provolatile => 's', prorettype => 'record', proargtypes => '',
   proallargtypes => '{text,interval,bool}', proargmodes => '{o,o,o}',
   proargnames => '{abbrev,utc_offset,is_dst}',
-  prosrc => 'pg_timezone_abbrevs' },
+  prosrc => 'pg_timezone_abbrevs_zone' },
+{ oid => '2599', descr => 'get abbreviations from timezone_abbreviations',
+  proname => 'pg_timezone_abbrevs_abbrevs', prorows => '1000', proretset => 't',
+  provolatile => 's', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{text,interval,bool}', proargmodes => '{o,o,o}',
+  proargnames => '{abbrev,utc_offset,is_dst}',
+  prosrc => 'pg_timezone_abbrevs_abbrevs' },
 { oid => '2856', descr => 'get the available time zone names',
   proname => 'pg_timezone_names', prorows => '1000', proretset => 't',
   provolatile => 's', prorettype => 'record', proargtypes => '',
diff --git a/src/include/pgtime.h b/src/include/pgtime.h
index b8b898a69c..5fc9f223de 100644
--- a/src/include/pgtime.h
+++ b/src/include/pgtime.h
@@ -74,6 +74,8 @@ extern bool pg_timezone_abbrev_is_known(const char *abbrev,
                                         long int *gmtoff,
                                         int *isdst,
                                         const pg_tz *tz);
+extern const char *pg_get_next_timezone_abbrev(int *indx,
+                                               const pg_tz *tz);
 extern bool pg_get_timezone_offset(const pg_tz *tz, long int *gmtoff);
 extern const char *pg_get_timezone_name(pg_tz *tz);
 extern bool pg_tz_acceptable(pg_tz *tz);
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 3014d047fe..91d316e3c4 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2627,10 +2627,19 @@ pg_tables| SELECT n.nspname AS schemaname,
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)))
      LEFT JOIN pg_tablespace t ON ((t.oid = c.reltablespace)))
   WHERE (c.relkind = ANY (ARRAY['r'::"char", 'p'::"char"]));
-pg_timezone_abbrevs| SELECT abbrev,
-    utc_offset,
-    is_dst
-   FROM pg_timezone_abbrevs() pg_timezone_abbrevs(abbrev, utc_offset, is_dst);
+pg_timezone_abbrevs| SELECT z.abbrev,
+    z.utc_offset,
+    z.is_dst
+   FROM pg_timezone_abbrevs_zone() z(abbrev, utc_offset, is_dst)
+UNION ALL
+ SELECT a.abbrev,
+    a.utc_offset,
+    a.is_dst
+   FROM pg_timezone_abbrevs_abbrevs() a(abbrev, utc_offset, is_dst)
+  WHERE (NOT (EXISTS ( SELECT 1
+           FROM pg_timezone_abbrevs_zone() z2(abbrev, utc_offset, is_dst)
+          WHERE (z2.abbrev = a.abbrev))))
+  ORDER BY 1;
 pg_timezone_names| SELECT name,
     abbrev,
     utc_offset,
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 91089ac215..352abc0bd4 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -223,3 +223,11 @@ select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
  t
 (1 row)

+-- One specific case we can check without much fear of breakage
+-- is the historical local-mean-time value used for America/Los_Angeles.
+select * from pg_timezone_abbrevs where abbrev = 'LMT';
+ abbrev |          utc_offset           | is_dst
+--------+-------------------------------+--------
+ LMT    | @ 7 hours 52 mins 58 secs ago | f
+(1 row)
+
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index b2a7923754..66179f026b 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -98,3 +98,6 @@ set timezone_abbreviations = 'Australia';
 select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
 set timezone_abbreviations = 'India';
 select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
+-- One specific case we can check without much fear of breakage
+-- is the historical local-mean-time value used for America/Los_Angeles.
+select * from pg_timezone_abbrevs where abbrev = 'LMT';
diff --git a/src/timezone/localtime.c b/src/timezone/localtime.c
index 65511ae8be..9f76212b7b 100644
--- a/src/timezone/localtime.c
+++ b/src/timezone/localtime.c
@@ -1919,6 +1919,44 @@ pg_timezone_abbrev_is_known(const char *abbrev,
     return result;
 }

+/*
+ * Iteratively fetch all the abbreviations used in the given time zone.
+ *
+ * *indx is a state counter that the caller must initialize to zero
+ * before the first call, and not touch between calls.
+ *
+ * Returns the next known abbreviation, or NULL if there are no more.
+ *
+ * Note: the caller typically applies pg_interpret_timezone_abbrev()
+ * to each result.  While that nominally results in O(N^2) time spent
+ * searching the sp->chars[] array, we don't expect any zone to have
+ * enough abbreviations to make that meaningful.
+ */
+const char *
+pg_get_next_timezone_abbrev(int *indx,
+                            const pg_tz *tz)
+{
+    const char *result;
+    const struct state *sp = &tz->state;
+    const char *abbrs;
+    int            abbrind;
+
+    /* If we're still in range, the result is the current abbrev. */
+    abbrs = sp->chars;
+    abbrind = *indx;
+    if (abbrind < 0 || abbrind >= sp->charcnt)
+        return NULL;
+    result = abbrs + abbrind;
+
+    /* Advance *indx past this abbrev and its trailing null. */
+    while (abbrs[abbrind] != '\0')
+        abbrind++;
+    abbrind++;
+    *indx = abbrind;
+
+    return result;
+}
+
 /*
  * If the given timezone uses only one GMT offset, store that offset
  * into *gmtoff and return true, else return false.
--
2.43.5


Re: IANA timezone abbreviations versus timezone_abbreviations

От
Tom Lane
Дата:
I wrote:
> 0003 is new work that fixes the pg_timezone_abbrevs view so
> that its output matches the new behavior.

Hmph ... the cfbot doesn't like that one bit.  It appears that the
"LMT" abbrev is known in my machine's system-provided tzdata tree,
but not when building our own timezone files.  I wonder if this is
exposing an existing bug (perhaps our copy of zic is too far out of
date?).  Will look into it tomorrow.

            regards, tom lane



Re: IANA timezone abbreviations versus timezone_abbreviations

От
Tom Lane
Дата:
I wrote:
> Hmph ... the cfbot doesn't like that one bit.  It appears that the
> "LMT" abbrev is known in my machine's system-provided tzdata tree,
> but not when building our own timezone files.  I wonder if this is
> exposing an existing bug (perhaps our copy of zic is too far out of
> date?).  Will look into it tomorrow.

Found it: there is indeed pre-existing breakage in
pg_interpret_timezone_abbrev(), but it doesn't manifest when using
--with-system-tzdata on RHEL8, so I'd missed it in my local testing.

0001-0003 are the same as last time (so I kept the patch version
as v2), and 0004 adds a fix for the pre-existing problem.

            regards, tom lane

From 4ce9ad893861b1a9e49e3c8567aa4468c17465fc Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Thu, 9 Jan 2025 17:08:56 -0500
Subject: [PATCH v2 1/4] Seek zone abbreviations in the IANA data before
 timezone_abbreviations.

If a time zone abbreviation used in datetime input is defined in
the currently active timezone, use that definition in preference
to looking in the timezone_abbreviations list.  That allows us to
correctly handle abbreviations that have different meanings in
different timezones.  Also, it eliminates an inconsistency between
datetime input and datetime output: the non-ISO datestyles for
timestamptz have always printed abbreviations taken from the IANA
data not from timezone_abbreviations.  Before this fix, it was
possible to demonstrate cases where casting a timestamp to text
and back fails or changes the value significantly because of that
inconsistency.

While this change removes the ability to override the IANA data about
an abbreviation known in the current zone, it's not clear that there's
any real use-case for doing so.  But it is clear that this makes life
a lot easier for dealing with abbreviations that have conflicts.

There are a couple of loose ends still to deal with:

* As this patch stands, it causes a noticeable degradation of the
runtime of timestamptz_in (about 20% in a microbenchmark of just
that function).  This is from DecodeTimezoneAbbrev not caching
the results of its lookup in the new path.  I split out the
improvement of that part for a follow-up patch.

* The pg_timezone_abbrevs view shows only abbreviations from
the timezone_abbreviations list.  That should probably be
adjusted to account for abbreviations taken from the timezone.

Per report from Aleksander Alekseev and additional investigation.

Discussion: https://postgr.es/m/CAJ7c6TOATjJqvhnYsui0=CO5XFMF4dvTGH+skzB--jNhqSQu5g@mail.gmail.com
---
 doc/src/sgml/config.sgml                  |  6 +-
 doc/src/sgml/datatype.sgml                |  4 +
 doc/src/sgml/datetime.sgml                | 42 ++++++++++-
 src/backend/utils/adt/datetime.c          | 89 +++++++++++++++++++++--
 src/include/pgtime.h                      |  5 ++
 src/test/regress/expected/horology.out    |  6 ++
 src/test/regress/expected/timestamptz.out | 59 +++++++++++++++
 src/test/regress/sql/horology.sql         |  1 +
 src/test/regress/sql/timestamptz.sql      | 17 +++++
 src/timezone/localtime.c                  | 76 +++++++++++++++++++
 10 files changed, 294 insertions(+), 11 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index f1ab614575..453d213966 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -10022,8 +10022,10 @@ SET XML OPTION { DOCUMENT | CONTENT };
       </term>
       <listitem>
        <para>
-        Sets the collection of time zone abbreviations that will be accepted
-        by the server for datetime input.  The default is <literal>'Default'</literal>,
+        Sets the collection of additional time zone abbreviations that
+        will be accepted by the server for datetime input (beyond any
+        abbreviations defined by the current <varname>TimeZone</varname>
+        setting).  The default is <literal>'Default'</literal>,
         which is a collection that works in most of the world; there are
         also <literal>'Australia'</literal> and <literal>'India'</literal>,
         and other collections can be defined for a particular installation.
diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 3e6751d64c..1d9127e94e 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -2534,6 +2534,10 @@ TIMESTAMP WITH TIME ZONE '2004-10-19 10:23:54+02'
      abbreviation if one is in common use in the current zone.  Otherwise
      it appears as a signed numeric offset in ISO 8601 basic format
      (<replaceable>hh</replaceable> or <replaceable>hhmm</replaceable>).
+     The alphabetic abbreviations shown in these styles are taken from the
+     IANA time zone database entry currently selected by the
+     <xref linkend="guc-timezone"/> run-time parameter; they are not
+     affected by the <xref linkend="guc-timezone-abbreviations"/> setting.
     </para>

     <para>
diff --git a/doc/src/sgml/datetime.sgml b/doc/src/sgml/datetime.sgml
index e7035c7806..3e24170acb 100644
--- a/doc/src/sgml/datetime.sgml
+++ b/doc/src/sgml/datetime.sgml
@@ -80,7 +80,7 @@
       <step>
        <para>
         See if the token matches any known time zone abbreviation.
-        These abbreviations are supplied by the configuration file
+        These abbreviations are determined by the configuration settings
         described in <xref linkend="datetime-config-files"/>.
        </para>
       </step>
@@ -424,9 +424,43 @@
    <para>
     Since timezone abbreviations are not well standardized,
     <productname>PostgreSQL</productname> provides a means to customize
-    the set of abbreviations accepted by the server.  The
-    <xref linkend="guc-timezone-abbreviations"/> run-time parameter
-    determines the active set of abbreviations.  While this parameter
+    the set of abbreviations accepted in datetime input.
+    There are two sources for these abbreviations:
+
+    <orderedlist>
+     <listitem>
+      <para>
+       The <xref linkend="guc-timezone"/> run-time parameter is usually
+       set to the name of an entry in the IANA time zone database.
+       If that zone has widely-used zone abbreviations, they will appear
+       in the IANA data, and <productname>PostgreSQL</productname> will
+       preferentially recognize those abbreviations with the meanings
+       given in the IANA data.
+       For example, if <varname>timezone</varname> is set
+       to <literal>America/New_York</literal> then <literal>EST</literal>
+       will be understood as UTC-5 and <literal>EDT</literal> will be
+       understood as UTC-4.  (These IANA abbreviations will also be used
+       in datetime output, if <xref linkend="guc-datestyle"/> is set to a
+       style that prefers non-numeric zone abbreviations.)
+      </para>
+     </listitem>
+
+     <listitem>
+      <para>
+       If an abbreviation is not found in the current IANA time zone,
+       it is sought in the list specified by the
+       <xref linkend="guc-timezone-abbreviations"/> run-time parameter.
+       The <varname>timezone_abbreviations</varname> list is primarily
+       useful for allowing datetime input to recognize abbreviations for
+       time zones other than the current zone.  (These abbreviations will
+       not be used in datetime output.)
+      </para>
+     </listitem>
+    </orderedlist>
+   </para>
+
+   <para>
+    While the <varname>timezone_abbreviations</varname> parameter
     can be altered by any database user, the possible values for it
     are under the control of the database administrator — they
     are in fact names of configuration files stored in
diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c
index d8af3591d1..cb028d3934 100644
--- a/src/backend/utils/adt/datetime.c
+++ b/src/backend/utils/adt/datetime.c
@@ -1845,6 +1845,40 @@ DetermineTimeZoneAbbrevOffsetInternal(pg_time_t t, const char *abbr, pg_tz *tzp,
 }


+/* TimeZoneAbbrevIsKnown()
+ *
+ * Detect whether the given string is a time zone abbreviation that's known
+ * in the specified TZDB timezone, and if so whether it's fixed or varying
+ * meaning.  The match is not case-sensitive.
+ */
+static bool
+TimeZoneAbbrevIsKnown(const char *abbr, pg_tz *tzp,
+                      bool *isfixed, int *offset, int *isdst)
+{
+    char        upabbr[TZ_STRLEN_MAX + 1];
+    unsigned char *p;
+    long int    gmtoff;
+
+    /* We need to force the abbrev to upper case */
+    strlcpy(upabbr, abbr, sizeof(upabbr));
+    for (p = (unsigned char *) upabbr; *p; p++)
+        *p = pg_toupper(*p);
+
+    /* Look up the abbrev's meaning in this zone */
+    if (pg_timezone_abbrev_is_known(upabbr,
+                                    isfixed,
+                                    &gmtoff,
+                                    isdst,
+                                    tzp))
+    {
+        /* Change sign to agree with DetermineTimeZoneOffset() */
+        *offset = (int) -gmtoff;
+        return true;
+    }
+    return false;
+}
+
+
 /* DecodeTimeOnly()
  * Interpret parsed string as time fields only.
  * Returns 0 if successful, DTERR code if bogus input detected.
@@ -3092,8 +3126,28 @@ DecodeTimezoneAbbrev(int field, const char *lowtoken,
                      int *ftype, int *offset, pg_tz **tz,
                      DateTimeErrorExtra *extra)
 {
+    bool        isfixed;
+    int            isdst;
     const datetkn *tp;

+    /*
+     * See if the current session_timezone recognizes it.  Checking this
+     * before zoneabbrevtbl allows us to correctly handle abbreviations whose
+     * meaning varies across zones, such as "LMT".  (Caching this lookup is
+     * left for later.)
+     */
+    if (session_timezone &&
+        TimeZoneAbbrevIsKnown(lowtoken, session_timezone,
+                              &isfixed, offset, &isdst))
+    {
+        *ftype = (isfixed ? (isdst ? DTZ : TZ) : DYNTZ);
+        *tz = (isfixed ? NULL : session_timezone);
+        /* flip sign to agree with the convention used in zoneabbrevtbl */
+        *offset = -(*offset);
+        return 0;
+    }
+
+    /* Nope, so look in zoneabbrevtbl */
     tp = abbrevcache[field];
     /* use strncmp so that we match truncated tokens */
     if (tp == NULL || strncmp(lowtoken, tp->token, TOKMAXLEN) != 0)
@@ -3109,6 +3163,7 @@ DecodeTimezoneAbbrev(int field, const char *lowtoken,
         *ftype = UNKNOWN_FIELD;
         *offset = 0;
         *tz = NULL;
+        /* failure results are not cached */
     }
     else
     {
@@ -3278,9 +3333,6 @@ DecodeTimezoneAbbrevPrefix(const char *str, int *offset, pg_tz **tz)
     *offset = 0;                /* avoid uninitialized vars on failure */
     *tz = NULL;

-    if (!zoneabbrevtbl)
-        return -1;                /* no abbrevs known, so fail immediately */
-
     /* Downcase as much of the string as we could need */
     for (len = 0; len < TOKMAXLEN; len++)
     {
@@ -3299,9 +3351,34 @@ DecodeTimezoneAbbrevPrefix(const char *str, int *offset, pg_tz **tz)
      */
     while (len > 0)
     {
-        const datetkn *tp = datebsearch(lowtoken, zoneabbrevtbl->abbrevs,
-                                        zoneabbrevtbl->numabbrevs);
+        bool        isfixed;
+        int            isdst;
+        const datetkn *tp;
+
+        /* See if the current session_timezone recognizes it. */
+        if (session_timezone &&
+            TimeZoneAbbrevIsKnown(lowtoken, session_timezone,
+                                  &isfixed, offset, &isdst))
+        {
+            if (isfixed)
+            {
+                /* flip sign to agree with the convention in zoneabbrevtbl */
+                *offset = -(*offset);
+            }
+            else
+            {
+                /* Caller must resolve the abbrev's current meaning */
+                *tz = session_timezone;
+            }
+            return len;
+        }

+        /* Known in zoneabbrevtbl? */
+        if (zoneabbrevtbl)
+            tp = datebsearch(lowtoken, zoneabbrevtbl->abbrevs,
+                             zoneabbrevtbl->numabbrevs);
+        else
+            tp = NULL;
         if (tp != NULL)
         {
             if (tp->type == DYNTZ)
@@ -3324,6 +3401,8 @@ DecodeTimezoneAbbrevPrefix(const char *str, int *offset, pg_tz **tz)
                 return len;
             }
         }
+
+        /* Nope, try the next shorter string. */
         lowtoken[--len] = '\0';
     }

diff --git a/src/include/pgtime.h b/src/include/pgtime.h
index 37171f1737..b8b898a69c 100644
--- a/src/include/pgtime.h
+++ b/src/include/pgtime.h
@@ -69,6 +69,11 @@ extern bool pg_interpret_timezone_abbrev(const char *abbrev,
                                          long int *gmtoff,
                                          int *isdst,
                                          const pg_tz *tz);
+extern bool pg_timezone_abbrev_is_known(const char *abbrev,
+                                        bool *isfixed,
+                                        long int *gmtoff,
+                                        int *isdst,
+                                        const pg_tz *tz);
 extern bool pg_get_timezone_offset(const pg_tz *tz, long int *gmtoff);
 extern const char *pg_get_timezone_name(pg_tz *tz);
 extern bool pg_tz_acceptable(pg_tz *tz);
diff --git a/src/test/regress/expected/horology.out b/src/test/regress/expected/horology.out
index cb28dfbaee..b90bfcd794 100644
--- a/src/test/regress/expected/horology.out
+++ b/src/test/regress/expected/horology.out
@@ -3332,6 +3332,12 @@ SELECT to_timestamp('2011-12-18 11:38 MSK', 'YYYY-MM-DD HH12:MI TZ');  -- dyntz
  Sat Dec 17 23:38:00 2011 PST
 (1 row)

+SELECT to_timestamp('2011-12-18 00:00 LMT', 'YYYY-MM-DD HH24:MI TZ');  -- dyntz
+         to_timestamp
+------------------------------
+ Sat Dec 17 23:52:58 2011 PST
+(1 row)
+
 SELECT to_timestamp('2011-12-18 11:38ESTFOO24', 'YYYY-MM-DD HH12:MITZFOOSS');
          to_timestamp
 ------------------------------
diff --git a/src/test/regress/expected/timestamptz.out b/src/test/regress/expected/timestamptz.out
index a6dd45626c..36349e363f 100644
--- a/src/test/regress/expected/timestamptz.out
+++ b/src/test/regress/expected/timestamptz.out
@@ -176,6 +176,65 @@ SELECT '205000-01-10 17:32:01 Europe/Helsinki'::timestamptz; -- non-DST
  Fri Jan 10 07:32:01 205000 PST
 (1 row)

+-- Recognize "LMT" as whatever it means in the current zone
+SELECT 'Jan 01 00:00:00 1000 LMT'::timestamptz;
+         timestamptz
+------------------------------
+ Wed Jan 01 00:00:00 1000 LMT
+(1 row)
+
+SELECT 'Jan 01 00:00:00 2024 LMT'::timestamptz;
+         timestamptz
+------------------------------
+ Sun Dec 31 23:52:58 2023 PST
+(1 row)
+
+SET timezone = 'Europe/London';
+SELECT 'Jan 01 00:00:00 1000 LMT'::timestamptz;
+         timestamptz
+------------------------------
+ Wed Jan 01 00:00:00 1000 LMT
+(1 row)
+
+SELECT 'Jan 01 00:00:00 2024 LMT'::timestamptz;
+         timestamptz
+------------------------------
+ Mon Jan 01 00:01:15 2024 GMT
+(1 row)
+
+-- which might be nothing
+SET timezone = 'UTC';
+SELECT 'Jan 01 00:00:00 2024 LMT'::timestamptz;  -- fail
+ERROR:  invalid input syntax for type timestamp with time zone: "Jan 01 00:00:00 2024 LMT"
+LINE 1: SELECT 'Jan 01 00:00:00 2024 LMT'::timestamptz;
+               ^
+-- Another example of an abbrev that varies across zones
+SELECT '1912-01-01 00:00 MMT'::timestamptz;  -- from timezone_abbreviations
+         timestamptz
+------------------------------
+ Sun Dec 31 17:30:00 1911 UTC
+(1 row)
+
+SET timezone = 'America/Montevideo';
+SELECT '1912-01-01 00:00'::timestamptz;
+         timestamptz
+------------------------------
+ Mon Jan 01 00:00:00 1912 MMT
+(1 row)
+
+SELECT '1912-01-01 00:00 MMT'::timestamptz;
+         timestamptz
+------------------------------
+ Mon Jan 01 00:00:00 1912 MMT
+(1 row)
+
+SELECT '1912-01-01 00:00 MMT'::timestamptz AT TIME ZONE 'UTC';
+         timezone
+--------------------------
+ Mon Jan 01 03:44:51 1912
+(1 row)
+
+RESET timezone;
 -- Test non-error-throwing API
 SELECT pg_input_is_valid('now', 'timestamptz');
  pg_input_is_valid
diff --git a/src/test/regress/sql/horology.sql b/src/test/regress/sql/horology.sql
index 4aa88b4ba9..1310b43277 100644
--- a/src/test/regress/sql/horology.sql
+++ b/src/test/regress/sql/horology.sql
@@ -538,6 +538,7 @@ SELECT to_timestamp('2011-12-18 11:38 EST', 'YYYY-MM-DD HH12:MI TZ');
 SELECT to_timestamp('2011-12-18 11:38 -05', 'YYYY-MM-DD HH12:MI TZ');
 SELECT to_timestamp('2011-12-18 11:38 +01:30', 'YYYY-MM-DD HH12:MI TZ');
 SELECT to_timestamp('2011-12-18 11:38 MSK', 'YYYY-MM-DD HH12:MI TZ');  -- dyntz
+SELECT to_timestamp('2011-12-18 00:00 LMT', 'YYYY-MM-DD HH24:MI TZ');  -- dyntz
 SELECT to_timestamp('2011-12-18 11:38ESTFOO24', 'YYYY-MM-DD HH12:MITZFOOSS');
 SELECT to_timestamp('2011-12-18 11:38-05FOO24', 'YYYY-MM-DD HH12:MITZFOOSS');
 SELECT to_timestamp('2011-12-18 11:38 JUNK', 'YYYY-MM-DD HH12:MI TZ');  -- error
diff --git a/src/test/regress/sql/timestamptz.sql b/src/test/regress/sql/timestamptz.sql
index a92586c363..2fa5378a57 100644
--- a/src/test/regress/sql/timestamptz.sql
+++ b/src/test/regress/sql/timestamptz.sql
@@ -109,6 +109,23 @@ SELECT '20500110 173201 Europe/Helsinki'::timestamptz; -- non-DST
 SELECT '205000-07-10 17:32:01 Europe/Helsinki'::timestamptz; -- DST
 SELECT '205000-01-10 17:32:01 Europe/Helsinki'::timestamptz; -- non-DST

+-- Recognize "LMT" as whatever it means in the current zone
+SELECT 'Jan 01 00:00:00 1000 LMT'::timestamptz;
+SELECT 'Jan 01 00:00:00 2024 LMT'::timestamptz;
+SET timezone = 'Europe/London';
+SELECT 'Jan 01 00:00:00 1000 LMT'::timestamptz;
+SELECT 'Jan 01 00:00:00 2024 LMT'::timestamptz;
+-- which might be nothing
+SET timezone = 'UTC';
+SELECT 'Jan 01 00:00:00 2024 LMT'::timestamptz;  -- fail
+-- Another example of an abbrev that varies across zones
+SELECT '1912-01-01 00:00 MMT'::timestamptz;  -- from timezone_abbreviations
+SET timezone = 'America/Montevideo';
+SELECT '1912-01-01 00:00'::timestamptz;
+SELECT '1912-01-01 00:00 MMT'::timestamptz;
+SELECT '1912-01-01 00:00 MMT'::timestamptz AT TIME ZONE 'UTC';
+RESET timezone;
+
 -- Test non-error-throwing API
 SELECT pg_input_is_valid('now', 'timestamptz');
 SELECT pg_input_is_valid('garbage', 'timestamptz');
diff --git a/src/timezone/localtime.c b/src/timezone/localtime.c
index 0bc160ea7d..65511ae8be 100644
--- a/src/timezone/localtime.c
+++ b/src/timezone/localtime.c
@@ -1843,6 +1843,82 @@ pg_interpret_timezone_abbrev(const char *abbrev,
     return false;                /* hm, not actually used in any interval? */
 }

+/*
+ * Detect whether a timezone abbreviation is defined within the given zone.
+ *
+ * This is similar to pg_interpret_timezone_abbrev() but is not concerned
+ * with a specific point in time.  We want to know if the abbreviation is
+ * known at all, and if so whether it has one meaning or several.
+ *
+ * Returns true if the abbreviation is known, false if not.
+ * If the abbreviation is known and has a single meaning (only one value
+ * of gmtoff/isdst), sets *isfixed = true and sets *gmtoff and *isdst.
+ * If there are multiple meanings, sets *isfixed = false.
+ *
+ * Note: abbrev is matched case-sensitively; it should be all-upper-case.
+ */
+bool
+pg_timezone_abbrev_is_known(const char *abbrev,
+                            bool *isfixed,
+                            long int *gmtoff,
+                            int *isdst,
+                            const pg_tz *tz)
+{
+    bool        result = false;
+    const struct state *sp = &tz->state;
+    const char *abbrs;
+    int            abbrind;
+
+    /*
+     * Locate the abbreviation in the zone's abbreviation list.  We assume
+     * there are not duplicates in the list.
+     */
+    abbrs = sp->chars;
+    abbrind = 0;
+    while (abbrind < sp->charcnt)
+    {
+        if (strcmp(abbrev, abbrs + abbrind) == 0)
+            break;
+        while (abbrs[abbrind] != '\0')
+            abbrind++;
+        abbrind++;
+    }
+    if (abbrind >= sp->charcnt)
+        return false;            /* definitely not there */
+
+    /*
+     * Scan the ttinfo array to find uses of the abbreviation.
+     */
+    for (int i = 0; i < sp->typecnt; i++)
+    {
+        const struct ttinfo *ttisp = &sp->ttis[i];
+
+        if (ttisp->tt_desigidx == abbrind)
+        {
+            if (!result)
+            {
+                /* First usage */
+                *isfixed = true;    /* for the moment */
+                *gmtoff = ttisp->tt_utoff;
+                *isdst = ttisp->tt_isdst;
+                result = true;
+            }
+            else
+            {
+                /* Second or later usage, does it match? */
+                if (*gmtoff != ttisp->tt_utoff ||
+                    *isdst != ttisp->tt_isdst)
+                {
+                    *isfixed = false;
+                    break;        /* no point in looking further */
+                }
+            }
+        }
+    }
+
+    return result;
+}
+
 /*
  * If the given timezone uses only one GMT offset, store that offset
  * into *gmtoff and return true, else return false.
--
2.43.5

From 44a4d51a1295264b778a533a0a9615762803603d Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Thu, 9 Jan 2025 17:14:19 -0500
Subject: [PATCH v2 2/4] Improve DecodeTimezoneAbbrev's caching logic.

The previous implementation could only cope with caching a
result found in the zoneabbrevtbl.  It is worth expending
a little more space to be able to cache results obtained from
the IANA timezone data, especially since that's likely to be
the majority use-case going forward.

To do this, we have to reset the cache after a change in
the timezone GUC not only the timezone_abbrev GUC, but that's
not hard.

In my testing, this puts the speed of repeated timestamptz_in
calls back on par with what it was before the previous patch.

Per report from Aleksander Alekseev and additional investigation.

Discussion: https://postgr.es/m/CAJ7c6TOATjJqvhnYsui0=CO5XFMF4dvTGH+skzB--jNhqSQu5g@mail.gmail.com
---
 src/backend/commands/variable.c  |  2 +
 src/backend/utils/adt/datetime.c | 69 ++++++++++++++++++++++++--------
 src/include/utils/datetime.h     |  2 +
 src/tools/pgindent/typedefs.list |  1 +
 4 files changed, 58 insertions(+), 16 deletions(-)

diff --git a/src/backend/commands/variable.c b/src/backend/commands/variable.c
index 44796bf15a..4ad6e236d6 100644
--- a/src/backend/commands/variable.c
+++ b/src/backend/commands/variable.c
@@ -381,6 +381,8 @@ void
 assign_timezone(const char *newval, void *extra)
 {
     session_timezone = *((pg_tz **) extra);
+    /* datetime.c's cache of timezone abbrevs may now be obsolete */
+    ClearTimeZoneAbbrevCache();
 }

 /*
diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c
index cb028d3934..ef04112602 100644
--- a/src/backend/utils/adt/datetime.c
+++ b/src/backend/utils/adt/datetime.c
@@ -259,7 +259,17 @@ static const datetkn *datecache[MAXDATEFIELDS] = {NULL};

 static const datetkn *deltacache[MAXDATEFIELDS] = {NULL};

-static const datetkn *abbrevcache[MAXDATEFIELDS] = {NULL};
+/* Cache for results of timezone abbreviation lookups */
+
+typedef struct TzAbbrevCache
+{
+    char        abbrev[TOKMAXLEN + 1];    /* always NUL-terminated */
+    char        ftype;            /* TZ, DTZ, or DYNTZ */
+    int            offset;            /* GMT offset, if fixed-offset */
+    pg_tz       *tz;                /* relevant zone, if variable-offset */
+} TzAbbrevCache;
+
+static TzAbbrevCache tzabbrevcache[MAXDATEFIELDS];


 /*
@@ -3126,15 +3136,28 @@ DecodeTimezoneAbbrev(int field, const char *lowtoken,
                      int *ftype, int *offset, pg_tz **tz,
                      DateTimeErrorExtra *extra)
 {
+    TzAbbrevCache *tzc = &tzabbrevcache[field];
     bool        isfixed;
     int            isdst;
     const datetkn *tp;

+    /*
+     * Do we have a cached result?  Use strncmp so that we match truncated
+     * names, although we shouldn't really see that happen with normal
+     * abbreviations.
+     */
+    if (strncmp(lowtoken, tzc->abbrev, TOKMAXLEN) == 0)
+    {
+        *ftype = tzc->ftype;
+        *offset = tzc->offset;
+        *tz = tzc->tz;
+        return 0;
+    }
+
     /*
      * See if the current session_timezone recognizes it.  Checking this
      * before zoneabbrevtbl allows us to correctly handle abbreviations whose
-     * meaning varies across zones, such as "LMT".  (Caching this lookup is
-     * left for later.)
+     * meaning varies across zones, such as "LMT".
      */
     if (session_timezone &&
         TimeZoneAbbrevIsKnown(lowtoken, session_timezone,
@@ -3144,20 +3167,20 @@ DecodeTimezoneAbbrev(int field, const char *lowtoken,
         *tz = (isfixed ? NULL : session_timezone);
         /* flip sign to agree with the convention used in zoneabbrevtbl */
         *offset = -(*offset);
+        /* cache result; use strlcpy to truncate name if necessary */
+        strlcpy(tzc->abbrev, lowtoken, TOKMAXLEN + 1);
+        tzc->ftype = *ftype;
+        tzc->offset = *offset;
+        tzc->tz = *tz;
         return 0;
     }

     /* Nope, so look in zoneabbrevtbl */
-    tp = abbrevcache[field];
-    /* use strncmp so that we match truncated tokens */
-    if (tp == NULL || strncmp(lowtoken, tp->token, TOKMAXLEN) != 0)
-    {
-        if (zoneabbrevtbl)
-            tp = datebsearch(lowtoken, zoneabbrevtbl->abbrevs,
-                             zoneabbrevtbl->numabbrevs);
-        else
-            tp = NULL;
-    }
+    if (zoneabbrevtbl)
+        tp = datebsearch(lowtoken, zoneabbrevtbl->abbrevs,
+                         zoneabbrevtbl->numabbrevs);
+    else
+        tp = NULL;
     if (tp == NULL)
     {
         *ftype = UNKNOWN_FIELD;
@@ -3167,7 +3190,6 @@ DecodeTimezoneAbbrev(int field, const char *lowtoken,
     }
     else
     {
-        abbrevcache[field] = tp;
         *ftype = tp->type;
         if (tp->type == DYNTZ)
         {
@@ -3181,11 +3203,26 @@ DecodeTimezoneAbbrev(int field, const char *lowtoken,
             *offset = tp->value;
             *tz = NULL;
         }
+
+        /* cache result; use strlcpy to truncate name if necessary */
+        strlcpy(tzc->abbrev, lowtoken, TOKMAXLEN + 1);
+        tzc->ftype = *ftype;
+        tzc->offset = *offset;
+        tzc->tz = *tz;
     }

     return 0;
 }

+/*
+ * Reset tzabbrevcache after a change in session_timezone.
+ */
+void
+ClearTimeZoneAbbrevCache(void)
+{
+    memset(tzabbrevcache, 0, sizeof(tzabbrevcache));
+}
+

 /* DecodeSpecial()
  * Decode text string using lookup table.
@@ -5036,8 +5073,8 @@ void
 InstallTimeZoneAbbrevs(TimeZoneAbbrevTable *tbl)
 {
     zoneabbrevtbl = tbl;
-    /* reset abbrevcache, which may contain pointers into old table */
-    memset(abbrevcache, 0, sizeof(abbrevcache));
+    /* reset tzabbrevcache, which may contain results from old table */
+    memset(tzabbrevcache, 0, sizeof(tzabbrevcache));
 }

 /*
diff --git a/src/include/utils/datetime.h b/src/include/utils/datetime.h
index 7fe12a4ea7..53a1c69eda 100644
--- a/src/include/utils/datetime.h
+++ b/src/include/utils/datetime.h
@@ -351,6 +351,8 @@ extern pg_tz *DecodeTimezoneNameToTz(const char *tzname);
 extern int    DecodeTimezoneAbbrevPrefix(const char *str,
                                        int *offset, pg_tz **tz);

+extern void ClearTimeZoneAbbrevCache(void);
+
 extern int    j2day(int date);

 extern struct Node *TemporalSimplify(int32 max_precis, struct Node *node);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 9f83ecf181..e11f8e781b 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3030,6 +3030,7 @@ TypeCat
 TypeFuncClass
 TypeInfo
 TypeName
+TzAbbrevCache
 U32
 U8
 UChar
--
2.43.5

From 932935ecdbcffda7824d1decb8d8ff8a2ad8a28e Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Thu, 9 Jan 2025 18:51:45 -0500
Subject: [PATCH v2 3/4] Add zone-derived abbreviations to the
 pg_timezone_abbrevs view.

This ensures that pg_timezone_abbrevs will report abbreviations
that are recognized via the IANA data, and *not* report any
timezone_abbreviations entries that are thereby overridden.

Under the hood, there are now two SRFs, one that pulls the IANA
data and one that pulls timezone_abbreviations entries.  They're
combined by logic in the view.  This approach was useful for
debugging (since the functions can be called on their own).
While I don't propose to document the functions explicitly,
they might be useful to call directly.

XXX: don't forget catversion bump.

Per report from Aleksander Alekseev and additional investigation.

Discussion: https://postgr.es/m/CAJ7c6TOATjJqvhnYsui0=CO5XFMF4dvTGH+skzB--jNhqSQu5g@mail.gmail.com
---
 doc/src/sgml/system-views.sgml         |  4 +-
 src/backend/catalog/system_views.sql   |  7 +-
 src/backend/utils/adt/datetime.c       | 92 +++++++++++++++++++++++++-
 src/include/catalog/pg_proc.dat        | 12 +++-
 src/include/pgtime.h                   |  2 +
 src/test/regress/expected/rules.out    | 17 +++--
 src/test/regress/expected/sysviews.out |  8 +++
 src/test/regress/sql/sysviews.sql      |  3 +
 src/timezone/localtime.c               | 38 +++++++++++
 9 files changed, 172 insertions(+), 11 deletions(-)

diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml
index a586156614..8e2b0a7927 100644
--- a/doc/src/sgml/system-views.sgml
+++ b/doc/src/sgml/system-views.sgml
@@ -4566,7 +4566,9 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
    The view <structname>pg_timezone_abbrevs</structname> provides a list
    of time zone abbreviations that are currently recognized by the datetime
    input routines.  The contents of this view change when the
-   <xref linkend="guc-timezone-abbreviations"/> run-time parameter is modified.
+   <xref linkend="guc-timezone"/> or
+   <xref linkend="guc-timezone-abbreviations"/> run-time parameters are
+   modified.
   </para>

   <table>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 7a595c84db..8e98bf847d 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -634,7 +634,12 @@ REVOKE ALL ON pg_ident_file_mappings FROM PUBLIC;
 REVOKE EXECUTE ON FUNCTION pg_ident_file_mappings() FROM PUBLIC;

 CREATE VIEW pg_timezone_abbrevs AS
-    SELECT * FROM pg_timezone_abbrevs();
+    SELECT * FROM pg_timezone_abbrevs_zone() z
+    UNION ALL
+    (SELECT * FROM pg_timezone_abbrevs_abbrevs() a
+     WHERE NOT EXISTS (SELECT 1 FROM pg_timezone_abbrevs_zone() z2
+                       WHERE z2.abbrev = a.abbrev))
+    ORDER BY abbrev;

 CREATE VIEW pg_timezone_names AS
     SELECT * FROM pg_timezone_names();
diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c
index ef04112602..5d893cff50 100644
--- a/src/backend/utils/adt/datetime.c
+++ b/src/backend/utils/adt/datetime.c
@@ -5110,11 +5110,99 @@ FetchDynamicTimeZone(TimeZoneAbbrevTable *tbl, const datetkn *tp,


 /*
- * This set-returning function reads all the available time zone abbreviations
+ * This set-returning function reads all the time zone abbreviations
+ * defined by the IANA data for the current timezone setting,
  * and returns a set of (abbrev, utc_offset, is_dst).
  */
 Datum
-pg_timezone_abbrevs(PG_FUNCTION_ARGS)
+pg_timezone_abbrevs_zone(PG_FUNCTION_ARGS)
+{
+    FuncCallContext *funcctx;
+    int           *pindex;
+    Datum        result;
+    HeapTuple    tuple;
+    Datum        values[3];
+    bool        nulls[3] = {0};
+    TimestampTz now = GetCurrentTransactionStartTimestamp();
+    pg_time_t    t = timestamptz_to_time_t(now);
+    const char *abbrev;
+    long int    gmtoff;
+    int            isdst;
+    struct pg_itm_in itm_in;
+    Interval   *resInterval;
+
+    /* stuff done only on the first call of the function */
+    if (SRF_IS_FIRSTCALL())
+    {
+        TupleDesc    tupdesc;
+        MemoryContext oldcontext;
+
+        /* create a function context for cross-call persistence */
+        funcctx = SRF_FIRSTCALL_INIT();
+
+        /*
+         * switch to memory context appropriate for multiple function calls
+         */
+        oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+        /* allocate memory for user context */
+        pindex = (int *) palloc(sizeof(int));
+        *pindex = 0;
+        funcctx->user_fctx = pindex;
+
+        if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+            elog(ERROR, "return type must be a row type");
+        funcctx->tuple_desc = tupdesc;
+
+        MemoryContextSwitchTo(oldcontext);
+    }
+
+    /* stuff done on every call of the function */
+    funcctx = SRF_PERCALL_SETUP();
+    pindex = (int *) funcctx->user_fctx;
+
+    while ((abbrev = pg_get_next_timezone_abbrev(pindex,
+                                                 session_timezone)) != NULL)
+    {
+        /* Ignore abbreviations that aren't all-alphabetic */
+        if (strspn(abbrev, "ABCDEFGHIJKLMNOPQRSTUVWXYZ") != strlen(abbrev))
+            continue;
+
+        /* Determine the current meaning of the abbrev */
+        if (!pg_interpret_timezone_abbrev(abbrev,
+                                          &t,
+                                          &gmtoff,
+                                          &isdst,
+                                          session_timezone))
+            continue;            /* hm, not actually used in this zone? */
+
+        values[0] = CStringGetTextDatum(abbrev);
+
+        /* Convert offset (in seconds) to an interval; can't overflow */
+        MemSet(&itm_in, 0, sizeof(struct pg_itm_in));
+        itm_in.tm_usec = (int64) gmtoff * USECS_PER_SEC;
+        resInterval = (Interval *) palloc(sizeof(Interval));
+        (void) itmin2interval(&itm_in, resInterval);
+        values[1] = IntervalPGetDatum(resInterval);
+
+        values[2] = BoolGetDatum(isdst);
+
+        tuple = heap_form_tuple(funcctx->tuple_desc, values, nulls);
+        result = HeapTupleGetDatum(tuple);
+
+        SRF_RETURN_NEXT(funcctx, result);
+    }
+
+    SRF_RETURN_DONE(funcctx);
+}
+
+/*
+ * This set-returning function reads all the time zone abbreviations
+ * defined by the timezone_abbreviations setting,
+ * and returns a set of (abbrev, utc_offset, is_dst).
+ */
+Datum
+pg_timezone_abbrevs_abbrevs(PG_FUNCTION_ARGS)
 {
     FuncCallContext *funcctx;
     int           *pindex;
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index b37e8a6f88..eed78a8a1e 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8392,12 +8392,18 @@
   proargmodes => '{o,o,o,o,o,o}',
   proargnames => '{name,statement,is_holdable,is_binary,is_scrollable,creation_time}',
   prosrc => 'pg_cursor' },
-{ oid => '2599', descr => 'get the available time zone abbreviations',
-  proname => 'pg_timezone_abbrevs', prorows => '1000', proretset => 't',
+{ oid => '9221', descr => 'get abbreviations from current timezone',
+  proname => 'pg_timezone_abbrevs_zone', prorows => '10', proretset => 't',
   provolatile => 's', prorettype => 'record', proargtypes => '',
   proallargtypes => '{text,interval,bool}', proargmodes => '{o,o,o}',
   proargnames => '{abbrev,utc_offset,is_dst}',
-  prosrc => 'pg_timezone_abbrevs' },
+  prosrc => 'pg_timezone_abbrevs_zone' },
+{ oid => '2599', descr => 'get abbreviations from timezone_abbreviations',
+  proname => 'pg_timezone_abbrevs_abbrevs', prorows => '1000', proretset => 't',
+  provolatile => 's', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{text,interval,bool}', proargmodes => '{o,o,o}',
+  proargnames => '{abbrev,utc_offset,is_dst}',
+  prosrc => 'pg_timezone_abbrevs_abbrevs' },
 { oid => '2856', descr => 'get the available time zone names',
   proname => 'pg_timezone_names', prorows => '1000', proretset => 't',
   provolatile => 's', prorettype => 'record', proargtypes => '',
diff --git a/src/include/pgtime.h b/src/include/pgtime.h
index b8b898a69c..5fc9f223de 100644
--- a/src/include/pgtime.h
+++ b/src/include/pgtime.h
@@ -74,6 +74,8 @@ extern bool pg_timezone_abbrev_is_known(const char *abbrev,
                                         long int *gmtoff,
                                         int *isdst,
                                         const pg_tz *tz);
+extern const char *pg_get_next_timezone_abbrev(int *indx,
+                                               const pg_tz *tz);
 extern bool pg_get_timezone_offset(const pg_tz *tz, long int *gmtoff);
 extern const char *pg_get_timezone_name(pg_tz *tz);
 extern bool pg_tz_acceptable(pg_tz *tz);
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 3014d047fe..91d316e3c4 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2627,10 +2627,19 @@ pg_tables| SELECT n.nspname AS schemaname,
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)))
      LEFT JOIN pg_tablespace t ON ((t.oid = c.reltablespace)))
   WHERE (c.relkind = ANY (ARRAY['r'::"char", 'p'::"char"]));
-pg_timezone_abbrevs| SELECT abbrev,
-    utc_offset,
-    is_dst
-   FROM pg_timezone_abbrevs() pg_timezone_abbrevs(abbrev, utc_offset, is_dst);
+pg_timezone_abbrevs| SELECT z.abbrev,
+    z.utc_offset,
+    z.is_dst
+   FROM pg_timezone_abbrevs_zone() z(abbrev, utc_offset, is_dst)
+UNION ALL
+ SELECT a.abbrev,
+    a.utc_offset,
+    a.is_dst
+   FROM pg_timezone_abbrevs_abbrevs() a(abbrev, utc_offset, is_dst)
+  WHERE (NOT (EXISTS ( SELECT 1
+           FROM pg_timezone_abbrevs_zone() z2(abbrev, utc_offset, is_dst)
+          WHERE (z2.abbrev = a.abbrev))))
+  ORDER BY 1;
 pg_timezone_names| SELECT name,
     abbrev,
     utc_offset,
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 91089ac215..352abc0bd4 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -223,3 +223,11 @@ select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
  t
 (1 row)

+-- One specific case we can check without much fear of breakage
+-- is the historical local-mean-time value used for America/Los_Angeles.
+select * from pg_timezone_abbrevs where abbrev = 'LMT';
+ abbrev |          utc_offset           | is_dst
+--------+-------------------------------+--------
+ LMT    | @ 7 hours 52 mins 58 secs ago | f
+(1 row)
+
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index b2a7923754..66179f026b 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -98,3 +98,6 @@ set timezone_abbreviations = 'Australia';
 select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
 set timezone_abbreviations = 'India';
 select count(distinct utc_offset) >= 24 as ok from pg_timezone_abbrevs;
+-- One specific case we can check without much fear of breakage
+-- is the historical local-mean-time value used for America/Los_Angeles.
+select * from pg_timezone_abbrevs where abbrev = 'LMT';
diff --git a/src/timezone/localtime.c b/src/timezone/localtime.c
index 65511ae8be..9f76212b7b 100644
--- a/src/timezone/localtime.c
+++ b/src/timezone/localtime.c
@@ -1919,6 +1919,44 @@ pg_timezone_abbrev_is_known(const char *abbrev,
     return result;
 }

+/*
+ * Iteratively fetch all the abbreviations used in the given time zone.
+ *
+ * *indx is a state counter that the caller must initialize to zero
+ * before the first call, and not touch between calls.
+ *
+ * Returns the next known abbreviation, or NULL if there are no more.
+ *
+ * Note: the caller typically applies pg_interpret_timezone_abbrev()
+ * to each result.  While that nominally results in O(N^2) time spent
+ * searching the sp->chars[] array, we don't expect any zone to have
+ * enough abbreviations to make that meaningful.
+ */
+const char *
+pg_get_next_timezone_abbrev(int *indx,
+                            const pg_tz *tz)
+{
+    const char *result;
+    const struct state *sp = &tz->state;
+    const char *abbrs;
+    int            abbrind;
+
+    /* If we're still in range, the result is the current abbrev. */
+    abbrs = sp->chars;
+    abbrind = *indx;
+    if (abbrind < 0 || abbrind >= sp->charcnt)
+        return NULL;
+    result = abbrs + abbrind;
+
+    /* Advance *indx past this abbrev and its trailing null. */
+    while (abbrs[abbrind] != '\0')
+        abbrind++;
+    abbrind++;
+    *indx = abbrind;
+
+    return result;
+}
+
 /*
  * If the given timezone uses only one GMT offset, store that offset
  * into *gmtoff and return true, else return false.
--
2.43.5

From e24d28fd2f3da4779d8ac40cd1faf3d09d30d8ba Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Fri, 10 Jan 2025 15:13:30 -0500
Subject: [PATCH v2 4/4] Make pg_interpret_timezone_abbrev() check
 sp->defaulttype too.

This omission caused it to miss the furthest-back zone abbreviation
when working with timezone data compiled with relatively recent zic
(2018e or newer, I think).  Older versions of zic produced a phony
DST transition at the Big Bang, so that the oldest abbreviation
could always be found in the sp->types[] array; but newer versions
don't do that, so that we must examine defaulttype as well as
the types[] array to be sure of seeing all the abbreviations.

While this has been broken for six or so years, we'd managed not
to notice for two reasons: (1) many platforms are still using
ancient zic for compatibility reasons, so that the issue did not
manifest in builds using --with-system-tzdata; (2) the oldest
zone abbreviation is almost always "LMT", which we weren't
supporting anyway until the current patch series.

While here, update pg_next_dst_boundary() to use sp->defaulttype
as the time type for non-DST zones and times before the oldest
DST transition.  The existing code here predates upstream's
invention of the sp->defaulttype field, and its heuristic for
finding the oldest time type has now been subsumed into the
code that fills in sp->defaulttype.

Possibly this should be back-patched, but I'm unsure whether
there are any visible consequences in released branches.

Per report from Aleksander Alekseev and additional investigation.

Discussion: https://postgr.es/m/CAJ7c6TOATjJqvhnYsui0=CO5XFMF4dvTGH+skzB--jNhqSQu5g@mail.gmail.com
---
 src/timezone/localtime.c | 40 ++++++++++++++++++++--------------------
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/src/timezone/localtime.c b/src/timezone/localtime.c
index 9f76212b7b..8eb02ef146 100644
--- a/src/timezone/localtime.c
+++ b/src/timezone/localtime.c
@@ -1624,15 +1624,8 @@ pg_next_dst_boundary(const pg_time_t *timep,
     sp = &tz->state;
     if (sp->timecnt == 0)
     {
-        /* non-DST zone, use lowest-numbered standard type */
-        i = 0;
-        while (sp->ttis[i].tt_isdst)
-            if (++i >= sp->typecnt)
-            {
-                i = 0;
-                break;
-            }
-        ttisp = &sp->ttis[i];
+        /* non-DST zone, use the defaulttype */
+        ttisp = &sp->ttis[sp->defaulttype];
         *before_gmtoff = ttisp->tt_utoff;
         *before_isdst = ttisp->tt_isdst;
         return 0;
@@ -1692,15 +1685,8 @@ pg_next_dst_boundary(const pg_time_t *timep,
     }
     if (t < sp->ats[0])
     {
-        /* For "before", use lowest-numbered standard type */
-        i = 0;
-        while (sp->ttis[i].tt_isdst)
-            if (++i >= sp->typecnt)
-            {
-                i = 0;
-                break;
-            }
-        ttisp = &sp->ttis[i];
+        /* For "before", use the defaulttype */
+        ttisp = &sp->ttis[sp->defaulttype];
         *before_gmtoff = ttisp->tt_utoff;
         *before_isdst = ttisp->tt_isdst;
         *boundary = sp->ats[0];
@@ -1793,7 +1779,9 @@ pg_interpret_timezone_abbrev(const char *abbrev,
      * abbreviation should get us what we want, since extrapolation would just
      * be repeating the newest or oldest meanings.
      *
-     * Use binary search to locate the first transition > cutoff time.
+     * Use binary search to locate the first transition > cutoff time.  (Note
+     * that sp->timecnt could be zero, in which case this loop does nothing
+     * and only the defaulttype entry will be checked.)
      */
     {
         int            lo = 0;
@@ -1827,7 +1815,19 @@ pg_interpret_timezone_abbrev(const char *abbrev,
     }

     /*
-     * Not there, so scan forwards to find the first one after.
+     * Not found yet; check the defaulttype, which is notionally the era
+     * before any of the entries in sp->types[].
+     */
+    ttisp = &sp->ttis[sp->defaulttype];
+    if (ttisp->tt_desigidx == abbrind)
+    {
+        *gmtoff = ttisp->tt_utoff;
+        *isdst = ttisp->tt_isdst;
+        return true;
+    }
+
+    /*
+     * Not there, so scan forwards to find the first one after the cutoff.
      */
     for (i = cutoff; i < sp->timecnt; i++)
     {
--
2.43.5