Обсуждение: Locale support
I am Yonathan Misgan from Ethiopia, want to add some functionalities on PostgreSQL to support Ethiopian locales. I want your advice where I start to hack the PostgresSQL source code. I have attached some synopsis about the existing problems of PostgresSQL related with Ethiopian locale specially related with calendar, date and time format. Please don't mind about date and time written with Amharic because I used only to show the problems.
Calendar: A calendar is a system of organizing days for social, religious, commercial or administrative purposes. This is done by giving names to periods of time, typically days, weeks, months and years. A date is the designation of a single, specific day within such a system. The Gregorian calendar is the most widely used calendar in the world today specially for database and other computer system. It is the calendar used in the international standard for representation of dates and times: ISO 8601:2004. It is a solar calendar based on a 365 days common year divideinto 12 months of irregular lengths 11 of the months have either 30 or 31 days, while the second month, February, has only 28 days during the common year. However, nearly every four years is a leap year, when one extra or intercalary day is added on February. Making the leap year in the Gregorian calendar 366 days long. The days of the year in the calendar are divided into 7 days weeks. The international standard is to start the week on Monday. However, several countries, including the Ethiopia, US and Canada, count Sunday as the first day of the week .
The Ethiopian calendar is the principal calendar used in Ethiopia and serves as the liturgical year for Christians in Eritrea and Ethiopia belonging to the Eritrean Orthodox Tewahedo Church, Ethiopian Orthodox Tewahedo Church, Eastern Catholic Churches and Coptic Orthodox Church of Alexandria. The Ethiopian calendar has difference with the Gregorian calendar in terms of day, month and year. Like the Coptic calendar, the Ethiopic calendar has 12 months of 30 days plus 5 or 6 epagomenal days, which comprise a thirteenth month. Maybe the clearest example of the different ways cultures view the world around them is demonstrated by our perceptions of the nontangible entity of time. The seven days week is nearly universal but we disagree on what we consider the first day of the week to be. This is the case even under the same calendar system.
Date and Time: Ethiopia shares the 24 hour day convention with the rest of the world but differs on when the day begins. The two part day division around mid-day (AM for anti-meridian and PM for post-meridian) is also a foreign notion taken for universal in localization systems. Where the “<am>” and “<pm>” day divisions of Amharic translation is an approximation that is no more than serviceable:
<am>ጠዋት</am> and <pm>ከሰዓት</pm>
While these translations could be understood under the context of the foreign conventions that they map, they are not ideal for Ethiopia. Naturally, Ethiopia will want to apply its own conventions that are already millennium old. ጠዋት, ረፋድ, እኩለ ቀን, እኩለ ሌሊት some of the examples of day division in Ethiopia. Ethiopia does not have a well-established preference for digital time formats in database system and other computer systems, but we want to establish computerized systems into a society. An example digital time format under United States English conventions appears as: Mon 27 Feb 2018 12:00:00 PM EAT
The equivalent date and time under the Ethiopian Amharic convention as available on Linux systems today appears as:
ማክሰ ፌብሩ 26 ቀን 12: 00: 00 ከሰዓት EAT 2018 ዓ/ም
This represents a loose mapping of some Amharic conventions onto an external reckoning of time. This is only translation and not localization in its truest sense. A hypothetical Ethiopic date and time presentation might looks as:
ማክሰኞ፣ የካቲት 19 ቀን 6: 00: 00 እኩለ ቀን 2010 ዓ/ም or
ማክሰኞ፣ የካቲት 19 ቀን 6: 00: 00 እኩለ ቀን፳ ፻ ፲ዓ/ም
Let see the drawbacks that exist in PostgreSQL with examples
Most of the database systems use time stamp to store data is in Gregorian calendar system for specific time zone.
However, most the data in Ethiopia are available with Ethiopian calendar system and users in Ethiopia are not comfortable with the Gregorian calendar as they use Ethiopian calendar in their day-to-day activities. This usually create inconvenience when the user want to have a reference to Ethiopic date as they had Gregorian calendar at database system. An example query to demonstrate this is given below.
Q2: Select current_date;
Q2 returns ‘2019-08-08’ but currently in Ethiopia calendar the year is ‘2011’,
Q3: Select to_char(to_timestamp(to_char(4, '999'), 'MM'), 'Month');
Q3 returns ‘April’ whereas the 4th month is ታህሳስ(December) in Ethiopian calendar system.
Q4: Select to_char(to_timestamp (to_char(13, '999'), 'MM'), 'Month');
Q4 returns an error message since the GC have only ‘12’ month per a year.
Where Q2, Q3 and Q4 are queries.
On Thu, Aug 8, 2019 at 7:31 PM Yonatan Misgan <yonamis@dtu.edu.et> wrote: > I am Yonathan Misgan from Ethiopia, want to add some functionalities on PostgreSQL to support Ethiopian locales. I wantyour advice where I start to hack the PostgresSQL source code. I have attached some synopsis about the existing problemsof PostgresSQL related with Ethiopian locale specially related with calendar, date and time format. Hi Yonatan, I'm not sure if this requires hacking the PostgreSQL source code. It sounds more like an extension. My first impression is that you might not need new types like "date". Instead you might be able to develop a suite of functions that can convert the existing types to and from the display formats (ie strings) and perhaps also components (year, month, day etc) that you use in your calendar system. For example: SELECT to_char_ethiopian(CURRENT_DATE, 'YYYY-MM-DD'), or whatever kind of format control string would be more appropriate. However, I see from https://en.wikipedia.org/wiki/Time_in_Ethiopia that new days start at 1 o'clock, not midnight, so that makes CURRENT_DATE a bit more confusing -- you might need to write a function current_eth_date() to deal with that small difference. Other than that detail, which is really a detail of CURRENT_DATE and not of the date type, dates are internally represented as a number of days since some arbitrary "epoch" day (I think Gregorian 2000-01-01), not as the components you see when you look at the output of SELECT CURRENT_DATE. That is, the Gregorian calendar concepts exist mostly in the display/input functions, and the operators that can add intervals etc. You could supply a different set of functions, but use the same types, and I suspect that'd be convenient because then you'll be able to use Gregorian and Ethiopian conventions with the same data, whenever you need to. It's much the same for timestamps, but with more complicated details. I see that there are libraries and bits of example code around to do the various kinds of calendar maths required for Ethiopian dates in Perl, Python etc. If I were you I think I'd experiment with a prototype implementation using PL/Perl, PL/Python etc (a way to define new PostgreSQL functions written in those languages), and if that goes well, try writing an extension in C to do it more efficiently. The end goal of that woudn't need to be part of PostgreSQL itself, but just an extension that anyone can download and install to use Ethiopian dates conveniently. -- Thomas Munro https://enterprisedb.com
Thank you for your quick response. I am also impressed to develop Ethiopian calendar as an extension on PostgreSQL and I I have already developed the function that convert Gregorian calendar time to Ethiopian calendar time. But the difficulty is on how to use this function on PostgreSQL as well on PostgreSQL month names are key words when I am developing Ethiopian calendar the date data type is doesn't accept Ethiopian month name as a date data type value only the numeric representation of the months are accepted by compiler.
-------- Original message --------
From: Thomas Munro <thomas.munro@gmail.com>
Date: 8/8/19 11:34 AM (GMT+03:00)
To: Yonatan Misgan <yonamis@dtu.edu.et>
Cc: pgsql-hackers@lists.postgresql.org
Subject: Re: Locale support
So my question is after developing the converter function where I put it for accessing it on PostgreSQL.
-------- Original message --------
From: Thomas Munro <thomas.munro@gmail.com>
Date: 8/8/19 11:34 AM (GMT+03:00)
To: Yonatan Misgan <yonamis@dtu.edu.et>
Cc: pgsql-hackers@lists.postgresql.org
Subject: Re: Locale support
On Thu, Aug 8, 2019 at 7:31 PM Yonatan Misgan <yonamis@dtu.edu.et> wrote:
> I am Yonathan Misgan from Ethiopia, want to add some functionalities on PostgreSQL to support Ethiopian locales. I want your advice where I start to hack the PostgresSQL source code. I have attached some synopsis about the existing problems of PostgresSQL related with Ethiopian locale specially related with calendar, date and time format.
Hi Yonatan,
I'm not sure if this requires hacking the PostgreSQL source code. It
sounds more like an extension. My first impression is that you might
not need new types like "date". Instead you might be able to develop
a suite of functions that can convert the existing types to and from
the display formats (ie strings) and perhaps also components (year,
month, day etc) that you use in your calendar system. For example:
SELECT to_char_ethiopian(CURRENT_DATE, 'YYYY-MM-DD'), or whatever kind
of format control string would be more appropriate.
However, I see from https://en.wikipedia.org/wiki/Time_in_Ethiopia
that new days start at 1 o'clock, not midnight, so that makes
CURRENT_DATE a bit more confusing -- you might need to write a
function current_eth_date() to deal with that small difference. Other
than that detail, which is really a detail of CURRENT_DATE and not of
the date type, dates are internally represented as a number of days
since some arbitrary "epoch" day (I think Gregorian 2000-01-01), not
as the components you see when you look at the output of SELECT
CURRENT_DATE. That is, the Gregorian calendar concepts exist mostly
in the display/input functions, and the operators that can add
intervals etc. You could supply a different set of functions, but use
the same types, and I suspect that'd be convenient because then you'll
be able to use Gregorian and Ethiopian conventions with the same data,
whenever you need to. It's much the same for timestamps, but with
more complicated details.
I see that there are libraries and bits of example code around to do
the various kinds of calendar maths required for Ethiopian dates in
Perl, Python etc. If I were you I think I'd experiment with a
prototype implementation using PL/Perl, PL/Python etc (a way to
define new PostgreSQL functions written in those languages), and if
that goes well, try writing an extension in C to do it more
efficiently.
The end goal of that woudn't need to be part of PostgreSQL itself, but
just an extension that anyone can download and install to use
Ethiopian dates conveniently.
--
Thomas Munro
https://enterprisedb.com
> I am Yonathan Misgan from Ethiopia, want to add some functionalities on PostgreSQL to support Ethiopian locales. I want your advice where I start to hack the PostgresSQL source code. I have attached some synopsis about the existing problems of PostgresSQL related with Ethiopian locale specially related with calendar, date and time format.
Hi Yonatan,
I'm not sure if this requires hacking the PostgreSQL source code. It
sounds more like an extension. My first impression is that you might
not need new types like "date". Instead you might be able to develop
a suite of functions that can convert the existing types to and from
the display formats (ie strings) and perhaps also components (year,
month, day etc) that you use in your calendar system. For example:
SELECT to_char_ethiopian(CURRENT_DATE, 'YYYY-MM-DD'), or whatever kind
of format control string would be more appropriate.
However, I see from https://en.wikipedia.org/wiki/Time_in_Ethiopia
that new days start at 1 o'clock, not midnight, so that makes
CURRENT_DATE a bit more confusing -- you might need to write a
function current_eth_date() to deal with that small difference. Other
than that detail, which is really a detail of CURRENT_DATE and not of
the date type, dates are internally represented as a number of days
since some arbitrary "epoch" day (I think Gregorian 2000-01-01), not
as the components you see when you look at the output of SELECT
CURRENT_DATE. That is, the Gregorian calendar concepts exist mostly
in the display/input functions, and the operators that can add
intervals etc. You could supply a different set of functions, but use
the same types, and I suspect that'd be convenient because then you'll
be able to use Gregorian and Ethiopian conventions with the same data,
whenever you need to. It's much the same for timestamps, but with
more complicated details.
I see that there are libraries and bits of example code around to do
the various kinds of calendar maths required for Ethiopian dates in
Perl, Python etc. If I were you I think I'd experiment with a
prototype implementation using PL/Perl, PL/Python etc (a way to
define new PostgreSQL functions written in those languages), and if
that goes well, try writing an extension in C to do it more
efficiently.
The end goal of that woudn't need to be part of PostgreSQL itself, but
just an extension that anyone can download and install to use
Ethiopian dates conveniently.
--
Thomas Munro
https://enterprisedb.com
On 8/8/19 9:29 AM, Yonatan Misgan wrote: > From: Thomas Munro <thomas.munro@gmail.com> >> Perl, Python etc. If I were you I think I'd experiment with a >> prototype implementation using PL/Perl, PL/Python etc (a way to As a bit of subtlety that might matter, the internal representation in PostgreSQL, as in ISO 8601, applies the Gregorian calendar 'proleptically', that is, forever into the future, and forever into the past, before it was even invented or in use anywhere. That matches the documented behavior of the standard 'datetime' class included with Python (though in Python you need an add-on module to support time zones). In Perl, an add-on module may be required. (In Java, the classes in the java.time package match the PostgreSQL / ISO 8601 behavior, while the classes in the java.sql package do not.) The effects of any mismatch are most likely to show up in dates earlier than 15 October 1582 Gregorian. This was freshly on my mind from a recent thread over here[1]. Regards, -Chap [1] https://www.postgresql.org/message-id/5D3AF944.6020900%40anastigmatix.net
On Thu, Aug 8, 2019 at 6:29 AM Yonatan Misgan <yonamis@dtu.edu.et> wrote: > So my question is after developing the converter function where I put it for accessing it on PostgreSQL. Maybe you can take some inspiration from the postgresql-unit extension: https://github.com/df7cb/postgresql-unit Note that it is built on top of GNU units, which is itself highly extensible. I'm not sure if this will be useful, since I am not an expert on calendar systems. -- Peter Geoghegan
On Fri, Aug 9, 2019 at 6:19 AM Peter Geoghegan <pg@bowt.ie> wrote: > On Thu, Aug 8, 2019 at 6:29 AM Yonatan Misgan <yonamis@dtu.edu.et> wrote: > > So my question is after developing the converter function where I put it for accessing it on PostgreSQL. > > Maybe you can take some inspiration from the postgresql-unit extension: > > https://github.com/df7cb/postgresql-unit Here's a 5 minute bare bones extension with place holders functions showing what I had in mind. That is, assuming that "date" is a reasonable type, and we're just talking about different ways of converting to/from text. https://github.com/macdice/calendars -- Thomas Munro https://enterprisedb.com
Can I implement it as a locale support? When the user want to change the lc _time = am_ET(Amharic Ethiopia ) the date and time representation of the database systems be in Ethiopian calendar.
-------- Original message --------
From: Thomas Munro <thomas.munro@gmail.com>
Date: 8/9/19 1:17 AM (GMT+03:00)
To: Peter Geoghegan <pg@bowt.ie>
Cc: Yonatan Misgan <yonamis@dtu.edu.et>, pgsql-hackers@lists.postgresql.org
Subject: Re: Locale support
-------- Original message --------
From: Thomas Munro <thomas.munro@gmail.com>
Date: 8/9/19 1:17 AM (GMT+03:00)
To: Peter Geoghegan <pg@bowt.ie>
Cc: Yonatan Misgan <yonamis@dtu.edu.et>, pgsql-hackers@lists.postgresql.org
Subject: Re: Locale support
On Fri, Aug 9, 2019 at 6:19 AM Peter Geoghegan <pg@bowt.ie> wrote:
> On Thu, Aug 8, 2019 at 6:29 AM Yonatan Misgan <yonamis@dtu.edu.et> wrote:
> > So my question is after developing the converter function where I put it for accessing it on PostgreSQL.
>
> Maybe you can take some inspiration from the postgresql-unit extension:
>
> https://github.com/df7cb/postgresql-unit
Here's a 5 minute bare bones extension with place holders functions
showing what I had in mind. That is, assuming that "date" is a
reasonable type, and we're just talking about different ways of
converting to/from text.
https://github.com/macdice/calendars
--
Thomas Munro
https://enterprisedb.com
> On Thu, Aug 8, 2019 at 6:29 AM Yonatan Misgan <yonamis@dtu.edu.et> wrote:
> > So my question is after developing the converter function where I put it for accessing it on PostgreSQL.
>
> Maybe you can take some inspiration from the postgresql-unit extension:
>
> https://github.com/df7cb/postgresql-unit
Here's a 5 minute bare bones extension with place holders functions
showing what I had in mind. That is, assuming that "date" is a
reasonable type, and we're just talking about different ways of
converting to/from text.
https://github.com/macdice/calendars
--
Thomas Munro
https://enterprisedb.com
On Fri, Aug 9, 2019 at 6:15 PM Yonatan Misgan <yonamis@dtu.edu.et> wrote: > Can I implement it as a locale support? When the user want to change the lc _time = am_ET(Amharic Ethiopia ) the dateand time representation of the database systems be in Ethiopian calendar. Hi Yonatan, I'm not an expert in this stuff, but it seem to me that both the operating system and the date/time localisation code in PostgreSQL use Gregorian calendar logic, even though they can use local language names for them. That is, they allow you to display the French, Greek, Ethiopian ... names for the Gregorian months, but not a a completely different calendar system. For example, according to my operating system: $ LC_TIME=en_GB.UTF-8 date Sat 10 Aug 2019 10:58:42 NZST $ LC_TIME=fr_FR.UTF-8 date sam. 10 août 2019 10:58:48 NZST $ LC_TIME=el_GR.UTF-8 date Σάβ 10 Αυγ 2019 10:58:51 NZST $ LC_TIME=am_ET.UTF-8 date ቅዳሜ ኦገስ 10 10:58:55 NZST 2019 These all say it's Saturday (ቅዳሜ) the 10th of August (ኦገስ) on the Gregorian calendar. Looking at POSIX date[1] you can see they contemplated the existence of non-Gregorian calendars in a very limited way, but no operating system I have access to does anything other than Gregorian with %x and %c: "The date string formatting capabilities are intended for use in Gregorian-style calendars, possibly with a different starting year (or years). The %x and %c conversion specifications, however, are intended for local representation; these may be based on a different, non-Gregorian calendar." PostgreSQL behaves the same way when you ask for the localised month in am_ET: tmunro=> set lc_time to 'am_ET.UTF-8'; SET tmunro=> select to_char(now(), 'TMMon'); to_char ----------- ኦገስ (1 row) This is hard coded into the system, as you can see from src/backend/utils/adt/pg_locale.c where the *twelve* month names are loaded into localized_abbrev_months and other similar arrays. That's the first clue that this system can't handle the thirteen Ethiopian months, not to mention the maths required to work with them. That's why I think you need a new, different to_char() function (and probably more functions). In that skeleton code I posted, you can see I defined a function to_char(date, format, calendar) that takes a third argument, for the calendar name. You might also wonder if that new function should respect the locale settings, but I'm not sure if it could in general; you'd have to be able to get (say) the Greek names for the Ethiopian calendar's months, which the OS won't be able to give you. Though perhaps you'd want some way to select between the Ethiopian script and the transliterations into Latin script, which would presumably be hard coded into the extension, I have no idea if that's useful to anyone... BTW there have been earlier discussions of this: https://www.postgresql.org/message-id/flat/CAM7dW9iBXDJuwZrEXW%2Bdsa_%3Dew%3D%2BFdv7mcF51nQLGSkTkQp2MQ%40mail.gmail.com It shows that Apple has an Objective-C NSCalendar class that understands Ehtiopian, Persian, Hebrew, ... calendars, which made me wonder if LC_TIME=am_ET.UTF-8 would trigger something special on a Mac, but nope, its libc still just gives Ethiopian names for Gregorian months as I expected. [1] http://pubs.opengroup.org/onlinepubs/009695399/utilities/date.html -- Thomas Munro https://enterprisedb.com
On Sat, Aug 10, 2019 at 11:50 AM Thomas Munro <thomas.munro@gmail.com> wrote: > On Fri, Aug 9, 2019 at 6:15 PM Yonatan Misgan <yonamis@dtu.edu.et> wrote: > > Can I implement it as a locale support? When the user want to change the lc _time = am_ET(Amharic Ethiopia ) the dateand time representation of the database systems be in Ethiopian calendar. > > I'm not an expert in this stuff, but it seem to me that both the > operating system and the date/time localisation code in PostgreSQL use > Gregorian calendar logic, even though they can use local language > names for them. That is, they allow you to display the French, Greek, > Ethiopian ... names for the Gregorian months, but not a a completely > different calendar system. For example, according to my operating > system: > ... Reading about that led me to the ICU model of calendars. Here's the C API (there are C++ and Java APIs too, but to maximise the possibility of producing something that could ever be part of core PostgreSQL, I'd stick to pure C): http://userguide.icu-project.org/datetime/calendar http://icu-project.org/apiref/icu4c/ucal_8h.html http://icu-project.org/apiref/icu4c/udat_8h.html It does in fact work using locales to select calendars as you were suggesting (unlike POSIX or at least the POSIX systems I tried), and there it knows that am_ET is associated with the Ethiopic calendar, as you were suggesting (and likewise for nearly a dozen other calendars). When you call ucal_open() you have to say whether you want UCAL_TRADITIONAL (the traditional calendar associated with the locale) or UCAL_GREGORIAN. In the usual ICU fashion it lets you mix and match more explicitly, so you can use locale names like "fr_FR@calendar=buddhist". Then you can us the udat.h stuff to format dates and so forth. So now I'm wondering if the best idea would be to write an extension that provides a handful of carefully crafted functions to expose the ICU calendar date/time formatting/parsing/manipulation stuff to SQL. I think that should be doable in a way that is not specific to Ethiopic, so that users of Indian, Persian, Islamic etc calendars can also benefit from this. I'm not sure if it should use LC_TIME -- possibly not, because that would interfere with libc-based stuff and the locale names don't match up; that might only make sense if this facility completely replaced the built in date/time stuff, which seems unlikely. Perhaps you'd just want to pass in the complete ICU locale string to each of the functions, to_char_icu(current_date, 'Month', 'fr_FR@calendar=buddhist'), or perhaps you'd want a GUC to control it (extensions can have their own GUCs). I'm not sure. There has certainly been interest in exposing other bits of ICU to PostgreSQL, either in core or in extensions: collations, Unicode normalisation, and now this. Hmm. Another thing you might want to look into is whether the SQL standard has anything to say about non-Gregorian calendars, and whether DB2 has anything to support them (it certainly has ICU inside it, as does PostgreSQL (optionally), so I wonder if they've exposed this part of it to SQL and if so how). -- Thomas Munro https://enterprisedb.com