Обсуждение: to_timestamp docs
I don't think the changes made in PG 12 are documented accurately. It currently says: to_timestamp and to_date matches any single separator in the input string or is skipped However, I think it is more accurate to say _multiple_ whitespace can also be matched by a single separator: SELECT to_timestamp('%1976','_YYYY'); to_timestamp ------------------------ 1976-01-01 00:00:00-05 SELECT to_timestamp('%%1976','_YYYY'); ERROR: invalid value "%197" for "YYYY" DETAIL: Value must be an integer. -- two spaces --> SELECT to_timestamp(' 1976','_YYYY'); to_timestamp ------------------------ 1976-01-01 00:00:00-05 --> SELECT to_timestamp(E'\t\t\t1976','_YYYY'); to_timestamp ------------------------ 1976-01-01 00:00:00-05 Proposed patch attached. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Вложения
Bruce Momjian <bruce@momjian.us> writes: > I don't think the changes made in PG 12 are documented accurately. That code is swapped out of my head at the moment, but it looks to me like the para before the one you changed is where we discuss the behavior for whitespace. I'm not sure that this change is right, or an improvement, in the context of both paras. regards, tom lane
On Wed, May 1, 2019 at 10:01:50AM -0400, Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > I don't think the changes made in PG 12 are documented accurately. > > That code is swapped out of my head at the moment, but it looks > to me like the para before the one you changed is where we discuss > the behavior for whitespace. I'm not sure that this change is > right, or an improvement, in the context of both paras. Thanks. I think I see the sentence you are thinking of: <function>to_timestamp</function> and <function>to_date</function> skip multiple blank spaces at the beginning of the input string and around date and time values unless the <literal>FX</literal> option is used. However, first, it is unclear what 'skip' means here, i.e., does it mean multiple blank spaces become a single space, or they are ignored. That should be clarified, though I am unclear if that matters based on how separators are handled. Also, I think "blank spaces" should be "whitespace". Second, I see inconsistent behaviour around the use of FX for various patterns, e.g.: SELECT to_timestamp('5 1976','FXDD_FXYYYY'); to_timestamp ------------------------ 1976-01-05 00:00:00-05 SELECT to_timestamp('JUL JUL','FXMON_FXMON'); to_timestamp --------------------------------- 0001-07-01 00:00:00-04:56:02 BC SELECT to_timestamp('JUL JUL','FXMON_FXMON'); ERROR: invalid value " " for "MON" DETAIL: The given value did not match any of the allowed values for this field. It seems DD and YYYY (as numerics?) in FX mode eat trailing whitespace, while MON does not? Also, I used these queries to determine it is "trailing" whitespace that "FXMON" controls: SELECT to_timestamp('JUL JUL JUL','MON_FXMON_MON'); to_timestamp --------------------------------- 0001-07-01 00:00:00-04:56:02 BC SELECT to_timestamp('JUL JUL JUL','MON_FXMON_MON'); ERROR: invalid value " J" for "MON" DETAIL: The given value did not match any of the allowed values for this field. Once we figure out how it is behaving I think we can pull together the FX text above to reference the separator text below. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Hello, On Wed, May 1, 2019 at 6:05 PM Bruce Momjian <bruce@momjian.us> wrote: > Thanks. I think I see the sentence you are thinking of: > > <function>to_timestamp</function> and <function>to_date</function> > skip multiple blank spaces at the beginning of the input string > and around date and time values unless the <literal>FX</literal> > option is used. > > However, first, it is unclear what 'skip' means here, i.e., does it mean > multiple blank spaces become a single space, or they are ignored. I worked at to_timestamp some time ago. In this case multiple bank spaces at the beginning should be ignored. > Second, I see inconsistent behaviour around the use of FX for various > patterns, e.g.: > > SELECT to_timestamp('5 1976','FXDD_FXYYYY'); > to_timestamp > ------------------------ > 1976-01-05 00:00:00-05 Hm, I think strspace_len() is partly to blame here, which is called by from_char_parse_int_len(): /* * Skip any whitespace before parsing the integer. */ *src += strspace_len(*src); But even if you remove this line of code then strtol() will eat survived whitespaces: result = strtol(init, src, 10); Not sure if we need some additional checks here if FX is set. > It seems DD and YYYY (as numerics?) in FX mode eat trailing whitespace, > while MON does not? Also, I used these queries to determine it is > "trailing" whitespace that "FXMON" controls: > > SELECT to_timestamp('JUL JUL JUL','MON_FXMON_MON'); > to_timestamp > --------------------------------- > 0001-07-01 00:00:00-04:56:02 BC > > SELECT to_timestamp('JUL JUL JUL','MON_FXMON_MON'); > ERROR: invalid value " J" for "MON" > DETAIL: The given value did not match any of the allowed values for this field. The problem here is that you need to specify FX only once and at beginning of the format string. It is stated in the documentation: "FX must be specified as the first item in the template." It works globally (but only for remaining string if you don't put it at the beginning) and you can set it only once. For example: =# SELECT to_timestamp('JUL JUL JUL','FXMON_MON_MON'); ERROR: invalid value " J" for "MON" DETAIL: The given value did not match any of the allowed values for this field. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
On Wed, May 1, 2019 at 11:20:05PM +0300, Arthur Zakirov wrote: > Hello, > > On Wed, May 1, 2019 at 6:05 PM Bruce Momjian <bruce@momjian.us> wrote: > > Thanks. I think I see the sentence you are thinking of: > > > > <function>to_timestamp</function> and <function>to_date</function> > > skip multiple blank spaces at the beginning of the input string > > and around date and time values unless the <literal>FX</literal> > > option is used. > > > > However, first, it is unclear what 'skip' means here, i.e., does it mean > > multiple blank spaces become a single space, or they are ignored. > > I worked at to_timestamp some time ago. In this case multiple bank spaces at > the beginning should be ignored. OK. > > Second, I see inconsistent behaviour around the use of FX for various > > patterns, e.g.: > > > > SELECT to_timestamp('5 1976','FXDD_FXYYYY'); > > to_timestamp > > ------------------------ > > 1976-01-05 00:00:00-05 > > Hm, I think strspace_len() is partly to blame here, which is called by > from_char_parse_int_len(): > > /* > * Skip any whitespace before parsing the integer. > */ > *src += strspace_len(*src); > > But even if you remove this line of code then strtol() will eat > survived whitespaces: > > result = strtol(init, src, 10); > > Not sure if we need some additional checks here if FX is set. Yes, I suspected it was part of the input function, but it seems it is done in two places. It seems we need the opposite of strspace_len() in that place to throw an error if we are in FX mode. > The problem here is that you need to specify FX only once and at beginning of > the format string. It is stated in the documentation: > > "FX must be specified as the first item in the template." Uh, FX certainly changes behavior if it isn't the first thing in the format string. > It works globally (but only for remaining string if you don't put it > at the beginning) Uh, then the documentation is wrong? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On Wed, May 1, 2019 at 11:20 PM Arthur Zakirov <a.zakirov@postgrespro.ru> wrote: > Hello, > > On Wed, May 1, 2019 at 6:05 PM Bruce Momjian <bruce@momjian.us> wrote: > > Thanks. I think I see the sentence you are thinking of: > > > > <function>to_timestamp</function> and <function>to_date</function> > > skip multiple blank spaces at the beginning of the input string > > and around date and time values unless the <literal>FX</literal> > > option is used. > > > > However, first, it is unclear what 'skip' means here, i.e., does it mean > > multiple blank spaces become a single space, or they are ignored. > > I worked at to_timestamp some time ago. In this case multiple bank spaces at > the beginning should be ignored. > > > Second, I see inconsistent behaviour around the use of FX for various > > patterns, e.g.: > > > > SELECT to_timestamp('5 1976','FXDD_FXYYYY'); > > to_timestamp > > ------------------------ > > 1976-01-05 00:00:00-05 > > Hm, I think strspace_len() is partly to blame here, which is called by > from_char_parse_int_len(): > > /* > * Skip any whitespace before parsing the integer. > */ > *src += strspace_len(*src); > > But even if you remove this line of code then strtol() will eat > survived whitespaces: > > result = strtol(init, src, 10); > > Not sure if we need some additional checks here if FX is set. I'd like to add that this behavior is not new in 12. It was the same before. > > It seems DD and YYYY (as numerics?) in FX mode eat trailing whitespace, > > while MON does not? Also, I used these queries to determine it is > > "trailing" whitespace that "FXMON" controls: > > > > SELECT to_timestamp('JUL JUL JUL','MON_FXMON_MON'); > > to_timestamp > > --------------------------------- > > 0001-07-01 00:00:00-04:56:02 BC > > > > SELECT to_timestamp('JUL JUL JUL','MON_FXMON_MON'); > > ERROR: invalid value " J" for "MON" > > DETAIL: The given value did not match any of the allowed values for this field. > > The problem here is that you need to specify FX only once and at beginning of > the format string. It is stated in the documentation: > > "FX must be specified as the first item in the template." > > It works globally (but only for remaining string if you don't put it > at the beginning) > and you can set it only once. For example: > > =# SELECT to_timestamp('JUL JUL JUL','FXMON_MON_MON'); > ERROR: invalid value " J" for "MON" > DETAIL: The given value did not match any of the allowed values for this field. Actually, FX takes effect on subsequent format patterns. This is not documented, but it copycats Oracle behavior. Sure, normally FX should be specified as the first item. We could document current behavior or restrict specifying FX not as first item. This is also not new in 12, so documenting current behavior is better for compatibility. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Thu, May 2, 2019 at 12:49:23AM +0300, Alexander Korotkov wrote: > On Wed, May 1, 2019 at 11:20 PM Arthur Zakirov <a.zakirov@postgrespro.ru> wrote: > > Hello, > > Not sure if we need some additional checks here if FX is set. > > I'd like to add that this behavior is not new in 12. It was the same before. Agreed, but since we are looking at it, let's document it. > > > It seems DD and YYYY (as numerics?) in FX mode eat trailing whitespace, > > > while MON does not? Also, I used these queries to determine it is > > > "trailing" whitespace that "FXMON" controls: > > > > > > SELECT to_timestamp('JUL JUL JUL','MON_FXMON_MON'); > > > to_timestamp > > > --------------------------------- > > > 0001-07-01 00:00:00-04:56:02 BC > > > > > > SELECT to_timestamp('JUL JUL JUL','MON_FXMON_MON'); > > > ERROR: invalid value " J" for "MON" > > > DETAIL: The given value did not match any of the allowed values for this field. > > > > The problem here is that you need to specify FX only once and at beginning of > > the format string. It is stated in the documentation: > > > > "FX must be specified as the first item in the template." > > > > It works globally (but only for remaining string if you don't put it > > at the beginning) > > and you can set it only once. For example: > > > > =# SELECT to_timestamp('JUL JUL JUL','FXMON_MON_MON'); > > ERROR: invalid value " J" for "MON" > > DETAIL: The given value did not match any of the allowed values for this field. > > Actually, FX takes effect on subsequent format patterns. This is not > documented, but it copycats Oracle behavior. Sure, normally FX should > be specified as the first item. We could document current behavior or > restrict specifying FX not as first item. This is also not new in 12, > so documenting current behavior is better for compatibility. Agreed. Since is it pre-12 behavior, I suggest we just document it and not change it. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On Thu, May 2, 2019 at 12:49 AM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > > It works globally (but only for remaining string if you don't put it > > at the beginning) > > and you can set it only once. For example: > > > > =# SELECT to_timestamp('JUL JUL JUL','FXMON_MON_MON'); > > ERROR: invalid value " J" for "MON" > > DETAIL: The given value did not match any of the allowed values for this field. > > Actually, FX takes effect on subsequent format patterns. This is not > documented, but it copycats Oracle behavior. Sure, normally FX should > be specified as the first item. We could document current behavior or > restrict specifying FX not as first item. This is also not new in 12, > so documenting current behavior is better for compatibility. I went to Oracle's documentation. It seems that the behavior is slightly different. Their documentation says: "A modifier can appear in a format model more than once. In such a case, each subsequent occurrence toggles the effects of the modifier. Its effects are enabled for the portion of the model following its first occurrence, and then disabled for the portion following its second, and then reenabled for the portion following its third, and so on." In PostgreSQL one cannot disable exact mode using second FX. I think we shouldn't add some restriction for FX. Instead PostgreSQL's documentation can be fixed. And current explanation in the documentation might be wrong as Bruce pointed. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
On Thu, May 2, 2019 at 1:03 AM Arthur Zakirov <a.zakirov@postgrespro.ru> wrote: > On Thu, May 2, 2019 at 12:49 AM Alexander Korotkov > <a.korotkov@postgrespro.ru> wrote: > > Actually, FX takes effect on subsequent format patterns. This is not > > documented, but it copycats Oracle behavior. Sure, normally FX should > > be specified as the first item. We could document current behavior or > > restrict specifying FX not as first item. This is also not new in 12, > > so documenting current behavior is better for compatibility. > > I went to Oracle's documentation. It seems that the behavior is > slightly different. > Their documentation says: > > "A modifier can appear in a format model more than once. In such a case, > each subsequent occurrence toggles the effects of the modifier. Its effects are > enabled for the portion of the model following its first occurrence, and then > disabled for the portion following its second, and then reenabled for > the portion > following its third, and so on." What about the patch I attached? It fixes the explanation of FX option a little. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company