Обсуждение: BUG #10972: string_agg function incorrectly concatenating varying delimiter

Поиск
Список
Период
Сортировка

BUG #10972: string_agg function incorrectly concatenating varying delimiter

От
jeff@goaldriven.com
Дата:
The following bug has been logged on the website:

Bug reference:      10972
Logged by:          Jeff Fischer
Email address:      jeff@goaldriven.com
PostgreSQL version: 9.3.1
Operating system:   CentOS
Description:

Running the query below will show how the delimiter for the current row is
actually the subsequent rows delimiter.  The sequence goes like this:

TestBlah    1
ITTest    2
testfail    3

Yet, the concatened output is this:

"testblah 2 ITtest 3 testfail"

You can see that the delimiter for the current row is actually getting the
subsequent rows delimiter.  It's as though the string concat moves to the
next row prior to concatenting the delimiter, which is likely the cause of
this bug.

SELECT
    string_agg(
        Field1,
        ' ' || RowIndex::text || ' '
    ORDER BY RowIndex) as VeryLongConcatenatedResultsFieldName
FROM (
    SELECT
        'testblah' as Field1,
        1 as RowIndex
    UNION
    SELECT
        'ITtest' as Field1,
        2 as RowIndex
    UNION
    SELECT
        'testfail' as Field1,
        3 as RowIndex
    ) t;

Re: BUG #10972: string_agg function incorrectly concatenating varying delimiter

От
Tom Lane
Дата:
jeff@goaldriven.com writes:
> Running the query below will show how the delimiter for the current row is
> actually the subsequent rows delimiter.

Hmm, well, the documentation for string_agg doesn't say what happens when
the "delimiter" argument varies across rows; but a quick look at the code
finds that the first-call delimiter isn't actually used at all, and on
subsequent calls the delimiter is appended to the running result before
the associated value is.  Which seems to me to be at least as reasonable,
and certainly a great deal easier to implement, as what you seem to have
in mind.  Can you offer a principled argument why it should be the other
way around?

            regards, tom lane

Re: BUG #10972: string_agg function incorrectly concatenating varying delimiter

От
Jeff Fischer
Дата:
Hi Tom,

I just read the Wikipedia article on you, fun.  Glad to make your acquainta=
nce.  My business partner, Dave Yarnall, went to Carnegie Mellon (CCed).

I suppose principles can be relative, but I'll assume you mean good princip=
les and give it a shot.

Primarily, I'd consider whether another function uses non-deterministic row=
s for its evaluation.  I could be wrong, but I don't think any other functi=
on uses two different rows results within a single function evaluation.  Ev=
en aggregates, such as string_agg, evaluate one row at a time which is a we=
ll-known behavior.

A similar paradigm might be in general programming if a compiled program ra=
ndomly chose values off of the stack to place as a parameter into a method =
call (function 3's parameters are passed into function2).  An odd and unexp=
ected behavior for the SQL language and really any language, I think.  Alth=
ough, it is quite creative.

It sounds like you've quickly isolated the line within the source.  In an i=
nterest in learning more about the code, would you mind pointing my partner=
 and I to the line for this bug?

Thanks,
Jeff

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]=20
Sent: Wednesday, July 16, 2014 7:20 AM
To: Jeff Fischer
Cc: pgsql-bugs@postgresql.org
Subject: Re: [BUGS] BUG #10972: string_agg function incorrectly concatenati=
ng varying delimiter

jeff@goaldriven.com writes:
> Running the query below will show how the delimiter for the current=20
> row is actually the subsequent rows delimiter.

Hmm, well, the documentation for string_agg doesn't say what happens when t=
he "delimiter" argument varies across rows; but a quick look at the code fi=
nds that the first-call delimiter isn't actually used at all, and on subseq=
uent calls the delimiter is appended to the running result before the assoc=
iated value is.  Which seems to me to be at least as reasonable, and certai=
nly a great deal easier to implement, as what you seem to have in mind.  Ca=
n you offer a principled argument why it should be the other way around?

            regards, tom lane

Re: BUG #10972: string_agg function incorrectly concatenating varying delimiter

От
Pavel Stehule
Дата:
Hello


2014-07-21 22:34 GMT+02:00 Jeff Fischer <jeff@goaldriven.com>:

> Hi Tom,
>
> I just read the Wikipedia article on you, fun.  Glad to make your
> acquaintance.  My business partner, Dave Yarnall, went to Carnegie Mellon
> (CCed).
>
> I suppose principles can be relative, but I'll assume you mean good
> principles and give it a shot.
>
> Primarily, I'd consider whether another function uses non-deterministic
> rows for its evaluation.  I could be wrong, but I don't think any other
> function uses two different rows results within a single function
> evaluation.  Even aggregates, such as string_agg, evaluate one row at a
> time which is a well-known behavior.
>
> A similar paradigm might be in general programming if a compiled program
> randomly chose values off of the stack to place as a parameter into a
> method call (function 3's parameters are passed into function2).  An odd
> and unexpected behavior for the SQL language and really any language, I
> think.  Although, it is quite creative.
>
> It sounds like you've quickly isolated the line within the source.  In an
> interest in learning more about the code, would you mind pointing my
> partner and I to the line for this bug?
>

https://github.com/postgres/postgres/blob/master/src/backend/utils/adt/varlena.c

search string_agg

Regards

Pavel


>
> Thanks,
> Jeff
>
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Wednesday, July 16, 2014 7:20 AM
> To: Jeff Fischer
> Cc: pgsql-bugs@postgresql.org
> Subject: Re: [BUGS] BUG #10972: string_agg function incorrectly
> concatenating varying delimiter
>
> jeff@goaldriven.com writes:
> > Running the query below will show how the delimiter for the current
> > row is actually the subsequent rows delimiter.
>
> Hmm, well, the documentation for string_agg doesn't say what happens when
> the "delimiter" argument varies across rows; but a quick look at the code
> finds that the first-call delimiter isn't actually used at all, and on
> subsequent calls the delimiter is appended to the running result before the
> associated value is.  Which seems to me to be at least as reasonable, and
> certainly a great deal easier to implement, as what you seem to have in
> mind.  Can you offer a principled argument why it should be the other way
> around?
>
>                         regards, tom lane
>
>
> --
> Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-bugs
>

Re: BUG #10972: string_agg function incorrectly concatenating varying delimiter

От
Jeff Fischer
Дата:
VGhhbmsgeW91IHZlcnkgbXVjaCBQYXZlbC4NCg0KVGhhbmtzLA0KSmVmZg0KDQpGcm9tOiBQYXZl
bCBTdGVodWxlIFttYWlsdG86cGF2ZWwuc3RlaHVsZUBnbWFpbC5jb21dDQpTZW50OiBNb25kYXks
IEp1bHkgMjEsIDIwMTQgMTA6NDIgUE0NClRvOiBKZWZmIEZpc2NoZXINCkNjOiBUb20gTGFuZTsg
cGdzcWwtYnVnc0Bwb3N0Z3Jlc3FsLm9yZzsgRGF2ZSBZYXJuYWxsDQpTdWJqZWN0OiBSZTogW0JV
R1NdIEJVRyAjMTA5NzI6IHN0cmluZ19hZ2cgZnVuY3Rpb24gaW5jb3JyZWN0bHkgY29uY2F0ZW5h
dGluZyB2YXJ5aW5nIGRlbGltaXRlcg0KDQpIZWxsbw0KDQoyMDE0LTA3LTIxIDIyOjM0IEdNVCsw
MjowMCBKZWZmIEZpc2NoZXIgPGplZmZAZ29hbGRyaXZlbi5jb208bWFpbHRvOmplZmZAZ29hbGRy
aXZlbi5jb20+PjoNCkhpIFRvbSwNCg0KSSBqdXN0IHJlYWQgdGhlIFdpa2lwZWRpYSBhcnRpY2xl
IG9uIHlvdSwgZnVuLiAgR2xhZCB0byBtYWtlIHlvdXIgYWNxdWFpbnRhbmNlLiAgTXkgYnVzaW5l
c3MgcGFydG5lciwgRGF2ZSBZYXJuYWxsLCB3ZW50IHRvIENhcm5lZ2llIE1lbGxvbiAoQ0NlZCku
DQoNCkkgc3VwcG9zZSBwcmluY2lwbGVzIGNhbiBiZSByZWxhdGl2ZSwgYnV0IEknbGwgYXNzdW1l
IHlvdSBtZWFuIGdvb2QgcHJpbmNpcGxlcyBhbmQgZ2l2ZSBpdCBhIHNob3QuDQoNClByaW1hcmls
eSwgSSdkIGNvbnNpZGVyIHdoZXRoZXIgYW5vdGhlciBmdW5jdGlvbiB1c2VzIG5vbi1kZXRlcm1p
bmlzdGljIHJvd3MgZm9yIGl0cyBldmFsdWF0aW9uLiAgSSBjb3VsZCBiZSB3cm9uZywgYnV0IEkg
ZG9uJ3QgdGhpbmsgYW55IG90aGVyIGZ1bmN0aW9uIHVzZXMgdHdvIGRpZmZlcmVudCByb3dzIHJl
c3VsdHMgd2l0aGluIGEgc2luZ2xlIGZ1bmN0aW9uIGV2YWx1YXRpb24uICBFdmVuIGFnZ3JlZ2F0
ZXMsIHN1Y2ggYXMgc3RyaW5nX2FnZywgZXZhbHVhdGUgb25lIHJvdyBhdCBhIHRpbWUgd2hpY2gg
aXMgYSB3ZWxsLWtub3duIGJlaGF2aW9yLg0KDQpBIHNpbWlsYXIgcGFyYWRpZ20gbWlnaHQgYmUg
aW4gZ2VuZXJhbCBwcm9ncmFtbWluZyBpZiBhIGNvbXBpbGVkIHByb2dyYW0gcmFuZG9tbHkgY2hv
c2UgdmFsdWVzIG9mZiBvZiB0aGUgc3RhY2sgdG8gcGxhY2UgYXMgYSBwYXJhbWV0ZXIgaW50byBh
IG1ldGhvZCBjYWxsIChmdW5jdGlvbiAzJ3MgcGFyYW1ldGVycyBhcmUgcGFzc2VkIGludG8gZnVu
Y3Rpb24yKS4gIEFuIG9kZCBhbmQgdW5leHBlY3RlZCBiZWhhdmlvciBmb3IgdGhlIFNRTCBsYW5n
dWFnZSBhbmQgcmVhbGx5IGFueSBsYW5ndWFnZSwgSSB0aGluay4gIEFsdGhvdWdoLCBpdCBpcyBx
dWl0ZSBjcmVhdGl2ZS4NCg0KSXQgc291bmRzIGxpa2UgeW91J3ZlIHF1aWNrbHkgaXNvbGF0ZWQg
dGhlIGxpbmUgd2l0aGluIHRoZSBzb3VyY2UuICBJbiBhbiBpbnRlcmVzdCBpbiBsZWFybmluZyBt
b3JlIGFib3V0IHRoZSBjb2RlLCB3b3VsZCB5b3UgbWluZCBwb2ludGluZyBteSBwYXJ0bmVyIGFu
ZCBJIHRvIHRoZSBsaW5lIGZvciB0aGlzIGJ1Zz8NCg0KaHR0cHM6Ly9naXRodWIuY29tL3Bvc3Rn
cmVzL3Bvc3RncmVzL2Jsb2IvbWFzdGVyL3NyYy9iYWNrZW5kL3V0aWxzL2FkdC92YXJsZW5hLmMN
CnNlYXJjaCBzdHJpbmdfYWdnDQoNClJlZ2FyZHMNClBhdmVsDQoNCg0KVGhhbmtzLA0KSmVmZg0K
DQotLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KRnJvbTogVG9tIExhbmUgW21haWx0bzp0Z2xA
c3NzLnBnaC5wYS51czxtYWlsdG86dGdsQHNzcy5wZ2gucGEudXM+XQ0KU2VudDogV2VkbmVzZGF5
LCBKdWx5IDE2LCAyMDE0IDc6MjAgQU0NClRvOiBKZWZmIEZpc2NoZXINCkNjOiBwZ3NxbC1idWdz
QHBvc3RncmVzcWwub3JnPG1haWx0bzpwZ3NxbC1idWdzQHBvc3RncmVzcWwub3JnPg0KU3ViamVj
dDogUmU6IFtCVUdTXSBCVUcgIzEwOTcyOiBzdHJpbmdfYWdnIGZ1bmN0aW9uIGluY29ycmVjdGx5
IGNvbmNhdGVuYXRpbmcgdmFyeWluZyBkZWxpbWl0ZXINCg0KamVmZkBnb2FsZHJpdmVuLmNvbTxt
YWlsdG86amVmZkBnb2FsZHJpdmVuLmNvbT4gd3JpdGVzOg0KPiBSdW5uaW5nIHRoZSBxdWVyeSBi
ZWxvdyB3aWxsIHNob3cgaG93IHRoZSBkZWxpbWl0ZXIgZm9yIHRoZSBjdXJyZW50DQo+IHJvdyBp
cyBhY3R1YWxseSB0aGUgc3Vic2VxdWVudCByb3dzIGRlbGltaXRlci4NCg0KSG1tLCB3ZWxsLCB0
aGUgZG9jdW1lbnRhdGlvbiBmb3Igc3RyaW5nX2FnZyBkb2Vzbid0IHNheSB3aGF0IGhhcHBlbnMg
d2hlbiB0aGUgImRlbGltaXRlciIgYXJndW1lbnQgdmFyaWVzIGFjcm9zcyByb3dzOyBidXQgYSBx
dWljayBsb29rIGF0IHRoZSBjb2RlIGZpbmRzIHRoYXQgdGhlIGZpcnN0LWNhbGwgZGVsaW1pdGVy
IGlzbid0IGFjdHVhbGx5IHVzZWQgYXQgYWxsLCBhbmQgb24gc3Vic2VxdWVudCBjYWxscyB0aGUg
ZGVsaW1pdGVyIGlzIGFwcGVuZGVkIHRvIHRoZSBydW5uaW5nIHJlc3VsdCBiZWZvcmUgdGhlIGFz
c29jaWF0ZWQgdmFsdWUgaXMuICBXaGljaCBzZWVtcyB0byBtZSB0byBiZSBhdCBsZWFzdCBhcyBy
ZWFzb25hYmxlLCBhbmQgY2VydGFpbmx5IGEgZ3JlYXQgZGVhbCBlYXNpZXIgdG8gaW1wbGVtZW50
LCBhcyB3aGF0IHlvdSBzZWVtIHRvIGhhdmUgaW4gbWluZC4gIENhbiB5b3Ugb2ZmZXIgYSBwcmlu
Y2lwbGVkIGFyZ3VtZW50IHdoeSBpdCBzaG91bGQgYmUgdGhlIG90aGVyIHdheSBhcm91bmQ/DQoN
CiAgICAgICAgICAgICAgICAgICAgICAgIHJlZ2FyZHMsIHRvbSBsYW5lDQoNCg0KLS0NClNlbnQg
dmlhIHBnc3FsLWJ1Z3MgbWFpbGluZyBsaXN0IChwZ3NxbC1idWdzQHBvc3RncmVzcWwub3JnPG1h
aWx0bzpwZ3NxbC1idWdzQHBvc3RncmVzcWwub3JnPikNClRvIG1ha2UgY2hhbmdlcyB0byB5b3Vy
IHN1YnNjcmlwdGlvbjoNCmh0dHA6Ly93d3cucG9zdGdyZXNxbC5vcmcvbWFpbHByZWYvcGdzcWwt
YnVncw0KDQo=

Re: BUG #10972: string_agg function incorrectly concatenating varying delimiter

От
Marti Raudsepp
Дата:
On Wed, Jul 16, 2014 at 3:16 AM,  <jeff@goaldriven.com> wrote:
> Running the query below will show how the delimiter for the current row is
> actually the subsequent rows delimiter.  The sequence goes like this:
>
> TestBlah        1
> ITTest  2
> testfail        3
>
> Yet, the concatened output is this:
>
> "testblah 2 ITtest 3 testfail"

I think you're misunderstanding the point of the 2nd argument. It's
not there for just concatenating two arguments together, it's there to
be a delimiter between the strings being concatenated. Almost always
it should be a constant. For example if you want a result like
"TestBlah,ITTest,testfail", that would be much uglier to do without
the delimiter argument.

The behavior you want is available as simply:
  string_agg(Field1 || ' ' || RowIndex::text, ' ')

Regards,
Marti