Обсуждение: BUG #3958: Self-Join Group-By Clause Produces Incorrect Results

Поиск
Список
Период
Сортировка

BUG #3958: Self-Join Group-By Clause Produces Incorrect Results

От
"Marcus Torres"
Дата:
The following bug has been logged online:

Bug reference:      3958
Logged by:          Marcus Torres
Email address:      marcsf23@yahoo.com
PostgreSQL version: 8.19
Operating system:   Linux - Ubuntu
Description:        Self-Join Group-By Clause Produces Incorrect Results
Details:

This is a repost of my original bug with self-contained sql to reproduce the
problem:

I wrote a simple self-join query to sum the transaction count of different
types of records in a audit table and the result set for the different sum
totals was the same which is incorrect.

SQL:
------
DROP TABLE T_AUDIT;
DROP TABLE T_POLICY;

CREATE TABLE T_AUDIT
(
   ID integer NOT NULL,
   POLICY_ID integer NOT NULL,
   AUDIT_DATE date NOT NULL,
   AUDIT_TYPE_CODE character varying(50) NOT NULL,
   TXN_COUNT integer NOT NULL DEFAULT 1
) WITHOUT OIDS
TABLESPACE pg_default;
ALTER TABLE t_audit OWNER TO postgres;
GRANT ALL ON TABLE t_audit TO public;

CREATE TABLE T_POLICY
(
   ID integer NOT NULL,
   CONTENT_POLICY_NAME character varying(50) NOT NULL
) WITHOUT OIDS
TABLESPACE pg_default;
ALTER TABLE t_audit OWNER TO postgres;
GRANT ALL ON TABLE t_policy TO public;

INSERT INTO T_POLICY VALUES (100, 'TEST POLICY');

INSERT INTO T_AUDIT VALUES (1000, 100, '2008-01-01','CONTENT_1',1);
INSERT INTO T_AUDIT VALUES (1001, 100, '2008-01-01','CONTENT_1',1);
INSERT INTO T_AUDIT VALUES (1002, 100, '2008-01-01','CONTENT_1',1);
INSERT INTO T_AUDIT VALUES (1003, 100, '2008-01-01','CONTENT_1',1);
INSERT INTO T_AUDIT VALUES (1004, 100, '2008-01-01','CONTENT_1',1);

INSERT INTO T_AUDIT VALUES (1005, 100, '2008-01-01','CONTENT_2',1);
INSERT INTO T_AUDIT VALUES (1006, 100, '2008-01-01','CONTENT_2',1);

SELECT A1.AUDIT_DATE,
       P.CONTENT_POLICY_NAME,
       SUM(A1.TXN_COUNT) AS SUM_1,
       SUM(A2.TXN_COUNT) AS SUM_2
FROM T_AUDIT A1,
     T_AUDIT A2,
     T_POLICY P
WHERE P.ID = A1.POLICY_ID
  AND P.ID = A2.POLICY_ID
  AND A1.POLICY_ID = A2.POLICY_ID
  AND A1.AUDIT_DATE = A2.AUDIT_DATE
  AND A1.AUDIT_TYPE_CODE = 'CONTENT_1'
  AND A2.AUDIT_TYPE_CODE = 'CONTENT_2'
GROUP BY A1.AUDIT_DATE, P.CONTENT_POLICY_NAME;

Re: BUG #3958: Self-Join Group-By Clause Produces Incorrect Results

От
"Heikki Linnakangas"
Дата:
Marcus Torres wrote:
> The following bug has been logged online:
>
> Bug reference:      3958
> Logged by:          Marcus Torres
> Email address:      marcsf23@yahoo.com
> PostgreSQL version: 8.19
> Operating system:   Linux - Ubuntu
> Description:        Self-Join Group-By Clause Produces Incorrect Results
> Details:
>
> This is a repost of my original bug with self-contained sql to reproduce the
> problem:
>
> I wrote a simple self-join query to sum the transaction count of different
> types of records in a audit table and the result set for the different sum
> totals was the same which is incorrect.

Looks perfectly correct to me.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: BUG #3958: Self-Join Group-By Clause Produces Incorrect Results

От
Tom Lane
Дата:
"Heikki Linnakangas" <heikki@enterprisedb.com> writes:
> Marcus Torres wrote:
>> I wrote a simple self-join query to sum the transaction count of different
>> types of records in a audit table and the result set for the different sum
>> totals was the same which is incorrect.

> Looks perfectly correct to me.

Me too.  The underlying data before grouping/aggregation is

regression=# select
     A1.AUDIT_DATE, P.CONTENT_POLICY_NAME, A1.TXN_COUNT, A2.TXN_COUNT
FROM T_AUDIT A1,
     T_AUDIT A2,
     T_POLICY P
WHERE P.ID = A1.POLICY_ID
  AND P.ID = A2.POLICY_ID
  AND A1.POLICY_ID = A2.POLICY_ID
  AND A1.AUDIT_DATE = A2.AUDIT_DATE
  AND A1.AUDIT_TYPE_CODE = 'CONTENT_1'
  AND A2.AUDIT_TYPE_CODE = 'CONTENT_2';
 audit_date | content_policy_name | txn_count | txn_count
------------+---------------------+-----------+-----------
 2008-01-01 | TEST POLICY         |         1 |         1
 2008-01-01 | TEST POLICY         |         1 |         1
 2008-01-01 | TEST POLICY         |         1 |         1
 2008-01-01 | TEST POLICY         |         1 |         1
 2008-01-01 | TEST POLICY         |         1 |         1
 2008-01-01 | TEST POLICY         |         1 |         1
 2008-01-01 | TEST POLICY         |         1 |         1
 2008-01-01 | TEST POLICY         |         1 |         1
 2008-01-01 | TEST POLICY         |         1 |         1
 2008-01-01 | TEST POLICY         |         1 |         1
(10 rows)

from which it's clear that given all ones in txn_count, the sums *must*
be the same because they're taken over the same number of rows.

I suspect what the OP needs is two separate queries (perhaps union'ed
together) not a self-join.

            regards, tom lane

Re: BUG #3958: Self-Join Group-By Clause Produces Incorrect Results

От
Marc SanF
Дата:
Please disregard the previous email.  After rereading what you sent, I realized that I need an outer join to A2 and not
simplya self join...thanks and my apologies! 

----- Original Message ----
From: Tom Lane <tgl@sss.pgh.pa.us>
To: Heikki Linnakangas <heikki@enterprisedb.com>
Cc: Marcus Torres <marcsf23@yahoo.com>; pgsql-bugs@postgresql.org
Sent: Wednesday, February 13, 2008 7:40:32 AM
Subject: Re: [BUGS] BUG #3958: Self-Join Group-By Clause Produces Incorrect Results


"Heikki
Linnakangas"
<heikki@enterprisedb.com>
writes:
>
Marcus
Torres
wrote:
>>
I
wrote
a
simple
self-join
query
to
sum
the
transaction
count
of
different
>>
types
of
records
in
a
audit
table
and
the
result
set
for
the
different
sum
>>
totals
was
the
same
which
is
incorrect.

>
Looks
perfectly
correct
to
me.

Me
too.
The
underlying
data
before
grouping/aggregation
is

regression=#
select



A1.AUDIT_DATE,
P.CONTENT_POLICY_NAME,
A1.TXN_COUNT,
A2.TXN_COUNT
FROM
T_AUDIT
A1,



T_AUDIT
A2,



T_POLICY
P
WHERE
P.ID
=
A1.POLICY_ID

AND
P.ID
=
A2.POLICY_ID

AND
A1.POLICY_ID
=
A2.POLICY_ID

AND
A1.AUDIT_DATE
=
A2.AUDIT_DATE

AND
A1.AUDIT_TYPE_CODE
=
'CONTENT_1'

AND
A2.AUDIT_TYPE_CODE
=
'CONTENT_2';

audit_date
|
content_policy_name
|
txn_count
|
txn_count
------------+---------------------+-----------+-----------

2008-01-01
|
TEST
POLICY




|




1
|




1

2008-01-01
|
TEST
POLICY




|




1
|




1

2008-01-01
|
TEST
POLICY




|




1
|




1

2008-01-01
|
TEST
POLICY




|




1
|




1

2008-01-01
|
TEST
POLICY




|




1
|




1

2008-01-01
|
TEST
POLICY




|




1
|




1

2008-01-01
|
TEST
POLICY




|




1
|




1

2008-01-01
|
TEST
POLICY




|




1
|




1

2008-01-01
|
TEST
POLICY




|




1
|




1

2008-01-01
|
TEST
POLICY




|




1
|




1
(10
rows)

from
which
it's
clear
that
given
all
ones
in
txn_count,
the
sums
*must*
be
the
same
because
they're
taken
over
the
same
number
of
rows.

I
suspect
what
the
OP
needs
is
two
separate
queries
(perhaps
union'ed
together)
not
a
self-join.




regards,
tom
lane






      ____________________________________________________________________________________
Never miss a thing.  Make Yahoo your home page.
http://www.yahoo.com/r/hs

Re: BUG #3958: Self-Join Group-By Clause Produces Incorrect Results

От
Marc SanF
Дата:
I respectfully challenge that the aggregation is correct.

In the where clause, I specify A2.AUDIT_TYPE_CODE = CONTENT_2, thus returning only 2 rows for A2 and not all of the
rowsin A2 which happen to have a TXN_COUNT of 1 / row but could in fact be any positive number.  I used 1 for
simplicity. Similarly, if you take out A1 from the query you receive the following result: 

audit_date    |    content_policy_name    |    sum_2
2008-01-01    |    TEST POLICY    |    2

I do not see how/why a self-join changes the condition specified in the where clause and thus returns a sum of 8 rows
thatdo not meet the specified condition in the query? 

Thanks in advance,
Marcus Torres

----- Original Message ----
From: Tom Lane <tgl@sss.pgh.pa.us>
To: Heikki Linnakangas <heikki@enterprisedb.com>
Cc: Marcus Torres <marcsf23@yahoo.com>; pgsql-bugs@postgresql.org
Sent: Wednesday, February 13, 2008 7:40:32 AM
Subject: Re: [BUGS] BUG #3958: Self-Join Group-By Clause Produces Incorrect Results


"Heikki
Linnakangas"
<heikki@enterprisedb.com>
writes:
>
Marcus
Torres
wrote:
>>
I
wrote
a
simple
self-join
query
to
sum
the
transaction
count
of
different
>>
types
of
records
in
a
audit
table
and
the
result
set
for
the
different
sum
>>
totals
was
the
same
which
is
incorrect.

>
Looks
perfectly
correct
to
me.

Me
too.
The
underlying
data
before
grouping/aggregation
is

regression=#
select



A1.AUDIT_DATE,
P.CONTENT_POLICY_NAME,
A1.TXN_COUNT,
A2.TXN_COUNT
FROM
T_AUDIT
A1,



T_AUDIT
A2,



T_POLICY
P
WHERE
P.ID
=
A1.POLICY_ID

AND
P.ID
=
A2.POLICY_ID

AND
A1.POLICY_ID
=
A2.POLICY_ID

AND
A1.AUDIT_DATE
=
A2.AUDIT_DATE

AND
A1.AUDIT_TYPE_CODE
=
'CONTENT_1'

AND
A2.AUDIT_TYPE_CODE
=
'CONTENT_2';

audit_date
|
content_policy_name
|
txn_count
|
txn_count
------------+---------------------+-----------+-----------

2008-01-01
|
TEST
POLICY




|




1
|




1

2008-01-01
|
TEST
POLICY




|




1
|




1

2008-01-01
|
TEST
POLICY




|




1
|




1

2008-01-01
|
TEST
POLICY




|




1
|




1

2008-01-01
|
TEST
POLICY




|




1
|




1

2008-01-01
|
TEST
POLICY




|




1
|




1

2008-01-01
|
TEST
POLICY




|




1
|




1

2008-01-01
|
TEST
POLICY




|




1
|




1

2008-01-01
|
TEST
POLICY




|




1
|




1

2008-01-01
|
TEST
POLICY




|




1
|




1
(10
rows)

from
which
it's
clear
that
given
all
ones
in
txn_count,
the
sums
*must*
be
the
same
because
they're
taken
over
the
same
number
of
rows.

I
suspect
what
the
OP
needs
is
two
separate
queries
(perhaps
union'ed
together)
not
a
self-join.




regards,
tom
lane






      ____________________________________________________________________________________
Never miss a thing.  Make Yahoo your home page.
http://www.yahoo.com/r/hs