Обсуждение: Bug or incorrect usage?

Поиск
Список
Период
Сортировка

Bug or incorrect usage?

От
Jordan Gigov
Дата:
So, I was trying to use left joins to get bot the total number of
corresponding rows and the number in a specific subset, but it seems I
can't do that in 9.4.6 (the changelogs after that don't suggest this has
changed).
This my small-scale test for it:

CREATE TABLE somedata (id bigserial, something text, PRIMARY KEY (id));
CREATE TABLE moredata (id bigserial, otherid bigint, status varchar,
PRIMARY KEY (id), FOREIGN KEY (otherid) REFERENCES somedata(id));
INSERT INTO somedata(something) VALUES ('Example 1'),('Example
2'),('Example 3');
INSERT INTO moredata(otherid,status)
VALUES(1,'NEW'),(3,'NEW'),(3,'OLD'),(3,'DEPRECATED');

SELECT somedata.id, somedata.something, count(md1.id), count(md2.id)
FROM somedata
LEFT JOIN moredata md1 ON (md1.otherid = somedata.id)
LEFT JOIN moredata md2 ON (md2.otherid = somedata.id AND md2.status NOT IN
('OLD', 'DEPRECATED'))
GROUP BY somedata.id, somedata.something;

What I expected the output to be is:
 id | something | count | count
----+-----------+-------+-------
  2 | Example 2 |     0 |     0
  3 | Example 3 |     3 |     1
  1 | Example 1 |     1 |     1
(3 rows)

Instead I got:
 id | something | count | count
----+-----------+-------+-------
  2 | Example 2 |     0 |     0
  3 | Example 3 |     3 |     3
  1 | Example 1 |     1 |     1
(3 rows)

Running the searches with individual joins:

SELECT somedata.id, somedata.something, count(md2.id)
FROM somedata
LEFT JOIN moredata md2 ON (md2.otherid = somedata.id AND md2.status NOT IN
('OLD', 'DEPRECATED'))
GROUP BY somedata.id, somedata.something;
 id | something | count
----+-----------+-------
  2 | Example 2 |     0
  3 | Example 3 |     1
  1 | Example 1 |     1

SELECT somedata.id, somedata.something, count(md1.id)
FROM somedata
LEFT JOIN moredata md1 ON (md1.otherid = somedata.id)
GROUP BY somedata.id, somedata.something;
 id | something | count
----+-----------+-------
  2 | Example 2 |     0
  3 | Example 3 |     3
  1 | Example 1 |     1


Am I misunderstanding something, or is some optimization messing-up my data?

Re: Bug or incorrect usage?

От
Vitaly Burovoy
Дата:
On 3/1/16, Jordan Gigov <coladict@gmail.com> wrote:
> So, I was trying to use left joins to get bot the total number of
> corresponding rows and the number in a specific subset, but it seems I
> can't do that in 9.4.6 (the changelogs after that don't suggest this has
> changed).
> This my small-scale test for it:
>
> CREATE TABLE somedata (id bigserial, something text, PRIMARY KEY (id));
> CREATE TABLE moredata (id bigserial, otherid bigint, status varchar,
> PRIMARY KEY (id), FOREIGN KEY (otherid) REFERENCES somedata(id));
> INSERT INTO somedata(something) VALUES ('Example 1'),('Example
> 2'),('Example 3');
> INSERT INTO moredata(otherid,status)
> VALUES(1,'NEW'),(3,'NEW'),(3,'OLD'),(3,'DEPRECATED');
>
> SELECT somedata.id, somedata.something, count(md1.id), count(md2.id)
> FROM somedata
> LEFT JOIN moredata md1 ON (md1.otherid = somedata.id)
> LEFT JOIN moredata md2 ON (md2.otherid = somedata.id AND md2.status NOT IN
> ('OLD', 'DEPRECATED'))
> GROUP BY somedata.id, somedata.something;
>
> What I expected the output to be is:
>  id | something | count | count
> ----+-----------+-------+-------
>   2 | Example 2 |     0 |     0
>   3 | Example 3 |     3 |     1
>   1 | Example 1 |     1 |     1
> (3 rows)
>
> Instead I got:
>  id | something | count | count
> ----+-----------+-------+-------
>   2 | Example 2 |     0 |     0
>   3 | Example 3 |     3 |     3
>   1 | Example 1 |     1 |     1
> (3 rows)

Ok, try to get result before grouping:
postgres=# SELECT
postgres-#     somedata.id
postgres-#     ,somedata.something
postgres-#         ,md1.id
postgres-#         ,md2.id
postgres-#
postgres-# --     ,count(md1.id)
postgres-# --     ,count(md2.id)
postgres-# FROM somedata
postgres-# LEFT JOIN moredata md1 ON (md1.otherid = somedata.id)
postgres-# LEFT JOIN moredata md2 ON (md2.otherid = somedata.id AND
md2.status NOT IN('OLD', 'DEPRECATED'))
postgres-# -- GROUP BY somedata.id, somedata.something;
postgres-# ;
 id | something | id | id
----+-----------+----+----
  1 | Example 1 |  1 |  1
  3 | Example 3 |  4 |  2
  3 | Example 3 |  3 |  2
  3 | Example 3 |  2 |  2
  2 | Example  +|    |
    | 2         |    |
(5 rows)

So it is very clear why count(md1.id) for somedata.id=3 gives result of 3...


> Running the searches with individual joins:
>
> SELECT somedata.id, somedata.something, count(md2.id)
> FROM somedata
> LEFT JOIN moredata md2 ON (md2.otherid = somedata.id AND md2.status NOT IN
> ('OLD', 'DEPRECATED'))
> GROUP BY somedata.id, somedata.something;
>  id | something | count
> ----+-----------+-------
>   2 | Example 2 |     0
>   3 | Example 3 |     1
>   1 | Example 1 |     1

Here is a non-grouped result (without the first joining there are just
few rows for grouping):

postgres=# SELECT somedata.id, somedata.something, md2.id --count(md2.id)
postgres-# FROM somedata
postgres-# LEFT JOIN moredata md2 ON (md2.otherid = somedata.id AND
md2.status NOT IN ('OLD', 'DEPRECATED'))
postgres-# -- GROUP BY somedata.id, somedata.something;
postgres-# ;
 id | something | id
----+-----------+----
  1 | Example 1 |  1
  3 | Example 3 |  2
  2 | Example  +|
    | 2         |
(3 rows)


> SELECT somedata.id, somedata.something, count(md1.id)
> FROM somedata
> LEFT JOIN moredata md1 ON (md1.otherid = somedata.id)
> GROUP BY somedata.id, somedata.something;
>  id | something | count
> ----+-----------+-------
>   2 | Example 2 |     0
>   3 | Example 3 |     3
>   1 | Example 1 |     1
>
>
> Am I misunderstanding something, or is some optimization messing-up my
> data?

So it works as expected.

--
Best regards,
Vitaly Burovoy