Обсуждение: Can null values be sorted low?
The problem...
From the table below I want to select distinct clus_id and the data in
3 db columns, even if that data is null.
When there are multiple contigs for a clus_id I want to select the row
where 1st, the most db columns have a value, 2nd the total value of the
db columns is highest.
So for
LRC00006 I want contig 3,
LRC00010 I want contig 4, because it has 3 db values (even though contig
2 would total more)
LRC00001 I want contig 2, because it would total more if the null value
was treated as zero in the calculation
and for the rest I want contig 1.
Any hints on writing such a query would be much appreciated.
lumbribase=# select clus_id,contig,db0,db1,db4,(db0+db1+db4) as total
from venn order by clus_id,total DESC;
clus_id | contig | db0 | db1 | db4 | total
----------+--------+------+------+------+-------
LRC00001 | 1 | 77.4 | | 63.9 |
LRC00001 | 2 | 77.4 | | 82.7 |
LRC00002 | 1 | 325 | 343 | 313 | 981
LRC00003 | 1 | | | |
LRC00004 | 1 | | | |
LRC00005 | 1 | 294 | 294 | 116 | 704
LRC00006 | 1 | 100 | 72.8 | |
LRC00006 | 3 | 120 | 122 | 63.9 | 305.9
LRC00006 | 2 | 117 | 112 | 58.5 | 287.5
LRC00007 | 1 | 178 | | |
LRC00008 | 1 | | | |
LRC00009 | 1 | 416 | | |
LRC00010 | 2 | 324 | 167 | |
LRC00010 | 4 | 146 | 168 | 172 | 486
LRC00010 | 1 | 146 | 166 | 171 | 483
LRC00010 | 3 | 145 | 160 | 159 | 464
LRC00011 | 1 | 179 | 100 | 95.1 | 374.1
LRC00012 | 1 | 639 | 639 | 633 | 1911
LRC00012 | 3 | 505 | 509 | 508 | 1522
LRC00012 | 2 | 390 | 391 | 392 | 1173
--
Ann
"In a world without walls and fences - who needs Windows and Gates ?"
(unknown)
--- ann hedley <ann.hedley@ed.ac.uk> wrote: > The problem... > From the table below I want to select distinct > clus_id and the data in > 3 db columns, even if that data is null. > > When there are multiple contigs for a clus_id I want > to select the row > where 1st, the most db columns have a value, 2nd the > total value of the > db columns is highest. > > So for > LRC00006 I want contig 3, > LRC00010 I want contig 4, because it has 3 db values > (even though contig > 2 would total more) > LRC00001 I want contig 2, because it would total > more if the null value > was treated as zero in the calculation > and for the rest I want contig 1. > > Any hints on writing such a query would be much If you don't mind using a PostgreSQL-only extension, "SELECT DISTINCT ON" may do what you want: check the "SELECT" page in the "SQL Commands" section of the manual. You will also want to use "coalesce" to replace null values, e.g. "select coalesce(db0,0)" > appreciated. > > lumbribase=# select > clus_id,contig,db0,db1,db4,(db0+db1+db4) as total > from venn order by clus_id,total DESC; > clus_id | contig | db0 | db1 | db4 | total > ----------+--------+------+------+------+------- > LRC00001 | 1 | 77.4 | | 63.9 | > LRC00001 | 2 | 77.4 | | 82.7 | > LRC00002 | 1 | 325 | 343 | 313 | 981 > LRC00003 | 1 | | | | > LRC00004 | 1 | | | | > LRC00005 | 1 | 294 | 294 | 116 | 704 > LRC00006 | 1 | 100 | 72.8 | | > LRC00006 | 3 | 120 | 122 | 63.9 | 305.9 > LRC00006 | 2 | 117 | 112 | 58.5 | 287.5 > LRC00007 | 1 | 178 | | | > LRC00008 | 1 | | | | > LRC00009 | 1 | 416 | | | > LRC00010 | 2 | 324 | 167 | | > LRC00010 | 4 | 146 | 168 | 172 | 486 > LRC00010 | 1 | 146 | 166 | 171 | 483 > LRC00010 | 3 | 145 | 160 | 159 | 464 > LRC00011 | 1 | 179 | 100 | 95.1 | 374.1 > LRC00012 | 1 | 639 | 639 | 633 | 1911 > LRC00012 | 3 | 505 | 509 | 508 | 1522 > LRC00012 | 2 | 390 | 391 | 392 | 1173 > > > > -- > Ann > > "In a world without walls and fences - who needs > Windows and Gates ?" > > (unknown) > > > > ---------------------------(end of > broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to > majordomo@postgresql.org > __________________________________ Yahoo! Mail Mobile Take Yahoo! Mail with you! Check email on your mobile phone. http://mobile.yahoo.com/learn/mail
OK
If your interested this is my solution to the problem. I'm sure there
are more efficient ways but this way works and that's what's important
at the moment.
While looking up Jeff's tips I discovered the CASE clause and used it to
generate two columns, a count of the dbs and a total across the dbs
avoiding the null values. I could then order these to get the required
row first.
SELECT distinct on (clus_id) clus_id,contig,db0,db1,db4,
CASE
WHEN db0 is not null and
db1 is not null and
db4 is not null
THEN 3
WHEN db0 is not null and
db1 is not null
THEN 2
WHEN db0 is not null and
db4 is not null
THEN 2
WHEN db1 is not null and
db4 is not null
THEN 2
END as db_count,
CASE
WHEN db0 is not null and
db1 is not null and
db4 is not null
THEN (db0+db1+db4)
WHEN db0 is not null and
db1 is not null
THEN (db0+db1)
WHEN db0 is not null and
db4 is not null
THEN (db0+db4)
WHEN db1 is not null and
db4 is not null
THEN (db0+db1)
WHEN db0 is not null
THEN db0
WHEN db1 is not null
THEN db1
WHEN db4 is not null
THEN db4
END as total
from venn order by clus_id,db_count DESC,total DESC;
clus_id | contig | db0 | db1 | db4 | db_count | total
----------+--------+---------+------+------+----------+-------
LRC00001 | 2 | 77.4 | | 82.7 | 2 | 160.1
LRC00002 | 1 | 325 | 343 | 313 | 3 | 981
LRC00003 | 1 | | | | |
LRC00004 | 1 | | | | |
LRC00005 | 1 | 294 | 294 | 116 | 3 | 704
LRC00006 | 3 | 120 | 122 | 63.9 | 3 | 305.9
LRC00007 | 1 | 178 | | | | 178
LRC00008 | 1 | | | | |
LRC00009 | 1 | 416 | | | | 416
LRC00010 | 4 | 146 | 168 | 172 | 3 | 486
LRC00011 | 1 | 179 | 100 | 95.1 | 3 | 374.1
LRC00012 | 1 | 639 | 639 | 633 | 3 | 1911
Jeff Eckermann wrote:
>--- ann hedley <ann.hedley@ed.ac.uk> wrote:
>
>
>>The problem...
>> From the table below I want to select distinct
>>clus_id and the data in
>>3 db columns, even if that data is null.
>>
>>When there are multiple contigs for a clus_id I want
>>to select the row
>>where 1st, the most db columns have a value, 2nd the
>>total value of the
>>db columns is highest.
>>
>>So for
>>LRC00006 I want contig 3,
>>LRC00010 I want contig 4, because it has 3 db values
>>(even though contig
>>2 would total more)
>>LRC00001 I want contig 2, because it would total
>>more if the null value
>>was treated as zero in the calculation
>>and for the rest I want contig 1.
>>
>>Any hints on writing such a query would be much
>>
>>
>
>If you don't mind using a PostgreSQL-only extension,
>"SELECT DISTINCT ON" may do what you want: check the
>"SELECT" page in the "SQL Commands" section of the
>manual.
>
>You will also want to use "coalesce" to replace null
>values, e.g. "select coalesce(db0,0)"
>
>
>
>>appreciated.
>>
>>lumbribase=# select
>>clus_id,contig,db0,db1,db4,(db0+db1+db4) as total
>>from venn order by clus_id,total DESC;
>> clus_id | contig | db0 | db1 | db4 | total
>>----------+--------+------+------+------+-------
>> LRC00001 | 1 | 77.4 | | 63.9 |
>> LRC00001 | 2 | 77.4 | | 82.7 |
>> LRC00002 | 1 | 325 | 343 | 313 | 981
>> LRC00003 | 1 | | | |
>> LRC00004 | 1 | | | |
>> LRC00005 | 1 | 294 | 294 | 116 | 704
>> LRC00006 | 1 | 100 | 72.8 | |
>> LRC00006 | 3 | 120 | 122 | 63.9 | 305.9
>> LRC00006 | 2 | 117 | 112 | 58.5 | 287.5
>> LRC00007 | 1 | 178 | | |
>> LRC00008 | 1 | | | |
>> LRC00009 | 1 | 416 | | |
>> LRC00010 | 2 | 324 | 167 | |
>> LRC00010 | 4 | 146 | 168 | 172 | 486
>> LRC00010 | 1 | 146 | 166 | 171 | 483
>> LRC00010 | 3 | 145 | 160 | 159 | 464
>> LRC00011 | 1 | 179 | 100 | 95.1 | 374.1
>> LRC00012 | 1 | 639 | 639 | 633 | 1911
>> LRC00012 | 3 | 505 | 509 | 508 | 1522
>> LRC00012 | 2 | 390 | 391 | 392 | 1173
>>
>>
>>
>>--
>>Ann
>>
>>"In a world without walls and fences - who needs
>>Windows and Gates ?"
>>
>> (unknown)
>>
>>
>>
>>---------------------------(end of
>>broadcast)---------------------------
>>TIP 1: subscribe and unsubscribe commands go to
>>majordomo@postgresql.org
>>
>>
>>
>
>
>
>__________________________________
>Yahoo! Mail Mobile
>Take Yahoo! Mail with you! Check email on your mobile phone.
>http://mobile.yahoo.com/learn/mail
>
>---------------------------(end of broadcast)---------------------------
>TIP 9: the planner will ignore your desire to choose an index scan if your
> joining column's datatypes do not match
>
>
--
Ann
"In a world without walls and fences - who needs Windows and Gates ?"
(unknown)