Re: again on index usage
От | Daniel Kalchev |
---|---|
Тема | Re: again on index usage |
Дата | |
Msg-id | 200201080922.LAA02480@dcave.digsys.bg обсуждение исходный текст |
Ответ на | Re: again on index usage (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: again on index usage
|
Список | pgsql-hackers |
>>>Tom Lane said:> It's difficult to tell from this what it thinks the selectivity of the> ipdate index would be, since therows estimate includes the effect of> the ipaddr and router restrictions. What do you get from just> > explain> SELECTsum(input) FROM iplog_gate200112 > WHERE > '2001-12-01 00:00:00+02' <= ipdate AND ipdate < '2001-12-02 00:00:00+02'A ND > '2001-12-01 00:00:00+02' <= ipdate AND ipdate < '2002-01-01 00:00:00+02'; Same result (sorry, should have included this originally): Aggregate (cost=47721.72..47721.72 rows=1 width=8) -> Seq Scan on iplog_gate200112 (cost=0.00..47579.54 rows=56873 width=8) > If you say "set enable_seqscan to off", does that change the plan? Yes. As expected (I no longer have the problem of NaN estimates :) Aggregate (cost=100359.71..100359.71 rows=1 width=8) -> Index Scan using iplog_gate200112_ipdate_idx on iplog_gate200112 (cost=0.00..100217.52 rows=56873 width=8) My belief is that the planner does not want to use index due to low value dispersion of the indexed attribute. When splitting the table into several smaller tables, index is used. This bites me, because each such query takes at least 3 minutes and the script that generates these needs to execute few thousands queries. > BTW, the planner does not associate function calls with indexes. If you> want to have the ipaddr index considered forthis query, you need to write> ipaddr <<= '193.68.240.0/20' not network_subeq(ipaddr, '193.68.240.0/20').> (But IIRC,that only works in 7.2 anyway, not earlier releases :-() This is what I though too, but using the ipdate index will be sufficient. I understand my complaint is not a bug, but rather question of proper planner optimization (it worked 'as expected' in 7.0). Perhaps the planner should consider the total number of rows, as well as the dispersion factor. With the dispersion being around 1.5% and total rows 1.7 million this gives about 25k rows with the same value - large enough to trigger sequential scan, as far as I understand it, but the cost of scanning 1.7 million rows sequentially is just too high. By the way, the same query takes approx 10 sec with set enable_seqscan to off. Daniel
В списке pgsql-hackers по дате отправления: