Re: Indexing on JSONB field not working
От | Tomas Vondra |
---|---|
Тема | Re: Indexing on JSONB field not working |
Дата | |
Msg-id | 20191221114003.yo56lv3o6thdvxtn@development обсуждение исходный текст |
Ответ на | Indexing on JSONB field not working ("Zhihong Zhang" <zhihong@gmail.com>) |
Ответы |
Re: Indexing on JSONB field not working
|
Список | pgsql-bugs |
On Fri, Dec 20, 2019 at 08:15:39PM -0500, Zhihong Zhang wrote: >Run those 2 EXPLAINs with seqscan off and on. See what difference it makes! > > >SET enable_seqscan = off; >explain analyze select id, _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision < 3.0limit 100; > >"Limit (cost=0.43..313.25 rows=100 width=53) (actual time=0.092..0.236 rows=7 loops=1)" >" -> Index Scan using assets_floatvalue_idx on assets (cost=0.43..2714072.57 rows=867607 width=53) (actual time=0.089..0.230rows=7 loops=1)" >" Index Cond: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)" >"Planning Time: 1.617 ms" >"Execution Time: 0.276 ms" > >SET enable_seqscan = on; >explain analyze select id, _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision < 3.0limit 100; > >"Limit (cost=0.00..107.95 rows=100 width=53) (actual time=41021.311..313501.746 rows=7 loops=1)" >" -> Seq Scan on assets (cost=0.00..936605.40 rows=867607 width=53) (actual time=41021.309..313501.732 rows=7 loops=1)" >" Filter: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)" >" Rows Removed by Filter: 2602824" >"Planning Time: 0.283 ms" >"Execution Time: 313501.777 ms" > Well, this confirms what I suspected before - the optimizer believes the seqscan plan is a bit cheaper compared to index scan (107 vs. 313) but that mismatches the actual performance. The question is why ... For LIMIT queries, I can think of two common issues - the cost estimate is computed as a simple linear approximation in the input relation. For example, we know the seqscan is expected to produce 867607 rows with a total cost of 936605, so the cost of producing just 100 rows is 100 * 936605 / 867607 = 107.95 But that assumes a number of things: (a) that the seqscan row estimate is correct, and that (b) the matching rows are uniformly distributed in the table. If it's misestimated, or if the rows are towards the end of the relation (i.e. after doing a most of the costed work) this estimate may be quite off. Can you do explain analyze of the query without the LIMIT? BTW a LIMIT without an ORDER BY is a bit strange. Also, maybe you could do try using a partial index (if the where condition does not change). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-bugs по дате отправления: