Обсуждение: workaround for expensive KNN?
hello all ...
given oleg's posting before i also wanted to fire up some KNN related question.
let us consider a simple example. i got some million lines and i want all rows matching a tsquery sorted by price.
i did some tests:
test=# explain (analyze true, buffers true, costs true) SELECT id FROM product.t_product WHERE to_tsvector('german',
title)@@ to_tsquery('german', 'iphone') ORDER BY int_price <-> 0 LIMIT 10;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
--------------------------------------Limit (cost=0.00..41.11 rows=10 width=16) (actual time=36391.717..45542.590
rows=10loops=1) Buffers: shared hit=9 read=5004 -> Index Scan using idx_product_t_product_titleprice on t_product
(cost=0.00..13251.91rows=3224 width=16) (actual time=
36391.715..45542.573 rows=10 loops=1) Index Cond: (to_tsvector('german'::regconfig, title) @@
'''iphon'''::tsquery) Order By: (int_price <-> 0::bigint) Buffers: shared hit=9 read=5004Total runtime:
45542.676ms
(7 rows)
test=# explain (analyze true, buffers true, costs true) SELECT id FROM product.t_product WHERE to_tsvector('german',
title)@@ to_tsquery('german', 'handy') ORDER BY int_price <-> 0 LIMIT 10;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
-------------------------------------Limit (cost=0.00..41.03 rows=10 width=16) (actual time=7243.526..10935.227
rows=10loops=1) Buffers: shared hit=3 read=2316 -> Index Scan using idx_product_t_product_titleprice on t_product
(cost=0.00..29762.61rows=7255 width=16) (actual time=
7243.524..10935.217 rows=10 loops=1) Index Cond: (to_tsvector('german'::regconfig, title) @@
'''handy'''::tsquery) Order By: (int_price <-> 0::bigint) Buffers: shared hit=3 read=2316Total runtime:
10935.265ms
(7 rows)
test=# explain (analyze true, buffers true, costs true) SELECT id FROM product.t_product WHERE to_tsvector('german',
title)@@ to_tsquery('german', 'handy') ORDER BY int_price <-> 0 LIMIT 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
-------------------------------Limit (cost=0.00..4.10 rows=1 width=16) (actual time=28.527..28.528 rows=1 loops=1)
Buffers:shared hit=1 read=1577 -> Index Scan using idx_product_t_product_titleprice on t_product
(cost=0.00..29762.61rows=7255 width=16) (actual time=
28.525..28.525 rows=1 loops=1) Index Cond: (to_tsvector('german'::regconfig, title) @@ '''handy'''::tsquery)
Order By: (int_price <-> 0::bigint) Buffers: shared hit=1 read=1577Total runtime: 28.558 ms
(7 rows)
under any circumstances - there is no way to reduce the number of buffers needed for a query like that.
if everything is cached this is still ok but as soon as you have to take a single block from disk you will die a
painfulrandom I/O death.
is there any alternative which does not simply die when i try to achieve what i want?
the use case is quite simple: all products with a certain word (10 cheapest or so).
is there any alternative approach to this?
i was putting some hope into KNN but it seems it needs too much random I/O :(.
many thanks,
hans
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
Probably, you miss two-columnt index. From my early post: http://www.sai.msu.su/~megera/wiki/knngist =# CREATE INDEX spots_idx ON spots USING knngist (coordinates, to_tsvector('french',address)); =# SELECT id, address, (coordinates <-> '(2.29470491409302,48.858263472125)'::point) AS dist FROM spots WHERE coordinates >< '(2.29470491409302,48.858263472125)'::point AND to_tsvector('french',address) @@ to_tsquery('french','mars') LIMIT 10; id | address | dist ---------+-------------------------------------------------------------+--------------------- 366096 | 1st Floor Tour Eiffel| Champs de Mars, Paris 75007, France | 2.32488941293945e-05 4356328 | r Champ de Mars 75007 PARIS | 0.00421854756964406 5200167 | Champ De Mars 75007 Paris | 0.004535645625872889301676 | Champ de Mars, 75007 Paris, | 0.00453564562587288 2152213 |16, ave Rapp, Champ de Mars, Tour Eiffel, Paris, France | 0.00624152097590896 1923818 | Champ de Mars Paris, France | 0.00838214733539654 5165953 | 39 Rue Champ De Mars Paris, France | 0.00874410234569529 7395870 | 39 Rue Champ De Mars Paris, France | 0.008744102345695294358671 | 32 Rue Champ De Mars Paris, France | 0.00876089659276339 1923742 |12 rue du Champ de Mars Paris, France | 0.00876764731845995 (10 rows) Time: 7.859 ms =# EXPLAIN (COSTS OFF) SELECT id, address FROM spots WHERE coordinates >< '(2.29470491409302,48.858263472125)'::point AND to_tsvector('french',address) @@ to_tsquery('french','mars') LIMIT 10; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------- Limit -> Index Scan using spots_idx on spots Index Cond: ((coordinates >< '(2.29470491409302,48.858263472125)'::point)AND (to_tsvector('french'::regconfig, address) @@ '''mar'''::tsquery)) (3 rows) On Fri, 8 Apr 2011, PostgreSQL - Hans-J?rgen Sch?nig wrote: > hello all ... > > given oleg's posting before i also wanted to fire up some KNN related question. > let us consider a simple example. i got some million lines and i want all rows matching a tsquery sorted by price. > i did some tests: > > test=# explain (analyze true, buffers true, costs true) SELECT id FROM product.t_product WHERE to_tsvector('german', title)@@ to_tsquery('german', 'iphone') ORDER BY int_price <-> 0 LIMIT 10; > QUERY PLAN > > ----------------------------------------------------------------------------------------------------------------------------- > -------------------------------------- > Limit (cost=0.00..41.11 rows=10 width=16) (actual time=36391.717..45542.590 rows=10 loops=1) > Buffers: shared hit=9 read=5004 > -> Index Scan using idx_product_t_product_titleprice on t_product (cost=0.00..13251.91 rows=3224 width=16) (actualtime= > 36391.715..45542.573 rows=10 loops=1) > Index Cond: (to_tsvector('german'::regconfig, title) @@ '''iphon'''::tsquery) > Order By: (int_price <-> 0::bigint) > Buffers: shared hit=9 read=5004 > Total runtime: 45542.676 ms > (7 rows) > > test=# explain (analyze true, buffers true, costs true) SELECT id FROM product.t_product WHERE to_tsvector('german', title)@@ to_tsquery('german', 'handy') ORDER BY int_price <-> 0 LIMIT 10; > QUERY PLAN > > ----------------------------------------------------------------------------------------------------------------------------- > ------------------------------------- > Limit (cost=0.00..41.03 rows=10 width=16) (actual time=7243.526..10935.227 rows=10 loops=1) > Buffers: shared hit=3 read=2316 > -> Index Scan using idx_product_t_product_titleprice on t_product (cost=0.00..29762.61 rows=7255 width=16) (actualtime= > 7243.524..10935.217 rows=10 loops=1) > Index Cond: (to_tsvector('german'::regconfig, title) @@ '''handy'''::tsquery) > Order By: (int_price <-> 0::bigint) > Buffers: shared hit=3 read=2316 > Total runtime: 10935.265 ms > (7 rows) > > test=# explain (analyze true, buffers true, costs true) SELECT id FROM product.t_product WHERE to_tsvector('german', title)@@ to_tsquery('german', 'handy') ORDER BY int_price <-> 0 LIMIT 1; > QUERY PLAN > > ----------------------------------------------------------------------------------------------------------------------------- > ------------------------------- > Limit (cost=0.00..4.10 rows=1 width=16) (actual time=28.527..28.528 rows=1 loops=1) > Buffers: shared hit=1 read=1577 > -> Index Scan using idx_product_t_product_titleprice on t_product (cost=0.00..29762.61 rows=7255 width=16) (actualtime= > 28.525..28.525 rows=1 loops=1) > Index Cond: (to_tsvector('german'::regconfig, title) @@ '''handy'''::tsquery) > Order By: (int_price <-> 0::bigint) > Buffers: shared hit=1 read=1577 > Total runtime: 28.558 ms > (7 rows) > > > under any circumstances - there is no way to reduce the number of buffers needed for a query like that. > if everything is cached this is still ok but as soon as you have to take a single block from disk you will die a painfulrandom I/O death. > is there any alternative which does not simply die when i try to achieve what i want? > > the use case is quite simple: all products with a certain word (10 cheapest or so). > > is there any alternative approach to this? > i was putting some hope into KNN but it seems it needs too much random I/O :(. > > many thanks, > > hans > > -- > Cybertec Sch?nig & Sch?nig GmbH > Gr?hrm?hlgasse 26 > A-2700 Wiener Neustadt, Austria > Web: http://www.postgresql-support.de > > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
Hans,
what if you create index (price,title) ?
On Fri, 8 Apr 2011, PostgreSQL - Hans-J?rgen Sch?nig wrote:
> hello ...
>
> i got that one ...
>
> "idx_product_t_product_titleprice" gist (to_tsvector('german'::regconfig, title), int_price)
>
> so, i have a combined index on text + number.
> to me the plan seems fine ... it looks like a prober KNN traversal.
> the difference between my plan and your plan seems to be the fact that i have, say, 1 mio rows which have "handy" or
soin it (1 mio out of 11 mio or so). you are moving out from one specific place.
>
> my maths is like that:
> 11 mio in total
> 1 mio matching "iphone"
> cheapest / most expensive 10 out of this mio needed.
>
> operator classes are all nice and in place:
>
> SELECT 10 <-> 4 as distance;
> distance
> ----------
> 6
> (1 row)
>
> what does "buffers true" in your case say?
>
> many thanks,
>
> hans
>
>
> On Apr 8, 2011, at 3:22 PM, Oleg Bartunov wrote:
>
>> Probably, you miss two-columnt index. From my early post:
>> http://www.sai.msu.su/~megera/wiki/knngist
>>
>> =# CREATE INDEX spots_idx ON spots USING knngist (coordinates, to_tsvector('french',address));
>> =# SELECT id, address, (coordinates <-> '(2.29470491409302,48.858263472125)'::point) AS dist FROM spots WHERE
coordinates>< '(2.29470491409302,48.858263472125)'::point AND to_tsvector('french',address) @@
to_tsquery('french','mars') LIMIT 10;
>> id | address | dist
---------+-------------------------------------------------------------+---------------------
>> 366096 | 1st Floor Tour Eiffel | Champs de Mars, Paris 75007, France | 2.32488941293945e-05
>> 4356328 | r Champ de Mars 75007 PARIS | 0.00421854756964406
>> 5200167 | Champ De Mars 75007 Paris | 0.00453564562587288
>> 9301676 | Champ de Mars, 75007 Paris, | 0.00453564562587288
>> 2152213 | 16, ave Rapp, Champ de Mars, Tour Eiffel, Paris, France | 0.00624152097590896
>> 1923818 | Champ de Mars Paris, France | 0.00838214733539654
>> 5165953 | 39 Rue Champ De Mars Paris, France | 0.00874410234569529
>> 7395870 | 39 Rue Champ De Mars Paris, France | 0.00874410234569529
>> 4358671 | 32 Rue Champ De Mars Paris, France | 0.00876089659276339
>> 1923742 | 12 rue du Champ de Mars Paris, France | 0.00876764731845995
>> (10 rows)
>>
>> Time: 7.859 ms
>>
>> =# EXPLAIN (COSTS OFF) SELECT id, address FROM spots WHERE coordinates ><
'(2.29470491409302,48.858263472125)'::point
>> AND to_tsvector('french',address) @@ to_tsquery('french','mars') LIMIT 10;
>>
>> QUERY PLAN
>>
--------------------------------------------------------------------------------------------------------------------------------------
>> Limit
>> -> Index Scan using spots_idx on spots
>> Index Cond: ((coordinates >< '(2.29470491409302,48.858263472125)'::point) AND
(to_tsvector('french'::regconfig,address) @@ '''mar'''::tsquery))
>> (3 rows)
>>
>>
>> On Fri, 8 Apr 2011, PostgreSQL - Hans-J?rgen Sch?nig wrote:
>>
>>> hello all ...
>>>
>>> given oleg's posting before i also wanted to fire up some KNN related question.
>>> let us consider a simple example. i got some million lines and i want all rows matching a tsquery sorted by price.
>>> i did some tests:
>>>
>>> test=# explain (analyze true, buffers true, costs true) SELECT id FROM product.t_product WHERE
to_tsvector('german',title) @@ to_tsquery('german', 'iphone') ORDER BY int_price <-> 0 LIMIT 10;
>>> QUERY PLAN
>>>
>>>
-----------------------------------------------------------------------------------------------------------------------------
>>> --------------------------------------
>>> Limit (cost=0.00..41.11 rows=10 width=16) (actual time=36391.717..45542.590 rows=10 loops=1)
>>> Buffers: shared hit=9 read=5004
>>> -> Index Scan using idx_product_t_product_titleprice on t_product (cost=0.00..13251.91 rows=3224 width=16)
(actualtime=
>>> 36391.715..45542.573 rows=10 loops=1)
>>> Index Cond: (to_tsvector('german'::regconfig, title) @@ '''iphon'''::tsquery)
>>> Order By: (int_price <-> 0::bigint)
>>> Buffers: shared hit=9 read=5004
>>> Total runtime: 45542.676 ms
>>> (7 rows)
>>>
>>> test=# explain (analyze true, buffers true, costs true) SELECT id FROM product.t_product WHERE
to_tsvector('german',title) @@ to_tsquery('german', 'handy') ORDER BY int_price <-> 0 LIMIT 10;
>>> QUERY PLAN
>>>
>>>
-----------------------------------------------------------------------------------------------------------------------------
>>> -------------------------------------
>>> Limit (cost=0.00..41.03 rows=10 width=16) (actual time=7243.526..10935.227 rows=10 loops=1)
>>> Buffers: shared hit=3 read=2316
>>> -> Index Scan using idx_product_t_product_titleprice on t_product (cost=0.00..29762.61 rows=7255 width=16)
(actualtime=
>>> 7243.524..10935.217 rows=10 loops=1)
>>> Index Cond: (to_tsvector('german'::regconfig, title) @@ '''handy'''::tsquery)
>>> Order By: (int_price <-> 0::bigint)
>>> Buffers: shared hit=3 read=2316
>>> Total runtime: 10935.265 ms
>>> (7 rows)
>>>
>>> test=# explain (analyze true, buffers true, costs true) SELECT id FROM product.t_product WHERE
to_tsvector('german',title) @@ to_tsquery('german', 'handy') ORDER BY int_price <-> 0 LIMIT 1;
>>> QUERY PLAN
>>>
>>>
-----------------------------------------------------------------------------------------------------------------------------
>>> -------------------------------
>>> Limit (cost=0.00..4.10 rows=1 width=16) (actual time=28.527..28.528 rows=1 loops=1)
>>> Buffers: shared hit=1 read=1577
>>> -> Index Scan using idx_product_t_product_titleprice on t_product (cost=0.00..29762.61 rows=7255 width=16)
(actualtime=
>>> 28.525..28.525 rows=1 loops=1)
>>> Index Cond: (to_tsvector('german'::regconfig, title) @@ '''handy'''::tsquery)
>>> Order By: (int_price <-> 0::bigint)
>>> Buffers: shared hit=1 read=1577
>>> Total runtime: 28.558 ms
>>> (7 rows)
>>>
>>>
>>> under any circumstances - there is no way to reduce the number of buffers needed for a query like that.
>>> if everything is cached this is still ok but as soon as you have to take a single block from disk you will die a
painfulrandom I/O death.
>>> is there any alternative which does not simply die when i try to achieve what i want?
>>>
>>> the use case is quite simple: all products with a certain word (10 cheapest or so).
>>>
>>> is there any alternative approach to this?
>>> i was putting some hope into KNN but it seems it needs too much random I/O :(.
>>>
>>> many thanks,
>>>
>>> hans
>>>
>>> --
>>> Cybertec Sch?nig & Sch?nig GmbH
>>> Gr?hrm?hlgasse 26
>>> A-2700 Wiener Neustadt, Austria
>>> Web: http://www.postgresql-support.de
>>>
>>>
>>>
>>
>> Regards,
>> Oleg
>> _____________________________________________________________
>> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
>> Sternberg Astronomical Institute, Moscow University, Russia
>> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
>> phone: +007(495)939-16-83, +007(495)939-23-83
>>
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>>
>
> --
> Cybertec Sch?nig & Sch?nig GmbH
> Gr?hrm?hlgasse 26
> A-2700 Wiener Neustadt, Austria
> Web: http://www.postgresql-support.de
>
Regards, Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
Oleg Bartunov <oleg@sai.msu.su> writes:
> what if you create index (price,title) ?
I think that SELECT ... WHERE ... ORDER BY ... LIMIT is basically an
intractable problem. We've recognized the difficulty in connection with
btree indexes for a long time, and there is no reason at all to think
that KNNGist will somehow magically dodge it. You can either visit
*all* of the rows satisfying WHERE (and then sort them), or you can
visit the rows in ORDER BY order and hope that you find enough of them
satisfying the WHERE in a reasonable amount of time. Either of these
strategies loses badly in many real-world cases.
Maybe with some sort of fuzzy notion of ordering it'd be possible to go
faster, but as long as you insist on an exact ORDER BY result, I don't
see any way out of it.
One way to be fuzzy is to introduce a maximum search distance:
SELECT ... WHERE x < limit AND other-conditions ORDER BY x LIMIT n
which essentially works by limiting the damage in the visit-all-the-rows
approach. Hans didn't do that in his example, but I wonder how much
it'd help (and whether the existing GIST support is adequate for it).
regards, tom lane
hello ...
i got that one ...
"idx_product_t_product_titleprice" gist (to_tsvector('german'::regconfig, title), int_price)
so, i have a combined index on text + number.
to me the plan seems fine ... it looks like a prober KNN traversal.
the difference between my plan and your plan seems to be the fact that i have, say, 1 mio rows which have "handy" or so
init (1 mio out of 11 mio or so). you are moving out from one specific place.
my maths is like that:11 mio in total1 mio matching "iphone"cheapest / most expensive 10 out of this mio needed.
operator classes are all nice and in place:
SELECT 10 <-> 4 as distance;distance
---------- 6
(1 row)
what does "buffers true" in your case say?
many thanks,
hans
On Apr 8, 2011, at 3:22 PM, Oleg Bartunov wrote:
> Probably, you miss two-columnt index. From my early post:
> http://www.sai.msu.su/~megera/wiki/knngist
>
> =# CREATE INDEX spots_idx ON spots USING knngist (coordinates, to_tsvector('french',address));
> =# SELECT id, address, (coordinates <-> '(2.29470491409302,48.858263472125)'::point) AS dist FROM spots WHERE
coordinates>< '(2.29470491409302,48.858263472125)'::point AND to_tsvector('french',address) @@
to_tsquery('french','mars') LIMIT 10;
> id | address | dist
---------+-------------------------------------------------------------+---------------------
> 366096 | 1st Floor Tour Eiffel | Champs de Mars, Paris 75007, France | 2.32488941293945e-05
> 4356328 | r Champ de Mars 75007 PARIS | 0.00421854756964406
> 5200167 | Champ De Mars 75007 Paris | 0.00453564562587288
> 9301676 | Champ de Mars, 75007 Paris, | 0.00453564562587288
> 2152213 | 16, ave Rapp, Champ de Mars, Tour Eiffel, Paris, France | 0.00624152097590896
> 1923818 | Champ de Mars Paris, France | 0.00838214733539654
> 5165953 | 39 Rue Champ De Mars Paris, France | 0.00874410234569529
> 7395870 | 39 Rue Champ De Mars Paris, France | 0.00874410234569529
> 4358671 | 32 Rue Champ De Mars Paris, France | 0.00876089659276339
> 1923742 | 12 rue du Champ de Mars Paris, France | 0.00876764731845995
> (10 rows)
>
> Time: 7.859 ms
>
> =# EXPLAIN (COSTS OFF) SELECT id, address FROM spots WHERE coordinates >< '(2.29470491409302,48.858263472125)'::point
> AND to_tsvector('french',address) @@ to_tsquery('french','mars') LIMIT 10;
>
> QUERY PLAN
>
--------------------------------------------------------------------------------------------------------------------------------------
> Limit
> -> Index Scan using spots_idx on spots
> Index Cond: ((coordinates >< '(2.29470491409302,48.858263472125)'::point) AND
(to_tsvector('french'::regconfig,address) @@ '''mar'''::tsquery))
> (3 rows)
>
>
> On Fri, 8 Apr 2011, PostgreSQL - Hans-J?rgen Sch?nig wrote:
>
>> hello all ...
>>
>> given oleg's posting before i also wanted to fire up some KNN related question.
>> let us consider a simple example. i got some million lines and i want all rows matching a tsquery sorted by price.
>> i did some tests:
>>
>> test=# explain (analyze true, buffers true, costs true) SELECT id FROM product.t_product WHERE to_tsvector('german',
title)@@ to_tsquery('german', 'iphone') ORDER BY int_price <-> 0 LIMIT 10;
>> QUERY PLAN
>>
>>
-----------------------------------------------------------------------------------------------------------------------------
>> --------------------------------------
>> Limit (cost=0.00..41.11 rows=10 width=16) (actual time=36391.717..45542.590 rows=10 loops=1)
>> Buffers: shared hit=9 read=5004
>> -> Index Scan using idx_product_t_product_titleprice on t_product (cost=0.00..13251.91 rows=3224 width=16)
(actualtime=
>> 36391.715..45542.573 rows=10 loops=1)
>> Index Cond: (to_tsvector('german'::regconfig, title) @@ '''iphon'''::tsquery)
>> Order By: (int_price <-> 0::bigint)
>> Buffers: shared hit=9 read=5004
>> Total runtime: 45542.676 ms
>> (7 rows)
>>
>> test=# explain (analyze true, buffers true, costs true) SELECT id FROM product.t_product WHERE to_tsvector('german',
title)@@ to_tsquery('german', 'handy') ORDER BY int_price <-> 0 LIMIT 10;
>> QUERY PLAN
>>
>>
-----------------------------------------------------------------------------------------------------------------------------
>> -------------------------------------
>> Limit (cost=0.00..41.03 rows=10 width=16) (actual time=7243.526..10935.227 rows=10 loops=1)
>> Buffers: shared hit=3 read=2316
>> -> Index Scan using idx_product_t_product_titleprice on t_product (cost=0.00..29762.61 rows=7255 width=16)
(actualtime=
>> 7243.524..10935.217 rows=10 loops=1)
>> Index Cond: (to_tsvector('german'::regconfig, title) @@ '''handy'''::tsquery)
>> Order By: (int_price <-> 0::bigint)
>> Buffers: shared hit=3 read=2316
>> Total runtime: 10935.265 ms
>> (7 rows)
>>
>> test=# explain (analyze true, buffers true, costs true) SELECT id FROM product.t_product WHERE to_tsvector('german',
title)@@ to_tsquery('german', 'handy') ORDER BY int_price <-> 0 LIMIT 1;
>> QUERY PLAN
>>
>>
-----------------------------------------------------------------------------------------------------------------------------
>> -------------------------------
>> Limit (cost=0.00..4.10 rows=1 width=16) (actual time=28.527..28.528 rows=1 loops=1)
>> Buffers: shared hit=1 read=1577
>> -> Index Scan using idx_product_t_product_titleprice on t_product (cost=0.00..29762.61 rows=7255 width=16)
(actualtime=
>> 28.525..28.525 rows=1 loops=1)
>> Index Cond: (to_tsvector('german'::regconfig, title) @@ '''handy'''::tsquery)
>> Order By: (int_price <-> 0::bigint)
>> Buffers: shared hit=1 read=1577
>> Total runtime: 28.558 ms
>> (7 rows)
>>
>>
>> under any circumstances - there is no way to reduce the number of buffers needed for a query like that.
>> if everything is cached this is still ok but as soon as you have to take a single block from disk you will die a
painfulrandom I/O death.
>> is there any alternative which does not simply die when i try to achieve what i want?
>>
>> the use case is quite simple: all products with a certain word (10 cheapest or so).
>>
>> is there any alternative approach to this?
>> i was putting some hope into KNN but it seems it needs too much random I/O :(.
>>
>> many thanks,
>>
>> hans
>>
>> --
>> Cybertec Sch?nig & Sch?nig GmbH
>> Gr?hrm?hlgasse 26
>> A-2700 Wiener Neustadt, Austria
>> Web: http://www.postgresql-support.de
>>
>>
>>
>
> Regards,
> Oleg
> _____________________________________________________________
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
Oops, my previous example was fromm early prototype :) I just
recreated test environment for 9.1:
knn=# select count(*) from spots; count
-------- 908846
(1 row)
knn=# explain (analyze true, buffers true) SELECT id, address, (coordinates <->
'(2.29470491409302,48.858263472125)'::point)AS dist FROM spots
WHERE to_tsvector('french',address) @@ to_tsquery('french','mars')
ORDER BY coordinates <-> '(2.29470491409302,48.858263472125)'::point
LIMIT 10; QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..33.63 rows=10 width=58) (actual time=1.541..1.875 rows=10 loops=1) Buffers: shared hit=251 ->
IndexScan using spots_idx on spots (cost=0.00..15279.12 rows=4544 width=58) (actual time=1.540..1.874 rows=10 loops=1)
Index Cond: (to_tsvector('french'::regconfig, address) @@ '''mar'''::tsquery) Order By: (coordinates <->
'(2.29470491409302,48.858263472125)'::point) Buffers: shared hit=251 Total runtime: 1.905 ms
(7 rows)
Time: 2.372 ms
On Fri, 8 Apr 2011, PostgreSQL - Hans-J?rgen Sch?nig wrote:
> hello ...
>
> i got that one ...
>
> "idx_product_t_product_titleprice" gist (to_tsvector('german'::regconfig, title), int_price)
>
> so, i have a combined index on text + number.
> to me the plan seems fine ... it looks like a prober KNN traversal.
> the difference between my plan and your plan seems to be the fact that i have, say, 1 mio rows which have "handy" or
soin it (1 mio out of 11 mio or so). you are moving out from one specific place.
>
> my maths is like that:
> 11 mio in total
> 1 mio matching "iphone"
> cheapest / most expensive 10 out of this mio needed.
>
> operator classes are all nice and in place:
>
> SELECT 10 <-> 4 as distance;
> distance
> ----------
> 6
> (1 row)
>
> what does "buffers true" in your case say?
>
> many thanks,
>
> hans
>
>
> On Apr 8, 2011, at 3:22 PM, Oleg Bartunov wrote:
>
>> Probably, you miss two-columnt index. From my early post:
>> http://www.sai.msu.su/~megera/wiki/knngist
>>
>> =# CREATE INDEX spots_idx ON spots USING knngist (coordinates, to_tsvector('french',address));
>> =# SELECT id, address, (coordinates <-> '(2.29470491409302,48.858263472125)'::point) AS dist FROM spots WHERE
coordinates>< '(2.29470491409302,48.858263472125)'::point AND to_tsvector('french',address) @@
to_tsquery('french','mars') LIMIT 10;
>> id | address | dist
---------+-------------------------------------------------------------+---------------------
>> 366096 | 1st Floor Tour Eiffel | Champs de Mars, Paris 75007, France | 2.32488941293945e-05
>> 4356328 | r Champ de Mars 75007 PARIS | 0.00421854756964406
>> 5200167 | Champ De Mars 75007 Paris | 0.00453564562587288
>> 9301676 | Champ de Mars, 75007 Paris, | 0.00453564562587288
>> 2152213 | 16, ave Rapp, Champ de Mars, Tour Eiffel, Paris, France | 0.00624152097590896
>> 1923818 | Champ de Mars Paris, France | 0.00838214733539654
>> 5165953 | 39 Rue Champ De Mars Paris, France | 0.00874410234569529
>> 7395870 | 39 Rue Champ De Mars Paris, France | 0.00874410234569529
>> 4358671 | 32 Rue Champ De Mars Paris, France | 0.00876089659276339
>> 1923742 | 12 rue du Champ de Mars Paris, France | 0.00876764731845995
>> (10 rows)
>>
>> Time: 7.859 ms
>>
>> =# EXPLAIN (COSTS OFF) SELECT id, address FROM spots WHERE coordinates ><
'(2.29470491409302,48.858263472125)'::point
>> AND to_tsvector('french',address) @@ to_tsquery('french','mars') LIMIT 10;
>>
>> QUERY PLAN
>>
--------------------------------------------------------------------------------------------------------------------------------------
>> Limit
>> -> Index Scan using spots_idx on spots
>> Index Cond: ((coordinates >< '(2.29470491409302,48.858263472125)'::point) AND
(to_tsvector('french'::regconfig,address) @@ '''mar'''::tsquery))
>> (3 rows)
>>
>>
>> On Fri, 8 Apr 2011, PostgreSQL - Hans-J?rgen Sch?nig wrote:
>>
>>> hello all ...
>>>
>>> given oleg's posting before i also wanted to fire up some KNN related question.
>>> let us consider a simple example. i got some million lines and i want all rows matching a tsquery sorted by price.
>>> i did some tests:
>>>
>>> test=# explain (analyze true, buffers true, costs true) SELECT id FROM product.t_product WHERE
to_tsvector('german',title) @@ to_tsquery('german', 'iphone') ORDER BY int_price <-> 0 LIMIT 10;
>>> QUERY PLAN
>>>
>>>
-----------------------------------------------------------------------------------------------------------------------------
>>> --------------------------------------
>>> Limit (cost=0.00..41.11 rows=10 width=16) (actual time=36391.717..45542.590 rows=10 loops=1)
>>> Buffers: shared hit=9 read=5004
>>> -> Index Scan using idx_product_t_product_titleprice on t_product (cost=0.00..13251.91 rows=3224 width=16)
(actualtime=
>>> 36391.715..45542.573 rows=10 loops=1)
>>> Index Cond: (to_tsvector('german'::regconfig, title) @@ '''iphon'''::tsquery)
>>> Order By: (int_price <-> 0::bigint)
>>> Buffers: shared hit=9 read=5004
>>> Total runtime: 45542.676 ms
>>> (7 rows)
>>>
>>> test=# explain (analyze true, buffers true, costs true) SELECT id FROM product.t_product WHERE
to_tsvector('german',title) @@ to_tsquery('german', 'handy') ORDER BY int_price <-> 0 LIMIT 10;
>>> QUERY PLAN
>>>
>>>
-----------------------------------------------------------------------------------------------------------------------------
>>> -------------------------------------
>>> Limit (cost=0.00..41.03 rows=10 width=16) (actual time=7243.526..10935.227 rows=10 loops=1)
>>> Buffers: shared hit=3 read=2316
>>> -> Index Scan using idx_product_t_product_titleprice on t_product (cost=0.00..29762.61 rows=7255 width=16)
(actualtime=
>>> 7243.524..10935.217 rows=10 loops=1)
>>> Index Cond: (to_tsvector('german'::regconfig, title) @@ '''handy'''::tsquery)
>>> Order By: (int_price <-> 0::bigint)
>>> Buffers: shared hit=3 read=2316
>>> Total runtime: 10935.265 ms
>>> (7 rows)
>>>
>>> test=# explain (analyze true, buffers true, costs true) SELECT id FROM product.t_product WHERE
to_tsvector('german',title) @@ to_tsquery('german', 'handy') ORDER BY int_price <-> 0 LIMIT 1;
>>> QUERY PLAN
>>>
>>>
-----------------------------------------------------------------------------------------------------------------------------
>>> -------------------------------
>>> Limit (cost=0.00..4.10 rows=1 width=16) (actual time=28.527..28.528 rows=1 loops=1)
>>> Buffers: shared hit=1 read=1577
>>> -> Index Scan using idx_product_t_product_titleprice on t_product (cost=0.00..29762.61 rows=7255 width=16)
(actualtime=
>>> 28.525..28.525 rows=1 loops=1)
>>> Index Cond: (to_tsvector('german'::regconfig, title) @@ '''handy'''::tsquery)
>>> Order By: (int_price <-> 0::bigint)
>>> Buffers: shared hit=1 read=1577
>>> Total runtime: 28.558 ms
>>> (7 rows)
>>>
>>>
>>> under any circumstances - there is no way to reduce the number of buffers needed for a query like that.
>>> if everything is cached this is still ok but as soon as you have to take a single block from disk you will die a
painfulrandom I/O death.
>>> is there any alternative which does not simply die when i try to achieve what i want?
>>>
>>> the use case is quite simple: all products with a certain word (10 cheapest or so).
>>>
>>> is there any alternative approach to this?
>>> i was putting some hope into KNN but it seems it needs too much random I/O :(.
>>>
>>> many thanks,
>>>
>>> hans
>>>
>>> --
>>> Cybertec Sch?nig & Sch?nig GmbH
>>> Gr?hrm?hlgasse 26
>>> A-2700 Wiener Neustadt, Austria
>>> Web: http://www.postgresql-support.de
>>>
>>>
>>>
>>
>> Regards,
>> Oleg
>> _____________________________________________________________
>> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
>> Sternberg Astronomical Institute, Moscow University, Russia
>> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
>> phone: +007(495)939-16-83, +007(495)939-23-83
>>
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>>
>
> --
> Cybertec Sch?nig & Sch?nig GmbH
> Gr?hrm?hlgasse 26
> A-2700 Wiener Neustadt, Austria
> Web: http://www.postgresql-support.de
>
Regards, Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
price has a problem :(.
"iphone" can be a 20 cents bag or a sticker or a 900 euro thing signed by whoever ...
so, words and the sort-number / price are not related in anyway. price is in this case no way to narrow down the
problem(e.g. evaluate first or so).
many thanks,
hans
On Apr 8, 2011, at 5:25 PM, Oleg Bartunov wrote:
> Hans,
>
> what if you create index (price,title) ?
>
>
> On Fri, 8 Apr 2011, PostgreSQL - Hans-J?rgen Sch?nig wrote:
>
>> hello ...
>>
>> i got that one ...
>>
>> "idx_product_t_product_titleprice" gist (to_tsvector('german'::regconfig, title), int_price)
>>
>> so, i have a combined index on text + number.
>> to me the plan seems fine ... it looks like a prober KNN traversal.
>> the difference between my plan and your plan seems to be the fact that i have, say, 1 mio rows which have "handy" or
soin it (1 mio out of 11 mio or so). you are moving out from one specific place.
>>
>> my maths is like that:
>> 11 mio in total
>> 1 mio matching "iphone"
>> cheapest / most expensive 10 out of this mio needed.
>>
>> operator classes are all nice and in place:
>>
>> SELECT 10 <-> 4 as distance;
>> distance
>> ----------
>> 6
>> (1 row)
>>
>> what does "buffers true" in your case say?
>>
>> many thanks,
>>
>> hans
>>
>>
>> On Apr 8, 2011, at 3:22 PM, Oleg Bartunov wrote:
>>
>>> Probably, you miss two-columnt index. From my early post:
>>> http://www.sai.msu.su/~megera/wiki/knngist
>>>
>>> =# CREATE INDEX spots_idx ON spots USING knngist (coordinates, to_tsvector('french',address));
>>> =# SELECT id, address, (coordinates <-> '(2.29470491409302,48.858263472125)'::point) AS dist FROM spots WHERE
coordinates>< '(2.29470491409302,48.858263472125)'::point AND to_tsvector('french',address) @@
to_tsquery('french','mars') LIMIT 10;
>>> id | address | dist
---------+-------------------------------------------------------------+---------------------
>>> 366096 | 1st Floor Tour Eiffel | Champs de Mars, Paris 75007, France | 2.32488941293945e-05
>>> 4356328 | r Champ de Mars 75007 PARIS | 0.00421854756964406
>>> 5200167 | Champ De Mars 75007 Paris | 0.00453564562587288
>>> 9301676 | Champ de Mars, 75007 Paris, | 0.00453564562587288
>>> 2152213 | 16, ave Rapp, Champ de Mars, Tour Eiffel, Paris, France | 0.00624152097590896
>>> 1923818 | Champ de Mars Paris, France | 0.00838214733539654
>>> 5165953 | 39 Rue Champ De Mars Paris, France | 0.00874410234569529
>>> 7395870 | 39 Rue Champ De Mars Paris, France | 0.00874410234569529
>>> 4358671 | 32 Rue Champ De Mars Paris, France | 0.00876089659276339
>>> 1923742 | 12 rue du Champ de Mars Paris, France | 0.00876764731845995
>>> (10 rows)
>>>
>>> Time: 7.859 ms
>>>
>>> =# EXPLAIN (COSTS OFF) SELECT id, address FROM spots WHERE coordinates ><
'(2.29470491409302,48.858263472125)'::point
>>> AND to_tsvector('french',address) @@ to_tsquery('french','mars') LIMIT 10;
>>>
>>> QUERY PLAN
>>>
--------------------------------------------------------------------------------------------------------------------------------------
>>> Limit
>>> -> Index Scan using spots_idx on spots
>>> Index Cond: ((coordinates >< '(2.29470491409302,48.858263472125)'::point) AND
(to_tsvector('french'::regconfig,address) @@ '''mar'''::tsquery))
>>> (3 rows)
>>>
>>>
>>> On Fri, 8 Apr 2011, PostgreSQL - Hans-J?rgen Sch?nig wrote:
>>>
>>>> hello all ...
>>>>
>>>> given oleg's posting before i also wanted to fire up some KNN related question.
>>>> let us consider a simple example. i got some million lines and i want all rows matching a tsquery sorted by price.
>>>> i did some tests:
>>>>
>>>> test=# explain (analyze true, buffers true, costs true) SELECT id FROM product.t_product WHERE
to_tsvector('german',title) @@ to_tsquery('german', 'iphone') ORDER BY int_price <-> 0 LIMIT 10;
>>>> QUERY PLAN
>>>>
>>>>
-----------------------------------------------------------------------------------------------------------------------------
>>>> --------------------------------------
>>>> Limit (cost=0.00..41.11 rows=10 width=16) (actual time=36391.717..45542.590 rows=10 loops=1)
>>>> Buffers: shared hit=9 read=5004
>>>> -> Index Scan using idx_product_t_product_titleprice on t_product (cost=0.00..13251.91 rows=3224 width=16)
(actualtime=
>>>> 36391.715..45542.573 rows=10 loops=1)
>>>> Index Cond: (to_tsvector('german'::regconfig, title) @@ '''iphon'''::tsquery)
>>>> Order By: (int_price <-> 0::bigint)
>>>> Buffers: shared hit=9 read=5004
>>>> Total runtime: 45542.676 ms
>>>> (7 rows)
>>>>
>>>> test=# explain (analyze true, buffers true, costs true) SELECT id FROM product.t_product WHERE
to_tsvector('german',title) @@ to_tsquery('german', 'handy') ORDER BY int_price <-> 0 LIMIT 10;
>>>> QUERY PLAN
>>>>
>>>>
-----------------------------------------------------------------------------------------------------------------------------
>>>> -------------------------------------
>>>> Limit (cost=0.00..41.03 rows=10 width=16) (actual time=7243.526..10935.227 rows=10 loops=1)
>>>> Buffers: shared hit=3 read=2316
>>>> -> Index Scan using idx_product_t_product_titleprice on t_product (cost=0.00..29762.61 rows=7255 width=16)
(actualtime=
>>>> 7243.524..10935.217 rows=10 loops=1)
>>>> Index Cond: (to_tsvector('german'::regconfig, title) @@ '''handy'''::tsquery)
>>>> Order By: (int_price <-> 0::bigint)
>>>> Buffers: shared hit=3 read=2316
>>>> Total runtime: 10935.265 ms
>>>> (7 rows)
>>>>
>>>> test=# explain (analyze true, buffers true, costs true) SELECT id FROM product.t_product WHERE
to_tsvector('german',title) @@ to_tsquery('german', 'handy') ORDER BY int_price <-> 0 LIMIT 1;
>>>> QUERY PLAN
>>>>
>>>>
-----------------------------------------------------------------------------------------------------------------------------
>>>> -------------------------------
>>>> Limit (cost=0.00..4.10 rows=1 width=16) (actual time=28.527..28.528 rows=1 loops=1)
>>>> Buffers: shared hit=1 read=1577
>>>> -> Index Scan using idx_product_t_product_titleprice on t_product (cost=0.00..29762.61 rows=7255 width=16)
(actualtime=
>>>> 28.525..28.525 rows=1 loops=1)
>>>> Index Cond: (to_tsvector('german'::regconfig, title) @@ '''handy'''::tsquery)
>>>> Order By: (int_price <-> 0::bigint)
>>>> Buffers: shared hit=1 read=1577
>>>> Total runtime: 28.558 ms
>>>> (7 rows)
>>>>
>>>>
>>>> under any circumstances - there is no way to reduce the number of buffers needed for a query like that.
>>>> if everything is cached this is still ok but as soon as you have to take a single block from disk you will die a
painfulrandom I/O death.
>>>> is there any alternative which does not simply die when i try to achieve what i want?
>>>>
>>>> the use case is quite simple: all products with a certain word (10 cheapest or so).
>>>>
>>>> is there any alternative approach to this?
>>>> i was putting some hope into KNN but it seems it needs too much random I/O :(.
>>>>
>>>> many thanks,
>>>>
>>>> hans
>>>>
>>>> --
>>>> Cybertec Sch?nig & Sch?nig GmbH
>>>> Gr?hrm?hlgasse 26
>>>> A-2700 Wiener Neustadt, Austria
>>>> Web: http://www.postgresql-support.de
>>>>
>>>>
>>>>
>>>
>>> Regards,
>>> Oleg
>>> _____________________________________________________________
>>> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
>>> Sternberg Astronomical Institute, Moscow University, Russia
>>> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
>>> phone: +007(495)939-16-83, +007(495)939-23-83
>>>
>>> --
>>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-hackers
>>>
>>
>> --
>> Cybertec Sch?nig & Sch?nig GmbH
>> Gr?hrm?hlgasse 26
>> A-2700 Wiener Neustadt, Austria
>> Web: http://www.postgresql-support.de
>>
>
> Regards,
> Oleg
> _____________________________________________________________
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de