Re: Index-only scans for GIST

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: Index-only scans for GIST
Дата
Msg-id 53E2906E.8080707@vmware.com
обсуждение исходный текст
Ответ на Index-only scans for GIST  (Anastasia Lubennikova <lubennikovaav@gmail.com>)
Ответы Re: Index-only scans for GIST
Список pgsql-hackers
On 08/01/2014 10:58 AM, Anastasia Lubennikova wrote:
> Hi, hackers!
> I work on a GSoC project "Index-only scans for GIST"
> https://wiki.postgresql.org/wiki/Support_for_Index-only_scans_for_GIST_GSoC_2014
>
> Repository is
> https://github.com/lubennikovaav/postgres/tree/indexonlygist2
> Patch is in attachments.

Thanks!

Some comments:

* I got this compiler warning:

gistget.c:556:5: warning: ISO C90 forbids mixed declarations and code 
[-Wdeclaration-after-statement]     ListCell *tmpPageData = so->curPageData;     ^

* I'm getting two regression failures with this (opr_sanity and join).

* After merging with master, build fails because of duplicate OIDs.

* The regression test queries that use LIMIT are not guaranteed to 
always return the same rows, hence they're not very good regression test 
cases. I'd suggest using more restricting WHERE clauses, so that each 
query only returns a handful of rows.

* What's the reason for turning GISTScanOpaqueData.pageData from an 
array to a List?

* I think it's leaking memory, in GIST scan context. I tested this with 
a variant of the regression tests:

insert into gist_tbl select box(point(0.05*i, 0.05*i), point(0.05*i, 
0.05*i)),                         point(0.05*i, 0.05*i) FROM generate_series(0, 
10000000) as i;
CREATE INDEX gist_tbl_point_index ON gist_tbl USING gist (p);

set enable_seqscan=off;
set enable_bitmapscan=off;

explain analyze  select p from gist_tbl where p <@ box(point(0,0), 
point(9999999,9999999)) and length(p::text) < 10;

while the final query runs, 'top' shows constantly increasing memory usage.

> It includes index-only scans for multicolumn GIST and new regression test.
> Fetch() method is realized for box and point opclasses.

Can we have Fetch functions for all the datatypes in btree_gist contrib 
module, please? Do other contrib modules contain GiST opclasses that 
could have Fetch functions?

> Documentation is not updated yet, but I'm going to do it till the end of
> GSoC.
>
> I've got one question about query with OR condition. It is the last query
> in regression test "gist_indexonly". It doesn't fail but it doensn't use
> index-only scans. Could someone explain to me how it works?
> It seems to depend on build_paths_for_OR
> <http://doxygen.postgresql.org/indxpath_8c.html#ae660d2e886355e53ed3b9ec693e4afd2>
> function.
> But I couldn't understand how.

The query is:

select * from gist_tbl
where b <@ box(point(5,5),  point(6,6))
or p <@ box(point(0,0),  point(100,100)) limit 10;

It cannot use an index(-only) scan for this, because a single index scan 
can only return rows based on one key. In this case, you need to do two 
scans, and then return the rows returned by either scan, removing 
duplicates. A bitmap scan is possible, because it can remove the 
duplicates, but the planner can't produce a plain index scan plan that 
would do the same.

A common trick when that happens in a real-world application is to 
re-write the query using UNION:

select * from gist_tbl
where b <@ box(point(5,5),  point(6,6))
UNION
select * from gist_tbl
where p <@ box(point(0,0),  point(100,100))
limit 10;

Although that doesn't seem to actually work:

ERROR:  could not identify an equality operator for type box
LINE 1: select * from gist_tbl               ^

but that's not your patch's fault, the same happens with unpatched master.

IOW, you don't need to worry about that case.

- Heikki




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: Re: PostrgeSQL vs oracle doing 1 million sqrts am I doing it wrong?
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: select_common_type()'s behavior doesn't match the documentation