Re: [HACKERS] What about LIMIT in SELECT ?

Поиск
Список
Период
Сортировка
От Bruce Momjian
Тема Re: [HACKERS] What about LIMIT in SELECT ?
Дата
Msg-id 199810141827.OAA29639@candle.pha.pa.us
обсуждение исходный текст
Ответ на Re: [HACKERS] What about LIMIT in SELECT ?  (Bruce Momjian <maillist@candle.pha.pa.us>)
Ответы Re: [HACKERS] What about LIMIT in SELECT ?  (jwieck@debis.com (Jan Wieck))
Список pgsql-hackers
> Thomas is correct on this.  Vadim has run some tests, and with our
> optimized psort() code, the in-memory sort is often faster than using
> the index to get the tuple, because you are jumping all over the drive.
> I don't remember, but obviously there is a break-even point where
> getting X rows using the index on a table of Y rows is faster , but
> getting X+1 rows on a table of Y rows is faster getting all the rows
> sequentailly, and doing the sort.
>
> You would have to pick only certain queries(no joins, index matches
> ORDER BY), take the number of rows requested, and the number of rows
> selected, and figure out if it is faster to use the index, or a
> sequential scan and do the ORDER BY yourself.
>
> Add to this the OFFSET capability.  I am not sure how you are going to
> get into the index and start at the n-th entry, unless perhaps you just
> sequential scan the index.
>
> In fact, many queries just get column already indexed, and we could just
> pull the data right out of the index.
>
> I have added this to the TODO list:
>
>     * Pull requested data directly from indexes, bypassing heap data
>
> I think this has to be post-6.4 work, but I think we need to work in
> this direction.  I am holding off any cnfify fixes for post-6.4, but a
> 6.4.1 performance release certainly is possible.
>
>
> But, you are correct that certain cases where in index is already being
> used on a query, you could just skip the sort IF you used the index to
> get the rows from the base table.

I have had more time to think about this.  Basically, for pre-sorted
data, our psort code is very fast, because it does not need to sort
anything.  It just moves the rows in and out of the sort memory.  Yes,
it could be removed in some cases, and probably should be, but it is not
going to produce great speedups.

The more general case I will describe below.

First, let's look at a normal query:

    SELECT *
    FROM tab
    ORDER BY col1

This is not going to use an index, and probably should not because it is
faster for large tables to sort them in memory, rather than moving all
over the disk.  For small tables, if the entire table fits in the buffer
cache, it may be faster to use the index, but on a small table the sort
doesn't take very long either, and the buffer cache effectiveness is
affected by other backends using it, so it may be better not to count on
it for a speedup.

However, if you only want the first 10 rows, that is a different story.
We pull all the rows into the backend, sort them, then return 10 rows.
The query, if we could do it, should be written as:

    SELECT *
    FROM tab
    WHERE col1 < some_unknown_value
    ORDER BY col1

In this case, the optimizer looks at the column statistics, and properly
uses an index to pull only a small subset of the table.  This is the
type of behavior people want for queries returning only a few values.

But, unfortunately, we don't know that mystery value.

Now, everyone agrees we need an index matching the ORDER BY to make this
query quick, but we don't know that mystery value, so currently we
execute the whole query, and do a fetch.

What I am now thinking is that maybe we need a way to walk around that
index.  Someone months ago asked how to do that, and we told him he
couldn't, because this not a C-ISAM/dbm type database.  However, if we
could somehow pass into the query the index location we want to start
at, and how many rows we need, that would solve our problem, and perhaps
even allow joined queries to work, assuming the table in the ORDER BY is
in an outer join loop.

    SELECT *
    FROM tab
    WHERE col1 < some_unknown_value
    ORDER BY col1
    USING INDEX tab_idx(452) COUNT 100

where 452 is an 452th index entry, and COUNT is the number of index rows
you want to process.  The query may return more or less than 100 rows if
there is a join and it joins to zero or more than one row in the joined
table, but this seems like perhaps a good way to go at it.  We need to
do it this way because if a single index row returns 4 result rows, and
only two of the four rows fit in the number of rows returnd as set by the
user, it is hard to re-start the query at the proper point, because you
would have to process the index rows a second time, and return just part
of the result, and that is hard.

If the index changes, or rows are added, the results are going to be
unreliable, but that is probably going to be true of any state-less
implementation we can devise.

I think this may be fairly easy to implement.  We could sequential scan
the index to get to the 452th row.  That is going to be quick.  We can
pass the 452 into the btree index code, so only a certain range of index
tuples are returned, and the system believes it has processed the entire
query, while we know it hasn't.  Doesn't really work with hash, so we
will not allow it for those indexes.

To make it really easy, we could implement it as a 'SET' command, so we
don't actually have it as part of the query, and have to pass it around
through all the modules.  You would do the proper 'SET' before running
the query.  Optimizer would look at 'SET' value to force index use.

    SET INDEX TO tab_idx START 452 COUNT 100

or

    SET INDEX TO tab_idx FROM 452 COUNT 451

There would have to be some way to signal that the end of the index had
been reached, because returning zero rows is not enough of a guarantee
in a joined SELECT.

Comments?

--
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Thomas G. Lockhart"
Дата:
Сообщение: Re: [DOCS] Re: [HACKERS] Alternative to LIMIT in SELECT ?
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: [HACKERS] What about LIMIT in SELECT ?