Re: DRAFT: Pass sk_attno to consistent function

Поиск
Список
Период
Сортировка
От Michał Kłeczek
Тема Re: DRAFT: Pass sk_attno to consistent function
Дата
Msg-id E17B60DE-42A3-4AAD-81C4-56CFF1D4CDD5@kleczek.org
обсуждение исходный текст
Ответ на Re: DRAFT: Pass sk_attno to consistent function  (Michał Kłeczek <michal@kleczek.org>)
Ответы Re: DRAFT: Pass sk_attno to consistent function  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
Список pgsql-hackers
Hi All,

Since it looks like there is not much interest in the patch I will try to provide some background to explain why I
thinkit is needed. 

We are in the process of migration from an old db platform to PostgreSQL.
Our database is around 10TB big and contains around 10 billion financial transactions in a single table.
Each transaction is assigned to an account (column acc_number).

We have partitioned the table BY HASH (acc_number).

A client can query transactions belonging to his accounts using several criteria - among them is te xt search.
Queries are of type TOP N (ie ORDER BY … LIMIT ).

The list of accounts that we are querying is provided as a parameter to the query.

We have decided to use a single Gist index supporting all queries (reasons described in [1]).

There are several problems with Gist usage (but I still think we have no other choice) but the most important is
that we cannot use SAOP in our queries - since Gist does not support it the planner decides to perform Bitmap Scan
which in turn does not support ORDER BY … LIMIT well because requires Sort.

So when we use “= ANY (array of account numbers) … ORDER BY ...” the plan requires reading all records meeting
search criteria and then sort.

As a workaround we have to perform LATERAL joins:

unnest(list of account numbers) AS a(n) LATERAL JOIN (SELECT * FROM … WHERE account_number = a.n ORDER BY … LIMIT …)
ORDERBY … LIMIT … 

It is still bad because requires multiple scans of the same partition if account number hashes are the same.

What we really need is for Gist to support “= ANY (…)”.
As Gist index is extensible in terms of queries it supports it is quite easy to implement an operator class/family with
operator:

||= (text, text[])

that has semantics the same as “= ANY (…)”

With this operator we can write our queries like:

account_number ||= [list of account numbers] AND
account_number = ANY ([list of account numbers]) — redundant for partition pruning as it does not understand ||=

and have optimal plans:

Limit
— Merge Append
—— Index scan of relevant partitions

The problem is that now each partition scan is for the same list of accounts.
The “consistent” function cannot assume anything about contents of the table so it has to check all elements of the
list
and that in turn causes reading unnecessary index pages for accounts not in this partition.

What we need is a way for the “consistent” function to be able to pre-process input query array and remove elements
that are not relevant for this scan. To do that it is necessary to have enough information to read necessary metadata
fromthe catalog. 

The proposed patch addresses this need and seems (to me) largely uncontroversial as it does not break any existing
extensions.

Attached is a patch with consolidated changes (I am pretty new to managing patches so previous two were partial and not
somethingshareable, I am afraid). 

—
Michal

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jelte Fennema-Nio
Дата:
Сообщение: Re: Possibility to disable `ALTER SYSTEM`
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Possibility to disable `ALTER SYSTEM`