Обсуждение: Index for range queries on JSON (user defined fields)

Поиск
Список
Период
Сортировка

Index for range queries on JSON (user defined fields)

От
Marco Colli
Дата:
Hello!

We have a multi-tenant service where each customer has millions of users (total: ~150M rows). Now we would like to let each customer define some custom columns for his users and then let the customer search his users efficiently based on these columns.

This problem seems really hard to solve with PostgreSQL:

In particular the easiest way would be to add a JSON field on the users table (e.g. user metadata). However the PostgreSQL GIN index only supports exact matches and not range queries. This means that a query on a range (e.g. age > 30) would be extremely inefficient and would result in a table scan.

Algorithmically it seems possible to use a GIN index (based on btree) for a range query. Also MongoDB seems to support something similar (https://docs.mongodb.com/manual/core/index-wildcard/).

Are there any plans to add support for range queries to GIN indexes (on JSON) in the future versions of PostgreSQL?


Marco Colli
Pushpad

Re: Index for range queries on JSON (user defined fields)

От
Nick Cleaton
Дата:
On Fri, 4 Dec 2020 at 15:39, Marco Colli <collimarco91@gmail.com> wrote:
Hello!

We have a multi-tenant service where each customer has millions of users (total: ~150M rows). Now we would like to let each customer define some custom columns for his users and then let the customer search his users efficiently based on these columns.

This problem seems really hard to solve with PostgreSQL:

In particular the easiest way would be to add a JSON field on the users table (e.g. user metadata). However the PostgreSQL GIN index only supports exact matches and not range queries. This means that a query on a range (e.g. age > 30) would be extremely inefficient and would result in a table scan.

You could have a table of (tenant, customer, setting_name, setting_value) so that a btree index on (tenant, setting_name, setting_value) would work for "select customer from my_table where tenant=$1 and setting_name='age' and setting_value > 30"

That doesn't deal with setting values having a variety of types, but you could have a distinct user defined settings table for each setting value type that you want to support.

Re: Index for range queries on JSON (user defined fields)

От
Marco Colli
Дата:
Thanks for the suggestion: I had already considered that solution (first link), but the fear is having to JOIN large tables with hundreds of millions of records. 

For my understanding **using JOIN when dealing with big data is bad and a nightmare for performance**: can you confirm? Or am I missing something?

That tables would be frequently read and updated and are the core of the application: that also means that every update on a user would produce **many dead rows** - not just 1 user row, as in the case of JSON, but many rows in the user metadata table.





On Fri, Dec 4, 2020 at 11:40 PM Nick Cleaton <nick@cleaton.net> wrote:
On Fri, 4 Dec 2020 at 15:39, Marco Colli <collimarco91@gmail.com> wrote:
Hello!

We have a multi-tenant service where each customer has millions of users (total: ~150M rows). Now we would like to let each customer define some custom columns for his users and then let the customer search his users efficiently based on these columns.

This problem seems really hard to solve with PostgreSQL:

In particular the easiest way would be to add a JSON field on the users table (e.g. user metadata). However the PostgreSQL GIN index only supports exact matches and not range queries. This means that a query on a range (e.g. age > 30) would be extremely inefficient and would result in a table scan.

You could have a table of (tenant, customer, setting_name, setting_value) so that a btree index on (tenant, setting_name, setting_value) would work for "select customer from my_table where tenant=$1 and setting_name='age' and setting_value > 30"

That doesn't deal with setting values having a variety of types, but you could have a distinct user defined settings table for each setting value type that you want to support.