Efficient Partitioning Strategies for PostgreSQL Table with KSUID and High Volume

Поиск
Список
Период
Сортировка
От wheels
Тема Efficient Partitioning Strategies for PostgreSQL Table with KSUID and High Volume
Дата
Msg-id CAEF_WQR1r=h+kvqZn4Tks7Qz9vo8J8po9nQ1oNW-Ky0nTkOLKg@mail.gmail.com
обсуждение исходный текст
Ответы Re: Efficient Partitioning Strategies for PostgreSQL Table with KSUID and High Volume  (Lorusso Domenico <domenico.l76@gmail.com>)
Список pgsql-general
Hi Team,

I'm working with a PostgreSQL table containing terabytes of data, and
it grows by millions of rows weekly. Each row is identified by a
[KSUID][1], and my primary read patterns are:

1. Retrieve a row by its KSUID.
2. List rows by `user_id` in descending order, pagination acceptable.

Currently, the table is unpartitioned and read performance is
sluggish. I'm contemplating partitioning the table by month using the
KSUID column, [leveraging its embedded uint32 timestamp][2], something
like this:

```sql
CREATE TABLE table_y2023m09 PARTITION OF ksuid
FOR VALUES FROM ('[current_month_ts][128 zeros]') TO
('[next_month_ts][128 zeros]')
```

This allows each 'Get row by KSUID' query to be isolated to a single partition.

For listing rows by `user_id`, I'm considering keyset pagination:

```sql
SELECT *
FROM table_name
WHERE user_id = ?
  AND ksuid > last_seen_ksuid
ORDER BY ksuid
LIMIT 10;
```

However, this method still would need to search through multiple
partitions depending on `last_seen_ksuid`, but I guess that with an
index by `user_id` might be enough.

### Questions:

1. Is using KSUID as a partitioning key viable, especially given that
the column can be represented as text or bytes?
2. Is there a more efficient way to implement listing by `user_id`
other than keyset pagination?
3. Are there any pitfalls or performance issues I should be aware of
with this approach?
4. Would it be better to just partition based on `created_at` and
extract the timestamp from the ksuid on application layer and add it
explicitly to the query?

Thank you very much,
Best regards.

  [1]: https://github.com/segmentio/ksuid
  [2]: https://github.com/segmentio/ksuid#:~:text=a%2032%2Dbit%20unsigned%20integer%20UTC%20timestamp



В списке pgsql-general по дате отправления:

Предыдущее
От: Lorusso Domenico
Дата:
Сообщение: rollback to savepoint issue
Следующее
От: Erik Wienhold
Дата:
Сообщение: Re: rollback to savepoint issue