Re: Feature proposal: immutable/sealed partitions (and maybe tables, too)

Поиск
Список
Период
Сортировка
От Ron
Тема Re: Feature proposal: immutable/sealed partitions (and maybe tables, too)
Дата
Msg-id 6a0551f5-3930-5cff-2886-80396f39f803@gmail.com
обсуждение исходный текст
Ответ на Feature proposal: immutable/sealed partitions (and maybe tables, too)  (Levi Aul <levi@covalenthq.com>)
Список pgsql-general
By "SEALED", do you mean "READ ONLY"?

On 9/6/22 14:39, Levi Aul wrote:
> My company runs some large OLAP data warehouses with append-only, 
> time-partitioned datasets. Our workloads involve aggregations and joins, 
> and query the data in ways not amenable to constraint-exclusion; and we 
> serve a high concurrent number of these queries at once from a single DB.
>
> In other words, our  workload is inherently one that acquires "way too 
> many locks." Our largest performance bottleneck, according to 
> pg_wait_sampling, is the LockManager itself. Despite most of our queries 
> spending only milliseconds actually executing, they often spend seconds 
> during planning waiting to acquire hundreds of access-shared locks.
>
> Given that our datasets are append-only, all our partitions for each table 
> save for the one "active" one (the one for the current time period) are 
> effectively immutable. No DML-triggered writes will occur to these. I 
> think this is pretty common in data-warehouse use-cases of PG.
>
> If PG could avoid the need to acquire the locks for these 
> effectively-immutable partitions, then the remaining number of tables 
> would be low enough to fit into the per-backend LWLock slots set, and so 
> avoid LockManager contention. I believe this could be a large optimization 
> not just for our use-case, but in a number of other high-concurrency OLAP 
> use-cases.
>
> My proposal for how this "lock elision under large numbers of immutable 
> partitions" could be accomplished:
>
> 1. Add some DDL statement to mark partitions as sealed/unsealed. (ALTER 
> TABLE ... SEAL PARTITION foo)
> 2. When query-planning DML against a partition or a partitioned table, 
> treat a sealed partition as if it had an always-false check constraint.
> 2. Define a "locking group" abstraction, where many entities can register 
> themselves under the same lock, such that access to all members of the 
> locking group requires only acquiring the single locking-group lock. All 
> sealed partitions of the same table would share a locking group.
>
> Under such a setup, querying a time-based partitioned table with one 
> active (unsealed) partition would only ever require acquiring, at most, 
> two locks — the one for the active partition, and the one for the 
> sealed-partitions locking group.
>
> The trade-off for this is that acquiring an exclusive-access lock on the 
> sealed-partitions locking-group for a table becomes much more expensive 
> than it would have been to acquire for a single partition. But this isn't 
> a problem in practice, because hot-path operations that take an 
> exclusive-access lock (DML writes) are disallowed against sealed 
> partitions. The only time the lock-group would need to be exclusive-access 
> acquired, would be to change its membership — an administrative DDL operation.
>
> Besides being useful operationally, such a mechanism would also be helpful 
> on the business-logic level, as you can rely on partition sealing to turn 
> accidental insertions of new data into any but the active partition(s) 
> into a constraint violation. (Currently, to achieve this, separate 
> triggers need to be maintained on each sealed partition.)
>
> And, with knowledge of the administrative intent for a table to be 
> immutable, further operational optimizations could be performed. A few off 
> the top of my head:
>
> 1. Running CLUSTER or VACUUM (FULL, FREEZE) after the partition is marked 
> as immutable, could rewrite the table using an implicit "heap_immutable" 
> access method (still reported as "heap"), which would drop the min_xid 
> column (as everything in a sealed table is guaranteed to be 
> always-visible), and thus remove the table for consideration for 
> xid-wraparound-protection rewriting. Such partitions would then require a 
> rewrite back to "heap" if unsealed.
>
> 2. Alternatively, such storage-rewriting DDL statements could switch the 
> table — and its indices — over to using an entirely different 
> access-methods, which would store the data+indices in "perfect" packed 
> forms, to maximize read performance while also minimizing disk usage.
>
> 3. ANALYZE could have an (otherwise-impractical) "EXACT" argument, to 
> populate statistics with exact aggregate values, requiring reading all 
> rows rather than sampling rows. This could pre-bake table-level aggregates 
> for most columns, like having a single, table-sized BRIN block-range.
>
> If this concept of "marking as sealed" were extended to tables rather than 
> only partitions, then further work could be done related to optimization 
> of bulk loads — e.g. having CREATE TABLE AS ... SEALED not generate WAL 
> segments for the table as it is populated, but rather treat the table as 
> UNLOGGED during population, and then, after creation, take the entire 
> finalized/sealed table's backing files and either pass them directly to 
> archive_command / send them directly to WAL receivers; or split+stream 
> them into retrospective WAL segments (each segment containing a single 
> "put this 16MB of data into this file at this position" op), and send those.

-- 
Angular momentum makes the world go 'round.



В списке pgsql-general по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: [EXT] Re: log_min_messages = warning
Следующее
От: "Dirschel, Steve"
Дата:
Сообщение: RE: [EXT] Re: log_min_messages = warning