Re: Feature proposal: immutable/sealed partitions (and maybe tables, too)
От | Ron |
---|---|
Тема | Re: Feature proposal: immutable/sealed partitions (and maybe tables, too) |
Дата | |
Msg-id | 6a0551f5-3930-5cff-2886-80396f39f803@gmail.com обсуждение исходный текст |
Ответ на | Feature proposal: immutable/sealed partitions (and maybe tables, too) (Levi Aul <levi@covalenthq.com>) |
Список | pgsql-general |
By "SEALED", do you mean "READ ONLY"? On 9/6/22 14:39, Levi Aul wrote: > My company runs some large OLAP data warehouses with append-only, > time-partitioned datasets. Our workloads involve aggregations and joins, > and query the data in ways not amenable to constraint-exclusion; and we > serve a high concurrent number of these queries at once from a single DB. > > In other words, our workload is inherently one that acquires "way too > many locks." Our largest performance bottleneck, according to > pg_wait_sampling, is the LockManager itself. Despite most of our queries > spending only milliseconds actually executing, they often spend seconds > during planning waiting to acquire hundreds of access-shared locks. > > Given that our datasets are append-only, all our partitions for each table > save for the one "active" one (the one for the current time period) are > effectively immutable. No DML-triggered writes will occur to these. I > think this is pretty common in data-warehouse use-cases of PG. > > If PG could avoid the need to acquire the locks for these > effectively-immutable partitions, then the remaining number of tables > would be low enough to fit into the per-backend LWLock slots set, and so > avoid LockManager contention. I believe this could be a large optimization > not just for our use-case, but in a number of other high-concurrency OLAP > use-cases. > > My proposal for how this "lock elision under large numbers of immutable > partitions" could be accomplished: > > 1. Add some DDL statement to mark partitions as sealed/unsealed. (ALTER > TABLE ... SEAL PARTITION foo) > 2. When query-planning DML against a partition or a partitioned table, > treat a sealed partition as if it had an always-false check constraint. > 2. Define a "locking group" abstraction, where many entities can register > themselves under the same lock, such that access to all members of the > locking group requires only acquiring the single locking-group lock. All > sealed partitions of the same table would share a locking group. > > Under such a setup, querying a time-based partitioned table with one > active (unsealed) partition would only ever require acquiring, at most, > two locks — the one for the active partition, and the one for the > sealed-partitions locking group. > > The trade-off for this is that acquiring an exclusive-access lock on the > sealed-partitions locking-group for a table becomes much more expensive > than it would have been to acquire for a single partition. But this isn't > a problem in practice, because hot-path operations that take an > exclusive-access lock (DML writes) are disallowed against sealed > partitions. The only time the lock-group would need to be exclusive-access > acquired, would be to change its membership — an administrative DDL operation. > > Besides being useful operationally, such a mechanism would also be helpful > on the business-logic level, as you can rely on partition sealing to turn > accidental insertions of new data into any but the active partition(s) > into a constraint violation. (Currently, to achieve this, separate > triggers need to be maintained on each sealed partition.) > > And, with knowledge of the administrative intent for a table to be > immutable, further operational optimizations could be performed. A few off > the top of my head: > > 1. Running CLUSTER or VACUUM (FULL, FREEZE) after the partition is marked > as immutable, could rewrite the table using an implicit "heap_immutable" > access method (still reported as "heap"), which would drop the min_xid > column (as everything in a sealed table is guaranteed to be > always-visible), and thus remove the table for consideration for > xid-wraparound-protection rewriting. Such partitions would then require a > rewrite back to "heap" if unsealed. > > 2. Alternatively, such storage-rewriting DDL statements could switch the > table — and its indices — over to using an entirely different > access-methods, which would store the data+indices in "perfect" packed > forms, to maximize read performance while also minimizing disk usage. > > 3. ANALYZE could have an (otherwise-impractical) "EXACT" argument, to > populate statistics with exact aggregate values, requiring reading all > rows rather than sampling rows. This could pre-bake table-level aggregates > for most columns, like having a single, table-sized BRIN block-range. > > If this concept of "marking as sealed" were extended to tables rather than > only partitions, then further work could be done related to optimization > of bulk loads — e.g. having CREATE TABLE AS ... SEALED not generate WAL > segments for the table as it is populated, but rather treat the table as > UNLOGGED during population, and then, after creation, take the entire > finalized/sealed table's backing files and either pass them directly to > archive_command / send them directly to WAL receivers; or split+stream > them into retrospective WAL segments (each segment containing a single > "put this 16MB of data into this file at this position" op), and send those. -- Angular momentum makes the world go 'round.
В списке pgsql-general по дате отправления: