Re: Incremental View Maintenance, take 2

Поиск
Список
Период
Сортировка
От jian he
Тема Re: Incremental View Maintenance, take 2
Дата
Msg-id CACJufxE8Kct5=c1KKi_JgP09WApCk0Myk6=a5xo05_J_mB7hRQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Incremental View Maintenance, take 2  (Yugo NAGATA <nagata@sraoss.co.jp>)
Ответы Re: Incremental View Maintenance, take 2  (Yugo NAGATA <nagata@sraoss.co.jp>)
Список pgsql-hackers
On Thu, Jun 1, 2023 at 2:47 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote:
>
> On Thu, 1 Jun 2023 23:59:09 +0900
> Yugo NAGATA <nagata@sraoss.co.jp> wrote:
>
> > Hello hackers,
> >
> > Here's a rebased version of the patch-set adding Incremental View
> > Maintenance support for PostgreSQL. That was discussed in [1].
>
> > [1] https://www.postgresql.org/message-id/flat/20181227215726.4d166b4874f8983a641123f5%40sraoss.co.jp
>
> ---------------------------------------------------------------------------------------
> * Overview
>
> Incremental View Maintenance (IVM) is a way to make materialized views
> up-to-date by computing only incremental changes and applying them on
> views. IVM is more efficient than REFRESH MATERIALIZED VIEW when
> only small parts of the view are changed.
>
> ** Feature
>
> The attached patchset provides a feature that allows materialized views
> to be updated automatically and incrementally just after a underlying
> table is modified.
>
> You can create an incementally maintainable materialized view (IMMV)
> by using CREATE INCREMENTAL MATERIALIZED VIEW command.
>
> The followings are supported in view definition queries:
> - SELECT ... FROM ... WHERE ..., joins (inner joins, self-joins)
> - some built-in aggregate functions (count, sum, avg, min, max)
> - GROUP BY clause
> - DISTINCT clause
>
> Views can contain multiple tuples with the same content (duplicate tuples).
>
> ** Restriction
>
> The following are not supported in a view definition:
> - Outer joins
> - Aggregates otehr than above, window functions, HAVING
> - Sub-queries, CTEs
> - Set operations (UNION, INTERSECT, EXCEPT)
> - DISTINCT ON, ORDER BY, LIMIT, OFFSET
>
> Also, a view definition query cannot contain other views, materialized views,
> foreign tables, partitioned tables, partitions, VALUES, non-immutable functions,
> system columns, or expressions that contains aggregates.
>
> ---------------------------------------------------------------------------------------
> * Design
>
> An IMMV is maintained using statement-level AFTER triggers.
> When an IMMV is created, triggers are automatically created on all base
> tables contained in the view definition query.
>
> When a table is modified, changes that occurred in the table are extracted
> as transition tables in the AFTER triggers. Then, changes that will occur in
> the view are calculated by a rewritten view dequery in which the modified table
> is replaced with the transition table.
>
> For example, if the view is defined as "SELECT * FROM R, S", and tuples inserted
> into R are stored in a transiton table dR, the tuples that will be inserted into
> the view are calculated as the result of "SELECT * FROM dR, S".
>
> ** Multiple Tables Modification
>
> Multiple tables can be modified in a statement when using triggers, foreign key
> constraint, or modifying CTEs. When multiple tables are modified, we need
> the state of tables before the modification.
>
> For example, when some tuples, dR and dS, are inserted into R and S respectively,
> the tuples that will be inserted into the view are calculated by the following
> two queries:
>
>  "SELECT * FROM dR, S_pre"
>  "SELECT * FROM R, dS"
>
> where S_pre is the table before the modification, R is the current state of
> table, that is, after the modification. This pre-update states of table
> is calculated by filtering inserted tuples and appending deleted tuples.
> The subquery that represents pre-update state is generated in get_prestate_rte().
> Specifically, the insterted tuples are filtered by calling IVM_visible_in_prestate()
> in WHERE clause. This function checks the visibility of tuples by using
> the snapshot taken before table modification. The deleted tuples are contained
> in the old transition table, and this table is appended using UNION ALL.
>
> Transition tables for each modification are collected in each AFTER trigger
> function call. Then, the view maintenance is performed in the last call of
> the trigger.
>
> In the original PostgreSQL, tuplestores of transition tables are freed at the
> end of each nested query. However, their lifespan needs to be prolonged to
> the end of the out-most query in order to maintain the view in the last AFTER
> trigger. For this purpose, SetTransitionTablePreserved is added in trigger.c.
>
> ** Duplicate Tulpes
>
> When calculating changes that will occur in the view (= delta tables),
> multiplicity of tuples are calculated by using count(*).
>
> When deleting tuples from the view, tuples to be deleted are identified by
> joining the delta table with the view, and tuples are deleted as many as
> specified multiplicity by numbered using row_number() function.
> This is implemented in apply_old_delta().
>
> When inserting tuples into the view, each tuple is duplicated to the
> specified multiplicity using generate_series() function. This is implemented
> in apply_new_delta().
>
> ** DISTINCT clause
>
> When DISTINCT is used, the view has a hidden column __ivm_count__ that
> stores multiplicity for tuples. When tuples are deleted from or inserted into
> the view, the values of __ivm_count__ column is decreased or increased as many
> as specified multiplicity. Eventually, when the values becomes zero, the
> corresponding tuple is deleted from the view.  This is implemented in
> apply_old_delta_with_count() and apply_new_delta_with_count().
>
> ** Aggregates
>
> Built-in count sum, avg, min, and max are supported. Whether a given
> aggregate function can be used or not is checked by using its OID in
> check_aggregate_supports_ivm().
>
> When creating a materialized view containing aggregates, in addition
> to __ivm_count__, more than one hidden columns for each aggregate are
> added to the target list. For example, columns for storing sum(x),
> count(x) are added if we have avg(x). When the view is maintained,
> aggregated values are updated using these hidden columns, also hidden
> columns are updated at the same time.
>
> The maintenance of aggregated view is performed in
> apply_old_delta_with_count() and apply_new_delta_with_count(). The SET
> clauses for updating columns are generated by append_set_clause_*().
>
> If the view has min(x) or max(x) and the minimum or maximal value is
> deleted from a table, we need to update the value to the new min/max
> recalculated from the tables rather than incremental computation. This
> is performed in recalc_and_set_values().
>
> ---------------------------------------------------------------------------------------
> * Details of the patch-set (v28)
>
> > The patch-set consists of the following eleven patches.
>
> In the previous version, the number of patches were nine.
> In the latest patch-set, the patches are divided more finely
> aiming to make the review easier.
>
> > - 0001: Add a syntax to create Incrementally Maintainable Materialized Views
>
> The prposed syntax to create an incrementally maintainable materialized
> view (IMMV) is;
>
>  CREATE INCREMENTAL MATERIALIZED VIEW AS SELECT .....;
>
> However, this syntax is tentative, so any suggestions are welcomed.
>
> > - 0002: Add relisivm column to pg_class system catalog
>
> We add a new field in pg_class to indicate a relation is IMMV.
> Another alternative is to add a new catalog for managing materialized
> views including IMMV, but I am not sure if we want this.
>
> > - 0003: Allow to prolong life span of transition tables until transaction end
>
> This patch fixes the trigger system to allow to prolong lifespan of
> tuple stores for transition tables until the transaction end. We need
> this because multiple transition tables have to be preserved until the
> end of the out-most query when multiple tables are modified by nested
> triggers. (as explained above in Design - Multiple Tables Modification)
>
> If we don't want to change the trigger system in such way, the alternative
> is to copy the contents of transition tables to other tuplestores, although
> it needs more time and memory.
>
> > - 0004: Add Incremental View Maintenance support to pg_dump
>
> This patch enables pg_dump to output IMMV using the new syntax.
>
> > - 0005: Add Incremental View Maintenance support to psql
>
> This patch implements tab-completion for the new syntax and adds
> information of IMMV to \d meta-command results.
>
> > - 0006: Add Incremental View Maintenance support
>
> This patch implements the basic IVM feature.
> DISTINCT and aggregate are not supported here.
>
> When an IMMV is created, the view query is checked, and if any
> non-supported feature is used, it raises an error. If it is ok,
> triggers are created on base tables and an unique index is
> created on the view if possible.
>
> In BEFORE trigger, an entry is created for each IMMV and the number
> of trigger firing is counted. Also, the snapshot just before the
> table modification is stored.
>
> In AFTER triggers, each transition tables are preserved. The number
> of trigger firing is counted also here, and when the firing number of
> BEFORE and AFTER trigger reach the same, it is deemed the final AFTER
> trigger call.
>
> In the final AFTER trigger, the IMMV is maintained. Rewritten view
> query is executed to generate delta tables, and deltas are applied
> to the view. If multiple tables are modified simultaneously, this
> process is iterated for each modified table. Tables before processed
> are represented in "pre-update-state", processed tables are
> "post-update-state" in the rewritten query.
>
> > - 0007: Add DISTINCT support for IVM
>
> This patch adds DISTINCT clause support.
>
> When an IMMV including DISTINCT is created, a hidden column
> "__ivm_count__" is added to the target list. This column has the
> number of duplicity of the same tuples. The duplicity is calculated
> by adding "count(*)" and GROUP BY to the view query.
>
> When an IMMV is maintained, the duplicity in __ivm_count__ is updated,
> and a tuples whose duplicity becomes zero can be deleted from the view.
> This logic is implemented by SQL in apply_old_delta_with_count and
> apply_new_delta_with_count.
>
> Columns starting with "__ivm_" are deemed hidden columns that doesn't
> appear when a view is accessed by "SELECT * FROM ....".  This is
> implemented by fixing parse_relation.c.
>
> > - 0008: Add aggregates support in IVM
>
> This patch provides codes for aggregates support, specifically
> for builtin count, sum, and avg.
>
> When an IMMV containing an aggregate is created, it is checked if this
> aggregate function is supported, and if it is ok, some hidden columns
> are added to the target list.
>
> When the IMMV is maintained, the aggregated value is updated as well as
> related hidden columns. The way of update depends the type of aggregate
> functions, and SET clause string is generated for each aggregate.
>
> > - 0009: Add support for min/max aggregates for IVM
>
> This patch adds min/max aggregates support.
>
> This is separated from #0008 because min/max needs more complicated
> work than count, sum, and avg.
>
> If the view has min(x) or max(x) and the minimum or maximal value is
> deleted from a table, we need to update the value to the new min/max
> recalculated from the tables rather than incremental computation.
> This is performed in recalc_and_set_values().
>
> TIDs and keys of tuples that need re-calculation are returned as a
> result of the query that deleted min/max values from the view using
> RETURNING clause. The plan to recalculate and set the new min/max value
> are stored and reused.
>
> > - 0010: regression tests
>
> This patch provides regression tests for IVM.
>
> > - 0011: documentation
>
> This patch provides documantation for IVM.
>
> ---------------------------------------------------------------------------------------
> * Changes from the Previous Version (v27)
>
> - Allow TRUNCATE on base tables
>
> When a base table is truncated, the view content will be empty if the
> view definition query does not contain an aggregate without a GROUP clause.
> Therefore, such views can be truncated.
>
> Aggregate views without a GROUP clause always have one row. Therefore,
> if a base table is truncated, the view will not be empty and will contain
> a row with NULL value (or 0 for count()). So, in this case, we refresh the
> view instead of truncating it.
>
> - Fix bugs reported by huyajun [1]
>
> [1] https://www.postgresql.org/message-id/tencent_FCAF11BCA5003FD16BDDFDDA5D6A19587809%40qq.com
>
> ---------------------------------------------------------------------------------------
> * Discussion
>
> ** Aggregate support
>
> There were a few suggestions that general aggregate functions should be
> supported [2][3], which may be possible by extending pg_aggregate catalog.
> However, we decided to leave supporting general aggregates to the future work [4]
> because it would need substantial works and make the patch more complex and
> bigger.
>
> There has been no opposite opinion on this. However, if we need more discussion
> on the design of aggregate support, we can omit aggregate support for the first
> release of IVM.
>
> [2] https://www.postgresql.org/message-id/20191128140333.GA25947%40alvherre.pgsql
> [3] https://www.postgresql.org/message-id/CAM-w4HOvDrL4ou6m%3D592zUiKGVzTcOpNj-d_cJqzL00fdsS5kg%40mail.gmail.com
> [4] https://www.postgresql.org/message-id/20201016193034.9a4c44c79fc1eca7babe093e%40sraoss.co.jp
>
> ** Hidden columns
>
> In order to support DISTINCT or aggregates, our implementation uses hidden columns.
>
> Columns starting with "__ivm_" are hidden columns that doesn't appear when a
> view is accessed by "SELECT * FROM ....". For this aim, parse_relation.c is
> fixed. There was a proposal to enable hidden columns by adding a new flag to
> pg_attribute [5], but this thread is no longer active, so we decided to check
> the hidden column by its name [6].
>
> [5]
https://www.postgresql.org/message-id/flat/CAEepm%3D3ZHh%3Dp0nEEnVbs1Dig_UShPzHUcMNAqvDQUgYgcDo-pA%40mail.gmail.com
> [6] https://www.postgresql.org/message-id/20201016193034.9a4c44c79fc1eca7babe093e%40sraoss.co.jp
>
> ** Concurrent Transactions
>
> When the view definition has more than one table, we acquire an exclusive
> lock before the view maintenance in order to avoid inconsistent results.
> This behavior was explained in [7]. The lock was improved to use weaker lock
> when the view has only one table based on a suggestion from Konstantin Knizhnik [8].
> However, due to the implementation that uses ctid for identifying target tuples,
> we still have to use an exclusive lock for DELETE and UPDATE.
>
> [7] https://www.postgresql.org/message-id/20200909092752.c91758a1bec3479668e82643%40sraoss.co.jp
> [8] https://www.postgresql.org/message-id/5663f5f0-48af-686c-bf3c-62d279567e2a%40postgrespro.ru
>
> ** Automatic Index Creation
>
> When a view is created, a unique index is automatically created if
> possible, that is, if the view definition query has a GROUP BY or
> DISTINCT, or if the view contains all primary key attributes of
> its base tables in the target list. It is necessary for efficient
> view maintenance. This feature is based on a suggestion from
> Konstantin Knizhnik [9].
>
> [9] https://www.postgresql.org/message-id/89729da8-9042-7ea0-95af-e415df6da14d%40postgrespro.ru
>
>
> ** Trigger and Transition Tables
>
> We implemented IVM based on triggers. This is because we want to use
> transition tables to extract changes on base tables. Also, there are
> other constraint that are using triggers in its implementation, like
> foreign references. However, if we can use transition table like feature
> without relying triggers, we don't have to insist to use triggers and we
> might implement IVM in the executor directly as similar as declarative
> partitioning.
>
> ** Feature to be Supported in the First Release
>
> The current patch-set supports DISTINCT and aggregates for built-in count,
> sum, avg, min and max. Do we need all these feature for the first IVM release?
> Supporting DISTINCT and aggregates needs discussion on hidden columns, and
> for supporting min/max we need to discuss on  re-calculation method. Before
> handling such relatively advanced feature, maybe, should we focus to design
> and implement of the basic feature of IVM?
>
>
> Any suggestion and discussion are welcomed!
>
> Regards,
> Yugo Nagata
>
> --
> Yugo NAGATA <nagata@sraoss.co.jp>
>
>


> The followings are supported in view definition queries:
> - SELECT ... FROM ... WHERE ..., joins (inner joins, self-joins)


> Also, a view definition query cannot contain other views, materialized views,
> foreign tables, partitioned tables, partitions, VALUES, non-immutable functions,
> system columns, or expressions that contains aggregates.

Does this also apply to tableoid?  but tableoid is a constant, so it
should be fine?
can following two queries apply to this feature.
select tableoid, unique1 from tenk1;
select 1 as constant, unique1 from tenk1;

I didn't apply the patch.(will do later, for someone to test, it would
be a better idea to dump a whole file separately....).



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dagfinn Ilmari Mannsåker
Дата:
Сообщение: Re: [PATCH] Honor PG_TEST_NOCLEAN for tempdirs
Следующее
От: Ants Aasma
Дата:
Сообщение: Re: ReadRecentBuffer() doesn't scale well