Обсуждение: Re: [WIP]Vertical Clustered Index (columnar store extension) - take2
On Tue, 29 Jul 2025 at 06:57, Peter Smith <smithpb2250@gmail.com> wrote: > Here are the latest v15 patches. > > Changes include: > > PATCH 0002. > - README now says user should not tamper with VCI internal relations > - fixes/test the VACUUM bug -- fix provided by Japin [1] > - fixes/tests the reported segv for attempted REFRESH of VCI internal > relation -- see [2 comment#1] > - fixes/tests VCI internal relation dependency on the indexed table > - simplifies code for PG_TEMP_FILES_DIR -- see [2 comment#2] > Hi Peter, Thanks for updating the patches. 1. I've identified another TRAP failure. Here are the reproduction steps: rm -rf demo initdb -D demo cat <<EOF >>demo/postgresql.auto.conf shared_preload_libraries = 'vci' max_worker_processes = '20' EOF pg_ctl -D demo start cat <<EOF | psql postgres CREATE EXTENSION vci; CREATE TABLE t (id int, info text); CREATE INDEX ON t USING vci (id); INSERT INTO t SELECT id, md5(random()::text) FROM generate_series(1, 1000) id; REINDEX TABLE t; REINDEX TABLE t; EOF The current VCI design doesn't support REINDEX, which is expected. But I've discovered an unexpected issue: running REINDEX on a table a second time causes an assertion failure. 2. +Internal Relation Types: +- -1: TID relation (maps CRID to original TID) +- -2: NULL vector (bit array for NULL values) +- -3: Delete vector (bit array for deleted records) +- -5: TID-CRID mapping table +- -9: Data WOS (buffered row data) +- -10: Whiteout WOS (deletion markers) +- 0-N: ROS column data relations (one per indexed column) + +Example: +For a VCI index on sales(customer_id, amount, date): + +Generated relations include: +vci_0000001234_00000_d → Column 0 data (customer_id) +vci_0000001234_00001_d → Column 1 data (amount) +vci_0000001234_00002_d → Column 2 data (date) +vci_0000001234_65535_d → TID relation +vci_0000001234_65534_d → NULL vector +vci_0000001234_65533_d → Delete vector +vci_0000001234_65531_m → TID-CRID mapping +vci_0000001234_65527_d → Data WOS +vci_0000001234_65526_d → Whiteout WOS The README states that it generates the above relations, but there are additional internal relations that are not mentioned. SELECT relname, relkind FROM pg_class WHERE relname ~ 'vci*' ORDER BY relname; relname | relkind ------------------------+--------- vci_0000016578_00000_d | m vci_0000016578_00000_m | m vci_0000016578_00001_d | m vci_0000016578_00001_m | m vci_0000016578_65526_d | m vci_0000016578_65527_d | m vci_0000016578_65530_0 | m vci_0000016578_65530_1 | m vci_0000016578_65531_d | m vci_0000016578_65531_m | m vci_0000016578_65533_d | m vci_0000016578_65533_m | m vci_0000016578_65534_d | m vci_0000016578_65534_m | m vci_0000016578_65535_d | m vci_0000016578_65535_m | m Based on the above, are the following materialized views unused, or is their use just undocumented? - vci_0000016578_00000_m - vci_0000016578_00001_m - vci_0000016491_65530_0 - vci_0000016578_65530_1 - vci_0000016578_65531_d - vci_0000016578_65534_m - vci_0000016578_65535_m What is the purpose of the '0' and '1' suffixes? 3. I've also found that the VCI index is not working. Is this the expected behavior? [local]:3209161 postgres=# \d+ t Table "public.t" Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description --------+---------+-----------+----------+---------+----------+-------------+--------------+------------- id | integer | | | | plain | | | info | text | | | | extended | | | Indexes: "t_id_idx" vci (id) Access method: heap [local]:3209161 postgres=# SET enable_seqscan TO off; SET [local]:3209161 postgres=# EXPLAIN SELECT * FROM t WHERE id = 1000; QUERY PLAN ----------------------------------------------------- Seq Scan on t (cost=0.00..2084.00 rows=1 width=37) Disabled: true Filter: (id = 1000) (3 rows) -- Regards, Japin Li
On Wed, Jul 30, 2025 at 9:07 PM Japin Li <japinli@hotmail.com> wrote: > ... > 2. > +Internal Relation Types: > +- -1: TID relation (maps CRID to original TID) > +- -2: NULL vector (bit array for NULL values) > +- -3: Delete vector (bit array for deleted records) > +- -5: TID-CRID mapping table > +- -9: Data WOS (buffered row data) > +- -10: Whiteout WOS (deletion markers) > +- 0-N: ROS column data relations (one per indexed column) > + > +Example: > +For a VCI index on sales(customer_id, amount, date): > + > +Generated relations include: > +vci_0000001234_00000_d → Column 0 data (customer_id) > +vci_0000001234_00001_d → Column 1 data (amount) > +vci_0000001234_00002_d → Column 2 data (date) > +vci_0000001234_65535_d → TID relation > +vci_0000001234_65534_d → NULL vector > +vci_0000001234_65533_d → Delete vector > +vci_0000001234_65531_m → TID-CRID mapping > +vci_0000001234_65527_d → Data WOS > +vci_0000001234_65526_d → Whiteout WOS > > The README states that it generates the above relations, but there are > additional internal relations that are not mentioned. > > SELECT relname, relkind FROM pg_class WHERE relname ~ 'vci*' ORDER BY relname; > relname | relkind > ------------------------+--------- > vci_0000016578_00000_d | m > vci_0000016578_00000_m | m > vci_0000016578_00001_d | m > vci_0000016578_00001_m | m > vci_0000016578_65526_d | m > vci_0000016578_65527_d | m > vci_0000016578_65530_0 | m > vci_0000016578_65530_1 | m > vci_0000016578_65531_d | m > vci_0000016578_65531_m | m > vci_0000016578_65533_d | m > vci_0000016578_65533_m | m > vci_0000016578_65534_d | m > vci_0000016578_65534_m | m > vci_0000016578_65535_d | m > vci_0000016578_65535_m | m > > Based on the above, are the following materialized views unused, or is their > use just undocumented? > > - vci_0000016578_00000_m > - vci_0000016578_00001_m > - vci_0000016491_65530_0 > - vci_0000016578_65530_1 > - vci_0000016578_65531_d > - vci_0000016578_65534_m > - vci_0000016578_65535_m > > What is the purpose of the '0' and '1' suffixes? > Yeah, it was undocumented. I didn't intend for the README to give a complete list, but in hindsight it may have been clearer if it did. Those '0' and '1' relations are two more files associated with the TID-CRID mappings -- those ones are for keeping track of mapping updates. + oid = vci_create_relation(GenRelName(indexRel, VCI_COLUMN_ID_TID_CRID_UPDATE, '0'), indexRel, indexInfo, VCI_RELTYPE_TIDCRID); + vci_SetMainRelVar(vmr_info, vcimrv_tid_crid_update_oid_0, 0, oid); + + oid = vci_create_relation(GenRelName(indexRel, VCI_COLUMN_ID_TID_CRID_UPDATE, '1'), indexRel, indexInfo, VCI_RELTYPE_TIDCRID); + vci_SetMainRelVar(vmr_info, vcimrv_tid_crid_update_oid_1, 0, oid); The README was updated like below; it will be available next time new patches are posted ------ Internal Relation Types: - -1: TID relation (maps CRID to original TID) - -2: NULL vector (bit array for NULL values) - -3: Delete vector (bit array for deleted records) - -5: TID-CRID mappings - -6: TID-CRID mappings (update list) - -9: Data WOS (buffered row data) - -10: Whiteout WOS (deletion markers) - 0-N: ROS column data relations (one per indexed column) Example: For a VCI index on sales(customer_id, amount, date): Generated relations include: vci_0000012345_00000_d → Column 0 data (customer_id) vci_0000012345_00000_m ... and metadata vci_0000012345_00001_d → Column 1 data (amount) vci_0000012345_00001_m ... and metadata vci_0000012345_00002_d → Column 2 data (date) vci_0000012345_00002_m ... and metadata vci_0000012345_65526_d → Whiteout WOS vci_0000012345_65527_d → Data WOS vci_0000012345_65531_d → TID-CRID mappings vci_0000012345_65531_m ... and metadata vci_0000012345_65530_0 ... and update list #0 vci_0000012345_65530_1 ... and update list #1 vci_0000012345_65533_d → Delete vector vci_0000012345_65533_m ... and metadata vci_0000012345_65534_d → NULL vector vci_0000012345_65534_m ... and metadata vci_0000012345_65535_d → TID relation vci_0000012345_65535_m ... and metadata ------ ====== Kind Regards, Peter Smith. Fujitsu Australia
On Wed, Jul 30, 2025 at 9:07 PM Japin Li <japinli@hotmail.com> wrote: > ... > 3. > I've also found that the VCI index is not working. Is this the expected > behavior? > > [local]:3209161 postgres=# \d+ t > Table "public.t" > Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description > --------+---------+-----------+----------+---------+----------+-------------+--------------+------------- > id | integer | | | | plain | | | > info | text | | | | extended | | | > Indexes: > "t_id_idx" vci (id) > Access method: heap > > [local]:3209161 postgres=# SET enable_seqscan TO off; > SET > [local]:3209161 postgres=# EXPLAIN SELECT * FROM t WHERE id = 1000; > QUERY PLAN > ----------------------------------------------------- > Seq Scan on t (cost=0.00..2084.00 rows=1 width=37) > Disabled: true > Filter: (id = 1000) > (3 rows) > Hi Japin. Yes, that's expected behaviour. VCI is used only when the vci index is defined for all the columns of your query. In your example there was a table with 2 columns ('id' and 'info') but you only have an index on the 'id' column. If you change the query then you can see VCI getting used. E.g. postgres=# EXPLAIN SELECT id FROM t WHERE id = 1000; QUERY PLAN ---------------------------------------------------------------------------- Custom Scan (VCI Scan) using tidx on t (cost=0.00..209.00 rows=1 width=4) Filter: (id = 1000) (2 rows) postgres=# EXPLAIN SELECT * FROM t WHERE id = 1000; QUERY PLAN ---------------------------------------------------- Seq Scan on t (cost=0.00..209.00 rows=1 width=37) Filter: (id = 1000) (2 rows) postgres=# EXPLAIN SELECT id,info FROM t WHERE id = 1000; QUERY PLAN ---------------------------------------------------- Seq Scan on t (cost=0.00..209.00 rows=1 width=37) Filter: (id = 1000) (2 rows) ~~~ You can see this also in the DEBUG logs, from vci_can_rewrite_custom_scan(), where it checks to see if the attrs are in the vci index or not. e.g. 2025-08-01 16:58:20.939 AEST [26528] DEBUG: vci index: target table "t"(oid=16384) tuples(rows=10000,extents=0) 2025-08-01 16:58:20.939 AEST [26528] DEBUG: vci index: don't match index "tidx"(oid=16469) 2025-08-01 16:58:20.940 AEST [26528] DEBUG: attrnum = 1 x 2025-08-01 16:58:20.940 AEST [26528] DEBUG: attrnum = 2 ====== Kind Regards, Peter Smith. Fujitsu Australia
On Fri, Aug 01, 2025 at 05:18:11PM +1000, Peter Smith wrote: > On Wed, Jul 30, 2025 at 9:07 PM Japin Li <japinli@hotmail.com> wrote: > > > ... > > 3. > > I've also found that the VCI index is not working. Is this the expected > > behavior? > > > > [local]:3209161 postgres=# \d+ t > > Table "public.t" > > Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description > > --------+---------+-----------+----------+---------+----------+-------------+--------------+------------- > > id | integer | | | | plain | | | > > info | text | | | | extended | | | > > Indexes: > > "t_id_idx" vci (id) > > Access method: heap > > > > [local]:3209161 postgres=# SET enable_seqscan TO off; > > SET > > [local]:3209161 postgres=# EXPLAIN SELECT * FROM t WHERE id = 1000; > > QUERY PLAN > > ----------------------------------------------------- > > Seq Scan on t (cost=0.00..2084.00 rows=1 width=37) > > Disabled: true > > Filter: (id = 1000) > > (3 rows) > > > > Hi Japin. Yes, that's expected behaviour. > > VCI is used only when the vci index is defined for all the columns of > your query. In your example there was a table with 2 columns ('id' and > 'info') but you only have an index on the 'id' column. If you change > the query then you can see VCI getting used. > > E.g. > > postgres=# EXPLAIN SELECT id FROM t WHERE id = 1000; > QUERY PLAN > ---------------------------------------------------------------------------- > Custom Scan (VCI Scan) using tidx on t (cost=0.00..209.00 rows=1 width=4) > Filter: (id = 1000) > (2 rows) > > postgres=# EXPLAIN SELECT * FROM t WHERE id = 1000; > QUERY PLAN > ---------------------------------------------------- > Seq Scan on t (cost=0.00..209.00 rows=1 width=37) > Filter: (id = 1000) > (2 rows) > > postgres=# EXPLAIN SELECT id,info FROM t WHERE id = 1000; > QUERY PLAN > ---------------------------------------------------- > Seq Scan on t (cost=0.00..209.00 rows=1 width=37) > Filter: (id = 1000) > (2 rows) > > ~~~ > > You can see this also in the DEBUG logs, from > vci_can_rewrite_custom_scan(), where it checks to see if the attrs are > in the vci index or not. > > e.g. > 2025-08-01 16:58:20.939 AEST [26528] DEBUG: vci index: target table > "t"(oid=16384) tuples(rows=10000,extents=0) > 2025-08-01 16:58:20.939 AEST [26528] DEBUG: vci index: don't match > index "tidx"(oid=16469) > 2025-08-01 16:58:20.940 AEST [26528] DEBUG: attrnum = 1 x > 2025-08-01 16:58:20.940 AEST [26528] DEBUG: attrnum = 2 > Thanks for your explantion! Got it. Are there any plans to remove this restriction in the future? -- Best regards, Japin Li ChengDu WenWu Information Technology Co., LTD.
On Fri, Aug 1, 2025 at 5:43 PM Japin Li <japinli@hotmail.com> wrote: > > On Fri, Aug 01, 2025 at 05:18:11PM +1000, Peter Smith wrote: > > On Wed, Jul 30, 2025 at 9:07 PM Japin Li <japinli@hotmail.com> wrote: > > > > > ... > > > 3. > > > I've also found that the VCI index is not working. Is this the expected > > > behavior? > > > > > > [local]:3209161 postgres=# \d+ t > > > Table "public.t" > > > Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description > > > --------+---------+-----------+----------+---------+----------+-------------+--------------+------------- > > > id | integer | | | | plain | | | > > > info | text | | | | extended | | | > > > Indexes: > > > "t_id_idx" vci (id) > > > Access method: heap > > > > > > [local]:3209161 postgres=# SET enable_seqscan TO off; > > > SET > > > [local]:3209161 postgres=# EXPLAIN SELECT * FROM t WHERE id = 1000; > > > QUERY PLAN > > > ----------------------------------------------------- > > > Seq Scan on t (cost=0.00..2084.00 rows=1 width=37) > > > Disabled: true > > > Filter: (id = 1000) > > > (3 rows) > > > > > > > Hi Japin. Yes, that's expected behaviour. > > > > VCI is used only when the vci index is defined for all the columns of > > your query. In your example there was a table with 2 columns ('id' and > > 'info') but you only have an index on the 'id' column. If you change > > the query then you can see VCI getting used. > > > > E.g. > > > > postgres=# EXPLAIN SELECT id FROM t WHERE id = 1000; > > QUERY PLAN > > ---------------------------------------------------------------------------- > > Custom Scan (VCI Scan) using tidx on t (cost=0.00..209.00 rows=1 width=4) > > Filter: (id = 1000) > > (2 rows) > > > > postgres=# EXPLAIN SELECT * FROM t WHERE id = 1000; > > QUERY PLAN > > ---------------------------------------------------- > > Seq Scan on t (cost=0.00..209.00 rows=1 width=37) > > Filter: (id = 1000) > > (2 rows) > > > > postgres=# EXPLAIN SELECT id,info FROM t WHERE id = 1000; > > QUERY PLAN > > ---------------------------------------------------- > > Seq Scan on t (cost=0.00..209.00 rows=1 width=37) > > Filter: (id = 1000) > > (2 rows) > > > > ~~~ > > > > You can see this also in the DEBUG logs, from > > vci_can_rewrite_custom_scan(), where it checks to see if the attrs are > > in the vci index or not. > > > > e.g. > > 2025-08-01 16:58:20.939 AEST [26528] DEBUG: vci index: target table > > "t"(oid=16384) tuples(rows=10000,extents=0) > > 2025-08-01 16:58:20.939 AEST [26528] DEBUG: vci index: don't match > > index "tidx"(oid=16469) > > 2025-08-01 16:58:20.940 AEST [26528] DEBUG: attrnum = 1 x > > 2025-08-01 16:58:20.940 AEST [26528] DEBUG: attrnum = 2 > > > > Thanks for your explantion! Got it. > > Are there any plans to remove this restriction in the future? > No. There aren't any plans to remove this restriction because it is not considered to be a "restriction" in the first place; e.g. VCI is intended more like an accelerator only for those *specified* columns for which you intend to do your analysis. ====== Kind Regards, Peter Smith. Fujitsu Australia
On Thu, Aug 14, 2025 at 12:04:12PM +1000, Peter Smith wrote: > On Fri, Aug 1, 2025 at 5:43 PM Japin Li <japinli@hotmail.com> wrote: > > > > On Fri, Aug 01, 2025 at 05:18:11PM +1000, Peter Smith wrote: > > > On Wed, Jul 30, 2025 at 9:07 PM Japin Li <japinli@hotmail.com> wrote: > > > > > > > ... > > > > 3. > > > > I've also found that the VCI index is not working. Is this the expected > > > > behavior? > > > > > > > > [local]:3209161 postgres=# \d+ t > > > > Table "public.t" > > > > Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description > > > > --------+---------+-----------+----------+---------+----------+-------------+--------------+------------- > > > > id | integer | | | | plain | | | > > > > info | text | | | | extended | | | > > > > Indexes: > > > > "t_id_idx" vci (id) > > > > Access method: heap > > > > > > > > [local]:3209161 postgres=# SET enable_seqscan TO off; > > > > SET > > > > [local]:3209161 postgres=# EXPLAIN SELECT * FROM t WHERE id = 1000; > > > > QUERY PLAN > > > > ----------------------------------------------------- > > > > Seq Scan on t (cost=0.00..2084.00 rows=1 width=37) > > > > Disabled: true > > > > Filter: (id = 1000) > > > > (3 rows) > > > > > > > > > > Hi Japin. Yes, that's expected behaviour. > > > > > > VCI is used only when the vci index is defined for all the columns of > > > your query. In your example there was a table with 2 columns ('id' and > > > 'info') but you only have an index on the 'id' column. If you change > > > the query then you can see VCI getting used. > > > > > > E.g. > > > > > > postgres=# EXPLAIN SELECT id FROM t WHERE id = 1000; > > > QUERY PLAN > > > ---------------------------------------------------------------------------- > > > Custom Scan (VCI Scan) using tidx on t (cost=0.00..209.00 rows=1 width=4) > > > Filter: (id = 1000) > > > (2 rows) > > > > > > postgres=# EXPLAIN SELECT * FROM t WHERE id = 1000; > > > QUERY PLAN > > > ---------------------------------------------------- > > > Seq Scan on t (cost=0.00..209.00 rows=1 width=37) > > > Filter: (id = 1000) > > > (2 rows) > > > > > > postgres=# EXPLAIN SELECT id,info FROM t WHERE id = 1000; > > > QUERY PLAN > > > ---------------------------------------------------- > > > Seq Scan on t (cost=0.00..209.00 rows=1 width=37) > > > Filter: (id = 1000) > > > (2 rows) > > > > > > ~~~ > > > > > > You can see this also in the DEBUG logs, from > > > vci_can_rewrite_custom_scan(), where it checks to see if the attrs are > > > in the vci index or not. > > > > > > e.g. > > > 2025-08-01 16:58:20.939 AEST [26528] DEBUG: vci index: target table > > > "t"(oid=16384) tuples(rows=10000,extents=0) > > > 2025-08-01 16:58:20.939 AEST [26528] DEBUG: vci index: don't match > > > index "tidx"(oid=16469) > > > 2025-08-01 16:58:20.940 AEST [26528] DEBUG: attrnum = 1 x > > > 2025-08-01 16:58:20.940 AEST [26528] DEBUG: attrnum = 2 > > > > > > > Thanks for your explantion! Got it. > > > > Are there any plans to remove this restriction in the future? > > > > No. There aren't any plans to remove this restriction because it is > not considered to be a "restriction" in the first place; e.g. VCI is > intended more like an accelerator only for those *specified* columns > for which you intend to do your analysis. > Thanks for your explantion! -- Best regards, Japin Li ChengDu WenWu Information Technology Co., LTD.