Обсуждение: Re: [WIP]Vertical Clustered Index (columnar store extension) - take2
On Tue, 29 Jul 2025 at 06:57, Peter Smith <smithpb2250@gmail.com> wrote:
> Here are the latest v15 patches.
>
> Changes include:
>
> PATCH 0002.
> - README now says user should not tamper with VCI internal relations
> - fixes/test the VACUUM bug -- fix provided by Japin [1]
> - fixes/tests the reported segv for attempted REFRESH of VCI internal
> relation -- see [2 comment#1]
> - fixes/tests VCI internal relation dependency on the indexed table
> - simplifies code for PG_TEMP_FILES_DIR -- see [2 comment#2]
>
Hi Peter,
Thanks for updating the patches.
1.
I've identified another TRAP failure. Here are the reproduction steps:
rm -rf demo
initdb -D demo
cat <<EOF >>demo/postgresql.auto.conf
shared_preload_libraries = 'vci'
max_worker_processes = '20'
EOF
pg_ctl -D demo start
cat <<EOF | psql postgres
CREATE EXTENSION vci;
CREATE TABLE t (id int, info text);
CREATE INDEX ON t USING vci (id);
INSERT INTO t SELECT id, md5(random()::text) FROM generate_series(1, 1000) id;
REINDEX TABLE t;
REINDEX TABLE t;
EOF
The current VCI design doesn't support REINDEX, which is expected. But I've
discovered an unexpected issue: running REINDEX on a table a second time causes
an assertion failure.
2.
+Internal Relation Types:
+- -1: TID relation (maps CRID to original TID)
+- -2: NULL vector (bit array for NULL values)
+- -3: Delete vector (bit array for deleted records)
+- -5: TID-CRID mapping table
+- -9: Data WOS (buffered row data)
+- -10: Whiteout WOS (deletion markers)
+- 0-N: ROS column data relations (one per indexed column)
+
+Example:
+For a VCI index on sales(customer_id, amount, date):
+
+Generated relations include:
+vci_0000001234_00000_d → Column 0 data (customer_id)
+vci_0000001234_00001_d → Column 1 data (amount)
+vci_0000001234_00002_d → Column 2 data (date)
+vci_0000001234_65535_d → TID relation
+vci_0000001234_65534_d → NULL vector
+vci_0000001234_65533_d → Delete vector
+vci_0000001234_65531_m → TID-CRID mapping
+vci_0000001234_65527_d → Data WOS
+vci_0000001234_65526_d → Whiteout WOS
The README states that it generates the above relations, but there are
additional internal relations that are not mentioned.
SELECT relname, relkind FROM pg_class WHERE relname ~ 'vci*' ORDER BY relname;
relname | relkind
------------------------+---------
vci_0000016578_00000_d | m
vci_0000016578_00000_m | m
vci_0000016578_00001_d | m
vci_0000016578_00001_m | m
vci_0000016578_65526_d | m
vci_0000016578_65527_d | m
vci_0000016578_65530_0 | m
vci_0000016578_65530_1 | m
vci_0000016578_65531_d | m
vci_0000016578_65531_m | m
vci_0000016578_65533_d | m
vci_0000016578_65533_m | m
vci_0000016578_65534_d | m
vci_0000016578_65534_m | m
vci_0000016578_65535_d | m
vci_0000016578_65535_m | m
Based on the above, are the following materialized views unused, or is their
use just undocumented?
- vci_0000016578_00000_m
- vci_0000016578_00001_m
- vci_0000016491_65530_0
- vci_0000016578_65530_1
- vci_0000016578_65531_d
- vci_0000016578_65534_m
- vci_0000016578_65535_m
What is the purpose of the '0' and '1' suffixes?
3.
I've also found that the VCI index is not working. Is this the expected
behavior?
[local]:3209161 postgres=# \d+ t
Table "public.t"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
id | integer | | | | plain | | |
info | text | | | | extended | | |
Indexes:
"t_id_idx" vci (id)
Access method: heap
[local]:3209161 postgres=# SET enable_seqscan TO off;
SET
[local]:3209161 postgres=# EXPLAIN SELECT * FROM t WHERE id = 1000;
QUERY PLAN
-----------------------------------------------------
Seq Scan on t (cost=0.00..2084.00 rows=1 width=37)
Disabled: true
Filter: (id = 1000)
(3 rows)
--
Regards,
Japin Li
On Wed, Jul 30, 2025 at 9:07 PM Japin Li <japinli@hotmail.com> wrote: > ... > 2. > +Internal Relation Types: > +- -1: TID relation (maps CRID to original TID) > +- -2: NULL vector (bit array for NULL values) > +- -3: Delete vector (bit array for deleted records) > +- -5: TID-CRID mapping table > +- -9: Data WOS (buffered row data) > +- -10: Whiteout WOS (deletion markers) > +- 0-N: ROS column data relations (one per indexed column) > + > +Example: > +For a VCI index on sales(customer_id, amount, date): > + > +Generated relations include: > +vci_0000001234_00000_d → Column 0 data (customer_id) > +vci_0000001234_00001_d → Column 1 data (amount) > +vci_0000001234_00002_d → Column 2 data (date) > +vci_0000001234_65535_d → TID relation > +vci_0000001234_65534_d → NULL vector > +vci_0000001234_65533_d → Delete vector > +vci_0000001234_65531_m → TID-CRID mapping > +vci_0000001234_65527_d → Data WOS > +vci_0000001234_65526_d → Whiteout WOS > > The README states that it generates the above relations, but there are > additional internal relations that are not mentioned. > > SELECT relname, relkind FROM pg_class WHERE relname ~ 'vci*' ORDER BY relname; > relname | relkind > ------------------------+--------- > vci_0000016578_00000_d | m > vci_0000016578_00000_m | m > vci_0000016578_00001_d | m > vci_0000016578_00001_m | m > vci_0000016578_65526_d | m > vci_0000016578_65527_d | m > vci_0000016578_65530_0 | m > vci_0000016578_65530_1 | m > vci_0000016578_65531_d | m > vci_0000016578_65531_m | m > vci_0000016578_65533_d | m > vci_0000016578_65533_m | m > vci_0000016578_65534_d | m > vci_0000016578_65534_m | m > vci_0000016578_65535_d | m > vci_0000016578_65535_m | m > > Based on the above, are the following materialized views unused, or is their > use just undocumented? > > - vci_0000016578_00000_m > - vci_0000016578_00001_m > - vci_0000016491_65530_0 > - vci_0000016578_65530_1 > - vci_0000016578_65531_d > - vci_0000016578_65534_m > - vci_0000016578_65535_m > > What is the purpose of the '0' and '1' suffixes? > Yeah, it was undocumented. I didn't intend for the README to give a complete list, but in hindsight it may have been clearer if it did. Those '0' and '1' relations are two more files associated with the TID-CRID mappings -- those ones are for keeping track of mapping updates. + oid = vci_create_relation(GenRelName(indexRel, VCI_COLUMN_ID_TID_CRID_UPDATE, '0'), indexRel, indexInfo, VCI_RELTYPE_TIDCRID); + vci_SetMainRelVar(vmr_info, vcimrv_tid_crid_update_oid_0, 0, oid); + + oid = vci_create_relation(GenRelName(indexRel, VCI_COLUMN_ID_TID_CRID_UPDATE, '1'), indexRel, indexInfo, VCI_RELTYPE_TIDCRID); + vci_SetMainRelVar(vmr_info, vcimrv_tid_crid_update_oid_1, 0, oid); The README was updated like below; it will be available next time new patches are posted ------ Internal Relation Types: - -1: TID relation (maps CRID to original TID) - -2: NULL vector (bit array for NULL values) - -3: Delete vector (bit array for deleted records) - -5: TID-CRID mappings - -6: TID-CRID mappings (update list) - -9: Data WOS (buffered row data) - -10: Whiteout WOS (deletion markers) - 0-N: ROS column data relations (one per indexed column) Example: For a VCI index on sales(customer_id, amount, date): Generated relations include: vci_0000012345_00000_d → Column 0 data (customer_id) vci_0000012345_00000_m ... and metadata vci_0000012345_00001_d → Column 1 data (amount) vci_0000012345_00001_m ... and metadata vci_0000012345_00002_d → Column 2 data (date) vci_0000012345_00002_m ... and metadata vci_0000012345_65526_d → Whiteout WOS vci_0000012345_65527_d → Data WOS vci_0000012345_65531_d → TID-CRID mappings vci_0000012345_65531_m ... and metadata vci_0000012345_65530_0 ... and update list #0 vci_0000012345_65530_1 ... and update list #1 vci_0000012345_65533_d → Delete vector vci_0000012345_65533_m ... and metadata vci_0000012345_65534_d → NULL vector vci_0000012345_65534_m ... and metadata vci_0000012345_65535_d → TID relation vci_0000012345_65535_m ... and metadata ------ ====== Kind Regards, Peter Smith. Fujitsu Australia
On Wed, Jul 30, 2025 at 9:07 PM Japin Li <japinli@hotmail.com> wrote:
>
...
> 3.
> I've also found that the VCI index is not working. Is this the expected
> behavior?
>
> [local]:3209161 postgres=# \d+ t
> Table "public.t"
> Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
> --------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
> id | integer | | | | plain | | |
> info | text | | | | extended | | |
> Indexes:
> "t_id_idx" vci (id)
> Access method: heap
>
> [local]:3209161 postgres=# SET enable_seqscan TO off;
> SET
> [local]:3209161 postgres=# EXPLAIN SELECT * FROM t WHERE id = 1000;
> QUERY PLAN
> -----------------------------------------------------
> Seq Scan on t (cost=0.00..2084.00 rows=1 width=37)
> Disabled: true
> Filter: (id = 1000)
> (3 rows)
>
Hi Japin. Yes, that's expected behaviour.
VCI is used only when the vci index is defined for all the columns of
your query. In your example there was a table with 2 columns ('id' and
'info') but you only have an index on the 'id' column. If you change
the query then you can see VCI getting used.
E.g.
postgres=# EXPLAIN SELECT id FROM t WHERE id = 1000;
QUERY PLAN
----------------------------------------------------------------------------
Custom Scan (VCI Scan) using tidx on t (cost=0.00..209.00 rows=1 width=4)
Filter: (id = 1000)
(2 rows)
postgres=# EXPLAIN SELECT * FROM t WHERE id = 1000;
QUERY PLAN
----------------------------------------------------
Seq Scan on t (cost=0.00..209.00 rows=1 width=37)
Filter: (id = 1000)
(2 rows)
postgres=# EXPLAIN SELECT id,info FROM t WHERE id = 1000;
QUERY PLAN
----------------------------------------------------
Seq Scan on t (cost=0.00..209.00 rows=1 width=37)
Filter: (id = 1000)
(2 rows)
~~~
You can see this also in the DEBUG logs, from
vci_can_rewrite_custom_scan(), where it checks to see if the attrs are
in the vci index or not.
e.g.
2025-08-01 16:58:20.939 AEST [26528] DEBUG: vci index: target table
"t"(oid=16384) tuples(rows=10000,extents=0)
2025-08-01 16:58:20.939 AEST [26528] DEBUG: vci index: don't match
index "tidx"(oid=16469)
2025-08-01 16:58:20.940 AEST [26528] DEBUG: attrnum = 1 x
2025-08-01 16:58:20.940 AEST [26528] DEBUG: attrnum = 2
======
Kind Regards,
Peter Smith.
Fujitsu Australia
On Fri, Aug 01, 2025 at 05:18:11PM +1000, Peter Smith wrote:
> On Wed, Jul 30, 2025 at 9:07 PM Japin Li <japinli@hotmail.com> wrote:
> >
> ...
> > 3.
> > I've also found that the VCI index is not working. Is this the expected
> > behavior?
> >
> > [local]:3209161 postgres=# \d+ t
> > Table "public.t"
> > Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
> > --------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
> > id | integer | | | | plain | | |
> > info | text | | | | extended | | |
> > Indexes:
> > "t_id_idx" vci (id)
> > Access method: heap
> >
> > [local]:3209161 postgres=# SET enable_seqscan TO off;
> > SET
> > [local]:3209161 postgres=# EXPLAIN SELECT * FROM t WHERE id = 1000;
> > QUERY PLAN
> > -----------------------------------------------------
> > Seq Scan on t (cost=0.00..2084.00 rows=1 width=37)
> > Disabled: true
> > Filter: (id = 1000)
> > (3 rows)
> >
>
> Hi Japin. Yes, that's expected behaviour.
>
> VCI is used only when the vci index is defined for all the columns of
> your query. In your example there was a table with 2 columns ('id' and
> 'info') but you only have an index on the 'id' column. If you change
> the query then you can see VCI getting used.
>
> E.g.
>
> postgres=# EXPLAIN SELECT id FROM t WHERE id = 1000;
> QUERY PLAN
> ----------------------------------------------------------------------------
> Custom Scan (VCI Scan) using tidx on t (cost=0.00..209.00 rows=1 width=4)
> Filter: (id = 1000)
> (2 rows)
>
> postgres=# EXPLAIN SELECT * FROM t WHERE id = 1000;
> QUERY PLAN
> ----------------------------------------------------
> Seq Scan on t (cost=0.00..209.00 rows=1 width=37)
> Filter: (id = 1000)
> (2 rows)
>
> postgres=# EXPLAIN SELECT id,info FROM t WHERE id = 1000;
> QUERY PLAN
> ----------------------------------------------------
> Seq Scan on t (cost=0.00..209.00 rows=1 width=37)
> Filter: (id = 1000)
> (2 rows)
>
> ~~~
>
> You can see this also in the DEBUG logs, from
> vci_can_rewrite_custom_scan(), where it checks to see if the attrs are
> in the vci index or not.
>
> e.g.
> 2025-08-01 16:58:20.939 AEST [26528] DEBUG: vci index: target table
> "t"(oid=16384) tuples(rows=10000,extents=0)
> 2025-08-01 16:58:20.939 AEST [26528] DEBUG: vci index: don't match
> index "tidx"(oid=16469)
> 2025-08-01 16:58:20.940 AEST [26528] DEBUG: attrnum = 1 x
> 2025-08-01 16:58:20.940 AEST [26528] DEBUG: attrnum = 2
>
Thanks for your explantion! Got it.
Are there any plans to remove this restriction in the future?
--
Best regards,
Japin Li
ChengDu WenWu Information Technology Co., LTD.
On Fri, Aug 1, 2025 at 5:43 PM Japin Li <japinli@hotmail.com> wrote:
>
> On Fri, Aug 01, 2025 at 05:18:11PM +1000, Peter Smith wrote:
> > On Wed, Jul 30, 2025 at 9:07 PM Japin Li <japinli@hotmail.com> wrote:
> > >
> > ...
> > > 3.
> > > I've also found that the VCI index is not working. Is this the expected
> > > behavior?
> > >
> > > [local]:3209161 postgres=# \d+ t
> > > Table "public.t"
> > > Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
> > > --------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
> > > id | integer | | | | plain | | |
> > > info | text | | | | extended | | |
> > > Indexes:
> > > "t_id_idx" vci (id)
> > > Access method: heap
> > >
> > > [local]:3209161 postgres=# SET enable_seqscan TO off;
> > > SET
> > > [local]:3209161 postgres=# EXPLAIN SELECT * FROM t WHERE id = 1000;
> > > QUERY PLAN
> > > -----------------------------------------------------
> > > Seq Scan on t (cost=0.00..2084.00 rows=1 width=37)
> > > Disabled: true
> > > Filter: (id = 1000)
> > > (3 rows)
> > >
> >
> > Hi Japin. Yes, that's expected behaviour.
> >
> > VCI is used only when the vci index is defined for all the columns of
> > your query. In your example there was a table with 2 columns ('id' and
> > 'info') but you only have an index on the 'id' column. If you change
> > the query then you can see VCI getting used.
> >
> > E.g.
> >
> > postgres=# EXPLAIN SELECT id FROM t WHERE id = 1000;
> > QUERY PLAN
> > ----------------------------------------------------------------------------
> > Custom Scan (VCI Scan) using tidx on t (cost=0.00..209.00 rows=1 width=4)
> > Filter: (id = 1000)
> > (2 rows)
> >
> > postgres=# EXPLAIN SELECT * FROM t WHERE id = 1000;
> > QUERY PLAN
> > ----------------------------------------------------
> > Seq Scan on t (cost=0.00..209.00 rows=1 width=37)
> > Filter: (id = 1000)
> > (2 rows)
> >
> > postgres=# EXPLAIN SELECT id,info FROM t WHERE id = 1000;
> > QUERY PLAN
> > ----------------------------------------------------
> > Seq Scan on t (cost=0.00..209.00 rows=1 width=37)
> > Filter: (id = 1000)
> > (2 rows)
> >
> > ~~~
> >
> > You can see this also in the DEBUG logs, from
> > vci_can_rewrite_custom_scan(), where it checks to see if the attrs are
> > in the vci index or not.
> >
> > e.g.
> > 2025-08-01 16:58:20.939 AEST [26528] DEBUG: vci index: target table
> > "t"(oid=16384) tuples(rows=10000,extents=0)
> > 2025-08-01 16:58:20.939 AEST [26528] DEBUG: vci index: don't match
> > index "tidx"(oid=16469)
> > 2025-08-01 16:58:20.940 AEST [26528] DEBUG: attrnum = 1 x
> > 2025-08-01 16:58:20.940 AEST [26528] DEBUG: attrnum = 2
> >
>
> Thanks for your explantion! Got it.
>
> Are there any plans to remove this restriction in the future?
>
No. There aren't any plans to remove this restriction because it is
not considered to be a "restriction" in the first place; e.g. VCI is
intended more like an accelerator only for those *specified* columns
for which you intend to do your analysis.
======
Kind Regards,
Peter Smith.
Fujitsu Australia
On Thu, Aug 14, 2025 at 12:04:12PM +1000, Peter Smith wrote:
> On Fri, Aug 1, 2025 at 5:43 PM Japin Li <japinli@hotmail.com> wrote:
> >
> > On Fri, Aug 01, 2025 at 05:18:11PM +1000, Peter Smith wrote:
> > > On Wed, Jul 30, 2025 at 9:07 PM Japin Li <japinli@hotmail.com> wrote:
> > > >
> > > ...
> > > > 3.
> > > > I've also found that the VCI index is not working. Is this the expected
> > > > behavior?
> > > >
> > > > [local]:3209161 postgres=# \d+ t
> > > > Table "public.t"
> > > > Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
> > > > --------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
> > > > id | integer | | | | plain | | |
> > > > info | text | | | | extended | | |
> > > > Indexes:
> > > > "t_id_idx" vci (id)
> > > > Access method: heap
> > > >
> > > > [local]:3209161 postgres=# SET enable_seqscan TO off;
> > > > SET
> > > > [local]:3209161 postgres=# EXPLAIN SELECT * FROM t WHERE id = 1000;
> > > > QUERY PLAN
> > > > -----------------------------------------------------
> > > > Seq Scan on t (cost=0.00..2084.00 rows=1 width=37)
> > > > Disabled: true
> > > > Filter: (id = 1000)
> > > > (3 rows)
> > > >
> > >
> > > Hi Japin. Yes, that's expected behaviour.
> > >
> > > VCI is used only when the vci index is defined for all the columns of
> > > your query. In your example there was a table with 2 columns ('id' and
> > > 'info') but you only have an index on the 'id' column. If you change
> > > the query then you can see VCI getting used.
> > >
> > > E.g.
> > >
> > > postgres=# EXPLAIN SELECT id FROM t WHERE id = 1000;
> > > QUERY PLAN
> > > ----------------------------------------------------------------------------
> > > Custom Scan (VCI Scan) using tidx on t (cost=0.00..209.00 rows=1 width=4)
> > > Filter: (id = 1000)
> > > (2 rows)
> > >
> > > postgres=# EXPLAIN SELECT * FROM t WHERE id = 1000;
> > > QUERY PLAN
> > > ----------------------------------------------------
> > > Seq Scan on t (cost=0.00..209.00 rows=1 width=37)
> > > Filter: (id = 1000)
> > > (2 rows)
> > >
> > > postgres=# EXPLAIN SELECT id,info FROM t WHERE id = 1000;
> > > QUERY PLAN
> > > ----------------------------------------------------
> > > Seq Scan on t (cost=0.00..209.00 rows=1 width=37)
> > > Filter: (id = 1000)
> > > (2 rows)
> > >
> > > ~~~
> > >
> > > You can see this also in the DEBUG logs, from
> > > vci_can_rewrite_custom_scan(), where it checks to see if the attrs are
> > > in the vci index or not.
> > >
> > > e.g.
> > > 2025-08-01 16:58:20.939 AEST [26528] DEBUG: vci index: target table
> > > "t"(oid=16384) tuples(rows=10000,extents=0)
> > > 2025-08-01 16:58:20.939 AEST [26528] DEBUG: vci index: don't match
> > > index "tidx"(oid=16469)
> > > 2025-08-01 16:58:20.940 AEST [26528] DEBUG: attrnum = 1 x
> > > 2025-08-01 16:58:20.940 AEST [26528] DEBUG: attrnum = 2
> > >
> >
> > Thanks for your explantion! Got it.
> >
> > Are there any plans to remove this restriction in the future?
> >
>
> No. There aren't any plans to remove this restriction because it is
> not considered to be a "restriction" in the first place; e.g. VCI is
> intended more like an accelerator only for those *specified* columns
> for which you intend to do your analysis.
>
Thanks for your explantion!
--
Best regards,
Japin Li
ChengDu WenWu Information Technology Co., LTD.