Обсуждение: stored short varlena in array

Поиск
Список
Период
Сортировка

stored short varlena in array

От
Quan Zongliang
Дата:
Hi

Now, the varlena type is stored directly in the array. Did not consider 
short varlena. If it's like fill_val(), using short varlena saves memory 
footprint and disk space.
In TODO, there is a requirement to be implemented:
  Allow single-byte header storage for array elements

This patch modifies many files.
Based on 38da053463bef32adf563ddee5277d16d2b6c5af
Has passed the regression test.
But it can affect many contribs. The code needs to be adjusted. Like 
hstore and ltree.


Disk space usage test

create table t1 (c1 varchar[]);
insert into t1
    select '{a,b,c,d,e,f,g,h,i,j,k,l,m,n}'
       from generate_series(1,100000);
select pg_relation_size('t1')/8192;

before
postgres=# select pg_relation_size('t1')/8192;
  ?column?
----------
      2041
(1 row)

after
postgres=# select pg_relation_size('t1')/8192;
  ?column?
----------
      1334
(1 row)

Memory usage
'{a,b,c,d,e,f,g,h,i,j,k,l,m,n}'::varchar[]

before
136 bytes
after
80 bytes

--
Zongliang Quan




Re: stored short varlena in array

От
Tom Lane
Дата:
Quan Zongliang <quanzongliang@yeah.net> writes:
> Now, the varlena type is stored directly in the array. Did not consider 
> short varlena. If it's like fill_val(), using short varlena saves memory 
> footprint and disk space.

TBH, I think this is a bad idea and we should reject it.  As you have
already discovered, the code footprint of such a change is enormous
(and I have little confidence that you found all the places to fix).
The consequences would be equally dire in extensions, which'd likely
be dealing with ensuing bugs for years to come.

The reason we didn't do this when we originally invented short varlena
headers is that we presumed that array-level compression would remove
most of the benefit.  Of course that only happens if the array is big
enough to get the attention of the tuple toaster, which is why your
example with very small arrays shows a benefit.  But I'm doubtful that
such use-cases justify the pain we'd endure getting to the point where
this'd work reliably.  The percentage savings drops off drastically as
the length of the individual strings grows, so this example with
one-byte strings is very much a best-case scenario.

In short, I'm afraid this ship sailed a long time ago.  Perhaps it
was a poor decision but I think we're stuck with it.

            regards, tom lane