Обсуждение: Efficient use of space in large table?
I was just wondering - I've got two large tables and I was wondering
if there is anyway to shrink them somewhat. I imagined compression for
non-indexed columns or something. Is varchar or char more efficient than
text? I'm sort of just grasping at straws for something to get the data
smaller.
Josh
A size summary
relname | size | type
------------------------+-------+----------
sospeople | 599MB | table
sospeople_fn | 71MB | index
sospeople_ln | 73MB | index
sospeople_zip | 73MB | index
votes | 937MB | table
Table "sospeople"
Attribute | Type | Modifier
------------------+---------+---------------------
sosid | text | not null default ''
countyname | text | not null
firstname | text |
middlename | text |
lastname | text |
suffix | text |
homephone | text |
registrationdate | text |
birthyear | integer |
status | text |
precinctcode | text |
housenumber | text |
streetname | text |
unittype | text |
unit | text |
address2 | text |
city | text |
state | text |
zipcode | text |
Indices: sospeople_fn,
sospeople_ln,
sospeople_zip
Table "votes"
Attribute | Type | Modifier
------------+---------------+----------
sosid | character(10) |
electionid | integer |
votetype | character(1) |
Joshua b. Jore ; http://www.greentechnologist.org
On Thu, 4 Jul 2002 21:43:10 -0500 (CDT), Josh Jore
<josh@greentechnologist.org> wrote:
>I was just wondering - I've got two large tables and I was wondering
>if there is anyway to shrink them somewhat. I imagined compression for
>non-indexed columns or something. Is varchar or char more efficient than
>text?
>
Josh,
first of all, text is ok. You might want to store NULL instead of ''
to squeeze out a few bytes here and there.
Now I have even more questions instead of answers :-)
PG version? OS?
Do you have lots of UPDATEs/DELETEs?
Do you ANALYZE regularly?
Please show us the outputs of
VACUUM VERBOSE sospeople;
VACUUM VERBOSE votes;
>
>A size summary
> relname | size | type | reltuples
>------------------------+-------+---------- +----------
> sospeople | 599MB | table | 2M
> sospeople_fn | 71MB | index | 2M
> sospeople_ln | 73MB | index | 2M
> sospeople_zip | 73MB | index | 2M
> votes | 937MB | table | 15M
Could you show us the tuple counts in your relations, just like the
wild guess I have inserted?
>
>
> Table "sospeople"
> Attribute | Type | Modifier
>------------------+---------+---------------------
> sosid | text | not null default ''
> countyname | text | not null
> firstname | text |
> middlename | text |
> lastname | text |
> suffix | text |
> homephone | text |
> registrationdate | text |
> birthyear | integer |
> status | text |
> precinctcode | text |
> housenumber | text |
> streetname | text |
> unittype | text |
> unit | text |
> address2 | text |
> city | text |
> state | text |
> zipcode | text |
What are the average sizes of these text columns? If there are long
repeated values (e.g. countyname), it might help do pull these out
into a separate table
CREATE TABLE county (id serial, name text);
and replace
> countyname | text | not null
by
county_id NOT NULL REFERENCES county
>Indices: sospeople_fn,
> sospeople_ln,
> sospeople_zip
>
Also post the results of
\d sospeople_fn
\d sospeople_ln
\d sospeople_zip
> Table "votes"
> Attribute | Type | Modifier
>------------+---------------+----------
> sosid | character(10) |
> electionid | integer |
> votetype | character(1) |
No index here?
Using sosno INT instead of sosid CHAR(10) could save you (vaguely
estimated) up to 10% of space for this table. OTOH you would have to
insert sosno into sospeople, so this would only be a win, if votes has
far more rows than sospeople.
>I'm sort of just grasping at straws for something to get the data
>smaller.
Is your problem really related to space, or to speed?
Servus
Manfred
On Fri, 5 Jul 2002, Manfred Koizar wrote: > first of all, text is ok. You might want to store NULL instead of '' > to squeeze out a few bytes here and there. Correct me if I'm wrong, but doesn't NULL take up *more* room than ''? After all, as soon as you have a NULL, you have to have a null bit array in the tuple. cjs -- Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're all light. --XTC
Curt Sampson dijo: > On Fri, 5 Jul 2002, Manfred Koizar wrote: > > > first of all, text is ok. You might want to store NULL instead of '' > > to squeeze out a few bytes here and there. > > Correct me if I'm wrong, but doesn't NULL take up *more* room than ''? > After all, as soon as you have a NULL, you have to have a null bit array > in the tuple. I think that if you have 32 attributes or less in the table, the bitmap will use the same space as '' (because '' has the 4-byte length). And if you have at least one other NULL attribute, the rest come for free. So no, it won't take more space, AFAIU anyway. -- Alvaro Herrera (<alvherre[a]atentus.com>) Officer Krupke, what are we to do? Gee, officer Krupke, Krup you! (West Side Story, "Gee, Officer Krupke")
On Fri, 5 Jul 2002, Manfred Koizar wrote: > On Thu, 4 Jul 2002 21:43:10 -0500 (CDT), Josh Jore > <josh@greentechnologist.org> wrote: > >I was just wondering - I've got two large tables and I was wondering > >if there is anyway to shrink them somewhat. I imagined compression for > >non-indexed columns or something. Is varchar or char more efficient than > >text? > > > Josh, > > first of all, text is ok. You might want to store NULL instead of '' > to squeeze out a few bytes here and there. I just thought I'd follow up - it turns out that most of my space was going to tuple headers (some 40ish bytes header, 16 bytes data). I just took the data out of PostgreSQL and stuck it into partitioned ASCII files and BerkeleyDB for indexes. That happens to work excellently and doesn't require as fancy a machine as PostgreSQL did. So the answer is to sometimesquestion your choice of tool ;-)