Обсуждение: varchar/char size

Поиск
Список
Период
Сортировка

varchar/char size

От
Bruce Momjian
Дата:
Does someone want to remind me why we allocate the full size for char()
and varchar(), when we really can just allocate the size of the given
string?

I relize char() has to be padded, but why varchar()?

In my experience, char() is full size as defined by create, and
varchar() is the the size of the actual data in the field, like text,
but with a pre-defined limit.

--
Bruce Momjian
maillist@candle.pha.pa.us

Re: [HACKERS] varchar/char size

От
The Hermit Hacker
Дата:
On Wed, 7 Jan 1998, Bruce Momjian wrote:

> In my experience, char() is full size as defined by create, and
> varchar() is the the size of the actual data in the field, like text,
> but with a pre-defined limit.

    Can you remind me what the difference is between text and varchar?  Why
would you use varchar over text?

Marc G. Fournier
Systems Administrator @ hub.org
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org


Re: [HACKERS] varchar/char size

От
Bruce Momjian
Дата:
>
> On Wed, 7 Jan 1998, Bruce Momjian wrote:
>
> > In my experience, char() is full size as defined by create, and
> > varchar() is the the size of the actual data in the field, like text,
> > but with a pre-defined limit.
>
>     Can you remind me what the difference is between text and varchar?  Why
> would you use varchar over text?

Only because SQL people are used to varchar, and not text, and sometimes
people want to have a maximum size if they are displaying this data in a
form that is only of limited size.

--
Bruce Momjian
maillist@candle.pha.pa.us

Re: [HACKERS] varchar/char size

От
"Thomas G. Lockhart"
Дата:
> Does someone want to remind me why we allocate the full size for char()
> and varchar(), when we really can just allocate the size of the given
> string?
> I relize char() has to be padded, but why varchar()?

> In my experience, char() is full size as defined by create, and
> varchar() is the the size of the actual data in the field, like text,
> but with a pre-defined limit.

Well, in many relational databases access can be optimized by having
fixed-length tuple storage structures. Also, it allows re-use of deleted
space in storage pages. It may be that neither of these points have any
bearing on Postgres, and never will, but unless that clearly the case then
I would be inclined to keep the storage scheme as it is currently.

                                                         - Tom


Re: [HACKERS] varchar/char size

От
Bruce Momjian
Дата:
>
> > Does someone want to remind me why we allocate the full size for char()
> > and varchar(), when we really can just allocate the size of the given
> > string?
> > I relize char() has to be padded, but why varchar()?
>
> > In my experience, char() is full size as defined by create, and
> > varchar() is the the size of the actual data in the field, like text,
> > but with a pre-defined limit.
>
> Well, in many relational databases access can be optimized by having
> fixed-length tuple storage structures. Also, it allows re-use of deleted
> space in storage pages. It may be that neither of these points have any
> bearing on Postgres, and never will, but unless that clearly the case then
> I would be inclined to keep the storage scheme as it is currently.

With Ingres and Informix char() is fixed size, while varchar() is
VARiable size.

--
Bruce Momjian
maillist@candle.pha.pa.us

Re: [HACKERS] varchar/char size

От
"Thomas G. Lockhart"
Дата:
Bruce Momjian wrote:

> >
> > > Does someone want to remind me why we allocate the full size for char()
> > > and varchar(), when we really can just allocate the size of the given
> > > string?
> > > I relize char() has to be padded, but why varchar()?
> >
> > > In my experience, char() is full size as defined by create, and
> > > varchar() is the the size of the actual data in the field, like text,
> > > but with a pre-defined limit.
> >
> > Well, in many relational databases access can be optimized by having
> > fixed-length tuple storage structures. Also, it allows re-use of deleted
> > space in storage pages. It may be that neither of these points have any
> > bearing on Postgres, and never will, but unless that clearly the case then
> > I would be inclined to keep the storage scheme as it is currently.
>
> With Ingres and Informix char() is fixed size, while varchar() is
> VARiable size.

Go for it. Let me know if I can help with testing or anything...

                                        - Tom


Re: [HACKERS] varchar/char size

От
Bruce Momjian
Дата:
>
> Bruce Momjian wrote:
>
> > >
> > > > Does someone want to remind me why we allocate the full size for char()
> > > > and varchar(), when we really can just allocate the size of the given
> > > > string?
> > > > I relize char() has to be padded, but why varchar()?
> > >
> > > > In my experience, char() is full size as defined by create, and
> > > > varchar() is the the size of the actual data in the field, like text,
> > > > but with a pre-defined limit.
> > >
> > > Well, in many relational databases access can be optimized by having
> > > fixed-length tuple storage structures. Also, it allows re-use of deleted
> > > space in storage pages. It may be that neither of these points have any
> > > bearing on Postgres, and never will, but unless that clearly the case then
> > > I would be inclined to keep the storage scheme as it is currently.
> >
> > With Ingres and Informix char() is fixed size, while varchar() is
> > VARiable size.
>
> Go for it. Let me know if I can help with testing or anything...

I know we have text, and that it is better than the others, but if our
varchar() were both variable sized storage, and you could place a max on
it, it would be useful for certain applications.

--
Bruce Momjian
maillist@candle.pha.pa.us

Re: [HACKERS] varchar/char size

От
Mattias Kregert
Дата:
Bruce Momjian wrote:
>
> Does someone want to remind me why we allocate the full size for char()
> and varchar(), when we really can just allocate the size of the given
> string?
>
> I relize char() has to be padded, but why varchar()?
>
> In my experience, char() is full size as defined by create, and
> varchar() is the the size of the actual data in the field, like text,
> but with a pre-defined limit.

Is CHAR padded on disk? Of course it should be padded for
representation, but for storage, couldn't it be stored just like
TEXT or VARCHAR? Before storing, it could be trimmed, and when
read from storage, it could be padded with spaces on the right.

Btw, why is VARCHAR not simply an alias for TEXT, with maxlen added?
Shouldn't these types be the same internally, but with maxlen checked
for VARCHAR in the parser and maxlen set to "infinite"(-1?) for TEXT?
Or perhaps CHAR could be put into the same type also?

If we have a type called VARTEXT(int maxLen, bool doPaddingProcessing):

VARCHAR(10) becomes VARTEXT(10, false)    // 10 chars, no padding
TEXT becomes VARTEXT(0, false)        // infinite length, no padding
CHAR(10) becomes VARTEXT(10, true)    // 10 chars, padded

Would not this be easier to handle than three different types? This
type stuff would be handled in the parser. There would be only one
storage function, which could do any kind of coding to make the VARTEXT
take as little space as possible on disk.
Perhaps it would (in some cases) be good to have the possibility to
specify compression of the text. That could be another bool attribute
to VARTEXT, used by "COMPRESSED VARCHAR()" or "COMPRESSED TEXT" so that
people can squeeze the maximum out of their disk space.

A related question: Is it possible to store tuples over more than one
block? Would it be possible to split a big TEXT into multiple blocks?

/* m */

Re: [HACKERS] varchar/char size

От
"Thomas G. Lockhart"
Дата:
> > Does someone want to remind me why we allocate the full size for char()
> > and varchar(), when we really can just allocate the size of the given
> > string?
> >
> > I relize char() has to be padded, but why varchar()?
> >
> > In my experience, char() is full size as defined by create, and
> > varchar() is the the size of the actual data in the field, like text,
> > but with a pre-defined limit.
>
> Is CHAR padded on disk? Of course it should be padded for
> representation, but for storage, couldn't it be stored just like
> TEXT or VARCHAR? Before storing, it could be trimmed, and when
> read from storage, it could be padded with spaces on the right.

My CA/Ingres Admin manual points out that there is a tradeoff between
compressing tuples to save disk storage and the extra processing work
required to uncompress for use. They suggest that the only case where you
would consider compressing on disk is when your system is very I/O bound,
and you have CPU to burn.

The default for Ingres is to not compress anything, but you can specify
compression on a table-by-table basis.

btw, char() is a bit trickier to handle correctly if you do compress it on
disk, since trailing blanks must be handled correctly all the way through.
For example, you would want 'hi' = 'hi   ' to be true, which is not a
requirement for varchar().

                                                        - Tom


Re: [HACKERS] varchar/char size

От
Bruce Momjian
Дата:
>
> Bruce Momjian wrote:
> >
> > Does someone want to remind me why we allocate the full size for char()
> > and varchar(), when we really can just allocate the size of the given
> > string?
> >
> > I relize char() has to be padded, but why varchar()?
> >
> > In my experience, char() is full size as defined by create, and
> > varchar() is the the size of the actual data in the field, like text,
> > but with a pre-defined limit.
>
> Is CHAR padded on disk? Of course it should be padded for
> representation, but for storage, couldn't it be stored just like
> TEXT or VARCHAR? Before storing, it could be trimmed, and when
> read from storage, it could be padded with spaces on the right.

Well, traditionally, CHAR() is fixed length, and VARCHAR() is variable.
This is how Ingres and Informix handle it.

There is very little difference in the types because internally they are
handled the same.  The only difference is when we need to specify a max
length, we do that with those types.

>
> Btw, why is VARCHAR not simply an alias for TEXT, with maxlen added?
> Shouldn't these types be the same internally, but with maxlen checked
> for VARCHAR in the parser and maxlen set to "infinite"(-1?) for TEXT?
> Or perhaps CHAR could be put into the same type also?

Right now we do some of the special processing using the OID of VARCHAR
and BPCHAR, which is char().  We would have to generalize the length
idea for each type, which is not hard to do.

>
> If we have a type called VARTEXT(int maxLen, bool doPaddingProcessing):
>
> VARCHAR(10) becomes VARTEXT(10, false)    // 10 chars, no padding
> TEXT becomes VARTEXT(0, false)        // infinite length, no padding
> CHAR(10) becomes VARTEXT(10, true)    // 10 chars, padded
>
> Would not this be easier to handle than three different types? This
> type stuff would be handled in the parser. There would be only one
> storage function, which could do any kind of coding to make the VARTEXT
> take as little space as possible on disk.
> Perhaps it would (in some cases) be good to have the possibility to
> specify compression of the text. That could be another bool attribute
> to VARTEXT, used by "COMPRESSED VARCHAR()" or "COMPRESSED TEXT" so that
> people can squeeze the maximum out of their disk space.
>
> A related question: Is it possible to store tuples over more than one
> block? Would it be possible to split a big TEXT into multiple blocks?

I don't know why it is not possible, but I suppose it goes to the
internal workings of PostgreSQL and how rows are added and modified.

--
Bruce Momjian
maillist@candle.pha.pa.us

Re: [HACKERS] varchar/char size

От
The Hermit Hacker
Дата:
On Fri, 9 Jan 1998, Bruce Momjian wrote:

> > Is CHAR padded on disk? Of course it should be padded for
> > representation, but for storage, couldn't it be stored just like
> > TEXT or VARCHAR? Before storing, it could be trimmed, and when
> > read from storage, it could be padded with spaces on the right.
>
> Well, traditionally, CHAR() is fixed length, and VARCHAR() is variable.
> This is how Ingres and Informix handle it.

    But how do we store this to the file system?  If I setup a table
with a char(20), and one of the records has a value of "a", does it then
write 1 byte to the file system, or does it write 1 byte ("a") + 19 bytes
("")?

    If the second, is there a reason why, as far as writing to the
file system is concerned, char() can't be treated like varchar()?  I'd
imagine you could save one helluva lot of "disk space" by doing that, no?

    Then again, thinkiing of it that way, I may as well just use
varchar() instead, right?

    See, this is what *really* gets me lost...I use text for
everything, since I really haven't got a clue as to *why* I'd want to use
either char() or varchar() instead...

    Now, from what I *think* I recall you stating, char() and
varchar() are more for backwards compatibility?  Compatibility with other
SQL engines?  If so...as long as we have a type char(), does our backend
representation have to be any different between char() and text?



Re: [HACKERS] varchar/char size

От
Bruce Momjian
Дата:
>
> On Fri, 9 Jan 1998, Bruce Momjian wrote:
>
> > > Is CHAR padded on disk? Of course it should be padded for
> > > representation, but for storage, couldn't it be stored just like
> > > TEXT or VARCHAR? Before storing, it could be trimmed, and when
> > > read from storage, it could be padded with spaces on the right.
> >
> > Well, traditionally, CHAR() is fixed length, and VARCHAR() is variable.
> > This is how Ingres and Informix handle it.
>
>     But how do we store this to the file system?  If I setup a table
> with a char(20), and one of the records has a value of "a", does it then
> write 1 byte to the file system, or does it write 1 byte ("a") + 19 bytes
> ("")?

20+VARHDRSZ bytes for char(20), 1+VARHDRSZ for varchar(20)

>
>     If the second, is there a reason why, as far as writing to the
> file system is concerned, char() can't be treated like varchar()?  I'd
> imagine you could save one helluva lot of "disk space" by doing that, no?

But then you have variable length records where char(x) forces a fixed
length.  Currently, the code treats all varlena structures as variable,
so we readly don't take advantage of this, but we may some day.

>
>     Then again, thinkiing of it that way, I may as well just use
> varchar() instead, right?

Yep.

>
>     See, this is what *really* gets me lost...I use text for
> everything, since I really haven't got a clue as to *why* I'd want to use
> either char() or varchar() instead...
>
>     Now, from what I *think* I recall you stating, char() and
> varchar() are more for backwards compatibility?  Compatibility with other
> SQL engines?  If so...as long as we have a type char(), does our backend
> representation have to be any different between char() and text?

We need the fixed length trim cabability of char(), and I think we need
the padding of char() too.

--
Bruce Momjian
maillist@candle.pha.pa.us