Tom Lane wrote:
> Don Baccus <dhogaza@pacifier.com> writes:
>
> > Also...interbase's "text" type is apparently compressed, and that's
> > an interesting idea for "text" itself (as opposed to "varchar()" of
> > a given size). Someone who just says "text" probably wants to be
> > able to stuff as much text into the column as possible, I know
> > I do!
>
> Just quietly make text compressed-under-the-hood, you mean? Hmm.
> Interesting idea, all right, and it wouldn't create any long-term
> compatibility problem since users couldn't see it directly. ...
If we wheren't in BETA code freeze right now, I'd call for another month delay - surely.
> > The price of compression/decompression is to some extent
> > balanced by not having to drag as many bytes around during joins
> > and sorts and the like.
>
> Also, there could be a threshold: don't bother trying to compress
> fields that are less than, say, 1K bytes.
>
> Jan, what do you think? I might be able to find some time to try this,
> if you approve of the idea but just don't have cycles to spare.
It's a very temping solution, turn "text" into "lztext" silently, and revert that internal changes in
thenext release again while implementing TOAST. Remember that the lztext I implemented had the mentioned
thresholdparamenter - say 256 - from the very beginning. And you know 256->1K is a one-liner in my coding
style. Moreover, it was a global parameter set driven value, and thus potentially prepared to be a runtime
configurable one (the other values of the parameter set where minimum compression ratio to gain, maximum
resultsize to force compression even if ratio below, GOOD size to stop history lookup and finally history lookup
GOODlowering factor during lookups).
The algorithm I used for compression is one, loosing possible compression ratio to gain speed. It uses a
poor XOR combination of the next 4 input-bytes, to lookup a history table - and that's anything but perfect
from a hashing algorithms point of view. But it was enough to make a 50+ column view fit easily into
pg_rewrite. And that's what it was made for.
Anyway, there are far too many direct references to VARDATA on "text" plus all the assumptions on binary
compatibility between text, varchar etc. in the code, to start on it during BETA.
Thus, I see a good chance for a 7.1 release, really soon after 7.0. Then have a longer delay for the
nextone, featuring TOAST.
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#========================================= wieck@debis.com (Jan Wieck) #