Re: Determining size of a database before dumping
От | Jeff Davis |
---|---|
Тема | Re: Determining size of a database before dumping |
Дата | |
Msg-id | 1159830749.25557.41.camel@dogma.v10.wvs обсуждение исходный текст |
Ответ на | Re: Determining size of a database before dumping (Alexander Staubo <alex@purefiction.net>) |
Ответы |
Re: Determining size of a database before dumping
|
Список | pgsql-general |
On Tue, 2006-10-03 at 00:42 +0200, Alexander Staubo wrote: > Why does pg_dump serialize data less efficiently than PostgreSQL when > using the "custom" format? (Pg_dump arguably has greater freedom in > being able to apply space-saving optimizations to the output format. > For example, one could use table statistics to selectively apply > something like Rice coding for numeric data, or vertically decompose > the tuples and emit sorted vectors using delta compression.) As for > TOAST, should not pg_dump's compression compress just as well, or > better? It would be a strange set of data that had a larger representation as a compressed pg_dump than the data directory itself. However, one could imagine a contrived case where that might happen. Let's say you had a single table with 10,000 columns of type INT4, 100M records, all with random numbers in the columns. I don't think standard gzip compression will compress random INT4s down to 32 bits. Another example is NULLs. What if only a few of those records had non- NULL values? If I understand correctly, PostgreSQL will represent those NULLs with just one bit. What you're saying is more theoretical. If pg_dump used specialized compression based on the data type of the columns, and everything was optimal, you're correct. There's no situation in which the dump *must* be bigger. However, since there is no practical demand for such compression, and it would be a lot of work, there is no *guarantee* that the data directory will be bigger. However, it probably is. Regards, Jeff Davis
В списке pgsql-general по дате отправления: