Re: [GENERAL] "Shared strings"-style table
От | Rob Sargent |
---|---|
Тема | Re: [GENERAL] "Shared strings"-style table |
Дата | |
Msg-id | dd0a0ae0-c28c-92d7-0650-559a4a43dd7b@gmail.com обсуждение исходный текст |
Ответ на | [GENERAL] "Shared strings"-style table (Seamus Abshere <seamus@abshere.net>) |
Список | pgsql-general |
On 10/13/2017 09:49 AM, Seamus Abshere wrote: > hey, > > In the spreadsheet world, there is this concept of "shared strings," a > simple way of compressing spreadsheets when the data is duplicated in > many cells. > > In my database, I have a table with >200 million rows and >300 columns > (all the households in the United States). For clarity of development > and debugging, I have not made any effort to normalize its contents, so > millions of rows have, for example, "SINGLE FAMILY RESIDENCE / > TOWNHOUSE" (yes, that whole string!) instead of some code representing > it. > > Theoretically / blue sky, could there be a table or column type that > transparently handles "shared strings" like this, reducing size on disk > at the cost of lookup overhead for all queries? > > (I guess maybe it's like TOAST, but content-hashed and de-duped and not > only for large objects?) > > Thanks, > Seamus > > -- > Seamus Abshere, SCEA > https://www.faraday.io > https://github.com/seamusabshere > https://linkedin.com/in/seamusabshere > > What data type are these columns now? I would be tempted to map the full strings to an abbreviation just so I didn't have to alter all the columns to an "id"; Optional to place any RI on the columns to the abbreviation dictionary table. Just use the translation as a last step in user facing reports. If you can map/abbreviate to 4 characters, you've approximated the disk size of an integer. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
В списке pgsql-general по дате отправления: