Table and View Planning for Census Data
От | Lee Hachadoorian |
---|---|
Тема | Table and View Planning for Census Data |
Дата | |
Msg-id | 5ab13580805281128j60a29102vaf023288be8d184d@mail.gmail.com обсуждение исходный текст |
Список | pgsql-novice |
I have a question about how to efficiently set up tables and views to work with census data. The census releases data partitioned (rows) by state and segmented (columns) by arbitrary groupings of matrices, or "subject tables". For example matrices P1 (total population), P2 (urban and rural), P3 (population by race), etc are all grouped in segment 1. Because the census data set is huge, I am downloading and importing data as needed, rather than all at once. Currently I am the only user, although it could possibly be used by several coworkers in the future. It's highly unlikely that more than a few people would be hitting the database at the same time. 1) Often I will be querying for a specific geographic level (e.g. counties) across state lines (e.g New York metro area, which includes data from three states). How much would I gain from trying to set up partitioning, as opposed to combining the states into one big table? Or, since I would rarely be returning multiple geographic levels (like census tracts *and* counties) at the same time, should I combine the states but partition by geographic level? 2) The segments are arbitrary groupings, so I don't really want to have to write a query for data from matrix P3 where I have to also know that matrix P3 is in database table segment1. I can think of three ways to deal with this. Please let me know which one is best, or suggest something else I haven't thought of. a) Create a view which combines all columns, then query for desired columns. (But does a view have the same row size limitation as a table? If so this wouldn't work, because the number of columns will be in the thousands.) b) Create a view for each matrix. Then I can query by matrix and column (which is easier for my brain). c) Break each matrix out into its own database table. (Most work, how much of a performance gain will there be?) Any input would be be appreciated. Lee Hachadoorian PhD Student, Geography Program in Earth & Environmental Sciences CUNY Graduate Center
В списке pgsql-novice по дате отправления: