Обсуждение: US Census database (Tiger 2004FE)
I just finished converting and loading the US census data into PostgreSQL would anyone be interested in it for testing purposes? It's a *LOT* of data (about 40+ Gig in PostgreSQL)
On Wed, Aug 03, 2005 at 05:00:16PM -0400, Mark Woodward wrote: > I just finished converting and loading the US census data into PostgreSQL > would anyone be interested in it for testing purposes? > > It's a *LOT* of data (about 40+ Gig in PostgreSQL) Sure. Got a torrent? Cheers, D -- David Fetter david@fetter.org http://fetter.org/ phone: +1 510 893 6100 mobile: +1 415 235 3778 Remember to vote!
David Fetter wrote: >On Wed, Aug 03, 2005 at 05:00:16PM -0400, Mark Woodward wrote: > > >>I just finished converting and loading the US census data into PostgreSQL >>would anyone be interested in it for testing purposes? >> >>It's a *LOT* of data (about 40+ Gig in PostgreSQL) >> >> > >Sure. Got a torrent? > > > How big is it when dumped and compressed? cheers andrew
Wow! a lot of people seem to want it! I am dumping out with pg_dump right now, it may take a few hours. It is in PostgreSQL 8.0.3 Does anyone have access to a high bandwidth server? I could mail it on a DVD to someone who would host it. > > David Fetter wrote: > >>On Wed, Aug 03, 2005 at 05:00:16PM -0400, Mark Woodward wrote: >> >> >>>I just finished converting and loading the US census data into >>> PostgreSQL >>>would anyone be interested in it for testing purposes? >>> >>>It's a *LOT* of data (about 40+ Gig in PostgreSQL) >>> >>> >> >>Sure. Got a torrent? >> >> >> > > > How big is it when dumped and compressed? > > cheers > > andrew >
* Mark Woodward (pgsql@mohawksoft.com) wrote: > I just finished converting and loading the US census data into PostgreSQL > would anyone be interested in it for testing purposes? > > It's a *LOT* of data (about 40+ Gig in PostgreSQL) How big dumped & compressed? I may be able to host it depending on how big it ends up being... Stephen
> * Mark Woodward (pgsql@mohawksoft.com) wrote: >> I just finished converting and loading the US census data into >> PostgreSQL >> would anyone be interested in it for testing purposes? >> >> It's a *LOT* of data (about 40+ Gig in PostgreSQL) > > How big dumped & compressed? I may be able to host it depending on how > big it ends up being... It's been running for about an hour now, and it is up to 3.3G. pg_dump tiger | gzip > tiger.pgz I'll let you know. Hopefully, it will fit on DVD. You know, ... maybe pg_dump needs a progress bar? (How would it do that, I wonder?)
* Mark Woodward (pgsql@mohawksoft.com) wrote: > > How big dumped & compressed? I may be able to host it depending on how > > big it ends up being... > > It's been running for about an hour now, and it is up to 3.3G. Not too bad. I had 2003 (iirc) loaded into 7.4 at one point. > pg_dump tiger | gzip > tiger.pgz What db version are you using, how did you load it (ogr2ogr?), is it in postgis form? Fun questions, all of them. :) > I'll let you know. Hopefully, it will fit on DVD. I guess your upload pipe isn't very big? snail-mail is slow... :) > You know, ... maybe pg_dump needs a progress bar? (How would it do that, I > wonder?) Using the new functions in 8.1 which provide size-on-disk of things, hopefully there's also a function to give a tuple-size or similar as well. It'd be a high estimate due to dead tuples but should be sufficient for a progress bar. Thanks, Stephen
> * Mark Woodward (pgsql@mohawksoft.com) wrote: >> > How big dumped & compressed? I may be able to host it depending on >> how >> > big it ends up being... >> >> It's been running for about an hour now, and it is up to 3.3G. > > Not too bad. I had 2003 (iirc) loaded into 7.4 at one point. Cool. > >> pg_dump tiger | gzip > tiger.pgz > > What db version are you using, how did you load it (ogr2ogr?), is it in > postgis form? Fun questions, all of them. :) 8.0.3, in simple pg_dump form. I loaded it with a utility I wrote a long time ago for tigerua. It is a fixed width text file to PG utility. It takes a "control" file that describes the fields, field widths, and field name. It creates a SQL "create table" statement, and also reads all the records from a control file into a PostgreSQL copy command. A control file looks something like: # Zip+4 codes # Tiger 2003 Record Conversion File # Copyright (c) 2004 Mark L. Woodward, Mohawk Software TABLE RTZ 1:I RT 4:I VERSION 10:T TLID 3:S RTSQ 4:Z ZIP4L 4:Z ZIP4R The first number is the field width in chars, second is an optional type (there are a few, 'I' means ignore, 'Z' means zipcode, etc.) if no type is given, then varchar is assumed. Last is the column name. > >> I'll let you know. Hopefully, it will fit on DVD. > > I guess your upload pipe isn't very big? snail-mail is slow... :) Never underestimate the bandwidth of a few DVDs and FedEx. Do the math, it is embarrasing. > >> You know, ... maybe pg_dump needs a progress bar? (How would it do that, >> I >> wonder?) > > Using the new functions in 8.1 which provide size-on-disk of things, > hopefully there's also a function to give a tuple-size or similar as > well. It'd be a high estimate due to dead tuples but should be > sufficient for a progress bar. > > Thanks, > > Stephen >
Am Donnerstag, den 04.08.2005, 08:40 -0400 schrieb Mark Woodward: > > * Mark Woodward (pgsql@mohawksoft.com) wrote: > >> I just finished converting and loading the US census data into > >> PostgreSQL > >> would anyone be interested in it for testing purposes? > >> > >> It's a *LOT* of data (about 40+ Gig in PostgreSQL) > > > > How big dumped & compressed? I may be able to host it depending on how > > big it ends up being... > > It's been running for about an hour now, and it is up to 3.3G. > > pg_dump tiger | gzip > tiger.pgz > > I'll let you know. Hopefully, it will fit on DVD. > > You know, ... maybe pg_dump needs a progress bar? (How would it do that, I > wonder?) pg_dump -v maybe? ;) *hint hint* -- Tino Wildenhain <tino@wildenhain.de>
> It's been running for about an hour now, and it is up to 3.3G. > > pg_dump tiger | gzip > tiger.pgz | bzip2 > tiger.sql.bz2 :) Chris
>> It's been running for about an hour now, and it is up to 3.3G. >> >> pg_dump tiger | gzip > tiger.pgz > > | bzip2 > tiger.sql.bz2 :) > I find bzip2 FAR SLOWER than the gain in compression.
It is 4.4G in space in a gzip package. I'll mail a DVD to two people who promise to host it for Hackers.
You can send it to me, and ehpg will host it. I'll send you a private email with my info. Gavin On Aug 4, 2005, at 8:26 AM, Mark Woodward wrote: > It is 4.4G in space in a gzip package. > > I'll mail a DVD to two people who promise to host it for Hackers. > > ---------------------------(end of > broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org > Gavin M. Roy 800 Pound Gorilla gmr@ehpg.net
Mark Woodward wrote: > It is 4.4G in space in a gzip package. > > I'll mail a DVD to two people who promise to host it for Hackers. Would it be easier to release the program you did to do this conversion? I use this pretty short (274 line) C program: http://www.forensiclogic.com/tmp/tgr2sql.c to convert the raw tiger files from http://www.census.gov/geo/www/tiger/index.html into SQL statements that can be loaded by postgresql. The #define SQL line controls if it makes data with INSERT statements or for COPY statements.
I thought bout it, but it isn't the best program around, but it does work. My program also reformats numbers, i.e. long/lat become properly decimal-ed numerics, zips become integers, etc. The question is... Do you download the raw data and convert it into a database, or do you download the pre-formatted database? I would say the preformated database is easier to manage. There are hundreds of individual zips files, in each of those files 10 or so data files. > Mark Woodward wrote: >> It is 4.4G in space in a gzip package. >> >> I'll mail a DVD to two people who promise to host it for Hackers. > > Would it be easier to release the program you did to do > this conversion? > > > I use this pretty short (274 line) C program: > http://www.forensiclogic.com/tmp/tgr2sql.c > to convert the raw tiger files > from http://www.census.gov/geo/www/tiger/index.html > into SQL statements that can be loaded by postgresql. > > The #define SQL line controls if it makes data > with INSERT statements or for COPY statements. > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend >
On Thursday 04 August 2005 09:37, Gavin M. Roy wrote: > You can send it to me, and ehpg will host it. I'll send you a > private email with my info. > > Gavin > > On Aug 4, 2005, at 8:26 AM, Mark Woodward wrote: > > It is 4.4G in space in a gzip package. > > > > I'll mail a DVD to two people who promise to host it for Hackers. I'm wondering if this is now available for consumption by the rest of us?? (ie what's the link) -- Darcy Buskermolen Wavefire Technologies Corp. http://www.wavefire.com ph: 250.717.0200 fx: 250.763.1759
Darcy Buskermolen wrote: > On Thursday 04 August 2005 09:37, Gavin M. Roy wrote: > >>You can send it to me, and ehpg will host it. I'll send you a >>private email with my info. >> >>Gavin >> >>On Aug 4, 2005, at 8:26 AM, Mark Woodward wrote: >> >>>It is 4.4G in space in a gzip package. >>> >>>I'll mail a DVD to two people who promise to host it for Hackers. Command Prompt would be willing to host it. Sincerely, Joshua D. Drake > > > I'm wondering if this is now available for consumption by the rest of us?? > > (ie what's the link) > > -- Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240 PostgreSQL Replication, Consulting, Custom Programming, 24x7 support Managed Services, Shared and Dedicated Hosting Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/