Обсуждение: US Census database (Tiger 2004FE)

Поиск
Список
Период
Сортировка

US Census database (Tiger 2004FE)

От
"Mark Woodward"
Дата:
I just finished converting and loading the US census data into PostgreSQL
would anyone be interested in it for testing purposes?

It's a *LOT* of data (about 40+ Gig in PostgreSQL)


Re: US Census database (Tiger 2004FE)

От
David Fetter
Дата:
On Wed, Aug 03, 2005 at 05:00:16PM -0400, Mark Woodward wrote:
> I just finished converting and loading the US census data into PostgreSQL
> would anyone be interested in it for testing purposes?
> 
> It's a *LOT* of data (about 40+ Gig in PostgreSQL)

Sure.  Got a torrent?

Cheers,
D
-- 
David Fetter david@fetter.org http://fetter.org/
phone: +1 510 893 6100   mobile: +1 415 235 3778

Remember to vote!


Re: US Census database (Tiger 2004FE)

От
Andrew Dunstan
Дата:
David Fetter wrote:

>On Wed, Aug 03, 2005 at 05:00:16PM -0400, Mark Woodward wrote:
>  
>
>>I just finished converting and loading the US census data into PostgreSQL
>>would anyone be interested in it for testing purposes?
>>
>>It's a *LOT* of data (about 40+ Gig in PostgreSQL)
>>    
>>
>
>Sure.  Got a torrent?
>
>  
>


How big is it when dumped and compressed?

cheers

andrew


Re: US Census database (Tiger 2004FE)

От
"Mark Woodward"
Дата:
Wow! a lot of people seem to want it!
I am dumping out with pg_dump right now, it may take a few hours.
It is in PostgreSQL 8.0.3
Does anyone have access to a high bandwidth server? I could mail it on a
DVD to someone who would host it.

>
> David Fetter wrote:
>
>>On Wed, Aug 03, 2005 at 05:00:16PM -0400, Mark Woodward wrote:
>>
>>
>>>I just finished converting and loading the US census data into
>>> PostgreSQL
>>>would anyone be interested in it for testing purposes?
>>>
>>>It's a *LOT* of data (about 40+ Gig in PostgreSQL)
>>>
>>>
>>
>>Sure.  Got a torrent?
>>
>>
>>
>
>
> How big is it when dumped and compressed?
>
> cheers
>
> andrew
>


Re: US Census database (Tiger 2004FE)

От
Stephen Frost
Дата:
* Mark Woodward (pgsql@mohawksoft.com) wrote:
> I just finished converting and loading the US census data into PostgreSQL
> would anyone be interested in it for testing purposes?
>
> It's a *LOT* of data (about 40+ Gig in PostgreSQL)

How big dumped & compressed?  I may be able to host it depending on how
big it ends up being...
Stephen

Re: US Census database (Tiger 2004FE)

От
"Mark Woodward"
Дата:
> * Mark Woodward (pgsql@mohawksoft.com) wrote:
>> I just finished converting and loading the US census data into
>> PostgreSQL
>> would anyone be interested in it for testing purposes?
>>
>> It's a *LOT* of data (about 40+ Gig in PostgreSQL)
>
> How big dumped & compressed?  I may be able to host it depending on how
> big it ends up being...

It's been running for about an hour now, and it is up to 3.3G.

pg_dump tiger | gzip > tiger.pgz

I'll let you know. Hopefully, it will fit on  DVD.

You know, ... maybe pg_dump needs a progress bar? (How would it do that, I
wonder?)


Re: US Census database (Tiger 2004FE)

От
Stephen Frost
Дата:
* Mark Woodward (pgsql@mohawksoft.com) wrote:
> > How big dumped & compressed?  I may be able to host it depending on how
> > big it ends up being...
>
> It's been running for about an hour now, and it is up to 3.3G.

Not too bad.  I had 2003 (iirc) loaded into 7.4 at one point.

> pg_dump tiger | gzip > tiger.pgz

What db version are you using, how did you load it (ogr2ogr?), is it in
postgis form?  Fun questions, all of them. :)

> I'll let you know. Hopefully, it will fit on  DVD.

I guess your upload pipe isn't very big?  snail-mail is slow... :)

> You know, ... maybe pg_dump needs a progress bar? (How would it do that, I
> wonder?)

Using the new functions in 8.1 which provide size-on-disk of things,
hopefully there's also a function to give a tuple-size or similar as
well.  It'd be a high estimate due to dead tuples but should be
sufficient for a progress bar.
Thanks,
    Stephen

Re: US Census database (Tiger 2004FE)

От
"Mark Woodward"
Дата:
> * Mark Woodward (pgsql@mohawksoft.com) wrote:
>> > How big dumped & compressed?  I may be able to host it depending on
>> how
>> > big it ends up being...
>>
>> It's been running for about an hour now, and it is up to 3.3G.
>
> Not too bad.  I had 2003 (iirc) loaded into 7.4 at one point.

Cool.

>
>> pg_dump tiger | gzip > tiger.pgz
>
> What db version are you using, how did you load it (ogr2ogr?), is it in
> postgis form?  Fun questions, all of them. :)

8.0.3, in simple pg_dump form.

I loaded it with a utility I wrote a long time ago for tigerua. It is a
fixed width text file to PG utility. It takes a "control" file that
describes the fields, field widths, and field name. It creates a SQL
"create table" statement, and also reads all the records from a control
file into a PostgreSQL copy command. A control file looks something like:

# Zip+4 codes
# Tiger 2003 Record Conversion File
# Copyright (c) 2004 Mark L. Woodward, Mohawk Software
TABLE RTZ
1:I     RT
4:I     VERSION
10:T    TLID
3:S     RTSQ
4:Z     ZIP4L
4:Z     ZIP4R


The first number is the field width in chars, second is an optional type
(there are a few, 'I' means ignore, 'Z' means zipcode, etc.) if no type is
given, then varchar is assumed. Last is the column name.


>
>> I'll let you know. Hopefully, it will fit on  DVD.
>
> I guess your upload pipe isn't very big?  snail-mail is slow... :)

Never underestimate the bandwidth of a few DVDs and FedEx. Do the math, it
is embarrasing.

>
>> You know, ... maybe pg_dump needs a progress bar? (How would it do that,
>> I
>> wonder?)
>
> Using the new functions in 8.1 which provide size-on-disk of things,
> hopefully there's also a function to give a tuple-size or similar as
> well.  It'd be a high estimate due to dead tuples but should be
> sufficient for a progress bar.
>
>     Thanks,
>
>         Stephen
>



Re: US Census database (Tiger 2004FE)

От
Tino Wildenhain
Дата:
Am Donnerstag, den 04.08.2005, 08:40 -0400 schrieb Mark Woodward:
> > * Mark Woodward (pgsql@mohawksoft.com) wrote:
> >> I just finished converting and loading the US census data into
> >> PostgreSQL
> >> would anyone be interested in it for testing purposes?
> >>
> >> It's a *LOT* of data (about 40+ Gig in PostgreSQL)
> >
> > How big dumped & compressed?  I may be able to host it depending on how
> > big it ends up being...
> 
> It's been running for about an hour now, and it is up to 3.3G.
> 
> pg_dump tiger | gzip > tiger.pgz
> 
> I'll let you know. Hopefully, it will fit on  DVD.
> 
> You know, ... maybe pg_dump needs a progress bar? (How would it do that, I
> wonder?)
pg_dump -v maybe? ;) *hint hint*

-- 
Tino Wildenhain <tino@wildenhain.de>



Re: US Census database (Tiger 2004FE)

От
Christopher Kings-Lynne
Дата:
> It's been running for about an hour now, and it is up to 3.3G.
> 
> pg_dump tiger | gzip > tiger.pgz

| bzip2 > tiger.sql.bz2 :)

Chris


Re: US Census database (Tiger 2004FE)

От
"Mark Woodward"
Дата:
>> It's been running for about an hour now, and it is up to 3.3G.
>>
>> pg_dump tiger | gzip > tiger.pgz
>
> | bzip2 > tiger.sql.bz2 :)
>

I find bzip2 FAR SLOWER than the gain in compression.


US Census database (Tiger 2004FE) - 4.4G

От
"Mark Woodward"
Дата:
It is 4.4G in space in a gzip package.

I'll mail a DVD to two people who promise to host it for Hackers.


Re: US Census database (Tiger 2004FE) - 4.4G

От
"Gavin M. Roy"
Дата:
You can send it to me, and ehpg will host it.  I'll send you a  
private email with my info.

Gavin

On Aug 4, 2005, at 8:26 AM, Mark Woodward wrote:

> It is 4.4G in space in a gzip package.
>
> I'll mail a DVD to two people who promise to host it for Hackers.
>
> ---------------------------(end of  
> broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org
>

Gavin M. Roy
800 Pound Gorilla
gmr@ehpg.net




Re: US Census database (Tiger 2004FE) - 4.4G

От
Ron Mayer
Дата:
Mark Woodward wrote:
> It is 4.4G in space in a gzip package.
> 
> I'll mail a DVD to two people who promise to host it for Hackers.

Would it be easier to release the program you did to do
this conversion?


I use this pretty short (274 line) C program:   http://www.forensiclogic.com/tmp/tgr2sql.c
to convert the raw tiger files
from http://www.census.gov/geo/www/tiger/index.html
into SQL statements that can be loaded by postgresql.

The #define SQL line controls if it makes data
with INSERT statements or for COPY statements.


Re: US Census database (Tiger 2004FE) - 4.4G

От
"Mark Woodward"
Дата:
I thought bout it, but it isn't the best program around, but it does work.
My program also reformats numbers, i.e. long/lat become properly
decimal-ed numerics, zips become integers, etc.

The question is...

Do you download the raw data and convert it into a database, or do you
download the pre-formatted database?

I would say the preformated database is easier to manage. There are
hundreds of individual zips files, in each of those files 10 or so data
files.



> Mark Woodward wrote:
>> It is 4.4G in space in a gzip package.
>>
>> I'll mail a DVD to two people who promise to host it for Hackers.
>
> Would it be easier to release the program you did to do
> this conversion?
>
>
> I use this pretty short (274 line) C program:
>     http://www.forensiclogic.com/tmp/tgr2sql.c
> to convert the raw tiger files
> from http://www.census.gov/geo/www/tiger/index.html
> into SQL statements that can be loaded by postgresql.
>
> The #define SQL line controls if it makes data
> with INSERT statements or for COPY statements.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>



Re: US Census database (Tiger 2004FE) - 4.4G

От
Darcy Buskermolen
Дата:
On Thursday 04 August 2005 09:37, Gavin M. Roy wrote:
> You can send it to me, and ehpg will host it.  I'll send you a
> private email with my info.
>
> Gavin
>
> On Aug 4, 2005, at 8:26 AM, Mark Woodward wrote:
> > It is 4.4G in space in a gzip package.
> >
> > I'll mail a DVD to two people who promise to host it for Hackers.

I'm wondering if this is now available for consumption by the rest of us??  

(ie what's the link) 


-- 
Darcy Buskermolen
Wavefire Technologies Corp.

http://www.wavefire.com
ph: 250.717.0200
fx: 250.763.1759


Re: US Census database (Tiger 2004FE) - 4.4G

От
"Joshua D. Drake"
Дата:
Darcy Buskermolen wrote:
> On Thursday 04 August 2005 09:37, Gavin M. Roy wrote:
> 
>>You can send it to me, and ehpg will host it.  I'll send you a
>>private email with my info.
>>
>>Gavin
>>
>>On Aug 4, 2005, at 8:26 AM, Mark Woodward wrote:
>>
>>>It is 4.4G in space in a gzip package.
>>>
>>>I'll mail a DVD to two people who promise to host it for Hackers.

Command Prompt would be willing to host it.

Sincerely,

Joshua D. Drake


> 
> 
> I'm wondering if this is now available for consumption by the rest of us??  
> 
> (ie what's the link) 
> 
> 


-- 
Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240
PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/