Обсуждение: Postgres and really huge tables

Поиск
Список
Период
Сортировка

Postgres and really huge tables

От
Brian Hurt
Дата:
Is there any experience with Postgresql and really huge tables?  I'm
talking about terabytes (plural) here in a single table.  Obviously the
table will be partitioned, and probably spread among several different
file systems.  Any other tricks I should know about?

We have a problem of that form here.  When I asked why postgres wasn't
being used, the opinion that postgres would "just <explicitive> die" was
given.  Personally, I'd bet money postgres could handle the problem (and
better than the ad-hoc solution we're currently using).  But I'd like a
couple of replies of the form "yeah, we do that here- no problem" to
wave around.

Brian



Re: Postgres and really huge tables

От
"Joshua D. Drake"
Дата:
Brian Hurt wrote:
> Is there any experience with Postgresql and really huge tables?  I'm
> talking about terabytes (plural) here in a single table.  Obviously the
> table will be partitioned, and probably spread among several different
> file systems.  Any other tricks I should know about?
>
> We have a problem of that form here.  When I asked why postgres wasn't
> being used, the opinion that postgres would "just <explicitive> die" was
> given.  Personally, I'd bet money postgres could handle the problem (and
> better than the ad-hoc solution we're currently using).  But I'd like a
> couple of replies of the form "yeah, we do that here- no problem" to
> wave around.

It entirely depends on the machine and how things are accessed. In
theory you could have a multi-terabyte table but my question of course
is why in the world would you do that? That is what partitioning is for.

Regardless, appropriate use of things like partial indexes should make
it possible.

Joshua D. Drake


>
> Brian
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: You can help support the PostgreSQL project by donating at
>
>                http://www.postgresql.org/about/donate
>


--

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


Re: Postgres and really huge tables

От
Chris Mair
Дата:
> Is there any experience with Postgresql and really huge tables?  I'm
> talking about terabytes (plural) here in a single table.  Obviously the
> table will be partitioned, and probably spread among several different
> file systems.  Any other tricks I should know about?
>
> We have a problem of that form here.  When I asked why postgres wasn't
> being used, the opinion that postgres would "just <explicitive> die" was
> given.  Personally, I'd bet money postgres could handle the problem (and
> better than the ad-hoc solution we're currently using).  But I'd like a
> couple of replies of the form "yeah, we do that here- no problem" to
> wave around.

I've done a project using 8.1 on solaris that had a table that was
closed to 2TB. The funny thing is that it just worked fine even without
partitioning.

But, then again: the size of a single record was huge too: ~ 50K.
So there were not insanly many records: "just" something
in the order of 10ths of millions.

The queries just were done on some int fields, so the index of the
whole thing fit into RAM.

A lot of data, but not a lot of records... I don't know if that's
valid. I guess the people at Greenplum and/or Sun have more exciting
stories ;)


Bye, Chris.




Re: [PERFORM] Postgres and really huge tables

От
Tom Lane
Дата:
Brian Hurt <bhurt@janestcapital.com> writes:
> Is there any experience with Postgresql and really huge tables?  I'm
> talking about terabytes (plural) here in a single table.

The 2MASS sky survey point-source catalog
http://www.ipac.caltech.edu/2mass/releases/allsky/doc/sec2_2a.html
is 470 million rows by 60 columns; I don't have it loaded up but
a very conservative estimate would be a quarter terabyte.  (I've
got a copy of the data ... 5 double-sided DVDs, gzipped ...)
I haven't heard from Rae Stiening recently but I know he's been using
Postgres to whack that data around since about 2001 (PG 7.1 or so,
which is positively medieval compared to current releases).  So at
least for static data, it's certainly possible to get useful results.
What are your processing requirements?

            regards, tom lane

Re: Postgres and really huge tables

От
"Luke Lonergan"
Дата:
Chris,

On 1/18/07 1:42 PM, "Chris Mair" <chris@1006.org> wrote:

> A lot of data, but not a lot of records... I don't know if that's
> valid. I guess the people at Greenplum and/or Sun have more exciting
> stories ;)

You guess correctly :-)

Given that we're Postgres 8.2, etc compatible, that might answer Brian's
coworker's question.  Soon we will be able to see that Greenplum/Postgres
are handling the world's largest databases both in record count and size.

While the parallel scaling technology we employ is closed source, we are
still contributing scaling technology to the community (partitioning, bitmap
index, sort improvements, resource management, more to come), so Postgres as
a "bet" is likely safer and better than a completely closed source
commercial product.

- Luke



Re: [PERFORM] Postgres and really huge tables

От
Oleg Bartunov
Дата:
On Thu, 18 Jan 2007, Tom Lane wrote:

> Brian Hurt <bhurt@janestcapital.com> writes:
>> Is there any experience with Postgresql and really huge tables?  I'm
>> talking about terabytes (plural) here in a single table.
>
> The 2MASS sky survey point-source catalog
> http://www.ipac.caltech.edu/2mass/releases/allsky/doc/sec2_2a.html
> is 470 million rows by 60 columns; I don't have it loaded up but
> a very conservative estimate would be a quarter terabyte.  (I've
> got a copy of the data ... 5 double-sided DVDs, gzipped ...)
> I haven't heard from Rae Stiening recently but I know he's been using
> Postgres to whack that data around since about 2001 (PG 7.1 or so,
> which is positively medieval compared to current releases).  So at
> least for static data, it's certainly possible to get useful results.
> What are your processing requirements?

We are working in production with 2MASS and other catalogues, and
2MASS is not the biggest. The nomad catalog has more than milliard records.
You could query them online
http://vo.astronet.ru/cas/conesearch.php
Everything is in PostgreSQL 8.1.5 and at present migrate to the 8.2.1,
which is very slow, since slow COPY.
The hardware we use is HP rx1620, dual Itanium2, MSA 20, currently
4.5 Tb.



>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: Postgres and really huge tables

От
Josh Berkus
Дата:
> A lot of data, but not a lot of records... I don't know if that's
> valid. I guess the people at Greenplum and/or Sun have more exciting
> stories ;)

Not really.   Pretty much multi-terabyte tables are fine on vanilla
PostgreSQL if you can stick to partitioned and/or indexed access.  If you
need to do unindexed fishing expeditions on 5tb of data, then talk to
Greenplum.

http://www.powerpostgresql.com/Downloads/terabytes_osc2005.pdf

--
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

Re: [PERFORM] Postgres and really huge tables

От
Scott Marlowe
Дата:
On Thu, 2007-01-18 at 14:31, Brian Hurt wrote:
> Is there any experience with Postgresql and really huge tables?  I'm
> talking about terabytes (plural) here in a single table.  Obviously the
> table will be partitioned, and probably spread among several different
> file systems.  Any other tricks I should know about?
>
> We have a problem of that form here.  When I asked why postgres wasn't
> being used, the opinion that postgres would "just <explicitive> die" was
> given.  Personally, I'd bet money postgres could handle the problem (and
> better than the ad-hoc solution we're currently using).  But I'd like a
> couple of replies of the form "yeah, we do that here- no problem" to
> wave around.

It really depends on what you're doing.

Are you updating every row by a single user every hour, or are you
updating dozens of rows by hundreds of users at the same time?

PostgreSQL probably wouldn't die, but it may well be that for certain
batch processing operations it's a poorer choice than awk/sed or perl.

If you do want to tackle it with PostgreSQL, you'll likely want to build
a truly fast drive subsystem.  Something like dozens to hundreds of
drives in a RAID-10 setup with battery backed cache, and a main server
with lots of memory on board.

But, really, it depends on what you're doing to the data.

Re: [PERFORM] Postgres and really huge tables

От
"Merlin Moncure"
Дата:
On 1/18/07, Brian Hurt <bhurt@janestcapital.com> wrote:
> Is there any experience with Postgresql and really huge tables?  I'm
> talking about terabytes (plural) here in a single table.  Obviously the
> table will be partitioned, and probably spread among several different
> file systems.  Any other tricks I should know about?

A pretty effective partitioning strategy that works in some cases is
to identify a criteria in your dataset that isolates your data on a
session basis.  For example, if you have a company_id that divides up
your company data and a session only needs to deal with company_id,
you can separate out all your tables based on company_id into
different schemas and have the session set the search_path variable
when it logs in.  Data that does not partition on your criteria sits
in public schemas that all the companies can see.

This takes advantage of a special trick regarding stored procedures
that they do not attach to tables until the first time they are
executed in a session -- keeping you from having to make a function
for each schema. (note: views do not have this property).   You can
still cross query using views and the like or hand rolled sql.

I would call this type of partitioning logical partitioning since you
are leveraging logical divisions in your data.  It obviously doesn't
work in all cases but when it does  it works great.

> We have a problem of that form here.  When I asked why postgres wasn't
> being used, the opinion that postgres would "just <explicitive> die" was
> given.  Personally, I'd bet money postgres could handle the problem (and
> better than the ad-hoc solution we're currently using).  But I'd like a
> couple of replies of the form "yeah, we do that here- no problem" to
> wave around.

pg will of course not die as when your dataset hits a certain
threshold.  It will become slower based on well know mathematical
patterns that grow with your working set size.  One of the few things
that gets to be a pain with large tables is vacuum -- since you can't
vacuum a piece of table and there are certain annoyances with having a
long running vacuum this is something to think about.

Speaking broadly about table partitioning, it optimizes one case at
the expense of another.  Your focus (IMO) should be on reducing your
working set size under certain conditions -- not the physical file
size.  If you have a properly laid out and logical dataset and can
identify special cases where you need some information and not other
information, the partitioning strategy should fall into place, whether
it is to do nothing, isolate data into separate schemas/tables/files,
or use the built in table partitioning feature (which to be honest I
am not crazy about).

merlin