Обсуждение: Tables on multiple disk drives

Поиск

Список

Период

Сортировка

Tables on multiple disk drives

От

Konstantin Tokar

Дата:

17 февраля 2004 г., 08:55:03

Hi!
Does PostgreSQL allow to create tables and indices of a single
database on multiple disk drives with a purpose of increase
performance as Oracle database does? If a symbolic reference is the
only method then the next question is: how can it be determined what
file is referred to what table and index?

Re: Tables on multiple disk drives

От

Richard Huxton

Дата:

17 февраля 2004 г., 10:56:07

On Tuesday 17 February 2004 12:54, Konstantin Tokar wrote:
> Hi!
> Does PostgreSQL allow to create tables and indices of a single
> database on multiple disk drives with a purpose of increase
> performance as Oracle database does? If a symbolic reference is the
> only method then the next question is: how can it be determined what
> file is referred to what table and index?

Yep, symlinks are the way at present (though I think someone is working on
tablespace support). The files are named after the OID of the object they
represent - there is a useful oid2name utility in the contrib/ folder.

You might want to check the archives though, and see what RAID setups people
prefer - less trouble to maintain than symlinking.

--
  Richard Huxton
  Archonet Ltd

Re: Tables on multiple disk drives

От

"scott.marlowe"

Дата:

17 февраля 2004 г., 13:27:21

On Tue, 17 Feb 2004, Konstantin Tokar wrote:

> Hi!
> Does PostgreSQL allow to create tables and indices of a single
> database on multiple disk drives with a purpose of increase
> performance as Oracle database does? If a symbolic reference is the
> only method then the next question is: how can it be determined what
> file is referred to what table and index?

You're life will be simpler, and your setup will be faster without having
to muck about with it, if you just buy a good RAID controller with battery
backed cache.  LSI/Megaraid and Adaptec both make serviceable controllers
for reasonable prices, and as you add drives, the speed just goes up, no
muddling around with sym links.

Re: Tables on multiple disk drives

От

Craig Thomas

Дата:

17 февраля 2004 г., 13:35:43

> On Tue, 17 Feb 2004, Konstantin Tokar wrote:
>
>> Hi!
>> Does PostgreSQL allow to create tables and indices of a single
>> database on multiple disk drives with a purpose of increase
>> performance as Oracle database does? If a symbolic reference is the
>> only method then the next question is: how can it be determined what
>> file is referred to what table and index?
>
> You're life will be simpler, and your setup will be faster without
> having  to muck about with it, if you just buy a good RAID controller
> with battery  backed cache.  LSI/Megaraid and Adaptec both make
> serviceable controllers  for reasonable prices, and as you add drives,
> the speed just goes up, no  muddling around with sym links.

This works to a limited extent.  For very large databases, maximum
throughput of I/O is the paramount factor for database performance.  With
raid controllers, your LUN is still limited to a small number of disks.
PostgreSQL can only write on a file system, but Oracle, SAP DB, DB2, etc
can write directly to disk (raw I/O).  With large databases it is
advantageous to spread a table across 100's of disks, if the table is
quite large.  I don't know of any manufacturer that creates a 100 disk
raid array yet.

Some of the problem can be addressed by using a volume manager (such as
LVM in Linux, or Veritas on Unix-like systems).  This allows one to
create a volume using partitions from many disks.  One can then create
a file system and mount it on the volume.

However, to get the best performance, Raw I/O capability is the best
way to go.
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly

Re: Tables on multiple disk drives

От

"scott.marlowe"

Дата:

17 февраля 2004 г., 14:00:41

On Tue, 17 Feb 2004, Craig Thomas wrote:

> > On Tue, 17 Feb 2004, Konstantin Tokar wrote:
> >
> >> Hi!
> >> Does PostgreSQL allow to create tables and indices of a single
> >> database on multiple disk drives with a purpose of increase
> >> performance as Oracle database does? If a symbolic reference is the
> >> only method then the next question is: how can it be determined what
> >> file is referred to what table and index?
> >
> > You're life will be simpler, and your setup will be faster without
> > having  to muck about with it, if you just buy a good RAID controller
> > with battery  backed cache.  LSI/Megaraid and Adaptec both make
> > serviceable controllers  for reasonable prices, and as you add drives,
> > the speed just goes up, no  muddling around with sym links.
>
> This works to a limited extent.  For very large databases, maximum
> throughput of I/O is the paramount factor for database performance.  With
> raid controllers, your LUN is still limited to a small number of disks.
> PostgreSQL can only write on a file system, but Oracle, SAP DB, DB2, etc
> can write directly to disk (raw I/O).  With large databases it is
> advantageous to spread a table across 100's of disks, if the table is
> quite large.  I don't know of any manufacturer that creates a 100 disk
> raid array yet.

You can run up to four LSI / Megaraids in one box, each with 3 UW SCSI
interfaces, and they act as one unit.  That's 3*4*15 = 180 disks max.

With FC AL connections and four cards, it would be possible to approach
1000 drives.

Of course, I'm not sure how fast any RAID card setup is gonna be with that
many drives, but ya never know.  My guess is that before you go there you
buy a big external RAID box built for speed.  We have a couple of 200+
drive external RAID5 storage boxes at work that are quite impressive.

> Some of the problem can be addressed by using a volume manager (such as
> LVM in Linux, or Veritas on Unix-like systems).  This allows one to
> create a volume using partitions from many disks.  One can then create
> a file system and mount it on the volume.

Pretty much RAID arrays in software, which means no battery backed cache,
which means it'll be fast at reading, but probably pretty slow at writes,
epsecially if there's a lot of parallel access waiting to write to the
database.

> However, to get the best performance, Raw I/O capability is the best
> way to go.

Unsupported statement made as fact.  I'm not saying it can't or isn't
true, but my experience has been that large RAID5 arrays are a great
compromise between maximum performance and reliability, giving a good
measure of each.  It doesn't take 100 drives to do well, even a dozen to
two dozen will get you in the same basic range as splitting out files by
hand with sym links without all the headache of chasing down the files,
shutting down the database, linking it over etc...

Re: Tables on multiple disk drives

От

Craig Thomas

Дата:

17 февраля 2004 г., 14:22:54

> On Tue, 17 Feb 2004, Craig Thomas wrote:
>
>> > On Tue, 17 Feb 2004, Konstantin Tokar wrote:
>> >
>> >> Hi!
>> >> Does PostgreSQL allow to create tables and indices of a single
>> database on multiple disk drives with a purpose of increase
>> >> performance as Oracle database does? If a symbolic reference is the
>> only method then the next question is: how can it be determined
>> what file is referred to what table and index?
>> >
>> > You're life will be simpler, and your setup will be faster without
>> having  to muck about with it, if you just buy a good RAID
>> controller with battery  backed cache.  LSI/Megaraid and Adaptec
>> both make serviceable controllers  for reasonable prices, and as you
>> add drives, the speed just goes up, no  muddling around with sym
>> links.
>>
>> This works to a limited extent.  For very large databases, maximum
>> throughput of I/O is the paramount factor for database performance.
>> With raid controllers, your LUN is still limited to a small number of
>> disks. PostgreSQL can only write on a file system, but Oracle, SAP DB,
>> DB2, etc can write directly to disk (raw I/O).  With large databases
>> it is advantageous to spread a table across 100's of disks, if the
>> table is quite large.  I don't know of any manufacturer that creates a
>> 100 disk raid array yet.
>
> You can run up to four LSI / Megaraids in one box, each with 3 UW SCSI
> interfaces, and they act as one unit.  That's 3*4*15 = 180 disks max.
>
> With FC AL connections and four cards, it would be possible to approach
> 1000 drives.
>
> Of course, I'm not sure how fast any RAID card setup is gonna be with
> that  many drives, but ya never know.  My guess is that before you go
> there you  buy a big external RAID box built for speed.  We have a
> couple of 200+  drive external RAID5 storage boxes at work that are
> quite impressive.

That's a good point.  But it seems that the databases that are the
leaders of the TPC numbers seem to be the Oracles of the world.  I
know that a former company I worked for publised TPC numbers using
Oracle with Raw I/O to get the performance up.

However, it would be interesting for us to conduct a small scale
test using a couple of HW Raid systems configured so that a single
file system can be mounted, then run the OSDL dbt workloads.  The
resluts could then be compared with current results that have been
captured.
>
>> Some of the problem can be addressed by using a volume manager (such
>> as LVM in Linux, or Veritas on Unix-like systems).  This allows one to
>> create a volume using partitions from many disks.  One can then create
>> a file system and mount it on the volume.
>
> Pretty much RAID arrays in software, which means no battery backed
> cache,  which means it'll be fast at reading, but probably pretty slow
> at writes,  epsecially if there's a lot of parallel access waiting to
> write to the  database.
>
>> However, to get the best performance, Raw I/O capability is the best
>> way to go.
>
> Unsupported statement made as fact.  I'm not saying it can't or isn't
> true, but my experience has been that large RAID5 arrays are a great
> compromise between maximum performance and reliability, giving a good
> measure of each.  It doesn't take 100 drives to do well, even a dozen to
>  two dozen will get you in the same basic range as splitting out files
> by  hand with sym links without all the headache of chasing down the
> files,  shutting down the database, linking it over etc...

Whoops, you're right.  I was typing faster than I was thinking.  I was
assuming a JBOD set up rather than a RAID storage subsystem.  SAN units
such as an EMC or Shark usualy have 4-16 GB cache and thus the I/O's
go pretty quick for really large databases.
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org

Re: Tables on multiple disk drives

От

"scott.marlowe"

Дата:

17 февраля 2004 г., 14:37:35

On Tue, 17 Feb 2004, Craig Thomas wrote:

> > On Tue, 17 Feb 2004, Craig Thomas wrote:
> >
> >> > On Tue, 17 Feb 2004, Konstantin Tokar wrote:
> >> >
> >> >> Hi!
> >> >> Does PostgreSQL allow to create tables and indices of a single
> >> database on multiple disk drives with a purpose of increase
> >> >> performance as Oracle database does? If a symbolic reference is the
> >> only method then the next question is: how can it be determined
> >> what file is referred to what table and index?
> >> >
> >> > You're life will be simpler, and your setup will be faster without
> >> having  to muck about with it, if you just buy a good RAID
> >> controller with battery  backed cache.  LSI/Megaraid and Adaptec
> >> both make serviceable controllers  for reasonable prices, and as you
> >> add drives, the speed just goes up, no  muddling around with sym
> >> links.
> >>
> >> This works to a limited extent.  For very large databases, maximum
> >> throughput of I/O is the paramount factor for database performance.
> >> With raid controllers, your LUN is still limited to a small number of
> >> disks. PostgreSQL can only write on a file system, but Oracle, SAP DB,
> >> DB2, etc can write directly to disk (raw I/O).  With large databases
> >> it is advantageous to spread a table across 100's of disks, if the
> >> table is quite large.  I don't know of any manufacturer that creates a
> >> 100 disk raid array yet.
> >
> > You can run up to four LSI / Megaraids in one box, each with 3 UW SCSI
> > interfaces, and they act as one unit.  That's 3*4*15 = 180 disks max.
> >
> > With FC AL connections and four cards, it would be possible to approach
> > 1000 drives.
> >
> > Of course, I'm not sure how fast any RAID card setup is gonna be with
> > that  many drives, but ya never know.  My guess is that before you go
> > there you  buy a big external RAID box built for speed.  We have a
> > couple of 200+  drive external RAID5 storage boxes at work that are
> > quite impressive.
>
> That's a good point.  But it seems that the databases that are the
> leaders of the TPC numbers seem to be the Oracles of the world.  I
> know that a former company I worked for publised TPC numbers using
> Oracle with Raw I/O to get the performance up.

But keep in mind, that in the TPC benchmarks, doing things that require
lots of dba work don't tend to make the cost in the test go up (you can
hide a lot of admin work in those things) while in real life, they do
drive up the real cost of maintenance.

I'd imagine that with Postgresql coming along nicely, it may well be that
in a year or two, in the real world, you can just take the money you'd
have spend on Oracle licenses and Oracle DBAs and just throw more drives
at a problem to solve it.

And still spend less money than you would on Oracle.  :-)

Re: Tables on multiple disk drives

От

Christopher Browne

Дата:

17 февраля 2004 г., 19:17:38

lists2@tokar.ru (Konstantin Tokar) wrote:
> Hi!
> Does PostgreSQL allow to create tables and indices of a single
> database on multiple disk drives with a purpose of increase
> performance as Oracle database does? If a symbolic reference is the
> only method then the next question is: how can it be determined what
> file is referred to what table and index?

It is possible to do this, albeit not trivially easily, by shutting
down the database, moving the index to another filesystem, and using a
symbolic link to connect it back in.  The system table pg_class
contains the relevant linkages.

But it seems likely to me that using a smart RAID controller (e.g. -
LSILogic MegaRAID) to link a whole lot of disks together to generate
one enormous striped filesystem would be a more effective strategy, in
the long run.

Doing that, with a substantial array of disk drives, allows your disk
subsystem to provide an analagous sort of performance increase without
there being any need for a DBA to fiddle around with anything.

If you have the DBA do the work, this means consuming some
not-insubstantial amount of time for analysis as well as down-time for
maintenance.  And it will be necessary to have a somewhat-fragile
"registry" of configuration information indicating what customizations
were done.

In contrast, throwing a smarter RAID controller at the problem costs
only a few hundred dollars, and requires little or no analysis effort.
And the RAID controller will affect _all_ cases where there could be
I/O benefits from striping tables across multiple drives.
--
If this was helpful, <http://svcs.affero.net/rm.php?r=cbbrowne> rate me
http://cbbrowne.com/info/x.html
The way to a man's heart is through the left ventricle.

Re: Tables on multiple disk drives

От

Josh Berkus

Дата:

19 февраля 2004 г., 00:57:37

Konstantin,

> > >> Does PostgreSQL allow to create tables and indices of a single
> > >> database on multiple disk drives with a purpose of increase
> > >> performance as Oracle database does? If a symbolic reference is the
> > >> only method then the next question is: how can it be determined what
> > >> file is referred to what table and index?

Howdy!   I bet you're a bit taken aback by the discussion that ensued, and
even more confused than before.

You are actually asking about two related features:

Tablespaces, which allows designating different directories/volumes for
specific tables and indexes at creation time, and:

Partitioned Tables, which allows the division of large tables and/or indexes
horizontally along pre-defined criteria.

The first, tablespaces, are under development and may make it for 7.5, or
maybe not, but certainly in the version after that.

The second, partitioned tables, is NOT under development because this feature
lacks both a programmer and a clear specification.

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Tables on multiple disk drives