Обсуждение: Hadoop backend?

Поиск
Список
Период
Сортировка

Hadoop backend?

От
Paul Sheer
Дата:
Hadoop backend for PostGreSQL....

A problem that my client has, and one that I come across often,
is that a database seems to always be associated with a particular
physical machine, a physical machine that has to be upgraded,
replaced, or otherwise maintained.

Even if the database is replicated, it just means there are two or
more machines. Replication is also a difficult thing to properly
manage.

With a distributed data store, the data would become a logical
object - no adding or removal of machines would affect the data.
This is an ideal that would remove a tremendous maintenance
burden from many sites ---- well, at least the one's I have worked
at as far as I can see.

Does anyone know of plans to implement PostGreSQL over Hadoop?

Yahoo seems to be doing this:     http://glinden.blogspot.com/2008/05/yahoo-builds-two-petabyte-postgresql.html

But they store tables column-ways for their performance situation.
If one is doing a lot of inserts I don't think this is most efficient - ?

Has Yahoo put the source code for their work online?

Many thanks for any pointers.

-paul


Re: Hadoop backend?

От
pi song
Дата:
1) Hadoop file system is very optimized for mostly read operation
2) As of a few months ago, hdfs doesn't support file appending.

There might be a bit of impedance to make them go together.

However, I think it should a very good initiative to come up with ideas to be able to run postgres on distributed file system (doesn't have to be specific hadoop).

Pi Song

On Sun, Feb 22, 2009 at 7:17 AM, Paul Sheer <paulsheer@gmail.com> wrote:
Hadoop backend for PostGreSQL....

A problem that my client has, and one that I come across often,
is that a database seems to always be associated with a particular
physical machine, a physical machine that has to be upgraded,
replaced, or otherwise maintained.

Even if the database is replicated, it just means there are two or
more machines. Replication is also a difficult thing to properly
manage.

With a distributed data store, the data would become a logical
object - no adding or removal of machines would affect the data.
This is an ideal that would remove a tremendous maintenance
burden from many sites ---- well, at least the one's I have worked
at as far as I can see.

Does anyone know of plans to implement PostGreSQL over Hadoop?

Yahoo seems to be doing this:
     http://glinden.blogspot.com/2008/05/yahoo-builds-two-petabyte-postgresql.html

But they store tables column-ways for their performance situation.
If one is doing a lot of inserts I don't think this is most efficient - ?

Has Yahoo put the source code for their work online?

Many thanks for any pointers.

-paul

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Hadoop backend?

От
Hans-Jürgen Schönig
Дата:
hi ...

i think the easiest way to do this is to simply add a mechanism to functions which allows a function to "stream" data through.
it would basically mean losing join support as you cannot "read data again" in a way which is good enough good enough for joining with the function providing the data from hadoop.

hannu ( I think) brought up some concept as well some time ago.

i think a straight forward implementation would not be too hard.

best regards,

hans



On Feb 22, 2009, at 3:37 AM, pi song wrote:

1) Hadoop file system is very optimized for mostly read operation
2) As of a few months ago, hdfs doesn't support file appending.

There might be a bit of impedance to make them go together.

However, I think it should a very good initiative to come up with ideas to be able to run postgres on distributed file system (doesn't have to be specific hadoop).

Pi Song

On Sun, Feb 22, 2009 at 7:17 AM, Paul Sheer <paulsheer@gmail.com> wrote:
Hadoop backend for PostGreSQL....

A problem that my client has, and one that I come across often,
is that a database seems to always be associated with a particular
physical machine, a physical machine that has to be upgraded,
replaced, or otherwise maintained.

Even if the database is replicated, it just means there are two or
more machines. Replication is also a difficult thing to properly
manage.

With a distributed data store, the data would become a logical
object - no adding or removal of machines would affect the data.
This is an ideal that would remove a tremendous maintenance
burden from many sites ---- well, at least the one's I have worked
at as far as I can see.

Does anyone know of plans to implement PostGreSQL over Hadoop?

Yahoo seems to be doing this:
     http://glinden.blogspot.com/2008/05/yahoo-builds-two-petabyte-postgresql.html

But they store tables column-ways for their performance situation.
If one is doing a lot of inserts I don't think this is most efficient - ?

Has Yahoo put the source code for their work online?

Many thanks for any pointers.

-paul

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers



--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt

Re: Hadoop backend?

От
Robert Haas
Дата:
On Sat, Feb 21, 2009 at 9:37 PM, pi song <pi.songs@gmail.com> wrote:
> 1) Hadoop file system is very optimized for mostly read operation
> 2) As of a few months ago, hdfs doesn't support file appending.
> There might be a bit of impedance to make them go together.
> However, I think it should a very good initiative to come up with ideas to
> be able to run postgres on distributed file system (doesn't have to be
> specific hadoop).

In theory, I think you could make postgres work on any type of
underlying storage you like by writing a second smgr implementation
that would exist alongside md.c.  The fly in the ointment is that
you'd need a more sophisticated implementation of this line of code,
from smgropen:
   reln->smgr_which = 0;   /* we only have md.c at present */

Logically, it seems like the choice of smgr should track with the
notion of a tablespace.  IOW, you might to have one tablespace that is
stored on a magnetic disk (md.c) and another that is stored on your
hypothetical distributed filesystem (hypodfs.c).  I'm not sure how
hard this would be to implement, but I don't think smgropen() is in a
position to do syscache lookups, so probably not that easy.

...Robert


Re: Hadoop backend?

От
pi song
Дата:
One more problem is that data placement on HDFS is inherent, meaning you have no explicit control. Thus, you cannot place two sets of data which are likely to be joined together on the same node = uncontrollable latency during query processing.

Pi Song

On Mon, Feb 23, 2009 at 7:47 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Sat, Feb 21, 2009 at 9:37 PM, pi song <pi.songs@gmail.com> wrote:
> 1) Hadoop file system is very optimized for mostly read operation
> 2) As of a few months ago, hdfs doesn't support file appending.
> There might be a bit of impedance to make them go together.
> However, I think it should a very good initiative to come up with ideas to
> be able to run postgres on distributed file system (doesn't have to be
> specific hadoop).

In theory, I think you could make postgres work on any type of
underlying storage you like by writing a second smgr implementation
that would exist alongside md.c.  The fly in the ointment is that
you'd need a more sophisticated implementation of this line of code,
from smgropen:

   reln->smgr_which = 0;   /* we only have md.c at present */

Logically, it seems like the choice of smgr should track with the
notion of a tablespace.  IOW, you might to have one tablespace that is
stored on a magnetic disk (md.c) and another that is stored on your
hypothetical distributed filesystem (hypodfs.c).  I'm not sure how
hard this would be to implement, but I don't think smgropen() is in a
position to do syscache lookups, so probably not that easy.

...Robert

Re: Hadoop backend?

От
Robert Haas
Дата:
On Sun, Feb 22, 2009 at 5:18 PM, pi song <pi.songs@gmail.com> wrote:
> One more problem is that data placement on HDFS is inherent, meaning you
> have no explicit control. Thus, you cannot place two sets of data which are
> likely to be joined together on the same node = uncontrollable latency
> during query processing.
> Pi Song

It would only be possible to have the actual PostgreSQL backends
running on a single node anyway, because they use shared memory to
hold lock tables and things.  The advantage of a distributed file
system would be that you could access more storage (and more system
buffer cache) than would be possible on a single system (or perhaps
the same amount but at less cost).  Assuming some sort of
per-tablespace control over the storage manager, you could put your
most frequently accessed data locally and the less frequently accessed
data into the DFS.

But you'd still have to pull all the data back to the master node to
do anything with it.  Being able to actually distribute the
computation would be a much harder problem.  Currently, we don't even
have the ability to bring multiple CPUs to bear on (for example) a
large sequential scan (even though all the data is on a single node).

...Robert


Re: Hadoop backend?

От
pi song
Дата:


On Mon, Feb 23, 2009 at 3:56 PM, pi song <pi.songs@gmail.com> wrote:
I think the point that you can access more system cache is right but that doesn't mean it will be more efficient than accessing from your local disk. Take Hadoop for example, your request for file content will have to go to Namenode (file chunk indexing service) and then you go ask the data node which then provides you data. Assuming that you're working on a large dataset, the probability of the data chunk you need staying in system cache is very low therefore most of the time you end up reading from a remote disk.

I've got a better idea. How about we make the buffer pool multilevel? The first level is the current one. The second level represents memory  from remote machines. Things that are used less often should stay on the second level. Has anyone ever thought about something like this before?

Pi Song

On Mon, Feb 23, 2009 at 1:09 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Sun, Feb 22, 2009 at 5:18 PM, pi song <pi.songs@gmail.com> wrote:
> One more problem is that data placement on HDFS is inherent, meaning you
> have no explicit control. Thus, you cannot place two sets of data which are
> likely to be joined together on the same node = uncontrollable latency
> during query processing.
> Pi Song

It would only be possible to have the actual PostgreSQL backends
running on a single node anyway, because they use shared memory to
hold lock tables and things.  The advantage of a distributed file
system would be that you could access more storage (and more system
buffer cache) than would be possible on a single system (or perhaps
the same amount but at less cost).  Assuming some sort of
per-tablespace control over the storage manager, you could put your
most frequently accessed data locally and the less frequently accessed
data into the DFS.

But you'd still have to pull all the data back to the master node to
do anything with it.  Being able to actually distribute the
computation would be a much harder problem.  Currently, we don't even
have the ability to bring multiple CPUs to bear on (for example) a
large sequential scan (even though all the data is on a single node).

...Robert


Re: Hadoop backend?

От
Paul Sheer
Дата:
> It would only be possible to have the actual PostgreSQL backends<br /> > running on a single node anyway,
becausethey use shared memory to<br /><br />This is not problem: Performance is a secondary consideration (at least<br
/>asfar as the problem I was referring to).<br /><br />The primary usefulness is to have the data be a logical entity
rather<br/> than a physical entity so that one can maintain physical machines<br />without having to worry to much
aboutwhere-is-the-data.<br /><br />At the moment, most databases suffer from the problem of occasionally<br />having to
movethe data from one place to another. This is a major<br /> nightmare that happens once every few years for most
DBAs.<br/>It happens because a system needs a soft/hard upgrade, or a disk<br />enlarged, or because a piece of
hardwarefails.<br /><br />I have also found it's no use having RAID or ZFS. Each of these ties<br /> the data to an OS
installation.If the OS needs to be reinstalled, all<br />the data has to be manually moved in a way that is, well...
dangerous.<br/><br />If there is only one machine running postgres that is fine too: I can have<br /> a second
identicalmachine on standby in case of a hardware failure.<br />That means a short amount of downtime - most people can
live<br/>with that.<br /><br />I read somewhere that replication was one of the goals of postgres's<br /> coming
developmentefforts. Personally I think hadoop might be<br />a better solution - *shrug*.<br /><br />Thoughts/comments
??<br/><br />-paul<br /><br /><br /><br /> 

Re: Hadoop backend?

От
Robert Haas
Дата:
On Mon, Feb 23, 2009 at 9:08 AM, Paul Sheer <paulsheer@gmail.com> wrote:
>> It would only be possible to have the actual PostgreSQL backends
>> running on a single node anyway, because they use shared memory to
>
> This is not problem: Performance is a secondary consideration (at least
> as far as the problem I was referring to).
>
> The primary usefulness is to have the data be a logical entity rather
> than a physical entity so that one can maintain physical machines
> without having to worry to much about where-is-the-data.
>
> At the moment, most databases suffer from the problem of occasionally
> having to move the data from one place to another. This is a major
> nightmare that happens once every few years for most DBAs.
> It happens because a system needs a soft/hard upgrade, or a disk
> enlarged, or because a piece of hardware fails.
>
> I have also found it's no use having RAID or ZFS. Each of these ties
> the data to an OS installation. If the OS needs to be reinstalled, all
> the data has to be manually moved in a way that is, well... dangerous.
>
> If there is only one machine running postgres that is fine too: I can have
> a second identical machine on standby in case of a hardware failure.
> That means a short amount of downtime - most people can live
> with that.

I think the performance aspect of things has to be considered.  If the
system introduces too much overhead it won't be useful in practice.
But apart from that I agree with all of this.

> I read somewhere that replication was one of the goals of postgres's
> coming development efforts. Personally I think hadoop might be
> a better solution - *shrug*.
>
> Thoughts/comments ??

I think the two are solving different problems.

...Robert


Re: Hadoop backend?

От
Andrew Chernow
Дата:
Paul Sheer wrote
> I have also found it's no use having RAID or ZFS. Each of these ties
> the data to an OS installation. If the OS needs to be reinstalled, all
> the data has to be manually moved in a way that is, well... dangerous.
> 
> 

How about network storage, fiber attach?  If you move the db you only 
need to redirect the LUN(s) to a new WWN.

-- 
Andrew Chernow
eSilo, LLC
every bit counts
http://www.esilo.com/


Re: Hadoop backend?

От
Markus Wanner
Дата:
Hi,

Paul Sheer wrote:
> This is not problem: Performance is a secondary consideration (at least
> as far as the problem I was referring to).

Well, if you don't mind your database running .. ehm.. creeping several
orders of magnitudes slower, you might also be interested in
Single-System Image Clustering Systems [1], like Beowulf, Kerrighed [2],
OpenSSI [3], etc..  Besides distributed filesystems, those also provide
transparent shared memory across nodes.

> The primary usefulness is to have the data be a logical entity rather
> than a physical entity so that one can maintain physical machines
> without having to worry to much about where-is-the-data.

There are lots of solutions offering that already. In what way should
Hadoop be better than any of those existing ones? For a slightly
different example, you can get equivalent functionality on the block
devices layer with DRBD [4], which is in successful use for Postgres as
well.

The main challenge with distributed filesystems remains reliable failure
detection and ensuring that only exactly one node is alive at any time.

> At the moment, most databases suffer from the problem of occasionally
> having to move the data from one place to another. This is a major
> nightmare that happens once every few years for most DBAs.
> It happens because a system needs a soft/hard upgrade, or a disk
> enlarged, or because a piece of hardware fails.

You are comparing to standalone nodes here, which doesn't make much
sense, IMO.

> I have also found it's no use having RAID or ZFS. Each of these ties
> the data to an OS installation. If the OS needs to be reinstalled, all
> the data has to be manually moved in a way that is, well... dangerous.

I'm thinking more of Lustre, GFS, OCFS, AFS or some such. Compare with
those!

> If there is only one machine running postgres that is fine too: I can have
> a second identical machine on standby in case of a hardware failure.
> That means a short amount of downtime - most people can live
> with that.

What most people have trouble with is a master that revives and suddenly
confuses the new master (old slave).

> I read somewhere that replication was one of the goals of postgres's
> coming development efforts. Personally I think hadoop might be
> a better solution - *shrug*.

I'm not convinced at all. The trouble is not the replication of the data
on disk, it's rather the data in shared memory which poses the hard
problems (locks, caches, etc..). The former is solved already, the later
is a tad harder to solve. See [5] for my approach (showing my bias).

What I do find interesting about Hadoop is the MapReduce approach, but
lots more than writing another "storage backend" is required, if you
want to make use of that for Postgres. Greenplum claims to have
implemented MapReduce for their Database [6], however, to me it looks
like that is working a couple of layers above the filesystem.

Regards

Markus Wanner


[1]: Wikipedia: Single-System Image Clustering
http://en.wikipedia.org/wiki/Single-system_image

[2]: http://www.kerrighed.org/

[3]: http://www.openssi.org/

[4]: http://www.drbd.org/

[5]: Postgres-R:
http://www.postgres-r.org/

[6]: Greenplum MapReduce
http://www.greenplum.com/resources/mapreduce/


Re: Hadoop backend?

От
"Jonah H. Harris"
Дата:
On Sun, Feb 22, 2009 at 3:47 PM, Robert Haas <robertmhaas@gmail.com> wrote:
In theory, I think you could make postgres work on any type of
underlying storage you like by writing a second smgr implementation
that would exist alongside md.c.  The fly in the ointment is that
you'd need a more sophisticated implementation of this line of code,
from smgropen:

   reln->smgr_which = 0;   /* we only have md.c at present */

I believe there is more than that which would need to be done nowadays.  I seem to recall that the storage manager abstraction has slowly been dedicated/optimized for md over the past 6 years or so.  It may even be easier/preferred to write a hadoop specific access method depending on what you're looking for from hadoop.

--
Jonah H. Harris, Senior DBA
myYearbook.com

Re: Hadoop backend?

От
Tom Lane
Дата:
"Jonah H. Harris" <jonah.harris@gmail.com> writes:
> I believe there is more than that which would need to be done nowadays.  I
> seem to recall that the storage manager abstraction has slowly been
> dedicated/optimized for md over the past 6 years or so.

As far as I can tell, the PG storage manager API is at the wrong level
of abstraction for pretty much everything.  These days, everything we do
is atop the Unix filesystem API, and anything that smgr might have been
able to do for us is getting handled in kernel filesystem code or device
drivers.  (Back in the eighties, when it was more plausible for PG to do
direct device access, maybe smgr was good for something; but no more.)

It's interesting to speculate about where we could draw an abstraction
boundary that would be more useful.  I don't think the MySQL guys got it
right either...
        regards, tom lane


Re: Hadoop backend?

От
pi song
Дата:
|     I believe there is more than that which would need to be done nowadays.  I seem to recall that the storage manager| 
|     abstraction has slowly been dedicated/optimized for md over the past 6 years or so.  It may even be easier/preferred 
|     to write a hadoop specific access method depending on what you're looking for from hadoop.

I think you're very right. What Postgres needs is access method abstraction. One should be able to plug in access method for SSD or network file systems if appropriate. I don't talk about MapReduce bit in Hadoop because I think that's a different story. What you need for MapReduce are 1) data store which feeds you data and then 2) MapReduce does the query processing. This has nothing to share with Postgres query processor in common. If you just want data from Postgres then it should be easier to build postgres data feeder in Hadoop (which might even already exist).

Pi Song

On Tue, Feb 24, 2009 at 11:24 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Jonah H. Harris" <jonah.harris@gmail.com> writes:
> I believe there is more than that which would need to be done nowadays.  I
> seem to recall that the storage manager abstraction has slowly been
> dedicated/optimized for md over the past 6 years or so.

As far as I can tell, the PG storage manager API is at the wrong level
of abstraction for pretty much everything.  These days, everything we do
is atop the Unix filesystem API, and anything that smgr might have been
able to do for us is getting handled in kernel filesystem code or device
drivers.  (Back in the eighties, when it was more plausible for PG to do
direct device access, maybe smgr was good for something; but no more.)

It's interesting to speculate about where we could draw an abstraction
boundary that would be more useful.  I don't think the MySQL guys got it
right either...

                       regards, tom lane

Re: Hadoop backend?

От
Paul Sheer
Дата:
<div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt
0pt0pt 0.8ex; padding-left: 1ex;"><div class="Ih2E3d"><br /></div>As far as I can tell, the PG storage manager API is
atthe wrong level<br /> of abstraction for pretty much everything.  These days, everything we do<br /> is atop the Unix
filesystemAPI, and anything that smgr might have been<br /></blockquote></div><br /><br />Is there a complete list of
filesystemAPI calls somewhere that I can get my head around?<br />At least this will give me an idea of the scope of
theeffort.<br /><br /><span style="border-collapse: collapse;">> should be easier to build postgres data feeder in
Hadoop(which might even already exist).<br /><br />do you know of a url?<br /><br /></span>-paul<br /><br /><br /><br
/>

Re: Hadoop backend?

От
Hans-Jürgen Schönig
Дата:
why not just stream it in via set-returning functions and make sure that we can mark a set returning function as "STREAMABLE" or so (to prevent joins, whatever).
is it the easiest way to get it right and it helps in many other cases.
i think that the storage manager is definitely the wrong place to do this.

it is also easy to use more than just one backend then if you get the interface code right.

regards,

hans


On Feb 24, 2009, at 12:03 AM, Jonah H. Harris wrote:

On Sun, Feb 22, 2009 at 3:47 PM, Robert Haas <robertmhaas@gmail.com> wrote:
In theory, I think you could make postgres work on any type of
underlying storage you like by writing a second smgr implementation
that would exist alongside md.c.  The fly in the ointment is that
you'd need a more sophisticated implementation of this line of code,
from smgropen:

   reln->smgr_which = 0;   /* we only have md.c at present */

I believe there is more than that which would need to be done nowadays.  I seem to recall that the storage manager abstraction has slowly been dedicated/optimized for md over the past 6 years or so.  It may even be easier/preferred to write a hadoop specific access method depending on what you're looking for from hadoop.

--
Jonah H. Harris, Senior DBA
myYearbook.com



--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt

Re: Hadoop backend?

От
Peter Eisentraut
Дата:
Tom Lane wrote:
> It's interesting to speculate about where we could draw an abstraction
> boundary that would be more useful.  I don't think the MySQL guys got it
> right either...

The supposed smgr abstraction of PostgreSQL, which tells more or less 
how to get a byte to the disk, is quite far away from what MySQL calls a 
storage engine, which has things like open table, scan table, drop table 
on a logical level (meaning open table, not open heap).

To my judgement, neither of these approaches is terribly useful from a 
current, practical point of view.

In any case, in order to solve the "where to abstract" question, you'd 
probably want to have one or two other storage APIs that you seriously 
want to integrate, and then you can analyze how to unify them.


Re: Hadoop backend?

От
Josh Berkus
Дата:
> With a distributed data store, the data would become a logical
> object - no adding or removal of machines would affect the data.
> This is an ideal that would remove a tremendous maintenance
> burden from many sites ---- well, at least the one's I have worked
> at as far as I can see.

Two things:

1) Hadoop is the wrong technology.  It's not designed to support 
transactional operations.

2) Transactional operations are, in general, your Big Obstacle for doing 
anything in the way of a distributed storage manager.

It's possible you could make both of the above "go away" if you were 
planning for a DW platform in which transactions weren't important. 
However, that would have to become an incompatible fork of PostgreSQL.

AFAIK, the Yahoo platform does not involve Hadoop at all.

--Josh



Re: Hadoop backend?

От
Ron Mayer
Дата:
Paul Sheer wrote:
> Hadoop backend for PostGreSQL....

Resurrecting an old thread, it seems some guys at Yale implemented
something very similar to what this thread was discussing.

http://dbmsmusings.blogspot.com/2009/07/announcing-release-of-hadoopdb-longer.html
> >
> >It's an open source stack that includes PostgreSQL Hadoop, and Hive, along
> >with some glue between PostgreSQL and Hadoop, a catalog, a data loader, and
> >an interface that accepts queries in MapReduce or SQL and generates query
> >plans that are processed partly in Hadoop and partly in different PostgreSQL
> >instances spread across many nodes in a shared-nothing cluster of machines.

Their detailed paper is here:
 http://db.cs.yale.edu/hadoopdb/hadoopdb.pdf

According to the paper, it scales very well.


> A problem that my client has, and one that I come across often,
> is that a database seems to always be associated with a particular
> physical machine, a physical machine that has to be upgraded,
> replaced, or otherwise maintained.
> 
> Even if the database is replicated, it just means there are two or
> more machines. Replication is also a difficult thing to properly
> manage.
> 
> With a distributed data store, the data would become a logical
> object - no adding or removal of machines would affect the data.
> This is an ideal that would remove a tremendous maintenance
> burden from many sites ---- well, at least the one's I have worked
> at as far as I can see.
> 
> Does anyone know of plans to implement PostGreSQL over Hadoop?
> 
> Yahoo seems to be doing this:
>       http://glinden.blogspot.com/2008/05/yahoo-builds-two-petabyte-postgresql.html
> 
> But they store tables column-ways for their performance situation.
> If one is doing a lot of inserts I don't think this is most efficient - ?
> 
> Has Yahoo put the source code for their work online?
> 
> Many thanks for any pointers.
> 
> -paul
>