Обсуждение: CFH: Mariposa, distributed DB

Поиск
Список
Период
Сортировка

CFH: Mariposa, distributed DB

От
"Ross J. Reedstrom"
Дата:
This is a Call For Hackers:

Some time ago, I floated a little discussion on this list about doing
some distributed database work with PostgreSQL. The project got back
burnered at work, but now has a timeline for needing a solution "this
summer."  Recent discussions on this list about Postgres's historical
object roots got me back to the Berkeley db sites, and reminded me about
Mariposa, which is Stonebraker's take on distributed DBs.

http://s2k-ftp.cs.berkeley.edu:8000:8000/mariposa/

StoneBraker has gone on to commercialize Mariposa as Cohera, which seems
to be one of those Enterprise Scale products where if you need to ask
how much a license costs, you can't afford it ;-)

Sounds like now would be a good time to re-visit Mariposa, and see what
good ideas can be folded over into PostgreSQL.  Mariposa was funded by
ARPA and ARO, and was used by NASA as the database part of the Sequoia
Project, which became Big Sur, looking to unify the various kinds of
geophysical data collected by earth observing missions.

The code is an offshoot of Postgres95, with lots of nasty '#ifdef P95's
scattered around. The split predates lots of good work by the PostgreSQL
team to clean up years of academic cruft that had accumulated, so merging
is not trivial.

Anyway, anyone interested in taking a look at this with me? I think the
place to start (i.e., where I'm starting) is to get the June-1996 alpha
release of Mariposa to compile on a current system (I'm running Linux
myself.) I've been doing a compare-and-contrast, staring at source code,
but I think I need a running system to decide how the parts fit together.

Then, plan what features to 'fold' into pgsql, and run a proposal past
this list, some time later in the 7.x series, perhaps in a couple of
months (you guys will probably be on 8.x by then!) Hopefully, not take-up
too much of the core developers time until we're talking integration.

Anyone else interested, I'm using the tarball from:

ftp://epoch.cs.berkeley.edu/pub/mariposa/src/alpha-1/mariposa-alpha-1.tar.gz

If this really takes off, I can host CVS of the mariposa and pgsql
sources, as well as web pages, mailing list, whatever. If it's just a
couple of us (or me all by myself ;-) we'll keep it simple.

Ross
-- 
Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> 
NSBRI Research Scientist/Programmer
Computer and Information Technology Institute
Rice University, 6100 S. Main St.,  Houston, TX 77005


Re: [HACKERS] CFH: Mariposa, distributed DB

От
Bruce Momjian
Дата:
> This is a Call For Hackers:
> 
> Some time ago, I floated a little discussion on this list about doing
> some distributed database work with PostgreSQL. The project got back
> burnered at work, but now has a timeline for needing a solution "this
> summer."  Recent discussions on this list about Postgres's historical
> object roots got me back to the Berkeley db sites, and reminded me about
> Mariposa, which is Stonebraker's take on distributed DBs.
> 
> http://s2k-ftp.cs.berkeley.edu:8000:8000/mariposa/
> 

I have looked at the code.  I have files that show all the diffs they
made to it and they have some new files.  It was hard for me to see what
they were doing.  Looks like they hacked up the executor and put in some
translation layer to talk to some databroker.  It seems like an awfully
complicated way to do it.  I would not bother getting it to run, but
figure out what they were trying to do, and why, and see how we can
implement it.  My guess is that they had one central server for each
table, and you went to that server to get information.

--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] CFH: Mariposa, distributed DB

От
Hannu Krosing
Дата:
"Ross J. Reedstrom" wrote:
> 
> 
> Anyone else interested, I'm using the tarball from:
> 
> ftp://epoch.cs.berkeley.edu/pub/mariposa/src/alpha-1/mariposa-alpha-1.tar.gz
> 

Is mariposa licence compatible with ours ?

------------------
Hannu


Re: [HACKERS] CFH: Mariposa, distributed DB

От
"Ross J. Reedstrom"
Дата:
On Mon, Feb 07, 2000 at 04:23:06PM -0500, Bruce Momjian wrote:
> > This is a Call For Hackers:
> > 
> > Some time ago, I floated a little discussion on this list about doing
> > some distributed database work with PostgreSQL. The project got back
> > burnered at work, but now has a timeline for needing a solution "this
> > summer."  Recent discussions on this list about Postgres's historical
> > object roots got me back to the Berkeley db sites, and reminded me about
> > Mariposa, which is Stonebraker's take on distributed DBs.
> > 
> > http://s2k-ftp.cs.berkeley.edu:8000:8000/mariposa/
> > 
> 
> I have looked at the code.  I have files that show all the diffs they
> made to it and they have some new files.  It was hard for me to see what
> they were doing.  Looks like they hacked up the executor and put in some
> translation layer to talk to some databroker.  It seems like an awfully
> complicated way to do it.  I would not bother getting it to run, but
> figure out what they were trying to do, and why, and see how we can
> implement it.  My guess is that they had one central server for each
> table, and you went to that server to get information.
> 

Actually, this being an academic project, there's lots of design
documents about how it's _supposed_ to work. Stonebraker calls in an
'agoric' distributed database, as in agora, market. The various db
servers offer tables (or even specific views on tables) 'for sale', and
bid against/with each other to provide the data to clients requesting
it. The idea behind it is to us a micro-economic market model to do
your distributed optimizations for you, rather than have the DBAs decide
what tables go where, what tables need to be shadowed, etc. The win is
supposedly massive scaleability: they Cohera site talks about 10000s
of servers.

As I said, I've been doing the compare existing source code thing,
but thought working code might be more revealing, and give my project
manager something to see progress on ;-) Your right, though, that the
most productive way to go, in the long run, might be to reimplement what
they've described, in the current pgsql tree, using the Mariposa source
as an example implementation.

Ross
-- 
Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> 
NSBRI Research Scientist/Programmer
Computer and Information Technology Institute
Rice University, 6100 S. Main St.,  Houston, TX 77005


Re: [HACKERS] CFH: Mariposa, distributed DB

От
"Ross J. Reedstrom"
Дата:
On Mon, Feb 07, 2000 at 11:44:14PM +0200, Hannu Krosing wrote:
> "Ross J. Reedstrom" wrote:
> > 
> > 
> > Anyone else interested, I'm using the tarball from:
> > 
> > ftp://epoch.cs.berkeley.edu/pub/mariposa/src/alpha-1/mariposa-alpha-1.tar.gz
> > 
> 
> Is mariposa licence compatible with ours ?

It better be, it's the same license ;-) That is, Mariposa is a branch off
the Postgres95 tree. Actually, it's a good question: the PG95 license 
would have let them put just about any license on Mariposa they wanted.

After running both COPYRIGHT files throught fmt, here's the diff output:

wallace$ diff COPYRIGHT COPYRIGHT.pgsql 
1c1,2
< Mariposa Distributed Data Base Management System
---
> PostgreSQL Data Base Management System (formerly known as Postgres,
> then as Postgres95).
3c4
< Copyright (c) 1994-6 Regents of the University of California
---
> Copyright (c) 1994-7 Regents of the University of California
21d21
< 
wallace$ 

So, it is word for word the PostgreSQL license.

Ross
-- 
Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> 
NSBRI Research Scientist/Programmer
Computer and Information Technology Institute
Rice University, 6100 S. Main St.,  Houston, TX 77005


Re: [HACKERS] CFH: Mariposa, distributed DB

От
Hannu Krosing
Дата:
Bruce Momjian wrote:
> 
> > This is a Call For Hackers:
> >
> > Some time ago, I floated a little discussion on this list about doing
> > some distributed database work with PostgreSQL. The project got back
> > burnered at work, but now has a timeline for needing a solution "this
> > summer."  Recent discussions on this list about Postgres's historical
> > object roots got me back to the Berkeley db sites, and reminded me about
> > Mariposa, which is Stonebraker's take on distributed DBs.
> >
> > http://s2k-ftp.cs.berkeley.edu:8000:8000/mariposa/

It has a nice concept of simulating free market for distributed query 
optimisation. Auctions, brokers and all ...

> 
> I have looked at the code.  I have files that show all the diffs they
> made to it and they have some new files.  It was hard for me to see what
> they were doing.  Looks like they hacked up the executor and put in some
> translation layer to talk to some databroker. 

The broker was for determining where to get the data from - as each table 
could be queried from several sites there had to be a mechanism for the 
planner to figure out the cheapest (or fastest if "money" was not a problem)

> It seems like an awfully
> complicated way to do it.  I would not bother getting it to run, but
> figure out what they were trying to do, and why, and see how we can
> implement it.  My guess is that they had one central server for each
> table, and you went to that server to get information.

They would not have needed the broker for such a simple scheme 

IIRC they had no central table, but they doubled the length of oid and 
made it to include the site id of the site that created the tuple.

It could be that they restricted changing a tuple to that site ?

The site to go for information was determined by an auction where each site 
offered speed and cost for looking up the data. Usually the didn't also 
quarantee the latest data, just the "best effort".

-------------------
Hannu


Re: [HACKERS] CFH: Mariposa, distributed DB

От
Don Baccus
Дата:
At 12:04 AM 2/8/00 +0200, Hannu Krosing wrote:

>The site to go for information was determined by an auction where each site 
>offered speed and cost for looking up the data. Usually the didn't also 
>quarantee the latest data, just the "best effort".

I just glanced at the website.  They explicitly mention that they don't
require global synchronization, because it would slow down response time
for many things (with thousands of server, that sounds like an
understatement).  

So, yes, it would appear they don't guarantee the latest data.



- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.
 


Re: [HACKERS] CFH: Mariposa, distributed DB

От
"Ross J. Reedstrom"
Дата:
Seems there was more than just going back to the Berkeley site that
reminded me of Mariposa. A principle new functionality in Mariposa is 
the ability to 'fragment' a class, based on a user-defined partitioning
function. The example used is a widgets class, which is partitioned on
the 'location' field (i.e., the warehouse the widget is stored in)

CREATE TABLE widgets (part_no        int4,location    char16,on_hand        int4,on_order    int4,commited    int4
) PARTITION ON LOCATION USING btchar16cmp;

Then, the table is filled with tuples, all containing locations of either
'Miami' or 'New York'.

SELECT * from widgets; 

works as expected.

Later, this table is fragmented:

SPLIT FRAGMENT widgets INTO widgets_mi, widgets_ny AT 'Miami';

Now, the original table widgets is _empty_: all the tuples with location <=
'Miami' go to widgets_mi, location > 'Miami' go to widgets_ny.

SELECT * from widgets; 

Still returns all the tuples! So, this works sort of the way Chris Bitmead
has implemented subclasses: widgets_mi and widgets_ny are subclasses of
the widgets class, so selects return everything below. They differ in
that only PARTITIONed classes can be FRAGMENTed.

The distributed part comes in with the MOVE FRAGMENT command. This
transfers the 'master' copy of a table to the designated host, so future
access to that FRAGMENT will go over the network.

There's also a COPY FRAGMENT command, that sets up a local cache of a
fragment, with a periodic update time.  These copies may be either 
READONLY, or (default) READ/WRITE. Seems updates are timed only (simple
extension would be to implement write through behavior)

All this is coming from the Mariposa User's Manual, which is an extended
version of the Postgres95 User's Manual.

As to latest vs. best effort: One defines a BidCurve, who's dimensions are
Cost and Time. A flat curve should get you that latest data. And, since
the DataBroker and Bidder are both implemented as Tcl scripts, so it
would be possible to define a bid policy that only buys the latest data,
regardless of how long it's going to take.

Oh, BTW, yes that does put _two_ interpreted Tcl scripts on the execution
path for every query. Wonder what _that'll_ do for execution time. However,
it's like planning/optimization time, in that it's spent per query, rather
than per tuple.

Ross
-- 
Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> 
NSBRI Research Scientist/Programmer
Computer and Information Technology Institute
Rice University, 6100 S. Main St.,  Houston, TX 77005


On Mon, Feb 07, 2000 at 02:19:56PM -0800, Don Baccus wrote:
> At 12:04 AM 2/8/00 +0200, Hannu Krosing wrote:
> 
> >The site to go for information was determined by an auction where each site 
> >offered speed and cost for looking up the data. Usually the didn't also 
> >quarantee the latest data, just the "best effort".
> 
> I just glanced at the website.  They explicitly mention that they don't
> require global synchronization, because it would slow down response time
> for many things (with thousands of server, that sounds like an
> understatement).  
> 
> So, yes, it would appear they don't guarantee the latest data.
> 


Re: [HACKERS] CFH: Mariposa, distributed DB

От
Don Baccus
Дата:
At 04:57 PM 2/7/00 -0600, Ross J. Reedstrom wrote:

>CREATE TABLE widgets (
>    part_no        int4,
>    location    char16,
>    on_hand        int4,
>    on_order    int4,
>    commited    int4
>) PARTITION ON LOCATION USING btchar16cmp;

Oracle's partitioning is fixed, in other words once you choose a
condition to split on, you can't change it.  In other words, in
your example:

>Then, the table is filled with tuples, all containing locations of either
>'Miami' or 'New York'.

After splitting the table into ">'Miami'" and "<='Miami" fragments, 
I've been told that you can't (say) change it to ">'Boston'" and
have the proper rows move automatically.

In practice, partioning is often used to split tables on dates.  You
might want to partion off your old tax data at the 7-yr old mark, and
each year as you do your taxes move the oldest tax data in your
"recent taxes" table split off to your "older taxes" table.

Apparently, Informix is smart enough to do this for you.

Since a couple of the people associated with the project are Informix
people, do you have any idea if Mariposa is able to do this?

>
>SELECT * from widgets; 
>
>works as expected.
>
>Later, this table is fragmented:
>
>SPLIT FRAGMENT widgets INTO widgets_mi, widgets_ny AT 'Miami';

In other words some sort of "update the two tables AT <some new criteria>"

Whatever the answer to my question, Mariposa certainly looks interesting.
It's functionality that folks who do data warehousing really need.

>Oh, BTW, yes that does put _two_ interpreted Tcl scripts on the execution
>path for every query. Wonder what _that'll_ do for execution time. However,
>it's like planning/optimization time, in that it's spent per query, rather
>than per tuple.

Probably not as bad as you think, if they're simple and short.  Once
someone has this up and running and integrated with PostgreSQL and 
robust and reliable we can measure it and change to something else if
necessary :)



- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.
 


Re: [HACKERS] CFH: Mariposa, distributed DB

От
Karel Zak - Zakkr
Дата:
Hi,
the Mariposa db distribution is interesting, but it is very specific. If I
good understand it is not real-time and global synchronized DB replication.
But for a lot of users (and me) is probably interestion on-line DB replication
and synchronization. How much users have 10K servers?I explore current PG's source and is probably possible create
supportfor
 
on-line replication. My idea is replicate data on a heap_ layout. The parser,
planer and executor run on local backend and replicate straight-out tuples 
to the others servers (nodes). It needs synchronize PG's locks too. 
In near future I want start project for PG on-line replication. Or works on 
this anyone now? Comments?
                        Karel