Обсуждение: Are we the first OSS database with parallel query?
I suspect not, but I can't think of another example right now. -- -- Josh Berkus Red Hat OSAS (any opinions are my own)
What about things like Apache Drill and EventQL?
On Fri, Aug 26, 2016 at 8:41 PM, Josh Berkus <josh@agliodbs.com> wrote:
I suspect not, but I can't think of another example right now.
--
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)
--
Sent via pgsql-advocacy mailing list (pgsql-advocacy@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-advocacy
On Fri, Aug 26, 2016 at 08:41:15PM -0400, Josh Berkus wrote: > I suspect not, but I can't think of another example right now. There are a fair number that beat us to that punch (GPDB, HadoopDB, etc.) We're the first with a liberal license, though. :) Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On 08/29/2016 11:39 AM, David Fetter wrote: > On Fri, Aug 26, 2016 at 08:41:15PM -0400, Josh Berkus wrote: >> I suspect not, but I can't think of another example right now. > > There are a fair number that beat us to that punch (GPDB, HadoopDB, > etc.) Do they have parallel query on a single node? I suppose you can have multiple shards-per-node, but that's still a different feature. Also, are there other *SQL* implementations? -- -- Josh Berkus Red Hat OSAS (any opinions are my own)
On Mon, Aug 29, 2016 at 11:41:48AM -0700, Josh Berkus wrote: > On 08/29/2016 11:39 AM, David Fetter wrote: > > On Fri, Aug 26, 2016 at 08:41:15PM -0400, Josh Berkus wrote: > >> I suspect not, but I can't think of another example right now. > > > > There are a fair number that beat us to that punch (GPDB, HadoopDB, > > etc.) > > Do they have parallel query on a single node? I suppose you can have > multiple shards-per-node, but that's still a different feature. I don't know of one offhand, but that distinction seems like a *VERY* thin slice to be claiming. > Also, are there other *SQL* implementations? Yep. You can run SQL in parallel atop Hadoop. Also: https://shardquery.com/2014/02/25/shard-query-supports-background-jobs-query-parallelism-and-all-select-syntax/ Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On 08/29/2016 11:46 AM, David Fetter wrote: > On Mon, Aug 29, 2016 at 11:41:48AM -0700, Josh Berkus wrote: >> On 08/29/2016 11:39 AM, David Fetter wrote: >>> On Fri, Aug 26, 2016 at 08:41:15PM -0400, Josh Berkus wrote: >>>> I suspect not, but I can't think of another example right now. >>> >>> There are a fair number that beat us to that punch (GPDB, HadoopDB, >>> etc.) >> >> Do they have parallel query on a single node? I suppose you can have >> multiple shards-per-node, but that's still a different feature. > > I don't know of one offhand, but that distinction seems like a *VERY* > thin slice to be claiming. Yeah, it's certainly not terribly marketable. "First non-sharded parallel query". -- -- Josh Berkus Red Hat OSAS (any opinions are my own)
On Mon, Aug 29, 2016 at 12:06:43PM -0700, Josh Berkus wrote: > On 08/29/2016 11:46 AM, David Fetter wrote: > > On Mon, Aug 29, 2016 at 11:41:48AM -0700, Josh Berkus wrote: > >> On 08/29/2016 11:39 AM, David Fetter wrote: > >>> On Fri, Aug 26, 2016 at 08:41:15PM -0400, Josh Berkus wrote: > >>>> I suspect not, but I can't think of another example right now. > >>> > >>> There are a fair number that beat us to that punch (GPDB, HadoopDB, > >>> etc.) > >> > >> Do they have parallel query on a single node? I suppose you can have > >> multiple shards-per-node, but that's still a different feature. > > > > I don't know of one offhand, but that distinction seems like a *VERY* > > thin slice to be claiming. > > Yeah, it's certainly not terribly marketable. "First non-sharded > parallel query". PostgreSQL: we don't charge you a network hop to use another core ;) Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
I thought that the real term of Parallel Query should talk about Query in a Host, Not multi host.
When multihost (clusters) is the context, of course it is parallel execution of query in every single host with single query each. So, applying term Parallel Query as the first RDBMS implement it, i think it is correct.
Julyanto SUTANDANG
Equnix Business Solutions, PT
(An Open Source and Open Mind Company)
www.equnix.co.id
Pusat Niaga ITC Roxy Mas Blok C2/42. Jl. KH Hasyim Ashari 125, Jakarta Pusat
T: +6221 22866662 F: +62216315281 M: +628164858028
Caution: The information enclosed in this email (and any attachments) may be legally privileged and/or confidential and is intended only for the use of the addressee(s). No addressee should forward, print, copy, or otherwise reproduce this message in any manner that would allow it to be viewed by any individual not originally listed as a recipient. If the reader of this message is not the intended recipient, you are hereby notified that any unauthorized disclosure, dissemination, distribution, copying or the taking of any action in reliance on the information herein is strictly prohibited. If you have received this communication in error, please immediately notify the sender and delete this message.Unless it is made by the authorized person, any views expressed in this message are those of the individual sender and may not necessarily reflect the views of PT Equnix Business Solutions.
On Tue, Aug 30, 2016 at 2:20 AM, David Fetter <david@fetter.org> wrote:
On Mon, Aug 29, 2016 at 12:06:43PM -0700, Josh Berkus wrote:
> On 08/29/2016 11:46 AM, David Fetter wrote:
> > On Mon, Aug 29, 2016 at 11:41:48AM -0700, Josh Berkus wrote:
> >> On 08/29/2016 11:39 AM, David Fetter wrote:
> >>> On Fri, Aug 26, 2016 at 08:41:15PM -0400, Josh Berkus wrote:
> >>>> I suspect not, but I can't think of another example right now.
> >>>
> >>> There are a fair number that beat us to that punch (GPDB, HadoopDB,
> >>> etc.)
> >>
> >> Do they have parallel query on a single node? I suppose you can have
> >> multiple shards-per-node, but that's still a different feature.
> >
> > I don't know of one offhand, but that distinction seems like a *VERY*
> > thin slice to be claiming.
>
> Yeah, it's certainly not terribly marketable. "First non-sharded
> parallel query".
PostgreSQL: we don't charge you a network hop to use another core ;)
Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate --
Sent via pgsql-advocacy mailing list (pgsql-advocacy@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-advocacy
On Tue, Aug 30, 2016 at 02:45:22AM +0700, julyanto SUTANDANG wrote: > I thought that the real term of Parallel Query should talk about Query in a > Host, Not multi host. > When multihost (clusters) is the context, of course it is parallel execution of > query in every single host with single query each. So, applying term Parallel > Query as the first RDBMS implement it, i think it is correct. I think we added a "parallel CPU query" feature. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On 08/29/2016 08:39 PM, David Fetter wrote: > On Fri, Aug 26, 2016 at 08:41:15PM -0400, Josh Berkus wrote: >> I suspect not, but I can't think of another example right now. > > There are a fair number that beat us to that punch (GPDB, HadoopDB, > etc.) > > We're the first with a liberal license, though. :) > No, we're not. This is fairly tricky question, because there's definitely a bunch of databases that are not used widely in production environments, but are technically open source and implement some sort of parallel query functionality. For example there's MonetDB (which is using basically a Mozilla license), which supports parallel queries since ~2012 or so. There's also C-Store (Vertica is a commercial fork), and H-Store (VoltDB is a commercial fork) - AFAIK both support query parallelism. Although they are experimental / research project (and most companies use the commercial forks in production), they use BSD license so technically they are open source. Also, Stonebraker cooperated on both those projects so neglecting them would be particularly annoying. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 08/29/2016 01:08 PM, Tomas Vondra wrote: > On 08/29/2016 08:39 PM, David Fetter wrote: >> On Fri, Aug 26, 2016 at 08:41:15PM -0400, Josh Berkus wrote: >>> I suspect not, but I can't think of another example right now. >> >> There are a fair number that beat us to that punch (GPDB, HadoopDB, >> etc.) >> >> We're the first with a liberal license, though. :) First relational open source....? JD > > > regards > -- Command Prompt, Inc. http://the.postgres.company/ +1-503-667-4564 PostgreSQL Centered full stack support, consulting and development. Everyone appreciates your honesty, until you are honest with them. Unless otherwise stated, opinions are my own.
On Mon, Aug 29, 2016 at 9:45 PM, julyanto SUTANDANG <julyanto@equnix.co.id> wrote:
I thought that the real term of Parallel Query should talk about Query in a Host, Not multi host.When multihost (clusters) is the context, of course it is parallel execution of query in every single host with single query each. So, applying term Parallel Query as the first RDBMS implement it, i think it is correct.
Right. With HadoopDB what you basically have is a strange form of SQL* run as a map reduce job.
* Hadoop assumes schema on read instead of schema on write, which means if your data doesn't match your expectations, you may get garbage back or may get nulls back. This is actually a feature in big data because usually you are looking for heuristic analysis of data based on statistical guesswork rather than provably correct answers. But I would *not* consider them a competitor.
Julyanto SUTANDANG
Equnix Business Solutions, PT
(An Open Source and Open Mind Company)
www.equnix.co.id
Pusat Niaga ITC Roxy Mas Blok C2/42. Jl. KH Hasyim Ashari 125, Jakarta Pusat
T: +6221 22866662 F: +62216315281 M: +628164858028
Caution: The information enclosed in this email (and any attachments) may be legally privileged and/or confidential and is intended only for the use of the addressee(s). No addressee should forward, print, copy, or otherwise reproduce this message in any manner that would allow it to be viewed by any individual not originally listed as a recipient. If the reader of this message is not the intended recipient, you are hereby notified that any unauthorized disclosure, dissemination, distribution, copying or the taking of any action in reliance on the information herein is strictly prohibited. If you have received this communication in error, please immediately notify the sender and delete this message.Unless it is made by the authorized person, any views expressed in this message are those of the individual sender and may not necessarily reflect the views of PT Equnix Business Solutions.On Tue, Aug 30, 2016 at 2:20 AM, David Fetter <david@fetter.org> wrote:On Mon, Aug 29, 2016 at 12:06:43PM -0700, Josh Berkus wrote:
> On 08/29/2016 11:46 AM, David Fetter wrote:
> > On Mon, Aug 29, 2016 at 11:41:48AM -0700, Josh Berkus wrote:
> >> On 08/29/2016 11:39 AM, David Fetter wrote:
> >>> On Fri, Aug 26, 2016 at 08:41:15PM -0400, Josh Berkus wrote:
> >>>> I suspect not, but I can't think of another example right now.
> >>>
> >>> There are a fair number that beat us to that punch (GPDB, HadoopDB,
> >>> etc.)
> >>
> >> Do they have parallel query on a single node? I suppose you can have
> >> multiple shards-per-node, but that's still a different feature.
> >
> > I don't know of one offhand, but that distinction seems like a *VERY*
> > thin slice to be claiming.
>
> Yeah, it's certainly not terribly marketable. "First non-sharded
> parallel query".
PostgreSQL: we don't charge you a network hop to use another core ;)
Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate --
Sent via pgsql-advocacy mailing list (pgsql-advocacy@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-advocacy
--
Best Wishes,
Chris Travers
Efficito: Hosted Accounting and ERP. Robust and Flexible. No vendor lock-in.