Обсуждение: SQL Property Graph Queries (SQL/PGQ)

Поиск
Список
Период
Сортировка

SQL Property Graph Queries (SQL/PGQ)

От
Peter Eisentraut
Дата:
Here is a prototype implementation of SQL property graph queries
(SQL/PGQ), following SQL:2023.  This was talked about briefly at the
FOSDEM developer meeting, and a few people were interested, so I
wrapped up what I had in progress into a presentable form.

There is some documentation to get started in doc/src/sgml/ddl.sgml
and doc/src/sgml/queries.sgml.

To learn more about this facility, here are some external resources:

* An article about a competing product:
   https://oracle-base.com/articles/23c/sql-property-graphs-and-sql-pgq-23c
   (All the queries in the article work, except the ones using
   vertex_id() and edge_id(), which are non-standard, and the JSON
   examples at the end, which require some of the in-progress JSON
   functionality for PostgreSQL.)

* An academic paper related to another competing product:
   https://www.cidrdb.org/cidr2023/papers/p66-wolde.pdf (The main part
   of this paper discusses advanced functionality that my patch doesn't
   have.)

* A 2019 presentation about graph databases:
   https://www.pgcon.org/2019/schedule/events/1300.en.html (There is
   also a video.)

* (Vik has a recent presentation "Property Graphs: When the Relational
   Model Is Not Enough", but I haven't found the content posted
   online.)

The patch is quite fragile, and treading outside the tested paths will
likely lead to grave misbehavior.  Use with caution.  But I feel that
the general structure is ok, and we just need to fill in the
proverbial few thousand lines of code in the designated areas.
Вложения

Re: SQL Property Graph Queries (SQL/PGQ)

От
Andres Freund
Дата:
Hi,

On 2024-02-16 15:53:11 +0100, Peter Eisentraut wrote:
> The patch is quite fragile, and treading outside the tested paths will
> likely lead to grave misbehavior.  Use with caution.  But I feel that
> the general structure is ok, and we just need to fill in the
> proverbial few thousand lines of code in the designated areas.

One aspect that I m concerned with structurally is that the transformation,
from property graph queries to something postgres understands, is done via the
rewrite system. I doubt that that is a good idea. For one it bars the planner
from making plans that benefit from the graph query formulation. But more
importantly, we IMO should reduce usage of the rewrite system, not increase
it.

Greetings,

Andres Freund



Re: SQL Property Graph Queries (SQL/PGQ)

От
Peter Eisentraut
Дата:
On 16.02.24 20:23, Andres Freund wrote:
> One aspect that I m concerned with structurally is that the transformation,
> from property graph queries to something postgres understands, is done via the
> rewrite system. I doubt that that is a good idea. For one it bars the planner
> from making plans that benefit from the graph query formulation. But more
> importantly, we IMO should reduce usage of the rewrite system, not increase
> it.

PGQ is meant to be implemented like that, like views expanding to joins 
and unions.  This is what I have gathered during the specification 
process, and from other implementations, and from academics.  There are 
certainly other ways to combine relational and graph database stuff, 
like with native graph storage and specialized execution support, but 
this is not that, and to some extent PGQ was created to supplant those 
other approaches.

Many people will agree that the rewriter is sort of weird and archaic at 
this point.  But I'm not aware of any plans or proposals to do anything 
about it.  As long as the view expansion takes place there, it makes 
sense to align with that.  For example, all the view security stuff 
(privileges, security barriers, etc.) will eventually need to be 
considered, and it would make sense to do that in a consistent way.  So 
for now, I'm working with what we have, but let's see where it goes.

(Note to self: Check that graph inside view inside graph inside view ... 
works.)




Re: SQL Property Graph Queries (SQL/PGQ)

От
Tomas Vondra
Дата:
On 2/23/24 17:15, Peter Eisentraut wrote:
> On 16.02.24 20:23, Andres Freund wrote:
>> One aspect that I m concerned with structurally is that the
>> transformation,
>> from property graph queries to something postgres understands, is done
>> via the
>> rewrite system. I doubt that that is a good idea. For one it bars the
>> planner
>> from making plans that benefit from the graph query formulation. But more
>> importantly, we IMO should reduce usage of the rewrite system, not
>> increase
>> it.
> 
> PGQ is meant to be implemented like that, like views expanding to joins
> and unions.  This is what I have gathered during the specification
> process, and from other implementations, and from academics.  There are
> certainly other ways to combine relational and graph database stuff,
> like with native graph storage and specialized execution support, but
> this is not that, and to some extent PGQ was created to supplant those
> other approaches.
> 

I understand PGQ was meant to be implemented as a bit of a "syntactic
sugar" on top of relations, instead of inventing some completely new
ways to store/query graph data.

But does that really mean it needs to be translated to relations this
early / in rewriter? I haven't thought about it very deeply, but won't
that discard useful information about semantics of the query, which
might be useful when planning/executing the query?

I've somehow imagined we'd be able to invent some new index types, or
utilize some other type of auxiliary structure, maybe some special
executor node, but it seems harder without this extra info ...

> Many people will agree that the rewriter is sort of weird and archaic at
> this point.  But I'm not aware of any plans or proposals to do anything
> about it.  As long as the view expansion takes place there, it makes
> sense to align with that.  For example, all the view security stuff
> (privileges, security barriers, etc.) will eventually need to be
> considered, and it would make sense to do that in a consistent way.  So
> for now, I'm working with what we have, but let's see where it goes.
> 
> (Note to self: Check that graph inside view inside graph inside view ...
> works.)
> 

AFAIK the "policy" regarding rewriter was that we don't want to use it
for user stuff (e.g. people using it for partitioning), but I'm not sure
about internal stuff.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: SQL Property Graph Queries (SQL/PGQ)

От
Ashutosh Bapat
Дата:
On Fri, Feb 23, 2024 at 11:08 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
> On 2/23/24 17:15, Peter Eisentraut wrote:
> > On 16.02.24 20:23, Andres Freund wrote:
> >> One aspect that I m concerned with structurally is that the
> >> transformation,
> >> from property graph queries to something postgres understands, is done
> >> via the
> >> rewrite system. I doubt that that is a good idea. For one it bars the
> >> planner
> >> from making plans that benefit from the graph query formulation. But more
> >> importantly, we IMO should reduce usage of the rewrite system, not
> >> increase
> >> it.
> >
> > PGQ is meant to be implemented like that, like views expanding to joins
> > and unions.  This is what I have gathered during the specification
> > process, and from other implementations, and from academics.  There are
> > certainly other ways to combine relational and graph database stuff,
> > like with native graph storage and specialized execution support, but
> > this is not that, and to some extent PGQ was created to supplant those
> > other approaches.
> >
>
> I understand PGQ was meant to be implemented as a bit of a "syntactic
> sugar" on top of relations, instead of inventing some completely new
> ways to store/query graph data.
>
> But does that really mean it needs to be translated to relations this
> early / in rewriter? I haven't thought about it very deeply, but won't
> that discard useful information about semantics of the query, which
> might be useful when planning/executing the query?
>
> I've somehow imagined we'd be able to invent some new index types, or
> utilize some other type of auxiliary structure, maybe some special
> executor node, but it seems harder without this extra info ...

I am yet to look at the implementation but ...
1. If there are optimizations that improve performance of some path
patterns, they are likely to improve the performance of joins used to
implement those. In such cases, loosing some information might be ok.
2. Explicit graph annotatiion might help to automate some things like
creating indexes automatically on columns that appear in specific
patterns OR create extended statistics automatically on the columns
participating in specific patterns. OR interpreting statistics/costing
in differently than normal query execution. Those kind of things will
require retaining annotations in views, planner/execution trees etc.
3. There are some things like aggregates/operations on paths which
might require stuff like new execution nodes. But I am not sure we
have reached that stage yet.

There might be things we may not see right now in the standard e.g.
indexes on graph properties. For those mapping the graph objects unto
database objects might prove useful. That goes back to Peter's comment
--- quote
As long as the view expansion takes place there, it makes
sense to align with that.  For example, all the view security stuff
(privileges, security barriers, etc.) will eventually need to be
considered, and it would make sense to do that in a consistent way.
--- unquote

--
Best Wishes,
Ashutosh Bapat



Re: SQL Property Graph Queries (SQL/PGQ)

От
Ashutosh Bapat
Дата:
Patch conflicted with changes in ef5e2e90859a39efdd3a78e528c544b585295a78. Attached patch with the conflict resolved.

--
Best Wishes,
Ashutosh Bapat
Вложения