Re: Do we want a hashset type?

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: Do we want a hashset type?
Дата
Msg-id 86f8a027-e6d4-65c0-7349-faa3301c8b8b@enterprisedb.com
обсуждение исходный текст
Ответ на Re: Do we want a hashset type?  ("Joel Jacobson" <joel@compiler.org>)
Ответы Re: Do we want a hashset type?  ("Joel Jacobson" <joel@compiler.org>)
Список pgsql-hackers
On 6/8/23 11:41, Joel Jacobson wrote:
> On Wed, Jun 7, 2023, at 19:37, Tomas Vondra wrote:
>> Interesting, considering how dumb the the hash table implementation is.
> 
> That's promising.
> 

Yeah, not bad for sleep-deprived on-plane hacking ...

There's a bunch of stuff that needs to be improved to make this properly
usable, like:

1) better hash table implementation

2) input/output functions

3) support for other types (now it only works with int32)

4) I wonder if this might be done as an array-like polymorphic type.

5) more efficient storage format, with versioning etc.

6) regression tests

Would you be interested in helping with / working on some of that? I
don't have immediate need for this stuff, so it's not very high on my
TODO list.

>>> I tested Neo4j and the results are surprising; it appears to be significantly *slower*.
>>> However, I've probably misunderstood something, maybe I need to add some index or something.
>>> Even so, it's interesting it's apparently not fast "by default".
>>>
>>
>> No idea how to fix that, but it's rather suspicious.
> 
> I've had a graph-db expert review my benchmark, and he suggested adding an index:
> 
> CREATE INDEX FOR (n:User) ON (n.id)
> 
> This did improve the execution time for Neo4j a bit, down from 819 ms to 528 ms, but PostgreSQL 299 ms is still a
win.
> 
> Benchmark here: https://github.com/joelonsql/graph-query-benchmarks
> Note, in this benchmark, I only test the naive RECURSIVE CTE approach using array_agg(DISTINCT ...).
> And I couldn't even test the most connected user with Neo4j, the query never finish for some reason,
> so I had to test with a less connected user.
> 

Interesting. I'd have expected the graph db to be much faster.

> The graph expert also said that other more realistic graph use-cases might be "multi-relational",
> and pointed me to a link: https://github.com/totogo/awesome-knowledge-graph#knowledge-graph-dataset
> No idea how such multi-relational datasets would affect the benchmarks.
> 

Not sure either, but I don't have ambition to improve everything at
once. If the hashset improves one practical use case, fine with me.

> I think we have already strong indicators that PostgreSQL with a hashset type will from a relative
> performance perspective, do just fine processing basic graph queries, even with large datasets.
> 
> Then there will always be the case when users primarily write very different graph queries all day long,
> who might prefer a graph query *language*, like SQL/PGQ in SQL:2023, Cypher or Gremlin.
> 

Right. IMHO the query language is a separate thing, you still need to
evaluate the query somehow - which is where hashset applies.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Hannu Krosing
Дата:
Сообщение: Re: Let's make PostgreSQL multi-threaded
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: running logical replication as the subscription owner