Re: Do we want a hashset type?

Поиск
Список
Период
Сортировка
От Joel Jacobson
Тема Re: Do we want a hashset type?
Дата
Msg-id 1c59f2c3-a70f-4727-b7e1-b7aea62d818a@app.fastmail.com
обсуждение исходный текст
Ответ на Re: Do we want a hashset type?  (Ants Aasma <ants@cybertec.at>)
Список pgsql-hackers
On Fri, Jun 2, 2023, at 10:01, Ants Aasma wrote:
> Have you looked at roaring bitmaps? There is a pg_roaringbitmap
> extension [1] already available that offers very fast unions,
> intersections and membership tests over integer sets. I used it to get
> some pretty impressive performance results for faceting search on
> large document sets. [2]

Many thanks for the tip!

New benchmark:

We already had since before:
> wget https://snap.stanford.edu/data/soc-pokec-relationships.txt.gz
> gunzip soc-pokec-relationships.txt.gz
> CREATE TABLE edges (from_node INT, to_node INT);
> \copy edges from soc-pokec-relationships.txt;
> ALTER TABLE edges ADD PRIMARY KEY (from_node, to_node);

I've created a new `users` table from the `edges` table,
with a new `friends` roaringbitmap column:

CREATE TABLE users AS
SELECT from_node AS id, rb_build_agg(to_node) AS friends FROM edges GROUP BY 1;
ALTER TABLE users ADD PRIMARY KEY (id);

Old query from before:

CREATE OR REPLACE VIEW friends_of_friends AS
WITH RECURSIVE friends_of_friends AS (
    SELECT
        ARRAY[5867::bigint] AS current,
        0 AS depth
    UNION ALL
    SELECT
        new_current,
        friends_of_friends.depth + 1
    FROM
        friends_of_friends
    CROSS JOIN LATERAL (
        SELECT
            array_agg(DISTINCT edges.to_node) AS new_current
        FROM
            edges
        WHERE
            from_node = ANY(friends_of_friends.current)
    ) q
    WHERE
        friends_of_friends.depth < 3
)
SELECT
    coalesce(array_length(current, 1), 0) AS count_friends_at_depth_3
FROM
    friends_of_friends
WHERE
    depth = 3;
;

New roaringbitmap-based query using users instead:

CREATE OR REPLACE VIEW friends_of_friends_roaringbitmap AS
WITH RECURSIVE friends_of_friends AS
(
    SELECT
        friends,
        1 AS depth
    FROM users WHERE id = 5867
    UNION ALL
    SELECT
        new_friends,
        friends_of_friends.depth + 1
    FROM
        friends_of_friends
    CROSS JOIN LATERAL (
        SELECT
            rb_or_agg(users.friends) AS new_friends
        FROM
            users
        WHERE
            users.id = ANY(rb_to_array(friends_of_friends.friends))
    ) q
    WHERE
        friends_of_friends.depth < 3
)
SELECT
    rb_cardinality(friends) AS count_friends_at_depth_3
FROM
    friends_of_friends
WHERE
    depth = 3
;

Note, depth is 1 at first level since we already have user 5867's friends in the users column.

Maybe there is a better way to make use of the btree index on users.id,
than to convert the roaringbitmap to an array in order to use = ANY(...),
that is, this part: `users.id = ANY(rb_to_array(friends_of_friends.friends))`?

Or maybe there is some entirely different but equivalent way of writing the query
in a more efficient way?


SELECT * FROM friends_of_friends;
 count_friends_at_depth_3
--------------------------
                  1035293
(1 row)

SELECT * FROM friends_of_friends_roaringbitmap;
 count_friends_at_depth_3
--------------------------
                  1035293
(1 row)

EXPLAIN ANALYZE SELECT * FROM friends_of_friends;
                                                                               QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 CTE Scan on friends_of_friends  (cost=5722.03..5722.73 rows=1 width=4) (actual time=2232.896..2233.289 rows=1
loops=1)
   Filter: (depth = 3)
   Rows Removed by Filter: 3
   CTE friends_of_friends
     ->  Recursive Union  (cost=0.00..5722.03 rows=31 width=36) (actual time=0.003..2228.707 rows=4 loops=1)
           ->  Result  (cost=0.00..0.01 rows=1 width=36) (actual time=0.001..0.001 rows=1 loops=1)
           ->  Subquery Scan on "*SELECT* 2"  (cost=190.59..572.17 rows=3 width=36) (actual time=556.806..556.837
rows=1loops=4)
 
                 ->  Nested Loop  (cost=190.59..572.06 rows=3 width=36) (actual time=553.748..553.748 rows=1 loops=4)
                       ->  WorkTable Scan on friends_of_friends friends_of_friends_1  (cost=0.00..0.22 rows=3 width=36)
(actualtime=0.000..0.001 rows=1 loops=4)
 
                             Filter: (depth < 3)
                             Rows Removed by Filter: 0
                       ->  Aggregate  (cost=190.59..190.60 rows=1 width=32) (actual time=737.427..737.427 rows=1
loops=3)
                             ->  Sort  (cost=179.45..185.02 rows=2227 width=4) (actual time=577.192..649.812
rows=3486910loops=3)
 
                                   Sort Key: edges.to_node
                                   Sort Method: quicksort  Memory: 393217kB
                                   ->  Index Only Scan using edges_pkey on edges  (cost=0.56..55.62 rows=2227 width=4)
(actualtime=0.027..225.609 rows=3486910 loops=3)
 
                                         Index Cond: (from_node = ANY (friends_of_friends_1.current))
                                         Heap Fetches: 0
 Planning Time: 0.294 ms
 Execution Time: 2240.446 ms

EXPLAIN ANALYZE SELECT * FROM friends_of_friends_roaringbitmap;
                                                                          QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------
 CTE Scan on friends_of_friends  (cost=799.30..800.00 rows=1 width=8) (actual time=492.925..492.930 rows=1 loops=1)
   Filter: (depth = 3)
   Rows Removed by Filter: 2
   CTE friends_of_friends
     ->  Recursive Union  (cost=0.43..799.30 rows=31 width=118) (actual time=0.061..492.842 rows=3 loops=1)
           ->  Index Scan using users_pkey on users  (cost=0.43..2.65 rows=1 width=118) (actual time=0.060..0.062
rows=1loops=1)
 
                 Index Cond: (id = 5867)
           ->  Nested Loop  (cost=26.45..79.63 rows=3 width=36) (actual time=164.244..164.244 rows=1 loops=3)
                 ->  WorkTable Scan on friends_of_friends friends_of_friends_1  (cost=0.00..0.22 rows=3 width=36)
(actualtime=0.000..0.001 rows=1 loops=3)
 
                       Filter: (depth < 3)
                       Rows Removed by Filter: 0
                 ->  Aggregate  (cost=26.45..26.46 rows=1 width=32) (actual time=246.359..246.359 rows=1 loops=2)
                       ->  Index Scan using users_pkey on users users_1  (cost=0.43..26.42 rows=10 width=114) (actual
time=0.074..132.318rows=116336 loops=2)
 
                             Index Cond: (id = ANY (rb_to_array(friends_of_friends_1.friends)))
 Planning Time: 0.257 ms
 Execution Time: 493.134 ms



В списке pgsql-hackers по дате отправления:

Предыдущее
От: vignesh C
Дата:
Сообщение: Re: [PoC] pg_upgrade: allow to upgrade publisher node
Следующее
От: shveta malik
Дата:
Сообщение: Re: Support logical replication of DDLs