Обсуждение: [PATCH] ltree hash functions

Поиск
Список
Период
Сортировка

[PATCH] ltree hash functions

От
Tommy Pavlicek
Дата:
Hi All,

I've written a patch to add hash functions for the ltree extension. It adds support for hash indexes and hash aggregation. I've reused the existing logic that's used to hash arrays and added tests that mirror elsewhere (i.e. hstore and hash_func regression tests).

The patch doesn't currently support hash joins as the ltree = operator was created without support for it. The ALTER OPERATOR command doesn't support changing the hash join support, so I'm not sure what the best strategy to change it is. Is it ok to update the operator's row in the pg_operator system catalog or is there a better way to change this that someone could recommend?

Any comments on the overall approach or other feedback would be appreciated.

Thanks,
Tommy
Вложения

Re: [PATCH] ltree hash functions

От
Tomas Vondra
Дата:
Hi,

I've created a CF entry for the patch:

  https://commitfest.postgresql.org/43/4375/

I only briefly skimmed the code, so a couple comments.

On 6/17/23 17:45, Tommy Pavlicek wrote:
> Hi All,
> 
> I've written a patch to add hash functions for the ltree extension. It
> adds support for hash indexes and hash aggregation. I've reused the
> existing logic that's used to hash arrays and added tests that mirror
> elsewhere (i.e. hstore and hash_func regression tests).
> 

Reusing code/logic is the right approach, IMHO.

> The patch doesn't currently support hash joins as the ltree = operator
> was created without support for it. The ALTER OPERATOR command doesn't
> support changing the hash join support, so I'm not sure what the best
> strategy to change it is. Is it ok to update the operator's row in the
> pg_operator system catalog or is there a better way to change this that
> someone could recommend?
> 

I guess the "correct" solution would be to extend ALTER OPERATOR. I
wonder why it's not supported - it's clearly an intentional decision
(per comment in AlterOperator). So what might break if this changes for
an existing operator?

FWIW the CREATE OPERATOR documentation only talks about hash joins for
HASHES, maybe it should be updated to also mention hash aggregates?

> Any comments on the overall approach or other feedback would be appreciated.
> 

I wonder what's the use case for this. I wonder how often people join on
ltree, for example. Did you just notice ltree can't hash and decided to
fix that, or do you have a practical use case / need for this feature?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [PATCH] ltree hash functions

От
Tom Lane
Дата:
Tomas Vondra <tomas.vondra@enterprisedb.com> writes:
> I guess the "correct" solution would be to extend ALTER OPERATOR. I
> wonder why it's not supported - it's clearly an intentional decision
> (per comment in AlterOperator). So what might break if this changes for
> an existing operator?

This code was added by commit 321eed5f0.  The thread leading up to
that commit is here:

https://www.postgresql.org/message-id/flat/3348985.V7xMLFDaJO%40dinodell

There are some nontrivial concerns in there about breaking the
semantics of existing exclusion constraints, for instance.  I think
we mostly rejected the concern about invalidation of cached plans
as already-covered, but that wasn't the only problem.

However, I think we could largely ignore the issues if we restricted
ALTER OPERATOR to only add commutator, negator, hashes, or merges
properties to operators that lacked them before --- which'd be the
primary if not only use-case anyway.  That direction can't break
anything.

            regards, tom lane



Re: [PATCH] ltree hash functions

От
Tomas Vondra
Дата:

On 6/17/23 20:19, Tom Lane wrote:
> Tomas Vondra <tomas.vondra@enterprisedb.com> writes:
>> I guess the "correct" solution would be to extend ALTER OPERATOR. I
>> wonder why it's not supported - it's clearly an intentional decision
>> (per comment in AlterOperator). So what might break if this changes for
>> an existing operator?
> 
> This code was added by commit 321eed5f0.  The thread leading up to
> that commit is here:
> 
> https://www.postgresql.org/message-id/flat/3348985.V7xMLFDaJO%40dinodell
> 
> There are some nontrivial concerns in there about breaking the
> semantics of existing exclusion constraints, for instance.  I think
> we mostly rejected the concern about invalidation of cached plans
> as already-covered, but that wasn't the only problem.
> 
> However, I think we could largely ignore the issues if we restricted
> ALTER OPERATOR to only add commutator, negator, hashes, or merges
> properties to operators that lacked them before --- which'd be the
> primary if not only use-case anyway.  That direction can't break
> anything.
> 

Sound reasonable.

Tommy, are you interested in extending ALTER OPERATOR to allow this,
which would also allow fixing the ltree operator?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [PATCH] ltree hash functions

От
Tommy Pavlicek
Дата:
FWIW the CREATE OPERATOR documentation only talks about hash joins for
HASHES, maybe it should be updated to also mention hash aggregates?

I think I might have been a bit unclear here, the hash aggregate does work without altering the operator so it's just the join that's blocked. Sorry about the confusion.

I wonder what's the use case for this. I wonder how often people join on
ltree, for example. Did you just notice ltree can't hash and decided to
fix that, or do you have a practical use case / need for this feature?

I mostly want to add hash indexes. Beyond selecting specific values, you can use them to get ancestors (trim the path and do an exact select) and descendents (using a functional index calculating the parent path for each row). For example, I've found it can be faster to calculate the path of every ancestor and use select ltree path = ANY([ancestor paths]) compared to using a gist index. It's not ideal, but unfortunately I've found that with enough rows, gist indexes get very large and slow. Btree indexes are better, but for ltree they can still be up to around 10x bigger than a hash index. I've also seen ltree hash indexes outperform btree indexes in very large tables, but I suspect in most cases they'll be similar.

Tommy, are you interested in extending ALTER OPERATOR to allow this,
which would also allow fixing the ltree operator?

Yes, I can do that. I took a look over the code and email thread and it seems like it should be relatively straight forward. I'll put a patch together for that and then update this patch to alter the operator.

On Sat, Jun 17, 2023 at 9:57 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:


On 6/17/23 20:19, Tom Lane wrote:
> Tomas Vondra <tomas.vondra@enterprisedb.com> writes:
>> I guess the "correct" solution would be to extend ALTER OPERATOR. I
>> wonder why it's not supported - it's clearly an intentional decision
>> (per comment in AlterOperator). So what might break if this changes for
>> an existing operator?
>
> This code was added by commit 321eed5f0.  The thread leading up to
> that commit is here:
>
> https://www.postgresql.org/message-id/flat/3348985.V7xMLFDaJO%40dinodell
>
> There are some nontrivial concerns in there about breaking the
> semantics of existing exclusion constraints, for instance.  I think
> we mostly rejected the concern about invalidation of cached plans
> as already-covered, but that wasn't the only problem.
>
> However, I think we could largely ignore the issues if we restricted
> ALTER OPERATOR to only add commutator, negator, hashes, or merges
> properties to operators that lacked them before --- which'd be the
> primary if not only use-case anyway.  That direction can't break
> anything.
>

Sound reasonable.

Tommy, are you interested in extending ALTER OPERATOR to allow this,
which would also allow fixing the ltree operator?


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PATCH] ltree hash functions

От
Daniel Gustafsson
Дата:
> On 19 Jun 2023, at 11:18, Tommy Pavlicek <tommypav122@gmail.com> wrote:

> Tommy, are you interested in extending ALTER OPERATOR to allow this,
> which would also allow fixing the ltree operator?
>
> Yes, I can do that. I took a look over the code and email thread and it seems like it should be relatively straight
forward.I'll put a patch together for that and then update this patch to alter the operator. 

Did you have a chance to look at this for an updated patch for this commitfest?

--
Daniel Gustafsson




Re: [PATCH] ltree hash functions

От
Tommy Pavlicek
Дата:
On Thu, Jul 6, 2023 at 2:18 AM Daniel Gustafsson <daniel@yesql.se> wrote:
>
> > On 19 Jun 2023, at 11:18, Tommy Pavlicek <tommypav122@gmail.com> wrote:
>
> > Tommy, are you interested in extending ALTER OPERATOR to allow this,
> > which would also allow fixing the ltree operator?
> >
> > Yes, I can do that. I took a look over the code and email thread and it seems like it should be relatively straight
forward.I'll put a patch together for that and then update this patch to alter the operator. 
>
> Did you have a chance to look at this for an updated patch for this commitfest?

I finally had a chance to look at this and I've updated the patch to
alter the = operator to enable hash joins.

This is ready to be looked at now.

Is there anything I need to do to move this forward?

Cheers,
Tommy

Вложения

Re: [PATCH] ltree hash functions

От
jian he
Дата:
On Wed, Nov 29, 2023 at 6:09 AM Tommy Pavlicek <tommypav122@gmail.com> wrote:
>
> On Thu, Jul 6, 2023 at 2:18 AM Daniel Gustafsson <daniel@yesql.se> wrote:
> >
> > > On 19 Jun 2023, at 11:18, Tommy Pavlicek <tommypav122@gmail.com> wrote:
> >
> > > Tommy, are you interested in extending ALTER OPERATOR to allow this,
> > > which would also allow fixing the ltree operator?
> > >
> > > Yes, I can do that. I took a look over the code and email thread and it seems like it should be relatively
straightforward. I'll put a patch together for that and then update this patch to alter the operator. 
> >
> > Did you have a chance to look at this for an updated patch for this commitfest?
>
> I finally had a chance to look at this and I've updated the patch to
> alter the = operator to enable hash joins.
>
> This is ready to be looked at now.
>
> Is there anything I need to do to move this forward?
>

you only change Makefile, you also need to change contrib/ltree/meson.build?

+drop index tstidx;
+create index tstidx on ltreetest using hash (t);
+set enable_seqscan=off;
+
+SELECT * FROM ltreetest WHERE t =  '12.3' order by t asc;

Do you need to use EXPLAIN to demo the index usage?



Re: [PATCH] ltree hash functions

От
Tommy Pavlicek
Дата:
On Tue, Nov 28, 2023 at 7:38 PM jian he <jian.universality@gmail.com> wrote:
> you only change Makefile, you also need to change contrib/ltree/meson.build?
> Do you need to use EXPLAIN to demo the index usage?

Thanks! Yes, I missed the Meson build file. I added additional
commands with EXPLAIN (COSTS OFF) as I found in other places.

Patch updated for those comments (and a touch of cleanup in the tests) attached.

Вложения

Re: [PATCH] ltree hash functions

От
jian he
Дата:
On Fri, Dec 1, 2023 at 8:44 AM Tommy Pavlicek <tommypav122@gmail.com> wrote:
>
>
> Patch updated for those comments (and a touch of cleanup in the tests) attached.

it would be a better name as hash_ltree than ltree_hash, similar logic
applies to ltree_hash_extended.
that would be the convention. see: https://stackoverflow.com/a/69650940/15603477


Other than that, it looks good.



Re: [PATCH] ltree hash functions

От
Tommy Pavlicek
Дата:
Thanks.

I've attached the latest version that updates the naming in line with
the convention.

On Mon, Dec 4, 2023 at 12:46 AM jian he <jian.universality@gmail.com> wrote:
>
> On Fri, Dec 1, 2023 at 8:44 AM Tommy Pavlicek <tommypav122@gmail.com> wrote:
> >
> >
> > Patch updated for those comments (and a touch of cleanup in the tests) attached.
>
> it would be a better name as hash_ltree than ltree_hash, similar logic
> applies to ltree_hash_extended.
> that would be the convention. see: https://stackoverflow.com/a/69650940/15603477
>
>
> Other than that, it looks good.

Вложения

Re: [PATCH] ltree hash functions

От
vignesh C
Дата:
On Wed, 6 Dec 2023 at 04:08, Tommy Pavlicek <tommypav122@gmail.com> wrote:
>
> Thanks.
>
> I've attached the latest version that updates the naming in line with
> the convention.
>
> On Mon, Dec 4, 2023 at 12:46 AM jian he <jian.universality@gmail.com> wrote:
> >
> > On Fri, Dec 1, 2023 at 8:44 AM Tommy Pavlicek <tommypav122@gmail.com> wrote:
> > >
> > >
> > > Patch updated for those comments (and a touch of cleanup in the tests) attached.
> >
> > it would be a better name as hash_ltree than ltree_hash, similar logic
> > applies to ltree_hash_extended.
> > that would be the convention. see: https://stackoverflow.com/a/69650940/15603477
> >
> >
> > Other than that, it looks good.

CFBot shows one of the test is failing as in [1]:
diff -U3 /tmp/cirrus-ci-build/contrib/ltree/expected/ltree.out
/tmp/cirrus-ci-build/build-32/testrun/ltree/regress/results/ltree.out
--- /tmp/cirrus-ci-build/contrib/ltree/expected/ltree.out 2024-01-31
15:18:42.893039599 +0000
+++ /tmp/cirrus-ci-build/build-32/testrun/ltree/regress/results/ltree.out
2024-01-31 15:23:25.309028749 +0000
@@ -1442,9 +1442,14 @@
        ('0.1.2'::ltree), ('0'::ltree), ('0_asd.1_ASD'::ltree)) x(v)
 WHERE  hash_ltree(v)::bit(32) != hash_ltree_extended(v, 0)::bit(32)
        OR hash_ltree(v)::bit(32) = hash_ltree_extended(v, 1)::bit(32);
- value | standard | extended0 | extended1
--------+----------+-----------+-----------
-(0 rows)
+    value    |             standard             |
extended0             |            extended1
+-------------+----------------------------------+----------------------------------+----------------------------------
+ 0           | 10001010100010010000000000001011 |
01011001111001000100011001011011 | 01011001111001000100011010011111
+ 0.1         | 10100000111110001010110001001110 |
00111100100010001100110111010101 | 00111100100010001101100011010101
+ 0.1.2       | 01111000011100000101111101110100 |
10101110011101011000000011010111 | 10101110011101110010001111000011
+ 0           | 10001010100010010000000000001011 |
01011001111001000100011001011011 | 01011001111001000100011010011111
+ 0_asd.1_ASD | 01000010001010000000101001001101 |
00111100100010001100110111010101 | 00111100100010001101100011010101
+(5 rows)

Please post an updated version for the same.

[1] -
https://api.cirrus-ci.com/v1/artifact/task/5572544858685440/testrun/build-32/testrun/ltree/regress/regression.diffs

Regards,
Vignesh



Re: [PATCH] ltree hash functions

От
jian he
Дата:
On Thu, Feb 1, 2024 at 11:11 PM vignesh C <vignesh21@gmail.com> wrote:
>
> On Wed, 6 Dec 2023 at 04:08, Tommy Pavlicek <tommypav122@gmail.com> wrote:
> >
> > Thanks.
> >
> > I've attached the latest version that updates the naming in line with
> > the convention.
> >
> > On Mon, Dec 4, 2023 at 12:46 AM jian he <jian.universality@gmail.com> wrote:
> > >
> > > On Fri, Dec 1, 2023 at 8:44 AM Tommy Pavlicek <tommypav122@gmail.com> wrote:
> > > >
> > > >
> > > > Patch updated for those comments (and a touch of cleanup in the tests) attached.
> > >
> > > it would be a better name as hash_ltree than ltree_hash, similar logic
> > > applies to ltree_hash_extended.
> > > that would be the convention. see: https://stackoverflow.com/a/69650940/15603477
> > >
> > >
> > > Other than that, it looks good.
>
> CFBot shows one of the test is failing as in [1]:
> diff -U3 /tmp/cirrus-ci-build/contrib/ltree/expected/ltree.out
> /tmp/cirrus-ci-build/build-32/testrun/ltree/regress/results/ltree.out
> --- /tmp/cirrus-ci-build/contrib/ltree/expected/ltree.out 2024-01-31
> 15:18:42.893039599 +0000
> +++ /tmp/cirrus-ci-build/build-32/testrun/ltree/regress/results/ltree.out
> 2024-01-31 15:23:25.309028749 +0000
> @@ -1442,9 +1442,14 @@
>         ('0.1.2'::ltree), ('0'::ltree), ('0_asd.1_ASD'::ltree)) x(v)
>  WHERE  hash_ltree(v)::bit(32) != hash_ltree_extended(v, 0)::bit(32)
>         OR hash_ltree(v)::bit(32) = hash_ltree_extended(v, 1)::bit(32);
> - value | standard | extended0 | extended1
> --------+----------+-----------+-----------
> -(0 rows)
> +    value    |             standard             |
> extended0             |            extended1
>
+-------------+----------------------------------+----------------------------------+----------------------------------
> + 0           | 10001010100010010000000000001011 |
> 01011001111001000100011001011011 | 01011001111001000100011010011111
> + 0.1         | 10100000111110001010110001001110 |
> 00111100100010001100110111010101 | 00111100100010001101100011010101
> + 0.1.2       | 01111000011100000101111101110100 |
> 10101110011101011000000011010111 | 10101110011101110010001111000011
> + 0           | 10001010100010010000000000001011 |
> 01011001111001000100011001011011 | 01011001111001000100011010011111
> + 0_asd.1_ASD | 01000010001010000000101001001101 |
> 00111100100010001100110111010101 | 00111100100010001101100011010101
> +(5 rows)
>
> Please post an updated version for the same.
>
> [1] -
https://api.cirrus-ci.com/v1/artifact/task/5572544858685440/testrun/build-32/testrun/ltree/regress/regression.diffs
>

It only fails on Linux - Debian Bullseye - Meson.
I fixed the white space, named it v5.
I also made the following changes:
from

uint64 levelHash = hash_any_extended((unsigned char *) al->name, al->len, seed);
uint32 levelHash = hash_any((unsigned char *) al->name, al->len);

to
uint64 levelHash = DatumGetUInt64(hash_any_extended((unsigned char *)
al->name, al->len, seed));
uint32 levelHash = DatumGetUInt32(hash_any((unsigned char *) al->name,
al->len));

(these two line live in different functions)

I have some problems testing it locally, so I post the patch.

Вложения

Re: [PATCH] ltree hash functions

От
Tom Lane
Дата:
jian he <jian.universality@gmail.com> writes:
> I also made the following changes:
> from

> uint64 levelHash = hash_any_extended((unsigned char *) al->name, al->len, seed);
> uint32 levelHash = hash_any((unsigned char *) al->name, al->len);

> to
> uint64 levelHash = DatumGetUInt64(hash_any_extended((unsigned char *)
> al->name, al->len, seed));
> uint32 levelHash = DatumGetUInt32(hash_any((unsigned char *) al->name,
> al->len));

Yeah, that'd fail on 32-bit machines.

Pushed v5 with some minor cosmetic tweaking.

            regards, tom lane