Обсуждение: [PATCH] Refactor bytea_sortsupport(), take two

Поиск
Список
Период
Сортировка

[PATCH] Refactor bytea_sortsupport(), take two

От
Aleksander Alekseev
Дата:
Hi,

This is a follow-up to b45242fd30ff [1]. Previously we separated
varlena.c into varlena.c and bytea.c. This patch makes
bytea_sortsupport() independent from varlena.c code as it was proposed
before [2][3]. The benefits of this change are summarized in the
commit message that I included to the patch.

As always, your feedback is most appreciated.

[1]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=b45242fd30ff
[2]: https://postgr.es/m/1502394.1725398354@sss.pgh.pa.us
[3]: https://postgr.es/m/CAJ7c6TO3X88dGd8C4Tb-Eq2ZDPz%2B9mP%2BKOwdzK_82BEz_cMPZg%40mail.gmail.com

-- 
Best regards,
Aleksander Alekseev

Вложения

Re: [PATCH] Refactor bytea_sortsupport(), take two

От
Aleksander Alekseev
Дата:
Hi,

> This is a follow-up to b45242fd30ff [1]. Previously we separated
> varlena.c into varlena.c and bytea.c. This patch makes
> bytea_sortsupport() independent from varlena.c code as it was proposed
> before [2][3]. The benefits of this change are summarized in the
> commit message that I included to the patch.
>
> As always, your feedback is most appreciated.

cfbot indicates that v1 needs a rebase. Here is v2.

Вложения

Re: [PATCH] Refactor bytea_sortsupport(), take two

От
John Naylor
Дата:
On Thu, Aug 7, 2025 at 6:44 PM Aleksander Alekseev
<aleksander@tigerdata.com> wrote:
>
> Hi,
>
> > This is a follow-up to b45242fd30ff [1]. Previously we separated
> > varlena.c into varlena.c and bytea.c. This patch makes
> > bytea_sortsupport() independent from varlena.c code as it was proposed
> > before [2][3]. The benefits of this change are summarized in the
> > commit message that I included to the patch.
> >
> > As always, your feedback is most appreciated.
>
> cfbot indicates that v1 needs a rebase. Here is v2.

- * Relies on the assumption that text, VarChar, BpChar, and bytea all have the
- * same representation.  Callers that always use the C collation (e.g.
- * non-collatable type callers like bytea) may have NUL bytes in their strings;
- * this will not work with any other collation, though.
+ * Relies on the assumption that text, VarChar, and BpChar all have the
+ * same representation. Callers that use the C collation may have NUL bytes
+ * in their strings; this will not work with any other collation, though.

- * More generally, it's okay that bytea callers can have NUL bytes in
- * strings because abbreviated cmp need not make a distinction between
+ * Generally speaking, it's okay that C locale callers can have NUL bytes
+ * in strings because abbreviated cmp need not make a distinction between

Don't these types disallow NUL bytes regardless of locale / character set?

--
John Naylor
Amazon Web Services



Re: [PATCH] Refactor bytea_sortsupport(), take two

От
Aleksander Alekseev
Дата:
Hi John,

Thanks for the review.

> Don't these types disallow NUL bytes regardless of locale / character set?

You are right, they do. Here is the patch v3 with corrected comments.

-- 
Best regards,
Aleksander Alekseev

Вложения

Re: [PATCH] Refactor bytea_sortsupport(), take two

От
John Naylor
Дата:
On Mon, Sep 15, 2025 at 3:40 PM Aleksander Alekseev
<aleksander@tigerdata.com> wrote:
>
> > Don't these types disallow NUL bytes regardless of locale / character set?
>
> You are right, they do. Here is the patch v3 with corrected comments.

Hi Aleksander,

- * Relies on the assumption that text, VarChar, BpChar, and bytea all have the
- * same representation.  Callers that always use the C collation (e.g.
- * non-collatable type callers like bytea) may have NUL bytes in their strings;
- * this will not work with any other collation, though.
+ * Relies on the assumption that text, VarChar, and BpChar all have the
+ * same representation. These text types cannot contain NUL bytes.

AFAICS, the only reaon to mention NUL bytes here before was because of
bytea -- sirnce bytea is being removed from this path, I don't see a
need to mention mention NUL bytes here. It'd be relevant if any
simplification is possible in functions that previously may have had
to worry about NUL bytes. I don't immediately see any such
opportunities, though. Are there?

+ * Note: text types (text, varchar, bpchar) cannot contain NUL bytes,
+ * so we don't need to worry about NUL byte handling here.

Same here. Also, "Note" is stronger than a normal comment, maybe to
call attention to complex edge cases or weird behaviors.

+#if SIZEOF_DATUM == 8

We recently made all datums 8 bytes.

Some comments are randomly different than the equivalents in
varlena.c. It's probably better if same things remain the same, but
there's nothing wrong either.

"Lastly, the performance and memory consumption could be optimized a little for
the bytea case."

Is this a side effect of the patch, or a possibility for future work?
It's not clear.

The rest seems fine at a glance.
--
John Naylor
Amazon Web Services



Re: [PATCH] Refactor bytea_sortsupport(), take two

От
Chao Li
Дата:
Hi Aleksander,

Overall LGTM. Just a few small comments:

On Sep 15, 2025, at 16:40, Aleksander Alekseev <aleksander@tigerdata.com> wrote:

--
Best regards,
Aleksander Alekseev
<v3-0001-Refactor-bytea_sortsupport.patch>


1.
····
+ /* We can't afford to leak memory here. */
+ if (PointerGetDatum(arg1) != x)
+ pfree(arg1);
+ if (PointerGetDatum(arg2) != y)
+ pfree(arg2);
···

Should we do:
```
    PG_FREE_IF_COPY(arg1, 0);
    PG_FREE_IF_COPY(arg2, 1)
```

Similar to other places.

2.
···
+/*
+ * Conversion routine for sortsupport.
+ */
+static Datum
+bytea_abbrev_convert(Datum original, SortSupport ssup)
···

The function comment is less descriptive. I would suggest something like:
```
/*
 * Abbreviated key conversion for bytea sortsupport.
 *
 * Returns the abbreviated key as a Datum.  If a detoasted copy was made,
 * it is freed before returning.
 */
```

3.
```
+ if (abbrev_distinct <= 1.0)
+ abbrev_distinct = 1.0;
+
+ if (key_distinct <= 1.0)
+ key_distinct = 1.0;
```

Why <= 1.0 then set to 1.0? Shouldn't “if" takes just <1.0?

4.
```
 Datum
 bytea_sortsupport(PG_FUNCTION_ARGS)
 {
  SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0);
  MemoryContext oldcontext;
+ ByteaSortSupport *bss;
```

“Bss” can be defined in the “if” clause where it is used.

5.
```
 Datum
 bytea_sortsupport(PG_FUNCTION_ARGS)
 {
  SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0);
  MemoryContext oldcontext;
+ ByteaSortSupport *bss;
+ bool abbreviate = ssup->abbreviate;
```

The local variable “abbreviate” is only used once, do we really need to cache ssup->abbreviate into a local variable?



--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/




Re: [PATCH] Refactor bytea_sortsupport(), take two

От
Aleksander Alekseev
Дата:
Hi Chao,

> 1.
> ····
> + /* We can't afford to leak memory here. */
> + if (PointerGetDatum(arg1) != x)
> + pfree(arg1);
> + if (PointerGetDatum(arg2) != y)
> + pfree(arg2);
> ···
>
> Should we do:
> ```
>     PG_FREE_IF_COPY(arg1, 0);
>     PG_FREE_IF_COPY(arg2, 1)
> ```
>
> Similar to other places.

Hmmm... to me it doesn't look more readable to be honest. I choose the
same pattern used in varlena.c and suggest keeping it for consistency.
IMO this refactoring can be discussed later in a separate thread. Good
observation though.

> 2.
> ···
> +/*
> + * Conversion routine for sortsupport.
> + */
> +static Datum
> +bytea_abbrev_convert(Datum original, SortSupport ssup)
> ···
>
> The function comment is less descriptive. I would suggest something like:
> ```
> /*
>  * Abbreviated key conversion for bytea sortsupport.
>  *
>  * Returns the abbreviated key as a Datum.  If a detoasted copy was made,
>  * it is freed before returning.
>  */
> ```

Sorry, but I don't think I agree with this either. The comment you are
proposing exposes the implementation details. I don't think that the
caller needs to know them.

This is basically a simplified varstr_abbrev_convert(). I was not
certain if I should mention this in a comment and/or how much text I
should copy from the comment for varstr_abbrev_convert(). I'm open to
suggestions.

> 3.
> ```
> + if (abbrev_distinct <= 1.0)
> + abbrev_distinct = 1.0;
> +
> + if (key_distinct <= 1.0)
> + key_distinct = 1.0;
> ```
>
> Why <= 1.0 then set to 1.0? Shouldn't “if" takes just <1.0?

Good point. I'll fix this in v4. Also I'll modify
varstr_abbrev_abort() for consistency.

One could argue that the change of varstr_abbrev_abort() is irrelevant
in the context of this patch. If there are objections we can easily
change '<' back to '<='. It's not that important but '<' looks cleaner
IMO.

> 4.
> ```
>  Datum
>  bytea_sortsupport(PG_FUNCTION_ARGS)
>  {
>   SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0);
>   MemoryContext oldcontext;
> + ByteaSortSupport *bss;
> ```
>
> “Bss” can be defined in the “if” clause where it is used.

Agree. I'll fix this in v4.

> 5.
> ```
>  Datum
>  bytea_sortsupport(PG_FUNCTION_ARGS)
>  {
>   SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0);
>   MemoryContext oldcontext;
> + ByteaSortSupport *bss;
> + bool abbreviate = ssup->abbreviate;
> ```
>
> The local variable “abbreviate” is only used once, do we really need to cache ssup->abbreviate into a local variable?

Agree, thanks.

--
Best regards,
Aleksander Alekseev



Re: [PATCH] Refactor bytea_sortsupport(), take two

От
Aleksander Alekseev
Дата:
Hi John,

> - * Relies on the assumption that text, VarChar, BpChar, and bytea all have the
> - * same representation.  Callers that always use the C collation (e.g.
> - * non-collatable type callers like bytea) may have NUL bytes in their strings;
> - * this will not work with any other collation, though.
> + * Relies on the assumption that text, VarChar, and BpChar all have the
> + * same representation. These text types cannot contain NUL bytes.
>
> AFAICS, the only reaon to mention NUL bytes here before was because of
> bytea -- sirnce bytea is being removed from this path, I don't see a
> need to mention mention NUL bytes here. It'd be relevant if any
> simplification is possible in functions that previously may have had
> to worry about NUL bytes. I don't immediately see any such
> opportunities, though. Are there?

Doesn't seem so. I removed the mention of NUL bytes.

> + * Note: text types (text, varchar, bpchar) cannot contain NUL bytes,
> + * so we don't need to worry about NUL byte handling here.
>
> Same here. Also, "Note" is stronger than a normal comment, maybe to
> call attention to complex edge cases or weird behaviors.

Agree. Fixed.

> +#if SIZEOF_DATUM == 8
>
> We recently made all datums 8 bytes.

Right, I forgot to change the patch accordingly. Fixed.

> Some comments are randomly different than the equivalents in
> varlena.c. It's probably better if same things remain the same, but
> there's nothing wrong either.

I did my best to keep the comments consistent between varlena.c and
bytea.c. I don't think they are going to be that consistent in the
long term anyway, so not sure how much effort we should invest into
this.

> "Lastly, the performance and memory consumption could be optimized a little for
> the bytea case."
>
> Is this a side effect of the patch, or a possibility for future work?
> It's not clear.

The idea was to reflect the fact that bytea-specific code becomes
simpler (less if's, etc) and uses structures of smaller size. This is
probably not that important for the commit message. I removed it in
order to avoid confusion.

PFA patch v4.

-- 
Best regards,
Aleksander Alekseev

Вложения