Обсуждение: UUID v7

Поиск
Список
Период
Сортировка

UUID v7

От
Andrey Borodin
Дата:
Hello pgsql-hackers!

As you may know there's a new version of UUID being standardized [0].
These new algorithms of UUID generation are very promising for
database performance. It keeps data locality for time-ordered values.
From my POV v7 is especially needed for users. Current standard status
is "draft". And I'm not sure it will be accepted before our feature
freeze for PG16. Maybe we could provide a draft implementation in 16
and adjust it to the accepted version if the standard is changed? PFA
patch with implementation.

What do you think?

cc Brad Peabody and Kyzer R. Davis, authors of the standard
cc Kirk Wolak and Nik Samokhvalov who requested the feature

Thanks!

Best regards, Andrey Borodin.

[0] https://datatracker.ietf.org/doc/html/draft-peabody-dispatch-new-uuid-format-04

Вложения

Re: UUID v7

От
Andres Freund
Дата:
Hi,

On 2023-02-10 15:57:50 -0800, Andrey Borodin wrote:
> As you may know there's a new version of UUID being standardized [0].
> These new algorithms of UUID generation are very promising for
> database performance.

I agree it's very useful to have.


> [0] https://datatracker.ietf.org/doc/html/draft-peabody-dispatch-new-uuid-format-04

That looks to not be the current version anymore, it's superseded by:
https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis


> It keeps data locality for time-ordered values.
> From my POV v7 is especially needed for users. Current standard status
> is "draft". And I'm not sure it will be accepted before our feature
> freeze for PG16. Maybe we could provide a draft implementation in 16
> and adjust it to the accepted version if the standard is changed? PFA
> patch with implementation.

Hm. It seems somewhat worrisome to claim something is a v7 UUID when it might
turn out to not be one.


Perhaps we should name the function something like
gen_time_ordered_random_uuid() instead?  That gives us a bit more flexibility
about what uuid version we generate. And it might be easier for users, anyway.

Still not sure what version we'd best use for now. Perhaps v8?

Greetings,

Andres Freund



Re: UUID v7

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2023-02-10 15:57:50 -0800, Andrey Borodin wrote:
>> From my POV v7 is especially needed for users. Current standard status
>> is "draft". And I'm not sure it will be accepted before our feature
>> freeze for PG16. Maybe we could provide a draft implementation in 16
>> and adjust it to the accepted version if the standard is changed? PFA
>> patch with implementation.

> Hm. It seems somewhat worrisome to claim something is a v7 UUID when it might
> turn out to not be one.

I think there is no need to rush this into v16.  Let's wait for the
standardization process to play out.

            regards, tom lane



Re: UUID v7

От
Andrey Borodin
Дата:
On Fri, Feb 10, 2023 at 5:14 PM Andres Freund <andres@anarazel.de> wrote:
>
> Perhaps we should name the function something like
> gen_time_ordered_random_uuid() instead?  That gives us a bit more flexibility
> about what uuid version we generate. And it might be easier for users, anyway.
I think users would be happy with any name.

> Still not sure what version we'd best use for now. Perhaps v8?
V8 is just a "custom data" format. Like "place whatever you want".
Though I agree that its sample implementation looks to be better.



On Fri, Feb 10, 2023 at 5:18 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> > Hm. It seems somewhat worrisome to claim something is a v7 UUID when it might
> > turn out to not be one.
>
> I think there is no need to rush this into v16.  Let's wait for the
> standardization process to play out.
>

Standardization per se does not bring value to users. However, I agree
that eager users can just have it today as an extension and be happy
with it [0].
Maybe it's fine to wait a year for others...


Best regards, Andrey Borodin.


[0] https://github.com/x4m/pg_uuid_next



Re: UUID v7

От
Peter Eisentraut
Дата:
On 11.02.23 02:14, Andres Freund wrote:
> On 2023-02-10 15:57:50 -0800, Andrey Borodin wrote:
>> As you may know there's a new version of UUID being standardized [0].
>> These new algorithms of UUID generation are very promising for
>> database performance.
> 
> I agree it's very useful to have.
> 
> 
>> [0] https://datatracker.ietf.org/doc/html/draft-peabody-dispatch-new-uuid-format-04
> 
> That looks to not be the current version anymore, it's superseded by:
> https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis

Yes, this means that the draft that an individual had uploaded has now 
been taken on by a working group for formal review.  If there is a 
prototype implementation, this is a good time to provide feedback.  But 
it's too early to ship a production version.




RE: UUID v7

От
"Kyzer Davis (kydavis)"
Дата:
Hello Group,

I am happy to see others interested in the improvements provided by UUIDv7!

I caught up on the thread and you all are correct.

Work has moved on GitHub from uuid6/uuid6-ietf-draft to 
ietf-wg-uuidrev/rfc4122bis
- Draft 00 merged RFC4122 with Draft 04 and fixed as many problems as possible 
with RFC4122.
- Draft 01 continued to iterate on RFC4122 problems: 
https://author-tools.ietf.org/iddiff?url2=draft-ietf-uuidrev-rfc4122bis-01
- Draft 02 items being changed are summarized in the latest PR for review in 
the upcoming interim meeting (Feb 16th): 
https://github.com/ietf-wg-uuidrev/rfc4122bis/pull/60
Note: Draft 02 should be published by the end of the week and long term we 
have one more meeting at IETF 116 to iron out the replacement of RFC4122, 
perform last call and submit to the IESG for official review and consideration 
for replacement of RFC4122 (actual timeline for that varies based on what IESG 
wants me to fix.)

That all being said:
The point is 99% of the work since adoption by the IETF has been ironing out 
RFC4122's problems and nothing major related to UUIDv6/7/8 which are all in a 
very good state.

If anybody has any feedback found during draft reviewing or prototyping; 
please either email uuidrev@ietf.org or drop an issue on the tracker:
https://github.com/ietf-wg-uuidrev/rfc4122bis/issues

Lastly, I have added the C/SQL implementation to the prototypes page below:
https://github.com/uuid6/prototypes

Thanks!

-----Original Message-----
From: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Sent: Saturday, February 11, 2023 10:51 AM
To: Andres Freund <andres@anarazel.de>; Andrey Borodin <amborodin86@gmail.com>
Cc: pgsql-hackers <pgsql-hackers@postgresql.org>; brad@peabody.io; 
wolakk@gmail.com; Kyzer Davis (kydavis) <kydavis@cisco.com>; Nikolay 
Samokhvalov <samokhvalov@gmail.com>
Subject: Re: UUID v7

On 11.02.23 02:14, Andres Freund wrote:
> On 2023-02-10 15:57:50 -0800, Andrey Borodin wrote:
>> As you may know there's a new version of UUID being standardized [0].
>> These new algorithms of UUID generation are very promising for
>> database performance.
>
> I agree it's very useful to have.
>
>
>> [0]
>> https://datatracker.ietf.org/doc/html/draft-peabody-dispatch-new-uuid
>> -format-04
>
> That looks to not be the current version anymore, it's superseded by:
> https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis

Yes, this means that the draft that an individual had uploaded has now been 
taken on by a working group for formal review.  If there is a prototype 
implementation, this is a good time to provide feedback.  But it's too early 
to ship a production version.


Вложения

Re: UUID v7

От
Nikolay Samokhvalov
Дата:
On Tue, Feb 14, 2023 at 6:13 AM Kyzer Davis (kydavis) <kydavis@cisco.com> wrote:
> I am happy to see others interested in the improvements provided by UUIDv7!

Thank you for providing the details!

Some small updates as I see them:
- there is revision 7 now in https://github.com/ietf-wg-uuidrev/rfc4122bis
- noticing that there is no commitfest record, I created one:
https://commitfest.postgresql.org/43/4388/
- recent post by Ants Aasma, Cybertec about the downsides of
traditional UUID raised a big discussion today on HN:
https://news.ycombinator.com/item?id=36429986



Re: UUID v7

От
Daniel Gustafsson
Дата:
> On 22 Jun 2023, at 20:30, Nikolay Samokhvalov <samokhvalov@gmail.com> wrote:

> Some small updates as I see them:
> - there is revision 7 now in https://github.com/ietf-wg-uuidrev/rfc4122bis
> - noticing that there is no commitfest record, I created one:

I will actually go ahead and close this entry in the current CF, not because we
don't want the feature but because it's unlikely that it will go in now given
that standardization is still underway.  Comitting anything right now seems
premature, we might as well wait for standardization given that we have lots of
time before the v17 freeze.

--
Daniel Gustafsson




Re: UUID v7

От
Matthias van de Meent
Дата:
On Thu, 6 Jul 2023 at 14:24, Daniel Gustafsson <daniel@yesql.se> wrote:
>
> > On 22 Jun 2023, at 20:30, Nikolay Samokhvalov <samokhvalov@gmail.com> wrote:
>
> > Some small updates as I see them:
> > - there is revision 7 now in https://github.com/ietf-wg-uuidrev/rfc4122bis
> > - noticing that there is no commitfest record, I created one:
>
> I will actually go ahead and close this entry in the current CF, not because we
> don't want the feature but because it's unlikely that it will go in now given
> that standardization is still underway.  Comitting anything right now seems
> premature, we might as well wait for standardization given that we have lots of
> time before the v17 freeze.

I'd like to note that this draft has recently had its last call
period, and has been proposed for publishing early last month. I don't
know how long this publishing process usually takes, but it seems like
the WG considers the text final, so unless this would take months I
wouldn't mind keeping this patch around as "waiting for external
process to complete". Sure, it's earlier than the actual release of
the standard, but that wasn't a blocker for SQL features that were
considered finalized either.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)



Re: UUID v7

От
Daniel Gustafsson
Дата:
> On 6 Jul 2023, at 15:29, Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:
>
> On Thu, 6 Jul 2023 at 14:24, Daniel Gustafsson <daniel@yesql.se> wrote:
>>
>>> On 22 Jun 2023, at 20:30, Nikolay Samokhvalov <samokhvalov@gmail.com> wrote:
>>
>>> Some small updates as I see them:
>>> - there is revision 7 now in https://github.com/ietf-wg-uuidrev/rfc4122bis
>>> - noticing that there is no commitfest record, I created one:
>>
>> I will actually go ahead and close this entry in the current CF, not because we
>> don't want the feature but because it's unlikely that it will go in now given
>> that standardization is still underway.  Comitting anything right now seems
>> premature, we might as well wait for standardization given that we have lots of
>> time before the v17 freeze.
>
> I'd like to note that this draft has recently had its last call
> period, and has been proposed for publishing early last month.

Sure, but this document is in AD Evaluation and there are many stages left in
the IESG process, it may still take a fair bit of time before this is done.

> Sure, it's earlier than the actual release of
> the standard, but that wasn't a blocker for SQL features that were
> considered finalized either.

I can't speak for any SQL standard features we've committed before being
standardized, it's for sure not the norm for the project.  I'm only commenting
on this particular Internet standard which we have plenty of time to commit
before v17 without rushing to beat a standards committee.

Also, if you look you can see that I moved it to the next CF in a vague hope
that standardization will be swift (which is admittedly never is).

--
Daniel Gustafsson




Re: UUID v7

От
Tom Lane
Дата:
Daniel Gustafsson <daniel@yesql.se> writes:
> On 6 Jul 2023, at 15:29, Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:
>> Sure, it's earlier than the actual release of
>> the standard, but that wasn't a blocker for SQL features that were
>> considered finalized either.

> I can't speak for any SQL standard features we've committed before being
> standardized, it's for sure not the norm for the project.

We have done a couple of things that way recently.  An important
reason why we felt we could get away with that is that nowadays
we have people who actually sit on the SQL committee and have
reliable information on what's likely to make it into the final text
of the next version.  I don't think we have equivalent visibility or
should have equivalent confidence about how UUID v7 standardization
will play out.

            regards, tom lane



Re: UUID v7

От
Peter Eisentraut
Дата:
On 06.07.23 16:02, Tom Lane wrote:
> Daniel Gustafsson <daniel@yesql.se> writes:
>> On 6 Jul 2023, at 15:29, Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:
>>> Sure, it's earlier than the actual release of
>>> the standard, but that wasn't a blocker for SQL features that were
>>> considered finalized either.
> 
>> I can't speak for any SQL standard features we've committed before being
>> standardized, it's for sure not the norm for the project.
> 
> We have done a couple of things that way recently.  An important
> reason why we felt we could get away with that is that nowadays
> we have people who actually sit on the SQL committee and have
> reliable information on what's likely to make it into the final text
> of the next version.  I don't think we have equivalent visibility or
> should have equivalent confidence about how UUID v7 standardization
> will play out.

(I have been attending some meetings and I'm on the mailing list.)

Anyway, I think it would be reasonable to review this patch now.  We 
might leave it hanging in "Ready for Committer" for a while when we get 
there.  But surely review can start now.




Re: UUID v7

От
"Andrey M. Borodin"
Дата:

> On 6 Jul 2023, at 21:38, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:
>
> I think it would be reasonable to review this patch now.
+1.

Also, I think we should discuss UUID v8. UUID version 8 provides an RFC-compatible format for experimental or
vendor-specificuse cases. Revision 1 of IETF draft contained interesting code for v8: almost similar to v7, but with
fieldsfor "node ID" and "rolling sequence number". 
I think this is reasonable approach, thus I attach implementation of UUID v8 per [0]. But from my point of view this
implementationhas some flaws. 
These two new fields "node ID" and "sequence" are there not for uniqueness, but rather for data locality.
But they are placed at the end, in bytes 14 and 15, after randomly generated numbers.

I think that "sequence" is there to help generate local ascending identifiers when the real time clock do not provide
enoughresolution. So "sequence" field must be placed after 6 bytes of time-generated identifier. 

On a contrary "node ID" must differentiate identifiers generated on different nodes. So it makes sense to place "node
ID"before timing. So identifiers generated on different nodes will tend to be in different ranges. 
Although, section "6.4. Distributed UUID Generation" states that "node ID" is there to decrease the likelihood of a
collision.So my intuition might be wrong here. 


Do we want to provide this "vendor-specific" UUID with tweaks for databases? Or should we limit the scope with well
definedUUID v7? 


Best regards, Andrey Borodin.


[0] https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-01


Вложения

RE: UUID v7

От
"Kyzer Davis (kydavis)"
Дата:
Great discussions group,

> I think it would be reasonable to review this patch now.
I am happy to review the format and logic for any proposed v7 and/or v8
UUID. Just point me to a PR or some code review location.

> Distributed UUID Generation" states that "node ID" is there to decrease
> the likelihood of a collision.
Correct, node identifiers help provide some bit space that ensures no
collision in the event the stars align where two nodes create the exact
UUID.

From what I have seen UUIDv7 should meet the requirements outlined thus far
In this thread.

Also to add, there are two UUID prototypes for postgres from my checks.
Although they are outdated from the latest draft sent up for official
Publication so review them from an academic perspective.)
- https://github.com/uuid6/prototypes
- pg_uuid_next (see this thread which nicely summarizes some UUIDv7
"checkboxes" https://github.com/x4m/pg_uuid_next/issues/1)
- UUID_v7_for_Postgres.sql

Don't forget, if we have UUIDv1 already implemented in the postgres code you
may want to examine UUIDv6.
UUIDv6 is simply a fork of that code and swap of the timestamp bits.
In terms of effort UUIDv6 easy to implement and gives you a time ordered
UUID re-using 99% of the code you may already have.

Lastly, my advice on v8 is that I would examine/implement v6 or v7 first
before jumping to v8
because whatever you do for implementing v6 or v7 will help you implement a
better v8.
There are also a number of v8 prototype implementations (at the previous
link) if somebody wants to give them a scroll.

Happy to answer any other questions where I can provide input.

Thanks,

-----Original Message-----
From: Andrey M. Borodin <x4mmm@yandex-team.ru> 
Sent: Friday, July 7, 2023 8:06 AM
To: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; Daniel Gustafsson <daniel@yesql.se>;
Matthias van de Meent <boekewurm+postgres@gmail.com>; Nikolay Samokhvalov
<samokhvalov@gmail.com>; Kyzer Davis (kydavis) <kydavis@cisco.com>; Andres
Freund <andres@anarazel.de>; Andrey Borodin <amborodin86@gmail.com>;
PostgreSQL Hackers <pgsql-hackers@postgresql.org>; brad@peabody.io;
wolakk@gmail.com
Subject: Re: UUID v7



> On 6 Jul 2023, at 21:38, Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
> 
> I think it would be reasonable to review this patch now.
+1.

Also, I think we should discuss UUID v8. UUID version 8 provides an
RFC-compatible format for experimental or vendor-specific use cases.
Revision 1 of IETF draft contained interesting code for v8: almost similar
to v7, but with fields for "node ID" and "rolling sequence number".
I think this is reasonable approach, thus I attach implementation of UUID v8
per [0]. But from my point of view this implementation has some flaws.
These two new fields "node ID" and "sequence" are there not for uniqueness,
but rather for data locality.
But they are placed at the end, in bytes 14 and 15, after randomly generated
numbers.

I think that "sequence" is there to help generate local ascending
identifiers when the real time clock do not provide enough resolution. So
"sequence" field must be placed after 6 bytes of time-generated identifier.

On a contrary "node ID" must differentiate identifiers generated on
different nodes. So it makes sense to place "node ID" before timing. So
identifiers generated on different nodes will tend to be in different
ranges.
Although, section "6.4. Distributed UUID Generation" states that "node ID"
is there to decrease the likelihood of a collision. So my intuition might be
wrong here.


Do we want to provide this "vendor-specific" UUID with tweaks for databases?
Or should we limit the scope with well defined UUID v7?


Best regards, Andrey Borodin.


[0] https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-01


Вложения

Re: UUID v7

От
Peter Eisentraut
Дата:
On 07.07.23 14:06, Andrey M. Borodin wrote:
> Also, I think we should discuss UUID v8. UUID version 8 provides an RFC-compatible format for experimental or
vendor-specificuse cases. Revision 1 of IETF draft contained interesting code for v8: almost similar to v7, but with
fieldsfor "node ID" and "rolling sequence number".
 
> I think this is reasonable approach, thus I attach implementation of UUID v8 per [0].

I suggest we keep this thread to v7, which has pretty straightforward 
semantics for PostgreSQL.  v8 by definition has many possible 
implementations, so you're going to have to make pretty strong arguments 
that yours is the best and only one, if you are going to claim the 
gen_uuid_v8 function name.




Re: UUID v7

От
"Andrey M. Borodin"
Дата:


On 10 Jul 2023, at 21:50, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:

I suggest we keep this thread to v7, which has pretty straightforward semantics for PostgreSQL.  v8 by definition has many possible implementations, so you're going to have to make pretty strong arguments that yours is the best and only one, if you are going to claim the gen_uuid_v8 function name.

Thanks Peter, I'll follow this course of action.

After discussion on GitHub with Sergey Prokhorenko [0] I understood that counter is optional, but useful part of UUID v7. It actually promotes sortability of data generated at high speed.
The standard does not specify how big counter should be. PFA patch with 16 bit counter. Maybe it worth doing 18bit counter - it will save us one byte of PRNG data. Currently we only take 2 bits out of the whole random byte.


Best regards, Andrey Borodin.


Вложения

Re: UUID v7

От
"Andrey M. Borodin"
Дата:

> On 30 Jul 2023, at 13:08, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>
>
> After discussion on GitHub with Sergey Prokhorenko [0] I understood that counter is optional, but useful part of UUID
v7.It actually promotes sortability of data generated at high speed. 
> The standard does not specify how big counter should be. PFA patch with 16 bit counter. Maybe it worth doing 18bit
counter- it will save us one byte of PRNG data. Currently we only take 2 bits out of the whole random byte. 
>

Here's a new patch version. Now counter is initialised with strong random on every time change (each ms). However, one
firstbit of the counter is preserved to zero. This is done to extend counter capacity (I left comments with reference
toRFC with explanations). 

Thanks!

Best regards, Andrey Borodin.

Вложения

Re: UUID v7

От
"Andrey M. Borodin"
Дата:

> On 20 Aug 2023, at 23:56, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>
> <v4-0001-Implement-UUID-v7-as-per-IETF-draft.patch>

I've observed, that pre-generating and buffering random numbers makes UUID generation 10 times faster.

Without buffering
postgres=# with x as (select gen_uuid_v7() from generate_series(1,1e6)) select count(*) from x;
Time: 5286.572 ms (00:05.287)

With buffering
postgres=# with x as (select gen_uuid_v7() from generate_series(1,1e6)) select count(*) from x;
Time: 390.091 ms

This can speed up gen_random_uuid() on the same scale too. PFA implementation of this technique.


Best regards, Andrey Borodin.

Вложения

Re: UUID v7

От
"Andrey M. Borodin"
Дата:

> On 21 Aug 2023, at 13:42, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>
>
<v5-0001-Implement-UUID-v7-as-per-IETF-draft.patch><v5-0002-Buffer-random-numbers.patch><v5-0003-Use-cached-random-numbers-in-gen_random_uuid-too.patch>

FPA attached next version.
Changes:
- implemented protection from time leap backwards when series is generated on the same backend
- counter overflow is now translated into ms step forward


Best regards, Andrey Borodin.

Вложения

Re: UUID v7

От
Mat Arye
Дата:
Andrey,

Thanks for all your work on this. I think this will be really useful. 

From a user perspective, it would be great to add 2 things:
- A function to extract the timestamp from a V7 UUID (very useful for defining constraints if partitioning by the uuid-embedded timestamps, for instance).
- Can we add an optional timestamptz argument to gen_uuid_v7 so that you can explicitly specify a time instead of always generating for the current time? If the argument is NULL, then use current time. This could be useful for backfilling and other applications.

Thanks,
Matvey Arye
Timescale software developer.


On Wed, Aug 30, 2023 at 3:05 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:


> On 21 Aug 2023, at 13:42, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>
> <v5-0001-Implement-UUID-v7-as-per-IETF-draft.patch><v5-0002-Buffer-random-numbers.patch><v5-0003-Use-cached-random-numbers-in-gen_random_uuid-too.patch>

FPA attached next version.
Changes:
- implemented protection from time leap backwards when series is generated on the same backend
- counter overflow is now translated into ms step forward


Best regards, Andrey Borodin.

Re: UUID v7

От
"Andrey M. Borodin"
Дата:
Thanks for interesting ideas, Mat!

> On 31 Aug 2023, at 20:32, Mat Arye <mat@timescaledb.com> wrote:
>
> From a user perspective, it would be great to add 2 things:
> - A function to extract the timestamp from a V7 UUID (very useful for defining constraints if partitioning by the
uuid-embeddedtimestamps, for instance). 

Well, as far as I know, RFC discourages extracting timestamps from UUIDs. But we still can have such functions...maybe
asan extension? 

> - Can we add an optional timestamptz argument to gen_uuid_v7 so that you can explicitly specify a time instead of
alwaysgenerating for the current time? If the argument is NULL, then use current time. This could be useful for
backfillingand other applications. 

I think this makes sense. We could also have a counter as an argument. I'll try to implement that.
However, so far I haven't figured out how to implement optional arguments for catalog functions. I'd appreciate any
pointershere. 


Best regards, Andrey Borodin.


Re: UUID v7

От
Chris Travers
Дата:
So I am in the process of reviewing the patch and hopefully can provide something there soon.

However I want to address in the mean time the question of timestamp functions.  I know that is outside the scope of
thispatch but I would be in favor of adding them generally, not just as an extension but eventually into core.  I
understand(and generally agree with) the logic of not generally extracting timestamps from UUIDs or other such field,s
butthere are cases where it is really, really helpful to be able to do.  In particular when you are troubleshooting
misbehavior,all information you can get is helpful.  And so extracting all of the subfields can be helpful.
 

The problem with putting this in an extension is that this is mostly useful when debugging systems (particularly larger
distributedsystems) and so the chances of it hitting a critical mass enough to be supported by all major cloud vendors
iseffectively zero.
 

So I am not asking for this to be included in this patch but I am saying I would love to see these sort of things
contributedat some point to core. 

Re: UUID v7

От
Nick Babadzhanian
Дата:
On Thu, 31 Aug 2023 at 23:10, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
> Well, as far as I know, RFC discourages extracting timestamps from UUIDs. But we still can have such
functions...maybeas an extension?
 

Do you know of any reason for that?

> However, so far I haven't figured out how to implement optional arguments for catalog functions. I'd appreciate any
pointershere.
 

I'd argue that the time argument shouldn't be optional. Asking the
user to supply time would force them to think whether they want to go
with `now()` or `clock_timestamp()` or something else.

Also, a shameless plug with my extension for UUID v1 that implements
extract and create from (and an opclass):
https://github.com/pgnickb/uuid_v1_ops



Re: UUID v7

От
Jelte Fennema
Дата:
On Mon, 9 Oct 2023 at 18:46, Nick Babadzhanian <pgnickb@gmail.com> wrote:
>
> On Thu, 31 Aug 2023 at 23:10, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
> > Well, as far as I know, RFC discourages extracting timestamps from UUIDs. But we still can have such
functions...maybeas an extension? 
>
> Do you know of any reason for that?

No reasons are given but the RFC states this:

> UUIDs SHOULD be treated as opaque values and implementations SHOULD NOT examine the bits in a UUID to whatever extent
ispossible. However, where necessary, inspectors should refer to Section 4 for more information on determining UUID
versionand variant. 

> > However, so far I haven't figured out how to implement optional arguments for catalog functions. I'd appreciate any
pointershere. 
>
> I'd argue that the time argument shouldn't be optional. Asking the
> user to supply time would force them to think whether they want to go
> with `now()` or `clock_timestamp()` or something else.

I think using `now()` is quite prone to sequence rollover. With the
current patch inserting more than 2^18~=0.26M rows into a table with
`gen_uuid_v7()` as the default in a single transaction would already
cause sequence rollover. I think using a monotonic clock source is the
only reasonable thing to do. From the RFC:

> Implementations SHOULD use the current timestamp from a reliable source to provide values that are time-ordered and
continuallyincreasing. Care SHOULD be taken to ensure that timestamp changes from the environment or operating system
arehandled in a way that is consistent with implementation requirements. For example, if it is possible for the system
clockto move backward due to either manual adjustment or corrections from a time synchronization protocol,
implementationsmust decide how to handle such cases. (See Altering, Fuzzing, or Smearing bullet below.) 



Re: UUID v7

От
Brad Peabody
Дата:
> > Well, as far as I know, RFC discourages extracting timestamps from UUIDs. But we still can have such functions...maybe as an extension?
> Do you know of any reason for that?

I guess some of the detail may have been edited out over time with all of the changes, but it’s basically this: https://github.com/ietf-wg-uuidrev/rfc4122bis/blob/main/draft-ietf-uuidrev-rfc4122bis.md#opacity-opacity. The rationale is that when you introspect a UUID you essentially add interoperability concerns.  E.g. if we say that applications can rely on being able to parse the timestamp from the UUID then it means that other implementations must provide guarantees about what that timestamp is.  And since the point of a UUID is to provide a unique value, not to transmit additional metadata, the decision was made early on that it’s more realistic and representative of the reality of the situation to say that applications should generate values, try not to parse them if they don’t have to, but if they do it’s only going to be as accurate as the original data put into it.  So systems with no NTP enabled, or that fuzz part of the time so as not to leak the exact moment in time something was done, etc - those are things that are going to happen and so buyer beware when parsing.

If the question is whether or not a function should exist to parse a timestamp from a UUID, I would say sure go ahead, just mention that the timestamp is only accurate as the input, and the spec doesn’t guarantee anything if your UUID came from another source.  I imagine a common case would be UUIDs generated in within the same database, and someone wants to extract the timestamp, which would be as reliable as the timestamp on the database machine - seems like a perfectly good case where supporting timestamp extraction as practical value.


On Oct 9, 2023, at 11:11 AM, Jelte Fennema <postgres@jeltef.nl> wrote:

On Mon, 9 Oct 2023 at 18:46, Nick Babadzhanian <pgnickb@gmail.com> wrote:

On Thu, 31 Aug 2023 at 23:10, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
Well, as far as I know, RFC discourages extracting timestamps from UUIDs. But we still can have such functions...maybe as an extension?

Do you know of any reason for that?

No reasons are given but the RFC states this:

UUIDs SHOULD be treated as opaque values and implementations SHOULD NOT examine the bits in a UUID to whatever extent is possible. However, where necessary, inspectors should refer to Section 4 for more information on determining UUID version and variant.

However, so far I haven't figured out how to implement optional arguments for catalog functions. I'd appreciate any pointers here.

I'd argue that the time argument shouldn't be optional. Asking the
user to supply time would force them to think whether they want to go
with `now()` or `clock_timestamp()` or something else.

I think using `now()` is quite prone to sequence rollover. With the
current patch inserting more than 2^18~=0.26M rows into a table with
`gen_uuid_v7()` as the default in a single transaction would already
cause sequence rollover. I think using a monotonic clock source is the
only reasonable thing to do. From the RFC:

Implementations SHOULD use the current timestamp from a reliable source to provide values that are time-ordered and continually increasing. Care SHOULD be taken to ensure that timestamp changes from the environment or operating system are handled in a way that is consistent with implementation requirements. For example, if it is possible for the system clock to move backward due to either manual adjustment or corrections from a time synchronization protocol, implementations must decide how to handle such cases. (See Altering, Fuzzing, or Smearing bullet below.)

Re: UUID v7

От
Andrey Borodin
Дата:
On Mon, Oct 9, 2023 at 11:11 PM Jelte Fennema <postgres@jeltef.nl> wrote:
>
> I think using `now()` is quite prone to sequence rollover. With the
> current patch inserting more than 2^18~=0.26M rows into a table with
> `gen_uuid_v7()` as the default in a single transaction would already
> cause sequence rollover.

Well, the current patch will just use now()+1ms when 2^18 is
exhausted. Even if now() would be passed as an argument (however
current patch does not support an argument).

Best regards, Andrey Borodin.



Re: UUID v7

От
"Andrey M. Borodin"
Дата:

> On 9 Oct 2023, at 23:46, Andrey Borodin <amborodin86@gmail.com> wrote:

Here's next iteration of the patch. I've added get_uuid_v7_time().
This function extracts timestamp from uuid, iff it is v7. Timestamp correctness only guaranteed if the timestamp was
generatedby the same implementation (6 bytes for milliseconds obtained by gettimeofday()). 
Tests verify that get_uuid_v7_time(gen_uuid_v7()) differs no more than 1ms from now(). Maybe we should allow more
tolerantvalues for slow test machines. 


Best regards, Andrey Borodin.



Вложения

Re: UUID v7

От
"Andrey M. Borodin"
Дата:


On 2 Jan 2024, at 14:17, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

Tests verify that get_uuid_v7_time(gen_uuid_v7()) differs no more than 1ms from now(). Maybe we should allow more tolerant values for slow test machines.

Indeed, CFbot complained about flaky tests. I've increased test tolerance to 100ms. (this does not affect test time)


Best regards, Andrey Borodin.
Вложения

Re: Re: UUID v7

От
Przemysław Sztoch
Дата:
Dear Andrey,

1. Is it possible to add a function that returns the version of the generated uuid?
It will be very useful.
I don't know if it's possible, but I think there are bits in the UUID that inform about the version.

2. If there is any doubt about adding the function to the main sources (standard development in progress), in my opinion you can definitely add this function to the uuid-ossp extension.

3. Wouldn't it be worth including UUID version 6 as well?

4. Sometimes you will need to generate a uuid for historical time. There should be an additional function gen_uuid_v7(timestamp).

Nevertheless, the need for uuid v6/7/8 is very high and I'm glad it's coming to PostgreSQL. It should be a PG17 version.
--
Przemysław Sztoch | Mobile +48 509 99 00 66

Re: UUID v7

От
"Andrey M. Borodin"
Дата:
Hello Przemysław,

thanks for your interest in this patch!

> On 3 Jan 2024, at 04:37, Przemysław Sztoch <przemyslaw@sztoch.pl> wrote:
>
> 1. Is it possible to add a function that returns the version of the generated uuid?
> It will be very useful.
> I don't know if it's possible, but I think there are bits in the UUID that inform about the version.
What do you think if we have functions get_uuid_v7_ver(uuid) and get_uuid_v7_var(uuid) to extract bit fields according
to[0] ? Or, perhaps, this should be one function with two return parameters? 
It's not in a patch yet, I'm just considering how this functionality should look like.

> 2. If there is any doubt about adding the function to the main sources (standard development in progress), in my
opinionyou can definitely add this function to the uuid-ossp extension. 
From my POV we can just have this function in the core. OSSP support for UUID seems more or less dead [1]: "Newsflash:
04-Jul-2008:Released OSSP uuid 1.6.2". Or do I look into wrong place? 

> 3. Wouldn't it be worth including UUID version 6 as well?
The standard in [0] says "Systems that do not involve legacy UUIDv1 SHOULD use UUIDv7 Section 5.7 instead." If there's
apoint in developing v6 - I'm OK to do so. 

> 4. Sometimes you will need to generate a uuid for historical time. There should be an additional function
gen_uuid_v7(timestamp).
Done, please see patch attached. But I changed signature to gen_uuid_v7(int8), to avoid messing with bytes from user
whoknows what they want. Or do you think gen_uuid_v7(timestamp) would be more convenient? 

Thanks!

Best regards, Andrey Borodin.


[0] https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-14#uuidv7
[1] http://www.ossp.org/

Вложения

Re: UUID v7

От
Jelte Fennema-Nio
Дата:
First of all, I'm a huge fan of UUID v7. So I'm very excited that this
is progressing. I'm definitely going to look closer at this patch
soon. Some tiny initial feedback:

(bikeshed) I'd prefer renaming `get_uuid_v7_time` to the shorter
`uuid_v7_time`, the `get_` prefix seems rarely used in Postgres
functions (e.g. `date_part` is not called `get_date_part`). Also it's
visually very similar to the gen_ prefix.

On Thu, 4 Jan 2024 at 19:20, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
> > On 3 Jan 2024, at 04:37, Przemysław Sztoch <przemyslaw@sztoch.pl> wrote:
> > 1. Is it possible to add a function that returns the version of the generated uuid?
> > It will be very useful.
> > I don't know if it's possible, but I think there are bits in the UUID that inform about the version.
> What do you think if we have functions get_uuid_v7_ver(uuid) and get_uuid_v7_var(uuid) to extract bit fields
accordingto [0] ? Or, perhaps, this should be one function with two return parameters? 
> It's not in a patch yet, I'm just considering how this functionality should look like.

I do agree that those functions would be useful, especially now that
we're introducing a function that errors when it's passed a UUID
that's not of version 7. With the version extraction function you
could return something else for other uuids if you have many and not
all of them are version 7.

I do think though that these functions should not have v7 in their
name, since they would apply to all uuids of all versions (so if also
removing the get_ prefix they would be called uuid_ver and uuid_var)

> > 4. Sometimes you will need to generate a uuid for historical time. There should be an additional function
gen_uuid_v7(timestamp).
> Done, please see patch attached. But I changed signature to gen_uuid_v7(int8), to avoid messing with bytes from user
whoknows what they want. Or do you think gen_uuid_v7(timestamp) would be more convenient? 

I think timestamp would be quite useful. timestamp would encode the
time in the same way as gen_uuid_v7() would, but based on the given
time instead of the current time.



Re: UUID v7

От
Przemysław Sztoch
Дата:
Andrey M. Borodin wrote on 1/4/2024 7:20 PM:
Hello Przemysław,

thanks for your interest in this patch!

On 3 Jan 2024, at 04:37, Przemysław Sztoch <przemyslaw@sztoch.pl> wrote:

1. Is it possible to add a function that returns the version of the generated uuid?
It will be very useful. 
I don't know if it's possible, but I think there are bits in the UUID that inform about the version.
What do you think if we have functions get_uuid_v7_ver(uuid) and get_uuid_v7_var(uuid) to extract bit fields according to [0] ? Or, perhaps, this should be one function with two return parameters?
It's not in a patch yet, I'm just considering how this functionality should look like.
uuid_ver(uuid) -> smallint/integer 1/3/4/5/6/7/8

Of course there is RFC 4122 Variant "bits: 10x". If it is other variant then uuid_ver should return -1 OR NULL.
For UUIDs generated by your patch this function should always return 7.
2. If there is any doubt about adding the function to the main sources (standard development in progress), in my opinion you can definitely add this function to the uuid-ossp extension.
From my POV we can just have this function in the core. OSSP support for UUID seems more or less dead [1]: "Newsflash: 04-Jul-2008: Released OSSP uuid 1.6.2". Or do I look into wrong place?
After two days of thinking about UUID v7, I consider it a very important functionality that should be included in PG17.
3. Wouldn't it be worth including UUID version 6 as well?
The standard in [0] says "Systems that do not involve legacy UUIDv1 SHOULD use UUIDv7 Section 5.7 instead." If there's a point in developing v6 - I'm OK to do so.
IETF standard should provide information about possibility of conversion from v1 to v6.
Then the usefulness of v6 is much greater and it would be worth implementing this version as well.
4. Sometimes you will need to generate a uuid for historical time. There should be an additional function gen_uuid_v7(timestamp).
Done, please see patch attached. But I changed signature to gen_uuid_v7(int8), to avoid messing with bytes from user who knows what they want. Or do you think gen_uuid_v7(timestamp) would be more convenient?
I talked to my colleagues and everyone chooses the timestamp version.
If timestamp is outside the allowed range, the function must return an error.

We also talked about uuid-ossp. Still, v5 is a great solution in some applications.
It is worth moving this function from extension to PG17. Many people don't use it because they don't know it and this uuid schema.

We think it would be quite reasonable to add:
uuid_generate_v5 (namespace uuid, name text) -> uuid
uuid_generate_v6 () -> uuid
uuid_generate_v6 (timestamptz) -> uuid
uuid_generate_v7() -> uuid
uuid_generate_v7(timestamptz) -> uuid
uuid_ver(uuid) -> smallint -1/1/2/3/4/5/6/7/8
uuid_ts(uuid) -> timestamptz (for 1/6/7 version, for other should return NULL, error is too heavy in our opinion)
uuid_v1_to_v6 (uuid) -> uuid

The naming of this family of functions needs to be rethought.
Do we adopt the naming standard from Postgres and the uuid-ossp extension?
Or should we continue with a slightly less accurate name for PG: get_random_uuid (get_random_uuid, get_uuid_v7)?

5. Please add in docs reference to RFC4122 (https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-14#uuid)
People should read standards. :-)
Thanks!

Best regards, Andrey Borodin.


[0] https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-14#uuidv7
[1] http://www.ossp.org/

--
Przemysław Sztoch | Mobile +48 509 99 00 66

Re: UUID v7

От
Sergey Prokhorenko
Дата:
Hello Przemysław and Andrey,

When naming functions, I would advise using the shorter abbreviation uuidv7 from the new version of the RFC instead of uuid_v7. When people search Google for new versions of UUIDs, they enter the abbreviation uuidv7 into the search bar. The name generate_uuidv7() looks good, as well as uuidv1_to_uuidv6() and timestamp_to_uuidv7().

Best regards,

Sergey Prokhorenko
sergeyprokhorenko@yahoo.com.au


On Friday, 5 January 2024 at 11:53:04 am GMT+3, Przemysław Sztoch <przemyslaw@sztoch.pl> wrote:


Andrey M. Borodin wrote on 1/4/2024 7:20 PM:
Hello Przemysław,

thanks for your interest in this patch!

On 3 Jan 2024, at 04:37, Przemysław Sztoch <przemyslaw@sztoch.pl> wrote:

1. Is it possible to add a function that returns the version of the generated uuid?
It will be very useful. 
I don't know if it's possible, but I think there are bits in the UUID that inform about the version.
What do you think if we have functions get_uuid_v7_ver(uuid) and get_uuid_v7_var(uuid) to extract bit fields according to [0] ? Or, perhaps, this should be one function with two return parameters?
It's not in a patch yet, I'm just considering how this functionality should look like.
uuid_ver(uuid) -> smallint/integer 1/3/4/5/6/7/8

Of course there is RFC 4122 Variant "bits: 10x". If it is other variant then uuid_ver should return -1 OR NULL.
For UUIDs generated by your patch this function should always return 7.
2. If there is any doubt about adding the function to the main sources (standard development in progress), in my opinion you can definitely add this function to the uuid-ossp extension.
From my POV we can just have this function in the core. OSSP support for UUID seems more or less dead [1]: "Newsflash: 04-Jul-2008: Released OSSP uuid 1.6.2". Or do I look into wrong place?
After two days of thinking about UUID v7, I consider it a very important functionality that should be included in PG17.
3. Wouldn't it be worth including UUID version 6 as well?
The standard in [0] says "Systems that do not involve legacy UUIDv1 SHOULD use UUIDv7 Section 5.7 instead." If there's a point in developing v6 - I'm OK to do so.
IETF standard should provide information about possibility of conversion from v1 to v6.
Then the usefulness of v6 is much greater and it would be worth implementing this version as well.
4. Sometimes you will need to generate a uuid for historical time. There should be an additional function gen_uuid_v7(timestamp).
Done, please see patch attached. But I changed signature to gen_uuid_v7(int8), to avoid messing with bytes from user who knows what they want. Or do you think gen_uuid_v7(timestamp) would be more convenient?
I talked to my colleagues and everyone chooses the timestamp version.
If timestamp is outside the allowed range, the function must return an error.

We also talked about uuid-ossp. Still, v5 is a great solution in some applications.
It is worth moving this function from extension to PG17. Many people don't use it because they don't know it and this uuid schema.

We think it would be quite reasonable to add:
uuid_generate_v5 (namespace uuid, name text) -> uuid
uuid_generate_v6 () -> uuid
uuid_generate_v6 (timestamptz) -> uuid
uuid_generate_v7() -> uuid
uuid_generate_v7(timestamptz) -> uuid
uuid_ver(uuid) -> smallint -1/1/2/3/4/5/6/7/8
uuid_ts(uuid) -> timestamptz (for 1/6/7 version, for other should return NULL, error is too heavy in our opinion)
uuid_v1_to_v6 (uuid) -> uuid

The naming of this family of functions needs to be rethought.
Do we adopt the naming standard from Postgres and the uuid-ossp extension?
Or should we continue with a slightly less accurate name for PG: get_random_uuid (get_random_uuid, get_uuid_v7)?

5. Please add in docs reference to RFC4122 (https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-14#uuid)
People should read standards. :-)
Thanks!

Best regards, Andrey Borodin.


[0] https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-14#uuidv7
[1] http://www.ossp.org/

--
Przemysław Sztoch
| Mobile +48 509 99 00 66

Re: UUID v7

От
Andrey Borodin
Дата:

> On 5 Jan 2024, at 15:57, Sergey Prokhorenko <sergeyprokhorenko@yahoo.com.au> wrote:

Sergey, Przemysław, Jelte, thanks for your feedback.
Here's v9. Changes:
1. Swapped type of the argument to timestamptz in gen_uuid_v7()
2. Renamed get_uuid_v7_time() to uuid_v7_time()
3. Added  uuid_ver() and uuid_var().

What do you think?

Best regards, Andrey Borodin.

Вложения

Re: UUID v7

От
Aleksander Alekseev
Дата:
Hi Andrey,

> Sergey, Przemysław, Jelte, thanks for your feedback.
> Here's v9. Changes:
> 1. Swapped type of the argument to timestamptz in gen_uuid_v7()
> 2. Renamed get_uuid_v7_time() to uuid_v7_time()
> 3. Added  uuid_ver() and uuid_var().
>
> What do you think?

Many thanks for the updated patch. It's an important work and I very
much hope we will see this in the upcoming PG release.

```
+Datum
+pg_node_tree_in(PG_FUNCTION_ARGS)
+{
+    if (!IsBootstrapProcessingMode())
+        elog(ERROR, "cannot accept a value of type pg_node_tree_in");
+    return textin(fcinfo);
+}
```

Not 100% sure what this is for. Any chance this could be part of another patch?

One thing I don't particularly like about the tests is the fact that
they don't check if a correct UUID was actually generated. I realize
that's not quite trivial due to the random nature of the function, but
maybe we could use some substring/regex magic here? Something like:

```
select gen_uuid_v7() :: text ~ '^[0-9a-f]{8}-([0-9a-f]{4}-){3}[0-9a-f]{12}$';
 ?column?
----------
 t

select regexp_replace(gen_uuid_v7('2024-01-16 15:45:33 MSK') :: text,
'[0-9a-f]{4}-[0-9a-f]{12}$', 'XXXX-' || repeat('X', 12));
            regexp_replace
--------------------------------------
 018d124e-39c8-74c7-XXXX-XXXXXXXXXXXX
```


```
+  proname => 'uuid_v7_time', proleakproof => 't', provolatile => 'i',
```

I don't think we conventionally specify IMMUTABLE volatility, it's the
default. Other values also are worth checking.

Another question: how did you choose between using TimestampTz and
Timestamp types? I realize that internally it's all the same. Maybe
Timestamp will be slightly better since the way it is displayed
doesn't depend on the session settings. Many people I talked to find
this part of TimestampTz confusing.

Also I would like to point out that part of the documentation is
missing, but I guess at this stage of the game it's OK.

Last but not least: maybe we should support casting Timestamp[Tz] to
UUIDv7 and vice versa? Shouldn't be difficult to implement and I
suspect somebody will request this eventually. During the cast to UUID
we will always get the same value for the given Timestamp[Tz], which
probably can be useful in certain applications. It can't be done with
gen_uuid_v7() and its volatility doesn't permit it.

--
Best regards,
Aleksander Alekseev



Re: UUID v7

От
"Andrey M. Borodin"
Дата:
Thanks for your review, Aleksander!

> On 16 Jan 2024, at 18:00, Aleksander Alekseev <aleksander@timescale.com> wrote:
>
>
> ```
> +Datum
> +pg_node_tree_in(PG_FUNCTION_ARGS)
> +{
> +    if (!IsBootstrapProcessingMode())
> +        elog(ERROR, "cannot accept a value of type pg_node_tree_in");
> +    return textin(fcinfo);
> +}
> ```
>
> Not 100% sure what this is for. Any chance this could be part of another patch?
Nope, it’s necessary there. Without these changes catalog functions cannot have defaults for arguments. These defaults
havetype pg_node_tree which has no-op in function. 

> One thing I don't particularly like about the tests is the fact that
> they don't check if a correct UUID was actually generated. I realize
> that's not quite trivial due to the random nature of the function, but
> maybe we could use some substring/regex magic here? Something like:
>
> ```
> select gen_uuid_v7() :: text ~ '^[0-9a-f]{8}-([0-9a-f]{4}-){3}[0-9a-f]{12}$';
> ?column?
> ----------
> t
>
> select regexp_replace(gen_uuid_v7('2024-01-16 15:45:33 MSK') :: text,
> '[0-9a-f]{4}-[0-9a-f]{12}$', 'XXXX-' || repeat('X', 12));
>            regexp_replace
> --------------------------------------
> 018d124e-39c8-74c7-XXXX-XXXXXXXXXXXX
> ```
Any 8 bytes which have ver and var bits (6 bits total) are correct UUID.
This is checked by tests when uuid_var() and uuid_ver() functions are exercised.

> ```
> +  proname => 'uuid_v7_time', proleakproof => 't', provolatile => 'i',
> ```
>
> I don't think we conventionally specify IMMUTABLE volatility, it's the
> default. Other values also are worth checking.
Makes sense, I’ll drop this values in next version.
BTW I’m in doubt if provided functions are leakproof. They ERROR-out with messages that can give a clue about several
bitsof UUID. Does this break leakproofness? I think yest, but I’m not sure. 
gen_uuid_v7() seems leakproof to me.

> Another question: how did you choose between using TimestampTz and
> Timestamp types? I realize that internally it's all the same. Maybe
> Timestamp will be slightly better since the way it is displayed
> doesn't depend on the session settings. Many people I talked to find
> this part of TimestampTz confusing.

I mean, this argument is expected to be used to implement K-way sorted identifiers. In this context, it seems to me,
it’sgood to remember to developer that time shift also depend on timezones. 
But this is too vague.
Do you have any reasons that apply to UUID generation?

> Also I would like to point out that part of the documentation is
> missing, but I guess at this stage of the game it's OK.
>
> Last but not least: maybe we should support casting Timestamp[Tz] to
> UUIDv7 and vice versa? Shouldn't be difficult to implement and I
> suspect somebody will request this eventually. During the cast to UUID
> we will always get the same value for the given Timestamp[Tz], which
> probably can be useful in certain applications. It can't be done with
> gen_uuid_v7() and its volatility doesn't permit it.
I’m strongly opposed to doing this cast. I was not adding this function to extract timestamp from UUID, because
standarddoes not recommend it. But a lot of people asked for this. 
But supporting easy way to do unrecommended thing seem bad.


Best regards, Andrey Borodin.


Re: UUID v7

От
Sergey Prokhorenko
Дата:
Andrey,

It is not clear how to interpret uuid_v7_time():
  • uuid_v7 to time() (extracting the timestamp)
  • time() to uuid_v7  (generation of the uuid_v7)
It is worth improving the naming, for example, adding prepositions.


Sergey Prokhorenko
sergeyprokhorenko@yahoo.com.au


On Tuesday, 16 January 2024 at 05:44:51 pm GMT+3, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:


Thanks for your review, Aleksander!

> On 16 Jan 2024, at 18:00, Aleksander Alekseev <aleksander@timescale.com> wrote:
>
>
> ```
> +Datum
> +pg_node_tree_in(PG_FUNCTION_ARGS)
> +{
> +    if (!IsBootstrapProcessingMode())
> +        elog(ERROR, "cannot accept a value of type pg_node_tree_in");
> +    return textin(fcinfo);
> +}
> ```
>
> Not 100% sure what this is for. Any chance this could be part of another patch?
Nope, it’s necessary there. Without these changes catalog functions cannot have defaults for arguments. These defaults have type pg_node_tree which has no-op in function.

> One thing I don't particularly like about the tests is the fact that
> they don't check if a correct UUID was actually generated. I realize
> that's not quite trivial due to the random nature of the function, but
> maybe we could use some substring/regex magic here? Something like:
>
> ```
> select gen_uuid_v7() :: text ~ '^[0-9a-f]{8}-([0-9a-f]{4}-){3}[0-9a-f]{12}$';
> ?column?
> ----------
> t
>
> select regexp_replace(gen_uuid_v7('2024-01-16 15:45:33 MSK') :: text,
> '[0-9a-f]{4}-[0-9a-f]{12}$', 'XXXX-' || repeat('X', 12));
>            regexp_replace
> --------------------------------------
> 018d124e-39c8-74c7-XXXX-XXXXXXXXXXXX
> ```
Any 8 bytes which have ver and var bits (6 bits total) are correct UUID.
This is checked by tests when uuid_var() and uuid_ver() functions are exercised.

> ```
> +  proname => 'uuid_v7_time', proleakproof => 't', provolatile => 'i',
> ```
>
> I don't think we conventionally specify IMMUTABLE volatility, it's the
> default. Other values also are worth checking.
Makes sense, I’ll drop this values in next version.
BTW I’m in doubt if provided functions are leakproof. They ERROR-out with messages that can give a clue about several bits of UUID. Does this break leakproofness? I think yest, but I’m not sure.
gen_uuid_v7() seems leakproof to me.

> Another question: how did you choose between using TimestampTz and
> Timestamp types? I realize that internally it's all the same. Maybe
> Timestamp will be slightly better since the way it is displayed
> doesn't depend on the session settings. Many people I talked to find
> this part of TimestampTz confusing.

I mean, this argument is expected to be used to implement K-way sorted identifiers. In this context, it seems to me, it’s good to remember to developer that time shift also depend on timezones.
But this is too vague.
Do you have any reasons that apply to UUID generation?


> Also I would like to point out that part of the documentation is
> missing, but I guess at this stage of the game it's OK.
>
> Last but not least: maybe we should support casting Timestamp[Tz] to
> UUIDv7 and vice versa? Shouldn't be difficult to implement and I
> suspect somebody will request this eventually. During the cast to UUID
> we will always get the same value for the given Timestamp[Tz], which
> probably can be useful in certain applications. It can't be done with
> gen_uuid_v7() and its volatility doesn't permit it.

I’m strongly opposed to doing this cast. I was not adding this function to extract timestamp from UUID, because standard does not recommend it. But a lot of people asked for this.
But supporting easy way to do unrecommended thing seem bad.


Best regards, Andrey Borodin.

Re: UUID v7

От
"Andrey M. Borodin"
Дата:

> On 16 Jan 2024, at 21:49, Sergey Prokhorenko <sergeyprokhorenko@yahoo.com.au> wrote:
>
> It is not clear how to interpret uuid_v7_time():
>     • uuid_v7 to time() (extracting the timestamp)
>     • time() to uuid_v7  (generation of the uuid_v7)
> It is worth improving the naming, for example, adding prepositions.

Previously, Jelte had some thoughts on idiomatic function names.

Jelte, what is your opinion on naming the function which extracts timestamp from UUID v7?
Of cause, it would be great to hear opinion from anyone else.


Best regards, Andrey Borodin.


Re: UUID v7

От
Jelte Fennema-Nio
Дата:
On Tue, 16 Jan 2024 at 15:44, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
> > On 16 Jan 2024, at 18:00, Aleksander Alekseev <aleksander@timescale.com> wrote:
> > Not 100% sure what this is for. Any chance this could be part of another patch?
> Nope, it’s necessary there. Without these changes catalog functions cannot have defaults for arguments. These
defaultshave type pg_node_tree which has no-op in function. 

That seems like the wrong way to make that work then. How about
instead we define the same function name twice, once with and once
without a timestamp argument. That's how this is done for other
functions that are overloaded in pg_catalog.



Re: UUID v7

От
Przemysław Sztoch
Дата:
Andrey Borodin wrote on 1/16/2024 1:15 PM:
Sergey, Przemysław, Jelte, thanks for your feedback.
Here's v9. Changes:
1. Swapped type of the argument to timestamptz in gen_uuid_v7()
Please update docs part about optional timestamp argument.
2. Renamed get_uuid_v7_time() to uuid_v7_time()
Please rename uuid_v7_time to uuid_time() and add support for v1 and v6.
If version is incompatible then return NULL.
3. Added  uuid_ver() and uuid_var().
Looks good.
But for me, throwing an error is problematic. Wouldn't it be better to return -1.
What do you think?
Best regards, Andrey Borodin.
--
Przemysław Sztoch | Mobile +48 509 99 00 66

Re: UUID v7

От
Jelte Fennema-Nio
Дата:
On Tue, 16 Jan 2024 at 19:17, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
> Jelte, what is your opinion on naming the function which extracts timestamp from UUID v7?

I looked at a few more datatypes: json, jsonb & hstore. The get_
prefix is not used there at all, so I'm still opposed to that. But
they seem to use either an _to_ or an _extract_ infix. _to_ is then
used for conversion of the whole object, and _extract_ is used to
extract a subset. So I think _extract_ would fit well here.

On Fri, 5 Jan 2024 at 11:57, Sergey Prokhorenko
<sergeyprokhorenko@yahoo.com.au> wrote:
> When naming functions, I would advise using the shorter abbreviation uuidv7 from the new version of the RFC instead
ofuuid_v7.
 

I also agree with that, uuid_v7 looks weird to my eyes. The RFC also
abbreviates them as UUIDv7 (without a space).

The more I look at it the more I also think the gen_ prefix is quite
strange, and I already thought the gen_random_uuid name was quite
weird. But now that we will also have a uuidv7 I think it's even
stranger that one uses the name from the RFC.

The name of gen_random_uuid was taken verbatim from pgcrypto, without
any discussion on the list[0]:

> Here is a proposed patch for this. I did a fair bit of looking around
> in other systems for a naming pattern but didn't find anything
> consistent. So I ended up just taking the function name and code from
> pgcrypto.


So currently my preference for the function names would be:

- uuidv4() -> alias for gen_random_uuid()
- uuidv7()
- uuidv7(timestamptz)
- uuid_extract_ver(uuid)
- uuid_extract_var(uuid)
- uuidv7_extract_time(uuid)

[0]:
https://www.postgresql.org/message-id/flat/6a65610c-46fc-2323-6b78-e8086340a325%402ndquadrant.com#76e40e950a44aa8b6844297e8d2efe2c



Re: UUID v7

От
Przemysław Sztoch
Дата:
Jelte Fennema-Nio wrote on 1/16/2024 9:25 PM:
On Tue, 16 Jan 2024 at 19:17, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
So currently my preference for the function names would be:

- uuidv4() -> alias for gen_random_uuid()
- uuidv7()
- uuidv7(timestamptz)
- uuid_extract_ver(uuid)
- uuid_extract_var(uuid)
- uuidv7_extract_time(uuid)
+1
But replace uuidv7_extract_time(uuid) with uuid_extract_time(uuid) - function should be able extract timestamp from v1/v6/v7

I would highly recommend to add:
uuidv5(namespace uuid, name text) -> uuid
using uuid_generate_v5 from uuid-ossp extension (https://www.postgresql.org/docs/current/uuid-ossp.html)
There is an important version and it should be included into the main PG code.

Jelte: Please propose the name of the function that will convert uuid from version 1 to 6.
v6 is almost as good as v7 for indexes. And v6 allows you to convert from v1 which some people use.
--
Przemysław Sztoch | Mobile +48 509 99 00 66

Re: UUID v7

От
Przemysław Sztoch
Дата:
Another question: how did you choose between using TimestampTz and
Timestamp types? I realize that internally it's all the same. Maybe
Timestamp will be slightly better since the way it is displayed
doesn't depend on the session settings. Many people I talked to find
this part of TimestampTz confusing.
timstamptz internally always store UTC.
I believe that in SQL, when operating with time in UTC, you should always use timestamptz.
timestamp is theoretically the same thing. But internally it does not convert time to UTC and will lead to incorrect use.
--
Przemysław Sztoch | Mobile +48 509 99 00 66

Re: UUID v7

От
Jelte Fennema-Nio
Дата:
On Tue, 16 Jan 2024 at 22:02, Przemysław Sztoch <przemyslaw@sztoch.pl> wrote:
> But replace uuidv7_extract_time(uuid) with uuid_extract_time(uuid) - function should be able extract timestamp from
v1/v6/v7

I'm fine with this.

> I would highly recommend to add:
> uuidv5(namespace uuid, name text) -> uuid
> using uuid_generate_v5 from uuid-ossp extension (https://www.postgresql.org/docs/current/uuid-ossp.html)
> There is an important version and it should be included into the main PG code.

I think adding more uuid versions would probably be desirable. But I
don't think it makes sense to clutter this patchset with that. I feel
like on this uuidv7 patchset we've had enough discussion that it could
reasonably get into PG17, but I think adding even more uuid versions
to this patchset would severely reduce the chances of that happening.



Re: UUID v7

От
Andrey Borodin
Дата:

> On 17 Jan 2024, at 02:19, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:

I want to ask Kyzer or Brad, I hope they will see this message. I'm working on the patch for time extraction for v1, v6
andv7. 

Do I understand correctly, that UUIDs contain local time, not UTC time? For examples in [0] I see that "A.6. Example of
aUUIDv7 Value" I see that February 22, 2022 2:22:22.00 PM GMT-05:00 results in unix_ts_ms = 0x017F22E279B0, which is
notUTC, but local time. 
Is it intentional? Section "5.1. UUID Version 1" states otherwise.

If so, I should swap signatures of functions from TimestampTz to Timestamp.
I'm hard-coding examples from this standard to tests, so I want to be precise...

If I follow the standard I see this in tests:
+-- extract UUID v1, v6 and v7 timestamp
+SELECT uuid_extract_time('C232AB00-9414-11EC-B3C8-9F6BDECED846') at time zone 'GMT-05';
+         timezone
+--------------------------
+ Wed Feb 23 00:22:22 2022
+(1 row)

Current patch version attached. I've addressed all other requests: function renames, aliases, multiple functions
insteadof optional params, cleaner catalog definitions, not throwing error when [var,ver,time] value is unknown. 
What is left: deal with timezones, improve documentation.


Best regards, Andrey Borodin.

[0] https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis#name-example-of-a-uuidv1-value

Вложения

Re: UUID v7

От
Aleksander Alekseev
Дата:
Hi,

> Another question: how did you choose between using TimestampTz and
> Timestamp types? I realize that internally it's all the same. Maybe
> Timestamp will be slightly better since the way it is displayed
> doesn't depend on the session settings. Many people I talked to find
> this part of TimestampTz confusing.
>
> timstamptz internally always store UTC.
> I believe that in SQL, when operating with time in UTC, you should always use timestamptz.
> timestamp is theoretically the same thing. But internally it does not convert time to UTC and will lead to incorrect
use.

No.

Timestamp and TimestampTz are absolutely the same thing. The only
difference is how they are shown to the user. TimestampTz uses session
context in order to be displayed in the TZ chosen by the user. Thus
typically it is somewhat more confusing to the users and thus I asked
whether there was a good reason to choose TimestampTz over Timestamp.

-- 
Best regards,
Aleksander Alekseev



Re: UUID v7

От
Andrey Borodin
Дата:

> On 18 Jan 2024, at 19:20, Aleksander Alekseev <aleksander@timescale.com> wrote:
>
> Timestamp and TimestampTz are absolutely the same thing.
My question is not about Postgres data types. I'm asking about examples in the standard.

There's an example 017F22E2-79B0-7CC3-98C4-DC0C0C07398F. It is expected to be generated on "Tuesday, February 22, 2022
2:22:22.00PM GMT-05:00". 
It's exaplained to be 164555774200000ns after 1582-10-15 00:00:00 UTC.

But 164555774200000ns after 1582-10-15 00:00:00 UTC  was  2022-02-22 19:22:22 UTC. And that was 2022-02-23 00:22:22 in
UTC-05.


Best regards, Andrey Borodin.


Re: UUID v7

От
Aleksander Alekseev
Дата:
Hi Andrey,

> > Timestamp and TimestampTz are absolutely the same thing.
> My question is not about Postgres data types. I'm asking about examples in the standard.
>
> There's an example 017F22E2-79B0-7CC3-98C4-DC0C0C07398F. It is expected to be generated on "Tuesday, February 22,
20222:22:22.00 PM GMT-05:00".
 
> It's exaplained to be 164555774200000ns after 1582-10-15 00:00:00 UTC.
>
> But 164555774200000ns after 1582-10-15 00:00:00 UTC  was  2022-02-22 19:22:22 UTC. And that was 2022-02-23 00:22:22
inUTC-05.
 

Not 100% sure which text you are referring to exactly, but I'm
guessing it's section B.2 of [1]

"""
This example UUIDv7 test vector utilizes a well-known 32 bit Unix
epoch with additional millisecond precision to fill the first 48 bits
[...]
The timestamp is Tuesday, February 22, 2022 2:22:22.00 PM GMT-05:00
represented as 0x17F22E279B0 or 1645557742000
"""

If this is the case, I think the example is indeed wrong:

```
=# select extract(epoch from 'Tuesday, February 22, 2022 2:22:22.00 PM
GMT-05:00' :: timestamptz)*1000;
       ?column?
----------------------
 1645521742000.000000
(1 row)
```

And the difference between the value in the text and the actual value
is 10 hours as you pointed out.

Also you named the date 1582-10-15 00:00:00 UTC. Maybe you actually
meant 1970-01-01 00:00:00 UTC?

[1]: https://www.ietf.org/archive/id/draft-peabody-dispatch-new-uuid-format-04.html

-- 
Best regards,
Aleksander Alekseev



Re: UUID v7

От
Sergey Prokhorenko
Дата:
Hi Andrey,

Aleksander Alekseev wrote: "If this is the case, I think the example is indeed wrong". 

This is one of the reasons why I was categorically against any examples of implementation in the new RFC. The examples have been very poorly studied and discussed, and therefore it is better not to use them at all. But the text of the RFC itself clearly refers to UTC, and not at all about local time: "UUID version 7 features a time-ordered value field derived from the widely implemented and well known Unix Epoch timestamp source, the number of milliseconds since midnight 1 Jan 1970 UTC, leap seconds excluded". The main reason for using UTC is so that UUIDv7's, generated approximately simultaneously in different time zones, are correctly ordered in time when they get into one database.


Sergey Prokhorenko
sergeyprokhorenko@yahoo.com.au


On Thursday, 18 January 2024 at 07:22:05 pm GMT+3, Aleksander Alekseev <aleksander@timescale.com> wrote:


Hi Andrey,

> > Timestamp and TimestampTz are absolutely the same thing.
> My question is not about Postgres data types. I'm asking about examples in the standard.
>
> There's an example 017F22E2-79B0-7CC3-98C4-DC0C0C07398F. It is expected to be generated on "Tuesday, February 22, 2022 2:22:22.00 PM GMT-05:00".
> It's exaplained to be 164555774200000ns after 1582-10-15 00:00:00 UTC.
>
> But 164555774200000ns after 1582-10-15 00:00:00 UTC  was  2022-02-22 19:22:22 UTC. And that was 2022-02-23 00:22:22 in UTC-05.

Not 100% sure which text you are referring to exactly, but I'm
guessing it's section B.2 of [1]

"""
This example UUIDv7 test vector utilizes a well-known 32 bit Unix
epoch with additional millisecond precision to fill the first 48 bits
[...]
The timestamp is Tuesday, February 22, 2022 2:22:22.00 PM GMT-05:00
represented as 0x17F22E279B0 or 1645557742000
"""

If this is the case, I think the example is indeed wrong:

```
=# select extract(epoch from 'Tuesday, February 22, 2022 2:22:22.00 PM
GMT-05:00' :: timestamptz)*1000;
      ?column?
----------------------
1645521742000.000000
(1 row)
```

And the difference between the value in the text and the actual value
is 10 hours as you pointed out.

Also you named the date 1582-10-15 00:00:00 UTC. Maybe you actually
meant 1970-01-01 00:00:00 UTC?

[1]: https://www.ietf.org/archive/id/draft-peabody-dispatch-new-uuid-format-04.html

--
Best regards,

Aleksander Alekseev

Re: UUID v7

От
Andrey Borodin
Дата:

> On 18 Jan 2024, at 20:39, Andrey Borodin <x4mmm@yandex-team.ru> wrote:
>
> But 164555774200000ns after 1582-10-15 00:00:00 UTC  was  2022-02-22 19:22:22 UTC. And that was 2022-02-23 00:22:22
inUTC-05. 


'2022-02-22 19:22:22 UTC' is exactly that moment which was encoded into example UUIDs. It's not '2022-02-23 00:22:22 in
UTC-05'as I thought. 
I got confused by "at timezone" changes which in fact removes timezone information. And that's per SQL standard...

Now I'm completely lost in time... I've set local time to NY (UTC-5).

postgres=# select TIMESTAMP WITH TIME ZONE '2022-02-22 14:22:22-05' - TIMESTAMP WITH TIME ZONE 'Tuesday, February 22,
20222:22:22.00 PM GMT-05:00'; 
 ?column?
----------
 10:00:00
(1 row)

postgres=# select TIMESTAMP WITH TIME ZONE 'Tuesday, February 22, 2022 2:22:22.00 PM GMT-05:00';
      timestamptz
------------------------
 2022-02-22 04:22:22-05
(1 row)

I cannot wrap my mind around it... Any pointers would be appreciated.
I'm certain that code extracted UTC time correctly, I just want a reliable test that verifies timestamp constant (+ I
understandwhat is going on). 


Best regards, Andrey Borodin.


Re: UUID v7

От
Sergey Prokhorenko
Дата:
Hi Andrey,

You'd better generate a test UUIDv7 for midnight 1 Jan 1970 UTC. In this case, the timestamp in UUIDv7 according to the new RFC must be filled with zeros. By extracting the timestamp from this test UUIDv7, you should get exactly midnight 1 Jan 1970 UTC.

I also recommend this article: https://habr.com/ru/articles/772954/



Sergey Prokhorenko
sergeyprokhorenko@yahoo.com.au


On Thursday, 18 January 2024 at 09:31:16 pm GMT+3, Andrey Borodin <x4mmm@yandex-team.ru> wrote:




> On 18 Jan 2024, at 20:39, Andrey Borodin <x4mmm@yandex-team.ru> wrote:
>
> But 164555774200000ns after 1582-10-15 00:00:00 UTC  was  2022-02-22 19:22:22 UTC. And that was 2022-02-23 00:22:22 in UTC-05.


'2022-02-22 19:22:22 UTC' is exactly that moment which was encoded into example UUIDs. It's not '2022-02-23 00:22:22 in UTC-05' as I thought.
I got confused by "at timezone" changes which in fact removes timezone information. And that's per SQL standard...

Now I'm completely lost in time... I've set local time to NY (UTC-5).

postgres=# select TIMESTAMP WITH TIME ZONE '2022-02-22 14:22:22-05' - TIMESTAMP WITH TIME ZONE 'Tuesday, February 22, 2022 2:22:22.00 PM GMT-05:00';
?column?
----------
10:00:00
(1 row)

postgres=# select TIMESTAMP WITH TIME ZONE 'Tuesday, February 22, 2022 2:22:22.00 PM GMT-05:00';
      timestamptz     
------------------------
2022-02-22 04:22:22-05
(1 row)

I cannot wrap my mind around it... Any pointers would be appreciated.
I'm certain that code extracted UTC time correctly, I just want a reliable test that verifies timestamp constant (+ I understand what is going on).



Best regards, Andrey Borodin.

Re: UUID v7

От
Przemysław Sztoch
Дата:
Using localtime would be absurd. Especially since time goes back during summer time change.
I believe our implementation should use UTC. No one forbids us from assuming that our local time for generating uuid is UTC.

Andrey Borodin wrote on 1/18/2024 2:17 PM:

On 17 Jan 2024, at 02:19, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
I want to ask Kyzer or Brad, I hope they will see this message. I'm working on the patch for time extraction for v1, v6 and v7.

Do I understand correctly, that UUIDs contain local time, not UTC time? For examples in [0] I see that "A.6. Example of a UUIDv7 Value" I see that February 22, 2022 2:22:22.00 PM GMT-05:00 results in unix_ts_ms = 0x017F22E279B0, which is not UTC, but local time.
Is it intentional? Section "5.1. UUID Version 1" states otherwise.

If so, I should swap signatures of functions from TimestampTz to Timestamp.
I'm hard-coding examples from this standard to tests, so I want to be precise...

If I follow the standard I see this in tests:
+-- extract UUID v1, v6 and v7 timestamp
+SELECT uuid_extract_time('C232AB00-9414-11EC-B3C8-9F6BDECED846') at time zone 'GMT-05';
+         timezone         
+--------------------------
+ Wed Feb 23 00:22:22 2022
+(1 row)

Current patch version attached. I've addressed all other requests: function renames, aliases, multiple functions instead of optional params, cleaner catalog definitions, not throwing error when [var,ver,time] value is unknown.
What is left: deal with timezones, improve documentation.


Best regards, Andrey Borodin.

[0] https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis#name-example-of-a-uuidv1-value

--
Przemysław Sztoch | Mobile +48 509 99 00 66

Re: UUID v7

От
Przemysław Sztoch
Дата:
Aleksander Alekseev wrote on 1/18/2024 3:20 PM:
Hi,

Another question: how did you choose between using TimestampTz and
Timestamp types? I realize that internally it's all the same. Maybe
Timestamp will be slightly better since the way it is displayed
doesn't depend on the session settings. Many people I talked to find
this part of TimestampTz confusing.

timstamptz internally always store UTC.
I believe that in SQL, when operating with time in UTC, you should always use timestamptz.
timestamp is theoretically the same thing. But internally it does not convert time to UTC and will lead to incorrect use.
No.

Timestamp and TimestampTz are absolutely the same thing. The only
difference is how they are shown to the user. TimestampTz uses session
context in order to be displayed in the TZ chosen by the user. Thus
typically it is somewhat more confusing to the users and thus I asked
whether there was a good reason to choose TimestampTz over Timestamp.


Theoretically, you're right. But look at this example:

SET timezone TO 'Europe/Warsaw';
SELECT extract(epoch from '2024-01-18 9:27:30'::timestamp), extract(epoch from '2024-01-18 9:27:30'::timestamptz);

 date_part  | date_part
------------+------------
 1705570050 | 1705566450
(1 row)

In my opinion, timestamptz gives greater guarantees that the time internally is in UTC and the user gets the time in his/her time zone.

In the case of timestamp, it is never certain whether it keeps time in UTC or in the local zone.

In the case of argument's type, there would be no problem because we could create two functions.
Of course timestamp would be treated the same as timestamptz.
But here we have a problem with the function return type, which can only be one. And since the time returned is in UTC, it should be timestamptz.

--
Przemysław Sztoch | Mobile +48 509 99 00 66

Re: UUID v7

От
Przemysław Sztoch
Дата:
We are not allowed to consider any time other than UTC.

You need to write to the authors of the standard. I suppose this is a mistake.

I know from experience that errors in such standards most often appear in examples.
Nobody detects them at first.
Everyone reads and checks ideas, not calculations.
Then developers during implementation tears out their hair.

Andrey Borodin wrote on 1/18/2024 4:39 PM:

On 18 Jan 2024, at 19:20, Aleksander Alekseev <aleksander@timescale.com> wrote:

Timestamp and TimestampTz are absolutely the same thing.
My question is not about Postgres data types. I'm asking about examples in the standard.

There's an example 017F22E2-79B0-7CC3-98C4-DC0C0C07398F. It is expected to be generated on "Tuesday, February 22, 2022 2:22:22.00 PM GMT-05:00".
It's exaplained to be 164555774200000ns after 1582-10-15 00:00:00 UTC.

But 164555774200000ns after 1582-10-15 00:00:00 UTC  was  2022-02-22 19:22:22 UTC. And that was 2022-02-23 00:22:22 in UTC-05.


Best regards, Andrey Borodin.

--
Przemysław Sztoch | Mobile +48 509 99 00 66

Re: UUID v7

От
Lukas Fittl
Дата:
On Thu, Jan 18, 2024 at 5:18 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
Current patch version attached. I've addressed all other requests: function renames, aliases, multiple functions instead of optional params, cleaner catalog definitions, not throwing error when [var,ver,time] value is unknown.
What is left: deal with timezones, improve documentation.

I've done a test of the v10 patch, and ran into an interesting behavior when passing in a timestamp to the function (which, as a side note, is actually very useful to have as a feature, to support creating time-based range partitions on UUIDv7 fields):

postgres=# SELECT uuid_extract_time(uuidv7());
     uuid_extract_time    
---------------------------
 2024-01-18 18:49:00.01-08
(1 row)

postgres=# SELECT uuid_extract_time(uuidv7('2024-04-01'));
   uuid_extract_time    
------------------------
 2024-04-01 00:00:00-07
(1 row)

postgres=# SELECT uuid_extract_time(uuidv7());
   uuid_extract_time    
------------------------
 2024-04-01 00:00:00-07
(1 row)

Note how calling the uuidv7 function again after having called it with a fixed future timestamp, returns the future timestamp, even though it should return the current time.

I believe this is caused by incorrectly re-using the cached previous_timestamp. In the second call here (with a fixed future timestamp), we end up setting ts and tms to 2024-04-01, with increment_counter = false, which leads us to set previous_timestamp to the passed in timestamp (else branch of the second if in uuidv7). When we then call the function again without an argument, we end up getting a new timestamp from gettimeofday, but because we try to detect backwards leaps, we set increment_counter to true, and thus end up reusing the previous (future) timestamp here:

/* protection from leap backward */
tms = previous_timestamp;

Not sure how to fix this, but clearly something is amiss here.

Thanks,
Lukas

--
Lukas Fittl

Re: UUID v7

От
"David G. Johnston"
Дата:
On Thu, Jan 18, 2024 at 11:31 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:

Now I'm completely lost in time... I've set local time to NY (UTC-5).

postgres=# select TIMESTAMP WITH TIME ZONE '2022-02-22 14:22:22-05' - TIMESTAMP WITH TIME ZONE 'Tuesday, February 22, 2022 2:22:22.00 PM GMT-05:00';
 ?column?
----------
 10:00:00
(1 row)


You are mixing POSIX and ISO-8601 conventions and, as noted in our appendix, they disagree on the direction that is positive.

 
The offset fields specify the hours, and optionally minutes and seconds, difference from UTC. They have the format hh[:mm[:ss]] optionally with a leading sign (+ or -). The positive sign is used for zones west of Greenwich. (Note that this is the opposite of the ISO-8601 sign convention used elsewhere in PostgreSQL.)

David J.


Re: UUID v7

От
Andrey Borodin
Дата:

> On 19 Jan 2024, at 08:24, David G. Johnston <david.g.johnston@gmail.com> wrote:
>
>
> You are mixing POSIX and ISO-8601 conventions and, as noted in our appendix, they disagree on the direction that is
positive.

Thanks! Now everything seems on its place.

I want to include in the patch following tests:
-- extract UUID v1, v6 and v7 timestamp
SELECT uuid_extract_time('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM
GMT+05:00';
SELECT uuid_extract_time('1EC9414C-232A-6B00-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM
GMT+05:00';
SELECT uuid_extract_time('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM
GMT+05:00';

How do you think, will it be stable all across buildfarm? Or should we change anything to avoid false positives
inferredfrom different timestamp parsing? 


> On 19 Jan 2024, at 07:58, Lukas Fittl <lukas@fittl.com> wrote:
>
> Note how calling the uuidv7 function again after having called it with a fixed future timestamp, returns the future
timestamp,even though it should return the current time. 

Thanks for the review.
Well, that was intentional. But now I see it's kind of confusing behaviour. I've changed it to more expected version.

Also, I've added some documentation on all functions.


Best regards, Andrey Borodin.

Вложения

Re: UUID v7

От
Aleksander Alekseev
Дата:
Hi,

> No.
>
> Timestamp and TimestampTz are absolutely the same thing. The only
> difference is how they are shown to the user. TimestampTz uses session
> context in order to be displayed in the TZ chosen by the user. Thus
> typically it is somewhat more confusing to the users and thus I asked
> whether there was a good reason to choose TimestampTz over Timestamp.
>
>
> Theoretically, you're right. But look at this example:
>
> SET timezone TO 'Europe/Warsaw';
> SELECT extract(epoch from '2024-01-18 9:27:30'::timestamp), extract(epoch from '2024-01-18 9:27:30'::timestamptz);
>
>  date_part  | date_part
> ------------+------------
>  1705570050 | 1705566450
> (1 row)
>
> In my opinion, timestamptz gives greater guarantees that the time internally is in UTC and the user gets the time in
his/hertime zone.
 

I believe you didn't notice, but this example just proves my point.

In this case you have two timestamps that are different _internally_,
but the way they are _shown_ is the same because the first one is in
UTC and the second one in your local session timezone, Europe/Warsaw.
extract(epoch ...) extract UNIX epoch, i.e. relies on the _internal_
representation. This is why you got different results.

This demonstrates that TimestampTz is a permanent source of confusion
for the users and the reason why personally I would prefer if UUIDv7
always used Timestamp (no Tz). TimestampTz can be converted to
TimestampTz by users who need them and have experience using them.

-- 
Best regards,
Aleksander Alekseev



Re: UUID v7

От
Andrey Borodin
Дата:

> On 19 Jan 2024, at 13:25, Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> 
> Also, I've added some documentation on all functions.

Here's v12. Changes:
1. Documentation improvements
2. Code comments
3. Better commit message and reviews list

Best regards, Andrey Borodin.


Вложения

Re: UUID v7

От
Nikolay Samokhvalov
Дата:
On Fri, Jan 19, 2024 at 10:07 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:


> On 19 Jan 2024, at 13:25, Andrey Borodin <x4mmm@yandex-team.ru> wrote:
>
> Also, I've added some documentation on all functions.

Here's v12. Changes:
1. Documentation improvements
2. Code comments
3. Better commit message and reviews list

Thank you, Andrey! I have just checked v12 – cleanly applied to HEAD, and functions work well. I especially like that fact that we keep uuid_extract_time(..) here – this is a great thing to have for time-based partitioning, and in many cases we will be able to decide not to have a creation column timestamp (e.g., "created_at") at all, saving 8 bytes.

The docs and comments look great too.

Overall, the patch looks mature enough. It would be great to have it in pg17. Yes, the RFC is not fully finalized yet, but it's very close. And many libraries are already including implementation of UUIDv7 – here are some examples:

- https://www.npmjs.com/package/uuidv7
- https://crates.io/crates/uuidv7
- https://github.com/google/uuid/pull/139

Nik

Re: UUID v7

От
Nikolay Samokhvalov
Дата:
The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:       tested, passed
Spec compliant:           not tested
Documentation:            tested, passed

Manually tested uuidv7(), uuid_extract_time() – they work as expected. The basic docs provided look clear.

I haven't checked the tests though and possible edge cases, so leaving it as "needs review" waiting for more reviewers

Re: UUID v7

От
Aleksander Alekseev
Дата:
Hi,

> But now (after big timeseries project with multiple time zones and DST problems) I think differently.
> Even though timestamp and timestamptz are practically the same, timestamptz should be used to store the time in UTC.
> Choosing timestamp is more likely to lead to problems and misunderstandings than timestamptz.

As somebody who contributed TZ support to TimescaleDB I'm more or less
aware about the pros and cons of Timestamp and TimestampTz :)
Engineering is all about compromises. I can imagine a project where it
makes sense to use only TimestampTz for the entire database, and the
opposite - when it's better to use only UTC and Timestamp. In this
particular case I was merely concerned that the particular choice
could be confusing for the users but I think I changed my mind by now,
see below.

>> Here's v12. Changes:
>> 1. Documentation improvements
>> 2. Code comments
>> 3. Better commit message and reviews list
>
>
> Thank you, Andrey! I have just checked v12 – cleanly applied to HEAD, and functions work well. I especially like that
factthat we keep uuid_extract_time(..) here – this is a great thing to have for time-based partitioning, and in many
caseswe will be able to decide not to have a creation column timestamp (e.g., "created_at") at all, saving 8 bytes. 
>
> The docs and comments look great too.
>
> Overall, the patch looks mature enough. It would be great to have it in pg17. Yes, the RFC is not fully finalized
yet,but it's very close. And many libraries are already including implementation of UUIDv7 – here are some examples: 
>
> - https://www.npmjs.com/package/uuidv7
> - https://crates.io/crates/uuidv7
> - https://github.com/google/uuid/pull/139

Thanks!

After playing with v12 I'm inclined to agree that it's RfC.

I only have a couple of silly nitpicks:

- It could make sense to decompose the C implementation of uuidv7() in
two functions, for readability.
- It could make sense to get rid of curly braces in SQL tests when
calling uuid_extract_ver() and uuid_extract_ver(), for consistency.

I'm not going to insist on these changes though and prefer leaving it
to the author and the committer to decide.

Also I take back what I said above about using Timestamp instead of
TimestampTz. I forgot that Timestamps are implicitly casted to
TimestampTz's, so users preferring Timestamps can do this:

```
=# select uuidv7('2024-01-22 12:34:56' :: timestamp);
                uuidv7
--------------------------------------
 018d3085-de00-77c1-9e7b-7b04ddb9ebb9
```

Cfbot also seems to be happy with the patch so I'm changing the CF
entry status to RfC.

--
Best regards,
Aleksander Alekseev



Re: UUID v7

От
Aleksander Alekseev
Дата:
Hi,

> Cfbot also seems to be happy with the patch so I'm changing the CF
> entry status to RfC.

I've found a bug:

```
=# select now() - interval '5000 years';
                ?column?
----------------------------------------
 2977-01-24 15:29:01.779462+02:30:17 BC

Time: 0.957 ms

=# select uuidv7(now() - interval '5000 years');
                uuidv7
--------------------------------------
 720c1868-0764-7677-99cd-265b84ea08b9

=# select uuid_extract_time('720c1868-0764-7677-99cd-265b84ea08b9');
     uuid_extract_time
----------------------------
 5943-08-26 21:30:44.836+03
```

-- 
Best regards,
Aleksander Alekseev



Re: UUID v7

От
"Andrey M. Borodin"
Дата:

> On 24 Jan 2024, at 17:31, Aleksander Alekseev <aleksander@timescale.com> wrote:
>
> Hi,
>
>> Cfbot also seems to be happy with the patch so I'm changing the CF
>> entry status to RfC.
>
> I've found a bug:
>
> ```
> =# select now() - interval '5000 years';
>                ?column?
> ----------------------------------------
> 2977-01-24 15:29:01.779462+02:30:17 BC
>
> Time: 0.957 ms
>
> =# select uuidv7(now() - interval '5000 years');
>                uuidv7
> --------------------------------------
> 720c1868-0764-7677-99cd-265b84ea08b9
>
> =# select uuid_extract_time('720c1868-0764-7677-99cd-265b84ea08b9');
>     uuid_extract_time
> ----------------------------
> 5943-08-26 21:30:44.836+03
> ```

UUIDv7 range does not correspond to timestamp range. But it’s purpose is not in storing timestamp, but in being unique
identifier.So I don’t think it worth throwing an error when overflowing value is given. BTW if you will subtract some
nanoseconds- you will not get back timestamp you put into UUID too. 
UUID does not store timpestamp, it only uses it to generate an identifier. Some value can be extracted back, but with
limitedprecision, limited range and only if UUID was generated precisely by the specification in standard (and standard
allowsdeviation! Most of implementation try to tradeoff something). 


Best regards, Andrey Borodin.


Re: UUID v7

От
Aleksander Alekseev
Дата:
Hi,

> UUIDv7 range does not correspond to timestamp range. But it’s purpose is not in storing timestamp, but in being
uniqueidentifier. So I don’t think it worth throwing an error when overflowing value is given. BTW if you will subtract
somenanoseconds - you will not get back timestamp you put into UUID too. 
> UUID does not store timpestamp, it only uses it to generate an identifier. Some value can be extracted back, but with
limitedprecision, limited range and only if UUID was generated precisely by the specification in standard (and standard
allowsdeviation! Most of implementation try to tradeoff something). 

I don't claim that UUIDv7 purpose is storing timestamps, but I think
the invariant:

```
uuid_extract_time(uidv7(X)) == X
```

and (!) even more importantly:

```
if X > Y then uuidv7(X) > uuidv7(Y)
```

... should hold. Otherwise you can calculate crc64(X) or sha256(X)
internally in order to generate an unique ID and claim that it's fine.

Values that violate named invariants should be rejected with an error.

--
Best regards,
Aleksander Alekseev



Re: UUID v7

От
"Andrey M. Borodin"
Дата:

> On 24 Jan 2024, at 18:02, Aleksander Alekseev <aleksander@timescale.com> wrote:
>
> Hi,
>
>> UUIDv7 range does not correspond to timestamp range. But it’s purpose is not in storing timestamp, but in being
uniqueidentifier. So I don’t think it worth throwing an error when overflowing value is given. BTW if you will subtract
somenanoseconds - you will not get back timestamp you put into UUID too. 
>> UUID does not store timpestamp, it only uses it to generate an identifier. Some value can be extracted back, but
withlimited precision, limited range and only if UUID was generated precisely by the specification in standard (and
standardallows deviation! Most of implementation try to tradeoff something). 
>
> I don't claim that UUIDv7 purpose is storing timestamps, but I think
> the invariant:
>
> ```
> uuid_extract_time(uidv7(X)) == X
> ```
>
> and (!) even more importantly:
>
> ```
> if X > Y then uuidv7(X) > uuidv7(Y)
> ```
>
> ... should hold.
Function to extract timestamp does not provide any guarantees at all. Standard states this, see Kyzer answers upthread.
Moreover, standard urges against relying on that if uuidX was generated before uuidY, then uuidX<uuid. The standard is
doinga lot to make this happen, but does not guaranty that. 
All what is guaranteed is the uniqueness at certain conditions.

> Otherwise you can calculate crc64(X) or sha256(X)
> internally in order to generate an unique ID and claim that it's fine.
>
> Values that violate named invariants should be rejected with an error.

Think about the value that you pass to uuid generation function as an entropy. It’s there to ensure uniqueness and
promoteordering (but not guarantee). 


Best regards, Andrey Borodin.


Re: UUID v7

От
Aleksander Alekseev
Дата:
Hi,

> Values that violate named invariants should be rejected with an error.

To clarify, I don't think we should bother about the precision part.
"Equals" in the example above means "equal within UUIDv7 precision",
same for "more" and "less". However, years 2977 BC and 5943 AC are
clearly not equal, thus 2977 BC should be rejected as an invalid value
for UUIDv7.

-- 
Best regards,
Aleksander Alekseev



Re: UUID v7

От
Aleksander Alekseev
Дата:
Hi,

> Function to extract timestamp does not provide any guarantees at all. Standard states this, see Kyzer answers
upthread.
> Moreover, standard urges against relying on that if uuidX was generated before uuidY, then uuidX<uuid. The standard
isdoing a lot to make this happen, but does not guaranty that. 
> All what is guaranteed is the uniqueness at certain conditions.
>
> > Otherwise you can calculate crc64(X) or sha256(X)
> > internally in order to generate an unique ID and claim that it's fine.
> >
> > Values that violate named invariants should be rejected with an error.
>
> Think about the value that you pass to uuid generation function as an entropy. It’s there to ensure uniqueness and
promoteordering (but not guarantee). 

If the standard doesn't guarantee something it doesn't mean it forbids
us to give stronger guarantees. I'm convinced that these guarantees
will be useful in real-world applications, at least the ones acting
exclusively within Postgres.

This being said, I understand your point of view too. Let's see what
other people think.

--
Best regards,
Aleksander Alekseev



Re: UUID v7

От
"Andrey M. Borodin"
Дата:

> On 24 Jan 2024, at 18:29, Aleksander Alekseev <aleksander@timescale.com> wrote:
>
> Hi,
>
>> Function to extract timestamp does not provide any guarantees at all. Standard states this, see Kyzer answers
upthread.
>> Moreover, standard urges against relying on that if uuidX was generated before uuidY, then uuidX<uuid. The standard
isdoing a lot to make this happen, but does not guaranty that. 
>> All what is guaranteed is the uniqueness at certain conditions.
>>
>>> Otherwise you can calculate crc64(X) or sha256(X)
>>> internally in order to generate an unique ID and claim that it's fine.
>>>
>>> Values that violate named invariants should be rejected with an error.
>>
>> Think about the value that you pass to uuid generation function as an entropy. It’s there to ensure uniqueness and
promoteordering (but not guarantee). 
>
> If the standard doesn't guarantee something it doesn't mean it forbids
> us to give stronger guarantees.
No, the standard makes these guarantees impossible.
If we insist that uuid_extract_time(uuidv7(time))==time, we won't be able to generate uuidv7 most of the time.
uuidv7(now())will always ERROR-out. 
Standard implies more coarse-grained timestamp that we have.

Also, please not that uuidv7(time+1us) and uuidv7(time) will have the same internal timestamp, so despite time+1us >
time,still second uuid will be greater. 

Both invariants you proposed cannot be reasonably guaranteed. Upholding any of them greatly reduces usability of UUID
v7.


Best regards, Andrey Borodin.


Re: UUID v7

От
Aleksander Alekseev
Дата:
Hi,

> Also, please not that uuidv7(time+1us) and uuidv7(time) will have the same internal timestamp, so despite time+1us >
time,still second uuid will be greater.
 
>
> Both invariants you proposed cannot be reasonably guaranteed. Upholding any of them greatly reduces usability of UUID
v7.

Again, personally I don't insist on the 1us precision [1]. Only the
fact that timestamp from the far past generates UUID from the future
bothers me.

[1]: https://postgr.es/m/CAJ7c6TPCSprWwVNdOB%3D%3DpgKZPqO5q%3DHRgmU7zmYqz9Dz5ffVYw%40mail.gmail.com

-- 
Best regards,
Aleksander Alekseev



Re: UUID v7

От
"Andrey M. Borodin"
Дата:

> On 24 Jan 2024, at 20:46, Aleksander Alekseev <aleksander@timescale.com> wrote:
>
> Only the
> fact that timestamp from the far past generates UUID from the future
> bothers me.

PFA implementation of guard checks, but I'm afraid that this can cause failures in ID generation unexpected to the
user...
See tests

+-- errors in edge cases of UUID v7
+SELECT 1 FROM uuidv7('1970-01-01 00:00:00+00'::timestamptz - interval '0ms');
+SELECT uuidv7('1970-01-01 00:00:00+00'::timestamptz - interval '1ms'); -- ERROR expected
+SELECT 1 FROM uuidv7(uuid_extract_time('FFFFFFFF-FFFF-7FFF-B000-000000000000'));
+SELECT uuidv7(uuid_extract_time('FFFFFFFF-FFFF-7FFF-B000-000000000000')+'1ms'); -- ERROR expected

Range is from 1970-01-01 00:00:00 to 10889-08-02 05:31:50.655. I'm not sure we should give this information in error
message...
Thanks!


Best regards, Andrey Borodin.

Вложения

Re: UUID v7

От
Marcos Pegoraro
Дата:
Is enough from 1970 ?
How about if user wants to have an UUID of his birth date ?

regards
Marcos

Em qua., 24 de jan. de 2024 às 13:54, Andrey M. Borodin <x4mmm@yandex-team.ru> escreveu:


> On 24 Jan 2024, at 20:46, Aleksander Alekseev <aleksander@timescale.com> wrote:
>
> Only the
> fact that timestamp from the far past generates UUID from the future
> bothers me.

PFA implementation of guard checks, but I'm afraid that this can cause failures in ID generation unexpected to the user...
See tests

+-- errors in edge cases of UUID v7
+SELECT 1 FROM uuidv7('1970-01-01 00:00:00+00'::timestamptz - interval '0ms');
+SELECT uuidv7('1970-01-01 00:00:00+00'::timestamptz - interval '1ms'); -- ERROR expected
+SELECT 1 FROM uuidv7(uuid_extract_time('FFFFFFFF-FFFF-7FFF-B000-000000000000'));
+SELECT uuidv7(uuid_extract_time('FFFFFFFF-FFFF-7FFF-B000-000000000000')+'1ms'); -- ERROR expected

Range is from 1970-01-01 00:00:00 to 10889-08-02 05:31:50.655. I'm not sure we should give this information in error message...
Thanks!


Best regards, Andrey Borodin.

Re: UUID v7

От
Andrey Borodin
Дата:

> On 24 Jan 2024, at 22:00, Marcos Pegoraro <marcos@f10.com.br> wrote:
>
> Is enough from 1970 ?
Per standard unix_ts_ms field is a number of milliseconds from UNIX start date 1970-01-01.

> How about if user wants to have an UUID of his birth date ?

I've claimed my
0078c135-bd00-70b1-865a-63c3741922a5

But again, UUIDs are not designed to store timestamp. They are unique and v7 promote data locality via time-ordering.


Best regards, Andrey Borodin.


Re: UUID v7

От
Marcos Pegoraro
Дата:
I understand your point, but 
'2000-01-01' :: timestamp and '1900-01-01' :: timestamp are both valid timestamps.

So looks strange if user can do
select uuidv7(TIMESTAMP '2000-01-01')
but cannot do
select uuidv7(TIMESTAMP '1900-01-01')

Regards
Marcos


Em qua., 24 de jan. de 2024 às 14:51, Andrey Borodin <x4mmm@yandex-team.ru> escreveu:


> On 24 Jan 2024, at 22:00, Marcos Pegoraro <marcos@f10.com.br> wrote:
>
> Is enough from 1970 ?
Per standard unix_ts_ms field is a number of milliseconds from UNIX start date 1970-01-01.

> How about if user wants to have an UUID of his birth date ?

I've claimed my
0078c135-bd00-70b1-865a-63c3741922a5

But again, UUIDs are not designed to store timestamp. They are unique and v7 promote data locality via time-ordering.


Best regards, Andrey Borodin.

Re: UUID v7

От
Jelte Fennema-Nio
Дата:
On Wed, 24 Jan 2024 at 21:47, Marcos Pegoraro <marcos@f10.com.br> wrote:
>
> I understand your point, but
> '2000-01-01' :: timestamp and '1900-01-01' :: timestamp are both valid timestamps.
>
> So looks strange if user can do
> select uuidv7(TIMESTAMP '2000-01-01')
> but cannot do
> select uuidv7(TIMESTAMP '1900-01-01')



I think that would be okay honestly. I don't think there's any
reasonable value for the uuid when a timestamp is given outside of the
date range that the uuid7 "algorithm" supports.

So +1 for erroring when you provide a timestamp outside of that range
(either too far in the past or too far in the future).



Re: UUID v7

От
Sergey Prokhorenko
Дата:
"Other people" think that extracting the timestamp from UUIDv7 in violation of the new RFC, and generating UUIDv7 from the timestamp were both terrible and poorly thought out ideas. The authors of the new RFC had very good reasons to prohibit this. And the problems you face are the best confirmation of the correctness of the new RFC. It’s better to throw all this gag out of the official patch. Don't tempt developers to break the new RFC with these error-producing functions.


Sergey Prokhorenko
sergeyprokhorenko@yahoo.com.au


On Wednesday, 24 January 2024 at 04:30:02 pm GMT+3, Aleksander Alekseev <aleksander@timescale.com> wrote:


Hi,

> Function to extract timestamp does not provide any guarantees at all. Standard states this, see Kyzer answers upthread.
> Moreover, standard urges against relying on that if uuidX was generated before uuidY, then uuidX<uuid. The standard is doing a lot to make this happen, but does not guaranty that.
> All what is guaranteed is the uniqueness at certain conditions.
>
> > Otherwise you can calculate crc64(X) or sha256(X)
> > internally in order to generate an unique ID and claim that it's fine.
> >
> > Values that violate named invariants should be rejected with an error.
>
> Think about the value that you pass to uuid generation function as an entropy. It’s there to ensure uniqueness and promote ordering (but not guarantee).

If the standard doesn't guarantee something it doesn't mean it forbids
us to give stronger guarantees. I'm convinced that these guarantees
will be useful in real-world applications, at least the ones acting
exclusively within Postgres.

This being said, I understand your point of view too. Let's see what
other people think.

--
Best regards,

Aleksander Alekseev

Re: UUID v7

От
Sergey Prokhorenko
Дата:
That's right! There is no point in waiting for the official approval of the new RFC, which obviously will not change anything. I have been a contributor to this RFC for several years, and I can testify that every aspect imaginable has been thoroughly researched and agreed upon. Nothing new will definitely appear in the new RFC.


Sergey Prokhorenko
sergeyprokhorenko@yahoo.com.au


On Monday, 22 January 2024 at 07:22:32 am GMT+3, Nikolay Samokhvalov <nik@postgres.ai> wrote:


On Fri, Jan 19, 2024 at 10:07 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:


> On 19 Jan 2024, at 13:25, Andrey Borodin <x4mmm@yandex-team.ru> wrote:
>
> Also, I've added some documentation on all functions.

Here's v12. Changes:
1. Documentation improvements
2. Code comments
3. Better commit message and reviews list


Thank you, Andrey! I have just checked v12 – cleanly applied to HEAD, and functions work well. I especially like that fact that we keep uuid_extract_time(..) here – this is a great thing to have for time-based partitioning, and in many cases we will be able to decide not to have a creation column timestamp (e.g., "created_at") at all, saving 8 bytes.

The docs and comments look great too.

Overall, the patch looks mature enough. It would be great to have it in pg17. Yes, the RFC is not fully finalized yet, but it's very close. And many libraries are already including implementation of UUIDv7 – here are some examples:

- https://www.npmjs.com/package/uuidv7
- https://crates.io/crates/uuidv7
- https://github.com/google/uuid/pull/139

Nik

Re: UUID v7

От
Nikolay Samokhvalov
Дата:
On Wed, Jan 24, 2024 at 1:52 PM Sergey Prokhorenko <sergeyprokhorenko@yahoo.com.au> wrote:
That's right! There is no point in waiting for the official approval of the new RFC, which obviously will not change anything. I have been a contributor to this RFC for several years, and I can testify that every aspect imaginable has been thoroughly researched and agreed upon. Nothing new will definitely appear in the new RFC.

From a practical point of view, these two things are extremely important to have to support partitioning. It is better to implement limitations than throw them away.

Without them, this functionality will be of a very limited use in databases. We need to think about large tables – which means partitioning.

Nik

Re: UUID v7

От
Nikolay Samokhvalov
Дата:

On Wed, Jan 24, 2024 at 8:40 PM Nikolay Samokhvalov <nik@postgres.ai> wrote:
On Wed, Jan 24, 2024 at 1:52 PM Sergey Prokhorenko <sergeyprokhorenko@yahoo.com.au> wrote:
That's right! There is no point in waiting for the official approval of the new RFC, which obviously will not change anything. I have been a contributor to this RFC for several years, and I can testify that every aspect imaginable has been thoroughly researched and agreed upon. Nothing new will definitely appear in the new RFC.

From a practical point of view, these two things are extremely important to have to support partitioning. It is better to implement limitations than throw them away.

Without them, this functionality will be of a very limited use in databases. We need to think about large tables – which means partitioning.

apologies -- this was a response to another email from you:

> "Other people" think that extracting the timestamp from UUIDv7 in violation of the new RFC, and generating UUIDv7 from the timestamp were both terrible and poorly thought out ideas. The authors of the new RFC had very good reasons to prohibit this. And the problems you face are the best confirmation of the correctness of the new RFC. It’s better to throw all this gag out of the official patch. Don't tempt developers to break the new RFC with these error-producing functions.
 

Re: UUID v7

От
"Andrey M. Borodin"
Дата:

> On 25 Jan 2024, at 09:40, Nikolay Samokhvalov <nik@postgres.ai> wrote:
>
> From a practical point of view, these two things are extremely important to have to support partitioning. It is
betterto implement limitations than throw them away. 

Postgres always was a bit hackerish, allowing slightly more then is safe. I.e. you can define immutable function that
isnot really immutable, turn off autovacuum or fsync. Why bother with safety guards here? 
My opinion is that we should have this function to extract timestamp. Even if it can return strange values for
impreciseRFC implementation. 


> On 25 Jan 2024, at 02:15, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
>
> So +1 for erroring when you provide a timestamp outside of that range
> (either too far in the past or too far in the future).


OK, it seems like we have some consensus on ERRORing..

Do we have any other open items? Does v13 address all open items? Maybe let’s compose better error message?


Best regards, Andrey Borodin.


Re: UUID v7

От
Sergey Prokhorenko
Дата:
I am against turning the DBMS into another C++, in which they do not so much design something new as fix bugs in production after a crash.

As for partitioning, I already wrote to Andrey Borodin that we need a special function to generate a partition id using the UUIDv7 timestamp or even simultaneously with the generation of the timestamp. For example, every month (or so, since precision is not needed here) a new partition is created. Here's a good example: https://elixirforum.com/t/partitioning-postgres-tables-by-timestamp-based-uuids/60916

But without a separate function for extracting the entire timestamp from the UUID! Let's solve this specific problem, and not give the developers a grenade with the safety removed. Many developers have already decided to store the timestamp in UUIDv7, so as not to create a separate created_at field. Then they will delete table records with the old timestamp, etc. Horrible mistakes are simply guaranteed.


Sergey Prokhorenko sergeyprokhorenko@yahoo.com.au


On Thursday, 25 January 2024 at 09:51:58 am GMT+3, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:




> On 25 Jan 2024, at 09:40, Nikolay Samokhvalov <nik@postgres.ai> wrote:
>
> From a practical point of view, these two things are extremely important to have to support partitioning. It is better to implement limitations than throw them away.

Postgres always was a bit hackerish, allowing slightly more then is safe. I.e. you can define immutable function that is not really immutable, turn off autovacuum or fsync. Why bother with safety guards here?
My opinion is that we should have this function to extract timestamp. Even if it can return strange values for imprecise RFC implementation.


> On 25 Jan 2024, at 02:15, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
>
> So +1 for erroring when you provide a timestamp outside of that range
> (either too far in the past or too far in the future).


OK, it seems like we have some consensus on ERRORing..

Do we have any other open items? Does v13 address all open items? Maybe let’s compose better error message?


Best regards, Andrey Borodin.

Re: UUID v7

От
Przemysław Sztoch
Дата:
Andrey M. Borodin wrote on 25.01.2024 07:51:

On 25 Jan 2024, at 09:40, Nikolay Samokhvalov <nik@postgres.ai> wrote:

>From a practical point of view, these two things are extremely important to have to support partitioning. It is better to implement limitations than throw them away.
Postgres always was a bit hackerish, allowing slightly more then is safe. I.e. you can define immutable function that is not really immutable, turn off autovacuum or fsync. Why bother with safety guards here?
My opinion is that we should have this function to extract timestamp. Even if it can return strange values for imprecise RFC implementation.


On 25 Jan 2024, at 02:15, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:

So +1 for erroring when you provide a timestamp outside of that range
(either too far in the past or too far in the future).

OK, it seems like we have some consensus on ERRORing..

Do we have any other open items? Does v13 address all open items? Maybe let’s compose better error message?
+1 for erroring when ts is outside range.

v13 looks good for me. I think we have reached a optimal compromise.

--
Przemysław Sztoch | Mobile +48 509 99 00 66

Re: UUID v7

От
Aleksander Alekseev
Дата:
Hi,

> Postgres always was a bit hackerish, allowing slightly more then is safe. I.e. you can define immutable function that
isnot really immutable, turn off autovacuum or fsync. Why bother with safety guards here? 
> My opinion is that we should have this function to extract timestamp. Even if it can return strange values for
impreciseRFC implementation. 

Completely agree.

Users that don't like or don't need it can pretend there are no
uuid_extract_time() and uuidv7(T) in Postgres. If we don't provide
them however, users that need them will end up writing their own
probably buggy and not compatible implementations. That would be much
worse.

> So +1 for erroring when you provide a timestamp outside of that range
> (either too far in the past or too far in the future).
>
> OK, it seems like we have some consensus on ERRORing..
>
> Do we have any other open items? Does v13 address all open items? Maybe let’s compose better error message?
>
> +1 for erroring when ts is outside range.
>
> v13 looks good for me. I think we have reached a optimal compromise.

Andrey, many thanks for the updated patch.

LGTM, cfbot is happy and I don't think we have any open items left. So
changing CF entry status back to RfC.

--
Best regards,
Aleksander Alekseev



Re: UUID v7

От
Aleksander Alekseev
Дата:
Hi,

> Andrey, many thanks for the updated patch.
>
> LGTM, cfbot is happy and I don't think we have any open items left. So
> changing CF entry status back to RfC.

PFA v14. I changed:

```
elog(ERROR, "Time argument of UUID v7 cannot exceed 6 bytes");
```

... to:

```
ereport(ERROR,
    (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
    errmsg("Time argument of UUID v7 is outside of the valid range")));
```

Which IMO tells a bit more to the average user and is translatable.

> At a quick glance, the patch needs improving English, IMO.

Agree. We could use some help from a native English speaker for this.

-- 
Best regards,
Aleksander Alekseev

Вложения

Re: UUID v7

От
Sergey Prokhorenko
Дата:
Aleksander,

In this case the documentation must state that the functions uuid_extract_time() and uuidv7(T) are against the RFC requirements, and that developers may use these functions with caution at their own risk, and these functions are not recommended for production environment.

The function uuidv7(T) is not better than uuid_extract_time(). Careless developers may well pass any business date into this function: document date, registration date, payment date, reporting date, start date of the current month, data download date, and even a constant. This would be a profanation of UUIDv7 with very negative consequences.


Sergey Prokhorenko
sergeyprokhorenko@yahoo.com.au


On Thursday, 25 January 2024 at 03:06:50 pm GMT+3, Aleksander Alekseev <aleksander@timescale.com> wrote:


Hi,

> Postgres always was a bit hackerish, allowing slightly more then is safe. I.e. you can define immutable function that is not really immutable, turn off autovacuum or fsync. Why bother with safety guards here?
> My opinion is that we should have this function to extract timestamp. Even if it can return strange values for imprecise RFC implementation.

Completely agree.

Users that don't like or don't need it can pretend there are no
uuid_extract_time() and uuidv7(T) in Postgres. If we don't provide
them however, users that need them will end up writing their own
probably buggy and not compatible implementations. That would be much
worse.


> So +1 for erroring when you provide a timestamp outside of that range
> (either too far in the past or too far in the future).
>
> OK, it seems like we have some consensus on ERRORing..
>
> Do we have any other open items? Does v13 address all open items? Maybe let’s compose better error message?
>
> +1 for erroring when ts is outside range.
>
> v13 looks good for me. I think we have reached a optimal compromise.


Andrey, many thanks for the updated patch.

LGTM, cfbot is happy and I don't think we have any open items left. So
changing CF entry status back to RfC.

--
Best regards,
Aleksander Alekseev

Re: UUID v7

От
Sergey Prokhorenko
Дата:
By the way, the Go language has also already implemented a function for UUIDv7: https://pkg.go.dev/github.com/gofrs/uuid#NewV7




Sergey Prokhorenko sergeyprokhorenko@yahoo.com.au


On Thursday, 25 January 2024 at 12:49:46 am GMT+3, Sergey Prokhorenko <sergeyprokhorenko@yahoo.com.au> wrote:


That's right! There is no point in waiting for the official approval of the new RFC, which obviously will not change anything. I have been a contributor to this RFC for several years, and I can testify that every aspect imaginable has been thoroughly researched and agreed upon. Nothing new will definitely appear in the new RFC.


Sergey Prokhorenko
sergeyprokhorenko@yahoo.com.au


On Monday, 22 January 2024 at 07:22:32 am GMT+3, Nikolay Samokhvalov <nik@postgres.ai> wrote:


On Fri, Jan 19, 2024 at 10:07 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:


> On 19 Jan 2024, at 13:25, Andrey Borodin <x4mmm@yandex-team.ru> wrote:
>
> Also, I've added some documentation on all functions.

Here's v12. Changes:
1. Documentation improvements
2. Code comments
3. Better commit message and reviews list


Thank you, Andrey! I have just checked v12 – cleanly applied to HEAD, and functions work well. I especially like that fact that we keep uuid_extract_time(..) here – this is a great thing to have for time-based partitioning, and in many cases we will be able to decide not to have a creation column timestamp (e.g., "created_at") at all, saving 8 bytes.

The docs and comments look great too.

Overall, the patch looks mature enough. It would be great to have it in pg17. Yes, the RFC is not fully finalized yet, but it's very close. And many libraries are already including implementation of UUIDv7 – here are some examples:

- https://www.npmjs.com/package/uuidv7
- https://crates.io/crates/uuidv7
- https://github.com/google/uuid/pull/139

Nik

Re: UUID v7

От
Jelte Fennema-Nio
Дата:
tl;dr I believe we should remove the uuidv7(timestamp) function from
this patchset.

On Thu, 25 Jan 2024 at 18:04, Sergey Prokhorenko
<sergeyprokhorenko@yahoo.com.au> wrote:
> In this case the documentation must state that the functions uuid_extract_time() and uuidv7(T) are against the RFC
requirements,and that developers may use these functions with caution at their own risk, and these functions are not
recommendedfor production environment. 
>
> The function uuidv7(T) is not better than uuid_extract_time(). Careless developers may well pass any business date
intothis function: document date, registration date, payment date, reporting date, start date of the current month,
datadownload date, and even a constant. This would be a profanation of UUIDv7 with very negative consequences. 

After re-reading the RFC more diligently, I'm inclined to agree with
Sergey that uuidv7(timestamp) is quite problematic. And I would even
say that we should not provide uuidv7(timestamp) at all, and instead
should only provide uuidv7(). Providing an explicit timestamp for
UUIDv7 is explicitly against the spec (in my reading):

> Implementations acquire the current timestamp from a reliable
> source to provide values that are time-ordered and continually
> increasing.  Care must be taken to ensure that timestamp changes
> from the environment or operating system are handled in a way that
> is consistent with implementation requirements.  For example, if
> it is possible for the system clock to move backward due to either
> manual adjustment or corrections from a time synchronization
> protocol, implementations need to determine how to handle such
> cases.  (See Altering, Fuzzing, or Smearing below.)
>
> ...
>
> UUID version 1 and 6 both utilize a Gregorian epoch timestamp
> while UUIDv7 utilizes a Unix Epoch timestamp.  If other timestamp
> sources or a custom timestamp epoch are required, UUIDv8 MUST be
> used.
>
> ...
>
> Monotonicity (each subsequent value being greater than the last) is
> the backbone of time-based sortable UUIDs.

By allowing users to provide a timestamp we're not using a continually
increasing timestamp for our UUIDv7 generation, and thus it would not
be a valid UUIDv7 implementation.

I do agree with others however, that being able to pass in an
arbitrary timestamp for UUID generation would be very useful. For
example to be able to partition by the timestamp in the UUID and then
being able to later load data for an older timestamp and have it be
added to to the older partition. But it's possible to do that while
still following the spec, by using a UUIDv8 instead of UUIDv7. So for
this usecase we could make a helper function that generates a UUIDv8
using the same format as a UUIDv7, but allows storing arbitrary
timestamps. You might say, why not sligthly change UUIDv7 then? Well
mainly because of this critical sentence in the RFC:

> UUIDv8's uniqueness will be implementation-specific and MUST NOT be assumed.

That would allow us to say that using this UUIDv8 helper requires
careful usage and checks if uniqueness is required.

So I believe we should remove the uuidv7(timestamp) function from this patchset.

I don't see a problem with including uuid_extract_time though. Afaict
the only thing the RFC says about extracting timestamps is that the
RFC does not give a requirement or guarantee about how close the
stored timestamp is to the actual time:

> Implementations MAY alter the actual timestamp.  Some examples
> include security considerations around providing a real clock
> value within a UUID, to correct inaccurate clocks, to handle leap
> seconds, or instead of dividing a number of microseconds by 1000
> to obtain a millisecond value; dividing by 1024 (or some other
> value) for performance reasons.  This specification makes no
> requirement or guarantee about how close the clock value needs to
> be to the actual time.

I see no reason why we cannot make stronger guarantees about the
timestamps that we use to generate UUIDs with our uuidv7() function.
And then we can update the documentation for
uuid_extract_time to something like this:

> This function extracts a timestamptz from UUID versions 1, 6 and 7. For other
> versions and variants this function returns NULL. The extracted timestamp
> does not necessarily equate to the time of UUID generation. How close it is
> to the actual time depends on the implementation that generated to UUID.
> The uuidv7() function provided PostgreSQL will normally store the actual time of
> generation to in the UUID, but if large batches of UUIDs are generated at the
> same time it's possible that some UUIDs will store a time that is slightly later
> than their actual generation time.



Re: UUID v7

От
Jelte Fennema-Nio
Дата:
On Thu, 25 Jan 2024 at 13:31, Aleksander Alekseev
<aleksander@timescale.com> wrote:
> PFA v14.

+<function>uuidv4</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   Both functions return a version 4 (random) UUID. This is the most commonly
+   used type of UUID and is appropriate when random distribution of keys does
+   not affect performance of an application.
+<synopsis>
+<function>uuidv7</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   This function returns a version 7 (time-ordered + random) UUID. This UUID
+   version should be used when application prefers locality of identifiers.
+<synopsis>

I think it would be good to explain the tradeoffs between uuidv4 and
uuidv7 a bit better. How about changing the docs to something like
this:

<function>uuidv4</function> () <returnvalue>uuid</returnvalue>
</synopsis>
Both functions return a version 4 (random) UUID. UUIDv4 is one of the
most commonly used types of UUID. It is appropriate when random
distribution of keys does not affect performance of an application or
when exposing the generation time of a UUID has unacceptable security
or business intelligence implications.
<synopsis>
<function>uuidv7</function> () <returnvalue>uuid</returnvalue>
</synopsis>
This function returns a version 7 (time-ordered + random) UUID. It
provides much better data locality than UUIDv4, which can greatly
improve performance when UUID is used in a BTREE index (the default
index type in PostgreSQL). To achieve this data locality, UUIDv7
embeds its own generation time into the UUID. If exposing such a
timestamp has unacceptable security or business intelligence
implications, then uuidv4() should be used instead.
<synopsis>



Re: UUID v7

От
Junwang Zhao
Дата:
On Mon, Jan 29, 2024 at 7:38 PM Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
>
> tl;dr I believe we should remove the uuidv7(timestamp) function from
> this patchset.
>
> On Thu, 25 Jan 2024 at 18:04, Sergey Prokhorenko
> <sergeyprokhorenko@yahoo.com.au> wrote:
> > In this case the documentation must state that the functions uuid_extract_time() and uuidv7(T) are against the RFC
requirements,and that developers may use these functions with caution at their own risk, and these functions are not
recommendedfor production environment. 
> >
> > The function uuidv7(T) is not better than uuid_extract_time(). Careless developers may well pass any business date
intothis function: document date, registration date, payment date, reporting date, start date of the current month,
datadownload date, and even a constant. This would be a profanation of UUIDv7 with very negative consequences. 
>
> After re-reading the RFC more diligently, I'm inclined to agree with
> Sergey that uuidv7(timestamp) is quite problematic. And I would even
> say that we should not provide uuidv7(timestamp) at all, and instead
> should only provide uuidv7(). Providing an explicit timestamp for
> UUIDv7 is explicitly against the spec (in my reading):
>
> > Implementations acquire the current timestamp from a reliable
> > source to provide values that are time-ordered and continually
> > increasing.  Care must be taken to ensure that timestamp changes
> > from the environment or operating system are handled in a way that
> > is consistent with implementation requirements.  For example, if
> > it is possible for the system clock to move backward due to either
> > manual adjustment or corrections from a time synchronization
> > protocol, implementations need to determine how to handle such
> > cases.  (See Altering, Fuzzing, or Smearing below.)
> >
> > ...
> >
> > UUID version 1 and 6 both utilize a Gregorian epoch timestamp
> > while UUIDv7 utilizes a Unix Epoch timestamp.  If other timestamp
> > sources or a custom timestamp epoch are required, UUIDv8 MUST be
> > used.
> >
> > ...
> >
> > Monotonicity (each subsequent value being greater than the last) is
> > the backbone of time-based sortable UUIDs.
>
> By allowing users to provide a timestamp we're not using a continually
> increasing timestamp for our UUIDv7 generation, and thus it would not
> be a valid UUIDv7 implementation.
>
> I do agree with others however, that being able to pass in an
> arbitrary timestamp for UUID generation would be very useful. For
> example to be able to partition by the timestamp in the UUID and then
> being able to later load data for an older timestamp and have it be
> added to to the older partition. But it's possible to do that while
> still following the spec, by using a UUIDv8 instead of UUIDv7. So for
> this usecase we could make a helper function that generates a UUIDv8
> using the same format as a UUIDv7, but allows storing arbitrary
> timestamps. You might say, why not sligthly change UUIDv7 then? Well
> mainly because of this critical sentence in the RFC:
>
> > UUIDv8's uniqueness will be implementation-specific and MUST NOT be assumed.
>
> That would allow us to say that using this UUIDv8 helper requires
> careful usage and checks if uniqueness is required.
>
> So I believe we should remove the uuidv7(timestamp) function from this patchset.

Agreed, the RFC section 6.1[1] has the following statements:

```
UUID version 1 and 6 both utilize a Gregorian epoch timestamp while
UUIDv7 utilizes a Unix Epoch timestamp. If other timestamp sources or
a custom timestamp epoch are required, UUIDv8 MUST be used.
```

In contrib/uuid-ossp, uuidv1 does not allow the user to supply a
custom timestamp,
so I think it should be the same for uuidv6 and uuidv7.

And I have the same feeling that we should not consider v6 and v8 in
this patch.


[1]: https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-14#section-6.1-2.4.1

>
> I don't see a problem with including uuid_extract_time though. Afaict
> the only thing the RFC says about extracting timestamps is that the
> RFC does not give a requirement or guarantee about how close the
> stored timestamp is to the actual time:
>
> > Implementations MAY alter the actual timestamp.  Some examples
> > include security considerations around providing a real clock
> > value within a UUID, to correct inaccurate clocks, to handle leap
> > seconds, or instead of dividing a number of microseconds by 1000
> > to obtain a millisecond value; dividing by 1024 (or some other
> > value) for performance reasons.  This specification makes no
> > requirement or guarantee about how close the clock value needs to
> > be to the actual time.
>
> I see no reason why we cannot make stronger guarantees about the
> timestamps that we use to generate UUIDs with our uuidv7() function.
> And then we can update the documentation for
> uuid_extract_time to something like this:
>
> > This function extracts a timestamptz from UUID versions 1, 6 and 7. For other
> > versions and variants this function returns NULL. The extracted timestamp
> > does not necessarily equate to the time of UUID generation. How close it is
> > to the actual time depends on the implementation that generated to UUID.
> > The uuidv7() function provided PostgreSQL will normally store the actual time of
> > generation to in the UUID, but if large batches of UUIDs are generated at the
> > same time it's possible that some UUIDs will store a time that is slightly later
> > than their actual generation time.
>
>


--
Regards
Junwang Zhao



Re: UUID v7

От
"Andrey M. Borodin"
Дата:

> On 25 Jan 2024, at 22:04, Sergey Prokhorenko <sergeyprokhorenko@yahoo.com.au> wrote:
>
> Aleksander,
>
> In this case the documentation must state that the functions uuid_extract_time() and uuidv7(T) are against the RFC
requirements,and that developers may use these functions with caution at their own risk, and these functions are not
recommendedfor production environment. 

Refining documentation is good. However, saying that these functions are not recommended for production must be based
onsome real threats. 

>
> The function uuidv7(T) is not better than uuid_extract_time(). Careless developers may well pass any business date
intothis function: document date, registration date, payment date, reporting date, start date of the current month,
datadownload date, and even a constant. This would be a profanation of UUIDv7 with very negative consequences. 

Even if the developer pass constant time to uuidv7(T) they will get what they asked for - unique identifier. Moreover -
itstill will be keeping locality. There will be no negative consequences at all. 
On the contrary, experienced developer can leverage parameter when data locality should be reduced. If you have
serveralstreams of data, you might want to introduce some shift in reduce contention. 
For example, you can generate uuidv7(now() + '1 day' * random(0,10)). This will split 1 contention point to 10 and
increaseingestion performance 10x-fold. 

> On 29 Jan 2024, at 18:58, Junwang Zhao <zhjwpku@gmail.com> wrote:
>
> If other timestamp sources or
> a custom timestamp epoch are required, UUIDv8 MUST be used.

Well, yeah. RFC says this... in 4 capital letters :) I believe it's kind of a big deficiency that k-way sortable
identifiersare not implementable on top of UUIDv7. Well, let's go without this function. UUIDv7 is still an improvement
overprevious versions. 


Jelte, your documentation corrections looks good to me, I'll include them in next version.

Thanks!


Best regards, Andrey Borodin.


Re: UUID v7

От
Jelte Fennema-Nio
Дата:
On Mon, 29 Jan 2024 at 19:32, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
> Even if the developer pass constant time to uuidv7(T) they will get what they asked for - unique identifier. Moreover
-it still will be keeping locality. There will be no negative consequences at all.
 

It will be significantly "less unique" than if they wouldn't pass a
constant time. Basically it would become a UUIDv4, but with 74 bits of
random data instead of 122. That might not be enough anymore to
"guarantee" uniqueness. I guess that's why it is required to use
UUIDv8 in these cases, because correct usage is now a requirement for
assuming uniqueness. And for UUIDv8 the spec says this:

> UUIDv8's uniqueness will be implementation-specific and MUST NOT be assumed.

> > On 29 Jan 2024, at 18:58, Junwang Zhao <zhjwpku@gmail.com> wrote:
> >
> > If other timestamp sources or
> > a custom timestamp epoch are required, UUIDv8 MUST be used.
>
> Well, yeah. RFC says this... in 4 capital letters :)

As an FYI, there is an RFC that defines these keywords that's why they
are capital letters: https://www.ietf.org/rfc/rfc2119.txt

> I believe it's kind of a big deficiency that k-way sortable identifiers are not implementable on top of UUIDv7. Well,
let'sgo without this function. UUIDv7 is still an improvement over previous versions.
 

Yeah, I liked the feature to generate UUIDv7 based on timestamp too.
But following the spec seems more important than a nice feature to me.



Re: UUID v7

От
Sergey Prokhorenko
Дата:
Andrey,

I understand and agree with your goals. But instead of dangerous universal functions, it is better to develop safe highly specialized functions that implement only these goals.

There should not be a function uuidv7(T) from an arbitrary timestamp, but there should be a special function that implements your algorithm: uuidv8(now() + '1 century' * random(0,10)).

I replaced 1 day with 1 century because the spread of 1 day is too small. Over time, records will be inserted between existing records, which is undesirable.

Similarly, if we need to calculate the partition id, then we do not need to use the uuid_extract_time() function to provide the extracted timestamp, the accuracy of which cannot be guaranteed. Instead, we need to give exactly the partition id, calculated using the uuidv7 timestamp. For example, partitions may have approximately a month interval between each other.

As for the documentation, it must be indicated that the UUIDv7 structure is not timestamp + random, but timestamp + randomly seeded counter + random, like in all advanced implementations.


Sergey Prokhorenko

sergeyprokhorenko@yahoo.com.au

______________________________________________________________


On Monday, 29 January 2024 at 09:32:54 pm GMT+3, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:




> On 25 Jan 2024, at 22:04, Sergey Prokhorenko <sergeyprokhorenko@yahoo.com.au> wrote:
>
> Aleksander,
>
> In this case the documentation must state that the functions uuid_extract_time() and uuidv7(T) are against the RFC requirements, and that developers may use these functions with caution at their own risk, and these functions are not recommended for production environment.

Refining documentation is good. However, saying that these functions are not recommended for production must be based on some real threats.

>
> The function uuidv7(T) is not better than uuid_extract_time(). Careless developers may well pass any business date into this function: document date, registration date, payment date, reporting date, start date of the current month, data download date, and even a constant. This would be a profanation of UUIDv7 with very negative consequences.

Even if the developer pass constant time to uuidv7(T) they will get what they asked for - unique identifier. Moreover - it still will be keeping locality. There will be no negative consequences at all.
On the contrary, experienced developer can leverage parameter when data locality should be reduced. If you have serveral streams of data, you might want to introduce some shift in reduce contention.
For example, you can generate uuidv7(now() + '1 day' * random(0,10)). This will split 1 contention point to 10 and increase ingestion performance 10x-fold.

> On 29 Jan 2024, at 18:58, Junwang Zhao <zhjwpku@gmail.com> wrote:
>
> If other timestamp sources or
> a custom timestamp epoch are required, UUIDv8 MUST be used.

Well, yeah. RFC says this... in 4 capital letters :) I believe it's kind of a big deficiency that k-way sortable identifiers are not implementable on top of UUIDv7. Well, let's go without this function. UUIDv7 is still an improvement over previous versions.


Jelte, your documentation corrections looks good to me, I'll include them in next version.

Thanks!


Best regards, Andrey Borodin.

Re: UUID v7

От
"Andrey M. Borodin"
Дата:

> On 30 Jan 2024, at 01:38, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
>
> Yeah, I liked the feature to generate UUIDv7 based on timestamp too.
> But following the spec seems more important than a nice feature to me.

PFA v15. Changes: removed timestamp argument, incorporated Jelte’s documentation addons.

Thanks!


Best regards, Andrey Borodin.

Вложения

Re: UUID v7

От
Sergey Prokhorenko
Дата:
Andrey,

I think this phrase is outdated: "This function can optionally accept a timestamp used instead of current time.
This allows implementation of k-way sotable identifiers."

This phrase is wrong: "Both functions return a version 4 (random) UUID."

For this phrase the reason is unclear and the phrase is most likely incorrect:
if large batches of UUIDs are generated at the
+   same time it's possible that some UUIDs will store a time that is slightly later
+   than their actual generation time


Sergey Prokhorenko

sergeyprokhorenko@yahoo.com.au


On Tuesday, 30 January 2024 at 09:55:04 am GMT+3, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:




> On 30 Jan 2024, at 01:38, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
>
> Yeah, I liked the feature to generate UUIDv7 based on timestamp too.
> But following the spec seems more important than a nice feature to me.


PFA v15. Changes: removed timestamp argument, incorporated Jelte’s documentation addons.

Thanks!


Best regards, Andrey Borodin.

Re: UUID v7

От
"Andrey M. Borodin"
Дата:

> On 30 Jan 2024, at 12:28, Sergey Prokhorenko <sergeyprokhorenko@yahoo.com.au> wrote:
>
>
> I think this phrase is outdated: "This function can optionally accept a timestamp used instead of current time.
> This allows implementation of k-way sotable identifiers.”
Fixed.

> This phrase is wrong: "Both functions return a version 4 (random) UUID.”
This applies to functions gen_random_uuid() and uuidv4().
>
> For this phrase the reason is unclear and the phrase is most likely incorrect:
> if large batches of UUIDs are generated at the
> +   same time it's possible that some UUIDs will store a time that is slightly later
> +   than their actual generation time

I’ve rewritten this phrase, hope it’s more clear now.


Best regards, Andrey Borodin.

Вложения

Re: UUID v7

От
Junwang Zhao
Дата:
Hi Andrey,

On Tue, Jan 30, 2024 at 5:56 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>
>
>
> > On 30 Jan 2024, at 12:28, Sergey Prokhorenko <sergeyprokhorenko@yahoo.com.au> wrote:
> >
> >
> > I think this phrase is outdated: "This function can optionally accept a timestamp used instead of current time.
> > This allows implementation of k-way sotable identifiers.”
> Fixed.
>
> > This phrase is wrong: "Both functions return a version 4 (random) UUID.”
> This applies to functions gen_random_uuid() and uuidv4().
> >
> > For this phrase the reason is unclear and the phrase is most likely incorrect:
> > if large batches of UUIDs are generated at the
> > +   same time it's possible that some UUIDs will store a time that is slightly later
> > +   than their actual generation time
>
> I’ve rewritten this phrase, hope it’s more clear now.
>
>
> Best regards, Andrey Borodin.

+Datum
+uuid_extract_var(PG_FUNCTION_ARGS)
+{
+ pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
+ uint16_t result;
+ result = uuid->data[8] >> 6;
+
+ PG_RETURN_UINT16(result);
+}
\ No newline at end of file

It's always good to add a newline at the end of a  source file, though
this might be nitpicky.

--
Regards
Junwang Zhao



Re: UUID v7

От
"Andrey M. Borodin"
Дата:

> On 30 Jan 2024, at 15:33, Junwang Zhao <zhjwpku@gmail.com> wrote:
> 
> It's always good to add a newline at the end of a  source file, though
> this might be nitpicky.

Thanks, also fixed warning found by CFBot.


Best regards, Andrey Borodin.

Вложения

Re: UUID v7

От
Sergey Prokhorenko
Дата:
typo:
being carried to time step

should be:
being carried to timestemp

Sergey Prokhorenko sergeyprokhorenko@yahoo.com.au


On Tuesday, 30 January 2024 at 04:35:45 pm GMT+3, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:




> On 30 Jan 2024, at 15:33, Junwang Zhao <zhjwpku@gmail.com> wrote:
>
> It's always good to add a newline at the end of a  source file, though
> this might be nitpicky.

Thanks, also fixed warning found by CFBot.



Best regards, Andrey Borodin.

Re: UUID v7

От
Peter Eisentraut
Дата:
On 30.01.24 14:35, Andrey M. Borodin wrote:
>> On 30 Jan 2024, at 15:33, Junwang Zhao <zhjwpku@gmail.com> wrote:
>>
>> It's always good to add a newline at the end of a  source file, though
>> this might be nitpicky.
> 
> Thanks, also fixed warning found by CFBot.

I have various comments on this patch:


- doc/src/sgml/func.sgml

The documentation of the new functions should be broken up a bit.
It's all one paragraph now.  At least make it several paragraphs, or
possibly tables or something else.

Avoid listing the functions twice: Once before the description and
then again in the description.  That's just going to get out of date.
The first listing is not necessary, I think.

The return values in the documentation should use the public-facing
type names, like "timestamp with time zone" and "smallint".

The descriptions of the UUID generation functions use handwavy
language in their descriptions, like "It provides much better data
locality" or "unacceptable security or business intelligence
implications", which isn't useful.  Either we cut that all out and
just say, it creates a UUIDv7, done, look elsewhere for more
information, or we provide some more concretely useful details.

We shouldn't label a link as "IETF standard" when it's actually a
draft.


- src/include/catalog/pg_proc.dat

The description of uuidv4 should be "generate UUID version 4", so that
it parallels uuidv7.

The description of uuid_extract_time says 'extract timestamp from UUID
version 7', the implementation is not limited to version 7.

I think uuid_extract_time should be named uuid_extract_timestamp,
because it extracts a timestamp, not a time.

The functions uuid_extract_ver and uuid_extract_var could be named
uuid_extract_version and uuid_extract_variant.  Otherwise, it's hard
to tell them apart, with only one letter different.


- src/test/regress/sql/uuid.sql

Why are the tests using the input format '{...}', which is not the
standard one?


- src/backend/utils/adt/uuid.c

All this new code should have more comments.  There is a lot of bit
twiddling going on, and I suppose one is expected to follow along in
the RFC?  At least each function should have a header comment, so one
doesn't have to check in pg_proc.dat what it's supposed to do.

I'm suspicious that these functions all appear to return null for
erroneous input, rather than raising errors.  I think at least some
explanation for this should be recorded somewhere.

I think the behavior of uuid_extract_var(iant) is wrong.  The code
takes just two bits to return, but the draft document is quite clear
that the variant is 4 bits (see Table 1).

The uuidv7 function could really use a header comment that explains
the choices that were made.  The RFC draft provides various options
that implementations could use; we should describe which ones we
chose.

I would have expected that, since gettimeofday() provides microsecond
precision, we'd put the extra precision into "rand_a" as per Section 6.2 
method 3.

You use some kind of counter, but could you explain which method that
counter implements?

I don't see any acknowledgment of issues relating to concurrency or
restarts.  Like, how do we prevent duplicates being generated by
concurrent sessions or between restarts?  Maybe the counter or random
stuff does that, but it's not explained.




Re: UUID v7

От
"Andrey M. Borodin"
Дата:
Hi Peter,

thank you for so thoughtful review.

> On 6 Mar 2024, at 12:13, Peter Eisentraut <peter@eisentraut.org> wrote:
>
> I have various comments on this patch:
>
>
> - doc/src/sgml/func.sgml
>
> The documentation of the new functions should be broken up a bit.
> It's all one paragraph now.  At least make it several paragraphs, or
> possibly tables or something else.
I've split functions to generate UUIDs from functions to extract stuff.

>
> Avoid listing the functions twice: Once before the description and
> then again in the description.  That's just going to get out of date.
> The first listing is not necessary, I think.

Fixed.

> The return values in the documentation should use the public-facing
> type names, like "timestamp with time zone" and "smallint".

Fixed.

> The descriptions of the UUID generation functions use handwavy
> language in their descriptions, like "It provides much better data
> locality" or "unacceptable security or business intelligence
> implications", which isn't useful.  Either we cut that all out and
> just say, it creates a UUIDv7, done, look elsewhere for more
> information, or we provide some more concretely useful details.

I've removed all that stuff entirely.

> We shouldn't label a link as "IETF standard" when it's actually a
> draft.

Fixed.

Well, all my modifications of documentation are kind of blind... I tried to "make docs", but it gives me gazilion of
errors...Is there an easy way to see resulting HTML? 


> - src/include/catalog/pg_proc.dat
>
> The description of uuidv4 should be "generate UUID version 4", so that
> it parallels uuidv7.

Fixed.

> The description of uuid_extract_time says 'extract timestamp from UUID
> version 7', the implementation is not limited to version 7.

Fixed.

> I think uuid_extract_time should be named uuid_extract_timestamp,
> because it extracts a timestamp, not a time.

Renamed.

> The functions uuid_extract_ver and uuid_extract_var could be named
> uuid_extract_version and uuid_extract_variant.  Otherwise, it's hard
> to tell them apart, with only one letter different.

Renamed.

> - src/test/regress/sql/uuid.sql
>
> Why are the tests using the input format '{...}', which is not the
> standard one?

Fixed.

> - src/backend/utils/adt/uuid.c
>
> All this new code should have more comments.  There is a lot of bit
> twiddling going on, and I suppose one is expected to follow along in
> the RFC?  At least each function should have a header comment, so one
> doesn't have to check in pg_proc.dat what it's supposed to do.

I've added some header comment. One big comment is attached to v7, I tried to take parts mostly from RFC. Yet there are
alot of my additions that now need review... 

> I'm suspicious that these functions all appear to return null for
> erroneous input, rather than raising errors.  I think at least some
> explanation for this should be recorded somewhere.

The input is not erroneous per se.
But the fact that
# select 1/0;
ERROR:  division by zero
makes me consider throwing an error. There was some argumentation upthread for not throwing error though, but now I
cannotfind it... maybe I accepted this behaviour as more user-friendly. 

> I think the behavior of uuid_extract_var(iant) is wrong.  The code
> takes just two bits to return, but the draft document is quite clear
> that the variant is 4 bits (see Table 1).

Well, it was correct only for implemented variant. I've made version that implements full table 1 from section 4.1.

> The uuidv7 function could really use a header comment that explains
> the choices that were made.  The RFC draft provides various options
> that implementations could use; we should describe which ones we
> chose.

Done.

>
> I would have expected that, since gettimeofday() provides microsecond
> precision, we'd put the extra precision into "rand_a" as per Section 6.2 method 3.

I had chosen method 2 over method 3 as most portable. Can we be sure how many bits (after reading milliseconds) are
thereacross different OSes? Even if we put extra 10 bits of timestamp, we cannot extract safely them. 
These bits could promote inter-backend stortability. E.i. when many backends generate data fast - this data is still
somewhatordered even within 1ms. But I think benefits of this sortability are outweighed by portability(unknown real
resolution),simplicity(we don't store microseconds, thus do not try to extract them). 
All this arguments are weak, but if one method would be strictly better than another - there would be only one method.

>
> You use some kind of counter, but could you explain which method that
> counter implements?
I described counter in uuidv7() header.

>
> I don't see any acknowledgment of issues relating to concurrency or
> restarts.  Like, how do we prevent duplicates being generated by
> concurrent sessions or between restarts?  Maybe the counter or random
> stuff does that, but it's not explained.

I think restart takes more than 1ms, so this is covered with time tick.
I've added paragraph about frequency of generation in uuidv7() header.


Best regards, Andrey Borodin.

Вложения

Re: UUID v7

От
"Andrey M. Borodin"
Дата:

> On 10 Mar 2024, at 17:59, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>
> I tried to "make docs", but it gives me gazilion of errors... Is there an easy way to see resulting HTML?

Oops, CFbot expectedly found a problem...
Sorry for the noise, this version, I hope, will pass all the tests.
Thanks!


Best regards, Andrey Borodin.

Вложения

Re: UUID v7

От
Aleksander Alekseev
Дата:
Hi,

> Oops, CFbot expectedly found a problem...
> Sorry for the noise, this version, I hope, will pass all the tests.
> Thanks!
>
> Best regards, Andrey Borodin.

I had some issues applying v19 against the current `master` branch.
PFA the rebased and minorly tweaked v20.

The patch LGTM. I think it could be merged unless there are any open
issues left. I don't think so, but maybe I missed something.

-- 
Best regards,
Aleksander Alekseev

Вложения

Re: UUID v7

От
Jelte Fennema-Nio
Дата:
Attached a few comment fixes/improvements and a pgindent run (patch 0002-0004)

Now with the added comments, one thing pops out to me: The comments
mention that we use "Monotonic Random", but when I read the spec that
explicitly recommends against using an increment of 1 when using
monotonic random. I feel like if we use an increment of 1, we're
better off going for the "Fixed-Length Dedicated Counter Bits" method
(i.e. change the code to start the counter at 0). See patch 0005 for
an example of that change.

I'm also wondering if we really want to use the extra rand_b bits for
this. The spec says we MAY, but it does remove the amount of
randomness in our UUIDs.

Вложения

Re: UUID v7

От
"Andrey M. Borodin"
Дата:

> On 11 Mar 2024, at 20:56, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
>
> Attached a few comment fixes/improvements and a pgindent run (patch 0002-0004)

Thanks!

> Now with the added comments, one thing pops out to me: The comments
> mention that we use "Monotonic Random", but when I read the spec that
> explicitly recommends against using an increment of 1 when using
> monotonic random. I feel like if we use an increment of 1, we're
> better off going for the "Fixed-Length Dedicated Counter Bits" method
> (i.e. change the code to start the counter at 0). See patch 0005 for
> an example of that change.
>
> I'm also wondering if we really want to use the extra rand_b bits for
> this. The spec says we MAY, but it does remove the amount of
> randomness in our UUIDs.

Method 1 is a just a Method 2 with specifically picked constants.
But I'll have to use some hand-wavy wordings...

UUID consists of these 128 bits:
a. Mandatory 2 var and 4 ver bits.
b. Flexible but strongly recommended 48 bits unix_ts_ms. These bits contribute to global sortability of values
generatedat frequency less than 1KHz. 
c. Counter bits:
c1. Initialised with 0 on any time tick.
c2. Initialised with randomness.
c3*. bit width of a counter step (*not counted in 128 bit capacity, can be non-integral)
d. Randomness bits.

Method 1 is when c2=0. My implementation of method 2 uses c1=1, c2=17

Consider all UUIDs generated at any given milliseconds. Probability of a collision of two UUIDs generated at frequency
lessthan 1KHz is p = 2^-(c2+d) 
Capacity of a counter has expected value of c = 2^(c1)*2^(c2-1)/2^c3
To guess next UUID you can correctly pick one of u = 2^(d+c3)

First, observe that c3 contributes unguessability at exactly same scale as decreases counter capacity. There is no
differencebetween using bits in d directly, or in c3. There is no point in non-zero c3. Every bit that could be given
toc3 can equally be given to d. 

Second, observe that c2 bits contribute to both collision protection and counter capacity! And when the time ticks, c2
alsocontribute to unguessability! So, technically, we should consider using all available bits as c2 bits. 

How many c1 bits do we need? I've chosen one - to prevent occasional counter capacity reduction.

If c1 = 1, we can distribute 73 bits between c2 and d. I've chosen c2 = 17 and d = 56 as an arbitrary compromise
betweencapacity of one backend per ms and prevention of global collision. 
This compromise is mostly dictated by maximum frequency of UUID generation by one backend, I've chosen 200MHz as a sane
value.


This compromise is much easier when you do not have 74 spare bits, this crazy amount of information forgives almost any
mistake.Imagine you have to distribute 10 bits between c2 and d. And you try to prevent collision between 10
independentdevices which need capacity to generate IDs with frequency of 10KHz each and keep sortability. You would
havesomething like c1=1, c2=3,d=6. 

Sorry for this long and vague explanation, if it still seems too uncertain we can have a chat or something like that. I
don'tthink this number picking stuff deserve to be commented, because it still is quite close to random. RFC gives us
toomuch freedom of choice. 

Thanks!


Best regards, Andrey Borodin.


Re: UUID v7

От
Michael Paquier
Дата:
On Mon, Mar 11, 2024 at 11:27:43PM +0500, Andrey M. Borodin wrote:
> Sorry for this long and vague explanation, if it still seems too
> uncertain we can have a chat or something like that. I don't think
> this number picking stuff deserve to be commented, because it still
> is quite close to random. RFC gives us too much freedom of choice.

Speaking about the RFC, I can see that there is a draft but nothing
formal yet.  The last one I can see is v14 from last November:
https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-14

It does not strike me as a good idea to rush an implementation without
a specification officially approved because there is always a risk of
shipping something that's non-compliant into core.  But perhaps I am
missing something on the RFC side?
--
Michael

Вложения

Re: UUID v7

От
"Andrey M. Borodin"
Дата:

> On 12 Mar 2024, at 10:53, Michael Paquier <michael@paquier.xyz> wrote:
>
> It does not strike me as a good idea to rush an implementation without
> a specification officially approved because there is always a risk of
> shipping something that's non-compliant into core.  But perhaps I am
> missing something on the RFC side?

Upthread one of document’s authors commented:

> On 14 Feb 2023, at 19:13, Kyzer Davis (kydavis) <kydavis@cisco.com> wrote:
>
> The point is 99% of the work since adoption by the IETF has been ironing out
> RFC4122's problems and nothing major related to UUIDv6/7/8 which are all in a
> very good state.

And also


> On 22 Jan 2024, at 09:22, Nikolay Samokhvalov <nik@postgres.ai> wrote:
>
> And many libraries are already including implementation of UUIDv7 – here are some examples:
>
> - https://www.npmjs.com/package/uuidv7
> - https://crates.io/crates/uuidv7
> - https://github.com/google/uuid/pull/139

So at least reviewing patch and agreeing on chosen methods and constants makes sense.


Best regards, Andrey Borodin.


Re: UUID v7

От
Michael Paquier
Дата:
On Tue, Mar 12, 2024 at 11:10:37AM +0500, Andrey M. Borodin wrote:
> On 12 Mar 2024, at 10:53, Michael Paquier <michael@paquier.xyz> wrote:
>> On 22 Jan 2024, at 09:22, Nikolay Samokhvalov <nik@postgres.ai> wrote:
>>
>> And many libraries are already including implementation of UUIDv7 – here are some examples:
>>
>> - https://www.npmjs.com/package/uuidv7
>> - https://crates.io/crates/uuidv7
>> - https://github.com/google/uuid/pull/139
>
> So at least reviewing patch and agreeing on chosen methods and constants makes sense.

Sure, there is no problem in discussing a patch to implement a
behavior.  But I disagree about taking a risk in merging something
that could become non-compliant with the approved RFC, if the draft is
approved at the end, of course.  This just strikes me as a bad idea.
--
Michael

Вложения

Re: UUID v7

От
Jelte Fennema-Nio
Дата:
On Mon, 11 Mar 2024 at 19:27, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
> Sorry for this long and vague explanation, if it still seems too uncertain we can have a chat or something like that.
Idon't think this number picking stuff deserve to be commented, because it still is quite close to random. RFC gives us
toomuch freedom of choice. 

I thought your explanation was quite clear and I agree that this
approach makes the most sense. I sent an email to the RFC authors to
ask for their feedback with you (Andrey) in the CC, because even
though it makes the most sense it does not comply with the either of
method 1 or 2 as described in the RFC.



Re: UUID v7

От
Jelte Fennema-Nio
Дата:
On Tue, 12 Mar 2024 at 07:32, Michael Paquier <michael@paquier.xyz> wrote:
> Sure, there is no problem in discussing a patch to implement a
> behavior.  But I disagree about taking a risk in merging something
> that could become non-compliant with the approved RFC, if the draft is
> approved at the end, of course.  This just strikes me as a bad idea.

I agree that we shouldn't release UUIDv7 support if the RFC describing
that is not yet approved. But I do think it would be a shame if e.g.
the RFC got approved 2 weeks after Postgres its feature freeze. Which
would then mean we'd have to wait another 1.5 years before actually
using uuidv7. Would it be a reasonable compromise to still merge the
patch for PG17 (assuming the code is good to merge with regards to the
current draft RFC), but revert the commit if the RFC is not approved
before some deadline before the release date (e.g. before the first
release candidate)?



Re: UUID v7

От
Sergey Prokhorenko
Дата:
Hi Jelte,


Andrey's patch corresponds exactly to Fixed-Length Dedicated Counter Bits (Method 1).

Andrey and you simply did not read the RFC a little further down in the text:
__________________________________________________________________

The following sub-topics cover topics related solely with creating reliable fixed-length dedicated counters:

Fixed-Length Dedicated Counter Seeding:

Implementations utilizing the fixed-length counter method randomly initialize the counter with each new timestamp tick. However, when the timestamp has not increased, the counter is instead incremented by the desired increment logic. When utilizing a randomly seeded counter alongside Method 1, the random value MAY be regenerated with each counter increment without impacting sortability. The downside is that Method 1 is prone to overflows if a counter of adequate length is not selected or the random data generated leaves little room for the required number of increments. Implementations utilizing fixed-length counter method MAY also choose to randomly initialize a portion of the counter rather than the entire counter. For example, a 24 bit counter could have the 23 bits in least-significant, right-most, position randomly initialized. The remaining most significant, left-most counter bit is initialized as zero for the sole purpose of guarding against counter rollovers.

Fixed-Length Dedicated Counter Length:
Select a counter bit-length that can properly handle the level of timestamp precision in use. For example, millisecond precision generally requires a larger counter than a timestamp with nanosecond precision. General guidance is that the counter SHOULD be at least 12 bits but no longer than 42 bits. Care must be taken to ensure that the counter length selected leaves room for sufficient entropy in the random portion of the UUID after the counter. This entropy helps improve the unguessability characteristics of UUIDs created within the batch.
The following sub-topics cover rollover handling with either type of counter method:
...
Counter Rollover Handling:

Counter rollovers MUST be handled by the application to avoid sorting issues. The general guidance is that applications that care about absolute monotonicity and sortability should freeze the counter and wait for the timestamp to advance which ensures monotonicity is not broken. Alternatively, implementations MAY increment the timestamp ahead of the actual time and reinitialize the counter.


Sergey Prokhorenko
sergeyprokhorenko@yahoo.com.au


On Tuesday, 12 March 2024 at 06:36:13 pm GMT+3, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:


On Mon, 11 Mar 2024 at 19:27, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

> Sorry for this long and vague explanation, if it still seems too uncertain we can have a chat or something like that. I don't think this number picking stuff deserve to be commented, because it still is quite close to random. RFC gives us too much freedom of choice.


I thought your explanation was quite clear and I agree that this
approach makes the most sense. I sent an email to the RFC authors to
ask for their feedback with you (Andrey) in the CC, because even
though it makes the most sense it does not comply with the either of
method 1 or 2 as described in the RFC.

Re: UUID v7

От
Jelte Fennema-Nio
Дата:
On Tue, 12 Mar 2024 at 18:18, Sergey Prokhorenko
<sergeyprokhorenko@yahoo.com.au> wrote:
> Andrey and you simply did not read the RFC a little further down in the text:

You're totally right, sorry about that. Maybe it would be good to move
those subsections around a bit in the RFC though, so that anything
related to only one method is included in the section for that method.



Re: UUID v7

От
Peter Eisentraut
Дата:
On 10.03.24 13:59, Andrey M. Borodin wrote:
>> The functions uuid_extract_ver and uuid_extract_var could be named
>> uuid_extract_version and uuid_extract_variant.  Otherwise, it's hard
>> to tell them apart, with only one letter different.
> 
> Renamed.

Another related comment: Throughout your patch, swap the order of 
uuid_extract_variant and uuid_extract_version.  First, this makes more 
sense because version is subordinate to variant, and also it makes it 
alphabetical.

>> I think the behavior of uuid_extract_var(iant) is wrong.  The code
>> takes just two bits to return, but the draft document is quite clear
>> that the variant is 4 bits (see Table 1).
> 
> Well, it was correct only for implemented variant. I've made version that implements full table 1 from section 4.1.

I think we are still interpreting this differently.  I think 
uuid_extract_variant should just return whatever is in those four bits. 
Your function comment says "Can return only 0, 0b10, 0b110 and 0b111.", 
which I don't think it is correct.  It should return 0 through 15.

>> I would have expected that, since gettimeofday() provides microsecond
>> precision, we'd put the extra precision into "rand_a" as per Section 6.2 method 3.
> 
> I had chosen method 2 over method 3 as most portable. Can we be sure how many bits (after reading milliseconds) are
thereacross different OSes?
 

I think this should have been researched.  If we don't know how many 
bits we have, how do we know we have enough for milliseconds?  I think 
we should at least have some kind of idea, if we are going to have this 
conversation.




Re: UUID v7

От
"Andrey M. Borodin"
Дата:

> On 14 Mar 2024, at 16:07, Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 10.03.24 13:59, Andrey M. Borodin wrote:
>>> The functions uuid_extract_ver and uuid_extract_var could be named
>>> uuid_extract_version and uuid_extract_variant.  Otherwise, it's hard
>>> to tell them apart, with only one letter different.
>> Renamed.
>
> Another related comment: Throughout your patch, swap the order of uuid_extract_variant and uuid_extract_version.
First,this makes more sense because version is subordinate to variant, and also it makes it alphabetical. 
I will do it soon.
>
>>> I think the behavior of uuid_extract_var(iant) is wrong.  The code
>>> takes just two bits to return, but the draft document is quite clear
>>> that the variant is 4 bits (see Table 1).
>> Well, it was correct only for implemented variant. I've made version that implements full table 1 from section 4.1.
>
> I think we are still interpreting this differently.  I think uuid_extract_variant should just return whatever is in
thosefour bits. Your function comment says "Can return only 0, 0b10, 0b110 and 0b111.", which I don't think it is
correct. It should return 0 through 15. 
We will return "do not care" bits. This bits can confuse someone. E.g. for varaint 0b10 we can return 8, 9, 10 and 11
randomly.Is it OK? BTW for some reason document lists number 1-15, but your are correct that range is 0-15. 

>
>>> I would have expected that, since gettimeofday() provides microsecond
>>> precision, we'd put the extra precision into "rand_a" as per Section 6.2 method 3.
>> I had chosen method 2 over method 3 as most portable. Can we be sure how many bits (after reading milliseconds) are
thereacross different OSes? 
>
> I think this should have been researched.  If we don't know how many bits we have, how do we know we have enough for
milliseconds? I think we should at least have some kind of idea, if we are going to have this conversation. 

Bits for milliseconds are strictly defined by the document: there are always 48 bits, independently from clock
resolution.
But I don't think it's main problem for Method 3. Method 1 actually guarantees strictly increasing order of UUIDs
generatedby single backend. Method 3 can generate a lot of unsorted data in case of time leaping backward. 

BTW Kyzer (in an off-list discussion) and Sergey confirmed that implemented method from the patch actually is Method 1.


Best regards, Andrey Borodin.


Re: UUID v7

От
Peter Eisentraut
Дата:
On 14.03.24 12:25, Andrey M. Borodin wrote:
>>>> I think the behavior of uuid_extract_var(iant) is wrong.  The code
>>>> takes just two bits to return, but the draft document is quite clear
>>>> that the variant is 4 bits (see Table 1).
>>> Well, it was correct only for implemented variant. I've made version that implements full table 1 from section
4.1.
>> I think we are still interpreting this differently.  I think uuid_extract_variant should just return whatever is in
thosefour bits. Your function comment says "Can return only 0, 0b10, 0b110 and 0b111.", which I don't think it is
correct. It should return 0 through 15.
 
> We will return "do not care" bits. This bits can confuse someone. E.g. for varaint 0b10 we can return 8, 9, 10 and 11
randomly.Is it OK? BTW for some reason document lists number 1-15, but your are correct that range is 0-15.
 

I agree it's confusing.  Before I studied the RFC 4122bis project, I 
didn't even know about variant vs. version.  I think overall people will 
find this more confusing than useful.  If you just want to know, "is 
this UUID of the kind specified in RFC 4122", you can query it with 
uuid_extract_version(x) IS NOT NULL.  So maybe we don't need the 
_extract_variant function?




Re: UUID v7

От
"Andrey M. Borodin"
Дата:

> On 14 Mar 2024, at 20:10, Peter Eisentraut <peter@eisentraut.org> wrote:
>
>>>>> I think the behavior of uuid_extract_var(iant) is wrong.  The code
>>>>> takes just two bits to return, but the draft document is quite clear
>>>>> that the variant is 4 bits (see Table 1).
>>>> Well, it was correct only for implemented variant. I've made version that implements full table 1 from section
4.1.
>>> I think we are still interpreting this differently.  I think uuid_extract_variant should just return whatever is in
thosefour bits. Your function comment says "Can return only 0, 0b10, 0b110 and 0b111.", which I don't think it is
correct. It should return 0 through 15. 
>> We will return "do not care" bits. This bits can confuse someone. E.g. for varaint 0b10 we can return 8, 9, 10 and
11randomly. Is it OK? BTW for some reason document lists number 1-15, but your are correct that range is 0-15. 
>
> I agree it's confusing.  Before I studied the RFC 4122bis project, I didn't even know about variant vs. version.  I
thinkoverall people will find this more confusing than useful.  If you just want to know, "is this UUID of the kind
specifiedin RFC 4122", you can query it with uuid_extract_version(x) IS NOT NULL.  So maybe we don't need the
_extract_variantfunction? 

I think it's the best possible solution. The variant has no value besides detecting if a version can be extracted.


Best regards, Andrey Borodin.


Re: UUID v7

От
Aleksander Alekseev
Дата:
Hi,

> > So maybe we don't need the _extract_variant function?
>
> I think it's the best possible solution. The variant has no value besides detecting if a version can be extracted.

+1 to the idea. I doubt that anyone will miss it.

-- 
Best regards,
Aleksander Alekseev



Re: UUID v7

От
"Andrey M. Borodin"
Дата:

> On 15 Mar 2024, at 14:47, Aleksander Alekseev <aleksander@timescale.com> wrote:
>
> +1 to the idea. I doubt that anyone will miss it.

PFA v22.

Changes:
1. Squashed all editorialisation by Jelte
2. Fixed my erroneous comments on using Method 2 (we are using method 1 instead)
3. Remove all traces of uuid_extract_variant()

Thanks!


Best regards, Andrey Borodin.

Вложения

Re: UUID v7

От
Peter Eisentraut
Дата:
On 16.03.24 18:43, Andrey M. Borodin wrote:
>> On 15 Mar 2024, at 14:47, Aleksander Alekseev <aleksander@timescale.com> wrote:
>>
>> +1 to the idea. I doubt that anyone will miss it.
> 
> PFA v22.
> 
> Changes:
> 1. Squashed all editorialisation by Jelte
> 2. Fixed my erroneous comments on using Method 2 (we are using method 1 instead)
> 3. Remove all traces of uuid_extract_variant()

I have committed a subset of this for now, namely the additions of 
uuid_extract_timestamp() and uuid_extract_version().  These seemed 
mature and agreed upon.  You can rebase the rest of your patch on top of 
that.

I have started a separate discussion to learn about the precision we can 
expect from gettimeofday().




Re: UUID v7

От
"Andrey M. Borodin"
Дата:

> On 19 Mar 2024, at 13:55, Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 16.03.24 18:43, Andrey M. Borodin wrote:
>>> On 15 Mar 2024, at 14:47, Aleksander Alekseev <aleksander@timescale.com> wrote:
>>>
>>> +1 to the idea. I doubt that anyone will miss it.
>> PFA v22.
>> Changes:
>> 1. Squashed all editorialisation by Jelte
>> 2. Fixed my erroneous comments on using Method 2 (we are using method 1 instead)
>> 3. Remove all traces of uuid_extract_variant()
>
> I have committed a subset of this for now, namely the additions of uuid_extract_timestamp() and
uuid_extract_version(). These seemed mature and agreed upon.  You can rebase the rest of your patch on top of that. 

Great! Thank you! PFA v23 with rebase on HEAD.

> I have started a separate discussion to learn about the precision we can expect from gettimeofday().

Even in presence of real microsecond-enabled and portable timer using microseconds does not seem to me an optimal way
ofutilising UUID bits. 

Timer-based bits contribute to global sortability. But the real timers we have are not even millisecond adjusted. We
canhope for ~few ms variation in one datacenter or in presence of atomic clocks. 

Time-based bits contribute to global uniqueness, but certainly they are not that effective as counter bits.

Time-based bits do not provide local sortability guarantees: some UUIDs might get same microseconds or be affected by
leapbackwards. 

I think that microseconds are good only for hardware-specific solutions, not for something that runs on variety of
platforms,OSes, devices. 


Best regards, Andrey Borodin.

Вложения

Re: UUID v7

От
"Andrey M. Borodin"
Дата:

> On 21 Mar 2024, at 20:21, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
>
> On Wed, 20 Mar 2024 at 19:08, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>> Timer-based bits contribute to global sortability. But the real timers we have are not even millisecond adjusted. We
canhope for ~few ms variation in one datacenter or in presence of atomic clocks. 
>
> I think the main benefit of using microseconds would not be
> sortability between servers, but sortability between backends.

Oh, that’s an interesting practical feature!
Se, essentially counter is a theoretical guaranty of sortability in one backend, while microseconds are practical
sortabilitybetween backends. 

> However, I don't really think it is incredibly important to get the
> "perfect" approach to filling in rand_a/rand_b right now. As long as
> we don't document what we do, we can choose to change the method
> without breaking backwards compatibility. Because either approach
> results in valid UUIDv7s.

Makes sense to me. I think both methods would be much better than UUIDv4 for practical reasons. And even not using
extrabits at all (fill them with random numbers) would work for 0.999 cases. 


Best regards, Andrey Borodin.


Re: UUID v7

От
Sergey Prokhorenko
Дата:
I think it's better to leave Andrey's patch as is, and add another function in the future with a customizable UUIDv7 structure for special use cases. The structure description can be in JSON format. See this discussion.



Sergey Prokhorenko sergeyprokhorenko@yahoo.com.au


On Friday, 22 March 2024 at 09:54:07 am GMT+3, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:




> On 21 Mar 2024, at 20:21, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
>
> On Wed, 20 Mar 2024 at 19:08, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>> Timer-based bits contribute to global sortability. But the real timers we have are not even millisecond adjusted. We can hope for ~few ms variation in one datacenter or in presence of atomic clocks.
>
> I think the main benefit of using microseconds would not be
> sortability between servers, but sortability between backends.

Oh, that’s an interesting practical feature!
Se, essentially counter is a theoretical guaranty of sortability in one backend, while microseconds are practical sortability between backends.


> However, I don't really think it is incredibly important to get the
> "perfect" approach to filling in rand_a/rand_b right now. As long as
> we don't document what we do, we can choose to change the method
> without breaking backwards compatibility. Because either approach
> results in valid UUIDv7s.


Makes sense to me. I think both methods would be much better than UUIDv4 for practical reasons. And even not using extra bits at all (fill them with random numbers) would work for 0.999 cases.


Best regards, Andrey Borodin.

Re: UUID v7

От
Peter Eisentraut
Дата:
On 21.03.24 16:21, Jelte Fennema-Nio wrote:
> On Wed, 20 Mar 2024 at 19:08, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>> Timer-based bits contribute to global sortability. But the real timers we have are not even millisecond adjusted. We
canhope for ~few ms variation in one datacenter or in presence of atomic clocks.
 
> 
> I think the main benefit of using microseconds would not be
> sortability between servers, but sortability between backends.

There is that, and there are also multiple backend workers for one session.




Re: UUID v7

От
Sergey Prokhorenko
Дата:
Why not use a single UUID generator for the database table in this case, similar to autoincrement?


Sergey Prokhorenko
sergeyprokhorenko@yahoo.com.au


On Friday, 22 March 2024 at 03:51:20 pm GMT+3, Peter Eisentraut <peter@eisentraut.org> wrote:


On 21.03.24 16:21, Jelte Fennema-Nio wrote:

> On Wed, 20 Mar 2024 at 19:08, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>> Timer-based bits contribute to global sortability. But the real timers we have are not even millisecond adjusted. We can hope for ~few ms variation in one datacenter or in presence of atomic clocks.
>
> I think the main benefit of using microseconds would not be
> sortability between servers, but sortability between backends.


There is that, and there are also multiple backend workers for one session.


Re: UUID v7

От
Sergey Prokhorenko
Дата:
BTW: Each microservice should have its own database to ensure data isolation and independence, enabling better scalability and fault tolerance






Sergey Prokhorenko sergeyprokhorenko@yahoo.com.au


On Friday, 22 March 2024 at 04:42:20 pm GMT+3, Sergey Prokhorenko <sergeyprokhorenko@yahoo.com.au> wrote:


Why not use a single UUID generator for the database table in this case, similar to autoincrement?


Sergey Prokhorenko
sergeyprokhorenko@yahoo.com.au


On Friday, 22 March 2024 at 03:51:20 pm GMT+3, Peter Eisentraut <peter@eisentraut.org> wrote:


On 21.03.24 16:21, Jelte Fennema-Nio wrote:

> On Wed, 20 Mar 2024 at 19:08, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>> Timer-based bits contribute to global sortability. But the real timers we have are not even millisecond adjusted. We can hope for ~few ms variation in one datacenter or in presence of atomic clocks.
>
> I think the main benefit of using microseconds would not be
> sortability between servers, but sortability between backends.


There is that, and there are also multiple backend workers for one session.


Re: UUID v7

От
Sergey Prokhorenko
Дата:
On Friday, 22 March 2024 at 04:58:59 pm GMT+3, Sergey Prokhorenko <sergeyprokhorenko@yahoo.com.au> wrote:


BTW: Each microservice should have its own database to ensure data isolation and independence, enabling better scalability and fault tolerance









Sergey Prokhorenko sergeyprokhorenko@yahoo.com.au


On Friday, 22 March 2024 at 04:42:20 pm GMT+3, Sergey Prokhorenko <sergeyprokhorenko@yahoo.com.au> wrote:


Why not use a single UUID generator for the database table in this case, similar to autoincrement?


Sergey Prokhorenko
sergeyprokhorenko@yahoo.com.au


On Friday, 22 March 2024 at 03:51:20 pm GMT+3, Peter Eisentraut <peter@eisentraut.org> wrote:


On 21.03.24 16:21, Jelte Fennema-Nio wrote:

> On Wed, 20 Mar 2024 at 19:08, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>> Timer-based bits contribute to global sortability. But the real timers we have are not even millisecond adjusted. We can hope for ~few ms variation in one datacenter or in presence of atomic clocks.
>
> I think the main benefit of using microseconds would not be
> sortability between servers, but sortability between backends.


There is that, and there are also multiple backend workers for one session.


Re: UUID v7

От
Peter Eisentraut
Дата:
On 20.03.24 19:08, Andrey M. Borodin wrote:
>> On 19 Mar 2024, at 13:55, Peter Eisentraut <peter@eisentraut.org> wrote:
>>
>> On 16.03.24 18:43, Andrey M. Borodin wrote:
>>>> On 15 Mar 2024, at 14:47, Aleksander Alekseev <aleksander@timescale.com> wrote:
>>>>
>>>> +1 to the idea. I doubt that anyone will miss it.
>>> PFA v22.
>>> Changes:
>>> 1. Squashed all editorialisation by Jelte
>>> 2. Fixed my erroneous comments on using Method 2 (we are using method 1 instead)
>>> 3. Remove all traces of uuid_extract_variant()
>>
>> I have committed a subset of this for now, namely the additions of uuid_extract_timestamp() and
uuid_extract_version(). These seemed mature and agreed upon.  You can rebase the rest of your patch on top of that.
 
> 
> Great! Thank you! PFA v23 with rebase on HEAD.

I have been studying the uuidv() function.

I find this code extremely hard to follow.

We don't need to copy all that documentation from the RFC 4122bis 
document.  People can read that themselves.  What I would like to see is 
easy to find information what from there we are implementing.  Like,

- UUID version 7
- fixed-length dedicated counter
- counter is 18 bits
- 4 bits are initialized as zero

That's more or less all I would need to know what is going on.

That said, I don't understand why you say it's an 18 bit counter, when 
you overwrite 6 bits with variant and version.  Then it's just a 12 bit 
counter?  Which is the size of the rand_a field, so that kind of makes 
sense.  But 12 bits is the recommended minimum, and (in this patch) we 
don't use sub-millisecond timestamp precision, so we should probably use 
more than the minimum?

Also, you are initializing 4 bits (I think?) to zero to guard against 
counter rollovers (so it's really just an 8 bit counter?).  But nothing 
checks against such rollovers, so I don't understand the use of that.

The code code be organized better.  In the not-increment_counter case, 
you could use two separate pg_strong_random calls: One to initialize 
rand_b, starting at &uuid->data[8], and one to initialize the counter. 
Then the former could be shared between the two branches, and the code 
to assign the sequence_counter to the uuid fields could also be shared.

I would also prefer if the normal case (not-increment_counter) were the 
first if branch.


Some other notes on your patch:

- Your rebase duplicated the documentation of uuid_extract_timestamp and 
uuid_extract_version.

- PostgreSQL code uses uint64 etc. instead of uint64_t etc.

- It seems the added includes

#include "access/xlog.h"
#include "utils/builtins.h"
#include "utils/datetime.h"

are not needed.

- The static variables sequence_counter and previous_timestamp could be 
kept inside the uuidv7() function.




Re: UUID v7

От
"Andrey M. Borodin"
Дата:
Sorry for this long reply. I was looking on refactoring around pg_strong_random() and could not decide what to do.
Finally,I decided to post at least something. 

> On 22 Mar 2024, at 19:15, Peter Eisentraut <peter@eisentraut.org> wrote:
>
> I have been studying the uuidv() function.
>
> I find this code extremely hard to follow.
>
> We don't need to copy all that documentation from the RFC 4122bis document.  People can read that themselves.  What I
wouldlike to see is easy to find information what from there we are implementing.  Like, 
>
> - UUID version 7
> - fixed-length dedicated counter
> - counter is 18 bits
> - 4 bits are initialized as zero

I've removed table taken from RFC.

> That's more or less all I would need to know what is going on.
>
> That said, I don't understand why you say it's an 18 bit counter, when you overwrite 6 bits with variant and version.
Then it's just a 12 bit counter?  Which is the size of the rand_a field, so that kind of makes sense.  But 12 bits is
therecommended minimum, and (in this patch) we don't use sub-millisecond timestamp precision, so we should probably use
morethan the minimum? 


No, we use 4 bits in data[6], 8 bits in data[7], and 6 bits data[8]. It's 18 total. Essentially, we use both partial
bytesand one whole byte between. 
There was a bug - we used 1 extra byte of random numbers that was not necessary, I think that's what lead you to think
thatwe use 12-bit counter. 

> Also, you are initializing 4 bits (I think?) to zero to guard against counter rollovers (so it's really just an 8 bit
counter?). But nothing checks against such rollovers, so I don't understand the use of that. 

No, there's only one guard rollover bit.
Here: uuid->data[6] = (uuid->data[6] & 0xf7);
Bits that are called "guard bits" do not guard anything, they just ensure counter capacity when it is initialized.
Rollover is carried into time tick here:
    ++sequence_counter;
        if (sequence_counter > 0x3ffff)
        {
            /* We only have 18-bit counter */
            sequence_counter = 0;
            previous_timestamp++;
        }

I think we might use 10 bits of microseconds and have 8 bits of a counter. Effect of a counter won't change much. But
I'mnot sure if this is allowed per RFC. 
If time source is coarse-grained it still acts like a random initializer. And when it is precise - time is "natural"
sourceof entropy. 


> The code code be organized better.  In the not-increment_counter case, you could use two separate pg_strong_random
calls:One to initialize rand_b, starting at &uuid->data[8], and one to initialize the counter. Then the former could be
sharedbetween the two branches, and the code to assign the sequence_counter to the uuid fields could also be shared. 

Call to pg_strong_random() is very expensive in builds without ssl (and even with ssl too). If we could ammortize
randomnumbers in small buffers - that would save a lot of time (see v8-0002-Buffer-random-numbers.patch upthread). Or,
perhaps,we can ignore cost of two pg_string_random() calls. 

>
> I would also prefer if the normal case (not-increment_counter) were the first if branch.

Done.

> Some other notes on your patch:
>
> - Your rebase duplicated the documentation of uuid_extract_timestamp and uuid_extract_version.
>
> - PostgreSQL code uses uint64 etc. instead of uint64_t etc.
>
> - It seems the added includes
>
> #include "access/xlog.h"
> #include "utils/builtins.h"
> #include "utils/datetime.h"
>
> are not needed.
>
> - The static variables sequence_counter and previous_timestamp could be kept inside the uuidv7() function.

Fixed.

Thanks!


Best regards, Andrey Borodin.

Вложения

Re: UUID v7

От
Peter Eisentraut
Дата:
On 26.03.24 18:26, Andrey M. Borodin wrote:
>> Also, you are initializing 4 bits (I think?) to zero to guard against counter rollovers (so it's really just an 8
bitcounter?).  But nothing checks against such rollovers, so I don't understand the use of that.
 
> No, there's only one guard rollover bit.
> Here: uuid->data[6] = (uuid->data[6] & 0xf7);
> Bits that are called "guard bits" do not guard anything, they just ensure counter capacity when it is initialized.

Uh, I guess I don't understand this at all.  I tried to dig up some 
information about this, but didn't find anything.  What exactly is the 
mechanism of these "counter rollover guards"?  If they don't guard 
anything, what are they supposed to accomplish?




Re: UUID v7

От
"Andrey M. Borodin"
Дата:

> On 4 Apr 2024, at 18:45, Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 26.03.24 18:26, Andrey M. Borodin wrote:
>>> Also, you are initializing 4 bits (I think?) to zero to guard against counter rollovers (so it's really just an 8
bitcounter?).  But nothing checks against such rollovers, so I don't understand the use of that. 
>> No, there's only one guard rollover bit.
>> Here: uuid->data[6] = (uuid->data[6] & 0xf7);
>> Bits that are called "guard bits" do not guard anything, they just ensure counter capacity when it is initialized.
>
> Uh, I guess I don't understand this at all.  I tried to dig up some information about this, but didn't find anything.
What exactly is the mechanism of these "counter rollover guards"?  If they don't guard anything, what are they supposed
toaccomplish? 
>

My understanding of guard bits is the following: on every UUID generation, when time is advancing, counter bits are
initializedwith random numbers, except guard bits. Guard bits are always initialized with zeroes. 

Let's consider we have a 1-byte counter with 4 guard bits and 4 normal bits.
If we generate some UUIDs at the very same millisecond we might have following counter values:

0C <--- lower nibble is initialized with random 4 bits C.
0D
0E
0F
10
11
12

If we have no these guard bits we might get random numbers that are immifiately at the end of a range of allowed
values:

FE <--- first UUID at given millisecond
FF
00 <--- rollover to next millisecond
01


If we have 1 guard bit and 7 normal bits we get at worst 128 values before rollover to next millisecond.
If we have 2 guard bits and 6 normal bits this guaranty is extended to 192.
3/5 will guaranty capacity of 224.
But usefulness of every next guard bits decreases, so I think there is a point in only one.

That's my understanding of guard bits in the counter. Please correct me if I'm wrong.


At this point we can skip the counter\microseconds entirely, just fill everything after unix_ts_ms with randomness.
It'sstill a valid UUIDv7, exhibiting much more data locality than UUIDv4. We can adjust this sortability measures
later.


Best regards, Andrey Borodin.


Re: UUID v7

От
Sergey Prokhorenko
Дата:
For every complex problem there is an answer that is clear, simple, and wrong. Since the RFC allows microsecond timestamp granularity, the first thing that comes to everyone's mind is to insert microsecond granularity into UUIDv7. And if the RFC allowed nanosecond timestamp granularity, then they would try to insert nanosecond granularity into UUIDv7.

But I am categorically against abandoning the counter under pressure from the unfounded proposal to replace the counter with microsecond granularity.

1) The RFC specifies millisecond timestamp granularity by default.

2) All advanced UUIDv7 implementations include a counter:
• for JavaScript https://www.npmjs.com/package/uuidv7
• for Rust https://crates.io/crates/uuid7
• for Go (Golang) https://pkg.go.dev/github.com/gofrs/uuid#NewV7
• for Python https://github.com/oittaa/uuid6-python

3) The theoretical performance of generating UUIDv7 without loss of monotonicity for microsecond granularity is only 1000 UUIDv7 per millisecond. This is very low and insufficient generation performance! But the actual generation performance is even worse, since the generation demand is unevenly distributed within a millisecond. Therefore, a UUIDv7 will not be generated every microsecond.

For a counter 18 bits long, with the most significant bit initialized to zero and the remaining bits initialized to a random number, the actual performance of generating a UUIDv7 without loss of monotonicity is between 2 to the power of 17 = 131072 UUIDv7 per millisecond (if the random number happens to be all ones) to 2 to the power of 18 = 262144 UUIDv7 per millisecond (if the random number happens to be all zeros). This is more than enough.

4) Microsecond timestamp fraction subtracts 10 bits from random data, which increases the risk of collision. In the counter, almost all bits are initialized with a random number, which reduces the risk of collision.



The only reasonable use of microsecond granularity is when writing to a database table in parallel. However, monotonicity in this case can be ensured in another way, namely a single UUIDv7 generator per database table, similar to SERIAL (https://postgrespro.com/docs/postgresql/16/datatype-numeric#DATATYPE-SERIAL) in PostgreSQL.

Best regards,
Sergey Prokhorenko
sergeyprokhorenko@yahoo.com.au


On Thursday, 4 April 2024 at 09:12:17 pm GMT+3, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:



...


At this point we can skip the counter\microseconds entirely, just fill everything after unix_ts_ms with randomness. It's still a valid UUIDv7, exhibiting much more data locality than UUIDv4. We can adjust this sortability measures later.


Best regards, Andrey Borodin.

Re: UUID v7

От
"Andrey M. Borodin"
Дата:

> On 12 Mar 2024, at 20:41, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
>
> if e.g.
> the RFC got approved 2 weeks after Postgres its feature freeze

Jelte, you seem to be the visionary! I would consider participating in lotteries or betting.
New UUID is assigned RFC number 9562, it was aproved by RFC editors and is now in AUTH48 state. This means after final
approvalby authors RFC will be imminently publicised. Most probably, this will happen circa 2 weeks after feature
freeze:) 


Best regards, Andrey Borodin.

[0] https://www.rfc-editor.org/auth48/rfc9562


Re: UUID v7

От
Sergey Prokhorenko
Дата:
I think that for the sake of such an epoch-making thing as UUIDv7 it would be worth slightly unfreezing this feature freeze.

Best regards, 


Sergey Prokhorenko
sergeyprokhorenko@yahoo.com.au


On Saturday, 13 April 2024 at 09:58:29 am GMT+3, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:




> On 12 Mar 2024, at 20:41, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
>
> if e.g.
> the RFC got approved 2 weeks after Postgres its feature freeze

Jelte, you seem to be the visionary! I would consider participating in lotteries or betting.
New UUID is assigned RFC number 9562, it was aproved by RFC editors and is now in AUTH48 state. This means after final approval by authors RFC will be imminently publicised. Most probably, this will happen circa 2 weeks after feature freeze :)


Best regards, Andrey Borodin.


[0]
https://www.rfc-editor.org/auth48/rfc9562

Re: UUID v7

От
Michael Paquier
Дата:
On Sat, Apr 13, 2024 at 07:07:34PM +0000, Sergey Prokhorenko wrote:
> I think that for the sake of such an epoch-making thing as UUIDv7 it
> would be worth slightly unfreezing this feature freeze.

A feature freeze is here to freeze things in place.  This comes up
every year, and that won't happen.

> New UUID is assigned RFC number 9562, it was aproved by RFC editors
> and is now in AUTH48 state. This means after final approval by
> authors RFC will be imminently publicised. Most probably, this will
> happen circa 2 weeks after feature freeze :)
>
> [0] https://www.rfc-editor.org/auth48/rfc9562

Well, that's life.  It looks like this is waiting for some final
approval, which may take some more time.  I have no idea how long this
usually takes.
--
Michael

Вложения

Re: UUID v7

От
"Andrey M. Borodin"
Дата:
> On 13 Apr 2024, at 11:58, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>
> New UUID is assigned RFC number 9562, it was aproved by RFC editors and is now in AUTH48 state.

RFC 9562 is not in AUTH48-Done state, it was approved by authors and editor, and now should be published.


Best regards, Andrey Borodin.


Re: UUID v7

От
"Andrey M. Borodin"
Дата:

> On 3 May 2024, at 11:18, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>

Here's the documentation from ClickHouse [0] for their implementation. It's identical to provided patch in this thread,
withfew notable exceptions: 

1. Counter is 42 bits, not 18. The counter have no guard bits, every bit is initialized with random number on time
ticks.
2. By default counter is shared between threads. Alternative function generateUUIDv7ThreadMonotonic() provides
thread-localcounter. 

Thanks!


Best regards, Andrey Borodin.

[0] https://clickhouse.com/docs/en/sql-reference/functions/uuid-functions#generateUUIDv7


Re: UUID v7

От
"Andrey M. Borodin"
Дата:

> On 3 May 2024, at 11:18, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>
> RFC 9562 is not in AUTH48-Done state, it was approved by authors and editor, and now should be published.

It's RFC now.
https://datatracker.ietf.org/doc/rfc9562/


Best regards, Andrey Borodin.