Re: UUID v7

Поиск
Список
Период
Сортировка
От Sergey Prokhorenko
Тема Re: UUID v7
Дата
Msg-id 1945125834.2044089.1706574441164@mail.yahoo.com
обсуждение исходный текст
Ответ на Re: UUID v7  ("Andrey M. Borodin" <x4mmm@yandex-team.ru>)
Список pgsql-hackers
Andrey,

I understand and agree with your goals. But instead of dangerous universal functions, it is better to develop safe highly specialized functions that implement only these goals.

There should not be a function uuidv7(T) from an arbitrary timestamp, but there should be a special function that implements your algorithm: uuidv8(now() + '1 century' * random(0,10)).

I replaced 1 day with 1 century because the spread of 1 day is too small. Over time, records will be inserted between existing records, which is undesirable.

Similarly, if we need to calculate the partition id, then we do not need to use the uuid_extract_time() function to provide the extracted timestamp, the accuracy of which cannot be guaranteed. Instead, we need to give exactly the partition id, calculated using the uuidv7 timestamp. For example, partitions may have approximately a month interval between each other.

As for the documentation, it must be indicated that the UUIDv7 structure is not timestamp + random, but timestamp + randomly seeded counter + random, like in all advanced implementations.


Sergey Prokhorenko

sergeyprokhorenko@yahoo.com.au

______________________________________________________________


On Monday, 29 January 2024 at 09:32:54 pm GMT+3, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:




> On 25 Jan 2024, at 22:04, Sergey Prokhorenko <sergeyprokhorenko@yahoo.com.au> wrote:
>
> Aleksander,
>
> In this case the documentation must state that the functions uuid_extract_time() and uuidv7(T) are against the RFC requirements, and that developers may use these functions with caution at their own risk, and these functions are not recommended for production environment.

Refining documentation is good. However, saying that these functions are not recommended for production must be based on some real threats.

>
> The function uuidv7(T) is not better than uuid_extract_time(). Careless developers may well pass any business date into this function: document date, registration date, payment date, reporting date, start date of the current month, data download date, and even a constant. This would be a profanation of UUIDv7 with very negative consequences.

Even if the developer pass constant time to uuidv7(T) they will get what they asked for - unique identifier. Moreover - it still will be keeping locality. There will be no negative consequences at all.
On the contrary, experienced developer can leverage parameter when data locality should be reduced. If you have serveral streams of data, you might want to introduce some shift in reduce contention.
For example, you can generate uuidv7(now() + '1 day' * random(0,10)). This will split 1 contention point to 10 and increase ingestion performance 10x-fold.

> On 29 Jan 2024, at 18:58, Junwang Zhao <zhjwpku@gmail.com> wrote:
>
> If other timestamp sources or
> a custom timestamp epoch are required, UUIDv8 MUST be used.

Well, yeah. RFC says this... in 4 capital letters :) I believe it's kind of a big deficiency that k-way sortable identifiers are not implementable on top of UUIDv7. Well, let's go without this function. UUIDv7 is still an improvement over previous versions.


Jelte, your documentation corrections looks good to me, I'll include them in next version.

Thanks!


Best regards, Andrey Borodin.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Re: Refactoring backend fork+exec code
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Use of backup_label not noted in log