Re: is there a way to automate deduplication of strings?
| От | Greg Sabino Mullane |
|---|---|
| Тема | Re: is there a way to automate deduplication of strings? |
| Дата | |
| Msg-id | CAKAnmmJ5A-5wvdqxyZCAigp5_Tpo=w=3oR0r8aP4LH2JqRHYsQ@mail.gmail.com обсуждение исходный текст |
| Ответ на | RE: is there a way to automate deduplication of strings? (Chris Papademetrious <Christopher.Papademetrious@synopsys.com>) |
| Список | pgsql-novice |
Ah, okay, so what you want is to normalize the data. Turns out, that's something database purists strive for, so you are in good company. Yes, there is a little more overhead, but that's kind of the cost of using a relational model. You can smooth things out a great bit with views, triggers, and functions. For example, you could create a view that joins the two tables together under the hood but an application or user would simply select from it as if it were the original non-duplicated form:
create view txn as select t.id /* plus other fields from "t" here */ , u.user_agent from transaction t join user_agent u on (u.id = t.user_agent_id);
You can also make the view ("txn" in this case) updateable by adding INSTEAD OF triggers on the view, which then call some functions that do the necessary maintenance work behind the scenes. Then your application simply calls insert/update/delete on the view as if it were the original table.
https://www.postgresql.org/docs/current/sql-createtrigger.html
We can help with those on this list, of course, but there are some good examples on that page to get you started.
Minor notes:
* Since the user_agent contains a small and finite number of variants, I would not bother with a uuid and just use an int:
create table user_agent (id int primary key generated always as identity, user_agent text not null unique);
* You ought to look at the new uuidv7() function, as it improves a lot on regular uuids. See for example:
https://www.thenile.dev/blog/uuidv7
* Rather than ON DELETE CASCADE, you probably want ON DELETE RESTRICT for this use case
* You mentioned "cleanup of no-longer-referenced values over time"
I would not sweat this, at least for this particular example. If those user agents were used once, they may get used again. For different types of data and larger tables, you could write a small sql function that cleans up stray entries and make sure to call that function after a bulk delete, or just cron it to run weekly.
create view txn as select t.id /* plus other fields from "t" here */ , u.user_agent from transaction t join user_agent u on (u.id = t.user_agent_id);
You can also make the view ("txn" in this case) updateable by adding INSTEAD OF triggers on the view, which then call some functions that do the necessary maintenance work behind the scenes. Then your application simply calls insert/update/delete on the view as if it were the original table.
https://www.postgresql.org/docs/current/sql-createtrigger.html
We can help with those on this list, of course, but there are some good examples on that page to get you started.
Minor notes:
* Since the user_agent contains a small and finite number of variants, I would not bother with a uuid and just use an int:
create table user_agent (id int primary key generated always as identity, user_agent text not null unique);
* You ought to look at the new uuidv7() function, as it improves a lot on regular uuids. See for example:
https://www.thenile.dev/blog/uuidv7
* Rather than ON DELETE CASCADE, you probably want ON DELETE RESTRICT for this use case
* You mentioned "cleanup of no-longer-referenced values over time"
I would not sweat this, at least for this particular example. If those user agents were used once, they may get used again. For different types of data and larger tables, you could write a small sql function that cleans up stray entries and make sure to call that function after a bulk delete, or just cron it to run weekly.
Cheers,
Greg
--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support
В списке pgsql-novice по дате отправления: