Re: historical log of data records
От | Alban Hertroys |
---|---|
Тема | Re: historical log of data records |
Дата | |
Msg-id | A0884D1E-66BD-4E0C-A9FC-8CFDFB41B922@gmail.com обсуждение исходный текст |
Ответ на | Re: historical log of data records (Laurenz Albe <laurenz.albe@cybertec.at>) |
Ответы |
Re: historical log of data records
|
Список | pgsql-general |
> On 16 Nov 2021, at 10:20, Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > On Tue, 2021-11-16 at 13:56 +0530, Sanjay Minni wrote: >> I need to keep a copy of old data as the rows are changed. >> >> For a general RDBMS I could think of keeping all the data in the same table with a flag >> to indicate older copies of updated / deleted rows or keep a parallel table and copy >> these rows into the parallel data under program / trigger control. Each has its pros and cons. >> >> In Postgres would i have to follow the same methods or are there any features / packages available ? > > Yes, I would use one of these methods. > > The only feature I can think of that may help is partitioning: if you have one partition > for the current data and one for the deleted data, then updating the flag would > automatically move the row between partitions, so you don't need a trigger. Are you building (something like) a data-vault? If so, keep in mind that you will have a row for every update, not just asingle deleted row. Enriching the data can be really useful in such cases. For a data-vault at a previous employer, we determined how to treat new rows by comparing a (md5) hash of the new and oldrows, adding the hash and a validity interval to the stored rows. Historic data went to a separate table for each respectivecurrent table. The current tables “inherited” the PK’s from the tables on the source systems (this was a data-warehouse DB). Obviously thatsame PK can not be applied to the historic tables where there _will_ be duplicates, although they should be at non-overlappingvalidity intervals. Alternatively, since this is time-series data, it would probably be a good idea to store that in a way optimised for that.TimescaleDB comes to mind, or arrays as per Pavel’s suggestion at https://stackoverflow.com/questions/68440130/time-series-data-on-postgresql. Regards, Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
В списке pgsql-general по дате отправления: