Ostrovsky Eugene wrote:
> Hi.
> I need to export data from the database to external file. The difficulty
> is that only data modified or added since previous export should be
> written to the file.
> I consider adding "modification_time" timestamp field to all the tables
> that should be exported. Then I can set this field to now() within ON
> UPDATE OR INSERT trigger.
> During export I can select modified data with 'WHERE modification_time >
> last_export_time' clause.
>
> It seems to be the solution but...
> What if the concurrent (and not yet committed) transaction modified some
> data before export transaction begins? These modifications would not be
> visible to export transaction and modified data would not be included to
> export file. Also it won't be included to the next export because it's
> modification time is less than current export start time (the new value
> of last_export_time).
>
> Thus some data could be lost from export files sequence. And that is not
> good at all.
About the only solid solution I can think of right now is to LOCK TABLE
the table you want to dump. You can use a lockmode that permits SELECT,
but just blocks UPDATE/INSERT/DELETE from other threads. That way your
modification time approach works.
(I strongly suggest leaving the modification time field without an
index, so that HOT can do in-place replacement of the rows and you avoid
a whole lot of bloat).
There might be another possible approach that uses the system
"xmin/xmax" fields of each tuple. That'd permit your incremental dumps
to be done read-only, saving you a whole lot of expensive I/O and bloat.
I'm just not sure what I'm thinking of will work yet. I'll check back in
once I've had a play and written a test script or two.
( If it will, then surely there'd be "pg_dump --incremental-as-of" by
now ...)
--
Craig Ringer
Tech-related writing: http://soapyfrogs.blogspot.com/