Обсуждение: Re: Big project, please help

Поиск
Список
Период
Сортировка

Re: Big project, please help

От
Paul Condon
Дата:
Two way replication on a single "table" is availabe in Lotus Notes. In
Notes, every record has a time-stamp, which contains the time of the
last update. (It also has a creation timestamp.) During replication,
timestamps are compared at the row/record level, and compared with the
timestamp of the last replication. If, for corresponding rows in two
replicas, the timestamp of one row is newer than the last replication,
the contents of this newer row is copied to the other replica. But if
both of the corresponding rows have newer timestamps, there is a
problem. The Lotus Notes solution is to: 1. send a replication conflict message to the Notes Administrator,
which message contains full copies of both rows. 2. copy the newest row over the less new row in the replicas. 3. there
isa mechanism for the Administrator to reverse the default
 
decision in 2, if the semantics of the message history, or off-line
investigation indicates that the wrong decision was made.

In practice, the Administrator is not overwhelmed with replication
conflict messages because updates usually only originate at the site
that originally created the row. Or updates fill only fields that were
originally 'TBD'. The full logic is perhaps more complicated than I have
described here, but it is already complicated enough to give you an idea
of what you're really being asked to do. I am not aware of a supplier of
relational database who really supports two way replication at the level
that Notes supports it, but Notes isn't a relational database.

The difficulty of the position that you appear to be in is that
management might believe that the full problem is solved in brand X
RDBMS, and you will have trouble convincing management that this is not
really true.



Re: Re: Big replication project, please help

От
Andrew McMillan
Дата:
Paul Condon wrote:
> 
> Two way replication on a single "table" is availabe in Lotus Notes. In
> Notes, every record has a time-stamp, which contains the time of the
> last update. (It also has a creation timestamp.) During replication,
> timestamps are compared at the row/record level, and compared with the
> timestamp of the last replication. If, for corresponding rows in two

I've implemented a similar two-way replication scheme for an application
some years ago and it works well.  This was written using Progress 4GL,
which has specific triggers for handling replication functionality
(REPLICATION-CREATE, REPLICATION-WRITE and REPLICATION-DELETE) which we
coded to write replication data into a table (Progress also allows you
to grab the whole record as a "RAW" field which we stuffed into a field
in the replication table).

Then we wrote processes to periodically dump the replication tables at
each site, swap them, and apply the updates.

The reason/advantage of having separate replication triggers was that
they would be disabled on replicate-in, but the _other_ triggers could
be left firing if desired.  In fact I think we found it wasn't desired
in most cases (because the changes effected by the triggers were _also_
being replicated), but we did use it in some cases (like where a summary
table was not being replicated, and was being maintained entirely by
triggers at both sites).

Of course we were writing our own replication information with this
one.  We kept before-image records for replication changes, rather than
having modification timestamps for every record, and considered that a
different before-image was a replication conflict.  Since we were
writing the replication ourselves this before-image / after-image
approach worked better than having to add timestamps to every table on
the database.

The _really_ necessary function for achieving this sort of replication
would be a way of getting a raw record before and after the changes -
I'm no PostgreSQL guru, but I think that should be possible in a 3GL
trigger.  The replication itself could be implemented with normal logic
and some flags to indicate whether a process is running normally, or is
replicating data in.  That detection could be handled (e.g.) by having
the replication-in process operate as a special 'replication' user that
would be detected within normal triggers enabling/disabling
functionality as appropriate.

A fairly small 'C'-language routine to operate as a generic replication
trigger should be achievable quite readily within these constraints, I
think.  I imagine it would be best to have such a routine write output
directly to log files, rather than to PostgreSQL tables, given the 8k
record size limitation and current problems holding binary data directly
in PostgreSQL columns.  This is unfortunate, as it would also introduce
concurrency complications for a busy database.  If you can guarantee
table record sizes under 2k you can probably get away with using
PostgreSQL tables for the replication data if you did some sort of
encoding of the raw record images.

This scheme worked really well (continues to work well) for about 4
years now. Conflicts are rare because although there are around 40
people using the application at various locations, they are all all
accessing fairly narrow record sets, especially for update.

In my replication we have left the actual transfer mechanism as far out
of the equation as possible.  In fact we used e-mail for the replication
messages.  When the sites were on dial-up modems we just did it once a
day, but now they are on DSL internet connections we do it much more
regularly.  That much reduces the numbers of collisions too, of course.

Feel free to pick my brains more on this - I should even be able to dig
out all of our design documentation on it from somewhere!

Regards,                Andrew McMillan
-- 
_____________________________________________________________________           Andrew McMillan, e-mail:
Andrew@cat-it.co.nz
Catalyst IT Ltd, PO Box 10-225, Level 22, 105 The Terrace, Wellington
Me: +64 (21) 635 694, Fax: +64 (4) 499 5596, Office: +64 (4) 499 2267