Обсуждение: WIP partial replication patch

Поиск
Список
Период
Сортировка

WIP partial replication patch

От
Boszormenyi Zoltan
Дата:
Hi,

attached is a WIP patch that will eventually implement
partial replication, with the following syntax:

CREATE REPLICA CLASS classname
    [ EXCLUDING RELATION ( relname [ , ... ] ) ]
    [ EXCLUDING DATABASE ( dbname [ , ... ] ) ]

ALTER REPLICA CLASS classname
    [ { INCLUDING | EXCLUDING } RELATION ( relname [ , ... ] ) ]
    [ { INCLUDING | EXCLUDING } DATABASE ( dbname [ , ... ] ) ]

The use case is to have a secondary server where read-only access
is allowed but (maybe for space reasons) some tables and databases
are excluded from the replication. The standby server keeps those tables
at the state of the last full backup but no further modification is done
to them.

The current patch adds two new global system tables, pg_replica and
pg_replicaitem and three new indexes to maintain the classes and their
contents.

The startup process in standby mode connects to a new database called
"replication" which is created at initdb time. This is needed because
transaction context is needed for accessing the syscache for the new tables.

There is a little nasty detail with the patch as it stands. The RelFileNode
triplet is currently treated as if it carried the relation Oid, but it's
not actually
true, the RelFileNode contains the relfilenode ID. Initially, without table
rewriting DDL,  the oid equals relfilenode, which was enough for a proof of
concept patch. I will need to extend the relmapper so it can carry more than
one "database-local" mapping info, so the filter can work in all database
at once. To be able to do this, all databases' pg_class should be read
initially
and re-read during relmapper cache invalidation. As a sidenode, this work
may serve as a basis for full cross-database relation access, too.

Best regards,
Zoltán Böszörményi


Вложения

Re: WIP partial replication patch

От
Tom Lane
Дата:
Boszormenyi Zoltan <zb@cybertec.at> writes:
> attached is a WIP patch that will eventually implement
> partial replication, with the following syntax:

This fundamentally cannot work, as it relies on system catalogs to be
valid during recovery.  Another rather basic problem is that you've
got to pass system catalog updates downstream (in case they affect
the tables being replicated) but if you want partial replication then
many of those updates will be incorrect for the slave machine.

More generally, though, we are going to have our hands full for the
foreseeable future trying to get the existing style of replication
bug-free and performant.  I don't think we want to undertake any large
expansion of the replication feature set, at least not for some time
to come.  So you can count on me to vote against committing anything
like this into core.
        regards, tom lane


Re: WIP partial replication patch

От
Boszormenyi Zoltan
Дата:
Tom Lane írta:
> Boszormenyi Zoltan <zb@cybertec.at> writes:
>   
>> attached is a WIP patch that will eventually implement
>> partial replication, with the following syntax:
>>     
>
> This fundamentally cannot work, as it relies on system catalogs to be
> valid during recovery.

Just like Hot Standby, no? What is the difference here?
Sorry for being ignorant.

>   Another rather basic problem is that you've
> got to pass system catalog updates downstream (in case they affect
> the tables being replicated) but if you want partial replication then
> many of those updates will be incorrect for the slave machine.
>   

Yes, it's true. But there's an easy solution to that, querying
such tables can be forbidden, we were talking about truncating
such excluded relations internally. Currently querying exluded
tables are allowed just to be able to see that DML indeed doesn't
modify them. As I said, ATM it's only a proof of concept patch.

> More generally, though, we are going to have our hands full for the
> foreseeable future trying to get the existing style of replication
> bug-free and performant.  I don't think we want to undertake any large
> expansion of the replication feature set, at least not for some time
> to come.  So you can count on me to vote against committing anything
> like this into core.
>   

Understood.

Best regards,
Zoltán Böszörményi



Re: WIP partial replication patch

От
Andres Freund
Дата:
On Fri, Aug 13, 2010 at 09:36:00PM +0200, Boszormenyi Zoltan wrote:
> Tom Lane írta:
> > Boszormenyi Zoltan <zb@cybertec.at> writes:
> >
> >> attached is a WIP patch that will eventually implement
> >> partial replication, with the following syntax:
> > This fundamentally cannot work, as it relies on system catalogs to be
> > valid during recovery.
> Just like Hot Standby, no? What is the difference here?
> Sorry for being ignorant.
In HS you can only connect after youve found a restartpoint - only
after that you know that you have reached a consistent point for the
system.

I think this is fixable by keeping more wal on the standby's but I
need to think more about it.

Andres


Re: WIP partial replication patch

От
Josh Berkus
Дата:
>  Another rather basic problem is that you've
> got to pass system catalog updates downstream (in case they affect
> the tables being replicated) but if you want partial replication then
> many of those updates will be incorrect for the slave machine.

Couldn't this be taken care of by replicating the objects but not any
data for them?  That is, the tables and indexes would exist, but be empty?

> More generally, though, we are going to have our hands full for the
> foreseeable future trying to get the existing style of replication
> bug-free and performant.  I don't think we want to undertake any large
> expansion of the replication feature set, at least not for some time
> to come.  So you can count on me to vote against committing anything
> like this into core.

I imagine it'll take more than a year to get this to work, if we ever
do.  Probably good to put it on a git branch and that way those who want
to can continue long-term work on it.

--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com
 


Re: WIP partial replication patch

От
Tom Lane
Дата:
Josh Berkus <josh@agliodbs.com> writes:
>> Another rather basic problem is that you've
>> got to pass system catalog updates downstream (in case they affect
>> the tables being replicated) but if you want partial replication then
>> many of those updates will be incorrect for the slave machine.

> Couldn't this be taken care of by replicating the objects but not any
> data for them?  That is, the tables and indexes would exist, but be empty?

Seems a bit pointless.  What exactly is the use-case for a slave whose
system catalogs match the master exactly (as they must) but whose data
does not?

Notice also that you have to shove the entire WAL downstream anyway ---
the proposed patch filters at the point of application, and would have a
hard time doing better because LSNs have to remain consistent.

It would also be rather tricky to identify which objects have to have
updates applied, eg, if you replicate a table you'd damn well better
replicate the data for each and every one of its indexes (which is a
non-constant set in general), because queries on the slave will expect
them all to be valid.  Maybe it's possible to keep track of that, though
I bet things will be tricky when there are uncommitted DDL changes
(consider data changes associated with a CREATE INDEX on a replicated
table).  In any case xlog replay functions are not the place to have
that kind of logic.
        regards, tom lane


Re: WIP partial replication patch

От
Boszormenyi Zoltan
Дата:
Andres Freund írta:
> On Fri, Aug 13, 2010 at 09:36:00PM +0200, Boszormenyi Zoltan wrote:
>   
>> Tom Lane írta:
>>     
>>> Boszormenyi Zoltan <zb@cybertec.at> writes:
>>>
>>>       
>>>> attached is a WIP patch that will eventually implement
>>>> partial replication, with the following syntax:
>>>>         
>>> This fundamentally cannot work, as it relies on system catalogs to be
>>> valid during recovery.
>>>       
>> Just like Hot Standby, no? What is the difference here?
>> Sorry for being ignorant.
>>     
> In HS you can only connect after youve found a restartpoint - only
> after that you know that you have reached a consistent point for the
> system.
>   

And in this patch, the startup process only tries to connect
after signalling the postmaster that a consistent state is reached.
And the connection has a reasonable timeout built in.

> I think this is fixable by keeping more wal on the standby's but I
> need to think more about it.
>
> Andres
>
>   

Best regards,
Zoltán Böszörményi



Re: WIP partial replication patch

От
Andres Freund
Дата:
On Sat, Aug 14, 2010 at 08:40:24AM +0200, Boszormenyi Zoltan wrote:
> Andres Freund írta:
> > On Fri, Aug 13, 2010 at 09:36:00PM +0200, Boszormenyi Zoltan wrote:
> >
> >> Tom Lane írta:
> >>
> >>> Boszormenyi Zoltan <zb@cybertec.at> writes:
> >>>
> >>>
> >>>> attached is a WIP patch that will eventually implement
> >>>> partial replication, with the following syntax:
> >>>>
> >>> This fundamentally cannot work, as it relies on system catalogs to be
> >>> valid during recovery.
> >>>
> >> Just like Hot Standby, no? What is the difference here?
> >> Sorry for being ignorant.
> >>
> > In HS you can only connect after youve found a restartpoint - only
> > after that you know that you have reached a consistent point for the
> > system.
> >
> And in this patch, the startup process only tries to connect
> after signalling the postmaster that a consistent state is reached.
> And the connection has a reasonable timeout built in.
I don't think you currently can guarantee you allways have enough local WAL to even reach
a consistent point. Which is not a problem of your patch, dont get me
wrong...

Andres


Re: WIP partial replication patch

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On Sat, Aug 14, 2010 at 08:40:24AM +0200, Boszormenyi Zoltan wrote:
>> And in this patch, the startup process only tries to connect
>> after signalling the postmaster that a consistent state is reached.
>> And the connection has a reasonable timeout built in.

> I don't think you currently can guarantee you allways have enough
> local WAL to even reach a consistent point.

Even if you do, the patch will malfunction (and perhaps corrupt the
database) while reading that WAL.  Yes, it'd work once you reach a
consistent database state, but bootstrapping a slave into that
condition will be far more painful than it is with the current
replication code.
        regards, tom lane