Обсуждение: RE: [GENERAL] Transaction logging
What would it take to have transaction logging added to Postgres. I am a
c/c++ programmer and will consider contributing to the Postgres development
effort. I really like everything I see and read about Postgres. As a
result, I am strongly considering Postgres as the database engine for my
Membership database application. My customer is willing to invest in a
commercial database, but most of the commercial databases I have briefly
looked at fall a little short in one way or another. I have several
concerns/needs that I am willing to implement and/or support:
- Outer join support in views
- Transaction logging
- Some form of mirroring, shadowing, or replication
- The current locking mechanism is of some concern. I need to make
sure that one user can read a record and then a second can read and update
that same record.
- If the first user attempts to update that record, what happens?
I know some of these requests are currently being worked, it would be
helpful to get some idea of when these items are expected to be released.
Thanks, Michael
-----Original Message-----
From: sdupille@i-france.com [SMTP:sdupille@i-france.com]
Sent: Tuesday, February 23, 1999 6:08 AM
To: pgsql-general@postgreSQL.org
Subject: Re: [GENERAL] Transaction logging
Hi !
Peter T Mount <peter@retep.org.uk> writes:
> > Has anyone implemented transaction logging in Postgres? Any
suggestions on
> > how to easily implement transaction logging? Storing the log
file in a text
> > file seems best but I am not sure out to open and write to a
text file from
> > a trigger. I would also be nice to post this transaction log
against a back
> > up server.
> Just a quick thought, but how about using syslog? That can be used
to post
> queries to a remote server, and it can be told to store the
"postgres"
> stuff to a seperate file on that server.
> Just an idea...
Why not, but I think it's a bad idea. Syslog is used to log
events coming from the system. It stores every connection to the
system, and any event which can affect the system (such as power
shutdown).
The transaction logging is a different taste of log : it
must
store every transaction made to the database, and in case of
deletion
of records, or change to data, it must save the old values. So it
generates a lot of traffic, and is closely dependant of the database
system.
Syslog is not strong enough to deal with so much data, and
the
use of an external process to get the transaction logging would
generate too much traffic (the cost in time would be too high). The
logging facility would, I think, be made by the database itself.
Anything else : the logging facility is used to recover the
database after a crash (mainly). This kind of log _must_ be easy to
use in case of crash. Syslog is very well when you won't to know
what
append, but not to deal with the problem. Don't forget that Syslog
add
some data to the events we send to him (such as the sender and the
date of the message). These data, in case of recovery by transaction
logging mechanism, are noise, which get the recovery (a little bit)
harder.
I don't think that we could get a logging facility with the
use of triggers. I think it would be better to hack the postgres
backend, and supersedes the access to SQL primitives (like insert or
update). It would be a little harder to implement, but faster and
totally transparent to the user.
regards.
--
___
{~._.~} Stephane - DUST - Dupille
( Y ) You were dust and you shall turn into dust
()~*~() email : sdupille@i-france.com
(_)-(_)
> What would it take to have transaction logging added to Postgres. I am a > c/c++ programmer and will consider contributing to the Postgres development > effort. I really like everything I see and read about Postgres. As a > result, I am strongly considering Postgres as the database engine for my > Membership database application. My customer is willing to invest in a > commercial database, but most of the commercial databases I have briefly > looked at fall a little short in one way or another. I have several > concerns/needs that I am willing to implement and/or support: > > - Outer join support in views In the works. Perhaps for 6.5, probably not. > > - Transaction logging > > - Some form of mirroring, shadowing, or replication > > - The current locking mechanism is of some concern. I need to make > sure that one user can read a record and then a second can read and update > that same record. MVCC locking in 6.5. Will do what you need. > > - If the first user attempts to update that record, what happens? Hard to explain. Will wait or update a copy while read's use an older copy fo the row. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Michael Davis wrote:
>
> What would it take to have transaction logging added to Postgres. I am a
> c/c++ programmer and will consider contributing to the Postgres development
> effort. I really like everything I see and read about Postgres. As a
I spent some time on transaction logging since it's a feature
I'm missing too. There are mainly two different transaction
log mechanisms out.
1. Log queries sent to the backend.
2. Log images of inserted/updated rows and row ID's of
deleted ones.
The query level logging will write less information if
queries usually affect a large number of rows. Unfortunately
the extensibility of Postgres work's against this approach.
There could be any number of user written functions who's
results aren't reproduceable during recovery. And core
features of Postgres itself would introduce the same problem.
Have a sequence which is used to create default values for
multiple tables, so that one ID is unique across them. Now
two backends insert (with INSERT ... SELECT) concurrently
into different tables using the same sequence. It's a
classic race condition and it depends on context switching
and page faults which backend will get which sequence
numbers. You cannot foresee and you cannot reproduce, except
you hook into the sequence generator and log this too. Later
when recovering, another hook into the sequence generator
must reproduce the logged results on the per
backend/transaction/command base, and the same must be done
for each function that usually returns unreproduceable
results (anything dealing with time, pid's, etc.).
As said, this must also cover user functions. So at least
there must be a general log API that provides such a
functionality for user written functions.
The image logging approach also has problems. First, the only
thing given to the heap access methods to outdate a tuple on
update/delete is the current tuple ID (information that tells
which tuple in which block is meant). So you need to save
the database files in binary format, because during the
actually existing dump/restore this could change and the
logged CTID's would hit the wrong tuples.
Second, you must remember in the log which transaction ID
these informations came from and later if the transaction
committed or not, so the recovery can set this commit/abort
information in pg_log too. pg_log is a shared system file and
the transaction ID's are unique only for one server. Using
this information for online replication of a single database
to another Postgres installation will not work.
Third, there are still some shared system catalogs across all
databases (pg_database, pg_group, pg_log!!!, pg_shadow and
pg_variable). Due to that it would be impossible (or at least
very, very tricky) to restore/recover (maybe point in time)
one single database. If you destroy one database and restore
it from the binary backup, these shared catalogs cannot be
restored too, so they're out of sync with the backup time.
How should the recovery now hit the right things (which
probably must not be there at all)?.
All this is really a mess. I think the architecture of
Postgres will only allow something on query level with some
general API for things that must reproduce the same result
during recovery. For example time(). Inside the backend,
time() should never be called directly. Instead another
function is to be called that log's during normal operation
which time get's returned by this particular function call
and if the backend is in recovery mode, returns the value
from the log.
And again, this all means trouble. Usually, most queries sent
to the database don't change any data because they are
SELECT's. It would dramatically blow up the log amount if you
log ALL queries instead of only those that modify things. But
when the query begins, you don't know this, because a SELECT
might call a function that uses SPI to UPDATE something else.
So the decision if the query must be logged or not can only
be made when the query is done (by having some global
variable where the heap access methods set a flag that
something got written). Now you have to log function call's
like time() even if the query will not modify any single row
in the database because the query is a
SELECT 'now'::datetime - updtime FROM ...
Doing this on a table with thousands of rows will definitely
waste much logging space and slowdown the whole thing by
unnecessary logging.
Maybe it's a compromise if at each query start the actual
time and other such information is remembered by the backend,
all time() calls return this remembered value instead of the
real one (wouldn't be bad anyway IMHO), and this information
is logged only if the query is to be logged.
Finally I think I must have missed some more problems, but
aren't these enough already to frustrate you :-?
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#======================================== jwieck@debis.com (Jan Wieck) #
Added to TODO:
* Transaction log, so re-do log can be on a separate disk by
logging SQL queries, or before/after row images
> Michael Davis wrote:
>
> >
> > What would it take to have transaction logging added to Postgres. I am a
> > c/c++ programmer and will consider contributing to the Postgres development
> > effort. I really like everything I see and read about Postgres. As a
>
> I spent some time on transaction logging since it's a feature
> I'm missing too. There are mainly two different transaction
> log mechanisms out.
>
> 1. Log queries sent to the backend.
>
> 2. Log images of inserted/updated rows and row ID's of
> deleted ones.
>
> The query level logging will write less information if
> queries usually affect a large number of rows. Unfortunately
> the extensibility of Postgres work's against this approach.
> There could be any number of user written functions who's
> results aren't reproduceable during recovery. And core
> features of Postgres itself would introduce the same problem.
> Have a sequence which is used to create default values for
> multiple tables, so that one ID is unique across them. Now
> two backends insert (with INSERT ... SELECT) concurrently
> into different tables using the same sequence. It's a
> classic race condition and it depends on context switching
> and page faults which backend will get which sequence
> numbers. You cannot foresee and you cannot reproduce, except
> you hook into the sequence generator and log this too. Later
> when recovering, another hook into the sequence generator
> must reproduce the logged results on the per
> backend/transaction/command base, and the same must be done
> for each function that usually returns unreproduceable
> results (anything dealing with time, pid's, etc.).
>
> As said, this must also cover user functions. So at least
> there must be a general log API that provides such a
> functionality for user written functions.
>
> The image logging approach also has problems. First, the only
> thing given to the heap access methods to outdate a tuple on
> update/delete is the current tuple ID (information that tells
> which tuple in which block is meant). So you need to save
> the database files in binary format, because during the
> actually existing dump/restore this could change and the
> logged CTID's would hit the wrong tuples.
>
> Second, you must remember in the log which transaction ID
> these informations came from and later if the transaction
> committed or not, so the recovery can set this commit/abort
> information in pg_log too. pg_log is a shared system file and
> the transaction ID's are unique only for one server. Using
> this information for online replication of a single database
> to another Postgres installation will not work.
>
> Third, there are still some shared system catalogs across all
> databases (pg_database, pg_group, pg_log!!!, pg_shadow and
> pg_variable). Due to that it would be impossible (or at least
> very, very tricky) to restore/recover (maybe point in time)
> one single database. If you destroy one database and restore
> it from the binary backup, these shared catalogs cannot be
> restored too, so they're out of sync with the backup time.
> How should the recovery now hit the right things (which
> probably must not be there at all)?.
>
> All this is really a mess. I think the architecture of
> Postgres will only allow something on query level with some
> general API for things that must reproduce the same result
> during recovery. For example time(). Inside the backend,
> time() should never be called directly. Instead another
> function is to be called that log's during normal operation
> which time get's returned by this particular function call
> and if the backend is in recovery mode, returns the value
> from the log.
>
> And again, this all means trouble. Usually, most queries sent
> to the database don't change any data because they are
> SELECT's. It would dramatically blow up the log amount if you
> log ALL queries instead of only those that modify things. But
> when the query begins, you don't know this, because a SELECT
> might call a function that uses SPI to UPDATE something else.
> So the decision if the query must be logged or not can only
> be made when the query is done (by having some global
> variable where the heap access methods set a flag that
> something got written). Now you have to log function call's
> like time() even if the query will not modify any single row
> in the database because the query is a
>
> SELECT 'now'::datetime - updtime FROM ...
>
> Doing this on a table with thousands of rows will definitely
> waste much logging space and slowdown the whole thing by
> unnecessary logging.
>
> Maybe it's a compromise if at each query start the actual
> time and other such information is remembered by the backend,
> all time() calls return this remembered value instead of the
> real one (wouldn't be bad anyway IMHO), and this information
> is logged only if the query is to be logged.
>
> Finally I think I must have missed some more problems, but
> aren't these enough already to frustrate you :-?
>
>
> Jan
>
> --
>
> #======================================================================#
> # It's easier to get forgiveness for being wrong than for being right. #
> # Let's break this rule - forgive me. #
> #======================================== jwieck@debis.com (Jan Wieck) #
>
>
>
>
--
Bruce Momjian | http://www.op.net/~candle
maillist@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026