Обсуждение: Lost updates vs resumable connections/transactions

Поиск
Список
Период
Сортировка

Lost updates vs resumable connections/transactions

От
Jens Lechtenbörger
Дата:
Dear reader,

I've got the following simple wish: Please tell me how to program a
standard web application that allows to update tuples without lost
updates.

My guess is that this cannot be done easily right now, but that this
could be done very easily with just a slight API (libpq) extension,
which allows to tie a user/browser session to a database
transaction.  The remainder of this mail explains my view of the
situation in detail.  If I'm missing something fundamental then
please let me know...

I'd like to have a web application (PHP in my case) that starts a
transaction to display some tuples in a browser.  (This happens in
one script in an isolated transaction/connection.)
The user should then be able to update some values of a chosen tuple
and write the updated tuple back into the database.  (This happens
in a second script, which - due to the stateless nature of HTTP -
has to open a second database connection, so there is no link
between the database operations of script one and script two.)

I'd like to see this application under transaction control to avoid
race conditions and, in particular, the following kind of lost
update (which, according to Murphy, *will* happen with two
independent transactions/connections/scripts):
Two users select the same tuple t at about the same point in time,
user u1 updates t into t1 and writes it back, while user u2 updates
t into t2 and writes that version back, just a little after t1 has
been written.  This way, the update of user u1 gets lost.

In my view the above scenario is that of a standard multi-user web
application.  However, I couldn't find a simple solution to program
transactions spanning multiple scripts/HTTP requests.
(PHP persistent connections don't help, see, e.g., here:
http://www-fr.mysql.com/news-and-events/newsletter/2002-11/a0000000086.html
)

So, my first question: Is there a simple solution to program
transactions that span multiple scripts?
I'm aware of the following "heavyweight" solution:
1. Do some kind of explicit user level locking.  In my view, this is
redundant work and the wrong way as databases support transactions.
2. Program a proxy-process that actually handles the database
connections (instead of the HTTP server) and that allows to suspend a
transaction at the end of one script, which can be resumed at the
beginning of another script.  E.g. SQL Relay can do this:
http://sqlrelay.sourceforge.net/

So, my second question: What do the PostgreSQL maintainers think
about enriching the API to make it usable in a web environment,
where stateless HTTP servers stand between (and in the way of)
statefull applications and database servers?
This could be done with just two additional API calls, e.g.,
for libpq:
int PQsuspendConn(PGconn *conn) and
PGconn *PQresumeConn(const char *conninfo)

Here, I assume that PQsuspendConn gets called at the end of one
script, which tells the database that this connection and its
associated transaction will be idle for some time and returns an
identifier, say x.  Later on, in some following script the database
connection is opened via PQresumeConn, passing the result of
PQsuspendConn as an additional parameter, say "transactionID = x",
which tells the database to continue the previous transaction on the
new connection.  This way, one could tie an HTTP session to a single
transaction.  (Function PQresumeConn could even be avoided by adding
the new parameter transactionID to PQconnectdb.)

What do you think?

Jens

P.S.
1. I'm aware that a suspended transaction could block other
transactions indefinitely.  To get around that, timeout mechanisms
could be used.
2. There would be exactly one database connection per user session.
Since the vast majority of user connections are likely to be
suspended at any point in time (they are only active while a script
is being executed, not while the user is thinking), suspended
connections have to be handled "efficiently".


Re: Lost updates vs resumable connections/transactions

От
Jan Wieck
Дата:
On 12/9/2004 10:43 AM, Jens Lechtenbörger wrote:

> Dear reader,
> 
> I've got the following simple wish: Please tell me how to program a
> standard web application that allows to update tuples without lost
> updates.
> 
> My guess is that this cannot be done easily right now, but that this
> could be done very easily with just a slight API (libpq) extension,
> which allows to tie a user/browser session to a database
> transaction.  The remainder of this mail explains my view of the
> situation in detail.  If I'm missing something fundamental then
> please let me know...
> 
> I'd like to have a web application (PHP in my case) that starts a
> transaction to display some tuples in a browser.  (This happens in
> one script in an isolated transaction/connection.)
> The user should then be able to update some values of a chosen tuple
> and write the updated tuple back into the database.  (This happens
> in a second script, which - due to the stateless nature of HTTP -
> has to open a second database connection, so there is no link
> between the database operations of script one and script two.)
> 
> I'd like to see this application under transaction control to avoid
> race conditions and, in particular, the following kind of lost
> update (which, according to Murphy, *will* happen with two
> independent transactions/connections/scripts):
> Two users select the same tuple t at about the same point in time,
> user u1 updates t into t1 and writes it back, while user u2 updates
> t into t2 and writes that version back, just a little after t1 has
> been written.  This way, the update of user u1 gets lost.
> 
> In my view the above scenario is that of a standard multi-user web
> application.  However, I couldn't find a simple solution to program
> transactions spanning multiple scripts/HTTP requests.
> (PHP persistent connections don't help, see, e.g., here:
> http://www-fr.mysql.com/news-and-events/newsletter/2002-11/a0000000086.html
> )
> 
> So, my first question: Is there a simple solution to program
> transactions that span multiple scripts?
> I'm aware of the following "heavyweight" solution:
> 1. Do some kind of explicit user level locking.  In my view, this is
> redundant work and the wrong way as databases support transactions.

You underestimate the impact of your "solution". Not only does it need 
lock timeouts, you also get scaling problems. Your approach assumes that 
each and every browser session will have its own, unique database session.

The same reason why your Apache web server can handle 10000 simultaneous 
sessions with just 100 or so server processes is manifold true in the 
case of a database. 10000 DB sessions means 10000 PostgreSQL backends. 
Probably only 100 or so active at any given time, so there will be a lot 
of swapping going on, unless you put some ridiculous amount of RAM into 
place just to counter this design mistake.

Even applications that have statefull enduser terminals (like SAP R/3 
for example) never allow an open transaction over user interaction. They 
all use the "heavy" way of application level advisory locks (held in 
database tables of course). In all these systems, no particular 
operating system resource other than a socket and some memory is used by 
any idle/thinking user session.


> 2. Program a proxy-process that actually handles the database
> connections (instead of the HTTP server) and that allows to suspend a
> transaction at the end of one script, which can be resumed at the
> beginning of another script.  E.g. SQL Relay can do this:
> http://sqlrelay.sourceforge.net/
> 
> So, my second question: What do the PostgreSQL maintainers think
> about enriching the API to make it usable in a web environment,
> where stateless HTTP servers stand between (and in the way of)
> statefull applications and database servers?
> This could be done with just two additional API calls, e.g.,
> for libpq:
> int PQsuspendConn(PGconn *conn) and
> PGconn *PQresumeConn(const char *conninfo)

Since in the Apache case, it isn't even guaranteed that the next request 
in a session will be served by the same web server process, how exactly 
do you intend to park and later hand over the open socket for that 
connection. And please, in a portable way ;-)




Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #


Re: Lost updates vs resumable connections/transactions

От
Greg Stark
Дата:
Jan Wieck <JanWieck@Yahoo.com> writes:

> Even applications that have statefull enduser terminals (like SAP R/3 for
> example) never allow an open transaction over user interaction. 

I'm not sure using SAP as your paragon of design excellence is a wise choice
here. From what I understand SAP implemented its own locking system because
the database it was based on didn't offer any locking at all.

But your basic point is sound. For a web site I would definitely avoid using
anything like database locks and even avoid doing anything with application
locks if possible.

If you really really want to expose the database session state I think he's on
the right track using SQLRelay. This would let him handle reconnecting a user
with her session even if she's connecting to a different Apache process.

I suspect the database wouldn't really be able to suspend a database
connection using any less memory than just keeping the entire backend process
with its session around anyways.

-- 
greg



Re: Lost updates vs resumable connections/transactions

От
Greg Stark
Дата:
Jens Lechtenboerger <lechten@wi.uni-muenster.de> writes:

> But why should I have SQLRelay between me and the database?
> I don't plan to use any of its "real" features.  It would just be a
> proxy with a known address that maintains a database connection.
> Obviously, the database server itself has a known address and
> maintains database connections...

The problem is that a database session keeps more state than just a
transaction id. It also keeps locks and other resources that would need to be
put aside in RAM. And for something like Apache where the new request could be
coming in to a different backend, these resources would have to be migrated to
the new backend when the session is resumed.

It could be done, but the most direct way to do it would be to keep around the
entire backend using something like SQLRelay. The resources used by SQLRelay
itself aren't really the concern here. They're dwarfed by the resources pinned
down by the database backends being kept around waiting in case the
application user comes back.

-- 
greg



Re: Lost updates vs resumable connections/transactions

От
Jan Wieck
Дата:
On 12/16/2004 8:52 AM, Jens Lechtenboerger wrote:
> Greg Stark <gsstark@mit.edu> writes:
> 
>> Jan Wieck <JanWieck@Yahoo.com> writes:
>>
>>> Even applications that have statefull enduser terminals (like SAP R/3 for
>>> example) never allow an open transaction over user interaction. 
>>
>> I'm not sure using SAP as your paragon of design excellence is a wise choice
>> here. From what I understand SAP implemented its own locking system because
>> the database it was based on didn't offer any locking at all.
>>
>> But your basic point is sound. For a web site I would definitely avoid using
>> anything like database locks and even avoid doing anything with application
>> locks if possible.
> 
> Well, I don't necessarily have to use locks.  I want any form of
> concurrency control that ensures serializability.  Optimistic
> approaches would be fine as well.
> 
>> If you really really want to expose the database session state I think he's on
>> the right track using SQLRelay. This would let him handle reconnecting a user
>> with her session even if she's connecting to a different Apache process.
> 
> But why should I have SQLRelay between me and the database?
> I don't plan to use any of its "real" features.  It would just be a
> proxy with a known address that maintains a database connection.
> Obviously, the database server itself has a known address and
> maintains database connections...

Because ...

actually knowing how the connection and session works on the PostgreSQL 
server side will help understanding it.

When your client application (in your case an Apache work process 
executing a PHP script) connects to the DB via libpq PQconnect(), it 
internally does socket(2), connect(2). On the server side, the 
postmaster, which had done socket(2), bind(2), listen(3) and currently 
wating on a select(2) will see that the server socket is ready. It will 
now call accept(2) to get the file descriptor of the server side end of 
the connection. After that it will fork(2) off a new process that will 
become your database server backend process handling your session.

Now the one (and only that one) Apache process, that is running your 
script, and that newly created backend (and only that one) have a socket 
each that are the connection for this session. No other process (well, 
we're not getting into IP spoofing or other hacker stuff please) can 
send or receive messages transmitted over this bidirectional channel. 
And there is no portable way to hand either side of the socket 
connection over to another process.

Now your PHP script ends. It can close the connection, which would 
result in an EOF condition on the server side, or leave it around for 
later reuse (persistent connection in PHP).

It doesn't really matter what it does, because now your user slams on 
the next submit button and "another" Apache work process will handle 
this new script request. Note that during your developer-is-single-user 
testing, you sometimes get the same Apache work process over and over 
again. But this isn't the case on a real production server under load.

And now what? How can this other Apache work process establish a new or 
take over the existing (persistent) connection that was/is used by the 
first Apache work process, which might not even exist any more because 
Apache dynamically adjusts the number of work processes to the current 
workload. Or that Apache process might be busy serving a 20MB PDF 
download for some time ... or whatever it does.

It doesn't help if you tell how the API call would preferrably look 
like. We need to know what you think this new libpq functions would do 
internally. What are the system calls that actually make this wonder happen?


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #


Re: Lost updates vs resumable connections/transactions

От
Jan Wieck
Дата:
On 12/17/2004 8:45 AM, Jens Lechtenboerger wrote:

> [...]
> When the connection is resumed, the postmaster creates a new socket
> with the (new) Apache process (just as you described above), gets
> the ID, informs the backend, and passes the new socket with
> sendmsg(2) to the backend which gets it with recvmsg(2).  (I have
> never passed sockets like this; I just read that it should work.)
> Now the backend can continue the existing transaction with the new
> Apache process.
> 
> Or not?

Yes, that would work ... on some but not all Unix derivates ... what 
about those where it does not? I know that Windows has a similar 
functionality available, but that would AFAIK require to use windows 
messages, which in turn requires to have a window handle for each and 
every backend, a rather dramatic change.

The PostgreSQL team members (me included) are big fans of portability. 
Introducing code that solves a problem for one specific web server, in 
the special case of a small number of application users, in a non 
portable way for only a couple operating systems and where the resulting 
functional difference is visible to the database client ... I don't 
think this idea has much of a chance to make it into the source tree.


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #


Re: Lost updates vs resumable connections/transactions

От
Jan Wieck
Дата:
On 12/17/2004 12:04 PM, Jens Lechtenboerger wrote:

> Jan Wieck <JanWieck@Yahoo.com> writes:
> 
>> [...]
>> The PostgreSQL team members (me included) are big fans of
>> portability. Introducing code that solves a problem for one specific web
>> server,
> 
> I don't see how this is web server specific?
> 
>> in the special case of a small number of application users,
> 
> Actually, I'm not sure that this should bother you.  As I wrote
> previously: I can put just the same load on the database server
> using Java applets right now.  In fact, the load with applets would
> even be higher, as the database server does not receive any hint
> when the connection is idle for some time...
> 
>> in a non portable way for only a couple operating systems
> 
> I get this point.
> 
>> and where the resulting functional difference is visible to the
>> database client ...
> 
> If it wasn't visible, then it would be useless for me.

I meant that there would be a functional difference between databases 
running on different operating systems. Like "if you want connection 
suspend/resume to work your DB server must run on OS foo or bar ...".

> 
>> I don't think this idea has much of a chance to make it into the
>> source tree.
> 
> I'm disappointed, though, and summarize:
> PostgreSQL transactions cannot be used naturally with CGI/PHP, and
> virtually every web application out there is prone to lost updates.

As said, open transactions with DB locks during user interaction are a 
known bad idea for every sort of application. That together with the 
scaling problems is IMHO reason enough not to implement something that 
is designed to avoid proper application side advisory locks.

Get used to put reasonable amounts of your business logic into stored 
procedures on the database side and you will find that dealing with 
advisory locks is not as painfull as it looks like. Doing it all with 
PHP coding alone, where a single business process is scattered over a 
input form flow dictated number of source files is neither as easy, nor 
as maintainable.


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #


Re: Lost updates vs resumable connections/transactions

От
Greg Stark
Дата:
Jan Wieck <JanWieck@Yahoo.com> writes:

> As said, open transactions with DB locks during user interaction are a known
> bad idea for every sort of application. That together with the scaling problems
> is IMHO reason enough not to implement something that is designed to avoid
> proper application side advisory locks.

I think it's a higher level divergence of opinion here. What he's talking
about is more like the Java/J2EE approach of building lots of infrastructure
to make everything work magically. Our instincts are to keep things simple and
avoid big hammers for features that would be hard to manage.

His way lies things like session caches and the rest of the J2EE type world.
Things work the way the textbooks say they should, full ACID semantics are
maintained for sessions across multiple processes even multiple servers, and a
good implementation would let you set policies for when sessions expire and
transactions roll back.

Our way lies things like PHP/Perl with stateless individual requests. Every
request is limited to what it can do with limited state and maintaining all
the rep invariants of the database. 

Keeping things simple limits you a lot but it makes things so much easier to
manage. It's kind of like a Database version of the same principle described
here:

http://www.schlossnagle.org/~george/blog/archives/269_Why_PHP_Scales__A_Cranky_Snarky_Answer.html

> Get used to put reasonable amounts of your business logic into stored
> procedures on the database side and you will find that dealing with advisory
> locks is not as painfull as it looks like. Doing it all with PHP coding alone,
> where a single business process is scattered over a input form flow dictated
> number of source files is neither as easy, nor as maintainable.

Eh. You can do that just as easily in the application. You write functions
that issue all the database queries you want to call. Often you can get all of
this intelligence into just a few objects with well defined interfaces. If
you're lucky you can get nearly all of it into a single object that represents
an arbitrary database record, which subclasses for each table that add only a
little intelligence.

I find trying to stuff things into stored procedures actually makes it
*harder* to abstract all this stuff away and harder to maintain. Languages
like PL/SQL or plpgsql are so limited and awkward that it's hard to express
anything more than straightforward procedural functions. They end up being
simple wrappers around the same SQL calls you would have made anyways.

Either way, the one thing you're trying to avoid is having SQL strewn about
every file. You want to push all the SQL down below an abstraction barrier so
that everything dictated by the front-end UI is well separated from the stuff
dictated by the database. The more abstract the interface between the two
layers the better. But how abstract it can be depends heavily on the
application.

-- 
greg



Re: Lost updates vs resumable connections/transactions

От
Jens Lechtenboerger
Дата:
Jan Wieck <JanWieck@Yahoo.com> writes:

> On 12/17/2004 12:04 PM, Jens Lechtenboerger wrote:
>
>> Jan Wieck <JanWieck@Yahoo.com> writes:
>>
>>> [...]
>>> I don't think this idea has much of a chance to make it into the
>>> source tree.
>> I'm disappointed, though, and summarize:
>> PostgreSQL transactions cannot be used naturally with CGI/PHP, and
>> virtually every web application out there is prone to lost updates.
>
> As said, open transactions with DB locks during user interaction are a known
> bad idea for every sort of application. That together with the scaling
> problems is IMHO reason enough not to implement something that is designed to
> avoid proper application side advisory locks.

After writing my last mail, I was riding my bicycle through the rain
and thought about the following additional application areas for an
API extension:

Grid computing: I might want to transfer a transaction from an
overloaded client node to another node.  Currently, this is not
possible.

Web services: I might want to compose an atomic service out of
component services over a single database.  In contrast to my
previous web example, in this scenario there needn't be user
interaction between the individual service invocations.

> Get used to put reasonable amounts of your business logic into stored
> procedures on the database side and you will find that dealing with advisory
> locks is not as painfull as it looks like. Doing it all with PHP coding alone,
> where a single business process is scattered over a input form flow dictated
> number of source files is neither as easy, nor as maintainable.

There is a fundamental problem.  It's not about "scattered" business
processes but about the very simple "look and edit" process that is
not supported.

Jens



Re: Lost updates vs resumable connections/transactions

От
Jens Lechtenboerger
Дата:
Jan Wieck <JanWieck@Yahoo.com> wrote:

> On 12/9/2004 10:43 AM, Jens Lechtenbörger wrote:
>
>> Dear reader,
>> I've got the following simple wish: Please tell me how to program a
>> standard web application that allows to update tuples without lost
>> updates.
>> My guess is that this cannot be done easily right now, but that this
>> could be done very easily with just a slight API (libpq) extension,
>> which allows to tie a user/browser session to a database
>> transaction.  The remainder of this mail explains my view of the
>> situation in detail.  If I'm missing something fundamental then
>> please let me know...
>>
>> [Some details from my original mail deleted...]
>>
>> So, my first question: Is there a simple solution to program
>> transactions that span multiple scripts?
>> I'm aware of the following "heavyweight" solution:
>> 1. Do some kind of explicit user level locking.  In my view, this is
>> redundant work and the wrong way as databases support transactions.
>
> You underestimate the impact of your "solution". Not only does it need lock
> timeouts, you also get scaling problems. Your approach assumes that each and
> every browser session will have its own, unique database session.

Yes.

> The same reason why your Apache web server can handle 10000 simultaneous
> sessions with just 100 or so server processes is manifold true in the case of
> a database. 10000 DB sessions means 10000 PostgreSQL backends. Probably only
> 100 or so active at any given time, so there will be a lot of swapping going
> on, unless you put some ridiculous amount of RAM into place just to counter
> this design mistake.

Actually, I assume just a few concurrent users, not 10000.

> Even applications that have statefull enduser terminals (like SAP R/3 for
> example) never allow an open transaction over user interaction. They all use
> the "heavy" way of application level advisory locks (held in database tables
> of course). In all these systems, no particular operating system resource
> other than a socket and some memory is used by any idle/thinking user session.

I got your point.  Nevertheless, my point is the following: If I
write my application, e.g., as Java applets, then I can have one
database connection per user (with a controlled number of concurrent
users).  I wonder about the same thing with CGI/PHP.

>> 2. Program a proxy-process that actually handles the database
>> connections (instead of the HTTP server) and that allows to suspend a
>> transaction at the end of one script, which can be resumed at the
>> beginning of another script.  E.g. SQL Relay can do this:
>> http://sqlrelay.sourceforge.net/
>> So, my second question: What do the PostgreSQL maintainers think
>> about enriching the API to make it usable in a web environment,
>> where stateless HTTP servers stand between (and in the way of)
>> statefull applications and database servers?
>> This could be done with just two additional API calls, e.g.,
>> for libpq:
>> int PQsuspendConn(PGconn *conn) and
>> PGconn *PQresumeConn(const char *conninfo)
>
> Since in the Apache case, it isn't even guaranteed that the next request in a
> session will be served by the same web server process, how exactly do you
> intend to park and later hand over the open socket for that connection. And
> please, in a portable way ;-)

I simply assume that I'm in an environment where session tracking is
supported.  Then the int value returned by PQsuspendConn will just
be part of session data.  E.g., in case of PHP there won't be any
problem.

I guess that many people will be happy with this assumption.

Jens



Re: Lost updates vs resumable connections/transactions

От
Christopher Browne
Дата:
gsstark@mit.edu (Greg Stark) writes:
> Jan Wieck <JanWieck@Yahoo.com> writes:
>> Even applications that have statefull enduser terminals (like SAP R/3 for
>> example) never allow an open transaction over user interaction. 
>
> I'm not sure using SAP as your paragon of design excellence is a
> wise choice here. From what I understand SAP implemented its own
> locking system because the database it was based on didn't offer any
> locking at all.

I don't think "design excellence" is the point; I'd point at Sabre/STS
as another example of "never an open transaction").

You (as a travel agent, out at the "client level") don't get to open
up user level transactions to the airline reservation system.

Instead, you fill a 3270 screen or some part thereof with data, and
then submit an "update."

As a result, each request is a fully-formed transaction, if a small
one.

> But your basic point is sound. For a web site I would definitely
> avoid using anything like database locks and even avoid doing
> anything with application locks if possible.

> If you really really want to expose the database session state I
> think he's on the right track using SQLRelay. This would let him
> handle reconnecting a user with her session even if she's connecting
> to a different Apache process.

We've got some web apps where it would be kind of nice to keep a
cursor around for a little while.

But there have been plenty of times when I have had to rue the fact of
transactions being held open for a long time by one thing or
another...
-- 
"cbbrowne","@","ca.afilias.info"
<http://dev6.int.libertyrms.com/>
Christopher Browne
(416) 673-4124 (land)


Re: Lost updates vs resumable connections/transactions

От
Jens Lechtenboerger
Дата:
Dear Jan and Greg,

thank you very much for your explanations so far!
I'll just continue ;)

Jan Wieck <JanWieck@Yahoo.com> writes:

> On 12/16/2004 8:52 AM, Jens Lechtenboerger wrote:
>> Greg Stark <gsstark@mit.edu> writes:
>>> Jan Wieck <JanWieck@Yahoo.com> writes:
>>>
>>> If you really really want to expose the database session state I think he's on
>>> the right track using SQLRelay. This would let him handle reconnecting a user
>>> with her session even if she's connecting to a different Apache process.
>> But why should I have SQLRelay between me and the database?
>> I don't plan to use any of its "real" features.  It would just be a
>> proxy with a known address that maintains a database connection.
>> Obviously, the database server itself has a known address and
>> maintains database connections...
>
> Because ...
>
> actually knowing how the connection and session works on the PostgreSQL server
> side will help understanding it.
>
> When your client application (in your case an Apache work process executing a
> PHP script) connects to the DB via libpq PQconnect(), it internally does
> socket(2), connect(2). On the server side, the postmaster, which had done
> socket(2), bind(2), listen(3) and currently wating on a select(2) will see
> that the server socket is ready. It will now call accept(2) to get the file
> descriptor of the server side end of the connection. After that it will
> fork(2) off a new process that will become your database server backend
> process handling your session.
>
> [...Explanation about subsequent Apaches processes communicating over
> different sockets deleted...]
>
> It doesn't help if you tell how the API call would preferrably look like. We
> need to know what you think this new libpq functions would do internally. What
> are the system calls that actually make this wonder happen?

What about the following?

When a connection is established for the first time, a unique ID is
generated by the postmaster, which is passed on to the database
backend process.  The postmaster keeps a map about IDs and
associated backends.

When the connection is suspended, the ID is returned from the
backend to the Apache process, which stores it as part of the
session.  The socket is destroyed.  Then, (I don't know if/how
postmaster and backends communicate) the backend tells the
postmaster that its connection is suspended which is recorded in the
map kept at the postmaster.  Then the backend blocks until informed
by the postmaster that the connection should be resumed.
(Now, as Greg pointed out, this requires keeping resources such as
locks in memory.  Again: I don't assume that this mechanism is used
with thousands of users.  Programmers can be warned.)

When the connection is resumed, the postmaster creates a new socket
with the (new) Apache process (just as you described above), gets
the ID, informs the backend, and passes the new socket with
sendmsg(2) to the backend which gets it with recvmsg(2).  (I have
never passed sockets like this; I just read that it should work.)
Now the backend can continue the existing transaction with the new
Apache process.

Or not?

Jens



Re: Lost updates vs resumable connections/transactions

От
Jens Lechtenboerger
Дата:
Greg Stark <gsstark@mit.edu> writes:

> Jan Wieck <JanWieck@Yahoo.com> writes:
>
>> Even applications that have statefull enduser terminals (like SAP R/3 for
>> example) never allow an open transaction over user interaction. 
>
> I'm not sure using SAP as your paragon of design excellence is a wise choice
> here. From what I understand SAP implemented its own locking system because
> the database it was based on didn't offer any locking at all.
>
> But your basic point is sound. For a web site I would definitely avoid using
> anything like database locks and even avoid doing anything with application
> locks if possible.

Well, I don't necessarily have to use locks.  I want any form of
concurrency control that ensures serializability.  Optimistic
approaches would be fine as well.

> If you really really want to expose the database session state I think he's on
> the right track using SQLRelay. This would let him handle reconnecting a user
> with her session even if she's connecting to a different Apache process.

But why should I have SQLRelay between me and the database?
I don't plan to use any of its "real" features.  It would just be a
proxy with a known address that maintains a database connection.
Obviously, the database server itself has a known address and
maintains database connections...

Jens



Re: Lost updates vs resumable connections/transactions

От
Jens Lechtenboerger
Дата:
Jan Wieck <JanWieck@Yahoo.com> writes:

> [...]
> The PostgreSQL team members (me included) are big fans of
> portability. Introducing code that solves a problem for one specific web
> server,

I don't see how this is web server specific?

> in the special case of a small number of application users,

Actually, I'm not sure that this should bother you.  As I wrote
previously: I can put just the same load on the database server
using Java applets right now.  In fact, the load with applets would
even be higher, as the database server does not receive any hint
when the connection is idle for some time...

> in a non portable way for only a couple operating systems

I get this point.

> and where the resulting functional difference is visible to the
> database client ...

If it wasn't visible, then it would be useless for me.

> I don't think this idea has much of a chance to make it into the
> source tree.

I'm disappointed, though, and summarize:
PostgreSQL transactions cannot be used naturally with CGI/PHP, and
virtually every web application out there is prone to lost updates.

Thank you very much for your feedback anyways.

Jens



Re: Lost updates vs resumable connections/transactions

От
Jan Wieck
Дата:
On 12/17/2004 1:10 PM, Greg Stark wrote:

> Jan Wieck <JanWieck@Yahoo.com> writes:
> 
>> As said, open transactions with DB locks during user interaction are a known
>> bad idea for every sort of application. That together with the scaling problems
>> is IMHO reason enough not to implement something that is designed to avoid
>> proper application side advisory locks.
> 
> I think it's a higher level divergence of opinion here. What he's talking
> about is more like the Java/J2EE approach of building lots of infrastructure
> to make everything work magically. Our instincts are to keep things simple and
> avoid big hammers for features that would be hard to manage.

ANSI SQL doesn't specify any lock timeouts or possibility to cancel 
someone elses transaction, heck it doesn't even define any way to find 
out what particular lock is held by what session. Maybe a database 
system that is oriented towards standard conformance isn't the right 
choice for this environment.

Standard conformant SQL databases do not work well with open 
transactions holding locks during user interactions. That is a fact. I 
would object to adding proprietary features to PostgreSQL so that the 
application developer can use some sort of SQL looking gibberish instead 
of creating the appropriate framework. I am however only one developer, 
and if other key members of the developer community are in favor for it, 
I can certainly ignore such feature - given it wouldn't affect the rest 
of PostgreSQL's features or performance as long as not used.


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #