Обсуждение: Re: PostgresDataSource Question

Поиск
Список
Период
Сортировка

Re: PostgresDataSource Question

От
Kovács Péter
Дата:
Ned,

I am afraid that (with my effective contribution :-)) things have got mixed
up a bit here. So I will try to sort it  out.

First of all: the XADataSourceImpl does some kind of pooling, but it is
probably not intended to implement the functionality you want. I will
explain what this sorta pooling does, but before doing so I'd like to make
another statement that has not yet been made: support for connection pooling
and support for distributed transactions can be implemented separately. So
you do not need to bother with the xa package at all if you want to support
connection pooling or even if you want _implement_ connection pooling (in my
view there is a difference between supporting connection pooling and
implementing connection pooling). The reason why I was kind of pushing the
xa package is that, people on this list keep talking about incompleteness
regarding the Exoffice's xa package -- and they are right from their
perspective. But if you look at it in another way, this is good stuff -- and
this is clearly my fault that I did not fully explain my perspective.

I think that first I should explain my view. What I really need is a way to
handle transactions in an environment where there are only local
transactions. Furthermore, I would like to do it so that the transaction
handling mechanism is abstract enough for me to be able to replace the
database server at will. I want it to be so much abstract as to allow me to
use not only JDBC but also other interfaces (OODBMs have completely
different client interfaces). In a world, where you have only JDBC with
local transactions, you can happily pass around Connection instances, each
representing exactly one local transaction. But if you want your code more
generic, you will want to have something more abstract.

What I said above may be obvious, but a brief explanation may be helpful. I
am using a simple model here: I partition my application into two layers:
business logic layer (BLL) and data access layer (DAL). When I was talking
about "more generic code" in the previous paragraph I meant the BLL. Let me
further explain what I mean. With any real-life application I will want to
start and end transactions in the BLL. [I could implement such interfaces in
the DAL, that would span a whole transaction (where atomicity needed) thus
obviating the need for the BLL to use transaction demarcation. But in this
case part of the business logic would inevitably have to be implemented in
the DAL, which is not desirable since it reduces modularity with all the
disatvantages that I will not detail here.] My point is to use an abstract
notion/type/concept for transaction demarcation in the BLL, so that if I
want replace an OODBMs (for example Versant) with an RDMBS (for example
PostgreSQL), I will have to change only the data access layer and do not
have to touch the BLL.

Even though tha JTA has been designed to handle distributed transactions, it
also can handle local transactions as well. And if you look at the interface
exposed to the application server (TransactionManager), this interface does
all what I need, and is completely agnostic of whether the underlying
transaction is distributed or local. (Beside the requirement I described
above, it also offers the benefit, that I do not have to pass around any
transaction objects between function calls, because a transaction can be
attached to and detached from a thread.) So why should not I use the JTA, if
it does the job (well)???

So I was setting out to create my own implementation of JTA. But how to do
it? The JTA defines its versions of XA interface to integrate the resource
manager(s). Why should I invent another interface? Most of the RDMBSs
already provide implementations of these interfaces anyway. Does it do any
harm, if the architecture is also capable of handling distributed
transactions. (My strong belief is that: it does not, and it is even good
that I can use the same infrastructure for distributed transactions, as my
needs and tools advance -- but you may disagree.)

((I actually implemented a mechanism for handling local transactions in an
abstract way using a custom API. It was not easy, but you learn a lot from
doing this kind of stuff, because you have to go through and find solutions
to problems which arise in such an environment. After I thought the
implementation was complete and worked nicely, another potential problem
came to my mind. The problem was the following. I used ThreadLocal to attach
transaction objects to threads. Also, I used CORBA for IPC. Now, every
decent CORBA implementation uses thread pools to process incoming requests.
What happens --I asked myself--, if the user-programmer forgets to end
[commit or rollback] the transaction??? It may take some time before the
timer for the transaction expires and will be cleaned up from the thread it
has been attached to. During this time, the thread can be reused by the ORB
to process another incoming CORBA request, and the implementation that
executes in the reused thread will be confused, because it will find that it
is part of an ongoing transaction. The clean solution to this is to use a
mechanism which is integrated in the CORBA infrastructure. CORBA provides
the local interface Current to move around thread specific information. But,
I asked myself, if I am already bogged down so deeply in this mess, why
should not I use OMG's Object Transaction Service -- and why should not I go
the standard way accross the board. And that made it. [Just one small
differentiation between standards: some of the standards are open in the
sense that you can freely make AND distribute complying implementations, and
some of the standards are open in the sense that you can make clean-room
implementations, but you cannot distribute complying implementation without
further arrangement with the standard's owner/author/...the lawyers know
better what. My understanding is that JNDI, JDBC and JTA falls in the former
category and EJB and Servlets in the later, but I may be completely
wrong.]))

My current implementation of the JTA uses the OMG's OTS Version 1.1 (OpenORB
Transaction Service 1.2.0). So the transactions are global, but since I make
sure that one PostgreSQL connection participates only in transactions in
which only connections from the same datasource will participate, the
transaction will practically remain local in the sense that there will be no
2pcs.

Summary: I need Exoffice's xa package, because it can be used to integrate
PostgreSQL into a JTA implementation. I am not interested in how it
impelements 2pc, whether it fakes or not or whether it implements 2pc at
all. You cannot use PosgtreSQL for 2pc anyway (as we already repeated it too
many times).

So what kind of pooling is done in XADataSourceImpl? The best way to
describe it is going through a scenario.

We have the following components:
-- DataSource: implemented in the middleware;
-- Pool: a pool of connections implemented in the middleware;
-- PostgresqlXADataSource: implemented by the jdbc driver. Itself implements
org.postgresql.xa.XADataSourceImpl.

The application requests a connection from the DataSource. Assume that we're
right after startup, so theres nothing in the Pool, so the DataSource will
further the request to PostgresqlXADataSource by calling
PostgresqlXADataSource.getXAConnection(). This returns an "empty"
XAConnectionImpl instance. It is empty in the sense, that it has no physical
connection assigned to. The XAConnectionImpl instance is returned to the
DataSource. Now there are two possibilities: (1) we're in a global
transaction or (2) we're NOT in a global transaction. In case (1) DataSource
calls XAConnectionImpl.start() with the XID of the transaction. The result
is that XADataSourceImpl a) creates a new physical connection, b) mappes it
internall to the XID, c) creates a ClientConnections and returns it to the
application. When the application calls methods on the ClientConnection, the
physical connection is always retrieved (ultimately through the XID) and is
used to do the real job. When the application calls ClientConnection.close()
the DataSource gets notified, calls XADataSourceImpl.end(xid, TMSUCCESSFUL)
and puts the XADataSourceImpl into the Pool. Calling
XADataSourceImpl.end(xid, TMSUCCESSFUL) will have the result that
XADataSource will be "emptied", ie. detached from the physical connection
(which remains internally mapped to the XID in XADataSourceImpl). At this
point the DataSource might think that it has a free connection in the pool,
whereas what it has its only a shell, that will be attached next time to a
physical connection as needed. There also exists, at this point in time, in
the system a physical connection, but it has not been committed, so it is
not free, it is tied (internally mapped) to the ongoing transaction. Let's
assume that the application does not commit the transaction (TX) and and
reuses its XADataSourceImpl in the pool to do work in the same TX. It will
enlist XADataSourceImpl via XAConnectionImpl.start(xid,TMRESUME) which will
have the result that the physical connection with the open local transaction
will be attached back to the (single) XADataSourceImpl instance). Assume
that the app calls again ClientConnection.close() and the TX is still open.
The XADataSourceImpl instance will be put back into the Pool. Also assume
that another app thread in another global TX (TX2) requests a connection
from the DataSource. [If the other thread had requested a connection from
the DataSource before the first thread called ClientConnection.close(), the
DataSource (the Pool being empty) would have had to request a new connection
from XADataSourceImpl, which would have resulted in the construction of
another instance of XAConnectionImpl. This also a possible scenario, but
this is not the case now.] The DataSource takes the XAConnectionImpl
instance from the pool and enlists it which will result in
XAConnectionImpl.start(xid2,TMRESUME). The XADataSourceImpl will find in its
internal map no physical connection with this XID, so it will create a new
one (PHC2) and attaches it to our (only) XADataSourceImpl instance. Now only
we have only one XADataSourceImpl instance (it was so far always available
in the Pool when the DataSource needed one), but there are to physical
connection, one which is in use by the second thread as part of TX2, and one
which is mapped to the first transaction and is awaiting commit or further
use. Now this state is represents the adverse effect of the decoupling of
the physical connections from the PooledConnections (XAConnectonImpl) I
talked about in one of my previous mails: the DataSource is pooling/handling
XAConnectionImpl instances that are only loosly coupled to physical
instances. We can probably agree that the main purpose of connection pooling
is (a) reuse existing connections and (b) limiting the number of connections
being open at a point in time. Now requirement (a) will be always met by the
above mechanism, but requirement (b) will be met only over time (on average,
if you wish).

Now let's say the app in TX2 calls ClientImpl.close() [the DataSource puts
the XAConnectionImpl instance back in the Pool] and commits. PHC2 will be
commiteded and put (releaseTxConnection) in the internal pool of
XADataSourceImpl. Note that this is the first time that a physical
connection has been put into the internal pool of XADataSourceImpl. Our
first physical connection is still mapped to the first TX and will be put
into the internal pool only after the transaction it is mapped to has been
committed (and the commit() has successfully been called on the physical
connection). It is clear that when a connection is requested from
XADataSourceImpl, it will first look for a free one in its internal pool
before creating a new one, but this pooling mechanism does not (and in fact,
based on the spec, is not supposed to) do anything along lines of meeting
pooling requriement (b). I can imagine for example an RDMBS-JDBC driver
combination, where physical connections can be effectively detached from and
attached to transactions. In such a case, the JDBC driver does not need to
implement any internal pooling. The XADataSourceImpl in our case needs to
maintain a pool of physical connection (if you wish) per force, because the
PostgreSQL implementation does not allow to detach physical connections from
transactions. (I do not know the internals of the backend, but I do not
think it is impossible [or even very complicated] to implement such a
feature and I am not sure how it could be useful anyway.)

Peter


> -----Original Message-----
> From: Ned Wolpert [mailto:wolpert@yahoo.com]
> Sent: Thursday, January 03, 2002 2:48 AM
> To: Ned Wolpert; Kovács Péter; pgsql-jdbc@postgresql.org
> Subject: PostgresDataSource Question
>
>
> Folks-
>
>   I'm re-examing the PostgresDataSource class, and it seems
> that I missed
> a few things.  I need someone to verify what it is I'm
> looking at. This is
> based on my pooled stuff I submitted eariler, and the current
> conversation
> that has been going on about my submittal.
>
>   Basically, it seems that the XADataSourceImpl is a working pooling
> manager.  It is an abstract class, only extended by
> PostgresqlDataSource.
> The XADataSourceImpl provides the access to the pool from their method
> newConnection() and releaseConnection(), neither of which are called
> elsewhere.
>
>   It looks like the code was expecting the
> org.postgresql.jdbc2.Connection
>
> object to 'release' it if it was called by the datasource, when the
> connection was closed, but the Connection class was never modified. In
> short, the pool is almost there already, just not complete. The class
> PostgresqlDataSource _can_ pool, it just doesn't.  Does this look like
> a proper analysis to others?
>
>   I can do one of two things at this point, and I would like people's
> opinion as to what I should do. One, I can continue working on my pool
> manager, which will extend XADataSourceImpl and will still
> have to wrap
> the connection classes to notify my pooling manager of changes that
> occurs.  or Two, create a set of patches that will impact the jdbc2
> package and PostgresDataSource class to finish what was started.
>
>   What do you think folks? I'm starting to lean to option
> two, but would
> like to hear other people's opinions.  If we pick two, that means
> that my pooling manager is _part_ of the PostgresDataSource, not a
> seperate class.  Could some of the CVS committers comment on this?
> (Also, I'll be having patches for basically all the classes
> in the jdbc2
> and xa package.)
>
> =====
> Virtually,        |                   "Must you shout too?"
> Ned Wolpert       |                                  -Dante
> wolpert@yahoo.com |
> _________________/              "Who watches the watchmen?"
> 4e75                                       -Juvenal, 120 AD
>
> -- Place your commercial here --                      fnord
>
> __________________________________________________
> Do You Yahoo!?
> Send your FREE holiday greetings online!
> http://greetings.yahoo.com
>

Re: PostgresDataSource Question

От
Ned Wolpert
Дата:
Peter-

  I'll reply more off the mailing list (to save other-peoples bandwidth
;-) after I review more of what you wrote.  I think we have two different
goals, though they are not othogonal.  Here's what I'm working on, and
will submit to the group to decide if they want it.

1) A pooling implementation that works 'out-of-the-box', using my
origial approach, but playing the pooling code in the
org.postgresql.pool package.  You suggestion about that is good.

2) Work on a rowset implementation.

3) Start trying to get two-phase commits available. (Helping on the
backend where I can, and providing support in the jdbc driver)

Obviously, the order of complexity going from 1 to 2 to 3 is a factor
of 10 each step, but its what I'm working on.  Sounds like you have some
positive ideas on how the XA package needs to be upgraded.  I'm not going
to work on the xa code at this time, mostly because I'm not comfortable
enough with the needs/requirements of the package.  Perhaps you have time
to work on the xa package needs?

=====
Virtually,        |                   "Must you shout too?"
Ned Wolpert       |                                  -Dante
wolpert@yahoo.com |
_________________/              "Who watches the watchmen?"
4e75                                       -Juvenal, 120 AD

-- Place your commercial here --                      fnord

__________________________________________________
Do You Yahoo!?
Send your FREE holiday greetings online!
http://greetings.yahoo.com

Re: PostgresDataSource Question

От
Ned Wolpert
Дата:
(I responded to the mailing list after joe asked to keep this thread
here.)

--- Kov�cs_P�ter <peter.kovacs@sysdata.siemens.hu> wrote:
> I am afraid that (with my effective contribution :-)) things have got
mixed
> up a bit here. So I will try to sort it  out.

I feel the same way as you.  Also, as I go over your letter, keep in mind
that like most people here, I work on the driver in my spare time.
I'm currently working on two open source projects on my spare time;
submitting code for the pgsql JDBC driver and as a committer for the
castor JDO project.  I also have to make a living with my stuff at
work. :-) (I'm just lucky enough that my work also uses both Castor
and PostgreSQL, but I cannot work on either open-source project during
work.)

> First of all: the XADataSourceImpl does some kind of pooling, but it is
> probably not intended to implement the functionality you want. I will

Actually, it just about does.  The only thing it doesn't do is limit
and block waiting for connections to be free. (But I'm just arguing
semantics. Your assessment is correct.)

> explain what this sorta pooling does, but before doing so I'd like to
make
> another statement that has not yet been made: support for connection
pooling
> and support for distributed transactions can be implemented separately.
So
> you do not need to bother with the xa package at all if you want to
support
> connection pooling or even if you want _implement_ connection pooling
(in my

I never disagreed with this.  I had thought from your previous
postings that you were afraid I was re-doing work that is already
available in the XA package.  I see now that this is not what you
meant.

> view there is a difference between supporting connection pooling and
> implementing connection pooling). The reason why I was kind of pushing
the
> xa package is that, people on this list keep talking about
incompleteness
> regarding the Exoffice's xa package -- and they are right from their
> perspective. But if you look at it in another way, this is good stuff --
and
> this is clearly my fault that I did not fully explain my perspective.

The code base in the xa package is good, just not complete.  We may
differ on what we think is incomplete though... (And I'm not saying
that the lack of a pooling implementation makes the XA package incomplete)

> I think that first I should explain my view. What I really need is a way
to
> handle transactions in an environment where there are only local

I understand the JTA layer, and I can see how you would find it useful
from 'local' transaction even when you can't do distributed
transactions.

[..]

> What I said above may be obvious, but a brief explanation may be
helpful. I
> am using a simple model here: I partition my application into two
layers:
> business logic layer (BLL) and data access layer (DAL). When I was
talking

Just so you know where I"m coming from, I use either Castor, TopLink
or EJB CMP entity beans, in my code-base to acheive this.  JBoss's EJB
layer works nicely with the existing JDBC driver, in my view, even
without distributed transaction availablity.  However, I prefer Castor
since its a) its easy to switch databases as needed, without changes
to the BLL and b) everything works as more of a java object. (Same is
true with Toplink, cocobase, and other non-opensource mappers)

With that said, I see the biggest 'bang-for-the-buck' usage of JTA is
inside of tools like these, rather than directly.  So when we work on the
XA package, it benefits the users of WebLogic, TopLink, etc the most.
Do you agree with this? (It benfits you here, since you are writing
directly to the jdbc2.0 optional spec)

[ .. ]

> attached to and detached from a thread.) So why should not I use the
JTA,
> if it does the job (well)???

In truth, you are correct.  I would only argue that its overkill.

> So I was setting out to create my own implementation of JTA. But how to
do
> it? The JTA defines its versions of XA interface to integrate the
resource
> manager(s). Why should I invent another interface? Most of the RDMBSs
> already provide implementations of these interfaces anyway. Does it do
any
> harm, if the architecture is also capable of handling distributed
> transactions. (My strong belief is that: it does not, and it is even
good
> that I can use the same infrastructure for distributed transactions, as
my
> needs and tools advance -- but you may disagree.)

Actually, no. I do not disagree.  Don't re-build what is already
available.  (That's why I point you to JBoss, since it already does
the stuff you want.)

> ((I actually implemented a mechanism for handling local transactions in
an
> abstract way using a custom API. It was not easy, but you learn a lot
from
> doing this kind of stuff, because you have to go through and find
solutions
> to problems which arise in such an environment. After I thought the
> implementation was complete and worked nicely, another potential problem
> came to my mind. The problem was the following. I used ThreadLocal to
attach
> transaction objects to threads. Also, I used CORBA for IPC. Now, every
> decent CORBA implementation uses thread pools to process incoming
requests.
> What happens --I asked myself--, if the user-programmer forgets to end
> [commit or rollback] the transaction??? It may take some time before the
> timer for the transaction expires and will be cleaned up from the thread
it
> has been attached to. During this time, the thread can be reused by the
ORB
> to process another incoming CORBA request, and the implementation that
> executes in the reused thread will be confused, because it will find
that it
> is part of an ongoing transaction. The clean solution to this is to use
a
> mechanism which is integrated in the CORBA infrastructure. CORBA
provides
> the local interface Current to move around thread specific information.
But,
> I asked myself, if I am already bogged down so deeply in this mess, why
> should not I use OMG's Object Transaction Service -- and why should not
I go
> the standard way accross the board. And that made it. [Just one small
> differentiation between standards: some of the standards are open in the
> sense that you can freely make AND distribute complying implementations,
and
> some of the standards are open in the sense that you can make clean-room
> implementations, but you cannot distribute complying implementation
without
> further arrangement with the standard's owner/author/...the lawyers know
> better what. My understanding is that JNDI, JDBC and JTA falls in the
former
> category and EJB and Servlets in the later, but I may be completely
> wrong.]))

[I left the above paragraph in tact for clarity]

To be honest, it sounds like your going through alot of work for very
little benefit.  You're saying that if your stuff encounters bad code
(where the conenction isn't committed/closed/rollback) and don't want
to wait for the timer, you want to still use the connection?  I point
to your statment "During this time, the thread can be reused by the
ORB to process another request" How much benefit is there with this?
(I'm not trying to say don't work on this problem, I'm trying to
understand the real issue you have with it.)

I don't think the EJB fails in this regard you mention.  Servlets do
simply because servlets aren't particular with what you do in it. EJB
is very particular.

(Side note, how familiar are you with EJB applications?  I'm asking
because I think EJB containers solve alot of what your trying to write
on the application server level.  I view our support of JTA, etc using
the XA classes vital for gaining acceptance with more EJB containers.
I want WebLogic to offer out-of-the-box a CMP persistance layer that
maps (at 100%) with postgresql one day.  This is my ultimate goal.
Course, if JBoss has their wish, WebLogic will be a non-product by
then anwaysy. ;-)

[ .. ]

> Summary: I need Exoffice's xa package, because it can be used to
integrate
> PostgreSQL into a JTA implementation. I am not interested in how it
> impelements 2pc, whether it fakes or not or whether it implements 2pc at
> all. You cannot use PosgtreSQL for 2pc anyway (as we already repeated it
too
> many times).

This is why I say our goals may be different, but they are not
othognal.  I too want to see us fully enabled in a JTA implementation,
though I am interested in a two-phase commit since we get the most
'bang-for-the-buck' that way.

> So what kind of pooling is done in XADataSourceImpl? The best way to
> describe it is going through a scenario.
>
> We have the following components:
> -- DataSource: implemented in the middleware;
> -- Pool: a pool of connections implemented in the middleware;
> -- PostgresqlXADataSource: implemented by the jdbc driver. Itself
implements
> org.postgresql.xa.XADataSourceImpl.

Here is the initial issue... When you say 'we have the following, you
say DataSource in the middleware.  I disagree. The datasource is part
of the JDBC driver.  The pool is 'extra' from the needs of the jdbc
driver.  You have clearly convinced me of that.  We do not have a
functioning org.postgresql.xa.XADataSourceImpl only because its not
complete.  Some of the issues are the two-phase-commits, which you
clearly do not need.  But for me, that is the primary issue.

[ .. I won't reproduce the following section here to save space, but let
  others review it in the archive..]

You have a good set of information about the XA package, and thanks
for helping to clarify it for me.  It helped me decide to continue
working on the pool in my original format, really for one reason; the
tools that the XA package is meant to provide are not necessary for my
pool.  They solve two different problems, though share the same base
concept.

[ .. ]

I realize that I'm quoting out of context, but please bear with me...

> committed (and the commit() has successfully been called on the physical
> connection). It is clear that when a connection is requested from
> XADataSourceImpl, it will first look for a free one in its internal pool
> before creating a new one, but this pooling mechanism does not (and in
fact,
> based on the spec, is not supposed to) do anything along lines of
meeting
> pooling requriement (b). I can imagine for example an RDMBS-JDBC driver

Well, it comes across in the source code that its trying to do this,
regardless that it may not be to spec.  This seems like one area that
may be wrong in the XA package.

> combination, where physical connections can be effectively detached from
and
> attached to transactions. In such a case, the JDBC driver does not need
to
> implement any internal pooling. The XADataSourceImpl in our case needs
to

I agree 100%.  The requirements of the jdbc2.0 optional api spec do
not require that the jdbc driver implement any internal pooling.
However, I want to provide it for folks so that they can use it
out-of-the-box anyways.

I can still claim that I'm working on getting the driver to compliance
to jdbc2.0 optional for when I work on rowsets and two-phase commits.
Also, not that I'm trying to see what I can and cannot reuse of the
current xa implementation.  Example: I can still user the
XAConnectionImpl.java instead of my wrapped connection because of how
it uses event listener. (My submittal had my own version, but I'm
working on a new submittal that reuses that implementation.)  Looking
at the spec, my implementation can make use of XAConnections and
XAResource objects easily.  (As long as these XA classes (and the
PostgresDataSource are up to spec.)

Before we go any further, I've laid out my plans of what I'm working
on the project.  Again, they are
 --pooling implementation
 --rowset implementation
 --two-phase commits

From our discussions, I don't think you believe I shouldn't be working
on these items, just that they are not what you would think is the most
important to work on.  Perhaps you can layout (in this simple format)
what you are going to work on? (Or, a request of what you would like)
so that a) others can see if they want to help and b) people can see
if they want it in the TODO file.

=====
Virtually,        |                   "Must you shout too?"
Ned Wolpert       |                                  -Dante
wolpert@yahoo.com |
_________________/              "Who watches the watchmen?"
4e75                                       -Juvenal, 120 AD

-- Place your commercial here --                      fnord

__________________________________________________
Do You Yahoo!?
Send your FREE holiday greetings online!
http://greetings.yahoo.com