Обсуждение: Big Test Environment Feature

Поиск
Список
Период
Сортировка

Big Test Environment Feature

От
Matthew Tedder
Дата:
Question:
   How feasible would it be to create this functionality in PostgreSQL:

One creates a test version of a database that initially consists of 
read-links to the production version of the same database.  Any code he/she 
then writes that reads from a table reads from the production database but 
any code that modifies data copies that table to the test database.

The benefits of this are obviously huge for IT shops that need to constantly 
work on data in test environments as similar as possible to the production 
environment.  

Usually, this is a very difficult aspect of one's work and represents a great 
deal of risk.   We always try to hard to ensure that what we migrate into 
production is going to work there the same as it did in test.  And we should 
not do testing in a production environment.

Such a feature would give PostgreSQL a major advantage over Oracle or DB2.

And some day when PostgreSQL is also distributable, it'll be ideal for the 
enterprise.  

Matthew

-- 
Anything that can be logically explained, can be programmed.


Re: Big Test Environment Feature

От
Bill Cunningham
Дата:
Matthew Tedder wrote:

>Question:
>
>    How feasible would it be to create this functionality in PostgreSQL:
>
>One creates a test version of a database that initially consists of 
>read-links to the production version of the same database.  Any code he/she 
>then writes that reads from a table reads from the production database but 
>any code that modifies data copies that table to the test database.
>
>The benefits of this are obviously huge for IT shops that need to constantly 
>work on data in test environments as similar as possible to the production 
>environment.  
>
>Usually, this is a very difficult aspect of one's work and represents a great 
>deal of risk.   We always try to hard to ensure that what we migrate into 
>production is going to work there the same as it did in test.  And we should 
>not do testing in a production environment.
>
>Such a feature would give PostgreSQL a major advantage over Oracle or DB2.
>
>And some day when PostgreSQL is also distributable, it'll be ideal for the 
>enterprise.  
>
>Matthew
>
>  
>

Why wouldn't you use a pg_dump of the production database? Perhaps just 
a sampling every so often?

This sounds like a lot of unnecessary work for the engine. How about a 
seperate program which has
notify links to the source database and places updated data in the test db?

- Bill




Re: Big Test Environment Feature

От
Matthew Tedder
Дата:
Comments at appropriate places below..

On Friday 14 June 2002 04:41 pm, Bill Cunningham wrote:
> Matthew Tedder wrote:
> >Question:
> >
> >    How feasible would it be to create this functionality in PostgreSQL:
> >
> >One creates a test version of a database that initially consists of
> >read-links to the production version of the same database.  Any code
> > he/she then writes that reads from a table reads from the production
> > database but any code that modifies data copies that table to the test
> > database.
> >
> >The benefits of this are obviously huge for IT shops that need to
> > constantly work on data in test environments as similar as possible to
> > the production environment.
> >
> >Usually, this is a very difficult aspect of one's work and represents a
> > great deal of risk.   We always try to hard to ensure that what we
> > migrate into production is going to work there the same as it did in
> > test.  And we should not do testing in a production environment.
> >
> >Such a feature would give PostgreSQL a major advantage over Oracle or DB2.
> >
> >And some day when PostgreSQL is also distributable, it'll be ideal for the
> >enterprise.
> >
> >Matthew
>
> Why wouldn't you use a pg_dump of the production database? Perhaps just
> a sampling every so often?

That won't work nearly as well.  Obviously we can and often do dumps.  But 
when testing something that has to work in a production environment, we need 
to see what happens over a course of several day's time.  This is needed not 
only for testing of the specific code changed or added to a process, but also 
a test of how it integrations with a larger and more complex information flow 
system.  

>
> This sounds like a lot of unnecessary work for the engine. How about a
> seperate program which has
> notify links to the source database and places updated data in the test db?

Big unnecessary dumps and recreation of the data structures also 
unecissarilly use I/O resources.   The idea is to minimize that and 
easily/seemlessly create testing environments.  

Often, many programmer/analysis are working on different parts of the 
information system simultaneously each and every day.

Matthew

>
> - Bill
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/users-lounge/docs/faq.html

-- 
Anything that can be logically explained, can be programmed.


Re: Big Test Environment Feature

От
Matthew Tedder
Дата:
Comments below to keep context intact...

On Saturday 15 June 2002 04:13 pm, Alvaro Herrera wrote:
> Matthew Tedder dijo:
> > On Friday 14 June 2002 04:41 pm, Bill Cunningham wrote:
> > > Matthew Tedder wrote:
> > > >    How feasible would it be to create this functionality in
> > > > PostgreSQL:
> > > >
> > > >One creates a test version of a database that initially consists of
> > > >read-links to the production version of the same database.  Any code
> > > > he/she then writes that reads from a table reads from the production
> > > > database but any code that modifies data copies that table to the
> > > > test database.
> > >
> > > [pg_dump into the development machines]
> >
> > That won't work nearly as well.  Obviously we can and often do dumps. 
> > But when testing something that has to work in a production environment,
> > we need to see what happens over a course of several day's time.  This is
> > needed not only for testing of the specific code changed or added to a
> > process, but also a test of how it integrations with a larger and more
> > complex information flow system.
>
> Seems like single master multi slave replication would do the trick,
> wouldn't it? You can replicate the master's data to the slaves and do
> the tests there.  Depending on how frequent the updates are (assuming
> they are asynchronous), the DB load will be different, but I wonder
> whether this may be an issue.

First, there are two issues to be cogniscent of: (1) that the test table(s) 
remain identical in every way to the production ones, including all the 
happens to them, except for whatever part of the processing is being tested; 
(2) that we conserve disk space and I/O resources.

Here's an example problem:

CONTEXT:
A group of eight hospitals merged together and integrated a variety of 
systems, including disparate Order Entry subsystems.  Nightly, the data from 
each subsystem is FTP'd to a central data processing server for the 
enterprise.  And as part of the nightly batch flows, a separate process for 
each, translates it to a common format and inserts it into the Orders table.  Following this, processing begins for
othersubsystems that use this data 
 
such as the Billing Subsystem(s), Inventory subsystems, Decision Support 
Systems, Archiving subsystems, etc.  

Un-Important Note: I personally believe strongly in using flags and status 
indicator codes on top of normalized data, but many conservative shops move 
data from bucket to bucket along its nightly course, as each process touches 
it.   (Although this causes data inconsistency problems, it does also have 
the advantage of providing a detailed audit trail)

PROBLEM:
When a change is made to the output of one of the Orders subsystems and the 
programmer/analyst has to redesign the translation code, should he dump the 
entire database into a test environment?  Everything that his data effects 
downstream may be only 15% of the remaining nightly processes. 

SOLUTION:
Therefore, if the database kept only some kind of a read-link to production 
tables and only dumps when something is modified in the respective table, 
wouldn't it significantly reduce the pull on resources--both in terms of disk 
space and I/O utilization?  

OTHER CONCERNS:
Often an IT shop has one big production, one big test, and one big 
development environment.   In that case, a big database dump for each makes a 
great deal of sense.  However, the date for applying a change from 
development to test and production will be sooner for some projects than for 
others.  My idea basically enables those with different due dates to have 
separate test or development environments so that the unwanted effects of 
projects that take a longer time do not negatively impact those that need to 
be perfected and put into production sooner.  The ones that go in sooner, 
would, however impact the ones going later once the sooner ones are put into 
production.  But this is not such a bad thing as the alternative.

Maybe I am reading into this a little too deeply.  I don't know.. You be the 
judge.......it seemed like something like this could be very helpful at my 
former workplace.  People were constantly bumping into eachother in our test 
environment.

Matthew
-- 
Anything that can be logically explained, can be programmed.