Re: Using the database to validate data
От | Adrian Klaver |
---|---|
Тема | Re: Using the database to validate data |
Дата | |
Msg-id | 55B12C1E.9080500@aklaver.com обсуждение исходный текст |
Ответ на | Using the database to validate data (JPLapham <lapham@jandr.org>) |
Ответы |
Re: Using the database to validate data
|
Список | pgsql-general |
On 07/23/2015 05:55 AM, JPLapham wrote: > Hello, > > I have an application that occasionally performs large batch inserts of user > hand-generated data. Input is a tab delimited file with typically hundreds > to a thousand lines of data. > > Because the data is generated by hand, there are always many > transaction-stopping errors in a typical input run. For example, missing > datum in a NOT NULL column, a duplicate value in a UNIQUE column, data type > mismatch, FOREIGN KEY reference to something non-existing, etc. Of course, > we chose PostgreSQL exactly because of these problems, because of the robust > transactional control, rollback on errors, etc. > > My question is the following. I would like to *test* the data input for > integrity in such a way that I can create a report to the user informing > them of exactly where in their input file to correct the problems. > > IDEA 1: My first attempt at this was to simply slurp the data into the > database, collect the errors, and then rollback. Of course (as I now know), > this doesn't work because after the first problem, the database reports, > "current transaction is aborted, commands ignored until end of transaction > block". This means that I can only report to the user the location of the > first problem, and then they run the data again, and keep looping through > the process until the data is good, a huge waste of time. > > IDEA 2: My second idea on how to do this was to ROLLBACK after each INSERT. > This allows me to check for things like NOT NULL and data type issues, but > not violations of UNIQUE within the new data. > > IDEA 3: Write my own data pre-conditioner. Ugh, what a nightmare, I feel > like I'm writing my own database! Checking for FKEY constraints, UNIQUE, etc > is not trivial. It seems ridiculous to do this when I have the *actual* > database available to test against! That has already been done: http://pgloader.io/ > > Has anyone dealt with this kind of issue before? What are your opinions on > best practice for this? Of course I do not want to actually COMMIT until the > data is perfect! > > Thanks for your time! > -Jon > > > > -- > View this message in context: http://postgresql.nabble.com/Using-the-database-to-validate-data-tp5859046.html > Sent from the PostgreSQL - general mailing list archive at Nabble.com. > > -- Adrian Klaver adrian.klaver@aklaver.com
В списке pgsql-general по дате отправления: