How can I prevent duplicate inserts by multiple concurrent threads ?
От | Hursh Jain |
---|---|
Тема | How can I prevent duplicate inserts by multiple concurrent threads ? |
Дата | |
Msg-id | 54AD891D.9000508@beesell.com обсуждение исходный текст |
Список | pgsql-novice |
I am running postgres 9.3.x and am feeling a little confused about transactions, isolations and locks, as they pertain to preventing duplicate concurrent *inserts* of the same data. No updates, only inserts. I am using JDBC and using the following code pattern, invoked from multiple front-end web server threads: getData() 1...get JDBC connection (using default transaction isolation). 2. START TRANSACTION 3....some prelim processing... 4. for a bunch of data chunks (say 100 different chunks), 4a...check if data chunk already exists in DB ? QUERY: select count(*) from table where data_chunk_id = 123 4b...if not exist (count is 0), create data and insert data into table 4c...else if already exists (count > 0), retrieve existing data from table 5. Add up all the chunks, and create a JSON string for the client 6. COMMIT TRANSACTION 7. write JSON to client We really only need to create/save these chunks the first time they are requested and then send only the already created ones thereafter. The issue is concurrent threads are running inside getData() the very first time the request is made (so the initial request, like subsequent requests, can be made by multiple clients at the same time). So, then: - Thread #1 is in step 4a and sees no data so inserts data into table. - Before it reaches Step 4b and commits, Thread #2 reaches step 4a as well. - Since Thread #1 has not committed yet, Thread #2 does not see any data in the table and also inserts data into the table. - Both threads insert the data and I get duplicate rows. (sometimes N-duplicates, for N threads). Any suggestions, tips ? Am I at least thinking along the right lines ? Should I be getting a table lock instead ? What is the recommended way to prevent duplicate inserts in a concurrent situation like this ? (again, there are no updates, only one-time inserts if that data does not exist already). I'm thinking, I could do this: Option (A) getData() 1...get JDBC connection (using default transaction isolation). 2. START TRANSACTION 3. ==> SET SAVEPOINT A 4. for a bunch of data chunks (say 100 different chunks), 4a...check if this data chunk already exists in DB, get ROW LOCK Query: select data_chunk_id from table where data_chunk_id = 123 FOR UPDATE 4b...if null row returned, create data and insert data into table 4c...else use already retrieved row from table 4d...COMMIT TO SAVEPOINT A 5. Add up all the chunks, and create a JSON string for the client 6. COMMIT TRANSACTION 7. write JSON to client Another option Option (B) getData() 1...get JDBC connection (using default transaction isolation). 2. start transaction SERIALIZABLE 3. ==> SET SAVEPOINT A 4. for a bunch of data chunks (say 100 different chunks), 4a. always create data and insert data into table 4c...if error, ROLLBACK to A, else COMMIT 5. Add up all the chunks, and create a JSON string for the client 6. COMMIT TRANSACTION 7. write JSON to client This does imply I need to add a uniqueness constraint to the data (which I can, although right now there are no constraints). Any feedback appreciated... Best, --j
В списке pgsql-novice по дате отправления: