Re: [HACKERS] how to deal with sparse/to-be populated tables
От | Alfred Perlstein |
---|---|
Тема | Re: [HACKERS] how to deal with sparse/to-be populated tables |
Дата | |
Msg-id | 20000203213232.Y25520@fw.wintelcom.net обсуждение исходный текст |
Ответ на | Re: [HACKERS] how to deal with sparse/to-be populated tables (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: [HACKERS] how to deal with sparse/to-be populated tables
|
Список | pgsql-hackers |
* Tom Lane <tgl@sss.pgh.pa.us> [000203 20:58] wrote: > Chris Bitmead <chrisb@nimrod.itg.telstra.com.au> writes: > > Hmm. Doesn't PostgreSQL have a big list of error codes? I don't think > > it does, I've never seen one. There should be a way to get error > > codes without comparing strings. Should this be on the TODO? > > It doesn't, there should, and it already is ;-) > > In the meantime, looking at the error message string is Alfred's > only option for distinguishing duplicate-record from other errors, > I'm afraid. > > A partial answer to his performance concern is to use a rule > (or possibly a trigger) on the database side to reinterpret > "insert into table X" as "either insert or update in table Y, > depending on whether the key is already there". This wouldn't > buy anything in terms of database cycles, but it would avoid two > rounds of client-to-backend communication and query parsing. > > I've never done that myself, but perhaps someone else on the > list has a working example. Actually we have some plpgsql code lying around that does this. The issue isn't ease of implementation, but actually the speed of the implementation. Even parsing the error return isn't as optimal as a insert_new|update_existing_with_args single op would be. One of the more fustrating aspects is that we could use the field that we merge rows on as a primary index, this would allow us to do a insert or update on failed insert... however... if we fail to locate the row on the initial query (to see if it exists) we pay a large penalty because the insert must be validated to be unique. This effectively doubles the search. This is also a problem if we do "update or insert on fail", basically a double scan is required. (yes, I just thought about only indexing, and trying the update first and only on failure doing an insert, however we really can't determine if the initial update failed because no record matched(ok), or possible some other error (ouch)) That's why we can't use this feild as a primary index, even though it is supposed to be unqiue. Basically the database seems to force a _double_ lookup, the only way I see around this is to then switch over to a bulk copy getting around the double lookup. However, this will only work for our special case where there is only a single reader/writer updating the table at any time, otherwise we need special locking to avoid races. Even if this isn't a TODO item, if there's a wish list out there it'd be nice to see this request for feature listed. I think once the dust settles over here and the need to scale goes from very scalable to insanely scalable I'm going to have an even greater interest in learning postgresql internals. :) thanks, -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
В списке pgsql-hackers по дате отправления: