Alter/update large tables - VERRRY annoying behaviour!
От | Dmitry Tkach |
---|---|
Тема | Alter/update large tables - VERRRY annoying behaviour! |
Дата | |
Msg-id | 3CBB08C8.7000705@openratings.com обсуждение исходный текст |
Список | pgsql-bugs |
Hi, everybody! I was wonderring if it is a known "feature" to begin with, and, if there are any plans to fix it in future? I had two very large tables in my database (about 30 million rows each), connected by a foreign key, and wanted to merge them together... Something like this: create table a ( id int primary key, some_data int ); create table b ( id int unique references a, other_data int ); So, what I did was: alter table a add other_data int; update a set other_data = b.other_data from b where b.id=a.id; This took me awfully long, but worked (I guess). I say 'I guess', because I wasn't able so far to verify that - when I triued to do select * from a limit 1; It just hungs on me ... at least, it looks like it does. Lucky me, I have compiled the backend from sources with full debug info, because if I hadn't done that, (as most users), I would certainly had thought, that my database is hopelessly corrupted, and would have to recreate it from scratch :-( Instead, I loaded the whole thing into a debugger, because that seems to be the only way to figure out what the hell it is thinking about... So, what I found out was that it seems to have recreated my entire table when I updated it, and left all the old tuples in it as well, so, my 'select *...limit 1' query was cycling through 30 million deleted tuples, trying to find the first one that was still valid, and that's waht was taking that long time... First of all, a question for you - is ANY update to a table equivalent (in this respect) to a delete+insert? Or is my problem specific to the fact that I have altered the table and add new columns? Now, I understand, that, if I vacuum'ed it, the problem would have been resolved... The problems I have with it though are: - As I said, the behaviour was so unexpected, I would have trashed the whole databse, if I wasn't able to debug it... If there is no other possible solution, I think, at the very least, it should give the user some indication that it's not hopelessly hung, when doing that query... - Vacuum, isn't the speediest thing in the world too (it's been running for a hour now, and still has not finished). I was hoping to complete modifying my schema first, and then just vacuum everything once. So, it would be REALLY, REALLY helpful for situations like that, if PG was smart enough to keep track somehow of those deleted tuples, to avoid having to scan through them all every time... In my particular situation, the solution would be trivial (just remembering the address of the first valid tuple would suffice - because the entire table was updated)... I am not familiar enough with internals to suggest anything more general than this, but EVEN fixing only this particular scenario, would, I believe, be extremely useful.... Do you agree? Thanks a lot! Dima.
В списке pgsql-bugs по дате отправления: