Re: DATA corruption after promoting slave to master
От | Karthik Iyer |
---|---|
Тема | Re: DATA corruption after promoting slave to master |
Дата | |
Msg-id | 53AC31DE.7030902@directi.com обсуждение исходный текст |
Ответ на | Re: DATA corruption after promoting slave to master (Shaun Thomas <sthomas@optionshouse.com>) |
Список | pgsql-general |
Thanks Shaun. We reindexed all the primary and unique keys of all the tables, But we did not reindex the tables. You think we should do that also ? Also, do you think we should do a clean dump restore to eliminate all data inconsistencies. One more query : We managed to get the old server up. We are trying to play the difference in data by checking the log files(statement logs). You think there is any other easy alternatives ? Thanks in advance, - Karthik On Thursday 26 June 2014 01:09 AM, Shaun Thomas wrote: > On 06/25/2014 06:29 AM, Karthik Iyer wrote: > >> [2] We also have a daemon process which copies the latest partial WAL >> log file (which postgres is currently writing to, under pg_xlog/) every >> 3 secs to a different location. > > No. No, no, no, no no. No. > > Also, no. Partial WAL files are not valid for recovery. In fact, I'm > surprised the standby node even applied it at all. > >> We are seeing these problems in the newly promoted master now: >> >> 1. when we run queries on primary key, we don't get the rows even >> if it exist in db. However if we force query not to use index, we get >> those entries. >> 2. there are duplicate values for primary keys > > This is no surprise. Your slave has partial data commits, which means > your table, index, or possibly both, are corrupt. > > The first thing you need to do is back up any tables you've noticed > are having this behavior. Second, try to reindex the tables that are > having problems. The errors you are seeing are due to the data and > indexes being out of sync. If you get an error that says the reindex > fails due to duplicate values, you can do this: > > SET enable_indexscan TO false; > SET enable_bitmapscan TO false; > > SELECT primary_key, count(1) > FROM broken_table > GROUP BY 1 > HAVING count(1) > 1; > > For any ID that comes back, do this: > > SELECT ctid, * > FROM broken_table > WHERE primary_key = [value(s) from above]; > > Then you need to delete one of the bad rows after deciding which. Use > the CTID of the row you want to delete: > > DELETE FROM broken_table > WHERE ctid = 'whatever'; > > Then reindex the table so the correct values are properly covered. > Doing this for all of your corrupt tables may take a while depending > on how many there are. >
В списке pgsql-general по дате отправления: