Re: Reviewing freeze map code

Поиск
Список
Период
Сортировка
От Masahiko Sawada
Тема Re: Reviewing freeze map code
Дата
Msg-id CAD21AoCFy56srzXnbjGPXQXOAji97W0rX-eEuju+PjGjmJK2jw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Reviewing freeze map code  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Fri, Jun 10, 2016 at 1:11 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Jun 9, 2016 at 5:48 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> Attached patch implements the above 2 functions.  I have addressed the
>> comments by Sawada San and you in latest patch and updated the documentation
>> as well.
>
> I made a number of changes to this patch.  Here is the new version.
>
> 1. The algorithm you were using for growing the array size is unsafe
> and can easily overrun the array.  Suppose that each of the first two
> pages have some corrupt tuples, more than 50% of MaxHeapTuplesPerPage
> but less than the full value of MaxTuplesPerPage.  Your code will
> conclude that the array does need to be enlarged after processing the
> first page.  I switched this to what I consider the normal coding
> pattern for such problems.
>
> 2. The all-visible checks seemed to me to be incorrect and incomplete.
> I made the check match the logic in lazy_scan_heap.
>
> 3. Your 1.0 -> 1.1 upgrade script was missing copies of the REVOKE
> statements you added to the 1.1 script.  I added them.
>
> 4. The tests as written were not safe under concurrency; they could
> return spurious results if the page changed between the time you
> checked the visibility map and the time you actually examined the
> tuples.  I think people will try running these functions on live
> systems, so I changed the code to recheck the VM bits after locking
> the page.  Unfortunately, there's either still a concurrency-related
> problem here or there's a bug in the all-frozen code itself because I
> once managed to get pg_check_frozen('pgbench_accounts') to return a
> TID while pgbench was running concurrently.  That's a bit alarming,
> but since I can't reproduce it I don't really have a clue how to track
> down the problem.
>
> 5. I made various cosmetic improvements.
>
> If there are not objections, I will go ahead and commit this tomorrow,
> because even if there is a bug (see point #4 above) I think it's
> better to have this in the tree than not.  However, code review and/or
> testing with these new functions seems like it would be an extremely
> good idea.
>

Thank you for working on this.
Here are some minor comments.

---
+/*
+ * Return the TIDs of not-all-visible tuples in pages marked all-visible

If there is even one non-visible tuple in pages marked all-visible,
the database might be corrupted.
Is it better "not-visible" or "non-visible" instead of "not-all-visible"?
---
Do we need to check page header flag?
I think that database also might be corrupt in case where there is
non-visible tuple in page set PD_ALL_VISIBLE.
We could emit the WARNING log in such case.

Also, using attached tool which allows us to set spurious visibility
map status without actual modifying the tuple , I manually made the
some situations where database is corrupted and tested it, but ISTM
that this tool works fine.
It doesn't mean proposing as a new feature of course, but please use
it as appropriate.

Regards,

--
Masahiko Sawada

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: [BUG] pg_basebackup from disconnected standby fails
Следующее
От: Craig Ringer
Дата:
Сообщение: Re: Online DW