Re: Hash Indexes

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: Hash Indexes
Дата
Msg-id CAA4eK1L1K9CokG6OjZwxoXdR9AgKty7J1mOgJa9uV7ghmebyQQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Hash Indexes  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: Hash Indexes  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Tue, Jun 21, 2016 at 9:26 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Tue, May 10, 2016 at 8:09 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > Once the split operation has set the split-in-progress flag, it will begin scanning bucket (N+1)/2.  Every time it finds a tuple that properly belongs in bucket N+1, it will insert the tuple into bucket N+1 with the moved-by-split flag set.  Tuples inserted by anything other than a split operation will leave this flag clear, and tuples inserted while the split is in progress will target the same bucket that they would hit if the split were already complete.  Thus, bucket N+1 will end up with a mix of moved-by-split tuples, coming from bucket (N+1)/2, and unflagged tuples coming from parallel insertion activity.  When the scan of bucket (N+1)/2 is complete, we know that bucket N+1 now contains all the tuples that are supposed to be there, so we clear the split-in-progress flag on both buckets.  Future scans of both buckets can proceed normally.  Split operation needs to take a cleanup lock on primary bucket to ensure that it doesn't start if there is any Insertion happening in the bucket.  It will leave the lock on primary bucket, but not pin as it proceeds for next overflow page.  Retaining pin on primary bucket will ensure that vacuum doesn't start on this bucket till the split is finished.
>
> In the second-to-last sentence, I believe you have reversed the words
> "lock" and "pin".
>

Yes. What, I mean to say is release the lock, but retain the pin on primary bucket till end of operation.

> > Insertion will happen by scanning the appropriate bucket and needs to retain pin on primary bucket to ensure that concurrent split doesn't happen, otherwise split might leave this tuple unaccounted.
>
> What do you mean by "unaccounted"?
>

It means that split might leave this tuple in old bucket even if it can be moved to new bucket.  Consider a case where insertion has to add a tuple on some intermediate overflow bucket in the bucket chain, if we allow split when insertion is in progress, split might not move this newly inserted tuple.

> > Now for deletion of tuples from (N+1/2) bucket, we need to wait for the completion of any scans that began before we finished populating bucket N+1, because otherwise we might remove tuples that they're still expecting to find in bucket (N+1)/2. The scan will always maintain a pin on primary bucket and Vacuum can take a buffer cleanup lock (cleanup lock includes Exclusive lock on bucket and wait till all the pins on buffer becomes zero) on primary bucket for the buffer.  I think we can relax the requirement for vacuum to take cleanup lock (instead take Exclusive Lock on buckets where no split has happened) with the additional flag has_garbage which will be set on primary bucket, if any tuples have been moved from that bucket, however I think for squeeze phase (in this phase, we try to move the tuples from later overflow pages to earlier overflow pages in the bucket and then if there are any empty overflow pages, then we move them to kind of a free pool) of vacuum, we need a cleanup lock, otherwise scan results might get effected.
>
> affected, not effected.
>
> I think this is basically correct, although I don't find it to be as
> clear as I think it could be.  It seems very clear that any operation
> which potentially changes the order of tuples in the bucket chain,
> such as the squeeze phase as currently implemented, also needs to
> exclude all concurrent scans.  However, I think that it's OK for
> vacuum to remove tuples from a given page with only an exclusive lock
> on that particular page.
>

How can we guarantee that it doesn't remove a tuple that is required by scan which is started after split-in-progress flag is set?

>  Also, I think that when cleaning up after a
> split, an exclusive lock is likewise sufficient to remove tuples from
> a particular page provided that we know that every scan currently in
> progress started after split-in-progress was set.
>

I think this could also have a similar issue as above, unless we have something which prevents concurrent scans.

>
> (Plain text email is preferred to HTML on this mailing list.)
>

If I turn to Plain text [1], then the signature of my e-mail also changes to Plain text which don't want.  Is there a way, I can retain signature settings in Rich Text and mail content as Plain Text.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Etsuro Fujita
Дата:
Сообщение: Re: Postgres_fdw join pushdown - wrong results with whole-row reference
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: Hash Indexes