Re: Massive parallel queue table causes index deterioration, butREINDEX fails with deadlocks.

Поиск
Список
Период
Сортировка
От Justin Pryzby
Тема Re: Massive parallel queue table causes index deterioration, butREINDEX fails with deadlocks.
Дата
Msg-id 20190225035438.GO28750@telsasoft.com
обсуждение исходный текст
Ответ на Re: Massive parallel queue table causes index deterioration, butREINDEX fails with deadlocks.  (Gunther <raj@gusw.net>)
Список pgsql-performance
On Sun, Feb 24, 2019 at 10:06:10PM -0500, Gunther wrote:
> The index isn't required at all if all my pending jobs are in a partition of
> only pending jobs. In that case the plan can just be a sequential scan.
..
> because an index really isn't required. The actual pending partition will
> always remain quite small, and being a queue, it doesn't even matter how big
> it might grow, as long as new rows are inserted at the end and not in the
> middle of the data file and still there be some way of fast skip over the
> part of the dead rows at the beginning that have already been processed and
> moved away.
..
> One question I have though: I imagine our pending partition heap file to now
> be essentially sequentially organized as a queue. New jobs are appended at
> the end, old jobs are at the beginning. As pending jobs become completed
> (pending = false) these initial rows will be marked as dead. So, while the
> number of live rows will remain small in that pending partition, sequential
> scans will have to skip over the dead rows in the beginning.
> 
> Does PostgreSQL structure its files such that skipping over dead rows is
> fast? Or do the dead rows have to be read and discarded during a table scan?
..
> Of course vacuum eliminates dead rows, but unless I do vacuum full, it will
> not re-pack the live rows, and that requires an exclusive table lock. So,
> what is the benefit of vacuuming that pending partition? What I _/don't/_
> want is insertion of new jobs to go into open slots at the beginning of the
> file. I want them to be appended (in Oracle there is an INSERT /*+APPEND*/
> hint for that. How does that work in PostgreSQL?
> 
> Ultimately that partition will amass too many dead rows, then what do I do?

Why don't you want to reuse free space in the table ?

When you UPDATE the tuples and set pending='f', the row will be moved to
another partition, and the "dead" tuple in the pending partition can be reused
for a future INSERT.  The table will remain small, as you said, only as large
as the number of items in the pending queue, plus tuples not yet vacuumed and
not yet available for reuse.

(BTW, index scans do intelligently skip over dead rows if the table/index are
vacuumed sufficiently often).

> 1. Create the Queue table partitioned on the pending column, this creates
> the partition with the pending jobs (on which I set the fillfactor kind of
> low, maybe 50) and the default partition with all the rest. Of course that
> allows people with a constant transaction volume to also partition on jobId
> or completionTime and move chunks out to cold archive storage. But that's
> beside the current point.

I suggest you might want to partition on something other than (or in addition
to) the boolean column.  For example, if you have a timestamp column for
"date_processed", then the active queue would be for "processed IS NULL" (which
I think would actually have to be the DEFAULT partition).  Or you could use
sub-partitioning, or partition on multiple columns (pending, date_processed) or
similar.

Justin


В списке pgsql-performance по дате отправления:

Предыдущее
От: Gunther
Дата:
Сообщение: Re: Massive parallel queue table causes index deterioration, butREINDEX fails with deadlocks.
Следующее
От: kimaidou
Дата:
Сообщение: Re: Aggregate and many LEFT JOIN