Re: Relation extension scalability

Поиск
Список
Период
Сортировка
От Dilip Kumar
Тема Re: Relation extension scalability
Дата
Msg-id CAFiTN-tkX6gs-jL8VrPxg6OG9VUAKnObUq7r7pWQqASzdF5OwA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Relation extension scalability  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: Relation extension scalability  (Dilip Kumar <dilipbalaut@gmail.com>)
Re: Relation extension scalability  (Dilip Kumar <dilipbalaut@gmail.com>)
Список pgsql-hackers

On Sat, Mar 26, 2016 at 8:07 AM, Robert Haas <robertmhaas@gmail.com> wrote:
I think we need to start testing these patches not only in terms of
how *fast* they are but how *large* the relation ends up being when
we're done.  A patch that inserts the rows slower but the final
relation is smaller may be better overall.  Can you retest v13, v14,
and master, and post not only the timings but the relation size
afterwards?  And maybe post the exact script you are using?


I have tested the size and performance, scripts are attached in the mail.

COPY 1-10 bytes tuple from 32 Clients
Base V13 V14
                               --------               ---------              ---------
TPS 123 874 446
No. Of Tuples 148270000  1049980000 536370000
Relpages 656089 4652593 2485482
INSERT 1028 bytes Tuples From 16 Clients
Base V13 V14
                              --------               --------                ---------
TPS 42 211 120
No. Of Tuples 5149000 25343000 14524000
Rel Pages 735577 3620765 2140612
 
As per above results If we calculate the tuple number of tuples and respective relpages, then neither in v13 nor v14 there are extra unused pages.

As per my calculation for INSERT (1028 byte tuple) each page contain 7 tuples so
number of pages required
Base: 5149000/7 = 735571  (from relpages we can see 6 pages are extra)
V13 :  25343000/7= 3620428 (from relpages we can see ~300 pages are extra).
V14 :  14524000/7= 2074857 (from relpages we can see ~70000 pages are extra).

With V14 we have found max pages number of extra pages, I expected V13 to have max unused pages, but it's v14 and I tested it in multiple runs and v13 is always the winner. I tested with multiple client count also like 8, 32 and v13 always have only ~60-300 extra pages out of total ~2-4 Million Pages.


Attached files:
-------------------
test_size_ins.sh  --> automated script to run insert test and calculate tuple and relpages.
test_size_copy   --> automated script to run copy test and calculate tuple and relpages.
copy_script  -> copy pg_bench script used by test_size_copy.sh
insert_script --> insert pg_bench script used by test_size_ins.sh

 
Maybe something like this would help:

    if (needLock)
    {
        if (!use_fsm)
            LockRelationForExtension(relation, ExclusiveLock);
        else if (!ConditionLockRelationForExtension(relation, ExclusiveLock))
        {
            BlockNumber     last_blkno = RelationGetNumberOfBlocks(relation);

            targetBlock = GetPageWithFreeSpaceExtended(relation,
last_blkno, len + saveFreeSpace);
            if (targetBlock != InvalidBlockNumber)
                goto loop;

            LockRelationForExtension(relation, ExclusiveLock);
            targetBlock = GetPageWithFreeSpace(relation, len + saveFreeSpace);
            if (targetBlock != InvalidBlockNumber)
            {
                UnlockRelationForExtension(relation, ExclusiveLock);
                goto loop;
            }
            RelationAddExtraBlocks(relation, bistate);
        }
    }

I think this is better than what you had before with lastValidBlock,
because we're actually interested in searching the free space map at
the *very end* of the relation, not wherever the last target block
happens to have been.

We could go further still and have GetPageWithFreeSpace() always
search the last, say, two pages of the FSM in all cases.  But that
might be expensive. The extra call to RelationGetNumberOfBlocks seems
cheap enough here because the alternative is to wait for a contended
heavyweight lock.

I will try the test with this also and post the results. 

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tatsuo Ishii
Дата:
Сообщение: Re: multivariate statistics v14
Следующее
От: Piotr Stefaniak
Дата:
Сообщение: Two division by 0 errors in optimizer/plan/planner.c and optimizer/path/costsize.c