Re: Large number of open(2) calls with bulk INSERT into empty table

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: Large number of open(2) calls with bulk INSERT into empty table
Дата
Msg-id CA+TgmoYRGO5Wu4_2yzP0-cmx4GBZHqOO4WKiCxT-R3gV6qrt6A@mail.gmail.com
обсуждение исходный текст
Ответ на Large number of open(2) calls with bulk INSERT into empty table  (Florian Weimer <fweimer@bfk.de>)
Ответы Re: Large number of open(2) calls with bulk INSERT into empty table  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Large number of open(2) calls with bulk INSERT into empty table  (Florian Weimer <fweimer@bfk.de>)
Список pgsql-hackers
On Sun, Nov 27, 2011 at 10:24 AM, Florian Weimer <fweimer@bfk.de> wrote:
> I noticed that a bulk INSERT into an empty table (which has been
> TRUNCATEd in the same transaction, for good measure) results in a
> curious number of open(2) calls for the FSM resource fork:

That's kind of unfortunate.  It looks like every time we extend the
relation, we try to read the free space map to see whether there's a
block available with free space in it.  But since we never actually
make any entries in the free space map, the fork never gets created,
so every attempt to read it involves a system call to see whether it's
there.

I set up the following test case to try to measure the overhead on my
MacBook Pro:

create table bob (a integer, b text);

pgbench -f foo -t 100, with the following contents for foo:

begin;
truncate bob;
insert into bob select g,
random()::text||random()::text||random()::text||random()::text from
generate_series(1,10000) g;
commit;

I tried whacking out the call to GetPageWithFreeSpace() in
RelationGetBufferForTuple(), and also with the unpatched code, but the
run-to-run randomness was way more than any difference the change
made.  Is there a better test case?

I've had the thought before that maybe we should cache the size of
some limited number of relation forks in shared memory.  That would
potentially eliminate not only the open() calls but also the lseek()
calls.  The trouble is, to get any benefit from such a change, we'd
need to have a userspace cache which was at least as concurrent as
what the kernel implements.  We're currently well behind the Linux
kernel in terms of synchronization techniques, so that would represent
a considerable investment of time and energy.

In this particular case, it seems like there's probably some way to be
smarter.  If we knew that the relation was created or truncated in the
current transaction, and we knew that we hadn't created the free space
map for it, we could presumably deduce that it still doesn't exist.
Not sure exactly how to make that work, though, and it doesn't solve
the more general problem where you create in one transaction and then
insert in the next.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: review: CHECK FUNCTION statement
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Large number of open(2) calls with bulk INSERT into empty table