Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

Поиск

Список

Период

Сортировка

От	Sergey Koposov
Тема	Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
Дата	30 мая 2012 г. 22:10:54
Msg-id	alpine.LRH.2.02.1205310148440.6351@calx046.ast.cam.ac.uk обсуждение исходный текст
Ответ на	Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile (Jeff Janes <jeff.janes@gmail.com>)
Ответы	Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
Список	pgsql-hackers

Дерево обсуждения

On Wed, 30 May 2012, Jeff Janes wrote:

>> But the question now is whether there is a *PG* problem here or not, or is
>> it Intel's or Linux's problem ? Because still the slowdown was caused by
>> locking. If there wouldn't be locking there wouldn't be any problems (as
>> demonstrated a while ago by just cat'ting the files in multiple threads).
>
> You cannot have a traditional RDBMS without locking.  From your

I understand the need of significant locking when there concurrent writes, 
but not when there only reads. But I'm not a RDBMS expert, so that's 
maybe that's misunderstanding on my side.

> description of the problem,  I probably wouldn't be using a traditional
> database system at all for this, but rather flat files and Perl.

Flat files and perl for 25-50 TB of data over few years is a bit extreme 
;)

> Or
> at least, I would partition the data before loading it to the DB,
> rather than trying to do it after.

I intensionally did otherwise, because I thought that PG will 
to be much smarter than me in juggling the data I'm ingesting (~ tens of 
gig each day), join the appropriate bits of data and then split by 
partitions. Unfortunately I see that there are some scalability 
issues on the way, which I didn't expect. Those aren't fatal, but slightly 
disappointing.

> But anyway, is idt_match a fairly static table?  If so, I'd partition
> that into 16 tables, and then have each one of your tasks join against
> a different one of those tables.  That should relieve the contention
> on the index root block, and might have some other benefits as well.

No, idt_match is getting filled by multi-threaded copy() and then joined 
with 4 other big tables like idt_phot. The result is then split into 
partitions. And I was trying different approaches to fully utilize the 
CPUs and/or I/O and somehow parallize the queries. That's the 
reasoning for somewhat contrived  queries in my test.

Cheers,    S

*****************************************************
Sergey E. Koposov, PhD, Research Associate
Institute of Astronomy, University of Cambridge
Madingley road, CB3 0HA, Cambridge, UK
Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile