parallelizing slow queries for multiple cores (PostgreSQL + Gearman)
От | henk de wit |
---|---|
Тема | parallelizing slow queries for multiple cores (PostgreSQL + Gearman) |
Дата | |
Msg-id | COL104-W6C6DC49AD7187E84F0E43F5990@phx.gbl обсуждение исходный текст |
Список | pgsql-performance |
Hi, >Has anyone done similar work in the light of upcoming many-core CPUs/systems? Any better results than 2x improvement? Yes, in fact I've done a very similar thing on a quad CPU box a while back. In my case the table in question had about 26million rows. I did nothing special to the table (no cluster, no partitioning, nothing, of course the table did had theappropriate indexes). Queries on this table are analytic/reporting kind of queries. Basically they are just aggregationsover a large number of rows. E.g. "the sum of column1 and the sum of column2 where time is some time and columnAhas some value and columnB has some other value", that kind of thing. From analysis the queries appeared to be nearly100% CPU bound. In my (Java) application I divided a reporting query for say the last 60 days in 2 equal portions: day 1 to 30 and day 31to 60 and assigned these to two worker threads. The results of these worker threads was merged using a simple resultsetmerge (the end result is simply the total of all rows returned by thread1 and thread2). The speed up I measuredon the quad box was a near perfect factor 2. I then divided the workload in 4 equal portions: day 1 to 15, 16 to30, 31 to 45 and 46 till 60. The speed up I measured was only a little less then a factor 4. I my situation too, the timeI measured included dispatching the jobs to a thread pool and merging their results. Of course, such a scheme can only be easily used when all workers return individual rows that are directly part of the endresult. If some further calculation has to be done on those rows, which happens to be the same calculation that is alsodone in the query you are parallelizing, then in effect you are duplicating logic. If you do that a lot in your codeyou can easily create a maintenance nightmare. Also, you have to be aware that without additional measures, every workerlives in its own transaction. Depending on the nature of the data this could potentially result in inconsistent databeing returned. In your case, on tables generated once per day this wouldn't be the case, but as a general techniqueyou have to be aware of this. Anyway, it's very clear that computers are moving to many-core architectures. Simple entry level servers already come thesedays with 8 cores. I've asked a couple of times on this list whether PG is going to support using multiple cores fora single query anytime soon, but this appears to be very unlikely. Until then it seems the only way to utilize multiplecores for a single query is doing it at the application level or by using something like pgpool-II. _________________________________________________________________ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
В списке pgsql-performance по дате отправления: