Обсуждение: Delay in completion of aggregation inserts when run in a single commit - PG 9.1.2
Delay in completion of aggregation inserts when run in a single commit - PG 9.1.2
Hi All,
We have a reporting application to consume transactional log created by a billing system. The flow of data in ETL framework is from TRANSACTION_LOG->STAGING_TABLES->FACT_TABLES->AGGREGATE_TABLES.
There are around 15 aggregate tables to be populated from these staging tables. Number of records for a one time will be 1.2 million records on an average. We are expecting anywhere near to 10 such processes in a day.
Problem:
- Staging table population happens in expected time. Anywhere from 10-15 minutes for each process.
- Aggregation process almost never completes which is a set of 15 insert queries. Entire process happens in a single commit (population of staging tables and aggregation).
- When I run these 15 queries individually, it happens in no time.
Could you be kind enough to explain how differently queries run when in a “single commit” as opposed to running each of them individually?
----
The system is CentOS 5.5, Postgres installed is 9.1.2.
There are 3 postgres servers installed, each at 5432, 5433 and 5434 ports. It is a 16GB RAM with 3 NetAPP storage box mounted with 500GB each.
Postgresql.conf remains untouched, except for log writing which has been made to “all”.
----
Any info greatly appreciated.
Thanks in advance,
Akash.
DISCLAIMER: The information in this message is confidential and may be legally privileged. It is intended solely for the addressee. Access to this message by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, or distribution of the message, or any action or omission taken by you in reliance on it, is prohibited and may be unlawful. Please immediately contact the sender if you have received this message in error. Further, this e-mail may contain viruses and all reasonable precaution to minimize the risk arising there from is taken by OnMobile. OnMobile is not liable for any damage sustained by you as a result of any virus in this e-mail. All applicable virus checks should be carried out by you before opening this e-mail or any attachment thereto.
Thank you - OnMobile Global Limited.
Re: Delay in completion of aggregation inserts when run in a single commit - PG 9.1.2
On 07/05/2012 02:30 AM, Akash Kodibail wrote: > -Staging table population happens in expected time. Anywhere from > 10-15 minutes for each process. > > -Aggregation process almost never completes which is a set of 15 > insert queries. Entire process happens in a single commit (population > of staging tables and aggregation). > When you mean "never completes" - can you be more specific? Is the query not progressing - say, waiting on a lock? Is it doing work, but performing very poorly? Does it terminate with a failure? If so, error message? If the query just keeps running but you're not sure what it is or isn't doing, you should examine pg_stat_activity and pg_locks. From pg_stat_activity you can get the backend process ID; examine that backend using system tools (ps, top, etc) to see if it's using CPU and whether it's doing disk I/O. From pg_locks you can get an idea of whether a query is stuck waiting on a lock. > Could you be kind enough to explain how differently queries run when > in a “single commit” as opposed to running each of them individually? There shouldn't be tons of difference. The first queries continue to hold locks while the others run, but since you're in the same transaction it won't affect subsequent queries, only concurrent ones. If you have other concurrent work going on it's possible you're running into locking issues that you don't encounter when you release locks with individual commits. Also, some resources can only be released when a transaction commits. If you're doing something like using huge numbers of savepoints or PL/PgSQL BEGIN ... EXCEPT blocks it's possible you're running into resource issues. The SAVEPOINT code has improved massively since 8.3 when I had similar issues, but savepoints still aren't free. Start by looking at the locking situation, and by determining whether your query is making slow forward progress or is simply waiting. > > Postgresql.conf remains untouched, except for log writing which has > been made to “all”. > That's going to be VERY sub-optimal if you're bulk-loading data. You'll need to push checkpoint_segments right up. Check your logs; you're probably seeing warnings about checkpoints being too frequent. Hell, my checkpoint_segments is at 50 on my *laptop* - albeit a very fast SSD-equipped laptop that does lots of blk data uinit testing. You'll want to push effective_cache_size up to 4 or 5 GB given you have three instances. Hard to say re shared_buffers, I have little experience with running multiple instances on one machine. -- Craig Rin
Re: Delay in completion of aggregation inserts when run in a single commit - PG 9.1.2
Akash Kodibail wrote: > Problem: [...] > > - Aggregation process almost never completes which is a set of 15 insert queries. Entire > process happens in a single commit (population of staging tables and aggregation). > > - When I run these 15 queries individually, it happens in no time. > > Could you be kind enough to explain how differently queries run when in a "single commit" as opposed > to running each of them individually? Apart from minor differences (current_timestamp, ...) the main difference is that there is no COMMIT after each query. COMMIT is expensive since it requires a write to hard disk. Try to get the execution plans of the queries in both cases and see if you can spot a difference. The auto_explain contrib module might come handy for that. Yours, Laurenz Albe
Re: Delay in completion of aggregation inserts when run in a single commit - PG 9.1.2
On 07/05/2012 04:00 PM, Albe Laurenz wrote: > Akash Kodibail wrote: >> Problem: > [...] >> - Aggregation process almost never completes which is a set of 15 > insert queries. Entire >> process happens in a single commit (population of staging tables and > aggregation). >> - When I run these 15 queries individually, it happens in no time. >> >> Could you be kind enough to explain how differently queries run when > in a "single commit" as opposed >> to running each of them individually? > Apart from minor differences (current_timestamp, ...) the > main difference is that there is no COMMIT after each query. > COMMIT is expensive since it requires a write to hard disk. > > Try to get the execution plans of the queries in both cases > and see if you can spot a difference. Actually, that's a thought. Autovaccum can't see uncomitted work, and thus can't analyze it. Maybe they need an explicit ANALYZE or two after a bulk insert or update during their data load. -- Craig Ringer
@@Craig I could only wait for 6 hours. I cancelled the process after that. No locks on any of the tables when running. Yes, It was progressing. Of the 15 tables, when I cancelled after 6 hours, 3rd table was getting populated (when checked in pg_stat_activity). Below are some changes I have in postgresql.conf, some based on your suggestions: 1) constraint_exclusion = partition, which I had already made, supported by an trigger on every insert with around 100 checks for each table (date constraints). 2)log_destination = 'stderr' 3)logging_collector = on 4)autovacuum = off, we are doing daily vacuum analyze on all tables. 5)shared_buffers = 32MB 6)max_prepared_transactions = 100 7)work_mem = 16MB 8)maintenance_work_mem = 64MB 9)wal_writer_delay = 1000ms 10)checkpoint_segments = 16 After these changes process is moving forward, but somehow I feel that, It is only matter of time, I will reach the next choking point. Since I am seeing the that process completion is taking a little longer each time I run it. Also a thing I noticed is: When I run the query: select relname, age(relfrozenxid) from pg_class order by age(relfrozenxid) desc; relname | age --------------------------------------------------+------------ pg_toast_28344_index | 2147483647 pg_toast_28351_index | 2147483647 pg_toast_33106_index | 2147483647 pg_toast_33099_index | 2147483647 pg_toast_32128_index | 2147483647 pg_toast_28232_index | 2147483647 pg_toast_33092_index | 2147483647 promo_seq | 2147483647 pg_toast_33085_index | 2147483647 pg_toast_32135_index | 2147483647 pg_toast_33120_index | 2147483647 ...... There are many more. Regards, Akash. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Delay-in-completion-of-aggregation-inserts-when-run-in-a-single-commit-PG-9-1-2-tp5715391p5715621.html Sent from the PostgreSQL - admin mailing list archive at Nabble.com.
That's a very ambiguous statement. I'm assuming you are saying "our code does not take any explict locks on those tables using LOCK TABLE or SELECT ... FOR SHARE / FOR UPDATE" - because there most certainly are locks on those tables whenever you're running a query against them. Even a simple SELECT takes out a lock to prevent the table from being dropped while the query runs.@@Craig I could only wait for 6 hours. I cancelled the process after that. No locks on any of the tables when running.
There could be no locks taken by transactions other than the transaction doing the run, though.
OK. If you turn auto_explain on and use it to get a plan for the slow queries, do those plans differ from the plans produced when running the same queries standalone?Yes, It was progressing. Of the 15 tables, when I cancelled after 6 hours, 3rd table was getting populated (when checked in pg_stat_activity).
Did you examine iostat? Did you look at what work the postgres process was doing to see if it was busy with cpu or I/O (iowait) or if it was idle waiting for something else to happen?
No!4)autovacuum = off, we are doing daily vacuum analyze on all tables.
Unless your database as no write activity other than these batch loads, you should have autovacuum on. The more frequently autovaccum runs the better a job it will do of preventing bloat, maintaining table stats, etc.
If you have tables that are only ever modified by a bulk-load script that then immediately vacuums them, set the autovac parameters for that table so it's excluded, don't turn autovaccum off entirely.
That's tiny and will severely constrain Pg's resources.5)shared_buffers = 32MB
I notice you didn't set effective_cache_size either.
Well, your tables and indexes are probably bloating horribly because autovaccum is turned off, so that's not surprising.After these changes process is moving forward, but somehow I feel that, It is only matter of time, I will reach the next choking point. Since I am seeing the that process completion is taking a little longer each time I run it.
If you drop and recreate, or TRUNCATE, the tables between load runs you might be OK with autovac off for those tables, but what you're describing makes me think otherwise.
--
Craig Ringer
Hi,In my Postgres log the message is occurring:1 4ffad9f4.67db 2012-07-09 10:17:40 BRT LOG: connection received: host=10.20.0.1 port=23682 4ffad9f4.67db 2012-07-09 10:17:40 BRT LOG: connection authorized: user=usu_cad database=prd_cad3 4ffad9f4.67db 2012-07-09 10:17:42 BRT LOG: could not send data to client: Connection reset by peer4 4ffad9f4.67db 2012-07-09 10:17:42 BRT LOG: unexpected EOF on client connectionUsers can not run the system.
If this is a Windows system with a 3rd party software firewall (Windows Firewall is OK, anything else isn't) then remove or disable the firewall and see if the problem goes away.
Otherwise, more info please. PostgreSQL version, OS and version, client program, whether the client program is using psqlODBC, pgJDBC, libpq directly, etc.
--
Craig Ringer