Обсуждение: postgresql-server-8.4.4-1PGDG.el5 - ERROR: could not write block 503414 of temporary file: No space le,ft on device
System info: Lancelot 1874-T dual Intel Xeon(Nehalem-DP) SATAII 24GB RAM, CentOS release 5.4 (Final), postgresql-server-8.4.4-1PGDG hi, We installed a fresh version of postgresql-server-8.4.4-1 using RPMs from PGDG. We are seeing the following errors, if the system trys to do 'anything' sort of update, insert, etc.: > 2010-07-26 11:06:59 MST LOG: database system was shut down at 2010-07-26 10:51:41 MST > 2010-07-26 11:06:59 MST LOG: autovacuum launcher started > 2010-07-26 11:06:59 MST LOG: database system is ready to accept connections > 2010-07-26 11:08:18 MST science_admin metadataFATAL: password authentication failed for user "science_admin" > 2010-07-26 11:09:33 MST postgres postgresFATAL: password authentication failed for user "postgres" > 2010-07-26 11:09:55 MST arcsoft arcsoftFATAL: password authentication failed for user "arcsoft" > 2010-07-26 11:10:00 MST arcsoft arcsoftFATAL: database "arcsoft" does not exist > 2010-07-26 11:10:43 MST arcsoft postgresLOG: statement: ALTER USER postgres WITH PASSWORD 'nsa_core'; > 2010-07-26 11:11:06 MST postgres postgresLOG: statement: ALTER USER operations WITH PASSWORD 'Mosaic_DHS'; > 2010-07-26 11:11:15 MST postgres postgresLOG: statement: ALTER USER pipeline WITH PASSWORD 'Mosaic_DHS'; > 2010-07-26 11:11:23 MST postgres postgresLOG: statement: ALTER USER science_admin WITH PASSWORD 'nsa_devel'; > 2010-07-26 11:11:31 MST postgres postgresLOG: statement: ALTER USER security_admin WITH PASSWORD 'nsa_devel'; > 2010-07-26 11:11:48 MST postgres postgresLOG: statement: ALTER USER voi WITH PASSWORD 'nsa_devel'; > 2010-07-26 11:15:24 MST system_admin metadataFATAL: password authentication failed for user "system_admin" > 2010-07-26 11:19:36 MST postgres postgresLOG: statement: ALTER USER system_admin WITH PASSWORD 'nsa_core'; > 2010-07-26 11:21:03 MST postgres postgresLOG: statement: ALTER USER system_admin WITH PASSWORD 'nsa_devel'; > 2010-07-26 11:21:19 MST system_admin system_adminFATAL: database "system_admin" does not exist > 2010-07-26 11:33:33 MST system_admin metadataERROR: could not write block 503414 of temporary file: No space le > ft on device > 2010-07-26 11:33:33 MST system_admin metadataHINT: Perhaps out of disk space? The postgresql is installed on /var/lib/pgsql, but it has plenty of space:: > -bash-3.2$ df -h > Filesystem Size Used Avail Use% Mounted on > /dev/sda3 16G 13G 1.7G 89% / > /dev/sda8 635G 979M 601G 1% /data0 > /dev/sda7 2.0G 241M 1.7G 13% /home > /dev/sda5 7.8G 1.5G 5.9G 20% /u1 > /dev/sda2 16G 184M 15G 2% /var/log > /dev/sda1 487M 22M 440M 5% /boot > tmpfs 12G 0 12G 0% /dev/shm > /dev/md0 2.7T 202M 2.6T 1% /data1 The pg_stat_tmp is hardly growing:: > -bash-3.2$ date > Mon Jul 26 14:37:38 MST 2010 > -bash-3.2$ pwd > /var/lib/pgsql/data > -bash-3.2$ du -ks pg_stat_tmp/* > 56 pg_stat_tmp/pgstat.stat > -bash-3.2$ du -ks pg_stat_tmp/* > 56 pg_stat_tmp/pgstat.stat In summary: 1) installed new version of postgresql-server-8.4.4-1. 2) restored a binary dump of a database from another system running same version. The restore succeeded without error. 3) started using the system We saw this same symptom in an CentOS5.2 machine, but we thought it had to do with the version of CentOS, so we repeated the steps to a system with more resources and running CentOS5.4. We have postgresql-server-8.2.14-1PGDG.rhel5 on other CentOS5.3 systems and we do not see this error. I read this post: > http://archives.postgresql.org/pgsql-performance/2010-04/msg00192.php I understand that autovacuum is enabled by default, I have not modified them: > -bash-3.2$ grep vacuum postgresql.conf > #vacuum_cost_delay = 0ms # 0-100 milliseconds > #vacuum_cost_page_hit = 1 # 0-10000 credits > #vacuum_cost_page_miss = 10 # 0-10000 credits > #vacuum_cost_page_dirty = 20 # 0-10000 credits > #vacuum_cost_limit = 200 # 1-10000 credits > #autovacuum = on # Enable autovacuum subprocess? 'on' > #log_autovacuum_min_duration = -1 # -1 disables, 0 logs all actions and > #autovacuum_max_workers = 3 # max number of autovacuum subprocesses > #autovacuum_naptime = 1min # time between autovacuum runs > #autovacuum_vacuum_threshold = 50 # min number of row updates before > # vacuum > #autovacuum_analyze_threshold = 50 # min number of row updates before > #autovacuum_vacuum_scale_factor = 0.2 # fraction of table size before vacuum > #autovacuum_analyze_scale_factor = 0.1 # fraction of table size before analyze > #autovacuum_freeze_max_age = 200000000 # maximum XID age before forced vacuum > #autovacuum_vacuum_cost_delay = 20ms # default vacuum cost delay for > # autovacuum, in milliseconds; > # -1 means use vacuum_cost_delay > #autovacuum_vacuum_cost_limit = -1 # default vacuum cost limit for > # autovacuum, -1 means use > # vacuum_cost_limit > #vacuum_freeze_min_age = 50000000 > #vacuum_freeze_table_age = 150000000 Your comments/suggestions are appreciated. --irene --------------------------------------------------------------------- Irene Barg Email: ibarg@noao.edu NOAO/AURA Inc. http://www.noao.edu/noao/staff/ibarg 950 N. Cherry Ave. Voice: 520-318-8273 Tucson, AZ 85726 USA FAX: 520-318-8360 ---------------------------------------------------------------------
On Mon, Jul 26, 2010 at 16:01, Irene Barg <ibarg@noao.edu> wrote: >> 2010-07-26 11:33:33 MST system_admin metadataERROR: could not write block >> 503414 of temporary file: No space le >> ft on device >> 2010-07-26 11:33:33 MST system_admin metadataHINT: Perhaps out of disk >> space? >> -bash-3.2$ df -h >> Filesystem Size Used Avail Use% Mounted on >> /dev/sda3 16G 13G 1.7G 89% / >> /dev/sda8 635G 979M 601G 1% /data0 >> /dev/sda7 2.0G 241M 1.7G 13% /home >> /dev/sda5 7.8G 1.5G 5.9G 20% /u1 >> /dev/sda2 16G 184M 15G 2% /var/log >> /dev/sda1 487M 22M 440M 5% /boot >> tmpfs 12G 0 12G 0% /dev/shm >> /dev/md0 2.7T 202M 2.6T 1% /data1 > > > The pg_stat_tmp is hardly growing:: > >> -bash-3.2$ date >> Mon Jul 26 14:37:38 MST 2010 >> -bash-3.2$ pwd >> /var/lib/pgsql/data Ok so it looks like this is on your / (sda3) partition that only has 1.7G free. Its entirely possible that you have some query that needs to write more than that in tmp files. >> -bash-3.2$ du -ks pg_stat_tmp/* No surprise that is not growing, thats not for temp files, its for the stats collector. I would try turning on log_temp_files, that might help pin point the problem query.
Alex Hunsaker <badalex@gmail.com> writes: > On Mon, Jul 26, 2010 at 16:01, Irene Barg <ibarg@noao.edu> wrote: > 2010-07-26 11:33:33 MST system_admin metadataERROR: could not write block > 503414 of temporary file: No space le > ft on device > 2010-07-26 11:33:33 MST system_admin metadataHINT: Perhaps out of disk > space? > Ok so it looks like this is on your / (sda3) partition that only has > 1.7G free. Its entirely possible that you have some query that needs > to write more than that in tmp files. "Block 503414" is a red flag here --- that means that this temp file alone required 503414 * 8K = nearly 4Gb. I suspect /var/lib/pgsql has been symlinked someplace else, as the root device has clearly not got anything close to that much free space. It's quite possible that the query involved multiple temp files, though, so maybe one of the other partitions is getting filled. regards, tom lane
Alex, It's funny how one cannot see what is right in front of them. I did NOT notice that 'var' was mounted as '/var/log', so postgres was getting installed in root '/'. So I moved pgsql to /data1, and created a symlink in /var/lib/pgsql to /data1/pgsql. Now I have a pgsql_tmp file that has grown to 1.9Tb! > -bash-3.2$ du -ks * > 5556 1 > 5364 11563 > 5436 11564 > 5529348 28523 > 1962946784 pgsql_tmp I'm afraid I don't understand how turning on 'log_temp_files' will help? Right now it is set to '-1' which should mean it's disabled, yet it is creating 1GB files and NOT cleaning them up: > -bash-3.2$ du -hs pgsql_tmp/* > 15M pgsql_tmp/pgsql_tmp22192.3070 > 1.1G pgsql_tmp/pgsql_tmp22192.4916 > 168M pgsql_tmp/pgsql_tmp22192.4918 > 1.1G pgsql_tmp/pgsql_tmp22192.4919 .... Thanks you. --irene Alex Hunsaker wrote: > On Mon, Jul 26, 2010 at 16:01, Irene Barg <ibarg@noao.edu> wrote: >>> 2010-07-26 11:33:33 MST system_admin metadataERROR: could not write block >>> 503414 of temporary file: No space le >>> ft on device >>> 2010-07-26 11:33:33 MST system_admin metadataHINT: Perhaps out of disk >>> space? > >>> -bash-3.2$ df -h >>> Filesystem Size Used Avail Use% Mounted on >>> /dev/sda3 16G 13G 1.7G 89% / >>> /dev/sda8 635G 979M 601G 1% /data0 >>> /dev/sda7 2.0G 241M 1.7G 13% /home >>> /dev/sda5 7.8G 1.5G 5.9G 20% /u1 >>> /dev/sda2 16G 184M 15G 2% /var/log >>> /dev/sda1 487M 22M 440M 5% /boot >>> tmpfs 12G 0 12G 0% /dev/shm >>> /dev/md0 2.7T 202M 2.6T 1% /data1 >> >> The pg_stat_tmp is hardly growing:: >> >>> -bash-3.2$ date >>> Mon Jul 26 14:37:38 MST 2010 >>> -bash-3.2$ pwd >>> /var/lib/pgsql/data > > Ok so it looks like this is on your / (sda3) partition that only has > 1.7G free. Its entirely possible that you have some query that needs > to write more than that in tmp files. > >>> -bash-3.2$ du -ks pg_stat_tmp/* > > No surprise that is not growing, thats not for temp files, its for the > stats collector. > > I would try turning on log_temp_files, that might help pin point the > problem query. > -- --------------------------------------------------------------------- Irene Barg Email: ibarg@noao.edu NOAO/AURA Inc. http://www.noao.edu/noao/staff/ibarg 950 N. Cherry Ave. Voice: 520-318-8273 Tucson, AZ 85726 USA FAX: 520-318-8360 ---------------------------------------------------------------------
On Tue, Jul 27, 2010 at 15:58, Irene Barg <ibarg@noao.edu> wrote: > I'm afraid I don't understand how turning on 'log_temp_files' will help? Well, postgres is probably doing exactly what its told. You just need to figure out what that is and why. ( and if it is some kind of bug, we wont be able to do anything about it unless you give us a test case :-) ) The idea behind log_temp_files is that you should be able to see whats causing the creation of all those temp files (and so potentially 'fix' the query/problem). The main problem with log_temp_files is it only logs when the command is done executing. Of course if it really is creating tons of files it should take long enough that you should be able to select * from pg_stat_activity; and pinpoint the problem query that way. If all else fails you might have better luck with turning on log_statements and then lurking through the logs when you start to get lots of tmp files. > Right now it is set to '-1' which should mean it's disabled, yet it is > creating 1GB files and NOT cleaning them up: It wont clean them up until its done using them. If a backend crashes, it looks like it might orphan some tmp files, at least until postgres is restarted.
Alex, Thanks for the explanation. That helps. After looking at some of the pgsql_tmp files, It was clear we needed to increase work_mem. So I switched back to the pgtune suggestions with one change: > -bash-3.2$ diff postgresql.conf postgresql.conf.local | grep "< " > < log_temp_files = 0 # log temporary files equal or larger > < default_statistics_target = 50 # pgtune wizard 2010-07-26 > < maintenance_work_mem = 1GB # pgtune wizard 2010-07-26 > < constraint_exclusion = on # pgtune wizard 2010-07-26 > < checkpoint_completion_target = 0.9 # pgtune wizard 2010-07-26 > < effective_cache_size = 16GB # pgtune wizard 2010-07-26 > < work_mem = 144MB # pgtune wizard 2010-07-26 > < wal_buffers = 8MB # pgtune wizard 2010-07-26 > < checkpoint_segments = 16 # pgtune wizard 2010-07-26 > < #shared_buffers = 5632MB # pgtune wizard 2010-07-26 > < shared_buffers = 2048MB > < max_connections = 80 # pgtune wizard 2010-07-26 The postmaster wouldn't start with shared_buffers = 5632MB, so we reduced it to 2GB. The system's shared memory is: > [dppops@pipen18 ~]$ sudo /sbin/sysctl -a | grep kernel | grep shmmax > kernel.shmmax = 4182863872 We have 24GB RAM total. The system is behaving normal now, it does create intermittant 'pgsql_tmp' files, but removes them after the SQL statement completes. We will monitor the logs and try to make informed tweaks to the postgresql settings as needed. Thanks to you and Tom Lane for your responses. This group is one of the best! --irene barg Alex Hunsaker wrote: > On Tue, Jul 27, 2010 at 15:58, Irene Barg <ibarg@noao.edu> wrote: >> I'm afraid I don't understand how turning on 'log_temp_files' will help? > > Well, postgres is probably doing exactly what its told. You just need > to figure out what that is and why. ( and if it is some kind of bug, > we wont be able to do anything about it unless you give us a test case > :-) ) > > The idea behind log_temp_files is that you should be able to see whats > causing the creation of all those temp files (and so potentially 'fix' > the query/problem). The main problem with log_temp_files is it only > logs when the command is done executing. Of course if it really is > creating tons of files it should take long enough that you should be > able to select * from pg_stat_activity; and pinpoint the problem query > that way. If all else fails you might have better luck with turning > on log_statements and then lurking through the logs when you start to > get lots of tmp files. > >> Right now it is set to '-1' which should mean it's disabled, yet it is >> creating 1GB files and NOT cleaning them up: > > It wont clean them up until its done using them. If a backend > crashes, it looks like it might orphan some tmp files, at least until > postgres is restarted. -- --------------------------------------------------------------------- Irene Barg Email: ibarg@noao.edu NOAO/AURA Inc. http://www.noao.edu/noao/staff/ibarg 950 N. Cherry Ave. Voice: 520-318-8273 Tucson, AZ 85726 USA FAX: 520-318-8360 ---------------------------------------------------------------------
On Wed, Jul 28, 2010 at 15:43, Irene Barg <ibarg@noao.edu> wrote: > Alex, > The postmaster wouldn't start with shared_buffers = 5632MB, so we reduced it > to 2GB. The system's shared memory is: > >> [dppops@pipen18 ~]$ sudo /sbin/sysctl -a | grep kernel | grep shmmax >> kernel.shmmax = 4182863872 You can bump that up so it works with the larger setting (assuming you are running a 64bit OS): http://developer.postgresql.org/pgdocs/postgres/kernel-resources.html > The system is behaving normal now, it does create intermittant 'pgsql_tmp' > files, but removes them after the SQL statement completes. We will monitor > the logs and try to make informed tweaks to the postgresql settings as > needed. If its doing a sort or join a btree index on the appropriate column(s) may help.