Обсуждение: best practices for separating data and logs
Hi all, I'm planning a lot of changes for migrating to PostgreSQL 8.3, among them being a better way of separating data and logs (transaction logs, that is). Currently, the OS and log data are on one disk system, and the data (including configs) are on the other disk system. After creating the database cluster, I copy the pg_xlog directory to the OS system and symlink it from the database. So, I'm wondering... - Are there any best practices, or better practices, than symlinking? - How do other people have this set up, or recommend setting this up (e.g. also moving pg_clog or other things as well)? I searched through the archives and found a few threads regarding separating data and WAL, but nothing regarding best practices or specifics. Thanks, Peter P.S. Here are exact details on how I have things set up. [root@mitchell testing-8.2]# pwd /scratch.1/postgres/testing-8.2 [root@mitchell testing-8.2]# ls -l total 48 drwx------ 8 postgres postgres 83 Oct 8 16:57 base drwx------ 2 postgres postgres 4096 Oct 12 05:07 global drwx------ 2 postgres postgres 94 Oct 9 14:28 pg_clog -rw------- 1 postgres postgres 3841 Aug 28 14:16 pg_hba.conf -rw------- 1 postgres postgres 1460 Aug 8 14:06 pg_ident.conf drwx------ 4 postgres postgres 34 Aug 8 14:06 pg_multixact drwx------ 2 postgres postgres 17 Oct 12 00:15 pg_subtrans drwx------ 2 postgres postgres 6 Aug 8 14:06 pg_tblspc drwx------ 2 postgres postgres 6 Aug 8 14:06 pg_twophase -rw------- 1 postgres postgres 4 Aug 8 14:06 PG_VERSION lrwxrwxrwx 1 root root 37 Oct 8 16:18 pg_xlog -> /scratch/postgres/testing-8.2/pg_xlog -rwxr-xr-x 1 postgres postgres 15212 Oct 11 15:05 postgresql.conf -rw------- 1 postgres postgres 96 Oct 11 15:55 postmaster.opts -rw------- 1 postgres postgres 57 Oct 11 15:55 postmaster.pid -rw-r--r-- 1 postgres postgres 1805 Aug 8 14:06 server.crt -rw------- 1 postgres postgres 1675 Aug 8 14:06 server.key [root@mitchell testing-8.2]# df Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda1 1019208 434264 532336 45% / /dev/sda8 56394768 348344 53135448 1% /scratch /dev/sda6 1019208 34664 931936 4% /tmp /dev/sda3 4061572 202080 3649844 6% /var /dev/sda7 1019208 701972 264628 73% /var/vice/cache /dev/sda2 10154020 5616776 4013124 59% /usr tmpfs 451556 0 451556 0% /dev/shm /dev/sdb1 78110004 9589792 68520212 13% /scratch.1 AFS 9000000 0 9000000 0% /afs
"Peter Koczan" <pjkoczan@gmail.com> writes: > I'm planning a lot of changes for migrating to PostgreSQL 8.3, among > them being a better way of separating data and logs (transaction logs, > that is). > Currently, the OS and log data are on one disk system, and the data > (including configs) are on the other disk system. After creating the > database cluster, I copy the pg_xlog directory to the OS system and > symlink it from the database. > So, I'm wondering... > - Are there any best practices, or better practices, than symlinking? I believe 8.3's initdb has an explicit option for making pg_xlog be a symlink to someplace. The results aren't different from doing it manually, but it saves a step (and a chance for mistake). > - How do other people have this set up, or recommend setting this up > (e.g. also moving pg_clog or other things as well)? I think consensus is that pg_clog is best treated as part of the data. The point of moving xlog is that the heads on that drive will never have to move away from the current xlog tip; as soon as you put something else on that drive, you ruin the performance benefit. regards, tom lane
tgl@sss.pgh.pa.us (Tom Lane) writes: > "Peter Koczan" <pjkoczan@gmail.com> writes: >> I'm planning a lot of changes for migrating to PostgreSQL 8.3, among >> them being a better way of separating data and logs (transaction logs, >> that is). > >> Currently, the OS and log data are on one disk system, and the data >> (including configs) are on the other disk system. After creating the >> database cluster, I copy the pg_xlog directory to the OS system and >> symlink it from the database. > >> So, I'm wondering... > >> - Are there any best practices, or better practices, than symlinking? > > I believe 8.3's initdb has an explicit option for making pg_xlog be a > symlink to someplace. The results aren't different from doing it > manually, but it saves a step (and a chance for mistake). Yes, indeed. -X, --xlogdir=XLOGDIR location for the transaction log directory I had not been aware of this one; this seems like a nice feature to add support for in an init script... We've been using an init script that offers a whole bunch of options, originally due to Drew Hammond. These days, the actions offered are thus: [start|reload|stop|env|mkdir|initdb|logtail] start/stop/reload are pretty traditional. The other options are pretty interesting, particularly for cases where you might want to: - Frequently create databases from scratch, as when testing CVS HEAD - Alternatively, to help establish common policies, for the "less frequent" cases. env: Sets up PATH, MAN_PATH, PGPORT with the values used by the backend in "this" init file mkdir: Sets up all the directories required both for DB backend and for logging logtail: runs "tail -f" on the last log file for the cluster initdb: Runs initdb, pointing at particular directories, and with particular configuration policy. I have recently augmented this by making it smart enough to rewrite the postgresql.conf file (using sed) to establish default values for a dozen or so options that tend to get customized with fairly common values. In effect, the entire cluster configuration gets set up in about a 10-line section near the top of the script. Adding in an option to redirect pg_xlog seems like a mighty fine idea; I know that on various occasions, I have had the irritation of building clusters and having to go to some fragile manual effort to shift pg_xlog somewhere else. Automating THAT seems like it's a "no-brainer" as far as being an excellent idea... I probably ought to ask around for permission to release this; it seems like it's probably useful enough (I have been using it a lot) that it ought to be in a CVS repository somewhere, rather than languishing on my desktop. -- (format nil "~S@~S" "cbbrowne" "linuxfinances.info") http://linuxdatabases.info/info/linuxxian.html I am not a Church numeral! I am a free variable!
Chris Browne wrote: > - Alternatively, to help establish common policies, for the "less > frequent" cases. > > env: Sets up PATH, MAN_PATH, PGPORT with the values used by > the backend in "this" init file How does this work? I have my own script to do things, and one of the painful bits is setting up the environment. What my script does is emit the var definitions to stdout, so I have to run the script on `` to let the current shell get the definitions. > logtail: runs "tail -f" on the last log file for the cluster Hmm, interesting. Perhaps I oughta implement this. Currently my script starts the server by launching the postmaster in foreground, so there is no need for this -- but I have frequently wished for an option to start it in background, and be able to get at the log separately. I use this script so frequently that I have also implemented bash completion for it. It is truly helpful. (Of course, it deals with multiple installations by getting the one to use as an argument, and autocompletes that as well). -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.