Re: Reducing buildfarm disk usage: remove temp installs when done
От | Tom Lane |
---|---|
Тема | Re: Reducing buildfarm disk usage: remove temp installs when done |
Дата | |
Msg-id | 28310.1421645334@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: Reducing buildfarm disk usage: remove temp installs when done (Andrew Dunstan <andrew@dunslane.net>) |
Ответы |
Re: Reducing buildfarm disk usage: remove temp installs
when done
|
Список | pgsql-hackers |
Andrew Dunstan <andrew@dunslane.net> writes: > On 01/18/2015 09:20 PM, Tom Lane wrote: >> What I see on dromedary, which has been around a bit less than a year, >> is that the at-rest space consumption for all 6 active branches is >> 2.4G even though a single copy of the git repo is just over 400MB: >> $ du -hsc pgmirror.git HEAD REL* >> 416M pgmirror.git >> 363M HEAD >> 345M REL9_0_STABLE >> 351M REL9_1_STABLE >> 354M REL9_2_STABLE >> 358M REL9_3_STABLE >> 274M REL9_4_STABLE >> 2.4G total > This isn't happening for me. Here's crake: > [andrew@emma root]$ du -shc pgmirror.git/ [RH]*/pgsql > 218M pgmirror.git/ > 149M HEAD/pgsql > 134M REL9_0_STABLE/pgsql > 138M REL9_1_STABLE/pgsql > 140M REL9_2_STABLE/pgsql > 143M REL9_3_STABLE/pgsql > 146M REL9_4_STABLE/pgsql > 1.1G total > Maybe you need some git garbage collection? Weird ... for me, dromedary and prairiedog are both showing very similar numbers. Shouldn't GC be automatic? These machines are not running latest and greatest git (looks like 1.7.3.1 and 1.7.9.6 respectively), maybe that has something to do with it? A fresh clone from git://git.postgresql.org/git/postgresql.git right now is 167MB (using dromedary's git version), so we're both showing some bloat over the minimum possible repo size, but it's curious that mine is so much worse. But the larger point is that git fetch does not, AFAICT, have the same kind of optimization that git clone does to do hard-linking when copying an object from a local source repo. With or without GC, the resulting duplicative storage is going to be the dominant effect after awhile on a machine tracking a full set of branches. > An alternative would be to remove the pgsql directory at the end of the > run and thus do a complete fresh checkout each run. As you say it would > cost some time but save some space. At least it would be doable as an > option, not sure I'd want to make it non-optional. What I was thinking is that a complete-fresh-checkout approach would remove the need for the copy_source step that happens now, thus buying back at least most of the I/O cost. But that's only considering the working tree. The real issue here seems to be about having duplicative git repos ... seems like we ought to be able to avoid that. regards, tom lane
В списке pgsql-hackers по дате отправления: