Re: robots.txt on git.postgresql.org
От | Dave Page |
---|---|
Тема | Re: robots.txt on git.postgresql.org |
Дата | |
Msg-id | CA+OCxoyOiOLbk8PM_HJCfnNj=uxgmOYz+cA4s40CUm9vWYSOeA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: robots.txt on git.postgresql.org (Craig Ringer <craig@2ndquadrant.com>) |
Список | pgsql-hackers |
On Wed, Jul 10, 2013 at 9:25 AM, Craig Ringer <craig@2ndquadrant.com> wrote: > On 07/09/2013 11:30 PM, Andres Freund wrote: >> On 2013-07-09 16:24:42 +0100, Greg Stark wrote: >>> I note that git.postgresql.org's robot.txt refuses permission to crawl >>> the git repository: >>> >>> http://git.postgresql.org/robots.txt >>> >>> User-agent: * >>> Disallow: / >>> >>> >>> I'm curious what motivates this. It's certainly useful to be able to >>> search for commits. >> >> Gitweb is horribly slow. I don't think anybody with a bigger git repo >> using gitweb can afford to let all the crawlers go through it. > > Wouldn't whacking a reverse proxy in front be a pretty reasonable > option? There's a disk space cost, but using Apache's mod_proxy or > similar would do quite nicely. It's already sitting behind Varnish, but the vast majority of pages on that site would only ever be hit by crawlers anyway, so I doubt that'd help a great deal as those pages would likely expire from the cache before it really saved us anything. -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: