Обсуждение: Having postgresql.org link to cgit instead of gitweb
Hi, While prepping the website for the PG18 GA, I stumbled on the inability to access parts of commits through the gitweb links, specifically hitting 429 status code errors (this seems to be intermittent). After some briefing on why it's disabled and how this isn't an issue with cgit, I prepped a patch for postgresql.org (the main website) that would update the git.postgresql.org reference to use cgit instead of gitweb. However, as this could impact some hacker workflows (e.g. the commit search page), I wanted to run this patch by -hackers before committing. Basically, the patch: * Moves any web links to git.postgresql.org repos to use the cgit interface instead of gitweb (e.g. [1]) * Update the commit search[2] to use cgit instead of gitweb Please note that this doesn't impact the availability of gitweb, rather the main parts of the postgresql.org website will link to cgit first, and people will have a more consistent experience overall (e.g. no 429 errors). Thoughts? Thanks, Jonathan [1] https://www.postgresql.org/developer/related-projects/ [2] https://www.postgresql.org/developer/coding/
Вложения
On Fri, 19 Sept 2025 at 13:12, Jonathan S. Katz <jkatz@postgresql.org> wrote: > While prepping the website for the PG18 GA, I stumbled on the inability > to access parts of commits through the gitweb links, specifically > hitting 429 status code errors (this seems to be intermittent). After > some briefing on why it's disabled and how this isn't an issue with > cgit, I prepped a patch for postgresql.org (the main website) that would > update the git.postgresql.org reference to use cgit instead of gitweb. > Please note that this doesn't impact the availability of gitweb, rather > the main parts of the postgresql.org website will link to cgit first, > and people will have a more consistent experience overall (e.g. no 429 > errors). You didn't mention the cause of the specific issues, but it has been mentioned on www lists before, so I don't think it's a secret with the bot traffic. Have you considered if switching these links to cgit wouldn't just cause the traffic to migrate to cgit, over time? If so, would you just be moving the problem from one place to another? I mean, the bots are getting the links from somewhere. I'd imagine release notes and the likes to be a popular source of links. Perhaps someone with more knowledge than I have on the problem can comment to give insight into if the same issue could occur with cgit. David
On 2025-Sep-19, David Rowley wrote: > You didn't mention the cause of the specific issues, but it has been > mentioned on www lists before, so I don't think it's a secret with the > bot traffic. Have you considered if switching these links to cgit > wouldn't just cause the traffic to migrate to cgit, over time? I think this will happen, yes. There are two problems here actually: the first one is that the old gitweb program, implemented in Perl, is awfully slow itself. Git itself is fast enough for most things and I don't think serving its output efficiently, as cgit does, is going to be a performance problem. So for the `blob` objects, which is what this is mostly used for, we should be fine with cgit. The other problem is `git blame`, which can be slow also with pure git, so if (when) the bots move to run blame with cgit, then we'll be in trouble just as well, and we're going to need some gating in order to prevent trouble. However, `blame` hasn't been as much of a problem as `blob` has, so we can take this more leisurely. There are two things we could do. One is to simply restrict `git blame` to authenticated users; this shouldn't be _too_ bad. But if we don't want that, we could put the bot checker javascript tricks in front of `blame`. In fact maybe we could have the best of both worlds: you get the javascript check if you're not authenticated, but nothing if you are. I'm not sure how easy it is to implement this though. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
On 19.09.25 03:12, Jonathan S. Katz wrote: > * Moves any web links to git.postgresql.org repos to use the cgit > interface instead of gitweb (e.g. [1]) > * Update the commit search[2] to use cgit instead of gitweb If we're doing that -- which seems reasonable -- then perhaps also update the forwarder for the links sent to pgsql-committers, like https://git.postgresql.org/pg/commitdiff/ed1aad15e09d7d523f4ef413e3c4d410497c8065 This might be related to the second item, not sure.
On 19.09.25 10:22, Álvaro Herrera wrote: > There are two things we could do. One is to simply restrict `git blame` > to authenticated users; this shouldn't be_too_ bad. But if we don't > want that, we could put the bot checker javascript tricks in front of > `blame`. In fact maybe we could have the best of both worlds: you get > the javascript check if you're not authenticated, but nothing if you > are. I'm not sure how easy it is to implement this though. Or just disable git blame. Who needs to run that through the website?
> On 19 Sep 2025, at 13:05, Peter Eisentraut <peter@eisentraut.org> wrote: > > On 19.09.25 10:22, Álvaro Herrera wrote: >> There are two things we could do. One is to simply restrict `git blame` >> to authenticated users; this shouldn't be_too_ bad. But if we don't >> want that, we could put the bot checker javascript tricks in front of >> `blame`. In fact maybe we could have the best of both worlds: you get >> the javascript check if you're not authenticated, but nothing if you >> are. I'm not sure how easy it is to implement this though. > > Or just disable git blame. Who needs to run that through the website? We could jut link to the postgres mirror on Github for that. -- Daniel Gustafsson
On Fri, 19 Sept 2025 at 23:05, Peter Eisentraut <peter@eisentraut.org> wrote: > > On 19.09.25 10:22, Álvaro Herrera wrote: > > There are two things we could do. One is to simply restrict `git blame` > > to authenticated users; this shouldn't be_too_ bad. But if we don't > > want that, we could put the bot checker javascript tricks in front of > > `blame`. In fact maybe we could have the best of both worlds: you get > > the javascript check if you're not authenticated, but nothing if you > > are. I'm not sure how easy it is to implement this though. > > Or just disable git blame. Who needs to run that through the website? I'd vote for getting rid of the blame if it could buy us back enough CPU cycles to have diff working again. I personally miss not having diff. I found it convenient when following links to see what's been changed from the pgsql-committers list. David
On 9/19/25 7:42 AM, David Rowley wrote: > On Fri, 19 Sept 2025 at 23:05, Peter Eisentraut <peter@eisentraut.org> wrote: >> >> On 19.09.25 10:22, Álvaro Herrera wrote: >>> There are two things we could do. One is to simply restrict `git blame` >>> to authenticated users; this shouldn't be_too_ bad. But if we don't >>> want that, we could put the bot checker javascript tricks in front of >>> `blame`. In fact maybe we could have the best of both worlds: you get >>> the javascript check if you're not authenticated, but nothing if you >>> are. I'm not sure how easy it is to implement this though. >> >> Or just disable git blame. Who needs to run that through the website? > > I'd vote for getting rid of the blame if it could buy us back enough > CPU cycles to have diff working again. I personally miss not having > diff. I found it convenient when following links to see what's been > changed from the pgsql-committers list. With the disclaimer that I'm not the target audience for this work, I've previously used the "git blame" web feature on git.postgresql.org to figure some stuff out, but these days I just use the Github one as Daniel mentioned. I do think the absence of diff is less than ideal, and definitely something that I use fairly frequently even if I'm not hacking often. For the website/patch itself (gitweb vs. cgit), again I'm not the target audience, so I'll defer to what you all want and particularly want to ensure your lives are easier. However, with the upcoming traffic spike with GA, I do want to ensure that our linked things are still working, which is what prompted the discussion. Jonathan
Вложения
On Thu, Sep 18, 2025 at 9:12 PM Jonathan S. Katz <jkatz@postgresql.org> wrote: > While prepping the website for the PG18 GA, I stumbled on the inability > to access parts of commits through the gitweb links, specifically > hitting 429 status code errors (this seems to be intermittent). After > some briefing on why it's disabled and how this isn't an issue with > cgit, I prepped a patch for postgresql.org (the main website) that would > update the git.postgresql.org reference to use cgit instead of gitweb. cgit messes up indentation by showing 8 space tabs (not 4 space tabs) -- that's certainly not ideal. I understand that the same problem was fixed within gitweb by patching the source code. -- Peter Geoghegan
On 2025-Sep-19, Peter Eisentraut wrote: > On 19.09.25 03:12, Jonathan S. Katz wrote: > > * Moves any web links to git.postgresql.org repos to use the cgit > > interface instead of gitweb (e.g. [1]) > > * Update the commit search[2] to use cgit instead of gitweb > > If we're doing that -- which seems reasonable -- then perhaps also update > the forwarder for the links sent to pgsql-committers, like > > https://git.postgresql.org/pg/commitdiff/ed1aad15e09d7d523f4ef413e3c4d410497c8065 > > This might be related to the second item, not sure. No, I think Jonathan wasn't thinking of these links when he mentioned that second item. I do have the /pg/commitdiff/ URLs in mind, but that's a pginfra configuration file that needs to be changed. I'll see about changing that as well, because I've been bitten by this problem there too. BTW regarding Jon's second item, I was again reminded that we have this "backend flowchart" page there, https://www.postgresql.org/developer/backend/ I think this is a prime example of something that we could do much better by adding one more item to our numerous collection of diagrams in the docbook core docs. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
On 9/19/25 10:47 AM, Álvaro Herrera wrote: > On 2025-Sep-19, Peter Eisentraut wrote: > >> On 19.09.25 03:12, Jonathan S. Katz wrote: >>> * Moves any web links to git.postgresql.org repos to use the cgit >>> interface instead of gitweb (e.g. [1]) >>> * Update the commit search[2] to use cgit instead of gitweb >> >> If we're doing that -- which seems reasonable -- then perhaps also update >> the forwarder for the links sent to pgsql-committers, like >> >> https://git.postgresql.org/pg/commitdiff/ed1aad15e09d7d523f4ef413e3c4d410497c8065 >> >> This might be related to the second item, not sure. > > No, I think Jonathan wasn't thinking of these links when he mentioned > that second item. I can confirm that I was thinking about them in the second item; I was thinking about them though, but was unsure if it needed to be in this discussion as it isn't directly in the pgweb scope. But holistically, I guess it does. > I do have the /pg/commitdiff/ URLs in mind, but > that's a pginfra configuration file that needs to be changed. I'll > see about changing that as well, because I've been bitten by this > problem there too. > > BTW regarding Jon's second item, I was again reminded that we have > this "backend flowchart" page there, > https://www.postgresql.org/developer/backend/ > I think this is a prime example of something that we could do much > better by adding one more item to our numerous collection of diagrams in > the docbook core docs. And we support images now (and for a few releases)! Jonathan
Вложения
Peter Geoghegan <pg@bowt.ie> writes: > cgit messes up indentation by showing 8 space tabs (not 4 space tabs) > -- that's certainly not ideal. To me that seems like a complete blocker for this proposal, if we can't find a fix. regards, tom lane
On 9/19/25 12:17 PM, Tom Lane wrote: > Peter Geoghegan <pg@bowt.ie> writes: >> cgit messes up indentation by showing 8 space tabs (not 4 space tabs) >> -- that's certainly not ideal. > > To me that seems like a complete blocker for this proposal, > if we can't find a fix. On a quick read, I believe this is easily settable in the cgit.css file by setting "tab-size" to "4". I did a quick test hacking this inline, and it worked. Further, it appears we already attempt to do this in a "4space.css" file we serve, but it needs to be edited with the updated cgit HTML/CSS. Thanks, Jonathan
Вложения
"Jonathan S. Katz" <jkatz@postgresql.org> writes: > On a quick read, I believe this is easily settable in the cgit.css file > by setting "tab-size" to "4". I did a quick test hacking this inline, > and it worked. Cool, thanks for looking into it. regards, tom lane
On 9/19/25 4:14 PM, Tom Lane wrote: > "Jonathan S. Katz" <jkatz@postgresql.org> writes: >> On a quick read, I believe this is easily settable in the cgit.css file >> by setting "tab-size" to "4". I did a quick test hacking this inline, >> and it worked. > > Cool, thanks for looking into it. Tested inline, but untested as a whole (as I don't have access to gitweb, nor do I really want to have access), but this is effectively the modification, the second line of the CSS rule. Jonathan
Вложения
On 9/19/25 4:54 PM, Jonathan S. Katz wrote: > On 9/19/25 4:14 PM, Tom Lane wrote: >> "Jonathan S. Katz" <jkatz@postgresql.org> writes: >>> On a quick read, I believe this is easily settable in the cgit.css file >>> by setting "tab-size" to "4". I did a quick test hacking this inline, >>> and it worked. >> >> Cool, thanks for looking into it. > > Tested inline, but untested as a whole (as I don't have access to > gitweb, nor do I really want to have access), but this is effectively > the modification, the second line of the CSS rule. If the main concern is lack of diff - which cgit gives us back, and the main objection is the tab-size patch (in previous email)[1], is there any objection to moving forward with updating the URLs after this patch is applied (which I can't do, as I don't have privileges to that server)? If there are objections, I'm fine to wait until after the release to re-open discussion. Jonathan [1] https://www.postgresql.org/message-id/38cfb119-a150-4899-8879-73e3ace66a6a%40postgresql.org
Вложения
"Jonathan S. Katz" <jkatz@postgresql.org> writes: > If the main concern is lack of diff - which cgit gives us back, and the > main objection is the tab-size patch (in previous email)[1], is there > any objection to moving forward with updating the URLs after this patch > is applied (which I can't do, as I don't have privileges to that server)? Not here. > If there are objections, I'm fine to wait until after the release to > re-open discussion. My first thought about scheduling was "best not in the middle of the 18.0 release cycle". However, I don't know of any actual connection between gitweb/cgit and the release-making tasks. My second thought was "the point here is to cut server load, and maybe we need that to happen before the anticipated traffic spike on Thursday". There might not be any connection there either, but if there is, agreed to get it done sooner not later. regards, tom lane
On 2025-Sep-22, Tom Lane wrote: > My first thought about scheduling was "best not in the middle of the > 18.0 release cycle". However, I don't know of any actual connection > between gitweb/cgit and the release-making tasks. My second thought > was "the point here is to cut server load, and maybe we need that to > happen before the anticipated traffic spike on Thursday". There > might not be any connection there either, but if there is, agreed > to get it done sooner not later. I think the traffic overloads are mostly caused by LLM scrapers, which as far as I know does not correlate with spikes caused by human behavior or even those caused by mirroring traffic during a new release or such. I would rather wait until next week, just in case something breaks. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "This is what I like so much about PostgreSQL. Most of the surprises are of the "oh wow! That's cool" Not the "oh shit!" kind. :)" Scott Marlowe, http://archives.postgresql.org/pgsql-admin/2008-10/msg00152.php
On 9/22/25 11:27 AM, Álvaro Herrera wrote: > On 2025-Sep-22, Tom Lane wrote: > >> My first thought about scheduling was "best not in the middle of the >> 18.0 release cycle". However, I don't know of any actual connection >> between gitweb/cgit and the release-making tasks. My second thought >> was "the point here is to cut server load, and maybe we need that to >> happen before the anticipated traffic spike on Thursday". There >> might not be any connection there either, but if there is, agreed >> to get it done sooner not later. > > I think the traffic overloads are mostly caused by LLM scrapers, which > as far as I know does not correlate with spikes caused by human behavior > or even those caused by mirroring traffic during a new release or such. > > I would rather wait until next week, just in case something breaks. I'm fine with this approach, for the above reasons. The web patch won't bit shift too much between now and then. Jonathan