On 2019-Aug-29, Magnus Hagander wrote:
> Maybe Google used to load the pages under /list/ and crawl them for links
> but just not include the actual pages in the index or something
>
> I wonder if we can inject these into Google using a sitemap. I think that
> should work -- will need some investigation on exactly how to do it, as
> sitemaps also have individual restrictions on the number of urls per file,
> and we do have quite a few messages.
>
> > Why is that /list/ exclusion there in the first place?
>
> Because there are basically infinite number of pages in that space, due to
> the fact that you can pick an arbitrary point in time to view from.
Maybe we can create a new page that's specifically to be used by
crawlers, that lists all emails, each only once. Say (unimaginatively)
/list_crawlers/2019-08/ containing links to all emails of all public
lists occurring during August 2019.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services