> Marc again dropped last time modification header, so it's
> impossible to sort results by date (in general case ) without
> specific parser.
Yes, that is unfortunate, but the code required to make this happen puts
stress on the archives to some degree.
> Also, he changed template for message. These changes cause
> recrawling the whole archive each time and overloading
> archives.postgresql.org More specific search engine could use
> another source of information which messages to crawl, but
> one we use at pgsql.ru is a general search engine and it
> can't get modification date without proper header.
There should be no need to reindex the entire archive because of a
template change, since if you honor the embedded
<!--noindex-->..<!--/noindex--> tags, the body text never changes.
Unless of course, you want to keep an up-to-date cached copy.
>
> I suggest:
>
> 1. Use 3-server architecture (image server, frontend, backend) which
> could be reduced to 2 servers (image+frontend, backend) -
> frontend could be plain apache+mod_accel and serve/cache
> all backends
> outputs, backend is a modperl or/and php enabled apache.
> 2. return last modification header - be friendly to crawlers
> and browsers
Tho an accellerator would only work if last-modified header is returned
by the backend, this might be worth looking into.
> 3. stop changing message template
>
Template changes are inevitable, they're part of progress :)
... John