Обсуждение: Mirror.php performance

Поиск
Список
Период
Сортировка

Mirror.php performance

От
"Dave Page"
Дата:
Well mirror.php's performance is *far* better than it was, though there
is clearly still room for improvement. However, something is not right -
there are over 7000 docs pages if counting static and interactive:

Nov 04 08:53:38 mirror [info] Mirroring started
Nov 04 08:57:09 mirror [error] HTTP error 404 at page
http://wwwdevel.postgresql.org/images/editorschoice2003.jpg
Nov 04 09:01:46 mirror [error] HTTP error 404 at page
http://wwwdevel.postgresql.org/presskit/en/presskit74.html
Nov 04 09:02:47 mirror [error] HTTP error 404 at page
http://wwwdevel.postgresql.org/pgsql-bugs@postgresql.org
Nov 04 09:16:04 mirror [info] Mirroring finished. 1027 page(s) saved,
1346 second(s) spent

It appears to have saved everything in the root directory afaict, and
the 7.4 static docs, but nothing else.

Any ideas?

Regards, Dave.

Re: Mirror.php performance

От
Alexey Borzov
Дата:
Hi,

Dave Page wrote:
> Nov 04 09:16:04 mirror [info] Mirroring finished. 1027 page(s) saved,
> 1346 second(s) spent
>
> It appears to have saved everything in the root directory afaict, and
> the 7.4 static docs, but nothing else.
>
> Any ideas?

Ouch. It did the same for me, will look into this: seems as if some
links are dropped / not followed.

Re: Mirror.php performance

От
Alexey Borzov
Дата:
Hi,

Alexey Borzov wrote:
>> Nov 04 09:16:04 mirror [info] Mirroring finished. 1027 page(s) saved,
>> 1346 second(s) spent
>>
>> It appears to have saved everything in the root directory afaict, and
>> the 7.4 static docs, but nothing else.
>>
>> Any ideas?
>
> Ouch. It did the same for me, will look into this: seems as if some
> links are dropped / not followed.

Fixed. Turned out the regexes to extract links from pages were broken
and some of the links (including the main menu, unfortunately) were thus
not crawled.


Re: Mirror.php performance

От
"Dave Page"
Дата:

> -----Original Message-----
> From: Alexey Borzov [mailto:borz_off@cs.msu.su]
> Sent: 04 November 2004 13:03
> To: Dave Page
> Cc: pgsql-www@postgresql.org
> Subject: Re: [pgsql-www] Mirror.php performance
>
> Hi,
>
> Alexey Borzov wrote:
> >> Nov 04 09:16:04 mirror [info] Mirroring finished. 1027
> page(s) saved,
> >> 1346 second(s) spent
> >>
> >> It appears to have saved everything in the root directory
> afaict, and
> >> the 7.4 static docs, but nothing else.
> >>
> >> Any ideas?
> >
> > Ouch. It did the same for me, will look into this: seems as if some
> > links are dropped / not followed.
>
> Fixed. Turned out the regexes to extract links from pages
> were broken and some of the links (including the main menu,
> unfortunately) were thus not crawled.

Thanks, I'll give it a try.

Regard,s dave.