Crawling Tab - NO RUN button
« on: June 19, 2010, 03:04:33 AM »
I have a crawl_dump.log that is rather big 579MB. everything was going great it had 1,480,000 pages indexed and had 200K more... I checked back in the morning after 2 weeks or crawling and it was not running anymore, and all i get when I click the crawling tab is

Run in background
Do not interrupt the script even after closing the browser window until the crawling is complete

There is no choice to run existing script. I have plenty of execution time and memory in php.ini

How can I fix this?

Also, I try to run via SSH and this is what i get...

php....  /generator/runcrawl.php

<html>
<head>
<title>XML Sitemaps - Generation</title>
<meta http-equiv="Content-type" content="text/html;charset=iso-8859-15" />
<link rel=stylesheet type="text/css" href="pages/style.css">
</head>
<body>


PLEASE HELP!!!!

Thanks in advance
Re: Crawling Tab - NO RUN button
« Reply #1 on: June 19, 2010, 06:23:41 AM »
I have a crawl_dump.log that is rather big 579MB. everything was going great it had 1,480,000 pages indexed and had 200K more... I checked back in the morning after 2 weeks or crawling and it was not running anymore, and all i get when I click the crawling tab is

Run in background
Do not interrupt the script even after closing the browser window until the crawling is complete

There is no choice to run existing script. I have plenty of execution time and memory in php.ini

How can I fix this?

Also, I try to run via SSH and this is what i get...

php....  /generator/runcrawl.php

<html>
<head>
<title>XML Sitemaps - Generation</title>
<meta http-equiv="Content-type" content="text/html;charset=iso-8859-15" />
<link rel=stylesheet type="text/css" href="pages/style.css">
</head>
<body>


PLEASE HELP!!!!

Thanks in advance

In addition here is the final few lines my crawl_dump.log

  array (
  ),
  'nt' => 79,
  'tsize' => 49232311,
  'pn' => 1523380,
  'links_level' => 3,
  'ctime' => 636.72208,
  'time' => 1276853058,
)

Re: Crawling Tab - NO RUN button
« Reply #2 on: June 19, 2010, 10:10:41 AM »
Hello,

in case if there is no Run button then memory_limit should be increased in php configuration, since generator is unable to read the large dump file.

with website of this size the best option is to create a limited sitemap - with "Maximum depth" or "Maximume URLs" option limited so that it would gather about 200-300,000 URLs, which would be main pages representing "roadmap" sitemap for search engines.