Cron does NOT start the crawl
« on: January 14, 2008, 03:40:53 PM »
I have cron setup on a remote server to call the runcrawl.php script on the site that has the site map generator installed.

It works fine if the command is "wget [ External links are visible to forum administrators only ].*******.co.uk/generator/runcrawl.php"
But I don't want to create a new file on the remote server of the progress report, so I changed the command to:
"wget --spider [ External links are visible to forum administrators only ].*******.co.uk/generator/runcrawl.php" so that no file is downloaded, but now the crawl does not start even though wget indicates it got an HTTP status of "200 OK".

Why?????
Re: Cron does NOT start the crawl
« Reply #2 on: January 19, 2008, 07:11:34 PM »
i have same problem too .. i run that command and then just numbers start to shows up and count ..

i dont get the result ! there is no setting that i can set for example every day at this time make a new sitemap ..
am i did wrong somewhere ?

let me kno plz thanks
Re: Cron does NOT start the crawl
« Reply #3 on: January 19, 2008, 08:44:00 PM »
Hello,

the cron job is configure in hosting control panel, where you define the command line and additional parameters for scheduled time/date.
Re: Cron does NOT start the crawl
« Reply #4 on: January 21, 2008, 05:46:37 PM »
I am not running cron on the server that the runcrawl.php script is on. I do not have access to cron on that server, so the "/usr/bin/php /path/to/runcrawl.php" is not applicable to this situation.

The crawl script starts fine when I run wget WITHOUT the "--spider" option, but since I do not want the cron to "wait" for the crawl to finish (on this domain it takes almost an hour) I am using the --spider option which SHOULD trigger the crawl script to start without waiting for a return, but for some reason it is not. I can't really troubleshoot it or look for places where it might be testing for the HTTP_USER_AGENT since the script is encoded.
Re: Cron does NOT start the crawl
« Reply #5 on: January 21, 2008, 10:42:10 PM »
The script doesn't test user agent for incoming requests, so there should be no difference here. It is possible though that your server where generator is installed doesn't allow scripts to stay in background when called with web request, so it's only running while network connection is still open (regardless of whether the script requests to stay in bg or not).