Getting Error while crawling
« on: December 13, 2007, 02:56:36 PM »
Hello,
We have configured following settings:

Starting URL:
[ External links are visible to forum administrators only ]
Save sitemap to:
/home/jcarder/mojopages.com/htdocs/sitemap.xml
Current path to Sitemap generator is: /home/jcarder/mojopages.com/htdocs/generator/
Your Sitemap URL:
[ External links are visible to forum administrators only ]
HTML Sitemap filename:
/home/jcarder/mojopages.com/htdocs/generator/data/sitemap.html

When I save this configuration it is saved successfully but when i run the Crawler it simply redirect to the configurartion page again. Now when i tried to saw sitemap it doesn't saw anything.

SITE URL: [ External links are visible to forum administrators only ]

Can you please help me out.
Re: Getting Error while crawling
« Reply #1 on: December 14, 2007, 01:31:28 AM »
Please try to manually define server's IP address in config.inc.php file:
'xs_ipconnection'=>'xx.xx.xx.xx'
Re: Getting Error while crawling
« Reply #2 on: December 14, 2007, 06:34:00 AM »
Thanks for your quick reply.

I have tried with your suggestion. I have made chang 'xs_ipconnection'=>'xx.xx.xx.xx' manually but still it repeat same thing. Please do needful to me.

It gives following error:

[ External links are visible to forum administrators only ] was an error while retrieving the URL specified: [ External links are visible to forum administrators only ]
Error message: Error opening socket to [ External links are visible to forum administrators only ]
HTTP headers follow:

HTTP output:

Thanks.
« Last Edit: December 14, 2007, 06:36:30 AM by joncarder »
Re: Getting Error while crawling
« Reply #4 on: December 17, 2007, 11:23:23 AM »
Now, I am facing another problem with that. When I started crawling, It started and after some time it stopped. Now when I go into crawling tab, It shows Two state:
Run in background  and
Resume last session.
Continue the interrupted session (2007-12-14 14:35:50, URLs added: 6240, estimated URLs left: 114073).

I have checked both option and start again and again. But it did not work at all. How can I able to crawl the complete site. When I see the Data folder it shows size of "crawl_dump.log" file to 43.21 MB.

Please do some needful to me. Thank you.

« Last Edit: December 17, 2007, 11:28:01 AM by joncarder »
Re: Getting Error while crawling
« Reply #5 on: December 17, 2007, 11:43:01 PM »
Hello,

it looks like your server configuration doesn't allow to run the script long enough to create full sitemap. Please try to increase memory_limit and max_execution_time settings in php configuration at your host (php.ini file) or contact hosting support regarding this.
Re: Getting Error while crawling
« Reply #6 on: December 18, 2007, 08:48:48 AM »
Ok Fine. We have changed according your suggestion (increase memory Limit to 256 MB from 128 MB) and executed again crawling script. It started to crawl URLs. But after some time when script go for depth level 3 it again stopped.

At what depth level should we follow crawling process? I mean what is the exact meaning of Depth Level crawling?

If I stop my script after 2 level, will it generate sitemap? 
When below two files will update?

1. Text SiteMap
[ External links are visible to forum administrators only ]
2. ROR SiteMap
[ External links are visible to forum administrators only ]

Current status is:
Links depth: 3
Current page: business-profile/clothing/abercrombie-&-fitch/austin/tx/13457977
Pages added to sitemap: 9786
Pages scanned: 9800 (1,500,028.7 Kb)
Pages left: 167715 (+ 12348 queued for the next depth level)
Time passed: 409:21
Time left: 7005:31
Memory usage: 123,295.5 Kb

Awaiting for your reply... Thanks.
« Last Edit: December 18, 2007, 09:48:53 AM by joncarder »
Re: Getting Error while crawling
« Reply #7 on: December 19, 2007, 02:24:50 AM »
Hello,

yes, it will generate the sitemap. The "depth level" option means "the number of clicks required to reach that page starting from homepage".
Re: Getting Error while crawling
« Reply #8 on: December 27, 2007, 05:10:16 AM »
Hi,
We have started crawling URL script for depth level 4 for our site. But it frequently stops the script. We have also increased memory limit upto 3 GB. Still it has the same problem. We have tried this since last 3 days.

Currently I am not able to open the following URL and see the status. What is the reason for that?  It was happened to me yesterday also.
[ External links are visible to forum administrators only ]

Please look at this issue and try to help me out. Thanks.
Re: Getting Error while crawling
« Reply #9 on: December 27, 2007, 01:20:59 PM »
Hi,
We have started crawling URL script for depth level 4 for our site. But it frequently stops the script. We have also increased memory limit upto 3 GB. Still it has the same problem. We have tried this since last 3 days.

For Depth level 3 we have run this script. Following is the Details of that:
Request date:
19 December 2007, 16:22
Processing time:
60801.00s
Pages indexed:
178641
Sitemap files:
5
Pages size:
1,299.33Mb

If you need login info i will provide you to see the configuration.

Currently I am not able to open the following URL and see the status. What is the reason for that?  It was happened to me yesterday also.
[ External links are visible to forum administrators only ]

Please look at this issue and try to help me out.
Thanks.
Re: Getting Error while crawling
« Reply #10 on: December 27, 2007, 07:43:49 PM »
Hello,

please try to execute the generator in command line instead of web interface and check if you get any error message.
Re: Getting Error while crawling
« Reply #11 on: December 28, 2007, 04:56:53 AM »
Thanks for your reply.

But I am not able to open the following URL in any browser.
[ External links are visible to forum administrators only ]

What is the reason for that? I want to see command to run from command line.
Re: Getting Error while crawling
« Reply #12 on: December 29, 2007, 12:49:46 AM »
That page is password protected.
The command line is:
Code: [Select]
/usr/bin/php /path/to/generator/runcrawl.php
Re: Getting Error while crawling
« Reply #13 on: January 02, 2008, 05:14:57 AM »
You are right. But after providing username and password it still don't open following URL while other links are working properly. I can see the configuration, analyze etc. but don't crawling link.

Please look at this issue.
Re: Getting Error while crawling
« Reply #14 on: January 02, 2008, 06:36:42 AM »
You are right. Pages are password protected. But after providing username and password it still don't open following URL while other links are working properly. I can see the configuration, analyze etc. but don't crawling link.
[ External links are visible to forum administrators only ]

Please look at this issue.