Crawl Issues _ HELP!
« on: March 06, 2008, 01:53:02 PM »
Hello,

I am new here.  Yesterday I purchased sitemap generator and have had some luck with it, but I am also still running into issues.

It seems to get caught up in my photo gallery.  If I do a crawl of only a few layers the crawl completes successfully, gut it I try to do 7 or 8 layers, it never finishes and I don't really see an error of why not.

I have tried messing around with a maximum run time and maximum page count, etc...but no luck as of yet.

Any ideas would be usefull.

Thanks
Re: Crawl Issues _ HELP!
« Reply #1 on: March 06, 2008, 10:39:16 PM »
Hello,

do you get sitemap generator stopped or it keeps working/crawling your site?
Re: Crawl Issues _ HELP!
« Reply #2 on: March 07, 2008, 03:09:04 PM »
Sometimes the crawling srceen comes back (like when you first start it)....other times it just gets stuck at say level 6 or layer 7 and doens't indicate anyting.  I have also seen an xml error in the sitemap.xml file...at the very bottom.
Re: Crawl Issues _ HELP!
« Reply #3 on: March 07, 2008, 11:04:33 PM »
So, the sitemap is created successfully? I.e., you see new entries on change log page?
Re: Crawl Issues _ HELP!
« Reply #4 on: March 10, 2008, 01:00:06 PM »
No when I am experiencing problems, I don't see an update to the change log.
Re: Crawl Issues _ HELP!
« Reply #5 on: March 10, 2008, 11:05:48 PM »
Hello,

it looks like your server configuration doesn't allow to run the script long enough to create full sitemap. Please try to increase memory_limit and max_execution_time settings in php configuration at your host (php.ini file) or contact hosting support regarding this.
Re: Crawl Issues _ HELP!
« Reply #6 on: March 14, 2008, 01:24:55 PM »
So I am waiting to hear from my web host, but it appears that I don't have access to my php.ini file.

Here's the deal.  I went through my config and set as shown in attachment.

I started the crawl with the run in background option selected.

It updated the status up to level 7...but then sat and sat and sat until if finally gave me the message to restart the interrupted session.

The change log did not update and the request date did not change.

Any suggestions other then the php.ini file?
Re: Crawl Issues _ HELP!
« Reply #7 on: March 14, 2008, 04:30:04 PM »
You can creat your own php.ini file and put it in the root of your web site and you can make the changes needed. 

My problem is that the program hangs up on level 3 and says its crawling the same page, then times out and says to start over, so i try to start over and it hangs up again. 

Links depth: 3
Current page: products.cfm/action/mfgdisplay/start/81/display/10/CategoryName/Control-Circuit-and-Protection/mfgname/Federal-Pacific-Electric-(FPE)/PageNum/9
Pages added to sitemap: 698
Pages scanned: 720 (20,601.2 Kb)
Pages left: 1312 (+ 2724 queued for the next depth level)
Time passed: 41:40
Time left: 75:57
Memory usage: 3,890.7 Kb

When i go in to the analyze portion it gives me the error:

Warning:  ksort() expects parameter 1 to be array, null given in D:\inetpub\overstockelectrical\generator\pages\page-analyze.inc.php(2) : eval()'d code on line 55

Warning:  Invalid argument supplied for foreach() in D:\inetpub\overstockelectrical\generator\pages\page-analyze.inc.php(2) : eval()'d code on line 57

SOMEONE HELP!
Re: Crawl Issues _ HELP!
« Reply #8 on: March 14, 2008, 08:25:57 PM »
Please PM me your generator URL and reference to this thread.

Quote
When i go in to the analyze portion it gives me the error:
You should not open analyze page until sitemap is created (since there is nothing to analyze at that point).
Re: Crawl Issues _ HELP!
« Reply #9 on: March 15, 2008, 12:09:05 AM »
I have lost the button to start a crawl....any ideas?  This is on the url/generator -> crawl tab!

shold my server be set for PHP 4 or PHP5?

Re: Crawl Issues _ HELP!
« Reply #10 on: March 15, 2008, 11:07:25 PM »
Hello,

you should increase memory_limit setting to resolve that.
Both PHP4 and 5 are supported.
Re: Crawl Issues _ HELP!
« Reply #11 on: March 17, 2008, 03:16:29 PM »
So here is where I am and I am still having problems:

I have access to my php.ini file.  The default values were as follows:

max_execution_time = 300
max_input_time = 60
memory_limit = 18MB.

Any recommendations what I should change these settings to.  I have tried increasing them and the crawl never complete's.


Here is what my configuration looks like:

Main Parameters:

Starting URL:
 [ External links are visible to forum administrators only ]

Save sitemap to:
 /hermes/bosweb/web145/b1450/ipw.weflyhot/public_html/sitemap.xml
Current path to Sitemap generator is: /hermes/bosweb/web145/b1450/ipw.weflyhot/public_html/generator/

Your Sitemap URL:
[ External links are visible to forum administrators only ]
 
Create Text Sitemap:
 X Create sitemap in Text format

Create ROR Sitemap:
X Create sitemap in ROR format
It will be stored in the same folder as XML sitemap, but with different filename: ror.xml
 
Create Google Base Feed (RSS):
X Create feed for Google Base
It will be stored in the data/ folder with filename: gbase.xml

Create HTML Sitemap:
X Create html site map for your normal visitors
Please note that this option requires additional resources to perform

HTML Sitemap filename:
/hermes/bosweb/web145/b1450/ipw.weflyhot/public_html/sitemap.html

Sitemap entry attributes (optional)

Change frequency:
Weekly

Last modification:
 Use server's response
 
Priority
0.5

Automatic Priority:
X Automatically assign priority attribute
Enable this option to automatically reduce priority depending on the page's depth level

Individual attributes:
Blank
define specific frequency and priority attributes here in the following format:
"url substring,lastupdate YYYY-mm-dd,frequency,priority".
example:
page.php?product=,2005-11-14,monthly,0.9


Re: Crawl Issues _ HELP!
« Reply #12 on: March 17, 2008, 03:20:32 PM »
Continued

Miscellaneous Definitions (optional)

Number of links per page in HTML sitemap:
40000
(that will split your sitemap on several pages)

Compress sitemap using GZip:
 Use sitemap files compression
(".gz" will be added to all filenames automatically)

Inform (ping) Search Engines upon completion (Google, Yahoo, Ask, Moreover):
 Ping Google when generation is done

Calculate changelog:
 Calculate Change Log after completion
please note that this option requires more resources to complete

  • Crawler Limitations, Finetune (optional)

Maximum pages:
 0 "0" for unlimited

Maximum depth level:
0 "0" for unlimited

Maximum execution time, seconds:
0  "0" for unlimited

Save the script state, every X seconds:
120 this option allows to resume crawling operation if it was interrupted. "0" for no saves

Make a delay between requests, X seconds after each N requests:
60 s after each  500 requests
This option allows to reduce the load on your webserver. "0" for no delay

  • Advanced Settings (optional)


Extract meta description tag
X enable META descriptions
Note: this option may significantly increase memory usage and is not recommended for larger sitemaps

Use IP address for crawling:
Blank

Remove session ID from URLs:
 PHPSESSID sid osCsid
common session parameters (separate with spaces): PHPSESSID, sid, osCsid

Progress state storage type:
X serialize  var_export
try to change this option in case of memory usage issues

Re: Crawl Issues _ HELP!
« Reply #13 on: March 17, 2008, 03:22:51 PM »
Basically what happens is I strat the crawl..then after a period of time The Sitemap generation in progress... screen shows

This page cannot be sidplayed

The page you are looking for is currently....................................................
......
........
Re: Crawl Issues _ HELP!
« Reply #14 on: March 17, 2008, 10:35:54 PM »
Hello,

memory_limit of 18M is too low in many cases, please increase it. THen check how manya pages does it crawl before getting interrupted.