• Welcome to Sitemap Generator Forum.
 

Crawl Issues _ HELP!

Started by scottsaxton, March 06, 2008, 01:53:02 PM

Previous topic - Next topic

scottsaxton

Hello,

I am new here.  Yesterday I purchased sitemap generator and have had some luck with it, but I am also still running into issues.

It seems to get caught up in my photo gallery.  If I do a crawl of only a few layers the crawl completes successfully, gut it I try to do 7 or 8 layers, it never finishes and I don't really see an error of why not.

I have tried messing around with a maximum run time and maximum page count, etc...but no luck as of yet.

Any ideas would be usefull.

Thanks


scottsaxton

Sometimes the crawling srceen comes back (like when you first start it)....other times it just gets stuck at say level 6 or layer 7 and doens't indicate anyting.  I have also seen an xml error in the sitemap.xml file...at the very bottom.


scottsaxton

No when I am experiencing problems, I don't see an update to the change log.

XML-Sitemaps Support

Hello,

it looks like your server configuration doesn't allow to run the script long enough to create full sitemap. Please try to increase memory_limit and max_execution_time settings in php configuration at your host (php.ini file) or contact hosting support regarding this.

scottsaxton

So I am waiting to hear from my web host, but it appears that I don't have access to my php.ini file.

Here's the deal.  I went through my config and set as shown in attachment.

I started the crawl with the run in background option selected.

It updated the status up to level 7...but then sat and sat and sat until if finally gave me the message to restart the interrupted session.

The change log did not update and the request date did not change.

Any suggestions other then the php.ini file?

da_lyman

You can creat your own php.ini file and put it in the root of your web site and you can make the changes needed. 

My problem is that the program hangs up on level 3 and says its crawling the same page, then times out and says to start over, so i try to start over and it hangs up again. 

Links depth: 3
Current page: products.cfm/action/mfgdisplay/start/81/display/10/CategoryName/Control-Circuit-and-Protection/mfgname/Federal-Pacific-Electric-(FPE)/PageNum/9
Pages added to sitemap: 698
Pages scanned: 720 (20,601.2 Kb)
Pages left: 1312 (+ 2724 queued for the next depth level)
Time passed: 41:40
Time left: 75:57
Memory usage: 3,890.7 Kb

When i go in to the analyze portion it gives me the error:

Warning:  ksort() expects parameter 1 to be array, null given in D:\inetpub\overstockelectrical\generator\pages\page-analyze.inc.php(2) : eval()'d code on line 55

Warning:  Invalid argument supplied for foreach() in D:\inetpub\overstockelectrical\generator\pages\page-analyze.inc.php(2) : eval()'d code on line 57

SOMEONE HELP!

XML-Sitemaps Support

Please PM me your generator URL and reference to this thread.

QuoteWhen i go in to the analyze portion it gives me the error:
You should not open analyze page until sitemap is created (since there is nothing to analyze at that point).

scottsaxton

I have lost the button to start a crawl....any ideas?  This is on the url/generator -> crawl tab!

shold my server be set for PHP 4 or PHP5?


XML-Sitemaps Support

Hello,

you should increase memory_limit setting to resolve that.
Both PHP4 and 5 are supported.

scottsaxton

So here is where I am and I am still having problems:

I have access to my php.ini file.  The default values were as follows:

max_execution_time = 300
max_input_time = 60
memory_limit = 18MB.

Any recommendations what I should change these settings to.  I have tried increasing them and the crawl never complete's.


Here is what my configuration looks like:

Main Parameters:

Starting URL:
[ External links are visible to forum administrators only ]

Save sitemap to:
/hermes/bosweb/web145/b1450/ipw.weflyhot/public_html/sitemap.xml
Current path to Sitemap generator is: /hermes/bosweb/web145/b1450/ipw.weflyhot/public_html/generator/

Your Sitemap URL:
[ External links are visible to forum administrators only ]

Create Text Sitemap:
X Create sitemap in Text format

Create ROR Sitemap:
X Create sitemap in ROR format
It will be stored in the same folder as XML sitemap, but with different filename: ror.xml

Create Google Base Feed (RSS):
X Create feed for Google Base
It will be stored in the data/ folder with filename: gbase.xml

Create HTML Sitemap:
X Create html site map for your normal visitors
Please note that this option requires additional resources to perform

HTML Sitemap filename:
/hermes/bosweb/web145/b1450/ipw.weflyhot/public_html/sitemap.html

Sitemap entry attributes (optional)

Change frequency:
Weekly

Last modification:
Use server's response

Priority
0.5

Automatic Priority:
X Automatically assign priority attribute
Enable this option to automatically reduce priority depending on the page's depth level

Individual attributes:
Blank
define specific frequency and priority attributes here in the following format:
"url substring,lastupdate YYYY-mm-dd,frequency,priority".
example:
page.php?product=,2005-11-14,monthly,0.9



scottsaxton

Continued

Miscellaneous Definitions (optional)

Number of links per page in HTML sitemap:
40000
(that will split your sitemap on several pages)

Compress sitemap using GZip:
Use sitemap files compression
(".gz" will be added to all filenames automatically)

Inform (ping) Search Engines upon completion (Google, Yahoo, Ask, Moreover):
Ping Google when generation is done

Calculate changelog:
Calculate Change Log after completion
please note that this option requires more resources to complete

  • Crawler Limitations, Finetune (optional)
    Maximum pages:
    0 "0" for unlimited

    Maximum depth level:
    0 "0" for unlimited

    Maximum execution time, seconds:
    0  "0" for unlimited

    Save the script state, every X seconds:
    120 this option allows to resume crawling operation if it was interrupted. "0" for no saves

    Make a delay between requests, X seconds after each N requests:
    60 s after each  500 requests
    This option allows to reduce the load on your webserver. "0" for no delay

  • Advanced Settings (optional)

    Extract meta description tag
    X enable META descriptions
    Note: this option may significantly increase memory usage and is not recommended for larger sitemaps

    Use IP address for crawling:
    Blank

    Remove session ID from URLs:
    PHPSESSID sid osCsid
    common session parameters (separate with spaces): PHPSESSID, sid, osCsid

    Progress state storage type:
    X serialize  var_export
    try to change this option in case of memory usage issues


scottsaxton

Basically what happens is I strat the crawl..then after a period of time The Sitemap generation in progress... screen shows

This page cannot be sidplayed

The page you are looking for is currently....................................................
......
........

XML-Sitemaps Support

Hello,

memory_limit of 18M is too low in many cases, please increase it. THen check how manya pages does it crawl before getting interrupted.