Re: Sitemap crawling never stops/ends
« Reply #31 on: June 17, 2009, 07:31:10 PM »
Hi there,

This looks like the same issue I'm having, however no resolution of it just yet!

Oddly, I only have the issue on one particular website.  If I move my xml-sitemap generator to any other website, it doesn't fault.  I've even downloaded a copy of the website it runs away on and it still works perfectly OK.

When it works on the offline site, it will generate 463 pages for the sitemap.  When it runs away on the live site,it gets to 2000-odd pages and won't stop!

In the front end interface (on the broken site) is this..

Already in progress. Current process state is displayed:
Links depth: 13
Current page: index.php/reviews/index.php/index.php/component/mailto/?tmpl=component&link=aHR0cDovL3d3dy5nZW9yZ2lhbW9mZmV0dC5uZXQvaW5kZXgucGhwL3Jldmlld3MvaW5kZXgucGhwL2luZGV4LnBocC9yZXZpZXdzLzE5Ny1kb2N0b3J3aG8tZGQ%3D
Pages added to sitemap: 1023
Pages scanned: 4300 (29,133.5 KB)
Pages left: 915 (+ 1272 queued for the next depth level)
Time passed: 0:27:05
Time left: 0:05:45
Memory usage: 5,146.5 Kb

Now I know that I don't have 13 levels and I know that there is something wrong with the site, looking at the Current page line, it's indexing: "index.php/reviews/index.php/index.php/component/mailto" and that just isn't right!! I don't use index.php as the pages (as they're SEF'd out), so it's finding url's that don't exist!

Can anyone please point me in the right direction?

Many thanks.
Re: Sitemap crawling never stops/ends
« Reply #32 on: June 17, 2009, 07:41:15 PM »
And as an update......

I've just downloaded the latest version of th generator.....and it made no difference!
Re: Sitemap crawling never stops/ends
« Reply #33 on: June 17, 2009, 10:12:10 PM »
Hello,

try to add this in "Exclude URLs" setting and start crawling from the scratch (no resuming):
Code: [Select]
index.php/index.php
Re: Sitemap crawling never stops/ends
« Reply #34 on: June 18, 2009, 06:13:10 AM »
Hallo Oleg - thanks for your reply.

Following your lead, I've added the URL to the "Exclude URL's" setting and finally got the generator to finish!

It still "over-crawled" by 475 URL's but at least I could get a sitemap out to see what it was crawling.

The 475 "extra" URL's were logged as Broken Links (they would be - as they don't exist!) but they did all start with index.php (example below).

Links depth: 9
Current page: index.php/reviews/index.php/reviews/reviews/sc9-index/sc9-intro
Pages added to sitemap: 519
Pages scanned: 960 (7,656.1 KB)
Pages left: 119 (+ 187 queued for the next depth level)
Time passed: 0:05:10
Time left: 0:00:38
Memory usage: 1,610.6 Kb

So - adding index.php to the "Exclude URL's" then resulted in a perfect sitemap, with the correct number of pages and no broken links (as it should be).  I use SEF that doesn't add the index.php to the ends of the URL's.

It's still very odd though - and as I can copy the site to a different installation and it runs just fine, I find it hard to believe that it's either the generator script or the web site config. (So huge thanks for helping!).  Possibly server config?  Any ideas (although it's out of your remit!)

Regards
Alan.
Re: Sitemap crawling never stops/ends
« Reply #35 on: June 18, 2009, 08:20:38 PM »
It seems like some links on your site are relative, while they should be absolute, for instance if you have link:
<a href="index.php/reviews/xxx"></a>
while it should be:
<a href="/index.php/reviews/xxx"></a>

Re: Sitemap crawling never stops/ends
« Reply #36 on: August 04, 2009, 02:54:18 PM »
Seems I'm having the same problem.
As long as  I configure link depth to 4, everything works fine.
As soon as I increase link depth (i.g 5) I get an error:

memory_limit as per phpinfo: 128M
max_execution_time: 60

Sitemap reports:
Links depth: 4
Current page: blog.php?p=172
Pages added to sitemap: 656
Pages scanned: 680 (13,648.6 KB)
Pages left: 693 (+ 379 queued for the next depth level)
Time passed: 0:01:25
Time left: 0:01:27
Memory usage: 1,618.8 Kb

Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 132382727 bytes) in /home/w10500/web/public_html/xmlsitemaps/pages/class.http.inc.php(2) : eval()'d code on line 112
Re: Sitemap crawling never stops/ends
« Reply #37 on: August 05, 2009, 10:46:47 PM »
Hello,

looks like sitemap generator is trying to crawl pages of large size (may be video or audio files direclty linked on your site). Please PM me your generator URL/login so that I can check that.

dale3

*
  • *
  • 16
Re: Sitemap crawling never stops/ends
« Reply #38 on: October 23, 2009, 09:45:20 PM »
Hello,

Sitemap worked great for months, but then it started getting stuck as other have indicated here in this topic.

I set my pphp.ini file:
 
max_execution_time = 3000
max_input_time = 120
memory_limit = 1024M
 
The settings seem to be active (I checked with the phpinfo file in the generator directory)

I blocked a lot of the unnecessary urls in the program admin.

I set the program to save every 30 seconds, but even after hours, I'm still getting this:

Links depth: 2
Current page: forums/single-for-life--t81s50.html
Pages added to sitemap: 736
Pages scanned: 760 (20,312.2 KB)
Pages left: 184 (+ 1359 queued for the next depth level)
Time passed: 0:10:01
Time left: 0:02:25
Memory usage: 1,931.7 Kb

Any suggestion on what I can do to get it to work? Thanks for any help.
Re: Sitemap crawling never stops/ends
« Reply #40 on: April 01, 2010, 09:12:34 AM »
I have a problem with the standalone engine.

I get this:
Total pages indexed: 1372
Creating sitemaps... and calculating changelog...

and that's it. I don't get my sitemaps. Why ?
memory_limit = 1024 Mb
enough resources for the sitemap engine.

what should i do ?
Re: Sitemap crawling never stops/ends
« Reply #41 on: April 01, 2010, 08:55:49 PM »
Hello,

please let me know your generator URL/login in private message to check this.
Re: Sitemap crawling never stops/ends
« Reply #42 on: April 05, 2010, 12:36:47 PM »
It's ok now, I did not set 775 for the folder.
Re: Sitemap crawling never stops/ends
« Reply #43 on: July 22, 2010, 10:29:35 PM »
Hi,

I just bought the full version and I seem to have a similar, if not the same, problem:

Links depth: 52
Current page: Calendrier-Annee.php?y=1911&cal_country=4
Pages added to sitemap: 3474
Pages scanned: 3480 (180,418.1 KB)
Pages left: 10 (+ 169 queued for the next depth level)
Time passed: 0:08:23
Time left: 0:00:01
Memory usage: 4,218.4 Kb

It remains with one second left for 20 minutes now, and none of the figures below is moving. Help please ?
Re: Sitemap crawling never stops/ends
« Reply #44 on: July 23, 2010, 10:59:32 AM »
Hello,

please let me know your generator URL/login in private message to check this.