crawling stop at a certain link

roo7oman · November 01, 2008, 07:06:50 PM

hi

I have tried so many time to resuming the last session ,but Crawling stops at this link

Links depth: 3
Current page: showthread.php?t=11650 Pages added to sitemap: 6323
Pages scanned: 6520 (1,157,767.0 KB)
Pages left: 18186 (+ 9792 queued for the next depth level)
Time passed: 34:18
Time left: 95:41
Memory usage: 26,648.0 Kb
Resuming the last session (last updated: 2008-11-01 21:45:40)

what should I do to complate sitemap crawling?

I have the unlimited sitemap generator

XML-Sitemaps Support · November 02, 2008, 08:27:43 AM

Hello,

replied to your private message.

ron8 · November 13, 2008, 05:26:08 PM

I have the same problem -
tried resuming several times and only maybe indexes a few more each time

Links depth: 3
Current page: Directory/Motel/37:Central_Otago
Pages added to sitemap: 2955
Pages scanned: 2960 (124,902.7 KB)
Pages left: 1689 (+ 8581 queued for the next depth level)
Time passed: 27:46
Time left: 15:51
Memory usage: -
Resuming the last session (last updated: 2008-11-12 20:52:37)

Any solution?

XML-Sitemaps Support · November 13, 2008, 08:51:47 PM

Replied to your PM, Ron.

ron8 · December 14, 2008, 06:03:37 PM

Hi Oleg

Sorry to bother you again

I set the following limits in php.ini on the site where generator is installed.

max_execution_time = 9000 ; Maximum execution time of each script, in seconds
max_input_time = 500 ; Maximum amount of time each script may spend parsing request data
memory_limit=5000M

Bit it still stopped at round 2600 pages, depth 3 (with around 9000 remaining). So it made no difference

Do you think I need to go even higher? Or is it something else.

(Generator it is installed on a different site now to the one I pm'd the details for before) - will pm those to you in case

Many thanks

Ron

Quote from: admin on November 15, 2008, 07:27:18 PM
Hello,

that depends on other factors as well.
Some of the real-world examples are:
about 35,000 URLs indexed - 1h 40min total generation time
about 200,000 URLs indexed - 38hours total generation time

Quote from: ron8 on November 14, 2008, 11:38:24 PM
Many thanks - I will try that - any suggestion as to memory limit or max time required for 13000 pages.
Ron

XML-Sitemaps Support · December 14, 2008, 08:06:11 PM

Replied to your PM, Ron.

ron8 · December 16, 2008, 10:06:27 AM

I have now loaded a php.ini to the /generator directory as advised by my host.

I have set the max-execution time at 9500 ( I tried 3600 first with no change to result)

I have set the memory limit at 2500M

I have verified that the max execution is indeed 9500 (using the url you used in your PM)

I have restarted the previous crawl no new pages were added to the sitemap.

I have closed and restarted generator and started a new crawl. This time it got to 3015 pages before hanging

Links depth: 3
Current page: Directory/Hotels/114:Oamaru
Pages added to sitemap: 3015
Pages scanned: 3020 (123,462.6 KB)
Pages left: 1702 (+ 9038 queued for the next depth level)
Time passed: 28:36
Time left: 16:07
Memory usage: -

So what should I try now to get this to complete?

Regards

Ron

info764 · December 16, 2008, 03:28:22 PM

Crawling seems to have stopped on my site with this info frozen:

Links depth: 3
Current page: compare.asp?strAction=add&strProductIDs=60793&strFrom=prodtype&numReturnID=7481
Pages added to sitemap: 615
Pages scanned: 620 (12,573.5 KB)
Pages left: 2682 (+ 610 queued for the next depth level)
Time passed: 4:58
Time left: 21:31
Memory usage: -

This was updating but no it never changes. I have gone on to other tasks and returned to this browser's tab to continue to see the same thing. How do I know if this is stuck or if it is still running in the background?

XML-Sitemaps Support · December 16, 2008, 07:49:30 PM

Please PM me your generator URL as well.

email14 · May 08, 2009, 05:22:32 AM

I'm having the same issue with my website, Crawling seems to stop around 2600. or 2660.

How did you guys configure it?

RonTom · May 08, 2009, 05:43:33 AM

This remains unresolved for us still - it was suggested to try to run as cron but haven't managed to do this yet.
I'd love to know your answer if you find one!

XML-Sitemaps Support · May 10, 2009, 06:39:19 AM

Hello,

in most cases this issue is resolved with a custom setup of "Do not parse"/"Exclude URLs"/"Memory limit" and other settings (depending on the sites).
Please PM me your generator URL/login to check that further.

jenelia · May 23, 2009, 01:08:57 AM

how do we stop spiders from crawling a certain part of a page. ... only stop it from crawling the page itself, or from crawling your links.

ball.mdr · May 23, 2009, 03:59:42 AM

I've a same problem. But sometimes I restart crawling it can finish.

XML-Sitemaps Support · May 23, 2009, 02:00:13 PM

If you want to stop parsing a page, you can add this in <head> section of html code:
<meta name="robots" content="index,nofollow" />

Sitemap Generator Forum

News:

crawling stop at a certain link

roo7oman

XML-Sitemaps Support

ron8

XML-Sitemaps Support

ron8

XML-Sitemaps Support

ron8

info764

XML-Sitemaps Support

email14

RonTom

XML-Sitemaps Support

jenelia

ball.mdr

XML-Sitemaps Support