• Welcome to Sitemap Generator Forum.
 

crawling stop at a certain link

Started by roo7oman, November 01, 2008, 07:06:50 PM

Previous topic - Next topic

roo7oman

hi

I have tried so many time to resuming the last session ,but Crawling stops at this link

Links depth: 3
Current page: showthread.php?t=11650 Pages added to sitemap: 6323
Pages scanned: 6520 (1,157,767.0 KB)
Pages left: 18186 (+ 9792 queued for the next depth level)
Time passed: 34:18
Time left: 95:41
Memory usage: 26,648.0 Kb
Resuming the last session (last updated: 2008-11-01 21:45:40)

what should I do to complate sitemap crawling?  :(

I have the unlimited sitemap generator


ron8

I have the same problem  -
tried resuming several times and only maybe indexes a few more each time

Links depth: 3
Current page: Directory/Motel/37:Central_Otago
Pages added to sitemap: 2955
Pages scanned: 2960 (124,902.7 KB)
Pages left: 1689 (+ 8581 queued for the next depth level)
Time passed: 27:46
Time left: 15:51
Memory usage: -
Resuming the last session (last updated: 2008-11-12 20:52:37)

Any solution?


ron8

Hi Oleg

Sorry to bother you again

I set the following limits in php.ini on the site where generator is installed.

max_execution_time = 9000     ; Maximum execution time of each script, in seconds
max_input_time = 500   ; Maximum amount of time each script may spend parsing request data
memory_limit=5000M

Bit it still stopped at round 2600 pages, depth 3  (with around 9000 remaining).  So it made no difference

Do you think I need to  go even higher? Or is it something else.

(Generator it is installed on a different site now to the one I pm'd the details for before) - will pm those to you in case

Many thanks

Ron

Quote from: admin on November 15, 2008, 07:27:18 PM
Hello,

that depends on other factors as well.
Some of the real-world examples are:
about 35,000 URLs indexed - 1h 40min total generation time
about 200,000 URLs indexed - 38hours total generation time


Quote from: ron8 on November 14, 2008, 11:38:24 PM
Many thanks - I will try that - any suggestion as to memory limit or max time required for 13000 pages.
Ron


ron8

I have now loaded a php.ini to the /generator directory as advised by my host.

I have set the max-execution time at 9500 ( I tried 3600 first with no change to result)

I have set the memory limit at 2500M

I have verified that the max execution is indeed 9500 (using the url you used in your PM)

I have restarted the previous crawl no new pages were added to the sitemap.

I have closed and restarted generator and started a new crawl.  This time it got to 3015 pages before hanging

Links depth: 3
Current page: Directory/Hotels/114:Oamaru
Pages added to sitemap: 3015
Pages scanned: 3020 (123,462.6 KB)
Pages left: 1702 (+ 9038 queued for the next depth level)
Time passed: 28:36
Time left: 16:07
Memory usage: -

So what should I try now to get this to complete?

Regards

Ron

info764

Crawling seems to have stopped on my site with this info frozen:

Links depth: 3
Current page: compare.asp?strAction=add&strProductIDs=60793&strFrom=prodtype&numReturnID=7481
Pages added to sitemap: 615
Pages scanned: 620 (12,573.5 KB)
Pages left: 2682 (+ 610 queued for the next depth level)
Time passed: 4:58
Time left: 21:31
Memory usage: -

This  was updating but no it never changes. I have gone on to other tasks and returned to this browser's tab to continue to see the same thing. How do I know if this is stuck or if it is still running in the background?


email14

I'm having the same issue with my website, Crawling seems to stop around 2600. or 2660.

How did you guys configure it?

RonTom

This remains unresolved for us still - it was suggested to try to run as cron but haven't managed to do this yet.
I'd love to know your answer if you find one!

XML-Sitemaps Support

Hello,

in most cases this issue is resolved with a custom setup of "Do not parse"/"Exclude URLs"/"Memory limit" and other settings (depending on the sites).
Please PM me your generator URL/login to check that further.

jenelia

#12
how do we stop spiders from crawling a certain part of a page. ... only stop it from crawling the page itself, or from crawling your links.

ball.mdr

I've a same problem. But sometimes I restart crawling it can finish.

XML-Sitemaps Support

If you want to stop parsing a page, you can add this in <head> section of html code:
<meta name="robots" content="index,nofollow" />