crawling stop at a certain link
« on: November 01, 2008, 07:06:50 PM »
hi

I have tried so many time to resuming the last session ,but Crawling stops at this link

Links depth: 3
Current page: showthread.php?t=11650 Pages added to sitemap: 6323
Pages scanned: 6520 (1,157,767.0 KB)
Pages left: 18186 (+ 9792 queued for the next depth level)
Time passed: 34:18
Time left: 95:41
Memory usage: 26,648.0 Kb
Resuming the last session (last updated: 2008-11-01 21:45:40)

what should I do to complate sitemap crawling?  :(

I have the unlimited sitemap generator
« Last Edit: November 01, 2008, 07:09:12 PM by roo7oman »

ron8

*
  • *
  • 24
Re: crawling stop at a certain link
« Reply #2 on: November 13, 2008, 05:26:08 PM »
I have the same problem  -
tried resuming several times and only maybe indexes a few more each time

Links depth: 3
Current page: Directory/Motel/37:Central_Otago
Pages added to sitemap: 2955
Pages scanned: 2960 (124,902.7 KB)
Pages left: 1689 (+ 8581 queued for the next depth level)
Time passed: 27:46
Time left: 15:51
Memory usage: -
Resuming the last session (last updated: 2008-11-12 20:52:37)

Any solution?

ron8

*
  • *
  • 24
Re: crawling stop at a certain link
« Reply #4 on: December 14, 2008, 06:03:37 PM »
Hi Oleg

Sorry to bother you again

I set the following limits in php.ini on the site where generator is installed.

max_execution_time = 9000     ; Maximum execution time of each script, in seconds
max_input_time = 500   ; Maximum amount of time each script may spend parsing request data
memory_limit=5000M

Bit it still stopped at round 2600 pages, depth 3  (with around 9000 remaining).  So it made no difference

Do you think I need to  go even higher? Or is it something else.

(Generator it is installed on a different site now to the one I pm'd the details for before) - will pm those to you in case

Many thanks

Ron

Hello,

that depends on other factors as well.
Some of the real-world examples are:
about 35,000 URLs indexed - 1h 40min total generation time
about 200,000 URLs indexed - 38hours total generation time


Many thanks - I will try that - any suggestion as to memory limit or max time required for 13000 pages.
Ron

ron8

*
  • *
  • 24
Re: crawling stop at a certain link
« Reply #6 on: December 16, 2008, 10:06:27 AM »
I have now loaded a php.ini to the /generator directory as advised by my host.

I have set the max-execution time at 9500 ( I tried 3600 first with no change to result)

I have set the memory limit at 2500M

I have verified that the max execution is indeed 9500 (using the url you used in your PM)

I have restarted the previous crawl no new pages were added to the sitemap.

I have closed and restarted generator and started a new crawl.  This time it got to 3015 pages before hanging

Links depth: 3
Current page: Directory/Hotels/114:Oamaru
Pages added to sitemap: 3015
Pages scanned: 3020 (123,462.6 KB)
Pages left: 1702 (+ 9038 queued for the next depth level)
Time passed: 28:36
Time left: 16:07
Memory usage: -

So what should I try now to get this to complete?

Regards

Ron
Re: crawling stop at a certain link
« Reply #7 on: December 16, 2008, 03:28:22 PM »
Crawling seems to have stopped on my site with this info frozen:

Links depth: 3
Current page: compare.asp?strAction=add&strProductIDs=60793&strFrom=prodtype&numReturnID=7481
Pages added to sitemap: 615
Pages scanned: 620 (12,573.5 KB)
Pages left: 2682 (+ 610 queued for the next depth level)
Time passed: 4:58
Time left: 21:31
Memory usage: -

This  was updating but no it never changes. I have gone on to other tasks and returned to this browser's tab to continue to see the same thing. How do I know if this is stuck or if it is still running in the background?
Re: crawling stop at a certain link
« Reply #9 on: May 08, 2009, 05:22:32 AM »
I'm having the same issue with my website, Crawling seems to stop around 2600. or 2660.

How did you guys configure it?
Re: crawling stop at a certain link
« Reply #10 on: May 08, 2009, 05:43:33 AM »
This remains unresolved for us still - it was suggested to try to run as cron but haven't managed to do this yet.
I'd love to know your answer if you find one!
Re: crawling stop at a certain link
« Reply #11 on: May 10, 2009, 06:39:19 AM »
Hello,

in most cases this issue is resolved with a custom setup of "Do not parse"/"Exclude URLs"/"Memory limit" and other settings (depending on the sites).
Please PM me your generator URL/login to check that further.
Re: crawling stop at a certain link
« Reply #12 on: May 23, 2009, 01:08:57 AM »
how do we stop spiders from crawling a certain part of a page. ... only stop it from crawling the page itself, or from crawling your links.
« Last Edit: May 23, 2009, 01:58:13 PM by admin »
Re: crawling stop at a certain link
« Reply #13 on: May 23, 2009, 03:59:42 AM »
I've a same problem. But sometimes I restart crawling it can finish.
Re: crawling stop at a certain link
« Reply #14 on: May 23, 2009, 02:00:13 PM »
If you want to stop parsing a page, you can add this in <head> section of html code:
<meta name="robots" content="index,nofollow" />