Question about crawling
« on: September 07, 2006, 08:46:10 PM »
Hi,

I have started my first crawl and wondered if anyone can clear something up for me.

My site has 1000s of pages so I am expecting this to take all night, however the page just simply states:

Please wait. Sitemap generation in progress...

Links depth: -
Current page: -
Pages added to sitemap: -
Pages scanned: - (- Kb)
Pages left: - (+ - queued for the next depth level)
Time passed: -
Time left: -
Memory usage: -

This hasn't changed now for the last hour.  Surely there should be information filled in all of these sections???

Can anyone let me know if this is correct or indeed if there is an issue here?

Many Thanks
Phillip Lloyd
Re: Question about crawling
« Reply #1 on: September 09, 2006, 02:15:25 AM »
Hello Phillip,

in some configurations the output of scripts is cached on server and is not sent to the browser untl script is not finished, as a result you don't see the progress updates while script is working. You can execute sitemap generator with "Run in background" checkbox enabled and then check back to "Crawling page" - you should be able to see the current state of crawling (it will not be automatically updated on the screen, but rather only the current values).
Re: Question about crawling
« Reply #2 on: September 09, 2006, 04:25:31 PM »
I ran the crawl again in the background closed the window - as it says you can, but when I opened up a new page an hour later and went to my generator it stills says:

'Sitemap was not generated yet, please go to Crawling page to start crawler manually or to setup a cron job'.

When I click on crawling I am presented with the normal page, as if to start again???

Can you figure out what might be the problem here???

Many Thanks
Re: Question about crawling
« Reply #3 on: September 10, 2006, 12:30:36 AM »
It is better to keep the window opened in this case.

Also, if you have ssh access to your host, it's suggested to execute sitemap generator from the command line for the larger sites.
Re: Question about crawling
« Reply #4 on: September 10, 2006, 02:13:18 PM »
Even when i keep the window open the internet tool bar starts to load when i click crawl, then in a couple of minutes it has loaded and states: Done.  I can then click through the web as normal.

I have even left this overnight with the window open and it seems to be in the same state in the morning as it was when I went to bed???

I ask this because I have used GSite Crawler in the past, and even though it took all night, it still processed and provided me with information of what was going on.

I need re-assurance or someone to look and see what the problem could be.

Thanks
Phillip Lloyd
Re: Question about crawling
« Reply #5 on: September 11, 2006, 02:03:06 AM »
Hello,

please execute sitemap generator from the command line at your server in this case or modify php configuration at your host (increase max_execution_time setting in php.ini).
Re: Question about crawling
« Reply #6 on: September 11, 2006, 10:38:33 AM »
Hi,

Again, I appreciate your response but would ask, in this instance, for you to break the last bit of information down into easier steps.

Are you saying I should speak to my host and ask them to execute the script from there???

I am on a shared server, will this make any difference???

If you could please explain this one to me a little easier it would be appreciated.

Many Thanks
Phillip
Re: Question about crawling
« Reply #7 on: September 12, 2006, 03:26:37 AM »
Hello Phillip,

Quote
Are you saying I should speak to my host and ask them to execute the script from there???
No, my suggestion was for you to execute sitemap generator from command line. It is required to have SSH access the host (you can ask your hosting support whether you have it).
Quote
I am on a shared server, will this make any difference???
If you are on a shared server, you are not allowed to change php settings on your own, so you should contact your hosting support regarding increaseing max_execution_time setting in php.ini


Another option is to use "Save state" option of Sitemap Generator:
1. Execute generator and check how long it is running before it's interrupted (for instance, 90 seconds)
2. Set "Save state" option to 60 seconds
3. Execute generator again
4. After it is stopped, open "Crawling" page again and re-execute generator with "Resume generation" checkbox enabled. It will then load crawling state from the file and continue execution.
You might need to repeat step 4 multiple times until full sitemap is created.
Re: Question about crawling
« Reply #8 on: September 12, 2006, 02:02:52 PM »
Ok,

I've tried that and after a while of loading (which was longer than normal i might add) i was presented with this screen:



Any Ideas?

Thanks
Phill
Re: Question about crawling
« Reply #9 on: September 13, 2006, 02:08:04 AM »
Hello,

this error means that script times out, so you should use of my suggestions posted above in this thread.
Re: Question about crawling
« Reply #10 on: September 13, 2006, 03:53:04 PM »
Ok,

I've spoke to my host and uploaded a php.ini file to change the default of the max_execution_script from 30 to 90 secs.

When i attempted to crawl I used the second method in your reply to me and was presented with a resume from save option.

So at least i can now see progress.  I ran the crawl again and waited, after a while it looked as though nothing was happening so i clicked back just to check where the resume option would carry on from.

It had progressed.  This confirmed my suspistions that it was working in the background and due to the sheer size of my site, paitence was the game.

NOW...

I clicked the crawl button and checked the resume from button RUN

I was presented with the following:

Fatal error: Allowed memory size of 16777216 bytes exhausted (tried to allocate 155 bytes) in /websites/LinuxPackage03/fa/nc/yd/fancydressretail.co.uk/public_html/generator/pages/class.grab.inc.php(2) : eval()'d code(1) : eval()'d code(1) : eval()'d code on line 315

Any ideas how i can resolve this???

A bit long winded I know, but I didn't want to leave anything out.

Thanks
Phill
Re: Question about crawling
« Reply #11 on: September 14, 2006, 03:26:54 AM »
Hello Phill,

unfortunately, this is another configuration issue that might happen to larger sites - memory limit applied to php scripts has been exceeded. This can be resolved with another modification in php.ini by increasing memory_limit setting.
Re: Question about crawling
« Reply #12 on: September 18, 2006, 11:33:51 AM »
Right i've increased the size and i am still presented with the following issue when i continue my sitemap crawl:

The page shows me how long to go, how long it's been going, what page, etc etc...

But it sits on the same url for the duration of time that I am willing to let this script run.

When I click from page to page and resume the sitemap it tells me that it has generated 1557 pages.

This is only 1 page more than it originally created from the previous crawl???

This just doesn't seem to be working for me and it is getting increasingly more frustrating as each day goes on!!!

Can you please provide me with a workable solution asap.

Many Thanks
Phillip Lloyd
« Last Edit: September 18, 2006, 02:18:00 PM by enquiries2 »
Re: Question about crawling
« Reply #13 on: September 18, 2006, 11:36:39 PM »
Hello Phillip,

please let me know your generator URL/login in private message.