Crawling freezes, no error.
« on: November 30, 2008, 08:02:11 AM »
I have my own server. I have already used your script to index a 225,000 page site in 50 hours without any problems.

On the same server is another site with about 300,000 pages. I am using the exact same settings I used for the first site. However, after about 116,000 pages and 24 hours it just stops.

It's not a memory issue. There is no php error in the php log. Your script just seems to "break".

When I click on the Crawling page, it does not load completely. After:

Run in background
Do not interrupt the script even after closing the browser window until the crawling is complete

There is nothing else. The second checkbox option and the submit button are missing.

This has happened twice now. The first time I blew it off, cleared the data directory, and started over.

But this time it happened again, and it looks like it happened at the same point during the crawl.


CT

Re: Crawling freezes, no error.
« Reply #1 on: November 30, 2008, 08:53:06 PM »
I've attached a cropped screen shot of the problem. It shows how the Crawiling page is rendering, and how it is "broken".


CT
Re: Crawling freezes, no error.
« Reply #3 on: November 30, 2008, 11:28:21 PM »
I sent a message, but I am not sure if you got it. There was no confirmation of send (or error, for that matter) and there was nothing in the outbox. Let me know if you didn't get it.
Re: Crawling freezes, no error.
« Reply #4 on: December 01, 2008, 07:39:59 PM »
Yes, I got it and replied to you. (the message is not stored in Outbox unless you enable "save message" checkbox)
Re: Crawling freezes, no error.
« Reply #5 on: December 01, 2008, 09:41:23 PM »
Ok, your resume link worked, for a bit. Now there is a new problem.

The script has gotten to this point:

Links depth: 6
Current page: journal/journal_section.php?section=spells&journal=ManzanaOscura
Pages added to sitemap: 133957
Pages scanned: 201080 (5,876,606.7 KB)
Pages left: 56908 (+ 209494 queued for the next depth level)
Time passed: 1903:11
Time left: 538:37
Memory usage: 224,822.0 Kb
Resuming the last session (last updated: 2008-12-01 18:21:14)

However, it keeps quitting without an error. I've tried resuming 5 times and the script never progresses past this point. The Crawling page is no longer broken, I can resume, but it doesn't resume past this spot - it just quits.
Re: Crawling freezes, no error.
« Reply #6 on: December 02, 2008, 09:51:35 PM »
Hello,

it looks like your server configuration doesn't allow to run the script long enough to create full sitemap. Please try to increase memory_limit and max_execution_time settings in php configuration at your host (php.ini file) or contact hosting support regarding this.
Re: Crawling freezes, no error.
« Reply #7 on: December 02, 2008, 10:36:39 PM »
I have my own server.

I have already generated a sitemap with your script for another site on this server. The previous sitemap (which worked) was bigger than this one, ran longer than this one, and used more memory than this one.

So I know the issue is not my server or memory or execution time settings.
Re: Crawling freezes, no error.
« Reply #8 on: December 02, 2008, 10:45:29 PM »
I solved the issue on my own.

"Save the script state, every X seconds" was set to 30 seconds. When I changed it to 180 seconds, the script continued to process without quitting.

You might want to consider adding some code in a future version that won't attempt a save unless the script is "ready". It looks like the auto-save was the cause of the constant script interruption.
Re: Crawling freezes, no error.
« Reply #9 on: December 02, 2008, 11:47:04 PM »
I spoke too soon. The issue is not resolved, but I did narrow it down.

After I changed "Save the script state" to 180 seconds, the script ran for 3 minutes, scanned and indexed more pages, then quit.

I changed "Save the script state" to 600 seconds. The script ran for 5 minutes, scanned and indexed more pages, then quit.

The file crawl_dump.log in the data directory has a timestamp of Dec 1 18:21. This is exactly when this problem arose.

I can only conclude that the source of the problem is the backup routine. The script tries to back itself up, it fails, then the script quits.
Re: Crawling freezes, no error.
« Reply #10 on: December 04, 2008, 09:24:34 PM »
Please try to set 0666 permissions for crawl_dump.log file or (if possible) remove all files from data/ folder and start generator from the scratch.