Unfinished sitemaps do not resume after restart.
« on: November 25, 2012, 11:37:53 AM »
I have been running the site map generator as a cron job for some time now.

As you know it fails more often than not. We have tried everything and I am sending you the reports that are generated after each cron job.

The code I am using is

[ External links are visible to forum administrators only ]

however each time it fails I get the report that looks like this:

2012-11-25 01:18:22 (1.02 KB/s) - `index.php?op=crawlproc&resume=1.175' saved [459411]

So that means to me that I have 175 failed sets of files. So when I go back and manually restart the failed mission, I get parts of previous site maps added to the new one and the information is often incorrect. I get files that are half finished with no EOF or the style sheet is missing and Google can't parse the files.

I have other cron jobs working on my server running back up's of word press files in another directory and it works 99 percent of the time.

I have tried just about every tweak I can think of to make this program work. I think the program has possibilities but it is costing me ranking. I have lost about half my ranking over the last year.

I am pleading with you for some answers.
Re: Unfinished sitemaps do not resume after restart.
« Reply #2 on: November 26, 2012, 10:13:53 AM »
I had it down to 10,000 files per site map. It still didn't work.
Re: Unfinished sitemaps do not resume after restart.
« Reply #3 on: November 27, 2012, 09:29:06 AM »
Decreasing it even more might help. Generally, the issue should be resolved by increasing max_execution_time/memory_limit settings.
In case if generator process is interrupted you need to check server logs to find out why it has been stopped.
Re: Unfinished sitemaps do not resume after restart.
« Reply #4 on: November 27, 2012, 04:51:15 PM »
I have the max time at 90000.
How do I check server logs? GoDaddy is my host.
Re: Unfinished sitemaps do not resume after restart.
« Reply #6 on: November 29, 2012, 05:02:56 AM »
Wait a minute. I asked a question. Why is it incrementing to `index.php?op=crawlproc&resume=1.175'.
I didn't ask you why it is failing. The cron job should start at "1" each time and if it fails start the unfinished job. Not start a complete new one.
I don't see how checking the server logs is going to benificial to this aspect of the program.
However if your telling me that it is failing often, yes I agree. I have been sending you the results of the cron job for you to check out but have not gotten a reply from you about that at all.
Re: Unfinished sitemaps do not resume after restart.
« Reply #8 on: November 30, 2012, 02:33:42 PM »
--2012-11-30 01:10:02--  [ External links are visible to forum administrators only ]
Resolving [ External links are visible to forum administrators only ]... 184.168.182.1 Connecting to [ External links are visible to forum administrators only ]|184.168.182.1|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: `index.php?op=crawlproc&resume=1.180'

     0K .......... .......... .......... .......... .......... 23.6M
    50K .......... .......... .......... .......... ..........  554
   100K .......... .......... .......... .......... ..........  563
   150K .......... .......... .......... .......... ..........  526
   200K .......... .......... .......... .......... .......... 7.87M
   250K .......... .......... .......... .......... ..........  555
   300K .......... .......... .                                 273 =7m34s

2012-11-30 01:18:49 (725 B/s) - `index.php?op=crawlproc&resume=1.180' saved [329278]

That is the email report that I get. You tell me where the 1.180 is coming from. It's your program.
Re: Unfinished sitemaps do not resume after restart.
« Reply #9 on: November 30, 2012, 04:01:56 PM »
Looks like you open index.php?op=crawlproc URL with command line tool, while you should run it directly with command line like:
php /path/to/generator/runcrawl.php
Re: Unfinished sitemaps do not resume after restart.
« Reply #10 on: November 30, 2012, 04:28:05 PM »
That is the way you set up the Cron Job for me when I gave you access to the account.
What do I need to change on the Cron Job using the GoDaddy interface.
Duplicates in sitemap
« Reply #12 on: June 14, 2013, 06:36:47 PM »
I have spent quite a bit of time making sure that the site is crawlable. I used Xendu to crawl it and point out bad links and eliminated them. I made sure that the pages were formatted correctly and anything else that I could think of doing when the sitemap generator kept failing. During that time I even sent duplicate reports to you so you could look into them.

So here I am with results like this. I watched the generator all day. It would work for a while and then pause. It would reach a point in the site, and then when the screen refreshed, it would display a part of the site that was already crawled. It would continue on and reach a little further and then pause. When the screen refreshed it was back again to a part of the site that was already crawled.

So it finally finished and this is the sitemap I got [http://www.2snapsup.com/sitemap.xml.gz] There are multiple pages that are in the index 2 times.

At other times when I do get a sitemap to crawl it will finish, and then immediately start crawling the site again generating a sitemap that is only a fraction of the sites pages. My biggest frustration is that I can never duplicate the problem the same way to accurately tell you what is going on so you can possibly fix it.

I wish you could tell me what to do to fix this problem. My rankings in Google have dropped about 75 percent since I started to use this program.

Thanking you in advance.

Mick
Re: Unfinished sitemaps do not resume after restart.
« Reply #13 on: June 18, 2013, 10:42:30 PM »
You would need to make sure that sitemaps are created with one step (i.e. allow enough resources for sitemap generator not to stop) to avoid duplicates.
Re: Unfinished sitemaps do not resume after restart.
« Reply #14 on: June 19, 2013, 12:32:13 AM »
It was done in one complete session. I watched it and captured screen shots of it creating the site map files. Do you want me to post them?