XML Sitemaps Generator

    Advanced search
Sitemap Generator Forum
July 20, 2008, 05:09:28 PM
Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
   Home   Help Search Login Register  
Sitemap software 2.9 released - Email notifications, html sitemap customizing and more
6813 Posts in 1681 Topics by Members
Latest Member: xiaolin
Pages: [1] 2
  Print  
Author Topic: Crawl Issue-Can't get Sitemap Generator to finish  (Read 3964 times)
elliesox
Registered Customer
Newbie
*
Posts: 7


View Profile
« on: March 22, 2008, 01:14:20 PM »


Hi,
I've been running the stand alone site map for over 6 hours.
This is the current picture:
Links depth: 9
Current page: index.php?main_page=shopping_cart&manufacturers_id=3&sort=20a&products_id=40&action=notify&zenid=a6b7e9f56537ab7ed59f4fa314a8cc69
Pages added to sitemap: 3527
Pages scanned: 16320 (268,577.3 Kb)
Pages left: 6111 (+ 2880 queued for the next depth level)
Time passed: 377:03
Time left: 141:11
Memory usage: 14,574.4 Kb
 
My site is not very large.
I only have 110 items listed in my online store at this time
The website url is: elliesox.com

Am I doing anything wrong?
Config, files/folders it's crawling/location?
Thanks Huh
 
Logged
admin
Administrator
Hero Member
*****
Posts: 2837


View Profile
« Reply #1 on: March 22, 2008, 04:10:48 PM »

Hello,

looks like your shopping software generated a lot of "noise" content page that should not be normally indexed (like sorting pages etc). Please let me know your generator URL and I will check the crawler exclusion list to resolve the issue.
Logged

elliesox
Registered Customer
Newbie
*
Posts: 7


View Profile
« Reply #2 on: March 22, 2008, 04:32:18 PM »

Hi,

The URL is: [external links are visible to admins only]

But now when I go to the URL I get the site with a 404 error and I'm not getting  the Sitemap menu.
Should I delete the generator file and reupload it and then email you to take a look at the configs?

Thanks
Logged
elliesox
Registered Customer
Newbie
*
Posts: 7


View Profile
« Reply #3 on: March 23, 2008, 02:16:09 PM »

HI,
I ended up deleting and reuploading the generator folder to my server at the url and now I can access the sitemap generator again.
I have not run it again, I'll wait until you can look at the config.
I did save the error_log and crawl_dump.log files and can email them to you is you'd like to see them.
The crawl dump lof is very large, 9.49MB

Let me know.

Thanks
Logged
twoeyesofblue
Registered Customer
Newbie
*
Posts: 8


View Profile
« Reply #4 on: March 23, 2008, 04:31:29 PM »

I too am having the same problem here with a shopping cart and trying to exclude things like urls ending this way:
/Compressors/?page=1&sort=2a
/Compressors/?page=1&sort=3a
/product_reviews.php/products_id/6437?osCsid=caab1a59ead8d2536ca6c11f0f5d3a41
/shop/product_reviews.php/cPath/295_559_563/products_id/2747
/shop/ACCESSORIES/WARN+PRODUCTS/Warn+Winch+Accessories/?page=1&sort=4d

I have tried the excludes, Does not seem to work.
You will notice it is even adding the session id when you have it as a default option to drop.
My site map this morning was over 7MB and ran all night.....

How can we stop all this???

As I understand the exclusions and extension exclusions all go on one line with a space between them right? ( No carriage returns)
Logged
admin
Administrator
Hero Member
*****
Posts: 2837


View Profile
« Reply #5 on: March 23, 2008, 07:58:38 PM »

Recommended settings for X-Cart websites:
Do Not parse URLs option:
Code:
js=
sort=
action=
write_review
product_reviews
reviews_write
printable=
language=
manufacturers_id=
bestseller=
sort/
action/
js/
printable/
language/
redirect.php
price_match.php

Exclude URLs option:
Code:
redirect.php
js=
sort=
action=
write_review
reviews_write
printable=
manufacturers_id=
bestseller=
Logged

twoeyesofblue
Registered Customer
Newbie
*
Posts: 8


View Profile
« Reply #6 on: March 23, 2008, 08:38:53 PM »

I run oscommerce and my cart is in the /shop directory, should any fo the above be preceeded with this path?
shop/

?
Logged
elliesox
Registered Customer
Newbie
*
Posts: 7


View Profile
« Reply #7 on: March 24, 2008, 02:22:05 AM »

Hi,
The new config has corrected the proble.
Now with the completed sitemap generated I have run into another problem.
All 5 referred from are similar.
I have 5 broken links, but cannot find the file they are referred from.
An an example of one referred from is:
index.php?main_page=tell_a_friend&products_id=21&zenid=d6cf8ed5e8c06a327c9c4a5410f5f971

Any suggestions to correct?

Thanks
Logged
elliesox
Registered Customer
Newbie
*
Posts: 7


View Profile
« Reply #8 on: March 24, 2008, 02:48:48 AM »

I submitted it to Google to see what it would come up with and Google came up with 10 "Paths don't match" errors

Let me know if I should PM you the URL(s)

Thanks,
Robert
 Sad
Logged
twoeyesofblue
Registered Customer
Newbie
*
Posts: 8


View Profile
« Reply #9 on: March 24, 2008, 03:00:08 PM »

The X-Cart suggestions you included, along with a couple more have solved ALL my problems with the oscommerce cart. It was a pleasure to see everything ran from cron last night with no errors and NO MORE TRASH output.

Many thanks for your help, May I suggest making your cart suggestions a part of a sticky F.A.Q for "Shopping Cart Operators or include them in your doc files.

Thanks,

TwoEyesOfBlue
Logged
admin
Administrator
Hero Member
*****
Posts: 2837


View Profile
« Reply #10 on: March 25, 2008, 01:04:05 AM »

Replied to your PM, Robert.
I submitted it to Google to see what it would come up with and Google came up with 10 "Paths don't match" errors

Let me know if I should PM you the URL(s)

Thanks,
Robert
 Sad
Logged

admin
Administrator
Hero Member
*****
Posts: 2837


View Profile
« Reply #11 on: March 25, 2008, 01:04:41 AM »

I'm glad that worked for you, thank you for suggestion, that is a good idea.

The X-Cart suggestions you included, along with a couple more have solved ALL my problems with the oscommerce cart. It was a pleasure to see everything ran from cron last night with no errors and NO MORE TRASH output.

Many thanks for your help, May I suggest making your cart suggestions a part of a sticky F.A.Q for "Shopping Cart Operators or include them in your doc files.

Thanks,

TwoEyesOfBlue
Logged

twoeyesofblue
Registered Customer
Newbie
*
Posts: 8


View Profile
« Reply #12 on: March 25, 2008, 03:53:31 PM »

Well I spoke a little too soon I guess as I thought since I received no errors from cron that it ran ok.  Later yesterday I realized I had pulled a good one in setting cron from cpanel but did not activate it (Good reason I got no errors I guess) However all sitemaps and googlebase created this morning just fine and completed with no errors.  The HTML site map did not update and cron sent hundreds of these repeating lines:

Warning: fwrite(): supplied argument is not a valid stream resource in /home/twoeyesofblue/public_html/generator/pages/class.xml-creator.inc.php(2) : eval()'d code on line 169

Warning: fwrite(): supplied argument is not a valid stream resource in /home/twoeyesofblue/public_html/generator/pages/class.xml-creator.inc.php(2) : eval()'d code on line 171

Warning: fwrite(): supplied argument is not a valid stream resource in /home/twoeyesofblue/public_html/generator/pages/class.xml-creator.inc.php(2) : eval()'d code on line 169

Warning: fwrite(): supplied argument is not a valid stream resource in /home/twoeyesofblue/public_html/generator/pages/class.xml-creator.inc.php(2) : eval()'d code on line 171


I have checked all files inside the /generator/pages directory and all have the same 0644 file permissions.
All site map files are 0666 and all updated fine except sitemap.html

Any Ideas?

I ran it manually yesterday and saw no errors when ran manually. I started with an empty sitemap.html and it wrote to it fine. It appears the problem is in the update to it.

TwoEyesOfBlue
Logged
admin
Administrator
Hero Member
*****
Posts: 2837


View Profile
« Reply #13 on: March 26, 2008, 02:52:02 AM »

Hello,

if you have any files inside generator/data/ folder, set their permissions to 0666.
Logged

twoeyesofblue
Registered Customer
Newbie
*
Posts: 8


View Profile
« Reply #14 on: March 26, 2008, 12:11:32 PM »

Well I can't change them all. The ones that were generated by a manual run from  a log in are owned by me the user. All those generated by Cron are owned by nobody (apache) and they are at 0644.

Here is something else. I have sitemap.html thru sitemap25.html defined and properly chmodded in the root. However I see sitemap.html thru sitemap26.html in /generator/pages directory. The cron run did not write ANYthing in any of them out in the /webroot directory. I went ahead and made a sitemap26.html for the webroot and chmodded it correctly. I am going to do a manual run and see what happens. I am starting to suspect the run from cron is failing while the other day on the 1st. install that worked was a manual run.  What I see too at looking at those in the /generator/pages directory are full of the trash fromthe shopping cart that we are now blocking from your suggestions. Let you know what happens on a manual run.
Logged
Pages: [1] 2
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.5 | SMF © 2006, Simple Machines LLC Valid XHTML 1.0! Valid CSS!