Generator troubles.
« on: January 11, 2006, 05:11:44 PM »
Hi,
I've really had a easy time using this tool for (I think) two months now. It been really easy to use, becasue my site ([ External links are visible to forum administrators only ]) is definately less than 500 pages and it's really user friendly and fast! But...
Last night I created subdomains for my site and now when I try to generate a sitemap, it literally follows every "link" including paths to scripts, photo files, folders, etc. It's counting everything as a "page" and I have no idea why!
What have I done wrong so that before, it followed only navigational links, and now it's following every path on my pages?

Thanks
Re: Generator troubles.
« Reply #1 on: January 11, 2006, 05:21:20 PM »
Hello,

the generator crawler follows every link that is include as normal link with <a> tag. If you have included the pages that are not linked in this way, please post (or PM me) an example.
Thanks!
Re: Generator troubles.
« Reply #2 on: January 11, 2006, 05:30:22 PM »
Thanks for responding so quickly.
As part of a thumbnail viewing script I include tags like

<a href="#" onClick="return modifyimage('dynloadarea', 9)"><img border="0" src="images/4/thumbs/10.jpg","" width="50" height="50" style="margin-bottom: 5px"></a>

which creates a click-able thumbnail which brings up a larger picture. It does not actually navigate. The generator ignored this style of <a> tags previously. But now it folows the path and counts the .jpg as a page. Did something change, or is my scripting off somehow?
Re: Generator troubles.
« Reply #3 on: January 11, 2006, 08:15:16 PM »
Hi,

this link should not be followed by the crawler. Please let me know the URL where this code appears to check it further.
Re: Generator troubles.
« Reply #4 on: January 11, 2006, 08:48:24 PM »
The url for the above link is [ External links are visible to forum administrators only ] . All of my picture sample pages us this same javascript, and the generator is following all of them as links to a page.

Thank you.
Re: Generator troubles.
« Reply #5 on: January 12, 2006, 12:27:33 AM »
Hello,

Thanks, I've checked this.

The crawler finds your folder page [ External links are visible to logged in users only ] and then discovers all individual pages/images links from it. I suggest you to create an empty index.html file in this folder to disallow listing of all your files and in the meantime I've modified the crawler to avoid this issue, please check that. :)

pol

*
  • *
  • 1
Re: Generator troubles.
« Reply #6 on: February 28, 2006, 01:34:34 AM »
I have a similar issue. I have installed the package on Windows 2003 and all permission settings as per instruction - When I start crawling it goes for about 5 minutes with the status bar in progress, but after that it stops without any update, or any error, or any report in the log.
Re: Generator troubles.
« Reply #7 on: February 28, 2006, 01:36:05 AM »
I got a parsing error in line 5 of my xml. What do you need from me to fix that?

Also, I noticed how you keep talking about using the generator, but if i upload to google my xml file. Won't that work the same?

Im lost  :)
Re: Generator troubles.
« Reply #8 on: March 01, 2006, 05:04:33 PM »
I have a similar issue. I have installed the package on Windows 2003 and all permission settings as per instruction - When I start crawling it goes for about 5 minutes with the status bar in progress, but after that it stops without any update, or any error, or any report in the log.
Hello,

you should increase the "maximum script execution time" setting in server configuration to let the script work longer and complete the full crawling of your website.


PS. you can use the forums username/password sent to you in automated email to post in "Standalone Generator" subforums.

Thank you!
Re: Generator troubles.
« Reply #9 on: March 01, 2006, 05:04:54 PM »
I got a parsing error in line 5 of my xml. What do you need from me to fix that?

Also, I noticed how you keep talking about using the generator, but if i upload to google my xml file. Won't that work the same?

Im lost  :)
Hello,

please let me know the URL of your sitemap file.