• Welcome to Sitemap Generator Forum.
 

Excluding folders and robots.txt

Started by merrick777, July 09, 2009, 08:01:56 AM

Previous topic - Next topic

merrick777

Very confused... I crawled without having a robots.txt on my host AND without entering anything in the 'Exclude URLs' area - result: 1788 pages indexed.

Then I crawled with robots.txt excluding some folders - result: 1788 pages indexed.

Then I crawled with robots.txt AND folders entered in the 'Exclude URLs' area - result: 1788 pages indexed. Format for entering folders in the 'Exclude' area is:
folder1/
folder2/

How can I make sure my sensitive stuff is excluded (buyer histories, personal buyer data, etc)?
Thx


merrick777

Hello Oleg,
No - they are not in the sitemap, but I'm trying to understand the logic... I crawled BEFORE uploading robots.txt to my server and BEFORE entering anything in the 'Exclude URLs' area of 'Configuration'... and still got the same 1,788 pages crawled. While it's true that the sitemap has been generated the way that I want (with these excluded), my question is: How?? (The only reason I even care is because I want to make sure I understand the logic of the script in order to make sure I am doing everything properly).

Also, I cannot seem to find an explanation of ROR.XML - what is this and how/why/under what circumstances do I use it?
Thanks so much


XML-Sitemaps Support

Hello,

so that means that those pages wer not in sitemap before as well. Perhaps there is no way to reach them starting from homepage and sitemap generator crawler cannot find them.

You can find details on ror sitemaps here: [ External links are visible to logged in users only ]

merrick777

I think you may be right... there isn't a way to reach them from the homepage. However, my blog is reachable from home (and every) page, but it was not crawled. If you wouldn't mind, please have a look and let me know: www.h eal thyleg [ External links are visible to forum administrators only ]
and add '/blog' to go straight to the blog. (I out the spaces in the url so the bulletin board system won't mess it up, AND I don't want search engines crawling this discussion and pointing it to my site.
Thx

XML-Sitemaps Support

Hello,

your blog is redirected to [ External links are visible to logged in users only ] while main site is on [ External links are visible to logged in users only ] (with www), you should make them both located in the same subdomain type.

merrick777

This topic of whether or not to use the www is so confusing to me. Is this something that I can fix by setting a 301 redirect from [ External links are visible to forum administrators only ] to [ External links are visible to forum administrators only ] (and I guess it should be without using masking)? Or is it something that I need to have my designer change in the CSS?  If I go to my site WITH the www, and I lay my mouse over 'blog', it shows that I will be going to [ External links are visible to forum administrators only ] BUT if I go to my site without the www, that same link shows without the www.
Once I fix this, xml-sitemap will crawl the blog also?
Thanks again Oleg - sorry to take so much of your time, but you've been a fantastic help!

merrick777

Oleg, you were on the money! I went into my wordpress admin panel for blog... changed the blog location to include the www, re-ran the crawl, and I see the blog now included! Nice!