Disallowing urls on dynamic site
« on: January 21, 2006, 02:03:05 PM »
I have a number of urls in my dynamic (database driven) site, which if they have segments like the following, I want to disallow.

like: ?PHPSESSID=

as in the following
[ External links are visible to forum administrators only ]
[ External links are visible to forum administrators only ]
[ External links are visible to forum administrators only ]
[ External links are visible to forum administrators only ]
[ External links are visible to forum administrators only ]

and like: all/?orderby=

as in the following:
[ External links are visible to forum administrators only ]
[ External links are visible to forum administrators only ]
[ External links are visible to forum administrators only ]
[ External links are visible to forum administrators only ]
[ External links are visible to forum administrators only ]
[ External links are visible to forum administrators only ]

How do I disallow these "pages." I am tempted to enter into the disallow field strings like:
?PHPSESSID=
all/?orderby=
but I am not sure this will work. Please advise.
Re: Disallowing urls on dynamic site
« Reply #1 on: January 21, 2006, 02:16:23 PM »
Hello,

yes, that should work. Add this to "Exclude URLs" input field in generator configuration.
Re: Disallowing urls on dynamic site
« Reply #2 on: January 21, 2006, 02:50:33 PM »
Thanks. I tried it and the number of pages indexed were 1,004, which is less than the previous indexing. So I have to believe it worked.

Looks good in /sitemap.xml, however I am unable to find /sitemap_generator/data/urllist.txt or /sitemap_generator/data/sitemap1.html. I just converted to 2.0. Is there some setting I'm missing to create these files?
Re: Disallowing urls on dynamic site
« Reply #3 on: January 21, 2006, 03:14:59 PM »
Perhaps you have disabled the "Create Text Sitemap" and "Create HTML Sitemap" at configuration page?
Re: Disallowing urls on dynamic site
« Reply #4 on: January 21, 2006, 03:23:54 PM »
Ooooops! I thought I had checked those. Looking back, I see they were unchecked. Thanks.

One more question. I also have a robots.txt file. Will this conflict with the sitemaps as far as spiders are concerned? With sitemaps in place is a robots.txt file redundant and no longer necessary?
Re: Disallowing urls on dynamic site
« Reply #5 on: January 21, 2006, 05:17:12 PM »
Hi,

robots.txt file is required to exclude some of your pages from being indexed by search engines. So, if you had it at your site, you should leave it there.
The sitemap generator script is checking robots.txt and doesn't included the disallowed URLs into site map.
Re: Disallowing urls on dynamic site
« Reply #6 on: January 21, 2006, 05:44:27 PM »
Re: "The sitemap generator script is checking robots.txt and doesn't included the disallowed URLs into site map."

Aha! So because I have already disallowed URLs in sitemap generator, does it hurt to also disallow them in robots.txt as well?

What's the best strategy/practice here for Google, Yahoo and other search engines? Shall I keep robots.txt and NOT disallow in sitemap generator the URLs which I have already disallowed in robots.txt? Disallow URLs in both places?

Would disallowing ?PHPSESSID=, etc. work in robots.txt?
Re: Disallowing urls on dynamic site
« Reply #7 on: January 21, 2006, 10:59:23 PM »
Hi!

You should keep your robots.txt file in place AND exclude the same pages in sitemap generator. :)

generally, "?PHPSESSID=" type of links are not excluded by robots.txt protocol
more details on robots.txt: http://www.robotstxt.org/wc/robots.html
Re: Disallowing urls on dynamic site
« Reply #8 on: January 24, 2006, 12:27:28 PM »
Thanks admin.