XML Sitemaps Generator

    Advanced search
Sitemap Generator Forum
July 20, 2008, 08:00:12 AM
Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
   Home   Help Search Login Register  
Sitemap software 2.9 released - Email notifications, html sitemap customizing and more
6811 Posts in 1679 Topics by Members
Latest Member: randy.slabey
Pages: [1]
  Print  
Author Topic: Disallowing urls on dynamic site  (Read 13800 times)
paul1
Registered Customer
Newbie
*
Posts: 5


View Profile
« on: January 21, 2006, 02:03:05 PM »

I have a number of urls in my dynamic (database driven) site, which if they have segments like the following, I want to disallow.

like: ?PHPSESSID=

as in the following
[external links are visible to admins only]
[external links are visible to admins only]
[external links are visible to admins only]
[external links are visible to admins only]
[external links are visible to admins only]

and like: all/?orderby=

as in the following:
[external links are visible to admins only]
[external links are visible to admins only]
[external links are visible to admins only]
[external links are visible to admins only]
[external links are visible to admins only]
[external links are visible to admins only]

How do I disallow these "pages." I am tempted to enter into the disallow field strings like:
?PHPSESSID=
all/?orderby=
but I am not sure this will work. Please advise.
Logged
admin
Administrator
Hero Member
*****
Posts: 2837


View Profile
« Reply #1 on: January 21, 2006, 02:16:23 PM »

Hello,

yes, that should work. Add this to "Exclude URLs" input field in generator configuration.
Logged

paul1
Registered Customer
Newbie
*
Posts: 5


View Profile
« Reply #2 on: January 21, 2006, 02:50:33 PM »

Thanks. I tried it and the number of pages indexed were 1,004, which is less than the previous indexing. So I have to believe it worked.

Looks good in /sitemap.xml, however I am unable to find /sitemap_generator/data/urllist.txt or /sitemap_generator/data/sitemap1.html. I just converted to 2.0. Is there some setting I'm missing to create these files?
Logged
admin
Administrator
Hero Member
*****
Posts: 2837


View Profile
« Reply #3 on: January 21, 2006, 03:14:59 PM »

Perhaps you have disabled the "Create Text Sitemap" and "Create HTML Sitemap" at configuration page?
Logged

paul1
Registered Customer
Newbie
*
Posts: 5


View Profile
« Reply #4 on: January 21, 2006, 03:23:54 PM »

Ooooops! I thought I had checked those. Looking back, I see they were unchecked. Thanks.

One more question. I also have a robots.txt file. Will this conflict with the sitemaps as far as spiders are concerned? With sitemaps in place is a robots.txt file redundant and no longer necessary?
Logged
admin
Administrator
Hero Member
*****
Posts: 2837


View Profile
« Reply #5 on: January 21, 2006, 05:17:12 PM »

Hi,

robots.txt file is required to exclude some of your pages from being indexed by search engines. So, if you had it at your site, you should leave it there.
The sitemap generator script is checking robots.txt and doesn't included the disallowed URLs into site map.
Logged

paul1
Registered Customer
Newbie
*
Posts: 5


View Profile
« Reply #6 on: January 21, 2006, 05:44:27 PM »

Re: "The sitemap generator script is checking robots.txt and doesn't included the disallowed URLs into site map."

Aha! So because I have already disallowed URLs in sitemap generator, does it hurt to also disallow them in robots.txt as well?

What's the best strategy/practice here for Google, Yahoo and other search engines? Shall I keep robots.txt and NOT disallow in sitemap generator the URLs which I have already disallowed in robots.txt? Disallow URLs in both places?

Would disallowing ?PHPSESSID=, etc. work in robots.txt?
Logged
admin
Administrator
Hero Member
*****
Posts: 2837


View Profile
« Reply #7 on: January 21, 2006, 10:59:23 PM »

Hi!

You should keep your robots.txt file in place AND exclude the same pages in sitemap generator. Smiley

generally, "?PHPSESSID=" type of links are not excluded by robots.txt protocol
more details on robots.txt: http://www.robotstxt.org/wc/robots.html
Logged

paul1
Registered Customer
Newbie
*
Posts: 5


View Profile
« Reply #8 on: January 24, 2006, 12:27:28 PM »

Thanks admin.
Logged
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.5 | SMF © 2006, Simple Machines LLC Valid XHTML 1.0! Valid CSS!