Do not parse URLs
« on: February 26, 2007, 08:25:57 AM »
Hi,

I am not sure if this is the right section of the forum to ask this question. If not please move it to the right place.

The question i had was for the below fields-

Do not parse URLs: - do not fetch pages that contain these substrings in URL (these URLs will still be added to sitemap!)  & Exclude URLs: - do not include URLs that contain these substrings, one string per line

I have hundreds of URLs such as below

introduce_yourself/1249-the_entry_i_meanthe_introduction.html?p=2263
introduce_yourself/1250-hi_therel.html?p=2264
introduce_yourself/1251-hello_I_am_amy.html?p=2263

These above urls are same as
introduce_yourself/1249-the_entry_i_meanthe_introduction.html
introduce_yourself/1250-hi_therel.html
introduce_yourself/1251-hello_I_am_amy.html

So i want to exclude all urls from the sitemap which has .html?p= but let the html pages stay ... how do i do this?

a good example from a site being created currently would be

- <url>
  <loc>MYDOMAIN/tech_forum/23902-mb_on_new_servers_your_inputs_needed.html</loc>
  <priority>0.5</priority>
  <lastmod>2007-02-25T22:29:57+00:00</lastmod>
  <changefreq>weekly</changefreq>
  </url>
- <url>
  <loc>MYDOMAIN/tech_forum/23902-mb_on_new_servers_your_inputs_needed.html?p=73374</loc>
  <priority>0.5</priority>
  <lastmod>2007-02-25T22:29:57+00:00</lastmod>
  <changefreq>weekly</changefreq>
  </url>
- <url>
  <loc>MYDOMAIND/general_discussion/23714-congrats_mrgovardhan_vt.html</loc>
  <priority>0.5</priority>
  <lastmod>2007-02-25T22:29:57+00:00</lastmod>
  <changefreq>weekly</changefreq>
  </url>
- <url>
  <loc>MTDOMAIND/general_discussion/23714-congrats_mrgovardhan_vt.html?p=73344</loc>
  <priority>0.5</priority>
  <lastmod>2007-02-25T22:29:57+00:00</lastmod>
  <changefreq>weekly</changefreq>
  </url>

As you will see both are the same urls that being added to the sitemap as two different urls. the only difference being... .html?p= but both lead to the same place. So i want to exclude all urls from the sitemap which has .html?p= but let the html pages stay ... how do i do this?


Any help would be appreciated.
Thanks
Re: Do not parse URLs
« Reply #1 on: February 26, 2007, 02:31:14 PM »
Hello,

you should add the following both to "Do not parse URLs" and "Exclude URLs" options:
Code: [Select]
html?p=