Sitemap verification
« on: August 10, 2009, 10:26:39 AM »

I have a list of sitemap files. (These I generate programmatically)
These Sitemap file are huge having thousands of URLs.

It is very difficult to check each and every URL manually.

So I have generated the utility which parses this sitemap file and using Apache Commons HttpInvoker I check if it is valid or not.
 *    Some urls if they are invalid they return 404 response; so I can find out the problem.

  *   But in some cases due to some exception error page is shown. So this is not a valid URL. But it does not   
     return the 404 response.
      Response code is 200.
      So there is no way for me to identify if it is a valid URL or no.

Not sure, I have heard that web-master tool does the same checking; so there must be something which can help to identify the valid URLS.

(Please note I'm not talking about XML validation, I'm talking about broken URLS where esponse code is other than 400. )
Any Help on this is appreciated.

Thanks in advance.

Re: Sitemap verification
« Reply #1 on: August 10, 2009, 07:06:38 PM »
Hello Leena,

in case if you are using our sitemap generator (either online generator or standalone generator script), it crawls your site and automatically checks http responses for every link, so broken links are not included in sitemap.


Re: Sitemap verification
« Reply #2 on: August 12, 2009, 01:30:45 PM »
I have six broken links on my site according to sitemap. When I test my links at my webhosts, all of them work. Yesterday it was only five and today 6. I double tjecked and tested them and they work as normal. It is a problem for me since I want to get the bots over them. Can anyone advise me any info?? Please help. My links are cloaked and at my cloaking system they work as normal. I also have more than one link to the same pages, can that maybe be the problem?

Thank you
Eloise :-\
Re: Sitemap verification
« Reply #3 on: August 12, 2009, 09:40:33 PM »

please check with our HTTP headervs viewer tool that those URLs return http code 200 (normal):