I have a list of sitemap files. (These I generate programmatically)
These Sitemap file are huge having thousands of URLs.
It is very difficult to check each and every URL manually.
So I have generated the utility which parses this sitemap file and using Apache Commons HttpInvoker I check if it is valid or not.
* Some urls if they are invalid they return 404 response; so I can find out the problem.
* But in some cases due to some exception error page is shown. So this is not a valid URL. But it does not
return the 404 response.
Response code is 200.
So there is no way for me to identify if it is a valid URL or no.
Not sure, I have heard that web-master tool does the same checking; so there must be something which can help to identify the valid URLS.
(Please note I'm not talking about XML validation, I'm talking about broken URLS where esponse code is other than 400. )
Any Help on this is appreciated.
Thanks in advance.