I have an interesting situation
I am sitemapping a zencart (cart products) v1.3.6 with the generator, some of the url's in the sitemap have the session id attached when crawled by xml sitemap.
for instance a url might be [external links are visible to admins only]. mysite?someproduct=456&zenid=4sdfhasgfd5jh77
Of course I don't want the session i.d showing in a google sitemap url, so firstly I arranged to detect the user agent of sitemap as a spider (so zencart is supposed not to attach the session id when xml sitemap crawls the site) and secondly, brute force, put the session name (zenid) to be ignored in the urls, in the configuration of xml sitemap.
Also, I have the crawl rate at 1 page a second, with the idea that any cookie code gets time to settle, as I think -is it right- cookies are saved only at the end of a page's php code.
It looks like when zencart is first accessed (by a browser with cookies enabled) , some of the links on a zencart home page still have session id attached, but after accessing a link or two, the cookie sytem kicks in and the session id disappears from most of the url's.
So.. all this was done in hope..., BUT i only now get about 80 pages crawled by xml sitemap, whereas there are about 250 products! Obviously some of the page url's which go to multiple products, have the session id on, so the whole section gets rejected.
Apparently a session i.d on a url can interfere with a user's 'shopping cart' if the user gets that link from a google or other search engine.
Zencart has a file called 'spiders.txt' that part of the xml sitemap user agent string can be inserted into, and a setting in it's admin to detect spiders....
Has anyone got any useful ideas on how to deal with the unwanted url session id?