cookies?
« on: July 01, 2006, 09:37:45 AM »
hi again.

will the script accept cookies? I have some pages that are "protected" by cookies for ageverification purposes if the page is opened by normal browser. if the agent string includes "bot", "spider" or "search" the page is loaded normally with out the ageverification check.

so:

a.) does the script accept cookies?

b.) if not, what is the agent string of the script, in order that i can add that to avoide ageverification.

greetings,
g
Re: cookies?
« Reply #1 on: July 02, 2006, 01:04:38 AM »
Hello,

1. Yes, Sitemap generator supports cookies
2. Just in case, user-agent string is:
Code: [Select]
XML Sitemaps Generator 1.0 (https://www.xml-sitemaps.com/)
Re: cookies?
« Reply #2 on: November 10, 2006, 12:10:35 AM »
I have an interesting situation
I am sitemapping a zencart (cart products) v1.3.6 with the generator, some of the url's in the sitemap have the session id attached when crawled by xml sitemap.
for instance   a url might be [ External links are visible to forum administrators only ]. mysite?someproduct=456&zenid=4sdfhasgfd5jh77

Of course I don't want the session i.d showing in a google sitemap url, so firstly I arranged to detect the user agent of sitemap as a spider (so zencart is supposed not to attach the session id when xml sitemap crawls the site) and secondly, brute force, put the session name (zenid) to be ignored in the urls, in the configuration of xml sitemap.

Also, I have the crawl rate at 1 page a second, with the idea that any cookie code gets time to settle, as I think -is it right- cookies are saved only at the end of a page's php code.

It looks like when zencart is first accessed (by a browser with cookies enabled) , some of the links on a zencart  home page still have session id attached, but after accessing a link or two, the cookie sytem kicks in and the session id disappears from most of the url's.

So.. all this was done in hope..., BUT i only now get about 80 pages crawled by xml sitemap, whereas there are about 250 products!  Obviously some of the page url's which go to multiple products, have the session id on, so the whole section gets rejected.

Apparently a session i.d on a url can interfere with a user's 'shopping cart' if the user gets that link from a google or other search engine.

Zencart has a file called 'spiders.txt' that part of the xml sitemap user agent string can be inserted into, and a setting in it's admin to detect spiders....

Has anyone got any useful ideas on how to deal with the unwanted url session id?
Re: cookies?
« Reply #3 on: November 10, 2006, 09:52:09 PM »
Hello,

in this case instead of adding zenid into URLs exclusion list, please try to manually modify your config.inc.php file:
FIND:
Code: [Select]
'xs_cleanurls'=>'',
REPLACE WITH:
Code: [Select]
'xs_cleanurls'=>'&zenid=.+',