Some problems and questions about SG
« on: March 30, 2009, 10:25:29 AM »
HI Oleg,
great script. We've justbought it for our dedicated server.
I have some problems and some questions

Problems:
1 - The SG get only 240 url how it can be? We have more than 2500 news and 250 reviews stored in our database.

Questions:
1 - to exclude a folder I should put in the exclude URL: foldername/ right?
2 - what's the difference from the field "Do not parse URLs"?
3 - I couldn't find the email notification option, where it is?
4 - Why it has so many different sitemap formats (ror,txt,html,etc) they're useful? It is better to let the script create them all?

best regards Alek
« Last Edit: March 30, 2009, 10:27:47 AM by sales587 »
Re: Some problems and questions about SG
« Reply #1 on: March 30, 2009, 11:15:03 AM »
Just to check I've placed in "Include ONLY" URLs only .html files. All of our reviews and news are generated with this extension. I got only 127 files instead of 3000. What the problem? Links are all in the reviews page, obviously are generated when the page is clicked and as soon as you click on "next page" you can see ALL of our reviews. same thing for the news.
Re: Some problems and questions about SG
« Reply #2 on: March 31, 2009, 01:52:46 AM »
Hello,

Are all your pages linked so it's possible to reach any pages by clicking, starting from homepage?


1. Yes, correct.

2. URLs matching "Do not parse" and not fetched form your site, but still included in sitemap. This is useful on large sites to increase crawling speed.

3. Miscellaneous Definitions->Send email notifications

4. It won't hurt to have them all. Text sitemap was used by Yahoo earlier, but now they support XML sitemaps too.
It might be helpful to link to HTML sitemap somewhere on your site to let human visitors (and search engines that do not support xml sitemaps) find your pages easier.
Re: Some problems and questions about SG
« Reply #3 on: March 31, 2009, 08:33:14 AM »
Hi Oleg,
"Are all your pages linked so it's possible to reach any pages by clicking, starting from homepage?"


No obviously not all without making for example a research. as I asked you here: https://www.xml-sitemaps.com/forum/index.php/topic,2827.html

5- We have dynamic pages created from our database (news and reviews), doest this script get all this pages or only pages visited by users?

and this is your answer:
5. The sitemap generator has a built-in crawler, that visits all pages on your website and includes them in sitemap, so it will find ALL your links.

this is my website [ External links are visible to forum administrators only ] please check it and let me know
Re: Some problems and questions about SG
« Reply #4 on: March 31, 2009, 11:44:07 PM »
Hello,

the crawler that is built into sitemap generator includes all pages in sitemap, but there must be a way to *find* those pages.
It's NOT necessary for those pages to be visited by other users.

As I see from your pages structure, the only issue is that your "next page" links are in javascript, like:
<a href="javascript:loadList('',%202);">

so they cannot be crawled. You should change those link to html URLs, like:
<a href="news.php?ACTION=filter&RL_PAGE=3&RL_LETTER=&SORT_COLUMN=date&SORT_DIR=desc&SHOW_CAT=">

and that will resolve the issue.
Re: Some problems and questions about SG
« Reply #5 on: April 01, 2009, 10:59:52 AM »
what about if we place a hidden link at the bottom of every page that link to a single page with ALL the links to reviews and news? This should work either right? In this way the crawler can see them
Re: Some problems and questions about SG
« Reply #6 on: April 02, 2009, 01:04:04 AM »
Yes, sounds good. Make sure that you hide it with color, but not in html comment (since they are automatically stripped).
Re: Some problems and questions about SG
« Reply #7 on: April 02, 2009, 06:05:32 PM »
Done! Now it works well thank you very much!
Re: Some problems and questions about SG
« Reply #9 on: April 03, 2009, 03:15:44 PM »
mmm maybe I've talked too early :D
The crawler is working but I'm also getting some strange urls like this:
[ External links are visible to forum administrators only ]

This is the hidden file: [ External links are visible to forum administrators only ]

In this file there's no link like this. The crwaler.php file is correct.

Can you explain me why the sitemap has this strange links?
Re: Some problems and questions about SG
« Reply #10 on: April 06, 2009, 12:01:43 AM »
It's not necessary that this link is found in a hidden file - sitemap generator continues to crawl your site and the link was found on one of your pages. I see that there are a number of simillar looking pages indexed in google so they are valid URLs: http://www.google.com.by/search?q=inurl:%22admin-recensione.html%22&hl=en&client=firefox-a&rls=org.mozilla:en-US:official&hs=eub&filter=0
Re: Some problems and questions about SG
« Reply #11 on: April 06, 2009, 08:58:08 AM »
YEs sorry but I didn't explained weel the issue.

This link doesn't not exists even if it works (I don't know why)

[ External links are visible to forum administrators only ]

The right link is [ External links are visible to forum administrators only ]

and another link is [ External links are visible to forum administrators only ]

it seems the crawler join this 2 links but I can't understand why.
Re: Some problems and questions about SG
« Reply #12 on: April 06, 2009, 09:28:42 AM »
Beside this problem the number of url collected are always different.

Eg:

Request date: 4 April 2009, 12:11
Processing time: 658.36s
Pages indexed: 4140

Request date: 5 April 2009, 05:10
Processing time: 632.96s
Pages indexed: 4031

Request date: 6 April 2009, 05:10
Processing time: 629.08s
Pages indexed: 3973

but we're not deleting any URL, why they're less?
Re: Some problems and questions about SG
« Reply #14 on: April 07, 2009, 08:54:51 AM »
PM sent