Sitemap HTML Issue
« on: June 08, 2006, 03:18:49 PM »
Congratulations on a super product. I ran the free version without problem and it produced 500 links with no problem. However, having installed the unlimited version then the Anchor Text for each of the page links in each sitemap file are all identical (seems to be the same as the start URL) and does notreflect the actual page title. Any ideas what may be causing this?

XML and URL txt files are fine.

Thanks in advance.

Paul

UPDATE: This issue appears to be related to the PARTIAL indexing of a very large site. As pages are retrieved then additional links are queued for the next level BUT if there are sufficient links to provide the requested number of links then it appears the links files are completed up to the number requested but the  completing links have never been fetched and therefore the link anchor text of the last page to be retrieved is repeated until the end of the file - this is a bit of a guess as I only acquired the program earlier today but I will continue to investigate by running differing size tests. I think that I can see the logic involved - in which case the programme is functioning in a logical way. Consequently, what I may be experiencing is probably related to the unconventional internal link structuring being used rather than from any failing in the software.

I have to say this is the best generator I have found to date by some way. Great job.

UPDATE 2: I have changed my opinion again! There may well be an issue. My sitemap request is for 1000 links and I set it to 140 links per page.
Level 1 finds one page with a link into the main site. The first page found then finds new links and as I watch the process it tells me that scanned pages are 20, pages left 115 and links queued +2056.
The screen refreshes are then, respectively:
40  :  95  :  +3785
60  :  75  :  +5334
80  :  55  :  +6854
100  :  35  :  +8600
120  :  15  :  +10258
140 :  860  :  +0
160 :  840  :  +0
etc to
20  :  980  :  +0
and the last cycle to finish.

Now when I look at the Sitemap HTML the first 135 anchor texts are all correct but nos 136 to 1000 are a repeat of the entry no 135, so it looks like at the start of Level 3 then all anchor texts are lost - so it looks like a bug or loop logic failure? I can show you this running and Skype chat, if you want to. Between 10.00am and 6.00pm GMT as I am in the UK.

 
« Last Edit: June 08, 2006, 09:45:49 PM by Confuscius »
Re: Sitemap HTML Issue
« Reply #1 on: June 08, 2006, 10:23:02 PM »
Hello,

thank you for your kind words :)
As for anchor text in HTML sitemap, please make sure that you doesn't have the "Maximum pages" limit set in the Configuration page (this option optmizes generation so that not every listed page is retrieved, and so page titles are unknown).
Please let me know if that solved the problem.
Re: Sitemap HTML Issue
« Reply #2 on: June 08, 2006, 10:58:22 PM »
Hi Oleg

Thank you for the insight into the workings. Unfortunately, I fear that if I do as you suggest then I may have a bit of an issue. The site itself has the capability of producing in excess of 10 million unique pages and I do not want to create Google Sitemaps covering the whole site in a single run! - also the file structure is very flat (seems to be good for spidering!).

As the page links are queued would it not be possible to in effect cache the text from ">ANCHOR TEXT</a>" as you parse each fille and then re-incorporate this as the html files are built? Just an idea.

Alternatively, what are the implications of starting the process knowing that I will have to interrupt it? Or is it possible to sacrifice optimization for perfect HTML files?

By the way, if there is a possible solution to this issue then the particular software that I use has about another 200+ users who would probably love to be able to produce complete partial sitemaps of their large websites as no one thought this was even viable! So close to the seemingly impossible. The XML / txt files are perfect, the HTML would be the absolute icing on the cake.

Many thanks for your efforts.

Paul

PS Although I did not have changelog checked, I have just noticed that the changelog tab reports the following in both boxes:
<br />
<b>Warning</b>:  implode(): Bad arguments. in <b>/home/XXXXXXX/public_html/generator/pages/page-chlog.inc.php(2) : eval()'d code(1) : eval()'d code(1) : eval()'d code</b> on line <b>35</b><br />

XXXXXXX replaces my user account!

PPS Thinking about this again today, I am still puzzled - if I run the free version then I get back 500 perfectly formed links on one sitemap page but if I run it on my server with a 500 page limit then I still only get the first 135 links with the correct anchor text - I assume that the version you are running is therefore slightly different to th unlimited version?
« Last Edit: June 09, 2006, 05:07:27 PM by Confuscius »
Re: Sitemap HTML Issue
« Reply #3 on: June 09, 2006, 08:44:04 PM »
Hello,

1. Please try to download the script again (I've just updated it) - it should create the titles correctly in html sitemap even when the number of pages is limited in configuration.

2. You can use "Exclude URLs" and "Do not parse URLs" options to exclude the majority of URLs from being crawled / added to sitemap and to narrow down your sitemap to a small part only.

3. If you have disable the changelog option, this warning message is shown since there is no source data from change log calculation.

Let me know if you have further questions :)
Re: Sitemap HTML Issue
« Reply #4 on: June 09, 2006, 11:27:22 PM »
Hi Oleg

Many thanks for the speedy response.

1. - I'll give it ago tomorrow when I'm a bit brighter! I'll report back in due course.
2. - I understand these features and they will be very useful.
3. - I guessed that this would probably be the case.

Hopefully, that will be the end of my questions!

Paul

UPDATE : Fantastic! It now works a treat as my main objective was to be able to increase the size of the sitemap gradually over time to generate the impression of a naturally increasing website size.

As a small thank you, I have made a personal recommendation to other users associated with the product that is used to form the basis of the websites that I build. I sincerely hope, at the very least, that you get quite a few new customers as a result of this. A product that does exactly what it is described to do is becoming rarer these days with all the hype that some people add to their creations. 10/10 from me.
« Last Edit: June 10, 2006, 09:22:45 PM by Confuscius »
Re: Sitemap HTML Issue
« Reply #5 on: June 11, 2006, 10:56:01 AM »
Hi Oleg

One final question! After creating a sitemap for the main domain, I have then reset the configuration to create a sitemap for a sub-domain - for the Google sitemap files and the HTML sitemap files then I can specify the locations to create the files on the subdirectory to which the subdomain points BUT am unable to do this for the Yahoo links file and it overwrites the main domain file.

Not a major problem as after producing the main domain YAhoo file I can rename it. Would it be possible to also specify the Yahoo txt file path in a future version? (Please!)

Paul
Re: Sitemap HTML Issue
« Reply #6 on: June 12, 2006, 12:57:45 PM »
Hello Paul,

excellent, I'm glad it works as expected for you now :)

re: yahoo sitemap file location
You can either:
1. create multiple instance of Sitemap generator at your host, like /generator1/, /generator2/. Each one will have separate configuration and there will be no need to change configuration every time.
2. OR manually set it in config.inc.php file as described here: https://www.xml-sitemaps.com/forum/index.php/topic,228.html
Re: Sitemap HTML Issue
« Reply #7 on: August 18, 2006, 03:48:06 PM »
Hi
Excellent software. Well done.

I am having a similar issue with anchor text whilst crawling through a mid sized forum (approx 9,000 pages).
It shows the url as the link anchor text, but I would like it to show the page title as the url.

I have checked that it is allowing unlimited maximum pages (0). I have also tried a variety of config settings.
Any ideas where else to look
Regards
Neil
Re: Sitemap HTML Issue
« Reply #8 on: August 18, 2006, 08:44:31 PM »
Hello Neil,

do you have some of these pages affected with "Do not parse URLs" option? (in this case urls are added to sitemap, but pages are not retreived, so it's impossible to determine the title)
Re: Sitemap HTML Issue
« Reply #9 on: August 19, 2006, 04:10:21 PM »
Hi Oleg
Thanks for the reply
The only url I had in that page was the 404.html error page which I have removed
I have some excludes urls but not in the forum.
here is the sitemap [ External links are visible to forum administrators only ] so you can see the result.

regards
Neil
« Last Edit: August 26, 2006, 11:56:31 AM by neil »
Re: Sitemap HTML Issue
« Reply #10 on: August 20, 2006, 01:51:53 AM »
Hello Neil,

as far as I see, you have "links as anchors" for ../vistit/ URLs, which are redirected to other sites and that's why there is no title in these cases.
Re: Sitemap HTML Issue
« Reply #11 on: August 23, 2006, 01:33:45 PM »
Thanks for the reply.

Would you have any tips on how I can get the forum to produce descriptive anchor text instead of just the URL?
regards
Neil
Re: Sitemap HTML Issue
« Reply #12 on: August 24, 2006, 06:07:03 AM »
Hello,

do you mean the links that you see on the forum pages?
I'm not sure about exact solution, it should be specific to a forum software used.
Re: Sitemap HTML Issue
« Reply #13 on: August 24, 2006, 12:24:51 PM »
No, I wanted descriptive anchor text for the links the sitemap generates for the html sitemap of my forum.
rgds
Neil
Re: Sitemap HTML Issue
« Reply #14 on: August 25, 2006, 11:35:16 PM »
Ah, I see. So, your suggestion is to use anchor text of the link found by the crawler in case if the page is not retreived or doesn't have corresponding title tag. Is it correct? :)