XML Sitemaps Generator

Author Topic: Problems with SEF links in cyrillic alphabet  (Read 9122 times)

capricorn

  • Registered Customer
  • Jr. Member
  • *
  • Posts: 12
Problems with SEF links in cyrillic alphabet
« on: September 02, 2012, 05:00:13 PM »
Hello!

I use standalone version of XML Sitemap Generator.

My site (Joomla) has a SEF component rewriting links into cyrillic alphabet.

I run the script on my local computer where I have got an Apache web-server installed. Because real site contains over 20 000 pages. I pointed the script to scan my real site in its configuration section. After that I uploaded the generated sitemap to my real site, after changing site URL in it of course.

Now I am facing two problems.

1) There was 1000 broken links, however they look OK and I can open the pages from the list of broken links just clicking on them. What should I do with them?

2) The site map contains links supposed to be in cyrillic looking as abracadabra, and of course they give 404 error when clicking on them.

I am not a big expert with that. Could you please advise where to look for or relevant reading. I will send you in pm my sitemap address so you can take a look.
Thanks.

XML-Sitemaps Support

  • Administrator
  • Hero Member
  • *****
  • Posts: 10624
Re: Problems with SEF links in cyrillic alphabet
« Reply #1 on: September 02, 2012, 10:28:09 PM »
Hello,

1. what is an example oforken link + referring page to it?

2. I'd need an example URL here as well. You might just try to enable UTF8 support setting in generator configuration though.
Oleg Ignatiuk
www.xml-sitemaps.com
Send me a Private Message

For maximum exposure and traffic for your web site check out our additional SEO Services.

capricorn

  • Registered Customer
  • Jr. Member
  • *
  • Posts: 12
Re: Problems with SEF links in cyrillic alphabet
« Reply #2 on: September 03, 2012, 08:15:03 AM »
Hello,

As for 1) you can see them at [external links are visible to admins only].

Regarding 2) I will follow your advice and report the results later. So far I can only say that when I run free on-line generator the SEF links in Russian in xml sitemap were OK. I suspect it may have also to do with settings of my local web-server.

BRG

capricorn

  • Registered Customer
  • Jr. Member
  • *
  • Posts: 12
Re: Problems with SEF links in cyrillic alphabet
« Reply #3 on: September 03, 2012, 01:00:15 PM »
I set UTF-8 and it's OK now. Also I set delay for crawling. Looks better - no 404 pages during a short test run.

Thanks for your advice!

I've got only two questions left at the moment.

1) If it still happens that some links will be reported as broken due to slow server response, but they are OK in fact, are they included nevertheless in the sitemap or I need something to do with them to have them included  in the sitemap?

2) Generator crowls "print" and "ask a question" pages in Virtuemart shop, however I've chosen Joomla exclusion preset in config section. These pages looks like:

/index.php?page=shop.ask&flypage=flypage.tpl&product_id=123522&category_id=51634&option=com_virtuemart&Itemid=44

/index2.php?option=com_virtuemart&page=shop.product_details&only_page=1&category_id=51634&product_id=123527&pop=1&tmpl=component&

What string do I need to add to "Exclude URLs" to prevent generator crowling these pages?

Thanks again!

XML-Sitemaps Support

  • Administrator
  • Hero Member
  • *****
  • Posts: 10624
Re: Problems with SEF links in cyrillic alphabet
« Reply #4 on: September 03, 2012, 03:52:26 PM »
1. In case if generator gets "not found" response, it won't include those pages in sitemap.
2. I'd recommend to add them in "Exclude URLs" setting with:
Code: [Select]
shop.ask
pop=1
Oleg Ignatiuk
www.xml-sitemaps.com
Send me a Private Message

For maximum exposure and traffic for your web site check out our additional SEO Services.

capricorn

  • Registered Customer
  • Jr. Member
  • *
  • Posts: 12
Re: Problems with SEF links in cyrillic alphabet
« Reply #5 on: September 03, 2012, 04:35:33 PM »
Thanks,

It works. However I had an impression, when I added "shop.feed" to exclusion list of URLs, for some reason it kept including in sitemap links containing "shop.feed". So I turned off these links on site for a scanning period.

Also I set delay 2 sec between 5 requests. I think my hosting has anti-flooding software running. Slowly but surely  :)

BRG.

capricorn

  • Registered Customer
  • Jr. Member
  • *
  • Posts: 12
Re: Problems with SEF links in cyrillic alphabet
« Reply #6 on: September 04, 2012, 04:18:45 PM »
Hello Oleg!

After new run again 1000 links reported as broken. Could you please take a look on them here [external links are visible to admins only].

Also around 5000 links are not crowled at all. I can't see why.

Thanks.

PS. As I said in the beginning I run generator from local PC and afterwards upload sitemap and generator itself to the real site. Since memory limit and execution time do not allow to run it on hosting.

XML-Sitemaps Support

  • Administrator
  • Hero Member
  • *****
  • Posts: 10624
Re: Problems with SEF links in cyrillic alphabet
« Reply #7 on: September 06, 2012, 05:09:15 AM »
Hello,

\the problem in this case is not in server where generator running, but server here the site is hosted, since it stops responding after some time when generator crawls the site.
Oleg Ignatiuk
www.xml-sitemaps.com
Send me a Private Message

For maximum exposure and traffic for your web site check out our additional SEO Services.

capricorn

  • Registered Customer
  • Jr. Member
  • *
  • Posts: 12
Re: Problems with SEF links in cyrillic alphabet
« Reply #8 on: September 06, 2012, 05:08:30 PM »
I will try to experiment with delay times between requests, as well as with a number of requests between delays.

Could you please tell - are there any limitations for a length of encoded URLs for a generator to crawl them successfully?

XML-Sitemaps Support

  • Administrator
  • Hero Member
  • *****
  • Posts: 10624
Re: Problems with SEF links in cyrillic alphabet
« Reply #9 on: September 08, 2012, 08:32:25 PM »
Up to 2048 characters are allowed in URL.
Oleg Ignatiuk
www.xml-sitemaps.com
Send me a Private Message

For maximum exposure and traffic for your web site check out our additional SEO Services.

capricorn

  • Registered Customer
  • Jr. Member
  • *
  • Posts: 12
Re: Problems with SEF links in cyrillic alphabet
« Reply #10 on: October 01, 2012, 12:57:00 PM »
Hello!

Regardless of crowl delay settings I always get exactly 1000 links reported as broken, even on another domain. Why is it always 1000 and why a vast majority of them is always the same links, actually working links? Could you please take a look at [external links are visible to admins only]. Do I need to re-engineer product and category names?

I use Joomla 1.5, Virtuemart 1.1.9 and SH404SEF.

Thanks

« Last Edit: October 01, 2012, 12:59:05 PM by capricorn »

XML-Sitemaps Support

  • Administrator
  • Hero Member
  • *****
  • Posts: 10624
Re: Problems with SEF links in cyrillic alphabet
« Reply #11 on: October 03, 2012, 11:22:18 PM »
Hello,

please try to enable " UTF8 charset" setting in generator configuration.
Oleg Ignatiuk
www.xml-sitemaps.com
Send me a Private Message

For maximum exposure and traffic for your web site check out our additional SEO Services.

 

SMF 2.0.12 | SMF © 2014, Simple Machines
XHTML RSS WAP2