XML Sitemaps Generator

Author Topic: robots.txt processing enabled - error  (Read 11135 times)

shiz

  • Registered Customer
  • Approved member
  • *
  • Posts: 6
robots.txt processing enabled - error
« on: July 15, 2015, 04:01:06 PM »
with robots.txt processing enabled, the sitemap generator exits immediately (both 6.1 and 7.1)

I use a block of agents (maybe it doesnt like it) i.e.

User-agent: *
Disallow: /

User-agent: googlebot
User-agent: bingbot
User-agent: AhrefsBot
User-agent: SemrushBot
User-agent: MJ12bot
Disallow:

Here's the report:

============================================================
2015-07-15 09:08:01


(memory up: 1,567.2 Kb)
0 | 0 | 0.0 | 0:00:01 | 0:00:00 | 0 | 1,567.2 Kb | 0 | 0 | 1567

[ 1 - , 1]

NEXT LEVEL:1

({skipped  - })

(memory: 1,509.9 Kb)
(saving dump)


Crawling completed
<h4>Completed</h4>Total pages indexed: 0
<br>Creating sitemaps...
 and calculating changelog...
<div id="percprog"></div>
Creating HTML sitemap...<div id="percprog2"></div>sorting.. |  | 0.0 | 0:00:00 | 0:00:00 |  |  |  |  | 0
 |  | 0.0 | 0:00:00 | 0:00:00 |  |  |  |  | 0

*** *** [external links are visible to admins only]

*** time: 10.263481855392 ***
 |  | 0.0 | 0:00:00 | 0:00:00 |  |  |  |  | 0
 |  | 0.0 | 0:00:00 | 0:00:00 |  |  |  |  | 0
<br />Done, redirecting to sitemap view page. <script> top.location = 'index.php?op=view' </script>

XML-Sitemaps Support

  • Administrator
  • Hero Member
  • *****
  • Posts: 10624
Re: robots.txt processing enabled - error
« Reply #1 on: July 16, 2015, 11:13:39 AM »
Hello,

generator follows "User-agent: *" rule.
However, you can disable "Support robots.txt" setting in generator configuration.
Oleg Ignatiuk
www.xml-sitemaps.com
Send me a Private Message

For maximum exposure and traffic for your web site check out our additional SEO Services.

shiz

  • Registered Customer
  • Approved member
  • *
  • Posts: 6
Re: robots.txt processing enabled - error
« Reply #2 on: July 16, 2015, 04:50:01 PM »
[external links are visible to admins only]
generator follows "User-agent: *" rule.
However, you can disable "Support robots.txt" setting in generator configuration.

From the doc, generator also follows "User-agent: googlebot" rule which clearly shows a problem with the current robots.txt implementation.

I suggest you recode it using all the well accepted standards.

googlebot, bingbot, ahrefsbot et al also abide by the "User-agent: *" rule and that don't stop them from crawling the site.

XML-Sitemaps Support

  • Administrator
  • Hero Member
  • *****
  • Posts: 10624
Re: robots.txt processing enabled - error
« Reply #3 on: July 17, 2015, 07:00:25 AM »
It follows both "googlebot" and "*" rules, combining them in a restrictive way.
Oleg Ignatiuk
www.xml-sitemaps.com
Send me a Private Message

For maximum exposure and traffic for your web site check out our additional SEO Services.

shiz

  • Registered Customer
  • Approved member
  • *
  • Posts: 6
Re: robots.txt processing enabled - error
« Reply #4 on: July 17, 2015, 03:19:03 PM »
Yes I can see that and it should not.  It should process them separately and sequentially whenever both are present.  Not ORING them e.g. if ($xxx == '*' || strstr($xxx, 'google')) {...}.  What has been disallowed in the first block should be reallowed in the 2nd block.

In other words, rules like that should cancel each other:

User-agent: *
Disallow: /

User-agent: googlebot
Disallow:


It should have the same effect as the well known apache directives
order deny,allow
deny from all
allow from google



XML-Sitemaps Support

  • Administrator
  • Hero Member
  • *****
  • Posts: 10624
Re: robots.txt processing enabled - error
« Reply #5 on: July 18, 2015, 07:40:00 AM »
It is designed in this way since generator bot is not actually a "googlebot". Thank you for suggestion though, we will consider changing the approach in future versions.
Oleg Ignatiuk
www.xml-sitemaps.com
Send me a Private Message

For maximum exposure and traffic for your web site check out our additional SEO Services.

shiz

  • Registered Customer
  • Approved member
  • *
  • Posts: 6
Re: robots.txt processing enabled - error
« Reply #6 on: July 18, 2015, 03:22:34 PM »
Thank you actually, Oleg.  I appreciate it.

 

SMF 2.0.12 | SMF © 2014, Simple Machines
XHTML RSS WAP2