phpBB Locked forums not getting crawled
« on: January 09, 2009, 02:51:47 PM »
Hi there!!

First of all  thanx for creating such a nice piece of code..

i have an issue with this script, i've bought this script mainly to crawl and create a sitemap of  my phpBB3 topics and posts.
I have some forums (that's how it's called in phpBB world) in phpBB which are locked and cannot be viewed by any guest... actually these forums can be viewed, but not read without entering u/n & password...

for example:
[ External links are visible to forum administrators only ]

phpBB has a built in spiders/bots management sytem that will let them browse this forum internally, by doing some cross checking of agent signatures..

but how will google bot come to know about my pages when they are not included in the sitemap???

is there any way out for this??
how can i allow Standalone Sitemap generator script to access topics within locked forums???
Re: phpBB Locked forums not getting crawled
« Reply #1 on: January 09, 2009, 10:25:11 PM »
Hello,

you can set this option in config.inc.php file to let generator crawl with User-agent set as Google bot:
Code: [Select]
'xs_crawl_ident' => 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'
However, please note that hiding pages for regular visitors and showing them to SE bots might be considered by search engines as cloaking: http://en.wikipedia.org/wiki/Cloaking
Re: phpBB Locked forums not getting crawled
« Reply #2 on: January 10, 2009, 03:16:49 PM »
Thanx for the reply..

i read the post on Cloaking on Wikipedia.. is it harmful for my organization??
I've seen numerous sites indexed in google(that probably means google bot has access to their pages), but when a legitimate user (like u and me) visits the same page, it asks for a username and a password???
Is this a violation of SearchEngines Rules&Regulations...??

And what if i want to add more Bots?? what's the syntax??

And one thing more>
I've submitted a sitemap with 1146 URLS, out of which not even a single URL is indexed by google yet?
I saw these statistics in Google Webmaster Tools Dashboard

Shall i wait for my URL's to get indexed and appear in Google's search listings, or is there anything else required???
« Last Edit: January 10, 2009, 03:22:01 PM by robyrulz1 »
Re: phpBB Locked forums not getting crawled
« Reply #3 on: January 11, 2009, 08:27:36 PM »
Hello,

1. That would work to some extent (until search engine will find out that), but potentially this would do more harm than good.

2. You should only set it to one bot type and sitemap generator will use it to crawl your site, as a result it will be allowed to access your private subforums.

3. It takes some time to get indexed, but the issue might be even related to point #1 above. I'd recommend to open the forums to avoid possible issue.
Re: phpBB Locked forums not getting crawled
« Reply #4 on: January 12, 2009, 12:40:08 PM »
Hello,

Just checked the statistics on Google Webmasters Tools, 119 URL's got indexed out of 1150..
this probably means that indexing is going on.. Thanx once again!!!

I'm so happy!!!
Re: phpBB Locked forums not getting crawled
« Reply #6 on: July 25, 2010, 08:42:50 PM »
Hello,

you can set this option in config.inc.php file to let generator crawl with User-agent set as Google bot:
Code: [Select]
'xs_crawl_ident' => 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'
However, please note that hiding pages for regular visitors and showing them to SE bots might be considered by search engines as cloaking: [ External links are visible to forum administrators only ]

Hello!

I am new to the forum and I apologize first for my bad English!

I have the same problem described in this thread, but I can not find the file config.inc.php to make the change.
The version I installed is the: Standalone Sitemap Generator (PHP) v4.0, 2010-05-20

The address of my website is: [ External links are visible to forum administrators only ]
while the address of the Forum: [ External links are visible to forum administrators only ]

Thanks in advance who will help me!!
Re: phpBB Locked forums not getting crawled
« Reply #7 on: July 26, 2010, 09:27:56 AM »
Hello,

please let me know your generator URL/login in private message to check this.
Re: phpBB Locked forums not getting crawled
« Reply #8 on: July 26, 2010, 11:21:49 AM »
Hello,

please let me know your generator URL/login in private message to check this.

Private Message sent. I'm hoping for good news
Re: phpBB Locked forums not getting crawled
« Reply #9 on: July 27, 2010, 03:28:16 PM »
In Sitemap Generator v4.x please try to add this setting in generator/data/generator.conf file:
Code: [Select]
<option name="xs_crawl_ident">Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)</option>
Re: phpBB Locked forums not getting crawled
« Reply #10 on: July 27, 2010, 04:53:43 PM »
Thanks for the quick response.
I tried to update the file by adding the recommended setting.

Unfortunately phpBB3 continues to see the crawler as "Guest" and not as "Bots"  :'(
In fact, in whois crawler is indicated:
Mozilla/5.0 (compatible; XML Sitemaps Generator; https://www.xml-sitemaps.com)
Gecko XML-Sitemaps/1.0  ???

The sitemap is still no hidden forums guests: some ideas on how to solve?
Thanks for your interest.
Re: phpBB Locked forums not getting crawled
« Reply #12 on: July 27, 2010, 05:57:00 PM »
Thanks it works perfectly!  ;D

Great job, congratulations!  ;)
Re: phpBB Locked forums not getting crawled
« Reply #13 on: August 11, 2013, 08:24:50 PM »
Greetings to all,
unfortunately have to ask again help to you as the problem of non-scan of the private forums phpBB is present after updating to version 6.0 I have modified the file generator.conf as I had done with version 4.0 but no longer works  :'(
I ask you to help me because it is very important for my website this possibility.

Thanks in advance.
Re: phpBB Locked forums not getting crawled
« Reply #14 on: August 12, 2013, 09:00:14 PM »
Hello to all,
I managed to solve the problem by simply adding the crawler bot as phpBB forum without changing the code more.
Despite this, however many pages of the forum are not indexed (over 50%). Riusciuto are to understand that the problem is due to the fact that all sub-forums and their arguments are ignored by the generator, are scanned only topics posted under a single forum father.
For example, the topics of the Forum
[ External links are visible to forum administrators only ]
are properly indexed.
The sub-forum and related topics in the forum father
[ External links are visible to forum administrators only ]
are ignored!
Simply this is the problem which does not allow indexing total of my Forum!
I specify that I never changed the structure of the forum and sub forum, with the previous version 4.0 I had no problem! Now with version 6.0 is the problem I have described.
How can I fix please?
If there is a solution how can I install version 4.0 (I no longer have the files)?
Thanks in advance for the help that will give me safe.

Greetings