• Welcome to Sitemap Generator Forum.
 

Moved Hosts Now Crawls 1 Page Only

Started by sharingsunshine, July 26, 2018, 10:09:25 PM

Previous topic - Next topic

sharingsunshine

I have read the various posts pertaining to this problem.  I can ping from my terminal and I have set the folders the correct permissions.

When I choose to crawl the site it comes back with one page only.

Please help.

Thanks,

Randal


sharingsunshine



sharingsunshine


sharingsunshine

now I am getting this error, when I attempt to create a sitemap.

[ External links are visible to forum administrators only ]

XML-Sitemaps Support

Probably there is a configuration problem - it looks like your server doesn't allow local network connections via port 80 (http) or 443 (https) - as a result sitemap generator is not able to crawl the site. This is usually related to firewall installed at the host - could you please contact your hosting support regarding this?

sharingsunshine

You were correct and I have since fixed the problem and verified I can telnet into my server using both ports. 

However, I am still getting this error:  [ External links are visible to forum administrators only ]

XML-Sitemaps Support

Please try to create a testing script to check that connection is working:

<?php

$initurl
= 'https://www.yourdomain.com';

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $initurl);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_VERBOSE, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

if(
$errno = curl_errno($ch)) {
 
$error_message = curl_error ($ch);
 echo
"cURL error ({$errno}):\n {$error_message}";
}
$info = curl_getinfo($ch);
print_r($info);
$fdata = curl_exec($ch);
print_r($fdata);
curl_close($ch);

sharingsunshine

this is the page I get when I run that script - [ External links are visible to forum administrators only ]

I ran a netstat and got this.  From what I know this seems to show the ports are open:

[root@ip-172-31-8-214 conf.d]# netstat -a | grep -i LISTEN
tcp        0      0 0.0.0.0:ssh             0.0.0.0:*               LISTEN
tcp        0      0 localhost:smtp          0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:sunrpc          0.0.0.0:*               LISTEN
tcp6       0      0 [::]:ssh                [::]:*                  LISTEN
tcp6       0      0 [::]:https              [::]:*                  LISTEN
tcp6       0      0 [::]:mysql              [::]:*                  LISTEN
tcp6       0      0 [::]:sunrpc             [::]:*                  LISTEN
tcp6       0      0 [::]:http               [::]:*                  LISTEN

XML-Sitemaps Support

As it's seen on a screenshot, the test script receives 403 forbidden response. It means that port is open, but your website blocks access from our server IP address.

XML-Sitemaps Support

It is also possible that your website blocks access from generator bot user-agent, please try to add this setting in generator/data/generator.conf file:
<option name="xs_crawl_ident">Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0</option>

sharingsunshine

You were correct, I have a plugin to stop bots.  I never dreamed your bot's name would be in the bot table but it was.  Once I whitelisted it, everything worked perfectly.

Thanks for sticking with me on this.