Sitemap Generation interrupted.
« on: November 08, 2020, 10:15:38 AM »
From the other Messages here. I understood that Generation of Sitemaps with more pages need more max execution time and more memory too.

But can you please tell how much execution time & memory is enough for my case, so I can get a rough Idea.

I have 10,000 (10k) pages to crawl by generator.
Re: Sitemap Generation interrupted.
« Reply #1 on: November 08, 2020, 11:00:42 AM »
Some added questions :

I have increased the max execution time to : 3600s (1 hour). Still it gets interrupted and doesn't completed the process in one go. shows the error : Error 524

Note that I have Cloudflare installed on the top as proxy server. Can Cloudflare be the cause?

I wish to set a Cron job generator. But before that just want to make sure that everything gets right in manual Crawl.
Re: Sitemap Generation interrupted.
« Reply #2 on: November 09, 2020, 09:11:29 AM »
Hello, 

in this case I would recommend to run generator in command line if you have ssh access to your server.
Re: Sitemap Generation interrupted.
« Reply #3 on: November 10, 2020, 10:54:18 AM »
The command line approach get things done for me. Thank you for your help 😇
Re: Sitemap Generation interrupted.
« Reply #4 on: December 09, 2020, 03:51:51 PM »
Every thing was working fine. Until now when I have started to see this ERROR:

An error occured
There was an error while retrieving the URL specified: [ External links are visible to forum administrators only ]
HTTP Code:
HTTP/2 403
HTTP headers:
date: Wed, 09 Dec 2020 15:42:37 GMT
content-type: text/html; charset=UTF-8
cf-chl-bypass: 1
cache-control: private, max-age=0, no-store, no-cache, must-revalidate, post-check=0, pre-check=0
expires: Thu, 01 Jan 1970 00:00:01 GMT
x-frame-options: SAMEORIGIN
cf-request-id: 06e9c4449e0000d58b91382000000001
expect-ct: max-

What should I do?
Re: Sitemap Generation interrupted.
« Reply #5 on: December 09, 2020, 04:31:44 PM »
I checked the error log for today :
[09-Nov-2020 10:46:04 UTC] PHP Deprecated:  idn_to_ascii(): INTL_IDNA_VARIANT_2003 is deprecated in /public_html/generator/pages/class.http.inc.php on line 120

I thought it might be helpful
Re: Sitemap Generation interrupted.
« Reply #6 on: December 10, 2020, 07:22:38 AM »
Hello,

looks like there is a configuration problem - it looks like your website blocks access from local network connections - as a result sitemap generator is not able to crawl the site.
Re: Sitemap Generation interrupted.
« Reply #7 on: December 13, 2020, 05:33:18 PM »
I contacted my hosting provider. And the following was the response. I think hosting guys are making some escape statement by indirectly indicating me that - Sorry we can't let run such scripts on our server. Posting this as I wanted to know your opinion over it.

RESPONSE from Hosting support below :

I am able to replicate the issue.

I have tried to execute the runcrawl.php file and got the error in the following URL.

+++
[ External links are visible to forum administrators only ]

+++

Upon checking the logs, I could not find anything related to this issue.

It seems that the server is blocking this as part of security since the script is not designed in the way where the server will accept the request.

I would suggest you to redesign the crawler as in the given KB and try once again. If the issue persists, then please find an alternative way to crawl the webpages with the help of your developer.

+++
[ External links are visible to forum administrators only ]

+++

Thank you for understanding!
Re: Sitemap Generation interrupted.
« Reply #8 on: December 14, 2020, 09:44:29 AM »
Apparently your website blocks requests from local IP address. Do you have a file named .htaccess in domain folder and if yes what is it's content? (there might be a blocking directive in that file)
Re: Sitemap Generation interrupted.
« Reply #9 on: December 14, 2020, 10:47:25 AM »
The file content is as below :

RewriteEngine On
RewriteCond %{HTTP_HOST} ^aaps.space$ [OR]
RewriteCond %{HTTP_HOST} ^mail.aaps.space$ [OR]
RewriteCond %{HTTP_HOST} ^www.aaps.space$
RewriteCond %{SERVER_PORT} 80
RewriteCond %{REQUEST_URI} !^/\.well-known/acme-challenge/[0-9a-zA-Z_-]+$
RewriteCond %{REQUEST_URI} !^/\.well-known/cpanel-dcv/[0-9a-zA-Z_-]+$
RewriteCond %{REQUEST_URI} !^/\.well-known/pki-validation/(?:\ Ballot169)?
RewriteCond %{REQUEST_URI} !^/\.well-known/pki-validation/[A-F0-9]{32}\.txt(?:\ Comodo\ DCV)?$
RewriteRule ^(.*)$ [ External links are visible to forum administrators only ]$1 [R,L]

# php -- BEGIN cPanel-generated handler, do not edit
# This domain inherits the “PHP” package.
# php -- END cPanel-generated handler, do not edit

# BEGIN cPanel-generated php ini directives, do not edit
# Manual editing of this file may result in unexpected behavior.
# To make changes to this file, use the cPanel MultiPHP INI Editor (Home >> Software >> MultiPHP INI Editor)
# For more information, read our documentation ([ External links are visible to forum administrators only ])
<IfModule php7_module>
   php_flag display_errors Off
   php_value max_execution_time 9500
   php_value max_input_time 60
   php_value max_input_vars 1000
   php_value memory_limit 256M
   php_value post_max_size 260M
   php_value session.gc_maxlifetime 1440
   php_value session.save_path "/var/cpanel/php/sessions/ea-php73"
   php_value upload_max_filesize 256M
   php_flag zlib.output_compression Off
</IfModule>
<IfModule lsapi_module>
   php_flag display_errors Off
   php_value max_execution_time 9500
   php_value max_input_time 60
   php_value max_input_vars 1000
   php_value memory_limit 256M
   php_value post_max_size 260M
   php_value session.gc_maxlifetime 1440
   php_value session.save_path "/var/cpanel/php/sessions/ea-php73"
   php_value upload_max_filesize 256M
   php_flag zlib.output_compression Off
</IfModule>
# END cPanel-generated php ini directives, do not edit
Re: Sitemap Generation interrupted.
« Reply #10 on: December 14, 2020, 04:18:30 PM »
Looks like it's blocked somewhere else, you might need to contact your website developer to check this.
Re: Sitemap Generation interrupted.
« Reply #11 on: April 11, 2021, 07:27:05 PM »
I,m having the similar issue. I have a larger number of pages to crawl. Is there any suggested settings to try? It also gets interrupted and I have to continually restart it. I have a baby name generator website, so there are a lot of pages to crawl. I appreciate any help.
Re: Sitemap Generation interrupted.
« Reply #12 on: April 14, 2021, 09:50:28 AM »
Hello, 

in this case I would recommend to run generator in command line if you have ssh access to your server.