interrupt.log does not work any more
« on: August 19, 2017, 08:59:29 PM »

I updated to 7.2 today.started generation via web frontend.
Before the update I had a last run with the 7.1 crawler and everything was fine.
I started generation via web frontend and watch debug.log with tail -f.
I observe 503 errors in the log.

Regular pageviews on the server also give increasing numbers of 503 errors. To see if the crawler causes to much load, i created an empty interrupt.log in the data directory.

I see the line "tail: debug.log: file truncated" and shortly after that the logging continues.

I examined the log and found the line "crawling completed". The Crawler starts over again.
I even deleted runcrawl.php via ftp, created another interrupt.log.
This gets deleted shortly after crawling interrupts and the crawler starts over again.

Calling the frontend again in the browser gives 504 Gateway Time-out errors.

What can I do?

edit:
I closed the browser and called the frontend in a new instance.
I clicked the stop link. It says the stop signal was sent, but back on the crawling tab it keeps saying "Crawling already in process".
« Last Edit: August 19, 2017, 09:29:40 PM by ebay147 »
Re: interrupt.log does not work any more
« Reply #2 on: August 20, 2017, 09:58:28 AM »
Hello Oleg,

it is still strange. Sinc I could not stop the crawler, I left it running overnight.
When I read Your reply the debug.log read in its last line. 

Code: [Select]
# RETRY -  - 0 - error() self-redir(-) badreq(0) forbreq() tmout()# zZz 1

The window within the crawling tab showed a connection error.
I reloaded the web frontend. The crawling tab looked as if the crawler wasn't running anymore, showed no interrupt link and offered to continue.

I deleted the crawl-state.log file via ftp.
just as I wanted to upload the interrupt.log file, I noticed that the debug log had changed.
The last line had doubled.
So there might have been a running instance oft the crawler sleeping in a timeout.
I uploaded interrupt.log anyway.

The interrupt.log disappeared. tail-f said the debug.log got truncated.
I reloaded the crawling tab an clicked "run" with "resume" and "Run in Background" unchecked.

It started, the debug.log started growing, but crawling tab looks as if the crawler wasn't running. It has no interrupt link and does not show the progress.

After 167 pages the debug.log gets stuck.
Seems like it misses the crawl-state.log now.
Must I provide a new one manually?



Re: interrupt.log does not work any more
« Reply #3 on: August 20, 2017, 11:43:37 AM »
Did that.
I first tried to run the crawler by ssh, but it terminated because my ssh access is proxy.
It said it is meant to be run from command line.
 
I created crawl_state.log by ftp with permissions set to 666.
It ist running now. But I am still curious if it will terminate after the full pass or start over.
To my astonishment the crawl_state.log was deleted by the crawler. Perhaps it is not present because at this start "Run in Background" was unchecked.
« Last Edit: August 20, 2017, 11:51:56 AM by ebay147 »
Re: interrupt.log does not work any more
« Reply #4 on: August 20, 2017, 04:52:48 PM »
OK. The full pass ran to the end, search engines notified.

Started again, this time with "Resume" checked.
The crawler created a new crawl_state.log.
I'm curious, if it will stop at the end of the pass or start over again.
Re: interrupt.log does not work any more
« Reply #5 on: August 20, 2017, 08:34:03 PM »
It still doesn't stop crawling.
When the pass is completed,
"crawling completed" is written to debug.log, sitemap.xml and sitemap_images.xml are overwritten, the .gz-Versions too.
debug.log gets truncated and crawling continues.
When I click the interrupt link in the web frontend, it displays "The "stop" signal has been sent to a crawler." The run button is displayed, as if it had really stopped, but tail -f debug.log shows that the crawler is still active.

I upload an interrupt.log, that gets deleted, debug.log is truncated and starts growing again.

When I click "run" with "resume" and "Run in background" unchecked, the crawling tab shows progress somewhere around "Links depth: 3".


Am I the first to report this?
 
Re: interrupt.log does not work any more
« Reply #6 on: August 21, 2017, 05:15:20 AM »

It might take some time (up to a minute) until generator is actually stopped.
Re: interrupt.log does not work any more
« Reply #7 on: August 21, 2017, 07:22:14 AM »
Hello Oleg,

yes, I know this.
What irritates me, is that after updating to 7.2 the crawler does not stop after the sitemap is saved and the search engines are notified.
It starts over with Link depth: 1 when "Run in Background had been checked at its start.

When I watch its activity with tail -f debug.log in an ssh window I see that it reacts to a click on an interrupt link respectively an interrupt.log by truncating debug.log and deleting the interrupt.log and then starts crawling again.
This goes on an on until I delete the crawl_state.log.
It wasn't like this with 7.1.

Re: interrupt.log does not work any more
« Reply #8 on: August 21, 2017, 08:11:59 PM »
Perhaps you have the window still open and it auto-restarts it in this case.
Re: interrupt.log does not work any more
« Reply #9 on: August 21, 2017, 11:38:53 PM »
As far as I remember, did previous Versions of the crawler terminate after all Pages were indexed, xml-files written and search engines pinged.

I will go for another round to find out.

OK. That was it. The window must be closed to prevent the crawler from starting over after the pass is completed.