fluctuating number of indexed pages
« on: April 12, 2016, 09:26:06 AM »
Hi,

we have the same configuration on three different environments (productive vs. two test).
Within each of our two different test environments the numbers of the indexed pages varies between 6.000 an 31.000 (should be 31.000)
  • crawler runs are indepently (also time wise)
  • variation scheme is not parallel
  • any number between 6.000 and 31.000 seems possible
  • production environment shows correct and stable numbers

Any ideas about this?
Exists a more informative log option?

Kind regards

Fritz
Re: fluctuating number of indexed pages
« Reply #1 on: April 13, 2016, 05:18:21 AM »
Hello,

it's possible that the server couldn't handle the number of requests generator crawler is sending. You can try to use "Make delay" setting to slow down crawling and avoid overloading server.
There are not additional log options in sitemap generator.
Huge daily differences in sitemap size, meta-noindex partially ignored
« Reply #2 on: April 21, 2016, 10:55:55 AM »
Our sitemap contains between 6.000 and 32.000 pages, it should be always 32.000.
Every day the changelog lists 1000 added and removed urls.

Among those are pages that contain the following meta-tag:
Code: [Select]
   <meta name="robots" content="NOINDEX,FOLLOW">
One day they get falsely added, the next day correctly removed.

We have an almost identical production environment which works fine.
Re: fluctuating number of indexed pages
« Reply #4 on: May 13, 2016, 12:10:57 PM »
I found a new hint.

When I activate the option "enable debug output" I was able to detect this:

Code: [Select]

*** *** https://xxx/psychotherapie/-ort-/e/
| 1,137.4 |

*** time: 5.24092411995 ***
| 1,137.4 |
(memory: 3,277.4 Kb)
| 1,137.4 |
[[[ 200 OK ]]] - 5.24s (0.00 + 0.00) array ( 'date' => 'Fri, 13 May 2016 09:43:19 GMT', 'server' => 'Apache', 'set-cookie' => 'fe_typo_user=a9200ca252196184cb43158e055131ed; path=/', 'vary' => 'Accept-Encoding,User-Agent', 'content-encoding' => 'gzip', 'cache-control' => 'max-age=0', 'expires' => 'Fri, 13 May 2016 09:43:19 GMT', 'content-length' => '9073', 'connection' => 'close', 'content-type' => 'text/html; charset=utf-8', 'x_csize' => 53312, )| 1,137.4 |
({skipped psychotherapie/-ort-/e/ - mrob})
| 1,137.4 |
[ 302 - psychotherapie/-ort-/f-g/, 1] | 1,137.4 |
{ https://xxx/psychotherapie/-ort-/f-g/ }

| 1,137.4 |

*** *** https://xxx/psychotherapie/-ort-/f-g/
| 1,143.7 |

*** time: 6.24322199821 ***
| 1,143.7 |
(memory: 3,270.1 Kb)
| 1,143.7 |
[[[ 200 OK ]]] - 6.24s (0.00 + 0.00) array ( 'date' => 'Fri, 13 May 2016 09:43:24 GMT', 'server' => 'Apache', 'set-cookie' => 'fe_typo_user=327fc51ff6bcb4176ef4c5c92b4d075d; path=/', 'vary' => 'Accept-Encoding,User-Agent', 'content-encoding' => 'gzip', 'cache-control' => 'max-age=0', 'expires' => 'Fri, 13 May 2016 09:43:24 GMT', 'content-length' => '10363', 'connection' => 'close', 'content-type' => 'text/html; charset=utf-8', 'x_csize' => 0, )| 1,143.7 |
((include https://xxx/psychotherapie/-ort-/f-g/))




This snippet shows two technically identical pages.
Both have the above mentioned meta=noindex.
Both get served by the server.
But the second gets falsely indexed and says  "'x_csize' => 0,".

Is this the cause?
What is going wrong here?
How can we to fix it?

Kind regards

Fritz
« Last Edit: May 13, 2016, 12:20:51 PM by psyche »
Re: fluctuating number of indexed pages
« Reply #5 on: May 14, 2016, 10:34:45 AM »
It means that there was an empty response from server, possibly to being not able to handle that many requests as mentioned above.