I have one big question, since i have few/several sites with more then a million pages up to 3 million pages, i would like to somehow gather site structure or analyze it before i even start crawl, so i could exclude things etc...
or maybe perhaps this thingy:
"1 out of every ---" taking an integer from 1 to 100
so if 1 out of 10
would cut 250k into 25k pages to scan
If you have any ideas i would love to hear them...