Article Directory Not Being Crawled
« on: July 22, 2008, 03:36:19 PM »
I have over 100,000 articles in my directory, but it only crawls the index page, and gets 66 URL's, which is about the same number of actual catergories. It does not drill down anywhere from the main page. I know they can be indexed as Google has done over 10,000.

All pages are dynamically produced, so a URL of an article may look like:
www.*******.com/Article/The-10-Commandments-of-Online-Horse-Racing-Betting/110032

No ROBOT code in txt or META

I've been piddling about with the software for around 2 hours and have lost the will to live  :D

Any help appreciated.  :)
Re: Article Directory Not Being Crawled
« Reply #1 on: July 22, 2008, 04:33:46 PM »
Hello,

please try our SE bot simulator - enter your category URL and see if individual pages URLs are found correctly.
Re: Article Directory Not Being Crawled
« Reply #2 on: July 22, 2008, 05:57:07 PM »
 ??? All the catergories show up as external links, which obviously they are not, and Google, Yahoo, etc spider them no problem.

So what settings do I need?
Re: Article Directory Not Being Crawled
« Reply #3 on: July 23, 2008, 05:26:38 PM »
Found the problem. Run it as a CRON so I could get the errors that flashed up, and then straight off again.


Your software uses FOPEN.....we use CURL for security.

Warning: fopen(/home/w2wart/public_html/site/data/urllist.txt): failed to open stream: Permission denied in /home/w2wart/public_html/site/pages/class.xml-creator.inc.php(2) : eval()'d code on line 95

Warning: fwrite(): supplied argument is not a valid stream resource in /home/w2wart/public_html/site/pages/class.xml-creator.inc.php(2) : eval()'d code on line 177

Any chance of an updated version with CURL? Easy to do.  ;)
Re: Article Directory Not Being Crawled
« Reply #4 on: July 25, 2008, 04:12:00 PM »
Hello,

sure, you should just enable this option in config.inc.php file:
   'xs_usecurl'=>1,