totally confused - timeout issue?
« on: June 22, 2011, 02:21:47 PM »
I use hostgator hosting.  I want to use this for a real estate website to scan property listings.  there will probably be a total of 8000-10000 pages.  When I run it, I have a couple issues.

I got to this point:

Links depth: 3
Current page: idx/mls-m5820781-360_snapdragon_loop_bradenton_fl_34212
Pages added to sitemap: 2243
Pages scanned: 2640 (154,773.5 KB)
Pages left: 2546 (+ 1954 queued for the next depth level)
Time passed: 0:14:29
Time left: 0:13:58
Memory usage: 6,424.4 Kb

then it  stopped scanning and I kept getting the message that the server is not responding, and it keeps counting up to 120 seconds.   I contacted Hostgator to increase memory and timeout, and they say they can not becuase I am on shared hsoting, but I have 64 MB of memory allocation.  It appears this only used 6.4Mb.....  but they claim that pages scanned is exceeding, and could be the issue:  (154,773.5 KB)

I am confused because if it is memory allocation, I am only at about 6.4 MB, but if it is pages scanned of 154 MB (before timing out) then I more than doubles their max allocation.   I am suspect that this is not the issue.  I asked if I am mistakenly at 6.4 MB, and they said "no", I am at 64Mb.

What are your thoughts on why this is timing out?
What hosting companies can support 10000-20000 inclusions in a sitemap with a shared hosting, or is that not possible?

**The other problem I am having is I tried to use about 60 links in the INCLUDE ONLY and PARSE ONLY sections (separated by a space for each) which will scan properties by zipcodes (60 zip codes) in order to cover all the property pages, but the program does not honor the include only's... it still seems to scan my entire site. (I tried it for one zip code first, and saw it was scanning other properties in other zip codes that it (assumably) got from other links on my website itself.
Re: totally confused - timeout issue?
« Reply #1 on: June 22, 2011, 03:54:54 PM »

please let me know your generator URL/login in private message to check this.
Re: totally confused - timeout issue?
« Reply #2 on: June 22, 2011, 08:53:46 PM »
Thanks! I sent a PM a little while ago
Re: totally confused - timeout issue?
« Reply #3 on: June 23, 2011, 03:39:57 AM »
I have tried a whole bunch of different things to get this to work properly, but have failed.  I have gotten it to scan the website, but it scans everything, not just property listings...and as I stated in the forst times out.  HostGator can not adjust it. I have tried a bunch of different ways. 

 Basically, I just want this sitemap to scan the dynamic property listings pages created by a IDX plugin , so I figured the zip code URL links would be the best to use rather than have it scan my whole site for all differnet types of search links.   I still have them in the settings.  Can you take a look and see what is wrong? I obviously don't have it set right because now it doesn't scan anything now. 

As a side note, there are two warnings int he ANALYZE TAB as well.

Re: totally confused - timeout issue?
« Reply #4 on: June 23, 2011, 05:39:43 PM »

I've updated your crawler settings, please check now.

> As a side note, there are two warnings int he ANALYZE TAB as well.
Analyze tab will not work until sitemap is completed.
Re: totally confused - timeout issue?
« Reply #5 on: June 23, 2011, 08:31:17 PM »
Thanks for the assitance.  I ran it...but I got this message after 35 minutes or so....

Links depth: 4
Current page: idx/mls-a3940673-7136_presidio_gln_lakewood_ranch_fl_34202
Pages added to sitemap: 5081
Pages scanned: 5860 (137,046.1 KB)
Pages left: 533 (+ 705 queued for the next depth level)
Time passed: 0:35:33
Time left: 0:03:14
Memory usage: 7,794.7 Kb

I am not real knowledgeable with this stuff....but I think I need to find ways to cut down the redundant listings as part of the issue.  I looked at the dump log and here is a small piece of it....  It looks like it is scanning all my idx links rather than just the zip code links....

    145 => 'idx/page-3?idx-q-Counties=Sarasota&idx-q-DistressTypes=1&idx-q-PriceMax=200000&idx-q-PriceMin=100000&idx-q-PropertyTypes=180',
    146 => 'idx/page-8?idx-q-Counties=Sarasota&idx-q-DistressTypes=1&idx-q-PriceMax=200000&idx-q-PriceMin=100000&idx-q-PropertyTypes=180',
    147 => 'idx/page-3?idx-q-Counties=Sarasota&idx-q-DistressTypes=1&idx-q-PriceMax=300000&idx-q-PriceMin=200000&idx-q-PropertyTypes=180',
    148 => 'idx/64135--short-sale-homes-in-sarasota/page-4?idx-d-SortOrders%3C0%3E-Column=Price&idx-d-SortOrders%3C0%3E-Direction=DESC',
    149 => 'idx/64135--short-sale-homes-in-sarasota/page-11?idx-d-SortOrders%3C0%3E-Column=Price&idx-d-SortOrders%3C0%3E-Direction=DESC',
    150 => 'idx/mls-m5798223-5002_e_18th_st_bradenton_fl_34203',
    151 => 'idx/mls-m5802622-4420_sanibel_way_bradenton_fl_34203',

 One problem (I am thinking) with that is that I can have duplicate listings (many times over in some cases) in the sitemap if I don't stick to one method of pulling listing results.

 For instance...idx/19644-downtown-condos-sarasota-fl-/page-5?idx-d-SortOrders%3C0%3E-Column=Price&idx-d-SortOrders%3C0%3E-Direction=DESC', is one of the examples, which is pulling properties for Downtown condos.  but then idx/page-3?idx-q-Counties=Sarasota&idx-q-DistressTypes=1&idx-q-PriceMax=200000&idx-q-PriceMin=100000&idx-q-PropertyTypes=180', is pulling condo distressed sales between 100-200k price range.... then another link could be pulling completely different condo criteria.  The same listings could meet a ton of different search link criterias that I have on my site.  I have 400 pages of real estate listings sorted by all different criterias.  lots of duplicates I am sure due to marching different criteria.

Considering I am having some sort of timeout or memory issue (or maybe a differnet issue causing it to cease after 30 minutes or so) ....I think part of the resolution is to try for efficience in the scanning....the most efficient way to go is to just scan zip codes because then every property will only be counted just once.  Is there a way to do that?

I have about 30 zipcodes, and as an example, here is a link that would pull all properties in zipcode 34201

[ External links are visible to forum administrators only ]

As far as that message, I have 64MB memory limit with Hostgator server.  Is that the issue with it stopping after 30 minutes or so?  They say that I get 64 MB memory limit, and it doesn't appear I have used that from the message.  THey say I can not adjust the time out,a nd that it is set for 30 seconds!  It obvioulsy appears I am running a lot longer than 30 seconds...more like 30 minutes before it I am unsure what is causing it.

Also, Another person I know uses this with property listings, and said to make sure it indexes the links, but doesn't crawl them because it takes too long to do. Does that make sense, and of so, how do I do that?   

My goal is to get all the 10,000-15,000 or so dynamic listings to rest in the sitemap. 
Thanks so much for the assistance!!!!  I really appreciate it
Re: totally confused - timeout issue?
« Reply #6 on: June 23, 2011, 11:41:08 PM »
Interesting...I just got an email that the sitemap was done.  It seems your changes have worked much better.   It had appeared to me that it had stalled when I sent the last posting. went about 35 minutes like previously,  and then stalled and had gone over 5000 seconds without an update before I posted that last message.  Not sure if it will work again or not.  I will  test again.

It has about 13000 links in sitemap, so at least I know it can be done with my server....I am close ...but they are all kinds of different IDX links, so I know there are lots of duplicate listing urls's int he sitemap.

Basically, I want to try to just get unique individual listing URL's in my sitemap (acquired by zip codes).  The following is an example link for the 34201 zip code that would show all listings in this zipcode.  What changes do I need to make to the settings to JUST scan 30 zipcodes (one example below) for listings, BUT..have the individual listings appear in the sitemap?

[ External links are visible to forum administrators only ]
Re: totally confused - timeout issue?
« Reply #7 on: June 26, 2011, 03:58:25 PM »

I think I got it as you want it to be now:
1. "Parse Only" option set to:
Code: [Select]
search-sarasota-zipcodes/  idx/zip/search-sarasota-zipcodes/ contains a list of zip code pages so we need to parse it too, while idx/zip/ cotains links to listings.

2. "Include Only" option:
Code: [Select]
So that *just* individual listing pages are added in sitemap.
Re: totally confused - timeout issue?
« Reply #8 on: June 28, 2011, 06:38:00 PM »
Thanks Oleg!  I really appreciate your help. That did the trick!