sitemap for ecommerce site
« on: October 16, 2020, 09:11:05 AM »
I have a multivendor ecommerce site with 72,000 products and 5 or 6 vendors
In addition to adding cart urls and many of the same pages with parameters, like page=3 and url_currency = 'usd" etc
it also pulls a lot of irrelavant links
I have 2 hours MORE to compelte my first scan and it has scanned 112,000 pages

So I'm thinking, I only care about the products and sellers store info
From a SEO standpoint, I index only these 2 link types, will that do the job?
What other link types should I use?
The site has a very high product turnover rate (7-800 added or deleted a week), so want to run it weekly
[ External links are visible to forum administrators only ]

Re: sitemap for ecommerce site
« Reply #1 on: October 17, 2020, 07:58:08 AM »
Hello,

you would need to use Exclude URLs setting to avoid crawling the pages:
Code: [Select]
cart/
currency=
...
Re: sitemap for ecommerce site
« Reply #2 on: October 18, 2020, 01:41:50 PM »
OK, is getting the links I want.
Except only 17,000 PRODUCT links produced
I have 70,000+ products that I expected to be included
I use "listing/" to get all the product links
Also took a horiffic amount of time to complete, stopping every 15-20 minutes
I'm thinking running the cron job every 30 minutes will keep it going
So question for that is : when job complete and still running every 30 minutes, does it create a brand new sitemap?
What is your advice on these 3 concerns

Thanks

Re: sitemap for ecommerce site
« Reply #3 on: October 18, 2020, 01:55:49 PM »
ALSO

Seeing this in html site map

germany-1214-used-weber-conducting-bp19710/1 page
Germany 1214 Used Weber Conducting (BP19710) - LINK

australia-562-used-star-saphire-2-1973-bp55721/1 page
Australia 562 Used Star Saphire 2 1973 (BP55721)  - LINK

AFTER PAGE 7 (of 20)The name is no longer being used as a link, the product ID is

russia-503-used-turkment-1933-cv-200-r0753/1 page
2225891 - LINK
morocco-3-5-used-sultan-mohammad-1956-m0264/1 page
2205244 - LINK
Re: sitemap for ecommerce site
« Reply #4 on: October 18, 2020, 04:28:15 PM »
Hello,

could you please PM me your generator URL and an example URL that is not included in sitemap and how it can be reached starting from homepage?

The crawling time itself depends on the website page generation time mainly, since it crawls the site similar to search engine bots.
For instance, if it it takes 1 second to retrieve every page, then 1000 pages will be crawled in about 16 minutes.

Re: sitemap for ecommerce site
« Reply #5 on: October 20, 2020, 01:36:13 PM »
Well, it has been running on an hourly cron for 48 hours on an hourly cron that has thusfar indexed 10,000 pages of 72,000+ expected product pages because it stops after every 15-35 minutes.
My server is not complaining.
Are their any logs to be seen other than the server error log which shows thousands of 406 errors for unrelated domains.
I'm also piping the cron output to a file which looks somthing like this and makes an entry for each hour

Resuming the last session (last updated: 2020-10-20 11:38:01)
12329 | 0 | 1,560,112.4 | 4:25:26 | 0:00:00 | 4 | 41,360.7 Kb | 11702 | 6937 | 41360
12333 | 7805 | 1,560,596.1 | 4:25:31 | 2:48:02 | 4 | 43,147.2 Kb | 11706 | 6940 | 43147
12340 | 7798 | 1,561,472.8 | 4:25:35 | 2:47:50 | 4 | 43,168.3 Kb | 11713 | 6944 | 43168
12344 | 7794 | 1,561,966.0 | 4:25:40 | 2:47:44 | 4 | 43,164.7 Kb | 11717 | 6945 | 43164

Can you suggest a stratgey:
I want to run the generator every day of the week during certain hours to minmize the server impact so it ends on a certain day . OR make it run fast enought to run to completion on the day before I want the new sitemap
The reason is my products are very dynamic on a weekly basis HUNDREDS and HUNDREDS of add/delete/modify
I am using robots.txt, does the delay set in that impact the sitemap generator? It us currently set to 10
In your app it is set to 1

I'm litteraly flying blind here w/o any useful logs on progress or problems.
Re: sitemap for ecommerce site
« Reply #6 on: October 20, 2020, 06:01:37 PM »
Hello,

you can setup a cron task to run on specific days/hours. Also, you can run it manually via ssh to see the output and check what are the last lines of output when it stops.
Re: sitemap for ecommerce site
« Reply #7 on: October 20, 2020, 09:14:00 PM »
well that's what I have done
ssh not viable, it's been running 96 hours
why is it so slow and why do I have to restart it every hour is my question
Re: sitemap for ecommerce site
« Reply #8 on: October 21, 2020, 09:10:56 AM »
Hello,

The crawling time itself depends on the website page generation time mainly, since it crawls the site similar to search engine bots.
For instance, if it it takes 1 second to retrieve every page, then 1000 pages will be crawled in about 16 minutes.
Re: sitemap for ecommerce site
« Reply #9 on: October 21, 2020, 09:42:28 AM »
OK, since you refuse to answer any of my questions my questions
Please have a look at it
I will PM you my installation URL