Why Are Dynamical Pages Not Included in Sitemaps.
« on: September 11, 2018, 09:56:05 PM »
How can I get the Standalone Sitemap Generator to scan and include dynamical pages into my sitemap files. The dynamical pages are blog pages and are embedded in the main html page, [ External links are visible to forum administrators only ] but have their own URLs. Here is an example URL of one of my blog articles that is dynamical page, but the generator does not include any of the dynamical pages in the sitemap files that the generator creates:

[ External links are visible to forum administrators only ]

Please advise on how to configure the generator to include these dynamical blog pages in the sitemap files (xml, ror, gz, html, txt, mobile).

Thank you.
Re: Why Are Dynamical Pages Not Included in Sitemaps.
« Reply #1 on: September 12, 2018, 06:39:01 AM »
Hello,

are you using default sitemap generator configuration or some options were modified? If yes, which ones?
Re: Why Are Dynamical Pages Not Included in Sitemaps.
« Reply #2 on: September 12, 2018, 06:53:47 AM »
The first four tabs seem straight forward nothing unusual.
In the CRAWLER RULES tab - No Changes
In the ADVANCED tab: -
include pages from any website subdomain NOT CHECKED
Support Cookies CHECKED
use robots.txt file CHECKED
enable canonical URLs CHECKED
AJAX content UNCHECKED
detect hreflang attribute UNCHECKED
Use CURL extension UNCHECKED
enable XSL stylesheet CHECKED
remove "created by" links UNCHECKED
store referring links UNCHECKED
UTF8 charset UNCHECKED
enable debug output UNCHECKED

robots.txt looks like this:
User-Agent: *
Disallow: /customer-login.html
Disallow: /customer-pw-reset.html
Disallow: /customer-registration.html
Disallow: /white-cream-litter-info-duch-inactive.html
Disallow: /white-cream-litter-info-hela-inactive.html
Disallow: /thankyou.html
Disallow: /akc-research-pedigree-5gen-files/
Noindex: /akc-research-pedigree-5gen-files/
Sitemap: [ External links are visible to forum administrators only ]
« Last Edit: September 12, 2018, 06:56:07 AM by 001-jlm »
Re: Why Are Dynamical Pages Not Included in Sitemaps.
« Reply #3 on: September 12, 2018, 07:09:54 AM »
Hello,

it should be working from what I see, please let me know your generator URL/login in private message to check this.
Re: Why Are Dynamical Pages Not Included in Sitemaps.
« Reply #4 on: September 12, 2018, 07:20:14 AM »
Pls check your PMs. Thanks.
Re: Why Are Dynamical Pages Not Included in Sitemaps.
« Reply #5 on: September 12, 2018, 07:29:44 AM »
Update: there are incorrect canonical meta tags on blog pages, for instance on
https://domain.com/willow-and-beau-are-expecting--1.html
page it's defined as:
Code: [Select]
<link rel="canonical" href="https://domain.com/golden-puppies-blog.html">while should be:
Code: [Select]
<link rel="canonical" href="https://domain.com/willow-and-beau-are-expecting--1">
Re: Why Are Dynamical Pages Not Included in Sitemaps.
« Reply #6 on: September 12, 2018, 07:40:09 AM »
Are you saying I should uncheck the Canonical box in the ADVANCED tab? Because As I said above, the blog articles are embedded into the "golden-puppies-blog.html" page, but the php script dynamically creates html pages for each article. This --> "willow-and-beau-are-expecting--1.html" is only one of the articles and more articles are added every week. Here is a list of the dynamic blog articles (remember, there is not a physical page for for each article, they are dynamical):

willow-and-beau-are-expecting--1.html
helaina-s-four-week-old-pups-are-doing-great--2.html
grain-free-dog-food-what-s-the-scoop--3.html
what-you-should-know-about-heart-disease--4.html
grooming-and-caring-for-a-golden-retriever--6.html
the-effects-of-inbreeding-in-golden-retrievers--7.html
i-have-a-family-to-call-me-their-very-own--8.html
spaying-and-neutering-your-golden-retriever--10.html
i-need-a-cuddle-buddy-to-be-my-person-asap--11.html
any-day-now--12.html
is-it-safe-to-ship-a-golden-retriever-puppy--13.html
beef-rawhide-chews-are-excellent-for-golden-retrievers--14.html
anesthesia-what-you-should-know-for-your-golden-retriever--15.html
oral-hygiene-for-your-golden-retriever--16.html
are-you-over-vaccinating-your-golden-retriever--17.html
they-re-here--18.html
Re: Why Are Dynamical Pages Not Included in Sitemaps.
« Reply #7 on: September 12, 2018, 08:13:28 AM »
I removed the canonical tag from the web page where the blog articles are embedded dynamically, but I left the Canonical box checked in the generator. Then I ran the generator again and it correctly indexed the 16 dynamical blog pages plus about 183 other pages including the kitchen sink. Any recommendations?
Re: Why Are Dynamical Pages Not Included in Sitemaps.
« Reply #8 on: September 12, 2018, 08:32:55 AM »
You can either keep without canonical meta tag or correct canonical tags on your website pages.
Re: Why Are Dynamical Pages Not Included in Sitemaps.
« Reply #9 on: September 12, 2018, 08:42:39 AM »
Oleg, you are not reading my posts. Please look at what I posted earlier. The pages are DYNAMIC. There are no canonical tags for each blog article on the page where you said to change it. Are you not getting this?
Re: Why Are Dynamical Pages Not Included in Sitemaps.
« Reply #10 on: September 12, 2018, 03:29:54 PM »
You would need to remove it in this case.
Alternatively, you can disable "Support canonical" setting in generator configuration and canonical meta tags will be ignored by sitemap generator.
Re: Why Are Dynamical Pages Not Included in Sitemaps.
« Reply #11 on: September 12, 2018, 04:58:58 PM »
If I do either of those recommendations, the generator indexes 230 pages which I do not want. I only want my html pages to be indexed, there is only about 29 static and 16 dynamic html pages. How do I filter out the other 185 other pages (non-html)? Please give specific instructions because I have tried using some of the filters on the Crawler Rules tab but I must not be using it properly becauae it always indexes 0 pages when I do that.
« Last Edit: September 12, 2018, 05:17:49 PM by 001-jlm »
Re: Why Are Dynamical Pages Not Included in Sitemaps.
« Reply #12 on: September 12, 2018, 05:52:53 PM »
If you want to exclude all pages with "?" in the URL you can just add "?" in Exclude URLs setting.