Re: Do not parse URLs are also parsed
« Reply #15 on: March 13, 2011, 02:04:59 PM »
For memory of course will take as much space, but you don't have to parse the array each time when you verify if the URL is new.

Currently you have a factorial scheme in place:

Example:

If array holds 10 items, you have to loop 10 items to see if unique.
If array holds 11 items, you have to loop 11 items to see if unique. With previous that is 1*2*...*10*11 comparison done from start.

When you have 500 000 urls, you have to loop for the last one 500.000 times, and before you have looped 499.999! loop.

So for a 500.000 sitemap you step in total 500.000 factorial  that numbers is 2.51Mbyte long and it's equivalent to 1.0228015846519023653×10^2,632,341

In my proposal you will reduce this large number to 1 for each URL, so that will be 500.000 steps on not 1.0228015846519023653×10^2,632,341.