At the heart of this algorithm is the traditional sieve of Eratosthenes (i.e. given a prime \(p\), mark all multiples of \(p\) as composite), however instead of sieving the entire interval, we only consider small sub-intervals. The benefits of this method are two fold:
Reduction of the space complexity from \(O(n)\), for the traditional sieve, to \(O(\sqrt n)\)
Reduction of cache misses
The latter is of particular importance as cache memory is much more efficient and closer in proximity to the CPU than main memory. Reducing the size of the sieving interval allows for more effective utilization of the cache, which greatly impacts the overall efficiency.
Another optimization over the traditional sieve is the utilization of wheel factorization. With the traditional sieve of Eratosthenes, you typically check every odd index of your logical vector and if the value is true, you have found a prime. With wheel factorization using the first four primes (i.e. 2, 3, 5, and 7) to construct your wheel (i.e. 210 wheel), you only have to check indices of your logical vector that are coprime to 210 (i.e. the product of the first four primes). As an example, with \(n = 10000\) and a 210 wheel, you only have to check 2285 indices vs. 5000 with the classical implementation.