Tuesday, April 26, 2016

(Distinct) Prime Factors

Just as a connection can be shown as between the harmonic series and the average no. of divisors of n, likewise a connection can be shown as between the prime harmonic series (i.e. the sum of the reciprocals of the primes) and the average no. of (distinct) prime factors of n (especially when n is very large).

The sum of the series of prime reciprocals

= 1/2 + 1/3 + 1/5 + 1/7 +  1/11 + ........

  ~ log log n + B1 , where B1 is the Merten's constant = .2614972...

Now once again we can approach the issue of  the average no. of (distinct) prime factors of n from this perspective.

Now clearly, 2 will be a factor in relation to 1/2, 3 a factor in relation to 1/3, 5 a factor in relation to 1/5, 7 a factor in relation to 1/7, 11 a factor in relation to 1/11 of all numbers and in like manner also for all other primes.

Now of course these prime factors will then often be repeated more than once in a number. For example 4 = 2 * 2. However this not effect the number of distinct factors where each prime factor is counted only once!

Therefore this therefore implies that the average number of (distinct) prime factors is simply given as

1/2 + 1/3 + 1/5 + 1/7 + 1/11 +...... 

However, this method of calculation will lead to factor quotients with remainders occurring in most cases, whereas the no. of factors can only occur in whole (integer) amounts.

Thus for example when n = 20, our suggested method (using prime reciprocals) would suggest that 10 include the factor 2. In this instance, no remainder exists (as 2 is a factor of 20).

However with respect to 3, it would suggest that 6.67 would contain 3 factors (whereas in fact 6 would be the correct answer). In the case of 5 we would get the correct integer result of 4 (as 5 is factor of 20). Then in the case of 7 we would get 2.86  (instead of 2) with 11 we would get 1.82 (instead of 1), with 13, 1.54 (instead of 1) with 17, 1.18 (instead of 1) and finally with 19, 1.05 (instead of 1).

So to this extent the prime reciprocal formula overstates slightly the average no. of (distinct) prime factors.

Once again we would expect with larger n that the remainders would tend to vary in a fairly balanced manner as between 0 and 1 with the average remainder therefore .5 (approx).

And as the total no. of primes to n is simply given as n/log n (approx) this would mean that the  discrepancy with respect to the total no. of factors would be given as n/2(log n) approx.

Therefore the average discrepancy per number would be given as 1/2(log n).

However as n becomes very large, this average discrepancy will keep falling ever closer to zero.

Therefore for very large n, the average no. of distinct prime factors is accurately given by the reciprocal prime formula,

i.e.  log log n + B1.

And as B1 is a small constant, this would imply that the average no. of (distinct) prime factors at n (when n is sufficiently large), is given as log log n. 

Monday, April 25, 2016

Surprising Result

As is well known the sum of the harmonic function,

1 + 1/2 + 1/3 + 1/4 + 1/5 + ............ ~ log n + γ (where γ = .5772... represents the Euler-Mascheroni constant).

It is also known (from Dirichlet) that the average no. of divisors (natural number factors) of n

~ log n – 1 + 2γ.

It then struck me how the second result could be related to the first.

In the numbers up to n every number must necessarily include 1 as a factor, then every second number must include 2 as a factor, every 3rd number, 3 as a factor, every fourth number, 4 as a factor and so until finally every nth number includes n as a factor.

Therefore if we were to calculate the average no. of factors to n, this would imply that it would equate to the sum of

1 + 1/2 + 1/3 + 1/4 + ...... + 1/n.

However there is a slight problem with this in that the divisors must in practice be whole numbers.

Therefore when we use the harmonic series, it also includes remainders, thereby over-estimating our result.

For example if we were to consider the total no. of factors to n = 10, using the harmonic series, we would obtain

10 + 5 + 3.33 (instead of 3) + 2.5 (instead of 2) + 2 + 1.67 (instead of 1) + 1.43 (instead of 1) + 1.25 (instead of 1) + 1.11 (instead of 1) + 1.

So apart from the case where the divisor is a factor of n, we get an over-estimate for each quotient using the harmonic series.

So the over-estimate (in this case for n = 10) is obtained by summing the remainders for each quotient (obtained by dividing 10 by each number from 1 up to and including 10).

This sum = . 33 + .5 + .67 + .43 +.25 + .11 = 2.29

So the average remainder up to n = 10 = .229.

Now, as n gets larger it would be reasonable to assume that remainders would vary in a fairly predictable fashion over a range of values from 0 to 1.

Thus the average remainder would be roughly .5.

Therefore the estimate of the average sum of divisors of n (as n increases without limit) would be overstated approximately by .5 using the sum for the harmonic series.

In fact, through using the two formulae, already obtained 1) for the sum of the harmonic series and 2) the average no. of divisors of n respectively, we can obtain a precise estimate for the average remainder.

Thus we subtract 2) from 1) to get the averge value of the remainders,

= log n + γ – (log n – 1 + 2γ) = log n + γ – log n + 1 – 2γ = 1 – γ.

Therefore, with respect to a number n (where n can increase without limit), when it is divided successively by all numbers from 1 to n (inclusive), the average value with respect to the remainders of the quotients arising ~ 1 – γ.


Now the harmonic series relates to the value of the Riemann zeta function for s = 1.

Thus ζ(1) = 1 + 1/2 + 1/3 + 1/4 + .............

Remarkably, the result which we have obtained for the average remainder of these quotients can be related to all other values of the Riemann zeta function (from 2 to n).

So,

1 – γ = {ζ(2) – 1}/2 + {ζ(3) – 1}/3 + {ζ(4) – 1}/4 + {ζ(5) – 1}/5 +…

Thursday, April 7, 2016

Predicting Frequency of Factors

I have mentioned before the Erdős–Kac theorem on my Riemann Hypothesis blog in
"Addendum on Erdős–Kac Theorem" in "More on Erdős–Kac Theorem" and "A Simple Example".

Basically this states if ω(n) is the number of (distinct) prime factors of n, the probability distribution of,

  \frac{\omega(n) - \log\log n}{\sqrt{\log\log n}}  

is the normal distribution.

What is fascinating about this result is that it allows one in principle to calculate for any value of n, the number of composites with 2, 3, 4, 5,...,k factors respectively. Indeed we can also attempt to use it as a means for calculating the number of primes (with 1 factor). However as we shall see, measurements for primes using the normal distribution are not at all reliable.


I will now attempt to illustrate using a simple example.

When n = 1,000,000,000, the average (or mean) number of (distinct) primes is given by log log n = 3.031, with the standard deviation as its square root = 1.741 (correct to 3 decimal places in each case).

We can use the normal distribution (with accompanying tables) to calculate the totals in relation to the number of factors as a percentage of n.

So as 3 is roughly the average number of (distinct) factors (per number) we would expect perhaps that the largest percentage would relate to this category.

However because the normal is a continuous distribution we must adjust the discrete notion of 3 to represent the gap in the normal curve as between 2.5 and 3.5.

So the number of standard deviations that 2.5 lies from the mean is given as (3.031 - 2.5)/1.741 = .30 to 2 decimal places (as my tables are only correct to 2 decimal places).
The area in the normal curve to which this relates = .5 - .3821 = .1179.

Then the corresponding number of standard deviations to which 3.5 corresponds = (3.5 - 3.031)/1.741 = .27.

And the area that this corresponds to is .5 - .3936 = .1064.

Therefore the total area corresponding to numbers with 3 factors = .1179 + .1064 = .2243.

This would suggest therefore that 22% (rounded to the nearest integer) would contain 3 factors. 

The corresponding figures for 4, 5, 6 and 7 factors were then calculated as 19%, 12%, 5% and 2% respectively.

However there is an asymmetry operating, which is not reflected by the use of the normal distribution in this case. Therefore though we can extend indefinitely on the RHS to calculate the probabilities of numbers with an increasing number of factors, this does not equally apply on the LHS where we can now only calculate with respect to two possibilities i.e. for numbers with 2 factors and 1 factor respectively.

If we assume the normal distribution in this case (though not strictly justified) the figures for numbers containing 2 factors and 1 factor (i.e. the primes) works out as 19% and 12% respectively.

I then did a quick empirical study (over a range of 500 from 1,000,000,001 to 1,000,000,500) to calculate the percentage totals that actually occurred in the number system around n (for numbers with these factor totals).






No. of Factors
Normal Dist. Est. (%)
Empirical Estimate (%)
    1
        12     
       4
    2
        20
      18
    3
        22
      29
    4
        19
      27
    5
        12
      14
    6
          5
        6
    7
          2
        1


When the empirical estimates (of actual occurrence) are compared with those based on calculation through the normal distribution, notable discrepancies occur.

In particular, there is considerable over-estimate for the frequency of primes (where actual occurrence is only 1/3 of that predicted using the normal distribution).

Then for the middle scores (3 and 4) actual occurrence is somewhat in excess of that predicted by the normal distribution.

In truth the assumption of a regular normal distribution does not properly apply in this case because the average number of (distinct) prime factors (3) involved per number is still exceedingly low.

However in order to get the appropriate number of factors so that the normal curve does indeed become a suitable approximation, the value of n which in our example is 109, needs to be dramatically increased.


Fro example where the average number of (distinct) prime factors = 10, n = 1010,000.

Therefore, especially in the case of the estimation of primes, the normal distribution could never be used as an accurate measurement. For strictly there is an always unavoidable asymmetry with respect to the LHS of the normal curve. Thus whereas the no. of factors greater than the average can theoretically  increase without limit, the corresponding number of factors less than the average must necessarily end at 1.

Rather, the normal curve can be best used to predict - where the average number of (distinct) factors is relatively high - the % of factors that is likely to occur within specified ranges of that average. However once again it will be less accurate for extreme outliers especially when attempting to calculate the no. of primes (with just one factor).  

Even with the average no. of distinct factors as low as 10, we could estimate pretty accurately for example the % of numbers that would contain exactly 10 distinct factors (12% approx).

We could also say quite accurately that in this case we would expect to find about 68% (approx) within 1 standard deviation of the mean (i.e. between 7 and 13 factors).

However, once again the estimate for primes is widely inaccurate.

Based on the assumption of the normal distribution , we would estimate on average about 1 in 435 numbers as prime, whereas in actual fact it is 1 in 23,000.