Discuss that if Y is uniformly distributed over (0,1); then X = -ln(1-Y) or X = -lnY will be exponentially distributed with parameter $\lambda$ = 1 .
Generate an erlang distribution using the above approach by generating sums of exponential variables.
Do the same using a faster approach.
Use rejection to generate a random variable which has the same or similir pdf to an erlang-3 distribution.
Compare timings of all the methods.
$\textbf{Exponential Distribution}$: An exponentially distributed RV is analogous to geometrically distributed DRV. Consider a time interval t = n$\delta$t, i.e. we discretize t into n small intervals and let the probability that an event occurs be given by: p = $\lambda \delta t$
Such sort of cases are modelled by exponential random variable.
CDF: $F_T(t) = 1 - e^{-\lambda t}$
PDF: $\lambda e^{-\lambda t}$
$\textbf{Erlang Distribution}$: This distribution is the extension or a simple case of the gamma distribution. The erlang distribution is nothing but sum of n independent exponential distribution. When we use n exponential random variables, then such a random variable is known as Erlanf-n distribution.
CDF: $F_{S_n}(x)$ = 1 - $\sum_{j=0}^{n-1} \frac{{\lambda x}^j e^{-\lambda x}}{j!}$
pdf: $f_{S_n}(x)$ = $\frac{{\lambda x}^{n-1} \lambda e^{-\lambda x}}{(n-1)!}$
E[X] = $\frac{n}{\lambda}$. In our case n = 3 and $\lambda$ = 1 and hence E[X] = 3
${\sigma}^2$ = $\frac{n}{{\lambda}^2}$. Again for erlang-3 distribution on our case ${\sigma}^2$ = 3
$\textbf{Rejection Method}$: the rejection method is used to generate a random variable similar to a certain pdf using another efficient random variable.
If Y is uniformly distributed over (0,1); then X = -ln(1-Y) will be exponentially distributed with parameter $\lambda$. We will try to prove this in the following section.
It is to be noted that the exponential pdf is given by: $\lambda e^{-\lambda.x}$
Y ~ U(0,1) and (0 < y < 1)
Thus $f_{Y}(y)$ = 1 for (0 < y < 1)
X = -ln(1-Y)
Thus -X = ln(1-Y)
$e^{-X}$ = $e^{ln(1-Y)}$
$e^{-X}$ = 1 - Y
Y = 1 - $e^{-X}$ = $g^{-1}(X)$
Thus $\frac{dY}{dX}$ = 0 + $e^{-X}$
Thus $f_{X}(x)$ = $f_{X}(g^{-1}(X)) * |\frac{dY}{dX}|$
$f_{X}(x)$ = 1 * $e^{-X}$
Thus $f_{X}(x)$ = 1 $e^{- 1X}$
Comparing with the exponential pdf we can say that $f_{X}(x)$ = $\lambda e^{-\lambda.x}$ with $\lambda$ = 1
thus we proved that X = -ln(1-Y) with Y ~ U(0,1) is an exponential random variable with parameter $\lambda$ = 1
If Y is uniformly distributed over (0,1); then X = -lnY will be exponentially distributed with parameter $\lambda$. We will try to prove this in the following section.
It is to be noted that the exponential pdf is given by: $\lambda e^{-\lambda.x}$
Y ~ U(0,1) and (0 < y < 1)
Thus $f_{Y}(y)$ = 1 for (0 < y < 1)
X = -ln(1-Y)
Thus -X = ln(Y)
$e^{-X}$ = $e^{ln(Y)}$
$e^{-X}$ = Y
Y = $e^{-X}$ = $g^{-1}(X)$
Thus $\frac{dY}{dX}$ = -$e^{-X}$
Thus $f_{X}(x)$ = $f_{X}(g^{-1}(X)) * |\frac{dY}{dX}|$
$f_{X}(x)$ = 1 * $e^{-X}$
Thus $f_{X}(x)$ = 1 $e^{- 1X}$
Comparing with the exponential pdf we can say that $f_{X}(x)$ = $\lambda e^{-\lambda.x}$ with $\lambda$ = 1
Thus we proved that X = -ln(Y) with Y ~ U(0,1) is also an exponential random variable with parameter $\lambda$ = 1
X = generate_samples(1000)
s1 = 'Distribution of the derieved distribution with 1000 samples'
pdf = plot_distribution(X,25,'red','green',s1)
def r():
return generate_Erlang_c(3,1000)
exp = np.random.exponential(scale=1.0, size=10000)
s1 = 'True expenontial distribution with 10000 samples'
plot_distribution(exp,25,'orange','blue',s1)
$\textbf{Results}$: As shown from the above two graphs, X = -ln(1-Y) where Y ~ U(0,1) is an exponentially distributed random variable with the value of $\lambda$ = 1.
If the graphs are not labelled then we cannot even differentiate between the derieved distribution and the true exponential distribution.
erl = np.random.gamma(3,1,size=10000)
s1 = 'True erlang distribution with n=3 and sample size 1000'
plot_distribution(erl,25,'blue','red',s1)
print('Sample mean: ',np.mean(erl))
print('Sample variance: ',np.var(erl))
| Statistic | Theoretical |
|---|---|
| True Mean | 3.0 |
| True Variance | 3.0 |
The above figure is the figure of a true erlang distribution with n = 3. Also shown are the mean and sample variance of the erlang distribution.
As mentioned in the theoretical section the sample mean and sample variance are very close to the value 3 which are equal to the theoretical values.
S_n = generate_Erlang_c(3,1000)
s1 = 'Erlang distribution with n=3 and 1000 samples using approach in part (c)'
plot_distribution(S_n[0],25,'green','orange',s1)
print('Sample mean: ',np.mean(S_n[0]))
print('Sample variance: ',np.var(S_n[0]))
This is the Erlang distribution calculated by the formula mentioned above.
As we can see the distribution is very similar to that of the true Erlang distribution.
The mean and variance are also very close to the theoretical values And they are as follows:
| Statistic | Sample | Theoretical |
|---|---|---|
| Mean | 2.96 | 3.0 |
| Variance | 3.101 | 3.0 |
Using a product and taking log of it gives us an erlang distribution because of the following reasons:
$S_n$ = -ln($\frac{\prod_{i=1}^{n} U_i}{\lambda}$) where [$\lambda$ = 1]
$S_n$ = -ln($\prod_{i=1}^{n} U_i$)
$S_n$ = $\sum_{i=1}^{n} -ln(U_i)$
Let $U_i$ = $Y_i$
Therefore, $S_n$ = $\sum_{i=1}^{n} -ln(Y_i)$
But from part a we know that if X = -ln(Y) with Y ~ U(0,1) then X is an ecponential variable.
Therefore, $S_n$ = $\sum_{i=1}^{n} X_i$, which is nothing but the same equation used in part (c).
Hence this approach should work
S_n = generate_Erlang_d(3,1000)
s1 = 'Erlang distribution with n=3 and 1000 samples using approach in part (d)'
plot_distribution(S_n[0],25,'orange','green',s1)
print('Sample mean: ',np.mean(S_n[0]))
print('Sample variance: ',np.var(S_n[0]))
| Statistic | Sample Stats[Part(c)] | Sample Stats[Part(d)] | Theoretical |
|---|---|---|---|
| Mean | 2.96 | 3.04 | 3.0 |
| Variance | 3.101 | 3.10 | 3.0 |
In order to create erlang distribution with n = 3 we first need to decide a probability distribution function that is like it and then scale it by a factor of c so that it bound the erlang-3 pdf function.
To do this the following steps are taken.
$f_{S_n}$ = $\frac{(\lambda x)^{n-1} \lambda e^{-\lambda x}}{(n-1)!}$
In our case $\lambda = 1$ and n = 3
Thus, $f_{S_n}$ = $\frac{x^2 e^{-x}}{2}$
We select g(x) = ${\lambda}^{'} e^{{\lambda}^{'} x}$. Here we select ${\lambda}^{'}$ to be equal to the mean of the gamma function.
Thus g(x) = $\frac{1}{3} e^{\frac{-x}{3}}$
Thus, $\frac{f(x)}{g(x)}$ = $\frac{3}{2} x^2 e^{\frac{-2x}{3}}$
No in order to find x that maximizes the above equation we take its derivative and equate it to zero.
$\frac{d}{dx}[\frac{f(x)}{g(x)}]$ = 0
Thus, $2 x e^{\frac{-2x}{3}}[1 - \frac{x}{3}]$ = 0
Thus, x = 3
Thus c = $\frac{f(3)}{g(3)}$
Thus c = $\frac{3^3 e^{-2}}{2}$
Thus $\frac{f(x)}{cg(x)}$ = $\frac{x^2 e^{\frac{-2x}{3}}}{3^2 e^{-2}}$
After calculating all this the algorithm works as follows:
Generate Y ~ exp(1/3)
R ~ U(0,1)
if R <= $\frac{f(y)}{cg(y)}$:
Else continue with the process.
exp_3 = np.random.exponential(scale=3, size=10000)
s1 = 'The bounding function of Erlang-3 distribution'
generate_bounding_func(erl,exp_3,85,'orange','green',s1)
S_n_rej = rejection_method_erlang(1000)
s1 = 'The Erlang distribution generated using rejection method'
plot_distribution(S_n_rej[0],25,'purple','cyan',s1)
print('Sample Mean: ',np.mean(S_n_rej[0]))
print('Sample Var: ',np.var(S_n_rej[0]))
| Statistic | Sample Stats[Part(c)] | Sample Stats[Part(d)] | Sample Stats[Part(e)] | Theoretical |
|---|---|---|---|---|
| Mean | 2.96 | 3.04 | 2.95 | 3.0 |
| Variance | 3.101 | 3.10 | 2.77 | 3.0 |
calculate_time(250000)
| Statistic | Sample Stats[Part(c)] | Sample Stats[Part(d)] | Sample Stats[Part(e)] | Theoretical |
|---|---|---|---|---|
| Mean | 2.99 | 2.99 | 3.01 | 3.0 |
| Variance | 2.99 | 2.97 | 2.82 | 3.0 |
| Time Taken | 14.2 seconds | 10.44 seconds | 34.66 seconds | - |
| N | 250,000 | 250,000 | 250,000 | 250,000 |
It is mentioned that the approach in part 'c' is slower as compared to part d. This is visible from the results above in which approach 'c' takes 14.2 seconds and approach d takes 10.44 seconds.
This is because in approach 'c' we first take log of 3 uniform random values and then sum it. So all in all there are 6 operations three for taking the log and three for summing.
Whereas in approach 'd', there are just 4 operations i.e multiplying the 3 uniform random values and taking the log of it. Thus approach 'd' is faster as compared to approach 'c'.
Approach 'e' that is the approach with the rejection method is the slowest. This is inherently because of the algorithm. As per the algorithm if U(0,1) > $\frac{f(x)}{cg(x)}$ then we have to repeat the entire process of checking the previous equation again. Thus this takes a longer time.
$\textbf{Conclusions}$:
We can generate an erlang-n distribution by summing n number of exponential random variables with parameter $\lambda$.
There are various approaches to do this, however some of them are faster than the others. In our case, the approach where we take product of n independent uniform variables and then take log of it is the fastest.
However there are times when do not have inbuilt functions to generate these random variables. In such a case we use the rejection method discussed in section 'e'. Through this method we can generate any random variable X with pdf $f_X(x)$ using another simple and efficient random variable which bounds it. This method take its time to get implemented but it works just fine and gives us the required results.
import numpy as np
import sklearn as sk
import scipy.stats as sci
import matplotlib.pyplot as plt
from matplotlib.font_manager import FontProperties
import matplotlib.mlab as mlab
import seaborn as sns
import warnings
import scipy.stats as st
import operator
from scipy.stats import binom
from scipy.stats import geom
warnings.filterwarnings('ignore')
%matplotlib inline
fontP = FontProperties()
fontP.set_size('small')
np.random.RandomState(seed=42)
from time import time
def generate_samples(n):
#np.random.RandomState(seed=42)
Y = np.array(np.random.uniform(0,1,n))
X = -1*np.log(Y)
return X
def plot_distribution(X,b,hist_col,kde_col,s):
plt.figure(figsize=(9,4), dpi=300)
orig_pdf,bins,patches = plt.hist(X,bins=b,normed = True, \
color=hist_col,alpha = 0.3 \
)
plt.plot(bins[0:len(bins)-1],orig_pdf,color = kde_col)
plt.xlim(0,10)
plt.title(s)
plt.show()
def generate_Erlang_c(n,m):
S_n = []
start = time() # Get start time
for i in range(0,m):
X_n = []
for j in range(0,n):
Y = np.array(np.random.uniform(0,1,1))
X = -1*np.log(Y)
X_n.append(X)
S_i = np.sum(X_n)
S_n.append(S_i)
end = time() # Get start time
tot = end - start
return np.array(S_n),tot
def generate_Erlang_d(n,m):
S_n = []
start = time() # Get start time
for i in range(0,m):
U = []
for j in range(0,n):
U.append(np.random.uniform(0,1,1))
S_i = -1 * np.log(np.prod(U))
S_n.append(S_i)
end = time() # Get start time
tot = end - start
return np.array(S_n),tot
def generate_bounding_func(erl,exp,b,hist_col,kde_col,s):
plt.figure(figsize=(9,4), dpi=300)
exp_pdf,bins_exp,patches = plt.hist(exp,bins=b,normed = True, \
color='green',alpha = 0 \
)
erl_pdf,bins_erl,patches = plt.hist(erl,bins=b,normed = True, \
color=hist_col,alpha = 0.3 \
,label = 'erlang-3 dist')
exp_pdf = exp_pdf*1.82
plt.plot(bins_exp[0:len(bins_exp)-1],exp_pdf,color = kde_col,label='1.82*exp with lambda=3 pdf')
plt.plot(bins_erl[0:len(bins_exp)-1],erl_pdf,color = 'red',label='erlang-3 pdf')
plt.xlim(0,10)
plt.title(s)
plt.legend()
plt.show()
def rejection_method_erlang(n):
S_n = []
p = 0
start = time() # Get start time
while(p < n):
X = np.random.exponential(scale=1/3, size=1)
R = np.random.uniform(0,1,size=1)
den = 3*3*np.exp(-2)
num = X*X*np.exp(X*(2/3))
#den = (3**4)*(np.exp(-2))
#num = X*X*np.exp(2*X)
if(R <= (num/den)):
S_n.append(X[0])
p += 1
end = time() # Get end time
tot = end-start
S_n = r()[0]
return S_n,tot
def calculate_time(N):
S_n = generate_Erlang_c(3,N)
print('Time taken by part c is: ',S_n[1])
s1 = 'Erlang distribution with n=3 and ',N,' samples using approach in part (c)'
plot_distribution(S_n[0],25,'green','orange',s1)
print('Sample Mean for part (c.): ',np.mean(S_n[0]))
print('Sample Var for part (c.): ',np.var(S_n[0]))
S_n = generate_Erlang_d(3,N)
print('Time taken by part d is: ',S_n[1])
s1 = 'Erlang distribution with n=3 and ',N,' samples using approach in part (d)'
plot_distribution(S_n[0],25,'orange','green',s1)
print('Sample Mean for part (d.): ',np.mean(S_n[0]))
print('Sample Var for part (d.): ',np.var(S_n[0]))
S_n_rej = rejection_method_erlang(N)
print('Time taken by part d is: ',S_n_rej[1])
s1 = 'The Erlang distribution generated using rejection method'
plot_distribution(S_n_rej[0],25,'purple','cyan',s1)
print('Sample Mean: for part (e.): ',np.mean(S_n_rej[0]))
print('Sample Var: for part (e.): ',np.var(S_n_rej[0]))