$\textbf{Uniform random variable}$:For a uniform random variable X ~ U(a,b) the equations of mean and variance are as follows:
$E[X] = \mu = \frac{(a+b)}{2}$
$var(X) = \sigma_X^2 = \frac{a^2+ab+b^2}{12}$
Thus here we have a = 0 and b = 1 our values of $\mu$ and $\sigma_X^2$ are as follows:
$\mu = \frac{(0+1)}{2} = 0.5$
$\sigma_X^2 = \frac{0^2+0*1+1^2}{12} = 0.0833$
$\textbf{Central Limit Theorem}$: The central limit theorem states that if we have a population with mean $\mu$ and variance $\sigma_X^2$ then if we take random samples from that population then the sample means of those samples would be normally distributed with mean $\mu$ and variance $\frac{\sigma_X^2}{n}$ where n is the size of each sample.
$\textbf{Covariance and Correlation}$:
COV(X,Y) = E[X,Y] - E[X]*E[Y]
CORR(X,Y) = $\frac{COV(X,Y)}{\sigma_{X}\sigma_{Y}}$
Covariance and Correlation are measures through which we can see if two random variables are Dependent on each other.
import numpy as np
import sklearn as sk
import scipy.stats as sci
import matplotlib.pyplot as plt
from matplotlib.font_manager import FontProperties
import matplotlib.mlab as mlab
%matplotlib inline
fontP = FontProperties()
fontP.set_size('small')
########### Calculating mean of Uniform pdf###################
true_mean = (0+1)/2
true_variance = (1)/12
true_std = np.sqrt(true_variance)
print("Mean of true uniform random variable is: ",true_mean)
print("Variance of true uniform random variable is: ",true_variance)
###############Generate Uniform Random variable################
np.random.RandomState(seed=42)
X = np.random.uniform(0,1,1000)
$E[X] = \mu = \frac{(a+b)}{2}$
$var(X) = \sigma_X^2 = \frac{a^2+ab+b^2}{12}$
Thus here we have a = 0 and b = 1 our values of $\mu$ and $\sigma_X^2$ are as follows:
$\mu$ = $\frac{(0+1)}{2} = 0.5$
$\sigma_X^2$ = $\frac{0^2+0*1+1^2}{12} = 0.0833$
$\mu$ | $\sigma_X^2$ |
---|---|
0.5 | 0.0833 |
There are two things we need to take care of when we are considering a particular statistic of a random variable:
Population parameters: These parameters are mean($\mu$) and variance($\sigma_X^2$). These polupation parameters capture the real underlying mean and variance of the population. Generally speaking they are unknown to us as we will have to take the data of all the entities in order to calculate the true parameters.
Point Estimates: As mentioned above instead of measuring the value of each and every entity and wasting our resources we can rather sample randomly from the population and measure the mean and variance of that sample mean. This is known as sample mean(m) and sample variance($s^2$). In the example that I mentioned above a sample can be a finite or small number of males selected independently.
##############Calculating mean var for N=100########################
X_100 = np.random.uniform(0, 1,100)
m_100 = np.sum(X_100)/len(X_100)
var_100 = np.sum((X_100-m_100)*(X_100-m_100))/(len(X_100)-1)
std_100 = np.sqrt(var_100)
#print('Mean of 100 random sample is: ',m_100)
#print('Variance of 100 random sample is: ',var_100)
#print('Sample variance of sample mean of size 100 is: ',var_100/100)
##############Calculating mean var for N=10000########################
X_10k = np.random.uniform(0, 1,10000)
m_10k = np.sum(X_10k)/len(X_10k)
var_10k = np.sum((X_10k-m_10k)*(X_10k-m_10k))/(len(X_10k)-1)
std_10k = np.sqrt(var_10k)
#print('Mean of 10000 random sample is: ',m_10k)
#print('Variance of 10000 random sample is: ',var_10k)
#print('Sample variance of sample mean of size 10k is: ',var_100/10000)
###################Hist for N=100##################################
f, (ax1, ax2) = plt.subplots(1, 2,figsize=(18,8))
f.suptitle('Uniform distribution simulations')
count, bins, ignored = ax1.hist(X_100,color='green',alpha=0.5,bins=15,density=True)
ax1.plot(bins, np.ones_like(bins), linewidth=2, color='r',label='True uniform pdf')
ax1.axvline(m_100, color='orange', linestyle='dashed', linewidth=1,label='sample mean')
ax1.axvline(true_mean, color='purple', linestyle='-.', linewidth=1,label='True mean')
ax1.axvline(m_100-std_100, color='black', linestyle=':', linewidth=1,label='1 standard deviation sample')
ax1.axvline(m_100+std_100, color='black', linestyle=':', linewidth=1)
ax1.axvline(true_mean-true_std, color='cyan', linestyle='dashed', linewidth=1,label='1 standard deviation population')
ax1.axvline(true_mean+true_std, color='cyan', linestyle='dashed', linewidth=1)
ax1.set(xlabel="X",ylabel="P(X)")
ax1.legend(loc=9, bbox_to_anchor=(1.2, 0.5))
ax1.set_title('uniform random variable with sample size 100')
###################Hist for N=10000##################################
count, bins, ignored = ax2.hist(X_10k,color='blue',alpha=0.5,bins=15,density=True)
ax2.plot(bins, np.ones_like(bins), linewidth=2, color='r',label='True uniform pdf')
ax2.axvline(m_10k, color='orange', linestyle='dashed', linewidth=1,label='sample mean')
ax2.axvline(true_mean, color='purple', linestyle='-.', linewidth=1,label='True mean')
ax2.axvline(m_10k-std_10k, color='black', linestyle=':', linewidth=1,label='1 standard deviation sample')
ax2.axvline(m_10k+std_10k, color='black', linestyle=':', linewidth=1)
ax2.axvline(true_mean-true_std, color='cyan', linestyle='dashed', linewidth=1,label='1 standard deviation population')
ax2.axvline(true_mean+true_std, color='cyan', linestyle='dashed', linewidth=1)
ax2.set(xlabel="X",ylabel="P(X)")
ax2.legend(loc=9, bbox_to_anchor=(1.2, 0.5))
ax2.set_title('uniform random variable with sample size 10000')
plt.subplots_adjust(left=0.01, wspace=0.52, top=0.8)
plt.show()
sample size | m | $s^2$ | sample variance |
---|---|---|---|
100 | 0.525 | 0.088 | 0.00085 |
10000 | 0.498 | 0.084 | $8.5* 10^{-6}$ |
True value | $\mu$ = 0.5 | $\sigma_X^2$ = 0.083 |
#######################Calculating Point Estimates for N=50 samples##########################
point_est = []
for i in range(50):
X = np.random.uniform(0, 1,100)
m = np.sum(X)/len(X)
point_est.append(m)
point_est = np.array(point_est)
mean = np.sum(point_est)/len(point_est)
var = np.sum(np.square(point_est-mean))/(len(point_est)-1)
std = np.sqrt(var)
true_sample_var = true_variance/50
true_sample_std = np.sqrt(true_sample_var)
############################Plotting distribution for 50 samples############
f, (ax1, ax2) = plt.subplots(1, 2,figsize=(18,8))
f.suptitle('Sampling Distributions: ')
weight = point_est/np.sum(point_est)
count,bins, ignored = ax1.hist(point_est,bins=9,alpha = 0.5,color = 'green',edgecolor='blue')
ax1.axvline(mean, color='yellow', linestyle='dashed', linewidth=2,label='sampling distribution mean')
ax1.axvline(true_mean, color='purple', linestyle='-.', linewidth=2,label='True mean')
ax1.axvline(mean-std, color='black', linestyle=':', linewidth=1.5,label='1 standard deviation sample')
ax1.axvline(mean+std, color='black', linestyle=':', linewidth=1.5)
l1 = 'True 1 Standard deviation'
ax1.axvline(true_mean-true_sample_std, color='cyan', linestyle='dashed', linewidth=1.5,label=l1)
ax1.axvline(true_mean+true_sample_std, color='cyan', linestyle='dashed', linewidth=1.5)
ax1.set(xlabel="Point estimates",ylabel="P(X)")
ax1.legend(loc=9, bbox_to_anchor=(0.8, 1))
ax1.set_title('Sampling Distribution of point estimates with 50 samples')
#print('Mean of point estimate is with samples 50 is: ',mean)
#print('Variance of point estimate with samples 50 is: ',var)
#######################Calculating Point Estimates for N=100 samples##########################
point_est = []
for i in range(50000):
X = np.random.uniform(0, 1,100)
m = np.sum(X)/len(X)
point_est.append(m)
point_est = np.array(point_est)
mean = np.sum(point_est)/len(point_est)
var = np.sum(np.square(point_est-mean))/(len(point_est)-1)
std = np.sqrt(var)
true_sample_var = true_variance/50
true_sample_std = np.sqrt(true_sample_var)
weight = point_est/np.sum(point_est)
############################Plotting distribution for 50k samples############
count,bins, ignored = ax2.hist(point_est,bins=100,alpha = 0.5,color = 'orange',edgecolor='blue')
ax2.axvline(mean, color='purple', linestyle='dashed', linewidth=2,label='sampling distribution mean')
ax2.axvline(true_mean, color='green', linestyle='-.', linewidth=2,label='True mean')
ax2.axvline(mean-std, color='black', linestyle=':', linewidth=1.5,label='1 standard deviation sample')
ax2.axvline(mean+std, color='black', linestyle=':', linewidth=1.5)
l2 = 'True 1 Standard deviation'
ax2.axvline(true_mean-true_sample_std, color='cyan', linestyle='dashed', linewidth=1.5,label=l2)
ax2.axvline(true_mean+true_sample_std, color='cyan', linestyle='dashed', linewidth=1.5)
ax2.set(xlabel="Point estimates",ylabel="P(X)")
ax2.legend(loc=9, bbox_to_anchor=(0.8, 1))
ax2.set_title('Sampling Distribution of point estimates with 50000 samples')
#print('Mean of point estimate with sample 50k is: ',mean)
#print('Variance of point estimate with samples 50k: ',var)
plt.subplots_adjust(left=0.01, wspace=0.52, top=0.8)
plt.show()
total samples(N = 100) | sampling dist mean | sampling dist variance |
---|---|---|
50 | 0.499 | 0.00090 |
50000 | 0.50 | 0.00083 |
Sampling values(N=100) | 0.50 | 0.00085 |
Since the sample is drawn of a uniform independent sample we expect that the covariance would be zero or very less. Covariance basically measures the linear relationship between two random variables. However two independent variables would have 0 covariance but a zero covariance does not imply that the random variables are independent. As a result of this we will also calculate the correlation whose range is between [-1,1]. If the value of correlation is 1 or -1 then there is linear relationship between the two random variables and if the value is 0 then there is no linear relationship between the random variables.
COV($X_i$,$X_{i+1}$) = E[$X_i$,$X_{i+1}$] - E[$X_i$]* E[$X_{i+1}$]
COV($X_i$,$X_{i+1}$) = $\sum_{N=1}^{\infty} \frac{X_i*X_{i+1}}{N}$ - $\sum_{N=1}^{\infty} \frac{X_i}{N}* \sum_{N=1}^{\infty} \frac{X_{i+1}}{N}$
CORR($X_i$,$X_{i+1}$) = $\frac{COV(X_i,X_{i+1})}{\sigma_{X_i}\sigma_{X_{i+1}}}$
########Calculating Covariance and Correlation#############
X = np.random.uniform(0,1,1001)
A=0
for i in range(1000):
A = A + X[i]*X[i+1]
A_final = A/1000
B = (np.sum(X)-X[1000])/1000
C = (np.sum(X)-X[0])/1000
Z = A_final - (B*C)
X_i = X[0:1000]
X_i1 = X[1:1001]
std_Xi = np.std(X_i)
std_Xi_1 = np.std(X_i1)
#print("value of Covariance Z is: ",Z)
#print("Correlation is :",(Z)/(std_Xi*std_Xi_1))
#Show linear relationship
Y = np.random.uniform(0,1,100)
Y_1 = 2*Y + 57
##################Plotting scatter plots#########################
f, (ax1, ax2) = plt.subplots(1, 2,figsize=(12,6))
f.suptitle('Scatter Plots')
ax1.scatter(X_i,X_i1,color='red',alpha=0.6,edgecolors='blue')
ax1.set(xlabel='X_i', ylabel='X_i+1')
ax1.set_title('Scatter plot of uniformly generated X_i and X_i+1')
ax2.scatter(Y,Y_1,color='orange',alpha=0.6,edgecolors='yellow')
ax2.set(xlabel='Y', ylabel='Y_1')
ax2.set_title('Scatter plot two linearly dependent variables')
plt.show()
covariance(Z) | correlation |
---|---|
0.001 | 0.016 |