Gaussian Distribution for Machine Learning and Data Science (Normal Distribution)

Hemanth Nhs
2 min readJun 11, 2019

--

Gaussian or Normal Distribution is a very common term in statistics. These are generally used to represent random variables that coming into Machine Learning we can say which is something like the error when we don't know the weight vector for our Linear Regression Model. In a Gaussian distribution, the more data near to the mean and is like a bell curve in general

We have two main parameters to explain or inform regarding our Gaussian distribution model they are mean and variance. Mean is usually represented by μ and variance with σ² (σ is the standard deviation). The graph is symmetric about the mean for a gaussian distribution. The mean, median, and mode are equal.

So coming into μ and σ, μ is the mean value of our data and σ is the spread of our data. We can express the probability density for Gaussian distribution as

Image from Wikipedia

While usually modeling a large data it is common that more data is closer to the mean value and the very few or less frequent data is observed towards the extremes, which is nothing but a gaussian distribution that looks like this(μ = 0 and σ = 1):

Adding to the above statement we can refer to the Central limit theorem to strengthen the above assumption.

The central limit theorem (CLT) establishes that, in some situations, when independent random variables are added, their properly normalized sum tends toward a normal distribution (informally a “bell curve”) even if the original variables themselves are not normally distribute

So because of these properties and Central Limit Theorem (CLT), Gaussian distribution is often used in Machine Learning Algorithms.

References and Other readings

Normal Distribution — Wikipedia

Normal Distribution — Crash Course

Gaussian Distribution Python Code

Central limit theorem

--

--