Is Statistics important for Data Science?

Is Statistics important for Data Science?

Is Statistics important for Data Science?

Did you liked it ??
+1
0
+1
0
+1
0
+1
0

Introduction

Statistics is the science of conducting studies to collect, organize, summarize, analyze and draw a conclusion out of the data. It is nothing but learning from data.

The field of math Statistics mainly deals with collective information, interpreting those information from data set and drawing conclusion from the it. It can be used in various fields.

For example, when we observe any cricket matches there are various terms used like batting average, bowling economy, strike rate, etc. Also we can observe many graphs and data visualizations. This things are the part of statistics. Here information is analyzed and various results are shown accordingly.

We can talk about statistics all the time but do we know the science behind it?

Here by using various methods various large cricket organizations compare players, teams and rank them accordingly. So if we learn the science behind it we can create our ranking, compare different thing and debate with hard facts.

Stats is very important in the field of analytics, Data Science, artificial intelligence ai, machine learning models, deep neural networks (deep learning). It is a used to process complex problems in the real world so that data professionals like data analyst and data scientist can analyze data and retrieve meaningful insights from data.

In simple words, stats can be used to derive meaningful insights from data by performing mathematical computations on it.

The field of statistics is divided into two parts Descriptive statistics and Inferential statistics. And data has two types quantitative data and qualitative data and it can be either labelled data or unlabeled data.

Some important terms used

Population: In statistics, a population is the entire pool from which statistical sample is drawn.  For example: Consider all students in a college. All students in the college are considered as population. Population can be contrasted with samples.

Samples: Sample is subset of the population. Sample is derived from population. It is representative of population. It refers to set of observation drawn from population.

It is necessary to use samples for research because it is impractical to study the whole population. For example, we want to know the average heights of boys in college.

So we can’t consider population as there can lots of boys and measuring height and calculating height is not reliable. So for such cases samples are taken. As sample is representative of population. Certain amount of boys are selected as a sample and average is computed.

Variable: A characteristic of each element of population or a sample is called as variable.

Also read: Essential Mathematics to master Data Science

Some of the important topics which we will be discussing in further articles are:

Basics statistics:

  • Terms related to statistics.
  • Random variables
  • Population and sample concept.
  • Measures of central tendency
  • Measures of variability
  • Sampling Techniques
  • Measures of Dispersion
  • Gaussian / Normal Distribution

Intermediate Statistics

  • Standard Normal Distribution
  • z-score
  • Probability Density function (pdf)
  • Cumulative distribution function (cdf)
  • Hypothesis testing
  • Plotting graphs
  • Kernel Density Estimation
  • Central limit theorem
  • Skewness of data
  • Covariance
  • Pearson correlation coefficient
  • Spearman Rank Correlation

Advanced Statistics

  • Q-Q Plot
  • Chebyshev’s inequality
  • Discrete and continuous distribution
  • Bernoulli and Binomial distribution
  • Log Normal Distribution
  • Power Law distribution
  • Box – cox transform
  • Poisson Distribution
  • z-stats
  • t-stats
  • Type 1 and Type 2 error
  • chi-square test
  • Annova testing
  • F-stats
  • A/B testing

Looking at the topics we can interpret that topics are tough but it depends on level of understanding and determination to learn. It’s not any rocket science and can be easily done.

It’s pretty much important that you know statistics because it’s going to be the pre-requisite for you further Data Science journey. So let’s kickstart our journey of statistics here.

The best way to learn anything is to understand it properly and interpret it by implementing it. As we learn from our mistakes so it’s better to keep learning unless you don’t understand it properly.

Before jumping into deep data science I will like to repeat that learning “Statistics” is must.

Let’s go 🚀🚀

Did you liked it ??
+1
0
+1
0
+1
0
+1
0

Leave a Reply

Your email address will not be published. Required fields are marked *