I often come across people throwing around statements like: "Statistics is hard" or "Statistics is so difficult".
I am here to be the bearer of good news. (Drums rolling) It is all a myth.
Look, I am not saying that statistics is the easiest thing on the planet. I know this too well, I majored in it for 4 years. I am just saying that we are all capable of mastering the basics of the subject. Of course it may take a few hours of dedicated learning, but honestly unless you are some kind of Einstein or Ramanujan, you are going to have to put in the work.
Now that we all have a positive mindset, let's get started.
Why are you trying to learn statistics?
There are several reasons why someone may need to have an understanding of statistics:
It is a unit being taught in your course. You really do not have a choice here.
You are writing your thesis or dissertation and need to carry out some analytic research.
You are a data analyst or data scientist (It is impossible to be a good data scientist without mastering the basics of statistics. It is not just about writing code, it's also about understanding what the numbers mean)
You are a Philomath and just like learning and studying.
If I have left anyone out, kindly let us know your reasons in the comment section below.
The Basics
I have been lucky enough to fit into several of the above mentioned reasons. With that in mind here are some basics in statistics that you need to know:
Variables
Probability
Probability distribution
Descriptive statistics
Inferential statistics
Variables
Understanding variables is key to mastering statistics. This is because simply knowing the type of variable can help with knowing which descriptive and inferential statistics to use.
There are 2 main types of variables:
1. Qualitative variables : These describe categorical variables and their values are generally names.
Under qualitative variables we have:
Nominal variables: The values are just names with no particular ordering, eg: Country of residence, race, gender
Ordinal variables: The values are names with a particular ordering eg: the likert scale (strongly disagree to strongly agree)
2. Quantitative variables: These are numeric variables and their values are numeric (they can be measured or counted)
Under quantitative variables we have:
Discrete variables: The values are numeric and are specific and certain. For example: Number of students in class (you cannot have half a student, its either1, 2, 3, etc.)
Continuous variables: The values are numeric and can fall in any specified range eg: weight and height
Probability
Probability is the numerical measure for the degree of certainty or likelihood of the occurrence of an event.
In life, we often tend to ask ourselves several questions:
How likely am I to get clients for my new business?
How likely is it that the younger generation spends more time on their phones?
How likely am I to finish my work if I indulge in just one movie on Netflix?
We basically go about our day to day lives on the basis of likelihoods. Understanding probability and the different ways to calculate it is definitely key.
Probability distribution
Remember in school when the exam results were announced? It was normal to find a few people with really high grades, a large group who performed fairly and then another small group that had really poor grades. Turns out your scores followed a distribution, a normal distribution.
Understanding the distribution, helps you understand the possible outcomes for a random event.
There are several other distributions that describe different events in life eg:
Poisson distribution
Binomial distribution
Exponential distribution
Uniform distribution
Descriptive statistics
Descriptive statistics are used to summarize and describe the characteristics of a variable.
Here's the catch though, you cannot describe a variable of the country of residence the same way you would the height of an individual. Basically, so as to properly describe a variable, you need to know what kind of variable it is. The basics!
Descriptive statistics are accomplished using:
Measures: mean, median, standard deviation, variance, skewness, kurtosis, frequency, correlation
Visualization: bar charts, histograms, pie charts, line graphs etc.
Inferential statistics
These are statistics calculated so as to make conclusions and reasonable guesses.
In real life, we use inferential statistics to carry out hypothesis tests. So we basically come up with some idea or question, then we run mathematical tests to prove ourselves wrong or right. For example, maybe you would like to know whether people enjoy watching comedy or action movies more.
We also have a catch here. To know the kind of inferential statistics to use, you need to understand both the type of variable, as well as the probability distribution of the variables. Lemme say it again, the basics!
*****************
Conclusion
I believe in laying strong foundations especially when learning. If you can master the basic concepts, everything else can be easily built on that. Stay tuned for my Statistics Zero to Hero beginner course.
Comments