Post

Statistics for Data Science — Basic Statistics

Statistics for Data Science — Basic Statistics

Statistics is a foundational component of data science, providing powerful tools and techniques for analyzing and interpreting data. Data scientists use statistical methods to extract meaningful insights from large and complex datasets, identify patterns and trends, and support informed business decisions. With a strong statistical foundation, a data scientist can better understand the behavior of data.

In this blog series, we will cover everything from foundational theories to advanced analytical techniques and explore their real-world applications.

What Is Statistics?

Statistics is the branch of applied mathematics that deals with the collection, organization, analysis, interpretation, and presentation of data.

It is widely used in science, economics, social sciences, business, and engineering to generate insights, make predictions, and guide decision-making. In simple terms, statistics helps us discover patterns, trends, and relationships in data.

Examples

  • Calculating the average (mean) marks of students in an exam.
  • Estimating the average height of all students in a school based on a sample of 100 students.

Key Concepts

Data

Data can be anything and everything. Any information or fact can be considered data.

Examples: age, weight, score, income.

Population

A population is the complete set of individuals or items that share a common characteristic and are the subject of study.

Example: all students in a class.

Types of Population

  • Finite population: Can be counted and measured directly. Example: the number of people enrolled in a course.
  • Infinite population: So large that it cannot be fully counted. Example: the number of Google searches performed per second.

Sample

A sample is a subset of a population used to draw conclusions about the entire population.

Example: surveying 100 students to understand the study habits of all students in a school.

Parameter

A parameter is a numerical value that describes a population.

Example: if the true average height of all students in a school is 5.5 feet, that value is a parameter.

Statistic

A statistic is a numerical value that describes a sample.

Example: if 100 students are measured and their average height is 5.4 feet, that value is a statistic.

Variable

A variable is any characteristic or quantity that can take different values.

Examples: age, length, height.

Types of Variables

  • Qualitative (categorical) variable: Describes qualities or categories. Examples: color of a car, blood type, gender.
  • Quantitative (numerical) variable: Represents measurable quantities. Examples: number of children, weight, income.

Types of Quantitative Data

  • Discrete data: Takes specific, countable values (often integers). Example: number of students in a class (30, 31, 33).
  • Continuous data: Takes any value within a range and is measured. Example: height of a person.

Scales of Measurement

There are four primary scales of measurement:

  • Nominal Scale
  • Ordinal Scale
  • Interval Scale
  • Ratio Scale

Nominal Scale

The nominal scale classifies data into distinct categories with no inherent order.

Examples:

  • Gender: Male, Female
  • Blood Type: A, B, AB, O
  • Marital Status: Single, Married, Divorced

Ordinal Scale

The ordinal scale ranks data in a meaningful order, but differences between ranks are not equal or precisely measurable.

Examples:

  • Education Level: High School, Bachelor’s, Master’s, PhD
  • Customer Satisfaction: Very Unsatisfied to Very Satisfied
  • Economic Status: Low, Middle, High

Interval Scale

The interval scale has ordered values with equal intervals, but no true zero point.

Examples:

  • Temperature: Celsius, Fahrenheit
  • Calendar Years
  • IQ Scores

With interval data, addition and subtraction are meaningful, but ratio statements are not. For example, 20 C is not twice as warm as 10 C.

Ratio Scale

The ratio scale has all interval scale properties plus an absolute zero point, allowing meaningful ratio comparisons.

Examples:

  • Height
  • Weight
  • Age

Types of Statistics

There are two major types of statistics:

  • Descriptive Statistics
  • Inferential Statistics

Descriptive Statistics

Descriptive statistics summarizes and presents data in a meaningful way so that we can understand it quickly.

Key Components

  • Measures of central tendency
    • Mean: average value
    • Median: middle value in ordered data
    • Mode: most frequent value
  • Measures of dispersion (variability)
    • Range: highest minus lowest value
    • Variance: average squared deviation from the mean
    • Standard Deviation: square root of variance
  • Frequency distribution
    • Shows how often each value appears (tables, histograms, pie charts)

Inferential Statistics

Inferential statistics uses sample data to draw conclusions or make predictions about a population.

Key Components

  • Hypothesis Testing
    • Null Hypothesis (H0): no effect or no difference
    • Alternative Hypothesis (H1): effect or difference exists
  • Confidence Intervals
    • A range likely to contain the true population parameter
  • Regression Analysis
    • Studies relationships between variables and supports prediction
  • Statistical Tests
    • t-tests, chi-square tests, ANOVA

Thanks for reading.

“Your network is your net worth.” - Tim Sanders

Connect on LinkedIn: md-sawrab

GitHub: md-sawrab

This post is licensed under CC BY 4.0 by the author.