IBM banner image

Course Details

 Course Code: DA0101EN

 Audience: Anyone

 Course Level: Intermediate

 Time to Complete: 8 hours

 Learning Path: Applied Data Science with Python

Learn to Analyze Data with Python

This course will take you from Python basics to exploring different types of data. You will learn how to prepare data for analyses, perform simple statistical analyses, create meaningful data visualizations, predict future trends from data, and more.

You will learn how to:

  • Import data sets
  • Clean and prepare data for analysis
  • Manipulate pandas DataFrame
  • Summarize data
  • Build machine learning models using scikit-learn
  • Build data pipelines

Data Analysis libraries:

  • You will learn to use Pandas DataFrames, NumPy multi-dimensional arrays, and SciPy libraries to work with various datasets. You will be introduced to pandas, an open-source library, to load, manipulate, analyze, and visualize datasets. You will also learn about another open-source library, scikit-learn, and use some of its machine-learning algorithms to build smart models and make predictions.
  • Data Analysis with Python is delivered through lectures, hands-on labs and assignments.

Course Syllabus

  • Learning Objectives
  • Understanding the Domain
  • Understanding the Dataset
  • Python package for data science
  • Importing and Exporting Data in Python
  • Basic Insights from Datasets
  • Identify and Handle Missing Values
  • Data Formatting
  • Data Normalization Sets
  • Binning
  • Indicator variables
  • Descriptive Statistics
  • Basic of Grouping
  • Correlation
  • More on Correlation
  • Simple and Multiple Linear Regression
  • Model Evaluation Using Visualization
  • Polynomial Regression and Pipelines
  • R-squared and MSE for In-Sample Evaluation
  • Prediction and Decision Making
  • Model Evaluation
  • Over-fitting, Under-fitting and Model Selection
  • Ridge Regression
  • Grid Search
  • Model Refinement

General Information

  • Self-paced
  • Flexible enrolment
  • Audit multiple times
  • Python programming and statistics


  • Some Python experience is expected
  • Python for Data Science

Course Staff


Joseph Santarcangelo Ph.D.

Joseph has a Ph.D. in Electrical Engineering. His research focuses on using machine learning, signal processing, and computer vision to determine how videos impact human cognition. Joseph has been working for IBM since he completed his PhD.

Mahdi Noorian

Mahdi Noorian Ph.D.

Mahdi Noorian is a Postdoctoral Fellow at the Laboratory for Systems, Software and Semantics (LS3) of the Ryerson University. He holds a Ph.D in Computer Science from University of New Brunswick. As a Data Scientist, he is interested in application of machine learning, data mining, optimization, and semantic data analysis in big data to solve the real-world problems.

Other Contributors

Bahare Talayian, Fiorella Wenver, Ke Xing, Steven Dong and Hima Vasudevan have also contributed to this course.