IBM banner image

Course Details

 Course Code: ML0101ENv3

 Audience: Anyone

 Course Level: Intermediate

 Time to Complete: 12 hours

 Language: English

About the Course

In this course you will learn about basic statistics and data types, preparing data, feature engineering, fitting a model and pipelines and grid search.  Apache Spark™ is a fast and general engine for large-scale data processing with built-in modules for streaming, machine learning and graph processing. This course shows you how to use Spark’s machine learning pipelines to fit models and search for optimal hyperparameters using a Spark cluster.

Course Syllabus

  • Vectors and Labelled Points
  • Local and Distributed Matrices
  • Summary Statistics, Correlations, and Random Data
  • Sampling
  • Hypothesis Testing
  • Statistics, Random data and Sampling on Data Frames
  • Handling Missing Data and Imputing Values
  • Transformers and Estimators
  • Data Normalization
  • Identifying Outliers
  • Feature Vectors
  • Categorical Features
  • Using Explode, User Defined Functions, and Pivot
  • Principal Component Analysis (PCA) in Feature Engineering
  • RFormulas
  • Decision Trees
  • Random Forests
  • Gradient-Boosting Trees
  • Linear Methods
  • Evaluation
  • Predicting Grant Applications: Introduction
  • Predicting Grant Applications: Creating Features
  • Predicting Grant Applications: Building a Pipeline
  • Prediciting Grant Applications: Cross Validation and Model Tuning
  • Predicting Grant Applications: Wrapping up

General Informatiln

  • Self-paced
  • Flexible enrolment
  • Audit multiple times
  • There is only ONE chance to pass the course, but multiple attempts per question

Recommended Existing Skills

  • General understanding of Scala Experience with Java (preferred)
  • Python, or another object­ oriented language
  • General understanding of machine learning

Course Staff

Petro Verkhogliad

Petro Verkhogliad

Petro Verkhogliad is Consulting Manager at Lightbend. He holds a Masters degree in Computer Science with specialization in Intelligent Systems. He is passionate about functional programming and applications of AI.

Petro Verkhogliad

Dr Priya Dev

Dr Priya Dev is a lecturer of statistics at ANU and UNSW and also a founder of a mobile commerce startup, Qhopper. She completed a PhD in probability theory from ANU and Columbia University and has been a data analytics consultant to ASX listed companies and global banks. Qhopper is a massively scalable mobile commerce platform built on the Lightbend platform using Scala and Spark. It bridges the technology gap for hospitality businesses, helping them create better experiences and connect with new and existing customers through their own online ordering, CRM and business intelligence suite.

Joseph Santarcangelo Ph.D.

Joseph Santarcangelo Ph.D.

Joseph has a Ph.D. in Electrical Engineering. His research focuses on using machine learning, signal processing, and computer vision to determine how videos impact human cognition. Joseph has been working for IBM since he completed his PhD.

Other Contributors

Agatha Colangelo also contributed.