Crosswalk

Assessment language

Outcome:

KB & PB

Crosswalk of learning goals – KB & PB – from KB’s 11 modules (slides), 12th module: project presentations

module KB PB differences
1 – Intro course expectations and policies  
  tools and resources PB: git, command-line, colab
  introduction to data science  
  introduction to R Python (assumed) PB: github-classroom (no notebooks)
2 – Data Viz common statistical graphics  
  how to look at data  
  key ingredients of useful plots  
  grammar of graphics seaborn vs matplotlib ggplot vs OO nature of Python
3 – Data Processing Types of data PB calls the module: 03-Tidy
  Structuring data for data science  
  Data wrangling and transformation  
  Summarizing data  
4 – EDA What is EDA? in PB modules 2 & 3 not a separate module
  Variation and covariation in data ’’ ’’
  “Interesting” visualizations ’’ ’’
5 – SQL What is relational data? PB calls the module: 04-Relational
  What is SQL?  
  Basics of relational algebra  
  Types of joins  
6 – Modeling I What are the goals of modeling? PB calls it: 05-Regression
  Why linear regression?  
  Fitting linear models  
  Model diagnostics  
7 – Modeling II Criteria for evaluating models PB calls it: 07-Resampling
  Overfitting and how to avoid it  
  Performing cross-validation  
  Selecting models  
8 – Statistical Inference What is statistical inference? in 05-Regression not a separate module
  Distributions of statistics ’’ ’’
  Confidence intervals ’’ ’’
  Hypothesis tests ’’ ’’
9 – SupervisedML What are the goals of supervised ML? PB calls it: 06-Classification
  Building classification models  
  Dealing with class imbalance  
10 – UnsupervisedML Goals of unsupervised ML PB calls it: 09-Unsupervised
  Dimension reduction  
  Clustering  
11 – Text mining Structuring text data PB calls it: 11-Text
  EDA using term frequency  
  Sentiment analysis  
  Topic models  
12 – Projects    

Following content only in PB’s version

module KB PB
10-Trees N/A beyond linear models with Trees & SVMs
  N/A decision-tree basics, random forest, ensembles of weak learners
  N/A SVM with nonlinear kernel (relationship to logistic regression)
  N/A intro to image processing (faces and digits)
12-Deep N/A optional module (there is no related homework)
  N/A comes before projects only when holidays and scheduling allow
  N/A intro to neural networks (perceptron, relationship to logistic regression)
  N/A function approximation, nonlinearity, stochastic GD (tensorflow playground)

PB outline (detail)

Approach & Issues