DS 5010

Introduction to Programming for Data Science

Offers an introductory course on fundamentals of programming and data structures. Covers lists, arrays, trees, hash tables, etc.; program design, programming practices, testing, debugging, maintainability, data collection techniques, and data cleaning and preprocessing. Includes a class project, where students use the concepts covered to collect data from the web, clean and preprocess the data, and make it ready for analysis. (This course description is from the Academic Catalog.)

Course objectives

This course builds on the foundation provided by CS 5001, 5002 and 5003. Students will apply programming basics to the design and implementation of data science applications. In particular, students will be introduced to programming as a collaborative discipline. The primary language is Python. At the end of DS 5010, a student should be able to do the following:

Schedule (Spring 2022)

The schedule below and other detail in this syllabus are subject to change.

Date Week Topic
20-21 Jan 1 github, colab, jupyter
27-28 Jan 2 Basics of reading and plotting data
3-4 Feb 3 Case Study #1: Vaccine effectiveness
10-11 Feb 4
17-18 Feb 5 Case Study #2: Data structures
17-18 Feb 5
24-25 Feb 6 Case Study #3: Numpy, Pandas & Seaborn
3-4 Mar 7
10-11 Mar 8 Case Study #4: USGS Earthquakes API
17-18 Mar Spring Break – NO CLASS
24-25 Mar 9
31-Mar/1 Apr 10 Case Study #5: GPCOG project
7-8 Apr 11
14-15 Apr 12
21-22 Apr 13
28-29 Apr 14
4-5 May 15 Finals week


Case studies

Projects allow students to gain experience working in small teams on practical problems. Code development occurs with a shared github repository using basic tools for collaborative coding such as prototyping in branches, pull requests, merging after independent collaborator review, discussing new functionality with “issues”, etc. Project documentation, attribution and reproducibility are critically important. Documentation should have sufficient detail so that another technical teams could pick up and expand upon the project at a later date. Projects include a front-facing github-pages site that provides an overview understandable by a non-technical audience. The repo and gh-pages site can contribute to student porfolios.

Examples from Spring 2022:


Development environment

You should have a standard Python development environment installed on your computer, including a text editor or IDE. You will find modern recommendations in McKinney’s text.

You should also install Git as described here: https://docs.github.com/en/get-started/quickstart/set-up-git.


Activity Contribution
Homework 50%
Quizzes 10%
Project 30%
Class Participation 10%