DS 5010

Introduction to Programming for Data Science

Offers an introductory course on fundamentals of programming and data structures. Covers lists, arrays, trees, hash tables, etc.; program design, programming practices, testing, debugging, maintainability, data collection techniques, and data cleaning and preprocessing. Includes a class project, where students use the concepts covered to collect data from the web, clean and preprocess the data, and make it ready for analysis. (This course description is from the Academic Catalog.)

Course objectives

This course builds on the foundation provided by CS 5001, 5002 and 5003. Students will apply programming basics to the design and implementation of data science applications. In particular, students will be introduced to programming as a collaborative discipline. The primary language is Python. At the end of DS 5010, a student should be able to do the following:

Approach

Case studies

Projects allow students to gain experience working in small teams on practical problems. Code development occurs with a shared github repository using basic tools for collaborative coding such as prototyping in branches, pull requests, merging after independent collaborator review, discussing new functionality with “issues”, etc. Project documentation, attribution and reproducibility are critically important. Documentation should have sufficient detail so that another technical teams could pick up and expand upon the project at a later date. Projects include a front-facing github-pages site that provides an overview understandable by a non-technical audience. The repo and gh-pages site can contribute to student porfolios.

Examples from previous classes:

Texts

Other texts

Development environment

You should have a standard Python development environment installed on your computer, including a text editor or IDE. The texts by VanderPlas and McKinney provide modern recommendations.

Assessment

Activity Contribution
Homework ~60%
Project ~30%
Class participation ~10%