DS 5110

Introduction to Data Management and Processing

Introduction to Data Management & Processing

Introduces students to the core tasks in data science, including data collection, storage, tidying, transformation, processing, management, and modeling for the purpose of extracting knowledge from raw observations. Programming is a cross-cutting aspect of the course. Offers students an opportunity to gain experience with data science tasks and tools through short assignments. Includes a term project based on real-world data. (This course description is from the Academic Catalog.)

Texts

Approach

Schedule

See the github repo for course notes and additional detail. All assigments will be assigned in Canvas using github classroom.

Week Topic ISLR2  
1 Intro  
2 Tidy data Ch 1  
3 Relational data Ch 2  
4 Regression Ch 3  
5 Classification Ch 4  
6 Resampling Ch 5  
7 Model selection Ch 6  
8 Beyond linearity Ch 7  
9 Trees Ch 8  
10 Support Vector Machines Ch 9  
11 Unsupervised learning Ch 12  
11 Text mining TBD  
12 Deep learning Ch 10  
13 Project presentations  
14 Finals week

Projects

Projects allow students to gain experience working in small teams on practical problems with real-world data. Ideally, these are XN projects that involve external stakeholders who help review prototypes and provide feedback along the way.

Example from Spring 2022:

Development environment

Students should have a standard Python development environment installed on their computer, including a text editor or IDE, and git (as described here). All coding assignments will involve github repositories administered with github classroom.

Assessment

Activity Contribution
Homework ~30%
Quizzes ~30%
Project ~30%
Class Participation ~10%