DS 5110

Introduction to Data Management and Processing

Introduction to Data Management & Processing

Introduces students to the core tasks in data science, including data collection, storage, tidying, transformation, processing, management, and modeling for the purpose of extracting knowledge from raw observations. Programming is a cross-cutting aspect of the course. Offers students an opportunity to gain experience with data science tasks and tools through short assignments. Includes a term project based on real-world data. (This course description is from the Academic Catalog.)

Texts

Schedule

Week Topic ISL PDS
1 Intro
2 Data Viz Ch 1 Ch 2-4
3 Tidy Data Ch 2 Ch 2-3
4 Relational Data Ch 3
5 Regression Ch 3
6 Classification Ch 4 Ch 5
7 Resampling Ch 5 Ch 5
8 Selection Ch 6 Ch 5
9 Trees Ch 8 Ch 5
10 SVMs Ch 9 Ch 5
11 Unsupervised Learning Ch 12 Ch 5
12 Text Mining
13 Deep Learning Ch 10
14 Project Presentations

Approach

By the end of the course students should be able to access and import a dataset, then clean, transform, and visualize the dataset appropriately for a well-described analytic goal.

Projects

Term projects allow students to gain experience working in small teams on practical problems with real-world data. Ideally, these are XN-style projects that involve external stakeholders who help review prototypes and provide feedback along the way.

Example from Spring 2022:

Development environment

Students should have a standard Python development environment installed on their computer, including a text editor or IDE, and git (as described here). All coding assignments will involve github repositories administered with github classroom.

Assessment

Activity Contribution
Homework ~55%
Project ~30%
Class Participation ~15%