Schedule for Fall 2024: Fridays, 12-3:20 Eastern
The course offers students an opportunity to practice data science skills learned in previous courses, and to build a portfolio. Students practice visualization, data wrangling, and machine learning skills by applying them to semester-long term projects on real-world data. Students may either propose their own projects or choose from a selection of industry options. The course emphasizes the overall data science process, including identification of the scientific problem, selection of appropriate machine learning methods, and visualization and communication of results. There may be occasional lectures on special topics such as visualization, communication, and data science ethics.
This class has no traditional homework assignments. Students are expected to focus on their capstone project(s). Assignments will be in the form of project milestones and peer review. Project deliverables include a well-documented github repository that makes it possible for someone with similar background to reproduce the entire data-science pipeline. The repo will also include a github-pages site that communicates results to a general audience.
Peer review is a major component of this course. Students are expected to provide oral and written feedback on their classmates’ projects, including scientific content, effectiveness of their communication and reproducibility of their results.
After completing this course, the student will be able to demonstrate the following competencies: (1) translate the domain science or technology problem into the language of data science, (2) design, evaluate, implement, and deploy analytical strategies that are appropriate for the problem, (3) translate the results into the language of the original science or technology problem, (4) communicate the findings in oral and written form, and (5) provide constructive criticism of other examples of data science projects and reports.