This course is an introduction to using computational tools to derive insights from data. The course is roughly broken into three topics:
- Measurement. How do we accurately measure real-world phenomena?
- Model-based thinking. How do we predict or forecast unobserved phenomena?
- Reliability and Scale. How do we build data-driven computing applications?
Read my welcome email first before you do anything else!
Course Structure
Lectures: For Fall 2020, this course will be an online course with recorded lectures (live at MWF 9:10am). On Mondays and Fridays, the lecture will be a conceptual lecture introducing theory and concepts. On Wednesdays, we will work together on a problem set or coding example in the allotted lecture time.
Zoom link: https://uchicago.zoom.us/j/93614402820?pwd=NmlmcXZWdEdmMml3Y1hzZjViQXFzdz09
Slack Link (for course projects and logistics): https://join.slack.com/t/cs-ezq4738/shared_invite/zt-hn95c2mv-LpcbB49MvDrwQdWimrBmRQ
Recommended Reading: Naked Statistics https://www.amazon.com/Naked-Statistics-Stripping-Dread-Data/dp/039334777X/, The Art of Statistics https://www.amazon.com/Art-Statistics-How-Learn-Data/dp/1541618513/
Exams and Grading: There will be 3 take-home exams and a quarter-long final course project. Grading is as follows: 0.25*Exam1 + 0.25*Exam2 + 0.25*Exam3 + 0.25*Project
- Exam 1. October 16th Friday 9:30am – Sunday October 18th 11:59pm
- Exam 2. November 6th Friday 9:30am – Sunday November 8th 11:59pm
- Exam 3. December 2nd Wed 9:30am – Friday Dec 4th 11:59pm
- Final Project. Due December 10th 11:59pm (details) REVISED!!
Office hours: 11am-12pm Tuesdays: https://uchicago.zoom.us/j/93614402820?pwd=NmlmcXZWdEdmMml3Y1hzZjViQXFzdz09
Lectures
The main lectures of the course introduce core topics in data science and show how to connect theoretical concepts in statistics with real-world data analysis problems.Recent events illustrate just how hard it is to quantify and measure real-world phenomena. In this lecture, we talk about the subjectivity of data analysis and talk about what data can and can’t do for you. Watch Slides
This lecture describes the types of data we may measure, populations (and their samples), and different types of biases that may enter the data collection process. Watch
This lecture will discuss how we mathematically model “simple” sampling processes and what kinds of insights we can gain from such models. Watch Slides
Recently, it seems like opinion polls are so wrong about everything. This lecture describes how systematic sampling can bias estimates and discusses stratified sampling techniques. Watch Slides
From the smaller to the bigger. We describe how different aggregations of population data can be compared and what those comparisons mean. Watch Slides
A randomized controlled trial is the primary method for determining causation. This lecture describes hypothesis testing, significance, and the assumptions needed to demonstrate causation beyond a reasonable doubt. Watch Slides
A randomized controlled trial is the primary method for determining causation. This lecture describes hypothesis testing, significance, and the assumptions needed to demonstrate causation beyond a reasonable doubt. Watch Slides
Correlation does not imply causation. Just because one variable has the ability to predict another, doesn’t mean there is a cause-and-effect relationship between them. This lecture defines correlation and differentiates this concept from causation. Watch Slides
A single anomalous measurement can affect measures of significance or confidence. This lecture describes how to make measures of correlation and causation more robust to outliers. We also describe the “principle of similar confidence”, where only like measurements should be compared against each other. Watch Slides
This lecture introduces using computational techniques to solve statistical estimation problems. Watch Slides
This lecture introduces using computational techniques to solve statistical estimation problems. Watch Slides
This lecture covers model fitting and the bias-variance tradeoff. Watch Slides
This lecture covers prediction rules and the bias-variance tradeoff from a different perspective. Watch Slides
In this lecture, we describe how we measure the success of machine learning models. Watch Slides
Two lectures on KMeans clustering and Principal component analysis. Watch Lecture 1 Slides Slides Watch Lecture 2
Can a machine learn to “see”? This lecture describes how once seemingly intractable problems in computer vision have been solved over the last 10 years. Watch Slides
How does Alexa understand language so well? In this lecture, we describe some of the break throughs in natural language understanding that allow us to build complex, conversational systems. WatchSlides
The recent breakthroughs in learning language, vision, and speech have been made by machine learning. How and why do they work, and what still confuses us about their effectiveness? Also, a lot about the dangers of relying too much on predictive models. Watch Slides
Machine learning does not solve everything and classical data structures play an important role in the analysis and presentation of data. Watch Slides
Practica
Each of the “practicum” lectures, illustrates a coding example or a problem set that reinforces the concepts learned in class.
Supplementary Lectures
These are supplementary lectures that are designed to cover topics you may have learned in pre-requisite classes.A random experiment is any process where the outcome is not known before hand. This lecture talks about random experiments, outcomes, events, and how to interpet them.
Watch
A probability space assigns relative likelihoods to resultant events from a random experiment. This lecture describes the axioms of probability and the concept of an “algebra” of events.
Watch
How likely is one event knowing that another definitely happens? Probability spaces are naturally subdividable and we describe the concepts of conditioning and independence.
Watch
We often assign numerical values to the outcomes of random experiments–these are called Random Variables. This lecture describes discrete and continuous random variables and distributions.
Watch
Suppose we repeatedly and independently run a random experiment, how do we characterize the long-term behavior of the process? The expected value of a random variable characterizes where the long-term average of independent trials of a random experiment converge to.
Watch
Random experiments are often coupled with each other, where the result of one gives us some information about another. This lecture overviews quantifying how informative an observation is. We describe correlation between random variables, and the variance of random variables.
Watch