Dear Students, I want to welcome you all to a new school year and the CMSC 21800 course. I can only imagine how surreal it must feel to be a student in such an environment. This fall quarter, at the least, promises to be an interesting one where we all will have to reckon with many extenuating, non-academic circumstances including the challenges in remote eduction, the ebbs and flows of a pandemic, and a tumultuous political atmosphere. While I cannot address any of those challenges, I will try my best to run this course in a way that appreciates the current stress on students whilst doing adequete service to the material. ## 1. What is the course about? This course is an introduction to using computational tools to derive insights from data. The course is roughly broken into three topics: * Measurement. How do we accurately measure real-world phenomena? * Model-based thinking. How do we predict or forecast unobserved phenomena? * Reliability and Scale. How do we build data-driven computing applications? ## 2. Who should take this course? This course is intended for Computer Science majors in satisfaction of the Data Science concentration requirements. I expect students in the course to be proficient programmers (no lectures will be spent teaching you how to program!). On the other hand, we will go through the quantitative methods in slow detail. This is a good working distinction between 118/119 and the current course. We do, however, accept students from all majors. Just keep in mind that if you have a weak background in programming (or coversely a very strong one in college-level statistics), this course is not for you. ## 3. How is the course structured? Due to the challenges of remote education, we won't have graded assignments this quarter. Instead, we will work through the assignment together during class time. All class activities are recorded, but you will have to participate synchronously if you'd like to ask questions. All course materials will be linked through the course website: http://sanjayk.io/cmsc21800/ ### 3a. Lectures On Mondays and Fridays, the lecture will be a conceptual lecture introducing theory and concepts. Questions will only be taken at the end. MF 9:30am-10:30am Central Time. Zoom link: https://uchicago.zoom.us/j/93614402820?pwd=NmlmcXZWdEdmMml3Y1hzZjViQXFzdz09 Id: 936 1440 2820 Passcode: 711504 !! The first Wednesday of class will be a regular lecture !! ### 3b. Practica On Wednesdays, we will work together on a problem set or coding example in the allotted lecture time. This "assignment" will released on Monday and I strongly encourage you to try it out on your own first as that will make the class time most valuable. W 9:30am-10:30am Central Time. Zoom link: https://uchicago.zoom.us/j/93614402820?pwd=NmlmcXZWdEdmMml3Y1hzZjViQXFzdz09 Id: 936 1440 2820 Passcode: 711504 ### 3c. Exams (75% of your grade) Your grade will be mostly determined by 3 exams (each accounting for 25% of your grade). Each of these exams is a take-home exam that will be assigned on the morning of a lecture and due two days later. The exams themselves are short (designed to 1-2 hrs) but you will have multiple days to do them. I expect you to figure out how to manage your time, late exams will recieve a 0. * Exam 1. October 16th Friday 9:30am - Sunday October 18th 11:59pm * Exam 2. November 6th Friday 9:30am - Sunday November 8th 11:59pm * Exam 3. December 2nd Wed 9:30am - Friday Dec 4th 11:59pm ### 3d. Final Project (25% of your grade) We will also ask you to do a quarter-long final project in groups of 4. This project is open-ended assignment where you will: formulate a hypothesis, acquire some dataset that can evaluate this hypothesis, and write a "longform" article on your findings. The primary role of the course TAs (all of whom are PhD students in data analytics/science) will be to mentor you. https://tinyurl.com/y4892647 ### 3e. Readings There are two recommended books (not textbooks!) for this course: * Naked Statistics https://www.amazon.com/Naked-Statistics-Stripping-Dread-Data/dp/039334777X/, * The Art of Statistics https://www.amazon.com/Art-Statistics-How-Learn-Data/dp/1541618513/ They are completely optional but I do encourage interested students to read these books as we progress through the course. This course is overwhelming in terms of material---the books help organize that a bit. ## 4. Grading Exams are worth 75% of your grade and the final project is worth 25% of your grade. While not explicitly curved, we do adjust grading buckets based on how difficult we think the exam is. Historically 2/3 of students in my classes get a B+ or higher. ### 4a. Progressive Grading If you are active on reddit, you've probably heard of "progressive grading" or "socialist grading" in my classes. Simply put, if you have a low grade (C or lower) going into the last exam, I give you an opportunity to raise your grade by turning in typed notes of 2 lectures of your choice. The lower your grade, the more significant the extra credit. An astonishingly small number of failing students actually take advantage of this policy. ### 4b. Pass/Fail You must indicate your P/F status before the third exam. To achieve a P, you must have a grade higher than a C- in the class and have satisfactorily completed all exams. It is your responsibily to take care of any credit repurcussions that may arise due to a non-quality grade; I will not negotiate with your major departments or academic advisors for you!