Introduction to Data Engineering

Course Description: Data-driven models are revolutionizing science and industry. Scalable systems are needed to collect, stream, process, and validate data at scale. This course is an introduction to “big” data engineering where students will receive hands-on experience building and deploying realistic data-intensive systems. It will cover streaming, data cleaning, relational data modeling and SQL, and machine learning model training. A core theme of the course is “scale”, and we will discuss the theory and the practice of programming with large external datasets that cannot fit in main memory on a single machine. The course will consist of bi-weekly programming assignmentsa midterm examination, and a final.

Location: MWF 9:30-10:20 SHFE 203

Office Hours: MW 4:30-5:30 243 JCL (Sanjay)

Office Hours (TA): Wed 11-12 (Rose), Thurs 9:30-10:30 (Will) both in 259 JCL

Grading: Quizzes (10%), Homework (20%), Midterm (30%), Final (40%) . The exam schedule is listed below:

  • Midterm (6:30 pm-8:30 pm May 10)
  • Final (10:30am-12:30pm June 12)
  • For any conflicts, a makeup exam will be scheduled prior to these times. It is your responsibility to coordinate this well in advance.

Late Policy: 0% for all late work, reasonable exceptions will be considered including family emergencies, illness, etc.

Official Communication: The TA(s) and Instructor WILL NOT respond to personal emails. Please communicate through Piazza either with a public post if it is of general interest or a private message.

4/1Course Introduction (pdf)
4/3Iterators (pdf)
4/5Operators (pdf) (submission instructions)HW0
4/8Composing Operators (pdf) (
4/10Main-Memory Aggregation (pdf)
4/12Out-of-core algorithms (pdf) (
4/15Out-of-core cont’d/ Hash Join (pdf)HW1
4/17In Class Quiz
4/19ParallelismHW0 Due
4/22Parallelism Cont’d
4/24SQL I
4/29Intro to Machine LearningHW1 DUE
5/1Tensor Operators
5/3ML Systems I
5/6ML Systems II
5/8(Guest Lecture) Midterm Review
5/10ETL and Potter’s WheelMidterm
5/13Potter’s Wheel Cont’dHW2 DUE
5/15Formal Logic/Abstract Algebra ReviewHW3 OUT
5/17Integrity Constraints I
5/20Integrity Constraints II
5/24MatchingHW4 OUT
5/29MatchingHW3 DUE
5/31Knowledge Bases I
6/3Knowledge Bases II
6/5View From The TopHW4 DUE
6/12Final 10:30am-12:30pm