Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Join the course here 👇🏼
COMMITS
May 3, 2026
A
Add disallowed-datasets section to project rubric
Alexey Grigorev committed
April 20, 2026
1
Add notes about nrows=100 in data ingestion doc (#839)
1879020 committed
A
docs: add high-quality datasets for 2026 project capstones (#835)
Aman Atar committed
S
Clean up formatting and comments in big_query.sql (#837)
Sunil committed
K
add notes for pyspark (#838)
Khanh Nguyen committed
April 8, 2026
A
Add peer review suggestions for project submissions
Alexey Grigorev committed
March 19, 2026
M
Document Bruin MCP integration steps for VS Code (#832)
motho17 committed
K
Add files via upload (#833)
Khang Tran committed
March 12, 2026
A
Update streaming homework with verified answers and setup hints
Alexey Grigorev committed
March 9, 2026
A
Add streaming homework for 2026 cohort
Alexey Grigorev committed
March 5, 2026
A
Update README.md
Alexey Grigorev committed
A
Fix PySpark installation link in homework.md
Alexey Grigorev committed
March 4, 2026
A
Update README.md
Alexey Grigorev committed
M
mod 6 - update taxi_lookup link (forbiden access in aws) (#831)
Michael Garcia-Rollet committed
A
Add live workshop code, link from README
Alexey Grigorev committed
A
Update video link in workshop README
Alexey Grigorev committed
A
Rewrite watermark explanation, add realtime producer, rename jobs
Alexey Grigorev committed
A
Credit Irem for the Python Kafka examples
Alexey Grigorev committed
A
Fix: 2026 workshop is by Alexey, not Zach
Alexey Grigorev committed
A
Fix pyflink attribution: Irem's workshop predates Zach's 2025 stream
Alexey Grigorev committed
A
Add mkdir before docker compose to prevent root-owned src/
Alexey Grigorev committed
A
Fix pyflink attribution (Irem vs Zach), mark extras as optional
Alexey Grigorev committed
A
Delete dataset.md
Alexey Grigorev committed
A
Add .duckdb to .gitignore
Alexey Grigorev committed
A
Delete taxi_rides_ny.duckdb
Alexey Grigorev committed
A
Reorganize 07-streaming: theory/, extras/, concise main README
Alexey Grigorev committed
A
Expand watermark explanation: why the name, how the subtraction works
Alexey Grigorev committed
March 3, 2026
A
Fix workshop: slots bug, missing models.py, incomplete aggregation job
Alexey Grigorev committed
A
Rewrite workshop: gradual structure, NYC taxi data, shared models, Q&A
Alexey Grigorev committed
A
Switch to kafka-python, add clean Flink config, fix JDK warnings
Alexey Grigorev committed