Building Real-World Systems with AI from Scratch

CS294 • Spring 2026 • UC Berkeley

Instructors
Prof. Ion Stoica
TAs
Ziming Mao
Schedule
Monday, 9:00 AM - 11:59 AM
Dates
1/20/2026 - 5/8/2026
Location
Soda 310
Level
Graduate
Units
4

Course Description

The central inquiry of this course is: Can AI-based tools enable a small team to build real-world systems orders of magnitude faster than traditional methods?

The rapid advancement of AI has led to significant successes in systems research, particularly in the performance optimization of existing systems. In parallel, AI-powered code assistants have demonstrated increasing capability, often generating entire pull requests for application-level code. However, their utility for the more demanding task of building robust, concurrent, and high-performance systems from the ground up remains a largely open question.

This course tackles this challenge. We will investigate the end-to-end lifecycle of building new systems and subsystems using the latest generative AI tools, as well as improving these tools. Our exploration will include:

  • Designing new architectures that are more amenable to using AI tools.
  • Developing new AI tools that target building systems or systems components from scratch.
  • Developing fast, accurate simulators.
  • Generating formal specifications for protocols and components.
  • Guiding iterative performance optimization and tuning.

Students will work on hands-on projects to build systems such as distributed collective communication libraries, distributed storage systems, distributed inference engines, new kernels for new accelerators, and more.

Note: Here we use "systems" broadly to include databases, distributed systems, networking, programming languages, and operating systems.

📢 Announcements

Welcome to Spring 2026! Please check back for updates on readings and assignments.

If you haven't done so already, please join our Slack here: CS294-265 Spring 2026 Slack.

Course Schedule

Note: This syllabus is tentative and subject to change. Please check back regularly for updates.

Week Date Topic Readings (tentative list, subject to change) Speaker
1 Jan 19 Academic Holiday
2 Jan 26 Class overview, logistics, project suggestions AlphaEvolve: A coding agent for scientific and algorithmic discovery OpenEvolve: Automated Discovery of Algorithms and Systems ShinkaEvolve: Towards Open-Ended And Sample-Efficient Program Evolution Glia: A Human-Inspired AI for Automated Systems Design and Optimization
3 Feb 2 Ongoing projects
AdaEvolve: Adaptive LLM Driven Zeroth-Order Optimization
Evolve^2
Evolving SkyRL into a Highly-Modular RL Framework SkyRL-Agent: Efficient RL Training for Multi-turn LLM Agent GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning KISS: A Language Model-based Interactive Software Development Environment
4 Feb 9 Project proposals presentations
5 Feb 16 Academic Holiday
6 Feb 23 Repo-level coding and repair SWE-bench: Can Language Models Resolve Real-World GitHub Issues? SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering Empirical Evaluation of Generalizable Automated Program Repair with Large Language Models OpenRCA: Can Large Language Models Locate the Root Cause of Software Failures? Rethinking Verification for LLM Code Generation: From Generation to Testing
Optional:
NExT: Teaching Large Language Models to Reason about Code Execution Code Repair with LLMs gives an Exploration-Exploitation Tradeoff
Koushik Sen
7 March 2 Formal Verification & Reliability Chain-of-Verification Reduces Hallucination in Large Language Models Dafny as Verification-Aware Intermediate Language for Code Generation Solving a Million-Step LLM Task with Zero Errors Models That Prove Their Own Correctness Verifying LLM-Generated Code in the Context of Software Verification with Ada/SPARK
Optional:
Don't Trust: Verify — Grounding LLM Quantitative Reasoning with Autoformalization Safe: Enhancing Mathematical Reasoning in Large Language Models via Retrospective Stepaware Formal Verification Position: Trustworthy AI Agents Require the Integration of Large Language Models and Formal Methods Specifications: The missing link to making the development of LLM systems an engineering discipline
8 Mar 9 Simulations Fast End-to-End Performance Simulation of Accelerated Hardware-Software Stacks Analyzing Metastable Failures SimAI: Unifying Architecture Design and Performance Tuning for Large-Scale Large Language Model Training Zero-Shot Cost Models for Out-of-Distribution Query Workloads CausalSim: A Causal Framework for Unbiased Trace-Driven Simulation Rishabh Iyer
9 Mar 16 AI-driven kernel optimizations KernelBench: Can LLMs Write Efficient GPU Kernels? Astra: A multi-agent system for gpu kernel performance optimization Automating GPU Kernel Generation with DeepSeek-R1 and Inference Time Scaling EvoEngineer: Mastering Automated CUDA Kernel Code Evolution with Large Language Models Towards Robust Agentic CUDA Kernel Benchmarking, Verification, and Optimization VibeTensor: System Software for Deep Learning, Fully Generated by AI Agents Bing Xu
10 Mar 23 Spring Recess
11 Mar 30 AI-driven training and inference optimizations REASONING COMPILER: LLM-Guided Optimizations for Efficient Model Serving PUZZLE: Distillation-based NAS for inference-Optimized LLMs Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling BEST-Route: Adaptive LLM Routing with Test-Time Optimal Compute KernelBlaster: Continual Cross-Task CUDA Optimization via Memory-Augmented In-Context Reinforcement Learning Amir Yazdan Bakhsh
12 Apr 6 From academia to industry and everything in between Eric Liang, Martin Casado, Azalia Mirhoseini
13 Apr 13 AI-driven networking optimizations AutoCCL: Automated Collective Communication Tuning for Accelerating Distributed and Parallel DNN Training Learning Production-Optimized Congestion Control Selection for Alibaba Cloud CDN NetLLM: Adapting Large Language Models for Networking TACCL: Guiding Collective Algorithm Synthesis using Communication Sketches Christopher Fletcher
14 Apr 20 AI for databases and databases for AI Conformal Prediction for Verifiable Learned Query Optimization Data-Agnostic Cardinality Learning from Imperfect Workloads NeurDB: On the Design and Implementation of an AI-powered Autonomous Database ZeroCard: Cardinality Estimation with Zero Dependence on Target Databases - No Data, No Query, No Retraining 𝜆-Tune: Harnessing Large Language Models for Automated Database System Tuning
Optional:
Palimpzest: Optimizing AI-Powered Analytics with Declarative Query Processing Supporting Our AI Overlords: Redesigning Data Systems to be Agent-First
Aditya Parameswaran
15 Apr 27 Project presentations

Grading

Paper presentations and reviews
20%
Class discussions and participation
20%
Project
60%

Course Resources

📁 Course Materials

Access homework assignments, signup sheets, and other course materials: Course Drive Folder

💬 Slack

Join our course Slack for announcements and discussions: CS294-264 Spring 2026 Slack Invite

Academic Integrity

All work submitted must be your own. Collaboration is encouraged for understanding concepts and debugging, but code and written reports must be completed independently unless explicitly stated otherwise. Use of AI tools for completing assignments must be disclosed and will be discussed as part of the course content.