Fall 2021

On-Device Machine Learning is a project-based course covering how to build, train, and deploy models that can run on low-power devices (e.g. smart phones, refrigerators, and mobile robots). The course will cover advances topics on distillation, quantization, weight imprinting, power calculation and more. Every week we will discuss a new research paper and area in this space one day, and have a lab working-group the second. Specifically, students will be provided with low-power compute hardware (e.g. SBCs and inference accelerators) in addition to sensors (e.g. microphones, cameras, and robotics) for their course project. The project will involve three components for building low-power multimodal models:
        (1) inference
        (2) performing training/updates for interactive ML, and
        (3) maximizing power.
The more that can be performed on device, the more privacy preserving and mobile the solution is.
For each stage of the course project, the final model produced will have an mAh "budget" equivalent to one full charge of a smart phone battery (~4 Ah or 2hrs on Jetson Nano, 7hrs on RPi, or 26hrs on a RPi Zero W).

Example Industry Motivation "... if the coffee maker with voice recognition was in use for four years, the speech recognition cost for chewing on data back in the Mr Coffee datacenter would wipe out the entire revenue stream from that coffee maker, but that same function, if implemented on a device specifically tuned for this very precise job, could be done for under $1 and would not affect the purchase price significantly. " -- Source

Instructors

Yonatan Bisk
Yonatan Bisk

ybisk@cs.cmu.edu


Slack and Course Communication

All course communication will happen via slack including slides and discussions.

Slack


Assignments Timeline and Grading

The course is split half on paper discussion and half projects.
Papers Project/Lab
– Participation 20% – Lab Reports (1page) 45%
– Paper Presentations 15% – Final Report & Presentation 20%
Participation:
Participation in Class or Slack (20%)
Participation is evaluated as "actively asking/answering questions based on the lectures, readings, and/or assisting other teams with project issues". Concretely, this means that every novel question or helpful answer provided in Slack will count for 1%, up to a total of 20% of your grade.

Submission Policies:

Projects, Hardware, and Resources

The course will be primarily centered on a few multimodal tasks/platforms to facilitate cross-team collaboration and technical assistance. If your team wants to use custom hardware or sensors not listed here -- that's fine, but please reach out so we can discuss it and put think through the implications. Every team will also be provided with one of the following Single Board Computers (SBCs)

Example Projects
Input Output Task
Speech Text Open-Domain QA
Images Text Object Detection or ASL Finger Spelling
Images Robot Arm Learning from Demonstration
Speech + Images Robot Car Vision-Language Navigation
Side view of three-wheel robot car LCD Screen and Camera for ASL
Single Board Computers
SBC RAM Notes
Raspberry Pi Zero W 512MB 150mA draw on limited processor
Raspberry Pi 4 2GB, 4GB or 8GB 2Amp draw on Moderately powerful processor
Google Coral 1GB, 4GB Edge TPU accelerator (TFLite)
Jetson Nano 2GB 128-core NVIDIA Maxwell CUDA cores
Resources

Classes

Discussion
Lab
Aug 31: Course structure & Background
Sept 2: Hardware and Modality choices
  • Hardware trade-off discussions:
    • Raspberry Pi Zero W 150 mA
    • Raspberry Pi 4 600 mA
    • Jetson Nano & Google Coral 2A
  • Input Options:
    • Speech Recognition: Microphone
    • Images: Camera
  • Output Options:
    • Text Output: LCD Display
    • Control: Robot
    • Additional sensors/extensions will be available: Temperature, LEDs, etc
Sept 7: Understanding the Ecosystem
  • ARM
  • OSs: Raspbian, Ubuntu, Android
  • SBCs vs Micro-controllers
Sept 9: OS and Peripherals setup
  • Dev Boards: Custom ARM builds
  • Build environments: Source install PyTorch/HuggingFace and TFLite
  • Familiarize with hardware
  • Run pretrained models
  • Report 1 5%
Sept 14: TinyML, TFLite, PyTorch Mobile Sept 16: Benchmark existing model
  • Performance, space, and power
  • Report 2 5%
Sept 21 Distillation
Sept 23: Fine-tune pretrained model
  • Report 3 5%
Sept 28: Distillation
Sept 30: Thinking
Oct 5: Quantization Oct 7: Project Proposal
  • Task definition, modalities, and evaluation
  • Report 4 5%
Oct 12: Quantization Oct 14: No class
Oct 19:On-Device Computer Vision Oct 21: Baseline or Hardcoded system
  • Full task implementation (no training)
  • Report 5 5%
Oct 26: Real Time Speech Recognition Oct 28: Implementation
  • Documenting Failures, Benchmarking, ...
  • Report 6 5%
Nov 2: Weight Imprinting Nov 4: Implementation
  • Documenting Failures, Benchmarking, ...
  • Report 7 5%
Nov 9: Neural Architecture Search Nov 11: Implementation
  • Documenting Failures, Benchmarking, ...
  • Report 8 5%
Nov 16: Power implications of accelerators Nov 18: Carbon & Alternative Power
  • Report 9 5%
Nov 23: Multimodal Fusion Nov 25: No class
Nov 30: FPGAs, Batteries, Solar, ... Dec 2: Concerns Discussion and Final Prep
Dec 7: Final Presentation Dec 9: Final Report due