Fall 2021         DRAFT

On-Device Machine Learning is a project-based course covering how to build, train, and deploy models that can run on low-power devices (e.g. smart phones, refrigerators, and mobile robots). The course will cover advances topics on distillation, quantization, weight imprinting, power calculation and more. Every week we will discuss a new research paper and area in this space one day, and have a lab working-group the second. Specifically, students will be provided with low-power compute hardware (e.g. SBCs and inference accelerators) in addition to sensors (e.g. microphones, cameras, and robotics) for their course project. The project will involve three components for building low-power multimodal models:
        (1) inference
        (2) performing training/updates for interactive ML, and
        (3) maximizing power.
The more that can be performed on device, the more privacy preserving and mobile the solution is.
For each stage of the course project, the final model produced will have an mAh "budget" equivalent to one full charge of a smart phone battery (~4 Ah or 2hrs on Jetson Nano, 7hrs on RPi, or 26hrs on a RPi Zero W).


Instructors

Yonatan Bisk
Yonatan Bisk

ybisk@cs.cmu.edu

Teaching Assistants

Student 1

student@cmu.edu

Student 2

student@cmu.edu


Slack and Course Communication

All course communication will happen via slack including slides and discussions.

Slack


Assignments Timeline and Grading

The course is split 50% paper discussion and 50% projects.
Papers Project/Lab
– Participation 15% – Lab Reports (1page) 50%
– Paper Presentations 15% – Final Report & Presentation 20%
Participation:
Participation in Class or Slack (15%)
Participation is evaluated as "actively asking/answering questions based on the lectures, readings, and/or assisting other teams with project issues". Concretely, this means that every novel question or helpful answer provided in Slack will count for 1%, up to a total of 15% of your grade.

Submission Policies:

Projects, Hardware, and Resources

The course will be primarily centered on a few multimodal tasks to facilitate cross-team collaboration and technical assistance. If your team has a good reason to work on something not listed here, please reach out so we can discuss it and put together a proposal. Every team will also be provided with one of the following Single Board Computers (SBCs)

Example Projects
Input Output Task
Speech Text Open-Domain QA
Images Text Object Detection or ASL Finger Spelling
Images Robot Arm Learning from Demonstration
Speech + Images Robot Car Vision-Language Navigation
Single Board Computers
SBC RAM Notes
Raspberry Pi Zero W 512MB 150mA draw on limited processor
Raspberry Pi 4 2GB, 4GB or 8GB 2Amp draw on Moderately powerful processor
Google Coral 1GB, 4GB Edge TPU accelerator (TFLite)
Jetson Nano 2GB 128-core NVIDIA Maxwell CUDA cores
Resources

Classes

Discussion
Lab
Aug 31: Course structure & Background
Sept 2: Hardware and Modality choices
  • Hardware trade-off discussions:
    • Raspberry Pi Zero W 150 mA
    • Raspberry Pi 4 600 mA
    • Jetson Nano & Google Coral 2A
  • Input Options:
    • Speech Recognition: Microphone
    • Images: Camera
  • Output Options:
    • Text Output: LCD Display
    • Control: Robot
    • Additional sensors/extensions will be available: Temperature, LEDs, etc
Sept 7: Understanding the Ecosystem
  • ARM
  • OSs: Raspbian, Ubuntu, Android
  • SBCs vs Micro-controllers
Sept 9: OS and Peripherals setup
  • Dev Boards: Custom ARM builds
  • Build environments: Source install PyTorch/HuggingFace and TFLite
  • Familiarize with hardware
  • Run pretrained models
  • Report 1 5%
Sept 14: TinyML, TFLite, PyTorch Mobile Sept 16: Benchmark existing model
  • Performance, space, and power
  • Report 2 5%
Sept 21 Distillation
Sept 23: Fine-tune pretrained model
  • Report 3 5%
Sept 28: Distillation
Sept 30: Project Proposal
  • Task definition, modalities, and evaluation
  • Report 4 5%
Oct 5: Quantization Oct 7: Quatization or distillation results
  • Report 5 5%
Oct 12: Quantization Oct 14: No class
Oct 19:On-Device Computer Vision Oct 21: Inference only
  • Full task implementation (no training)
  • Report 6 5%
Oct 26: Real Time Speech Recognition Oct 28: Train a new example
  • Report 7 5%
Nov 2: Weight Imprinting Nov 4: Implementation
  • Report 8 5%
Nov 9: Neural Architecture Search Nov 11: Training benchmarks
  • Static dataset provided by instructors
  • Report 9 5%
Nov 16: Power implications of accelerators Nov 18: Carbon & Alternative Power
  • Report 10 5%
Nov 23: Multimodal Fusion Nov 25: No class
Nov 30: FPGAs, Batteries, Solar, ... Dec 2: TBD
Dec 7: Final Presentation Dec 9: Final Report due