11-767 On-Device Machine Learning

Fall 2021

On-Device Machine Learning is a project-based course covering how to build, train, and deploy models that can run on low-power devices (e.g. smart phones, refrigerators, and mobile robots). The course will cover advances topics on distillation, quantization, weight imprinting, power calculation and more. Every week we will discuss a new research paper and area in this space one day, and have a lab working-group the second. Specifically, students will be provided with low-power compute hardware (e.g. SBCs and inference accelerators) in addition to sensors (e.g. microphones, cameras, and robotics) for their course project. The project will involve three components for building low-power multimodal models:
        (1) inference
        (2) performing training/updates for interactive ML, and
        (3) maximizing power.
The more that can be performed on device, the more privacy preserving and mobile the solution is.

For each stage of the course project, the final model produced will have an mAh "budget" equivalent to one full charge of a smart phone battery (~4 Ah or 2hrs on Jetson Nano, 7hrs on RPi, or 26hrs on a RPi Zero W).

Time & Place: 10:10am - 11:30am on Tu/Th
Course questions and discussion: Slack
GitHub Template: https://github.com/ybisk/11-767-template

Example Industry Motivation "... if the coffee maker with voice recognition was in use for four years, the speech recognition cost for chewing on data back in the Mr Coffee datacenter would wipe out the entire revenue stream from that coffee maker, but that same function, if implemented on a device specifically tuned for this very precise job, could be done for under $1 and would not affect the purchase price significantly. " -- Source

Instructors

Yonatan Bisk

ybisk@cs.cmu.edu

Emma Strubell

strubell@cmu.edu

Slack and Course Communication

All course communication will happen via slack including slides and discussions.

Slack

#general-questions: For questions about lectures, the course, or help from others on class projects
🔒group-N: Each group should come up with a name and create their own private channel (invite TAs and instructur). Use the same name for your GitHub fork and pin the link to the channel. Please also invite us to the GitHub. Example: 🔒group-RoboFun
#hardware/#modality: Hardware or Modality related questions should be asked broadly for anyone to help with.
Private Messages: If there is a question you would like to address to the instructors, please create a 4-person PM on slack. Please check #general-questions first and post there when possible.

Assignments Timeline and Grading

The course is split half on paper discussion and half projects.

Papers	Project/Lab
– Participation	20%	– Lab Reports (1page)	45%
– Paper Presentations	15%	– Final Report & Presentation	20%

Participation:
Participation in Class or Slack (20%)
Participation is evaluated as "actively asking/answering questions based on the lectures, readings, and/or assisting other teams with project issues". Concretely, this means that every novel question or helpful answer provided in Slack will count for 1%, up to a total of 20% of your grade.

Submission Policies:

All deadlines are 5pm EST (determined by GitHub commit time)
Late days: Every team has a budget of 3 late days. They will be automatically calculated based on git commit, after which 2% absolute is removed from max grade.

Projects, Hardware, and Resources

The course will be primarily centered on a few multimodal tasks/platforms to facilitate cross-team collaboration and technical assistance. If your team wants to use custom hardware or sensors not listed here -- that's fine, but please reach out so we can discuss it and put think through the implications. Every team will also be provided with one of the following Single Board Computers (SBCs)

Example Projects

Input	Output	Task
Speech	Text	Open-Domain QA
Images	Text	Object Detection or ASL Finger Spelling
Images	Robot Arm	Learning from Demonstration
Speech + Images	Robot Car	Vision-Language Navigation

Single Board Computers

SBC	RAM	Notes
Raspberry Pi Zero W	512MB	150mA draw on limited processor
Raspberry Pi 4	2GB, 4GB or 8GB	2Amp draw on Moderately powerful processor
Google Coral	1GB, 4GB	Edge TPU accelerator (TFLite)
Jetson Nano	2GB	128-core NVIDIA Maxwell CUDA cores

Resources

Classes

Discussion	Lab
Aug 31: Course structure & Background Energy and Policy Considerations for Deep Learning in NLP Green AI Early Fusion for Goal Directed Robotic Vision	Sept 2: Hardware and Modality choices Hardware trade-off discussions: Raspberry Pi Zero W 150 mA Raspberry Pi 4 600 mA Jetson Nano & Google Coral 2A Input Options: Speech Recognition: Microphone Images: Camera Output Options: Text Output: LCD Display Control: Robot Additional sensors/extensions will be available: Temperature, LEDs, etc
Sept 7: Understanding the Ecosystem ARM OSs: Raspbian, Ubuntu, Android SBCs vs Micro-controllers	Sept 9: OS and Peripherals setup Dev Boards: Custom ARM builds Build environments: Source install PyTorch/HuggingFace and TFLite Familiarize with hardware Run pretrained models Report 1 5%
Sept 14: TinyML, TFLite, PyTorch Mobile	Sept 16: Benchmark existing model Performance, space, and power Report 2 5%
Sept 21 Distillation Distilling the Knowledge in a Neural Network DistilBERT	Sept 23: Fine-tune pretrained model Report 3 5%
Sept 28: Distillation TernaryBERT: Distillation-aware Ultra-low Bit BERT MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices	Sept 30: Thinking
Oct 5: Quantization Binarized Neural Networks Training Deep Neural Networks with 8-bit Floating Point Numbers	Oct 7: Project Proposal Task definition, modalities, and evaluation Report 4 5%
Oct 12: Quantization HAWQ: Hessian AWare Quantization of Neural Networks with Mixed-Precision Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT OptionalScalable Methods for 8-bit Training of Neural Networks	Oct 14: No class
Oct 19:On-Device Computer Vision XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications	Oct 21: Baseline or Hardcoded system Full task implementation (no training) Report 5 5%
Oct 26: Real Time Speech Recognition EESEN: End-to-End Speech Recognition using Deep RNN Models and WFST-based Decoding Exploring architectures, data and units for streaming end-to-end speech recognition with RNN-transducer	Oct 28: Implementation Documenting Failures, Benchmarking, ... Report 6 5%
Nov 2: Weight Imprinting Low-Shot Learning with Imprinted Weights	Nov 4: Implementation Documenting Failures, Benchmarking, ... Report 7 5%
Nov 9: Neural Architecture Search SpArSe: Sparse Architecture Search for CNNs on Resource-Constrained Microcontrollers FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable NAS Optional:EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks Optional:High-Performance Large-Scale Image Recognition Without Normalization	Nov 11: Implementation Documenting Failures, Benchmarking, ... Report 8 5%
Nov 16: Power implications of accelerators	Nov 18: Carbon & Alternative Power Report 9 5%
Nov 23: Multimodal Fusion VisualBERT: A Simple and Performant Baseline for Vision and Language Mapping Navigation Instructions to Cont. Control Actions with Position-Visitation Pred.	Nov 25: No class
Nov 30: FPGAs, Batteries, Solar, ...	Dec 2: Concerns Discussion and Final Prep
Dec 7: Final Presentation	Dec 9: Final Report due

11-767: On-Device Machine Learning