Spring 2022 Previous Projects
This course focuses on core techniques and modern advances for integrating different "modalities" into a shared representation or reasoning system. Specifically, these include text, audio, images/videos and action taking.
- Time & Place: 10:10am - 11:30am on Tu/Th (Doherty Hall 2210)
- Canvas: Lectures and additional details (coming soon)
- Course questions and discussion: Slack
Registered students will be invited daily the first week of class - GitHub Template: https://github.com/ybisk/11-777-template
Slack and Canvas
All course communication will happen via slack and canvas. All videos will be posted to Canvas for offline viewing though aspects of the class/teaching will be interactive in the zoom sessions.
Slack
- #general: For questions about lectures, the course, or help from others on class projects
- #team-N-X: Each team should come up with a name and create their own private channel (invite TAs and instructor). Use the same name for your GitHub fork and pin the link to the channel. Please also invite us to the GitHub. Example: #team-fun-vizwiz
- #dataset-XYZ: Each core dataset will also have its own slack channel that anyone can join (across teams) to ask for help on setup, preprocessing, and other issues that might arise.
- Private Messages: If there is a question you would like to address to the instructors, please send a DM on slack. Please check #general-questions first and post there when possible.
Assignments Timeline and Grading
The course is primarily project based, but there will be readings throughout the course which are only graded via participation.Project Timeline and Assignments: (see links for more details)
Feb 03 | Groups Formed | ||
Feb 10 | R1 | Dataset Proposal and Analysis (as a group) | (10%) |
Mar 03 | R2 | Related Work and Model Proposal | (15%) |
Mar 31 | R3 | Baseline Analysis | (15%) |
Finals Week | Presentation | (10%) | |
May 6 | Final | Completed Report | (20%) |
Participation:
Participation in Class or Slack (20%)
Participation is evaluated as "actively asking/answering questions based on the lectures, readings, and/or assisting other teams with project issues". Concretely, this means that every novel question or helpful answer provided in Slack will count for 1%, up to a total of 20% of your grade. Two bonus points can be earned (22%).
Paper Summaries:
Paper Summaries (10%)
Writing a three sentence summary describing the paper you read earns you 1pt. This summary will be submitted in three text boxes. Specifically, A. The goal of the paper, B. Explain the key insight, C. State a key limitation or important extension. There will be 11 opportunities, so you one bonus point can be earned (11%). Paper summaries are due the following Tuesday night (1 week after being assigned).
Submission Policies:
- All deadlines are midnight EST (determined by Canvas submission)
- Everyone must submit a PDF of the report to Canvas so we can give individual grades
- Late days: Every team has a budget of 6 late days. They will be automatically calculated, after which 2% absolute is removed from max grade.
Tasks & Datasets
The course will be primarily centered on a few datasets/tasks to facilitate cross-team collaboration and technical assistance. If your team has a good reason to work on something else, please reach out so we can discuss it and put together a proposal.Simulator Based
Room-Across-Room | Code | Multilingual Embodied Navigation |
ALFRED | Code | Embodied instruction following with interaction |
TEACh | Code | Embodied Teaching (and Dialogue) |
Question Answering & Captioning
TextVQA | Code | Text in images (referring expressions and reading) |
WebQA | Code | Multihop Visual QA |
VizWiz | VQA and Captioning | Visual models for blind users |
Social-IQ | Code Proj page |
Video Question Answering focused on social interactions |
Multi-turn QA
CompGuessWhat?! | Visual Guessing Game and Attribute Prediction | |
PhotoBook Dialogue | Data | Visual reference game via dialogue |
Audio
Spoken Image Captions | A series of audio corpora and corresponding images for connecting audio directly to image regions. |
Video
TVQA | Video Question Answering Dataset | |
VATEX | Multilingual Video Captioning and Translation |
Physical hardware / robots / sensors ...
What about physical hardware? robots? tasks not datasets? Let's talk. |
Compute Limited AWS and Google Cloud compute credits will be made available to each group, so please consider both your interests and available compute resources when deciding on a dataset/project.
Lectures
Tuesday | Thursday |
---|---|
Jan 18: Course Structure
|
Jan 20: Multimodal applications and datasets
|
Readings:
|
|
Jan 25: Basics: "Deep learning"
|
Jan 27: Basics: Optimization
|
Readings: A listed or proposed dataset/task | |
Feb 1: Unimodal representations (Vision)
|
Feb 3: Unimodal representations (Language)
|
Readings: | |
Feb 8 Project Hours (Project ideas) |
Feb 10: Project Hours (Project ideas)
|
Readings: A paper of your choosing which is relevant to your project. Note: Team members must choose different papers. |
|
Feb 15: Multimodal & Coordinated Representations
|
Feb 17: Alignment and Attention
|
Readings: | |
Feb 22: Alignment + Representation
|
Feb 24: Alignment + Representation (Cont)
|
Readings: | |
Mar 1: Alignment + Representation (Cont)
|
Mar 3: Ethics (Guest: Emma Strubell)
|
Readings: None | |
Mar 8: Spring Break! |
Mar 10: Spring Break! |
Readings: None | |
Mar 15: Project Hours (Research Discussion) | Mar 17: Project Hours (Research Discussion) |
Readings: A paper of your choosing which is relevant to your project. Note: Team members must choose different papers. |
|
Mar 22: Alignment + Translation
|
Mar 24: Fusion and co-learning
|
Readings: | |
Mar 29: Reinforcement Learning
|
Mar 31: Multimodal RL
|
Readings: | |
Apr 5: Embodiment
|
Apr 7: -- NO CLASS -- |
Readings: None | |
Apr 12: Embodiment (cont)
|
Apr 14: New research directions
|
Readings:
| |
Apr 19: Project Hours (Final) |
Apr 21: Project Hours (Final) |
Readings: None | |
Apr 26: Daniel Fried | Apr 28: Chris Paxton |
Readings: | |
May 5 (5:30-8:30pm): Project Presentations (Hybrid: PH 100) | May 6: Final Reports Due |