References
Daniel Adiwardana, Minh-Thang Luong, David R So,
Jamie Hall, Noah Fiedel, Romal Thoppilan, Zi Yang,
Apoorv Kulshreshtha, Gaurav Nemade, Yifeng Lu,
et al. 2020. Towards a human-like open-domain
chatbot. arXiv preprint arXiv:2001.09977.
Peter Anderson, Xiaodong He, Chris Buehler, Damien
Teney, Mark Johnson, Stephen Gould, and Lei
Zhang. 2017. Bottom-up and top-down attention for
image captioning and visual question answering. Vi-
sual Question Answering Challenge at CVPR 2017.
Peter Anderson, Qi Wu, Damien Teney, Jake Bruce,
Mark Johnson, Niko Sünderhauf, Ian Reid, Stephen
Gould, and Anton van den Hengel. 2018. Vision-
and-Language Navigation: Interpreting visually-
grounded navigation instructions in real environ-
ments. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition (CVPR).
Jacob Andreas and Dan Klein. 2016. Reasoning about
pragmatics with neural listeners and speakers. In
Proceedings of the 2016 Conference on Empirical
Methods in Natural Language Processing, pages
1173–1182, Austin, Texas.
Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Mar-
garet Mitchell, Dhruv Batra, C Lawrence Zitnick,
and Devi Parikh. 2015. Vqa: Visual question an-
swering. In Proceedings of the IEEE international
conference on computer vision, pages 2425–2433.
John Langshaw Austin. 1975. How to do things with
words. Oxford university press.
Philip Bachman, R Devon Hjelm, and William Buch-
walter. 2019. Learning representations by maximiz-
ing mutual information across views. In Advances
in Neural Information Processing Systems 32.
Anton Bakhtin, Laurens van der Maaten, Justin John-
son, Laura Gustafson, and Ross Girshick. 2019.
Phyre: A new benchmark for physical reasoning. In
Advances in Neural Information Processing Systems
32 (NIPS 2019).
Dare A. Baldwin, Ellen M. Markman, Brigitte Bill, Re-
nee N. Desjardins, Jane M. Irwin, and Glynnis Tid-
ball. 1996. Infants’ reliance on a social criterion for
establishing word-object relations. Child Develop-
ment, 67(6):3135–3153.
H.B. Barlow. 1989. Unsupervised learning. Neural
Computation, 1(3):295–311.
Marco Baroni, Silvia Bernardini, Adriano Ferraresi,
and Eros Zanchetta. 2009. The wacky wide web: a
collection of very large linguistically processed web-
crawled corpora. Language resources and evalua-
tion, 43(3):209–226.
Rachel Barr. 2013. Memory constraints on infant learn-
ing from picture books, television, and touchscreens.
Child Development Perspectives, 7(4):205–210.
Lawrence W Barsalou. 2008. Grounded cognition.
Annu. Rev. Psychol., 59:617–645.
Emily M Bender and Alexander Koller. 2020. Climb-
ing towards nlu: On meaning, form, and understand-
ing in the age of data. In Association for Computa-
tional Linguistics (ACL).
Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and
Christian Jauvin. 2003. A neural probabilistic lan-
guage model. Journal of Machine Learning Re-
search, 3:1137–1155.
Leon Bergen, Roger Levy, and Noah Goodman. 2016.
Pragmatic reasoning through semantic inference.
Semantics and Pragmatics, 9.
Yonatan Bisk, Kevin Shih, Yejin Choi, and Daniel
Marcu. 2018. Learning Interpretable Spatial Oper-
ations in a Rich 3D Blocks World . In Proceedings
of the Thirty-Second Conference on Artificial Intelli-
gence (AAAI-18).
Yonatan Bisk, Rowan Zellers, Ronan Le Bras, Jian-
feng Gao, and Yejin Choi. 2020. PIQA: Reasoning
about physical commonsense in natural language. In
Thirty-Fourth AAAI Conference on Artificial Intelli-
gence.
David M. Blei, Andrew Y. Ng, and Michael I. Jordan.
2003. Latent dirichlet allocation. Journal of Ma-
chine Learning Research, 3:993–1022.
Paul Bloom. 2002. How children learn the meanings
of words. MIT press.
Valts Blukis, Yannick Terme, Eyvind Niklasson,
Ross A. Knepper, and Yoav Artzi. 2019. Learning to
map natural language instructions to physical quad-
copter control using simulated flight. In 3rd Confer-
ence on Robot Learning (CoRL).
Peter F Brown, Peter V deSouza, Robert L Mercer, Vin-
cent J Della Pietra, and Jenifer C Lai. 1992. Class-
based n-gram models of natural language. Compu-
tational Linguistics, 18.
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie
Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind
Neelakantan, Pranav Shyam, Girish Sastry, Amanda
Askell, Sandhini Agarwal, Ariel Herbert-Voss,
Gretchen Krueger, Tom Henighan, Rewon Child,
Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu,
Clemens Winter, Christopher Hesse, Mark Chen,
Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin
Chess, Jack Clark, Christopher Berner, Sam Mc-
Candlish, Alec Radford, Ilya Sutskever, and Dario
Amodei. 2020. Language models are few-shot learn-
ers. In preprint.
Elia Bruni, Gemma Boleda, Marco Baroni, and Nam-
Khanh Tran. 2012. Distributional semantics in tech-
nicolor. In Proceedings of the 50th Annual Meet-
ing of the Association for Computational Linguistics
(Volume 1: Long Papers), pages 136–145, Jeju Is-
land, Korea.