Physical Interaction: Question Answering

PIQA was designed to investigate the physical knowledge of existing models. To what extent are current approaches actually learning about the world?



Submitting to the leaderboard

Submission is simple. Please email your predictions.

To: ybisk--_--cs.cmu.edu

Subject: [PIQA Leaderboard Submission]

Body:

  1. A predictions lst file (one prediction per line)
  2. A name for your model
  3. Your team name (including your affiliation)
  4. Optionally: A github repo or paper link.


I'll try to get back to you within a few days, usually sooner. Teams can only submit results from a model once every 7 days. Additionally, we reserve the right to not score any of your submissions if you cheat -- for instance, fake names / email addresses and multiple submissions under those names.


Citation

@inproceedings{Bisk2020,
  author = {Yonatan Bisk and Rowan Zellers and Ronan Le Bras and Jianfeng Gao and Yejin Choi},
  title = {PIQA: Reasoning about Physical Commonsense in Natural Language},
  booktitle = {Thirty-Fourth AAAI Conference on Artificial Intelligence},
  year = {2020},
}

Questions?

Please email me

PIQA Leaderboard

Physical IQA is a binary choice task, often better viewed as a set of two (Goal, Solution) pairs

  • Goal To separate egg whites from the yolk using a water bottle, you should ...
  • Solution 1 Squeeze the water bottle and press it against the yolk. Release, which creates suction and lifts the yolk.
  • Solution 2 Place the water bottle and press it against the yolk. Keep pushing, which creates suction and lifts the yolk.

Evaluation is a simple accuracy prediction over this binary task.

Rank Model Accuracy
Human Performance
(Bisk et al. '20)
94.9
1 RoBERTa-Large

Run by AI2

77.1
2 OpenAI GPT

Run by AI2

69.2
3 BERT-Large

Run by AI2

66.8
4 Majority Class 50.4
Random Performance 50.0