Reasoning or Reciting? Exploring the Capabilities and Limitations ofLanguage Models Through Counterfactual Tasks
| Jan 23, 2024
0  |  Read Time 0 min
link
Publish Date
Number
reflection
abstract
measure task-level generalizability by taking tasks on which LMs perform well, and altering the conditions or rules under which these tasks are performed
Status
Done
Type
evaluation
Author

Experiment Setting

notion image
In each, the original task under the default conditions and its counterfactual variants share the same reasoning procedure but differ in their input-output mappings.
 
Notes: this setting are not completely outside the realm of human experience and not guarantee that counterfactual world models are unobserved in a pretraining corpus.
 
排除confounder:LLM不能理解其instructions,而不是真正不具备reasoning。
notion image

Conclusion

This suggests that these models’ ability on these tasks is supported at least in part by non-transferable, default-condition-specific behaviors rather than abstract, generalizable reasoning skills.
  • Valine
Catalog