Evaluating the Logical Reasoning Ability of ChatGPT and GPT-4
| Jan 23, 2024
0  |  Read Time 0 min
link
Publish Date
Number
reflection
abstract
This report analyses multiple logical reasoning datasets, with popular benchmarks like LogiQA and ReClor, and newly-released datasets like AR-LSAT. GPT-4和ChatGPT在trandition benchmark上表现还可以,但是在OOD上表现很差。
Status
Not started
Type
evaluation
Author
  • Valine
Catalog