When can transformers reason with abstract symbols?

link

Publish Date

Number

reflection

abstract

理论上来说明，经过大量的ft，transformer可以完成template matching（模版一样，只是variables不一样）即所说的一阶泛化。但另外是对于symbolic task，它不能。 For (i) regression tasks, we prove that transformers generalize when trained, but require astonishingly large quantities of training data. For (ii) next-token-prediction tasks with symbolic labels, we show an “inverse scaling law”: transformers fail to generalize as their embedding dimension increases.

Status

In progress

Type

evaluation

Author