To diagnose the failures of current models and support research, we're releasing GSM8K, a dataset of 8.5K high quality linguistically diverse grade school math word problems. We find that even the ...
Anthony was feeling stressed. Besides the fact that he had spent absolutely no time thinking about math over the 17-day break ...