The title makes it sound nice but the reported results are worse than random baselines on several benchmarks, including ones to claim superiority over BERT. At a glance, Hellaswag, boolq, winogrande are all at or below random guessing. At best this is a fun model with broken evaluation. At worst this is medium spam for clout farming - which won't work on anyone who can read the tables.
The title makes it sound nice but the reported results are worse than random baselines on several benchmarks, including ones to claim superiority over BERT. At a glance, Hellaswag, boolq, winogrande are all at or below random guessing. At best this is a fun model with broken evaluation. At worst this is medium spam for clout farming - which won't work on anyone who can read the tables.