spindump8930 8 hours ago

The title makes it sound nice but the reported results are worse than random baselines on several benchmarks, including ones to claim superiority over BERT. At a glance, Hellaswag, boolq, winogrande are all at or below random guessing. At best this is a fun model with broken evaluation. At worst this is medium spam for clout farming - which won't work on anyone who can read the tables.