Hi, the author here. It was meant as a light joke, and I totally agree that we should not feel bad if an AI can predict the next letter better than us. Maybe I should rephrase it to something like "IQ test for AIs" to hint a bit more about what's to come -- this is what we train the AIs for to be intelligent.
> We are now getting an equivalent definition of what neural nets are being trained for! LLMs are trained to compress the internet as much as possible!
@JoshCole Thanks -- I also like that lecture by Ilya, added it to the resources now.
@0ffh -- Yeah I hope the page does not come across as me trying to claim some new revolutionary insights. People like Ilya have been talking about this years ago - I am just trying to package it into a hopefully more accessible format.
Framing it as compression is reductive (intended). Yes compression of information is a proxy measure of Kolmogorov complexity, however it's really more accurate to say you're accurately mapping the conditional probability distribution, since it's a stochastic machine that produces samples from a distribution, not a literal compressed representation of anything (you have to do work to extract this stuff and it's not 100% in all cases).
Sometimes it is best to remember that you are a person and are not in competition with algorithms.
The author labels the first riddle thusly:
And reinforces this with the first sentence: If trying to solve it is fun and the faux implication immaterial, awesome. However, if the expressed characterization of: Is bothersome, remember that all of this is just one person's way of shaping your experience such that continued engagement is likely.Hi, the author here. It was meant as a light joke, and I totally agree that we should not feel bad if an AI can predict the next letter better than us. Maybe I should rephrase it to something like "IQ test for AIs" to hint a bit more about what's to come -- this is what we train the AIs for to be intelligent.
> We are now getting an equivalent definition of what neural nets are being trained for! LLMs are trained to compress the internet as much as possible!
Nice payoff. Others have also called out the relationship to compression (https://www.youtube.com/watch?v=AKMuA_TVz3A).
Hi, the author here.
@JoshCole Thanks -- I also like that lecture by Ilya, added it to the resources now.
@0ffh -- Yeah I hope the page does not come across as me trying to claim some new revolutionary insights. People like Ilya have been talking about this years ago - I am just trying to package it into a hopefully more accessible format.
Framing it as compression is reductive (intended). Yes compression of information is a proxy measure of Kolmogorov complexity, however it's really more accurate to say you're accurately mapping the conditional probability distribution, since it's a stochastic machine that produces samples from a distribution, not a literal compressed representation of anything (you have to do work to extract this stuff and it's not 100% in all cases).
The relationship has been thought about for a long time. In 2006 it even led to the creation of the Hutter Prize, with around 38k€ payed out so far.