Which is the hexadecimal dump of the ASCII contents of myfile.txt.
The #defines are used to obfuscate the code and make it harder to read, replacing printf with O1O, putchar with OlO, etc. The D() function is used to check if a byte is a printable ASCII character.
So in summary, this program opens a file, reads it byte by byte, prints the hex values, and prints the ASCII for printable characters, as a hexadecimal dump utility.
I'm well aware of how the code works. I was mostly interested in why it was written like that. nwiswell's comment[1] gave me the hint I needed to find that yes, this was part of an obfuscated C challenge[2].
I know you mean well but LLMs are the very last resource I'd turn to for help. Those things make crap up all the time.
After seeing an LLM do something like this, I've got to ask people who think that LLMs are just "stochastic parrots", "just predicting the next word", or are merely "a blurry jpeg of the web" to think about what's really going on here.
This exact code appears online as part of the IOCCC 1986 (it was a submission), so it's likely that this was indeed part of the training set for this LLM and that there is a significant corpus of text discussing this particular program and other obfuscated programs like it.
I'm not ruling out that this LLM output is "partially organic" rather than "fully regurgitated", but I'd be much more interested to see this LLM explain an obfuscated program that hasn't been floating around the Internet for 35 years.
Even if it's part of the training data and the LLM is just a better search engine, how would I have figured out what the code does without an LLM? I certainly can't paste this into Google.
I mostly agree with the stochastic parrot interpretation, but that doesn't undermine the usefulness or impressiveness. Even if it's just a highly compressed search index, that level of compression is amazing.
My experience is that ChatGPT does a very poor job writing Brainfuck programs for me, even simple programs like "add two and two" aren't correct. Maybe it would do better if I asked it to explain one instead.
In my experience, LLMs are poor at working with unpopular languages -- probably because their training data does not contain a lot of examples of programs written in those languages, or explanations of them.
So, in other words, they perform precisely how you’d expect a stochastic parrot to perform?
The more popular the language the more likely the training corpus includes both very similar code samples and explanation of those code samples, and also the more likely those two converge on a “reasonable” explanation.
Ask it something it’s likely to have seen an answer for and it’s likely to spit out that answer… interesting? Sure, impressive? Maybe… but still pretty well captured by “a fuzzy jpeg of the web”.
> Train a human mostly on English, and they'll speak English. Train them mostly on Chinese, and they'll speak Chinese.
Ahh, but ask a human a question in a language they don’t understand and they’ll look at you with bewilderment, not confidently make up a stream of hallucinatory nonsense that only vaguely looks statistically right.
> Or exactly like you’d expect a human to perform.
Not exactly, no… but with just enough of the uncanny valley to make me think the more interesting thought: are we really not much more than stochastic parrots? Or, in other words, are we naturally just slightly more interesting than today’s state of the artificially stupid?
I've seen code like this (3 letter macros for every one and a half(!) syntax construct, all macros starting in Q, seemingly random indentation). I do not understand why the developer did that or why the company let him. Just that at the end he didn't understand his own code anymore and couldn't fix some issues.
In many fonts, they are nearly indistinguishable. For example, the default font of putty for { and ( look identical to me, leading to many syntax errors in my code.