Was this for an obfuscated C challenge?

pmoriarty · on June 16, 2023

Here's how Claude+[1] explained it:

This program appears to be a hexadecimal dump utility. It does the following:

- It takes a filename as a command line argument and opens that file for reading

- It reads the file byte by byte until EOF

- For each byte, it prints the hexadecimal value of the byte, in the format "%2x " (i.e. 2 hex digits, a space)

- After every 16 bytes, it prints the ASCII representation of those bytes, replacing non-printable characters with "."

- It also has some obfuscated logic with bitwise operations, likely attempting to confuse the reader.

So if you ran it like this:

  ./program myfile.txt

It would output something like:

  54 65 78 74 20 66 69 
  6c 65 2e 0a 54 68 69 
  73 20 69 73 20 61 20 
  74 65 78 74 20 66 69 
  6c 65 2e 0a 54 68 65 
  20 71 75 69 63 6b 20 
  62 72 6f 77 6e 20 66  
  6f 78 0a 6a 75 6d 70 
  73 20 6f 76 65 72 20
  20 74 68 65 20 6c 61
  7a 79 20 64 6f 67 0a 
  2e 2e 2e

Which is the hexadecimal dump of the ASCII contents of myfile.txt.

The #defines are used to obfuscate the code and make it harder to read, replacing printf with O1O, putchar with OlO, etc. The D() function is used to check if a byte is a printable ASCII character.

So in summary, this program opens a file, reads it byte by byte, prints the hex values, and prints the ASCII for printable characters, as a hexadecimal dump utility.

[1] - https://poe.com/Claude%2B

cornstalks · on June 16, 2023

I'm well aware of how the code works. I was mostly interested in why it was written like that. nwiswell's comment[1] gave me the hint I needed to find that yes, this was part of an obfuscated C challenge[2].

I know you mean well but LLMs are the very last resource I'd turn to for help. Those things make crap up all the time.

[1]: https://news.ycombinator.com/item?id=36364283

[2]: https://www.ioccc.org/1986/bright/bright.c

da39a3ee · on June 17, 2023

What a silly reply. It's very impressive that the other person's LLM was able to figure this out, even if you were clever enough you know it already

WalterBright · on June 17, 2023

> It also has some obfuscated logic with bitwise operations, likely attempting to confuse the reader.

Bitwise operations are commonplace to improve efficiency.

pmoriarty · on June 16, 2023

After seeing an LLM do something like this, I've got to ask people who think that LLMs are just "stochastic parrots", "just predicting the next word", or are merely "a blurry jpeg of the web" to think about what's really going on here.

nwiswell · on June 16, 2023

This exact code appears online as part of the IOCCC 1986 (it was a submission), so it's likely that this was indeed part of the training set for this LLM and that there is a significant corpus of text discussing this particular program and other obfuscated programs like it.

I'm not ruling out that this LLM output is "partially organic" rather than "fully regurgitated", but I'd be much more interested to see this LLM explain an obfuscated program that hasn't been floating around the Internet for 35 years.

loveparade · on June 17, 2023

Even if it's part of the training data and the LLM is just a better search engine, how would I have figured out what the code does without an LLM? I certainly can't paste this into Google.

I mostly agree with the stochastic parrot interpretation, but that doesn't undermine the usefulness or impressiveness. Even if it's just a highly compressed search index, that level of compression is amazing.

erhaetherth · on June 17, 2023

> how would I have figured out what the code does without an LLM

Start by find-and-replacing those #defines. You can iteratively deobfuscate things by hand. It's PITA and takes time, but it's doable.

If you hit a roadblock, run it in a VM.

pmoriarty · on June 16, 2023

It's easy to test it with something unpublished

tomjakubowski · on June 16, 2023

My experience is that ChatGPT does a very poor job writing Brainfuck programs for me, even simple programs like "add two and two" aren't correct. Maybe it would do better if I asked it to explain one instead.

pmoriarty · on June 17, 2023

In my experience, LLMs are poor at working with unpopular languages -- probably because their training data does not contain a lot of examples of programs written in those languages, or explanations of them.

They do much better with popular languages.

yawpitch · on June 17, 2023

> They do much better with popular languages.

So, in other words, they perform precisely how you’d expect a stochastic parrot to perform?

The more popular the language the more likely the training corpus includes both very similar code samples and explanation of those code samples, and also the more likely those two converge on a “reasonable” explanation.

Ask it something it’s likely to have seen an answer for and it’s likely to spit out that answer… interesting? Sure, impressive? Maybe… but still pretty well captured by “a fuzzy jpeg of the web”.

pmoriarty · on June 17, 2023

"So, in other words, they perform precisely how you’d expect a stochastic parrot to perform?"

Or exactly like you'd expect a human to perform.

Train a human mostly on English, and they'll speak English. Train them mostly on Chinese, and they'll speak Chinese.

yawpitch · on June 17, 2023

> Train a human mostly on English, and they'll speak English. Train them mostly on Chinese, and they'll speak Chinese.

Ahh, but ask a human a question in a language they don’t understand and they’ll look at you with bewilderment, not confidently make up a stream of hallucinatory nonsense that only vaguely looks statistically right.

> Or exactly like you’d expect a human to perform.

Not exactly, no… but with just enough of the uncanny valley to make me think the more interesting thought: are we really not much more than stochastic parrots? Or, in other words, are we naturally just slightly more interesting than today’s state of the artificially stupid?

TheCleric · on June 16, 2023

It didn't even answer the question though...

tomjakubowski · on June 16, 2023

Funny thing is GPT4's explanation mentions two times that the code is "highly" or "heavily" obfuscated. Poe's does not.

https://sharegpt.com/c/G37AW91

WalterBright · on June 16, 2023

> likely attempting to confuse the reader

It's just ordinary C code.

throwaway675309 · on June 16, 2023

Very obviously not.

Nobody substitutes random three letter strings for keywords in ordinary C code unless they intend on some trivial obfuscation.

    define O1O printf
    #define OlO putchar
    #define O10 exit
    #define Ol0 strlen
    #define QLQ fopen
    #define OlQ fgetc
    #define O1Q abs
    #define QO0 for
    typedef char lOL;

panzi · on June 17, 2023

I've seen code like this (3 letter macros for every one and a half(!) syntax construct, all macros starting in Q, seemingly random indentation). I do not understand why the developer did that or why the company let him. Just that at the end he didn't understand his own code anymore and couldn't fix some issues.

andrepd · on June 17, 2023

Not random, but permutations of easily confusable characters (0OQ, 1l).

WalterBright · on June 17, 2023

In many fonts, they are nearly indistinguishable. For example, the default font of putty for { and ( look identical to me, leading to many syntax errors in my code.

tomjakubowski · on June 16, 2023

Did you write the program without the intention of being obfuscatory, and then _later_ submit it to IOCCC? Or did somebody else submit your code?

It's right here, and your name is on it. https://www.ioccc.org/1986/bright/bright.c

It even won an award! https://www.ioccc.org/1986/bright/hint.html

WalterBright · on June 17, 2023

> Did you write the program without the intention of being obfuscatory

I cribbed it from system .h files.