Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Was this for an obfuscated C challenge?


Here's how Claude+[1] explained it:

This program appears to be a hexadecimal dump utility. It does the following:

- It takes a filename as a command line argument and opens that file for reading

- It reads the file byte by byte until EOF

- For each byte, it prints the hexadecimal value of the byte, in the format "%2x " (i.e. 2 hex digits, a space)

- After every 16 bytes, it prints the ASCII representation of those bytes, replacing non-printable characters with "."

- It also has some obfuscated logic with bitwise operations, likely attempting to confuse the reader.

So if you ran it like this:

  ./program myfile.txt
It would output something like:

  54 65 78 74 20 66 69 
  6c 65 2e 0a 54 68 69 
  73 20 69 73 20 61 20 
  74 65 78 74 20 66 69 
  6c 65 2e 0a 54 68 65 
  20 71 75 69 63 6b 20 
  62 72 6f 77 6e 20 66  
  6f 78 0a 6a 75 6d 70 
  73 20 6f 76 65 72 20
  20 74 68 65 20 6c 61
  7a 79 20 64 6f 67 0a 
  2e 2e 2e
Which is the hexadecimal dump of the ASCII contents of myfile.txt.

The #defines are used to obfuscate the code and make it harder to read, replacing printf with O1O, putchar with OlO, etc. The D() function is used to check if a byte is a printable ASCII character.

So in summary, this program opens a file, reads it byte by byte, prints the hex values, and prints the ASCII for printable characters, as a hexadecimal dump utility.

[1] - https://poe.com/Claude%2B


I'm well aware of how the code works. I was mostly interested in why it was written like that. nwiswell's comment[1] gave me the hint I needed to find that yes, this was part of an obfuscated C challenge[2].

I know you mean well but LLMs are the very last resource I'd turn to for help. Those things make crap up all the time.

[1]: https://news.ycombinator.com/item?id=36364283

[2]: https://www.ioccc.org/1986/bright/bright.c


What a silly reply. It's very impressive that the other person's LLM was able to figure this out, even if you were clever enough you know it already


> It also has some obfuscated logic with bitwise operations, likely attempting to confuse the reader.

Bitwise operations are commonplace to improve efficiency.


After seeing an LLM do something like this, I've got to ask people who think that LLMs are just "stochastic parrots", "just predicting the next word", or are merely "a blurry jpeg of the web" to think about what's really going on here.


This exact code appears online as part of the IOCCC 1986 (it was a submission), so it's likely that this was indeed part of the training set for this LLM and that there is a significant corpus of text discussing this particular program and other obfuscated programs like it.

I'm not ruling out that this LLM output is "partially organic" rather than "fully regurgitated", but I'd be much more interested to see this LLM explain an obfuscated program that hasn't been floating around the Internet for 35 years.


Even if it's part of the training data and the LLM is just a better search engine, how would I have figured out what the code does without an LLM? I certainly can't paste this into Google.

I mostly agree with the stochastic parrot interpretation, but that doesn't undermine the usefulness or impressiveness. Even if it's just a highly compressed search index, that level of compression is amazing.


> how would I have figured out what the code does without an LLM

Start by find-and-replacing those #defines. You can iteratively deobfuscate things by hand. It's PITA and takes time, but it's doable.

If you hit a roadblock, run it in a VM.


It's easy to test it with something unpublished


My experience is that ChatGPT does a very poor job writing Brainfuck programs for me, even simple programs like "add two and two" aren't correct. Maybe it would do better if I asked it to explain one instead.


In my experience, LLMs are poor at working with unpopular languages -- probably because their training data does not contain a lot of examples of programs written in those languages, or explanations of them.

They do much better with popular languages.


> They do much better with popular languages.

So, in other words, they perform precisely how you’d expect a stochastic parrot to perform?

The more popular the language the more likely the training corpus includes both very similar code samples and explanation of those code samples, and also the more likely those two converge on a “reasonable” explanation.

Ask it something it’s likely to have seen an answer for and it’s likely to spit out that answer… interesting? Sure, impressive? Maybe… but still pretty well captured by “a fuzzy jpeg of the web”.


"So, in other words, they perform precisely how you’d expect a stochastic parrot to perform?"

Or exactly like you'd expect a human to perform.

Train a human mostly on English, and they'll speak English. Train them mostly on Chinese, and they'll speak Chinese.


> Train a human mostly on English, and they'll speak English. Train them mostly on Chinese, and they'll speak Chinese.

Ahh, but ask a human a question in a language they don’t understand and they’ll look at you with bewilderment, not confidently make up a stream of hallucinatory nonsense that only vaguely looks statistically right.

> Or exactly like you’d expect a human to perform.

Not exactly, no… but with just enough of the uncanny valley to make me think the more interesting thought: are we really not much more than stochastic parrots? Or, in other words, are we naturally just slightly more interesting than today’s state of the artificially stupid?


It didn't even answer the question though...


Funny thing is GPT4's explanation mentions two times that the code is "highly" or "heavily" obfuscated. Poe's does not.

https://sharegpt.com/c/G37AW91


> likely attempting to confuse the reader

It's just ordinary C code.


Very obviously not.

Nobody substitutes random three letter strings for keywords in ordinary C code unless they intend on some trivial obfuscation.

    define O1O printf
    #define OlO putchar
    #define O10 exit
    #define Ol0 strlen
    #define QLQ fopen
    #define OlQ fgetc
    #define O1Q abs
    #define QO0 for
    typedef char lOL;


I've seen code like this (3 letter macros for every one and a half(!) syntax construct, all macros starting in Q, seemingly random indentation). I do not understand why the developer did that or why the company let him. Just that at the end he didn't understand his own code anymore and couldn't fix some issues.


Not random, but permutations of easily confusable characters (0OQ, 1l).


In many fonts, they are nearly indistinguishable. For example, the default font of putty for { and ( look identical to me, leading to many syntax errors in my code.


Did you write the program without the intention of being obfuscatory, and then _later_ submit it to IOCCC? Or did somebody else submit your code?

It's right here, and your name is on it. https://www.ioccc.org/1986/bright/bright.c

It even won an award! https://www.ioccc.org/1986/bright/hint.html


> Did you write the program without the intention of being obfuscatory

I cribbed it from system .h files.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: