So You Think You Know C? (2020) [pdf]

okaleniuk · on Sept 17, 2023

Wow! Didn't expect to see it on Hacker News again.

Hello, hi! I'm Oleksandr Kaleniuk, and I wrote the thing back in 2020. Please don't take it all too seriously. There was COVID-19, I had nothing better to do.

I did write a new book since then, and my publisher would bite my head off for missing this opportunity. It's called Geometry for Programmers and it was published by Manning just this year. https://www.manning.com/books/geometry-for-programmers It's about curves, surfaces, transformations, homogeneous coordinates, rational splines, distance fields, and voxels. You don't have to know any advanced math to enjoy the book, but you might want to know a bit of Python. It's for programmers, so the didactic tools there are mostly code, not theorems.

cerved · on Sept 17, 2023

Very nice read, thank you for sharing.

Buttons840 · on Sept 17, 2023

Wow, the HN front page at any given moment gives fewer things to talk about than this one PDF.

This is 10 blog posts in one PDF, and each post is good.

unwind · on Sept 17, 2023

It irks me no end when language-lawyering like this uses sizeof in the stupidly obscure way of the first example.

    sizeof(*(&s))

I mean, wut? Really?? This should just be

    sizeof s

Since sizeof is not a function, and redirecting by dereferencing a pointer that is just created by taking the address of a local is silly.

Not overly impressed by this.

mrkeen · on Sept 17, 2023

Here's Torvalds on it:

> But "sizeof()" really is a function. It acts exactly like a function of it's argument. There is no reason to not treat it that way.

https://lkml.org/lkml/2012/7/11/103

jstimpfle · on Sept 17, 2023

One of the rare cases where I think he's just not seeing the whole picture. For one, sizeof is not a runtime function but evaluated at compile time. (pretty obvious). More importantly, sizeof is syntactically a unary operator. It isn't syntactically a function call, just like "-a" or "*a" aren't function calls.

That's just how it is, and really what's wrong with it? We're not in lisp after all. Let's not pretend it's a function, because otherwise an expression like

    sizeof(obj).member

would be at least confusing.

Gibbon1 · on Sept 17, 2023

In a alternative world where WG14 all got hit by a bus in 1988 sizeof could be a runtime function that the compiler could replace with a compile time constant as an optimization. Lets say Richie won and phat pointers were added to the language. Then sizeof(phat_pointer) would absolutely be a run time function.

jstimpfle · on Sept 17, 2023

But it's not. And there is no reason why getting the size out of a fat pointer at runtime would need to be done by the same syntactic construct that otherwise does static calculations.

Gibbon1 · on Sept 17, 2023

Because C doesn't have object syntax.

AKA

   int a = 10;
   int \*p = &a;

   printf( "size of a = %i, p=%i", a.size, p.size);

jstimpfle · on Sept 18, 2023

Duh, C also doesn't have fat pointers. And C doesn't have a runtime_sizeof keyword. How do you get from this to "The existing sizeof should be re-purposed to be evaluated at runtime"?

Gibbon1 · on Sept 18, 2023

Personal interest is why.

You and WG14 think of C as if it was a historical steam locomotive being carefully maintained in in original state.

Other people think of it as a tool they have to use all the time.

jstimpfle · on Sept 19, 2023

It is a tool I use all the time. I am not concerned about keeping an unaltered timeless gem (which it isn't), but rather things like clarity, orthogonality, simplicity, minimality of concepts etc.

Honestly I am not even sure I am missing fat pointers, other than getting some better diagnostics in pretty rare cases of out-of-bounds access. For coding, I much prefer separate, explicit length fields. I can have as many of them as I need and mutate them as I need. Much more flexible.

The hidden size field of a fat pointer would be more appropriate for a "capacity" field (think std::vector capacity() vs size()) since that is indeed -- more or less, usually -- immutable after allocation of the buffer.

However that is not the "size" field that most would be interested in, not the size field that defines that range of valid data. So again, would be mostly useful for diagnostics, and only if there is a well defined allocation size.

There are some other concerns I have with fat pointers. Let's consider two fat pointers with the same address but different length, how should they compare, p1 == p2 or p1 != p2? Neither p1 < p2 nor p1 > p2 would be a good choice, they would both probably break a lot of code. There is probably quite a lot of code that deducts p1==p2, however substituting p2 for p1 could lead to breakage due to different lengths.

This seems to me like a strong argument against fat pointers, seeing that problems arise from too much magic. While it could be made to work with the standard (since I guess doing any comparison between p1 and p2 could be UB considering p1 and p2 as pointing to different "objects" / one of them invalid), it's probably a path I would not want to go down.

Another, less theoretic and much more practical concern is this: How do you plan to make fat pointers compatible with existing software? If the normal pointer syntax is taken as-is and repurposed to represent fat pointers, those fat pointers will now be bigger than before. I suppose that a good chunk of existing projects would break, and would require significant porting work.

It is ratholes like this that WG14 seems to have mostly avoided rather successfully, and I value that. If you want more features, there is a different language with another philosophy, so no reason for posting negative sentiments (including subtle aggression like "hit by a bus" etc).

kazinator · on Sept 19, 2023

In Lisp terms, sizeof is a special operator.

It used to be compile-time only until C99, which allowed it to be applied to a variable-length array (VLA). Thus it potentially has run-time semantics, though still not a function. Address-of (&) has run-time semantics, since the address of an object is rarely constant, and isn't a function either.

unwind · on Sept 18, 2023

When using variable-length arrays (which I know are being phased out but which still do exist in the language) sizeof is indeed evaluated at runtime. It's weird.

kazinator · on Sept 19, 2023

Address-of (&) can be resolved at load time, making this possible at file scope:

  struct node {
    struct node *next, *prev;
  } circular_list = { &circular_list, &circular_list };

This requires no initialization code to run; linking resolves it.

Very long ago I made a GUI help system for a program, where all the documentation nodes were represented as structures with pointers for their navigation links: next section, previous section, up, children ...

I statically declared the entire content with declarations akin to the above circular_list.

The help GUI was just a window which displayed the node contents, and the buttons like next and prev just chased the pointers to another node which would replace the displayed content.

Worked like a charm.

Good luck giving that to internationalization people to translate. Haha!

unwind · on Sept 18, 2023

It's probably silly to start remote-lawyering with Mr Torvalds of all people, but that is obviously false, and (also obviously) he knows it.

If it acts exactly like a function, show me its prototype, and how you would call it through a function pointer. I'll be here, waiting while thumbing through my old K&R, or something ...

Also, when using variable-length arrays (which are being phased out, but they are in for instance C99) this works:

    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>

    int main(int argc, char *argv[])
    {
       size_t arglens[argc];
       for (size_t i = 0; i < sizeof arglens / sizeof *arglens; ++i) {
         arglens[i] = strlen(argv[i]);
       }
       printf("Computed %zu lens:\n", sizeof arglens / sizeof *arglens);
       for (size_t i = 0; i < sizeof arglens / sizeof *arglens; ++i) {
         printf("%zu: '%s' is %zu\n", i, argv[i], arglens[i]);
       }
       return 0;
    }

Compile (as C99!) and run:

    $ gcc -o arglens -std=c99 arglens.c
    $ ./arglens foo bar baz fooofooofoo
    Computed 5 lens:
    0: './arglens' is 9
    1: 'foo' is 3
    2: 'bar' is 3
    3: 'baz' is 3
    4: 'fooofooofoo' is 11

Notice how sizeof is computing the size of a run-time sized array. Very much unlike a function.

Edit: fix indent for code formatting (hard tabs don't work).

leni536 · on Sept 17, 2023

I agree with his preference, but I think the reasoning should go like `sizeof` is an operator with possibly unclear operator precedence, while `return` is a statement.

flyingcircus3 · on Sept 17, 2023

sizeof still requires parentheses if it's a type and not a variable. To me it's like curly braces' optionality on conditional blocks with one line. If you aren't consistent, you're setting yourself up for heartache.

jstimpfle · on Sept 17, 2023

I think this is the same syntax as the one used for casting -- a "type expression" will only be recognized immediately after an opening paren (or at the start of a statement, in case of a declaration statement). So this is rather consistent.

flyingcircus3 · on Sept 17, 2023

Consistency would be using parentheses for all cases of sizeof, because people without 10 years of experience can pick it up.

jstimpfle · on Sept 17, 2023

If you think something is inconsistent you should ask why and find that what you thought inconsistent are simply 2 different things. The parentheses have nothing to do with sizeof, it's just how you write types (I'm repeating myself).

eddd-ddde · on Sept 17, 2023

This is not obscure at all, I have used it commonly when you have a pointer to an object an want to allocate the size its dereferenced object occupies.

sylware · on Sept 17, 2023

C is the less worse alternative, the less worse compromise.

C syntax is already way too complex.

My opinion: a real world wide standard and royalty-free average modern ISA (for instance 64bits RISC-V), then significant system components get a native assembly version, and then some very high level interpreters (javascript?python?ruby?lua?tintin?etc?) written themselves in assembly. If the pre-processor is not abused (aka they are not trying to code c++ with the assembler pre-processor), we should reach a state much less toxic than what we have today.

layer8 · on Sept 17, 2023

> How satisfied are you with the current state of software on a scale from 1 to -128?

This is brilliant. :D

belter · on Sept 17, 2023

On a scale from 1 to -128...I'd rate it a solid 'undefined'.

tetha · on Sept 17, 2023

I'll note your rating down as excellent, naturally.

dahfizz · on Sept 17, 2023

Kinda disappointed that all five questions just boiled down to the size of structs / ints in memory. There's way more interesting and arcane gotchas in C!

mtreis86 · on Sept 17, 2023

Can you elaborate a bit? I don't know enough C to know many gotchas besides the order of the characters in i++ vs ++i mattering.

skitter · on Sept 17, 2023

Something as simple as adding integers is tricky due to its implicitness. Due to promotion, the following code works on some platforms but is verboten on others:

    int16_t a = 20000;
    int16_t b = a + a;

But even ignoring size, it can bite you:

    unsigned int c = 1;
    signed int d = -2;
    if(c + d > 0)
      puts("1 + -2 > 0");

That code prints because of balancing: The signed value gets cast to an unsized int, which can't hold the value.

For not directly integer related semantics, creating an out-of-bounds pointer is UB, unless it points to one past the end of the object, in which case you can create the pointer but can't read from it. And if an object's lifetime has ended, pointers to it aren't valid anymore even if you don't even read from it:

    void *foo_1 = malloc(1);
    free(foo_1);
    void *foo_2 = malloc(1);
    printf("Checking for same allocatoin: %d", foo_1 == foo_2);

If you go by the standard, strictly speaking this is UB:

    #include <stdio.h>
    int main(void) {
        puts("Hello World);
    }

But don't worry, real compilers will refuse to compile this. Most don't follow the standard anyways, because they don't treat writing to a memory location as side effect.

A large source of gotchas in C are the standard libraries. For example here's a taste of the trouble its locales cause: https://github.com/mpv-player/mpv/commit/1e70e82baa9193f6f02...

smcameron · on Sept 17, 2023

That last link was enjoyable. The best way I know of to amputate locales is to use dlsym() to look up setlocale() and replace it with your own setlocale that calls the real setlocale with arguments LC_ALL, "C", regardless of what arguments are passed in.

readthenotes1 · on Sept 17, 2023

  puts("Hello World);

How does that even compile without the closing "?

P-Nuts · on Sept 17, 2023

The compiler isn’t strictly required by the standard to emit a syntax error when given an invalid program.

esrauch · on Sept 17, 2023

Every other one can be justified by the size of char/int are not fixed, but instead the standard defines a minimum size. Int is usually 32 bit but it's allowed to be 16 bit or anything larger. Char is usually 8 bit but can arbitrarily large if the platform "byte" is not 8 bits.

svilen_dobrev · on Sept 17, 2023

"So if I were to reinvent a programming language for the 21st century,

I would reinvent being responsible instead.

I would reinvent learning your tools;

I would reinvent being attentive to essential details and

being merciless to accidental complexity.

Unlike the languages that come and go with fashion, the things that matter do deserve constant reinvention."

(last paragraph of the book)

---

can't upvote/ engrave it enough. Amen.

nuc1e0n · on Sept 17, 2023

Very cool.

I think this says quite something about the modern human thought process in general. We place a lot of emphasis on knowing how abstract rules we've created should interact and not so much on actually testing things for ourselves.

Sometimes there are unknown factors that will influence results, such as undefined behaviour with C compilers in this case.

_gabe_ · on Sept 17, 2023

> Eventually I learned I had to rely on the standard instead of folklore

Isn’t this the opposite of what should have been done in the case of coding systems for a nuclear facility? I thought the whole point of the exercises was to show that C has lots of undefined behavior, so you should only rely on the behavior of the specific system you’re coding for. If the C standard doesn’t specify what happens when a signed integer overflows, that doesn’t mean you can just ignore it. You could crash the system, report an error, or find out what the targeted system is specified to do in the case of a signed integer overflow. Is there something else I’m missing here, or do the nuclear systems typically have several processors that may have varying behavior for this type of stuff?

dannymi · on Sept 19, 2023

>I thought the whole point of the exercises was to show that C has lots of undefined behavior, so you should only rely on the behavior of the specific system you’re coding for.

In my opinion, the whole point of the exercise was to show that as an engineer, you should have an engineering mindset. And that means collecting requirements, selecting parts, reading a lot of specifications, and participate in writing them.

C is an abstraction. If you program in C, you use that abstraction (so you use the standardized C abstract machine). Why would you program in C and NOT use that abstraction? If you want the latter, just use assembly. Especially in this application.

Otherwise, yes, you hold yourself to the C standard, and you use compilers that are certified to actually implement that standard.

Of course, there needs to be a lot (>50% of the total time) of testing, too. But not to figure out what the compiler does :P

> If the C standard doesn’t specify what happens when a signed integer overflows, that doesn’t mean you can just ignore it.

It means you have to write your programs in a way that signed overflow cannot happen.

> You could crash the system, report an error, or find out what the targeted system is specified to do in the case of a signed integer overflow.

Just make it fail (looong) before the signed integer overflow would happen (with proof).

>what the targeted system is specified to do in the case of a signed integer overflow

What does this mean? What is the "system" in this case?

If the "system" is the hardware, then no. In standardized C, this overflow never makes it down to the hardware. If it's the hardware and the compiler, then sure, but you aren't programming standard C then. I guess you could do that (why would you, though... compliance is gonna reject it).

> Is there something else I’m missing here, or do the nuclear systems typically have several processors that may have varying behavior for this type of stuff?

In practice, you have certified compilers that are certified to actually implement the c standard correctly.

>Is there something else I’m missing here, or do the nuclear systems typically have several processors that may have varying behavior for this type of stuff?

You usually want to be independent of the processor (and/or vendor) used, in order to be able to easily switch it out in case they get uppity/lazy. That's one of the main uses of C.

irundebian · on Sept 17, 2023

Regarding to the struct packing: That's why I love Ada, there are so called representation clauses which can define the size of struct-alike types called records. In C you have to rely on compiler options or features like __attribute__ which are not compliant to the ISO C standard. It looks like there is a ISO-C compliant version for that since C11 but it just doesn't look nice and like a first class citizen of the language.

https://docs.adacore.com/gnat_rm-docs/html/gnat_rm/gnat_rm/r...

rwmj · on Sept 17, 2023

I know he's trying (badly) to make some point about undefined behaviour and/or strict reading of the C standard, but his answers are simply wrong. The first case, for example, returns 8 [edit: the size of the structure] (not "I don't know") under any reasonable compiler, and isn't even UB. The second one returns 0, and is also not UB. etc.

Also if someone wrote this bit of obscurantism in any programming language it should be rejected hard when reviewing the code:

  return (((((i >= i) << i) >> i) <= i));

nanolith · on Sept 17, 2023

It's fine to bristle about underhanded evangelism (e.g. "Use Rust instead of C, because C is a foot cannon and Rust is perfect"), but that's definitely not the case here. This is a reasonable critique of the common C pitfalls.

I use C daily. The answer in each category is "I don't know" because, without the additional information provided by a specific ABI, specific target, and specific compiler, these are all undefined. No language is perfect. It's important to understand the limitations of a language to use it effectively. That is closer to the author's point. This is not a condemnation of C, but rather, an acknowledgement of C, warts and all.

The original point of C was to provide a portable low-level language that could capture expressions instead of writing everything in assembler, so that UNIX could be ported to platforms other than the PDP-11. It's also a reason why C still has uses not shared by newer languages, because these newer languages are more precise. This is a Good Thing, but unfortunately, developers who are unaware of this imprecision can make mistakes that aren't portable and may not behave as they assume. These problems appear, seemingly out of nowhere, when porting source code to a new compiler or a new platform. While the examples the author provides are over-the-top and easy for an experienced C developer to catch, there are more subtle examples, such as promotion and signed integer overflow, that are hard even for an experienced C developer to catch.

This is why I use static analysis, model checking, proof assistants, and even a modern compiler with built-in pedantic checking against the standard. This language has subtle corner cases that even someone with 25-30 years of experience can miss. Unit testing can miss these corner cases unless the right input values are chosen, and the right input values are not always so easy to guess. It's not a condemnation of C or an attempt to evangelize newer languages to point this out, but rather, an acknowledgement that mastery of any language means accepting both its strengths and limitations. C has both.

cuanim · on Sept 17, 2023

Hello, what would be your recommendations for someone who wants to take the next step from basics of C to contributing to FOSS repos (for ex: FFMPEG or VLC or mpv)? I have been working my way through [Modern C](https://inria.hal.science/hal-02383654/file/ModernC.pdf) but I can't seem to figure out how does one start with making(/contributing to) non trivial projects.

nanolith · on Sept 18, 2023

First, I'm no expert on this. While I do contribute patches to various OSS projects, I'm not an insider in any. I possess some old-school netiquette and I've gone through plenty of commercial projects, so there are enough similarities that allow me to submit patches and get them mainlined from time to time. In other words, please take my advice with an appropriate level of salt, as it may not reflect your experience.

Different projects have different cultures. Some are quite insular and difficult to break into. Others are quite easy going. For instance, I was able to send a patch through Meson's process (a fix to ensure that their CMake compatibility layer could build custom assembler files) in a matter of days. In projects like OpenBSD, in contrast, you really have to impress the core developers and become part of the community before you can make significant changes. Either way, don't get frustrated, and always remember that these projects are more often than not made up entirely of volunteers who don't owe you or anyone else anything. Be gracious and polite with all of your interactions, even when dealing with the more abrasive personalities. Don't take things personally. See every interaction as a learning opportunity.

If you are interested in a particular project, then I recommend joining the mailing lists and forums. Lurk for a while. Learn the organization and politics of the project, as these will differ wildly between projects. Spend time studying the project and offer help to users in the user mailing list. If they have an IRC channel, join it, and ask the developers on the channel how you can help. They will have specific recommendations based on your experience and specific areas that they need help with.

As you study the project, compare the code as-implemented with the documentation, and begin by offering patches to the documentation and manual pages where these differ. Documentation tends to go stale in many projects since much of the "fun" involves writing the code. With developer blessing, consider zooming in on test coverage, if the project values test automation, and find ways to improve it. This will give you a deeper understanding of how the software operates, and it will help you uncover errors (bugs) that require fixing. Many of these will be inconsequential or minor, but as you learn from the developers and community, and participate in tightening up the code base, you will gain practical knowledge, as well as visibility in the community and respect. Furthermore, learning to maintain and fix issues in a mature project is an excellent way to learn how to contribute feature changes that are accepted, since it will challenge you to learn a lot about the code base, project organization, and community.

Pay close attention to how new features are proposed and developed. One thing that is common in many projects is that feature requests that are ambitious are often rejected, unless the person making the request is a known quantity who is willing to put in the work to see the feature through. Minor error reports and feature requests that are laser focused and that come with a well thought-out patch set implementing the feature or fix tend to be much better received. Most projects are willing to take in code that feels like it belongs (follows style, testing, and similar guidelines) and that is well developed. Since you are starting out, don't try to be too ambitious. You'll get a feel over time regarding what feature requests can be accepted, especially if you provide a patch in the request.

All of this being said, in the age of github, where anyone can start a project to scratch a personal itch, also consider creating a minor library that implements something that you care about. Keep this library small and laser focused. Go through the process of getting it to work on Linux, Windows, MacOS, FreeBSD, OpenBSD, NetBSD, etc. Learn about cross-platform build systems, writing man pages, writing documentation, and creating official releases. Try to find friends and colleagues willing to test out the library and provide you with feedback. While this is an open ended suggestion, you can learn a lot on your own about the sorts of problems that you need to solve as an OSS developer by trying your hand at creating and maintaining a small library on your own.

The important thing is to commit to realistic goals and be tenacious. There is a lot to learn, and at times, you will get frustrated. Don't give up. Accept reasonable criticism, but learn to ignore the trolls. Always be open to learning new things, and accept that people get VERY PASSIONATE about this stuff. Heated exchanges and insults are unfortunately quite common in this space. Try to take the high road by being polite, and don't get pulled into a flame war. Certain maintainers seem to like flaming newbies for fun. It's best to avoid those communities entirely, but if you can't, then invest in a good pair of asbestos coveralls.

Good luck, and I sincerely hope that you get involved! The OSS community needs developers, and we all start where you are, more or less.

dzaima · on Sept 17, 2023

"are simply wrong" and "under any reasonable compiler" don't mix - yes, for arm(32/64)/x86(-32 or -64) there are common values, but that doesn't make everything else not exist. OP doesn't say that the first two are UB, just that you don't know what they will be without additional information.

For example, here's a GCC-supported target that's available on Compiler Explorer, where the result of question 1 is "3", and question 2 is "1": https://godbolt.org/z/MjT8YhrTc

rwmj · on Sept 17, 2023

Sure, that's an AVR platform where structs are packed and ints are 16 bits, and it returns the size of the struct, in other words, the correct answer. And that's my point, it returns the size of the struct, it's not "don't know", it's not doing anything weird, so why have the question?

dzaima · on Sept 17, 2023

But, thus, upon seeing the program, without additional information you cannot know whether it'll give 8 or 3 (or who knows what else), and thus "I don't know." is very much an appropriate answer. Of course, it might be more clear to say "it depends" or "there isn't enough information to answer", but I'd say "don't know" is nevertheless a correct answer. Definitely more correct than "8" at least.

"it's not doing anything weird" only to people who do already know the C standard inside out. But there is a significant amount of people who might not know everything C (or might know that there are some weird things about types, but still assume 'int' will be at least 4 bytes or something).

It's of course not a question of much practical impact (for anyone not working in embedded at least) but it's nevertheless one that can be at least interesting to some.

rwmj · on Sept 17, 2023

The only time you're writing sizeof(x) is when you want to know the size of 'x', eg. to store it somewhere else or zero out the memory or something of that sort. And it gives the right answer, great! It doesn't ever do something that's undefined or strange, and it's not an obscure part of the C language.

gjm11 · on Sept 17, 2023

It seems like you're responding to something that isn't there -- an accusation along the lines of "... and therefore C is a bad language" or "... and therefore sizeof is poorly designed" or "... and it's bad that the correct answer is 'I don't know'".

The author isn't, so far as I can tell, making any such claim.

He's claiming only this: many people who program in C (or C++, which in this particular respect is the same) think they know that sizeof(...) will be 8, or think they know that sizeof(...) will be 5, and all those people are wrong, because it could be either of those things or various other things too, and there are contexts in which you need to be aware that the assumptions you're inclined to make around this sort of code are wrong.

All of which is straightforwardly correct, so far as I can tell.

As the author says, the question is really about struct padding more than it's about sizeof. It most likely doesn't matter that much whether or not someone knows that sizeof(...) might not be 8 in this situation. But it might matter if, e.g., they read the docs for some binary file format and see that it looks like

  offset type   name
  0000   int    block_size
  0004   char   record_type
  0008   int    user_id
  000C   int    unit_id
  0010   double radiation_level
  0018   int    timestamp

and think "aha, I'll make this neater" and write

  struct protocol_block {
    int block_size;
    char record_type;
    int user_id;
    int unit_id;
    double radiation_level
    int timestamp;
  };

and

  fread(...);
  struct protocol_block * block = buffer;
  int uid = block->user_id;

without being aware that they are making assumptions about what their compiler does with structs (and also about endianness, and other things).

Buttons840 · on Sept 17, 2023

You originally said:

> The first case, for example, returns 8 under any reasonable compiler

That is wrong. This part of C is apparently obscure enough that people make false assumptions like your own I quoted.

After you made that claim, someone provided a case where GCC returns something other than 8 and you edited your statement. Again, your statement was incorrect until you edited it, and so there must be some obscurity involved.

dzaima · on Sept 17, 2023

sizeof isn't an obscure part of the language, yes, but the specific behavior here might still be unexpected for a decent number of people, who might, say, think they can always use "sizeof(int)" and "4" interchangeably to shorten code (which could very well be true for all platforms they will ever care about, but nevertheless is not a guaranteed property by C by itself).

AnimalMuppet · on Sept 17, 2023

I am old. I remember when sizeof(int) was usually 2. And I may live long enough to see sizeof(int) typically be 8.

The 16-to-32 bit transition broke a lot of code that assumed sizeof(int) was 2. The next transition may do the same. (Or, we may keep "int is 32 bits" forever, and use long for 64. Who knows? I don't. You probably don't, either, so don't assume that sizeof(int) = 4.)

dzaima · on Sept 17, 2023

The present behavior on normal 32-bit and 64-bit platforms is that, on both, 'int' is 4 bytes, and 'long long' is 8 bytes; and on 64-bit, whether 'long' is 4 or 8 bytes depends on the ABI/target OS.

I'd imagine it's quite likely that 'int' stays 4 bytes even on hypothetical 128-bit CPUs - there's not much reason to change it, as on 64-bit it's already less than the CPU width, and thus is pretty much arbitrary even today. But yes, anything that wants a 4-byte/32-bit integer should just use <stdint.h>'s int32_t.

lelanthran · on Sept 17, 2023

> But, thus, upon seeing the program, without additional information you cannot know whether it'll give 8 or 3 (or who knows what else), and thus "I don't know." is very much an appropriate answer.

May it is appropriate, but its underhanded.

I mean if someone gave the single line of Go code:

     s := x + y

And then, when you said "that adds two number", the someone replied "Hah! Gotcha! The answer is 'I don't know'"

It's puerile.

stephen_g · on Sept 17, 2023

I don’t get what you’re saying - what’s underhanded about saying structure packing can have unexpected results across platforms, architectures, compilers and even compiler flags?

The point is that you could only ever say for sure what the answer will be if all that is exactly specified, otherwise your assumption could be wrong.

mannykannot · on Sept 17, 2023

Many errors have arisen from someone tacitly assuming what "any reasonable compiler" would do. These questions barely scratch the surface of the possibilities.

Furthermore, just because your code works with Reasonable Compiler 1, it does not mean it will work with Reasonable Compiler 2. I think I have seen enough C compilers that, for each of these questions, you could pick a pair giving different answers.

AnimalMuppet · on Sept 17, 2023

Worse, Reasonable Compiler 1 may give different results, depending on what flags you pass it at compile time.

krychu · on Sept 17, 2023

Your code dictates the compiler's behavior. If you grant the compiler flexibility, it's unreasonable to complain about unexpected results.

mannykannot · on Sept 18, 2023

I don't think it is a complaint; it's an observation in support of a point of view.

P-Nuts · on Sept 17, 2023

I have debugged a problem that was due to a left shift of 1ULL by 64 (in a routine that tried to set the lowest n bits by shifting 1 left and subtracting by 1). I needed to read the Intel docs to find out what actually happens to work out how badly we’d corrupted the customer data and how to write a fix to unmangle it.

What Intel processors do is ignore the upper bits of the shift quantity, so it’s effectively (1ULL << (x % 64)) - 1, therefore the function that should have set all ones for x=64 actually set all zeros.

nuc1e0n · on Sept 17, 2023

I understand where you're coming from, but have you tried them? I think there's a good chance the first question will return 16 on 64 bit systems.

tyingq · on Sept 17, 2023

Returns 3 on Borland Turbo C, which I think most people would say is a reasonable compiler.

matja · on Sept 17, 2023

Depends on the ABI which the compiler targets. The vast majority of 64-bit OS's are ILP32, LLP64 or LP64, where ints are 32-bit - but to the paper's point, you don't know until you compile it.

richardjam73 · on Sept 17, 2023

I got the first example to give 4 and the second one returns 1. OpenWacom DOS 16Bit. Under 32bit it gives 8 AND 0.

amalcon · on Sept 17, 2023

That's an illustrative example. You're not meant to actually use such code, but the relevant insight (that code with binary shifts is rarely portable with types of unspecified size) is accurate.

virtuous_sloth · on Sept 17, 2023

You can't say that with certainty without specifying the compiler, it's version, and the architecture.

minroot · on Sept 17, 2023

It is our fault that we don't comprehend the genius of great men

Borg3 · on Sept 17, 2023

Q1:C Q2:A Q3:C Q4:D Q5:A

And when I saw answers I smiled, Right! because he didnt specified platform.

Yeah C can be devious, but I still love it! Its my favorite language. And Ruby for scripting.

P-Nuts · on Sept 17, 2023

I knew they were all UB except the integer promotion one. Ah well, at least I have -Wconversion turned on to save me from myself on that one.

FpUser · on Sept 17, 2023

I do not "know (tm)" any language but I could and do use many.

lofties · on Sept 17, 2023

notquiteright · on Sept 17, 2023

I knew these were all ”I don’t know” edit: but yes, deceptive phrasing by the quiz author. If you don’t know that, then yeah, you’re probably not a great C programmer. Adjust your mental model: C is meant to be useful on a very wide variety of platforms, including ones that don’t follow the conventions (ASCII character table, size of byte, alignment requirements, etc.) of the most common platforms.

Even including knowledge of what is IB and UB, C is still simpler than most languages in common use these days.

wredue · on Sept 17, 2023

If “the standard doesn’t define it” is a reason for “I don’t knows then all of rust is “I don’t know”. Seems silly.

circuit10 · on Sept 17, 2023

Rust still behaves in a consistent way, anything undefined by the C standard can and often will change between platforms and compilers and optimisation levels

throwawaymaths · on Sept 17, 2023

Undefined per se is not a bad thing. It means "compiler can make choices based on certain assumptions in the name of performance". The problem with c is that you must have a comprehensive dictionary in your brain with tons of corner cases to know what is or is not undefined in any given compiler setting.

If C could have a consistent set of rules, and/or easily tag something as undefined a la "unsafe" or have some sort of visual reminder signal (like using function name prefixes or suffixes) that would go a long way to making it better.

MaxBarraclough · on Sept 17, 2023

> The problem with c is that you must have a comprehensive dictionary in your brain with tons of corner cases to know what is or is not undefined in any given compiler setting.

I'm reminded of a quote: [0]

> Ada has made you lazy and careless.

> You can write programs in C that are just as safe by the simple application of super-human diligence.

To your point about performance:

> compiler can make choices based on certain assumptions in the name of performance

I don't think the performance argument really applies with modern optimising compilers. The (too often overlooked) Ada language is safer than C, but has about the same performance, provided you avoid the features with runtime overhead. Similarly I don't think the performance of Rust, and in particular its Safe Rust subset, suffers much for its lack of undefined behaviour.

It's true that, say, Java, doesn't perform as well as C even today, but Java requires a slew of runtime checks and is hostile to micro-optimised memory-management. In Ada, things like array bounds checks can be enabled for debug builds but disabled for production builds, which isn't easy to do in C.

> If C could have a consistent set of rules, and/or easily tag something as undefined a la "unsafe" or have some sort of visual reminder signal (like using function name prefixes or suffixes)

This essentially can't be done. Even the MISRA C coding style, which aims to help with this kind of thing, can't completely guarantee to eliminate undefined behaviour from C codebases. To illustrate the challenge of 'containing the risk' with the C language, it's undefined behaviour to do this:

    int i; int j = i;

Fortunately other languages do a much better job at offering truly safe subsets (Rust and D for instance).

[0] https://people.cs.kuleuven.be/~dirk.craeynest/quotes.html

krychu · on Sept 17, 2023

> The problem with c is that you must have a comprehensive dictionary in your brain with tons of corner cases to know what is or is not undefined in any given compiler setting.

The cases of undefined behavior in the C standard are independent of compiler settings or options.

> If C could have a consistent set of rules …

The C language has a well-defined standard, but the presence of undefined behavior is a deliberate aspect of that standard.

throwawaymaths · on Sept 17, 2023

Deliberateness is not the same as consistency.

You can have well defined standards, that are wildly inconsistent. For example, in python, in file A:

    def get(dict, key):
      return dict[key]

In file B:

    def get(key, dict):
      return dict[key]

Imagine working in this codebase!

circuit10 · on Sept 17, 2023

I’m not saying it’s a bad thing, but it is something to be aware of

marcosdumay · on Sept 17, 2023

And program runs, and different times the same process execute them. Those two are what separate IB from UB on the original interpretation, from the last century.

But nowadays UB is something much more complex and dangerous than "I can't ever know the results of this".

BruceEel · on Sept 17, 2023

Right. Personally, I think those could have been better phrased as "I can't tell". As for C's simplicity, 100% what you said.

MaxBarraclough · on Sept 17, 2023

> Even including knowledge of what is IB and UB, C is still simpler than most languages in common use these days.

That depends what you mean by simple. C is a fairly small language, but it has far more than its share of footguns.

In terms of compiler engineering, sure, C is comparatively simple.

tester756 · on Sept 17, 2023

>C is still simpler than most languages in common use these days.

Having less keywords and implementing less fancier concepts

doesn't make my day2day life easier, actually it usually makes my day harder.

movedx · on Sept 17, 2023

Would you write a web app in it? (Just out of interest. I'm not trying to be an ass.)

notquiteright · on Sept 17, 2023

No, wrong level of abstraction. Why do manual memory management in a realm where it doesn’t really matter? In addition to the excessive amount of work (also why I wouldn’t choose Rust for the job), you’re opening yourself up to a whole class of errors for no good reason.

addaon · on Sept 17, 2023

I've written plenty of C web apps back in the old cgi-bin days, for no particular reason, and haven't found the memory management an issue. cgi-bin is a great example of a lifetime-managed execution environment where you can just use the heap as one big arena allocator, with malloc() having unchanged semantics (or, if you're feeling fancy, going to a bump allocator), and free() being no-op'd (or, more realistically, just omitted when writing code). Yeah, your high water mark memory usage might be a bit higher than a more managed approach, but the OS is a perfect garbage collector, and the fastest possible garbage collector, for short-lived executables.

The bigger issue is the mediocre string processing libraries that are so common to C.

aidenn0 · on Sept 17, 2023

I wrote my first web app in C (1999ish). Emitting HTML was way easier than any GUI libraries I had encountered up to that point.

I probably wouldn't write one in C today, but webapps are one of the easier string-heavy things to write in C[1]; you can get away with using a hierarchical memory manager and just free everything after the request is complete.

1: Having to do a lot of string manipulation is usually a good sign "you shouldn't be using C" There's a reason awk was written in 1977.

cozzyd · on Sept 18, 2023

if it runs on a microcontroller, then yes