▲Parameterized types in C using the new tag compatibility rulenullprogram.com

107 points by ingve 13 hours ago | 45 comments

fuhsnn 9 hours ago [-]

The recent #def #enddef proposal[1] would eliminate the need for backslashes to define readable macros, making this pattern much more pleasant, finger crossed for its inclusion in C2Y!

[1] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3531.txt

cb321 8 hours ago [-]

While long-def's might be nice, you can even back in ANSI C 89 get rid of the backslash pattern (or need to cc -E and run through GNU indent/whatever) by "flipping the script" and defining whole files "parameterized" by their macro environment like https://github.com/c-blake/bst or https://github.com/glouw/ctl/

Add a namespacing macro and you have a whole generics system, unlike that in TFA.

So, it might add more value to have the C std add an `#include "file.c" name1=val1 name2=val2` preprocessor syntax where name1, name2 would be on a "stack" and be popped after processing the file. This would let you do types/functions/whatever "generic modules" with manual instantiation which kind of fits with C (manual management of memory, bounds checking, etc.) but preprocessor-assisted "macro scoping" for nested generics. Perhaps an idea to play with in your slimcc fork?

glouwbug 2 hours ago [-]

I've been thinking of maybe doing CTL2 with this. Maybe if #def makes it in.

cb321 2 hours ago [-]

I think the #include extension could make vec_vec / vec_list / lst_str type nesting more natural/maybe more general, but maybe just my opinion. :-)

I guess ctags-type tools would need updating for the new possible definition location. Mostly someone needs to decide on a separation syntax for stuff like `name1(..)=expansion1 name2(..)=expansion2` for "in-line" cases. Compiler programs have had `cc -Dname(..)=expansion` or equivalents since the dawn of the language, but they actually get the OS/argv idea of separation from whatever CL args or Windows APIs or etc.

Anyway, might makes sense to first get experience with a slimcc/tinycc/gcc/clang cpp++ extension. ;-) Personally, these days I mostly just use Nim as a better C.

hyperbolablabla 5 hours ago [-]

I really don't think the backslashes are that annoying? Seems unnecessary to complicate the spec with stuff like this.

Arnavion 2 hours ago [-]

Neat similarity to Zig's approach to generic types. The generic type is defined as a type constructor, a function that returns a type. Every instantiation of that generic type is an invocation of that function. So the generic growable list type is `fn ArrayList(comptype T: type) type` and a function that takes two lists of i32 and returns a third is `fn foo(a: ArrayList(i32), b: ArrayList(i32)) ArrayList(i32)`

JonChesterfield 6 hours ago [-]

Not personally interested in this hack, but https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3037.pdf means struct foo {} defined multiple times with the same fields in the same TU now refers to the same thing instead of to UB and that is a good bugfix.

IAmLiterallyAB 4 hours ago [-]

If you're reaching for that hack, just use C++? You don't have to go all in on C++-isms, you can always write C-style C++ and only use the features you need.

pton_xd 2 hours ago [-]

Yeah as someone who writes C in C++, everytime I see posts bending over backwards trying to fit parameterized types into C I just cringe a little. I understand the appeal of sticking to pure C, but... why do that to yourself? Come on over, we've got lambdas, and operator overloading for those special circumstances... the water's fine!

pjmlp 2 hours ago [-]

Some people will do as much as they can to hurt themselves, only to avoid using C++.

Note as the newer versions are basically C++ without Classes kind of thing.

glouwbug 2 hours ago [-]

I think the main appeal is subset lock-down and compile times. ~5000 lines in C gets me sub second iteration times, while ~5000 lines in C++ hits the 10 second mark. Including both iostream and format in C++ gets any project up into the ~1.5 second mark which kills my iteration interests.

Second to that I'd say the appeal is just watching something you've known for a long time grow slowly and steadily.

kilpikaarna 2 hours ago [-]

This, and the two pages of incomprehensible compiler spam you get when you make a typo in C++.

uecker 43 minutes ago [-]

I see it the other way round. People hurt themselves by using C++. C++ fans will never understand it, but it you can solve your problem in a much simpler way, this is far better.

waynecochran 4 hours ago [-]

Not always a viable option -- especially for embedded and systems programming.

sim7c00 2 hours ago [-]

you are so right..thought hisotrically i would of disagreed just by being triggered.

templates is the main thing c++ has over c. its trivial to circumvent or escape the thing u dont 'like' about c++ like new and delete (personal obstacle) and write good nice modern c++ with templates.

C generic can help but ultimately, in my opinion, the need for templating is a good one to go from C to C++.

unwind 10 hours ago [-]

I think this is an interesting change, even though I (as someone who has loved C for 30+ years and use it daily in a professional capacity) don't immediately see a lot of use-cases I'm sure they can be found as the author demonstrates. Cool, and a good post!

glouwbug 3 hours ago [-]

Combined with C23's auto (see vec_for) you can technically backport the entirety of C++'s STL (of course with skeeto's limitation in his last paragraph in mind). gcc -std=c23. It is a _very_ useful feature for even the mundane, like resizable arrays:

  #include <stdlib.h>
  #include <stdio.h>
  
  #define vec(T) struct { T* val; int size; int cap; }
  
  #define vec_push(self, x) {                                                 \
      if((self).size == (self).cap) {                                         \
          (self).cap = (self).cap == 0 ? 1 : 2 * (self).cap;                  \
          (self).val = realloc((self).val, sizeof(*(self).val) * (self).cap); \
      }                                                                       \
      (self).val[(self).size++] = x;                                          \
  }
  
  #define vec_for(self, at, ...)             \
      for(int i = 0; i < (self).size; i++) { \
          auto at = &(self).val[i];          \
          __VA_ARGS__                        \
      }
  
  typedef vec(char) string;
  
  void string_push(string* self, char* chars)
  {
      if(self->size > 0)
      {
          self->size -= 1;
      }
      while(*chars)
      {
          vec_push(*self, *chars++);
      }
      vec_push(*self, '\0');
  }
  
  int main()
  {
      vec(int) a = {};
      vec_push(a, 1);
      vec_push(a, 2);
      vec_push(a, 3);
      vec_for(a, at, {
          printf("%d\n", *at);
      });
      vec(double) b = {};
      vec_push(b, 1.0);
      vec_push(b, 2.0);
      vec_push(b, 3.0);
      vec_for(b, at, {
          printf("%f\n", *at);
      });
      string c = {};
      string_push(&c, "this is a test");
      string_push(&c, " ");
      string_push(&c, "for c23");
      printf("%s\n", c.val);
  }

6 hours ago [-]

uecker 3 hours ago [-]

Here is my experimental library for generic types with some godbolt links to try: https://github.com/uecker/noplate

rwmj 9 hours ago [-]

Slighty off-topic, why is he using ptrdiff_t (instead of size_t) for the cap & len types?

r1chardnl 9 hours ago [-]

From one of his other blogposts. "Guidelines for computing sizes and subscripts"

  Never mix unsigned and signed operands. Prefer signed. If you need to convert an operand, see (2).

https://nullprogram.com/blog/2024/05/24/

https://www.youtube.com/watch?v=wvtFGa6XJDU

poly2it 8 hours ago [-]

I still don't understand how these arguments make sense for new code. Naturally, sizes should be unsigned because they represent values which cannot be unsigned. If you do pointer/size arithmetic, the only solution to avoid overflows is to overflow-check and range-check before computation.

You cannot even check the signedness of a signed size to detect an overflow, because signed overflow is undefined!

The remaining argument from what I can tell is that comparisons between signed and unsigned sizes are bug-prone. There is however, a dedicated warning to resolve this instantly.

It makes sense that you should be able to assign a pointer to a size. If the size is signed, this cannot be done due to its smaller capacity.

Given this, I can't understand the justification. I'm currently using unsigned sizes. If you have anything contradicting, please comment :^)

sparkie 5 hours ago [-]

C offers a different solution to the problem in Annex K of the standard. It provides a type `rsize_t`, which like `size_t` is unsigned, and has the same bit width, but where `RSIZE_MAX` is recommended to be `SIZE_MAX >> 1` or smaller. You perform bounds checking as `<= RSIZE_MAX` to ensure that a value used for indexing is not in the range that would be considered negative if converted to a signed integer. A negative value provided where `rsize_t` is expected would fail the check `<= RSIZE_MAX`.

IMO, this is a better approach than using signed types for indexing, but AFAIK, it's not included in GCC/glibc or gnulib. It's an optional extension and you're supposed to define `__STDC_WANT_LIB_EXT1__` to use it.

I don't know if any compiler actually supports it. It came from Microsoft and was submitted for standardization, but ISO made some changes from Microsoft's own implementation.

https://www.open-std.org/JTC1/SC22/WG14/www/docs/n1173.pdf#p...

https://www.open-std.org/JTC1/SC22/WG14/www/docs/n1225.pdf

poly2it 43 minutes ago [-]

This is an interesting middle ground. As ncruces pointed out in a sibling comment, the sign bit in a pointer cannot be set without contradicting the ptrdiff_t type. That makes this seem like a reasonable approach to storing sizes.

uecker 38 minutes ago [-]

"Naturally, sizes should be unsigned because they represent values which cannot be unsigned."

Unsigned types in C have modular arithmetic, I think they should be used exclusively when this is needed, or maybe if you absolutely need the full range.

sim7c00 7 hours ago [-]

I dont know either.

int somearray[10];

new_ptr = somearray + signed_value;

element = somearray[signedvalue];

this seems almost criminal to how my brain does logic/C code.

The only thing i could think of is this:

somearray+=11; somearray[-1] // index set to somearray[10] ??

if i'd see my CPU execute that i'd want it to please stop. I'd want my compiler to shout at me like a little child, and be mean until i do better.

-Wall -Wextra -Wextra -Wpedantic <-- that should flag i think any of these weird practices.

As you stated tho, i'd be keen to learn why i am wrong!

windward 4 hours ago [-]

In the implementation of something like a deque or merge sort, you could have a variable that represents offsets from pointers but which could sensibly be negative. C developers culturally aren't as particular about theoretical correctness of types as developers in some other languages - there's a lot of implicit casting being used - so you'll typically see an `int` used for this. If you do wish to bring some rigidity to your type system, you may argue that this value is distinct from a general integer which could be used for any arithmetic and definitely not just a pointer. So it should be a signed pointer difference.

Arrays aren't the best example, since they are inherently about linear, scalar offsets, but you might see a negative offset from the start of a (decayed) array in the implementation of an allocator with clobber canaries before and after the data.

mandarax8 1 hours ago [-]

Any kind of relative/offset pointers require negative pointer arithmetic. https://www.gingerbill.org/article/2020/05/17/relative-point...

poly2it 55 minutes ago [-]

I don't think you can make such a broad statement and be correct in all cases. Negative pointer arithmetic is not by itself a reason to use signed types, except if you are:

1. Certain your added value is negative.

2. Checking for underflows after computation, which you shouldn't.

The article was interesting.

windward 4 hours ago [-]

Pointer arithmetic that could overflow would probably involve a heap and therefore be less likely to require a relative, negative offset. Just use the addresses and errors you get from allocation.

poly2it 2 hours ago [-]

Yes, but there are definitely cases where this doesn't apply, for example when deriving an offset from a user pointer. As such this is not a universal solution.

ncruces 7 hours ago [-]

> It makes sense that you should be able to assign a pointer to a size. If the size is signed, this cannot be done due to its smaller capacity.

Why?

By the definition of ptrdiff_t, ISTM the size of any object allocated by malloc cannot be out of bounds of ptrdiff_t, so I'm not sure how can you have a useful size_t that uses the sign bit?

foldr 7 hours ago [-]

Stroustrup believes that signed should be preferred to unsigned even for values that can’t be less than zero: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p14...

poly2it 2 hours ago [-]

I've of course read his argument before, and I think it might be more applicable to C++. I exclusively program in C, and in that regard, the relevant aspects as far as I can tell wouldn't be clearly in favour of a signed type. I also think his discussion on iterator signedness mixes issues with improper bounds checking and attributes it to the size type signedness. What remains I cannot see justify using the a signed type other than "just because". I'm not sure it's applicable to C.

uecker 40 minutes ago [-]

I also prefer signed types in C for sizes and indices. You can screen for overflow bugs easily using UBSan (or use it to prevent exploitation).

rurban 6 hours ago [-]

Skeeto and Stroustrup are a bit confused about valid index types. They prefer signed, which will lead to overflows on negative values, but have the advantage of using only half of the valid ranges, so there's more heap for the rest. Very confused

2 hours ago [-]

o11c 3 hours ago [-]

Are we getting a non-broken `_Generic` yet? Because that's the thing that made me give up with disgust the last project I tried to write in C. Manually having to do `extern template` a few times is nothing in comparison.

uecker 37 minutes ago [-]

What is a non-broken `_Generic' ?

Surac 9 hours ago [-]

i fear this will make slopy code compile more often OK.

poly2it 8 hours ago [-]

Dear God I hope nobody is committing unreviewed LLM output in C codebases.

pests 4 hours ago [-]

No worries, the LLM commits it for you.

pjmlp 2 hours ago [-]

Eventually they will generate executables directly.

ioasuncvinvaer 9 hours ago [-]

Can you give an example?

tialaramex 9 hours ago [-]

It seems as though this makes it impossible to do the new-type paradigm in C23 ? If Goose and Beaver differ only in their name, C now thinks they're the same type so too bad we can tell a Beaver to fly even though we deliberately required a Goose ?

yorwba 8 hours ago [-]

"Tag compatibility" means that the name has to be the same. The issue the proposal is trying to address is that "struct Goose { float weight; }" and "struct Goose { float weight; }" are different types if declared in different locations of the same translation unit, but the same if declared in different translation units. With tag compatibility, they would always be treated as being the same.

"struct Goose { float weight; }" and "struct Beaver { float weight; }" would remain incompatible, as would "struct { float weight; }" and "struct { float weight; }" (since they're declared without tags.)

tialaramex 8 hours ago [-]

Ah, thanks, that makes sense.

Loading comments...

fuhsnn 9 hours ago [-]

The recent #def #enddef proposal[1] would eliminate the need for backslashes to define readable macros, making this pattern much more pleasant, finger crossed for its inclusion in C2Y!

[1] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3531.txt

cb321 8 hours ago [-]

Add a namespacing macro and you have a whole generics system, unlike that in TFA.

glouwbug 2 hours ago [-]

I've been thinking of maybe doing CTL2 with this. Maybe if #def makes it in.

cb321 2 hours ago [-]

I think the #include extension could make vec_vec / vec_list / lst_str type nesting more natural/maybe more general, but maybe just my opinion. :-)

Anyway, might makes sense to first get experience with a slimcc/tinycc/gcc/clang cpp++ extension. ;-) Personally, these days I mostly just use Nim as a better C.

hyperbolablabla 5 hours ago [-]

I really don't think the backslashes are that annoying? Seems unnecessary to complicate the spec with stuff like this.

Arnavion 2 hours ago [-]

JonChesterfield 6 hours ago [-]

IAmLiterallyAB 4 hours ago [-]

If you're reaching for that hack, just use C++? You don't have to go all in on C++-isms, you can always write C-style C++ and only use the features you need.

pton_xd 2 hours ago [-]

pjmlp 2 hours ago [-]

Some people will do as much as they can to hurt themselves, only to avoid using C++.

Note as the newer versions are basically C++ without Classes kind of thing.

glouwbug 2 hours ago [-]

Second to that I'd say the appeal is just watching something you've known for a long time grow slowly and steadily.

kilpikaarna 2 hours ago [-]

This, and the two pages of incomprehensible compiler spam you get when you make a typo in C++.

uecker 43 minutes ago [-]

I see it the other way round. People hurt themselves by using C++. C++ fans will never understand it, but it you can solve your problem in a much simpler way, this is far better.

waynecochran 4 hours ago [-]

Not always a viable option -- especially for embedded and systems programming.

sim7c00 2 hours ago [-]

you are so right..thought hisotrically i would of disagreed just by being triggered.

C generic can help but ultimately, in my opinion, the need for templating is a good one to go from C to C++.

unwind 10 hours ago [-]

glouwbug 3 hours ago [-]

  #include <stdlib.h>
  #include <stdio.h>
  
  #define vec(T) struct { T* val; int size; int cap; }
  
  #define vec_push(self, x) {                                                 \
      if((self).size == (self).cap) {                                         \
          (self).cap = (self).cap == 0 ? 1 : 2 * (self).cap;                  \
          (self).val = realloc((self).val, sizeof(*(self).val) * (self).cap); \
      }                                                                       \
      (self).val[(self).size++] = x;                                          \
  }
  
  #define vec_for(self, at, ...)             \
      for(int i = 0; i < (self).size; i++) { \
          auto at = &(self).val[i];          \
          __VA_ARGS__                        \
      }
  
  typedef vec(char) string;
  
  void string_push(string* self, char* chars)
  {
      if(self->size > 0)
      {
          self->size -= 1;
      }
      while(*chars)
      {
          vec_push(*self, *chars++);
      }
      vec_push(*self, '\0');
  }
  
  int main()
  {
      vec(int) a = {};
      vec_push(a, 1);
      vec_push(a, 2);
      vec_push(a, 3);
      vec_for(a, at, {
          printf("%d\n", *at);
      });
      vec(double) b = {};
      vec_push(b, 1.0);
      vec_push(b, 2.0);
      vec_push(b, 3.0);
      vec_for(b, at, {
          printf("%f\n", *at);
      });
      string c = {};
      string_push(&c, "this is a test");
      string_push(&c, " ");
      string_push(&c, "for c23");
      printf("%s\n", c.val);
  }

6 hours ago [-]

uecker 3 hours ago [-]

Here is my experimental library for generic types with some godbolt links to try: https://github.com/uecker/noplate

rwmj 9 hours ago [-]

Slighty off-topic, why is he using ptrdiff_t (instead of size_t) for the cap & len types?

r1chardnl 9 hours ago [-]

From one of his other blogposts. "Guidelines for computing sizes and subscripts"

  Never mix unsigned and signed operands. Prefer signed. If you need to convert an operand, see (2).

https://nullprogram.com/blog/2024/05/24/

https://www.youtube.com/watch?v=wvtFGa6XJDU

poly2it 8 hours ago [-]

You cannot even check the signedness of a signed size to detect an overflow, because signed overflow is undefined!

The remaining argument from what I can tell is that comparisons between signed and unsigned sizes are bug-prone. There is however, a dedicated warning to resolve this instantly.

It makes sense that you should be able to assign a pointer to a size. If the size is signed, this cannot be done due to its smaller capacity.

Given this, I can't understand the justification. I'm currently using unsigned sizes. If you have anything contradicting, please comment :^)

sparkie 5 hours ago [-]

I don't know if any compiler actually supports it. It came from Microsoft and was submitted for standardization, but ISO made some changes from Microsoft's own implementation.

https://www.open-std.org/JTC1/SC22/WG14/www/docs/n1173.pdf#p...

https://www.open-std.org/JTC1/SC22/WG14/www/docs/n1225.pdf

poly2it 43 minutes ago [-]

uecker 38 minutes ago [-]

"Naturally, sizes should be unsigned because they represent values which cannot be unsigned."

Unsigned types in C have modular arithmetic, I think they should be used exclusively when this is needed, or maybe if you absolutely need the full range.

sim7c00 7 hours ago [-]

I dont know either.

int somearray[10];

new_ptr = somearray + signed_value;

element = somearray[signedvalue];

this seems almost criminal to how my brain does logic/C code.

The only thing i could think of is this:

somearray+=11; somearray[-1] // index set to somearray[10] ??

if i'd see my CPU execute that i'd want it to please stop. I'd want my compiler to shout at me like a little child, and be mean until i do better.

-Wall -Wextra -Wextra -Wpedantic <-- that should flag i think any of these weird practices.

As you stated tho, i'd be keen to learn why i am wrong!

windward 4 hours ago [-]

mandarax8 1 hours ago [-]

Any kind of relative/offset pointers require negative pointer arithmetic. https://www.gingerbill.org/article/2020/05/17/relative-point...

poly2it 55 minutes ago [-]

I don't think you can make such a broad statement and be correct in all cases. Negative pointer arithmetic is not by itself a reason to use signed types, except if you are:

1. Certain your added value is negative.

2. Checking for underflows after computation, which you shouldn't.

The article was interesting.

windward 4 hours ago [-]

Pointer arithmetic that could overflow would probably involve a heap and therefore be less likely to require a relative, negative offset. Just use the addresses and errors you get from allocation.

poly2it 2 hours ago [-]

Yes, but there are definitely cases where this doesn't apply, for example when deriving an offset from a user pointer. As such this is not a universal solution.

ncruces 7 hours ago [-]

> It makes sense that you should be able to assign a pointer to a size. If the size is signed, this cannot be done due to its smaller capacity.

Why?

By the definition of ptrdiff_t, ISTM the size of any object allocated by malloc cannot be out of bounds of ptrdiff_t, so I'm not sure how can you have a useful size_t that uses the sign bit?

foldr 7 hours ago [-]

Stroustrup believes that signed should be preferred to unsigned even for values that can’t be less than zero: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p14...

poly2it 2 hours ago [-]

uecker 40 minutes ago [-]

I also prefer signed types in C for sizes and indices. You can screen for overflow bugs easily using UBSan (or use it to prevent exploitation).

rurban 6 hours ago [-]

2 hours ago [-]

o11c 3 hours ago [-]

uecker 37 minutes ago [-]

What is a non-broken `_Generic' ?

Surac 9 hours ago [-]

i fear this will make slopy code compile more often OK.

poly2it 8 hours ago [-]

Dear God I hope nobody is committing unreviewed LLM output in C codebases.

pests 4 hours ago [-]

No worries, the LLM commits it for you.

pjmlp 2 hours ago [-]

Eventually they will generate executables directly.

ioasuncvinvaer 9 hours ago [-]

Can you give an example?

tialaramex 9 hours ago [-]

yorwba 8 hours ago [-]

tialaramex 8 hours ago [-]

Ah, thanks, that makes sense.