experchange > c

T (01-14-20, 07:22 PM)
Hi All,

Had this conversatio with someone over on the
Raku mailing list:

>> By the way, "C String" REQUIRES a nul at the end:
>> an error in the NativeCall documentation.


> No, it does not. And even if it did, it should better go to the C, not Raku, documentation


Would someone correct me if a C String actually does
not require a nul to terminate itself?

If I am correct, would someone give me the chapter and
verse in the ISO that states this. (The guy I am
conversing with will ot take a third party definition.)
Malcolm McLean (01-14-20, 07:32 PM)
On Tuesday, 14 January 2020 17:22:24 UTC, T wrote:
> Hi All,
> Had this conversatio with someone over on the
> Raku mailing list:
> Would someone correct me if a C String actually does
> not require a nul to terminate itself?
> If I am correct, would someone give me the chapter and
> verse in the ISO that states this. (The guy I am
> conversing with will ot take a third party definition.)

Generally when we use the term "C string" we mean a nul-terminated string
with 8 bit characters, or coded in UTF-8.
However the C language allows you to construct strings in memory which are
not nul terminated. These won't work as expected with printf() or any
of the other C string handling functions. However you can write alternatives
to these functions that operate on counted strings, or whatever
alternative you have chosen.
T (01-14-20, 07:43 PM)
On 2020-01-14 09:32, Malcolm McLean wrote:
> On Tuesday, 14 January 2020 17:22:24 UTC, T wrote:
> Generally when we use the term "C string" we mean a nul-terminated string
> with 8 bit characters, or coded in UTF-8.
> However the C language allows you to construct strings in memory which are
> not nul terminated. These won't work as expected with printf() or any
> of the other C string handling functions. However you can write alternatives
> to these functions that operate on counted strings, or whatever
> alternative you have chosen.


So in other words, if y create a customer array of bytes
and have another variable somewhere to keep track of
its length, you don't have to terminate it with a
nul. Am I correct?

And good luck stopping a WinAPI call from careening
until it finds a nul if you don't nul terminate it.

Any idea where in the ISO the nul is documented?
Bart (01-14-20, 07:53 PM)
On 14/01/2020 17:43, T wrote:
> On 2020-01-14 09:32, Malcolm McLean wrote:
> So in other words, if y create a customer array of bytes
> and have another variable somewhere to keep track of
> its length, you don't have to terminate it with a
> nul.  Am I correct?
> And good luck stopping a WinAPI call from careening
> until it finds a nul if you don't nul terminate it.


Most functions in libraries that take a char* that represents text,
expect it to be nul-terminated.

I find that a nuisance as I make extensive use of either counted strings
(with attached length) or strings with a length stored separately.

Calling an API function requiring a terminating string, means allocating
N+1 bytes, copying the string into it, adding the terminator, then
passing that block. Then deallocating when done.
Malcolm McLean (01-14-20, 07:57 PM)
On Tuesday, 14 January 2020 17:43:19 UTC, T wrote:
> On 2020-01-14 09:32, Malcolm McLean wrote:
> So in other words, if y create a customer array of bytes
> and have another variable somewhere to keep track of
> its length, you don't have to terminate it with a
> nul. Am I correct?
> And good luck stopping a WinAPI call from careening
> until it finds a nul if you don't nul terminate it.

Yes. Although you actually give a bad example. Quite a lot of Windows API
calls take a length parameter instead of assuming that a string is nul-
terminated. That's for historical reasons when the Win API wasn't
specificlaly designed for C.
> Any idea where in the ISO the nul is documented? From one version of the standard


In translation phase 7, a byte or code of value zero is appended to each multibytecharacter sequence that results from a string literal or literals.

It's in the section "string literals", 6.4.5
David Brown (01-14-20, 09:17 PM)
On 14/01/2020 18:53, Bart wrote:
> On 14/01/2020 17:43, T wrote:
> Most functions in libraries that take a char* that represents text,
> expect it to be nul-terminated.
> I find that a nuisance as I make extensive use of either counted strings
> (with attached length) or strings with a length stored separately.
> Calling an API function requiring a terminating string, means allocating
> N+1 bytes, copying the string into it, adding the terminator, then
> passing that block. Then deallocating when done.


People could say exactly the same thing about working with Pascal
strings when they have a C string.

If you find you are often doing such conversions, why not just make sure
you have N + 2 bytes of space in the first place? Your first extra byte
is the length (limited to 255 byte strings), then comes the data. And
you can just do "s[s[0]] = 0" to get the null termination before calling
the C function (with s + 1 as the address).
Joe Pfeiffer (01-14-20, 09:36 PM)
T <T> writes:

> Hi All,
> Had this conversatio with someone over on the
> Raku mailing list:
> Would someone correct me if a C String actually does
> not require a nul to terminate itself?
> If I am correct, would someone give me the chapter and
> verse in the ISO that states this. (The guy I am
> conversing with will ot take a third party definition.)


The catch here is that C doesn't actually have strings, except for
string literals in the source code. Strings in programs are really
expectations on the part of the standard library, not part of the
language itself.

The definition of the standard C library contains the following:

"7.1.1 Definitions of terms
1 A string is a contiguous sequence of characters terminated by and including the first null
character. The term multibyte string is sometimes used instead to emphasize special
processing given to multibyte characters contained in the string or to avoid confusion
with a wide string. A pointer to a string is a pointer to its initial (lowest addressed)
character. The length of a string is the number of bytes preceding the null character and
the value of a string is the sequence of the values of the contained
characters, in order."

This appears in the C draft standard ISO/IEC 9899:201x (I expect it is
unchanged in the current, adopted standard but this is what I've got).
Keith Thompson (01-14-20, 09:37 PM)
Malcolm McLean <malcolm.arthur.mclean> writes:
> On Tuesday, 14 January 2020 17:43:19 UTC, T wrote: [...]
> From one version of the standard
> In translation phase 7, a byte or code of value zero is appended to
> each multibytecharacter sequence that results from a string literal or
> literals.
> It's in the section "string literals", 6.4.5


That's not directly relevant. String literals and strings are two
different things. String literals in source code *usually* result in
strings at run time, but not always. For example, the string literal
"foo\0bar" results in an 8-character array whose contents are not a
string. And there are plenty of ways to construct strings without using
string literals:

char s[4];
s[0] = 'f';
s[1] = 'o';
s[2] = 'o';
s[3] = 0;

The answer to the OP's question is 7.1.1 paragraph 1:

A *string* is a contiguous sequence of characters terminated by and
including the first null character.

The word "string" is in italics, so that's the definition of the term.

You can certainly have arrays of characters that are not terminated
by a null character, but such an array (or rather its content)
is not a "string".
Manfred (01-14-20, 09:48 PM)
On 1/14/2020 6:57 PM, Malcolm McLean wrote:
> On Tuesday, 14 January 2020 17:43:19 UTC, T wrote:
> Yes. Although you actually give a bad example. Quite a lot of Windows API
> calls take a length parameter instead of assuming that a string is nul-
> terminated. That's for historical reasons when the Win API wasn't
> specificlaly designed for C.
> From one version of the standard
> In translation phase 7, a byte or code of value zero is appended to each multibytecharacter sequence that results from a string literal or literals.
> It's in the section "string literals", 6.4.5


From n1570:

Section 5.2.1 "Character sets", p2, definition of null character:
"... A byte with all bits set to 0, called the null character, shall
exist in the basic execution character set; it is used to terminate a
character string"

Section 7 "Library", 7.1.1 "Definition of terms", definition of string:

"A string is a contiguous sequence of characters terminated by and
including the first null character. The term multibyte string is
sometimes used instead to emphasize special processing given to
multibyte characters contained in the string or to avoid confusion with
a wide string. A pointer to a string is a pointer to its initial (lowest
addressed) character. The length of a string is the number of bytes
preceding the null character and the value of a string is the sequence
of the values of the contained characters, in order."

This definition applies to the section about the standard library. As
other have said, it is possible to handle other kinds of text data (e.g.
counted strings) by writing the appropriate C code - in this case the
text data would be substantially a raw byte array that is handled (as
text) by the application code.
However, the standard library (which is part of the C standard) defines
strings as above, which is the layout that is required for text data to
be handled by the standard library (unless there are any exceptions from
this in the standard library that I am missing, but that's substantially
it).
Bart (01-14-20, 10:01 PM)
On 14/01/2020 19:17, David Brown wrote:
> On 14/01/2020 18:53, Bart wrote:
>> On 14/01/2020 17:43, T wrote:


> People could say exactly the same thing about working with Pascal
> strings when they have a C string.
> If you find you are often doing such conversions, why not just make sure
> you have N + 2 bytes of space in the first place?  Your first extra byte
> is the length (limited to 255 byte strings), then comes the data.  And
> you can just do "s[s[0]] = 0" to get the null termination before calling
> the C function (with s + 1 as the address).


This only really works for certain kinds of strings. For example, when
you are dealing with an entire, self-contained string when it is not
part of a larger object, or a larger string.

If the counted string involved is a field of a struct, and it needs to
have and be able to use all N bytes of its length, there is no room to
put a terminator for any string of length N.

(Here, there are ways to have a string count, and to use all bytes; to
use such string involves constructing a (pointer,length) descriptor, but
adding a terminator involves duplicating it.)

But most importantly, when the counted string is a slice to a substring
which is in the middle of another string, you can't write past the end
of it to add a zero.

Even for independent strings, ensuring space for a terminator can mean
double the allocation space when a string already contains 2**N
characters, depending on allocation strategy.

This also also complicates management: you are not dealing with N bytes
of its length, but N+1 bytes. Empty strings that normally are dealt with
simply by N=0 (and there is no pointer), for passing to C functions
require special-casing and providing either NULL or "".
Keith Thompson (01-14-20, 10:05 PM)
Joe Pfeiffer <pfeiffer> writes:
[...]
> The catch here is that C doesn't actually have strings, except for
> string literals in the source code. Strings in programs are really
> expectations on the part of the standard library, not part of the
> language itself.


C certainly does have strings, and you quoted the definition
yourself.

Strings exist at run time. A lot of standard library functions
require pointers to strings, but it's perfectly possible to use
strings in a C program independently of the standard library.

I suspect there's some true statement behind your statement that
"C doesn't actually have strings", but I honestly don't know what
it is. Can you clarify?

[...]
Joe Pfeiffer (01-14-20, 10:54 PM)
Keith Thompson <Keith.S.Thompson+u> writes:

> Joe Pfeiffer <pfeiffer> writes:
> [...]
> C certainly does have strings, and you quoted the definition
> yourself.
> Strings exist at run time. A lot of standard library functions
> require pointers to strings, but it's perfectly possible to use
> strings in a C program independently of the standard library.
> I suspect there's some true statement behind your statement that
> "C doesn't actually have strings", but I honestly don't know what
> it is. Can you clarify?


The spec separates the definition of the language itself (everything
before chapter 7) from the standard library (chapter 7). Unless I
missed it, chapters 1-6 only mention strings in the context of string
literals. The language (as distinguished from the library) has no
builtin string data type, no builtin string operators such as
concatenation or taking a substring, etc. The only place I know of that
the language itself supports null-terminated strings is by adding a null
to a string literal when initializing a pointer or a long-enough array.
In that sense, strings are in the library but not the language.
James Kuyper (01-15-20, 01:03 AM)
It is perfectly true that C doesn't have a string data type. It does,
however, have a string data format. C is not just the C language, it is
also the C standard library. You can't have a fully conforming
implementation of C without both.
Many of the C standard library routines have their behavior defined in
term of that data format, and the C language itself has two features
that make it easy to generate data in that format:
1. When a string literal is used to initialize a array of character type
with an unspecified length, or a length large enough to hold a
terminating nul character, it results in the creation of data in that
format.
2. When used in most other contexts (except as the operand of & or
sizeof), a string literal causes the creation of an unnamed character
array containing data in that format, and has a value that points at the
first element of that array.
Richard Damon (01-15-20, 04:43 AM)
On 1/14/20 12:22 PM, T wrote:
> Hi All,
> Had this conversatio with someone over on the
> Raku mailing list:
> Would someone correct me if a C String actually does
> not require a nul to terminate itself?
> If I am correct, would someone give me the chapter and
> verse in the ISO that states this.  (The guy I am
> conversing with will ot take a third party definition.)


7.1.1p1
A string is a contiguous sequence of characters terminated by and
including the first null character.
Keith Thompson (01-15-20, 07:31 AM)
Joe Pfeiffer <pfeiffer> writes:
> Keith Thompson <Keith.S.Thompson+u> writes: [...]
> The spec separates the definition of the language itself (everything
> before chapter 7) from the standard library (chapter 7). Unless I
> missed it, chapters 1-6 only mention strings in the context of string
> literals. The language (as distinguished from the library) has no
> builtin string data type, no builtin string operators such as
> concatenation or taking a substring, etc. The only place I know of that
> the language itself supports null-terminated strings is by adding a null
> to a string literal when initializing a pointer or a long-enough array.
> In that sense, strings are in the library but not the language.


The term "string" is defined in section 7 because nothing in the
language depends on it (note that the value of a string literal isn't a
string if it has an embedded \0), but the definition still applies, and
programs can still contain strings. In this program:

int main(void) {
char s[] = {'f', 'o', 'o', '\0'};
}

the object s contains a *string* even though it doesn't use the standard
library. More realistically, a program could pass the (decayed) value
of s to a function outside the standard library.

Similar Threads