char a[6]
in one source file, and in
another I declared extern char
*a
. Why didn't it work?
The declaration extern char
*a
simply does not match the actual
definition. The type "pointer-to-type-T" is not the same as
"array-of-type-T." Use extern char a[]
.
References: CT&P Sec. 3.3 pp. 33-4, Sec. 4.5 pp. 64-5.
char a[]
was identical to char
*a
.
Not at all. (What you heard has to do with formal parameters to
functions; see question 2.4.) Arrays are not pointers. The
array declaration "char a[6];
" requests that space for six
characters be set aside, to be known by the name "a
." That is,
there is a location named "a
" at which six characters can sit.
The pointer declaration "char
*p;
" on the other hand, requests a
place which holds a pointer. The pointer is to be known by the
name "p
," and can point to any char
(or contiguous array of
char
s) anywhere.
As usual, a picture is worth a thousand words. The statements
char a[] = "hello"; char *p = "world";
would result in data structures which could be represented like this:
+---+---+---+---+---+---+ a: | h | e | l | l | o |\0 | +---+---+---+---+---+---+
+-----+ +---+---+---+---+---+---+ p: | *======> | w | o | r | l | d |\0 | +-----+ +---+---+---+---+---+---+
It is important to realize that a reference like x[3]
generates
different code depending on whether x
is an array or a pointer.
Given the declarations above, when the compiler sees the
expression a[3]
, it emits code to start at the location "a,"
move three past it, and fetch the character there. When it sees
the expression p[3]
, it emits code to start at the location "p,"
fetch the pointer value there, add three to the pointer, and
finally fetch the character pointed to. In the example above,
both a[3]
and p[3]
happen to be the character 'l'
, but the
compiler gets there differently. (See also questions 17.19 and 17.20.)
Much of the confusion surrounding pointers in C can be traced to a misunderstanding of this statement. Saying that arrays and pointers are "equivalent" neither means that they are identical nor even interchangeable.
"Equivalence" refers to the following key definition:
An lvalue [see question 2.5] of type array-of-T which appears in an expression decays (with three exceptions) into a pointer to its first element; the type of the resultant pointer is pointer-to-T.(The exceptions are when the array is the operand of a
sizeof
or
&
operator, or is a literal string initializer for a character
array.)
As a consequence of this definition, there is no apparent
difference in the behavior of the "array subscripting" operator
[]
as it applies to arrays and pointers. In an expression of
the form a[i]
, the array reference "a
" decays into a pointer,
following the rule above, and is then subscripted just as would
be a pointer variable in the expression p[i]
(although the
eventual memory accesses will be different, as explained in
question 2.2). In either case, the expression x[i]
(where x
is
an array or a pointer) is, by definition, identical to *((x)+(i))
.
References: K&R I Sec. 5.3 pp. 93-6; K&R II Sec. 5.3 p. 99; H&S Sec. 5.4.1 p. 93; ANSI Sec. 3.2.2.1, Sec. 3.3.2.1, Sec. 3.3.6 .
Since arrays decay immediately into pointers, an array is never actually passed to a function. As a convenience, any parameter declarations which "look like" arrays, e.g.
f(a) char a[];
are treated by the compiler as if they were pointers, since that is what the function will receive if an array is passed:
f(a) char *a;
This conversion holds only within function formal parameter declarations, nowhere else. If this conversion bothers you, avoid it; many people have concluded that the confusion it causes outweighs the small advantage of having the declaration "look like" the call and/or the uses within the function.
References: K&R I Sec. 5.3 p. 95, Sec. A10.1 p. 205; K&R II Sec. 5.3 p. 100, Sec. A8.6.3 p. 218, Sec. A10.1 p. 226; H&S Sec. 5.4.3 p. 96; ANSI Sec. 3.5.4.3, Sec. 3.7.1, CT&P Sec. 3.3 pp. 33-4.
The ANSI C Standard defines a "modifiable lvalue," which an array is not.
References: ANSI Sec. 3.2.2.1 p. 37.
The sizeof operator reports the size of the pointer parameter which the function actually receives (see question 2.4).
This is a bit of an oversimplification. An array name is "constant" in that it cannot be assigned to, but an array is not a pointer, as the discussion and pictures in question 2.2 should make clear.
Arrays automatically allocate space, but can't be relocated or resized. Pointers must be explicitly assigned to point to allocated space (perhaps using malloc), but can be reassigned (i.e. pointed at different objects) at will, and have many other uses besides serving as the base of blocks of memory.
Due to the so-called equivalence of arrays and pointers (see
question 2.3), arrays and pointers often seem interchangeable,
and in particular a pointer to a block of memory assigned by
malloc
is frequently treated (and can be referenced using []
exactly) as if it were a true array. (See question 2.14; see
also question 17.20.)
5["abcdef"]
. How can this be legal C?
Yes, Virginia, array subscripting is commutative in C. This
curious fact follows from the pointer definition of array
subscripting, namely that a[e]
is identical to *((a)+(e))
, for
any expression e and primary expression a, as long as one of
them is a pointer expression and one is integral. This
unsuspected commutativity is often mentioned in C texts as if it
were something to be proud of, but it finds no useful
application outside of the Obfuscated C Contest (see question
17.13).
References: ANSI Rationale Sec. 3.3.2.1 p. 41.
The rule by which arrays decay into pointers is not applied recursively. An array of arrays (i.e. a two-dimensional array in C) decays into a pointer to an array, not a pointer to a pointer. Pointers to arrays can be confusing, and must be treated carefully. (The confusion is heightened by the existence of incorrect compilers, including some versions of pcc and pcc-derived lint's, which improperly accept assignments of multi-dimensional arrays to multi-level pointers.) If you are passing a two-dimensional array to a function:
int array[NROWS][NCOLUMNS]; f(array);
the function's declaration should match:
f(int a[][NCOLUMNS]) {...}or
f(int (*ap)[NCOLUMNS]) {...} /* ap is a pointer to an array */
In the first declaration, the compiler performs the usual implicit parameter rewriting of "array of array" to "pointer to array;" in the second form the pointer declaration is explicit. Since the called function does not allocate space for the array, it does not need to know the overall size, so the number of "rows," NROWS, can be omitted. The "shape" of the array is still important, so the "column" dimension NCOLUMNS (and, for 3- or more dimensional arrays, the intervening ones) must be included.
If a function is already declared as accepting a pointer to a pointer, it is probably incorrect to pass a two-dimensional array directly to it.
References: K&R I Sec. 5.10 p. 110; K&R II Sec. 5.9 p. 113.
It's not easy. One way is to pass in a pointer to the [0][0]
element, along with the two dimensions, and simulate array
subscripting "by hand:"
f2(aryp, nrows, ncolumns) int *aryp; int nrows, ncolumns; { ... ary[i][j] is really aryp[i * ncolumns + j] ... }
This function could be called with the array from question 2.10 as
f2(&array[0][0], NROWS, NCOLUMNS);
It must be noted, however, that a program which performs
multidimensional array subscripting "by hand" in this way is not
in strict conformance with the ANSI C Standard; the behavior of
accessing (&array[0][0])[x]
is not defined for x > NCOLUMNS
.
gcc allows local arrays to be declared having sizes which are specified by a function's arguments, but this is a nonstandard extension.
See also question 2.15.
Usually, you don't want to. When people speak casually of a pointer to an array, they usually mean a pointer to its first element.
Instead of a pointer to an array, consider using a pointer to one of the array's elements. Arrays of type T decay into pointers to type T (see question 2.3), which is convenient; subscripting or incrementing the resultant pointer accesses the individual members of the array. True pointers to arrays, when subscripted or incremented, step over entire arrays, and are generally only useful when operating on arrays of arrays, if at all. (See question 2.10 above.)
If you really need to declare a pointer to an entire array, use
something like "int (*ap)[N];
" where N
is the size of the array.
(See also question 10.4.) If the size of the array is unknown,
N
can be omitted, but the resulting type, "pointer to array of
unknown size," is useless.
int array[NROWS][NCOLUMNS];
array
and &array
?
Under ANSI/ISO Standard C, &array yields a pointer, of type
pointer-to-array-of-T, to the entire array (see also question 2.12).
Under pre-ANSI C, the & in &array
generally elicited a
warning, and was generally ignored. Under all C compilers, an
unadorned reference to an array yields a pointer, of type
pointer-to-T, to the array's first element. (See also question 2.3.)
It is usually best to allocate an array of pointers, and then initialize each pointer to a dynamically-allocated "row." Here is a two-dimensional example:
int **array1 = (int **)malloc(nrows * sizeof(int *)); for(i = 0; i < nrows; i++) array1[i] = (int *)malloc(ncolumns * sizeof(int));
(In "real" code, of course, malloc would be declared correctly, and each return value checked.)
You can keep the array's contents contiguous, while making later reallocation of individual rows difficult, with a bit of explicit pointer arithmetic:
int **array2 = (int **)malloc(nrows * sizeof(int *)); array2[0] = (int *)malloc(nrows * ncolumns * sizeof(int)); for(i = 1; i < nrows; i++) array2[i] = array2[0] + i * ncolumns;
In either case, the elements of the dynamic array can be
accessed with normal-looking array subscripts: array[i][j]
.
If the double indirection implied by the above schemes is for some reason unacceptable, you can simulate a two-dimensional array with a single, dynamically-allocated one-dimensional array:
int *array3 = (int *)malloc(nrows * ncolumns * sizeof(int));
However, you must now perform subscript calculations manually,
accessing the i,jth element with array3[i * ncolumns + j]
. (A
macro can hide the explicit calculation, but invoking it then
requires parentheses and commas which don't look exactly like
multidimensional array subscripts.)
Finally, you can use pointers-to-arrays:
int (*array4)[NCOLUMNS] = (int (*)[NCOLUMNS])malloc(nrows * sizeof(*array4));
, but the syntax gets horrific and all but one dimension must be known at compile time.
With all of these techniques, you may of course need to remember to free the arrays (which may take several steps; see question 3.9) when they are no longer needed, and you cannot necessarily intermix the dynamically-allocated arrays with conventional, statically-allocated ones (see question 2.15 below, and also question 2.10).
There is no single perfect method. Given a function f1()
similar to the f() of question 2.10, the array as declared in
question 2.10, f2()
as declared
in question 2.11,
array1
, array2
, array3
, and array4
as declared in 2.14, and a
function f3()
declared as:
f3(pp, m, n) int **pp; int m, n;
; the following calls should work as expected:
f1(array, NROWS, NCOLUMNS); f1(array4, nrows, NCOLUMNS); f2(&array[0][0], NROWS, NCOLUMNS); f2(*array2, nrows, ncolumns); f2(array3, nrows, ncolumns); f2(*array4, nrows, NCOLUMNS); f3(array1, nrows, ncolumns); f3(array2, nrows, ncolumns);
The following two calls would probably work, but involve
questionable casts, and work only if the dynamic ncolumns
matches the static NCOLUMNS
:
f1((int (*)[NCOLUMNS])(*array2), nrows, ncolumns); f1((int (*)[NCOLUMNS])array3, nrows, ncolumns);
It must again be noted that passing &array[0][0]
to f2()
is not
strictly conforming; see question 2.11.
If you can understand why all of the above calls work and are written as they are, and if you understand why the combinations that are not listed would not work, then you have a very good understanding of arrays and pointers (and several other areas) in C.
int realarray[10]; int *array = &realarray[-1];
Although this technique is attractive (and was used in old editions of the book Numerical Recipes in C), it does not conform to the C standards. Pointer arithmetic is defined only as long as the pointer points within the same allocated block of memory, or to the imaginary "terminating" element one past it; otherwise, the behavior is undefined, even if the pointer is not dereferenced. The code above could fail if, while subtracting the offset, an illegal address were generated (perhaps because the address tried to "wrap around" past the beginning of some memory segment).
References: ANSI Sec. 3.3.6 p. 48, Rationale Sec. 3.2.2.3 p. 38; K&R II Sec. 5.3 p. 100, Sec. 5.4 pp. 102-3, Sec. A7.7 pp. 205-6.
... int *ip; f(ip); ... void f(ip) int *ip; { static int dummy = 5; ip = &dummy; }
Did the function try to initialize the pointer itself, or just what it pointed to? Remember that arguments in C are passed by value. The called function altered only the passed copy of the pointer. You'll want to pass the address of the pointer (the function will end up accepting a pointer-to-a-pointer).
char
*
pointer that happens to point to some ints, and
I want to step it over them. Why doesn't
((int *)p)++;
In C, a cast operator does not mean "pretend these bits have a
different type, and treat them accordingly;" it is a conversion
operator, and by definition it yields an rvalue, which cannot be
assigned to, or incremented with ++
. (It is an anomaly in pcc-
derived compilers, and an extension in gcc, that expressions
such as the above are ever accepted.) Say what you mean: use
p = (char *)((int *)p + 1);
, or simply
p += sizeof(int);
References: ANSI Sec. 3.3.4, Rationale Sec. 3.3.2.4 p. 43.
void
**
pointer to pass a generic pointer to a
function by reference?
void
*
acts as a generic pointer only because conversions are
applied automatically when other pointer types are assigned to
and from void
*
's; these conversions cannot be performed (the
correct underlying pointer type is not known) if an attempt is
made to indirect upon a void
**
value which points at something
other than a void
*
.