\input texinfo @c -*- Texinfo -*-
@comment %**start of header (This is for running Texinfo on a region.)
@tex
\special{twoside}
@end tex
@setfilename c_reference_manual.info
@settitle YARMAC 0.13--- C Reference Manual --- DRAFT 1998-08-29
@setchapternewpage odd
@comment DAV: Is the header really the right place to put
@comment the setchapternewpage command ?
@set VERSION 0.13 @c huh ?
@paragraphindent none
@comment %**end of header (This is for running Texinfo on a region.)
@ignore
123456789012345678901234567890123456789012345678901234567890123456789012
Only visible in source file !
[FIXME: do tabs in sample programs need to be replaced by spaces ?]
[DVDEUG: YES!!!]
[FIXME:
consider adding some of the style guide suggestions at
http://www.rdrop.com/~cary/html/linux.html
to this text.
]
Should this be put under a "OPL" liscense ? see
OpenContent
http://www.opencontent.org/home.shtml
@end ignore
@comment from "info:texinfo#Installing_Dir_Entries"
@comment "@dircategory" and "@direntry" are used only by "install-info".
@comment Why doesn't the "info:texinfo#Beginning_a_File" documentation
@comment mention this ?
@dircategory Programming
@direntry
* YARMAC: (c_reference_manual). Yet Another Reference Manual About C.
@end direntry
@ifinfo
@c The summary description and copyright ---
@c --- does not appear in the printed document.
This is an unfinished, unpublished work.
When finished, it will be a C reference manual
and at that time
you may freely distribute it under terms vaguely similar
to the following:
This C reference manual documents some of the features
of the C language that are required for writing programs
in that language.
Copyright @copyright{} 1988 Richard M. Stallman; @copyright{} 1994
Peter Seebach; @copyright{} 1998 David Cary
current maintainer (1998-05-26): David Cary
Permission is granted to make and distribute verbatim
copies of this manual provided the copyright notice and
this permission notice are preserved on all copies.
@ignore
Permission is granted to process this file through TeX
and print the results, provided the printed document
carries a copying permission notice identical to this
one except for the removal of this paragraph (this
paragraph not being relevant to the printed manual).
@end ignore
Permission is granted to copy and distribute modified
versions of this manual under the conditions for
verbatim copying, provided also that the sections
entitled ``Copying'' and ``GNU General Public License''
are included exactly as in the original, and provided
that the entire resulting derived work is distributed
under the terms of a permission notice identical to this
one.
Permission is granted to copy and distribute
translations of this manual into another language,
under the above conditions for modified versions,
except that this permission notice may be stated in a
translation approved by the Free Software Foundation.
@end ifinfo
@titlepage
@c start of title page --- does not appear in the Info file.
@title YARMAC
@subtitle UNFINISHED DRAFT 1998-05-26
@subtitle NOT FOR DISTRIBUTION, YET
@subtitle Yet Another Reference Manual About C
@author current maintainer: David Cary
@page
@vskip 0pt plus 1filll
@c start of copyright page
Copyright @copyright{} 1988 Richard M. Stallman;
@copyright{} 1994 Peter Seebach;
@copyright{} 1998 David Cary
current maintainer (1998-08-23): David Cary
This is an unfinished, unpublished work.
When finished, it will be a C reference manual
and at that time
you may freely distribute it under terms vaguely similar
to the following:
Published by ...
current maintainer: David Cary
Permission is granted to make and distribute verbatim
copies of this manual provided the copyright notice and
this permission notice are preserved on all copies.
Permission is granted to copy and distribute modified
versions of this manual under the conditions for
verbatim copying, provided also that the sections
entitled ``Copying'' and ``GNU General Public License''
are included exactly as in the original, and provided
that the entire resulting derived work is distributed
under the terms of a permission notice identical to this
one.
Permission is granted to copy and distribute
translations of this manual into another language,
under the above conditions for modified versions,
except that this permission notice may be stated in a
translation approved by the Free Software Foundation.
@end titlepage
@comment node-name, next, previous, up
@ifinfo
@node Top, Copying, , (dir)
@top YARMAC
YARMAC version 0.13 (1998-08-29) DRAFT
This is an unfinished, unpublished work.
When finished, it will be a C reference manual.
This manual is intended to cover the standard C language
and all compliant C compilers.
(K&R C, ANSI/ISO C, GNU C, and the upcoming C9X standard)
The GNU C compiler with the
@code{-ansi -pedantic-errors}
options
is a standard C.
Global questions about this DRAFT:
Yes, I know all the chapter headings are all lowercase.
I've been influenced by _IEEE Spectrum_ doing the same thing;
is this just a fad ?
How should I handle URIs ?
When this document is run through texi2html,
I could make links that (a) directly jump to the referenced site.
Some people prefer such links to (b) jump to a bibliography at the end;
only the links in that bibliography actually exit the document.
This document is intended to be a reference for
people who already know a little C
and who are reading other people's C source code and
trying to figure out what's going on.
@comment this is the ``master menu''
@menu
@comment Main Chapters and appendices
* Copying:: YARMAC
will be free to distribute. [FIXME]
* Introduction to YARMAC:: What is YARMAC ? (Overview)
Who is the intended audience ?
* Top-Down View of C:: A Top-Down View of C
* conventions:: general language conventions
* fundamental data types:: fundamental data types
* variables:: variables
* pointers:: pointers
* Creating New Data Types:: Creating New Data Types
* Arithmetic and Bitwise Operators:: Arithmetic and Bitwise Operators
* bool type:: working with the bool type
(true, false, and logical operators)
* expressions:: expressions
* = and side effects:: assignments and side effects
* evaluation order:: Precedence vs. order of evaluation
* sequence points:: sequence points
* branching:: branching
* looping:: looping
* functions:: functions
* scope:: scope, linkage, access and duration
* I/O:: Input and Output
* macros:: macros and the preprocessor
* History of C:: History of C
* History of this document:: History of this document
* Bibliography:: Bibliography
* another view:: The Compiler Writer's View
@comment Indices
* Glossary:: Glossary
* keystroke index:: Keystroke Index
* Concept Index:: Concept Index
@detailmenu
--- Detailed Chapters and subsections ---
* Copying:: YARMAC
will be free to distribute. [FIXME]
* Introduction to YARMAC:: What is YARMAC ? (Overview)
Who is the intended audience ?
* conventions:: general language conventions
* fundamental data types::
* Int Const Default:: Default Type of an Integer Constant
* Int Const Type:: Explicitly Typed Integer Constants
* Int Conversion:: Type Conversion among Integer Types
* Int Promotion:: Default Integer Promotions
* variables:: variables
* pointers:: pointers
* Creating New Data Types:: Creating New Data Types
* Arithmetic and Bitwise Operators:: Arithmetic and Bitwise Operators
* bool type:: working with the bool type
(true, false, and logical operators)
* expressions:: expressions
* = and side effects:: assignments and side effects
* evaluation order:: Precedence vs. order of evaluation
* sequence points:: sequence points
* branching:: branching
* looping:: looping
* functions:: functions
* scope:: scope, linkage, access and duration
* I/O:: Input and Output
* macros:: macros and the preprocessor
* History of C:: History of C
* History of this document:: History of this document
* Bibliography:: Bibliography
* another view:: The Compiler Writer's View
* Glossary:: Glossary
* keystroke index:: Keystroke Index
* Concept Index:: Concept Index
@end detailmenu
@end menu
@end ifinfo
@comment node-name, next, previous, up
@node Copying, Introduction to YARMAC, Top, Top
@chapter Copying (Information on FSF)
@section General Public License and explanation
[FIXME: Is this the latest proper language ?]
Copyright
@copyright{} 1994, 1995, 1996, 1998 Free Software Foundation, Inc.
Permission is granted to make and distribute verbatim copies
of this manual
provided the copyright notice and this permission notice
are preserved on all copies.
Permission is granted to copy and distribute modified versions
of this manual
under the conditions for verbatim copying,
provided that the entire resulting derived work is distributed under
the terms of a permission notice identical to this one.
Permission is granted to copy and distribute translations
of this manual into
another language, under the above conditions for modified versions,
except that this permission notice may be stated
in a translation approved by
the Free Software Foundation.
@section Why FSF? (Richard Stallman and the Gnu Manifesto)
@section Why Gnu C?
@section Trivia
@node Introduction to YARMAC, Top-Down View of C, Copying, Top
@chapter Introduction to YARMAC
C language reference manual
This book intends to be a reference to the C programming language.
It assumes you have already gone through the tutorial
that came with your
C compiler, and are familiar with editing files on your own platform.
This book is not a replacement for the FAQ,
or any other explanatory book;
it is expected to be a mere reference.
A reference manual can have bugs.
Bugs include segments that are inaccurate or unclear.
Bugs include a reference that should have been in the index.
When you find bugs, please report them to the maintainer;
currently, that's @samp{d.cary@@ieee.org}.
I'll try to patch the bug,
or explain why it's a feature.
Please include the version number from the top of the file
in any bug report.
[DVDEUG: --- doesn't show up right!]
The C programming language is widely available ---
widely enough that there are many distinct dialects.
This manual aims to cover the K&R (traditional) dialect, ANSI/ISO C,
GNU C, and the upcoming C9X standard.
The primary targets are ISO C (for portability)
and GNU C (to go with the @samp{gcc} compiler).
The K&R C style is referred to mostly for portability to obscure systems,
and to help identify what was intended by code written in this style.
Some side references to other dialects may be included where necessary;
in particular, some of the more visible incompatibilities with C++
are covered,
because this kind of information can be helpful.
This book contains a lot of opinion and dogma;
this is understood to be the author's personal opinion,
but frequently is related to issues that look harmless
until you use your second compiler.
I have tried to avoid picking sides in the major religious wars,
focussing instead on things that are known to introduce problems.
The goal of this book is to provide a high quality reference manual
for the C language, available in machine-readable form.
No paper index can compete with the vast speed of modern computers
at searching for information.
Each chapter will have a section titled `Trivia'
which will contain likely sticking points, non-obvious implications,
and other things that are expected to answer the questions
of experienced programmers, or which may prove interesting to know.
The Appendices are intended to serve as quick references,
with pointers to more detailed treatments in the text.
The Index in particular, it is hoped, will be of greater value
than indices in computer books usually are.
@section Trivia
There are 2 kinds of bugs: ones I know about (labeled FIXME),
and ones I don't know about.
@node Top-Down View of C, conventions, Introduction to YARMAC, Top
@chapter a top-down view of C
[FIXME: most programmers reference manuals
start with individual characters and build up from there.
I think it might be interesting to start at a high level
and work down; a quick reference for people who
found a C program on the 'web, and need to know how
change a couple of @code{#define}s to get it working
on their machines; then get more and more detailed
as they need to make a little change here and a little change there...
How high a level should I start ?
Buckminster Fuller says to start with the universe,
then start dividing.
]
[DVDEUG: Right now it isn't written like this.
Is there plans to write it like this,
because personally I would skip it.
The key - unanswered question - is the audience.
If it is a _reference manual_ then it should be for people who know C.
My adding the comparisons to Pascal,
C++ and Java would make it for anyone with computer science.
Restructuring the document
like this would make it more for anyone,
at the cost of reducing its value as a C reference.]
[
I'm thinking about adding a section,
"When should one *not* use C ?"
to add a brief discussion of all the other
wonderful (and free !) tools
that do things that C is overqualified and/or underqualified for.
Simple batch files and sed scripts,
raw assembly,
PERL,
FORTRAN,
Octave,
C++,
etc.
]
One typically creates a executable program
by using the @code{make} utility.
[FIXME: Recent GNU programs tend to
have the user run Configure to generate a make file.]
One goes to the directory containing the source code of the C program
(a bunch of files that usually end in @code{.c} or @code{.h})
@comment (and @code{.cpp} @code{.hpp} FIXME ?)
and also containing a file named @code{makefile},
then types @code{make}, and if everything goes as designed,
the @code{make} utility uses the @code{makefile}
to find all the various parts and combine them into
a single executable file.
Then you type the name of that file
to run it.
[Really large programs typically have subdirectories,
or "branches", each with their own @code{makefile}
that is recursively called from the "root" @code{makefile}].
The @code{make} utility, the format of the @code{makefile},
and the process of compiling the source into a executable
are outside the scope of this manual.
This manual covers what is in the C source files
and how that corresponds to what your executable program actually does
when you run it.
Inside the C source files, there is 2 kinds of text:
comments intended solely for human readers,
and "code" intended to be understood by a compiler.
@section Overview
A C program consists of a collection of files.
The naming of files is arbitrary; the only thing mandated by most compilers
is that the filename of the code ends in @code{.c}@footnote{
@code{.c}, not @code{.C}.
Regrettably, there is a difference - @code{.C} implies a C++ program, despite
this being error prone even under a UNIX-like system,
and tragic under a system that is case-insensitive.},
and that can be overridden by hand.
In practice, the name of a file is a
brief description of the contents of the file with @code{.c} appended.
In addition to @code{.c} (code) files, there are
@code{.h}@footnote{Again, @code{.H} implies C++.} (header) files.
To control the compilation of these files,
most people use @code{make},
which automatically builds only the files that need to be built.
Many high-quality projects
use @code{autoconf} to help make their code easily portable
to a wide variety of systems. While they are both
useful tools for C programming, they fall outside the realm of this work.
Please see [insert references to the manuals].
A header file contains different material than a code file.
A header file contains information for more than one file:
prototypes for functions[add reference],
global [add reference to glossary] data,
and type definitions
[add reference to compound & user-defined types].
They can also @code{include} other header files using @code{#include}.
A code file mainly contains
the actual code of the program,
@code{including} (using @code{#include}) header files so that it recognizes
the information it shares with other files.
@section Trivia
In reality, a header file can include anything ---
the preproccessor [reference?]
just textually replaces each @code{#include} statement
with the entire file it references.
Once this is done,
the compiler can't tell the difference
between lines of text in the original @code{.c} source file
and lines of text that were @code{#include}d.
[/* was:
In reality, a header file can include anything it wants.
The preproccessor [reference?] just textually adds the files together,
and as long as it doesn't depend on a pass of the preproccessor
that occurs before the adding, it will work.
*/]
(This author (David Starner)
once had a program where the main function started in a header file
and finished in the @code{.c} file the header file was included in.)
This is really not reccommended;
in general a header file should only contain what is mentioned above.
When the GNU preprocessor encounters
a @code{#include <>} or @code{#include ""},
where does it look for those files ?
[FIXME: answer.]
@section In Comparison
@c In Comparison is written so that one who knows one of the languages can look and tell the major differences in the section,
@c without reading the entire section.
@subsection Java
While a Java class must be saved in a file of the same name,
a C file has no such restriction on file names.
C also has header files (@code{.h})
for data needed in multiple files,
whereas the Java compiler reads the @code{.class} files.
[ARGGH! I don't know any Java (yet).
But I was under the impression that
@code{.java} files were source code
that the Java compiler converted into
@code{.class} executables,
which contradicts this paragraph.]
@subsection C++
C++ uses
@code{.cpp}, @code{.cc}, or @code{.C} source code
and @code{.hpp}, @code{.H}, or @code{.h} header files.
Header files under C++ contain classes and
code in form of inline functions
as well as what is noted above as being in C header files.
@subsection Pascal
[DVDEUG: ARGGH! I have little familarity with Pascal.
This is only basic Wirth & Jensen or ISO 7185 Pascal.]
Whereas classic Pascal has a monolithic file structure
with only one file per program,
C has functions and declarations split up between
files and uses header files to hold common data.
[DAV may be bluffing when he says he knows Pascal:]
Many versions of Pascal use Units ...
@node conventions, fundamental data types, Top-Down View of C, Top
@chapter general language conventions
spaces, tabs, whitespace
@section Program Structure
@section Comments
[FIXME: The description about how to use comments needs to flow.]
Everything from a @code{/*} to the first @code{*/}
is a comment intended solely for the human reader.
Comments are ignored by the C compiler --
they have no effect on the exectable file.
You can insert a comment anywhere there is "white space".
Please put some comments next to
code you write or change.
You want your comments to tell WHAT your code does, not HOW.
Documentation embedded in the source code is best.
External documentation all too often is separated, lost.
If the documentation is right there in the source code,
it far easier to update it when the code changes.
@section On one line
@code{ /* @dots{} */ }
@section Blocks
@example
/*
@dots{}
@dots{}
@dots{}
*/
@end example
@section Trivia
Some C compilers (including GNU C) allow the C++ comment,
@code{//} followed by arbitrary text up to a newline.
This is
a standard comment format for C9X.
Unfortunately, too many early C compilers
do not recognize the @code{//} comment style,
so the @code{//} should not be used in portable C source code.
At least one highly-contrived program compiles legally
under both kinds of compilers, but executes differently.
If you have code that uses the @code{//} comment style,
you can convert it to the @code{/* ... */} comment style with
@example
sed -e 'sX^\([^"]*\("[^"]*"[^"]*\)*\)//\(.*\)$X\1/*\3*/Xg' test.cpp > test.c
@end example
[FIXME: what about the DOC++ comment style, ?
http://www.zib.de/Visual/software/doc++/
DVDEUG: It's non-free isn't it? Don't worry about it.
I will add something about literate programming.
]
The extreme in commenting comes with Knuth's literate programming style,
wherein @TeX{} is interlaced with actual code in order to produce
programs that can be read like books. @TeX{} itself is written in this
style. If you are interested, look through the Web2C documentation
and get Cweb.
[How to get this documentation ?]
@node fundamental data types, Int Const Default, conventions , Top
@chapter fundamental data types
Programmers use an infinite number of possible types of data.
All the different types (at least in C) are built
out of the fundamental types
``built-in'' to the C language:
@itemize @bullet
@item
the ``bool'' values
(also called a "bit")
It represents either true or false.
@item
The integers (char, int, short, long)
@item
The floating-point numbers (float, double, long double)
@item
Pointers (``*'')
@item
The ``un-type'' (void)
@end itemize
Pointers and user-defined types are covered in a later section
(``Creating New Data Types'').
A data type represents (among other things)
the range of values that a variable can hold.
This range is limited by, and specific to,
the particular compiler used to compile that program.
Every compiler should come with the standard header @code{},
which specifies the largest and smallest values
of its fundamental data types.
Note that these values are often different for different compilers.
It doesn't make any sense to use any other @code{}
file besides the original one that came with the compiler.
@section Integers (char int long)
@cindex integers
An integer datum always contains some whole number
such as -93, -12, 0, 1, or 69.
But that's not the whole story.
Every value in C must have a specific data type.
As a consequence, it is impossible to simply have the value 7;
it must be 7 in a particular type.
The C language has several different data types for integers.
Each type has a range of possible values;
different types have different ranges.
For example, a variable of type @code{short int}
can hold any value from -32768 to 32767 in
programs compiled by my compiler.
A variable of type @code{int}
can hold a value from -2,147,483,648 to 2,147,483,647 on
most 32 bit machines.
On other size architectures, the range of an @code{int}
could be as small as the range -32,768 to 32,767,
the same as a @code{short int}.
Keep this in mind when writing programs
that might be ported to a smaller machine.
When you define an integer variable,
you must choose one of the standard integer
types for it.
The type controls what range of values the variable can hold,
and at the same time the amount of storage space used for the variable.
It is impossible to
store in a variable a value outside the range of its type;
if you try to do this, the actual result is to store some other value,
a value that is within the permissible range.
(In practice, the extra high-order bits are discarded
and the low-order bits are stored.)
[FIXME: Stroustrup said once,
``unsigned integers, declared @code{unsigned},
obey the laws of arithmetic modulo 2^n.
This implies that unsigned arithmetic does not overflow''.
Is this really true for all standards-compliant compilers ?]
Here are examples of declarations for integer variables:
@example
short int s;
int u, v;
unsigned long x;
unsigned long long z; /* non-portable */
@end example
You can omit the keyword @code{int} if you use
any of the keywords @code{long},
@code{short}, @code{signed} or @code{unsigned}.
@xref{Declarations}.
@section Signed and Unsigned Types
@cindex signed types
@cindex unsigned types
Each integral type has two forms: @samp{signed} and @samp{unsigned}.
The two forms occupy the same amount of storage space
and their ranges are equally large.
A signed type has a range of values centered on zero,
while an unsigned type has a range that starts at zero.
For example,
(on one particular machine using one particular compiler)
the @code{ short int} has 65536 distinct values.
The unsigned form, @code{unsigned short int},
can hold any integer in the range 0 to 65535,
while the signed form, @code{signed short int},
has a range centered on zero, -32768 to 32767 to be exact.
It is common to assume that specific sizes
(16-bit @samp{short}, 32-bit @samp{long})
are ``standard''; this assumption is in error.
(All too often, some people assume that @samp{int} is 16 bits.
Others assume that @samp{int} is 32 bits.
Obviously, both cannot be right, and sometimes both are wrong.)
Similarly, it is not guaranteed
that any integer type can hold a pointer,
although it is quite common for it to be possible.
[FIXME: This paragraph is (or should be) redundant
compared to later information.]
For portability,
avoid trying to cast a pointer into any integer type
use @samp{long} when you need more than 16 bits (up to 32 bits)
use @samp{char} when you need no more than 8 bits
and want to conserve space,
use @samp{short} when you need more than 8 bits (up to 16 bits)
and want to conserve space
use @samp{int} when you need at most 16 bits and want speed over space.
In general, programs that make assumptions
about the sizes of the integral types
are device drivers for a specific operating system,
or very poorly written.
@kindex signed
@kindex unsigned
You can specify a signed type or an unsigned type by using the keywords
@code{signed} and @code{unsigned} as part of the type name.
@section Table of Integer Types
@kindex int
@kindex short
@kindex char
Here is a table of all the integer types of C,
together with their ranges
(as documented in @code{} in a typical implementation
of GNU C):
You are guaranteed that @samp{char} will be at least 8 bits,
@samp{short}
at least 16, @samp{long} at least 32.
@samp{int} will be at least as large as @samp{short},
and no longer than @samp{long}.
In general, each type will be no larger than the next larger type.
For example, there are implementations where all of the integral types
are 64 bits.
@table @code
@item int
@itemx signed int
Four-byte signed integer; range -2^31 to 2^31-1.
Guaranteed to be at least as large as @samp{short},
i.e., on smaller machines,
the range could be as small as -2^15 to 2^15-1.
@item unsigned int
Four-byte unsigned integer; range zero to 2^32-1.
On smaller machines, the range could be as small as 2^16-1.
@item short int
@item signed short int
Two-byte signed integer; range -2^15 to 2^15-1.
@item unsigned short int
Two-byte unsigned integer; range 0 to 2^16-1.
@item signed char
One-byte signed integer; range -128 to 127.
@item unsigned char
One-byte unsigned integer; range 0 to 255.
@item char
Depending on the machine, @code{char} is an alias for either
@code{signed char} or @code{unsigned char}.
The only values that you can count on to fit in a @code{char}
regardless of the type of machine are 0 to 127.
@item long int
@itemx unsigned long int
These types in GNU C are equivalent to @code{int}
and @code{unsigned int}.
In some other C implementations, @code{long int}
occupies more bytes than @code{int}.
For example, in the original implementation of C, @code{int}
occupied only two bytes (like @code{short int}),
and to get a four-byte integer
it was necessary to use the type @code{long int}.
@item long long int
Double precision signed integers ranging from -2^63 to 2^63-1.
These integers occupy 8 bytes.
(Most C compilers don't support this type.)
@item unsigned long long int
Double precision unsigned integers ranging from 0 to 2^64-1.
These integers occupy 8 bytes.
(Most C compilers don't support this type.)
@end table
Even though two types may be equivalent
(@code{int} and @code{long int} are equivalent in my compiler, and
@code{char} is always equivalent to either @code{unsigned} or @code{signed char})
they are considered distinct types.
For example, the types pointer-to-@code{int} and pointer-to-@code{long int}
are completely different types.
@section Integer Constants
@cindex integer constant
@cindex octal
@cindex decimal
Any positive integer value can be written as a constant.
There are no constants for negative integer values,
but unary @samp{-} and a positive constant do the job of one.
@subsection Integer Constant Radices
There are three ways of writing integer constants:
decimal, octal and hexadecimal.
@itemize @bullet
@item
A decimal constant is a sequence of digits not starting with a zero.
Any positive number except zero can be written this way.
@item
An octal constant is a sequence of digits starting with a zero.
The zero tells the compiler to interpret the digits in base 8.
Thus, @samp{010} has value 8,
@samp{013} has value 11, and @samp{0100} has value 64.
Strictly speaking, @samp{0} is an octal constant.
But 0 is 0 in any radix.
@item
@cindex hex digit
@kindex 0x
A hexadecimal (or @dfn{hex}) constant is @samp{0x} (or @samp{0X})
followed by a sequence of @dfn{hex digits}.
A hex digit is either a decimal digit, or a letter
in the range @samp{a} through @samp{f} (upper or lower case).
@samp{a} stands for 10,
@samp{b} for 11, and so on, through
@samp{f} for 15.
Thus, the hex constant @samp{0xa} has value 10,
@samp{0x10} has value 16, @samp{0x16} has value 22,
@samp{0x20} has value 32, and @samp{0xff} has value 255.
@end itemize
Hexadecimal constants are used more often than octal constants,
because it is easy to see how a hexadecimal constant breaks down
into separate bytes.
Each pair of hexadecimal digits makes one byte.
Octal constants don't split conveniently into bytes.
@node Int Const Default, Int Const Type, fundamental data types, Top
@subsection Default Type of an Integer Constant
Like all C expressions,
an integer constant specifies a data type as well as a value.
The type is usually determined by the value,
unless you use a suffix letter (@pxref{Int Const Type}).
The type of a decimal constant is taken from the following series:
@example
int, long int, unsigned long int
@end example
@noindent
The type of a decimal constant is the first type in that series
which can hold the constant's value.
Thus, any value that is small enough will have type @code{int}.
In GNU C, @code{long int} never plays a role
because it is effectively the same as @code{int};
not so in other C implementations.
The type of an octal or hex constant is taken from the following series:
@example
int, unsigned int, long int, unsigned long int
@end example
@noindent
There are some values that can fit in an @code{unsigned int}
but not in an @code{int}; if the constant is written in octal or hex,
that unsigned type is used for such values.
In GNU C
(since @code{unsigned int} and @code{unsigned long int}
have the same range),
some values are have type @code{unsigned int}
when written as a octal or hex constant,
but have type @code{unsigned long int}
when written as a decimal constant.
@node Int Const Type, Int Conversion, Int Const Default, Top
@subsection Explicitly Typed Integer Constants
@kindex l
@kindex u
The letters @samp{u} and @samp{l} may be used as suffixes
to specify the type of an integer constant.
The letter @samp{u} means it must be unsigned.
The letter @samp{l} means it must be long.
(Upper case is accepted also;
in fact, @samp{L} is better than @samp{l}
because @samp{l} looks too much like a @samp{1}.)
The effect of the suffix
is to reject certain types from the series of possible types.
(The series of possible types depends on the constant's radix;
@pxref{Int Const Default}).
@samp{l} rejects the types that are not long,
and @samp{u} rejects those that are signed.
Once those are rejected, the type used
is the first of those remaining which can hold the actual value.
@subsection Integer Constant Type Examples
Here are some examples of integer constants
and their types.
@itemize @bullet
@item
The hex constant @code{0x80000000} needs 32 bits.
On my compiler [DVDEUG: GNU C?], its type is @code{unsigned int},
because it can fit in that,
whereas it is just barely too large for an @code{int}.
@item
@code{2147483648} is the same value, expressed in decimal.
It is a @code{long unsigned int} because @code{unsigned int}
is never used for decimal constants,
and neither @code{int} nor @code{long int} will hold this value.
@item
@code{0x80000000L} is likewise a @code{long unsigned int}.
@code{unsigned int} is ruled out by the @samp{L},
so the next candidate type that can hold the value is used.
@item
@code{0x80000000u} is an @code{unsigned int},
just like @code{0x80000000}.
The @samp{u} rules out @code{int},
but that has no effect,
since this value doesn't fit in an @code{int} anyway.
@item
@code{2147482648u} is a @code{long unsigned int}.
@code{int} and @code{long int} are ruled out by the @samp{u},
and @code{unsigned int} is ruled out by the choice of decimal radix.
@item
@code{3l} is a @code{long int}.
@code{int} is barred by the @samp{l}
and @code{long int} is the next candidate for a decimal constant.
[FIXME: Is this `@code{ul}' really valid ?]
@item
@code{4ul} is a @code{unsigned long int}.
@code{int} is barred by the @samp{l};
@code{long int} is barred by the @samp{u};
@code{unsigned long int} is the next candidate for a decimal constant.
@end itemize
@node Int Conversion, Int Promotion, Int Const Type, Top
@section Type Conversion among Integer Types
C allows automatic conversion between integer types.
Conversions can be requested explicitly with casts (@pxref{Casts});
they also happen automatically
when the operands of an arithmetic operator have different types,
and for integer promotion (@pxref{Int Promotion}).
@node Int Promotion, variables, Int Conversion, Top
@section Default Integer Promotions
@cindex promotions (integer)
In C, the @code{short} and @code{char} types (whether signed or not)
are nominally never used for any operation.
Values of these types appearing in arithmetic expressions
are always converted to type @code{int} before any arithmetic is done,
before they are passed as arguments to a function, and so on.
In fact, the GNU C compiler may omit the conversion,
but only when this has no effect on the result.
For understanding the meaning of a C program,
you can assume that the conversion always happens.
@ignore
Controversy over previous 2 paragraphs.
[DVDEUG: Whoa! Default promotion like this disappeared with ISO C!]
[DAV:
Promotion seems to be alive and well. I run
#include
main(){
unsigned char a, b, c;
a = 0xff;
b = 0xff;
c = (a+b)/2;
printf("%i", (unsigned int)c );
}
If there is *no* promotion, then
(uchar)0xff + (uchar)0xff
should equal (uchar)0xfe.
Divide by 2, and we get 0x7f (printing "127").
However, when I compile this with
$ gcc --version
2.7.2.3
and run it, it prints "255".
Is there a better explanation than just saying that the
chars were promoted to int, so that the result of addition
(uchar)0xff + (uchar)0xff
is (int)0x0ffe ?
]
FIXME: What is the most understandable way of summarizing this ?
I prefer easy-to-understand "as if" rules,
even if a particular compiler doesn't happen to actually work that way
internally -- as long as we get the same results.
@end ignore
@section Floating Point Numbers (float, double, long double)
@cindex floating point
@cindex mantissa
@cindex exponent
@cindex scientific notation
@dfn{Floating-point} numbers
are the computer's version of ``scientific notation''.
Floating point data is often called ``real'' data
but strictly speaking this is a misuse of language.
Floating point is often used to represent real-number values,
but general real numbers cannot be exactly represented,
only approximated.
[DVDEUG: Add reference to
"What every Computer Scientist should know about floating point.
Also add references to NAN and Inf.]
[FIXME: is all this really necessary ?
Can't we just say that fixed-point numbers can
handle fractions and numbers of a certain range
with a certain precision, and be done with it ?]
In scientific notation,
a number is represented as the product of its @dfn{mantissa},
which is a number between 1 and 10, and a power of 10.
The power of 10 used is called the @dfn{exponent} of the number.
Here are some examples of numbers in scientific notation:
@table @asis
@item 129
1.29 * (10^2)
@item 100
1.0 * (10^2)
@item 99
9.9 * (10^1)
@item 5.5
5.5 * (10^0)
@item .125
1.25 * (10^-1)
@end table
Floating point notation in the computer
is the binary equivalent of scientific notation.
The mantissa is between 1 (inclusive) and 2 (exclusive)
and is represented in binary;
the exponent is a power of 2 instead of 10.
Here is how the previous examples would look
in the computerized format:
@table @asis
@item 129
1.0000001 * (2^7)
@item 100
1.1001 * (2^6)
@item 99
1.100011 * (2^6)
@item 5.5
1.10111 * (2^5)
@item .125
1.0 * (2^-3)
@end table
Note that the exponent of the number zero is not really determined
because 0 * (2^0) = 0 * (2^1) = 0 * (2^@var{anything}).
By convention, when zero is represented as a floating-point number,
zero is used as the exponent value.
@section Floating Point Types
Floating-point data types in the computer differ in how many bits are
available for representing the mantissa and the exponent. The number of
mantissa bits determines how much significance can be represented; the
number of exponent bits determines the overall range of magnitudes that
can be represented.
For example, if 7 bits are available for the exponent,
the range of possible exponents is from @minus{}64 to 63,
so the range of possible floating point values
is from 2^@minus{}64 to 1.111@dots{} * 2^63.
With 8 exponent bits, the smallest possible positive value
is twice as small and the largest possible positive value is twice as
large.
If only 4 bits were available for the mantissa,
it would be impossible to distinguish the numbers 16 and 17
(10000 and 10001 in binary).
Only the first 4 significant bits, 1000 in both cases, could be kept.
In actuality, at least 24 bits of mantissa are always available.
This translates to around 7 significant decimal digits.
Since the first bit of the mantissa is always one,
it is often not explicitly represented.
[FIXME: Is this always true for all GNU C implementations ?]
All ANSI C implementations provide three distinct data types for
floating point numbers:
@code{float},
@code{double}, and
@code{long double}.
In GNU C,
@code{float} is a 32-bit single-precision number;
32 bits are available for the mantissa, exponent and sign bit.
Just how the bits are apportioned among mantissa and exponent
depends on the kind of computer in use.
@code{double} is a 64-bit double-precision number.
@code{long double} is equivalent to @code{double},
but it is considered a distinct type.
@section Floating Point Constants
Floating point constants let you express particular floating-point
numbers in C programs. Each floating-point constant specifies a numeric
value and a data type (either @code{float}, @code{double} or @code{long
double}).
The numeric value consists of a mantissa optionally followed by an
exponent. The mantissa is a number with a decimal point.
An exponent is the letter @samp{e} (or @samp{E}) followed by an integer
which may have a sign.
If an exponent is given, the decimal point is not required in
the mantissa.
Here are some examples, all of which have the value 150:
@example
150.0
150e0
15e1
1.5e2
1.5e+2
1.500e2
.015e4
@end example
A letter at the end of the constant specifies the data type.
The letter @samp{F} (or @samp{f}) specifies type @code{float}.
The letter @samp{L} (or @samp{l}) specifies type @code{long double}.
No letter at all specifies the default, which is @code{double}.
It is rarely necessary to use letters to specify the type explicitly.
One time when it is useful is when using the constant in arithmetic
together with values of type @code{float}:
if you do not explicitly specify the type,
the constant is a @code{double}.
The compiler will add code to convert the other values to @code{double}
and the arithmetic would be done in @code{double} precision.
If the result that you want is a @code{float},
the extra conversions would make the program unnecessarily slow.
You can avoid the extra conversions
by explicitly specifying the type of your constant as @code{float},
like this:
@example
@{
float *x, y;
*x = (y + 1.3f) * 2.4f;
@}
@end example
@section the ``un-type'' (void)
The ``un-type'' @code{void} is used only in these 3 common situations:
@itemize @bullet
@item the type of the single argument to functions which take no arguments
@item a generic pointer, i.e., a pointer of type @code{void *},
can point to a object of any type (see Pointers)
@item the return type of a function which doesn't return anything
(see Functions for both flavors of this situation).
@end itemize
There are no objects of type @samp{void}.
@section Numeric Type Conversion
In C, any numeric type can be converted automatically to any other
numeric type.
Type conversion happens
in assignments, in arithmetic, and in casts (@pxref{Casts}).
It may also happen in @code{return} statements (@pxref{Return}) and
in function calls when a prototype is in effect (@pxref{Prototype}).
For example, if @code{x} is a variable declared as @code{int}
and @code{f} is declared as @code{float}, then
@example
f = x;
@end example
@noindent
converts the value of @code{x} to floating point and
@example
x = f;
@end example
@noindent
converts the value of @code{f} to an integer.
If a constant appears in a context where
it would need to be converted immediately to another type,
GNU C converts it while compiling the program.
Normally this makes no difference except to speed up execution.
@section Integer Conversion
The general rule when converting a value from one integer type to
another
is that the numeric value is unchanged
if it is within the range of possible values for the new type.
If it is outside the possible range,
then the number's bit pattern is preserved.[FIXME: This is confusing.].
If the number has too many bits to fit, then the least significant bits
are kept, as many as will fit.
@cindex extending
Converting an integer of a narrower type to a wider integer type
(such as @code{char} to @code{int}) is called @code{extension}.
If the types are signed, it is called @code{sign-extension}.
If the original type is unsigned, it is called @code{zero-extension}.
In either case, the number keeps the same value.
There is one other case of extension, from a signed type to an unsigned
one.
This case is an exception because only positive values can go through
unchanged; negative values cannot do so because the unsigned type
cannot represent them.
A negative number large in absolute value becomes a small positive
number, and a negative number close to zero becomes a large positive
number.
This case is error-prone, so check carefully whenever you write code
that converts @code{signed} numbers to @code{unsigned}.
@cindex truncation
When a value of wider type is converted to a narrower type, it keeps
the same value if possible; but often this is impossible.
For example, 513 (1000000001 in binary) cannot keep the same value when
converted to a @code{char}; it is outside the possible range of a
@code{char}.
In this case, the least significant bits remain the same and the rest
are lost.
Thus, 513 converts to the @code{char} value 1.
This is called @dfn{truncation}.
Sometimes truncation of a positive value has a negative result.
For example, truncating 129 (10000001 in binary) to a @code{char} has
the value @minus{}127 because the first 1 in the number is now the
sign-bit.
Of course, this happens only when the result type is a signed type.
There is one other case of integer type conversion, that where the old
and new types are equally wide but one is signed and the other is
unsigned.
In this case, the bit pattern is preserved.
For example, when converting from @code{char} to @code{unsigned char}
or vice versa, values 0 through 127 are unchanged.
@code{char} values @minus{}128 through @minus{}1 map into
@code{unsigned char} values 128 through 255, respectively, and vice
versa.
It was shown above how 129 as an @code{unsigned char} corresponds to
@minus{}127 as a @code{char}.
@section Floating Point Conversion
When a value of type @code{float} is converted to @code{double}, it
keeps the same numeric value.
@code{double} can represent anything that @code{float} can.
Likewise when @code{float} or @code{double} is converted to @code{long
double}, accuracy is maintained.
@cindex floating overflow
When a value of type @code{double} is converted to @code{float}, two
kinds of problems must be faced.
@itemize @bullet
@item
@code{float} has fewer mantissa bits.
The most significant mantissa bits are kept, as many as will fit, so
that the result is close to the original value even if not exactly the
same.
@item
@code{float} has fewer exponent bits, so its largest possible value is
smaller.
If the number being converted fits in the possible range of a
@code{float}, this problem has no effect.
If the number does not fit, the result is pure garbage, this being an
example of @dfn{floating overflow}.
@end itemize
@section Integer to Floating Point
When an integer value is converted to a floating point type, in
general, the result is the floating point value which is numerically
closest to the original integer.
In some cases, the integer can be represented exactly.
For example, converting the integer 5 to @code{float} results in the
number 1.25 * 2^2, or, in binary, 1.01 * 2^2, whose value is exactly 5.
But this is not possible for large integers.
An @code{int} has 31 significant bits; in a @code{float}, some of the
32 bits are needed for sign and exponent, leaving typically 24 bits of
significance.
Integers greater than this cannot be represented exactly.
For example, both 268435456 and 268435457 convert to the same floating
point number (these integers are 2^28 and 2^28+1).
This loss of significance does not happen when converting an @code{int}
to a @code{double} because type @code{double} has more than 32 bits of
mantissa.
@section Floating Point to Integer
When a floating-point value is converted to an integer type, the result
is the nearest integer, rounding toward zero.
Thus, 1.5 converts to 1, and @minus{}1.5 converts to @minus{}1.
A floating-point value may far exceed the range of a @code{int}.
For example, the largest possible @code{float} value is at least 2^64
--- much too large for an @code{int}.
When such values are converted to @code{int}, the result is undefined.
[FIXME: this section needs work]
@section Trivia
You can make @code{bool} variables in C++,
but not in ordinary C.
C9x has provisions for boolean variables. [DVDEUG: Specify!!]
Some compilers (GNU C among them) add the type @samp{long long},
which is most often 64 bits.
It is not compatible with ISO C, in which it is a syntax error,
but it may prove helpful or necessary during porting projects.
[FIXME: a few compilers have something vaguely similar to
_int16, _int32, _int64, and others - are
they worth mentioning them here ? DVDEUG: I don't see why;
they are extremely non-standard, and not part of GNU-C. ]
@section In Comparison
A function with a void return type is usually called
a ``procedure'' in most other languages.
@node variables, pointers, Int Promotion, Top
@chapter variables
@section declaring variables
@example
int x;
@end example
@noindent
declares @code{x} to have type @code{int}.
Every variable used in a C program must be defined, in a
@dfn{declaration}, before it is used.
The declaration has five purposes:
@enumerate
@item
To give the function or variable a name, so it can be used later.
@item
To describe the data type of the function or variable: for example,
whether the value is an integer or a character string.
This is done with a @dfn{type specifier} and a @dfn{declarator}.
[@var{declarator} may not be an English word, but it is the
standard term.]
@item
To specify how storage for a variable should be allocated.
This is done with a @dfn{storage class} (@pxref{Storage Class}).
@item
To specify the @dfn{scope} of the name: for example, whether the name
is known in an entire program or only in the current file or function.
The storage class fills this role also.
@item
Optionally, to give an initial value.
This is done with an @dfn{initializer} (@pxref{Initializers}).
@end enumerate
@section initializing variables
If the variable is static or automatic [FIXME: what other kind of
variable is there ?], an initializer may be added, as in
@example
int x = 5;
@end example
@noindent
which is the same as the previous example except that @code{x} is
initialized to 5 when its storage is allocated.
@xref{Initializers}.
@section Assignment statements (and combinatorial assignment)
[FIXME: huh ?]
@section choosing variable names
start with letter ...
number, ...
underscore ...
...
[FIXME: is there a maximum length ? ] ...
Normal programs cannot use the C keywords for identifiers (variable
names, and function names, and user-defined type names). I also
highly recommended that you do not use these other special reserved
words for identifiers:
@table @asis
@item Words that start with underscore
@item C keywords
[FIXME]
@item C++ keywords
asm catch class delete friend
inline new operator private protected
public template try this virtual throw
@end table
@section Type Conversion
@section Automatic type conversion
@section Type casting
@section Quantization Errors
@node pointers, Creating New Data Types, variables, Top
@chapter pointers
@section the pointer type
A pointer represents the address of a block of memory, together with
the data type of the block.
Pointers have several uses:
@itemize @bullet
@item
Pointers represent character strings.
[FIXME: Is this confusing ?
Is this a good pedagogical viewpoint --- that character strings are
directly related to pointers, rather than merely being of type
@code{char []} ?]
@item
A subroutine can be told where to store its output-value by giving it a
pointer to the desired place.
@xref{Address}, for an example of this use.
@item
A subroutine can be told which function to call by giving it a pointer
to the desired function.
@xref{@code{quicksort()}} for an example of this use.
@xref{Function Pointers}.
@item
Trees and linked lists can be created by storing pointers to blocks of
data into other blocks of data.
@xref{Lists}, for an example of this use.
@end itemize
@section declaring pointers
@cindex pointer types
@cindex pointer declarations
In C, every expression must have a single clearly defined data type.
This includes an expression to refer to the contents of a pointer.
C determines the type of the contents by the type of the pointer.
Therefore, C has many types of pointers --- one for each type of
contents.
Each C data type @var{t} has a corresponding pointer type, the type of
pointers-to-@var{t}.
A value of type pointer-to-@var{t} describes the address of a block of
memory whose contents have type @var{t}.
To declare a variable @var{v} to have type pointer-to-@var{t}, pretend
you are declaring @code{* @var{v}} to have type @var{t}.
(This isn't much of a pretense, because @code{* @var{v}} will be an
expression of type @var{t}.) @xref{Declarations}.
For example, to declare @code{p} as a pointer to a @code{char}, write:
@example
char* p;
@end example
[FIXME: Can I delete this paragraph ? Does it say anything that hasn't
already been said, and better, by the previous few paragraphs ?]
A
pointer type is a derived type, and cannot be the basic type of a
declaration. To declare a variable with pointer type, you must also
specify: ``To what type of thing does this variable point ?''. To
declare @var{v} with type pointer-to-@var{t}, one must declare the
complex declarator @code{* @var{v}} to have base type @var{t}. For
example,
[/FIXME]
@example
char* string;
@end example
@noindent
declares @code{string} to be a pointer to @code{char}.
Here the declarator is @code{* string} --- a complex declarator that
expresses the relationship between @code{string}'s type and the
declaration's basic type (@code{char}).
To express pointers to types that are not themselves basic, the @code{*
@var{var}} construct is nested within other declarator constructs.
For example, a pointer to a pointer-to-@code{char} is declared as
follows:
@example
(char (*(* stringptr)));
@end example
In this case, the parentheses are optional.
This is exactly equivalent to
@example
char** stringptr;
@end example
A pointer-to-a-pointer is commonly called a @dfn{handle}; in this case,
we have a ``handle-to-a-@code{char}''.
If you want a variable named @code{funcptr} to point to function taking
two @code{double} arguments and returning @code{int}, write:
@example
int (*funcptr)(double, double); /* funcptr is a pointer variable */
@end example
@noindent
Here parentheses are required around @code{*funcptr} to specify that
the @code{*@var{var}} construct is nested within the function-type
construct.
[FIXME: David still doesn't know how to parenthesize arbitrary type
declarations ... is there a simple rule ?]
If you had written
@example
int* funcptr(double, double); /* function prototype */
@end example
@noindent
the compiler would think that you were declaring the function prototype
@example
int* (funcptr(double, double)); /* identical function prototype */
@end example
@noindent
a function whose value is a pointer to an @code{int}.
@xref{Precedence}.
You can add a initialization to a pointer declarator for static and
automatic variables [FIXME: what other kind of variables is there ?].
For example,
@example
char* string = "Hello";
char **stringptr = &string;
int (*funcptr) (double, double) = &double_divide_and_round;
@end example
@noindent
Note that the initializer is added after the entire declarator, but the
value of the initializer must have the same type as the variable being
declared --- @emph{not} the basic type of the declaration.
[@var{initializer} is not an English word, but a special term for
talking about C programs.]
@subsection The generic pointer type @code{void *}
The type @code{void *} is used, by convention, for the address of a
block of memory to which no particular type is ascribed.
For example, dynamic memory allocation functions typically return this
type.
If a dynamic allocation function is intended for general use, then
there is no telling what type of data the caller wants to allocate ---
any C data type is possible --- so there is no reason to prefer any one
type for the function to return.
But the value must have @emph{some} type.
@code{void *} is a noncommittal choice.
A pointer of type @code{void *} has no ``contents''; you cannot apply
the @samp{*} operator to it.
However, you can cast it to any other pointer type, and @emph{then}
apply the @samp{*} operator.
For example, the following is valid:
@example
char c;
int i;
struct foo s;
void * x;
x = malloc( sizeof(foo) );
c = * (char *) x;
i = * (int *) x;
s = * (struct foo *) x;
@end example
[FIXME: Is this really valid ? I've seen some mainframe operating
systems, if you try to read data out of a uninitialized block, will
core dump your program.]
[FIXME: perhaps a more useful example would be
better here.]
@noindent
Here the block of memory that @code{x} points to is examined first as a
@code{char}, then as an @code{int}, and finally as a @code{struct foo}.
@code{void *} pointers may not be added or subtracted, but they may be
compared like any other pointers.
@section where do pointer values come from ?
Pointer values arise in three ways:
@itemize @bullet
@item
The address operator @samp{&} can make a pointer to any variable,
function, array element or structure element.
(Even variables of the user-defined data types discussed in the next
chapter.)
@item
Dynamic storage allocation reports its results as a pointer to the
memory that was allocated.
@item
A null pointer can be made by converting zero (@code{false}) to a
pointer type.
@end itemize
@subsection Address of a Variable
@kindex & (unary)
@cindex address
The unary operator @samp{&} returns the @dfn{address} of a variable (or
other lvalue).
The contents of this pointer are that variable.
[FIXME: Does this
sentence make sense ? or is this redundant from our discussion of
@code{*} ?].
@samp{&} can be applied to both local and global variables.
For example, suppose that @code{read_two()} is a function that reads
two integers from an input file.
A function can return only one value, so the most convenient way to get
two integers back from @code{read_two()} is to provide two pointers as
arguments, saying where to put the integers.
Then, if we want the integers to be stored in the variables @code{i1}
and @code{i2}, we can write:
@example
read_two(&i1, &i2);
@end example
We would use the following declaration for @code{read_two()} (for info
on @code{void}, @pxref{Void Functions}):
@example
void read_two(int* i1, int* i2);
@end example
@samp{&} is not limited to variables.
It can also be used with structure, union and array elements.
For example, suppose that @code{a} is an array of @code{MAX_INTS}
integers and we want to fill it up with pairs read with
@code{read_two()}.
The following code will work:
@example
int a[NUM_INTS];
int i;
for(i = 0; i < MAX_INTS; i += 2)@{
read_two(&a[i], &a[i + 1]);
@};
@end example
@code{&@var{a}[@var{i}]} means a pointer to element number @var{i} in
array @var{a}.
@subsection Dynamic Allocation (malloc and free)
@dfn{Dynamic allocation} means obtaining a block of memory which is
allocated during the execution of the program.
When memory is allocated dynamically, its size need not be known in
advance.
For example, you can write functions to operate on strings with no
fixed upper limit on the size of the string.
A dynamically allocated block of memory cannot have a variable name in
the ordinary sense.
The only way to refer to it is with a pointer.
In the following examples we use @code{malloc}, which is a standard
library function for dynamic allocation.
It is documented elsewhere (see ...[FIXME]).
For now it is enough to know that the argument to @code{malloc} is the
number of @code{char}s of storage desired, and its value is a
@code{void *} pointer to the block that was allocated (@pxref{Void
Pointers}).
For example, suppose we want character string, but we don't know until
run time how long it needs to be.
Once our program discovers it needs @code{size} characters, it can
allocate the character string dynamically with
@example
string = (char *) malloc (size + 1);
@dots{}
free(string);
@end example
@noindent
where a cast is used to convert the pointer to the correct data type.
A very common error known as a ``memory leak'' happens when you
repeatedly ask for more memory, but ``forget'' to give it back when you
are done with it.
This causes blocks of memory that you no longer need to steadily build
up.
When the program ends, these blocks are returned to the system; but if
your program runs for a long time, eventually there may be no memory
left.
If there is not enough memory left to fulfill your request (either your
program or other programs in the system have already used it all up),
then @code{malloc()} returns a null pointer.
C++ completely replaces @code{malloc()} and @code{free()} with the much
easier to use operators @code{new} and @code{delete}.
@example
string = new char[size+1]; // This only works in C++
@dots{}
delete string;
@end example
@subsection Null Pointers
A pointer of any type may have the null value.
Whenever a pointer happens to have the null value, we call the pointer
``@dfn{null pointer}''.
The purpose of a ``@dfn{null pointer}'' is to be a distinguishable
value that you can put in a pointer variable to say, ``As of now, this
does not point anywhere.''
To create a null pointer, cast the integer zero to the pointer type
that you want.
For example, @code{(char *) 0} is an expression for a null pointer to a
@code{char}.
@code{0} is automatically cast to a pointer of the correct type when it
is assigned to a pointer variable or compared with a pointer value.
A null pointer has no contents.
If a pointer used as the operand of the @samp{*} operator is null, it
is an error.
On some machines, the results are unpredictable; on others, the result
is inevitably a fatal signal (the program will core dump).
If a pointer value may be null, you should check whether this is so
before attempting to use its contents.
The way to do this is to compare against a null pointer expression or
the integer zero.
For example,
@example
#include
void
safe_contents(char* p)
@{
if(0 == p)@{ /* The compiler automatically casts this `0' to a `(char *)0' */
printf("this is a null pointer.\n");
@}else@{
printf("this pointer points somewhere - it points to \"%s\".\n", p);
@};
@}
void
main()
@{
char * x = "TEST";
safe_contents(x);
x[0] = 0;
safe_contents(x);
x = 0;
safe_contents(x);
@}
@end example
@noindent
causes this to be printed:
@example
this pointer points somewhere - it points to "TEST".
this pointer points somewhere - it points to "".
this is a null pointer.
@end example
@section what do I do with pointer values once I have them ?
@subsection dereferencing (*)
@cindex contents
@kindex * (unary)
Most of the time, a pointer will actually point to a memory block.
We call the contents of that memory block the @dfn{contents of the
pointer}, for short.
To get the contents of a pointer, apply the unary @samp{*} operator to
the pointer value.
Another operator that is used with pointers to structures is @samp{->}.
It takes one structure element of the contents when the contents are a
structure.
@xref{Structure Pointers}.
...
illegal/undefined when the pointer is not pointing at a ``real'' block
...
can cause core dump ...
most random values, as well as the null value ...
...
@subsection pointers and strings
@subsection pointer arithmetic
@cindex addition (pointer)
@cindex subtraction (pointer)
Two arithmetic operations are defined on pointer types: addition and
subtraction.
Not all pointer data types support them: pointers to @code{void} do
not, and pointers to functions do not.
But all other pointer types do.
Addition and subtraction on pointers can also be done with the
modifying assignment operators (@pxref{Modify}) and the
increment/decrement operators (@pxref{Increment}).
[FIXME: should we mention the type @code{size_t} here ?]
@table @code
@item @var{p} + @var{i}
@itemx @var{i} + @var{p}
The result of adding a pointer @var{p} and an integer @var{i} is a
pointer of the same type as @var{p}, but advanced from @var{p} by
@var{i} objects --- by @var{i} times the length of the object that
@var{p} points to.
This means that if @var{p} points to an element of an array,
@code{@var{p}+@var{i}} points @var{i} elements later.
Thus,
@example
&a[3] + 2
@end example
@noindent
is equivalent to @code{&a[5]}; it takes the address of the third
element and then advances it by two elements' worth.
This is true whether the elements are @code{char}'s or @code{double}'s
or large structures.
In fact, @code{&a[@var{i}]} is equivalent to @var{&a[0] + @var{i}}.
@item @var{p} - @var{i}
Subtracting an integer from a pointer is really nothing new.
This expression is equivalent to @code{@var{p} + (- @var{i})}.
@item @var{p1} - @var{p2}.
Subtraction is also allowed between two pointers of the same type.
The result (an integer) tells how far apart the two pointers lie,
measured in units of the objects pointed to.
For example,
@example
&a[5] - &a[3]
@end example
@noindent
is invariably 2.
(Note that these pointers may be hundreds of bytes apart if @code{a[]}
is a large structure type).
The compiler subtracts the addresses, then divides the result by the
size of the objects to which they point.
The subtraction is legitimate only if this division comes out even; the
result is not considered well defined otherwise.
When the subtraction is well defined, the result can be added to
@var{p2} to give back @var{p1}.
@item @var{p}[@var{i}]
The array indexing operator, @code{[]}, can be used with a pointer in
place of an array.
In effect, it regards the pointer as pointing to the first element of
an array, and fetches the contents of the @var{i}th element.
This expression is equivalent to
@example
*(@var{p} + @var{i})
@end example
@end table
@subsection Comparison of Pointers
All of the comparison operators can be used on two pointer values of
the same type (@pxref{Comparison}).
The integer zero may also be used as one of the operands.
Zero is converted automatically to a null pointer of the same type as
the other operand.
@samp{==} and @samp{!=} test whether two pointer values are identical
(point to the same place).
The order-comparisons @samp{>}, @samp{<}, @samp{>=} and @samp{<=} test
pointers according to the order in memory of the places they point to.
Smaller addresses are considered ``less''.
[FIXME: I (DAV) used a C compiler that put the 20 address bits of its
machine into 3 bytes, but @code{int} was merely 16 - does this make the
following statement wrong/non-compliant, or was my compiler merely
non-compliant ? What about the type @code{size_t} ?]
Comparing two pointers gives the same result as casting them both to
@samp{int} (on some machines) or @samp{unsigned int} (on other
machines) and comparing the integers.
@xref{Pointer-Integer}.
@subsection pointers, structures and lists
@subsection passing values between functions by pointers
@subsection pointers to functions
@subsection Pointer-Integer Conversion
A cast (@xref{Casts}) can convert an integer value to a pointer value,
or a pointer value to an integer value.
The ANSI C standard does not specify exactly what this conversion
means.
GNU C keeps the same bit pattern when it converts.
As a consequence, the conversion takes no time to execute.
Another consequence is that result of converting any pointer to an
integer is the difference in bytes between that pointer and a null
pointer.
In fact, for a pointer to a @code{char}, converting to @code{int} is
the same as subtracting a null pointer.
In GNU C, converting a pointer to an integer [FIXME: what kind of
integer ? surely not a @code{short int} ?] and then back to a pointer
produces a value equal to the original pointer.
The same is true if an integer is converted to a pointer and then back
to an integer.
@section Trivia
@node Creating New Data Types, Arithmetic and Bitwise Operators, pointers, Top
@chapter Creating New Data Types
@section Arrays
@subsection declaring and initializing arrays
@cindex array
@cindex index
An @dfn{array} is a sequence of elements, all of the same type (the
``element type'').
An individual element is identified by its sequence number (called its
@dfn{index}).
An array type is a derived type, and cannot be the basic type of a
declaration.
To declare a variable with array type, you must always specify: ``What
type of things are in this array ?''.
You must usually also specify ``How many
things are in this array ?'' (the ``@var{length}'' of the array,
occasionally called the ``size'' of the array).
To declare an array @var{a} with @var{length} elements of type @var{t},
one must declare the complex declarator @code{@var{a}[@var{length}]} to
have type @var{t}.
For example,
@example
char buffer[5];
@end example
@noindent
declares an array of 5 @code{char} variables; and names the array
@code{buffer}.
Here the declarator is @code{buffer[5]} --- a complex declarator that
expresses the relationship between @code{buffer}'s type and the
declaration's basic type (@code{char}).
The length of an array type must be an integer.
The ANSI C standard requires the length of an array type to be a
positive constant known at compile time.
GNU C also allows zero.
GNU C also allows the length of an array of storage class @code{auto}
to be any expression, which is recomputed each time space for the array
is allocated
(If the length is negative, the results are undefined.)
The length of the array may be omitted if an initializer is present
because the number of elements in the initializer shows how big the
array must be.
The length of the array may also be omitted for an external variable.
The length of the array may also be omitted in function prototypes:
@example
float average_foot_smelliness( int number_of_feet, float foot_smelliness[] );
@end example
@noindent
Unfortunately, only the length of the *last* dimension of a
multidimensional array may be omitted in a function prototype - all the
other dimensions must be explicitly set in the function prototype.
This makes it impossible to write a function to directly accept a 2D
array of arbitrary size.
There are various (incompatible) tricks to work around this inadequacy.
[FIXME: should I mention a few ?]
[FIXME: Is there any difference between `initialization' v.
`initializer' ?]
You can add an initialization to an array declarator for static and
automatic arrays.
The initializer for an array consists of a pair of braces surrounding a
sequence of element expressions.
The first item in the sequence initializes array[0], the next
initializes array[1], etc.
Once we run out of element expressions, the rest of the array is
initialized to zero.
For example,
@example
char * table[3] = @{"small", "medium", "large"@};
int values[3] = @{2, 20, 8192@};
int state[3] = @{@}; /* zero out the entire array */
@end example
In strict ANSI standard C, the elements of an array initializer must be
compile-time constant expressions.
GNU C allows arbitrary expressions to initialize elements of automatic
arrays; for a static array, since the initialization is done when the
program is loaded, the value must still be constant.
Array types in C are unusual because no expression can have an array
type.
Array types are used only for declaring arrays (variables of array
type).
Functions cannot be declared to return any array type.
Whenever an array variable name appears as an expression, it is
immediately converted to a pointer.
That pointer points to the first element of the array.
Even indexing works this way.
(The @var{length} of an array is also called the length of the array).
@subsection working with arrays
Referring to an element by its index is called @dfn{indexing}.
In C, indexing is represented with square brackets, as in
@code{buffer[2]}.
In C, indices always count from zero.
The previously defined @code{buffer} contains 5 elements, but 5 would
not be a valid index. Any attempt to read or write to buffer[5] may
cause a core dump.
The only valid indices to this buffer are 0, 1, 2, 3 and 4 --- in
other words, we can now read and write to buffer[0], buffer[1],
buffer[2], buffer[3], and buffer[4].
To express arrays of types that are not themselves basic, the
@code{@var{var}[@var{length}]} construct is nested within other
declarator constructs.
For example, an array of pointers-to-@code{char} is declared as
follows:
@example
char (*(stringptr[512]));
@end example
@noindent
or more simply
@example
char * stringptr[512];
@end example
This declares @code{strings} as an array of 5 elements, each of which
is a @code{char *}.
We declare @code{strings[5]} as a pointer to a @code{char}, and that in
turn is done by declaring the complex declarator @code{*strings[5]} ---
as a @code{char}.
@example
char *strings[5];
@end example
And this declares @code{matrix} as an array of 9 arrays of 10
@code{int}'s.
@example
int matrix[9][10];
@end example
@noindent
Here we pretend to declare @code{matrix[9]} as an
array-of-10-@code{int}'s, so @code{matrix} itself must be an array of 9
of those.
(As an expression, @code{matrix[0]} would be the first subarray, and
@code{matrix[0][9]} would be the last @code{int} in that subarray.)
The length of an array may be omitted when you declare an initialized
variable, because then it can be determined from the initializer.
@xref{Initializers}.
@section Indexing
@cindex indexing
@dfn{Indexing} an array means referring to one element by specifying
its index.
In C, indexing is represented with square brackets.
@table @code
@item @var{array}[@var{index}]
This expression represents the value of the @var{index}th element of
@var{array}.
It is a lvalue; that is to say, it may appear on the left side of an
assignment.
That is how values are stored in array elements.
@end table
Using @var{array} in an expression converts it immediately to a pointer
to the first element of the array.
The indexing operation actually operates on this pointer.
It can equally well operate on any pointer.
It is equivalent to @code{*(@var{array} + @var{index})}.
From this equivalent form, we see that indexing is a symmetrical
operation.
It follows that you can just as well write
@code{@var{index}[@var{array}]}.
In other languages, array indexing may check that the index is within
the valid range for the array that is in use.
In C, this is impossible because the indexing operation actually
operates on a pointer to the first array element.
This pointer carries no information about the length of the array.
Indices that are nominally out of range are often useful.
For example, when indexing a pointer that is not an array, negative
indices may be useful.
If @var{p} is a pointer to an element in the middle of an array,
@code{@var{p}[0]} is that element, @code{@var{p}[1]} is the following
element, and @code{@var{p}[-1]} is the previous element.
Indexing by a value that appears ``too large'' is useful also.
Often it is necessary to allocate arrays dynamically.
Standard C does not define array types with varying length, so the
usual practice is to declare the array with length 1 but actually
allocate space for as many elements as are needed.
It's the programmer's responsibility to keep track of how many elements
were actually allocated.
Then any index less than that number is valid in fact, even though it
exceeds the nominal length with which array was declared.
@subsection Multi-dimensional arrays
@subsection Trivia
Multi-dimensional arrays are not very easy to use in C.
Most people who need them re-implement them ...
The ``element type'' is the data type of all the elements of the array.
In C, the ``element type'' of an array may be any type except for
function types and @code{void}.
For example, arrays of arrays are allowed, and so are arrays of
structures and arrays of pointers.
Arrays of pointers to functions are sometimes useful.
@section Characters and Strings
@subsection initializing strings
@subsection Null termination
@subsection working with strings
@subsection Trivia
@section Structures
@cindex structure
@cindex element
@cindex member
@cindex field
@subsection Structures
@samp{struct}
@comment - didn't I already say this elsewhere ?
A @dfn{structure} is a data object containing several sub-objects, each
of a specified name and type. They need not all have the same data
type. The sub-objects are called @dfn{elements}, @dfn{members} or
@dfn{fields} of the structure.
We also use the term ``element'' for a sub-object of an array. We use
the term ``member'' (and ``field'') only to indicate a sub-object of a
structure.
In an array, a numeric index selects an element.
In a structure, a name selects an element.
[FIXME: is ``member'' always an exact synonym for ``field'' ?]
[FIXME: is there a special term that always indicates a sub-object of an
array, a term that never indicates a sub-object of a structure ?]
@subsection defining structures
@kindex struct
In C, each kind of structure is a distinct data type and is
distinguished by a name called the @dfn{structure tag}.
You must define each kind of structure, specifying its structure tag
name and the names and types of all the fields.
Here is an example:
@example
struct fontunit@{
char code;
int height, width, kern;
int * bitmap;
@};
@end example
@noindent
This defines a structure type that might be used to record the
information about one character in a font.
The structure tag name is @code{fontunit}.
The structure contains five fields: one of type @code{char} named
@code{code}; three of type @code{int} named @code{height},
@code{width}, and @code{kern}; and one of type @code{int *} named
@code{bitmap}.
Once this type is defined, @code{struct fontunit} behaves as the name
of a data type, much like @code{int}.
So it can be used to declare variables.
@subsection declaring structure variables
For example
@example
struct fontunit temp;
struct fontunit *nextunit;
@end example
@noindent
declares @code{temp} to be a structure of this type.
We say that @code{temp} is ``a @code{struct fontunit}''.
This means that @code{temp} is allocated a block of memory that has
enough room for all five fields, one after the next.
By contrast, @code{nextunit} is declared as a pointer to a @code{struct
fontunit} (@pxref{Pointers}).
@code{nextunit} is allocated a block of memory that has enough room for
a single pointer.
@subsection Structure Forward References
@cindex forward reference
In fact, it is possible to use the type @code{struct fontinfo} for some
declarations even before it is defined.
Before its definition, the amount of memory space needed to hold it is
not known.
So you are not allowed to define variables or structure fields of that
type.
But you can define @emph{pointers} to that type.
For example, the following is legitimate:
@example
struct fontunit *nextunit;
struct fontunit
@{
char code;
int height, width, kern;
int *bitmap;
@};
@end example
@noindent
The declaration of @code{nextunit} makes a forward reference to a
structure type not as yet defined.
After the definition of @code{struct fontunit} is seen, the C compiler
fully understands the data type of @code{nextunit}.
Until that time, it would be invalid to refer to the contents of
@code{nextunit} with @code{*nextunit}.
Undefined structure types can validly exist only buried within pointer
types.
The forward reference capability is essential for defining recursive
pointer-structures.
For example,
@example
struct mymove
@{
enum piece_type piece;
char new_x, new_y;
struct mymove *alternative;
struct hismove *next_move;
@};
struct hismove
@{
enum piece_type piece;
char new_x, new_y;
struct hismove *alternative;
struct mymove *next_move;
@};
@end example
@noindent
defines a data structure that might be useful in a game-playing
program. Each @code{struct mymove} represents a move that the player
might make; it belongs to a chain of alternative moves. It also points
to the beginning of a chain of possible moves for the opponent, a chain
of @code{struct hismove} structures, one for each move the opponent
might then make. And each @code{struct hismove} structure points to
another chain of @code{struct mymove} structures describing the
possible responses for the player.
Clearly these two structures could not be defined without a forward
reference. But even the @code{struct mymove *alternative;} in the
definition of @code{struct mymove} counts as a forward reference.
@subsection Anonymous Structure Types
It is possible to define a structure type that has no structure tag
name.
This is an anonymous structure type.
Because it is impossible to refer to the type again, the definition of
the type must appear in a declaration of one or more variables.
The variables declared therein are the only ones that can have this
anonymous type.
For example,
@example
struct @{ int i; double d; @} struc1, struc2;
@end example
@noindent
declares each of the variables @code{struc1} and @code{struc2} to
contain an @code{int} and a @code{double}.
This feature in its simplest form is not useful; you could just as well
define each field as a separate variable.
But in more complex usage it may be useful.
For example, it is possible to copy @code{struc1} into @code{struc2}
with a single assignment expression.
Individual variables for the fields could not be copied as a group in
this way.
Also, an array of anonymous structures may be useful.
For example,
@example
struct @{ int i; double d; @} a[10];
@end example
@noindent
defines an array of 10 @code{int}-@code{double} pairs.
The analogous feature for unions is very useful.
@xref{Anonymous Unions}.
@subsection Structure Redefinition and Scope
Structure tag names obey the same scoping rule as variable names do
(@pxref{Scoping}).
Each function definition, and each compound statement, forms a scope.
The entire source file also forms a scope.
A structure tag is in effect only during the innermost scope that
contains the structure type definition.
For example, if you define a structure tag name within a function
definition, the tag name is defined only within that function.
Another structure of the same name could be defined in the next
function with no conflict.
Structure tag names and variable names are completely independent.
For example, you can have a structure named @code{foo} and a variable,
function or type named @code{foo} with no interference.
This is actually a common thing to do.
However, structure tags, union tags and enum tags share one name space.
Thus, you may not have @code{struct foo} and @code{union foo} defined
at the same time in one scope.
An attempt to do this will elicit an error message.
@subsection Shadowing Structure Tags
@cindex shadowing
It is invalid to define the same structure tag name twice in one
scoping level.
But a name defined in an outer scope can be temporarily redefined for
an inner scope.
This is called @dfn{shadowing} the name's outer definition.
For example, you can define a structure tag outside of function
definitions (a definition whose scope is the whole file) and make an
overriding definition of the same name inside a function definition.
Within that function, the meaning of the structure tag name is the
definition given in the function.
After the end of the function, that definition ceases to exist and the
tag name has its original meaning again.
Here is an example:
@example
struct foo @{
int i, j;
@};
double
func(double x)
@{
struct foo @{
double i, k;
@};
struct foo * ptr;
@dots{}
return( ptr->i + ptr->k );
@}
/* @i{the first definition of @code{struct foo} is once again in effect} */
@end example
Shadowing is not usually a good idea.
It is clearer to pick distinct names for your structure types.
Occasionally it may be useful together with macros: a macro that
expands into a compound statement might define a structure type for use
within that compound statement.
Shadowing makes it possible to do this without interference from the
surrounding context.
Because structure tags, union tags and enum tags come from the same
name space, you can shadow one kind with another.
For example, you can shadow a union tag name with a structure
definition:
@example
union converter @{ int i[2]; double d; @};
int
foo ()
@{
struct converter @{ char* defn; @};
@dots{}
@}
@end example
@subsection Accessing Structure Elements
@kindex .
@cindex field access
The binary operator @samp{.} refers to a field of a structure.
The left operand is an expression whose type must be a structure.
The right operand is not an expression.
It is the name of one of the fields of that structure.
Thus, after the declarations
@example
struct point @{ int x, y; @};
struct point cursor;
struct * nextpoint = &cursor;
@end example
@noindent
the expression @code{cursor.x} retrieves the @code{x}-field of the
structure @code{cursor}.
The expression @code{((*nextpoint).x)} retrieves the same value, but we
usually abbreviate that as @code{nextpoint->x} (@pxref{Structure
Pointers}).
The ``@samp{.} expression'' is a lvalue if the left operand is
(@pxref{lvalue}).
Being a lvalue means its address can be taken with @samp{&}
(@pxref{Address}) and usually that a value can be stored there with an
assignment (@pxref{Assignment}).
It is an error to use a left operand whose type is not a structure or
union.
It is an error to use a field name that does not belong to the
particular structure or union type of the left operand.
@subsection Structure Operations
Accessing a field of a structure is not the only way to operate on one.
These other operations are also allowed:
@itemize @bullet
@item
Assignment: An entire structure object can be assigned a new value ---
the value of another structure of the same type.
@xref{Assignment}.
@item
Argument passing: A structure can be passed as an argument to a
function.
It is essential that the function argument be declared as a structure
of the same type.
@xref{Calling}.
@item
Returning: A function can be declared to return a structure type.
Then a call to that function is an expression of that type.
@item
Address: The address of a structure can be taken with @samp{&}
(@pxref{Address}).
This address can be used later to access the original structure or its
components (@pxref{Structure Pointers}).
@end itemize
There are no constant structure values, and type conversion is not
possible for structures.
@subsection Structure Size and Alignment
Each structure type defined has an associated required alignment in
memory and a size in bytes.
The alignment required for a structure type is the maximum of the
alignments required by the types of the fields of the structure.
Each field is also aligned within the structure to its own required
alignment.
For example, in the structure
@example
struct foo
@{
char c;
int i;
@};
@end example
@noindent
on a machine in which the address of an @code{int} must be multiple of
4, 3 bytes are unused in between fields @code{c} and @code{i}. If the
alignment required for an @code{int} is only 2, just 1 unused byte is
needed. In either case, the required alignment of the type @code{struct
foo} is the same as that of @code{int} (because that is certainly not
less than the required alignment of the other field's type, which is 1
for @code{char}).
The size of the structure is equal to the offset of the last field,
plus its size, rounded up to a multiple of the structure's required
alignment.
For example, in
@example
struct bar
@{
int i;
char c;
@};
@end example
@noindent
the required alignment of @code{struct bar} is the same as that of
@code{int}.
The total size is thus 4 (the offset of @code{c}) plus 1 (the size of
@code{c}), rounded up to a multiple of that alignment.
The result is 6 or 8 if the alignment required for an @code{int} is 2
or 4.
This means some space is wasted at the end.
[FIXME: this assumes 4 Byte @code{int}s, which is not always true.
Should we qualify this by saying ``on my particular compiler'',
generalize to the same level of detail, or just gloss over the whole
thing by saying ``padding makes it impossible to know the exact size of
a structure'' ?]
You can make a structure smaller by grouping smaller fields together.
Consider the following two structure types:
@example
struct a @{ char c1; int i; char c2; @};
struct b @{ char c1; char c2; int i; @};
@end example
@code{struct a} occupies 8 or 12 bytes according to the alignment
required by @code{int}, whereas @code{struct b} occupies only 6 or 8.
By putting the two @code{char}'s together, @code{struct b} saves an
amount equal to the alignment required for an @code{int}.
@subsection Pointers to Structures
@kindex ->
When the type of the contents is a structure type, it is often useful
to combine the two operations of taking the contents (a structure) and
taking an element of the structure.
The binary operator @samp{->} does this.
@table @code
@item @var{ptr}->@var{elementname}
The value of this expression is the element named @var{elementname} in
the structure that @var{ptr} points to.
@var{ptr} must be an expression whose type is a pointer to a structure
type, and that structure type must have an element named
@var{elementname}.
This expression is equivalent to @code{(*@var{ptr})->@var{elementname}}.
@end table
For example, suppose we represent a complex number as a structure
containing a real part and an imaginary part:
@example
struct complex @{
double real; double imag;
@};
@end example
Then, given a pointer @var{p} to a complex number, we can calculate the
magnitude squared of the complex number as follows:
@example
double
mag_squared(struct complex *p)@{
return p->real * p->real + p->imag * p->imag;
@}
@end example
@noindent
which is short for
@example
double
mag_squared(struct complex *p)@{
return( ((*p).real) * ((*p).real) + ((*p).imag) * ((*p).imag) );
@}
@end example
@subsection Lists
@cindex nodes
This example shows how structures and pointers are used to make linked
lists.
We define a structure to hold one node of a list of @code{int} values.
The list is made of @dfn{nodes}; each node contains one @code{int}
value and a pointer to the following link:
@comment 1998-05-27:DAV: replaced the term `link' in the original text with the term `node'.
@c Was the original author just confused, or has terminology really changed over the years ?
@c What does the term ``a link of a linked list'' mean these days ?
@c An individual blocks of the list, or a pointer inside that block ?
@example
struct int_list_node
@{
int value;
struct int_list_node *next;
@};
@end example
What goes in the @code{next} element of the last node?
It cannot be a pointer to the following node, because there is no
following node.
Instead, we store there a @dfn{null pointer}: a pointer value that is
recognizably distinct from any possible following node.
The presence of a null pointer indicates that the node is the end of
the list.
@xref{Null Pointers}.
This function @code{int_list_last()}, when given a pointer to a list
(as described above), returns a pointer to the last node of the list.
@example
struct int_list_node *
int_list_last (struct int_list_node *node)@{
while (node->next != 0)@{
node = node->next;
@};
return(node);
@}
@end example
If in the same program we need other kinds of lists --- lists of
@code{double} values or lists of strings, perhaps --- a new structure
type must be defined for each kind of list.
Although the operation of finding the last node is fundamentally the
same for each kind of list, a separate function is needed for each kind
since each function applies only to one data type.
This inconvenience can be remedied with @dfn{unions}.
(C++ creates a totally different remedy.)
@subsection Varying-Size Structures
Often it is useful for dynamically allocated structures to end with an
array of varying size.
C requires each array to have a fixed size, so we cannot officially do
this.
What we actually do is define the structure with an array of size zero
or one, but then allocate extra space.
As an example, we will define a font consisting of a sequence of the
@code{struct fontinfo} structures previously defined.
Each @code{struct fontunit} describes one character in the font.
Each font needs a different number of @code{struct fontunit} units,
according to how many characters are defined.
The data structure of the font must contain these units and must also
say explicitly how many units there are.
Here is how it is done:
@example
struct fontunit
@{
char code;
int height, width, kern;
int *bitmap;
@};
struct font
@{
int length;
struct fontunit contents[0];
@};
@end example
A font containing @var{x} units can then be allocated with
@example
struct font *
allocate_font (int x)
@{
int nbytes = (sizeof (struct font)
+ x * sizeof (struct fontunit));
struct font *thisfont;
thisfont = (struct font *) malloc (nbytes);
if(thisfont == 0)@{
fatal("virtual memory exceeded");
@}else@{
thisfont->length = x;
@};
return( thisfont );
@}
@end example
@noindent
This example shows how to calculate the size required from the number
of elements; it also illustrates the technique for checking that
@code{malloc} succeeded.
The length used to allocate the font is stored in the font's
@code{length} field.
That way, when the font is accessed later, it is possible to tell how
many elements there actually are.
For example, this function returns finds the element of @code{font}
whose @code{code} field matches @code{thischar}, and returns a pointer
to that element.
If there is no such element, this function returns a null pointer
(because zero converts automatically to a null pointer; @pxref{Null
Pointer}).
@smallexample
struct fontunit *
font_find_char(struct font *font, char thischar)
@{
/* Point just past the last element that exists */
struct fontunit *end = font->contents + font->length;
/* Look at each element; stop when past the last.*/
for(nextunit = font->contents; nextunit != end; nextunit++)@{
if(nextunit->code == thischar)@{
return nextunit;
@};
@};
return 0;
@}
@end smallexample
@noindent
Note that @code{font->contents} refers to the field @code{contents}.
Since that is an array, it is immediately converted to a pointer to its
first element.
The array officially has no elements, but that is no problem: The
pointer points to where the first element would be if there were one.
In fact, there really are elements --- dynamically allocated elements
--- and that is exactly where the first one is.
ANSI Standard C does not allow a zero-length array.
If code is to operate on other C implementations, the @code{contents}
field must be given the length 1 and the allocation code must be
changed to match.
The change is in the computation of @code{nbytes}.
This is the result:
@example
struct font @{
int length;
struct fontunit contents[1];
@};
struct font *
allocate_font (int x)
@{
int nbytes = (sizeof (struct font)
+ (x - 1) * sizeof (struct fontunit));
struct font *thisfont;
thisfont = (struct font *) malloc (nbytes);
if(thisfont == 0)@{
fatal("virtual memory exceeded");
@}else@{
thisfont->length = x;
@};
return thisfont;
@}
@end example
@subsection Bit Fields
@cindex bit field
A @dfn{bit field} is a structure field that is not a full byte or word.
You can specify exactly how many bits long it should be.
Bit fields allow you to pack information tightly into a small space.
They are also useful for describing the pattern of data in a hardware
register.
A bit field is defined like any other structure field except that a
colon and a bit-width follow the field name.
For example, this is a structure, designed for a 16-bit @code{int}
compiler, that breaks a 32-bit word down into 8 four-bit fields:
@example
struct half_bytes @{
unsigned int a : 4, b : 4, c : 4, d : 4;
unsigned int e : 4, f : 4, g : 4, h : 4;
@};
@end example
@noindent
You might think that this particular application calls for an array of
four-bit elements, but unfortunately there is no such thing in the C
language.
Bit fields in C exist only as structure fields.
Pointers in C can point only to bytes or multi-byte objects.
A bit field is not usually composed of entire bytes, so in C pointers
to bit fields are not allowed.
Use of the address operator @samp{&} on a bit field causes an error
message (@pxref{Address}).
However, a bit field can be an lvalue for assignment purposes just like
any other structure field (@pxref{Lvalue}).
@subsection Data Types of Bit Fields
The data type of a bit field must be an integer type or an @code{enum}
type.
An integer type may be signed or unsigned.
This choice makes a big difference.
A signed bit field of @var{n} bits has range of values
@minus{}2^(@var{n}@minus{}1) to 2^(@var{n}@minus{}1) @minus{} 1.
An unsigned one of the same number of bits ranges from zero to
2^@var{n} @minus{} 1.
For example, an unsigned bit field of 1 bit can be 0 or 1, but a signed
one-bit field can only be 0 or @minus{}1.
If an @code{enum} type is used, it is treated as unsigned.
The number of bits may not be longer than the word size; that is, the
bit field may not be bigger than an @code{int}.
@subsection Bit Field Machine Dependence
Exactly how the fields are packed into bytes depends on the machine.
On machines where the least significant byte of a word is the
lowest-numbered, fields are packed in starting from the least
significant bit.
If the most significant byte is lowest number, fields are packed in
starting from the most significant bit.
Thus, the first field in a sequence of consecutive fields always goes
into the next available byte.
On some machines, field are freely split across word boundaries.
On others, this is not allowed; then if the next field is too big to
fit in what remains of the current word, it stars in the following
word.
@subsection Bit Field Gaps
[FIXME: Is this true ?]
You can leave a gap of a specified number of bits by defining a field
with a negative size and no name.
For example,
@example
struct foo
@{
unsigned int x : 5;
unsigned int y : 5;
unsigned int : 3;
unsigned int z : 3;
@};
@end example
@noindent
gives 5 bits to @code{x}, 5 to @code{y}, skips the next 3, and gives 3
bits to @code{z}.
The total is 16 bits, or two bytes.
A nameless field with ``size'' zero forces the next field to start at
the beginning of a word.
@subsection trivia
The definition of the structure also serves as the name of a type.
So you can declare variables of that type at the same time as the type
is defined.
For example, it is legitimate to write
@example
struct fontunit
@{
char code;
int height, width, kern;
int *bitmap;
@} *nextunit;
@end example
@noindent
But this is not recommended.
If you keep the structure definition separate from variable
declarations, it is easier to read.
@subsection Shadowing and Forward References
Shadowing causes problems with forward references.
Suppose within the definition of @code{func} above you want to make a
forward reference to @code{struct foo} before defining it.
A definition of @code{struct foo} is already known, so a declaration
such as @code{struct foo *ptr;} would be taken as a use of the existing
definition.
In order to make a forward reference to the new definition to come, you
must first shadow the outer definition with an empty declaration
consisting of just @code{struct foo;}.
@example
struct foo @{ int i, j@};
double
func (double x)
@{
struct foo;
struct foo *ptr;
struct foo @{ double i, k; @};
@dots{}
return ptr->i + ptr->k;
@}
@end example
@noindent
Normally, @code{struct foo} would be a name for the existing structure
type.
However, when it appears in an empty declaration (one that declares no
variables) it is given a special meaning.
The empty declaration tells the compiler that @code{struct foo} will be
redefined in the current scope, and following uses of @code{struct foo}
should be taken as forward references to the coming definition.
This ``empty declaration'' feature is supported and described in
@code{gcc} because the ANSI C Standard mandates it and you might see
programs that use it.
Using this feature is a very bad idea.
@section Unions
@samp{union}
@subsection Unions
@cindex union
@kindex union
@dfn{Unions} are a kind of type that allow one block of memory to be
regarded as any of several other types.
Each union type is defined by
specifying the alternative types that are its members.
Unions in C are much like structures.
The description of unions here assumes that you understand structures.
@xref{Structures}.
@subsection defining unions
A union definition looks like a structure definition except that the
keyword @code{union} replaces @code{struct} (@pxref{Structure Def}).
Union tag names and structure tag names come from the same name space.
This means that, in any one name scope, one particular name may be the
name of either a structure type or a union type, but not both.
If you define @code{union hack}, you may not also use @code{struct
hack}.
@subsection accessing unions
Union components are accessed using the @samp{.} and @samp{->}
operators, just like structure components (@pxref{Structure Ref}).
They can be assigned, passed as arguments and returned just like
structures (@code{Structure Operations}).
There are no constant union values, and type conversion is not possible
for unions.
@subsection When to use a union
There are only 2 reasons to ever use a union: (a) to save space, and
(b) to interpret a single piece of hardware multiple ways.
The "endian problem" never happens if you don't use unions.
@subsection Union Members
Here is a sample union definition:
@example
union element
@{
int i;
char *s;
struct window *w;
@};
union element temp;
@end example
@noindent
This union has three members, of three different types.
An object of this union type, such as the variable @code{temp} has
enough space to hold either an @code{int}, a @code{char *} or a
@code{struct window *}, but not two at once.
The three members of the union variable @code{temp} can be thought of
as three variables of different types that are stored in the same
space.
The value of the union is valid only for the member that was last used
to store in it.
For example, if you store an @code{int} into @code{temp.i}, you can
refer to @code{temp.i} later to get the same @code{int} value, but
@code{temp.s} and @code{temp.w} are invalid and their values are
undefined.
If you later store a @code{char *} value into @code{temp.s}, you can
access @code{temp.s} again to recover the same value, but @code{temp.i}
is now undefined.
The size of the union is equal to the largest of the sizes of its
members.
Contrast this with a structure that has the same members:
@example
struct elements
@{
int i;
char *s;
struct window *w;
@};
@end example
@noindent
This structure has enough space for an @code{int} @emph{and} two
pointers side-by-side.
All three can be stored in it independently.
The size of this structure is (at least) the sum of the sizes of the
members.
@subsection Alternative-use Storage
The example above for list structure (@pxref{Lists}) shows that you
need a new structure type for each kind of data you want to put into
lists.
When you have one type of structure to represent a list of
@code{int}'s, you need another structure type for a list of @code{char
*} strings, and yet another for a list of @code{struct window *}'s.
What if you want to have one list containing @code{int}'s, @code{char
*}'s and @code{struct window *}'s, in any random order?
This can be
done with the union defined in the previous section.
Here is the definition again:
@example
union element
@{
int i;
char *s;
struct window *w;
@};
@end example
Now we can make a list of @code{union element} values just like a list
of anything else:
@example
struct alt_list_node @{
union element value;
struct alt_list_node *next;
@};
struct alt_list_node *p;
@end example
If @code{p} points to a node of a list of this kind, you can extract
the value as an @code{int} with @code{p->value.i}, or extract it as a
@code{struct window *} with @code{p->value.w}.
This is because @code{p->value} by itself is a value of type
@code{union element}.
But this is not a good solution of the problem.
Nothing in the list node tells you whether the value is supposed to be
interpreted as an @code{int}, a @code{char *} or a @code{struct window
*}.
If you refer to the value the wrong way, you will not get an error
message, just bizarre results.
This problem can be avoided by adding a @dfn{type-code} field to the
node structure, making it a ``self-describing'' structure.
@ifinfo
See the next node.
@end ifinfo
@subsection Unions and Type-code Fields
In the simple list-of-union, it is impossible to tell just by looking
at a node whether it contains an @code{int}, a @code{char *} or a
@code{struct window *}.
So the simple list-of-union structure is useful only when there is some
other way for the program to know how each node should be used.
Most of the time, it is better to add -- to every node -- information
about to interpret the node's value.
This is done with an additional field in the node structure, called a
``type code'' field because its value informs us of the type of value
in the union.
An enumeration type is often just the right thing for this purpose.
Here is the modified structure definition:
@example
struct alt_list_node
@{
enum @{ IS_INT, IS_STRING, IS_WINDOW @} code;
union element value;
struct alt_list_node *next;
@};
@end example
Then we establish a convention that when the @code{value} field is
properly interpreted as an @code{int}, the value @code{IS_INT} is
stored in the @code{code} field, and so on.
The C language does not enforce this convention.
It is still possible to disregard the convention and do
@example
node->code = IS_INT;
node->value.s = "foo";
@end example
@noindent
But obeying the convention is not hard, and as long as that is done,
the meaning of each element of the list is self-evident.
@subsection Unions for Type Puns
Would you like to know what the bit pattern of a @code{char}-pointer
really looks like? Define a union containing types @code{char *} and
@code{int} and see.
Here is how:
@example
int
ptr_as_int (char *p)
@{
union @{ char *p; int i; @} conv;
conv.p = p;
return conv.i;
@}
@end example
@noindent
Here the data is loaded into the union variable @code{conv} as a
pointer, then examined as an integer.
An example actually used in the GNU C compiler involves storing a
@code{double} in a data structure composed of an array of @code{int}s.
Two @code{int}'s provide enough room for the bits of the @code{double},
but we need a way to separate it into two words.
The following union was used:
@example
union converter
@{
int i[2];
double d;
@};
@end example
@noindent
With this union it is possible to take a @code{double} apart and store
it into two @code{int}'s, and later reverse the transformation.
Here is a function to take a @code{double} apart, storing the two
halves into two locations specified by giving pointers two them:
@example
void
dissect_double(double d, int *l, int *h)
@{
union converted conv;
conv.d = d;
*l = conv.i[0];
*h = conv.i[1];
@}
@end example
Here is how to reassemble the two halves into an identical
@code{double}:
@example
double
reconstruct_double(int l, int h)
@{
union converted conv;
conv.i[0] = l;
conv.i[1] = h;
return conf.d;
@}
@end example
@subsection Union Member Addresses
In general, the members of a union share a common starting address.
The address of any member of the union is equal to that of the union
(though their types are different, so in order to compare them in C you
must cast one to the other's type).
For example, in
@example
union test @{ int i; char c; @} var;
int
check_it()
@{
return ((int *) &var) == (&var.i);
@}
@end example
@noindent
the function @code{check_it} is guaranteed to return 1.
@subsection Run-time Endianness Test
A union of a @code{char} and an @code{int} can be used to tell how the
bytes in an @code{int} are numbered on the machine you are using.
This example shows how.
@example
void
endian(void)
@{
union @{
int i;
char c;
@} temp;
temp.i = 0;
temp.c = 1;
if(temp.i == 1)@{
printf("Little-endian\n");
@}else if(temp.i == 1 << 24)@{
printf("32-bit big-endian\n");
@}else@{
printf("Something strange\n");
@};
@}
@end example
@subsection Unions of Structures
Structure types can be used in unions as any other types can.
When this is done, the structure fields are obtained from the union
with two stages of the @samp{.} operator.
The size of the union is, as always, the maximum of the sizes of the
fields.
A common situation is that a union has several members that are
different types of structures.
Often two of the structure types start with similar fields, as shown
here:
@example
struct type1
@{
int x;
char b;
char *name;
int size;
@};
struct type2
@{
int x;
char c;
char *name;
char text[100];
@};
union u
@{
struct type1 t1;
struct type2 t2;
@};
union u u1;
@end example
Here both @code{struct type1} and @code{struct type2} start with the
sequence @code{int}, @code{char}, @code{char *}.
(The field names are not the same, but that is not important.)
In this case, it is guaranteed that you will see the same values for
those three initial fields regardless of whether you access them
through @code{struct type1} or @code{struct type2}.
In other words, @code{u1.t1.x} and @code{u1.t2.x} are the @emph{same
object}; and @code{u1.t1.b} and @code{u1.t2.c} are also the @emph{same
object}.
This fact is a consequence of the fact that the compiler lays out
structure fields in the order you write them, and their size and
spacing depends only on their data types.
If the first @var{n} fields of two structure types match in their
types, the layout of those fields must also match.
[FIXME: this next section may be totally bogus]
The code in the previous section can create very confusing source code.
Here is an alternate way of specifying exactly the same layout in
memory, but is far easier to understand.
This demonstrates that structures can contain unions.
[FIXME: make the reference over in anonymous structure types point
here]
@example
struct type1 @{
int size;
@};
struct type2 @{
char text[100];
@};
struct u @{
int x;
char b;
char *name;
union @{
struct type1 t1;
struct type2 t2;
@};
@};
struct u u1;
@end example
The memory layout of this @code{struct u u1} is identical to the
previous @code{union u u1}.
The code that uses @code{u1} is slightly simplified.
All references to @code{u1.t1.x} or to @code{u1.t2.x} must now be
replaced with @code{u1.x}, which makes it obvious that they were really
referring to the same thing.
Other bits of the code that refer to @code{u1.t1.size} or
@code{u1.t2.text} still access the same area of memory they did before.
[end possibly bogus section]
@subsection Trivia
@section enumerated types (enum)
Enumeration Types
@section Renaming (typedef)
@section Trivia
@node Arithmetic and Bitwise Operators, bool type, Creating New Data Types, Top
@chapter Arithmetic and Bitwise Operators
@section Arithmetic operators (+ - * / %)
@cindex addition (integer)
@cindex subtraction (integer)
@cindex multiplication (integer)
@cindex division (integer)
@cindex quotient (integer)
@cindex remainder
@cindex common type
The type of the result depends on the types of the operands.
First, if either operand has type @code{short} or @code{char} (either
signed or unsigned), it is converted to @code{int} by default
promotion.
Then the @dfn{common type} of the operands is determined.
This is either @code{long unsigned int}, @code{long int},
@code{unsigned int} or @code{int}.
The common type is long if either operand is long; it is unsigned if
either operand is unsigned.
If one operand has an unsigned type and the other has a signed type,
the one with the signed type is converted to unsigned and the
arithmetic is done on unsigned values.
If the signed operand had a negative value, the results may be
counterintuitive, because when this value is converted to an unsigned
type, it becomes a large positive number.
Small negative numbers become positive numbers near the top of the
range possible values.
For positive numbers, the result of an arithmetic operation is always
the same regardless of whether the type of the numbers is signed or
unsigned, except when the result is so large that it overflows the
range of the type.
@kindex +
@kindex -
@kindex * (binary)
@kindex /
@kindex %
@table @samp
@item @var{intexp} + @var{intexp}
Addition of two integer expressions
@item @var{intexp} @minus{} @var{intexp}
Subtraction of two integer expressions
@item @minus{} @var{intexp}
Negation of an integer expression.
Equivalent to @code{0 - @var{intexp}}
@item @var{intexp} * @var{intexp}
Multiplication of two integer expressions
@item @var{a} / @var{b}
Quotient of two integer expressions
If the exact quotient is not an integer, it is rounded toward zero to
make an integer.
If @var{b} is negative, the quotient is minus the result of dividing by
@code{-@var{b}}.
(The handling of negative operands may be different in other
implementations of C.)
If @var{b} is zero, the division operation raises a signal.
It is possible to write a handler for this signal, but usually it is
more convenient to test whether the divisor is zero before you do the
division.
@item @var{a} % @var{b}
Remainder of two integer expressions.
The remainder is compatible with the quotient: (@var{a} / @var{b}) *
@var{b} + @var{a} % @var{b} is equal to @var{a}.
If @var{b} is zero, the remainder operation raises a signal.
It is possible to write a handler for this signal, but usually it is
more convenient to test whether the divisor is zero before you do the
division.
@end table
@section increment and decrement (++ --)
@section conversion of types (cast)
@section internal representation of numbers in general
@section bitwise operators (& | ^ ~ >> <<)
@cindex bitwise operations
@cindex boolean operations
@cindex logical operations
[FIXME: we need to use terminology that makes it hard to confuse
``bitwise'' (lots of bits all being operated on at once in a single
value) vs. ``boolean'' (a value containing a single bit).
Perhaps ``bitwise'' vs. ``logical'' ?]
The @dfn{bitwise} operations combine two integers bit by bit.
This means that the operands are considered as binary numbers and lined
up.
The least significant bits (1's bits) of the operands are combined to
make the least significant bit of the result; the 2's bits of the
operands are combined to make the 2's bits of the result; the 4's bits
are combined to make the 4's bit of the result; and so on.
The operands are always treated as unsigned in these operations even if
they have signed types.
Operands of type @code{short} or @code{char} are extended to @samp{int}
before the operation is done, so there are always 32 bits to operate on
in each operand.
[FIXME: a picture or some ASCII Art would make this much easier to
visualize.
Remember that a @code{int} is not always 32 bits; and sometimes a
@code{long int} can be used in a bitwise operation - right ?]
Bitwise operations are also called @dfn{boolean} operations because
they are modeled on the laws of boolean algebra, and @dfn{logical}
operations because ``logical'' is traditionally used for any operation
that considers an integer as a sequence of bits.[FIXME: Wrong.]
Although the numbers are considered unsigned in order to perform the
operation, the data type of the result is not always unsigned.
It follows the same rule used for arithmetic operations: it is long if
either operand is long; it is unsigned if either operand is unsigned.
Here are precise definitions of all the bitwise operations.
Bit @var{n} of an unsigned integer @var{a} is @code{(@var{a} >>
@var{n}) % 2} (where @samp{>>} stands for right-shift;
@pxref{Shifting}).
Bit @var{n} of a signed integer is computed by first converting the
integer to unsigned.
@kindex & (binary)
@kindex |
@kindex ^
@kindex ~
@table @samp
@item @var{a} & @var{b}
Bitwise logical-and.
Bit @var{n} of the result is 1 if bit @var{n} in
both operands is 1.
@item @var{a} | @var{b}
Bitwise logical-or.
Bit @var{n} of the result is 1 if bit @var{n} in
either operand is 1.
@item @var{a} ^ @var{b}
Bitwise logical-exclusive-or.
Bit @var{n} of the result is 1 if bit
@var{n} is 1 in one of the operands and 0 in the other.
@item ~ @var{a}
Bitwise logical-not.
Bit @var{n} of the result is 1 if bit @var{n} of
@var{a} is 0.
@end table
@section Shift Operators
@cindex shifting
@kindex <<
@kindex >>
@dfn{Shifting} an integer is defined in terms of the binary
representation of the integer.
Shifting left means appending binary zeros to the number's
representation; this has the effect of multiplying by a power of 2.
(If the number is large enough, the most significant digits can be lost
by overflow in the process.) Shifting right means discarding binary
digits from the right of the number.
This has the effect of dividing by 2 and rounding down (to negative
infinity).
The result of shifting right has the same sign as the operand.
This means that the same bit-pattern for the operand produces a
different result depending on whether it has a signed or unsigned type.
The signed integer @minus{}4 and the unsigned integer @code{0xfffffffc}
have the same bit pattern, but when shifted right one place they
produce the results @minus{}2 and @code{0x7ffffffe}.
These two numbers differ in the highest bit.
When applied to unsigned values, the @code{>>} operator uses
``logical'' right shifting --- it brings zeroes into the most
significant bits of the result.
When applied to signed values, the @code{>>} operator uses
``@dfn{arithmetic}'' right shifting.
This brings zeros into the most significant bits for a positive number,
and ones into the most significant bits for a negative number.
@table @code
@item @var{a} << @var{count}
Shift @var{a} left by @var{count} places.
The result is undefined if @var{count} is negative or if it is larger
than 32.
@item @var{a} >> @var{count}
Shift @var{a} right by @var{count} places.
The result is undefined if @var{count} is negative or if it is larger
than 32.
@end table
Here are some examples of shifting, with the values that result.
@example
1<<0 == 1
1<<5 == 32
1<<31 == 0x80000000
5<<1 == 10
(-5)<<1 == -10
3>>1 == 1
4>>1 == 2
5>>1 == 2
(-3)>>1 == -2 == 0xfffffffe
(-4)>>1 == -2
(-5)>>1 == -3 == 0xfffffffd
((unsigned)-3) >> 1 == 0x7ffffffe
((unsigned)-4) >> 1 == 0x7ffffffe
((unsigned)-5) >> 1 == 0x7ffffffd
@end example
The ANSI C standard does not specify what happens when a negative
number is shifted.
In GNU C, we have chosen the meaning we think is most useful.
@section Floating Point Arithmetic
@cindex arithmetic (floating)
@cindex common type
The four basic arithmetic operators, @samp{+}, @samp{-}, @samp{*} and
@samp{/}, are allowed on floating point operands as well as integer
operands.
These are the only operations allowed on floating point operands.
The remainder operation (@samp{%}) is not meaningful for floating point
operands because division of floating point numbers does not round the
result to an integer.
When the result of arithmetic is outside the range of possible values
of its type, this is called @dfn{floating point overflow}.
The result of the operation is undefined when overflow happens.
When dividing by a negative number @var{b}, the result is the quotient
is minus the result of dividing by @minus{}@var{b}.
Division by zero has undefined effects, possibly crashing the program.
You should test whether the divisor is zero before dividing.
When operands of two different floating-point types are combined with
an arithmetic operation, the operand of narrower type is converted to
the other (wider) operand's type before the operation is performed.
The types in order of increasing width are @code{float}, @code{double}
and @code{long double}.
Floating point and integer operands may be mixed.
When this is done, the integer operand is converted to floating point,
in the same type as the other operand; then the arithmetic operation is
done in that type.
@section Trivia
@node bool type, expressions, Arithmetic and Bitwise Operators, Top
@chapter working with the bool type (true, false, and logical operators)
@section @code{bool} values
[FIXME: perhaps it would be easier to explain this ``as if'' there were
a @code{bool} type - i.e., from the C++ perspective.
People who knew nothing of type @code{bool} wrote many C compilers
compliant with the ANSI standard.
However, many programmers argue that the @code{bool} type is implicit
in the C language.
A C program compiled on a C++ compiler may create an executable
identical to that generated by a C compiler.
But the C++ perspective is to say that operators like `<' and `>'
return a value of type @code{bool}, and the conditional expression in a
if() is cast to a @code{bool}.]
A @code{bool} value is either @code{true} or @code{false}.
A truth value is either ``true'' or ``false''.
C does not have a distinct data type for truth values, as some
languages do.
(For example, type ``@code{bool}'' in C++).
Instead, any numeric type or pointer type can be used as a truth value.
A zero value represents ``false'', and any nonzero value means ``true''.
Most of the time, it is wise to use only type @code{int} for truth
values and to use only the value 1 to mean ``true''.
Although there is no special type for truth values, there are special
operators in C for creating truth values (comparison operators),
combining truth values (truth operators) and using them (conditional
expressions and conditional statements).
the @var{continue-condition} must have a data type which can be
compared against the constant zero, which means an integer zero, a
floating point zero, or a null pointer.
@xref{branching}
@xref{looping}
@section Comparison (Relational operators: > >= < <= == !=)
@cindex comparison
Comparison operators test for equality or ordering of either numbers or
pointers.
The result of a comparison is an @code{int} which is either 0 or 1.
Usually this value is used as a truth value.
@table @code
@item @var{a} == @var{b}
@item @var{a} != @var{b}
@item @var{a} < @var{b}
@item @var{a} > @var{b}
@item @var{a} <= @var{b}
@item @var{a} >= @var{b}
@end table
@section Logical operators (&& || !)
The @dfn{truth operators} combine truth values into other truth values.
There are three such operators: ``not true'', ``both true'' and
``either one true''.
The operands of these operators are used only as truth values: their
values are checked only for nonzeroness.
The operands may have any type that is acceptable as a truth value, but
the result always has type @code{int}.
@kindex !
@kindex &&
@kindex ||
@table @samp
@item ! @var{truthexp}
Not true.
Value is 1 if @var{truthexp} equals 0; 0 otherwise.
If @var{truthexp} represents a condition, @code{! @var{truthexp}}
represents the contrary condition.
@item @var{truthexp1} && @var{truthexp2}
``And'' for truth values.
Value is 1 if both @var{truthexp1} and @var{truthexp2} have nonzero
values.
If @var{truthexp1} is zero, @var{truthexp2} is not computed at all; its
side effects do not take place.
@item @var{truthexp1} || @var{truthexp2}
``Or'' for truth values.
Value is 1 if either @var{truthexp1} or @var{truthexp2} has a nonzero
value.
If @var{truthexp1} is nonzero, @var{truthexp2} is not computed at all;
its side effects do not take place.
@end table
The operators @samp{&&} and @samp{||} specify @dfn{conditional
execution}.
This means that, depending on the value of the first operand, the
second operand may or may not be executed.
This makes a difference when the second operand has side effects.
Consider by contrast @code{0 * (x = 4)}.
Its value is always 0, but it has the effect of assigning the value 4
to the variable @code{x}.
Here the sub-expression @code{x = 4} is executed unconditionally, even
in cases where its value is known in advance to be irrelevant.
Most operators in C work this way; all of their operands are executed
unconditionally.
In addition, the order in which the operands are executed is not
specified.
The operators @samp{&&} and @samp{||} are unusual: their operands are
executed in left-to-right order, and if the ultimate result is
determined after the first operand, then the second operand is skipped
entirely.
Thus, in @code{0 && (x = 4)}, since the first operand makes it certain
that the value is zero, the second operand is not computed and @code{x}
is not changed.
In @code{y && (x = 4)}, @code{x} is changed only if @code{y} is
nonzero.
Only one other C expression, the conditional expression, can omit
execution of some of its operands (@pxref{Conditional Expr}).
@section Conditional Expressions
@cindex conditional expression
@kindex ? :
A conditional expression lets you select one of two expressions based
on a truth value expression.
It looks like this:
@example
@var{truthexp} ? @var{val1} : @var{val2}
@end example
@var{truthexp} must be a number or a pointer.
If @var{truthexp} is nonzero, @var{val1} is computed and its value is
used.
Otherwise, @var{val2} is computed and its value is used.
Exactly one of @var{val1} and @var{val2} is computed.
If @var{val1} and @var{val2} have the same type, that may be any type,
and the conditional expression has the same type.
(Array and function types are excluded: if either @var{val1} or
@var{val2} is an array or function then it is converted to a pointer
``before'' the conditional expression ``sees'' it.)
In addition, the following cases of different types are allowed:
@itemize @bullet
@item
Both types are numbers.
In this case, the type of the conditional expression is determined as
if the two numbers were being added together.
@item
One operand is void.
Then the other operand may have any type, but the result is void.
@item
One operand is a pointer and the other operand is zero.
Then the value is a pointer of the same type.
@end itemize
In all of these cases, either @var{val1} or @var{val2}, whichever is
selected, is converted to the appropriate result type.
Here are some examples of conditional expressions:
@example
(3 > 1) ? 5 : 2 => 5
(3 < 1) ? 5 : 2 => 2
*p == 0 ? "end of string" : 0
@end example
The last example has type @code{char *} and its value is either the
constant @code{"end of string"} or a null pointer.
@section Trivia
Overwhelmingly used in if() statements.
``Boolean operators'', ``Relational Operators'', ``Truth Operators'',
and ``Logical Operators'' are different ways of saying the same thing.
A ``boolean variable'' can either be true or false; these are often
called ``flags''.
Many people think that the keywords defined in
@code{#include }
are much easier to read.
This ISO standard defines the keywords
@code{and and_eq bitand bitor compl not or or_eq xor xor_eq not_eq}
to be exactly equivalent to
@code{&& &= & | ~ ! || |= ^ ^= !=}
[FIXME: is this true ? is there no bitor_eq ?]
@node expressions, = and side effects, bool type, Top
@chapter expressions
@vindex expressions
@vindex operator precedence
@vindex precedence
@section Precedence
[FIXME](Table to be included when I know how to do tables in texinfo.)
@section assigning a value to an expression (var = X)
@section Trivia
According to ANSI, there is no precedence in C; instead, there are many
types of expressions.
Although their terminology is very different, the net effect is
identical to the (hopefully easier to understand) ``associativity and
precedence system'' terminology in this reference manual.
@node = and side effects, evaluation order, expressions, Top
@chapter assignments and side effects
@section Simple Assignment
@kindex =
@cindex lvalue
@cindex assignment
Simple assignment is done with the operator @samp{=}.
On the left of the @samp{=} is a place to store a value; this can be a
variable, a structure element, an array element, or the place a pointer
points.
Expressions that are allowed on the left of an @samp{=} are called
@dfn{lvalues} (left-side values).
On the right of the @samp{=} is an expression for the value to be
stored.
Let's call them @var{l} and @var{r}.
If @var{l} and @var{r} have the same type, it may be any type except
for void, array and function types.
(If @var{r} is an array or function then it is converted automatically
to a pointer before the assignment ``sees'' it.)
In addition, the following cases of mixed types are allowed:
@itemize @bullet
@item
Both @var{l} and @var{r} have numeric types.
Then @var{r} is automatically converted to @var{l}'s type and the
result is stored in @var{l}.
@item
@var{l} has a pointer type and @var{r} is the integer 0.
Then a null pointer is stored in @var{l}.
@end itemize
An assignment is an expression, and therefore has a value.
This value is the altered value of @var{l}.
However, the expression is not a lvalue; it may not be used as the
operand of unary @samp{&} or as the left side of another assignment.
@section Modifying Assignment
@cindex modifying assignment
The @dfn{modifying assignment} operators abbreviate an arithmetic
operation combined with an assignment.
Any arithmetic operator can be used.
These operators do not add any power to the language, but they are
often convenient.
Let's take the most commonly used modifying assignment operator,
@samp{+=}, as an example.
@code{@var{l} += @var{r}} is an abbreviation for @code{@var{l} =
@var{l} + @var{r}}.
It means that the value of @var{r} is added into @var{l}, not simply
stored into @var{l}.
Like simple assignments, modifying assignments are expressions and have
values.
The value of any assignment is the new value of @var{l}.
However, the expression is not an lvalue; it may not be used as the
operand of unary @samp{&} or as the left side of another assignment.
The rules for the types allowed in modifying assignments follow from
the rules for types in simple assignments and in arithmetic operators.
It must be possible to combine @var{l} and @var{r} with the arithmetic
operator used, and the result must be able to be stored into @var{l}.
The following modifying assignment operators are allowed with @var{l}
and @var{r} having any numeric types, and are also allowed if @var{l}
is a pointer type and @var{r} is an integer.
@table @code
@item @var{l} += @var{r}
This expression increments @var{l} by the addition of @var{r}.
[FIXME: way too much passive voice around here.]
@item @var{l} -= @var{r}
The value of @var{l} is decremented by the subtraction of @var{r}.
@end table
The following modifying assignment operators are allowed whenever
@var{l} and @var{r} both have numeric types (either integer or
floating).
It is not necessary for @var{l} and @var{r} to have the same type; in
fact, one may be integer and the other floating.
@table @code
@item @var{l} *= @var{r}
The value of @var{l} is altered by multiplication by @var{r}.
@item @var{l} /= @var{r}
The value of @var{l} is altered by division by @var{r}.
@end table
The following modifying assignment operators are allowed whenever
@var{l} and @var{r} both have integer types.
They need not have the same types.
@table @code
@item @var{l} %= @var{r}
The value of @var{l} is changed to its remainder in division by
@var{r}.
@item @var{l} &= @var{r}
The value of @var{l} is altered by logical-and with @var{r}.
This clears all bits in @var{l} that are clear in @var{r}.
@xref{Bitwise}.
For example, @code{x &= ~4} clears the 4's bit in @code{x}, leaving all
other bits in @code{x} unchanged.
@item @var{l} |= @var{r}
The value of @var{l} is altered by logical-or with @var{r}.
This sets all bits in @var{l} that are set in @var{r}.
@xref{Bitwise}.
For example, @code{x |= 4} sets the 4's bit in @code{x}, leaving all
other bits in @code{x} unchanged.
@