\input texinfo @c -*- Texinfo -*- @comment %**start of header (This is for running Texinfo on a region.) @tex \special{twoside} @end tex @setfilename c_reference_manual.info @settitle YARMAC 0.13--- C Reference Manual --- DRAFT 1998-08-29 @setchapternewpage odd @comment DAV: Is the header really the right place to put @comment the setchapternewpage command ? @set VERSION 0.13 @c huh ? @paragraphindent none @comment %**end of header (This is for running Texinfo on a region.) @ignore 123456789012345678901234567890123456789012345678901234567890123456789012 Only visible in source file ! [FIXME: do tabs in sample programs need to be replaced by spaces ?] [DVDEUG: YES!!!] [FIXME: consider adding some of the style guide suggestions at http://www.rdrop.com/~cary/html/linux.html to this text. ] Should this be put under a "OPL" liscense ? see OpenContent http://www.opencontent.org/home.shtml @end ignore @comment from "info:texinfo#Installing_Dir_Entries" @comment "@dircategory" and "@direntry" are used only by "install-info". @comment Why doesn't the "info:texinfo#Beginning_a_File" documentation @comment mention this ? @dircategory Programming @direntry * YARMAC: (c_reference_manual). Yet Another Reference Manual About C. @end direntry @ifinfo @c The summary description and copyright --- @c --- does not appear in the printed document. This is an unfinished, unpublished work. When finished, it will be a C reference manual and at that time you may freely distribute it under terms vaguely similar to the following: This C reference manual documents some of the features of the C language that are required for writing programs in that language. Copyright @copyright{} 1988 Richard M. Stallman; @copyright{} 1994 Peter Seebach; @copyright{} 1998 David Cary current maintainer (1998-05-26): David Cary Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. @ignore Permission is granted to process this file through TeX and print the results, provided the printed document carries a copying permission notice identical to this one except for the removal of this paragraph (this paragraph not being relevant to the printed manual). @end ignore Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided also that the sections entitled ``Copying'' and ``GNU General Public License'' are included exactly as in the original, and provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the Free Software Foundation. @end ifinfo @titlepage @c start of title page --- does not appear in the Info file. @title YARMAC @subtitle UNFINISHED DRAFT 1998-05-26 @subtitle NOT FOR DISTRIBUTION, YET @subtitle Yet Another Reference Manual About C @author current maintainer: David Cary @page @vskip 0pt plus 1filll @c start of copyright page Copyright @copyright{} 1988 Richard M. Stallman; @copyright{} 1994 Peter Seebach; @copyright{} 1998 David Cary current maintainer (1998-08-23): David Cary This is an unfinished, unpublished work. When finished, it will be a C reference manual and at that time you may freely distribute it under terms vaguely similar to the following: Published by ... current maintainer: David Cary Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided also that the sections entitled ``Copying'' and ``GNU General Public License'' are included exactly as in the original, and provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the Free Software Foundation. @end titlepage @comment node-name, next, previous, up @ifinfo @node Top, Copying, , (dir) @top YARMAC YARMAC version 0.13 (1998-08-29) DRAFT This is an unfinished, unpublished work. When finished, it will be a C reference manual. This manual is intended to cover the standard C language and all compliant C compilers. (K&R C, ANSI/ISO C, GNU C, and the upcoming C9X standard) The GNU C compiler with the @code{-ansi -pedantic-errors} options is a standard C. Global questions about this DRAFT: Yes, I know all the chapter headings are all lowercase. I've been influenced by _IEEE Spectrum_ doing the same thing; is this just a fad ? How should I handle URIs ? When this document is run through texi2html, I could make links that (a) directly jump to the referenced site. Some people prefer such links to (b) jump to a bibliography at the end; only the links in that bibliography actually exit the document. This document is intended to be a reference for people who already know a little C and who are reading other people's C source code and trying to figure out what's going on. @comment this is the ``master menu'' @menu @comment Main Chapters and appendices * Copying:: YARMAC will be free to distribute. [FIXME] * Introduction to YARMAC:: What is YARMAC ? (Overview) Who is the intended audience ? * Top-Down View of C:: A Top-Down View of C * conventions:: general language conventions * fundamental data types:: fundamental data types * variables:: variables * pointers:: pointers * Creating New Data Types:: Creating New Data Types * Arithmetic and Bitwise Operators:: Arithmetic and Bitwise Operators * bool type:: working with the bool type (true, false, and logical operators) * expressions:: expressions * = and side effects:: assignments and side effects * evaluation order:: Precedence vs. order of evaluation * sequence points:: sequence points * branching:: branching * looping:: looping * functions:: functions * scope:: scope, linkage, access and duration * I/O:: Input and Output * macros:: macros and the preprocessor * History of C:: History of C * History of this document:: History of this document * Bibliography:: Bibliography * another view:: The Compiler Writer's View @comment Indices * Glossary:: Glossary * keystroke index:: Keystroke Index * Concept Index:: Concept Index @detailmenu --- Detailed Chapters and subsections --- * Copying:: YARMAC will be free to distribute. [FIXME] * Introduction to YARMAC:: What is YARMAC ? (Overview) Who is the intended audience ? * conventions:: general language conventions * fundamental data types:: * Int Const Default:: Default Type of an Integer Constant * Int Const Type:: Explicitly Typed Integer Constants * Int Conversion:: Type Conversion among Integer Types * Int Promotion:: Default Integer Promotions * variables:: variables * pointers:: pointers * Creating New Data Types:: Creating New Data Types * Arithmetic and Bitwise Operators:: Arithmetic and Bitwise Operators * bool type:: working with the bool type (true, false, and logical operators) * expressions:: expressions * = and side effects:: assignments and side effects * evaluation order:: Precedence vs. order of evaluation * sequence points:: sequence points * branching:: branching * looping:: looping * functions:: functions * scope:: scope, linkage, access and duration * I/O:: Input and Output * macros:: macros and the preprocessor * History of C:: History of C * History of this document:: History of this document * Bibliography:: Bibliography * another view:: The Compiler Writer's View * Glossary:: Glossary * keystroke index:: Keystroke Index * Concept Index:: Concept Index @end detailmenu @end menu @end ifinfo @comment node-name, next, previous, up @node Copying, Introduction to YARMAC, Top, Top @chapter Copying (Information on FSF) @section General Public License and explanation [FIXME: Is this the latest proper language ?] Copyright @copyright{} 1994, 1995, 1996, 1998 Free Software Foundation, Inc. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the Free Software Foundation. @section Why FSF? (Richard Stallman and the Gnu Manifesto) @section Why Gnu C? @section Trivia @node Introduction to YARMAC, Top-Down View of C, Copying, Top @chapter Introduction to YARMAC C language reference manual This book intends to be a reference to the C programming language. It assumes you have already gone through the tutorial that came with your C compiler, and are familiar with editing files on your own platform. This book is not a replacement for the FAQ, or any other explanatory book; it is expected to be a mere reference. A reference manual can have bugs. Bugs include segments that are inaccurate or unclear. Bugs include a reference that should have been in the index. When you find bugs, please report them to the maintainer; currently, that's @samp{d.cary@@ieee.org}. I'll try to patch the bug, or explain why it's a feature. Please include the version number from the top of the file in any bug report. [DVDEUG: --- doesn't show up right!] The C programming language is widely available --- widely enough that there are many distinct dialects. This manual aims to cover the K&R (traditional) dialect, ANSI/ISO C, GNU C, and the upcoming C9X standard. The primary targets are ISO C (for portability) and GNU C (to go with the @samp{gcc} compiler). The K&R C style is referred to mostly for portability to obscure systems, and to help identify what was intended by code written in this style. Some side references to other dialects may be included where necessary; in particular, some of the more visible incompatibilities with C++ are covered, because this kind of information can be helpful. This book contains a lot of opinion and dogma; this is understood to be the author's personal opinion, but frequently is related to issues that look harmless until you use your second compiler. I have tried to avoid picking sides in the major religious wars, focussing instead on things that are known to introduce problems. The goal of this book is to provide a high quality reference manual for the C language, available in machine-readable form. No paper index can compete with the vast speed of modern computers at searching for information. Each chapter will have a section titled `Trivia' which will contain likely sticking points, non-obvious implications, and other things that are expected to answer the questions of experienced programmers, or which may prove interesting to know. The Appendices are intended to serve as quick references, with pointers to more detailed treatments in the text. The Index in particular, it is hoped, will be of greater value than indices in computer books usually are. @section Trivia There are 2 kinds of bugs: ones I know about (labeled FIXME), and ones I don't know about. @node Top-Down View of C, conventions, Introduction to YARMAC, Top @chapter a top-down view of C [FIXME: most programmers reference manuals start with individual characters and build up from there. I think it might be interesting to start at a high level and work down; a quick reference for people who found a C program on the 'web, and need to know how change a couple of @code{#define}s to get it working on their machines; then get more and more detailed as they need to make a little change here and a little change there... How high a level should I start ? Buckminster Fuller says to start with the universe, then start dividing. ] [DVDEUG: Right now it isn't written like this. Is there plans to write it like this, because personally I would skip it. The key - unanswered question - is the audience. If it is a _reference manual_ then it should be for people who know C. My adding the comparisons to Pascal, C++ and Java would make it for anyone with computer science. Restructuring the document like this would make it more for anyone, at the cost of reducing its value as a C reference.] [ I'm thinking about adding a section, "When should one *not* use C ?" to add a brief discussion of all the other wonderful (and free !) tools that do things that C is overqualified and/or underqualified for. Simple batch files and sed scripts, raw assembly, PERL, FORTRAN, Octave, C++, etc. ] One typically creates a executable program by using the @code{make} utility. [FIXME: Recent GNU programs tend to have the user run Configure to generate a make file.] One goes to the directory containing the source code of the C program (a bunch of files that usually end in @code{.c} or @code{.h}) @comment (and @code{.cpp} @code{.hpp} FIXME ?) and also containing a file named @code{makefile}, then types @code{make}, and if everything goes as designed, the @code{make} utility uses the @code{makefile} to find all the various parts and combine them into a single executable file. Then you type the name of that file to run it. [Really large programs typically have subdirectories, or "branches", each with their own @code{makefile} that is recursively called from the "root" @code{makefile}]. The @code{make} utility, the format of the @code{makefile}, and the process of compiling the source into a executable are outside the scope of this manual. This manual covers what is in the C source files and how that corresponds to what your executable program actually does when you run it. Inside the C source files, there is 2 kinds of text: comments intended solely for human readers, and "code" intended to be understood by a compiler. @section Overview A C program consists of a collection of files. The naming of files is arbitrary; the only thing mandated by most compilers is that the filename of the code ends in @code{.c}@footnote{ @code{.c}, not @code{.C}. Regrettably, there is a difference - @code{.C} implies a C++ program, despite this being error prone even under a UNIX-like system, and tragic under a system that is case-insensitive.}, and that can be overridden by hand. In practice, the name of a file is a brief description of the contents of the file with @code{.c} appended. In addition to @code{.c} (code) files, there are @code{.h}@footnote{Again, @code{.H} implies C++.} (header) files. To control the compilation of these files, most people use @code{make}, which automatically builds only the files that need to be built. Many high-quality projects use @code{autoconf} to help make their code easily portable to a wide variety of systems. While they are both useful tools for C programming, they fall outside the realm of this work. Please see [insert references to the manuals]. A header file contains different material than a code file. A header file contains information for more than one file: prototypes for functions[add reference], global [add reference to glossary] data, and type definitions [add reference to compound & user-defined types]. They can also @code{include} other header files using @code{#include}. A code file mainly contains the actual code of the program, @code{including} (using @code{#include}) header files so that it recognizes the information it shares with other files. @section Trivia In reality, a header file can include anything --- the preproccessor [reference?] just textually replaces each @code{#include} statement with the entire file it references. Once this is done, the compiler can't tell the difference between lines of text in the original @code{.c} source file and lines of text that were @code{#include}d. [/* was: In reality, a header file can include anything it wants. The preproccessor [reference?] just textually adds the files together, and as long as it doesn't depend on a pass of the preproccessor that occurs before the adding, it will work. */] (This author (David Starner) once had a program where the main function started in a header file and finished in the @code{.c} file the header file was included in.) This is really not reccommended; in general a header file should only contain what is mentioned above. When the GNU preprocessor encounters a @code{#include <>} or @code{#include ""}, where does it look for those files ? [FIXME: answer.] @section In Comparison @c In Comparison is written so that one who knows one of the languages can look and tell the major differences in the section, @c without reading the entire section. @subsection Java While a Java class must be saved in a file of the same name, a C file has no such restriction on file names. C also has header files (@code{.h}) for data needed in multiple files, whereas the Java compiler reads the @code{.class} files. [ARGGH! I don't know any Java (yet). But I was under the impression that @code{.java} files were source code that the Java compiler converted into @code{.class} executables, which contradicts this paragraph.] @subsection C++ C++ uses @code{.cpp}, @code{.cc}, or @code{.C} source code and @code{.hpp}, @code{.H}, or @code{.h} header files. Header files under C++ contain classes and code in form of inline functions as well as what is noted above as being in C header files. @subsection Pascal [DVDEUG: ARGGH! I have little familarity with Pascal. This is only basic Wirth & Jensen or ISO 7185 Pascal.] Whereas classic Pascal has a monolithic file structure with only one file per program, C has functions and declarations split up between files and uses header files to hold common data. [DAV may be bluffing when he says he knows Pascal:] Many versions of Pascal use Units ... @node conventions, fundamental data types, Top-Down View of C, Top @chapter general language conventions spaces, tabs, whitespace @section Program Structure @section Comments [FIXME: The description about how to use comments needs to flow.] Everything from a @code{/*} to the first @code{*/} is a comment intended solely for the human reader. Comments are ignored by the C compiler -- they have no effect on the exectable file. You can insert a comment anywhere there is "white space". Please put some comments next to code you write or change. You want your comments to tell WHAT your code does, not HOW. Documentation embedded in the source code is best. External documentation all too often is separated, lost. If the documentation is right there in the source code, it far easier to update it when the code changes. @section On one line @code{ /* @dots{} */ } @section Blocks @example /* @dots{} @dots{} @dots{} */ @end example @section Trivia Some C compilers (including GNU C) allow the C++ comment, @code{//} followed by arbitrary text up to a newline. This is a standard comment format for C9X. Unfortunately, too many early C compilers do not recognize the @code{//} comment style, so the @code{//} should not be used in portable C source code. At least one highly-contrived program compiles legally under both kinds of compilers, but executes differently. If you have code that uses the @code{//} comment style, you can convert it to the @code{/* ... */} comment style with @example sed -e 'sX^\([^"]*\("[^"]*"[^"]*\)*\)//\(.*\)$X\1/*\3*/Xg' test.cpp > test.c @end example [FIXME: what about the DOC++ comment style, ? http://www.zib.de/Visual/software/doc++/ DVDEUG: It's non-free isn't it? Don't worry about it. I will add something about literate programming. ] The extreme in commenting comes with Knuth's literate programming style, wherein @TeX{} is interlaced with actual code in order to produce programs that can be read like books. @TeX{} itself is written in this style. If you are interested, look through the Web2C documentation and get Cweb. [How to get this documentation ?] @node fundamental data types, Int Const Default, conventions , Top @chapter fundamental data types Programmers use an infinite number of possible types of data. All the different types (at least in C) are built out of the fundamental types ``built-in'' to the C language: @itemize @bullet @item the ``bool'' values (also called a "bit") It represents either true or false. @item The integers (char, int, short, long) @item The floating-point numbers (float, double, long double) @item Pointers (``*'') @item The ``un-type'' (void) @end itemize Pointers and user-defined types are covered in a later section (``Creating New Data Types''). A data type represents (among other things) the range of values that a variable can hold. This range is limited by, and specific to, the particular compiler used to compile that program. Every compiler should come with the standard header @code{}, which specifies the largest and smallest values of its fundamental data types. Note that these values are often different for different compilers. It doesn't make any sense to use any other @code{} file besides the original one that came with the compiler. @section Integers (char int long) @cindex integers An integer datum always contains some whole number such as -93, -12, 0, 1, or 69. But that's not the whole story. Every value in C must have a specific data type. As a consequence, it is impossible to simply have the value 7; it must be 7 in a particular type. The C language has several different data types for integers. Each type has a range of possible values; different types have different ranges. For example, a variable of type @code{short int} can hold any value from -32768 to 32767 in programs compiled by my compiler. A variable of type @code{int} can hold a value from -2,147,483,648 to 2,147,483,647 on most 32 bit machines. On other size architectures, the range of an @code{int} could be as small as the range -32,768 to 32,767, the same as a @code{short int}. Keep this in mind when writing programs that might be ported to a smaller machine. When you define an integer variable, you must choose one of the standard integer types for it. The type controls what range of values the variable can hold, and at the same time the amount of storage space used for the variable. It is impossible to store in a variable a value outside the range of its type; if you try to do this, the actual result is to store some other value, a value that is within the permissible range. (In practice, the extra high-order bits are discarded and the low-order bits are stored.) [FIXME: Stroustrup said once, ``unsigned integers, declared @code{unsigned}, obey the laws of arithmetic modulo 2^n. This implies that unsigned arithmetic does not overflow''. Is this really true for all standards-compliant compilers ?] Here are examples of declarations for integer variables: @example short int s; int u, v; unsigned long x; unsigned long long z; /* non-portable */ @end example You can omit the keyword @code{int} if you use any of the keywords @code{long}, @code{short}, @code{signed} or @code{unsigned}. @xref{Declarations}. @section Signed and Unsigned Types @cindex signed types @cindex unsigned types Each integral type has two forms: @samp{signed} and @samp{unsigned}. The two forms occupy the same amount of storage space and their ranges are equally large. A signed type has a range of values centered on zero, while an unsigned type has a range that starts at zero. For example, (on one particular machine using one particular compiler) the @code{ short int} has 65536 distinct values. The unsigned form, @code{unsigned short int}, can hold any integer in the range 0 to 65535, while the signed form, @code{signed short int}, has a range centered on zero, -32768 to 32767 to be exact. It is common to assume that specific sizes (16-bit @samp{short}, 32-bit @samp{long}) are ``standard''; this assumption is in error. (All too often, some people assume that @samp{int} is 16 bits. Others assume that @samp{int} is 32 bits. Obviously, both cannot be right, and sometimes both are wrong.) Similarly, it is not guaranteed that any integer type can hold a pointer, although it is quite common for it to be possible. [FIXME: This paragraph is (or should be) redundant compared to later information.] For portability, avoid trying to cast a pointer into any integer type use @samp{long} when you need more than 16 bits (up to 32 bits) use @samp{char} when you need no more than 8 bits and want to conserve space, use @samp{short} when you need more than 8 bits (up to 16 bits) and want to conserve space use @samp{int} when you need at most 16 bits and want speed over space. In general, programs that make assumptions about the sizes of the integral types are device drivers for a specific operating system, or very poorly written. @kindex signed @kindex unsigned You can specify a signed type or an unsigned type by using the keywords @code{signed} and @code{unsigned} as part of the type name. @section Table of Integer Types @kindex int @kindex short @kindex char Here is a table of all the integer types of C, together with their ranges (as documented in @code{} in a typical implementation of GNU C): You are guaranteed that @samp{char} will be at least 8 bits, @samp{short} at least 16, @samp{long} at least 32. @samp{int} will be at least as large as @samp{short}, and no longer than @samp{long}. In general, each type will be no larger than the next larger type. For example, there are implementations where all of the integral types are 64 bits. @table @code @item int @itemx signed int Four-byte signed integer; range -2^31 to 2^31-1. Guaranteed to be at least as large as @samp{short}, i.e., on smaller machines, the range could be as small as -2^15 to 2^15-1. @item unsigned int Four-byte unsigned integer; range zero to 2^32-1. On smaller machines, the range could be as small as 2^16-1. @item short int @item signed short int Two-byte signed integer; range -2^15 to 2^15-1. @item unsigned short int Two-byte unsigned integer; range 0 to 2^16-1. @item signed char One-byte signed integer; range -128 to 127. @item unsigned char One-byte unsigned integer; range 0 to 255. @item char Depending on the machine, @code{char} is an alias for either @code{signed char} or @code{unsigned char}. The only values that you can count on to fit in a @code{char} regardless of the type of machine are 0 to 127. @item long int @itemx unsigned long int These types in GNU C are equivalent to @code{int} and @code{unsigned int}. In some other C implementations, @code{long int} occupies more bytes than @code{int}. For example, in the original implementation of C, @code{int} occupied only two bytes (like @code{short int}), and to get a four-byte integer it was necessary to use the type @code{long int}. @item long long int Double precision signed integers ranging from -2^63 to 2^63-1. These integers occupy 8 bytes. (Most C compilers don't support this type.) @item unsigned long long int Double precision unsigned integers ranging from 0 to 2^64-1. These integers occupy 8 bytes. (Most C compilers don't support this type.) @end table Even though two types may be equivalent (@code{int} and @code{long int} are equivalent in my compiler, and @code{char} is always equivalent to either @code{unsigned} or @code{signed char}) they are considered distinct types. For example, the types pointer-to-@code{int} and pointer-to-@code{long int} are completely different types. @section Integer Constants @cindex integer constant @cindex octal @cindex decimal Any positive integer value can be written as a constant. There are no constants for negative integer values, but unary @samp{-} and a positive constant do the job of one. @subsection Integer Constant Radices There are three ways of writing integer constants: decimal, octal and hexadecimal. @itemize @bullet @item A decimal constant is a sequence of digits not starting with a zero. Any positive number except zero can be written this way. @item An octal constant is a sequence of digits starting with a zero. The zero tells the compiler to interpret the digits in base 8. Thus, @samp{010} has value 8, @samp{013} has value 11, and @samp{0100} has value 64. Strictly speaking, @samp{0} is an octal constant. But 0 is 0 in any radix. @item @cindex hex digit @kindex 0x A hexadecimal (or @dfn{hex}) constant is @samp{0x} (or @samp{0X}) followed by a sequence of @dfn{hex digits}. A hex digit is either a decimal digit, or a letter in the range @samp{a} through @samp{f} (upper or lower case). @samp{a} stands for 10, @samp{b} for 11, and so on, through @samp{f} for 15. Thus, the hex constant @samp{0xa} has value 10, @samp{0x10} has value 16, @samp{0x16} has value 22, @samp{0x20} has value 32, and @samp{0xff} has value 255. @end itemize Hexadecimal constants are used more often than octal constants, because it is easy to see how a hexadecimal constant breaks down into separate bytes. Each pair of hexadecimal digits makes one byte. Octal constants don't split conveniently into bytes. @node Int Const Default, Int Const Type, fundamental data types, Top @subsection Default Type of an Integer Constant Like all C expressions, an integer constant specifies a data type as well as a value. The type is usually determined by the value, unless you use a suffix letter (@pxref{Int Const Type}). The type of a decimal constant is taken from the following series: @example int, long int, unsigned long int @end example @noindent The type of a decimal constant is the first type in that series which can hold the constant's value. Thus, any value that is small enough will have type @code{int}. In GNU C, @code{long int} never plays a role because it is effectively the same as @code{int}; not so in other C implementations. The type of an octal or hex constant is taken from the following series: @example int, unsigned int, long int, unsigned long int @end example @noindent There are some values that can fit in an @code{unsigned int} but not in an @code{int}; if the constant is written in octal or hex, that unsigned type is used for such values. In GNU C (since @code{unsigned int} and @code{unsigned long int} have the same range), some values are have type @code{unsigned int} when written as a octal or hex constant, but have type @code{unsigned long int} when written as a decimal constant. @node Int Const Type, Int Conversion, Int Const Default, Top @subsection Explicitly Typed Integer Constants @kindex l @kindex u The letters @samp{u} and @samp{l} may be used as suffixes to specify the type of an integer constant. The letter @samp{u} means it must be unsigned. The letter @samp{l} means it must be long. (Upper case is accepted also; in fact, @samp{L} is better than @samp{l} because @samp{l} looks too much like a @samp{1}.) The effect of the suffix is to reject certain types from the series of possible types. (The series of possible types depends on the constant's radix; @pxref{Int Const Default}). @samp{l} rejects the types that are not long, and @samp{u} rejects those that are signed. Once those are rejected, the type used is the first of those remaining which can hold the actual value. @subsection Integer Constant Type Examples Here are some examples of integer constants and their types. @itemize @bullet @item The hex constant @code{0x80000000} needs 32 bits. On my compiler [DVDEUG: GNU C?], its type is @code{unsigned int}, because it can fit in that, whereas it is just barely too large for an @code{int}. @item @code{2147483648} is the same value, expressed in decimal. It is a @code{long unsigned int} because @code{unsigned int} is never used for decimal constants, and neither @code{int} nor @code{long int} will hold this value. @item @code{0x80000000L} is likewise a @code{long unsigned int}. @code{unsigned int} is ruled out by the @samp{L}, so the next candidate type that can hold the value is used. @item @code{0x80000000u} is an @code{unsigned int}, just like @code{0x80000000}. The @samp{u} rules out @code{int}, but that has no effect, since this value doesn't fit in an @code{int} anyway. @item @code{2147482648u} is a @code{long unsigned int}. @code{int} and @code{long int} are ruled out by the @samp{u}, and @code{unsigned int} is ruled out by the choice of decimal radix. @item @code{3l} is a @code{long int}. @code{int} is barred by the @samp{l} and @code{long int} is the next candidate for a decimal constant. [FIXME: Is this `@code{ul}' really valid ?] @item @code{4ul} is a @code{unsigned long int}. @code{int} is barred by the @samp{l}; @code{long int} is barred by the @samp{u}; @code{unsigned long int} is the next candidate for a decimal constant. @end itemize @node Int Conversion, Int Promotion, Int Const Type, Top @section Type Conversion among Integer Types C allows automatic conversion between integer types. Conversions can be requested explicitly with casts (@pxref{Casts}); they also happen automatically when the operands of an arithmetic operator have different types, and for integer promotion (@pxref{Int Promotion}). @node Int Promotion, variables, Int Conversion, Top @section Default Integer Promotions @cindex promotions (integer) In C, the @code{short} and @code{char} types (whether signed or not) are nominally never used for any operation. Values of these types appearing in arithmetic expressions are always converted to type @code{int} before any arithmetic is done, before they are passed as arguments to a function, and so on. In fact, the GNU C compiler may omit the conversion, but only when this has no effect on the result. For understanding the meaning of a C program, you can assume that the conversion always happens. @ignore Controversy over previous 2 paragraphs. [DVDEUG: Whoa! Default promotion like this disappeared with ISO C!] [DAV: Promotion seems to be alive and well. I run #include main(){ unsigned char a, b, c; a = 0xff; b = 0xff; c = (a+b)/2; printf("%i", (unsigned int)c ); } If there is *no* promotion, then (uchar)0xff + (uchar)0xff should equal (uchar)0xfe. Divide by 2, and we get 0x7f (printing "127"). However, when I compile this with $ gcc --version 2.7.2.3 and run it, it prints "255". Is there a better explanation than just saying that the chars were promoted to int, so that the result of addition (uchar)0xff + (uchar)0xff is (int)0x0ffe ? ] FIXME: What is the most understandable way of summarizing this ? I prefer easy-to-understand "as if" rules, even if a particular compiler doesn't happen to actually work that way internally -- as long as we get the same results. @end ignore @section Floating Point Numbers (float, double, long double) @cindex floating point @cindex mantissa @cindex exponent @cindex scientific notation @dfn{Floating-point} numbers are the computer's version of ``scientific notation''. Floating point data is often called ``real'' data but strictly speaking this is a misuse of language. Floating point is often used to represent real-number values, but general real numbers cannot be exactly represented, only approximated. [DVDEUG: Add reference to "What every Computer Scientist should know about floating point. Also add references to NAN and Inf.] [FIXME: is all this really necessary ? Can't we just say that fixed-point numbers can handle fractions and numbers of a certain range with a certain precision, and be done with it ?] In scientific notation, a number is represented as the product of its @dfn{mantissa}, which is a number between 1 and 10, and a power of 10. The power of 10 used is called the @dfn{exponent} of the number. Here are some examples of numbers in scientific notation: @table @asis @item 129 1.29 * (10^2) @item 100 1.0 * (10^2) @item 99 9.9 * (10^1) @item 5.5 5.5 * (10^0) @item .125 1.25 * (10^-1) @end table Floating point notation in the computer is the binary equivalent of scientific notation. The mantissa is between 1 (inclusive) and 2 (exclusive) and is represented in binary; the exponent is a power of 2 instead of 10. Here is how the previous examples would look in the computerized format: @table @asis @item 129 1.0000001 * (2^7) @item 100 1.1001 * (2^6) @item 99 1.100011 * (2^6) @item 5.5 1.10111 * (2^5) @item .125 1.0 * (2^-3) @end table Note that the exponent of the number zero is not really determined because 0 * (2^0) = 0 * (2^1) = 0 * (2^@var{anything}). By convention, when zero is represented as a floating-point number, zero is used as the exponent value. @section Floating Point Types Floating-point data types in the computer differ in how many bits are available for representing the mantissa and the exponent. The number of mantissa bits determines how much significance can be represented; the number of exponent bits determines the overall range of magnitudes that can be represented. For example, if 7 bits are available for the exponent, the range of possible exponents is from @minus{}64 to 63, so the range of possible floating point values is from 2^@minus{}64 to 1.111@dots{} * 2^63. With 8 exponent bits, the smallest possible positive value is twice as small and the largest possible positive value is twice as large. If only 4 bits were available for the mantissa, it would be impossible to distinguish the numbers 16 and 17 (10000 and 10001 in binary). Only the first 4 significant bits, 1000 in both cases, could be kept. In actuality, at least 24 bits of mantissa are always available. This translates to around 7 significant decimal digits. Since the first bit of the mantissa is always one, it is often not explicitly represented. [FIXME: Is this always true for all GNU C implementations ?] All ANSI C implementations provide three distinct data types for floating point numbers: @code{float}, @code{double}, and @code{long double}. In GNU C, @code{float} is a 32-bit single-precision number; 32 bits are available for the mantissa, exponent and sign bit. Just how the bits are apportioned among mantissa and exponent depends on the kind of computer in use. @code{double} is a 64-bit double-precision number. @code{long double} is equivalent to @code{double}, but it is considered a distinct type. @section Floating Point Constants Floating point constants let you express particular floating-point numbers in C programs. Each floating-point constant specifies a numeric value and a data type (either @code{float}, @code{double} or @code{long double}). The numeric value consists of a mantissa optionally followed by an exponent. The mantissa is a number with a decimal point. An exponent is the letter @samp{e} (or @samp{E}) followed by an integer which may have a sign. If an exponent is given, the decimal point is not required in the mantissa. Here are some examples, all of which have the value 150: @example 150.0 150e0 15e1 1.5e2 1.5e+2 1.500e2 .015e4 @end example A letter at the end of the constant specifies the data type. The letter @samp{F} (or @samp{f}) specifies type @code{float}. The letter @samp{L} (or @samp{l}) specifies type @code{long double}. No letter at all specifies the default, which is @code{double}. It is rarely necessary to use letters to specify the type explicitly. One time when it is useful is when using the constant in arithmetic together with values of type @code{float}: if you do not explicitly specify the type, the constant is a @code{double}. The compiler will add code to convert the other values to @code{double} and the arithmetic would be done in @code{double} precision. If the result that you want is a @code{float}, the extra conversions would make the program unnecessarily slow. You can avoid the extra conversions by explicitly specifying the type of your constant as @code{float}, like this: @example @{ float *x, y; *x = (y + 1.3f) * 2.4f; @} @end example @section the ``un-type'' (void) The ``un-type'' @code{void} is used only in these 3 common situations: @itemize @bullet @item the type of the single argument to functions which take no arguments @item a generic pointer, i.e., a pointer of type @code{void *}, can point to a object of any type (see Pointers) @item the return type of a function which doesn't return anything (see Functions for both flavors of this situation). @end itemize There are no objects of type @samp{void}. @section Numeric Type Conversion In C, any numeric type can be converted automatically to any other numeric type. Type conversion happens in assignments, in arithmetic, and in casts (@pxref{Casts}). It may also happen in @code{return} statements (@pxref{Return}) and in function calls when a prototype is in effect (@pxref{Prototype}). For example, if @code{x} is a variable declared as @code{int} and @code{f} is declared as @code{float}, then @example f = x; @end example @noindent converts the value of @code{x} to floating point and @example x = f; @end example @noindent converts the value of @code{f} to an integer. If a constant appears in a context where it would need to be converted immediately to another type, GNU C converts it while compiling the program. Normally this makes no difference except to speed up execution. @section Integer Conversion The general rule when converting a value from one integer type to another is that the numeric value is unchanged if it is within the range of possible values for the new type. If it is outside the possible range, then the number's bit pattern is preserved.[FIXME: This is confusing.]. If the number has too many bits to fit, then the least significant bits are kept, as many as will fit. @cindex extending Converting an integer of a narrower type to a wider integer type (such as @code{char} to @code{int}) is called @code{extension}. If the types are signed, it is called @code{sign-extension}. If the original type is unsigned, it is called @code{zero-extension}. In either case, the number keeps the same value. There is one other case of extension, from a signed type to an unsigned one. This case is an exception because only positive values can go through unchanged; negative values cannot do so because the unsigned type cannot represent them. A negative number large in absolute value becomes a small positive number, and a negative number close to zero becomes a large positive number. This case is error-prone, so check carefully whenever you write code that converts @code{signed} numbers to @code{unsigned}. @cindex truncation When a value of wider type is converted to a narrower type, it keeps the same value if possible; but often this is impossible. For example, 513 (1000000001 in binary) cannot keep the same value when converted to a @code{char}; it is outside the possible range of a @code{char}. In this case, the least significant bits remain the same and the rest are lost. Thus, 513 converts to the @code{char} value 1. This is called @dfn{truncation}. Sometimes truncation of a positive value has a negative result. For example, truncating 129 (10000001 in binary) to a @code{char} has the value @minus{}127 because the first 1 in the number is now the sign-bit. Of course, this happens only when the result type is a signed type. There is one other case of integer type conversion, that where the old and new types are equally wide but one is signed and the other is unsigned. In this case, the bit pattern is preserved. For example, when converting from @code{char} to @code{unsigned char} or vice versa, values 0 through 127 are unchanged. @code{char} values @minus{}128 through @minus{}1 map into @code{unsigned char} values 128 through 255, respectively, and vice versa. It was shown above how 129 as an @code{unsigned char} corresponds to @minus{}127 as a @code{char}. @section Floating Point Conversion When a value of type @code{float} is converted to @code{double}, it keeps the same numeric value. @code{double} can represent anything that @code{float} can. Likewise when @code{float} or @code{double} is converted to @code{long double}, accuracy is maintained. @cindex floating overflow When a value of type @code{double} is converted to @code{float}, two kinds of problems must be faced. @itemize @bullet @item @code{float} has fewer mantissa bits. The most significant mantissa bits are kept, as many as will fit, so that the result is close to the original value even if not exactly the same. @item @code{float} has fewer exponent bits, so its largest possible value is smaller. If the number being converted fits in the possible range of a @code{float}, this problem has no effect. If the number does not fit, the result is pure garbage, this being an example of @dfn{floating overflow}. @end itemize @section Integer to Floating Point When an integer value is converted to a floating point type, in general, the result is the floating point value which is numerically closest to the original integer. In some cases, the integer can be represented exactly. For example, converting the integer 5 to @code{float} results in the number 1.25 * 2^2, or, in binary, 1.01 * 2^2, whose value is exactly 5. But this is not possible for large integers. An @code{int} has 31 significant bits; in a @code{float}, some of the 32 bits are needed for sign and exponent, leaving typically 24 bits of significance. Integers greater than this cannot be represented exactly. For example, both 268435456 and 268435457 convert to the same floating point number (these integers are 2^28 and 2^28+1). This loss of significance does not happen when converting an @code{int} to a @code{double} because type @code{double} has more than 32 bits of mantissa. @section Floating Point to Integer When a floating-point value is converted to an integer type, the result is the nearest integer, rounding toward zero. Thus, 1.5 converts to 1, and @minus{}1.5 converts to @minus{}1. A floating-point value may far exceed the range of a @code{int}. For example, the largest possible @code{float} value is at least 2^64 --- much too large for an @code{int}. When such values are converted to @code{int}, the result is undefined. [FIXME: this section needs work] @section Trivia You can make @code{bool} variables in C++, but not in ordinary C. C9x has provisions for boolean variables. [DVDEUG: Specify!!] Some compilers (GNU C among them) add the type @samp{long long}, which is most often 64 bits. It is not compatible with ISO C, in which it is a syntax error, but it may prove helpful or necessary during porting projects. [FIXME: a few compilers have something vaguely similar to _int16, _int32, _int64, and others - are they worth mentioning them here ? DVDEUG: I don't see why; they are extremely non-standard, and not part of GNU-C. ] @section In Comparison A function with a void return type is usually called a ``procedure'' in most other languages. @node variables, pointers, Int Promotion, Top @chapter variables @section declaring variables @example int x; @end example @noindent declares @code{x} to have type @code{int}. Every variable used in a C program must be defined, in a @dfn{declaration}, before it is used. The declaration has five purposes: @enumerate @item To give the function or variable a name, so it can be used later. @item To describe the data type of the function or variable: for example, whether the value is an integer or a character string. This is done with a @dfn{type specifier} and a @dfn{declarator}. [@var{declarator} may not be an English word, but it is the standard term.] @item To specify how storage for a variable should be allocated. This is done with a @dfn{storage class} (@pxref{Storage Class}). @item To specify the @dfn{scope} of the name: for example, whether the name is known in an entire program or only in the current file or function. The storage class fills this role also. @item Optionally, to give an initial value. This is done with an @dfn{initializer} (@pxref{Initializers}). @end enumerate @section initializing variables If the variable is static or automatic [FIXME: what other kind of variable is there ?], an initializer may be added, as in @example int x = 5; @end example @noindent which is the same as the previous example except that @code{x} is initialized to 5 when its storage is allocated. @xref{Initializers}. @section Assignment statements (and combinatorial assignment) [FIXME: huh ?] @section choosing variable names start with letter ... number, ... underscore ... ... [FIXME: is there a maximum length ? ] ... Normal programs cannot use the C keywords for identifiers (variable names, and function names, and user-defined type names). I also highly recommended that you do not use these other special reserved words for identifiers: @table @asis @item Words that start with underscore @item C keywords [FIXME] @item C++ keywords asm catch class delete friend inline new operator private protected public template try this virtual throw @end table @section Type Conversion @section Automatic type conversion @section Type casting @section Quantization Errors @node pointers, Creating New Data Types, variables, Top @chapter pointers @section the pointer type A pointer represents the address of a block of memory, together with the data type of the block. Pointers have several uses: @itemize @bullet @item Pointers represent character strings. [FIXME: Is this confusing ? Is this a good pedagogical viewpoint --- that character strings are directly related to pointers, rather than merely being of type @code{char []} ?] @item A subroutine can be told where to store its output-value by giving it a pointer to the desired place. @xref{Address}, for an example of this use. @item A subroutine can be told which function to call by giving it a pointer to the desired function. @xref{@code{quicksort()}} for an example of this use. @xref{Function Pointers}. @item Trees and linked lists can be created by storing pointers to blocks of data into other blocks of data. @xref{Lists}, for an example of this use. @end itemize @section declaring pointers @cindex pointer types @cindex pointer declarations In C, every expression must have a single clearly defined data type. This includes an expression to refer to the contents of a pointer. C determines the type of the contents by the type of the pointer. Therefore, C has many types of pointers --- one for each type of contents. Each C data type @var{t} has a corresponding pointer type, the type of pointers-to-@var{t}. A value of type pointer-to-@var{t} describes the address of a block of memory whose contents have type @var{t}. To declare a variable @var{v} to have type pointer-to-@var{t}, pretend you are declaring @code{* @var{v}} to have type @var{t}. (This isn't much of a pretense, because @code{* @var{v}} will be an expression of type @var{t}.) @xref{Declarations}. For example, to declare @code{p} as a pointer to a @code{char}, write: @example char* p; @end example [FIXME: Can I delete this paragraph ? Does it say anything that hasn't already been said, and better, by the previous few paragraphs ?] A pointer type is a derived type, and cannot be the basic type of a declaration. To declare a variable with pointer type, you must also specify: ``To what type of thing does this variable point ?''. To declare @var{v} with type pointer-to-@var{t}, one must declare the complex declarator @code{* @var{v}} to have base type @var{t}. For example, [/FIXME] @example char* string; @end example @noindent declares @code{string} to be a pointer to @code{char}. Here the declarator is @code{* string} --- a complex declarator that expresses the relationship between @code{string}'s type and the declaration's basic type (@code{char}). To express pointers to types that are not themselves basic, the @code{* @var{var}} construct is nested within other declarator constructs. For example, a pointer to a pointer-to-@code{char} is declared as follows: @example (char (*(* stringptr))); @end example In this case, the parentheses are optional. This is exactly equivalent to @example char** stringptr; @end example A pointer-to-a-pointer is commonly called a @dfn{handle}; in this case, we have a ``handle-to-a-@code{char}''. If you want a variable named @code{funcptr} to point to function taking two @code{double} arguments and returning @code{int}, write: @example int (*funcptr)(double, double); /* funcptr is a pointer variable */ @end example @noindent Here parentheses are required around @code{*funcptr} to specify that the @code{*@var{var}} construct is nested within the function-type construct. [FIXME: David still doesn't know how to parenthesize arbitrary type declarations ... is there a simple rule ?] If you had written @example int* funcptr(double, double); /* function prototype */ @end example @noindent the compiler would think that you were declaring the function prototype @example int* (funcptr(double, double)); /* identical function prototype */ @end example @noindent a function whose value is a pointer to an @code{int}. @xref{Precedence}. You can add a initialization to a pointer declarator for static and automatic variables [FIXME: what other kind of variables is there ?]. For example, @example char* string = "Hello"; char **stringptr = &string; int (*funcptr) (double, double) = &double_divide_and_round; @end example @noindent Note that the initializer is added after the entire declarator, but the value of the initializer must have the same type as the variable being declared --- @emph{not} the basic type of the declaration. [@var{initializer} is not an English word, but a special term for talking about C programs.] @subsection The generic pointer type @code{void *} The type @code{void *} is used, by convention, for the address of a block of memory to which no particular type is ascribed. For example, dynamic memory allocation functions typically return this type. If a dynamic allocation function is intended for general use, then there is no telling what type of data the caller wants to allocate --- any C data type is possible --- so there is no reason to prefer any one type for the function to return. But the value must have @emph{some} type. @code{void *} is a noncommittal choice. A pointer of type @code{void *} has no ``contents''; you cannot apply the @samp{*} operator to it. However, you can cast it to any other pointer type, and @emph{then} apply the @samp{*} operator. For example, the following is valid: @example char c; int i; struct foo s; void * x; x = malloc( sizeof(foo) ); c = * (char *) x; i = * (int *) x; s = * (struct foo *) x; @end example [FIXME: Is this really valid ? I've seen some mainframe operating systems, if you try to read data out of a uninitialized block, will core dump your program.] [FIXME: perhaps a more useful example would be better here.] @noindent Here the block of memory that @code{x} points to is examined first as a @code{char}, then as an @code{int}, and finally as a @code{struct foo}. @code{void *} pointers may not be added or subtracted, but they may be compared like any other pointers. @section where do pointer values come from ? Pointer values arise in three ways: @itemize @bullet @item The address operator @samp{&} can make a pointer to any variable, function, array element or structure element. (Even variables of the user-defined data types discussed in the next chapter.) @item Dynamic storage allocation reports its results as a pointer to the memory that was allocated. @item A null pointer can be made by converting zero (@code{false}) to a pointer type. @end itemize @subsection Address of a Variable @kindex & (unary) @cindex address The unary operator @samp{&} returns the @dfn{address} of a variable (or other lvalue). The contents of this pointer are that variable. [FIXME: Does this sentence make sense ? or is this redundant from our discussion of @code{*} ?]. @samp{&} can be applied to both local and global variables. For example, suppose that @code{read_two()} is a function that reads two integers from an input file. A function can return only one value, so the most convenient way to get two integers back from @code{read_two()} is to provide two pointers as arguments, saying where to put the integers. Then, if we want the integers to be stored in the variables @code{i1} and @code{i2}, we can write: @example read_two(&i1, &i2); @end example We would use the following declaration for @code{read_two()} (for info on @code{void}, @pxref{Void Functions}): @example void read_two(int* i1, int* i2); @end example @samp{&} is not limited to variables. It can also be used with structure, union and array elements. For example, suppose that @code{a} is an array of @code{MAX_INTS} integers and we want to fill it up with pairs read with @code{read_two()}. The following code will work: @example int a[NUM_INTS]; int i; for(i = 0; i < MAX_INTS; i += 2)@{ read_two(&a[i], &a[i + 1]); @}; @end example @code{&@var{a}[@var{i}]} means a pointer to element number @var{i} in array @var{a}. @subsection Dynamic Allocation (malloc and free) @dfn{Dynamic allocation} means obtaining a block of memory which is allocated during the execution of the program. When memory is allocated dynamically, its size need not be known in advance. For example, you can write functions to operate on strings with no fixed upper limit on the size of the string. A dynamically allocated block of memory cannot have a variable name in the ordinary sense. The only way to refer to it is with a pointer. In the following examples we use @code{malloc}, which is a standard library function for dynamic allocation. It is documented elsewhere (see ...[FIXME]). For now it is enough to know that the argument to @code{malloc} is the number of @code{char}s of storage desired, and its value is a @code{void *} pointer to the block that was allocated (@pxref{Void Pointers}). For example, suppose we want character string, but we don't know until run time how long it needs to be. Once our program discovers it needs @code{size} characters, it can allocate the character string dynamically with @example string = (char *) malloc (size + 1); @dots{} free(string); @end example @noindent where a cast is used to convert the pointer to the correct data type. A very common error known as a ``memory leak'' happens when you repeatedly ask for more memory, but ``forget'' to give it back when you are done with it. This causes blocks of memory that you no longer need to steadily build up. When the program ends, these blocks are returned to the system; but if your program runs for a long time, eventually there may be no memory left. If there is not enough memory left to fulfill your request (either your program or other programs in the system have already used it all up), then @code{malloc()} returns a null pointer. C++ completely replaces @code{malloc()} and @code{free()} with the much easier to use operators @code{new} and @code{delete}. @example string = new char[size+1]; // This only works in C++ @dots{} delete string; @end example @subsection Null Pointers A pointer of any type may have the null value. Whenever a pointer happens to have the null value, we call the pointer ``@dfn{null pointer}''. The purpose of a ``@dfn{null pointer}'' is to be a distinguishable value that you can put in a pointer variable to say, ``As of now, this does not point anywhere.'' To create a null pointer, cast the integer zero to the pointer type that you want. For example, @code{(char *) 0} is an expression for a null pointer to a @code{char}. @code{0} is automatically cast to a pointer of the correct type when it is assigned to a pointer variable or compared with a pointer value. A null pointer has no contents. If a pointer used as the operand of the @samp{*} operator is null, it is an error. On some machines, the results are unpredictable; on others, the result is inevitably a fatal signal (the program will core dump). If a pointer value may be null, you should check whether this is so before attempting to use its contents. The way to do this is to compare against a null pointer expression or the integer zero. For example, @example #include void safe_contents(char* p) @{ if(0 == p)@{ /* The compiler automatically casts this `0' to a `(char *)0' */ printf("this is a null pointer.\n"); @}else@{ printf("this pointer points somewhere - it points to \"%s\".\n", p); @}; @} void main() @{ char * x = "TEST"; safe_contents(x); x[0] = 0; safe_contents(x); x = 0; safe_contents(x); @} @end example @noindent causes this to be printed: @example this pointer points somewhere - it points to "TEST". this pointer points somewhere - it points to "". this is a null pointer. @end example @section what do I do with pointer values once I have them ? @subsection dereferencing (*) @cindex contents @kindex * (unary) Most of the time, a pointer will actually point to a memory block. We call the contents of that memory block the @dfn{contents of the pointer}, for short. To get the contents of a pointer, apply the unary @samp{*} operator to the pointer value. Another operator that is used with pointers to structures is @samp{->}. It takes one structure element of the contents when the contents are a structure. @xref{Structure Pointers}. ... illegal/undefined when the pointer is not pointing at a ``real'' block ... can cause core dump ... most random values, as well as the null value ... ... @subsection pointers and strings @subsection pointer arithmetic @cindex addition (pointer) @cindex subtraction (pointer) Two arithmetic operations are defined on pointer types: addition and subtraction. Not all pointer data types support them: pointers to @code{void} do not, and pointers to functions do not. But all other pointer types do. Addition and subtraction on pointers can also be done with the modifying assignment operators (@pxref{Modify}) and the increment/decrement operators (@pxref{Increment}). [FIXME: should we mention the type @code{size_t} here ?] @table @code @item @var{p} + @var{i} @itemx @var{i} + @var{p} The result of adding a pointer @var{p} and an integer @var{i} is a pointer of the same type as @var{p}, but advanced from @var{p} by @var{i} objects --- by @var{i} times the length of the object that @var{p} points to. This means that if @var{p} points to an element of an array, @code{@var{p}+@var{i}} points @var{i} elements later. Thus, @example &a[3] + 2 @end example @noindent is equivalent to @code{&a[5]}; it takes the address of the third element and then advances it by two elements' worth. This is true whether the elements are @code{char}'s or @code{double}'s or large structures. In fact, @code{&a[@var{i}]} is equivalent to @var{&a[0] + @var{i}}. @item @var{p} - @var{i} Subtracting an integer from a pointer is really nothing new. This expression is equivalent to @code{@var{p} + (- @var{i})}. @item @var{p1} - @var{p2}. Subtraction is also allowed between two pointers of the same type. The result (an integer) tells how far apart the two pointers lie, measured in units of the objects pointed to. For example, @example &a[5] - &a[3] @end example @noindent is invariably 2. (Note that these pointers may be hundreds of bytes apart if @code{a[]} is a large structure type). The compiler subtracts the addresses, then divides the result by the size of the objects to which they point. The subtraction is legitimate only if this division comes out even; the result is not considered well defined otherwise. When the subtraction is well defined, the result can be added to @var{p2} to give back @var{p1}. @item @var{p}[@var{i}] The array indexing operator, @code{[]}, can be used with a pointer in place of an array. In effect, it regards the pointer as pointing to the first element of an array, and fetches the contents of the @var{i}th element. This expression is equivalent to @example *(@var{p} + @var{i}) @end example @end table @subsection Comparison of Pointers All of the comparison operators can be used on two pointer values of the same type (@pxref{Comparison}). The integer zero may also be used as one of the operands. Zero is converted automatically to a null pointer of the same type as the other operand. @samp{==} and @samp{!=} test whether two pointer values are identical (point to the same place). The order-comparisons @samp{>}, @samp{<}, @samp{>=} and @samp{<=} test pointers according to the order in memory of the places they point to. Smaller addresses are considered ``less''. [FIXME: I (DAV) used a C compiler that put the 20 address bits of its machine into 3 bytes, but @code{int} was merely 16 - does this make the following statement wrong/non-compliant, or was my compiler merely non-compliant ? What about the type @code{size_t} ?] Comparing two pointers gives the same result as casting them both to @samp{int} (on some machines) or @samp{unsigned int} (on other machines) and comparing the integers. @xref{Pointer-Integer}. @subsection pointers, structures and lists @subsection passing values between functions by pointers @subsection pointers to functions @subsection Pointer-Integer Conversion A cast (@xref{Casts}) can convert an integer value to a pointer value, or a pointer value to an integer value. The ANSI C standard does not specify exactly what this conversion means. GNU C keeps the same bit pattern when it converts. As a consequence, the conversion takes no time to execute. Another consequence is that result of converting any pointer to an integer is the difference in bytes between that pointer and a null pointer. In fact, for a pointer to a @code{char}, converting to @code{int} is the same as subtracting a null pointer. In GNU C, converting a pointer to an integer [FIXME: what kind of integer ? surely not a @code{short int} ?] and then back to a pointer produces a value equal to the original pointer. The same is true if an integer is converted to a pointer and then back to an integer. @section Trivia @node Creating New Data Types, Arithmetic and Bitwise Operators, pointers, Top @chapter Creating New Data Types @section Arrays @subsection declaring and initializing arrays @cindex array @cindex index An @dfn{array} is a sequence of elements, all of the same type (the ``element type''). An individual element is identified by its sequence number (called its @dfn{index}). An array type is a derived type, and cannot be the basic type of a declaration. To declare a variable with array type, you must always specify: ``What type of things are in this array ?''. You must usually also specify ``How many things are in this array ?'' (the ``@var{length}'' of the array, occasionally called the ``size'' of the array). To declare an array @var{a} with @var{length} elements of type @var{t}, one must declare the complex declarator @code{@var{a}[@var{length}]} to have type @var{t}. For example, @example char buffer[5]; @end example @noindent declares an array of 5 @code{char} variables; and names the array @code{buffer}. Here the declarator is @code{buffer[5]} --- a complex declarator that expresses the relationship between @code{buffer}'s type and the declaration's basic type (@code{char}). The length of an array type must be an integer. The ANSI C standard requires the length of an array type to be a positive constant known at compile time. GNU C also allows zero. GNU C also allows the length of an array of storage class @code{auto} to be any expression, which is recomputed each time space for the array is allocated (If the length is negative, the results are undefined.) The length of the array may be omitted if an initializer is present because the number of elements in the initializer shows how big the array must be. The length of the array may also be omitted for an external variable. The length of the array may also be omitted in function prototypes: @example float average_foot_smelliness( int number_of_feet, float foot_smelliness[] ); @end example @noindent Unfortunately, only the length of the *last* dimension of a multidimensional array may be omitted in a function prototype - all the other dimensions must be explicitly set in the function prototype. This makes it impossible to write a function to directly accept a 2D array of arbitrary size. There are various (incompatible) tricks to work around this inadequacy. [FIXME: should I mention a few ?] [FIXME: Is there any difference between `initialization' v. `initializer' ?] You can add an initialization to an array declarator for static and automatic arrays. The initializer for an array consists of a pair of braces surrounding a sequence of element expressions. The first item in the sequence initializes array[0], the next initializes array[1], etc. Once we run out of element expressions, the rest of the array is initialized to zero. For example, @example char * table[3] = @{"small", "medium", "large"@}; int values[3] = @{2, 20, 8192@}; int state[3] = @{@}; /* zero out the entire array */ @end example In strict ANSI standard C, the elements of an array initializer must be compile-time constant expressions. GNU C allows arbitrary expressions to initialize elements of automatic arrays; for a static array, since the initialization is done when the program is loaded, the value must still be constant. Array types in C are unusual because no expression can have an array type. Array types are used only for declaring arrays (variables of array type). Functions cannot be declared to return any array type. Whenever an array variable name appears as an expression, it is immediately converted to a pointer. That pointer points to the first element of the array. Even indexing works this way. (The @var{length} of an array is also called the length of the array). @subsection working with arrays Referring to an element by its index is called @dfn{indexing}. In C, indexing is represented with square brackets, as in @code{buffer[2]}. In C, indices always count from zero. The previously defined @code{buffer} contains 5 elements, but 5 would not be a valid index. Any attempt to read or write to buffer[5] may cause a core dump. The only valid indices to this buffer are 0, 1, 2, 3 and 4 --- in other words, we can now read and write to buffer[0], buffer[1], buffer[2], buffer[3], and buffer[4]. To express arrays of types that are not themselves basic, the @code{@var{var}[@var{length}]} construct is nested within other declarator constructs. For example, an array of pointers-to-@code{char} is declared as follows: @example char (*(stringptr[512])); @end example @noindent or more simply @example char * stringptr[512]; @end example This declares @code{strings} as an array of 5 elements, each of which is a @code{char *}. We declare @code{strings[5]} as a pointer to a @code{char}, and that in turn is done by declaring the complex declarator @code{*strings[5]} --- as a @code{char}. @example char *strings[5]; @end example And this declares @code{matrix} as an array of 9 arrays of 10 @code{int}'s. @example int matrix[9][10]; @end example @noindent Here we pretend to declare @code{matrix[9]} as an array-of-10-@code{int}'s, so @code{matrix} itself must be an array of 9 of those. (As an expression, @code{matrix[0]} would be the first subarray, and @code{matrix[0][9]} would be the last @code{int} in that subarray.) The length of an array may be omitted when you declare an initialized variable, because then it can be determined from the initializer. @xref{Initializers}. @section Indexing @cindex indexing @dfn{Indexing} an array means referring to one element by specifying its index. In C, indexing is represented with square brackets. @table @code @item @var{array}[@var{index}] This expression represents the value of the @var{index}th element of @var{array}. It is a lvalue; that is to say, it may appear on the left side of an assignment. That is how values are stored in array elements. @end table Using @var{array} in an expression converts it immediately to a pointer to the first element of the array. The indexing operation actually operates on this pointer. It can equally well operate on any pointer. It is equivalent to @code{*(@var{array} + @var{index})}. From this equivalent form, we see that indexing is a symmetrical operation. It follows that you can just as well write @code{@var{index}[@var{array}]}. In other languages, array indexing may check that the index is within the valid range for the array that is in use. In C, this is impossible because the indexing operation actually operates on a pointer to the first array element. This pointer carries no information about the length of the array. Indices that are nominally out of range are often useful. For example, when indexing a pointer that is not an array, negative indices may be useful. If @var{p} is a pointer to an element in the middle of an array, @code{@var{p}[0]} is that element, @code{@var{p}[1]} is the following element, and @code{@var{p}[-1]} is the previous element. Indexing by a value that appears ``too large'' is useful also. Often it is necessary to allocate arrays dynamically. Standard C does not define array types with varying length, so the usual practice is to declare the array with length 1 but actually allocate space for as many elements as are needed. It's the programmer's responsibility to keep track of how many elements were actually allocated. Then any index less than that number is valid in fact, even though it exceeds the nominal length with which array was declared. @subsection Multi-dimensional arrays @subsection Trivia Multi-dimensional arrays are not very easy to use in C. Most people who need them re-implement them ... The ``element type'' is the data type of all the elements of the array. In C, the ``element type'' of an array may be any type except for function types and @code{void}. For example, arrays of arrays are allowed, and so are arrays of structures and arrays of pointers. Arrays of pointers to functions are sometimes useful. @section Characters and Strings @subsection initializing strings @subsection Null termination @subsection working with strings @subsection Trivia @section Structures @cindex structure @cindex element @cindex member @cindex field @subsection Structures @samp{struct} @comment - didn't I already say this elsewhere ? A @dfn{structure} is a data object containing several sub-objects, each of a specified name and type. They need not all have the same data type. The sub-objects are called @dfn{elements}, @dfn{members} or @dfn{fields} of the structure. We also use the term ``element'' for a sub-object of an array. We use the term ``member'' (and ``field'') only to indicate a sub-object of a structure. In an array, a numeric index selects an element. In a structure, a name selects an element. [FIXME: is ``member'' always an exact synonym for ``field'' ?] [FIXME: is there a special term that always indicates a sub-object of an array, a term that never indicates a sub-object of a structure ?] @subsection defining structures @kindex struct In C, each kind of structure is a distinct data type and is distinguished by a name called the @dfn{structure tag}. You must define each kind of structure, specifying its structure tag name and the names and types of all the fields. Here is an example: @example struct fontunit@{ char code; int height, width, kern; int * bitmap; @}; @end example @noindent This defines a structure type that might be used to record the information about one character in a font. The structure tag name is @code{fontunit}. The structure contains five fields: one of type @code{char} named @code{code}; three of type @code{int} named @code{height}, @code{width}, and @code{kern}; and one of type @code{int *} named @code{bitmap}. Once this type is defined, @code{struct fontunit} behaves as the name of a data type, much like @code{int}. So it can be used to declare variables. @subsection declaring structure variables For example @example struct fontunit temp; struct fontunit *nextunit; @end example @noindent declares @code{temp} to be a structure of this type. We say that @code{temp} is ``a @code{struct fontunit}''. This means that @code{temp} is allocated a block of memory that has enough room for all five fields, one after the next. By contrast, @code{nextunit} is declared as a pointer to a @code{struct fontunit} (@pxref{Pointers}). @code{nextunit} is allocated a block of memory that has enough room for a single pointer. @subsection Structure Forward References @cindex forward reference In fact, it is possible to use the type @code{struct fontinfo} for some declarations even before it is defined. Before its definition, the amount of memory space needed to hold it is not known. So you are not allowed to define variables or structure fields of that type. But you can define @emph{pointers} to that type. For example, the following is legitimate: @example struct fontunit *nextunit; struct fontunit @{ char code; int height, width, kern; int *bitmap; @}; @end example @noindent The declaration of @code{nextunit} makes a forward reference to a structure type not as yet defined. After the definition of @code{struct fontunit} is seen, the C compiler fully understands the data type of @code{nextunit}. Until that time, it would be invalid to refer to the contents of @code{nextunit} with @code{*nextunit}. Undefined structure types can validly exist only buried within pointer types. The forward reference capability is essential for defining recursive pointer-structures. For example, @example struct mymove @{ enum piece_type piece; char new_x, new_y; struct mymove *alternative; struct hismove *next_move; @}; struct hismove @{ enum piece_type piece; char new_x, new_y; struct hismove *alternative; struct mymove *next_move; @}; @end example @noindent defines a data structure that might be useful in a game-playing program. Each @code{struct mymove} represents a move that the player might make; it belongs to a chain of alternative moves. It also points to the beginning of a chain of possible moves for the opponent, a chain of @code{struct hismove} structures, one for each move the opponent might then make. And each @code{struct hismove} structure points to another chain of @code{struct mymove} structures describing the possible responses for the player. Clearly these two structures could not be defined without a forward reference. But even the @code{struct mymove *alternative;} in the definition of @code{struct mymove} counts as a forward reference. @subsection Anonymous Structure Types It is possible to define a structure type that has no structure tag name. This is an anonymous structure type. Because it is impossible to refer to the type again, the definition of the type must appear in a declaration of one or more variables. The variables declared therein are the only ones that can have this anonymous type. For example, @example struct @{ int i; double d; @} struc1, struc2; @end example @noindent declares each of the variables @code{struc1} and @code{struc2} to contain an @code{int} and a @code{double}. This feature in its simplest form is not useful; you could just as well define each field as a separate variable. But in more complex usage it may be useful. For example, it is possible to copy @code{struc1} into @code{struc2} with a single assignment expression. Individual variables for the fields could not be copied as a group in this way. Also, an array of anonymous structures may be useful. For example, @example struct @{ int i; double d; @} a[10]; @end example @noindent defines an array of 10 @code{int}-@code{double} pairs. The analogous feature for unions is very useful. @xref{Anonymous Unions}. @subsection Structure Redefinition and Scope Structure tag names obey the same scoping rule as variable names do (@pxref{Scoping}). Each function definition, and each compound statement, forms a scope. The entire source file also forms a scope. A structure tag is in effect only during the innermost scope that contains the structure type definition. For example, if you define a structure tag name within a function definition, the tag name is defined only within that function. Another structure of the same name could be defined in the next function with no conflict. Structure tag names and variable names are completely independent. For example, you can have a structure named @code{foo} and a variable, function or type named @code{foo} with no interference. This is actually a common thing to do. However, structure tags, union tags and enum tags share one name space. Thus, you may not have @code{struct foo} and @code{union foo} defined at the same time in one scope. An attempt to do this will elicit an error message. @subsection Shadowing Structure Tags @cindex shadowing It is invalid to define the same structure tag name twice in one scoping level. But a name defined in an outer scope can be temporarily redefined for an inner scope. This is called @dfn{shadowing} the name's outer definition. For example, you can define a structure tag outside of function definitions (a definition whose scope is the whole file) and make an overriding definition of the same name inside a function definition. Within that function, the meaning of the structure tag name is the definition given in the function. After the end of the function, that definition ceases to exist and the tag name has its original meaning again. Here is an example: @example struct foo @{ int i, j; @}; double func(double x) @{ struct foo @{ double i, k; @}; struct foo * ptr; @dots{} return( ptr->i + ptr->k ); @} /* @i{the first definition of @code{struct foo} is once again in effect} */ @end example Shadowing is not usually a good idea. It is clearer to pick distinct names for your structure types. Occasionally it may be useful together with macros: a macro that expands into a compound statement might define a structure type for use within that compound statement. Shadowing makes it possible to do this without interference from the surrounding context. Because structure tags, union tags and enum tags come from the same name space, you can shadow one kind with another. For example, you can shadow a union tag name with a structure definition: @example union converter @{ int i[2]; double d; @}; int foo () @{ struct converter @{ char* defn; @}; @dots{} @} @end example @subsection Accessing Structure Elements @kindex . @cindex field access The binary operator @samp{.} refers to a field of a structure. The left operand is an expression whose type must be a structure. The right operand is not an expression. It is the name of one of the fields of that structure. Thus, after the declarations @example struct point @{ int x, y; @}; struct point cursor; struct * nextpoint = &cursor; @end example @noindent the expression @code{cursor.x} retrieves the @code{x}-field of the structure @code{cursor}. The expression @code{((*nextpoint).x)} retrieves the same value, but we usually abbreviate that as @code{nextpoint->x} (@pxref{Structure Pointers}). The ``@samp{.} expression'' is a lvalue if the left operand is (@pxref{lvalue}). Being a lvalue means its address can be taken with @samp{&} (@pxref{Address}) and usually that a value can be stored there with an assignment (@pxref{Assignment}). It is an error to use a left operand whose type is not a structure or union. It is an error to use a field name that does not belong to the particular structure or union type of the left operand. @subsection Structure Operations Accessing a field of a structure is not the only way to operate on one. These other operations are also allowed: @itemize @bullet @item Assignment: An entire structure object can be assigned a new value --- the value of another structure of the same type. @xref{Assignment}. @item Argument passing: A structure can be passed as an argument to a function. It is essential that the function argument be declared as a structure of the same type. @xref{Calling}. @item Returning: A function can be declared to return a structure type. Then a call to that function is an expression of that type. @item Address: The address of a structure can be taken with @samp{&} (@pxref{Address}). This address can be used later to access the original structure or its components (@pxref{Structure Pointers}). @end itemize There are no constant structure values, and type conversion is not possible for structures. @subsection Structure Size and Alignment Each structure type defined has an associated required alignment in memory and a size in bytes. The alignment required for a structure type is the maximum of the alignments required by the types of the fields of the structure. Each field is also aligned within the structure to its own required alignment. For example, in the structure @example struct foo @{ char c; int i; @}; @end example @noindent on a machine in which the address of an @code{int} must be multiple of 4, 3 bytes are unused in between fields @code{c} and @code{i}. If the alignment required for an @code{int} is only 2, just 1 unused byte is needed. In either case, the required alignment of the type @code{struct foo} is the same as that of @code{int} (because that is certainly not less than the required alignment of the other field's type, which is 1 for @code{char}). The size of the structure is equal to the offset of the last field, plus its size, rounded up to a multiple of the structure's required alignment. For example, in @example struct bar @{ int i; char c; @}; @end example @noindent the required alignment of @code{struct bar} is the same as that of @code{int}. The total size is thus 4 (the offset of @code{c}) plus 1 (the size of @code{c}), rounded up to a multiple of that alignment. The result is 6 or 8 if the alignment required for an @code{int} is 2 or 4. This means some space is wasted at the end. [FIXME: this assumes 4 Byte @code{int}s, which is not always true. Should we qualify this by saying ``on my particular compiler'', generalize to the same level of detail, or just gloss over the whole thing by saying ``padding makes it impossible to know the exact size of a structure'' ?] You can make a structure smaller by grouping smaller fields together. Consider the following two structure types: @example struct a @{ char c1; int i; char c2; @}; struct b @{ char c1; char c2; int i; @}; @end example @code{struct a} occupies 8 or 12 bytes according to the alignment required by @code{int}, whereas @code{struct b} occupies only 6 or 8. By putting the two @code{char}'s together, @code{struct b} saves an amount equal to the alignment required for an @code{int}. @subsection Pointers to Structures @kindex -> When the type of the contents is a structure type, it is often useful to combine the two operations of taking the contents (a structure) and taking an element of the structure. The binary operator @samp{->} does this. @table @code @item @var{ptr}->@var{elementname} The value of this expression is the element named @var{elementname} in the structure that @var{ptr} points to. @var{ptr} must be an expression whose type is a pointer to a structure type, and that structure type must have an element named @var{elementname}. This expression is equivalent to @code{(*@var{ptr})->@var{elementname}}. @end table For example, suppose we represent a complex number as a structure containing a real part and an imaginary part: @example struct complex @{ double real; double imag; @}; @end example Then, given a pointer @var{p} to a complex number, we can calculate the magnitude squared of the complex number as follows: @example double mag_squared(struct complex *p)@{ return p->real * p->real + p->imag * p->imag; @} @end example @noindent which is short for @example double mag_squared(struct complex *p)@{ return( ((*p).real) * ((*p).real) + ((*p).imag) * ((*p).imag) ); @} @end example @subsection Lists @cindex nodes This example shows how structures and pointers are used to make linked lists. We define a structure to hold one node of a list of @code{int} values. The list is made of @dfn{nodes}; each node contains one @code{int} value and a pointer to the following link: @comment 1998-05-27:DAV: replaced the term `link' in the original text with the term `node'. @c Was the original author just confused, or has terminology really changed over the years ? @c What does the term ``a link of a linked list'' mean these days ? @c An individual blocks of the list, or a pointer inside that block ? @example struct int_list_node @{ int value; struct int_list_node *next; @}; @end example What goes in the @code{next} element of the last node? It cannot be a pointer to the following node, because there is no following node. Instead, we store there a @dfn{null pointer}: a pointer value that is recognizably distinct from any possible following node. The presence of a null pointer indicates that the node is the end of the list. @xref{Null Pointers}. This function @code{int_list_last()}, when given a pointer to a list (as described above), returns a pointer to the last node of the list. @example struct int_list_node * int_list_last (struct int_list_node *node)@{ while (node->next != 0)@{ node = node->next; @}; return(node); @} @end example If in the same program we need other kinds of lists --- lists of @code{double} values or lists of strings, perhaps --- a new structure type must be defined for each kind of list. Although the operation of finding the last node is fundamentally the same for each kind of list, a separate function is needed for each kind since each function applies only to one data type. This inconvenience can be remedied with @dfn{unions}. (C++ creates a totally different remedy.) @subsection Varying-Size Structures Often it is useful for dynamically allocated structures to end with an array of varying size. C requires each array to have a fixed size, so we cannot officially do this. What we actually do is define the structure with an array of size zero or one, but then allocate extra space. As an example, we will define a font consisting of a sequence of the @code{struct fontinfo} structures previously defined. Each @code{struct fontunit} describes one character in the font. Each font needs a different number of @code{struct fontunit} units, according to how many characters are defined. The data structure of the font must contain these units and must also say explicitly how many units there are. Here is how it is done: @example struct fontunit @{ char code; int height, width, kern; int *bitmap; @}; struct font @{ int length; struct fontunit contents[0]; @}; @end example A font containing @var{x} units can then be allocated with @example struct font * allocate_font (int x) @{ int nbytes = (sizeof (struct font) + x * sizeof (struct fontunit)); struct font *thisfont; thisfont = (struct font *) malloc (nbytes); if(thisfont == 0)@{ fatal("virtual memory exceeded"); @}else@{ thisfont->length = x; @}; return( thisfont ); @} @end example @noindent This example shows how to calculate the size required from the number of elements; it also illustrates the technique for checking that @code{malloc} succeeded. The length used to allocate the font is stored in the font's @code{length} field. That way, when the font is accessed later, it is possible to tell how many elements there actually are. For example, this function returns finds the element of @code{font} whose @code{code} field matches @code{thischar}, and returns a pointer to that element. If there is no such element, this function returns a null pointer (because zero converts automatically to a null pointer; @pxref{Null Pointer}). @smallexample struct fontunit * font_find_char(struct font *font, char thischar) @{ /* Point just past the last element that exists */ struct fontunit *end = font->contents + font->length; /* Look at each element; stop when past the last.*/ for(nextunit = font->contents; nextunit != end; nextunit++)@{ if(nextunit->code == thischar)@{ return nextunit; @}; @}; return 0; @} @end smallexample @noindent Note that @code{font->contents} refers to the field @code{contents}. Since that is an array, it is immediately converted to a pointer to its first element. The array officially has no elements, but that is no problem: The pointer points to where the first element would be if there were one. In fact, there really are elements --- dynamically allocated elements --- and that is exactly where the first one is. ANSI Standard C does not allow a zero-length array. If code is to operate on other C implementations, the @code{contents} field must be given the length 1 and the allocation code must be changed to match. The change is in the computation of @code{nbytes}. This is the result: @example struct font @{ int length; struct fontunit contents[1]; @}; struct font * allocate_font (int x) @{ int nbytes = (sizeof (struct font) + (x - 1) * sizeof (struct fontunit)); struct font *thisfont; thisfont = (struct font *) malloc (nbytes); if(thisfont == 0)@{ fatal("virtual memory exceeded"); @}else@{ thisfont->length = x; @}; return thisfont; @} @end example @subsection Bit Fields @cindex bit field A @dfn{bit field} is a structure field that is not a full byte or word. You can specify exactly how many bits long it should be. Bit fields allow you to pack information tightly into a small space. They are also useful for describing the pattern of data in a hardware register. A bit field is defined like any other structure field except that a colon and a bit-width follow the field name. For example, this is a structure, designed for a 16-bit @code{int} compiler, that breaks a 32-bit word down into 8 four-bit fields: @example struct half_bytes @{ unsigned int a : 4, b : 4, c : 4, d : 4; unsigned int e : 4, f : 4, g : 4, h : 4; @}; @end example @noindent You might think that this particular application calls for an array of four-bit elements, but unfortunately there is no such thing in the C language. Bit fields in C exist only as structure fields. Pointers in C can point only to bytes or multi-byte objects. A bit field is not usually composed of entire bytes, so in C pointers to bit fields are not allowed. Use of the address operator @samp{&} on a bit field causes an error message (@pxref{Address}). However, a bit field can be an lvalue for assignment purposes just like any other structure field (@pxref{Lvalue}). @subsection Data Types of Bit Fields The data type of a bit field must be an integer type or an @code{enum} type. An integer type may be signed or unsigned. This choice makes a big difference. A signed bit field of @var{n} bits has range of values @minus{}2^(@var{n}@minus{}1) to 2^(@var{n}@minus{}1) @minus{} 1. An unsigned one of the same number of bits ranges from zero to 2^@var{n} @minus{} 1. For example, an unsigned bit field of 1 bit can be 0 or 1, but a signed one-bit field can only be 0 or @minus{}1. If an @code{enum} type is used, it is treated as unsigned. The number of bits may not be longer than the word size; that is, the bit field may not be bigger than an @code{int}. @subsection Bit Field Machine Dependence Exactly how the fields are packed into bytes depends on the machine. On machines where the least significant byte of a word is the lowest-numbered, fields are packed in starting from the least significant bit. If the most significant byte is lowest number, fields are packed in starting from the most significant bit. Thus, the first field in a sequence of consecutive fields always goes into the next available byte. On some machines, field are freely split across word boundaries. On others, this is not allowed; then if the next field is too big to fit in what remains of the current word, it stars in the following word. @subsection Bit Field Gaps [FIXME: Is this true ?] You can leave a gap of a specified number of bits by defining a field with a negative size and no name. For example, @example struct foo @{ unsigned int x : 5; unsigned int y : 5; unsigned int : 3; unsigned int z : 3; @}; @end example @noindent gives 5 bits to @code{x}, 5 to @code{y}, skips the next 3, and gives 3 bits to @code{z}. The total is 16 bits, or two bytes. A nameless field with ``size'' zero forces the next field to start at the beginning of a word. @subsection trivia The definition of the structure also serves as the name of a type. So you can declare variables of that type at the same time as the type is defined. For example, it is legitimate to write @example struct fontunit @{ char code; int height, width, kern; int *bitmap; @} *nextunit; @end example @noindent But this is not recommended. If you keep the structure definition separate from variable declarations, it is easier to read. @subsection Shadowing and Forward References Shadowing causes problems with forward references. Suppose within the definition of @code{func} above you want to make a forward reference to @code{struct foo} before defining it. A definition of @code{struct foo} is already known, so a declaration such as @code{struct foo *ptr;} would be taken as a use of the existing definition. In order to make a forward reference to the new definition to come, you must first shadow the outer definition with an empty declaration consisting of just @code{struct foo;}. @example struct foo @{ int i, j@}; double func (double x) @{ struct foo; struct foo *ptr; struct foo @{ double i, k; @}; @dots{} return ptr->i + ptr->k; @} @end example @noindent Normally, @code{struct foo} would be a name for the existing structure type. However, when it appears in an empty declaration (one that declares no variables) it is given a special meaning. The empty declaration tells the compiler that @code{struct foo} will be redefined in the current scope, and following uses of @code{struct foo} should be taken as forward references to the coming definition. This ``empty declaration'' feature is supported and described in @code{gcc} because the ANSI C Standard mandates it and you might see programs that use it. Using this feature is a very bad idea. @section Unions @samp{union} @subsection Unions @cindex union @kindex union @dfn{Unions} are a kind of type that allow one block of memory to be regarded as any of several other types. Each union type is defined by specifying the alternative types that are its members. Unions in C are much like structures. The description of unions here assumes that you understand structures. @xref{Structures}. @subsection defining unions A union definition looks like a structure definition except that the keyword @code{union} replaces @code{struct} (@pxref{Structure Def}). Union tag names and structure tag names come from the same name space. This means that, in any one name scope, one particular name may be the name of either a structure type or a union type, but not both. If you define @code{union hack}, you may not also use @code{struct hack}. @subsection accessing unions Union components are accessed using the @samp{.} and @samp{->} operators, just like structure components (@pxref{Structure Ref}). They can be assigned, passed as arguments and returned just like structures (@code{Structure Operations}). There are no constant union values, and type conversion is not possible for unions. @subsection When to use a union There are only 2 reasons to ever use a union: (a) to save space, and (b) to interpret a single piece of hardware multiple ways. The "endian problem" never happens if you don't use unions. @subsection Union Members Here is a sample union definition: @example union element @{ int i; char *s; struct window *w; @}; union element temp; @end example @noindent This union has three members, of three different types. An object of this union type, such as the variable @code{temp} has enough space to hold either an @code{int}, a @code{char *} or a @code{struct window *}, but not two at once. The three members of the union variable @code{temp} can be thought of as three variables of different types that are stored in the same space. The value of the union is valid only for the member that was last used to store in it. For example, if you store an @code{int} into @code{temp.i}, you can refer to @code{temp.i} later to get the same @code{int} value, but @code{temp.s} and @code{temp.w} are invalid and their values are undefined. If you later store a @code{char *} value into @code{temp.s}, you can access @code{temp.s} again to recover the same value, but @code{temp.i} is now undefined. The size of the union is equal to the largest of the sizes of its members. Contrast this with a structure that has the same members: @example struct elements @{ int i; char *s; struct window *w; @}; @end example @noindent This structure has enough space for an @code{int} @emph{and} two pointers side-by-side. All three can be stored in it independently. The size of this structure is (at least) the sum of the sizes of the members. @subsection Alternative-use Storage The example above for list structure (@pxref{Lists}) shows that you need a new structure type for each kind of data you want to put into lists. When you have one type of structure to represent a list of @code{int}'s, you need another structure type for a list of @code{char *} strings, and yet another for a list of @code{struct window *}'s. What if you want to have one list containing @code{int}'s, @code{char *}'s and @code{struct window *}'s, in any random order? This can be done with the union defined in the previous section. Here is the definition again: @example union element @{ int i; char *s; struct window *w; @}; @end example Now we can make a list of @code{union element} values just like a list of anything else: @example struct alt_list_node @{ union element value; struct alt_list_node *next; @}; struct alt_list_node *p; @end example If @code{p} points to a node of a list of this kind, you can extract the value as an @code{int} with @code{p->value.i}, or extract it as a @code{struct window *} with @code{p->value.w}. This is because @code{p->value} by itself is a value of type @code{union element}. But this is not a good solution of the problem. Nothing in the list node tells you whether the value is supposed to be interpreted as an @code{int}, a @code{char *} or a @code{struct window *}. If you refer to the value the wrong way, you will not get an error message, just bizarre results. This problem can be avoided by adding a @dfn{type-code} field to the node structure, making it a ``self-describing'' structure. @ifinfo See the next node. @end ifinfo @subsection Unions and Type-code Fields In the simple list-of-union, it is impossible to tell just by looking at a node whether it contains an @code{int}, a @code{char *} or a @code{struct window *}. So the simple list-of-union structure is useful only when there is some other way for the program to know how each node should be used. Most of the time, it is better to add -- to every node -- information about to interpret the node's value. This is done with an additional field in the node structure, called a ``type code'' field because its value informs us of the type of value in the union. An enumeration type is often just the right thing for this purpose. Here is the modified structure definition: @example struct alt_list_node @{ enum @{ IS_INT, IS_STRING, IS_WINDOW @} code; union element value; struct alt_list_node *next; @}; @end example Then we establish a convention that when the @code{value} field is properly interpreted as an @code{int}, the value @code{IS_INT} is stored in the @code{code} field, and so on. The C language does not enforce this convention. It is still possible to disregard the convention and do @example node->code = IS_INT; node->value.s = "foo"; @end example @noindent But obeying the convention is not hard, and as long as that is done, the meaning of each element of the list is self-evident. @subsection Unions for Type Puns Would you like to know what the bit pattern of a @code{char}-pointer really looks like? Define a union containing types @code{char *} and @code{int} and see. Here is how: @example int ptr_as_int (char *p) @{ union @{ char *p; int i; @} conv; conv.p = p; return conv.i; @} @end example @noindent Here the data is loaded into the union variable @code{conv} as a pointer, then examined as an integer. An example actually used in the GNU C compiler involves storing a @code{double} in a data structure composed of an array of @code{int}s. Two @code{int}'s provide enough room for the bits of the @code{double}, but we need a way to separate it into two words. The following union was used: @example union converter @{ int i[2]; double d; @}; @end example @noindent With this union it is possible to take a @code{double} apart and store it into two @code{int}'s, and later reverse the transformation. Here is a function to take a @code{double} apart, storing the two halves into two locations specified by giving pointers two them: @example void dissect_double(double d, int *l, int *h) @{ union converted conv; conv.d = d; *l = conv.i[0]; *h = conv.i[1]; @} @end example Here is how to reassemble the two halves into an identical @code{double}: @example double reconstruct_double(int l, int h) @{ union converted conv; conv.i[0] = l; conv.i[1] = h; return conf.d; @} @end example @subsection Union Member Addresses In general, the members of a union share a common starting address. The address of any member of the union is equal to that of the union (though their types are different, so in order to compare them in C you must cast one to the other's type). For example, in @example union test @{ int i; char c; @} var; int check_it() @{ return ((int *) &var) == (&var.i); @} @end example @noindent the function @code{check_it} is guaranteed to return 1. @subsection Run-time Endianness Test A union of a @code{char} and an @code{int} can be used to tell how the bytes in an @code{int} are numbered on the machine you are using. This example shows how. @example void endian(void) @{ union @{ int i; char c; @} temp; temp.i = 0; temp.c = 1; if(temp.i == 1)@{ printf("Little-endian\n"); @}else if(temp.i == 1 << 24)@{ printf("32-bit big-endian\n"); @}else@{ printf("Something strange\n"); @}; @} @end example @subsection Unions of Structures Structure types can be used in unions as any other types can. When this is done, the structure fields are obtained from the union with two stages of the @samp{.} operator. The size of the union is, as always, the maximum of the sizes of the fields. A common situation is that a union has several members that are different types of structures. Often two of the structure types start with similar fields, as shown here: @example struct type1 @{ int x; char b; char *name; int size; @}; struct type2 @{ int x; char c; char *name; char text[100]; @}; union u @{ struct type1 t1; struct type2 t2; @}; union u u1; @end example Here both @code{struct type1} and @code{struct type2} start with the sequence @code{int}, @code{char}, @code{char *}. (The field names are not the same, but that is not important.) In this case, it is guaranteed that you will see the same values for those three initial fields regardless of whether you access them through @code{struct type1} or @code{struct type2}. In other words, @code{u1.t1.x} and @code{u1.t2.x} are the @emph{same object}; and @code{u1.t1.b} and @code{u1.t2.c} are also the @emph{same object}. This fact is a consequence of the fact that the compiler lays out structure fields in the order you write them, and their size and spacing depends only on their data types. If the first @var{n} fields of two structure types match in their types, the layout of those fields must also match. [FIXME: this next section may be totally bogus] The code in the previous section can create very confusing source code. Here is an alternate way of specifying exactly the same layout in memory, but is far easier to understand. This demonstrates that structures can contain unions. [FIXME: make the reference over in anonymous structure types point here] @example struct type1 @{ int size; @}; struct type2 @{ char text[100]; @}; struct u @{ int x; char b; char *name; union @{ struct type1 t1; struct type2 t2; @}; @}; struct u u1; @end example The memory layout of this @code{struct u u1} is identical to the previous @code{union u u1}. The code that uses @code{u1} is slightly simplified. All references to @code{u1.t1.x} or to @code{u1.t2.x} must now be replaced with @code{u1.x}, which makes it obvious that they were really referring to the same thing. Other bits of the code that refer to @code{u1.t1.size} or @code{u1.t2.text} still access the same area of memory they did before. [end possibly bogus section] @subsection Trivia @section enumerated types (enum) Enumeration Types @section Renaming (typedef) @section Trivia @node Arithmetic and Bitwise Operators, bool type, Creating New Data Types, Top @chapter Arithmetic and Bitwise Operators @section Arithmetic operators (+ - * / %) @cindex addition (integer) @cindex subtraction (integer) @cindex multiplication (integer) @cindex division (integer) @cindex quotient (integer) @cindex remainder @cindex common type The type of the result depends on the types of the operands. First, if either operand has type @code{short} or @code{char} (either signed or unsigned), it is converted to @code{int} by default promotion. Then the @dfn{common type} of the operands is determined. This is either @code{long unsigned int}, @code{long int}, @code{unsigned int} or @code{int}. The common type is long if either operand is long; it is unsigned if either operand is unsigned. If one operand has an unsigned type and the other has a signed type, the one with the signed type is converted to unsigned and the arithmetic is done on unsigned values. If the signed operand had a negative value, the results may be counterintuitive, because when this value is converted to an unsigned type, it becomes a large positive number. Small negative numbers become positive numbers near the top of the range possible values. For positive numbers, the result of an arithmetic operation is always the same regardless of whether the type of the numbers is signed or unsigned, except when the result is so large that it overflows the range of the type. @kindex + @kindex - @kindex * (binary) @kindex / @kindex % @table @samp @item @var{intexp} + @var{intexp} Addition of two integer expressions @item @var{intexp} @minus{} @var{intexp} Subtraction of two integer expressions @item @minus{} @var{intexp} Negation of an integer expression. Equivalent to @code{0 - @var{intexp}} @item @var{intexp} * @var{intexp} Multiplication of two integer expressions @item @var{a} / @var{b} Quotient of two integer expressions If the exact quotient is not an integer, it is rounded toward zero to make an integer. If @var{b} is negative, the quotient is minus the result of dividing by @code{-@var{b}}. (The handling of negative operands may be different in other implementations of C.) If @var{b} is zero, the division operation raises a signal. It is possible to write a handler for this signal, but usually it is more convenient to test whether the divisor is zero before you do the division. @item @var{a} % @var{b} Remainder of two integer expressions. The remainder is compatible with the quotient: (@var{a} / @var{b}) * @var{b} + @var{a} % @var{b} is equal to @var{a}. If @var{b} is zero, the remainder operation raises a signal. It is possible to write a handler for this signal, but usually it is more convenient to test whether the divisor is zero before you do the division. @end table @section increment and decrement (++ --) @section conversion of types (cast) @section internal representation of numbers in general @section bitwise operators (& | ^ ~ >> <<) @cindex bitwise operations @cindex boolean operations @cindex logical operations [FIXME: we need to use terminology that makes it hard to confuse ``bitwise'' (lots of bits all being operated on at once in a single value) vs. ``boolean'' (a value containing a single bit). Perhaps ``bitwise'' vs. ``logical'' ?] The @dfn{bitwise} operations combine two integers bit by bit. This means that the operands are considered as binary numbers and lined up. The least significant bits (1's bits) of the operands are combined to make the least significant bit of the result; the 2's bits of the operands are combined to make the 2's bits of the result; the 4's bits are combined to make the 4's bit of the result; and so on. The operands are always treated as unsigned in these operations even if they have signed types. Operands of type @code{short} or @code{char} are extended to @samp{int} before the operation is done, so there are always 32 bits to operate on in each operand. [FIXME: a picture or some ASCII Art would make this much easier to visualize. Remember that a @code{int} is not always 32 bits; and sometimes a @code{long int} can be used in a bitwise operation - right ?] Bitwise operations are also called @dfn{boolean} operations because they are modeled on the laws of boolean algebra, and @dfn{logical} operations because ``logical'' is traditionally used for any operation that considers an integer as a sequence of bits.[FIXME: Wrong.] Although the numbers are considered unsigned in order to perform the operation, the data type of the result is not always unsigned. It follows the same rule used for arithmetic operations: it is long if either operand is long; it is unsigned if either operand is unsigned. Here are precise definitions of all the bitwise operations. Bit @var{n} of an unsigned integer @var{a} is @code{(@var{a} >> @var{n}) % 2} (where @samp{>>} stands for right-shift; @pxref{Shifting}). Bit @var{n} of a signed integer is computed by first converting the integer to unsigned. @kindex & (binary) @kindex | @kindex ^ @kindex ~ @table @samp @item @var{a} & @var{b} Bitwise logical-and. Bit @var{n} of the result is 1 if bit @var{n} in both operands is 1. @item @var{a} | @var{b} Bitwise logical-or. Bit @var{n} of the result is 1 if bit @var{n} in either operand is 1. @item @var{a} ^ @var{b} Bitwise logical-exclusive-or. Bit @var{n} of the result is 1 if bit @var{n} is 1 in one of the operands and 0 in the other. @item ~ @var{a} Bitwise logical-not. Bit @var{n} of the result is 1 if bit @var{n} of @var{a} is 0. @end table @section Shift Operators @cindex shifting @kindex << @kindex >> @dfn{Shifting} an integer is defined in terms of the binary representation of the integer. Shifting left means appending binary zeros to the number's representation; this has the effect of multiplying by a power of 2. (If the number is large enough, the most significant digits can be lost by overflow in the process.) Shifting right means discarding binary digits from the right of the number. This has the effect of dividing by 2 and rounding down (to negative infinity). The result of shifting right has the same sign as the operand. This means that the same bit-pattern for the operand produces a different result depending on whether it has a signed or unsigned type. The signed integer @minus{}4 and the unsigned integer @code{0xfffffffc} have the same bit pattern, but when shifted right one place they produce the results @minus{}2 and @code{0x7ffffffe}. These two numbers differ in the highest bit. When applied to unsigned values, the @code{>>} operator uses ``logical'' right shifting --- it brings zeroes into the most significant bits of the result. When applied to signed values, the @code{>>} operator uses ``@dfn{arithmetic}'' right shifting. This brings zeros into the most significant bits for a positive number, and ones into the most significant bits for a negative number. @table @code @item @var{a} << @var{count} Shift @var{a} left by @var{count} places. The result is undefined if @var{count} is negative or if it is larger than 32. @item @var{a} >> @var{count} Shift @var{a} right by @var{count} places. The result is undefined if @var{count} is negative or if it is larger than 32. @end table Here are some examples of shifting, with the values that result. @example 1<<0 == 1 1<<5 == 32 1<<31 == 0x80000000 5<<1 == 10 (-5)<<1 == -10 3>>1 == 1 4>>1 == 2 5>>1 == 2 (-3)>>1 == -2 == 0xfffffffe (-4)>>1 == -2 (-5)>>1 == -3 == 0xfffffffd ((unsigned)-3) >> 1 == 0x7ffffffe ((unsigned)-4) >> 1 == 0x7ffffffe ((unsigned)-5) >> 1 == 0x7ffffffd @end example The ANSI C standard does not specify what happens when a negative number is shifted. In GNU C, we have chosen the meaning we think is most useful. @section Floating Point Arithmetic @cindex arithmetic (floating) @cindex common type The four basic arithmetic operators, @samp{+}, @samp{-}, @samp{*} and @samp{/}, are allowed on floating point operands as well as integer operands. These are the only operations allowed on floating point operands. The remainder operation (@samp{%}) is not meaningful for floating point operands because division of floating point numbers does not round the result to an integer. When the result of arithmetic is outside the range of possible values of its type, this is called @dfn{floating point overflow}. The result of the operation is undefined when overflow happens. When dividing by a negative number @var{b}, the result is the quotient is minus the result of dividing by @minus{}@var{b}. Division by zero has undefined effects, possibly crashing the program. You should test whether the divisor is zero before dividing. When operands of two different floating-point types are combined with an arithmetic operation, the operand of narrower type is converted to the other (wider) operand's type before the operation is performed. The types in order of increasing width are @code{float}, @code{double} and @code{long double}. Floating point and integer operands may be mixed. When this is done, the integer operand is converted to floating point, in the same type as the other operand; then the arithmetic operation is done in that type. @section Trivia @node bool type, expressions, Arithmetic and Bitwise Operators, Top @chapter working with the bool type (true, false, and logical operators) @section @code{bool} values [FIXME: perhaps it would be easier to explain this ``as if'' there were a @code{bool} type - i.e., from the C++ perspective. People who knew nothing of type @code{bool} wrote many C compilers compliant with the ANSI standard. However, many programmers argue that the @code{bool} type is implicit in the C language. A C program compiled on a C++ compiler may create an executable identical to that generated by a C compiler. But the C++ perspective is to say that operators like `<' and `>' return a value of type @code{bool}, and the conditional expression in a if() is cast to a @code{bool}.] A @code{bool} value is either @code{true} or @code{false}. A truth value is either ``true'' or ``false''. C does not have a distinct data type for truth values, as some languages do. (For example, type ``@code{bool}'' in C++). Instead, any numeric type or pointer type can be used as a truth value. A zero value represents ``false'', and any nonzero value means ``true''. Most of the time, it is wise to use only type @code{int} for truth values and to use only the value 1 to mean ``true''. Although there is no special type for truth values, there are special operators in C for creating truth values (comparison operators), combining truth values (truth operators) and using them (conditional expressions and conditional statements). the @var{continue-condition} must have a data type which can be compared against the constant zero, which means an integer zero, a floating point zero, or a null pointer. @xref{branching} @xref{looping} @section Comparison (Relational operators: > >= < <= == !=) @cindex comparison Comparison operators test for equality or ordering of either numbers or pointers. The result of a comparison is an @code{int} which is either 0 or 1. Usually this value is used as a truth value. @table @code @item @var{a} == @var{b} @item @var{a} != @var{b} @item @var{a} < @var{b} @item @var{a} > @var{b} @item @var{a} <= @var{b} @item @var{a} >= @var{b} @end table @section Logical operators (&& || !) The @dfn{truth operators} combine truth values into other truth values. There are three such operators: ``not true'', ``both true'' and ``either one true''. The operands of these operators are used only as truth values: their values are checked only for nonzeroness. The operands may have any type that is acceptable as a truth value, but the result always has type @code{int}. @kindex ! @kindex && @kindex || @table @samp @item ! @var{truthexp} Not true. Value is 1 if @var{truthexp} equals 0; 0 otherwise. If @var{truthexp} represents a condition, @code{! @var{truthexp}} represents the contrary condition. @item @var{truthexp1} && @var{truthexp2} ``And'' for truth values. Value is 1 if both @var{truthexp1} and @var{truthexp2} have nonzero values. If @var{truthexp1} is zero, @var{truthexp2} is not computed at all; its side effects do not take place. @item @var{truthexp1} || @var{truthexp2} ``Or'' for truth values. Value is 1 if either @var{truthexp1} or @var{truthexp2} has a nonzero value. If @var{truthexp1} is nonzero, @var{truthexp2} is not computed at all; its side effects do not take place. @end table The operators @samp{&&} and @samp{||} specify @dfn{conditional execution}. This means that, depending on the value of the first operand, the second operand may or may not be executed. This makes a difference when the second operand has side effects. Consider by contrast @code{0 * (x = 4)}. Its value is always 0, but it has the effect of assigning the value 4 to the variable @code{x}. Here the sub-expression @code{x = 4} is executed unconditionally, even in cases where its value is known in advance to be irrelevant. Most operators in C work this way; all of their operands are executed unconditionally. In addition, the order in which the operands are executed is not specified. The operators @samp{&&} and @samp{||} are unusual: their operands are executed in left-to-right order, and if the ultimate result is determined after the first operand, then the second operand is skipped entirely. Thus, in @code{0 && (x = 4)}, since the first operand makes it certain that the value is zero, the second operand is not computed and @code{x} is not changed. In @code{y && (x = 4)}, @code{x} is changed only if @code{y} is nonzero. Only one other C expression, the conditional expression, can omit execution of some of its operands (@pxref{Conditional Expr}). @section Conditional Expressions @cindex conditional expression @kindex ? : A conditional expression lets you select one of two expressions based on a truth value expression. It looks like this: @example @var{truthexp} ? @var{val1} : @var{val2} @end example @var{truthexp} must be a number or a pointer. If @var{truthexp} is nonzero, @var{val1} is computed and its value is used. Otherwise, @var{val2} is computed and its value is used. Exactly one of @var{val1} and @var{val2} is computed. If @var{val1} and @var{val2} have the same type, that may be any type, and the conditional expression has the same type. (Array and function types are excluded: if either @var{val1} or @var{val2} is an array or function then it is converted to a pointer ``before'' the conditional expression ``sees'' it.) In addition, the following cases of different types are allowed: @itemize @bullet @item Both types are numbers. In this case, the type of the conditional expression is determined as if the two numbers were being added together. @item One operand is void. Then the other operand may have any type, but the result is void. @item One operand is a pointer and the other operand is zero. Then the value is a pointer of the same type. @end itemize In all of these cases, either @var{val1} or @var{val2}, whichever is selected, is converted to the appropriate result type. Here are some examples of conditional expressions: @example (3 > 1) ? 5 : 2 => 5 (3 < 1) ? 5 : 2 => 2 *p == 0 ? "end of string" : 0 @end example The last example has type @code{char *} and its value is either the constant @code{"end of string"} or a null pointer. @section Trivia Overwhelmingly used in if() statements. ``Boolean operators'', ``Relational Operators'', ``Truth Operators'', and ``Logical Operators'' are different ways of saying the same thing. A ``boolean variable'' can either be true or false; these are often called ``flags''. Many people think that the keywords defined in @code{#include } are much easier to read. This ISO standard defines the keywords @code{and and_eq bitand bitor compl not or or_eq xor xor_eq not_eq} to be exactly equivalent to @code{&& &= & | ~ ! || |= ^ ^= !=} [FIXME: is this true ? is there no bitor_eq ?] @node expressions, = and side effects, bool type, Top @chapter expressions @vindex expressions @vindex operator precedence @vindex precedence @section Precedence [FIXME](Table to be included when I know how to do tables in texinfo.) @section assigning a value to an expression (var = X) @section Trivia According to ANSI, there is no precedence in C; instead, there are many types of expressions. Although their terminology is very different, the net effect is identical to the (hopefully easier to understand) ``associativity and precedence system'' terminology in this reference manual. @node = and side effects, evaluation order, expressions, Top @chapter assignments and side effects @section Simple Assignment @kindex = @cindex lvalue @cindex assignment Simple assignment is done with the operator @samp{=}. On the left of the @samp{=} is a place to store a value; this can be a variable, a structure element, an array element, or the place a pointer points. Expressions that are allowed on the left of an @samp{=} are called @dfn{lvalues} (left-side values). On the right of the @samp{=} is an expression for the value to be stored. Let's call them @var{l} and @var{r}. If @var{l} and @var{r} have the same type, it may be any type except for void, array and function types. (If @var{r} is an array or function then it is converted automatically to a pointer before the assignment ``sees'' it.) In addition, the following cases of mixed types are allowed: @itemize @bullet @item Both @var{l} and @var{r} have numeric types. Then @var{r} is automatically converted to @var{l}'s type and the result is stored in @var{l}. @item @var{l} has a pointer type and @var{r} is the integer 0. Then a null pointer is stored in @var{l}. @end itemize An assignment is an expression, and therefore has a value. This value is the altered value of @var{l}. However, the expression is not a lvalue; it may not be used as the operand of unary @samp{&} or as the left side of another assignment. @section Modifying Assignment @cindex modifying assignment The @dfn{modifying assignment} operators abbreviate an arithmetic operation combined with an assignment. Any arithmetic operator can be used. These operators do not add any power to the language, but they are often convenient. Let's take the most commonly used modifying assignment operator, @samp{+=}, as an example. @code{@var{l} += @var{r}} is an abbreviation for @code{@var{l} = @var{l} + @var{r}}. It means that the value of @var{r} is added into @var{l}, not simply stored into @var{l}. Like simple assignments, modifying assignments are expressions and have values. The value of any assignment is the new value of @var{l}. However, the expression is not an lvalue; it may not be used as the operand of unary @samp{&} or as the left side of another assignment. The rules for the types allowed in modifying assignments follow from the rules for types in simple assignments and in arithmetic operators. It must be possible to combine @var{l} and @var{r} with the arithmetic operator used, and the result must be able to be stored into @var{l}. The following modifying assignment operators are allowed with @var{l} and @var{r} having any numeric types, and are also allowed if @var{l} is a pointer type and @var{r} is an integer. @table @code @item @var{l} += @var{r} This expression increments @var{l} by the addition of @var{r}. [FIXME: way too much passive voice around here.] @item @var{l} -= @var{r} The value of @var{l} is decremented by the subtraction of @var{r}. @end table The following modifying assignment operators are allowed whenever @var{l} and @var{r} both have numeric types (either integer or floating). It is not necessary for @var{l} and @var{r} to have the same type; in fact, one may be integer and the other floating. @table @code @item @var{l} *= @var{r} The value of @var{l} is altered by multiplication by @var{r}. @item @var{l} /= @var{r} The value of @var{l} is altered by division by @var{r}. @end table The following modifying assignment operators are allowed whenever @var{l} and @var{r} both have integer types. They need not have the same types. @table @code @item @var{l} %= @var{r} The value of @var{l} is changed to its remainder in division by @var{r}. @item @var{l} &= @var{r} The value of @var{l} is altered by logical-and with @var{r}. This clears all bits in @var{l} that are clear in @var{r}. @xref{Bitwise}. For example, @code{x &= ~4} clears the 4's bit in @code{x}, leaving all other bits in @code{x} unchanged. @item @var{l} |= @var{r} The value of @var{l} is altered by logical-or with @var{r}. This sets all bits in @var{l} that are set in @var{r}. @xref{Bitwise}. For example, @code{x |= 4} sets the 4's bit in @code{x}, leaving all other bits in @code{x} unchanged. @