\input texinfo @c -*- Texinfo -*- @comment %**start of header (This is for running Texinfo on a region.) @tex \special{twoside} @end tex @setfilename c_reference_manual.info @settitle YARMAC 0.13--- C Reference Manual --- DRAFT 1998-08-29 @setchapternewpage odd @comment DAV: Is the header really the right place to put @comment the setchapternewpage command ? @set VERSION 0.13 @c huh ? @paragraphindent none @comment %**end of header (This is for running Texinfo on a region.) @ignore 123456789012345678901234567890123456789012345678901234567890123456789012 Only visible in source file ! [FIXME: do tabs in sample programs need to be replaced by spaces ?] [DVDEUG: YES!!!] [FIXME: consider adding some of the style guide suggestions at http://www.rdrop.com/~cary/html/linux.html to this text. ] Should this be put under a "OPL" liscense ? see OpenContent http://www.opencontent.org/home.shtml @end ignore @comment from "info:texinfo#Installing_Dir_Entries" @comment "@dircategory" and "@direntry" are used only by "install-info". @comment Why doesn't the "info:texinfo#Beginning_a_File" documentation @comment mention this ? @dircategory Programming @direntry * YARMAC: (c_reference_manual). Yet Another Reference Manual About C. @end direntry @ifinfo @c The summary description and copyright --- @c --- does not appear in the printed document. This is an unfinished, unpublished work. When finished, it will be a C reference manual and at that time you may freely distribute it under terms vaguely similar to the following: This C reference manual documents some of the features of the C language that are required for writing programs in that language. Copyright @copyright{} 1988 Richard M. Stallman; @copyright{} 1994 Peter Seebach; @copyright{} 1998 David Cary current maintainer (1998-05-26): David Cary Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. @ignore Permission is granted to process this file through TeX and print the results, provided the printed document carries a copying permission notice identical to this one except for the removal of this paragraph (this paragraph not being relevant to the printed manual). @end ignore Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided also that the sections entitled ``Copying'' and ``GNU General Public License'' are included exactly as in the original, and provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the Free Software Foundation. @end ifinfo @titlepage @c start of title page --- does not appear in the Info file. @title YARMAC @subtitle UNFINISHED DRAFT 1998-05-26 @subtitle NOT FOR DISTRIBUTION, YET @subtitle Yet Another Reference Manual About C @author current maintainer: David Cary @page @vskip 0pt plus 1filll @c start of copyright page Copyright @copyright{} 1988 Richard M. Stallman; @copyright{} 1994 Peter Seebach; @copyright{} 1998 David Cary current maintainer (1998-08-23): David Cary This is an unfinished, unpublished work. When finished, it will be a C reference manual and at that time you may freely distribute it under terms vaguely similar to the following: Published by ... current maintainer: David Cary Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided also that the sections entitled ``Copying'' and ``GNU General Public License'' are included exactly as in the original, and provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the Free Software Foundation. @end titlepage @comment node-name, next, previous, up @ifinfo @node Top, Copying, , (dir) @top YARMAC YARMAC version 0.13 (1998-08-29) DRAFT This is an unfinished, unpublished work. When finished, it will be a C reference manual. This manual is intended to cover the standard C language and all compliant C compilers. (K&R C, ANSI/ISO C, GNU C, and the upcoming C9X standard) The GNU C compiler with the @code{-ansi -pedantic-errors} options is a standard C. Global questions about this DRAFT: Yes, I know all the chapter headings are all lowercase. I've been influenced by _IEEE Spectrum_ doing the same thing; is this just a fad ? How should I handle URIs ? When this document is run through texi2html, I could make links that (a) directly jump to the referenced site. Some people prefer such links to (b) jump to a bibliography at the end; only the links in that bibliography actually exit the document. This document is intended to be a reference for people who already know a little C and who are reading other people's C source code and trying to figure out what's going on. @comment this is the ``master menu'' @menu @comment Main Chapters and appendices * Copying:: YARMAC will be free to distribute. [FIXME] * Introduction to YARMAC:: What is YARMAC ? (Overview) Who is the intended audience ? * Top-Down View of C:: A Top-Down View of C * conventions:: general language conventions * fundamental data types:: fundamental data types * variables:: variables * pointers:: pointers * Creating New Data Types:: Creating New Data Types * Arithmetic and Bitwise Operators:: Arithmetic and Bitwise Operators * bool type:: working with the bool type (true, false, and logical operators) * expressions:: expressions * = and side effects:: assignments and side effects * evaluation order:: Precedence vs. order of evaluation * sequence points:: sequence points * branching:: branching * looping:: looping * functions:: functions * scope:: scope, linkage, access and duration * I/O:: Input and Output * macros:: macros and the preprocessor * History of C:: History of C * History of this document:: History of this document * Bibliography:: Bibliography * another view:: The Compiler Writer's View @comment Indices * Glossary:: Glossary * keystroke index:: Keystroke Index * Concept Index:: Concept Index @detailmenu --- Detailed Chapters and subsections --- * Copying:: YARMAC will be free to distribute. [FIXME] * Introduction to YARMAC:: What is YARMAC ? (Overview) Who is the intended audience ? * conventions:: general language conventions * fundamental data types:: * Int Const Default:: Default Type of an Integer Constant * Int Const Type:: Explicitly Typed Integer Constants * Int Conversion:: Type Conversion among Integer Types * Int Promotion:: Default Integer Promotions * variables:: variables * pointers:: pointers * Creating New Data Types:: Creating New Data Types * Arithmetic and Bitwise Operators:: Arithmetic and Bitwise Operators * bool type:: working with the bool type (true, false, and logical operators) * expressions:: expressions * = and side effects:: assignments and side effects * evaluation order:: Precedence vs. order of evaluation * sequence points:: sequence points * branching:: branching * looping:: looping * functions:: functions * scope:: scope, linkage, access and duration * I/O:: Input and Output * macros:: macros and the preprocessor * History of C:: History of C * History of this document:: History of this document * Bibliography:: Bibliography * another view:: The Compiler Writer's View * Glossary:: Glossary * keystroke index:: Keystroke Index * Concept Index:: Concept Index @end detailmenu @end menu @end ifinfo @comment node-name, next, previous, up @node Copying, Introduction to YARMAC, Top, Top @chapter Copying (Information on FSF) @section General Public License and explanation [FIXME: Is this the latest proper language ?] Copyright @copyright{} 1994, 1995, 1996, 1998 Free Software Foundation, Inc. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the Free Software Foundation. @section Why FSF? (Richard Stallman and the Gnu Manifesto) @section Why Gnu C? @section Trivia @node Introduction to YARMAC, Top-Down View of C, Copying, Top @chapter Introduction to YARMAC C language reference manual This book intends to be a reference to the C programming language. It assumes you have already gone through the tutorial that came with your C compiler, and are familiar with editing files on your own platform. This book is not a replacement for the FAQ, or any other explanatory book; it is expected to be a mere reference. A reference manual can have bugs. Bugs include segments that are inaccurate or unclear. Bugs include a reference that should have been in the index. When you find bugs, please report them to the maintainer; currently, that's @samp{d.cary@@ieee.org}. I'll try to patch the bug, or explain why it's a feature. Please include the version number from the top of the file in any bug report. [DVDEUG: --- doesn't show up right!] The C programming language is widely available --- widely enough that there are many distinct dialects. This manual aims to cover the K&R (traditional) dialect, ANSI/ISO C, GNU C, and the upcoming C9X standard. The primary targets are ISO C (for portability) and GNU C (to go with the @samp{gcc} compiler). The K&R C style is referred to mostly for portability to obscure systems, and to help identify what was intended by code written in this style. Some side references to other dialects may be included where necessary; in particular, some of the more visible incompatibilities with C++ are covered, because this kind of information can be helpful. This book contains a lot of opinion and dogma; this is understood to be the author's personal opinion, but frequently is related to issues that look harmless until you use your second compiler. I have tried to avoid picking sides in the major religious wars, focussing instead on things that are known to introduce problems. The goal of this book is to provide a high quality reference manual for the C language, available in machine-readable form. No paper index can compete with the vast speed of modern computers at searching for information. Each chapter will have a section titled `Trivia' which will contain likely sticking points, non-obvious implications, and other things that are expected to answer the questions of experienced programmers, or which may prove interesting to know. The Appendices are intended to serve as quick references, with pointers to more detailed treatments in the text. The Index in particular, it is hoped, will be of greater value than indices in computer books usually are. @section Trivia There are 2 kinds of bugs: ones I know about (labeled FIXME), and ones I don't know about. @node Top-Down View of C, conventions, Introduction to YARMAC, Top @chapter a top-down view of C [FIXME: most programmers reference manuals start with individual characters and build up from there. I think it might be interesting to start at a high level and work down; a quick reference for people who found a C program on the 'web, and need to know how change a couple of @code{#define}s to get it working on their machines; then get more and more detailed as they need to make a little change here and a little change there... How high a level should I start ? Buckminster Fuller says to start with the universe, then start dividing. ] [DVDEUG: Right now it isn't written like this. Is there plans to write it like this, because personally I would skip it. The key - unanswered question - is the audience. If it is a _reference manual_ then it should be for people who know C. My adding the comparisons to Pascal, C++ and Java would make it for anyone with computer science. Restructuring the document like this would make it more for anyone, at the cost of reducing its value as a C reference.] [ I'm thinking about adding a section, "When should one *not* use C ?" to add a brief discussion of all the other wonderful (and free !) tools that do things that C is overqualified and/or underqualified for. Simple batch files and sed scripts, raw assembly, PERL, FORTRAN, Octave, C++, etc. ] One typically creates a executable program by using the @code{make} utility. [FIXME: Recent GNU programs tend to have the user run Configure to generate a make file.] One goes to the directory containing the source code of the C program (a bunch of files that usually end in @code{.c} or @code{.h}) @comment (and @code{.cpp} @code{.hpp} FIXME ?) and also containing a file named @code{makefile}, then types @code{make}, and if everything goes as designed, the @code{make} utility uses the @code{makefile} to find all the various parts and combine them into a single executable file. Then you type the name of that file to run it. [Really large programs typically have subdirectories, or "branches", each with their own @code{makefile} that is recursively called from the "root" @code{makefile}]. The @code{make} utility, the format of the @code{makefile}, and the process of compiling the source into a executable are outside the scope of this manual. This manual covers what is in the C source files and how that corresponds to what your executable program actually does when you run it. Inside the C source files, there is 2 kinds of text: comments intended solely for human readers, and "code" intended to be understood by a compiler. @section Overview A C program consists of a collection of files. The naming of files is arbitrary; the only thing mandated by most compilers is that the filename of the code ends in @code{.c}@footnote{ @code{.c}, not @code{.C}. Regrettably, there is a difference - @code{.C} implies a C++ program, despite this being error prone even under a UNIX-like system, and tragic under a system that is case-insensitive.}, and that can be overridden by hand. In practice, the name of a file is a brief description of the contents of the file with @code{.c} appended. In addition to @code{.c} (code) files, there are @code{.h}@footnote{Again, @code{.H} implies C++.} (header) files. To control the compilation of these files, most people use @code{make}, which automatically builds only the files that need to be built. Many high-quality projects use @code{autoconf} to help make their code easily portable to a wide variety of systems. While they are both useful tools for C programming, they fall outside the realm of this work. Please see [insert references to the manuals]. A header file contains different material than a code file. A header file contains information for more than one file: prototypes for functions[add reference], global [add reference to glossary] data, and type definitions [add reference to compound & user-defined types]. They can also @code{include} other header files using @code{#include}. A code file mainly contains the actual code of the program, @code{including} (using @code{#include}) header files so that it recognizes the information it shares with other files. @section Trivia In reality, a header file can include anything --- the preproccessor [reference?] just textually replaces each @code{#include} statement with the entire file it references. Once this is done, the compiler can't tell the difference between lines of text in the original @code{.c} source file and lines of text that were @code{#include}d. [/* was: In reality, a header file can include anything it wants. The preproccessor [reference?] just textually adds the files together, and as long as it doesn't depend on a pass of the preproccessor that occurs before the adding, it will work. */] (This author (David Starner) once had a program where the main function started in a header file and finished in the @code{.c} file the header file was included in.) This is really not reccommended; in general a header file should only contain what is mentioned above. When the GNU preprocessor encounters a @code{#include <>} or @code{#include ""}, where does it look for those files ? [FIXME: answer.] @section In Comparison @c In Comparison is written so that one who knows one of the languages can look and tell the major differences in the section, @c without reading the entire section. @subsection Java While a Java class must be saved in a file of the same name, a C file has no such restriction on file names. C also has header files (@code{.h}) for data needed in multiple files, whereas the Java compiler reads the @code{.class} files. [ARGGH! I don't know any Java (yet). But I was under the impression that @code{.java} files were source code that the Java compiler converted into @code{.class} executables, which contradicts this paragraph.] @subsection C++ C++ uses @code{.cpp}, @code{.cc}, or @code{.C} source code and @code{.hpp}, @code{.H}, or @code{.h} header files. Header files under C++ contain classes and code in form of inline functions as well as what is noted above as being in C header files. @subsection Pascal [DVDEUG: ARGGH! I have little familarity with Pascal. This is only basic Wirth & Jensen or ISO 7185 Pascal.] Whereas classic Pascal has a monolithic file structure with only one file per program, C has functions and declarations split up between files and uses header files to hold common data. [DAV may be bluffing when he says he knows Pascal:] Many versions of Pascal use Units ... @node conventions, fundamental data types, Top-Down View of C, Top @chapter general language conventions spaces, tabs, whitespace @section Program Structure @section Comments [FIXME: The description about how to use comments needs to flow.] Everything from a @code{/*} to the first @code{*/} is a comment intended solely for the human reader. Comments are ignored by the C compiler -- they have no effect on the exectable file. You can insert a comment anywhere there is "white space". Please put some comments next to code you write or change. You want your comments to tell WHAT your code does, not HOW. Documentation embedded in the source code is best. External documentation all too often is separated, lost. If the documentation is right there in the source code, it far easier to update it when the code changes. @section On one line @code{ /* @dots{} */ } @section Blocks @example /* @dots{} @dots{} @dots{} */ @end example @section Trivia Some C compilers (including GNU C) allow the C++ comment, @code{//} followed by arbitrary text up to a newline. This is a standard comment format for C9X. Unfortunately, too many early C compilers do not recognize the @code{//} comment style, so the @code{//} should not be used in portable C source code. At least one highly-contrived program compiles legally under both kinds of compilers, but executes differently. If you have code that uses the @code{//} comment style, you can convert it to the @code{/* ... */} comment style with @example sed -e 'sX^\([^"]*\("[^"]*"[^"]*\)*\)//\(.*\)$X\1/*\3*/Xg' test.cpp > test.c @end example [FIXME: what about the DOC++ comment style, ? http://www.zib.de/Visual/software/doc++/ DVDEUG: It's non-free isn't it? Don't worry about it. I will add something about literate programming. ] The extreme in commenting comes with Knuth's literate programming style, wherein @TeX{} is interlaced with actual code in order to produce programs that can be read like books. @TeX{} itself is written in this style. If you are interested, look through the Web2C documentation and get Cweb. [How to get this documentation ?] @node fundamental data types, Int Const Default, conventions , Top @chapter fundamental data types Programmers use an infinite number of possible types of data. All the different types (at least in C) are built out of the fundamental types ``built-in'' to the C language: @itemize @bullet @item the ``bool'' values (also called a "bit") It represents either true or false. @item The integers (char, int, short, long) @item The floating-point numbers (float, double, long double) @item Pointers (``*'') @item The ``un-type'' (void) @end itemize Pointers and user-defined types are covered in a later section (``Creating New Data Types''). A data type represents (among other things) the range of values that a variable can hold. This range is limited by, and specific to, the particular compiler used to compile that program. Every compiler should come with the standard header @code{}, which specifies the largest and smallest values of its fundamental data types. Note that these values are often different for different compilers. It doesn't make any sense to use any other @code{} file besides the original one that came with the compiler. @section Integers (char int long) @cindex integers An integer datum always contains some whole number such as -93, -12, 0, 1, or 69. But that's not the whole story. Every value in C must have a specific data type. As a consequence, it is impossible to simply have the value 7; it must be 7 in a particular type. The C language has several different data types for integers. Each type has a range of possible values; different types have different ranges. For example, a variable of type @code{short int} can hold any value from -32768 to 32767 in programs compiled by my compiler. A variable of type @code{int} can hold a value from -2,147,483,648 to 2,147,483,647 on most 32 bit machines. On other size architectures, the range of an @code{int} could be as small as the range -32,768 to 32,767, the same as a @code{short int}. Keep this in mind when writing programs that might be ported to a smaller machine. When you define an integer variable, you must choose one of the standard integer types for it. The type controls what range of values the variable can hold, and at the same time the amount of storage space used for the variable. It is impossible to store in a variable a value outside the range of its type; if you try to do this, the actual result is to store some other value, a value that is within the permissible range. (In practice, the extra high-order bits are discarded and the low-order bits are stored.) [FIXME: Stroustrup said once, ``unsigned integers, declared @code{unsigned}, obey the laws of arithmetic modulo 2^n. This implies that unsigned arithmetic does not overflow''. Is this really true for all standards-compliant compilers ?] Here are examples of declarations for integer variables: @example short int s; int u, v; unsigned long x; unsigned long long z; /* non-portable */ @end example You can omit the keyword @code{int} if you use any of the keywords @code{long}, @code{short}, @code{signed} or @code{unsigned}. @xref{Declarations}. @section Signed and Unsigned Types @cindex signed types @cindex unsigned types Each integral type has two forms: @samp{signed} and @samp{unsigned}. The two forms occupy the same amount of storage space and their ranges are equally large. A signed type has a range of values centered on zero, while an unsigned type has a range that starts at zero. For example, (on one particular machine using one particular compiler) the @code{ short int} has 65536 distinct values. The unsigned form, @code{unsigned short int}, can hold any integer in the range 0 to 65535, while the signed form, @code{signed short int}, has a range centered on zero, -32768 to 32767 to be exact. It is common to assume that specific sizes (16-bit @samp{short}, 32-bit @samp{long}) are ``standard''; this assumption is in error. (All too often, some people assume that @samp{int} is 16 bits. Others assume that @samp{int} is 32 bits. Obviously, both cannot be right, and sometimes both are wrong.) Similarly, it is not guaranteed that any integer type can hold a pointer, although it is quite common for it to be possible. [FIXME: This paragraph is (or should be) redundant compared to later information.] For portability, avoid trying to cast a pointer into any integer type use @samp{long} when you need more than 16 bits (up to 32 bits) use @samp{char} when you need no more than 8 bits and want to conserve space, use @samp{short} when you need more than 8 bits (up to 16 bits) and want to conserve space use @samp{int} when you need at most 16 bits and want speed over space. In general, programs that make assumptions about the sizes of the integral types are device drivers for a specific operating system, or very poorly written. @kindex signed @kindex unsigned You can specify a signed type or an unsigned type by using the keywords @code{signed} and @code{unsigned} as part of the type name. @section Table of Integer Types @kindex int @kindex short @kindex char Here is a table of all the integer types of C, together with their ranges (as documented in @code{} in a typical implementation of GNU C): You are guaranteed that @samp{char} will be at least 8 bits, @samp{short} at least 16, @samp{long} at least 32. @samp{int} will be at least as large as @samp{short}, and no longer than @samp{long}. In general, each type will be no larger than the next larger type. For example, there are implementations where all of the integral types are 64 bits. @table @code @item int @itemx signed int Four-byte signed integer; range -2^31 to 2^31-1. Guaranteed to be at least as large as @samp{short}, i.e., on smaller machines, the range could be as small as -2^15 to 2^15-1. @item unsigned int Four-byte unsigned integer; range zero to 2^32-1. On smaller machines, the range could be as small as 2^16-1. @item short int @item signed short int Two-byte signed integer; range -2^15 to 2^15-1. @item unsigned short int Two-byte unsigned integer; range 0 to 2^16-1. @item signed char One-byte signed integer; range -128 to 127. @item unsigned char One-byte unsigned integer; range 0 to 255. @item char Depending on the machine, @code{char} is an alias for either @code{signed char} or @code{unsigned char}. The only values that you can count on to fit in a @code{char} regardless of the type of machine are 0 to 127. @item long int @itemx unsigned long int These types in GNU C are equivalent to @code{int} and @code{unsigned int}. In some other C implementations, @code{long int} occupies more bytes than @code{int}. For example, in the original implementation of C, @code{int} occupied only two bytes (like @code{short int}), and to get a four-byte integer it was necessary to use the type @code{long int}. @item long long int Double precision signed integers ranging from -2^63 to 2^63-1. These integers occupy 8 bytes. (Most C compilers don't support this type.) @item unsigned long long int Double precision unsigned integers ranging from 0 to 2^64-1. These integers occupy 8 bytes. (Most C compilers don't support this type.) @end table Even though two types may be equivalent (@code{int} and @code{long int} are equivalent in my compiler, and @code{char} is always equivalent to either @code{unsigned} or @code{signed char}) they are considered distinct types. For example, the types pointer-to-@code{int} and pointer-to-@code{long int} are completely different types. @section Integer Constants @cindex integer constant @cindex octal @cindex decimal Any positive integer value can be written as a constant. There are no constants for negative integer values, but unary @samp{-} and a positive constant do the job of one. @subsection Integer Constant Radices There are three ways of writing integer constants: decimal, octal and hexadecimal. @itemize @bullet @item A decimal constant is a sequence of digits not starting with a zero. Any positive number except zero can be written this way. @item An octal constant is a sequence of digits starting with a zero. The zero tells the compiler to interpret the digits in base 8. Thus, @samp{010} has value 8, @samp{013} has value 11, and @samp{0100} has value 64. Strictly speaking, @samp{0} is an octal constant. But 0 is 0 in any radix. @item @cindex hex digit @kindex 0x A hexadecimal (or @dfn{hex}) constant is @samp{0x} (or @samp{0X}) followed by a sequence of @dfn{hex digits}. A hex digit is either a decimal digit, or a letter in the range @samp{a} through @samp{f} (upper or lower case). @samp{a} stands for 10, @samp{b} for 11, and so on, through @samp{f} for 15. Thus, the hex constant @samp{0xa} has value 10, @samp{0x10} has value 16, @samp{0x16} has value 22, @samp{0x20} has value 32, and @samp{0xff} has value 255. @end itemize Hexadecimal constants are used more often than octal constants, because it is easy to see how a hexadecimal constant breaks down into separate bytes. Each pair of hexadecimal digits makes one byte. Octal constants don't split conveniently into bytes. @node Int Const Default, Int Const Type, fundamental data types, Top @subsection Default Type of an Integer Constant Like all C expressions, an integer constant specifies a data type as well as a value. The type is usually determined by the value, unless you use a suffix letter (@pxref{Int Const Type}). The type of a decimal constant is taken from the following series: @example int, long int, unsigned long int @end example @noindent The type of a decimal constant is the first type in that series which can hold the constant's value. Thus, any value that is small enough will have type @code{int}. In GNU C, @code{long int} never plays a role because it is effectively the same as @code{int}; not so in other C implementations. The type of an octal or hex constant is taken from the following series: @example int, unsigned int, long int, unsigned long int @end example @noindent There are some values that can fit in an @code{unsigned int} but not in an @code{int}; if the constant is written in octal or hex, that unsigned type is used for such values. In GNU C (since @code{unsigned int} and @code{unsigned long int} have the same range), some values are have type @code{unsigned int} when written as a octal or hex constant, but have type @code{unsigned long int} when written as a decimal constant. @node Int Const Type, Int Conversion, Int Const Default, Top @subsection Explicitly Typed Integer Constants @kindex l @kindex u The letters @samp{u} and @samp{l} may be used as suffixes to specify the type of an integer constant. The letter @samp{u} means it must be unsigned. The letter @samp{l} means it must be long. (Upper case is accepted also; in fact, @samp{L} is better than @samp{l} because @samp{l} looks too much like a @samp{1}.) The effect of the suffix is to reject certain types from the series of possible types. (The series of possible types depends on the constant's radix; @pxref{Int Const Default}). @samp{l} rejects the types that are not long, and @samp{u} rejects those that are signed. Once those are rejected, the type used is the first of those remaining which can hold the actual value. @subsection Integer Constant Type Examples Here are some examples of integer constants and their types. @itemize @bullet @item The hex constant @code{0x80000000} needs 32 bits. On my compiler [DVDEUG: GNU C?], its type is @code{unsigned int}, because it can fit in that, whereas it is just barely too large for an @code{int}. @item @code{2147483648} is the same value, expressed in decimal. It is a @code{long unsigned int} because @code{unsigned int} is never used for decimal constants, and neither @code{int} nor @code{long int} will hold this value. @item @code{0x80000000L} is likewise a @code{long unsigned int}. @code{unsigned int} is ruled out by the @samp{L}, so the next candidate type that can hold the value is used. @item @code{0x80000000u} is an @code{unsigned int}, just like @code{0x80000000}. The @samp{u} rules out @code{int}, but that has no effect, since this value doesn't fit in an @code{int} anyway. @item @code{2147482648u} is a @code{long unsigned int}. @code{int} and @code{long int} are ruled out by the @samp{u}, and @code{unsigned int} is ruled out by the choice of decimal radix. @item @code{3l} is a @code{long int}. @code{int} is barred by the @samp{l} and @code{long int} is the next candidate for a decimal constant. [FIXME: Is this `@code{ul}' really valid ?] @item @code{4ul} is a @code{unsigned long int}. @code{int} is barred by the @samp{l}; @code{long int} is barred by the @samp{u}; @code{unsigned long int} is the next candidate for a decimal constant. @end itemize @node Int Conversion, Int Promotion, Int Const Type, Top @section Type Conversion among Integer Types C allows automatic conversion between integer types. Conversions can be requested explicitly with casts (@pxref{Casts}); they also happen automatically when the operands of an arithmetic operator have different types, and for integer promotion (@pxref{Int Promotion}). @node Int Promotion, variables, Int Conversion, Top @section Default Integer Promotions @cindex promotions (integer) In C, the @code{short} and @code{char} types (whether signed or not) are nominally never used for any operation. Values of these types appearing in arithmetic expressions are always converted to type @code{int} before any arithmetic is done, before they are passed as arguments to a function, and so on. In fact, the GNU C compiler may omit the conversion, but only when this has no effect on the result. For understanding the meaning of a C program, you can assume that the conversion always happens. @ignore Controversy over previous 2 paragraphs. [DVDEUG: Whoa! Default promotion like this disappeared with ISO C!] [DAV: Promotion seems to be alive and well. I run #include main(){ unsigned char a, b, c; a = 0xff; b = 0xff; c = (a+b)/2; printf("%i", (unsigned int)c ); } If there is *no* promotion, then (uchar)0xff + (uchar)0xff should equal (uchar)0xfe. Divide by 2, and we get 0x7f (printing "127"). However, when I compile this with $ gcc --version 2.7.2.3 and run it, it prints "255". Is there a better explanation than just saying that the chars were promoted to int, so that the result of addition (uchar)0xff + (uchar)0xff is (int)0x0ffe ? ] FIXME: What is the most understandable way of summarizing this ? I prefer easy-to-understand "as if" rules, even if a particular compiler doesn't happen to actually work that way internally -- as long as we get the same results. @end ignore @section Floating Point Numbers (float, double, long double) @cindex floating point @cindex mantissa @cindex exponent @cindex scientific notation @dfn{Floating-point} numbers are the computer's version of ``scientific notation''. Floating point data is often called ``real'' data but strictly speaking this is a misuse of language. Floating point is often used to represent real-number values, but general real numbers cannot be exactly represented, only approximated. [DVDEUG: Add reference to "What every Computer Scientist should know about floating point. Also add references to NAN and Inf.] [FIXME: is all this really necessary ? Can't we just say that fixed-point numbers can handle fractions and numbers of a certain range with a certain precision, and be done with it ?] In scientific notation, a number is represented as the product of its @dfn{mantissa}, which is a number between 1 and 10, and a power of 10. The power of 10 used is called the @dfn{exponent} of the number. Here are some examples of numbers in scientific notation: @table @asis @item 129 1.29 * (10^2) @item 100 1.0 * (10^2) @item 99 9.9 * (10^1) @item 5.5 5.5 * (10^0) @item .125 1.25 * (10^-1) @end table Floating point notation in the computer is the binary equivalent of scientific notation. The mantissa is between 1 (inclusive) and 2 (exclusive) and is represented in binary; the exponent is a power of 2 instead of 10. Here is how the previous examples would look in the computerized format: @table @asis @item 129 1.0000001 * (2^7) @item 100 1.1001 * (2^6) @item 99 1.100011 * (2^6) @item 5.5 1.10111 * (2^5) @item .125 1.0 * (2^-3) @end table Note that the exponent of the number zero is not really determined because 0 * (2^0) = 0 * (2^1) = 0 * (2^@var{anything}). By convention, when zero is represented as a floating-point number, zero is used as the exponent value. @section Floating Point Types Floating-point data types in the computer differ in how many bits are available for representing the mantissa and the exponent. The number of mantissa bits determines how much significance can be represented; the number of exponent bits determines the overall range of magnitudes that can be represented. For example, if 7 bits are available for the exponent, the range of possible exponents is from @minus{}64 to 63, so the range of possible floating point values is from 2^@minus{}64 to 1.111@dots{} * 2^63. With 8 exponent bits, the smallest possible positive value is twice as small and the largest possible positive value is twice as large. If only 4 bits were available for the mantissa, it would be impossible to distinguish the numbers 16 and 17 (10000 and 10001 in binary). Only the first 4 significant bits, 1000 in both cases, could be kept. In actuality, at least 24 bits of mantissa are always available. This translates to around 7 significant decimal digits. Since the first bit of the mantissa is always one, it is often not explicitly represented. [FIXME: Is this always true for all GNU C implementations ?] All ANSI C implementations provide three distinct data types for floating point numbers: @code{float}, @code{double}, and @code{long double}. In GNU C, @code{float} is a 32-bit single-precision number; 32 bits are available for the mantissa, exponent and sign bit. Just how the bits are apportioned among mantissa and exponent depends on the kind of computer in use. @code{double} is a 64-bit double-precision number. @code{long double} is equivalent to @code{double}, but it is considered a distinct type. @section Floating Point Constants Floating point constants let you express particular floating-point numbers in C programs. Each floating-point constant specifies a numeric value and a data type (either @code{float}, @code{double} or @code{long double}). The numeric value consists of a mantissa optionally followed by an exponent. The mantissa is a number with a decimal point. An exponent is the letter @samp{e} (or @samp{E}) followed by an integer which may have a sign. If an exponent is given, the decimal point is not required in the mantissa. Here are some examples, all of which have the value 150: @example 150.0 150e0 15e1 1.5e2 1.5e+2 1.500e2 .015e4 @end example A letter at the end of the constant specifies the data type. The letter @samp{F} (or @samp{f}) specifies type @code{float}. The letter @samp{L} (or @samp{l}) specifies type @code{long double}. No letter at all specifies the default, which is @code{double}. It is rarely necessary to use letters to specify the type explicitly. One time when it is useful is when using the constant in arithmetic together with values of type @code{float}: if you do not explicitly specify the type, the constant is a @code{double}. The compiler will add code to convert the other values to @code{double} and the arithmetic would be done in @code{double} precision. If the result that you want is a @code{float}, the extra conversions would make the program unnecessarily slow. You can avoid the extra conversions by explicitly specifying the type of your constant as @code{float}, like this: @example @{ float *x, y; *x = (y + 1.3f) * 2.4f; @} @end example @section the ``un-type'' (void) The ``un-type'' @code{void} is used only in these 3 common situations: @itemize @bullet @item the type of the single argument to functions which take no arguments @item a generic pointer, i.e., a pointer of type @code{void *}, can point to a object of any type (see Pointers) @item the return type of a function which doesn't return anything (see Functions for both flavors of this situation). @end itemize There are no objects of type @samp{void}. @section Numeric Type Conversion In C, any numeric type can be converted automatically to any other numeric type. Type conversion happens in assignments, in arithmetic, and in casts (@pxref{Casts}). It may also happen in @code{return} statements (@pxref{Return}) and in function calls when a prototype is in effect (@pxref{Prototype}). For example, if @code{x} is a variable declared as @code{int} and @code{f} is declared as @code{float}, then @example f = x; @end example @noindent converts the value of @code{x} to floating point and @example x = f; @end example @noindent converts the value of @code{f} to an integer. If a constant appears in a context where it would need to be converted immediately to another type, GNU C converts it while compiling the program. Normally this makes no difference except to speed up execution. @section Integer Conversion The general rule when converting a value from one integer type to another is that the numeric value is unchanged if it is within the range of possible values for the new type. If it is outside the possible range, then the number's bit pattern is preserved.[FIXME: This is confusing.]. If the number has too many bits to fit, then the least significant bits are kept, as many as will fit. @cindex extending Converting an integer of a narrower type to a wider integer type (such as @code{char} to @code{int}) is called @code{extension}. If the types are signed, it is called @code{sign-extension}. If the original type is unsigned, it is called @code{zero-extension}. In either case, the number keeps the same value. There is one other case of extension, from a signed type to an unsigned one. This case is an exception because only positive values can go through unchanged; negative values cannot do so because the unsigned type cannot represent them. A negative number large in absolute value becomes a small positive number, and a negative number close to zero becomes a large positive number. This case is error-prone, so check carefully whenever you write code that converts @code{signed} numbers to @code{unsigned}. @cindex truncation When a value of wider type is converted to a narrower type, it keeps the same value if possible; but often this is impossible. For example, 513 (1000000001 in binary) cannot keep the same value when converted to a @code{char}; it is outside the possible range of a @code{char}. In this case, the least significant bits remain the same and the rest are lost. Thus, 513 converts to the @code{char} value 1. This is called @dfn{truncation}. Sometimes truncation of a positive value has a negative result. For example, truncating 129 (10000001 in binary) to a @code{char} has the value @minus{}127 because the first 1 in the number is now the sign-bit. Of course, this happens only when the result type is a signed type. There is one other case of integer type conversion, that where the old and new types are equally wide but one is signed and the other is unsigned. In this case, the bit pattern is preserved. For example, when converting from @code{char} to @code{unsigned char} or vice versa, values 0 through 127 are unchanged. @code{char} values @minus{}128 through @minus{}1 map into @code{unsigned char} values 128 through 255, respectively, and vice versa. It was shown above how 129 as an @code{unsigned char} corresponds to @minus{}127 as a @code{char}. @section Floating Point Conversion When a value of type @code{float} is converted to @code{double}, it keeps the same numeric value. @code{double} can represent anything that @code{float} can. Likewise when @code{float} or @code{double} is converted to @code{long double}, accuracy is maintained. @cindex floating overflow When a value of type @code{double} is converted to @code{float}, two kinds of problems must be faced. @itemize @bullet @item @code{float} has fewer mantissa bits. The most significant mantissa bits are kept, as many as will fit, so that the result is close to the original value even if not exactly the same. @item @code{float} has fewer exponent bits, so its largest possible value is smaller. If the number being converted fits in the possible range of a @code{float}, this problem has no effect. If the number does not fit, the result is pure garbage, this being an example of @dfn{floating overflow}. @end itemize @section Integer to Floating Point When an integer value is converted to a floating point type, in general, the result is the floating point value which is numerically closest to the original integer. In some cases, the integer can be represented exactly. For example, converting the integer 5 to @code{float} results in the number 1.25 * 2^2, or, in binary, 1.01 * 2^2, whose value is exactly 5. But this is not possible for large integers. An @code{int} has 31 significant bits; in a @code{float}, some of the 32 bits are needed for sign and exponent, leaving typically 24 bits of significance. Integers greater than this cannot be represented exactly. For example, both 268435456 and 268435457 convert to the same floating point number (these integers are 2^28 and 2^28+1). This loss of significance does not happen when converting an @code{int} to a @code{double} because type @code{double} has more than 32 bits of mantissa. @section Floating Point to Integer When a floating-point value is converted to an integer type, the result is the nearest integer, rounding toward zero. Thus, 1.5 converts to 1, and @minus{}1.5 converts to @minus{}1. A floating-point value may far exceed the range of a @code{int}. For example, the largest possible @code{float} value is at least 2^64 --- much too large for an @code{int}. When such values are converted to @code{int}, the result is undefined. [FIXME: this section needs work] @section Trivia You can make @code{bool} variables in C++, but not in ordinary C. C9x has provisions for boolean variables. [DVDEUG: Specify!!] Some compilers (GNU C among them) add the type @samp{long long}, which is most often 64 bits. It is not compatible with ISO C, in which it is a syntax error, but it may prove helpful or necessary during porting projects. [FIXME: a few compilers have something vaguely similar to _int16, _int32, _int64, and others - are they worth mentioning them here ? DVDEUG: I don't see why; they are extremely non-standard, and not part of GNU-C. ] @section In Comparison A function with a void return type is usually called a ``procedure'' in most other languages. @node variables, pointers, Int Promotion, Top @chapter variables @section declaring variables @example int x; @end example @noindent declares @code{x} to have type @code{int}. Every variable used in a C program must be defined, in a @dfn{declaration}, before it is used. The declaration has five purposes: @enumerate @item To give the function or variable a name, so it can be used later. @item To describe the data type of the function or variable: for example, whether the value is an integer or a character string. This is done with a @dfn{type specifier} and a @dfn{declarator}. [@var{declarator} may not be an English word, but it is the standard term.] @item To specify how storage for a variable should be allocated. This is done with a @dfn{storage class} (@pxref{Storage Class}). @item To specify the @dfn{scope} of the name: for example, whether the name is known in an entire program or only in the current file or function. The storage class fills this role also. @item Optionally, to give an initial value. This is done with an @dfn{initializer} (@pxref{Initializers}). @end enumerate @section initializing variables If the variable is static or automatic [FIXME: what other kind of variable is there ?], an initializer may be added, as in @example int x = 5; @end example @noindent which is the same as the previous example except that @code{x} is initialized to 5 when its storage is allocated. @xref{Initializers}. @section Assignment statements (and combinatorial assignment) [FIXME: huh ?] @section choosing variable names start with letter ... number, ... underscore ... ... [FIXME: is there a maximum length ? ] ... Normal programs cannot use the C keywords for identifiers (variable names, and function names, and user-defined type names). I also highly recommended that you do not use these other special reserved words for identifiers: @table @asis @item Words that start with underscore @item C keywords [FIXME] @item C++ keywords asm catch class delete friend inline new operator private protected public template try this virtual throw @end table @section Type Conversion @section Automatic type conversion @section Type casting @section Quantization Errors @node pointers, Creating New Data Types, variables, Top @chapter pointers @section the pointer type A pointer represents the address of a block of memory, together with the data type of the block. Pointers have several uses: @itemize @bullet @item Pointers represent character strings. [FIXME: Is this confusing ? Is this a good pedagogical viewpoint --- that character strings are directly related to pointers, rather than merely being of type @code{char []} ?] @item A subroutine can be told where to store its output-value by giving it a pointer to the desired place. @xref{Address}, for an example of this use. @item A subroutine can be told which function to call by giving it a pointer to the desired function. @xref{@code{quicksort()}} for an example of this use. @xref{Function Pointers}. @item Trees and linked lists can be created by storing pointers to blocks of data into other blocks of data. @xref{Lists}, for an example of this use. @end itemize @section declaring pointers @cindex pointer types @cindex pointer declarations In C, every expression must have a single clearly defined data type. This includes an expression to refer to the contents of a pointer. C determines the type of the contents by the type of the pointer. Therefore, C has many types of pointers --- one for each type of contents. Each C data type @var{t} has a corresponding pointer type, the type of pointers-to-@var{t}. A value of type pointer-to-@var{t} describes the address of a block of memory whose contents have type @var{t}. To declare a variable @var{v} to have type pointer-to-@var{t}, pretend you are declaring @code{* @var{v}} to have type @var{t}. (This isn't much of a pretense, because @code{* @var{v}} will be an expression of type @var{t}.) @xref{Declarations}. For example, to declare @code{p} as a pointer to a @code{char}, write: @example char* p; @end example [FIXME: Can I delete this paragraph ? Does it say anything that hasn't already been said, and better, by the previous few paragraphs ?] A pointer type is a derived type, and cannot be the basic type of a declaration. To declare a variable with pointer type, you must also specify: ``To what type of thing does this variable point ?''. To declare @var{v} with type pointer-to-@var{t}, one must declare the complex declarator @code{* @var{v}} to have base type @var{t}. For example, [/FIXME] @example char* string; @end example @noindent declares @code{string} to be a pointer to @code{char}. Here the declarator is @code{* string} --- a complex declarator that expresses the relationship between @code{string}'s type and the declaration's basic type (@code{char}). To express pointers to types that are not themselves basic, the @code{* @var{var}} construct is nested within other declarator constructs. For example, a pointer to a pointer-to-@code{char} is declared as follows: @example (char (*(* stringptr))); @end example In this case, the parentheses are optional. This is exactly equivalent to @example char** stringptr; @end example A pointer-to-a-pointer is commonly called a @dfn{handle}; in this case, we have a ``handle-to-a-@code{char}''. If you want a variable named @code{funcptr} to point to function taking two @code{double} arguments and returning @code{int}, write: @example int (*funcptr)(double, double); /* funcptr is a pointer variable */ @end example @noindent Here parentheses are required around @code{*funcptr} to specify that the @code{*@var{var}} construct is nested within the function-type construct. [FIXME: David still doesn't know how to parenthesize arbitrary type declarations ... is there a simple rule ?] If you had written @example int* funcptr(double, double); /* function prototype */ @end example @noindent the compiler would think that you were declaring the function prototype @example int* (funcptr(double, double)); /* identical function prototype */ @end example @noindent a function whose value is a pointer to an @code{int}. @xref{Precedence}. You can add a initialization to a pointer declarator for static and automatic variables [FIXME: what other kind of variables is there ?]. For example, @example char* string = "Hello"; char **stringptr = &string; int (*funcptr) (double, double) = &double_divide_and_round; @end example @noindent Note that the initializer is added after the entire declarator, but the value of the initializer must have the same type as the variable being declared --- @emph{not} the basic type of the declaration. [@var{initializer} is not an English word, but a special term for talking about C programs.] @subsection The generic pointer type @code{void *} The type @code{void *} is used, by convention, for the address of a block of memory to which no particular type is ascribed. For example, dynamic memory allocation functions typically return this type. If a dynamic allocation function is intended for general use, then there is no telling what type of data the caller wants to allocate --- any C data type is possible --- so there is no reason to prefer any one type for the function to return. But the value must have @emph{some} type. @code{void *} is a noncommittal choice. A pointer of type @code{void *} has no ``contents''; you cannot apply the @samp{*} operator to it. However, you can cast it to any other pointer type, and @emph{then} apply the @samp{*} operator. For example, the following is valid: @example char c; int i; struct foo s; void * x; x = malloc( sizeof(foo) ); c = * (char *) x; i = * (int *) x; s = * (struct foo *) x; @end example [FIXME: Is this really valid ? I've seen some mainframe operating systems, if you try to read data out of a uninitialized block, will core dump your program.] [FIXME: perhaps a more useful example would be better here.] @noindent Here the block of memory that @code{x} points to is examined first as a @code{char}, then as an @code{int}, and finally as a @code{struct foo}. @code{void *} pointers may not be added or subtracted, but they may be compared like any other pointers. @section where do pointer values come from ? Pointer values arise in three ways: @itemize @bullet @item The address operator @samp{&} can make a pointer to any variable, function, array element or structure element. (Even variables of the user-defined data types discussed in the next chapter.) @item Dynamic storage allocation reports its results as a pointer to the memory that was allocated. @item A null pointer can be made by converting zero (@code{false}) to a pointer type. @end itemize @subsection Address of a Variable @kindex & (unary) @cindex address The unary operator @samp{&} returns the @dfn{address} of a variable (or other lvalue). The contents of this pointer are that variable. [FIXME: Does this sentence make sense ? or is this redundant from our discussion of @code{*} ?]. @samp{&} can be applied to both local and global variables. For example, suppose that @code{read_two()} is a function that reads two integers from an input file. A function can return only one value, so the most convenient way to get two integers back from @code{read_two()} is to provide two pointers as arguments, saying where to put the integers. Then, if we want the integers to be stored in the variables @code{i1} and @code{i2}, we can write: @example read_two(&i1, &i2); @end example We would use the following declaration for @code{read_two()} (for info on @code{void}, @pxref{Void Functions}): @example void read_two(int* i1, int* i2); @end example @samp{&} is not limited to variables. It can also be used with structure, union and array elements. For example, suppose that @code{a} is an array of @code{MAX_INTS} integers and we want to fill it up with pairs read with @code{read_two()}. The following code will work: @example int a[NUM_INTS]; int i; for(i = 0; i < MAX_INTS; i += 2)@{ read_two(&a[i], &a[i + 1]); @}; @end example @code{&@var{a}[@var{i}]} means a pointer to element number @var{i} in array @var{a}. @subsection Dynamic Allocation (malloc and free) @dfn{Dynamic allocation} means obtaining a block of memory which is allocated during the execution of the program. When memory is allocated dynamically, its size need not be known in advance. For example, you can write functions to operate on strings with no fixed upper limit on the size of the string. A dynamically allocated block of memory cannot have a variable name in the ordinary sense. The only way to refer to it is with a pointer. In the following examples we use @code{malloc}, which is a standard library function for dynamic allocation. It is documented elsewhere (see ...[FIXME]). For now it is enough to know that the argument to @code{malloc} is the number of @code{char}s of storage desired, and its value is a @code{void *} pointer to the block that was allocated (@pxref{Void Pointers}). For example, suppose we want character string, but we don't know until run time how long it needs to be. Once our program discovers it needs @code{size} characters, it can allocate the character string dynamically with @example string = (char *) malloc (size + 1); @dots{} free(string); @end example @noindent where a cast is used to convert the pointer to the correct data type. A very common error known as a ``memory leak'' happens when you repeatedly ask for more memory, but ``forget'' to give it back when you are done with it. This causes blocks of memory that you no longer need to steadily build up. When the program ends, these blocks are returned to the system; but if your program runs for a long time, eventually there may be no memory left. If there is not enough memory left to fulfill your request (either your program or other programs in the system have already used it all up), then @code{malloc()} returns a null pointer. C++ completely replaces @code{malloc()} and @code{free()} with the much easier to use operators @code{new} and @code{delete}. @example string = new char[size+1]; // This only works in C++ @dots{} delete string; @end example @subsection Null Pointers A pointer of any type may have the null value. Whenever a pointer happens to have the null value, we call the pointer ``@dfn{null pointer}''. The purpose of a ``@dfn{null pointer}'' is to be a distinguishable value that you can put in a pointer variable to say, ``As of now, this does not point anywhere.'' To create a null pointer, cast the integer zero to the pointer type that you want. For example, @code{(char *) 0} is an expression for a null pointer to a @code{char}. @code{0} is automatically cast to a pointer of the correct type when it is assigned to a pointer variable or compared with a pointer value. A null pointer has no contents. If a pointer used as the operand of the @samp{*} operator is null, it is an error. On some machines, the results are unpredictable; on others, the result is inevitably a fatal signal (the program will core dump). If a pointer value may be null, you should check whether this is so before attempting to use its contents. The way to do this is to compare against a null pointer expression or the integer zero. For example, @example #include void safe_contents(char* p) @{ if(0 == p)@{ /* The compiler automatically casts this `0' to a `(char *)0' */ printf("this is a null pointer.\n"); @}else@{ printf("this pointer points somewhere - it points to \"%s\".\n", p); @}; @} void main() @{ char * x = "TEST"; safe_contents(x); x[0] = 0; safe_contents(x); x = 0; safe_contents(x); @} @end example @noindent causes this to be printed: @example this pointer points somewhere - it points to "TEST". this pointer points somewhere - it points to "". this is a null pointer. @end example @section what do I do with pointer values once I have them ? @subsection dereferencing (*) @cindex contents @kindex * (unary) Most of the time, a pointer will actually point to a memory block. We call the contents of that memory block the @dfn{contents of the pointer}, for short. To get the contents of a pointer, apply the unary @samp{*} operator to the pointer value. Another operator that is used with pointers to structures is @samp{->}. It takes one structure element of the contents when the contents are a structure. @xref{Structure Pointers}. ... illegal/undefined when the pointer is not pointing at a ``real'' block ... can cause core dump ... most random values, as well as the null value ... ... @subsection pointers and strings @subsection pointer arithmetic @cindex addition (pointer) @cindex subtraction (pointer) Two arithmetic operations are defined on pointer types: addition and subtraction. Not all pointer data types support them: pointers to @code{void} do not, and pointers to functions do not. But all other pointer types do. Addition and subtraction on pointers can also be done with the modifying assignment operators (@pxref{Modify}) and the increment/decrement operators (@pxref{Increment}). [FIXME: should we mention the type @code{size_t} here ?] @table @code @item @var{p} + @var{i} @itemx @var{i} + @var{p} The result of adding a pointer @var{p} and an integer @var{i} is a pointer of the same type as @var{p}, but advanced from @var{p} by @var{i} objects --- by @var{i} times the length of the object that @var{p} points to. This means that if @var{p} points to an element of an array, @code{@var{p}+@var{i}} points @var{i} elements later. Thus, @example &a[3] + 2 @end example @noindent is equivalent to @code{&a[5]}; it takes the address of the third element and then advances it by two elements' worth. This is true whether the elements are @code{char}'s or @code{double}'s or large structures. In fact, @code{&a[@var{i}]} is equivalent to @var{&a[0] + @var{i}}. @item @var{p} - @var{i} Subtracting an integer from a pointer is really nothing new. This expression is equivalent to @code{@var{p} + (- @var{i})}. @item @var{p1} - @var{p2}. Subtraction is also allowed between two pointers of the same type. The result (an integer) tells how far apart the two pointers lie, measured in units of the objects pointed to. For example, @example &a[5] - &a[3] @end example @noindent is invariably 2. (Note that these pointers may be hundreds of bytes apart if @code{a[]} is a large structure type). The compiler subtracts the addresses, then divides the result by the size of the objects to which they point. The subtraction is legitimate only if this division comes out even; the result is not considered well defined otherwise. When the subtraction is well defined, the result can be added to @var{p2} to give back @var{p1}. @item @var{p}[@var{i}] The array indexing operator, @code{[]}, can be used with a pointer in place of an array. In effect, it regards the pointer as pointing to the first element of an array, and fetches the contents of the @var{i}th element. This expression is equivalent to @example *(@var{p} + @var{i}) @end example @end table @subsection Comparison of Pointers All of the comparison operators can be used on two pointer values of the same type (@pxref{Comparison}). The integer zero may also be used as one of the operands. Zero is converted automatically to a null pointer of the same type as the other operand. @samp{==} and @samp{!=} test whether two pointer values are identical (point to the same place). The order-comparisons @samp{>}, @samp{<}, @samp{>=} and @samp{<=} test pointers according to the order in memory of the places they point to. Smaller addresses are considered ``less''. [FIXME: I (DAV) used a C compiler that put the 20 address bits of its machine into 3 bytes, but @code{int} was merely 16 - does this make the following statement wrong/non-compliant, or was my compiler merely non-compliant ? What about the type @code{size_t} ?] Comparing two pointers gives the same result as casting them both to @samp{int} (on some machines) or @samp{unsigned int} (on other machines) and comparing the integers. @xref{Pointer-Integer}. @subsection pointers, structures and lists @subsection passing values between functions by pointers @subsection pointers to functions @subsection Pointer-Integer Conversion A cast (@xref{Casts}) can convert an integer value to a pointer value, or a pointer value to an integer value. The ANSI C standard does not specify exactly what this conversion means. GNU C keeps the same bit pattern when it converts. As a consequence, the conversion takes no time to execute. Another consequence is that result of converting any pointer to an integer is the difference in bytes between that pointer and a null pointer. In fact, for a pointer to a @code{char}, converting to @code{int} is the same as subtracting a null pointer. In GNU C, converting a pointer to an integer [FIXME: what kind of integer ? surely not a @code{short int} ?] and then back to a pointer produces a value equal to the original pointer. The same is true if an integer is converted to a pointer and then back to an integer. @section Trivia @node Creating New Data Types, Arithmetic and Bitwise Operators, pointers, Top @chapter Creating New Data Types @section Arrays @subsection declaring and initializing arrays @cindex array @cindex index An @dfn{array} is a sequence of elements, all of the same type (the ``element type''). An individual element is identified by its sequence number (called its @dfn{index}). An array type is a derived type, and cannot be the basic type of a declaration. To declare a variable with array type, you must always specify: ``What type of things are in this array ?''. You must usually also specify ``How many things are in this array ?'' (the ``@var{length}'' of the array, occasionally called the ``size'' of the array). To declare an array @var{a} with @var{length} elements of type @var{t}, one must declare the complex declarator @code{@var{a}[@var{length}]} to have type @var{t}. For example, @example char buffer[5]; @end example @noindent declares an array of 5 @code{char} variables; and names the array @code{buffer}. Here the declarator is @code{buffer[5]} --- a complex declarator that expresses the relationship between @code{buffer}'s type and the declaration's basic type (@code{char}). The length of an array type must be an integer. The ANSI C standard requires the length of an array type to be a positive constant known at compile time. GNU C also allows zero. GNU C also allows the length of an array of storage class @code{auto} to be any expression, which is recomputed each time space for the array is allocated (If the length is negative, the results are undefined.) The length of the array may be omitted if an initializer is present because the number of elements in the initializer shows how big the array must be. The length of the array may also be omitted for an external variable. The length of the array may also be omitted in function prototypes: @example float average_foot_smelliness( int number_of_feet, float foot_smelliness[] ); @end example @noindent Unfortunately, only the length of the *last* dimension of a multidimensional array may be omitted in a function prototype - all the other dimensions must be explicitly set in the function prototype. This makes it impossible to write a function to directly accept a 2D array of arbitrary size. There are various (incompatible) tricks to work around this inadequacy. [FIXME: should I mention a few ?] [FIXME: Is there any difference between `initialization' v. `initializer' ?] You can add an initialization to an array declarator for static and automatic arrays. The initializer for an array consists of a pair of braces surrounding a sequence of element expressions. The first item in the sequence initializes array[0], the next initializes array[1], etc. Once we run out of element expressions, the rest of the array is initialized to zero. For example, @example char * table[3] = @{"small", "medium", "large"@}; int values[3] = @{2, 20, 8192@}; int state[3] = @{@}; /* zero out the entire array */ @end example In strict ANSI standard C, the elements of an array initializer must be compile-time constant expressions. GNU C allows arbitrary expressions to initialize elements of automatic arrays; for a static array, since the initialization is done when the program is loaded, the value must still be constant. Array types in C are unusual because no expression can have an array type. Array types are used only for declaring arrays (variables of array type). Functions cannot be declared to return any array type. Whenever an array variable name appears as an expression, it is immediately converted to a pointer. That pointer points to the first element of the array. Even indexing works this way. (The @var{length} of an array is also called the length of the array). @subsection working with arrays Referring to an element by its index is called @dfn{indexing}. In C, indexing is represented with square brackets, as in @code{buffer[2]}. In C, indices always count from zero. The previously defined @code{buffer} contains 5 elements, but 5 would not be a valid index. Any attempt to read or write to buffer[5] may cause a core dump. The only valid indices to this buffer are 0, 1, 2, 3 and 4 --- in other words, we can now read and write to buffer[0], buffer[1], buffer[2], buffer[3], and buffer[4]. To express arrays of types that are not themselves basic, the @code{@var{var}[@var{length}]} construct is nested within other declarator constructs. For example, an array of pointers-to-@code{char} is declared as follows: @example char (*(stringptr[512])); @end example @noindent or more simply @example char * stringptr[512]; @end example This declares @code{strings} as an array of 5 elements, each of which is a @code{char *}. We declare @code{strings[5]} as a pointer to a @code{char}, and that in turn is done by declaring the complex declarator @code{*strings[5]} --- as a @code{char}. @example char *strings[5]; @end example And this declares @code{matrix} as an array of 9 arrays of 10 @code{int}'s. @example int matrix[9][10]; @end example @noindent Here we pretend to declare @code{matrix[9]} as an array-of-10-@code{int}'s, so @code{matrix} itself must be an array of 9 of those. (As an expression, @code{matrix[0]} would be the first subarray, and @code{matrix[0][9]} would be the last @code{int} in that subarray.) The length of an array may be omitted when you declare an initialized variable, because then it can be determined from the initializer. @xref{Initializers}. @section Indexing @cindex indexing @dfn{Indexing} an array means referring to one element by specifying its index. In C, indexing is represented with square brackets. @table @code @item @var{array}[@var{index}] This expression represents the value of the @var{index}th element of @var{array}. It is a lvalue; that is to say, it may appear on the left side of an assignment. That is how values are stored in array elements. @end table Using @var{array} in an expression converts it immediately to a pointer to the first element of the array. The indexing operation actually operates on this pointer. It can equally well operate on any pointer. It is equivalent to @code{*(@var{array} + @var{index})}. From this equivalent form, we see that indexing is a symmetrical operation. It follows that you can just as well write @code{@var{index}[@var{array}]}. In other languages, array indexing may check that the index is within the valid range for the array that is in use. In C, this is impossible because the indexing operation actually operates on a pointer to the first array element. This pointer carries no information about the length of the array. Indices that are nominally out of range are often useful. For example, when indexing a pointer that is not an array, negative indices may be useful. If @var{p} is a pointer to an element in the middle of an array, @code{@var{p}[0]} is that element, @code{@var{p}[1]} is the following element, and @code{@var{p}[-1]} is the previous element. Indexing by a value that appears ``too large'' is useful also. Often it is necessary to allocate arrays dynamically. Standard C does not define array types with varying length, so the usual practice is to declare the array with length 1 but actually allocate space for as many elements as are needed. It's the programmer's responsibility to keep track of how many elements were actually allocated. Then any index less than that number is valid in fact, even though it exceeds the nominal length with which array was declared. @subsection Multi-dimensional arrays @subsection Trivia Multi-dimensional arrays are not very easy to use in C. Most people who need them re-implement them ... The ``element type'' is the data type of all the elements of the array. In C, the ``element type'' of an array may be any type except for function types and @code{void}. For example, arrays of arrays are allowed, and so are arrays of structures and arrays of pointers. Arrays of pointers to functions are sometimes useful. @section Characters and Strings @subsection initializing strings @subsection Null termination @subsection working with strings @subsection Trivia @section Structures @cindex structure @cindex element @cindex member @cindex field @subsection Structures @samp{struct} @comment - didn't I already say this elsewhere ? A @dfn{structure} is a data object containing several sub-objects, each of a specified name and type. They need not all have the same data type. The sub-objects are called @dfn{elements}, @dfn{members} or @dfn{fields} of the structure. We also use the term ``element'' for a sub-object of an array. We use the term ``member'' (and ``field'') only to indicate a sub-object of a structure. In an array, a numeric index selects an element. In a structure, a name selects an element. [FIXME: is ``member'' always an exact synonym for ``field'' ?] [FIXME: is there a special term that always indicates a sub-object of an array, a term that never indicates a sub-object of a structure ?] @subsection defining structures @kindex struct In C, each kind of structure is a distinct data type and is distinguished by a name called the @dfn{structure tag}. You must define each kind of structure, specifying its structure tag name and the names and types of all the fields. Here is an example: @example struct fontunit@{ char code; int height, width, kern; int * bitmap; @}; @end example @noindent This defines a structure type that might be used to record the information about one character in a font. The structure tag name is @code{fontunit}. The structure contains five fields: one of type @code{char} named @code{code}; three of type @code{int} named @code{height}, @code{width}, and @code{kern}; and one of type @code{int *} named @code{bitmap}. Once this type is defined, @code{struct fontunit} behaves as the name of a data type, much like @code{int}. So it can be used to declare variables. @subsection declaring structure variables For example @example struct fontunit temp; struct fontunit *nextunit; @end example @noindent declares @code{temp} to be a structure of this type. We say that @code{temp} is ``a @code{struct fontunit}''. This means that @code{temp} is allocated a block of memory that has enough room for all five fields, one after the next. By contrast, @code{nextunit} is declared as a pointer to a @code{struct fontunit} (@pxref{Pointers}). @code{nextunit} is allocated a block of memory that has enough room for a single pointer. @subsection Structure Forward References @cindex forward reference In fact, it is possible to use the type @code{struct fontinfo} for some declarations even before it is defined. Before its definition, the amount of memory space needed to hold it is not known. So you are not allowed to define variables or structure fields of that type. But you can define @emph{pointers} to that type. For example, the following is legitimate: @example struct fontunit *nextunit; struct fontunit @{ char code; int height, width, kern; int *bitmap; @}; @end example @noindent The declaration of @code{nextunit} makes a forward reference to a structure type not as yet defined. After the definition of @code{struct fontunit} is seen, the C compiler fully understands the data type of @code{nextunit}. Until that time, it would be invalid to refer to the contents of @code{nextunit} with @code{*nextunit}. Undefined structure types can validly exist only buried within pointer types. The forward reference capability is essential for defining recursive pointer-structures. For example, @example struct mymove @{ enum piece_type piece; char new_x, new_y; struct mymove *alternative; struct hismove *next_move; @}; struct hismove @{ enum piece_type piece; char new_x, new_y; struct hismove *alternative; struct mymove *next_move; @}; @end example @noindent defines a data structure that might be useful in a game-playing program. Each @code{struct mymove} represents a move that the player might make; it belongs to a chain of alternative moves. It also points to the beginning of a chain of possible moves for the opponent, a chain of @code{struct hismove} structures, one for each move the opponent might then make. And each @code{struct hismove} structure points to another chain of @code{struct mymove} structures describing the possible responses for the player. Clearly these two structures could not be defined without a forward reference. But even the @code{struct mymove *alternative;} in the definition of @code{struct mymove} counts as a forward reference. @subsection Anonymous Structure Types It is possible to define a structure type that has no structure tag name. This is an anonymous structure type. Because it is impossible to refer to the type again, the definition of the type must appear in a declaration of one or more variables. The variables declared therein are the only ones that can have this anonymous type. For example, @example struct @{ int i; double d; @} struc1, struc2; @end example @noindent declares each of the variables @code{struc1} and @code{struc2} to contain an @code{int} and a @code{double}. This feature in its simplest form is not useful; you could just as well define each field as a separate variable. But in more complex usage it may be useful. For example, it is possible to copy @code{struc1} into @code{struc2} with a single assignment expression. Individual variables for the fields could not be copied as a group in this way. Also, an array of anonymous structures may be useful. For example, @example struct @{ int i; double d; @} a[10]; @end example @noindent defines an array of 10 @code{int}-@code{double} pairs. The analogous feature for unions is very useful. @xref{Anonymous Unions}. @subsection Structure Redefinition and Scope Structure tag names obey the same scoping rule as variable names do (@pxref{Scoping}). Each function definition, and each compound statement, forms a scope. The entire source file also forms a scope. A structure tag is in effect only during the innermost scope that contains the structure type definition. For example, if you define a structure tag name within a function definition, the tag name is defined only within that function. Another structure of the same name could be defined in the next function with no conflict. Structure tag names and variable names are completely independent. For example, you can have a structure named @code{foo} and a variable, function or type named @code{foo} with no interference. This is actually a common thing to do. However, structure tags, union tags and enum tags share one name space. Thus, you may not have @code{struct foo} and @code{union foo} defined at the same time in one scope. An attempt to do this will elicit an error message. @subsection Shadowing Structure Tags @cindex shadowing It is invalid to define the same structure tag name twice in one scoping level. But a name defined in an outer scope can be temporarily redefined for an inner scope. This is called @dfn{shadowing} the name's outer definition. For example, you can define a structure tag outside of function definitions (a definition whose scope is the whole file) and make an overriding definition of the same name inside a function definition. Within that function, the meaning of the structure tag name is the definition given in the function. After the end of the function, that definition ceases to exist and the tag name has its original meaning again. Here is an example: @example struct foo @{ int i, j; @}; double func(double x) @{ struct foo @{ double i, k; @}; struct foo * ptr; @dots{} return( ptr->i + ptr->k ); @} /* @i{the first definition of @code{struct foo} is once again in effect} */ @end example Shadowing is not usually a good idea. It is clearer to pick distinct names for your structure types. Occasionally it may be useful together with macros: a macro that expands into a compound statement might define a structure type for use within that compound statement. Shadowing makes it possible to do this without interference from the surrounding context. Because structure tags, union tags and enum tags come from the same name space, you can shadow one kind with another. For example, you can shadow a union tag name with a structure definition: @example union converter @{ int i[2]; double d; @}; int foo () @{ struct converter @{ char* defn; @}; @dots{} @} @end example @subsection Accessing Structure Elements @kindex . @cindex field access The binary operator @samp{.} refers to a field of a structure. The left operand is an expression whose type must be a structure. The right operand is not an expression. It is the name of one of the fields of that structure. Thus, after the declarations @example struct point @{ int x, y; @}; struct point cursor; struct * nextpoint = &cursor; @end example @noindent the expression @code{cursor.x} retrieves the @code{x}-field of the structure @code{cursor}. The expression @code{((*nextpoint).x)} retrieves the same value, but we usually abbreviate that as @code{nextpoint->x} (@pxref{Structure Pointers}). The ``@samp{.} expression'' is a lvalue if the left operand is (@pxref{lvalue}). Being a lvalue means its address can be taken with @samp{&} (@pxref{Address}) and usually that a value can be stored there with an assignment (@pxref{Assignment}). It is an error to use a left operand whose type is not a structure or union. It is an error to use a field name that does not belong to the particular structure or union type of the left operand. @subsection Structure Operations Accessing a field of a structure is not the only way to operate on one. These other operations are also allowed: @itemize @bullet @item Assignment: An entire structure object can be assigned a new value --- the value of another structure of the same type. @xref{Assignment}. @item Argument passing: A structure can be passed as an argument to a function. It is essential that the function argument be declared as a structure of the same type. @xref{Calling}. @item Returning: A function can be declared to return a structure type. Then a call to that function is an expression of that type. @item Address: The address of a structure can be taken with @samp{&} (@pxref{Address}). This address can be used later to access the original structure or its components (@pxref{Structure Pointers}). @end itemize There are no constant structure values, and type conversion is not possible for structures. @subsection Structure Size and Alignment Each structure type defined has an associated required alignment in memory and a size in bytes. The alignment required for a structure type is the maximum of the alignments required by the types of the fields of the structure. Each field is also aligned within the structure to its own required alignment. For example, in the structure @example struct foo @{ char c; int i; @}; @end example @noindent on a machine in which the address of an @code{int} must be multiple of 4, 3 bytes are unused in between fields @code{c} and @code{i}. If the alignment required for an @code{int} is only 2, just 1 unused byte is needed. In either case, the required alignment of the type @code{struct foo} is the same as that of @code{int} (because that is certainly not less than the required alignment of the other field's type, which is 1 for @code{char}). The size of the structure is equal to the offset of the last field, plus its size, rounded up to a multiple of the structure's required alignment. For example, in @example struct bar @{ int i; char c; @}; @end example @noindent the required alignment of @code{struct bar} is the same as that of @code{int}. The total size is thus 4 (the offset of @code{c}) plus 1 (the size of @code{c}), rounded up to a multiple of that alignment. The result is 6 or 8 if the alignment required for an @code{int} is 2 or 4. This means some space is wasted at the end. [FIXME: this assumes 4 Byte @code{int}s, which is not always true. Should we qualify this by saying ``on my particular compiler'', generalize to the same level of detail, or just gloss over the whole thing by saying ``padding makes it impossible to know the exact size of a structure'' ?] You can make a structure smaller by grouping smaller fields together. Consider the following two structure types: @example struct a @{ char c1; int i; char c2; @}; struct b @{ char c1; char c2; int i; @}; @end example @code{struct a} occupies 8 or 12 bytes according to the alignment required by @code{int}, whereas @code{struct b} occupies only 6 or 8. By putting the two @code{char}'s together, @code{struct b} saves an amount equal to the alignment required for an @code{int}. @subsection Pointers to Structures @kindex -> When the type of the contents is a structure type, it is often useful to combine the two operations of taking the contents (a structure) and taking an element of the structure. The binary operator @samp{->} does this. @table @code @item @var{ptr}->@var{elementname} The value of this expression is the element named @var{elementname} in the structure that @var{ptr} points to. @var{ptr} must be an expression whose type is a pointer to a structure type, and that structure type must have an element named @var{elementname}. This expression is equivalent to @code{(*@var{ptr})->@var{elementname}}. @end table For example, suppose we represent a complex number as a structure containing a real part and an imaginary part: @example struct complex @{ double real; double imag; @}; @end example Then, given a pointer @var{p} to a complex number, we can calculate the magnitude squared of the complex number as follows: @example double mag_squared(struct complex *p)@{ return p->real * p->real + p->imag * p->imag; @} @end example @noindent which is short for @example double mag_squared(struct complex *p)@{ return( ((*p).real) * ((*p).real) + ((*p).imag) * ((*p).imag) ); @} @end example @subsection Lists @cindex nodes This example shows how structures and pointers are used to make linked lists. We define a structure to hold one node of a list of @code{int} values. The list is made of @dfn{nodes}; each node contains one @code{int} value and a pointer to the following link: @comment 1998-05-27:DAV: replaced the term `link' in the original text with the term `node'. @c Was the original author just confused, or has terminology really changed over the years ? @c What does the term ``a link of a linked list'' mean these days ? @c An individual blocks of the list, or a pointer inside that block ? @example struct int_list_node @{ int value; struct int_list_node *next; @}; @end example What goes in the @code{next} element of the last node? It cannot be a pointer to the following node, because there is no following node. Instead, we store there a @dfn{null pointer}: a pointer value that is recognizably distinct from any possible following node. The presence of a null pointer indicates that the node is the end of the list. @xref{Null Pointers}. This function @code{int_list_last()}, when given a pointer to a list (as described above), returns a pointer to the last node of the list. @example struct int_list_node * int_list_last (struct int_list_node *node)@{ while (node->next != 0)@{ node = node->next; @}; return(node); @} @end example If in the same program we need other kinds of lists --- lists of @code{double} values or lists of strings, perhaps --- a new structure type must be defined for each kind of list. Although the operation of finding the last node is fundamentally the same for each kind of list, a separate function is needed for each kind since each function applies only to one data type. This inconvenience can be remedied with @dfn{unions}. (C++ creates a totally different remedy.) @subsection Varying-Size Structures Often it is useful for dynamically allocated structures to end with an array of varying size. C requires each array to have a fixed size, so we cannot officially do this. What we actually do is define the structure with an array of size zero or one, but then allocate extra space. As an example, we will define a font consisting of a sequence of the @code{struct fontinfo} structures previously defined. Each @code{struct fontunit} describes one character in the font. Each font needs a different number of @code{struct fontunit} units, according to how many characters are defined. The data structure of the font must contain these units and must also say explicitly how many units there are. Here is how it is done: @example struct fontunit @{ char code; int height, width, kern; int *bitmap; @}; struct font @{ int length; struct fontunit contents[0]; @}; @end example A font containing @var{x} units can then be allocated with @example struct font * allocate_font (int x) @{ int nbytes = (sizeof (struct font) + x * sizeof (struct fontunit)); struct font *thisfont; thisfont = (struct font *) malloc (nbytes); if(thisfont == 0)@{ fatal("virtual memory exceeded"); @}else@{ thisfont->length = x; @}; return( thisfont ); @} @end example @noindent This example shows how to calculate the size required from the number of elements; it also illustrates the technique for checking that @code{malloc} succeeded. The length used to allocate the font is stored in the font's @code{length} field. That way, when the font is accessed later, it is possible to tell how many elements there actually are. For example, this function returns finds the element of @code{font} whose @code{code} field matches @code{thischar}, and returns a pointer to that element. If there is no such element, this function returns a null pointer (because zero converts automatically to a null pointer; @pxref{Null Pointer}). @smallexample struct fontunit * font_find_char(struct font *font, char thischar) @{ /* Point just past the last element that exists */ struct fontunit *end = font->contents + font->length; /* Look at each element; stop when past the last.*/ for(nextunit = font->contents; nextunit != end; nextunit++)@{ if(nextunit->code == thischar)@{ return nextunit; @}; @}; return 0; @} @end smallexample @noindent Note that @code{font->contents} refers to the field @code{contents}. Since that is an array, it is immediately converted to a pointer to its first element. The array officially has no elements, but that is no problem: The pointer points to where the first element would be if there were one. In fact, there really are elements --- dynamically allocated elements --- and that is exactly where the first one is. ANSI Standard C does not allow a zero-length array. If code is to operate on other C implementations, the @code{contents} field must be given the length 1 and the allocation code must be changed to match. The change is in the computation of @code{nbytes}. This is the result: @example struct font @{ int length; struct fontunit contents[1]; @}; struct font * allocate_font (int x) @{ int nbytes = (sizeof (struct font) + (x - 1) * sizeof (struct fontunit)); struct font *thisfont; thisfont = (struct font *) malloc (nbytes); if(thisfont == 0)@{ fatal("virtual memory exceeded"); @}else@{ thisfont->length = x; @}; return thisfont; @} @end example @subsection Bit Fields @cindex bit field A @dfn{bit field} is a structure field that is not a full byte or word. You can specify exactly how many bits long it should be. Bit fields allow you to pack information tightly into a small space. They are also useful for describing the pattern of data in a hardware register. A bit field is defined like any other structure field except that a colon and a bit-width follow the field name. For example, this is a structure, designed for a 16-bit @code{int} compiler, that breaks a 32-bit word down into 8 four-bit fields: @example struct half_bytes @{ unsigned int a : 4, b : 4, c : 4, d : 4; unsigned int e : 4, f : 4, g : 4, h : 4; @}; @end example @noindent You might think that this particular application calls for an array of four-bit elements, but unfortunately there is no such thing in the C language. Bit fields in C exist only as structure fields. Pointers in C can point only to bytes or multi-byte objects. A bit field is not usually composed of entire bytes, so in C pointers to bit fields are not allowed. Use of the address operator @samp{&} on a bit field causes an error message (@pxref{Address}). However, a bit field can be an lvalue for assignment purposes just like any other structure field (@pxref{Lvalue}). @subsection Data Types of Bit Fields The data type of a bit field must be an integer type or an @code{enum} type. An integer type may be signed or unsigned. This choice makes a big difference. A signed bit field of @var{n} bits has range of values @minus{}2^(@var{n}@minus{}1) to 2^(@var{n}@minus{}1) @minus{} 1. An unsigned one of the same number of bits ranges from zero to 2^@var{n} @minus{} 1. For example, an unsigned bit field of 1 bit can be 0 or 1, but a signed one-bit field can only be 0 or @minus{}1. If an @code{enum} type is used, it is treated as unsigned. The number of bits may not be longer than the word size; that is, the bit field may not be bigger than an @code{int}. @subsection Bit Field Machine Dependence Exactly how the fields are packed into bytes depends on the machine. On machines where the least significant byte of a word is the lowest-numbered, fields are packed in starting from the least significant bit. If the most significant byte is lowest number, fields are packed in starting from the most significant bit. Thus, the first field in a sequence of consecutive fields always goes into the next available byte. On some machines, field are freely split across word boundaries. On others, this is not allowed; then if the next field is too big to fit in what remains of the current word, it stars in the following word. @subsection Bit Field Gaps [FIXME: Is this true ?] You can leave a gap of a specified number of bits by defining a field with a negative size and no name. For example, @example struct foo @{ unsigned int x : 5; unsigned int y : 5; unsigned int : 3; unsigned int z : 3; @}; @end example @noindent gives 5 bits to @code{x}, 5 to @code{y}, skips the next 3, and gives 3 bits to @code{z}. The total is 16 bits, or two bytes. A nameless field with ``size'' zero forces the next field to start at the beginning of a word. @subsection trivia The definition of the structure also serves as the name of a type. So you can declare variables of that type at the same time as the type is defined. For example, it is legitimate to write @example struct fontunit @{ char code; int height, width, kern; int *bitmap; @} *nextunit; @end example @noindent But this is not recommended. If you keep the structure definition separate from variable declarations, it is easier to read. @subsection Shadowing and Forward References Shadowing causes problems with forward references. Suppose within the definition of @code{func} above you want to make a forward reference to @code{struct foo} before defining it. A definition of @code{struct foo} is already known, so a declaration such as @code{struct foo *ptr;} would be taken as a use of the existing definition. In order to make a forward reference to the new definition to come, you must first shadow the outer definition with an empty declaration consisting of just @code{struct foo;}. @example struct foo @{ int i, j@}; double func (double x) @{ struct foo; struct foo *ptr; struct foo @{ double i, k; @}; @dots{} return ptr->i + ptr->k; @} @end example @noindent Normally, @code{struct foo} would be a name for the existing structure type. However, when it appears in an empty declaration (one that declares no variables) it is given a special meaning. The empty declaration tells the compiler that @code{struct foo} will be redefined in the current scope, and following uses of @code{struct foo} should be taken as forward references to the coming definition. This ``empty declaration'' feature is supported and described in @code{gcc} because the ANSI C Standard mandates it and you might see programs that use it. Using this feature is a very bad idea. @section Unions @samp{union} @subsection Unions @cindex union @kindex union @dfn{Unions} are a kind of type that allow one block of memory to be regarded as any of several other types. Each union type is defined by specifying the alternative types that are its members. Unions in C are much like structures. The description of unions here assumes that you understand structures. @xref{Structures}. @subsection defining unions A union definition looks like a structure definition except that the keyword @code{union} replaces @code{struct} (@pxref{Structure Def}). Union tag names and structure tag names come from the same name space. This means that, in any one name scope, one particular name may be the name of either a structure type or a union type, but not both. If you define @code{union hack}, you may not also use @code{struct hack}. @subsection accessing unions Union components are accessed using the @samp{.} and @samp{->} operators, just like structure components (@pxref{Structure Ref}). They can be assigned, passed as arguments and returned just like structures (@code{Structure Operations}). There are no constant union values, and type conversion is not possible for unions. @subsection When to use a union There are only 2 reasons to ever use a union: (a) to save space, and (b) to interpret a single piece of hardware multiple ways. The "endian problem" never happens if you don't use unions. @subsection Union Members Here is a sample union definition: @example union element @{ int i; char *s; struct window *w; @}; union element temp; @end example @noindent This union has three members, of three different types. An object of this union type, such as the variable @code{temp} has enough space to hold either an @code{int}, a @code{char *} or a @code{struct window *}, but not two at once. The three members of the union variable @code{temp} can be thought of as three variables of different types that are stored in the same space. The value of the union is valid only for the member that was last used to store in it. For example, if you store an @code{int} into @code{temp.i}, you can refer to @code{temp.i} later to get the same @code{int} value, but @code{temp.s} and @code{temp.w} are invalid and their values are undefined. If you later store a @code{char *} value into @code{temp.s}, you can access @code{temp.s} again to recover the same value, but @code{temp.i} is now undefined. The size of the union is equal to the largest of the sizes of its members. Contrast this with a structure that has the same members: @example struct elements @{ int i; char *s; struct window *w; @}; @end example @noindent This structure has enough space for an @code{int} @emph{and} two pointers side-by-side. All three can be stored in it independently. The size of this structure is (at least) the sum of the sizes of the members. @subsection Alternative-use Storage The example above for list structure (@pxref{Lists}) shows that you need a new structure type for each kind of data you want to put into lists. When you have one type of structure to represent a list of @code{int}'s, you need another structure type for a list of @code{char *} strings, and yet another for a list of @code{struct window *}'s. What if you want to have one list containing @code{int}'s, @code{char *}'s and @code{struct window *}'s, in any random order? This can be done with the union defined in the previous section. Here is the definition again: @example union element @{ int i; char *s; struct window *w; @}; @end example Now we can make a list of @code{union element} values just like a list of anything else: @example struct alt_list_node @{ union element value; struct alt_list_node *next; @}; struct alt_list_node *p; @end example If @code{p} points to a node of a list of this kind, you can extract the value as an @code{int} with @code{p->value.i}, or extract it as a @code{struct window *} with @code{p->value.w}. This is because @code{p->value} by itself is a value of type @code{union element}. But this is not a good solution of the problem. Nothing in the list node tells you whether the value is supposed to be interpreted as an @code{int}, a @code{char *} or a @code{struct window *}. If you refer to the value the wrong way, you will not get an error message, just bizarre results. This problem can be avoided by adding a @dfn{type-code} field to the node structure, making it a ``self-describing'' structure. @ifinfo See the next node. @end ifinfo @subsection Unions and Type-code Fields In the simple list-of-union, it is impossible to tell just by looking at a node whether it contains an @code{int}, a @code{char *} or a @code{struct window *}. So the simple list-of-union structure is useful only when there is some other way for the program to know how each node should be used. Most of the time, it is better to add -- to every node -- information about to interpret the node's value. This is done with an additional field in the node structure, called a ``type code'' field because its value informs us of the type of value in the union. An enumeration type is often just the right thing for this purpose. Here is the modified structure definition: @example struct alt_list_node @{ enum @{ IS_INT, IS_STRING, IS_WINDOW @} code; union element value; struct alt_list_node *next; @}; @end example Then we establish a convention that when the @code{value} field is properly interpreted as an @code{int}, the value @code{IS_INT} is stored in the @code{code} field, and so on. The C language does not enforce this convention. It is still possible to disregard the convention and do @example node->code = IS_INT; node->value.s = "foo"; @end example @noindent But obeying the convention is not hard, and as long as that is done, the meaning of each element of the list is self-evident. @subsection Unions for Type Puns Would you like to know what the bit pattern of a @code{char}-pointer really looks like? Define a union containing types @code{char *} and @code{int} and see. Here is how: @example int ptr_as_int (char *p) @{ union @{ char *p; int i; @} conv; conv.p = p; return conv.i; @} @end example @noindent Here the data is loaded into the union variable @code{conv} as a pointer, then examined as an integer. An example actually used in the GNU C compiler involves storing a @code{double} in a data structure composed of an array of @code{int}s. Two @code{int}'s provide enough room for the bits of the @code{double}, but we need a way to separate it into two words. The following union was used: @example union converter @{ int i[2]; double d; @}; @end example @noindent With this union it is possible to take a @code{double} apart and store it into two @code{int}'s, and later reverse the transformation. Here is a function to take a @code{double} apart, storing the two halves into two locations specified by giving pointers two them: @example void dissect_double(double d, int *l, int *h) @{ union converted conv; conv.d = d; *l = conv.i[0]; *h = conv.i[1]; @} @end example Here is how to reassemble the two halves into an identical @code{double}: @example double reconstruct_double(int l, int h) @{ union converted conv; conv.i[0] = l; conv.i[1] = h; return conf.d; @} @end example @subsection Union Member Addresses In general, the members of a union share a common starting address. The address of any member of the union is equal to that of the union (though their types are different, so in order to compare them in C you must cast one to the other's type). For example, in @example union test @{ int i; char c; @} var; int check_it() @{ return ((int *) &var) == (&var.i); @} @end example @noindent the function @code{check_it} is guaranteed to return 1. @subsection Run-time Endianness Test A union of a @code{char} and an @code{int} can be used to tell how the bytes in an @code{int} are numbered on the machine you are using. This example shows how. @example void endian(void) @{ union @{ int i; char c; @} temp; temp.i = 0; temp.c = 1; if(temp.i == 1)@{ printf("Little-endian\n"); @}else if(temp.i == 1 << 24)@{ printf("32-bit big-endian\n"); @}else@{ printf("Something strange\n"); @}; @} @end example @subsection Unions of Structures Structure types can be used in unions as any other types can. When this is done, the structure fields are obtained from the union with two stages of the @samp{.} operator. The size of the union is, as always, the maximum of the sizes of the fields. A common situation is that a union has several members that are different types of structures. Often two of the structure types start with similar fields, as shown here: @example struct type1 @{ int x; char b; char *name; int size; @}; struct type2 @{ int x; char c; char *name; char text[100]; @}; union u @{ struct type1 t1; struct type2 t2; @}; union u u1; @end example Here both @code{struct type1} and @code{struct type2} start with the sequence @code{int}, @code{char}, @code{char *}. (The field names are not the same, but that is not important.) In this case, it is guaranteed that you will see the same values for those three initial fields regardless of whether you access them through @code{struct type1} or @code{struct type2}. In other words, @code{u1.t1.x} and @code{u1.t2.x} are the @emph{same object}; and @code{u1.t1.b} and @code{u1.t2.c} are also the @emph{same object}. This fact is a consequence of the fact that the compiler lays out structure fields in the order you write them, and their size and spacing depends only on their data types. If the first @var{n} fields of two structure types match in their types, the layout of those fields must also match. [FIXME: this next section may be totally bogus] The code in the previous section can create very confusing source code. Here is an alternate way of specifying exactly the same layout in memory, but is far easier to understand. This demonstrates that structures can contain unions. [FIXME: make the reference over in anonymous structure types point here] @example struct type1 @{ int size; @}; struct type2 @{ char text[100]; @}; struct u @{ int x; char b; char *name; union @{ struct type1 t1; struct type2 t2; @}; @}; struct u u1; @end example The memory layout of this @code{struct u u1} is identical to the previous @code{union u u1}. The code that uses @code{u1} is slightly simplified. All references to @code{u1.t1.x} or to @code{u1.t2.x} must now be replaced with @code{u1.x}, which makes it obvious that they were really referring to the same thing. Other bits of the code that refer to @code{u1.t1.size} or @code{u1.t2.text} still access the same area of memory they did before. [end possibly bogus section] @subsection Trivia @section enumerated types (enum) Enumeration Types @section Renaming (typedef) @section Trivia @node Arithmetic and Bitwise Operators, bool type, Creating New Data Types, Top @chapter Arithmetic and Bitwise Operators @section Arithmetic operators (+ - * / %) @cindex addition (integer) @cindex subtraction (integer) @cindex multiplication (integer) @cindex division (integer) @cindex quotient (integer) @cindex remainder @cindex common type The type of the result depends on the types of the operands. First, if either operand has type @code{short} or @code{char} (either signed or unsigned), it is converted to @code{int} by default promotion. Then the @dfn{common type} of the operands is determined. This is either @code{long unsigned int}, @code{long int}, @code{unsigned int} or @code{int}. The common type is long if either operand is long; it is unsigned if either operand is unsigned. If one operand has an unsigned type and the other has a signed type, the one with the signed type is converted to unsigned and the arithmetic is done on unsigned values. If the signed operand had a negative value, the results may be counterintuitive, because when this value is converted to an unsigned type, it becomes a large positive number. Small negative numbers become positive numbers near the top of the range possible values. For positive numbers, the result of an arithmetic operation is always the same regardless of whether the type of the numbers is signed or unsigned, except when the result is so large that it overflows the range of the type. @kindex + @kindex - @kindex * (binary) @kindex / @kindex % @table @samp @item @var{intexp} + @var{intexp} Addition of two integer expressions @item @var{intexp} @minus{} @var{intexp} Subtraction of two integer expressions @item @minus{} @var{intexp} Negation of an integer expression. Equivalent to @code{0 - @var{intexp}} @item @var{intexp} * @var{intexp} Multiplication of two integer expressions @item @var{a} / @var{b} Quotient of two integer expressions If the exact quotient is not an integer, it is rounded toward zero to make an integer. If @var{b} is negative, the quotient is minus the result of dividing by @code{-@var{b}}. (The handling of negative operands may be different in other implementations of C.) If @var{b} is zero, the division operation raises a signal. It is possible to write a handler for this signal, but usually it is more convenient to test whether the divisor is zero before you do the division. @item @var{a} % @var{b} Remainder of two integer expressions. The remainder is compatible with the quotient: (@var{a} / @var{b}) * @var{b} + @var{a} % @var{b} is equal to @var{a}. If @var{b} is zero, the remainder operation raises a signal. It is possible to write a handler for this signal, but usually it is more convenient to test whether the divisor is zero before you do the division. @end table @section increment and decrement (++ --) @section conversion of types (cast) @section internal representation of numbers in general @section bitwise operators (& | ^ ~ >> <<) @cindex bitwise operations @cindex boolean operations @cindex logical operations [FIXME: we need to use terminology that makes it hard to confuse ``bitwise'' (lots of bits all being operated on at once in a single value) vs. ``boolean'' (a value containing a single bit). Perhaps ``bitwise'' vs. ``logical'' ?] The @dfn{bitwise} operations combine two integers bit by bit. This means that the operands are considered as binary numbers and lined up. The least significant bits (1's bits) of the operands are combined to make the least significant bit of the result; the 2's bits of the operands are combined to make the 2's bits of the result; the 4's bits are combined to make the 4's bit of the result; and so on. The operands are always treated as unsigned in these operations even if they have signed types. Operands of type @code{short} or @code{char} are extended to @samp{int} before the operation is done, so there are always 32 bits to operate on in each operand. [FIXME: a picture or some ASCII Art would make this much easier to visualize. Remember that a @code{int} is not always 32 bits; and sometimes a @code{long int} can be used in a bitwise operation - right ?] Bitwise operations are also called @dfn{boolean} operations because they are modeled on the laws of boolean algebra, and @dfn{logical} operations because ``logical'' is traditionally used for any operation that considers an integer as a sequence of bits.[FIXME: Wrong.] Although the numbers are considered unsigned in order to perform the operation, the data type of the result is not always unsigned. It follows the same rule used for arithmetic operations: it is long if either operand is long; it is unsigned if either operand is unsigned. Here are precise definitions of all the bitwise operations. Bit @var{n} of an unsigned integer @var{a} is @code{(@var{a} >> @var{n}) % 2} (where @samp{>>} stands for right-shift; @pxref{Shifting}). Bit @var{n} of a signed integer is computed by first converting the integer to unsigned. @kindex & (binary) @kindex | @kindex ^ @kindex ~ @table @samp @item @var{a} & @var{b} Bitwise logical-and. Bit @var{n} of the result is 1 if bit @var{n} in both operands is 1. @item @var{a} | @var{b} Bitwise logical-or. Bit @var{n} of the result is 1 if bit @var{n} in either operand is 1. @item @var{a} ^ @var{b} Bitwise logical-exclusive-or. Bit @var{n} of the result is 1 if bit @var{n} is 1 in one of the operands and 0 in the other. @item ~ @var{a} Bitwise logical-not. Bit @var{n} of the result is 1 if bit @var{n} of @var{a} is 0. @end table @section Shift Operators @cindex shifting @kindex << @kindex >> @dfn{Shifting} an integer is defined in terms of the binary representation of the integer. Shifting left means appending binary zeros to the number's representation; this has the effect of multiplying by a power of 2. (If the number is large enough, the most significant digits can be lost by overflow in the process.) Shifting right means discarding binary digits from the right of the number. This has the effect of dividing by 2 and rounding down (to negative infinity). The result of shifting right has the same sign as the operand. This means that the same bit-pattern for the operand produces a different result depending on whether it has a signed or unsigned type. The signed integer @minus{}4 and the unsigned integer @code{0xfffffffc} have the same bit pattern, but when shifted right one place they produce the results @minus{}2 and @code{0x7ffffffe}. These two numbers differ in the highest bit. When applied to unsigned values, the @code{>>} operator uses ``logical'' right shifting --- it brings zeroes into the most significant bits of the result. When applied to signed values, the @code{>>} operator uses ``@dfn{arithmetic}'' right shifting. This brings zeros into the most significant bits for a positive number, and ones into the most significant bits for a negative number. @table @code @item @var{a} << @var{count} Shift @var{a} left by @var{count} places. The result is undefined if @var{count} is negative or if it is larger than 32. @item @var{a} >> @var{count} Shift @var{a} right by @var{count} places. The result is undefined if @var{count} is negative or if it is larger than 32. @end table Here are some examples of shifting, with the values that result. @example 1<<0 == 1 1<<5 == 32 1<<31 == 0x80000000 5<<1 == 10 (-5)<<1 == -10 3>>1 == 1 4>>1 == 2 5>>1 == 2 (-3)>>1 == -2 == 0xfffffffe (-4)>>1 == -2 (-5)>>1 == -3 == 0xfffffffd ((unsigned)-3) >> 1 == 0x7ffffffe ((unsigned)-4) >> 1 == 0x7ffffffe ((unsigned)-5) >> 1 == 0x7ffffffd @end example The ANSI C standard does not specify what happens when a negative number is shifted. In GNU C, we have chosen the meaning we think is most useful. @section Floating Point Arithmetic @cindex arithmetic (floating) @cindex common type The four basic arithmetic operators, @samp{+}, @samp{-}, @samp{*} and @samp{/}, are allowed on floating point operands as well as integer operands. These are the only operations allowed on floating point operands. The remainder operation (@samp{%}) is not meaningful for floating point operands because division of floating point numbers does not round the result to an integer. When the result of arithmetic is outside the range of possible values of its type, this is called @dfn{floating point overflow}. The result of the operation is undefined when overflow happens. When dividing by a negative number @var{b}, the result is the quotient is minus the result of dividing by @minus{}@var{b}. Division by zero has undefined effects, possibly crashing the program. You should test whether the divisor is zero before dividing. When operands of two different floating-point types are combined with an arithmetic operation, the operand of narrower type is converted to the other (wider) operand's type before the operation is performed. The types in order of increasing width are @code{float}, @code{double} and @code{long double}. Floating point and integer operands may be mixed. When this is done, the integer operand is converted to floating point, in the same type as the other operand; then the arithmetic operation is done in that type. @section Trivia @node bool type, expressions, Arithmetic and Bitwise Operators, Top @chapter working with the bool type (true, false, and logical operators) @section @code{bool} values [FIXME: perhaps it would be easier to explain this ``as if'' there were a @code{bool} type - i.e., from the C++ perspective. People who knew nothing of type @code{bool} wrote many C compilers compliant with the ANSI standard. However, many programmers argue that the @code{bool} type is implicit in the C language. A C program compiled on a C++ compiler may create an executable identical to that generated by a C compiler. But the C++ perspective is to say that operators like `<' and `>' return a value of type @code{bool}, and the conditional expression in a if() is cast to a @code{bool}.] A @code{bool} value is either @code{true} or @code{false}. A truth value is either ``true'' or ``false''. C does not have a distinct data type for truth values, as some languages do. (For example, type ``@code{bool}'' in C++). Instead, any numeric type or pointer type can be used as a truth value. A zero value represents ``false'', and any nonzero value means ``true''. Most of the time, it is wise to use only type @code{int} for truth values and to use only the value 1 to mean ``true''. Although there is no special type for truth values, there are special operators in C for creating truth values (comparison operators), combining truth values (truth operators) and using them (conditional expressions and conditional statements). the @var{continue-condition} must have a data type which can be compared against the constant zero, which means an integer zero, a floating point zero, or a null pointer. @xref{branching} @xref{looping} @section Comparison (Relational operators: > >= < <= == !=) @cindex comparison Comparison operators test for equality or ordering of either numbers or pointers. The result of a comparison is an @code{int} which is either 0 or 1. Usually this value is used as a truth value. @table @code @item @var{a} == @var{b} @item @var{a} != @var{b} @item @var{a} < @var{b} @item @var{a} > @var{b} @item @var{a} <= @var{b} @item @var{a} >= @var{b} @end table @section Logical operators (&& || !) The @dfn{truth operators} combine truth values into other truth values. There are three such operators: ``not true'', ``both true'' and ``either one true''. The operands of these operators are used only as truth values: their values are checked only for nonzeroness. The operands may have any type that is acceptable as a truth value, but the result always has type @code{int}. @kindex ! @kindex && @kindex || @table @samp @item ! @var{truthexp} Not true. Value is 1 if @var{truthexp} equals 0; 0 otherwise. If @var{truthexp} represents a condition, @code{! @var{truthexp}} represents the contrary condition. @item @var{truthexp1} && @var{truthexp2} ``And'' for truth values. Value is 1 if both @var{truthexp1} and @var{truthexp2} have nonzero values. If @var{truthexp1} is zero, @var{truthexp2} is not computed at all; its side effects do not take place. @item @var{truthexp1} || @var{truthexp2} ``Or'' for truth values. Value is 1 if either @var{truthexp1} or @var{truthexp2} has a nonzero value. If @var{truthexp1} is nonzero, @var{truthexp2} is not computed at all; its side effects do not take place. @end table The operators @samp{&&} and @samp{||} specify @dfn{conditional execution}. This means that, depending on the value of the first operand, the second operand may or may not be executed. This makes a difference when the second operand has side effects. Consider by contrast @code{0 * (x = 4)}. Its value is always 0, but it has the effect of assigning the value 4 to the variable @code{x}. Here the sub-expression @code{x = 4} is executed unconditionally, even in cases where its value is known in advance to be irrelevant. Most operators in C work this way; all of their operands are executed unconditionally. In addition, the order in which the operands are executed is not specified. The operators @samp{&&} and @samp{||} are unusual: their operands are executed in left-to-right order, and if the ultimate result is determined after the first operand, then the second operand is skipped entirely. Thus, in @code{0 && (x = 4)}, since the first operand makes it certain that the value is zero, the second operand is not computed and @code{x} is not changed. In @code{y && (x = 4)}, @code{x} is changed only if @code{y} is nonzero. Only one other C expression, the conditional expression, can omit execution of some of its operands (@pxref{Conditional Expr}). @section Conditional Expressions @cindex conditional expression @kindex ? : A conditional expression lets you select one of two expressions based on a truth value expression. It looks like this: @example @var{truthexp} ? @var{val1} : @var{val2} @end example @var{truthexp} must be a number or a pointer. If @var{truthexp} is nonzero, @var{val1} is computed and its value is used. Otherwise, @var{val2} is computed and its value is used. Exactly one of @var{val1} and @var{val2} is computed. If @var{val1} and @var{val2} have the same type, that may be any type, and the conditional expression has the same type. (Array and function types are excluded: if either @var{val1} or @var{val2} is an array or function then it is converted to a pointer ``before'' the conditional expression ``sees'' it.) In addition, the following cases of different types are allowed: @itemize @bullet @item Both types are numbers. In this case, the type of the conditional expression is determined as if the two numbers were being added together. @item One operand is void. Then the other operand may have any type, but the result is void. @item One operand is a pointer and the other operand is zero. Then the value is a pointer of the same type. @end itemize In all of these cases, either @var{val1} or @var{val2}, whichever is selected, is converted to the appropriate result type. Here are some examples of conditional expressions: @example (3 > 1) ? 5 : 2 => 5 (3 < 1) ? 5 : 2 => 2 *p == 0 ? "end of string" : 0 @end example The last example has type @code{char *} and its value is either the constant @code{"end of string"} or a null pointer. @section Trivia Overwhelmingly used in if() statements. ``Boolean operators'', ``Relational Operators'', ``Truth Operators'', and ``Logical Operators'' are different ways of saying the same thing. A ``boolean variable'' can either be true or false; these are often called ``flags''. Many people think that the keywords defined in @code{#include } are much easier to read. This ISO standard defines the keywords @code{and and_eq bitand bitor compl not or or_eq xor xor_eq not_eq} to be exactly equivalent to @code{&& &= & | ~ ! || |= ^ ^= !=} [FIXME: is this true ? is there no bitor_eq ?] @node expressions, = and side effects, bool type, Top @chapter expressions @vindex expressions @vindex operator precedence @vindex precedence @section Precedence [FIXME](Table to be included when I know how to do tables in texinfo.) @section assigning a value to an expression (var = X) @section Trivia According to ANSI, there is no precedence in C; instead, there are many types of expressions. Although their terminology is very different, the net effect is identical to the (hopefully easier to understand) ``associativity and precedence system'' terminology in this reference manual. @node = and side effects, evaluation order, expressions, Top @chapter assignments and side effects @section Simple Assignment @kindex = @cindex lvalue @cindex assignment Simple assignment is done with the operator @samp{=}. On the left of the @samp{=} is a place to store a value; this can be a variable, a structure element, an array element, or the place a pointer points. Expressions that are allowed on the left of an @samp{=} are called @dfn{lvalues} (left-side values). On the right of the @samp{=} is an expression for the value to be stored. Let's call them @var{l} and @var{r}. If @var{l} and @var{r} have the same type, it may be any type except for void, array and function types. (If @var{r} is an array or function then it is converted automatically to a pointer before the assignment ``sees'' it.) In addition, the following cases of mixed types are allowed: @itemize @bullet @item Both @var{l} and @var{r} have numeric types. Then @var{r} is automatically converted to @var{l}'s type and the result is stored in @var{l}. @item @var{l} has a pointer type and @var{r} is the integer 0. Then a null pointer is stored in @var{l}. @end itemize An assignment is an expression, and therefore has a value. This value is the altered value of @var{l}. However, the expression is not a lvalue; it may not be used as the operand of unary @samp{&} or as the left side of another assignment. @section Modifying Assignment @cindex modifying assignment The @dfn{modifying assignment} operators abbreviate an arithmetic operation combined with an assignment. Any arithmetic operator can be used. These operators do not add any power to the language, but they are often convenient. Let's take the most commonly used modifying assignment operator, @samp{+=}, as an example. @code{@var{l} += @var{r}} is an abbreviation for @code{@var{l} = @var{l} + @var{r}}. It means that the value of @var{r} is added into @var{l}, not simply stored into @var{l}. Like simple assignments, modifying assignments are expressions and have values. The value of any assignment is the new value of @var{l}. However, the expression is not an lvalue; it may not be used as the operand of unary @samp{&} or as the left side of another assignment. The rules for the types allowed in modifying assignments follow from the rules for types in simple assignments and in arithmetic operators. It must be possible to combine @var{l} and @var{r} with the arithmetic operator used, and the result must be able to be stored into @var{l}. The following modifying assignment operators are allowed with @var{l} and @var{r} having any numeric types, and are also allowed if @var{l} is a pointer type and @var{r} is an integer. @table @code @item @var{l} += @var{r} This expression increments @var{l} by the addition of @var{r}. [FIXME: way too much passive voice around here.] @item @var{l} -= @var{r} The value of @var{l} is decremented by the subtraction of @var{r}. @end table The following modifying assignment operators are allowed whenever @var{l} and @var{r} both have numeric types (either integer or floating). It is not necessary for @var{l} and @var{r} to have the same type; in fact, one may be integer and the other floating. @table @code @item @var{l} *= @var{r} The value of @var{l} is altered by multiplication by @var{r}. @item @var{l} /= @var{r} The value of @var{l} is altered by division by @var{r}. @end table The following modifying assignment operators are allowed whenever @var{l} and @var{r} both have integer types. They need not have the same types. @table @code @item @var{l} %= @var{r} The value of @var{l} is changed to its remainder in division by @var{r}. @item @var{l} &= @var{r} The value of @var{l} is altered by logical-and with @var{r}. This clears all bits in @var{l} that are clear in @var{r}. @xref{Bitwise}. For example, @code{x &= ~4} clears the 4's bit in @code{x}, leaving all other bits in @code{x} unchanged. @item @var{l} |= @var{r} The value of @var{l} is altered by logical-or with @var{r}. This sets all bits in @var{l} that are set in @var{r}. @xref{Bitwise}. For example, @code{x |= 4} sets the 4's bit in @code{x}, leaving all other bits in @code{x} unchanged. @item @var{l} ^= @var{r} The value of @var{l} is altered by logical-exclusive-or with @var{r}. This complements all bits in @var{l} that are set in @var{r}. @xref{Bitwise}. For example, @code{x ^= 4} complements the 4's bit in @code{x}, leaving all other bits in @code{x} unchanged. @item @var{l} <<= @var{r} The value of @var{l} is altered by shifting it @var{r} places to the left. The usual rules for shift operators apply. @xref{Shift}. @code{x <<= 3} has the same effect as @code{x *= 8} (but the former is restricted to integers, while the latter is permitted for floating point numbers as well). @item @var{l} >>= @var{r} The value of @var{l} is altered by shifting it @var{r} places to the right. The usual rules for shift operators apply: for example, the sign of the value of @var{l} is preserved. @xref{Shift}. @code{x >>= 3} is @emph{not} equivalent to @code{x /= 8}; they both divide by 8, but they round the result differently when @code{x} is negative. @end table @section Increment Operators @kindex ++ @kindex -- @cindex increment operators The operation of adding or subtracting 1 is so common that there are two special operators for this: @samp{++} for adding 1 and @samp{--} for subtracting 1. These operators may be used on either numbers or pointers. (Recall that adding 1 to a pointer actually increments the pointer address by the size of the object it points to; @pxref{Pointer Add}). When used as prefix operators, preceding a modifiable lvalue, these operators are equivalent to modifying assignments. @code{++@var{l}} is equivalent to @code{(@var{l} += 1)} and @code{--@var{l}} is equivalent to @code{(@var{l} -= 1)}. Aside from syntax, there is no difference. @xref{Modifying Assignment}. The increment and decrement operators may also be used as @dfn{postfixes}, following a modifiable lvalue. Prefix or postfix, the effect on the lvalue is the same. But the value of the expression is different. In the prefix case, the value of the expression is the altered value of the lvalue. In the postfix case, the value of the expression is the @emph{original} value of the lvalue; the value it had before being incremented or decremented. When the increment operator comes before the lvalue, it means that the value is incremented before you see its value. When the increment operator comes after, the lvalue is incremented after you see its value. Here is an example. Suppose that @code{x} contains 5. Then after @example y = x++; z = ++x; @end example @noindent @code{y} will contain 5, and both @code{z} and @code{x} will contain 7. @strong{Note: it is dangerous to use increment operators more than once in a statement. The C language does not specify when the incrementation will take place.} @xref{Order}. Referring to the incremented lvalue elsewhere in the expression is also dangerous: for example, @example x = 5; y = x + x++; @end example @noindent @c might set @code{y} to either 10 or 11. @c Wrong! It has no defined effects; it would be perfectly legal to set y = 15 or 43 or -12 leaves @code{y} undefined; it is most likely 10 or 11, but could be any value. @section Lvalues @cindex lvalue @cindex modifiable lvalue Expressions that have addresses are called @dfn{lvalues}. Only lvalues can be used as operands of the unary @samp{&} operator (@pxref{Address}). Most kinds of lvalues can also be assigned new values with assignment operators; such lvalues are called @dfn{modifiable lvalues}. @xref{Assignment}. The simplest kind of lvalue is a variable or function name. Variable and function names are always lvalues. Only variable names can be modifiable lvalues; in addition, a variable name is not modifiable if it was declared @code{const} or if its type is an array type. In addition, the following operators can produce lvalues: @table @code @item * @var{ptr} The location that a pointer points to is always a lvalue. In fact, its address is equal to @var{ptr}. @xref{Contents}. This lvalue is modifiable provided that it is not @samp{const} and is not a function or array. @item @var{array}[@var{idx}] This expression is equivalent to @code{* ((@var{array})+(@var{idx}))}, so it is always a lvalue. Whether it is modifiable can be determined from its type as described under @samp{*}. @xref{Array Ref}. @item @var{structure} . @var{member} This expression is a lvalue if @var{structure} is a lvalue. It is modifiable if @var{structure} is modifiable, with two exceptions. First, if the member @var{member} is itself declared @code{const} in the definition of the structure type, then it is not modifiable (but other members of @var{structure} may be modifiable). Second, if member @var{member} is declared as an array, it is not modifiable. (The array's @emph{elements} may be modifiable, but not the array as a whole.) @xref{Structure Ref}. Unions are equivalent to structures in this regard. @item @var{structureptr}->@var{member} This expression is equivalent to @code{(*@var{structureptr})->@var{member}}. Therefore, the conditions stated above for the operator @samp{.} apply. @code{*@var{structure}} is certainly a lvalue; therefore, this expression is always a lvalue and it is modifiable provided @var{member} is not declared @code{const} and its type is not an array. @xref{Structure Ref}. @item (@var{exp}) This is a lvalue if @var{exp} is, and is modifiable if @var{exp} is. Parentheses are for syntactic grouping only; they have no semantic significance. @end table @node evaluation order, sequence points, = and side effects, Top @chapter Precedence vs. order of evaluation With a few specific exceptions, the order of execution of the parts of a C expression is unspecified. This means that when an expression contains more than one side effect or function call, they can happen in any order. In particular, precedence does @emph{not} determine order of evaluation - in fact, nothing does, unless explicitly noted otherwise. For instance, with @example x = f() * g() + h(); @end example @noindent some (but not all) systems call @samp{h()} first. We are only guaranteed that all the functions were called, and that the values returned by @samp{f()} and @samp{g()} will be multiplied, and the result added to the value that was returned by @samp{h()}. Another example: @example x = foo() + bar() * (z++); @end example You cannot tell whether @code{foo()} will be called before or after @code{bar()}. In addition, @code{z} might be incremented at any time --- before the first function call, between the two calls, or after them both. If @code{z} is a global variable which is also used by the function @code{foo} or by @code{bar}, this might make an important difference. You might think that the fact that @code{z} is post-incremented makes a difference here, but that is not so. Post-increment means that the value used to multiply by is the value @code{z} has before it is incremented. But it is possible to save that original value and increment @code{z} early on; it is also possible to use @code{z} in the multiplication, after computing @code{bar()}, and only then increment it. In this example, the multiplication must be done before the addition, but that does not mean that the @emph{operands} of the multiplication must be computed first. It is possible to compute @code{foo()} first, then compute @code{bar()} and @code{z++}, then do the multiplication and the addition. Parentheses have no effect on this issue. Rewriting the above example as follows would make no difference: @example x = foo() + (bar() * (z++)); @end example If a variable (or any lvalue) is affected by a side effect in an expression, even a reference to that variable elsewhere in the expression is ambiguous, because the reference might take place before or after the side effect. For example, @example x = *p++ + *p++; @end example @noindent (which might be intended to fetch two characters from a string and add them) is not safe. It looks as though it is equivalent to @example x = p[0] + p[1], p += 2; @end example @noindent but it might perform both incrementations after both references. Then it would be equivalent to @example x = *p + *p, p += 2; @end example @node sequence points, branching, evaluation order, Top @section sequence points @cindex sequence point @vindex sequence points Above we mentioned that the general rule, that the order of sub-expressions is indeterminate, has some exceptions. These exceptions are called @dfn{sequence points}. Sequence points cause all previous side effects to complete before any new ones begin. The easiest way to explain what a sequence point means is to describe one example. Every function-call expression has a sequence point after its sub-expressions, immediately before the function is called. This means that the arguments of the function call must be computed completely, including any side effects, before the function is actually called. It also means that nothing outside the function call can be intermixed with the sub-expressions of the call. Concretely, this means that in @example x = foo(*p++); @end example @noindent you can be sure that @code{p} has already been incremented when @code{foo} is called. Also, in @example x = bar() + foo(*p++); @end example @noindent you can be sure that the reference to and incrementation of @code{p} form a unit with the call to @code{foo}. The call to @code{bar} must come before them all or after them all; it cannot come among them. [FIXME: Is this really true ?] The three C operators that specify conditional execution --- @samp{&&}, @samp{||} and @samp{? @dots{} :} --- all have sequence points after the first sub-expression is computed. This means that the first sub-expression must be entirely executed before the second or third one can begin. It also means that nothing outside of this expression can be intermingled with it. For example, in @example x = foo() + (bar(*p++) && lose()); @end example you can also be sure that @code{p} is already incremented at the time @code{bar} is called. Also, @code{foo} must be called either first or last; it cannot come between or within the operands of the @samp{&&}. @cindex top-level expression Sequence points appear at two other places: in the @samp{,} (compound expression) operator and at the statement level. A @dfn{top-level expression} is an expression that is not contained in any larger expression. (Whether an expression is top-level depends entirely on its context.) For example, in @example if(*p++ != 0)@{ fputc(*p++, outfile); @}; @end example @noindent there are two top-level expressions: @code{*p++ != 0} and @code{fputc(*p++, outfile)}. Every top-level expression is followed by a sequence point. Therefore, in the above example, the first incrementation of @code{p} must be complete before the decision about the @code{if} statement is made. To summarize, there are sequence points: @itemize @bullet @item between the evaluation of the arguments of a function and the call of the function, @item After the first sub-expression of @samp{&&}, @samp{||}, @samp{?} and @samp{,}, is computed @item between statements @item at the beginning and end of each of the expressions associated with any @samp{if()}, @samp{while()}, @samp{switch()}, or @samp{for()} statement. [FIXME: do@{@}while() should be included here, right ?] @end itemize This leads to a critical point. @emph{It is not legal to modify the same object twice between sequence points.} An expression like ``@code{i = ++i;}'' is not merely ambiguous; it invokes undefined behavior. The code is not legal, and the program is entitled to abort when this expression is reached. (Or possibly during compilation.) This also applies to more common situations, such as: @example printf("Next two i values: %d, %d\n", i++, i++); @end example @noindent Many compilers will give the expected results from such code, but will change the results they give based on optimization, target processor, or new releases. Such code should be avoided at all costs; it is never conceptually sound. @node branching, looping, sequence points, Top @chapter branching @section if() statement The @code{if} Construct @section Multiple choice if() statements @section Multiple choice switch() statement @section break statement @section default statement @section Trivia @node looping, functions, branching, Top @chapter looping @cindex loops [FIXME: too much redundancy in this section] A @dfn{loop} is a part of a program that (can or does) execute several times in succession. Every loop must contain two things: its @dfn{body}, a piece of program to execute repeatedly, and its @dfn{exit condition}, which says when to stop the repetition. In order for this to be useful, the body usually must do something different each time it is repeated. (If it produces the same result each time, you may as well execute it only once, and an optimizing compiler may arrange to do just that.) Because loops abound in most programs, the C language has three special constructs which make loops convenient to write; these are the @code{for()@{@}} statement, the @code{while()@{@}} statement and the @code{do@{@}while()} statement. Each of these statements contain a smaller statement which is the body of the loop. @section for()@{@} loops The @code{for()@{@}} statement is the most complicated loop construct in C, but it makes for the easiest example. Here is a simple example of one: @example for(x = 0; x < 10; x++)@{ printf("%d squared is %d\n", x, x * x); @}; @end example @noindent This loop prints the numbers from 0 to 9, each followed by its square. The body of the loop is the second line, the call to @code{printf} which prints a single number and its square. The first line is what converts the body into a loop; it says how many times to repeat the loop body, and how to vary the effects by changing the value of @code{x} each time. The variable @code{x} is called the @dfn{loop counter} because it counts the number of times the body has been executed. Not every loop has a loop counter variable, but many loops have them. When using a loop counter to control a loop, you must specify three things: @itemize @bullet @item The initial value for the loop counter (zero in this example). @item How to increment the counter for each repetition (add one in this example). The counter is incremented after each repetition of the body. @item The loop's exit condition, which here is expressed in terms of the loop counter. In C, exit conditions are expressed in inverse; you write the condition for @emph{not} exiting. In this example, the loop does not exit as long as @code{x < 10}, which means it does exit when @code{x} is 10. @end itemize @subsection The @code{for} Construct @kindex for Often a loop has initialization and incrementation code that conceptually belong to it. If you use @code{while}, the initialization code must precede the loop and the incrementation code must be part of the body. This can result in a correct program but it does not emphasize the purpose served by the initialization and incrementation code. The @code{for} construct provides a way to write the loop and emphasize the special relationship between the loop and its initialization and incrementation code. In general, a @code{for} loop looks like this: @example for(@var{init}; @var{test}; @var{step}) @var{body} @end example @noindent Here @var{init}, @var{test} and @var{step} are expressions; @var{body} is a statement. The same loop can be written as follows with @code{while}: @example @var{init}; while(@var{test}) @{ @var{body} @var{step}; @} @end example Any of @var{init}, @var{test} and @var{step} may be omitted (left blank). If @var{test} is omitted, it is equivalent to using 1 (which means the loop executes forever unless it is exited in another way). If @var{init} or @var{step} is omitted, it means there is no initialization or no stepping needed. The construct @samp{for(,,)} is sometimes used to write an infinite loop, but I recommend @samp{while(1)} instead. As you can see, the difference between @code{for} and @code{while} is mainly one of form, not substance. The only case in which there is a substantive difference is when @code{continue} is used in @var{body}; the effect of @code{continue} in a @code{for} is different from its effect in a @code{while}. The @code{continue} statement is defined to jump to the end of the loop body; in the @code{for} loop, this means the end of @var{body}, but in the equivalent @code{while} loop the end of the body is after @var{step}. @xref{Continue}. @subsection while()@{@} loops @kindex while The @code{while} construct is the most basic loop construct in C. It looks like this: @example while(@var{continue-condition})@{ ... @var{body} ... @}; @end example The @code{while} first tests the @var{continue-condition}. If it is false, the @var{body} is skipped and the loop exits. Otherwise, the @var{body} is executed; and then the @code{while} tests the @var{continue-condition} again, and so on. Note that the @code{while} does *not* continuously test the @var{continue-condition} - if the @var{continue-condition} suddenly becomes false partway through the @var{body}, that fact is ignored. Only after the body has completely finished executing will the @var{continue-condition} be checked again. In order for the body to do additional work when on each repetition (rather than redoing the same job), it must change the values of some variables each time and look at them the next time. Here is how we could rewrite the previous example to use @code{while}: @example x = 0; while(x < 10)@{ printf("%d squared is %d\n", x, x * x); x++; @} @end example @noindent Here the body of the loop is a compound statement, which is the pair of braces and everything they contain. The body includes the call to @code{printf} which does the work, and the @code{x++} which arranges for the next repetition to do a different piece of the work. This particular example looks better with @code{for}; it is just the case for which @code{for} was invented. The reason that @code{while} is useful is that not all loops look just like this. Here is a different kind of loop, one that calculates an approximation to the square root of a number: @example double sqrt(double d, double error)@{ double s = (d + 1) / 2; while((s * s) - d > error || (s * s) - d < - error)@{ s = (s + d/s) / 2; @}; return s; @} @end example @noindent This function takes two arguments: @var{d}, the (positive) number whose square root is desired, and @var{error}, which says how accurate you want the answer to be. The answer will square to a value that differs from @var{d} by at most @var{error}. The algorithm used is called ``Newton's method''. Each repetition of the body brings the value of @code{s} closer to the true square root. The loop exit condition tests whether the square of @code{s} is as close as desired. Note that this is not the best way to write a square-root function; it was chosen to be a good example of a loop. A loop is often used to walk down a chain of structures connected by pointers. For example, suppose the following structure is used for recording a list of names of files to be deleted: @example struct deletable@{ struct deletable *next; char *filename; @}; @end example @noindent The @code{next} field points to the next structure in the chain; the last one in the chain has a null pointer (zero, in other words) in its @code{next}. Here is a function to delete all the files named in such a list: @example void delete_files(struct deletable *list) @{ struct deletable *tail = list; while(tail)@{ unlink(tail->filename); tail = tail->next; @}; @} @end example @noindent The reason for the local variable @code{tail} is so that, if you stop the program inside this function with a debugger, you can still see the original value of the argument @code{list} even after the loop has executed a few times. Here is the same function rewritten to use @code{for} instead of @code{while}: @example void delete_files(struct deletable *list) @{ struct deletable *tail; for(tail = list; tail; tail = tail->next)@{ unlink(list->filename); @}; @end example If the @var{continue-condition} is zero the first time it is tested, then the loop exits immediately, and the @var{body} is not executed at all. @section do@{@}while() loops @subsection The @code{do}-@code{while} Construct @kindex do @kindex while The @code{do}-@code{while} construct is a variant of the simple @code{while} construct. It looks like this: @example do @var{body} while(@var{continue-condition}); @end example Here @var{body} is a single statement, which is to be repeated; if you wish to use multiple statements in the body, group them inside of braces to make a compound statement. The @var{continue-condition} is tested after each execution of the @var{body}, and its value is compared against zero. (It must have a data type that can be compared against the constant zero, which means an integer, a floating point number or a pointer.) If the value is zero, the loop exits. Otherwise, the @var{body} is executed and then the @var{continue-condition} is tested again, and so on. The body of a @code{do}-@code{while} construct is always executed at least once, because the @var{continue-condition} is not tested until afterward. Contrast this with the simple @code{while} construct in which the @var{continue-condition} is tested before the first execution of the @var{body}. If the @var{continue-condition} is false when the loop is entered, a simple @code{while} will detect this immediately and the @var{body} will not be executed, whereas the @code{do}-@code{while} construct will execute the @var{body} once. @subsection Exiting a Loop with @code{break} @kindex break The @code{break} statement exits the innermost @code{for} or @code{while} loop (or @code{switch} construct, but that's another story). It looks exactly like this: @example break; @end example Here is a trivial example of using @code{break}: @example x = 0; while(1)@{ if(x >= 10) break; printf("%d squared is %d\n", x, x * x); x++; @} @end example @noindent This is our first example of a loop, rewritten to use @code{break} to exit the loop when @code{x} reaches 10. It prints the numbers from zero to 9, each followed by its square. This is not a very useful example, because here @code{break} does just what a nontrivial @code{while}-condition would do. The case where @code{break} is really useful is where you would like @dfn{multiple exits} from the loop. Suppose you want to print the numbers from 1 to @var{n}, printing the square of each even number and the cube of each odd number. Here is one way to do it: @example x = 1; while(x <= @var{n})@{ printf("%d cubed is %d\n", x, x * x *x); x++ if(x >= @var{n}) break; printf("%d squared is %d\n", x, x * x); x++ @}; @end example @noindent Since we don't know whether @var{n} is even or odd, we have to be prepared to exit the loop after each place the loop counter @code{x} might reach @var{n}. As you can see, @code{break} is used inside an @code{if} statement. This is always the case when @code{break} is used to exit a loop, because the alternative is useless. For example, consider this program: @example while(x)@{ win(4); break; lose(3); @}; @end example @noindent This is a silly program because @code{lose} will never be called; in fact, it is equivalent to the following: @example if(x) win(4); @end example Another function of the @code{break} statement is to exit a @code{switch} construct. In fact, it will exit either a loop or a @code{switch}, whichever more narrowly surrounds the @code{break}. This has the unfortunate result that you cannot enclose a @code{switch} in a loop and then use @code{break} to exit the loop in one of the cases of @code{switch}. In such a case, use a @code{goto} (@pxref{goto}). @subsection Exiting a Loop Body with @code{continue} @kindex continue The @code{continue} statement is a close relative of the @code{break} statement. It is used in a loop to jump to the end of the loop's body. It looks precisely like this: @example continue; @end example When @code{continue} is executed inside a @code{while} loop, the next thing that happens is that the @code{while}-condition is tested to see whether the loop should repeat. When @code{continue} is executed inside a @code{for} loop, as shown here, @example for(@var{init}; @var{test}; @var{step})@{ @dots{} if(@dots{}) continue; @dots{} @}; @end example @noindent the next thing executed is @var{step}. It is usually best to avoid the use of @code{continue} and put the rest of the body inside of an @code{if} construct instead. The skeleton shown above would then look like this: @example for(@var{init}; @var{test}; @var{step})@{ @dots{} if(! @dots{})@{ @dots{} @} @} @end example It is reasonable to use @code{continue} when there are several places in the loop where you want to use it, and the alternative is several @code{if} constructs which nest deeply enough to be harder to read than @code{continue}. Because @code{continue} is used infrequently, people reading the code will not be expecting it. They may study parts of a large loop with the unconscious assumption that @code{continue} is not in use, and thus be confused. To prevent such confusion, write prominent comments at the beginning and end of the loop, and/or where @code{continue} is used, to call the reader's attention to it. @section Nested loops @section Infinite loops @section Loops without a body @section Things C lacks There is no "arithmetic right shift" (shift in the sign bit) in C. signed int a = (-1) >> 1 is implementation defined. See http://www.lysator.liu.se/c/schildt.html section 6.3.7 . (arithmetic right shift can be emulated with #define ashr(a,n) ((a)<0) ? ~( (~(a))>>n ) : ((a) >> n) ) There is, however, a "logical right shift" (shift in zeros) in C. unsigned int a = ((unsigned int)(0) - (unsigned int)(1)) >> 1; is well defined. Right shifts on unsigned integral values always shift in zeros. Left shifts are also well defined (they always shift in zeros). There is no rotate left or rotate right operators in C. One cannot define a function to accept a standard C array of more than 1 dimension unless most of those dimensions are fixed. (Only along the "last" dimension can the array be variable size). (This leads to lots of array and matrix libraries; everyone seems to make their own variable-sized array type). @section choosing a loop appropriate to the situation @section Goto and Labels The @code{goto} statement allows you to specify precisely the transfers of control within a C function. First you define a @dfn{label} at each place you wish to transfer control to. To do the actual transfer, write @samp{goto @var{label};} specifying the label for the place you want to go. @subsection A Pedagogical Example of @code{goto} The best example to clarify the use and meaning of @code{goto} is a completely unrealistic one. Here we rewrite our favorite loop example: @example @{ int x = 0; while(x < 10)@{ printf("%d squared is %d\n", x, x * x); x++; @} @} @end example @noindent (which prints the numbers 0 to 9 and their squares) using @code{goto} and labels: @example @{ int x = 0; loop_start: if(x >= 10) goto loop_exit; printf("%d squared is %d\n", x, x * x); x++; goto loop_start; loop_exit: ; @} @end example Here the two transfers of control, which actually occur in every loop, are shown precisely and explicitly. At the end of the loop, execution returns to the beginning of the loop (the label @code{loop_start}) where the exit condition is tested. And if the exit condition is true, control transfers to a point following the loop (the label @code{loop_end}). This is a very bad way to write a loop. Not only is it much messier than using @code{while}, but the GNU C compiler's loop optimizer won't realize it is a loop and won't optimize it. Note how @samp{loop_exit:} is followed by a semicolon. This is because every label must be followed by a statement. A label followed immediately by a close-brace is not valid C (some other languages allow this). The semicolon, making a null statement, fulfils the requirement to have a statement there. Note also how the labels are written with less indentation than the statements surrounding them. This is a convention that we recommend, and which Emacs automatically performs. The scope of a label name is the usually the entire function that it is found in. In other words, you can't have two labels with the same name in a function, not even if they are in different levels of braces. @subsection Real Examples of @code{goto} Here is a real example of using a @code{goto} to do what @code{for} and @code{break} cannot conveniently do. This example scans a two-dimensional array in row major order until it finds an element that is zero: @example @{ int i, j; for(i = 0; i < n; i++) for(j = 0; j < n; j++) if(a[i][j] == 0) goto end; end: ; @} @end example @noindent A similar scan of a one-dimensional array would use @code{break} to exit, but @code{break} exits only the innermost loop. Likewise, @code{goto} is actually used when exiting a loop from within a @code{switch}. A completely different occasion for using @code{goto} is when two cases of a @code{switch} start out different but finish identically. For example, consider this @code{switch} which handles several binary arithmetic operands for a hypothetical compiler. Assume that the variable @code{operator} is a character, an arithmetic operator, and @code{left} and @code{right} are expression-objects @example @{ switch(operator)@{ case '+': operation = PLUS; goto binop; case '-': operation = MINUS; goto binop; case '*': operation = TIMES; binop: result = build_expression(operation, left, right); case '!' result = build_expression(NOT, NULL, right); @} return(simplify(result)); @} @end example @section Trivia If the body of the loop (the stuff between the @code{@{@}} sqiggly brackets) contains exactly 1 statement (no more, no less), then the brackets are optional. There are 3 ways to exit a loop: The normal way: set the @var{continue-condition}) to false (zero), then wait for the loop to check it again. Immediately exit a loop with a @code{goto} (@pxref{goto}) or a @code{break} statement (@pxref{break}). ``iteration'' is a synonym for looping. @node functions, scope, looping, Top @chapter functions @section when to use functions [FIXME: what needs to go here ?] @section function declarators and function prototypes @cindex function prototype Every function must be declared, with a @dfn{declaration}, before it can be used. (There is one exception: function names may be implicitly declared; @pxref{Implicit Decl}). A function type is a derived type, and cannot be the basic type of a declaration. To declare a function, you must also specify ``What type of thing does this function return as output ?'' (the type of the value of a call to the function) as well as ``What sort of thing(s) does this function take as input ?'' To declare @var{f} with type ``function-accepting@var{i}-and-returning-@var{t}'', you must declare the complex declarator @code{@var{f}(i)} to have type @var{t}. For example, @example int round(double); @end example @noindent declares @code{ round } to be a function returning a @code{ int }. Here the declarator is @code{ round(double) } --- a complex declarator that expresses the relationship between @code{ round }'s type and the declaration's basic type (@code{ int }). When declaring a function type, write the arguments and their types, separated by commas, inside the parentheses that follow the function name (or nested declarator). If the function takes no arguments, write @code{(void)}. A function declaration that specifies the argument types is called a @dfn{function prototype}. [FIXME: is this correct terminology ? Can the first line of a full function definition be called a ``function prototype'' ?] To declare functions returning types that are not themselves basic, the @code{@var{func}()} construct is nested within other declarator constructs. For example, a function returning a string (a pointer-to-@code{char}) is declared as follows: @example char *get_name(void); @end example When you define a function, the function definition itself serves as a declaration for the rest of the source file. But often a separate declaration of the function, separate from the full definition, is mandatory. This declaration, separate from the full definition, is called a ``function prototype''. @itemize @bullet @item If a function is used in several files, the function should be declared (with a function prototype) in one and only one file: a ``.h'' header file. That header file should be @code{#include}ed in the ``.c'' file that fully defines the function as well as every ``.c'' file that uses that function. (But alas, many programs do not take this advice). @item Even if a function is only used in one file, the same file that defines the function, a separate declaration may be needed if there are calls to the function before it is completely defined. (This always happens with recursive functions). @end itemize A function type in C applies only to particular functions, each defined in one particular place in the program. Some other languages have ``function variables'' --- variables whose values are functions. In C, pointer-to-function types are used for this purpose. The sorts of function definitions in this section, that have the function declaration immediately followed by a semicolon, are called ``function prototypes''. @section function definition (actual code within) You can think of a function definition as an initialized function declaration, with the body of the function taking the place of an initializer. This sort of full function definition, that includes both the function declaration immediately followed by a block of code in squiggly braces, is sometimes called the ``function implementation''. There are plenty of examples of this in the sections below. It is usually best if the prototype of a function is exactly character-for-character identical to the function declaration part of the function implementation. [FIXME: there has got to be an easier-to-understand way of saying this ...] @section returning a value (or returning @code{void}) @cindex void type @kindex void The data type @code{void} is a type that has only one possible value. No operations on this value are allowed. Variables may not have type @code{void}, and neither may structure elements. You might well ask why @code{void} exists. In fact, the type @code{void} by itself is rarely used; but other types derived from @code{void} are useful. Both pointers to @code{void} and functions returning @code{void} have important uses. (Arrays of @code{void} are not allowed). When a function is not supposed to return any interesting value, declare it to return type @code{void}. Then any attempt to use the value returned by a call to the function will automatically get an error message. @example void foo(x) double x; @{ extern double *next; if(x == 0)@{ return; @}; *next++ = x; @} @end example This function does not return a useful value --- it uses @code{return} statements with no value, and ends by falling through --- so attempting to use its value would be undefined. Declaring @code{foo} to return type @code{void} causes expressions such as @code{foo(1.1)} to have type @code{void}; therefore, they cannot appear in a context where a value will be used. Remember that if you do not specify the type of value that a function returns, the default is @code{int}, not @code{void}. @section Casting to @code{void} @code{void} itself is sometimes used in casts, to emphasize that a value is not used (@pxref{Casts}). For example, @example (void) foo(x) @end example @noindent (where @code{foo} is a function that returns a value which, in this case, is not being used) can be used to placate the program @code{lint}. @section Header files, functions and libraries @section Trivia A function cannot return an array, or in other words the array declarator construct cannot directly contain the @code{@var{func}()} construct. However, a function *can* return a pointer to an array, or in other words the array declarator *can* directly contain the pointer construct which in tern can contain the @code{@var{func}()} construct. @example /* a function that, when called, returns a single pointer, which points to an array of strings */ char (*(get_names(void)))[]; @end example Complicated declarations like this can often be simplified using typedefs. When declaring a function, the names used inside the function for its arguments are technically optional. @example void get_answer( char * question, char * answer ); @end example @noindent could be declared @example /* poor style --- which is which ? */ float average_foot_smelliness( char *, char * ); @end example @noindent In fact, even the types of arguments expected by the function are technically optional; this function could legally be declared @example /* horrible style - how many arguments do I need ? */ float average_foot_smelliness(); @end example @noindent instead. Declaring the types of the arguments is useful because then the C compiler can validate each call to the function, checking that the number and types of arguments written in the call matches what the function expects. It can also convert the argument you supply automatically to the type that is required. For example, in @example @{ double sin(double x); return sin (1); @} @end example @noindent the integer 1 is converted automatically to a @code{double} before it is passed to the @code{sin} function. But in @example @{ double sin(); return sin (1); @} @end example @noindent the integer 1 is passed as an integer because the compiler does not know any better. This would produce completely undefined results because the bit pattern of the integer would be interpreted as (part of) a @code{double} when the @code{sin} function actually runs. Declaring the names of the arguments doesn't really help the compiler any, but it can help people who use the prototypes in the header files you made as a quick summary of what the arguments mean and in what order they are required for each function. @cindex implicit declaration Whenever you use an undeclared name as a function (by writing it followed by an open-parenthesis), the name is forthwith declared as a function returning type @code{int}. This is called an @dfn{implicit declaration} because the name is declared without there being any text to constitute a declaration of it. The scope of the implicit declaration is the innermost pair of curly braces surrounding the place where the declaration happens. Once the close-brace is passed, the function name is once again undeclared. Therefore, you are free to declare it. It may also be implicitly declared again in another scope. ``implicit declaration'' is generally a bad idea, for the same reasons listed above for declaring the types of function arguments). @subsection difference between ANSI and K&R @node scope, I/O, functions, Top @chapter scope, linkage, access and duration @section what is linkage ? Linkage is barely a part of the C language. In the ``link'' phase, code from separate ``modules'' (a.k.a. translation units) is put together to form executables. The details on how this is done vary from system to system. However, the C standard specifies some details of what can happen here. @section what is scope ? when is scope important ? Scope is the concept of where a variable can be seen. A variable can have one of several scopes: global, file, function, or block. A variable with block scope is visible within the block it is declared (and within any sub-blocks). Variables cannot have function scope; only labels have function scope. Function scope means that a label is visible from anywhere within the function it's in, even outside of the block it's used in, or before it's defined. A variable with file scope is visible anywhere within the current source file, but only after a declaration of it. @section Access modifiers (const volatile) @section Storage Classes and Duration @section Constant values @section Trivia @node I/O, macros, scope, Top @chapter Input and Output (How far do we take this?) [FIXME: which of these is truly ISO standard, and which only work on GNU C ? ] [FIXME: really should mention some of the more common graphical libraries; making CGI scripts in C that output HTML; etc.] [FIXME: since C is becoming common on embedded systems that don't have a alphabetic keyboard and a terminal display, mention I/O ports and @code{volatile} and @code{const}] @section Formatted Input and Output (printf() and scanf()) printf() and scanf() are in . ``info libc'' will give you more details than you ever wanted to know. printf fprintf sprintf snprintf scanf sscanf fscanf @example int printf(const char* format, ...); @end example The first argument is the format string. At run time, @code{printf()} scans through the format string. Plain characters are copied to the output stream unchanged. (This includes the characters `\b', `\a', `\n', `\t', etc.). When @code{printf()} encounters a conversion specifier (which always begin with the @code{%} character) in the format string, it takes the next argument in the argument list and prints it out in the manner specified by the conversion specifier. When @code{printf()} encounters the ``un-specifier'', @code{``%%''}, @code{printf()} prints a literal @code{%}. Each conversion specifier in the @code{printf()} format string indicates the type of the next argument. Each conversion specifier in a @code{scanf()}format string indicates the type of the object to which the next argument points. (All of the optional arguments in the @code{scanf()} argument list are pointers). Unfortunately, it is all too easy to make the error of making the type in the format string inconsistent with the type of the arguments (of @code{printf()} or @code{scanf()}). Such programs often seem to compile and run just fine, but then dump core --- or sometimes just give inexplicable results. The original purpose of the @samp{lint} program was to warn about this common error. Nowadays most decent compilers have this feature of @samp{lint} built in. In particular, if you use @samp{gcc} with the @samp{-Wformat} option, all your calls (to @code{printf()} and the other functions defined with the @code{__format__} attribute) will be checked to make sure that the specified type actually does correspond to the actual type of the corresponding argument. A conversion specifier in a printf format string looks like this: `%' [flags] [width] [`.' precision] type A conversion specifier in a scanf format string looks like this: `%' [flags] type you can almost always get reasonable free-format output without using any of the optional modifiers at all (i.e., just the `%' and the type). The modifiers are mostly used to make the output look ``prettier'' in tables. The flags are @code{#} : ``an alternate form'' @code{-} : [FIXME] @table @var @item width a decimal integer that specifies the ``minimum field width''. If the normal conversion produces fewer characters than this, the field is padded with spaces to the specified width. This is a *minimum* value; if the normal conversion produces more characters than this, the field is *not* truncated. Normally, the output is right-justified within the field. @item precision specifies the number of digits to be written for the numeric conversions. If the precision is specified, it consists of a period (`.') followed by a decimal integer. @end table @code{printf()} and @code{scanf()} can handle *only* the fundamental data types plus the ``string type'' (array of characters). They share this list of conversion specifiers: @code{signed int} and @code{unsigned int} %d, %o, %u, %x, %i %d = use signed decimal notation %u = use unsigned decimal notation %o = use octal notation %x = use hexadecimal notation %i = recognize whether the input is hex (starts with 0x), decimal, or octal (starts with 0). synonymous with d when used for output. @code{signed short int} and @code{unsigned short int} %hd, %ho, %hx, %hu, %hi (similar to int specifiers) @code{signed long int} and @code{unsigned long int} %ld, %lo, %lx, %lu, %li (similar to int specifiers) @code{ char} %c = use a literal character. [FIXME: I have seen code that used `%c' with a @code{int} argument. Is this portable ?] @code{ char *} %s = print the string of characters. @code{ float } and @code{ double} %f, %e, %g, %E, %G %f = print in the fixed-point style [-]@var{ddd.ddd}. The number of digits @var{d} after the decimal point is equal to the precision specification (defaults to 6). %e = print in the exponential notation [-]d.ddde+dd or [-]d.ddde-dd, with one digit before the decimal point. The number of digits @var{d} after the decimal point is equal to the precision specification (defaults to 6). %E = print in the exponential notation [-]d.dddE+dd or [-]d.dddE-dd, much like %e. %g = print in either %f or %e, whichever ``looks best''. [FIXME: Stroustrup mentions %f, %e, *and* %d - is that a error in his book ?] %G = print in either %f or %E, whichever ``looks best''. @code{ long double} %Lf, %Le, %Lg, %LE, %LG (similar to @code{double} specifiers) @code{ void * } %p = print the pointer. For example: @example char * t = "test"; printf ("%p", (void*)t); @end example @noindent prints a number --- the address of the string constant ``"test"''. It does not print the word ``test''. (The exact format is undefined, but it is typically some hexadecimal notation). [FIXME: I think there is a few more types missing here. @code{float} cannot be printed by printf --- but I think @code{long long int} and a few others can.] [Trivia; is this true ? @code{float} cannot technically be printed by printf, but printf() often appears to print a float because the compiler automagically casts the float to a @code{double}. Does this ever happen with other functions ?] @section Trivia Programs written in C++ can use @code{printf()} and @code{scanf()}, but in C++ @code{cout <<} and @code{cin >>} are always easier to use. There are several more conversion specifier options. For example, explaining ``%-+*.*Lg%n'' is too much to go into here. [FIXME: I probably *should* explain this here ... Or will there be a seperate manual covering printf() and the other library functions ?] @node macros, History of C, I/O, Top @chapter macros and the preprocessor @section Behavior @vindex translation phase Note that ANSI does not require that the preprocessor or linker be separate from the compiler; all that is specified is behavior, not implementation. The preprocessor is the name generally given to the first four translation phases that occur. It performs certain textual modifications, including the interpretation of simple macros, the inclusion of other files, and certain obscure hacks to support unusual input devices or file systems. It has historically almost always been available as a separate program, although ISO makes no guarantee that it is. The translation phases generally thought of as the preprocessor are as follows: 1. Physical source file characters are mapped to the source character set. Say what? This means that any necessary conversions are applied. Be this EBCDIC to ASCII, or converting CR/LF sequences to LF's, whatever translations must be applied to characters are applied. Trigraph sequences are processed. The trigraph sequences (enumerated below) are a kluge to allow C to be implemented on deficient machines (MIX-1009, IBM mainframes) which lack the character set to implement C. (Also certain European character sets, which have hijacked some of the ASCII set for letters not present in ASCII.) @vindex trigraph The trigraph sequences are: @table @code @itemx ??/ @samp{\} @itemx ??= @samp{#} @itemx ??( @samp{[} @itemx ??) @samp{]} @itemx ??' @samp{^} @itemx ??- @samp{~} @itemx ??< @samp{@{} @itemx ??> @samp{@}} @end table @vindex translation phase 2 @vindex logical source lines The second phase is merely eliminating backslashes followed by newlines; this allows automatic breaking of C source files at arbitrary positions, as the backslash/newline pairs are removed, not merely replaced with whitespace. It is an error for the last character in a file to be a backslash. Note that a C environment is free to allow spaces between the backslash and the newline character, though most don't. An environment may define the end of a line however it wishes; on some systems, for instance, any sequence of spaces which includes the 80th character is a newline, so a backslash which is the last non-space character on the line will be matched by this rule. The results of this phase are called logical source lines. [``newline'' and ``whitespace'' are not English words, but common terms in discussing C.] A logical source line exceeding 509 characters is not portable; in particular, the long macros favored by some developers will be rejected by some systems. @vindex comment The third phase consists of breaking the file into preprocessing tokens and white space. Comments (sequences of characters, starting with @samp{/*} and going until the first occurrence of @samp{*/}) are removed at this time, and replaced. ANSI specifies that they are replaced by a single space; most modern compilers replace them with white space, possibly more than one space. Portability note: K&R compilers frequently eliminated comments entirely, leading to the construction @example #define paste(a, b) a/**/b @end example @noindent This is not available in ANSI compilers; instead, there is a @code{##} operator to perform the same task. If you wish to test whether a compiler performs the old-style pasting, consider: @example int pastetest(void) @{ int i = 1; return -/**/-i; @} @end example @noindent This returns 1 if comments expand into space (which they are required to by ANSI), or 0 if they do not. Useless, but amusing. Comments do not nest. Rare compilers have a ``feature'' allowing them to; it has never been known to help significantly, and introduces bugs. (This is the main reason for the C++ comment, actually.) To comment out large blocks of code that have comments in them, @samp{#if 0} is recommended. The Rationale explains that comments add (human-language) commentary; the right way to ``uninclude'' source is conditional compilation. The fourth phase handles preprocessing directives and expands macros. @section Preprocessor directives @vindex preprocessor directive A preprocessor directive looks, to the compiler, like the token-sequence @samp{#}, @samp{directive}, @samp{optional-args}. The arguments are ended by a newline. The initial `@samp{#}' must be the first non-whitespace character on the line. Optional whitespace may separate any or all of these. (Whitespace may be required to separate tokens, of course.) It is not necessary in modern C compilers (ISO or GNU C) for a preprocessor directive to start at the beginning of the line; all that is required is that the leading @samp{#} be the first non-whitespace character on a line. K&R compilers frequently require the @samp{#} to be in the first column, however, most of them will accept arbitrary spaces or tabs between the @samp{#} and the directive. This is why old code frequently has indents like @example @dots{} # ifdef foo # ifdef bar # endif # endif @end example @noindent This is not necessary with modern compilers. (Where modern is defined as compilers we like.) ANSI would accept @example @dots{} #ifdef foo #ifdef bar #endif #endif @end example @noindent as synonymous with the above. In all preprocessor directives, white space is taken to include only spaces and horizontal tab characters; newlines are not allowed. ([FIXME: is it true that] This is the only time a compiler ever distinguishes between different kinds of whitespace, as long as you don't use the @code{//} C++ comment). The following directives are defined, with the following behavior: @vindex #include The @samp{include} directive causes the named file to be included. The name given may be inside double quotes (@samp{"}), or inside a pair of @samp{<} and @samp{>} symbols. (Technically this is not a string literal; the rules for embedding characters in a string literal do not apply). If it contains the characters @samp{'}, @samp{\}, or @samp{/*}, the behavior is undefined. If a sequence inside @samp{<>}'s contains a @samp{"}, the behavior is also undefined. (No, @samp{/*} isn't a character, but ISO lists it in with the others.) This means that the preprocessor directive @samp{#include "c:\tc\bob.h"} invokes undefined behavior, and does not necessarily try to open a file which has a tab and a backspace in its name. Some compilers may choose to simply pass the @samp{\} on as a part of a file name; this is not portable. If you write the name inside double quotes, the preprocessor searches for a file of that name in an implementation-defined manner, most likely including the current directory. If this search fails, the preprocessor continues to search for that file using the procedure for include files inside @samp{<>} symbols. If you write the name inside @samp{<>} symbols, the preprocessor searches in a different, also implementation defined manner. Most often, the latter form is used for the system include files. (For instance, on a UNIX system, default behavior for @samp{#include } might be to search the directory @samp{/usr/include}, while @samp{#include "stdio.h"} might search the current directory first. However, it would then search @samp{/usr/include} if it couldn't find anything in the current directory, since it must fall back on the default behavior.) The @samp{#include} directive will put a named header or source file through phases one through four, recursively. @vindex #pragma The @samp{#pragma} directive has implementation defined behavior. It is specified that an implementation will ignore a pragma it does not understand. Richard Stallman claims that @samp{#pragma} is useless. An early version of GNU C defined all @samp{pragma} directives to abort compilation and invoke rogue, hack, or emacs running the tower of Hanoi demo. If none could be found, it would abort. This is an atypical implementation. @vindex #error The @samp{error} directive causes compilation to abort, and an error message to be printed, which incorporates the remainder of the line. @samp{#error} is best used to flag conditions the source cannot handle; for instance, @example #ifdef unix /* code to handle Unix and related environments */ #else #error No support for non-unix systems, sorry. #endif @end example @noindent would be more polite than what a lot of programs do, which is compile about a third of the way, then abort because they can't find an obscure system resource. @vindex #if @vindex #else @vindex #elif @vindex #endif @vindex #ifdef @vindex #ifndef @vindex defined() The @samp{if} directive and relatives are used to control @dfn{conditional inclusion}. For instance, @example #if defined(__STDC__)int foo; #endif @end example @noindent declares an integer @var{foo} if the symbol @samp{__STDC__} is defined. The expression @samp{defined(@var{name})} evaluates to 1 if @var{name} is defined, and to 0 otherwise. It is possible to use it without the parentheses. The @samp{ifdef} and @samp{ifndef} directives are special cases; @samp{#ifdef foo} is equivalent to @samp{#if defined foo}, and @samp{#ifndef bar} is equivalent to @samp{#if ! defined bar}. Macro expansion happens prior to the evaluation of the expression associated with any @samp{#if} directive. [FIXME: did this get out of chronological order ?] It is an error for there to be a preprocessing token on the line after an @samp{#else} or @samp{#endif} directive, but white space is allowed. Older compilers did not always have this restriction, and some vendors leave tokens on the ends of these lines in their provided headers. Note that this restriction happens after comments have been used, so @samp{#endif /* _POSIX_SOURCE */} is valid, while @samp{#endif _POSIX_SOURCE} is an error. Certain Unix vendors whose names are three letters long, and end with `N', frequently ignore this. The material ``commented out'' with one of these directives is still required to consist of valid preprocessor tokens, although many compilers will allow you to get away with arbitrary text. It is best to use comments to hide text or other material unsuitable for compilation, and @samp{#if} directives to hide legitimate code you don't want to compile. [FIXME: this paragraph is completely incomprehensible] @vindex #define The @samp{define} directive causes a macro to be defined. There are two kinds of macros, which are both defined with the same directive. A function-like-macro is a macro which takes arguments, and expands into text in which the names given to those arguments are replaced by those arguments in each invocation. A plain macro simply expands to the text given in the definition. @section Generic macros (no arguments) @section Macros with arguments and encapsulation (lots of () ) Macros, introduced by the @samp{#define} preprocessor directive, are expanded in phase four. A function-like macro is expanded only if the next token after its name is a left parenthesis (@samp{(}). If you wish to suppress any macro definition of a library function, and actually generate a function call to the library, you can use something like this: @example (isupper)(c); /* force library call */ @end example @noindent After a macro is expanded, the results are rescanned to see if there are other macros to be expanded. Macro expansion is where the ANSI @samp{##} and @samp{#} operators take effect. Many macros are already pre-defined for you. For example, all standards-compliant C compilers define the macro @samp{__STDC__} to be 1. Many compilers for languages quite similar to C will not, or will merely define it without giving it a value. There are also the macros @samp{__FILE__} and @samp{__LINE__}, which provide the name of the current file (as a string literal) and the current line number (as an integer constant). The macros @samp{__DATE__} and @samp{__TIME__} provide the date and time this file was compiled (as a string literal). The macro @samp{__cplusplus} is defined when compiling a C++ program. Although ANSI does not allow a compiler which does not conform to the Standard to define the @samp{__STDC__} macro, unfortunately it has no power to determine the behavior of a compiler which does not conform to the standard. @emph{Sigh}. [FIXME: I know the C++ standard requires __DATE__ and __TIME__; does the C standard require them ?] [FIXME: Are these *all* the macros required of a standards-compliant implementation of C ? What other common useful macros (like unix, x86, etc) exist ?] Traditionally, people define macros to be all uppercase. @vindex #undef The @samp{undef} directive removes any macro definition associated with its argument. It is legal and harmless if there is no macro definition; it simply has no effect in that case. @vindex #line The @samp{#line} directive can change the current source file and line number. This affects the future expansions of @samp{__FILE__} and @samp{__LINE__}. In most compilers, it will also affect the reporting of error messages, although this is not specified. @chapter Lexical Structure Lexically, a C program is a stream of tokens. This chapter discusses the way that C programs are recognized as streams of tokens. There is some overlap with the preprocessor; the preprocessor does much of the token scanning, and then performs some substitutions, producing a stream of tokens. @vindex maximal munch rule C parses incoming tokens using the maximal munch rule - the rule that a token is the longest sequence of incoming characters that can be seen as a token. This prevents a later typographical error from significantly changing the meaning of code, the way it can in FORTRAN; the parsing is consistent. A C program consists of sequences of tokens. Tokens are optionally separated by whitespace. However, unless the whitespace distinguishes between two interpretations, it is not distinct. In other words, @example x+++y;@{!x!=y;@} @end example @noindent is entirely equivalent to the following: @example x ++ + y ; @{ ! x != y ; @} @end example @noindent (Admittedly, neither does anything other than increment @samp{x}.) This is generally felt to be a great strength of C. Note that, in the above example, it @emph{would} have mattered if it had been written as @samp{x+ ++y;} instead. In this case, @samp{y} would have been incremented, and not @samp{x}. White space is significant only where it serves to distinguish between tokens. (It also affects the expansion of the @samp{__LINE__} macro, in some cases.) @section Problems with macros @section when to use macros and when to use functions @section Trivia From this several non-obvious things follow: @itemize @bullet @item The code @example in\ t foo; @end example @noindent is a legitimate declaration of an int @code{`foo'}. There is no problem of parsing the word @code{`int'}. If there were tokens after the `@samp{\}', the code would not be legal. If there were white space after the `@samp{\}', the code might be legal - if the compiler defines a newline in the source environment to be any sequence of whitespace followed by an end of line. (For instance, a punched card environment.) @vindex comment @item It is an error for a comment to start in one file and end in another; comments are taken out in phase 3, before @code{#include} directives are processed. @end itemize @node History of C, History of this document, macros, Top @chapter History of C [FIXME: Do I really need something here ? What ?] @node History of this document, Bibliography, History of C, Top @chapter History of this document I want to help you to write great C programs by making this the best possible C manual. Tell me how I can better serve you by sending your comments on the organization and content of this manual. To: From: email address: Subject: YARMAC 1998-08-07 What are the best features of this manual ? Do you find the organization of this manual easy to follow? If not, why? What additions to this manual do you think would enhance the organization and content ? What deletions from this manual could be made without affecting the overall usefulness ? How would you improve this manual ? [Fixme: this isn't really part of the manual; it's intended to remind me of the goals for this manual, so people helping me revise it can help me push it closer to these goals, and so I can refuse the "help" of people who claim the passive voice sounds "more professional".] I wrote this in the active voice, rather than the passive voice, to make it easier to understand. In particular, I cannot write ambiguous sentences like ``X is evaluated.'' in the active voice. Many manuals confuse the reader because the reader doesn't know if *he* is supposed to evaluate X, if the compiler evaluates X, if the program cannot help but evaluate X at runtime, or if *he* must take special care to persuade the program to evaluate X at runtime. The passive voice is disliked by me. I attempt to use gender-neutral language. I try to emphasize the point-of-view of someone reading or writing C source code, completely ignoring the point of view of a compiler writer. I converted some of the manual to Reduced English (E-prime) . This sometimes makes things easier to understand. @ignore Original author: Peter Seebach and Richard M. Stallman Current maintainer: David Cary FIXME: Documentation for GNU utilities and libraries should be written in ``.texinfo'' format (as of 1998-05-17). Change log: 1998-08-29:DAV: fixed a bunch of broken "next node" links. 1998-08-23:DAV: re-organized slightly. 1998-05-26:DAV: Unfortunately, I have never learned to use Emacs. I used a variety of other editors to edit this file, including ``nedit'' and ``ms-word'', then used ``texi2html'' and ``Netscape Navigator'' to test and view this file, and ``makeinfo'' and ``info'' to test it again. 1998-05-26:DAV: renamed from ``C YARM'' to ``Yet Another Reference Manual About C (YARMAC)'' 1998-05-24:DAV: Merged RMS info into PS manual. Now ``Version 0.13''. 1998-05-21:DAV: David Cary took over maintenance. 1996:PS: Peter Seebach made VERSION 0.12 ``C - Yet Another Reference Manual'' (C YARM) 1994:PS: Peter Seebach @samp{seebs@@solon.com} started ``C - Yet Another Reference Manual'' 1991-02-28:CP: Chris Petrilli emailed RMS ``Outline for GNU C Reference Manual''. 1988-08-13:RMS: wrote ``C Reference Manual DRAFT''. I started with a copy of ``Yet Another Reference Manual'' from Peter Seebach: >Date: Sun, 17 May 1998 12:26:43 -0500 (CDT) >From: "S. Morningthunder" >To: David Cary >Subject: Other C reference manual and outline Used outline from >From: petrilli@albert.ai.mit.edu (Chris Petrilli) >Date: Thu, 28 Feb 1991 18:56:46 WET >To: rms@@albert.ai.mit.edu >Subject: Outline for GNU C Reference Manual Then merged in the (much larger) ``C Reference Manual DRAFT'' (C) 1988 Richard M. Stallman. >Date: Sun, 17 May 1998 12:24:56 -0500 (CDT) >From: "S. Morningthunder" >To: David Cary >Subject: RMS C reference manual @end ignore @section Trivia @node Bibliography, another view, History of this document, Top @appendix Bibliography P. J. Plauger, in one of his columns, gave a bibliography with brief reviews of books he found interesting, important, or relevant. I found this to be a great enhancement of a bibliography. Lacking the background for good reviews, I'm leaving these as short paragraphs, covering my impression of the books I cover. These are, of course, my personal opinions, and the Free Software Foundation is not responsible. GNU Coding Standards http://www.gnu.org/prep/standards_toc.html _The C Programming Language, second edition_ book by Bjarne Stroustrup (c) 1991. Bjarne Stroustrup designed the C++ language. _Texinfo: The GNU Documentation Format, for Texinfo version 3.12_ (c) 1998-02 http://www.delorie.com/gnu/docs/texinfo/texinfo_toc.html by Robert J. Chassell and Richard M. Stallman DJ Delorie _Using and Porting GNU CC_ by Richard M. Stallman (C) 1996 Free Software Foundation, Inc. http://www.delorie.com/gnu/docs/gcc/gcc_toc.html @node another view, Glossary, Bibliography, Top @appendix The Compiler Writer's View @section Declaration Syntax [DVDEUG: Unclear and almost absurd - why is this here? Only a language lawyer could argue this - it doesn't help for learning anything.] [DAV: Is this a more appropriate section for this paragraph ? At first I thought this paragraph was incorrect. It's certainly a interesting point of view. ] C code consists solely of declarations. `Normal' statements occur only as part of the declaration of a function. The overall syntax of a declaration is as follows: @example @var{type-and-storage-class} @var{initdecl}, @var{initdecl}, @dots{}; @end example @cindex type specifier @cindex basic type @noindent @var{type-and-storage-class} is a sequence of type specifiers and storage class specifiers. At least one specifier is required, but it may be either kind. The specifiers apply to all the identifiers declared in the declaration. The type described by the @dfn{type specifiers} is called the declaration's @dfn{basic type}. Only numeric types, structure types, union types, enumerated types and @code{void} may be basic types. Each @var{initdecl} names one identifier and describes its type in relation to the basic type. The @var{initdecl} also contains the initializer, if any, for that identifier: @example @var{declarator} @exdent or @var{declarator} = @var{initializer} @end example @cindex declarator A @dfn{declarator} names a variable or function being declared, and describes its data type in relation to the declaration's basic type. The simplest @var{declarator} is just a variable name. Then the basic type is used as the type of that variable. (some people call this a ``simple declarator'') For example, @example int x; @end example @noindent declares @code{x} to have type @code{int}. What if you want more complex declarators ? Nesting the following constructs makes more complex declarators: (some people call this a ``complex declarator'') @itemize @bullet @item @code{* @var{declarator}} for a pointer type. @item @code{@var{declarator}[@var{size}]} for an array type. (@var{size} may be omitted in certain contexts.) @item @code{@var{declarator}()} for a function type with unspecified arguments. @item @code{@var{declarator}(@var{argtype},@var{argtype},@dots{})} for a function type with specified argument types. Each @var{argtype} describes the type of one argument of a function. @item @code{(@var{declarator})} shows how a declarator can be enclosed in parentheses. This has no effect on its meaning. @end itemize Parentheses are needed when nesting the @code{* @var{declarator}} construct with @code{@var{declarator}[@dots{}]} and @code{@var{declarator}(@dots{})}, to avoid ambiguity between @example (* @var{declarator})[@var{size}] @end example @noindent (a pointer to an array) and @example * (@var{declarator}[@var{size}]) @end example @noindent (an array of pointers). Without parentheses, as in @code{* @var{declarator}[@var{size}]}, is equivalent to the latter. @section A Rough Grammar The following yacc-like grammar roughly describes the C language. It is not (yet) believed to be accurate in all respects. In particular, note that really, the preprocessor statements are handled before a real parser sees the language. C was written as a language to be written by a macro interpreter; defining it as a single language introduces some ugly problems. [FIXME: There's no grammar at all here !] @vindex statements There are six kinds of statements in C; they are called @samp{labeled}, @samp{compound}, @samp{expression}, @samp{selection}, @samp{iteration}, and @samp{jump} statements. [FIXME: which kind is the ``return'' statement ? What about the ``switch'' statement ?] @vindex labeled statement A labeled statement is a statement preceded by a label; a label is an identifier followed by a colon. The statement part of a labeled statement may be any other kind of statement. @vindex compound statement @vindex block A compound statement, or block, consists of a declaration list (which is 0 or more declarations), followed by a list of statements (which is 0 or more statements) inside a pair of `@samp{@{@}}'s. An empty pair of braces is a block. A block introduces a new scope; variables declared in it may hide variables outside it. @vindex expression statement This is the most common kind of statement; it consists simply of an expression. The expression is evaluated for side effects, and the result discarded. For instance, @example x = 3; @end example @noindent is an expression statement; the expression `@code{x = 3}' is evaluated, the side effects happen, and the resulting value (3, most likely) is discarded. @vindex null statement A special case of the expression statement is the null statement, which is a semicolon without an expression in front of it. It is useful primarily as a concise way of writing an empty block, or to put a label in front of. @vindex selection statement Selection statements are the @samp{if} and @samp{switch} statements. They (in principle) select between alternative courses of action, based on the value of an expression. @vindex iteration statement Iteration statements are the @samp{for}, @samp{while}, and @samp{do} statements. They cause a loop body to be executed some number of times, dependant on an expression. @vindex jump statement Jump statements are @samp{goto}, @samp{continue}, @samp{break}, and @samp{return}. They cause control to move to another part of the program. @section Trivia @itemize @bullet @item According to ANSI, there is no precedence in C; instead, there are many types of expressions. This makes a parser (mostly) easier to generate. Many C compilers are written in terms of a precedence grammar, but the canonical (ISO standard) grammar does not have precedence. The ANSI C Standard specifies expressions in a hierarchy of kinds of expressions, with the operators normally considered the ``highest precedence'' nearest the leaves of the hierarchy. Although the terminology is very different, the net effect is identical to the (hopefully easier to understand) ``associativity and precedence system'' terminology in this reference manual. @end itemize @node Glossary, term index, another view, Top @appendix Glossary @table @asis @item as-if rule The ``as-if rule'' states that it doesn't matter at all what a compiler actually does, as long as the results are `as if' the specified behavior had been followed. This refers mostly to optimizations, also to some limited disregard for the specified order of translation phases. For example, @example int foo = 3; foo = 4; foo = 5; @end example @noindent may simply set @var{foo} to 5; the program will still run `as if' the intermediate values had been applied. See also @samp{sequence point}. @item bug See also @samp{feature}. @item built-in type A type which is defined directly in the Standard, and not a derived type. For instance, @samp{int} is a built-in type. Contrast @emph{user-defined type}. [FIXME: list all the built-in types here. Does ``pointer'' technically qualify as a built-in type, since pointers must always point to some other type ?] [DVDEUG: Pointers are derived types.] @item expression ANSI 6.3 says: @quotation An @emph{expression} is a sequence of operators and operands that specifies computation of a value, or that designates an object or a function, or that generates side effects, or that performs a combination thereof. @end quotation @noindent Expressions form one of the six kinds of statements, and are the most common kind of statement. @item feature See also @samp{bug}. @item full expression An expression which is not a part of a larger expression. In this code, @example for(i = 0; i < 10; ++i, ++k)@{ @dots{} @}; @end example @noindent the @samp{i = 0} is a full expression. @samp{++i, ++k} is also full expression. However, the @samp{++i} here is not a full expression (neither is @samp{++k}), because it is a part of a larger expression. @item maximal munch rule The rule that the longest stream of incoming characters that can be a token is a token. For example, @samp{a+++++b} can only be meaningfully read as @samp{a++ + ++b}, but the maximal munch rule assures that it will be read as @samp{a++ ++ +b}, which is a syntax error. The fragment is thus illegal. It is important to note that the maximal munch rule selects the longest available token, not the longest token with which the code can be parsed as legal. @item precedence The rules that determine how expressions are broken down; for instance, @samp{a * b + c} is broken down as @samp{(a * b) + c}, and not as @samp{a * (b + c)}. C does not actually have precedence, although it acts like it. [DVDEUG: WHAAA?] @item sequence point The key to the as-if rule. A point at which everything must be settled; all previous side effects must be resolved, all further side effects must not have started yet. Note that the as-if rule allows some judicious ignoring of sequence points in special cases, but they are the primary rule to which it must adhere. See also @samp{as-is rule}. @item standard library The set of functions specified by ANSI to be available to a program; they provide the tools needed for a large number of tasks, and provide building blocks for large programs. Not originally part of the language, the standard library developed and evolved as a tool for programmers in the real world - which has led to a few incompatibilities in older systems, but has helped weed out ill-considered features. @item translation phase One of the canonical phases in which translation from source text to executable code occurs. Primarily relevant, not because a compiler will necessarily implement them as phases, but because they describe the order in which things must @emph{appear} to happen. See also @samp{as-is rule}. @item trigraph One of the most visible mistakes of the ANSI committee; a mechanism by which machines with lacking character set can supposedly run C. The original intent was to provide for languages or character sets in which the values commonly used for @samp{`@{'} and @samp{`@}'} were used instead for accented characters and for some systems, like mainframes, with unusual character sets. Unfortunately, they are hardly readable and can change valid code - for example @code{"What??!"} becomes @code{"What|"}. With the advent of 8-bit character sets as a common thing, the need for trigraphs has almost disappeared, but trigraphs haven't. It's still useful occasionally, but it is widely regretted that it's so ugly. @item user-defined type Any type defined by the user, most often via @samp{typedef}, or through the creation of a @samp{struct}, @samp{union}, or @samp{enum}. [DVDEUG: Okay, but where do the language defined structs come in?] @item address [FIXME] @item value This manual uses the term ``value'' in only one sense. A @dfn{value} is [FIXME: what ?] . At the lowest level, a value is a block of information, either in memory (``the value of a variable'') or in the CPU registers (``a intermediate value''). In C, every value has a type. @end table @node term index, keystroke index, Glossary, Top @unnumbered Term Index @printindex vr @node keystroke index, function index, term index, Top @unnumbered Keystroke Index @comment node-name, next, previous, up @printindex ky @node function index, variable index, keystroke index, Top @unnumbered function index @comment node-name, next, previous, up @printindex fn @node variable index, program index, function index, Top @unnumbered variables index @comment node-name, next, previous, up @printindex vr @node program index, data type index, variable index, Top @unnumbered program index @comment node-name, next, previous, up @printindex pg @node data type index, Concept Index, program index, Top @unnumbered data type index @comment node-name, next, previous, up @printindex tp @ignore _Computation Structures_ by Stephen A. Ward and Robert H. Halstead, Jr. (c)1990 by The Massachusetts Institute of Technology The MIT Press ... A1 The C Language: A brief overview -- "C is a relatively low-level language, designed to be easily translated to efficient object code for many modern computers. Unlike LISP, for example, the semantics of C strongly reflect the assumption of compiled (rather than interpreted) implementations and a bias towards simplicity and efficiency of the translated program rather than flexibility and generality of the source language." "A1.1 Simple Data Types and Declarations". char (8 bit) short(16 bit) int (equivalent to short in some C implementations, and to long in others)[FIXME: mention the file] long (32 bit) "Every variable in C has an associated compile-time type." "the "= 13" clause specifies an initial value; if absent, the initial value will be random. Note the use of "/* ... */" syntax to incorporate comments into the source program." "Typed /pointers/ can be manipulated in C. A pointer to a datum is represented at run time by a word containing the machine address of that datum; if the datum occupies multiple (consecutive) bytes, the pointer contains the /lowest/ address occupied by the data to which it points. Pointer variables are declared using one or more asterisks preceding the variable name. Thus the declaration long a, *b, **c; notifies the compiler that the variable a will have as its value a 32-bit integer; b will have as its value a /pointer/ to a 32-bit integer (that is, the address of a memory location containing a long integer), and c's value will be a pointer to a location containing a pointer to a long integer." "A1.2 Expressions" Table A1.1 C expressions and operators Expression Value ------------ ------- a+b Addition a-b Subtraction -a negative [FIXME: is this always 2's complement ?] a*b Multiplication a/b Division a%b Modulus (remainder from a/b) Comparison operators: return bool true or false. (a) value of a; parentheses used for grouping ab True if a is greater than b; else false a<=b Less-than-or-equal-to comparison a>=b greater-than-or-equal-to comparison a==b Equal-to comparison (don't confuse with assignment =) a!=b Not-equal-to comparison !a True if a is false (zero); Boolean not. a&&b wordwise AND: false (zero) if either a or b is false (zero) a||b wordwise OR: true if either a or b is true (nonzero). ~a bitwise complement of a a&b bitwise AND a|b bitwise OR a^b bitwise exclusive OR (don't confuse with exponentiation) a>>b integer a shifted right b positions a<c Component c of structure pointed to by p sizeof(x) Size, in bytes, of the representation of x. (many operators missing from this table; +=, |=, &=, etc.) "In this table, a, b, f, and p may be replaced by valid expressions, while x must be a variable (since the storage location associated with it is referenced)." c is the name of a component of a structure. "simple expressions such as 3*(x=x+1) may have side effects as well as values." ... Table A1.5 Summary of C Statements Statement type Use /expr/; Evaluate expression /expr/ if(/test/) /statement/; Conditional test else /statement/; optionally follows @code{if} statement switch(/expr/){ N-way branch (dispatch) case /C1/: ... (each /Ci/ is a constant) case /C2/: ... default: ... } while(/test/) /statement/; Iteration for(/init/; /test/; /incr/) /statement/; return /expr/; Procedure return; /expr/ is optional break; Break out of loop or @code{switch} continue; Continue to next loop iteration /tag/: Define a label for @code{goto} goto /tag/; transfer control. ... A1.5 Structures In addition to arrays, which have homogenous type but variable size, C supports /structures/, which are fixed-size records of heterogenous data. ... struct Employee { char * Name; /* Employee's name */ long Salary; /* Employee's salary */ }; This fragment describes a structure type and gives it the name @code{Employee}. Subsequent declarations can treat @code{struct Employee} as a valid C type; for example, struct Employee Payroll[100]; declares an array of 100 structures, each devoted to some employee's record. ... a structure declaration provides the compiler with a prototype for the interpretation of blocks of memory. ... The salary of the fourth employee, for example, is referenced by @code{Payroll[3].Salary}. ... We might, for exmple, wish to expand our little payroll database to include the supervisor of each employee: struct Employee { char * Name; /* Employee's name */ long Salary; /* Employee's salary */ struct Employee * Supervisor; /* Employee's boss */ }; struct Employee Payroll[100]; ... While it is perfectly legal in C to declare a structure component that is itself a @code{struct} type, the result is that the containing structure type is enlarged by the size of the contained structure. If the supervisor component were declared to be type @code{struct Employee} rather than a pointer to a @{struct Employee}, C would complain. ... We circumvent such difficulties by referencing the supervisor through a pointer to the appropriate payroll record. ... Since access to structures through pointers is so common, C provides the special syntax "/p/->/cname/" to reference component /cname/ of the structure that /p/ points to. Use of structure pointers is illustrated by bing term, */ + p->Pointsthe silly program in figure A1.6. ... Figure A1.6 Use of structures. struct Employee { char * Name; /* Employee's name */ long Salary; /* Employee's salary */ long Points; /* Brownie points */ struct Employee * Supervisor; /* Employee's boss */ }; struct Employee Payroll[100]; ... /* Annual raise program */ Raise( struct Employee p[100]) { int i; for(i=0; i<100; i=i+1){ /* consider each employee */ p->Salary = /* Salary adjustment */ p->Salary + 100 /* cost-of-living term, */ + p->Points; /* merit term */ p->Points = 0; /* Start over next year ! */ p = p+1; /* On to next record ! */ Check(p); /* Make sure no disparities.*/ } } /* Make sure employee is getting less than boss: */ Check( struct Employee * e /* pointer to record */ ) { if( 0 == e->Supervisor ){ /* Ignore the president */ return; /* (pres. has no boss).*/ } if( e->Salary < (e->Supervisor)->Salary ){ /* Problem here ? */ return; /* Nope, supervisor is happy */ }; /* When e's boss is making no more than e is, give the boss a raise, then check that boss's new salary causes no additional problems: */ (e->Supervisor)->Salary = 1 + e->Salary; /* Now boss makes more. */ Check(e->Supervisor); /* check further */ } [DAV: this seems broken -- the final salaries depend not just on the "real" data in the employee structure, but also on the arbitrary ordering of employees. If we assume the president is p[0] and every employee is listed *after* his supervisor, then things are OK. ] } @end ignore the much more readable keywords B. Stroustrup suggested used instead. (I list them in daves.h; recently that list has become part of the C++ standard and compliant compilers should let you
	  #include <iso646.h>
	
) [FIXME] [DVDEUG: How do we clean this up? It's illegible, at least when printed.] #include /* for ``and'', ``or'', ``bitor'', etc.*/ [Added with Amendment 1] #define and && [keyword in C++] #define and_eq &= [keyword in C++] #define bitand & [keyword in C++] #define bitor | [keyword in C++] #define compl ~ [keyword in C++] #define not ! [keyword in C++] #define not_eq != [keyword in C++] #define or || [keyword in C++] #define or_eq |= [keyword in C++] #define xor ^ [keyword in C++] #define xor_eq ^= [keyword in C++] The standard header provides readable alternatives to certain operators or punctuators. ``The ANSI C Standard'' is the common name for ANSI X3.159-1989, American National Standard for Information Systems - Programming Language - C . filename extensions .c .c++ .C .cc .cxx .cpp .h .H .hh comments and administrative information human-readable text in sourcecode students look for illuminating comments in source code and human-readable strings in the sourcecode ``If you'd like to know more about writing programs in C, you can't do much better than reading _The C Programming Language (Second Edition)_ by Brian W. Kernighan and Dennis M. Richie. For those of you who wish to voyage a little further, _Reusable Data Structures for C_ by Roger Sessions is well worth perusing.'' -- recc. Clive ``Max'' Maxfield http://www.maxmon.com/ in _Bebop to the Boolean Boogie_ p. 331. General rule for end-squiggly-brackets (``@}'') and semicolons(``;''): The end-squiggly-bracket at the end of a @var{function} is special. Never put a semicolon after it. The end-squiggly-bracket at the end of a do@{@}while(). is special. Never put a semicolon after it. Always put semicolons after all other end-squiggly brackets. A top-down view ``filter'' ``sponge'' ``interactive'' @node Concept Index, , data type index, Top @unnumbered Concept Index @comment node-name, next, previous, up @printindex cp @summarycontents @contents @comment end c_reference_manual.texinfo, maintained by David Cary @bye @ignore .texinfo snippets Only visible in source file ! @chapter @section @subsection Write a command such as @noindent at the beginning of a line as the only text on the line. (@noindent prevents the beginning of the next line from being indented as the beginning of a paragraph.) @iftex Only visible in TeX typesetting ! @end iftex @ifinfo Only visible in the Info file ! @end ifinfo @c Only visible in source file ! @comment Only visible in source file ! What is the difference between @code{void} vs. @samp{void} ? @itemize @bullet @item @item @end itemize @enumerate @item @item @item @end enumerate ... ellipses @dots{} warning: if you use fread() to read from a file, and you ask it to read *all* the remaining data in the file, feof() returns false (!), even though there is no more data in the file to be read. Only after fread() tries to read *beyond* the end of the file does feof() return true. -- Paul functions have inputs and outputs. [needs example] Since C is pass-by-value, obviously any "int" or "double" you call a function with must be a *input* to that function. The return value of a function is one of its outputs. Often a single output is inadequate, one wants to return more stuff -- then what ? use & in call, * in function definition ... to tell the function where to put its outputs. ... One more idiom: Sometimes people use const ... * for some *inputs* to a function (typically items that are large structures and arrays), since passing a pointer is much faster than copying the entire item. The const indicates that this is really a input to the function, and forbids the function from changing it. [FIXME:]
Ghostscript uses ANSI syntax for function definitions. Because of this, when compiling with cc, it must preprocess each .c file to convert it to the older syntax defined in Kernighan and Ritchie, which is what most current Unix compilers (other than gcc) support. This step is automatically performed by a utility called ansi2knr, which is included in the Ghostscript distribution. The makefile automatically builds ansi2knr.
-- from make.txt, which you get with a Ghostscript installation. [Should I mention the URI link to the "GNU Ghostscript" version of ansi2knr ?] If you write ANSI C programs, http://iel.ucdavis.edu/CH/ they can be interpreted in the CH environment.

The November, 1997 Draft Specification for C98 http://plg.uwaterloo.ca/~cforall/C9X/ information on "Makefile Basics" http://www.gnu.org/prep/standards_41.html variables not local to a function: static: local to this module (file) (notice that this word mean something completely when modifying a variable local to a function) extern global. the April 28, 1995 Working Paper of ISO Working Group WG21 ftp://research.att.com/dist/c++std/WP/CD1/ lots of info on the C programming language http://www.lysator.liu.se/c/ Doug Gwyn's http://www.lysator.liu.se/c/iso646.h ansi2knr - Convert ANSI C programs to traditional ("Kernighan & Richie") C. is available via http://www.cs.wisc.edu/~ghost/ as ftp://ftp.cs.wisc.edu/ghost/ansi2knr.c

char* something; // Nothing is const. const char* something; // content is const. char* const something; // pointer is const. const char* const something; // Everything is const. Obviously, as the pointer itself is just a variable, it makes no sense to require it being const. Thus protecting the content pointed to is sufficient.
-- http://www.gamers.org/dEngine/r3D/coding.html [DAV: huh ? shouldn't the const come _before_ the "*" ?] stick with GNU getopt/getopt_long. This parses arguments in a manner compatible with the GNU coding standards and compatibily with Unix. Additionally it is very simple to use. or use argp http://www.gnome.org/devel/start/argp.shtml Joe maintains the FAQ list for the GNU C++ compiler. http://www.synopsys.com/news/pubs/research/people/jbuck.html http://egcs.cygnus.com "egcs is an experimental step in the development of GCC, the GNU C compiler." comp.lang.c Hypertext C-FAQ http://www.eskimo.com/~scs/C-faq/top.html "GNU Coding Standards" http://www.gnu.ai.mit.edu/prep/standards_toc.html by Richard Stallman Programming in C http://www.lysator.liu.se/c/ lots of info about the ANSI C standards, a public-domain version of "#include ", tutorials, reviews of books on C, detailed technical comments, I kind of like the http://www.lysator.liu.se/c/pikestyle.html to which it points. (Like me, he prefers lowercase). Simple rule: include files should never include include files. [-- Rob Pike http://www.lysator.liu.se/c/pikestyle.html ] _The CWEB System of Structured Documentation_ book by Donald E. Knuth http://sunburn.stanford.edu/~knuth/cweb.html "CWEB is a version of WEB for documenting C, C++, and Java programs. ... Thus CWEB combines TeX with today's most widely used professional programming languages." http://www.vendian.org/mncharity/ccode/grammar/ http://www.vendian.org/mncharity/ccode/ Some html-ized C language grammars Some PERL regular expressions for preprocessor-level parsing of C source code. A sloppy diff between ANSI and C9X grammars. The Motor Industry Research Association, the MISRA Consortium. http://www.misra.org.uk/ sells "Guidelines for the Use of the C Language in Vehicle Based Software" for £35 / copy. Has some other "Development Guidelines for Vehicle Based Software" that are available for download. "(MISRA C is also called "Safer C")" Simple Vector Library by Andrew Willmott http://pecan.srv.cs.cmu.edu/afs/cs/user/ph/www/859E/src/svl/doc/svl.html http://pecan.srv.cs.cmu.edu/afs/cs/user/ph/www/859E/src/svl/ SVL vector & matrix package SVL provides 2-, 3- and 4-vector and matrix types, as well as arbitrarily-sized vectors and matrices, and various useful functions and arithmetic operators. in C++. arg_parse(3) by Paul Heckbert http://pecan.srv.cs.cmu.edu/afs/cs/user/ph/www/859E/src/libarg/arg_parse.text http://www.cs.cmu.edu/afs/cs/user/ph/www/859E/src/libarg/ "source code to an argument parser with lots of nifty features. It's written in ANSI C, but it should link with C++ code with no trouble." "It is hoped that use of arg_parse will help standardize argument conventions and reduce the tedium of adding options to programs." @end ignore