Basic constructs (C Plus Plus)

From LiteratePrograms
Jump to: navigation, search
Other implementations: AmigaE | C | C++ | Java | Unlambda

This program is under development.
Please help to debug it. When debugging
is complete, remove the {{develop}} tag.

This page describes some basic constructs of the C++ programming language.


[edit] Functions

A function generally has the following structure:

return type function name(argument list)
  function body

Each program has at least one function:

<<function name>>=

The function main is called at program startup. As arguments it gets the number of arguments and a list of the program arguments. Those are traditionally named argc and argv, but that's just convention, not demanded by the language.

<<argument list>>=
int argc, char* argv[]

Note that if the program doesn't need program arguments, the argument list may also be empty instead.

The return type of the main function is always

<<return type>>=

The returned integer is given back to the operating system in order to indicate success or failure. Returning 0 always indicates success:

<<function body>>=
return 0;

Note that for the main function (and only for the main function), not having a return statement is equivalent to returning 0.

The main function is the only thing which absolutely must be written. Therefore this function alone is already a complete program (which in this case does nothing):


[edit] Includes

A typical C++ program needs to use libraries (including the C++ standard library), and usually consists of more than one file. To make declarations visible in more than one file, they are written in separate files, usually called headers. The naming of header files is completely arbitrary (as far as the compiler is concerned; some editors automatically switch to appropriate C++ editing modes if the file has a recogniced extension). Typically header files get one of the extensions ".h" (like C headers), ".H", ".hh", ".hpp", ".hxx" or ".h++" (note that some file systems don't allow the last one as file name). An exception are the standard library headers which don't have an extension at all (but for compatibility, the C headers are also available in the traditional ".h" form).

Header files are included with the preprocessor directive #include. For standard headers (or headers provided by the system), the file name is enclosed in angle brackets, which causes the compiler only to search the system directories. For self-written headers, the name is enclosed in quotes. Note that include directives provide pure textual inclusion, therefore they should not be used inside any program construct (unless the header is specially designed for that, but that's extremely unusual).

By convention, the include directives are usually collected at the beginning of the C++ file:

includes of example 2

rest of example 2

The example program unconditionally tells the operating system that the program has failed. For that, there exists a special value, EXIT_FAILURE. Note that this is the only truly portable way to signal failure; different operating systems have different conventions to indicate success or failure.

<<rest of example 2>>=
int main()
  return EXIT_FAILURE;

The value EXIT_FAILURE is defined in the header cstdlib (and the corresponding C header stdlib.h). Therefore this header must be included:

<<includes of example 2>>=
#include <cstdlib>

The following example program implements a helper function failure (which just returns EXIT_FAILURE) in a separate file. The complete program consists of three files: The header file declaring the function, the implementation file defining it, and the main file calling it.

Since header files might be included (indirectly) several times into the same main file, but certain constructs may not be repeated, header files should always get an include guard (although in this simple case, it would not be needed):


failure.h contents


The include guard tests if a header-specific macro is defined, and only if not, passes the rest of the header to the compiler. The first thing it does in that conditional part is to define the macro, so if the same header is included again, the macro will be defined. Note that for this to work, the macro name may not be used otherwise, therefore it should be derived from the file name.

In this case, the header just forward declares the function failure, which takes no arguments and returns an int:

<<failure.h contents>>=
int failure();

The function is implemented in the file

<<>>= includes

function failure implementation

By convention, the first thing an implementation file should do is to include the corresponding header. For one, inclusion of the header is usually necessary anyway to get definitions from there. In addition, including it first gives a nice test that the header is self-contained.

<< includes>>=
#include "failure.h"

The implementation of the function is quite simple: It just (again) returns EXIT_FAILURE.

<<function failure implementation>>=
int failure()
  return EXIT_FAILURE;

To make EXIT_FAILURE available, must include cstdlib.

<< includes>>=
#include <cstdlib>

The main program is implemented in the file

<<>>= includes

example 3 main function

The main function just calls failure to get the value to return:

<<example 3 main function>>=
int main()
  return failure();

To do so, the header failure.h has to be included:

<< includes>>=
#include "failure.h"

Note that there's no need to include cstdlib, since EXIT_FAILURE isn't used in this file.

The program example3 of course does exactly the same as example2: just report failure to the operating system.

[edit] Basic Input and Output

Input and output in C++ are done with streams. The standard output stream is std::cout. Output is done with the << operator. A hello world program could therefore read:

<<>>= includes

int main()
  std::cout << "Hello world!" << std::endl;

The std::endl outputs a newline and makes sure the line is actually output.

The stream std::cout is found in the standard header iostream:

<< includes>>=
#include <iostream>

Input is done by using the >> operator on std::cin, which is found in the same standard header. The following program reads an integer and then prints it again:

#include <iostream>

int main()
{ ask for integer read integer output message about integer

First, we output a message that we want an integer:

<< ask for integer>>=
std::cout << "Please enter an integer value: ";

Then we read an integer. For this, we need an integer variable. Note that in C++ a common style rule is that a variable should be defined as close to its initialization as possible. In the ideal case, it would be initialized directly at the definition, but for input that's not posible. However, we deferred the definition until directly before the input statement:

<< read integer>>= define integer variable read integer into variable

The variable is named value and is of type int. It is defined as follows:

<< define integer variable>>=
int value;

Finally we can read the integer value.

<< read integer into variable>>=
std::cin >> value;

The stream std::cin is tied to the stream std::cout, so that the text of the previous output statement is guaranteed to be output before waiting for the input.

Finally, we output a message telling which value was input.

<< output message about integer>>=
std::cout << "The number you typed was " << value << std::endl;

[edit] Variables and types

[edit] Variable definition

In C++, variables are typically defined by writing its type followed by the variable name and an (usually) optional initialization. For example, the following defines a variable named v of type int, which gets initialized with 3:

<<define v as int and initialize with 3>>=
int v(3);

There's another syntax, called copy initialization, which often looks more natural, but isn't always allowed (and in some cases has subtly different meaning, but not for simple types like int):

<<define v2 as int and copy-initialize with 3>>=
int v2 = 3;

Sometimes a variable initialization can have more than one argument, e.g. thew following defines a double-precision complex number variable named z with the initial value (2+3i):

<< includes>>=
#include <complex>
<<define c as std::complex&lt;double&gt; and initialize it with (2+3i)>>=
std::complex<double> z(2,3);

However, some types don't follow the simple rule above. One example is the array type, where the size of the array comes to the right:

<<define primes as array of 10 ints and initialize it with the first 10 primes>>=
int primes[10] = { 2, 3, 5, 7, 11, 13, 17, 19, 23, 29 };

Note that if an initialization is given, the compiler can figure out the length by itself:

<<define array a without explicitly giving length>>=
int a[] = { 1, 2, 3, 4, 5 };

Variable definitions can be both at global scope or at function scope:

<<>>= includes

// variables at global scope
define v as int and initialize with 3
define v2 as int and copy-initialize with 3
define primes as array of 10 ints and initialize it with the first 10 primes

void f()
  // variables at function scope
  define c as std::complex&lt;double&gt; and initialize it with (2+3i)
  define array a without explicitly giving length

Global variables exist as long as the program runs, function scope variables only exist during the execution of the function. However, the lifetime of local variables can be extended by prefixing with static. In that case, initialization occurs the first time the the definition is executed:

int remember(int init)
  static int var = init;
  return var;

The main function demonstrates this by calling the function twice with different values, and showing the result each time:

<< includes>>=
#include <iostream>
int main()
  std::cout << remember(42) << "\n";
  std::cout << remember(39) << "\n"; // outputs 42 again

[edit] Built-in types

C++ has a variety of built-in types. The following variable definitions show all basic types of C++:

<<basic types>>=
char c = 'a';  // character type, guaranteed to have size of one byte
char c2 = 48;  // the type char actually is a numeric type (on systems using ASCII, this initializes c2 with '0'
signed char sc = -3; // one byte, signed (i.e. can take negative values)
unsigned char uc = 3; // one byte, unsigned (i.e. all values are non-negative)
// note that the type char has the same representation and value range as either signed char or unsigned char,
// but is a distinct type from both

wchar_t wc = L'a'; // wide character type

short s = -300s; // signed short integer; same type as signed short, short int, signed short int
unsigned short us = 300us; // unsigned short integer; same type as unsigned short int

int i = -1000; // signed integer; same type as signed int, signed
unsigned u = 1000u; // unsigned integer; same type as unsigned int

long l = -100000L; // long signed integer; same type as long int, signed long, signed long int
unsigned long ul = 100000UL; // long unsigned integer; same type as unsigned long int

bool b = false; // boolean variable; can only take the values false and true (but implicit conversion from/to int exists)

float f = 1.0f; // floating point variable
double d = 1.0; // floating point variable (usually double precision)
long double d = 1.0L; // floating point value (at least same precision as double, sometimes more)

// the following types are not in the current C++ standard, but a very common extension,
// part of C99, and will be part of next version of the C++ standard
long long ll = -1000000000LL; // very long integer, same as long long int, signed long long, signed long long int
unsigned long long ull = 1000000000ULL; // unsigned version of long long

In addition, there's the type void which is special in that you cannot define variables of that type. However you can use it as return type, and in the following type constructs (with the exception of references and arrays).

For each C++ type T with the exception of reference types (see below), you can create any of the following types:

  • T const = constant T; you cannot modify objects of this type
  • T volatile = volatile T; objects of this type may be externally visible/modifyable
  • T* = pointer to T
  • T[n] (n any constant integer) = array of n Ts
  • T& = reference to T; gives a new name to an existing object

Note that the pointer type (T*) is a built-in type even if T is not. Also note that for array types, variable names as well as further type specifiers go in between T and [n]. In some cases parentheses are put around that part in order to remove ambiguities.

The following shows a few examples:

basic types
int const ci = 3; // compile time constant
int const ci2 = some_function(); // constant, but initialized at run time
int volatile vi; // This may be visible from outside the program
int const volatile cvi; // The program may not change it, but external processes might
int* pi = &i; // pointer to int, initialized to point to i
int const* pci = &i; // pointer to constant int, initialized to i; cannot be used to change i
int* const cpi = &i; // constant pointer to int, initialized i; can be used to change i,
                     // but not reset to point somewhere else
int const* const cpci = &i; // constant pointer to constant int, pointing to i
int const* cpi2 = &ci; // pointer to constant int can (but need not) point to actually constant int
void* pv = pi; // pointer to void pointing to i; stores address, but loses type information
void const* pcv = pci; // constness matters also for pointer to void
int ai[3] = { 1, 2, 3 }; // array of 3 ints
int* api[3]; // array of 3 pointers to int
int (*pai)[3]; // pointer to array of 3 ints
int* (*pai)[5]; // pointer to array of 5 pointers to int
int a2[3][4]; // two-dimensional array; actually an array of 3 arrays of 4 ints each
int& ri = i; // gives a new name to i
int const& rci = ai[1]; // gives a name to the second member of the array ai;
                        // the name cannot be used to modify ai[1].

In addition to that, there are function types, of the form return type(arguments). While you cannot define variables of that type (the corresponding syntax defines functions), you can create pointers and references to functions, and those can be used to define variables. Like with arrays, the function name or further type specifiers go in the middle, just after the return type:

void foo(int);          // not a variable, but a function declaration
void (*pf)(int) = foo;  // a function pointer pointing to foo
void (**pf)(int) = &pf; // a pointer to pointer to function, pointing to pf
void (&rf)(int) = foo;  // a function reference to foo; seldom used,
                        // works identical to a const pointer

[edit] User-defined types

In addition to those built-in types (and a few more to be introduced later), there are various user-defined types. User-defined types have to be defined in every file where they are used. Those definitions have to be the same. This is usually achieved by defining it in a header which then gets included in all files needing the definition.

The simplest one is the enumeration type:

enum color { red, orange, yellow, green, blue, violet }; // defines type "color"
color background = green; // a variable of type color

Enumeration values implicitly convert to integer values; by default, the first enumerator (red in the example above) is given the value 0, the second enumerator gets 1, and so on. However, it's also possible to explicitly give values:

enum flags { flag0 = 1, flag1 = 2, flag3 = 4, flag4 = 8, flag5 = 16 };

Note however that despite the implicit conversion, an enumeration is still a separate type. For example, you cannot assign an int to a variable of enumeration type (without explicit typecast), and you can overload operators on enumeration types.

The next type is the class type. Class types can be written with the keyword struct or class. The only difference between both is that with struct access is public by default, while with class it's private. The following defines a simple class type node which has two public member variables: An integer named value and a pointer to node named next:

struct node
  int value;
  node* next;

This also shows that the type's name already can be used inside the type definition, although in restricted ways: It can be used to define pointer and reference members, and to declare member functions (see below). Such a type is called incomplete type. After the end of the definition, the type is complete, and there's no restriction as to how it can be used.

If somewhere you only need the incomplete type, you can also just declare that type. Such a declaration looks like this:

struct node;

After that declaration, the name node is recognized by the compiler as incomplete type (unless a complete definition has been seen before).

The following defines a variable of type node (this requires the type to be complete):

node some_node;

For C compatibility, also the following definition is allowed:

struct node some_other_node;

But this latter form is unidiomatic in C++.

User defined variables can also be initialized at definition time (and it is encouraged to do so). Types like node which don't have user-defined constructors (see below) can be initialized by giving the values of the members in braces:

node initialized_node = { 7, &some_node };

Here & is the adress-of operator, which gives a pointer to the object it is applied to (i.e. some_node).

Besides data members, classes can also have member functions. For example:

struct vector3d
  double x, y, z;
  double length();

double vector3d::length()
  return std::sqrt(x*x+y*y+z*z);

vector3d diagonal = { 1, 1, 1 };
double diagonal_length = diagonal.length(); // sqrt(3)

There are some member functions which have special syntax and/or semantics. One important example are constructors and destructors. Constructors are written by just appending an argument list to the class name. Those are demonstrated with the following example class:

struct example_class
  example class constructor declarations
  int i;

example class constructor definitions

The simplest constructor is the default constructor. It can be used to initialize an object without giving an explicit initializer:

<<example class constructor declarations>>=
<<example class constructor definitions>>=
  std::cout << "default-initialized an object. i = " << i << "\n";

The part starting with ":" is the constructor initializer list; it tells how to initialize the members (in this case, initialize i with 3).

With this constructor, if you write

example_class example_variable;

then the above constructor will be called.

Indeed, if you don't implement any constructor yourself, the compiler will always generate a default-constructor for you; however that default constructor will just call the default constructors of the contained members of user-defined type and let the non-initialized members of built-in type uninitialized.

Download code