The `fork()' system call


From the C programming language, the fork() system call is well known. When a program needs to start a new process, system() can be used, but this requires the program to wait for the child process to terminate. The more general way to spawn subprocesses is to call fork().

In this section we will see how C++ can be used to wrap classes around a complex system call like fork(). Much of what follows in this section directly applies to the Unix operating system, and the discussion will therefore focus on that operating system. However, other systems usually provide comparable facilities. The following discussion is based heavily on the notion of design patterns, as published by Gamma et al. (1995)

When fork() is called, the current program is duplicated in memory, thus creating a new process, and both processes continue their execution just below the fork() system call. The two processes may, however, inspect the return value of fork(): the return value in the original process (called the parent process) differs from the return value in the newly created process (called the child process):

  • In the parent process fork() returns the process ID of the child process created by the fork() system call. This is a positive integer value.
  • In the child process fork() returns 0.
  • If fork() fails, -1 is returned.

A basic Fork class should hide all bookkeeping details of a system call like fork() from its users. The class Fork developed here will do just that. The class itself only needs to take care of the proper execution of the fork() system call. Normally, fork() is called to start a child process, usually boiling down to the execution of a separate process. This child process may expect input at its standard input stream and/or may generate output to its standard output and/or standard error streams. Fork does not know all this, and does not have to know what the child process will do. However, Fork objects should be able to activate their child processes.

Unfortunately, Fork's constructor cannot know what actions its child process should perform. Similarly, it cannot know what actions the parent process should perform. For this particular situation, the template method design pattern was developed. According to Gamma c.s., the template method design pattern

``Define(s) the skeleton of an algorithm in an operation, deferring some steps to subclasses. (The) Template Method (design pattern) lets subclasses redefine certain steps of an algorithm, without changing the algorithm's structure.''

This design pattern allows us to define an abstract base class already implementing the essential steps related to the fork() system call and deferring the implementation of certain normally used parts of the fork() system call to subclasses.

The Fork abstract base class itself has the following characteristics:

  • It defines a data member d_pid. This data member will contain the child's process id (in the parent process) and the value 0 in the child process:

· class Fork

· {

· int d_pid;

  • Its public interface declares but two members:
    • a fork() member function, performing the actual forking (i.e., it will create the (new) child process);
    • an empty virtual destructor ~Fork(), which may be overridden by derived classes.

Here is Fork's complete public interface:

virtual ~Fork()

{}

void fork();

  • All remaining member functions are declared in the class's protected section and can thus only be used by derived classes. They are:
    • The member function pid(), allowing derived classes to access the system fork()'s return value:

o int pid()

o {

o return d_pid;

o }

    • A member int waitForChild(), which can be called by parent processes to wait for the completion of their child processes (as discussed below). This member is declared in the class interface. Its implementation is

o #include "fork.ih"

o

o int Fork::waitForChild()

o {

o int status;

o

o waitpid(d_pid, &status, 0);

o

o return WEXITSTATUS(status);

o }

This simple implementation returns the child's exit status to the parent. The called system function waitpid() blocks until the child terminates.

    • When fork() system calls are used, parent processes and child processes may always be distinguished. The main distinction between these processes is that d_pid will be equal to the child's process-id in the parent process, while d_pid will be equal to 0 in the child process itself. Since these two processes may always be distinguished, they must be implemented by classes derived from Fork. To enforce this requirement, the members childProcess(), defining the child process' actions and parentProcess(), defining the parent process' actions we defined as pure virtual functions:

o virtual void childProcess() = 0; // both must be implemented

o virtual void parentProcess() = 0;

    • In addition, communication between parent- and child processes may use standard streams or other facilities, like pipes (cf. section 20.3.3). To facilitate this inter-process communication, derived classes may implement:
      • childRedirections(): this member should be implemented if any standard stream (cin, cout) or cerr must be redirected in the child process (cf. section 20.3.1);
      • parentRedirections(): this member should be implemented if any standard stream (cin, cout) or cerr must be redirected in the parent process.

Redirection of the standard streams will be necessary if parent- and child processes should communicate with each other via the standard streams. Here are their default definitions provided by the class's interface:

virtual void childRedirections()

{}

virtual void parentRedirections()

{}

The member function fork() calls the system function fork() (Caution: since the system function fork() is called by a member function having the same name, the :: scope resolution operator must be used to prevent a recursive call of the member function itself). After calling ::fork(), depending on its return value, either parentProcess() or childProcess() is called. Maybe redirection is necessary. Fork::fork()'s implementation calls childRedirections() just before calling childProcess(), and parentRedirections() just before calling parentProcess():

#include "fork.ih"

void Fork::fork()

{

if ((d_pid = ::fork()) <>

throw "Fork::fork() failed";

if (d_pid == 0) // childprocess has pid == 0

{

childRedirections();

childProcess();

exit(1); // we shouldn't come here:

// childProcess() should exit

}

parentRedirections();

parentProcess();

}

In fork.cc the class's internal header file fork.ih is included. This header file takes care of the inclusion of the necessary system header files, as well as the inclusion of fork.h itself. Its implementation is:

#include "fork.h"

#include

#include

#include

#include

Child processes should not return: once they have completed their tasks, they should terminate. This happens automatically when the child process performs a call to a member of the exec...() family, but if the child itself remains active, then it must make sure that it terminates properly. A child process normally uses exit() to terminate itself, but it should be realized that exit() prevents the activation of destructors of objects defined at the same or more superficial nesting levels than the level at which exit() is called. Destructors of globally defined objects are activated when exit() is used. When using exit() to terminate childProcess(), it should either itself call a support member function defining all nested objects it needs, or it should define all its objects in a compound statement (e.g., using a throw block) calling exit() beyond the compound statement.

Parent processes should normally wait for their children to complete. The terminating child processes inform their parent that they are about to terminate by sending out a signal which should be caught by their parents. If child processes terminate and their parent processes do not catch those signal then such child processes remain visible as so-called zombie processes.

If parent processes must wait for their children to complete, they may call the member waitForChild(). This member returns the exit status of a child process to its parent.

There exists a situation where the child process continues to live, but the parent dies. In nature this happens all the time: parents tend to die before their children do. In our context (i.e. C++), this is called a daemon program: the parent process dies and the child program continues to run as a child of the basic init process. Again, when the child eventually dies a signal is sent to its `step-parent' init. No zombie is created here, as init catches the termination signals of all its (step-) children. The construction of a daemon process is very simple, given the availability of the class Fork (cf. section 20.3.2).