Classes for input operations


To derive a class doing input operations from std:: streambuf, the class should use an input buffer of at least one character, to allow the use of the member functions istream::putback() or istream::ungetc(). Stream classes (like istream) normally allow us to unget at least one character using their member functions putback() or ungetc(). This is important, as these stream classes usually interface to streambuf objects. Although it is strictly speaking not necessary to implement a buffer in classes derived from streambuf, using buffers in these cases is strongly advised: the implementation is very simple and straightforward, and the applicability of such classes will be greatly improved. Therefore, in all our classes derived from the class streambuf at least a buffer of one character will be defined.


Using a one-character buffer When deriving a class (e.g., ifdstreambuf) from streambuf using a buffer of one character, at least its member streambuf::underflow() should be overridden, as this is the member to which all requests for input are eventually directed. Since a buffer is also needed, the member streambuf::setg() is used to inform the streambuf base class of the size of the input buffer, so that it is able to set up its input buffer pointers correctly. This will ensure that eback(), gptr(), and egptr() return correct values.

The required class shows the following characteristics:

  • Like the class designed for output operations, this class is derived from std:: streambuf as well:

· class ifdstreambuf: public std::streambuf

  • The class receives two data members, one of them a fixed-sized one character buffer. The data members are defined as protected data members so that derived classes (e.g., see section 20.1.2.3) can access them:

· protected:

· int d_fd;

· char d_buffer[1];

  • The constructor initializes the buffer. However, this initialization is done so that gptr() will be equal to egptr(). Since this implies that the buffer is empty, underflow() will immediately be called to refill the buffer:

· ifdstreambuf(int fd)

· :

· d_fd(fd)

· {

· setg(d_buffer, d_buffer + 1, d_buffer + 1);

· }

  • Finally underflow() is overridden. It will first ensure that the buffer is really empty. If not, then the next character in the buffer is returned. If the buffer is really empty, it is refilled by reading from the file descriptor. If this fails (for whatever reason), EOF is returned. More sophisticated implementations could react more intelligently here, of course. If the buffer could be refilled, setg() is called to set up streambuf's buffer pointers correctly.
  • The implementations of the member functions use low-level functions to operate the file descriptors, so apart from streambuf the header file unistd.h must have been read by the compiler before the implementations of the member functions can be compiled.

This completes the construction of the ifdstreambuf class. It is used in the following program:

#include

#include

#include

#include "ifdbuf.h"

using namespace std;

int main(int argc)

{

ifdstreambuf fds(STDIN_FILENO);

istream is(&fds);

cout <<>

}


Using an n-character buffer How complex would things get if we would decide to use a buffer of substantial size? Not that complex. The following class allows us to specify the size of a buffer, but apart from that it is basically the same class as ifdstreambuf developed in the previous section. To make things a bit more interesting, in the class ifdnstreambuf developed here, the member streambuf::xsgetn() is also overridden, to optimize reading of series of characters. Furthermore, a default constructor is provided which can be used in combination with the open() member to construct an istream object before the file descriptor becomes available. Then, once the descriptor becomes available, the open() member can be used to initiate the object's buffer. Later, in section 20.3, we'll encounter such a situation.

To save some space, the success of various calls was not checked. In `real life' implementations, these checks should, of course, not be omitted. The class ifdnstreambuf has the following characteristics:

  • Once again, it is derived from std:: streambuf:

· class ifdnstreambuf: public std::streambuf

  • Like the class ifdbuf (section 20.1.2.1), its data members are protected. Since the buffer's size is configurable, this size is kept in a dedicated data member, d_bufsize:

· protected:

· int d_fd;

· unsigned d_bufsize;

· char* d_buffer;

  • The default constructor does not allocate a buffer, and can be used to construct an object before the file descriptor becomes known. A second constructor simply passes its arguments to open() which will then initialize the object so that it can actually be used:

· ifdnstreambuf()

· :

· d_bufsize(0),

· d_buffer(0)

· {}

· ifdnstreambuf(int fd, unsigned bufsize = 1)

· {

· open(fd, bufsize);

· }

  • If the object has been initialized by open(), its destructor will both delete the object's buffer and use the file descriptor to close the device:

· ~ifdnstreambuf()

· {

· if (d_bufsize)

· {

· close(d_fd);

· delete[] d_buffer;

· }

· }

  • The open() member simply allocates the object's buffer. It is assumed that the calling program has already opened the device. Once the buffer has been allocated, the base class member setg() is used to ensure that eback(), gptr(), and egptr() return correct values:

· void open(int fd, unsigned bufsize = 1)

· {

· d_fd = fd;

· d_bufsize = bufsize;

· d_buffer = new char[d_bufsize];

· setg(d_buffer, d_buffer + d_bufsize, d_buffer + d_bufsize);

· }

  • The overridden member underflow() is implemented almost identically to ifdstreambuf's (section 20.1.2.1) member. The only difference is that the current class supports a buffer of larger sizes. Therefore, more characters (up to d_bufsize) may be read from the device at once:

· int underflow()

· {

· if (gptr() <>

· return *gptr();

·

· int nread = read(d_fd, d_buffer, d_bufsize);

·

· if (nread <= 0)

· return EOF;

·

· setg(d_buffer, d_buffer, d_buffer + nread);

· return *gptr();

· }

  • Finally xsgetn() is overridden. In a loop, n is reduced until 0, at which point the function terminates. Alternatively, the member returns if underflow() fails to obtain more characters. This member optimizes the reading of series of characters: instead of calling streambuf::sbumpc() n times, a block of avail characters is copied to the destination, using streambuf::gpumb() to consume avail characters from the buffer using one function call:

· std::streamsize xsgetn(char *dest, std::streamsize n)

· {

· int nread = 0;

·

· while (n)

· {

· if (!in_avail())

· {

· if (underflow() == EOF)

· break;

· }

·

· int avail = in_avail();

·

· if (avail > n)

· avail = n;

·

· memcpy(dest + nread, gptr(), avail);

· gbump(avail);

·

· nread += avail;

· n -= avail;

· }

·

· return nread;

· }

  • The implementations of the member functions use low-level functions to operate the file descriptors. So apart from streambuf the header file unistd.h must have been read by the compiler before the implementations of the member functions can be compiled.

The member function xsgetn() is called by streambuf::sgetn(), which is a streambuf member. The following example illustrates the use of this member function with a ifdnstreambuf object:

#include

#include

#include

#include "ifdnbuf.h"

using namespace std;

int main(int argc)

{

// internally: 30 char buffer

ifdnstreambuf fds(STDIN_FILENO, 30);

char buf[80]; // main() reads blocks of 80

// chars

while (true)

{

unsigned n = fds.sgetn(buf, 80);

if (n == 0)

break;

cout.write(buf, n);

}

}


Seeking positions in `streambuf' objects When devices support seek operations, classes derived from streambuf should override te members streambuf::seekoff() and streambuf::seekpos(). The class ifdseek, developed in this section, can be used to read information from devices supporting such seek operations. The class ifdseek was derived from ifdstreambuf, so it uses a character buffer of just one character. The facilities to perform seek operations, which are added to our new class ifdseek, will make sure that the input buffer is reset when a seek operation is requested. The class could also be derived from the class ifdnstreambuf; in which case, the arguments to reset the input buffer must be adapted in such a way that its second and third parameters point beyond the available input buffer. Let's have a look at the characteristics of ifdseek:

  • As mentioned, ifdseek is derived from ifdstreambuf. Like the latter class, ifdseek's member functions use facilities declared in unistd.h. So, the compiler must have seen unistd.h before it can compile the class's members functions. The class interface itself starts with:

· class ifdseek: public ifdstreambuf

  • To reduce the amount of typing when specifying types and constants from std::streambuf and std::ios, several typedefs are defined at the class's very top:

· typedef std::streambuf::pos_type pos_type;

· typedef std::streambuf::off_type off_type;

· typedef std::ios::seekdir seekdir;

· typedef std::ios::openmode openmode;

These typedefs refer to types that are defined in the header file ios, which must therefore be included as well before the compiler reads ifdseek's class definition.

  • The class is given a rather basic implementation. The only required constructor expects the device's file descriptor. It has no special tasks to perform and only needs to call its base class constructor:

· ifdseek(int fd)

· :

· ifdstreambuf(fd)

· {}

  • The member seek_off() is responsible for performing the actual seek operations. It calls lseek() to seek a new position in a device whose file descriptor is known. If seeking succeeds, setg() is called to define an already empty buffer, so that the base class's underflow() member will refill the buffer at the next input request.

· pos_type seekoff(off_type offset, seekdir dir, openmode)

· {

· pos_type pos =

· lseek

· (

· d_fd, offset,

· (dir == std::ios::beg) ? SEEK_SET :

· (dir == std::ios::cur) ? SEEK_CUR :

· SEEK_END

· );

·

· if (pos <>

· return -1;

·

· setg(d_buffer, d_buffer + 1, d_buffer + 1);

· return pos;

· }

  • Finally, the companion function seekpos is overridden as well: it is actually defined as a call to seekoff():

· pos_type seekpos(pos_type offset, openmode mode)

· {

· return seekoff(offset, std::ios::beg, mode);

· }

An example of a program using the class ifdseek is the following. If this program is given its own source file using input redirection then seeking is supported, and with the exception of the first line, every other line is shown twice:

#include "fdinseek.h"

#include

#include

#include

#include

using namespace std;

int main(int argc)

{

ifdseek fds(0);

istream is(&fds);

string s;

while (true)

{

if (!getline(is, s))

break;

streampos pos = is.tellg();

cout <<>

if (!getline(is, s))

break;

streampos pos2 = is.tellg();

cout <<>

if (!is.seekg(pos))

{

cout << "Seek failed\n";

break;

}

}

}

20.1.2.4: Multiple `unget()' calls in `streambuf' objects As mentioned before, streambuf classes and classes derived from streambuf should support at least ungetting the last read character. Special care must be taken when series of unget() calls must be supported. In this section the construction of a class supporting a configurable number of istream::unget() or istream::putback() calls.

Support for multiple (say `n') unget() calls is realized by reserving an initial section of the input buffer, which is gradually filled up to contain the last n characters read. The class was implemented as follows:

  • Once again, the class is derived from std:: streambuf. It defines several data members, allowing the class to perform the bookkeeping required to maintain an unget-buffer of a configurable size:

· class fdunget: public std::streambuf

· {

· int d_fd;

· unsigned d_bufsize;

· unsigned d_reserved;

· char* d_buffer;

· char* d_base;

  • The class's constructor expects a file descriptor, a buffer size and the number of characters that can be ungot or pushed back as its arguments. This number determines the size of a reserved area, defined as the first d_reserved bytes of the class's input buffer.
    • The input buffer will always be at least one byte larger than d_reserved. So, a certain number of bytes may be read. Then, once reserved bytes have been read at least reserved bytes can be ungot.
    • Next, the starting point for reading operations is configured: it is called d_base, pointing to a location reserved bytes from the start of d_buffer. This will always be the point where the buffer refills start.
    • Now that the buffer has been constructed, we're ready to define streambuf's buffer pointers using setg(). As no characters have been read yet, all pointers are set to point to d_base. If unget() is called at this point, no characters are available, so unget() will (correctly) fail.
    • Eventually, the refill buffer's size is determined as the number of allocated bytes minus the size of the reserved area.

Here is the class's constructor:

fdunget (int fd, unsigned bufsz, unsigned unget)

:

d_fd(fd),

d_reserved(unget)

{

unsigned allocate =

bufsz > d_reserved ?

bufsz

:

d_reserved + 1;

d_buffer = new char [allocate];

d_base = d_buffer + d_reserved;

setg(d_base, d_base, d_base);

d_bufsize = allocate - d_reserved;

}

  • The class's destructor simply returns the memory allocated for the buffer to the common pool:

· ~fdunget()

· {

· delete[] d_buffer;

· }

  • Finally, underflow() is overridden.
    • Firstly, the standard check to determine whether the buffer is really empty is applied.
    • If empty, it determines the number of characters that could potentially be ungot. At this point, the input buffer is exhausted. So this value may be any value between 0 (the initial state) or the input buffer's size (when the reserved area has been filled up completely, and all current characters in the remaining section of the buffer have also been read).
    • Next the number of bytes to move into the reserved area is computed. This number is at most d_reserved, but it is equal to the actual number of characters that can be ungot if this value is smaller.
    • Now that the number of characters to move into the reserved area is known, this number of characters is moved from the input buffer's end to the area immediately before d_base.
    • Then the buffer is refilled. This all is standard, but notice that reading starts from d_base and not from d_buffer.
    • Finally, streambuf's read buffer pointers are set up. Eback() is set to move locations before d_base, thus defining the guaranteed unget-area, gptr() is set to d_base, since that's the location of the first read character after a refill, and egptr() is set just beyond the location of the last character read into the buffer.

Here is underflow()'s implementation:

int underflow()

{

if (gptr() <>

return *gptr();

unsigned ungetsize = gptr() - eback();

unsigned move = std::min(ungetsize, d_reserved);

memcpy(d_base - move, egptr() - move, move);

int nread = read(d_fd, d_base, d_bufsize);

if (nread <= 0) // none read -> return EOF

return EOF;

setg(d_base - move, d_base, d_base + nread);

return *gptr();

}

};

The following program illustrates the class fdunget. It reads at most 10 characters from the standard input, stopping at EOF. A guaranteed unget-buffer of 2 characters is defined in a buffer holding 3 characters. Just before reading a character, the program tries to unget at most 6 characters. This is, of course, not possible; but the program will nicely unget as many characters as possible, considering the actual number of characters read:

#include "fdunget.h"

#include

#include

#include

using namespace std;

int main(int argc)

{

fdunget fds(0, 3, 2);

istream is(&fds);

char c;

for (int idx = 0; idx <>

{

cout << "after reading " <<>

for (int ug = 0; ug <= 6; ++ug)

{

if (!is.unget())

{

cout

<< "\tunget failed at attempt " << (ug + 1) << "\n"

<< "\trereading: '";

is.clear();

while (ug--)

{

is.get(c);

cout <<>

}

cout << "'\n";

break;

}

}

if (!is.get(c))

{

cout << " reached\n";

break;

}

cout << "Next character: " <<>

}

}

/*

Generated output after 'echo abcde | program':

after reading 0 characters:

unget failed at attempt 1

rereading: ''

Next character: a

after reading 1 characters:

unget failed at attempt 2

rereading: 'a'

Next character: b

after reading 2 characters:

unget failed at attempt 3

rereading: 'ab'

Next character: c

after reading 3 characters:

unget failed at attempt 4

rereading: 'abc'

Next character: d

after reading 4 characters:

unget failed at attempt 4

rereading: 'bcd'

Next character: e

after reading 5 characters:

unget failed at attempt 4

rereading: 'cde'

Next character:

after reading 6 characters:

unget failed at attempt 4

rereading: 'de

'

reached

*/

No comments: