Computer Science 111, Assignment 4
Tutorial on kinds of repetition, text files, input error states, and break statements

  1. Preliminaries
  2. Text file input
  3. Error handling with interactive input and text file input
  4. C-strings vs. string class objects, again
  5. Displaying columns of numbers:  the setw manipulator
  6. Text file output
  7. More about the setw manipulator: right-justifying columns of both integers and floating-point numbers, using different kinds of output
  8. Event-controlled repetition vs. count-controlled repetition
  9. Sentinel-controlled repetition
  10. End of input with and without a sentinel value
  11. End of input with text files
  12. Variables of type bool
  13. Kinds of event-controlled repetiton
  14. Reading characters in a loop
  15. More about text files
  16. break statements
  17. break statements vs. return statements
  18. Chaining:   consolidating statements in C++


  1. Preliminaries.   To ensure that you have enough disk space, please be sure to get rid of all a.out, l.out, and core files before you begin. To find where they are, type the following at the "forbin>" prompt, in your home directory:

       find . | grep a.out
       find . | grep l.out
       find . | grep core
    

    Then go to the indicated directories and remove the files.

    Also, remove old homework source code files after first backing them up (via FTP) onto more than one diskette.

    Then, if you have not done so already, create a directory named hw04 inside your homework directory. Then change your present working directory to hw04 and copy into it the example files for Assignment 4, as follows:

       cp ~nixon/cs111/hw04/* .
    


  2. Text file input.   Compile and run average2A1.cpp. This program averages a sequence of integers, similarly to average1G.cpp, but takes its input from the text data file numbers.txt instead of from the keyboard.

    Look at the contents of numbers.txt. On forbin, you can look at numbers.txt either by opening it in vi or, more quickly, by typing, at the "forbin>" prompt:

       cat numbers.txt
    

    Then compare average1G.cpp and average2A1.cpp to see how text file input differs from interactive input.

    First, notice that average2A1.cpp contains the following preprocessor directive:

       #include <fstream>
    

    so that the preprocessor can insert the contents of a header file containing information about classes ifstream and ofstream. Recall that a class is fanciest kind of structured data type. An object, i.e. a variable whose data type is a class, not only stores data, but also has behaviors defined for it, as part of the definition of its class. Our program uses an object of class ifstream to read (take input) from a text file. Later, we will see a program which uses an object of class ofstream to output to a text file.

    Before it can read from the file, our program average2A1.cpp first needs to declare a variable of type ifstream:

          ifstream inputFile;
    

    After declaring inputFile to be an object of class ifstream, or program then tells the ifstream object to prepare to read from the text file numbers.txt, as follows:

          inputFile.open("numbers.txt");
    

    Then, whenever the program needs to do input, average2A1.cpp uses:

          inputFile >> whatever 
    

    instead of

          cin >> whatever 
    

    which all our previous programs have used for interactive input. Also, note that average2A1.cpp does not bother prompting the user for input, i.e. there are no statements like:

          cout << "Enter a number:>"; 
    

    which we needed in average1G.cpp in order to tell the user what to do.

    It is possible to take a program written for interactive input and use it to read from a text file too, though the output will be somewhat messy looking if we do so. Compile average1G.cpp (our interactive version) again. Then run it, this time as follows, rather than just typing a.out:

       a.out < numbers.txt
    

    The "<" redirects the input, so that the program a.out now takes its input from a file instead of from the keyboard. However,, observe what the output looks like. As you can see, it displays the prompts, but you can't see the input; and the line returns you would type when doing manual interactive input aren't there. Rather ugly.

    But using average1G.cpp this way is more flexible than average2A1.cpp, because we can now take our input from different files, not just numbers.txt. For example, we can type:

       a.out < numbers2.txt
    

    and thereby average a different sequence of three integers.

    It would be nice if we could re-write average2A1.cpp so that it too could take its input from different data files, not just numbers.txt. For example, we could write it to prompt the user for a filename and get the filename via interactive input. Then, the rest of the program's input could be gotten from the named file rather than from the keyboard.

    Compile and run both average2A2.cpp and average2A3.cpp. Run them both by typing just a.out and nothing else on the same line. Then enter the filename numbers.txt or numbers2.txt interactively when prompted.

    Make sure you spell the filenames correctly. If you misspell them, there won't be an error message, alas. Instead, there will be unpredictable, nonsensical results, known as garbage.

    When I tried mispelling a filename, the garbage apparently included giving an extremely high value to numberOfNumbers, resulting in a loop which repeated many, many times. If this happens to you, press [Ctrl]-C (press the C key while holding down the Ctrl key) as soon as possible. You may still have to wait a minute or so for the loop to quit.

    Later, you'll be shown how to make a program check for errors such as telling an ifstream object to read from a nonexistent file.

    In both average2A2.cpp and average2A3.cpp, the filename has to be stored in memory via some variable. To obtain a filename via interactive input, we need to write something like the following, in which the filename is stored using a variable named filename:

          cout << "Enter filename: ";
          cin >> filename;
    
          // Prepare to read from file:
          ifstream inputFile;
          inputFile.open(filename);
    

    The variable filename must be able to store a sequence of characters, i.e. it must be a string. As we have seen, there are two kinds of strings in C++. In average2A2.cpp, we have declared filename as a C-string, whereas, in average2A3.cpp, we have declared it as a string class object.

    In average2A1.cpp, we told the ifstream object inputFile to prepare to read from text file "numbers.txt" as follows:

       inputFile.open("numbers.txt");
    

    In average2A2.cpp, in which the variable filename was declared as a C-string, we can tell the ifstream object inputFile to prepare to read from the text file whose filename was entered by the user:

       inputFile.open(filename);
    

    Recall that an object's behaviors are defined in the form of functions and operators. In the definition of class ifstream, the open function is defined to tell the ifstream object to prepare to read from the specified file.

    The open function of class ifstream expects a C-string -- NOT a string class object -- as an argument. Therefore, in average2A3.cpp, in which filename is declared as a string class object, we CANNOT call the open function the same way we did in average2A2.cpp, where filename was declared as a C-string. Fortunately, the string class has a function c_str which generates a C-string equivalent to the string object for which the c_str function is called. So, in average2A3.cpp, we can call an ifstream object's open function as follows:

          inputFile.open(filename.c_str());
    


  3. Error handling with interactive input and text file input.   Compile and run average1G.cpp again. This program averages a sequence of integers obtained via interaction with the user. Try entering non-numeric strings when prompted for integers. Observe that the program outputs garbage. and generally behaves very oddly.

    Now compile and run average3.cpp and observe the error message you get for non-numeric input. Then look at the source code. The error-checking is done as follows:

       if ( !cin )
       {
          cout << "You must enter integers only." << endl;
          return 1;
       }  // if
    

    Observe that the value of the object reference cin, of data type istream, can be converted automatically to a value type bool and used as part of a boolean expression. The resulting bool value is true if all input has been successful so far, false otherwise. Checking its value is called checking the error state of the input stream object.

    Now compile and run average2A3.cpp again. This was one of the versions that prompted you for the filename of a text data file. Try entering a filename of a file which does not exist. Then try entering the filename of a file which does exist, but contains invalid data, e.g. the file average2A3.cpp itself. Observe the garbage results.

    Then compile and run average2B3.cpp, and observe that it generates appropriate error message for the above two cases. Look now at its source code. It checks the error state of the ifstream variable inputFile in two different parts of the program. First, after the file is opened:

       if ( !inputFile )
       {
          cout << "Can't read file "
                   << filename << "." << endl;
          return 1;
       }  // if
    

    And then again, later, after the number of integers to be averaged has been input from the file:

       if ( !inputFile )
       {
          cout << filename << " file data format error: "
                   << "Can't read number of numbers." << endl;
          return 1;
       }  // if
    

    And then again, later, after each one of the numbers to be averaged has been input from the file:

          if ( !inputFile )
          {
             cout << filename << " file data format error: "
                   << "Not reading " << numberOfNumbers
                          << " consecutive integers." << endl;
             return 1;
          }  // if
    

    The following programs:

    are error-checking versions of the following programs, respectively, which we looked at earlier, all of which take input from text files:

    Experiment with running all of them.


  4. C-strings vs. string class objects, again.   Look now at average2B2.cpp/ It uses a C-string, declared as an array of characters, for the filename, which is input interactively via cin. Note that the array is declared to hold up to 20 data items of type char. Thus, it can store a string with up to 20 characters, including the characters of the string itself plus a null character to indicate the end of the string. Thus, it can store a filename which is up to 19 characters long, not counting the end-of-string null character.

    Let's see what happens if you try to make this program read from a file whose name is more than 19 characters long. Compile average2B2.cpp again, and run it. When prompted for a filename, enter:

       FileWithAVeryLongName.txt
    

    The result will be a run time error, which might cause a core dump, or it might cause other unpredictable results.

    Unlike a C-string, an object of the C++ string class is smart enough to adjust its length. Compile and run average2B3.cpp, and observe that it does NOT have a problem with the data file named FileWithAVeryLongName.txt.


  5. Displaying columns of numbers:  the setw manipulator.   Compile sloppyColumns.cpp and run it. It displays a sloppy-looking table of the powers of 2. You'll learn how to write an application displaying a neater table later; don't worry about this for now.

    Observe that the powers of 2 grow very quickly. According to the last line in the table, 2 to the 16th power is 65536.

    The table is generated by a while loop, one line of the table per iteration of the loop:

       int power = 1;
       int exponent = 0
    
       while ( power < 100000 )
       {
          cout << "   " << exponent
                      << "        " << power << endl;
          power *= 2;
          exponent++;
       }  // while
    

    Observe that the powers of 2 are generated by multiplying the variable power by 2 each time through the loop. The variable power is initially set to 1, which is 2 to the zero power.

    Examine the program carefully, until you understand how the while loop does indeed do what it is supposed to do at each step. Among other things, make sure you understand how the loop terminates and thus is not an infinite loop.

    But it would be nice to display columns that are right-justified, i.e. lined up at the right side, the usual way of displaying columns of numbers. For an example of a program which does this, compile neatColumns1.cpp and run it.

    The right-justification is done using the setw manipulator, as follows:

          cout << setw(5) << exponent << " "
                      << setw(11) << power << endl;
    

    The setw manipulator inserts leading spaces in front of the integer that follows it. It inserts enough leading spaces so that the total number of character positions taken up by the leading spaces plus the integer itself is equal to the column width (number of character positions) specified by the argument to the setw manipulater.

    In order to use the setw manipulator, we must include the library header file <iomanip>.

    Carefully compare the while loop in neatColumns1.cpp with the while loop in sloppyColumns.cpp, and make sure you understand all the similarities and differences.


  6. Text file output.   In much the same way that we have used objects of class ifstream for text file input, we can also use objects of another class, ofstream, for text file output. Class ofstream, like class ifstream, is declared in standard library header file <fstream>, which must be included in programs that use it.

    A program using an ofstream object must first declare it, e.g. as follows:

       ofstream outputFile;
    

    Then, to tell the ofstream object to prepare to write to a particular file:

       outputFile.open(filename);
    

    where filename is a C-string which has previously been given a value in one way or another. Note that the ofstream object's open function will work regardless of whether a file with the specified filename already exists. If a file with that name does not yet exist, it will be created. If a file with that name does already exist, then the old version will be overwritten by the new -- unless the user doesn't have write permission for that file. In the event of the latter possibility, it is a good idea to check the error state of the ofstream object immediately after opening the file:

       if ( ! outputFile )
    

    and print an error message (via cout) if the error state is false.

    If the file has been opened successfully, then, to output to the file, use the same insertion operator ("<<") that you would use for output via cout. For example:

       outputFile << whatever;
    

    As we have seen, when we input from a text file, the error state of the ifstream object should be checked not only after opening the file, but also after every input. On the other hand, when we output to a tile, it is not necessary to check the error state of the ofstream object after every single output. But it is a good idea to check it occasionally. The error state of an ofstream object should be checked not only after the file is opened, but also after all output to the file is finished, to make sure all the output was successful. (The error state will be false if there was an error at any point.)

    For an example of a program which uses text file output, compile neatColumns2.cpp and run it. It will create file powerOfTwo.txt, whose contents you can then view by typing, at the "forbin>" prompt:

       cat powersOfTwo.txt
    


  7. More about the setw manipulator: right-justifying columns of both integers and floating-point numbers, using different kinds of output.   We've seen how to use the setw manipulator to right-justify a column of integers displayed to the console. More generally, the setw manipulator is used as follows with integers:

       outputstream << setw(width) << variable_name << endl;
    

    where outputstream is either cout or an ofstream object (used for output to a text file), width is the column width (total number of character positions taken up by the number and the leading spaces), and variable_name is an integer (i.e. a variable of type int, shortt, or long).

    For an example of the use of the setw manipulator to display a column of integers to a text file, see neatColumns2.cpp.

    So far, we've seen how to right-justify a column of integers. To right-justify a column of floating-point numbers, we also need to specify the number of decimal places, i.e. the number of digits after the decimal point. To right-justify a column of floating point numbers:

       outputstream << setw(width) << setprecision(places)
                        << setiosflags(ios::fixed)
                        << variable_name << endl;
    

    where outputstream is either cout or an ofstream object, width is the column width (total number of character positions taken up by the number and the leading spaces), places is the number of decimal places (digits after the decimal point), and variable_name is a floating-point variable (i.e. a variable of type float or type double).

    For an example, see neatColumns3.cpp, which displays a table of powers of 2.5. The following statement, inside a loop, right-justifies both a column of integers and a column of floating-point numbers:

          cout << setw(5) << exponent << " "
                      << setw(12) << setprecision(3)
                      << setiosflags(ios::fixed)
                      << power << endl;
    


  8. Event-controlled repetition vs. count-controlled repetition.   In the loops in all the averaging programs we have looked at so far, the number of iterations (repetitions) is controlled using a variable which counts either up or down for each iteration. The condition tests the value of this variable, comparing it to a predetermined value. A loop controlled in this manner is said to be count-controlled.

    Count-controlled repetition is the most common way of controlling a loop. But we sometimes need to control loops in other ways. These other ways are all known, collectively, as event-controlled repetition.

    For example, in our programs which examined all the characters in a C-string, e.g. lengthCString2B.cpp and countLetters.cpp, there is a variable (length or index) which counts up each time the loop iterates, but does NOT count up to a predetermined value. Note that the loop condition does NOT look like this:

       while ( index < LENGTH )
    

    but rather like this:

       while ( text[index] != '\0')
    

    Another example: Suppose we want to average a sequence of integers entered by the user, but WITHOUT requiring the user to say, in advance, how many numbers will be in the sequence. To require the user to count the numbers manually before entering them, as we did in average3.cpp (in the Assignment 4 example files) and average3.cpp, can be an inconvenience to the user. It would be better if the user could just enter the numbers to be averaged, without having to enter the number of numbers first. After entering all the numbers, the user could then, somehow, signal that there are no more numbers to be entered.

    This could not be accomplished by a loop which repeats until a count of iterations reaches some pre-determined value. Instead, we would need the loop to repeat until the event of the user signalling that there are no more numbers to be entered.


  9. Sentinel-controlled repetition.   A special data value that signals the end of a sequence, and which is not treated as a valid part of the sequence itself, is known as a sentinel.

    An example of a sentinel is the null character at the end of a C-string. Recall that the null character is not considered to be part of the string itself, but is an extra character AFTER the last character. It occupies a position that is one byte past the last character. Thus, in lengthCString2B.cpp, we determined the length of a C-string using the following while loop:

       int length = 0;
       while ( line[length] != '\0' )
          length = length + 1;
    

    where line is some C-string, and '\0' is an escape sequence for the null character. This loop iterates until the null character is reached.

    When the above while loop exits, length is equal to the index of the location containing the null character. Thus, it is one greater than the index of the location containing the last character which is considered to be part of the string itself. For example, if the string has 4 characters, then the last character is at position 3, and thus the terminating null character is at position 4. So, the length of the string is equal to the final value of length.

    Likewise, consider the varirable index in the while loop in countLetters.cpp:

       int countLetters = 0;
       int index = 0;
       while ( text[index] != '\0')
       {
          if ( (text[index] >= 'A'  &&  text[index] <= 'Z')
                  ||  (text[index] >= 'a'  &&  text[index] <= 'z') )
             countLetters = countLetters + 1;
          index = index + 1;
       }
    

    A loop which repeats until a sentinel value is encountered is said to be sentinel-controlled. Sentinel-controlled repetition is a subcategory of event-controlled repetition.


  10. End of input with and without a sentinel value.   In order for sentinel-controlled repetition to work, there must exist a special data value which can be used as the sentinel and for no other purpose. For example, it is commonly accepted that null characters are not to be treated as normal characters, but are used solely to indicate the end of a C-string.

    Unfortunately, in the case of our averaging program, there is no such value.

    If our program were not averaging integers in general, but instead were averaging just exam scores in the range 0 to 100, then we could use a value outside that range, such as -1, as a sentinel value. This we do in averageScores.cpp.

    Below is an abbreviated pseudocode version of the loop in averageScores.cpp. It is recommended that you examine it side-by-side with a printout of the program itself.

       // After prompting user for score:
       int score;
       cin >> score;          // Read FIRST score
       // Check cin error state and value of score,
       // before checking for sentinal value ....
    
       // We are now ready to check for sentinel.
    
       while ( score != -1 )
       {
    
          // If we have reached this point,
          // score is a valid score.
          //
          // Now do everything necessary to
          // process the score most recently read.
          // Add it to a cumulative sum and
          // increment a count of the scores.
          // Then prompt user for another score.
          // Then, after prompting user for score:
    
          cin >> score;       // Read NEXT score
          // Check cin error state and value of score,
          // before checking for sentinal value ....
       }  // while
    

    However, if we want to be able to average ALL integers, not just integers in some range like 0 to 100, then there is no special value that can be used only as a sentinel.

    So, instead of using a sentinel value of the entered number, we will use the error state of cin to control the loop. We will tell the user to signal end of input by entering a non-numeric character, such as 'X'. When cin tries to read a non-numeric character into a variable of type int, its error state will become false. So our loop, in average4A.cpp, has a while loop with the following heading:

       while ( cin )
    

    which will iterate until the error state of cin becomes false, i.e. until the user types a non-numeric character, thereby causing cin to try to read a non-numeric character into the int variable number.

    Below is an abbreviated pseudocode version of the loop in average4A.cpp. Examine it side-by-side with a printout of the program itself.

       // After prompting user for number:
       int number;
       cin >> number;          // Read FIRST number
    
       while ( cin )
       {
    
          // Now do everything necessary to
          // process the number most recently read.
          // Add it to a cumulative sum and
          // increment a count of entered numbers.
          // Then prompt user for another number.
          // Then, after prompting user:
    
          cin >> number;       // Read NEXT number
       }
    

    Let's now examine the programs average3.cpp and average4A.cpp in more detail. Both these programs average a sequence of numbers entered by the user. The average is determined by using a while loop to generate a cumulative sum of the entered numbers, and then, after the loop, dividing the cumulative sum by the number of numbers that were entered. The variable count keeps track of how many numbers have been entered so far.

    In average3.cpp, the user is asked how many numbers are to be averaged. The user's answer is stored in the variable numberOfNumbers. Then the while loop iterates exactly numberOfNumbers times, so that that many numbers are entered and averaged. The loop has this heading:

       while ( count < numberOfNumbers )
    

    On the other hand, average4A.cpp does not ask the user how many numbers will be entered. Instead, it prompts the user to enter the numbers and then a non-numeric character, such as 'X', to signal that there are no more numbers to be entered. As explained earlier, the loop uses the error state of cin to detect the non-numeric character and stops iterating at that point.

    In average3.cpp, observe that the variable count is used to count iterations of the loop, and that count is used in the condition. The loop stops executing when the count of iterations reaches a particular value; and, in that sense, the loop is controlled by the count. On the other hand, although average4A.cpp also counts iterations of the loop, the count is NOT used in the condition, and thus the loop is not count-controlled.. Instead, the loop iterates until a particular event occurs, in this case the event of a non-numeric character being encountered in the input. The final value of the count is then used later, in the division which computes the average. Thus, although average3.cpp and average4A.cpp both count iterations of the loop, they do so for very different purposes.

    Other important differences between average3.cpp and average4A.cpp are:

    • average3.cpp does its input at the beginning of the loop body, whereas average4A.cpp does its input both before the loop and at the end of the loop body. These differences are necessary in order for the conditions to work properly. To understand why, examine carefully what both programs do, step by step.

    • In average4A.cpp, because the error state of the input is checked by the loop condition, a separate if block is not needed for that purpose.


  11. End of input with text files.   Make sure there is a copy of the text data file moreNumbers.txt in your hw04 directory. Then compile average5A.cpp and run it with moreNumbers.txt as a command-line argument. This program averages ALL the integers in moreNumbers.txt, WITHOUT interpreting the first number as the number of numbers in the sequence.

    Look now at the source code of average5A.cpp. This program is similar to average4A.cpp except that it reads numbers from a text file, via a file input stream object, instead of from interactive input via cin.

    In average5A.cpp, the while loop is controlled by the error state of inputFile, an object of class ifstream, similarly to the way that the while loop in average4A.cpp is controlled by the error state of cin.

    An ifstream object goes into the fail state if it tries to read from a file after the end of the file has been reached. Thus, an ifstream object's error state can be used to check whether the end of the file has been reached. In average5A.cpp, the while loop reads numbers from the file until the end of the file.

    In average5A.cpp, as in average4A.cpp, the first number is read BEFORE the beginning of the while loop. The while loop's condition tests whether what was read, by the file input stream object, is a valid number. Thus the body of the loop is executed only if the file input stream did not fail. Each iteration of the loop body processes a valid number, adding it to the cumulative sum. Then, at the end of the loop body, an attempt is made to read the NEXT number. If a next number is read successfully, i.e. if there is still at least one previously-unread number in the file, then the loop body is executed again. Otherwise, the program goes on to the next line after the loop.

    In average5A.cpp, observe the use of the setw manipulator to display the numbers in a right-justified column.

    Both average4A.cpp and average5A.cpp use event-controlled repetition, not count-controlled repetition. Observe also that, in both programs, the last statement in the loop body does NOT increment or decrement a variable, as does the last statement in the loop body of average3.cpp.


  12. Variables of type bool.   Compile checkNonAscii1A.cpp and run it. Enter some text, as prompted. It detects whether your entered string contains any non-ASCII characters. Try it both with and without non-ASCII characters. (To type a non-ASCII character, hold down the Alt key while typing a number between 128 and 255 on the numeric keypad only.)

    Consider what this program needs to do in order to detect whether a C-string contains a non-ASCII character. It will need to examine each character, starting at location 0, up to the null character, and determine whether the character is an ASCII character. Thus our program will need to have a loop something like this:

       int index = 0;
       while ( text[index] != '\0')
       {
          // Determine whether or not text[index] is
          // ASCII, and then do something appropriate.
          // But what?
    
          index++;
       }  // while
    

    How do we determine whether the character is ASCII? Recall that the ASCII characters are those with binary codes having numeric values 0 to 127. If type char were unsigned (i.e. treated as containing non-negative numbers only), then the non-ASCII characters' binary codes would have numeric values in the range 128 to 255. However, in C++, type char is signed by default, with numeric values in the range -128 to +127. So the non-ASCII characters are in the range -128 to -1. So we can detect a non-ASCII character by detecting whether the character has a negative binary code value, as follows:

          if ( text[index] < 0 )
    

    Alternatively, we can detect an ASCII character by detecting whether the character has a non-negative binary code value, as follows:

          if ( text[index] >= 0 )
    

    But what is the appropriate thing to do when a non-ASCII character is detected? We do NOT want to do the following:

       int index = 0;
       while ( text[index] != '\0')
       {
          if ( text[index] >= 0 )
             cout << "The string does NOT contain a non-ASCII character." << endl;
          else
             cout << "The string DOES contain a non-ASCII character." << endl;
    
          index++;
       }  // while
    

    What's wrong here? First, if a non-ASCII character has NOT yet been found at a given index, we don't yet know whether or not the string as a whole contains a non-ASCII character. We won't know this until we reach the end of the string. Second, if the string does contain at least one non-ASCII character, we want to output this information only once, not once for each character.

    So, we don't want an output for each character. We want an output only AFTER the loop is finished. Within the loop itself, we want some means of keeping track of whether a non-ASCII character has been found or not.

    In other words, within the loop, we want to keep track of whether it is true or false that a non-ASCII character has been found yet. To accomplish this, we can use a variable of type bool, a data type which has two values, true and false.

    Now examine the source code in checkNonAscii1A.cpp. This program uses a variable of type bool. The boolean variable nonAaciiFound is used within a loop to indicate whether a non-ASCII character has been found so far:

       bool nonAsciiFound = false;
    
       int index = 0;
       while ( text[index] != '\0')
       {
          if ( text[index] < 0 )
             nonAsciiFound = true;
          index++;
       }
    

    The value of nonAaciiFound is initially false. Each character in the string is then tested, and the value of nonAaciiFound becomes true when an ASCII control character is found. This technique is sometimes called "innocent until proven guilty."

    Output is then done once, after the loop:

       cout << "Your string ";
       if ( nonAsciiFound )
          cout << "contains at least one non-ASCII character.";
       else
          cout << "does not contain any non-ASCII characters.";
       cout << endl;
    

    Slight problem: Suppose that the string were very long. Suppose that the string consisted of the entire contents of a very large text file. And suppose that the second character were non-ASCII. In that case, by the time we reach the second character, we already know that the string contains one non-ASCII character, and we don't need to keep examining the rest of the string. But our program does continue to examine the rest of the string.

    This problem is solved in checkNonAscii2A.cpp, as follows:

       bool nonAsciiFound = false;
    
       int index = 0;
       while ( text[index] != '\0' && !nonAsciiFound )
       {
          if ( text[index] < 0 )
             nonAsciiFound = true;
          index++;
       }
    

    Note the exclamation point ("!") in front of nonAaciiFound in the for loop condition. In this context, the exclamation point is the NOT operator, negating the boolean expression that immediately follows it. Thus !nonAsciiFound is true whenever nonAsciiFound is false, and it is false whenever nonAsciiFound is true.

    The above loop quits when either (1) the end of the string is reached OR (2) a non-ASCII character is found.

    But consider carefully why the loop condition uses "and" (&&) rather than "or" (||), whereas the above sentence, describing when the loop quits, used the word "OR." The reason is that the loop condition tells us under what circumstances we want the loop to continue iterating, which is the negation of the circumstances under which we want the loop to quit. And remember DeMorgan's laws:

    not (P or Q) <==> (not P) and (not Q)
    not (P and Q) <==> (not P) or (not Q)

    In general, when using compound conditions, be careful about your AND's and OR's.


  13. Kinds of event-controlled repetiton.   As we have seen, one important kind of event-controlled repetition is sentinel-controlled repetition, and another kind is EOF (end of file) controlled repetition. Yet another kind is flag-controlled repetition, which uses a boolean variable as a loop condition.

    In checkNonAscii2A.cpp, in the Assignment 4 example files, we used a combination of count-controlled repetition and flag controlled repetition:

       // No non-ASCII character has been found YET:
       bool nonAsciiFound = false;
    
       // Search for a non-ASCII character:
       int index = 0;
       while ( text[index] != '\0' && !nonAsciiFound )
       {
          if ( text[index] < 0 )
             nonAsciiFound = true;
          index++;
       }  // while
    

    Recall that the above loop stops when EITHER (1) the null character is reached OR (2) nonAsciiFound becomes true (and hence !nonAsciiFound becomes false).


  14. Reading characters in a loop.   Our two example programs checkNonAscii1A.cpp and checkNonAscii2A.cpp use cin to input into a C-string, as follows:

       cout << "Enter your string:>";
       char text[LENGTH + 1];
       cin >> text;
    

    Thus, they have the familiar problem of unpredictable results when the string is too long. One way of avoiding this problem, of course, would be to use an object of class string instead of a C-string. However, given the tiny amount of disk space you have on forbin, you might prefer to avoid using the string class, due to the large increase in the size of the executable file that results when a program includes the <string> header file.

    Another alternative is to use the getline function of cin, instead of the extraction operator. With a C-string, the getline function has an argument specifying the maximum number of characters to be read into the C-string, so you can avoid going out of range.

    But, as we have seen, the getline function and the extraction operator do not behave the same regarding whitespace. Suppose that, for whatever reason, we still wanted our program's input to behave like the extraction operator. Suppose we still want our program to skip over leading whitespace, then read non-whitespace character and stop reading the input when it encounters a whitespace character.

    As it turns out, our program doesn't really need to use a string variable at all in order to behave as we intend. It can simply inspect the sequence of characters entered by the user, one character at a time, without need to store more than one of them in memory at a time. Or, to be more exact, there is no need for our program to store more than one character at a time -- although the standard input stream, separately from our program itself, routinely does store an entire line of keyboard input text in an array of its own, known as an input buffer, as explained below.

    When the user types a line of text, none of the characters on the line can be read by a program until AFTER the user has pressed Enter to end the line. When the user presses Enter, all the characters on the line -- including the end-of-line marker generated by the Enter key itself -- are then put into the input buffer. Only then can the input stream object cin begin to read the characters on the line into one or more variables in our program.

    When the program reaches a statement in which cin is trying to read characters on a line for which the user has not yet pressed the Enter key, then the program is blocked from doing anything at all until the user finally does press Enter. Then, cin can read all the characters on the line.

    Look now at checkNonAscii1B.cpp. This program uses the get function of cin to read characters, one at a time. First it reads and ignores any whitespace characters, and then it reads non-whitespace characters until a whitespace character is encountered. The whitespace characters are the space (' '), tab ('\t'), carriage return ('\r'), and newline ('\n'). Below are the relevant loops and some preceding statements:

       // Read and ignore leading whitespace, until
       // non-whitespace is reached.
       char typed;
       cin.get(typed);         // Read FIRST character.
       while ( typed == ' ' || typed == '\t'
                 || typed == '\r' || typed == '\n' )
       {
          cin.get(typed);      // Read NEXT character.
       }  // while
    
       // Read and echo non-whitespace characters,
       // searching for a non-ASCII character, until
       // whitespace is reached.
       cout << "You typed: ";  // Announce echoed input.
       while ( typed != ' ' && typed != '\t'
                 && typed != '\r' && typed != '\n' )
       {
          cout << typed;       // Output current character.
          if ( !nonAsciiFound && typed < 0)
             nonAsciiFound = true;
          cin.get(typed);      // Read NEXT character.
       }  // while
    

    Compile the program and run it with several different inputs, so you can see exactly what it does. Observe that the input behaves exactly like input with cin's extraction operator.

    However, within our program, there is still one slight difference between the behavior of cin's extraction operator and the behavior of our two loops. Observe that the second loop quits when it encounters a whitespace character. This means that a whitespace character HAS BEEN read into our character variable typed. On the other hand, cin's extraction operator reads non-whitespace characters up to but NOT including the first whitepace character it then encounters.

    We will not concern ourselves, now, with how this slight difference can be eliminated. But you should be aware of it.

    Look now at checkNonAscii2B.cpp. This program is identical to checkNonAscii1B.cpp except that, in checkNonAscii2B.cpp, the second loop quits either when a whitespace character is encountered OR when a non-ASCII character is encountered.


  15. More about text files.   Compile copy.cpp. (This program uses the string class, which is huge, so you may need to delete a few files first.) Then run it. When prompted for a filename, give it the filename of any of the files in your directory, such as perhaps copy.cpp itself:

    The program will generate a copy of the file. The copy's filename will consist of the original's filename prefixed by "copy-". Thus, if your original was copy.cpp itself, the copy will have the name copy-copy.cpp. After you've run the program, you can then view the contents of the copy by using the Unix more command. For example:

       more copy-copy.cpp
    

    Our program creates and opens objects of classes ifstream and ofstream and then copies the input file, one character at a time, via the following while loop:

       // Begin reading the input file, one byte at a time:
       char currentByte;             // the byte most recently read
       inputFile.get(currentByte);   // read the FIRST byte, if any
    
       while ( inputFile )           // i.e. while not end-of-file
       {
          // Output the byte most recently read:
          outputFile << currentByte;
    
          // Read NEXT byte from input file, if any bytes are left:
          inputFile.get(currentByte);
       }  // while not end of file
    

    The get function can be used with ifstream objects as well as with cin. So too can the getline function.

    Look now at lineBreakFixer.cpp, a program very similar to copy.cpp except that it copies its input file one line at a time instead of one byte at a time:

       // Begin reading the input file, one line at a time:
       string line;                  // the line most recently read
       getline( inputFile, line );   // read the FIRST line, if any
    
       while ( inputFile )           // i.e. while not end-of-file
       {
          // Output the line most recently read:
          outputFile << line << endl;
    
          // Read NEXT line from input file, if any lines are left:
          getline( inputFile, line );
       }  // while not end of file
    

    Use both programs to make copies of their own source code. Then generate a detailed listing of files in your present working directory by typing:

       ls -l
    

    at the "forbin>" prompt. Look at the column which lists the number of bytes in each file. Observe that copy-lineBreakFixer.cpp is one byte longer than lineBreakFixer.cpp, i.e. the copy contains one more character. The reason for this is that lineBreakFixer.cpp does not contain a newline character (line return) at the very end of the file, whereas copy-lineBreakFixer.cpp does contain a newline character as its very last character. (The newline is a control character.) Look back at the program's while loop and observe that it always appends an end-of-line marker (which, on a Unix system, means a newline character) after each line of text, regardless of whether there was an end-of-line marker in the original file.

    Thus, our program lineBreakFixer.cpp is good for copying text files only. It is not good for copying any other kind of file, because other kinds of files might be hurt by appending a line return character at the end. On the other hand, copy.cpp can copy any kind of file correctly.


  16. break statements.   In average3.cpp, average4A.cpp, copy.cpp, lineBreakFixer.cpp, and most (but not all) of our other examples of event-controlled repetition, we used while loops like the following:

       statements1;
       while ( condition )  {
          statements2;
          statements1;   // again
       }
    

    where statements1 and statements2 each consist of one or more statements. Note that statements1 appears twice, once before the loop and once at the end of the loop body, whereas statements2 appears only once. For example, in average4A.cpp, statements1 consists of 1 statement (a cin statement), and statements2 consists of two statements:

       int number;
       cin >> number;
    
       while ( cin )
       {
          sum = sum + number;
          count++;
          cin >> number;
       }
    

    Note that the cin statement occurs twice, once before the loop and once at the end of the loop body.

    Another example can be found in lineBreakFixer.cpp, which contains two identical getline statements, one just before the loop and one at the end of the loop body:

       string line;recently read
       getline( inputFile, line );
    
       while ( inputFile )  {
          outputFile << line << endl;
          getline( inputFile, line );
       }
    

    With this pattern, note that one or more statements occur in two different places. Such pairs of identical statements are sometimes necessary, but are considered undesirable. Suppose that, in debugging a program, you discover that you need to modify such a statement. You then need to remember to modify it on BOTH of the lines on which it occurs in your program. So, it is considered preferable to write everything once, if possible.

    Another way to write the loop in lineBreakFixer.cpp is the following, in copyWhileBreak.cpp:

       while ( true )  {
          getline( inputFile, line );
          if ( ! inputFile )
             break;
          outputFile << line << endl;
       }
    

    Despite the always-true loop condition, the above loop avoids being infinite via the break statement, which forces the loop to quit at that point. In effect, this loop is controlled not by the while condition but by the if condition. In general, a break statement causes its most immediate enclosing block (in this case the while loop) to quit. Execution then continues with whatever statements follow the block, in this case whatever statement appears just below the while loop.

    The above example has the advantage of not repeating ANY lines of code, neither statements nor conditions. The call to getline does not appear more than once, as it does in lineBreakFixer.cpp. Because nothing is repeated, this version is easier to modify. It is easier to edit a line of code if you don't have to track down multiple copies of that line and make sure you've edited them all to be consistent. Thus, many programmers would consider the version with the break statement to be much better in terms of programming style.

    The loop in copyWhileBreak.cpp is also a little more efficient than the loop in copyDoWhile.cpp. In copyWhileBreak.cpp, the while condition of true is trivial to check; it does not involve an actual comparison, as does the if condition.

    However, others would consider the break statement to be bad programming style because it violates the rules of a school of thought known as structured programming, which, among other things, insists that all loops should be controlled by conditions at either the top or the bottom.

    In the old days, most programming languages had a commonly-used keyword goto, which allowed program execution to jump from anywhere to anywhere. All program statements were labeled (numbered), and a goto statement consisted of the word goto followed by the label of the statement to which you wanted your program to jump. All this jumping around could easily make a program very hard to read. That's why branching structures (if and if/else) and loops (e.g. while loops) were invented -- so that goto statements could be eliminated and, instead, a program's jumping around could occur only in predictable, hence easier-to-read patterns.

    A furious debate ensued between the adherents of structured programming and other programmers who preferred the flexibility allowed by goto statements. The break statement represents a compromise in this debate. Although it violates what some regard as the overly rigid rules of structured programming, it does so in a manner more constrained, hence more predictable, and hence more readable, than the total freedom allowed by goto statements.

    In this course, you may use break statements, but you must NOT use goto. C++ does allow goto statements, but newer programming languages such as Java do not, and with good reason.

    In this course, you are not REQUIRED to use break statements (except for the required breaks in a switch statement, as will be discussed later in the semester). You may, if you prefer, follow strictly the rules of structured programming.

    Look now at average4B.cpp and compare it to average4A.cpp. Observe that average4B.cpp uses the following loop:

       while ( true )
       {
          int number;
          cin >> number;
    
          if ( !cin )
             break;
    
          sum += number;
          count++;
       }
    

    in place of the following loop in average4A.cpp:

       int number;
       cin >> number;
    
       while ( cin )
       {
          sum = sum + number;
          count++;
          cin >> number;
       }
    

    In average4B.cpp, observe how our use of the conditional break statement eliminates repetition of the cin statement and also allows us to write the statements in our loop in a more intuitively obvious order.

    Similarly, look now at average5B.cpp and compare it to average5A.cpp. Observe that average5B.cpp uses the following loop:

       while ( true )
       {
          int number;
          inputFile >> number;
    
          if ( !inputFile )
             break;
    
          cout << setw(15) << number << endl;
          sum += number;
          count++;
       }
    

    in place of the following loop in average5A.cpp:

       int number;
       inputFile >> number;
    
       while ( inputFile )
       {
          cout << setw(15) << number << endl;
          sum += number;
          count++;
          inputFile >> number;
       }
    

    A conditional break statement can be used to simplify programs in other ways too. For example, look now at the loop in checkNonAscii3A.cpp and observe how it eliminates the need for the compound condition we used in checkNonAscii2A.cpp. Below is the loop in checkNonAscii3A.cpp:

       bool nonAsciiFound = false;
    
       int index = 0;
       while ( text[index] != '\0' )
       {
          if ( text[index] < 0 )  {
             nonAsciiFound = true;
             break;
          }
    
          index++;
       }
    

    When a non-ASCII character (i.e. a character with a negative binary code value) is encountered, then not only is the boolean variable nonAsciiFound set to true, but also the break statement makes the loop quit at this point, so that the program does not continue inspecting characters after a non-ASCII character has been found.

    Recall how, in checkNonAscii2A.cpp, we accomplished this using a compound condition:

       bool nonAsciiFound = false;
    
       int index = 0;
       while ( text[index] != '\0' && !nonAsciiFound )
       {
          if ( text[index] < 0 )
             nonAsciiFound = true;
          index++;
       }
    

    When you use a compound condition, you need to think carefully about when to use AND ("&&") and when to use OR ("||"). Sometimes this isn't intuitively obvious. For some people, the use of break statements may be easier to do correctly.

    Similarly, compare the loop in checkNonAscii3B.cpp with the loop in checkNonAscii2B.cpp.


  17. break statements vs. return statements.   Often, there is more than one way to write a program. So far, in all versions of our program to check whether a string contains a non-ASCII character, we have used a boolean variable. But it is possible to write that program without a boolean variable, as follows:

       // Search for a non-ASCII character:
       int index = 0;
       while ( text[index] != '\0' )
       {
          if ( text[index] < 0 )  {
             cout << "Your string contains at least one "
                         << "non-ASCII character." << endl;
             return 0;
          }  // if
    
          index++;
       }  // while
    
       // If this point has been reached, the string
       // does not contain any non-ASCII characters.
    
       // Output result:
       cout << "Your string does not contain any "
                         << "non-ASCII characters." << endl;
    

    Look now at checkNonAscii4A.cpp and compare it with checkNonAscii3A.cpp. Note that checkNonAscii4A.cpp uses a return statement in the loop, whereas checkNonAscii3A.cpp uses a breal statement. The difference between a return statement and a break statement is that a return statement makes the main function quit, whereas a break statement makes just the loop quit. With a break statement in a loop, the statements after the loop will still be executed after the break statement is encountered. On the other hand, with a return statement inside a loop, the statements after the loop will NOT be executed after the return statement has been encountered.


  18. Chaining:   consolidating statements in C++.   In C++, many kinds of statements are, themselves, expressions with values, making it possible to consolidate statements. For example, the following three assignment statements:

       z = 5;
       y = z;
       x = y;
    

    can be consolidated into one statement as follows:

       x = y = z = 5;
    

    The assignment operators are evaluated right to left, and each assignment is, itself, an expression with a value. Thus the above statement is equivalent to:

       x = (y = (z = 5));
    

    The expression z = 5 gets evaluated first. It sets z to a value of 5 and is also, itself, an expression whose value is 5. Once that expression is evaluated, its value, 5, is then assigned to the variable y. When that happens, the expression y = z = 5 is also evalated, and its value, too, is 5, which is then assigned to the variable x.

    Note that the value of an assignment expression is equal to the value assigned to the variable on the left. Thus the expression z = 5 has a value of 5.

    An input statement, using either cin or a file input stream, is also an expression with a value. Consider the following statement:

       cin >> number;
    

    The expression cin >> number is also an expression with a value. But what is its value? Its value is cin iteslf, NOT the value that has been read into the variable number.

    Thus, for example, the following statement does NOT do what you might hope:

       int copyOfNumber = (cin >> number);
    

    i.e. it is NOT equivalent to:

       cin >> number:
       int copyOfNumber = number;
    

    On the other hand, the following statement:

       (cin >> number) >> anotherNumber;
    

    is equivalent to the following two statements:

       cin >> number;
       cin >> anotherNumber;
    

    These are equivalent because (cin >> number) is an expression whose value is cin itself.

    Because the extraction operator (">>") evaluates from left to right, we don't need the parentheses. We can simply write:

       cin >> number >> anotherNumber;
    

    Thus, for example, we can prompt the user to enter three numbers as follows:

       cout << "Enter three numbers:> "
       cin >> number1 >> number2 >> number3;
    

    where number1, number2, and number3 are all variables that have been previously declared.

    The insertion operator ("<<") for an output stream (either cout or a file output stream) behaves similarly. We have often seen statements like the following:

       cout << string1 << string2 << string3;
    

    equivalent to:

       ((cout << string1) << string2) << string3;
    

    and thus equivalent to:

       cout << string1;
       cout << string2;
       cout << string3;
    

    The consolidation of statements in this manner is known as chaining.

    Consider again the following loop in average4A.cpp:

       int number;
       cin >> number;
    
       while ( cin )
       {
          sum += number;
          count++;
          cin >> number;
       }
    

    Note that the error state of cin is tested immediately after inputting a number, whether the input is done before the loop or at the end of the loop.

    Hence the two identical input statements (cin >> number) can both be chained with the testing of the error state, as is done in average4C.cpp:

       int number;
       while ( cin >> number )
       {
          sum += number;
          count++;
       }  // while
    

    Likewise, consider the loop in average5A.cpp:

       int number;
       inputFile >> number;
    
       while ( inputFile )
       {
          cout << setw(15) << number << endl;
          sum += number;
          count++;
          inputFile >> number;
       }
    

    Note that the error state of inpuFile is tested immediately after inputting a number, whether the input is done before the loop or at the end of the loop.

    Hence the two identical input statements (inputFile >> number) can both be chained with the testing of the error state, as is done in average5C.cpp

       int number;
       while ( inputFile >> number )
       {
          cout << setw(15) << number << endl;
          sum += number;
          count++;
       }
    

    
    
       

    
    
       


Back to: