Previous | Next | Table of Contents | Index | Program List | Copyright

9.3 System Structures: A Systematic View of Text Files

Up to this point, we have written most programs as interactive programs; in other words, each program reads all input data from the keyboard and displays all outputs on the screen. This mode of operation is fine for small programs. However, as you begin to write larger programs, you will see that there are many advantages to using disk files for program input and output.

You can create a data file using a text editor in the same way you create a program file. Once the data file is entered in computer memory, you can carefully check and edit each line before you save it as a disk file. When you enter data interactively, you do not have the opportunity to examine and edit the data.

After the data file is saved on disk, you can instruct your program to read data from the data file rather than from the keyboard. Recall from Chapter 2 that this mode of program execution is called batch mode. Because the program data are supplied before execution begins, prompting messages are not required in batch programs. Instead, batch programs must contain display statements that echo print data values, thereby providing a record of the data that are read and processed in a particular run.

Besides giving you the opportunity to check for errors in your data, using data files has another advantage. Because a data file can be read many times, during debugging you can rerun the program as often as you need to, without retyping the test data each time.

You can also instruct your program to write its output to a disk file rather than display it on the screen. When output is written to the screen, it disappears after it scrolls off the screen and cannot be retrieved. However, if program output is written to a disk file, you can use an operating system command such as TYPE filename (VAX/VMS and MS-DOS) or cat filename (UNIX) to list file filename as often as you wish or look at it with your editor. You can also get a hard copy of a disk file by sending it to the printer.

Finally, you can use the output file generated by one program as a data file for another program. For example, a payroll program may compute employee salaries and write each employee's name and salary to an output file. A second program that prints employee checks could use the output of the payroll program as its data file.

Ada's Package Specification for Text Files

You know already that in Ada, input and output are done with packages; Ada.Text_IO is the one we are using in this book. An excerpt of the Ada.Text_IO specification dealing with files appears as Figure 9.1.

Figure 9.1.
Section of Text_IO Dealing with Text Files

WITH IO_Exceptions;
PACKAGE Text_IO IS

  TYPE File_Type IS LIMITED PRIVATE;

  TYPE File_Mode IS (In_File, Out_File);

  ...

  -- File Management

  PROCEDURE Create(File : IN OUT File_Type;
                   Mode : IN File_Mode := Out_File;
                   Name : IN String := "";
                   Form : IN String := "");

  PROCEDURE Open(File : IN OUT File_Type;
                   Mode : IN File_Mode; Name : IN String;
                   Form : IN String := "");

  PROCEDURE Close(File : IN OUT File_Type);
  PROCEDURE Delete(File : IN OUT File_Type);
  PROCEDURE Reset(File : IN OUT File_Type; Mode : IN File_Mode);
  PROCEDURE Reset(File : IN OUT File_Type);

  FUNCTION Mode(File : IN File_Type) RETURN File_Mode;
  FUNCTION Name(File : IN File_Type) RETURN String;
  FUNCTION Form(File : IN File_Type) RETURN String;

  FUNCTION Is_Open(File : IN File_Type) RETURN Boolean;

  -- Control of default Input and output Files

  PROCEDURE Set_Input(File : IN File_Type);
  PROCEDURE Set_Output(File : IN File_Type);

  FUNCTION Standard_Input RETURN File_Type;
  FUNCTION Standard_Output RETURN File_Type;

  FUNCTION Current_Input RETURN File_Type;
  FUNCTION Current_Output RETURN File_Type;

  -- Specification of Line and Page lengths

  PROCEDURE Set_Line_Length(File : IN File_Type; To : IN Count);
  PROCEDURE Set_Line_Length(To : IN Count);

  PROCEDURE Set_Page_Length(File : IN File_Type; To : IN Count);
  PROCEDURE Set_Page_Length(To : IN Count);

  FUNCTION Line_Length(File : IN File_Type) RETURN Count;
  FUNCTION Line_Length RETURN Count;

  FUNCTION Page_Length(File : IN File_Type) RETURN Count;
  FUNCTION Page_Length RETURN Count;

  -- Column, Line, and Page Control

  PROCEDURE New_Line(File : IN File_Type; Spacing : IN Positive_Count := 1);
  PROCEDURE New_Line(Spacing : IN Positive_Count := 1);

  PROCEDURE Skip_Line(File : IN File_Type; Spacing : IN Positive_Count := 1);
  PROCEDURE Skip_Line(Spacing : IN Positive_Count := 1);

  FUNCTION End_of_Line(File : IN File_Type) RETURN Boolean;
  FUNCTION End_of_Line RETURN Boolean;

  PROCEDURE New_Page(File : IN File_Type);
  PROCEDURE New_Page;

  PROCEDURE Skip_Page(File : IN File_Type);
  PROCEDURE Skip_Page;

  FUNCTION End_of_Page(File : IN File_Type) RETURN Boolean;
  FUNCTION End_of_Page RETURN Boolean;

  FUNCTION End_of_File(File : IN File_Type) RETURN Boolean;
  FUNCTION End_of_File RETURN Boolean;

  PROCEDURE Set_Col(File : IN File_Type; To : IN Positive_Count);
  PROCEDURE Set_Col(To : IN Positive_Count);

  PROCEDURE Set_Line(File : IN File_Type; To : IN Positive_Count);
  PROCEDURE Set_Line(To : IN Positive_Count);

  FUNCTION Col(File : IN File_Type) RETURN Positive_Count;
  FUNCTION Col RETURN Positive_Count;

  FUNCTION Line(File : IN File_Type) RETURN Positive_Count;
  FUNCTION Line RETURN Positive_Count;

  FUNCTION Page(File : IN File_Type) RETURN Positive_Count;
  FUNCTION Page RETURN Positive_Count;

  -- Character Input-Output

  PROCEDURE Get(File : IN File_Type; Item : OUT Character);
  PROCEDURE Get(Item : OUT Character);
  PROCEDURE Put(File : IN File_Type; Item : IN Character);
  PROCEDURE Put(Item : IN Character);

  -- String Input-Output

  PROCEDURE Get(File : IN File_Type; Item : OUT String);
  PROCEDURE Get(Item : OUT String);
  PROCEDURE Put(File : IN File_Type; Item : IN String);
  PROCEDURE Put(Item : IN String);

  PROCEDURE Get_Line(File : IN File_Type;
                     Item : OUT String; Last : OUT natural);
  PROCEDURE Get_Line(Item : OUT String; Last : OUT natural);
  PROCEDURE Put_Line(File : IN File_Type; Item : IN String);
  PROCEDURE Put_Line(Item : IN String);

  ...

END Text_IO;

A file is defined as a type:

TYPE File_Type IS LIMITED PRIVATE;

We have seen PRIVATE before, but not LIMITED PRIVATE. The latter term is used to designate a type that behaves like a PRIVATE type--the client program cannot directly access details of objects of that type--but is even more restricted: The assignment and equality-checking operations are taken away. A type of this kind has no predefined operations; all client-accessible operations must be defined in the package specification.

Refer to this partial specification as you read the remainder of this section. Many more operations are defined in the specification than we will ever be using in this book, but it is helpful to know that the Ada standard defines all the operations in such a clear fashion as a package specification. The full specification for Ada.Text_IO, which runs for a number of pages, appears in Appendix D.

Reading and Writing Files with Ada.Text_IO

Several previous examples have used files for their input and output. This section gives a systematic explanation of how to get an Ada program to read from a data file and to write program results to an output file with Ada.Text_IO.

A text file is a collection of characters stored under the same name in secondary memory (that is, on a disk). A text file has no fixed size. To mark the end of a text file, the computer places a special character, called the end-of-file character (denoted as <eof>), following the last character in a text file. The Ada literature usually refers to this marker as the file terminator. Its exact form depends on the operating system.

As you create a text file using an editor program, you press the RETURN key to separate the file into lines. Each time you press RETURN, another special character, called the end-of-line character (denoted as <eol>), or line terminator, is placed in the file.

Here are the contents of a text file that consists of three lines of letters, blank characters, and punctuation. Each line ends with <eol>, and <eof> follows the last <eol> in the file. For convenience in scanning the file's contents, we have listed each line of the file as a separate line. In the actual file stored on disk, the characters are stored in consecutive storage locations, with each character occupying a single storage location. The first character of the second line (the letter I) occupies the next storage location following the first <eol>.

    This is a text file!<eol>
    It has two lines.<eol><eof>
A text file can also contain numeric data or mixed numeric and character data. Here is a text file that consists of numeric data and blank characters. Each number is stored on disk as a sequence of digit characters; blank characters separate numbers on the same line.
    1234  345<eol>
    999  -17<eol><eof>

The Keyboard and the Screen as Text Files

In interactive programming, Ada treats data entered at the keyboard as if they were read from the predefined file called Ada.Text_IO.Standard_Input. Pressing the Return key enters the <eol> in this file. In interactive mode, we normally use a sentinel value to indicate the end of data rather than attempt to enter <eof> in system file Ada.Text_IO.Standard_Input. We could use <eof>, however. Its keyboard representation depends on the operating system; control-D and control-Z are often used.

Similarly, displaying characters on the screen is equivalent to writing characters to system file Ada.Text_IO.Standard_Output. The New_Line procedure places the <eol> in this file, resulting in the cursor moving to the start of the next line of the screen. Both Ada.Text_IO.Standard_Input and Ada.Text_IO.Standard_Output are text files because their individual components are characters.

The End_of_Line and End_of_File Functions

Both <eol> and <eof> are different from the other characters in a text file because they are not data characters; in fact, the Ada standard doesn't even specify what they should be because their form depends on the operating system. Many of the Ada Get operations skip over the line terminators. However, if an Ada program attempts to read <eof>, the exception Ada.Text_IO.End_Error is raised.

If we can't read or write these characters in the normal way, how do we process them? Ada.Text_IO provides two functions that enable us to determine whether the next character is <eol> or <eof>. The function Ada.Text_IO.End_of_Line returns a value of True if the next character is <eol>; the function Ada.Text_IO.End_of_File returns a value of True if the next character is <eof>. The algorithm below uses the End_of_Line and End_of_File functions to control the processing of a data file.

Algorithm Skeleton for Processing a Text File, Character by Character

    WHILE NOT Ada.Text_IO.End_of_File (data file) LOOP
        
        WHILE NOT Ada.Text_IO.End_of_Line (data file) LOOP
            process each character in the current line
        END LOOP;
        -- assert: the next character is <eol>
    
        process the <eol> character
    
    END LOOP;
    -- assert: the next character is <eof>
If the data file is not empty, the initial call to End_of_File returns a value of False, and the computer executes the inner WHILE loop. This loop processes each character in a line up to (but not including) the <eol>. For the two-line character data file shown above, the first execution of the WHILE loop processes the first line of characters:
    This is a text file!
When the next character is <eol>, the End_of_Line function returns True, so the inner WHILE loop is exited. The <eol> is processed immediately after loop exit, and the outer WHILE loop is repeated.

Each repetition of the outer WHILE loop begins with a call to the End_of_File function to test whether the next character is the <eof> character. If it is, the End_of_File function returns True, so the outer loop is exited. If the next character is not <eof>, the End_of_File function returns False, so the inner loop executes again and processes the next line of data up to <eol>. For the file above, the second execution of the inner WHILE loop processes the second line of characters:

    It has two lines.
After the second <eol> is processed, the next character is <eof>, so the End_of_File function returns True, and the outer WHILE loop is exited. We use this algorithm later in a program that duplicates a file by copying all its characters to another file.

SYNTAX DISPLAY
End_of_Line Function (for Text Files)

Form:
Ada.Text_IO.End_of_Line(filename)

Interpretation:
The function result is True if the next character in file filename is <eol>; otherwise, the function result is False.

Note:
If filename is omitted, the file is assumed to be Ada.Text_IO.Standard_Input (usually the terminal keyboard).

SYNTAX DISPLAY
End_of_File Function (for Text Files)

Form:
Ada.Text_IO.End_of_File(filename)

Interpretation:
The function result is True if the next character in file filename is <eof>; otherwise, the function result is False.

Note:
If filename is omitted, the file is again assumed to be Ada.Text_IO.Standard_Input. If a read operation is attempted when End_of_File (filename) is True, an attempt to read past the end of the input file error occurs and the program stops.

Declaring a Text File

Before we can reference a text file in a program, we must declare it just like any other data object. For example, the declarations

    InData : Ada.Text_IO.File_Type;
    OutData : Ada.Text_IO.File_Type;
identify InData and OutData as text file variables of type Ada.Text_IO.File_Type.

Directory Names for Files

To read or write a text file with an Ada program, we must know the file's directory name, or external name, which is the name used to identify it in the disk's directory. A disk's directory lists the names of all files stored on the disk. A file's directory name must follow the conventions that apply on your particular computer system. For example, some systems (MS-DOS, for example) limit you to a file name that consists of eight characters, a period, and a three-letter extension. Many programmers use the extension .DAT or .TXT to designate a text file.

You need to communicate to the operating system the directory names of any files you are using so that the system knows the correspondence between file variables and directory names. This process varies from computer to computer. Your instructor will give you the details for your particular system.

Preparing a File for Input or Output

Before a program can use a file, the file must be prepared for input or output. At any given time, a file can be used for either input or output, but not both simultaneously. If a file is being used for input, its components can be read as data. If a file is being used for output, new components can be written to the file.

The procedure call statement

    Ada.Text_IO.Open
      (File => InData, Mode => Ada.Text_IO.In_File, Name => "SCORES.DAT");
prepares file InData for input by associating it with the disk file SCORES.DAT and moving its file position pointer to the beginning of the file. The file position pointer selects the next character to be processed in the file. The file SCORES.DAT must have been previously created and located in the current disk directory; if it is not available, the exception Ada.Text_IO.Name_Error is raised.

The procedure call statement

    Ada.Text_IO.Create(File=>OutData, Mode=>Ada.Text_IO.Out_File, Name=>"TEST.OUT");
prepares file OutData for output. If no file TEST.OUT is saved on disk, a file that is initially empty (that is, TEST.OUT has no characters) is created. If a file TEST.OUT is already saved on disk, it is deleted and a new one is created.

To read and process a file a second time in the same program run, first close it by performing an operation such as

    Ada.Text_IO.Close(File => "TEST.OUT"); 
and then reopen it for input. A program can read and echo print (to the screen) an output file it creates by calling the Close procedure with the newly created file as its parameter. An Open operation prepares this file for input, and your program can then read data from that file.

Reading and Writing a Text File

You've learned how to declare a text file and how to prepare one for processing. All that remains is to find out how to instruct the computer to read data from an input file or to write program results to an output file.

If NextCh is a type Character variable, we know that the procedure call statement

    Ada.Text_IO.Get (Item => NextCh);
reads the next data character typed at the keyboard into NextCh. This is really an abbreviation for the procedure call statement
    Ada.Text_IO.Get (File => Ada.Text_IO.Standard_Input, Item => NextCh);
which has the same effect. The statement
    Ada.Text_IO.Get (File => InData, Item => NextCh);
reads the next character from file InData into NextCh, where the next character is the one selected by the file position pointer. The computer automatically advances the file position pointer after each read operation. Remember to open InData for input before the first read operation.

In a similar manner, the procedure call statements

    Ada.Text_IO.Put (Item => NextCh);
    Ada.Text_IO.Put (File => Ada.Text_IO.Standard_Output, Item => NextCh);
display the value of Ch on the screen. The statement
    Ada.Text_IO.Put (File => OutData, Item => NextCh);
writes the value of Ch to the end of file OutData. Remember to open OutData for output before the first call to procedure Put.

Example 9.10

It is a good idea to have a backup or duplicate copy of a file in case the original file data are lost. Program 9.3 is an Ada program that copies one file to another; it is similar in function to the file-copying command provided by the operating system.

Program 9.3
File Copy Program

WITH Ada.Text_IO;
PROCEDURE Copy_File IS
------------------------------------------------------------------------
--| Program copies its input file test.dat into its output file 
--| test.out, then closes test.out, reopens it for input,  
--| and displays its contents on the screen.
--| Author: Michael B. Feldman, The George Washington University 
--| Last Modified: November 1995                                     
------------------------------------------------------------------------
 
  InData  : Ada.Text_IO.File_Type;
  OutData : Ada.Text_IO.File_Type;
  NextCh  : Character;

BEGIN -- Copy_File

  -- open input file; create output file
  Ada.Text_IO.Open
    (File=>InData, Mode=>Ada.Text_IO.In_File, Name=>"test.dat");
  Ada.Text_IO.Create
    (File=>OutData, Mode=>Ada.Text_IO.Out_File, Name=>"test.out");

  -- copy input file to output file, character by character
  WHILE NOT Ada.Text_IO.End_of_File(File => InData) LOOP
    WHILE NOT Ada.Text_IO.End_of_Line(File => InData) LOOP

      Ada.Text_IO.Get(File => InData, Item => NextCh);
      Ada.Text_IO.Put(File => OutData, Item => NextCh);

    END LOOP;

    Ada.Text_IO.Skip_Line(File => InData);
    Ada.Text_IO.New_Line(File => OutData);
  END LOOP;

  Ada.Text_IO.Close(File => InData);
  Ada.Text_IO.Close(File => OutData);

  -- reopen the new file and display it on the screen
  Ada.Text_IO.Open
    (File=>InData, Mode=>Ada.Text_IO.In_File, Name=>"test.out");

  WHILE NOT Ada.Text_IO.End_of_File(File => InData) LOOP
    WHILE NOT Ada.Text_IO.End_of_Line(File => InData) LOOP

      Ada.Text_IO.Get(File => InData, Item => NextCh);
      Ada.Text_IO.Put(Item => NextCh);

    END LOOP;

    Ada.Text_IO.Skip_Line(File => InData);
    Ada.Text_IO.New_Line;
  END LOOP;

  Ada.Text_IO.Close(File => InData);

EXCEPTION

  WHEN Ada.Text_IO.Name_Error =>
    Ada.Text_IO.Put
      (Item => "File test.dat doesn't exist in this directory!");
    Ada.Text_IO.New_Line;

END Copy_File;

The nested WHILE loops in Program 9.3 implement the algorithm first shown above. The data file, InData, is the argument in the calls to functions End_of_Line and End_of_File. As long as the next character is not <eol>, the statements

    Ada.Text_IO.Get (File => InData, Item => NextCh);  
    Ada.Text_IO.Put (File => OutData, Item => NextCh);
read the next character of file InData into NextCh, and then write that character to file OutData. If the next character is <eol>, the inner WHILE loop is exited and the statements
    Ada.Text_IO.Skip_Line (File => InData);  
    Ada.Text_IO.New_Line (File => OutData);
are executed. The Ada.Text_IO.Skip_Line procedure does not read any data but simply advances the file position pointer for InData past the <eol> to the first character of the next line. The second statement writes the <eol> to file OutData. After the <eol> is processed, function End_of_File is called again to test whether there are more data characters left to be copied.

It is interesting to contemplate the effect of omitting either the Skip_Line or the New_Line statement. If the New_Line is omitted, the <eol> will not be written to file OutData whenever the end of a line is reached in file InData. Consequently, OutData will contain all the characters in file InData, but on one (possibly very long) line. If the Skip_Line is omitted, the file position pointer will not be advanced and the <eol> will still be the next character. Consequently, End_of_Line (InData) will remain True, the inner loop is exited immediately, and another <eol> is written to file OutData. This continues "forever" or until the program is terminated by its user or until its time limit is exceeded.

After copying the file, the program closes TEST.OUT, reopens it for input, and displays its contents on the screen; the algorithm in the second part of the program is nearly identical to that in the first part.

A common source of error is forgetting to use a file name with End_of_Line or End_of_File. In this case, the system uses file Ada.Text_IO.Standard_Input. A similar error is forgetting to use a file name with Get or Put. Normally, no error diagnostic is displayed, because there is nothing illegal about this; the computer simply assumes the keyboard or screen is intended instead of the disk file. The cause of the incorrect behavior of the program is therefore not obvious.

Behaviors of the Various Get Operations in Ada.Text_IO

Learning to write input operations correctly is one of the most difficult tasks for a beginner in any programming language, including Ada. It is important to realize that Ada.Text_IO provides many different Get operations. We most frequently use four types: Get for a single character, Get and Get_Line for strings (as we used in section 9.1), and Get for numeric and enumeration values. Each of these behaves slightly differently with respect to blanks and line terminators in a file (including Standard_Input). Here is a summary of their behaviors; we have used the "short form" for reading from the terminal, but the behavior is identical if a file is used.

This operation can cause trouble if you are not careful: Suppose that you are trying to read an integer value and accidentally type a few numeric digits followed by a letter or punctuation character. This last character will cause reading to stop but remain available; the already-read numeric digits make up a valid integer literal, so the typing error will not be discovered until the next input operation, which will probably not expect that character and raise Ada.Text_IO.Data_Error. Be careful!

Now is the time to take another look at the procedures in package Robust_Input (Section 6.7) to be certain you understand exactly how they work to prevent such a situation from arising.


Previous | Next | Table of Contents | Index | Program List | Copyright

Copyright © 1996 by Addison-Wesley Publishing Company, Inc.