Previous | Next | Table of Contents | Index | Program List | Copyright

7.5 Data Structures: The Character Type

Character variables are declared using the data type Character. A character literal consists of a single printable character (letter, digit, punctuation mark, etc.) enclosed in single quotes. A character value may be assigned to a character variable or associated with a constant identifier as shown below.

    Star : CONSTANT Character := '*';
    NextLetter : Character;

BEGIN 
    NextLetter := 'A';

The character variable NextLetter is assigned the character value 'A' by the assignment statement above. A single character variable or literal can appear on the right-hand side of a character assignment statement. Character values can also be compared, read, and displayed.

Example 7.21

Program 7.9 reads a sentence ending in a period and counts the number of blanks in the sentence. Each character entered after the prompting message is read into the variable Next and tested to see if it is a blank.

The statement

    Ada.Text_IO.Get (Item => Next); 
appears once to prime the loop and a second time within the loop body and is used to read one character at a time from the data line because Next is type Character. The WHILE loop is exited when the last character read is a period.

Program 7.9
Counting the Number of Blanks in a Sentence

WITH Ada.Text_IO;
WITH Ada.Integer_Ada.Text_IO;
PROCEDURE Blank_Count IS
------------------------------------------------------------------------
--| Counts the number of blanks in a sentence.   
--| Author: Michael B. Feldman, The George Washington University 
--| Last Modified: July 1995                                     
------------------------------------------------------------------------

   Blank :    CONSTANT Character :=  ' ';  -- character being counted   
   Sentinel : CONSTANT Character :=  '.';   -- sentinel character   

   Next  : Character;             -- next character in sentence   
   Count : Natural;               -- number of blank characters   

BEGIN  -- Blank_Count   

  Count :=  0;                    -- Initialize Count   
  Ada.Text_IO.Put(Item => "Enter a sentence ending with a period.");
  Ada.Text_IO.New_Line;

  -- Process each input character up to the period   
  Ada.Text_IO.Get(Item => Next);      -- Get first character   
  Ada.Text_IO.Put(Item => Next);
  WHILE Next /= Sentinel LOOP
    -- invariant: Count is the count of blanks so far and
    --   no prior value of Next is the sentinel
      
    IF Next = Blank THEN
      Count := Count + 1;         -- Increment blank count   
    END IF;
    Ada.Text_IO.Get(Item => Next);    -- Get next character   
    Ada.Text_IO.Put(Item => Next);
  END LOOP;
  -- assert: Count is the count of blanks and Next is the sentinel   

  Ada.Text_IO.New_Line;
  Ada.Text_IO.Put(Item => "The number of blanks is ");
  Ada.Integer_Ada.Text_IO.Put(Item => Count, Width => 1);
  Ada.Text_IO.New_Line;

END Blank_Count;
Sample Run
Enter a sentence ending with a period.
The  q uick   brown fox   jumped over th  e lazy dogs.
The  q uick   brown fox   jumped over th  e lazy dogs.
The number of blanks is 16

Using Relational Operators with Characters

In Program 7.9, the Boolean expressions
    Next = Blank 
    Next /= Sentinel
are used to determine whether two character variables have the same or different values. Order comparisons can also be performed on character variables using the relational operators < , <= , > , and >= .

To understand the result of an order comparison, we must know something about the way characters are represented internally. Each character has its own unique numeric code; the binary form of this code is stored in a memory cell that has a character value. These binary numbers are compared by the relational operators in the normal way.

The Ada 95 standard uses the 256-character ISO 8859-1 (Latin-1) character set. This character set, an extension of the older 128-character ASCII set, includes the usual letters a-z and A-Z, but also a number of additional characters to provide for the additional letters used in non-English languages. For example, French uses accented letters like é and à; German has letters using the umlaut such as ü, the Scandinavian languages have dipthongs such as æ, and so forth. For purposes of this book, we use just the 26 uppercase and lowercase letters of English; if you are in another country and wish to use the additional letters, you can find out locally how to do so on your computer or terminal.

The ordinary printable characters have codes from 32 (code for blank or space) to 126 (code for symbol ~); the additional European characters have codes from 160 to 255. The other codes repesent nonprintable control characters. Sending a control character to an output device causes the device to perform a special operation such as returning the cursor to column 1, advancing the cursor to the next line, ringing a bell, and so on.

Some features of the "ordinary" part of the code are as follows:

Example 7.22

Let us write a function specified by

    FUNCTION Cap (InChar : Character) RETURN Character;

If InChar is a lowercase letter, Cap(InChar) returns the corresponding uppercase letter; otherwise Cap(InChar) just returns InChar unchanged. The function body makes use of the Pos (position) and Val (value) attribute functions as well as the fact that all the uppercase letters are "together" in the type Character, as are all the lowercase letters. If InChar is lowercase, its position relative to 'a' is used to find the value of the corresponding uppercase letter. As an example, if InChar is 'g', its position relative to 'a' is 6 (remember, the positions start with 0). The corresponding uppercase value is the value at the same position relative to 'A', namely, 'G'.

FUNCTION Cap (InChar : Character) RETURN Character IS

    Temp : Character;

BEGIN

    IF InChar IN 'a' .. 'z' THEN
    	Temp := Character'Val(Character'Pos(Inchar)
    		- Character'Pos('a') + Character'Pos('A');
    ELSE
    	Temp := InChar;
    END IF;

    RETURN Temp;

END Cap;

Example 7.23

When you enter or display a token of any kind, you are always entering or displaying sequences of characters, because these are the basic unit of information used by keyboards and display devices. A numeric token--for example, 1257--read by, say, Ada.Integer_Ada.Text_IO.Get, cannot be placed in an integer variable directly; the sequence of characters must first be converted to a number--in this case a binary integer. This conversion task is generally done by the input/output routines, often with the help of a system utility program. The important thing to realize is that there is always a program taking care of this.

You now have the background to learn how such a conversion program works. Let's consider the simple case of reading a positive integer as a sequence of individual characters instead of using Ada.Integer_Ada.Text_IO.Get. This enables the program to detect and ignore input errors. For example, if the program user enters a letter instead of a number, this error will be detected and the program will prompt again for a data value. Similarly, if the program user types in $15,400 instead of the number 15400, the extra characters will be ignored.

Program 7.10 is a procedure Get_Natural_Token, which reads in a string of characters ending with the sentinel (%) and ignores any character that is not a digit. It also computes the value of the number (of type Natural) formed by the digits only. For example, if the characters $15,43AB0% are entered, the value returned through NumData will be 15430.

Program 7.10
Reading a Token and Converting to Natural

PROCEDURE Get_Natural_Token (NumData : OUT Natural) IS
------------------------------------------------------------------------
--| Reads consecutive characters ending with the symbol %.  Computes
--| the integer value of the digit characters, ignoring nondigits.
--| Author: Michael B. Feldman, The George Washington University 
--| Last Modified: July 1995                                     
------------------------------------------------------------------------

  Base :     CONSTANT Positive := 10;   -- the number system base   
  Sentinel : CONSTANT Character := '%'; -- the sentinel character   

  Next :     Character;                 -- each character read   
  Digit :    Natural;                   -- value of numeric character 
                                        -- (its ASCII position)
BEGIN -- Get_Natural_Token   

  -- Accumulate the numeric value of the digits in NumData   
  NumData := 0;                         -- initial value is zero   
  Ada.Text_IO.Get(Item => Next);        -- Read first character   
  WHILE Next /= Sentinel LOOP
    -- invariant:
    --   No prior value of Next is the sentinel and
    --   if Next is a digit, NumData is multiplied by Base and
    --   Next's digit value is added to NumData
      
    IF (Next >= '0') AND (Next <= '9') THEN
      -- Process digit   
      Digit := Character'Pos(Next) - Character'Pos('0');
      NumData := Base * NumData + Digit; -- Add digit value   
    END IF;
    Ada.Text_IO.Get(Item => Next);       -- Read next character   
  END  LOOP;
  -- assert:
  --   Next is the sentinel and
  --   NumData is the number in base Base formed from the digit
  --   characters read as data

END Get_Natural_Token;
In Program 7.10, the statements
     Digit := Character'Pos( Next) - Character'Pos('0');	-- Get digit value
     NumData := Base * NumData + Digit;     		-- Add digit value
assign to Digit an integer value between 0 (for character value '0') and 9 (for character value '9'). The number being accumulated in NumData is multiplied by 10, and the value of Digit is added to it. Table 7.6 traces the procedure execution for the input characters 3N5%; the value returned is 35.

Table 7.6
Trace of Procedure GetNaturalToken for Data Characters 3N5%

Statement			Next	Digit	TempNum	Effect of Statement
				?	?	?
NumData:= 0;					0	Initialize NumData
Ada.Text_IO.Get(Item=>Next);	'3'			Get Character
WHILE Next/=Sentinel LOOP	'3'			'3' /= '%' is True
IF Next>='0' AND Next<='9'	'3'			'3' is a digit
Digit:=Character'Pos(Next)		3		Digit value is 3
     - Character'Pos('0');
NumData:=Base*NumData+Digit;		3	3	Add 3 to 0
Ada.Text_IO.Get(Item=>Next);	'N'			Get Character
WHILE Next/=Sentinel LOOP	'N'			'N' /= '%' is True
IF Next>='0' AND Next<='9'	'N'			'N' is not a digit
Ada.Text_IO.Get(Item=>Next);	'5'			Get Character
WHILE Next/=Sentinel LOOP	'5'			'5' /= '%' is True
IF Next>='0' AND Next<='9'	'5'			'5' is a digit
Digit:=Character'Pos(Next)		5		Digit value is 5
     - Character'Pos('0');
NumData:=Base*NumData+Digit;		5	35	Add 5 to 30
Ada.Text_IO.Get(Item=>Next);	'%'			Get Character
WHILE Next/=Sentinel LOOP	'%'			'%' /= '%' is False

Representing Control Characters

The character set includes a number of "nonprintable" characters which are used for controlling input and output devices. These control characters cannot be represented in programs in the usual way (i.e., by enclosing them in quotes). A control character can be specified in Ada using its position in the Character type (see Appendix C). For example, Character'Val(10) is the line feed character, and Character'Val(7) is the bell character. The statements

    Ada.Text_IO.Put(Item => Character'Val(10));
    Ada.Text_IO.Put(Item => Character'Val(7));
    Ada.Text_IO.Put(Item => Character'Val(7));
will cause the output device to perform a line feed and then ring its bell twice.

Ada also has a more intuitive way of representing the control characters. These characters are all given names by declaring them as character constants in a predefined package Ada.Characters.Latin_1. The statements

    Ada.Text_IO.Put(Item => Ada.Characters.Latin_1.LF);
    Ada.Text_IO.Put(Item => Ada.Characters.Latin_1.Bel);
    Ada.Text_IO.Put(Item => Ada.Characters.Latin_1.Bel);
give the same effect as the statements above, but use the names of the characters instead of their numerical values. A program that uses the Ada.Characters.Latin_1 package must of course be preceded by a context clause
    WITH Ada.Characters.Latin_1;

Example 7.24

A collating sequence is a sequence of characters arranged in the order in which they appear in the Latin-1 character set. The Character type is really an enumeration type; each character's position in this type corresponds to its Latin-1 value. Given the declarations

    MinPos : CONSTANT Positive := 32; 
    MaxPos : CONSTANT Positive := 90;
the loop
    FOR NextPos IN MinPos .. MaxPos LOOP
      Ada.Text_IO.Put(Item => Character'Val(NextPos));
    END LOOP;
displays part of the Ada collating sequence. It lists the characters with values 32 through 90, inclusive. The first character--in position 32-- is a blank, as follows:

!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ

Example 7.26

In Section 3.7 we introduced the package Screen ( Program 3.8 and Program 3.9), which we have used several times since. In Section 3.7 we advised you not to worry about the details of the package body; now, having studied the Character type systematically, you are ready to understand those details. In Program 3.9, the procedure Beep contains a statement

    Ada.Text_IO.Put (Item => Ada.Characters.Latin_1.BEL);
which sends the bell character to the terminal. Instead of displaying this character, the terminal will beep. Procedure ClearScreen contains the statements
    Ada.Text_IO.Put (Item => Ada.Characters.Latin_1.ESC);
    Ada.Text_IO.Put (Item => "[2J");
which send four characters to the terminal. According to standard American National Standards Institute (ANSI) terminal control commands, this sequence will cause the screen to be erased. Finally, the procedure MoveCursor contains these lines:
    Ada.Text_IO.Put (Item => Ada.Characters.Latin_1.ESC);
    Ada.Text_IO.Put (Item => "[" );
    Ada.Integer_Text_IO.Put (Item => Row, Width => 1);
    Ada.Text_IO.Put (Item => ';');
    Ada.Integer_Text_IO.Put (Item => Column, Width => 1);
    Ada.Text_IO.Put (Item => 'f');

The sequence of characters sent to the terminal by these statements will cause the cursor to be moved to the given row/column position. Suppose Row is 15. Under these circumstances, sending the integer value Row does not cause the terminal to display the characters 15; rather, because these characters are sent in the middle of a control command (preceded by Ada.Characters.Latin_1.ESC and [), the terminal obeys the command and moves the cursor to row 15. The command must end with 'f'. It may seem strange to you, but that is what the ANSI terminal control standard specifies. As you saw in the examples using the screen package, these commands really do cause the terminal to carry out the desired actions.

The Package Ada.Characters.Handling

Ada provides a package Ada.Characters.Handling [1], which contains a set of functions to do useful operations on characters. Figure 7.8 gives a partial specification for this package.

Figure 7.8
Partial Specification of Ada.Characters.Handling

PACKAGE Ada.Characters.Handling IS

  -- Character classification functions

  FUNCTION Is_Control           (Item : IN Character) RETURN Boolean;
  FUNCTION Is_Graphic           (Item : IN Character) RETURN Boolean;
  FUNCTION Is_Letter            (Item : IN Character) RETURN Boolean;
  FUNCTION Is_Lower             (Item : IN Character) RETURN Boolean;
  FUNCTION Is_Upper             (Item : IN Character) RETURN Boolean;
  FUNCTION Is_Digit             (Item : IN Character) RETURN Boolean;
  FUNCTION Is_Alphanumeric      (Item : IN Character) RETURN Boolean;
  FUNCTION Is_Special           (Item : IN Character) RETURN Boolean;

 -- Conversion functions for Character and String

  FUNCTION To_Lower (Item : IN Character) RETURN Character;
  FUNCTION To_Upper (Item : IN Character) RETURN Character;

  FUNCTION To_Lower (Item : IN String) RETURN String;
  FUNCTION To_Upper (Item : IN String) RETURN String;

  . . . 

END Ada.Characters.Handling;
Most of these functions are self-explanatory: Is_Letter, Is_Lower, Is_Upper, and Is_Digit return True if their input character is in the given category; Is_Alphanumeric returns True if the character is a letter or a digit; Is_Control returns True if the input character has position 0..31; Is_Graphic returns true if the input character has position 32..126.

To_Upper, like the function Cap we wrote in Example 7.23, returns an uppercase letter; To_Lower produces a lowercase letter. There are corresponding functions for strings: the second To_Upper converts all letters in the string to uppercase.

Exercises for Section 7.5

Self-Check

  1. Evaluate the following:
    a. Boolean'Pos( True) 
    b. Boolean'Pred( True) 
    c. Boolean'Succ( False) 
    d. Boolean'Pos( True) - Boolean'Pos( False) 
    
  2. Evaluate the following; assume the letters are consecutive characters.
    a. Character'Pos('D') - Character'Pos('A')      		
    b. Character'Pos('d') - Character'Pos('a')            
    c. Character'Succ( Character'Pred('a'))                
    d. Character'Val( Character'Pos('C'))                  
    e. Character'Val( Character'Pos('C') - Character'Pos('A')+Character'Pos('a'))
    f. Character'Pos('7') - Character'Pos('6')
    g. Character'Pos('9') - Character'Pos('0')
    h. Character'Succ( Character'Succ( Character'Succ('d')))
    i. Character'Val( Character'Pos('A') + 5)
    


[1] This package is new to Ada 95.


Previous | Next | Table of Contents | Index | Program List | Copyright

Copyright © 1996 by Addison-Wesley Publishing Company, Inc.