Rationale for Ada 2005
7.5 Characters and strings
An important improvement in Ada 2005 is the ability
to deal with 16- and 32-bit characters both in the program text and in
the executing program.
The fine detail of the changes to the program text
are perhaps for the language lawyer. The purpose is to permit the use
of all relevant characters of the entire ISO/IEC 10646:2003 repertoire.
The most important effect is that we can write programs using Cyrillic,
Greek and other character sets.
A good example is provided
by the addition of the constant
π : constant := Pi;
to the package Ada.Numerics.
This enables us to write mathematical programs in a more natural notation
thus
Circumference: Float := 2.0 * π * Radius;
Other examples might
be for describing polar coordinates thus
R: Float := Sqrt(X*X + Y*Y);
θ: Angle := Arctan(Y, X);
and of course in France
we can now declare a decent set of ingredients for breakfast
type Breakfast_Stuff is (Croissant, Café, Œuf, Beurre);
Curiously, although
the ligature æ is in Latin-1 and thus
available in Ada 95 in identifiers, the ligature œ
is not (for reasons we need not go into). However, in Ada 95, œ
is a character of the type Wide_Character
and so even in Ada 95 one can order breakfast thus
Put("Deux œufs easy-over avec jambon"); -- wide string
In order to manipulate
32-bit characters, Ada 2005 includes types
Wide_Wide_Character
and
Wide_Wide_String in the package
Standard
and the appropriate operations to manipulate them in packages such as
Ada.Strings.Wide_Wide_Bounded
Ada.Strings.Wide_Wide_Fixed
Ada.Strings.Wide_Wide_Maps
Ada.Strings.Wide_Wide_Maps.Wide_Wide_Constants
Ada.Strings.Wide_Wide_Unbounded
Ada.Wide_Wide_Text_IO
Ada.Wide_Wide_Text_IO.Text_Streams
Ada.Wide_Wide_Text_IO.Complex_IO
Ada.Wide_Wide_Text_IO.Editing
There are also new attributes Wide_Wide_Image,
Wide_Wide_Value and Wide_Wide_Width
and so on.
The addition of wide-wide characters and strings
introduces many additional possibilities for conversions. Just adding
these directly to the existing package
Ada.Characters.Handling
could cause ambiguities in existing programs when using literals. So
a new package
Ada.Characters. Conversions
has been added. This contains conversions in all combinations between
Character,
Wide_Character
and
Wide_Wide_Character and similarly for
strings. The existing functions from
Is_Character
to
To_Wide_String in
Ada.Characters.Handling
have been banished to
Annex
J.
The introduction of more complex writing systems
makes the definition of the case insensitivity of identifiers, (the equivalence
between upper and lower case), much more complicated.
In some systems, such
as the ideographic system used by Chinese, Japanese and Korean, there
is only one case, so things are easy. But in other systems, like the
Latin, Greek and Cyrillic alphabets, upper and lower case characters
have to be considered. Their equivalence is usually straightforward but
there are some interesting exceptions such as
- Greek has two forms for lower case
sigma (the normal form σ and the final form ς which is used
at the end of a word). These both convert to the one upper case letter
Σ.
- German has the lower case letter ß
whose upper case form is made of two letters, namely SS.
- Slovenian has a grapheme LJ which
is considered a single letter and has three forms: LJ, Lj and lj.
The Greek situation used to apply in English where
the long s was used in the middle of words (where it looked like an f
but without a cross stroke) and the familiar short s only at the end.
To modern eyes this makes poetic lines such as "Where the bee sucks,
there suck I" somewhat dubious. (This is sung by Ariel in Act V
Scene I of The Tempest by William Shakespeare.)
The definition chosen for Ada 2005 closely follows
those provided by ISO/IEC 10646:2003 and by the Unicode Consortium; this
hopefully means that all users should find that the case insensitivity
of identifiers works as expected in their own language.
Of interest to all users whatever their language
is the addition of a few more subprograms in the string handling packages.
As explained in the Introduction, Ada 95 requires rather too many conversions
between bounded and unbounded strings and the raw type String
and, moreover, multiple searching is inconvenient.
The additional subprograms
in the packages are as follows.
In the package Ada.Strings.Fixed
(assuming use Maps; for brevity)
function Index(
Source: String; Pattern: String;
From: Positive; Going: Direction := Forward;
Mapping: Character_Mapping := Identity) return Natural;
function Index(
Source: String; Pattern: String;
From: Positive; Going: Direction := Forward;
Mapping: Character_Mapping_Function) return Natural;
function Index(
Source: String; Set: Character_Set;
From: Positive; Test: Membership := Inside;
Going: Direction := Forward) return Natural;
function Index_Non_Blank(
Source: String;
From: Positive; Going: Direction := Forward) return Natural;
The difference between these and the existing functions
is that these have an additional parameter From.
This makes it much easier to search for all the occurrences of some pattern
in a string.
Similar functions are also added to the packages
Ada.Strings.Bounded and Ada.Strings.Unbounded.
Thus suppose we want
to find all the occurrences of "bar"
in the string "barbara barnes" held
in the variable BS of type Bounded_String.
(I have put my wife into lower case for convenience.) There are 3 of
course. The existing function Count can be
used to determine this fact quite easily
N := Count(BS, "bar") -- is 3
But we really need
to know where they are; we want the corresponding index values. The first
is easy in Ada 95
I := Index(BS, "bar") -- is 1
But to find the next
one in Ada 95 we have to do something such as take a slice by removing
the first three characters and then search again. This would destroy
the original string so we need to make a copy of at least part of it
thus
Part := Delete(BS, I, I+2); -- 2 is length "bar" – 1
I := Index(Part, "bar") + 3; -- is 4
and so on in the not-so-obvious
loop. (There are other ways such as making a complete copy first, this
could either be in another bounded string or perhaps it is simplest just
to copy it into a normal String first; but
whatever we do it is messy.) In Ada 2005, having found the index of the
first in I, we can find the second by writing
I := Index(BS, "bar", From => I+3);
and so on. This is clearly much easier.
The following are also
added to Ada.Strings.Bounded
procedure Set_Bounded_String(
Target: out Bounded_String;
Source: in String; Drop: in Truncation := Error);
function Bounded_Slice(
Source: Bounded_String;
Low: Positive; High: Natural) return Bounded_String;
procedure Bounded_Slice(
Source: in Bounded_String;
Target: out Bounded_String;
Low: in Positive; High: in Natural);
The procedure Set_Bounded_String
is similar to the existing function To_Bounded_String.
Thus rather than
BS := To_Bounded_String("A Bounded String");
we can equally write
Set_Bounded_String(BS, "A Bounded String");
The slice subprograms
avoid conversion to and from the type String.
Thus to extract the characters from 3 to 9 we can write
BS := Bounded_Slice(BS, 3, 9); -- "Bounded"
whereas in Ada 95 we
have to write something like
BS := To_Bounded(Slice(BS, 3, 9));
Similar subprograms are added to Ada.Strings.Unbounded.
These are even more valuable because unbounded strings are typically
implemented with controlled types and the use of a procedure such as
Set_Unbounded_String is much more efficient
than the function To_Unbounded_String because
it avoids assignment and thus calls of Adjust.
Input and output of
bounded and unbounded strings in Ada 95 can only be done by converting
to or from the type
String. This is both slow
and untidy. This problem is particularly acute with unbounded strings
and so Ada 2005 provides the following additional package (we have added
a use clause for brevity as usual)
with Ada.Strings.Unbounded; use Ada.Strings.Unbounded;
package Ada.Text_IO.Unbounded_IO is
procedure Put(File: in File_Type; Item: in Unbounded_String);
procedure Put(Item: in Unbounded_String);
procedure Put_Line(File: in File_Type; Item: in Unbounded_String);
procedure Put_Line(Item: in Unbounded_String);
function Get_Line(File: File_Type) return Unbounded_String;
function Get_Line return Unbounded_String;
procedure Get_Line(File: in File_Type; Item: out Unbounded_String);
procedure Get_Line(Item: out Unbounded_String);
end Ada.Text_IO.Unbounded_IO;
The behaviour is as expected.
There is a similar package for bounded strings but
it is generic. It has to be generic because the package
Generic_Bounded_Length
within
Strings.Bounded is itself generic and
has to be instantiated with the maximum string size. So the specification
is
with Ada.Strings.Bounded; use Ada.Strings.Bounded;
generic
with package Bounded is new Generic_Bounded_Length(<>);
use Bounded;
package Ada.Text_IO.Bounded_IO is
procedure Put(File: in File_Type; Item: in Bounded_String);
procedure Put(Item: in Bounded_String);
... -- etc as for Unbounded_IO
end Ada.Text_IO.Bounded_IO;
It will be noticed that these packages include functions
Get_Line as well as procedures Put_Line
and Get_Line corresponding to those in Text_IO.
The reason is that procedures Get_Line are
not entirely satisfactory.
If we do successive calls of the procedure Text_IO.Get_Line
using a string of length 80 on a series of lines of length 80 (we are
reading a nice old deck of punched cards), then it does not work as expected.
Alternate calls return a line of characters and a null string (the history
of this behaviour goes back to early Ada 83 days and is best left dormant).
Ada 2005 accordingly
adds corresponding functions Get_Line to the
package Ada.Text_IO itself thus
function Get_Line(File: File_Type) return String;
function Get_Line return String;
Successive calls of a function Get_Line
then neatly return the text on the cards one by one without bother.
© 2005, 2006 John Barnes Informatics.
Sponsored in part by: