Rationale for Ada 2005
8.3 Maps
We will now turn to the maps and sets packages. We
will start by considering maps which are more exciting than sets and
begin with ordered maps which are a little simpler and then consider
hashed maps.
Remember that a map
is just a means of getting from a value of one type (the key) to another
type (the element). This is not a one-one relationship. Given a key there
is a unique element (if any), but several keys may correspond to the
same element. A simple example is an array. This is a map from the index
type to the component type. Thus if we have
S: String := "animal";
then this provides a map from integers in the range
1 to 6 to some values of the type Character.
Given an integer such as 3 there is a unique
character 'i' but given a character such as
'a' there might be several corresponding integers
(in this case both 1 and 5).
More interesting examples are where the set of used
key values is quite sparse. For example we might have a store where various
spare parts are held. The parts have a five-digit part number and there
are perhaps twenty racks where they are held identified by a letter.
However, only a handful of the five digit numbers are in use so it would
be very wasteful to use an array with the part number as index. What
we want instead is a container which holds just the pairs that matter
such as (34618, 'F'), (27134, 'C') and so on. We can do this using a
map. We usually refer to the pairs of values as nodes of the map.
There are two maps
packages with much in common. One keeps the keys in order and the other
uses a hash function. Here is the specification of the ordered maps package
generally showing just those facilities common to both.
generic
type Key_Type is private;
type Element_Type is private;
with function "<" (Left, Right: Key_Type) return Boolean is <>;
with function "=" (Left, Right: Element_Type) return Boolean is <>;
package Ada.Containers.Ordered_Maps is
pragma Preelaborate(Ordered_Maps);
function Equivalent_Keys(Left: Right: Key_Type) return Boolean;
The generic parameters include the ordering relationship
"<" on the keys and equality
for the elements.
It is assumed that the ordering relationship is well
behaved in the sense that if x < y is true
then y < x is false. We say that two keys
x and y are equivalent
if both x < y and y
< x are false. In other words this defines an equivalence class
on keys. The relationship must also be transitive, that is, if x
< y and y < z are both true then
x < z must also be true.
This concept of an equivalence relationship occurs
throughout the various maps and sets. Sometimes, as here, it is defined
in terms of an order but in other cases, as we shall see, it is defined
by an equivalence function.
It is absolutely vital that the equivalence relations
are defined properly and meet the above requirements. It is not possible
for the container packages to check this and if the operations are wrong
then peculiar behaviour is almost inevitable.
For the convenience
of the user the function Equivalent_Keys is
declared explicitly. It is equivalent to
function Equivalent_Keys(Left, Right: Key_Type) return Boolean is
begin
return not (Left < Right) and not (Right < Left);
end Equivalent_Keys;
The equality operation on elements is not so demanding.
It must be symmetric so that x = y and y
= x are the same but transitivity is not required (although cases
where it would not automatically be transitive are likely to be rare).
The operation is only used for the function "="
on the containers as a whole.
Note that Find and similar
operations for maps and sets work in terms of the equivalence relationship
rather than equality as was the case with lists and vectors.
type Map is tagged private;
pragma Preelaborable_Initialization(Map);
type Cursor is private;
pragma Preelaborable_Initialization(Cursor);
Empty_Map: constant Map;
No_Element: constant Cursor;
The types Map and Cursor
and constants Empty_Map and No_Element
are similar to the corresponding entities in the lists and vectors containers.
function "=" (Left, Right: Map) return Boolean;
function Length(Container: Map) return Count_Type;
function Is_Empty(Container: Map) return Boolean;
procedure Clear(Container: in out Map);
These are again similar to the corresponding entities
for lists. Note that two maps are said to be equal if they have the same
number of nodes with equivalent keys (as defined by "<")
whose corresponding elements are equal (as defined by "=").
function Key(Position: Cursor) return Key_Type;
function Element(Position: Cursor) return Element_Type;
procedure Replace_Element(
Container: in out Map;
Position: in Cursor;
New_Item: in Element_Type);
procedure Query_Element(
Position: in Cursor;
Process: not null access procedure (Key: in Key_Type; Element: in Element_Type));
procedure Update_Element(
Container: in out Map; Position: in Cursor;
Process: not null access procedure (Key: in Key_Type; Element: in out Element_Type));
In this case there is a function Key
as well as a function Element. But there is
no procedure Replace_Key since it would not
make sense to change a key without changing the element as well and this
really comes down to deleting the whole node and then inserting a new
one.
The procedures Query_Element
and Update_Element are slightly different
in that the procedure Process also takes the
key as parameter as well as the element to be read or updated. Note again
that the key cannot be changed. Nevertheless the value of the key is
given since it might be useful in deciding how the update should be performed.
Remember that we cannot get uniquely from an element to a key but only
from a key to an element.
procedure Move(Target, Source: in out Map);
This moves the map from the source to the target
after first clearing the target. It does not make copies of the nodes
so that after the operation the source is empty and Length(Source)
is zero.
procedure Insert(
Container: in out Map;
Key: in Key_Type;
New_Item: in Element_Type;
Position: out Cursor;
Inserted: out Boolean);
procedure Insert(
Container: in out Map;
Key: in Key_Type;
Position: out Cursor;
Inserted: out Boolean);
procedure Insert(
Container: in out Map;
Key: in Key_Type;
New_Item: in Element_Type);
These insert a new node into the map unless a node
with an equivalent key already exists. If it does exist then the first
two return with Inserted set to False
and Position indicating the node whereas the
third raises Constraint_Error (the element
value is not changed). If a node with equivalent key is not found then
a new node is created with the given key, the element value is set to
New_Item when that is given and otherwise
it takes its default value (if any), and Position
is set when given.
Unlike vectors and lists, we do not have to say where
the new node is to be inserted because of course this is an ordered map
and it just goes in the correct place according to the order given by
the generic parameter "<".
procedure Include(
Container: in out Map;
Key: in Key_Type;
New_Item: in Element_Type);
This is somewhat like the last Insert
except that if an existing node with an equivalent key is found then
it is replaced (rather than raising Constraint_Error).
Note that both the key and the element are updated. This is because equivalent
keys might not be totally equal.
For example the key
part might be a record with part number and year of introduction, thus
type Part_Key is
record
Part_Number: Integer;
Year: Integer;
end record;
and we might define
the ordering relationship to be used as the generic parameter simply
in terms of the part number
function "<" (Left, Right: Part_Key) return Boolean is
begin
return Left.Part_Number < Right.Part_Number;
end "<";
In this situation, the keys could match without the
year component being the same and so it would need to be updated. In
other words with this definition of the ordering relation, two keys are
equivalent provided just the part numbers are the same.
procedure Replace(
Container: in out Map;
Key: in Key_Type;
New_Item: in Element_Type);
In this case, Constraint_Error
is raised if the node does not already exist. On replacement both the
key and the element are updated as for Include.
Perhaps a better example of equivalent keys not being
totally equal is if the key were a string. We might decide that the case
of letter did not need to match in the test for equivalence but nevertheless
we would probably want to update with the string as used in the parameter
of Replace.
procedure Exclude(Container: in out Map; Key: in Key_Type);
If there is a node with an equivalent key then it
is deleted. If there is not then nothing happens.
procedure Delete(Container: in out Map; Key: in Key_Type);
procedure Delete(Container: in out Map; Position: in out Cursor);
These delete a node. In the first case if there is
no such equivalent key then Constraint_Error
is raised (by contrast to Exclude which remains
silent in this case). In the second case if the cursor is No_Element
then again Constraint_Error is raised –
there is also a check to ensure that the cursor otherwise does designate
a node in the correct map (remember that cursors designate both an entity
and the container); if this check fails then Program_Error
is raised.
Perhaps it is worth observing that Insert,
Include, Replace,
Exclude and Delete
form a sort of progression from an operation that will insert something,
through operations that might insert, will neither insert nor delete,
might delete, to the final operation that will delete something. Note
also that Include, Replace
and Exclude do not apply to lists and vectors.
function First(Container: Map) return Cursor;
function Last(Container: Map) return Cursor;
function Next(Position: Cursor) return Cursor;
procedure Next(Position: in out Cursor);
function Find(Container: Map; Key: Key_Type) return Cursor;
function Element(Container: Map; Key: Key_Type) return Element;
function Contains(Container: Map; Key: Key_Type) return Boolean;
These should be self-evident. Unlike the operations
on vectors and lists, Find logically searches
the whole map and not just starting at some point (and since it searches
the whole map there is no point in having Reverse_Find).
(In implementation terms it won't actually search the whole map because
it will be structured in a way that makes this unnecessary – as
a balanced tree perhaps.) Moreover, Find uses
the equivalence relation based on the "<"
parameter so in the example it only has to match the part number and
not the year. The function call Element(My_Map, My_Key)
is equivalent to Element(Find(My_Map, My_Key)).
function Has_Element(Position: Cursor) return Boolean;
procedure Iterate(
Container: in Map;
Process: not null access procedure (Position: in Cursor));
These are also as for other containers.
And at last we have
private
... -- not specified by the language
end Ada.Containers.Ordered_Maps;
We have omitted to mention quite a few operations
that have no equivalent in hashed maps – we will come back to these
in a moment.
As an example we can
make a container to hold the information concerning spare parts. We can
use the type Part_Key and the function "<"
as above. We can suppose that the element type is
type Stock_Info is
record
Shelf: Character range 'A' .. 'T';
Stock: Integer;
end record;
This gives both the shelf letter and the number in
stock.
We can then declare
the container thus
package Store_Maps is
new Ordered_Maps(
Key_Type => Part_Key,
Element_Type => Stock_Info,
"<" => "<");
The_Store: Store_Maps.Map;
The last parameter could be omitted because the formal
has a <> default.
We can now add items
to our store by calling
The_Store.Insert((34618, 1998), ('F', 25));
The_Store.Insert((27134, 2004), ('C', 45));
...
We might now have a procedure which, given a part
number, checks to see if it exists and that the stock is not zero, and
if so returns the shelf letter and year number and decrements the stock
count.
procedure Request(
Part: in Integer; OK: out Boolean;
Year: out Integer; Shelf: out Character) is
C: Cursor;
K: Part_Key;
E: Stock_Info;
begin
C := The_Store.Find((Part, 0));
if C = No_Element then
OK := False; return; -- no such key
end if;
E := Element(C); K := Key(C);
Year := K.Year; Shelf := E.Shelf;
if E.Stock = 0 then
OK := False; return; -- out of stock
end if;
Replace_Element(C, (Shelf, E.Stock–1));
OK := True;
end Request;
Note that we had to
put a dummy year number in the call of Find.
We could of course use the new <> notation
for this
C := The_Store.Find((Part, others => <>));
The reader can improve this example at leisure –
by using Update_Element for example.
As another example
suppose we wish to check all through the stock looking for parts whose
stock is low, perhaps less than some given parameter. We can use Iterate
for this as follows
procedure Check_Stock(Low: in Integer) is
procedure Check_It(C: in Cursor) is
begin
if Element(C).Stock < Low then
-- print a message perhaps
Put("Low stock of part ");
Put_Line(Key(C).Part_Number);
end if;
end Check_It;
begin
The_Store.Iterate(Check_It'Access);
end Check_Stock;
Note that this uses a so-called downward closure.
The procedure Check_It has to be declared
locally to Check_Stock in order to access
the parameter Low. (Well you could declare
it outside and copy the parameter Low to a
global variable but that is just the sort of wicked thing one has to
do in lesser languages (such as even Ada 95). It is not task safe for
one thing.)
Another approach is
to use First and Next
and so on thus
procedure Check_Stock(Low: in Integer) is
C: Cursor := The_Store.First;
begin
loop
exit when C = No_Element;
if Element(C).Stock < Low then
-- print a message perhaps
Put("Low stock of part ");
Put_Line(Key(C).Part_Number);
end if;
C := The_Store.Next(C);
end loop;
end Check_Stock;
We will now consider hashed maps. The trouble with
ordered maps in general is that searching can be slow when the map has
many entries. Techniques such as a binary tree can be used but even so
the search time will increase at least as the logarithm of the number
of entries. A better approach is to use a hash function. This will be
familiar to many readers (especially those who have written compilers).
The general idea is as follows.
We define a function which takes a key and returns
some value in a given range. In the case of the Ada containers it has
to return a value of the modular type Hash_Type
which is declared in the root package Ada.Containers.
We could then convert this value onto a range representing an index into
an array whose size corresponds to the capacity of the map. This index
value is the preferred place to store the entry. If there already is
an entry at this place (because some other key has hashed to the same
value) then a number of approaches are possible. One way is to create
a list of entries with the same index value (often called buckets); another
way is simply to put it in the next available slot. The details don't
matter. But the overall effect is that provided the map is not too full
and the hash function is good then we can find an entry almost immediately
more or less irrespective of the size of the map.
So as users all we have to do is to define a suitable
hash function. It should give a good spread of values across the range
of
Hash_Type for the population of keys, it
should avoid clustering and above all for a given key it must
always
return the same hash value. A good discussion on hash functions by Knuth
will be found in
[10].
Defining good hash
functions needs care. In the case of the part numbers we might multiply
the part number by some obscure prime number and then truncate the result
down to the modular type Hash_Type. The author
hesitates to give an example but perhaps
function Part_Hash(P: Part_Key) return Hash_Type is
M31: constant := 2**31–1; -- a nice Mersenne prime
begin
return Hash_Type(P.Part_Number) * M31;
end Part_Hash;
On reflection that's probably a very bad prime to
use because it is so close to half of 2**32
a typical value of Hash_Type'Last+1. Of course
it doesn't have to be prime but simply relatively prime to it such as
5**13. Knuth suggests dividing the range by
the golden number τ = (√5+1)/2 = 1.618... and then taking
the nearest number relatively prime which is in fact simply the nearest
odd number (in this case it is 2654435769).
Here is a historic interlude. Marin Mersenne (1588-1648)
was a Franciscan monk who lived in Paris. He studied numbers of the form
Mp
= 2
p
– 1 where
p is prime. A lot of these are themselves prime.
Mersenne gave a list of those up to 257 which he said were prime (namely
2, 3, 5, 7, 13, 17, 19, 31, 67, 127, 257). It was not until 1947 that
it was finally settled that he got some of them wrong (61, 89, and 107
are also prime but 67 and 257 are not). At the time of writing there
are 42 known Mersenne primes and the largest which is also the largest
known prime number is
M25964951
– see
www.mersenne.org.
The specification of the hashed maps package is very
similar to that for ordered maps. It starts
generic
type Key_Type is private;
type Element_Type is private;
with function Hash(Key: Key_Type) return Hash_Type;
with function Equivalent_Keys(Left, Right: Key_Type) return Boolean;
with function "=" (Left, Right: Element_Type) return Boolean is <>;
package Ada.Containers.Hashed_Maps is
pragma Preelaborate(Hashed_Maps);
The differences from the ordered maps package are
that there is an extra generic parameter Hash
giving the hash function and the ordering parameter "<"
has been replaced by the function Equivalent_Keys.
It is this function that defines the equivalence relationship for hashed
maps; it is important that Equivalent_Keys(X, Y)
is always the same as Equivalent_Keys(Y, X).
Moreover if X and Y
are equivalent and Y and Z
are equivalent then X and Z
must also be equivalent.
Note that the function Equivalent_Keys
in the ordered maps package discussed above corresponds to the formal
generic parameter of the same name in this hashed maps package. This
should make it easier to convert between the two forms of packages.
Returning to our example,
if we now write
function Equivalent_Parts(Left, Right: Part_Key) return Boolean is
begin
return Left.Part_Number = Right.Part_Number;
end Equivalent_Parts;
then we can instantiate
the hashed maps package as follows
package Store_Maps is new Hashed_Maps(
Key_Type => Part_Key,
Element_Type => Stock_Info,
Hash => Part_Hash,
Equivalent_Keys => Equivalent_Parts);
The_Store: Store_Maps.Map;
and then the rest of our example will be exactly
as before. It is thus easy to convert from an ordered map to a hashed
map and vice versa provided of course that we only use the facilities
common to both.
We will finish this discussion of maps by briefly
considering the additional facilities in the two packages.
The ordered maps package
has the following additional subprograms
procedure Delete_First(Container: in out Map);
procedure Delete_Last(Container: in out Map);
function First_Element(Container: Map) return Element_Type;
function First_Key(Container: Map) return Key_Type;
function Last_Element(Container: Map) return Element_Type;
function Last_Key(Container: Map) return Key_Type;
function Previous(Position: Cursor) return Cursor;
procedure Previous(Position: in out Cursor);
function Floor(Container: Map; Key: Key_Type) return Cursor;
function Ceiling(Container: Map; Key: Key_Type) return Cursor;
function "<" (Left, Right: Cursor) return Boolean;
function ">" (Left, Right: Cursor) return Boolean;
function "<" (Left: Cursor; Right: Key_Type) return Boolean;
function ">" (Left: Cursor; Right: Key_Type) return Boolean;
function "<" (Left: Key_Type; Right: Cursor) return Boolean;
function ">" (Left: Key_Type; Right: Cursor) return Boolean;
procedure Reverse_Iterate(
Container: in Map;
Process: not null access procedure (Position: in Cursor));
These are again largely self-evident. The functions
Floor and Ceiling
are interesting. Floor searches for the last
node whose key is not greater than Key and
similarly Ceiling searches for the first node
whose key is not less than Key – they
return No_Element if there is no such element.
The subprograms Previous are of course the
opposite of Next and Reverse_Iterate
is like Iterate only backwards.
The functions "<"
and ">" are mostly for convenience.
Thus the first is equivalent to
function "<" (Left, Right: Cursor) return Boolean is
begin
return Key(Left) < Key(Right);
end "<";
Clearly these additional operations must be avoided
if we wish to retain the option of converting to a hashed map later.
Hashed maps have a
very important facility not in ordered maps which is the ability to specify
a capacity as for the vectors package. (Underneath their skin the hashed
maps are a bit like vectors whereas the ordered maps are a bit like lists.)
Thus we have
procedure Reserve_Capacity(Container: in out Map; Capacity: in Count_Type);
function Capacity(Container: Map) return Count_Type;
The behaviour is much as for vectors. We don't have
to set the capacity ourselves since it will be automatically extended
as necessary but it might significantly improve performance to do so.
In the case of maps, increasing the capacity requires the hashing to
be redone which could be quite time consuming, so if we know that our
map is going to be a big one, it is a good idea to set an appropriate
capacity right from the beginning. Note again that Length(M)
cannot exceed Capacity(M) but might be much
less.
The other additional
subprograms for hashed maps are
function Equivalent_Keys(Left, Right: Cursor) return Boolean;
function Equivalent_Keys(Left: Cursor; Right: Key_Type) return Boolean;
function Equivalent_Keys(Left: Key_Type; Right: Cursor) return Boolean;
These (like the additional
"<" and ">"
for ordered maps) are again mostly for convenience. The first is equivalent
to
function Equivalent_Keys(Left, Right: Cursor) return Boolean is
begin
return Equivalent_Keys(Key(Left), Key(Right));
end Equivalent_Keys;
Before moving on to
sets it should be noticed that there are also some useful functions in
the string packages. The main one is
with Ada.Containers;
function Ada.Strings.Hash(Key: String) return Containers.Hash_Type;
pragma Pure(Ada.Strings.Hash);
There is a similar function Ada.Strings.Unbounded.Hash
where the parameter Key has type Unbounded_String.
It simply converts the parameter to the type String
and then calls Ada.Strings.Hash. There is
also a generic function for bounded strings which again calls the basic
function Ada.Strings.Hash. For completeness
the function Ada.Strings.Fixed.Hash is a renaming
of Ada.Strings.Hash.
These are provided because it is often the case that
the key is a string and they save the user from devising good hash functions
for strings which might cause a nasty headache.
We could for example
save ourselves the worry of defining a good hash function in the above
example by making the part number into a 5-character string. So we might
write
function Part_Hash(P: Part_Key) return Hash_Type is
begin
return Ada.Strings.Hash(P.Part_Number);
end Part_Hash;
and if this doesn't work well then we can blame the
vendor.
© 2005, 2006 John Barnes Informatics.
Sponsored in part by: