Value Classes

This section illustrates the declaration of value types using a simple version of the complex number class, CPX. We also describe the benefits of value classes and when you might consider using them. Similar semantics can be obtained using immutable reference classes - we show how with a reference version of CPX. Finally, we discuss the aliasing problems that can arise in FSTR and similar classes when not using value semantics for efficiency reasons.

What are Value Classes

At a fundamental level: value classes define objects which, once created, never change their value. A variable of a value type may only be changed by re-assigning to that variable. When we wish to only modify some portion of a value class (one attribute, say), we are compelled to reassign the whole object. Hence, an attribute

	attr a: INT

of a value class has a setting routine called

 
	a(new_a_value: INT):SAME

which returns a new value object in which the attribute a has the value new_a_value. This contrasts with a reference class, in which the setting routine for a similar attribute would have the signature

	 a(new_a_value: INT)

The syntax of the setting routines of value classes is a common source of confusion when one is first introduced to Sather.

For experienced C programmers the difference between value and reference classes is similar to the difference between structs (value types) and pointers to structs (reference types). Because of that difference, reference objects can be referred to from more than one variable (aliased). Value objects cannot.

The basic types mentioned (except arrays) are value classes. Reference objects must be explicitly allocated with new. Variables have the value void until an object is assigned to them. Void for reference objects is similar to a null pointer in C. Void for value objects means that a predefined value is assigned ( 0 for INT, ` 0` for CHAR, false for BOOL, and 0.0 for FLT). Accessing an attribute of a void value object will always work. Accessing an attribute of a void reference object results in a fatal error.

Value Class Example

We illustrate the use of value classes through a commonly used example: the complex class CPX. The version shown here is a much simplified version of the library class CPX{T}. The key point to note is the manner in which attributes values are set in the create routine.

value class CPX  is
   -- Complex numbers.

   readonly attr real,imag: FLT;
        -- Real and imaginary parts.
   
   create(re,im:FLT):SAME is 
      -- Returns a complex number with its real and imaginary parts set
      -- according to the arguments
      return real(re).imag(im);

      -- This is equivalent to the more detailed steps:	
      -- res: SAME;             -- Declare a result value
      -- res := res.real(re);   -- Set the real part and reassign
      -- The attribute setting routine for real is real(FLT): SAME which 
      -- takes an argument of type FLT and returns (conceptually) a new CPX
      -- res := res.imag(im);  
      -- return(res);      
    end;
   
   is_eq(c: SAME): BOOL is 
      -- Return true if this object is equal to "c"
      return real=c.real and imag=c.imag
   end;

   plus(c:SAME):SAME is
      -- Return a complex number which represents the sum of self and `c'.
      return  #CPX(real+c.real,imag+c.imag); 
      -- # is syntactic sugar for the create routine above
    end;

   str:STR is
      -- A string representation of self of the form "1.02+3.23i".
      return real.str + ("+") + imag.str + "i";
    end;
end;

The complex class may then be used in the following manner.

        b: CPX := #(2.0,3.0);
        d: CPX := #(4.0,5.0);
        c: CPX := b+d;

The key point to note is the assignment of the attributes of the value class. The attribute imag adds the following reader/writer routines to the class interface

        imag(new_imag_value: FLT): SAME      -- Setting attribute routine
        imag: FLT;                           -- Getting attribute routine

The reason that the setting routine returns SAME is that we cannot really modify a value object. Instead, we (conceptually) generate a new value object whose imaginary field has the desired value. The setting routine, imag(FLT):SAME, returns this new value object.

If CPX were a regular reference class, the attribute access routines would be

        imag(new_imag_value: FLT);             -- Setting attribute routine
        image: FLT		               -- Getting attribute routine

For a more elaborate example, please consult the section on nil and void (See Nil and Void)

Using Value Classes

Value classes behave differently from concrete classes both in terms of their abstract behaviour (value semantics) and in terms of their implementation.

To begin with, value classes cannot suffer from aliasing problems, since they are immutable. You can get the same effect with reference classes by not providing any modifying operations in the interface - any operation that would modify the object, returns a new object instead. For example, take a look at the STR class

Value classes have several efficiency advantages over reference classes. Since they are usually stored on the stack, they have no heap management overhead and need not be garbage collected. They also don't use space to store a tag, and the absence of aliasing makes more C compiler optimizations possible. For a small class like CPX, all these factors combine to give a significant win over a reference class implementation. On the other hand, copying large value objects onto the stack can incur significant overhead. Unfortunately the efficiency of a value class appears directly tied to how smart the C compiler is; ``gcc'' is not very bright in this respect.

Note that when a value class is passed as an argument to a function which is expecting an abstract type, the compiler ``boxes'' it i.e. it is given a temporary reference class wrapper with a type-tag. Thus, value objects behaves exactly like an immutable reference objects in this situation.

Rules of Thumb

You want the class to have value semantics. You could still consider an immutable reference class (See immutable classes).
The class is small - the exact speed trade-offs have not been investigated, but value classes have so far been used when there are only a handful of attributes.
There are going to be a large number of objects of that class. This goes along with the previous point. For instance, if you are going to have large arrays of complex numbers, then the space that would be required for an object pointer and an object tag may be considerable.

Non-existant objects: Nil and void

One complexity in using value classes is the meaning of void, which is usually used to indicate a non-existent object. void is perfectly clear in the case of reference objects, where it is implemented by the NULL pointer. However, a value object exists as soon as it is declared (initialized to all zero values), and is never non-existent. Sather's solution is to say that a value object is void if it has this initial, all zero value. This introduces its own problems, since we may well want to use the all-zero value as a legitimate value (for instance, we frequently want to make use of INTs and FLTs with values of zero!).

To illustrate this discussion, we start with a simple example. The same class CPX used above is re-implemented, this time with an abstract type $CPX, below which are a reference implementation REF_CPX and a value implementation VAL_CPX.

The class $CPX defines the abstract inteface which may be realized by either a value or a reference class:

 
        type $CPX is
           real: FLT;
           imag: FLT;
           plus(arg: $CPX): $CPX;
        end;

The reference class implementation defines the ``plus'' routine to return a new object with the sum.

	
        class REF_CPX < $CPX is
          -- Reference class version
          readonly attr real: FLT;  -- Make the reader publicly available
          readonly attr imag: FLT;

          create(r,i: FLT): SAME is
            res ::= new;
            res.real := r; 
            res.imag := i;      
            return(res);
          end;

          plus(arg: $CPX): REF_CPX is
            -- By the contravariant rule:
            -- The argument may be either a $CPX or some supertype of $CPX
            -- The return value may be either a $CPX or some subtype of $CPX.
            new_real: FLT := real+arg.real;
            new_imag: FLT := imag+arg.imag;
            res: REF_CPX := #REF_CPX(new_real,new_imag);
            return(res);
          end;
        end;

The value class implementation is the same as that mentioned previously:

        value class VAL_CPX < $CPX is
          readonly attr real,imag: FLT;

          create(re,im:FLT):SAME is 
             return(real(re).imag(im));
          end;
   
          plus(c:$CPX): VAL_CPX is
             res: VAL_CPX := #(real+c.real,imag+c.imag); 
             return(res);
          end;
        end;

As you can see, there are only subtle differences in the way the value and the reference classes are created. However, what if we want to perform the void test on the variables below?

        
        a: VAL_CPX;
        b: REF_CPX;
        #OUT+void(a);    -- "true"
        #OUT+void(b);    -- "true"

We would like both calls to generate void, since neither variable has been properly initialized, and Sather's definition of void achieves exactly this. However, problematic situations can arise, since there will doubtless be occassions when you want to use a zero valued complex number!

        a: VAL_CPX := #VAL_CPX(0.0,0.0);
        b: REF_CPX := #REF_CPX(0.0,0.0);
        #OUT+void(a);   -- Returns "true"!
        #OUT+void(b);   -- Returns "false"

will generate void for the a and non- void for the call b, because a happened to be set to our void value! The reference class has no such confusion.

The birth of nil

To get around this problem, we can provide a user-defined nil value for the value class - nil will be an unused value which will signify a non-existent value object. In the case of the FLT class, a good nil value is ``NaN''.

        value class VAL_CPX} < $NIL{VAL_CPX, $CPX is
          is_nil: BOOL is
                -- A complex number is nil if both real and im parts are nil
               return(imag.is_nil and real.is_nil);
           end;

           nil: VAL_CPX is
             -- Return the "nil" value, which consists of 
             -- a complex number with two nil FLT components
             return(#(FLT::nil,FLT::nil))
           end;
        end;

In this case, we make use of nil FLT values to signify a nil complex number. The FLT class itself uses ``NaN'' (not a number) to signify a nil value. We subtype from the class $NIL to indicate that VAL_CPX provides the two routines is_nil and nil.

     type IS_NIL is
        is_nil: BOOL; -- Return true if this object is nil
     end;
     type $NIL{T} < $IS_NIL is
        nil: T;    -- The actual nil value
     end;

Hence, when checking to see whether an object is non-existent we can now do the following:

         object_does_not_exist(a: $CPX): BOOL is
            -- Returns true if "a" is a non-existent object.
           typecase a
           when $IS_NIL then return(a.is_nil);
                -- When a is a subtype of $IS_NIL
           else return(void(a)) end;
        end;

This is a fairly standard idiom; you will find code similar to this, particularly in parametrized classes such as FMAP etc, where the nil value is used to indicate empty spots in the hash table.

        a: VAL_CPX := VAL_CPX::nil;
        b: REF_CPX;
        #OUT+object_does_not_exist(a);   -- Returns "true"
        #OUT+object_does_not_exist(b);   -- Returns "true"
        
        c: VAL_CPX := #VAL_CPX(0.0,0.0);
        d: REF_CPX := #REF_CPX(0.0,0.0);
        #OUT+object_does_not_exist(c);   -- Returns "false"
        #OUT+object_does_not_exist(d);   -- Returns "false"

We now need to always initialize value classes when we declare them to the nil value (it would be nice if the language did this automatically, but this introduces other complications). This solution suffers from another problem; for some classes such as INT, there is no single value that can be set aside to signal nil, since we would then not be able to use that value.

Please note that the IS_NIL class does not yet exist, but will shortly.

Immutable Reference Classes

It is possible to get the same semantics as a value class by defining an immutable reference class. Immutable reference objects return a copy of the object whenever an operation might modify the object. An immutable class is not a Sather construct. Rather, it is a property of a reference class interface.

Immutable CPX

Value semantics can be achieved for reference classes my making them immutable. All operations that might be most naturally defined to modify the original object, instead return a new object with the appropriate modification.

We can imagine defining a ``square'' operation in REF_CPX as follows:

        square is
            r ::= real*real-imag*imag;
            i ::= 2*real*imag;
            real := r;
            imag := i;
          end;

However, our complex numbers would then behave unexpectedly.

       a: REF_CPX := #REF_CPX(3.0,4.0);
       c: REF_CPX := a;                -- c is (3.0,4.0)
       a.square;                       -- c's value has also changed to (-7,24)

The resulting value of a is as we expect (-7.0,24.0). The value of c, however, has changed to be (-7.0,24.0). The reason, of course, is that c points to the a object.

The REF_CPX class we defined above was immutable. Square would return a copy of the class instead of modifying the original.

        square: REF_CPX is
            r := real*real-imag*imag;
            i := 2*real*imag;
            return(#REF_CPX(r,i));
	end;

       a: REF_CPX := #REF_CPX(3.0,4.0);
       c: REF_CPX := a;                 -- c is (3.0,4.0)
       a := a.square;                   -- c's value is still (3.0,4.0);

FSTR and Immutable STR

The class STR is an example of such an immutable reference class in the library - a reference class with value semantics. We generally expect value semantics for strings i.e.

        a: STR := "Beetle ";
        b: STR := a;
        a := a+"dung";

At the end of this example, we would like a to hold "Beetle dung" and b to hold ``Beetle''. However, were STR a reference class with aliasing, b might well be modified along with a. There are two possible solutions. The obvious one is to make STR a value class. However, strings can be extremely large (in the complier, whole files are held in strings), and should definitely not be passed on the stack, which is the current implementation. The other choice is to make an immutable reference class, where every modification generates a copy. However, this copying is inefficient. So we have two types of strings. The basic class STR, which is an immutable reference class and the type of the string literal. We also have the more efficient class FSTR, which is not immutable. Using FSTR can therefore result in aliasing bugs.

To explain this futher we consider the plus operation in FSTR and in STR (this version is simplified to explain the point. The library version is more general and efficient). The STR version is relatively straightforward. It just allocates a new STR, sufficiently large to hold the result and copies self and the argument into the allocated STR and returns it.

    plus(s: STR):SAME is
        -- A new string obtained by appending `s' to self.
        -- This routine is actually from  STR::append(STR) in the library.

        -- If either self or sarg is void, return the other
        if void(self) then return s; end;
        if void(s) then return self; end;
        selfsize::=asize;            -- Determine the size of self 
        ssize::=s.asize;             -- and sarg strings
        r::=new(selfsize+ssize);     -- Allocate a new string for result
        r.acopy(self);               -- Copy self 
        r.acopy(selfsize,s);         -- and the argument into the new string
        return r;                    -- return the new string
    end;

The corresponding routine in FSTR is much more complex, since it tries to make use of the original array, if the data will fit. Only if the data will not fit, does it allocate a new array and copy the old data into it. When allocating the new array it doubles the space allocated, so that there is space for the FSTR to grow for a while longer, without copying. This is the technique of amortized dobuling

    plus(s:SAME):SAME is
    -- Append the string `s' to self and return it. 
        -- r will hold the result
        r:SAME;
        l ::= s.length;
	 -- If the argument would fit into our left over space, use self
        if (loc + l < asize) then     r:=self;       else
        -- Otherwise, make the result a new string twice the length of 
	-- self + the argument. This is called amortized doubling 
            r :=new(2*(asize+l));
            if (~void(self)) then  -- If self is not void, copy it over into r
              r.loc := loc;        -- Set the end pointer of r before the copy
              r.acopy(self);      
               -- Mark the old string as destroyed.
               -- This helps prevent aliasing bugs - if someone accesses 
               -- the old string and destroy checking is on, an error occurs
	       -- This helps catch aliasing bugs 
              SYS::destroy(self);   -- The old one should never be used now.
            end;
        end;
        -- "r" now holds the original string and space enough for the arg
        -- If the argument string has a size of 0, just return "r"
        if (l = 0) then return r; end;
        r.loc := r.loc + l;         -- Set the new location
        r.acopy(r.loc-l,s);         -- Copy the argument into the result
        return r;
    end;

Both STRs and FSTRs are meant to behave in roughly the same manner, but FSTRs can exhibit aliasing bugs as shown below.

        a: STR := #STR;         -- A new string
        a := a+"Beetle";        -- Append the string "Beetle" to a
        b: STR := a;            -- "b" now points to "a"
        c: STR := a+" Dung";
        #OUT+c;                 -- Outputs "Beetle Dung"
        #OUT+b;                 -- Outputs "Beetle"  
                                -- "b" has not been modified by aliasing
                                -- because the append operation returned 
                                -- a new string.

        -- To avoid seeing the effect of amortized doubling, we
        -- first create a large amount of space for "a"
        a: FSTR := #FSTR;     -- Create an empty FSTR, which starts out with 
                              -- space for 16 chars
        a := a+"Beetle";      
        b: FSTR := a;         
        c: FSTR := a+" Dung";
        #OUT+c;               -- Outputs "Beetle Dung"
        #OUT+b;               -- Outputs "Beetle Dung"  
                              -- "b" has been modified
                              -- because the append operation modified the
                              -- the original string.

Amortized Doubling further complicates matters

The presence of such unexpected side effects can be even more tricky in reality, since it is usually not obvious to the user when exactly a new string will be allocated (due to the amortized doubling). It is possible for a program that works fine with ``n'' characters to break when we add or delete a few characters.

        a: FSTR := #FSTR("A test");     -- 6 chars
        a := a+" of";                   -- Amortized doubling kicks in
                                        -- "a" now has space for 18 characters
        b: FSTR := a;                   -- "b" is aliased to "a"
        a := a+"123456789012"           -- Add on another 12 characters - this 
                                        -- won't fit into "a"'s 18 chars and 
                                        -- so a new string will be 
                                        -- allocated and returned.
        #OUT+a;                         -- "A test of12345678901"
        #OUT+b;                         -- "A test of";   
                                        -- The original "b" is left unchanged.

However, if we change this program slightly, aliasing suddenly rears its ugly head!

        ... First 3 lines are the same ...
        a := a+"123456"                -- Add on another 6 characters - this 
                                       -- now fits into the space "a" has 
                                       -- left over.
                                       -- We just modify the original string.
        #OUT+a;                        -- "A test of123456"
        #OUT+b;                        -- "A test of123456";  
                                        -- b is changed due to aliasing!

The destroy check allows us to catch some of these problems. If destroy checking is on, reading or modifying b should generate a run-time error.

Using FSTR and other F classes

While the preceeding discussion may make FSTRs seem dangerous, in practice a couple of simple rules will prevent any problems.

Set only one variable to hold the FSTR and always reassign back to that variable when a modification takes place.
Avoid letting an FSTR be part of a class interface. If possible confine FSTR usage to local variables.
In general, only use FSTR when efficiency is an issue; this is usually the case when building up a string in small bits, possibly from parsing or user input.

Value Class History

Value classes were initially proposed to generalize the notion of basic types such as INTs and FLTs. They were particularly relevant to the use of Sather on machines which had special arithmetic types ( FLTs with differing precisions etc.). Later, however, the value semantics came to be emphasized, they acquired attributes and were allowed to be under abstract types.

Benedict A. Gomes
Mon Apr 29 10:12:43 PDT 1996