This section illustrates the declaration of value types using a simple version of the complex number class, CPX. We also describe the benefits of value classes and when you might consider using them. Similar semantics can be obtained using immutable reference classes - we show how with a reference version of CPX. Finally, we discuss the aliasing problems that can arise in FSTR and similar classes when not using value semantics for efficiency reasons.
attr a: INTof a value class has a setting routine called
a(new_a_value: INT):SAMEwhich returns a new value object in which the attribute a has the value new_a_value. This contrasts with a reference class, in which the setting routine for a similar attribute would have the signature
a(new_a_value: INT)The syntax of the setting routines of value classes is a common source of confusion when one is first introduced to Sather.
For experienced C programmers the difference between value and reference classes is similar to the difference between structs (value types) and pointers to structs (reference types). Because of that difference, reference objects can be referred to from more than one variable (aliased). Value objects cannot.
The basic types mentioned (except arrays) are value
classes. Reference objects must be explicitly allocated with
new. Variables have the value void until an object is
assigned to them. Void for reference objects is similar to a null
pointer in C. Void for value objects means that a predefined value is
assigned ( 0 for INT, `
0` for CHAR,
false for BOOL, and 0.0 for FLT).
Accessing an attribute of a void value object will always
work. Accessing an attribute of a void reference object results in a
fatal error.
value class CPX is -- Complex numbers. readonly attr real,imag: FLT; -- Real and imaginary parts. create(re,im:FLT):SAME is -- Returns a complex number with its real and imaginary parts set -- according to the arguments return real(re).imag(im); -- This is equivalent to the more detailed steps: -- res: SAME; -- Declare a result value -- res := res.real(re); -- Set the real part and reassign -- The attribute setting routine for real is real(FLT): SAME which -- takes an argument of type FLT and returns (conceptually) a new CPX -- res := res.imag(im); -- return(res); end; is_eq(c: SAME): BOOL is -- Return true if this object is equal to "c" return real=c.real and imag=c.imag end; plus(c:SAME):SAME is -- Return a complex number which represents the sum of self and `c'. return #CPX(real+c.real,imag+c.imag); -- # is syntactic sugar for the create routine above end; str:STR is -- A string representation of self of the form "1.02+3.23i". return real.str + ("+") + imag.str + "i"; end; end;
The complex class may then be used in the following manner.
b: CPX := #(2.0,3.0); d: CPX := #(4.0,5.0); c: CPX := b+d;
The key point to note is the assignment of the attributes of the value class. The attribute imag adds the following reader/writer routines to the class interface
imag(new_imag_value: FLT): SAME -- Setting attribute routine imag: FLT; -- Getting attribute routine
The reason that the setting routine returns SAME is that we cannot really modify a value object. Instead, we (conceptually) generate a new value object whose imaginary field has the desired value. The setting routine, imag(FLT):SAME, returns this new value object.
If CPX were a regular reference class, the attribute access routines would be
imag(new_imag_value: FLT); -- Setting attribute routine image: FLT -- Getting attribute routineFor a more elaborate example, please consult the section on nil and void (See Nil and Void)
Value classes behave differently from concrete classes both in terms of their abstract behaviour (value semantics) and in terms of their implementation.
To begin with, value classes cannot suffer from aliasing problems, since they are immutable. You can get the same effect with reference classes by not providing any modifying operations in the interface - any operation that would modify the object, returns a new object instead. For example, take a look at the STR class
Value classes have several efficiency advantages over reference classes. Since they are usually stored on the stack, they have no heap management overhead and need not be garbage collected. They also don't use space to store a tag, and the absence of aliasing makes more C compiler optimizations possible. For a small class like CPX, all these factors combine to give a significant win over a reference class implementation. On the other hand, copying large value objects onto the stack can incur significant overhead. Unfortunately the efficiency of a value class appears directly tied to how smart the C compiler is; ``gcc'' is not very bright in this respect.
Note that when a value class is passed as an argument to a function which is expecting an abstract type, the compiler ``boxes'' it i.e. it is given a temporary reference class wrapper with a type-tag. Thus, value objects behaves exactly like an immutable reference objects in this situation.
To illustrate this discussion, we start with a simple example. The same class CPX used above is re-implemented, this time with an abstract type $CPX, below which are a reference implementation REF_CPX and a value implementation VAL_CPX.
The class $CPX defines the abstract inteface which may be realized by either a value or a reference class:
type $CPX is real: FLT; imag: FLT; plus(arg: $CPX): $CPX; end;The reference class implementation defines the ``plus'' routine to return a new object with the sum.
class REF_CPX < $CPX is -- Reference class version readonly attr real: FLT; -- Make the reader publicly available readonly attr imag: FLT; create(r,i: FLT): SAME is res ::= new; res.real := r; res.imag := i; return(res); end; plus(arg: $CPX): REF_CPX is -- By the contravariant rule: -- The argument may be either a $CPX or some supertype of $CPX -- The return value may be either a $CPX or some subtype of $CPX. new_real: FLT := real+arg.real; new_imag: FLT := imag+arg.imag; res: REF_CPX := #REF_CPX(new_real,new_imag); return(res); end; end;The value class implementation is the same as that mentioned previously:
value class VAL_CPX < $CPX is readonly attr real,imag: FLT; create(re,im:FLT):SAME is return(real(re).imag(im)); end; plus(c:$CPX): VAL_CPX is res: VAL_CPX := #(real+c.real,imag+c.imag); return(res); end; end;
As you can see, there are only subtle differences in the way the value and the reference classes are created. However, what if we want to perform the void test on the variables below?
a: VAL_CPX; b: REF_CPX; #OUT+void(a); -- "true" #OUT+void(b); -- "true"We would like both calls to generate void, since neither variable has been properly initialized, and Sather's definition of void achieves exactly this. However, problematic situations can arise, since there will doubtless be occassions when you want to use a zero valued complex number!
a: VAL_CPX := #VAL_CPX(0.0,0.0); b: REF_CPX := #REF_CPX(0.0,0.0); #OUT+void(a); -- Returns "true"! #OUT+void(b); -- Returns "false"will generate void for the a and non- void for the call b, because a happened to be set to our void value! The reference class has no such confusion.
To get around this problem, we can provide a user-defined nil value for the value class - nil will be an unused value which will signify a non-existent value object. In the case of the FLT class, a good nil value is ``NaN''.
value class VAL_CPX} < $NIL{VAL_CPX, $CPX is is_nil: BOOL is -- A complex number is nil if both real and im parts are nil return(imag.is_nil and real.is_nil); end; nil: VAL_CPX is -- Return the "nil" value, which consists of -- a complex number with two nil FLT components return(#(FLT::nil,FLT::nil)) end; end;In this case, we make use of nil FLT values to signify a nil complex number. The FLT class itself uses ``NaN'' (not a number) to signify a nil value. We subtype from the class $NIL to indicate that VAL_CPX provides the two routines is_nil and nil.
type IS_NIL is is_nil: BOOL; -- Return true if this object is nil end; type $NIL{T} < $IS_NIL is nil: T; -- The actual nil value end;Hence, when checking to see whether an object is non-existent we can now do the following:
object_does_not_exist(a: $CPX): BOOL is -- Returns true if "a" is a non-existent object. typecase a when $IS_NIL then return(a.is_nil); -- When a is a subtype of $IS_NIL else return(void(a)) end; end;
This is a fairly standard idiom; you will find code similar to this, particularly in parametrized classes such as FMAP etc, where the nil value is used to indicate empty spots in the hash table.
a: VAL_CPX := VAL_CPX::nil; b: REF_CPX; #OUT+object_does_not_exist(a); -- Returns "true" #OUT+object_does_not_exist(b); -- Returns "true" c: VAL_CPX := #VAL_CPX(0.0,0.0); d: REF_CPX := #REF_CPX(0.0,0.0); #OUT+object_does_not_exist(c); -- Returns "false" #OUT+object_does_not_exist(d); -- Returns "false"We now need to always initialize value classes when we declare them to the nil value (it would be nice if the language did this automatically, but this introduces other complications). This solution suffers from another problem; for some classes such as INT, there is no single value that can be set aside to signal nil, since we would then not be able to use that value.
Please note that the IS_NIL class does not yet exist, but will shortly.
It is possible to get the same semantics as a value class by defining an immutable reference class. Immutable reference objects return a copy of the object whenever an operation might modify the object. An immutable class is not a Sather construct. Rather, it is a property of a reference class interface.
Value semantics can be achieved for reference classes my making them immutable. All operations that might be most naturally defined to modify the original object, instead return a new object with the appropriate modification.
We can imagine defining a ``square'' operation in REF_CPX as follows:
square is r ::= real*real-imag*imag; i ::= 2*real*imag; real := r; imag := i; end;However, our complex numbers would then behave unexpectedly.
a: REF_CPX := #REF_CPX(3.0,4.0); c: REF_CPX := a; -- c is (3.0,4.0) a.square; -- c's value has also changed to (-7,24)
The resulting value of a is as we expect (-7.0,24.0). The value of c, however, has changed to be (-7.0,24.0). The reason, of course, is that c points to the a object.
The REF_CPX class we defined above was immutable. Square would return a copy of the class instead of modifying the original.
square: REF_CPX is r := real*real-imag*imag; i := 2*real*imag; return(#REF_CPX(r,i)); end; a: REF_CPX := #REF_CPX(3.0,4.0); c: REF_CPX := a; -- c is (3.0,4.0) a := a.square; -- c's value is still (3.0,4.0);
The class STR is an example of such an immutable reference class in the library - a reference class with value semantics. We generally expect value semantics for strings i.e.
a: STR := "Beetle "; b: STR := a; a := a+"dung";At the end of this example, we would like a to hold "Beetle dung" and b to hold ``Beetle''. However, were STR a reference class with aliasing, b might well be modified along with a. There are two possible solutions. The obvious one is to make STR a value class. However, strings can be extremely large (in the complier, whole files are held in strings), and should definitely not be passed on the stack, which is the current implementation. The other choice is to make an immutable reference class, where every modification generates a copy. However, this copying is inefficient. So we have two types of strings. The basic class STR, which is an immutable reference class and the type of the string literal. We also have the more efficient class FSTR, which is not immutable. Using FSTR can therefore result in aliasing bugs.
To explain this futher we consider the plus operation in FSTR and in STR (this version is simplified to explain the point. The library version is more general and efficient). The STR version is relatively straightforward. It just allocates a new STR, sufficiently large to hold the result and copies self and the argument into the allocated STR and returns it.
plus(s: STR):SAME is -- A new string obtained by appending `s' to self. -- This routine is actually from STR::append(STR) in the library. -- If either self or sarg is void, return the other if void(self) then return s; end; if void(s) then return self; end; selfsize::=asize; -- Determine the size of self ssize::=s.asize; -- and sarg strings r::=new(selfsize+ssize); -- Allocate a new string for result r.acopy(self); -- Copy self r.acopy(selfsize,s); -- and the argument into the new string return r; -- return the new string end;The corresponding routine in FSTR is much more complex, since it tries to make use of the original array, if the data will fit. Only if the data will not fit, does it allocate a new array and copy the old data into it. When allocating the new array it doubles the space allocated, so that there is space for the FSTR to grow for a while longer, without copying. This is the technique of amortized dobuling
plus(s:SAME):SAME is -- Append the string `s' to self and return it. -- r will hold the result r:SAME; l ::= s.length; -- If the argument would fit into our left over space, use self if (loc + l < asize) then r:=self; else -- Otherwise, make the result a new string twice the length of -- self + the argument. This is called amortized doubling r :=new(2*(asize+l)); if (~void(self)) then -- If self is not void, copy it over into r r.loc := loc; -- Set the end pointer of r before the copy r.acopy(self); -- Mark the old string as destroyed. -- This helps prevent aliasing bugs - if someone accesses -- the old string and destroy checking is on, an error occurs -- This helps catch aliasing bugs SYS::destroy(self); -- The old one should never be used now. end; end; -- "r" now holds the original string and space enough for the arg -- If the argument string has a size of 0, just return "r" if (l = 0) then return r; end; r.loc := r.loc + l; -- Set the new location r.acopy(r.loc-l,s); -- Copy the argument into the result return r; end;
Both STRs and FSTRs are meant to behave in roughly the same manner, but FSTRs can exhibit aliasing bugs as shown below.
a: STR := #STR; -- A new string a := a+"Beetle"; -- Append the string "Beetle" to a b: STR := a; -- "b" now points to "a" c: STR := a+" Dung"; #OUT+c; -- Outputs "Beetle Dung" #OUT+b; -- Outputs "Beetle" -- "b" has not been modified by aliasing -- because the append operation returned -- a new string. -- To avoid seeing the effect of amortized doubling, we -- first create a large amount of space for "a" a: FSTR := #FSTR; -- Create an empty FSTR, which starts out with -- space for 16 chars a := a+"Beetle"; b: FSTR := a; c: FSTR := a+" Dung"; #OUT+c; -- Outputs "Beetle Dung" #OUT+b; -- Outputs "Beetle Dung" -- "b" has been modified -- because the append operation modified the -- the original string.
The presence of such unexpected side effects can be even more tricky in reality, since it is usually not obvious to the user when exactly a new string will be allocated (due to the amortized doubling). It is possible for a program that works fine with ``n'' characters to break when we add or delete a few characters.
a: FSTR := #FSTR("A test"); -- 6 chars a := a+" of"; -- Amortized doubling kicks in -- "a" now has space for 18 characters b: FSTR := a; -- "b" is aliased to "a" a := a+"123456789012" -- Add on another 12 characters - this -- won't fit into "a"'s 18 chars and -- so a new string will be -- allocated and returned. #OUT+a; -- "A test of12345678901" #OUT+b; -- "A test of"; -- The original "b" is left unchanged.However, if we change this program slightly, aliasing suddenly rears its ugly head!
... First 3 lines are the same ... a := a+"123456" -- Add on another 6 characters - this -- now fits into the space "a" has -- left over. -- We just modify the original string. #OUT+a; -- "A test of123456" #OUT+b; -- "A test of123456"; -- b is changed due to aliasing!The destroy check allows us to catch some of these problems. If destroy checking is on, reading or modifying b should generate a run-time error.
While the preceeding discussion may make FSTRs seem dangerous, in practice a couple of simple rules will prevent any problems.