diff options
-rw-r--r-- | doc/reference/ExamplesPart.tex | 6890 | ||||
-rw-r--r-- | doc/reference/ReferencePart.tex | 4579 | ||||
-rw-r--r-- | doc/reference/ScalaByExample.tex | 6889 | ||||
-rw-r--r-- | doc/reference/ScalaReference.tex | 5007 |
4 files changed, 11473 insertions, 11892 deletions
diff --git a/doc/reference/ExamplesPart.tex b/doc/reference/ExamplesPart.tex new file mode 100644 index 0000000000..ad6403ae95 --- /dev/null +++ b/doc/reference/ExamplesPart.tex @@ -0,0 +1,6890 @@ +\def\exercise{ + \def\theresult{Exercise~\thesection.\arabic{result}} + \refstepcounter{result} + \trivlist\item[\hskip + \labelsep{\bf \theresult}]} +\def\endexercise{\endtrivlist} + +\newcommand{\rewriteby}[1]{\mbox{\tab\tab\rm(#1)}} + +\chapter{\label{chap:intro}Introduction} + +Scala is a programming language that fuses elements from +object-oriented and functional programming. We introduce here Scala in +an informal way, through a sequence of examples. + +Chapters~\ref{chap:example-one} and \ref{chap:example-auction} +highlight some of the features that make Scala interesting. The +following chapters introduce the language constructs of Scala in a +more thorough way, starting with simple expressions and functions, and +working up through objects and classes, lists and streams, mutable +state, pattern matching to more complete examples that show +interesting programming techniques. The present informal exposition is +meant to be complemented by the Java Language Reference Manual which +specifies Scala in a more detailed and precise way. + +\paragraph{Acknowledgement} +We owe a great dept to Sussman and Abelson's wonderful book +``Structure and Interpretation of Computer +Programs''\cite{abelson-sussman:structure}. Many of their examples and +exercises are also present here. Of course, the working language has +in each case been changed from Scheme to Scala. Furthermore, the +examples make use of Scala's object-oriented constructs where +appropriate. + +\chapter{\label{chap:example-one}A First Example} + +As a first example, here is an implementation of Quicksort in Scala. + +\begin{lstlisting} +def sort(xs: Array[int]): unit = { + def swap(i: int, j: int): unit = { + val t = xs(i); xs(i) = xs(j); xs(j) = t; + } + def sort1(l: int, r: int): unit = { + val pivot = xs((l + r) / 2); + var i = l, j = r; + while (i <= j) { + while (xs(i) < pivot) { i = i + 1 } + while (xs(j) > pivot) { j = j - 1 } + if (i <= j) { + swap(i, j); + i = i + 1; + j = j - 1; + } + } + if (l < j) sort1(l, j); + if (j < r) sort1(i, r); + } + sort1(0, xs.length - 1); +} +\end{lstlisting} + +The implementation looks quite similar to what one would write in Java +or C. We use the same operators and similar control structures. +There are also some minor syntactical differences. In particular: +\begin{itemize} +\item +Definitions start with a reserved word. Function definitions start +with \code{def}, variable definitions start with \code{var} and +definitions of values (i.e. read only variables) start with \code{val}. +\item +The declared type of a symbol is given after the symbol and a colon. +The declared type can often be omitted, because the compiler can infer +it from the context. +\item +We use \code{unit} instead of \code{void} to define the result type of +a procedure. +\item +Array types are written \code{Array[T]} rather than \code{T[]}, +and array selections are written \code{a(i)} rather than \code{a[i]}. +\item +Functions can be nested inside other functions. Nested functions can +access parameters and local variables of enclosing functions. For +instance, the name of the array \code{a} is visible in functions +\code{swap} and \code{sort1}, and therefore need not be passed as a +parameter to them. +\end{itemize} +So far, Scala looks like a fairly conventional language with some +syntactic pecularities. In fact it is possible to write programs in a +conventional imperative or object-oriented style. This is important +because it is one of the things that makes it easy to combine Scala +components with components written in mainstream languages such as +Java, C\# or Visual Basic. + +However, it is also possible to write programs in a style which looks +completely different. Here is Quicksort again, this time written in +functional style. + +\begin{lstlisting} +def sort(xs: List[int]): List[int] = { + val pivot = a(a.length / 2); + sort(a.filter(x => x < pivot)) + ::: a.filter(x => x == pivot) + ::: sort(a.filter(x => x > pivot)) +} +\end{lstlisting} + +The functional program works with lists instead of arrays.\footnote{In +a future complete implemenetation of Scala, we could also have used arrays +instead of lists, but at the moment arrays do not yet support +\code{filter} and \code{:::}.} +It captures the essence of the quicksort algorithm in a concise way: +\begin{itemize} +\item Pick an element in the middle of the list as a pivot. +\item Partition the lists into two sub-lists containing elements that +are less than, respectively greater than the pivot element, and a +third list which contains elements equal to privot. +\item Sort the first two sub-lists by a recursive invocation of +the sort function.\footnote{This is not quite what the imperative algorithm does; +the latter partitions the array into two sub-arrays containing elements +less than or greater or equal to pivot.} +\item The result is obtained by appending the three sub-lists together. +\end{itemize} +Both the imperative and the functional implementation have the same +asymptotic complexity -- $O(N;log(N))$ in the average case and +$O(N^2)$ in the worst case. But where the imperative implementation +operates in place by modifying the argument array, the functional +implementation returns a new sorted list and leaves the argument +list unchanged. The functional implementation thus requires more +transient memory than the imperative one. + +The functional implementation makes it look like Scala is a language +that's specialized for functional operations on lists. In fact, it +is not; all of the operations used in the example are simple library +methods of a class \code{List[t]} which is part of the standard +Scala library, and which itself is implemented in Scala. + +In particular, there is the method \code{filter} which takes as +argument a {\em predicate function} that maps list elements to +boolean values. The result of \code{filter} is a list consisting of +all the elements of the original list for which the given predicate +function is true. The \code{filter} method of an object of type +\code{List[t]} thus has the signature + +\begin{lstlisting} +def filter(p: t => boolean): List[t] +\end{lstlisting} + +Here, \code{t => boolean} is the type of functions that take an element +of type \code{t} and return a \code{boolean}. Functions like +\code{filter} that take another function as argument or return one as +result are called {\em higher-order} functions. + +In the quicksort program, \code{filter} is applied three times to an +anonymous function argument. The first argument, +\code{x => x <= pivot} represents the function that maps its parameter +\code{x} to the boolean value \code{x <= pivot}. That is, it yields +true if \code{x} is smaller or equal than \code{pivot}, false +otherwise. The function is anonymous, i.e.\ it is not defined with a +name. The type of the \code{x} parameter is omitted because a Scala +compiler can infer it automatically from the context where the +function is used. To summarize, \code{xs.filter(x => x <= pivot)} +returns a list consisting of all elements of the list \code{xs} that are +smaller than \code{pivot}. + +\comment{ +It is also possible to apply higher-order functions such as +\code{filter} to named function arguments. Here is functional +quicksort again, where the two anonymous functions are replaced by +named auxiliary functions that compare the argument to the +\code{pivot} value. + +\begin{lstlisting} +def sort (xs: List[int]): List[int] = { + val pivot = xs(xs.length / 2); + def leqPivot(x: int) = x <= pivot; + def gtPivot(x: int) = x > pivot; + def eqPivot(x: int) = x == pivot; + sort(xs filter leqPivot) + ::: sort(xs filter eqPivot) + ::: sort(xs filter gtPivot) +} +\end{lstlisting} +} + +An object of type \code{List[t]} also has a method ``\code{:::}'' +which takes an another list and which returns the result of appending this +list to itself. This method has the signature + +\begin{lstlisting} +def :::(that: List[t]): List[t] +\end{lstlisting} + +Scala does not distinguish between identifiers and operator names. An +identifier can be either a sequence of letters and digits which begins +with a letter, or it can be a sequence of special characters, such as +``\code{+}'', ``\code{*}'', or ``\code{:}''. The last definition thus +introduced a new method identifier ``\code{:::}''. This identifier is +used in the Quicksort example as a binary infix operator that connects +the two sub-lists resulting from the partition. In fact, any method +can be used as an operator in Scala. The binary operation $E;op;E'$ +is always interpreted as the method call $E.op(E')$. This holds also +for binary infix operators which start with a letter. The recursive call +to \code{sort} in the last quicksort example is thus equivalent to +\begin{lstlisting} +sort(a.filter(x => x < pivot)) + .:::(sort(a.filter(x => x == pivot))) + .:::(sort(a.filter(x => x > pivot))) +\end{lstlisting} + +Looking again in detail at the first, imperative implementation of +Quicksort, we find that many of the language constructs used in the +second solution are also present, albeit in a disguised form. + +For instance, ``standard'' binary operators such as \code{+}, +\code{-}, or \code{<} are not treated in any special way. Like +\code{append}, they are methods of their left operand. Consequently, +the expression \code{i + 1} is regarded as the invocation +\code{i.+(1)} of the \code{+} method of the integer value \code{x}. +Of course, a compiler is free (if it is moderately smart, even expected) +to recognize the special case of calling the \code{+} method over +integer arguments and to generate efficient inline code for it. + +Control constructs such as \code{while} are also not primitive but are +predefined functions in the standard Scala library. Here is the +definition of \code{while} in Scala. +\begin{lstlisting} +def while (def p: boolean) (def s: unit): unit = + if (p) { s ; while(p)(s) } +\end{lstlisting} +The \code{while} function takes as first parameter a test function, +which takes no parameters and yields a boolean value. As second +parameter it takes a command function which also takes no parameters +and yields a trivial result. \code{while} invokes the command function +as long as the test function yields true. Again, compilers are free to +pick specialized implementations of \code{while} that have the same +behavior as the invocation of the function given above. + +\chapter{Programming with Actors and Messages} +\label{chap:example-auction} + +Here's an example that shows an application area for which Scala is +particularly well suited. Consider the task of implementing an +electronic auction service. We use an Erlang-style actor process +model to implement the participants of the auction. Actors are +objects to which messages are sent. Every process has a ``mailbox'' of +its incoming messages which is represented as a queue. It can work +sequentially through the messages in its mailbox, or search for +messages matching some pattern. + +\begin{lstlisting}[style=floating,label=fig:simple-auction-msgs,caption=Implementation of an Auction Service] +trait AuctionMessage; +case class Offer(bid: int, client: Actor) extends AuctionMessage; +case class Inquire(client: Actor) extends AuctionMessage; + +trait AuctionReply; +case class Status(asked: int, expire: Date) extends AuctionReply; +case object BestOffer extends AuctionReply; +case class BeatenOffer(maxBid: int) extends AuctionReply; +case class AuctionConcluded(seller: Actor, client: Actor) + extends AuctionReply; +case object AuctionFailed extends AuctionReply; +case object AuctionOver extends AuctionReply; +\end{lstlisting} + +For every traded item there is an auctioneer process that publishes +information about the traded item, that accepts offers from clients +and that communicates with the seller and winning bidder to close the +transaction. We present an overview of a simple implementation +here. + +As a first step, we define the messages that are exchanged during an +auction. There are two abstract base classes (called {\em traits}): +\code{AuctionMessage} for messages from clients to the auction +service, and \code{AuctionReply} for replies from the service to the +clients. For both base classes there exists a number of cases, which +are defined in Figure~\ref{fig:simple-auction-msgs}. + +\begin{lstlisting}[style=floating,label=fig:simple-auction,caption=Implementation of an Auction Service] +class Auction(seller: Actor, minBid: int, closing: Date) extends Actor { + val timeToShutdown = 36000000; // msec + val bidIncrement = 10; + def run() = { + var maxBid = minBid - bidIncrement; + var maxBidder: Actor = _; + var running = true; + while (running) { + receiveWithin ((closing.getTime() - new Date().getTime())) { + case Offer(bid, client) => + if (bid >= maxBid + bidIncrement) { + if (maxBid >= minBid) maxBidder send BeatenOffer(bid); + maxBid = bid; maxBidder = client; client send BestOffer; + } else { + client send BeatenOffer(maxBid); + } + case Inquire(client) => + client send Status(maxBid, closing); + case TIMEOUT => + if (maxBid >= minBid) { + val reply = AuctionConcluded(seller, maxBidder); + maxBidder send reply; seller send reply; + } else { + seller send AuctionFailed; + } + receiveWithin(timeToShutdown) { + case Offer(_, client) => client send AuctionOver + case TIMEOUT => running = false; + } + } + } + } +} +\end{lstlisting} + +For each base class, there are a number of {\em case classes} which +define the format of particular messages in the class. These messages +might well be ultimately mapped to small XML documents. We expect +automatic tools to exist that convert between XML documents and +internal data structures like the ones defined above. + +Figure~\ref{fig:simple-auction} presents a Scala implementation of a +class \code{Auction} for auction processes that coordinate the bidding +on one item. Objects of this class are created by indicating +\begin{itemize} +\item a seller process which needs to be notified when the auction is over, +\item a minimal bid, +\item the date when the auction is to be closed. +\end{itemize} +The process behavior is defined by its \code{run} method. That method +repeatedly selects (using \code{receiveWithin}) a message and reacts to it, +until the auction is closed, which is signalled by a \code{TIMEOUT} +message. Before finally stopping, it stays active for another period +determined by the \code{timeToShutdown} constant and replies to +further offers that the auction is closed. + +Here are some further explanations of the constructs used in this +program: +\begin{itemize} +\item +The \code{receiveWithin} method of class \code{Actor} takes as +parameters a time span given in milliseconds and a function that +processes messages in the mailbox. The function is given by a sequence +of cases that each specify a pattern and an action to perform for +messages matching the pattern. The \code{receiveWithin} method selects +the first message in the mailbox which matches one of these patterns +and applies the corresponding action to it. +\item +The last case of \code{receiveWithin} is guarded by a +\code{TIMEOUT} pattern. If no other messages are received in the meantime, this +pattern is triggered after the time span which is passed as argument +to the enclosing \code{receiveWithin} method. \code{TIMEOUT} is a +particular instance of class \code{Message}, which is triggered by the +\code{Actor} implementation itself. +\item +Reply messages are sent using syntax of the form +\code{destination send SomeMessage}. \code{send} is used here as a +binary operator with a process and a message as arguments. This is +equivalent in Scala to the method call +\code{destination.send(SomeMessage)}, i.e. the invocation of +the \code{send} of the destination process with the given message as +parameter. +\end{itemize} +The preceding discussion gave a flavor of distributed programming in +Scala. It might seem that Scala has a rich set of language constructs +that support actor processes, message sending and receiving, +programming with timeouts, etc. In fact, the opposite is true. All the +constructs discussed above are offered as methods in the library class +\code{Actor}. That class is itself implemented in Scala, based on the underlying +thread model of the host language (e.g. Java, or .NET). +The implementation of all features of class \code{Actor} used here is +given in Section~\ref{sec:actors}. + +The advantages of the library-based approach are relative simplicity +of the core language and flexibility for library designers. Because +the core language need not specify details of high-level process +communication, it can be kept simpler and more general. Because the +particular model of messages in a mailbox is a library module, it can +be freely modified if a different model is needed in some +applications. The approach requires however that the core language is +expressive enough to provide the necessary language abstractions in a +convenient way. Scala has been designed with this in mind; one of its +major design goals was that it should be flexible enough to act as a +convenient host language for domain specific languages implemented by +library modules. For instance, the actor communication constructs +presented above can be regarded as one such domain specific language, +which conceptually extends the Scala core. + +\chapter{\label{chap:simple-funs}Expressions and Simple Functions} + +The previous examples gave an impression of what can be done with +Scala. We now introduce its constructs one by one in a more +systematic fashion. We start with the smallest level, expressions and +functions. + +\section{Expressions And Simple Functions} + +A Scala system comes with an interpreter which can be seen as a fancy +calculator. A user interacts with the calculator by typing in +expressions. The calculator returns the evaluation results and their +types. Example: + +\begin{lstlisting} +> 87 + 145 +232: scala.Int + +> 5 + 2 * 3 +11: scala.Int + +> "hello" + " world!" +hello world: scala.String +\end{lstlisting} +It is also possible to name a sub-expression and use the name instead +of the expression afterwards: +\begin{lstlisting} +> def scale = 5 +def scale: int + +> 7 * scale +35: scala.Int +\end{lstlisting} +\begin{lstlisting} +> def pi = 3.14159 +def pi: scala.Double + +> def radius = 10 +def radius: scala.Int + +> 2 * pi * radius +62.8318: scala.Double +\end{lstlisting} +Definitions start with the reserved word \code{def}; they introduce a +name which stands for the expression following the \code{=} sign. The +interpreter will answer with the introduced name and its type. + +Executing a definition such as \code{def x = e} will not evaluate the +expression \code{e}. Instead \code{e} is evaluated whenever \code{x} +is used. Alternatively, Scala offers a value definition +\code{val x = e}, which does evaluate the right-hand-side \code{e} as part of the +evaluation of the definition. If \code{x} is then used subsequently, +it is immediately replaced by the pre-computed value of +\code{e}, so that the expression need not be evaluated again. + +How are expressions evaluated? An expression consisting of operators +and operands is evaluated by repeatedly applying the following +simplification steps. +\begin{itemize} +\item pick the left-most operation +\item evaluate its operands +\item apply the operator to the operand values. +\end{itemize} +A name defined by \code{def}\ is evaluated by replacing the name by the +(unevaluated) definition's right hand side. A name defined by \code{val} is +evaluated by replacing the name by the value of the definitions's +right-hand side. The evaluation process stops once we have reached a +value. A value is some data item such as a string, a number, an array, +or a list. + +\example +Here is an evaluation of an arithmetic expression. +\begin{lstlisting} +$\,\,\,$ (2 * pi) * radius +$\rightarrow$ (2 * 3.14159) * radius +$\rightarrow$ 6.28318 * radius +$\rightarrow$ 6.28318 * 10 +$\rightarrow$ 62.8318 +\end{lstlisting} +The process of stepwise simplification of expressions to values is +called {\em reduction}. + +\section{Parameters} + +Using \code{def}, one can also define functions with parameters. Example: +\begin{lstlisting} +> def square(x: double) = x * x +def square(x: double): scala.Double + +> square(2) +4.0: scala.Double + +> square(5 + 4) +81.0: scala.Double + +> square(square(4)) +256.0: scala.Double + +> def sumOfSquares(x: double, y: double) = square(x) + square(y) +def sumOfSquares(x: scala.Double, y: scala.Double): scala.Double +\end{lstlisting} + +Function parameters follow the function name and are always enclosed +in parentheses. Every parameter comes with a type, which is indicated +following the parameter name and a colon. At the present time, we +only need basic numeric types such as the type \code{scala.Double} of +double precision numbers. Scala defines {\em type aliases} for some +standard types, so we can write numeric types as in Java. For instance +\code{double} is a type alias of \code{scala.Double} and \code{int} is +a type alias for \code{scala.Int}. + +Functions with parameters are evaluated analogously to operators in +expressions. First, the arguments of the function are evaluated (in +left-to-right order). Then, the function application is replaced by +the function's right hand side, and at the same time all formal +parameters of the function are replaced by their corresponding actual +arguments. + +\example\ + +\begin{lstlisting} +$\,\,\,$ sumOfSquares(3, 2+2) +$\rightarrow$ sumOfSquares(3, 4) +$\rightarrow$ square(3) + square(4) +$\rightarrow$ 3 * 3 + square(4) +$\rightarrow$ 9 + square(4) +$\rightarrow$ 9 + 4 * 4 +$\rightarrow$ 9 + 16 +$\rightarrow$ 25 +\end{lstlisting} + +The example shows that the interpreter reduces function arguments to +values before rewriting the function application. One could instead +have chosen to apply the function to unreduced arguments. This would +have yielded the following reduction sequence: +\begin{lstlisting} +$\,\,\,$ sumOfSquares(3, 2+2) +$\rightarrow$ square(3) + square(2+2) +$\rightarrow$ 3 * 3 + square(2+2) +$\rightarrow$ 9 + square(2+2) +$\rightarrow$ 9 + (2+2) * (2+2) +$\rightarrow$ 9 + 4 * (2+2) +$\rightarrow$ 9 + 4 * 4 +$\rightarrow$ 9 + 16 +$\rightarrow$ 25 +\end{lstlisting} + +The second evaluation order is known as \emph{call-by-name}, +whereas the first one is known as \emph{call-by-value}. For +expressions that use only pure functions and that therefore can be +reduced with the substitution model, both schemes yield the same final +values. + +Call-by-value has the advantage that it avoids repeated evaluation of +arguments. Call-by-name has the advantage that it avoids evaluation of +arguments when the parameter is not used at all by the function. +Call-by-value is usually more efficient than call-by-name, but a +call-by-value evaluation might loop where a call-by-name evaluation +would terminate. Consider: +\begin{lstlisting} +> def loop: int = loop +def loop: scala.Int + +> def first(x: int, y: int) = x +def first(x: scala.Int, y: scala.Int): scala.Int +\end{lstlisting} +Then \code{first(1, loop)} reduces with call-by-name to \code{1}, +whereas the same term reduces with call-by-value repeatedly to itself, +hence evaluation does not terminate. +\begin{lstlisting} +$\,\,\,$ first(1, loop) +$\rightarrow$ first(1, loop) +$\rightarrow$ first(1, loop) +$\rightarrow$ ... +\end{lstlisting} +Scala uses call-by-value by default, but it switches to call-by-name evaluation +if the parameter is preceded by \code{def}. + +\example\ + +\begin{lstlisting} +> def constOne(x: int, def y: int) = 1 +constOne(x: scala.Int, def y: scala.Int): scala.Int + +> constOne(1, loop) +1: scala.Int + +> constOne(loop, 2) // gives an infinite loop. +^C +\end{lstlisting} + +\section{Conditional Expressions} + +Scala's \code{if-else} lets one choose between two alternatives. Its +syntax is like Java's \code{if-else}. But where Java's \code{if-else} +can be used only as an alternative of statements, Scala allows the +same syntax to choose between two expressions. That's why Scala's +\code{if-else} serves also as a substitute for Java's conditional +expression \code{ ... ? ... : ...}. + +\example\ + +\begin{lstlisting} +> def abs(x: double) = if (x >= 0) x else -x +abs(x: double): double +\end{lstlisting} +Scala's boolean expressions are similar to Java's; they are formed +from the constants +\code{true} and +\code{false}, comparison operators, boolean negation \code{!} and the +boolean operators $\,$\code{&&}$\,$ and $\,$\code{||}. + +\section{\label{sec:sqrt}Example: Square Roots by Newton's Method} + +We now illustrate the language elements introduced so far in the +construction of a more interesting program. The task is to write a +function +\begin{lstlisting} +def sqrt(x: double): double = ... +\end{lstlisting} +which computes the square root of \code{x}. + +A common way to compute square roots is by Newton's method of +successive approximations. One starts with an initial guess \code{y} +(say: \code{y = 1}). One then repeatedly improves the current guess +\code{y} by taking the average of \code{y} and \code{x/y}. As an +example, the next three columns indicate the guess \code{y}, the +quotient \code{x/y}, and their average for the first approximations of +$\sqrt 2$. +\begin{lstlisting} +1 2/1 = 2 1.5 +1.5 2/1.5 = 1.3333 1.4167 +1.4167 2/1.4167 = 1.4118 1.4142 +1.4142 ... ... + +$y$ $x/y$ $(y + x/y)/2$ +\end{lstlisting} +One can implement this algorithm in Scala by a set of small functions, +which each represent one of the elements of the algorithm. + +We first define a function for iterating from a guess to the result: +\begin{lstlisting} +def sqrtIter(guess: double, x: double): double = + if (isGoodEnough(guess, x)) guess + else sqrtIter(improve(guess, x), x); +\end{lstlisting} +Note that \code{sqrtIter} calls itself recursively. Loops in +imperative programs can always be modelled by recursion in functional +programs. + +Note also that the definition of \code{sqrtIter} contains a return +type, which follows the parameter section. Such return types are +mandatory for recursive functions. For a non-recursive function, the +return type is optional; if it is missing the type checker will +compute it from the type of the function's right-hand side. However, +even for non-recursive functions it is often a good idea to include a +return type for better documentation. + +As a second step, we define the two functions called by +\code{sqrtIter}: a function to \code{improve} the guess and a +termination test \code{isGoodEnough}. Here is their definition. +\begin{lstlisting} +def improve(guess: double, x: double) = + (guess + x / guess) / 2; + +def isGoodEnough(guess: double, x: double) = + abs(square(guess) - x) < 0.001; +\end{lstlisting} + +Finally, the \code{sqrt} function itself is defined by an aplication +of \code{sqrtIter}. +\begin{lstlisting} +def sqrt(x: double) = sqrtIter(1.0, x); +\end{lstlisting} + +\begin{exercise} The \code{isGoodEnough} test is not very precise for small +numbers and might lead to non-termination for very large ones (why?). +Design a different version of \code{isGoodEnough} which does not have +these problems. +\end{exercise} + +\begin{exercise} Trace the execution of the \code{sqrt(4)} expression. +\end{exercise} + +\section{Nested Functions} + +The functional programming style encourages the construction of many +small helper functions. In the last example, the implementation +of \code{sqrt} made use of the helper functions \code{sqrtIter}, +\code{improve} and \code{isGoodEnough}. The names of these functions +are relevant only for the implementation of \code{sqrt}. We normally +do not want users of \code{sqrt} to access these functions directly. + +We can enforce this (and avoid name-space pollution) by including +the helper functions within the calling function itself: +\begin{lstlisting} +def sqrt(x: double) = { + def sqrtIter(guess: double, x: double): double = + if (isGoodEnough(guess, x)) guess + else sqrtIter(improve(guess, x), x); + def improve(guess: double, x: double) = + (guess + x / guess) / 2; + def isGoodEnough(guess: double, x: double) = + abs(square(guess) - x) < 0.001; + sqrtIter(1.0, x) +} +\end{lstlisting} +In this program, the braces \code{\{ ... \}} enclose a {\em block}. +Blocks in Scala are themselves expressions. Every block ends in a +result expression which defines its value. The result expression may +be preceded by auxiliary definitions, which are visible only in the +block itself. + +Every definition in a block must be followed by a semicolon, which +separates this definition from subsequent definitions or the result +expression. However, a semicolon is inserted implicitly if the +definition ends in a right brace and is followed by a new line. +Therefore, the following are all legal: +\begin{lstlisting} +def f(x) = x + 1; /* `;' mandatory */ +f(1) + f(2) + +def g(x) = {x + 1} +g(1) + g(2) + +def h(x) = {x + 1}; /* `;' mandatory */ h(1) + h(2) +\end{lstlisting} +Scala uses the usual block-structured scoping rules. A name defined in +some outer block is visible also in some inner block, provided it is +not redefined there. This rule permits us to simplify our +\code{sqrt} example. We need not pass \code{x} around as an additional parameter of +the nested functions, since it is always visible in them as a +parameter of the outer function \code{sqrt}. Here is the simplified code: +\begin{lstlisting} +def sqrt(x: double) = { + def sqrtIter(guess: double): double = + if (isGoodEnough(guess)) guess + else sqrtIter(improve(guess)); + def improve(guess: double) = + (guess + x / guess) / 2; + def isGoodEnough(guess: double) = + abs(square(guess) - x) < 0.001; + sqrtIter(1.0) +} +\end{lstlisting} + +\section{Tail Recursion} + +Consider the following function to compute the greatest common divisor +of two given numbers. + +\begin{lstlisting} +def gcd(a: int, b: int): int = if (b == 0) a else gcd(b, a % b) +\end{lstlisting} + +Using our substitution model of function evaluation, +\code{gcd(14, 21)} evaluates as follows: + +\begin{lstlisting} +$\,\,$ gcd(14, 21) +$\rightarrow\!$ if (21 == 0) 14 else gcd(21, 14 % 21) +$\rightarrow\!$ if (false) 14 else gcd(21, 14 % 21) +$\rightarrow\!$ gcd(21, 14 % 21) +$\rightarrow\!$ gcd(21, 14) +$\rightarrow\!$ if (14 == 0) 21 else gcd(14, 21 % 14) +$\rightarrow$ $\rightarrow$ gcd(14, 21 % 14) +$\rightarrow\!$ gcd(14, 7) +$\rightarrow\!$ if (7 == 0) 14 else gcd(7, 14 % 7) +$\rightarrow$ $\rightarrow$ gcd(7, 14 % 7) +$\rightarrow\!$ gcd(7, 0) +$\rightarrow\!$ if (0 == 0) 7 else gcd(0, 7 % 0) +$\rightarrow$ $\rightarrow$ 7 +\end{lstlisting} + +Contrast this with the evaluation of another recursive function, +\code{factorial}: + +\begin{lstlisting} +def factorial(n: int): int = if (n == 0) 1 else n * factorial(n - 1) +\end{lstlisting} + +The application \code{factorial(5)} rewrites as follows: +\begin{lstlisting} +$\,\,\,$ factorial(5) +$\rightarrow$ if (5 == 0) 1 else 5 * factorial(5 - 1) +$\rightarrow$ 5 * factorial(5 - 1) +$\rightarrow$ 5 * factorial(4) +$\rightarrow\ldots\rightarrow$ 5 * (4 * factorial(3)) +$\rightarrow\ldots\rightarrow$ 5 * (4 * (3 * factorial(2))) +$\rightarrow\ldots\rightarrow$ 5 * (4 * (3 * (2 * factorial(1)))) +$\rightarrow\ldots\rightarrow$ 5 * (4 * (3 * (2 * (1 * factorial(0)))) +$\rightarrow\ldots\rightarrow$ 5 * (4 * (3 * (2 * (1 * 1)))) +$\rightarrow\ldots\rightarrow$ 120 +\end{lstlisting} +There is an important difference between the two rewrite sequences: +The terms in the rewrite sequence of \code{gcd} have again and again +the same form. As evaluation proceeds, their size is bounded by a +constant. By contrast, in the evaluation of factorial we get longer +and longer chains of operands which are then multiplied in the last +part of the evaluation sequence. + +Even though actual implementations of Scala do not work by rewriting +terms, they nevertheless should have the same space behavior as in the +rewrite sequences. In the implementation of \code{gcd}, one notes that +the recursive call to \code{gcd} is the last action performed in the +evaluation of its body. One also says that \code{gcd} is +``tail-recursive''. The final call in a tail-recursive function can be +implemented by a jump back to the beginning of that function. The +arguments of that call can overwrite the parameters of the current +instantiation of \code{gcd}, so that no new stack space is needed. +Hence, tail recursive functions are iterative processes, which can be +executed in constant space. + +By contrast, the recursive call in \code{factorial} is followed by a +multiplication. Hence, a new stack frame is allocated for the +recursive instance of factorial, and is decallocated after that +instance has finished. The given formulation of the factorial function +is not tail-recursive; it needs space proportional to its input +parameter for its execution. + +More generally, if the last action of a function is a call to another +(possibly the same) function, only a single stack frame is needed for +both functions. Such calls are called ``tail calls''. In principle, +tail calls can always re-use the stack frame of the calling function. +However, some run-time environments (such as the Java VM) lack the +primititives to make stack frame re-use for tail calls efficient. A +production quality Scala implementation is therefore only required to +re-use the stack frame of a directly tail-recursive function whose +last action is a call to itself. Other tail calls might be optimized +also, but one should not rely on this across implementations. + +\begin{exercise} Design a tail-recursive version of +\code{factorial}. +\end{exercise} + +\chapter{\label{chap:first-class-funs}First-Class Functions} + +A function in Scala is a ``first-class value''. Like any other value, +it may be passed as a parameter or returned as a result. Functions +which take other functions as parameters or return them as results are +called {\em higher-order} functions. This chapter introduces +higher-order functions and shows how they provide a flexible mechanism +for program composition. + +As a motivating example, consider the following three related tasks: +\begin{enumerate} +\item +Write a function to sum all integers between two given numbers \code{a} and \code{b}: +\begin{lstlisting} +def sumInts(a: int, b: int): double = + if (a > b) 0 else a + sumInts(a + 1, b) +\end{lstlisting} +\item +Write a function to sum the cubes of all integers between two given numbers +\code{a} and \code{b}: +\begin{lstlisting} +def cube(x: int): double = x * x * x +def sumCubes(a: int, b: int): double = + if (a > b) 0 else cube(a) + sumSqrts(a + 1, b) +\end{lstlisting} +\item +Write a function to sum the reciprocals of all integers between two given numbers +\code{a} and \code{b}: +\begin{lstlisting} +def sumReciprocals(a: int, b: int): double = + if (a > b) 0 else 1.0 / a + sumReciprocals(a + 1, b) +\end{lstlisting} +\end{enumerate} +These functions are all instances of +\(\sum^b_a f(n)\) for different values of $f$. +We can factor out the common pattern by defining a function \code{sum}: +\begin{lstlisting} +def sum(f: int => double, a: int, b: int): double = + if (a > b) 0 else f(a) + sum(f, a + 1, b) +\end{lstlisting} +The type \code{int => double} is the type of functions that +take arguments of type \code{int} and return results of type +\code{double}. So \code{sum} is a function which takes another function as +a parameter. In other words, \code{sum} is a {\em higher-order} +function. + +Using \code{sum}, we can formulate the three summing functions as +follows. +\begin{lstlisting} +def sumInts(a: int, b: int): double = sum(id, a, b); +def sumCubes(a: int, b: int): double = sum(cube, a, b); +def sumReciprocals(a: int, b: int): double = sum(reciprocal, a, b); +\end{lstlisting} +where +\begin{lstlisting} +def id(x: int): double = x; +def cube(x: int): double = x * x * x; +def reciprocal(x: int): double = 1.0/x; +\end{lstlisting} + +\section{Anonymous Functions} + +Parameterization by functions tends to create many small functions. In +the previous example, we defined \code{id}, \code{cube} and +\code{reciprocal} as separate functions, so that they could be +passed as arguments to \code{sum}. + +Instead of using named function definitions for these small argument +functions, we can formulate them in a shorter way as {\em anonymous +functions}. An anonymous function is an expression that evaluates to a +function; the function is defined without giving it a name. As an +example consider the anonymous reciprocal function: +\begin{lstlisting} + x: int => 1.0/x +\end{lstlisting} +The part before the arrow `\code{=>}' is the parameter of the function, +whereas the part following the `\code{=>}' is its body. If there are +several parameters, we need to enclose them in parentheses. For +instance, here is an anonymous function which multiples its two arguments. +\begin{lstlisting} + (x: double, y: double) => x * y +\end{lstlisting} +Using anonymous functions, we can reformulate the three summation +functions without named auxiliary functions: +\begin{lstlisting} +def sumInts(a: int, b: int): double = sum(x: int => x, a, b); +def sumCubes(a: int, b: int): double = sum(x: int => x * x * x, a, b); +def sumReciprocals(a: int, b: int): double = sum(x: int => 1.0/x, a, b); +\end{lstlisting} +Often, the Scala compiler can deduce the parameter type(s) from the +context of the anonymous function in which case they can be omitted. +For instance, in the case of \code{sumInts}, \code{sumCubes} and +\code{sumReciprocals}, one knows from the type of +\code{sum} that the first parameter must be a function of type +\code{int => double}. Hence, the parameter type \code{int} is +redundant and may be omitted: +\begin{lstlisting} +def sumInts(a: int, b: int): double = sum(x => x, a, b); +def sumCubes(a: int, b: int): double = sum(x => x * x * x, a, b); +def sumReciprocals(a: int, b: int): double = sum(x => 1.0/x, a, b); +\end{lstlisting} + +Generally, the Scala term +\code{(x}$_1$\code{: T}$_1$\code{, ..., x}$_n$\code{: T}$_n$\code{) => E} +defines a function which maps its parameters +\code{x}$_1$\code{, ..., x}$_n$ to the result of the expression \code{E} +(where \code{E} may refer to \code{x}$_1$\code{, ..., x}$_n$). Anonymous +functions are not essential language elements of Scala, as they can +always be expressed in terms of named functions. Indeed, the +anonymous function +\begin{lstlisting} +(x$_1$: T$_1$, ..., x$_n$: T$_n$) => E +\end{lstlisting} +is equivalent to the block +\begin{lstlisting} +{ def f (x$_1$: T$_1$, ..., x$_n$: T$_n$) = E ; f } +\end{lstlisting} +where \code{f} is fresh name which is used nowhere else in the program. +We also say, anonymous functions are ``syntactic sugar''. + +\section{Currying} + +The latest formulation of the three summing function is already quite +compact. But we can do even better. Note that +\code{a} and \code{b} appear as parameters and arguments of every function +but they do not seem to take part in interesting combinations. Is +there a way to get rid of them? + +Let's try to rewrite \code{sum} so that it does not take the bounds +\code{a} and \code{b} as parameters: +\begin{lstlisting} +def sum(f: int => double) = { + def sumF(a: int, b: int): double = + if (a > b) 0 else f(a) + sumF(a + 1, b); + sumF +} +\end{lstlisting} +In this formulation, \code{sum} is a function which returns another +function, namely the specialized summing function \code{sumF}. This +latter function does all the work; it takes the bounds \code{a} and +\code{b} as parameters, applies \code{sum}'s function parameter \code{f} to all +integers between them, and sums up the results. + +Using this new formulation of \code{sum}, we can now define: +\begin{lstlisting} +def sumInts = sum(x => x); +def sumCubes = sum(x => x * x * x); +def sumReciprocals = sum(x => 1.0/x); +\end{lstlisting} +Or, equivalently, with value definitions: +\begin{lstlisting} +val sumInts = sum(x => x); +val sumCubes = sum(x => x * x * x); +val sumReciprocals = sum(x => 1.0/x); +\end{lstlisting} +These functions can be applied like other functions. For instance, +\begin{lstlisting} +> sumCubes(1, 10) + sumReciprocals(10, 20) +3025.7687714031754: scala.Double +\end{lstlisting} +How are function-returning functions applied? As an example, in the expression +\begin{lstlisting} +sum(x => x * x * x)(1, 10) , +\end{lstlisting} +the function \code{sum} is applied to the cubing function +\code{(x => x * x * x)}. The resulting function is then +applied to the second argument list, \code{(1, 10)}. + +This notation is possible because function application associates to the left. +That is, if $\mbox{args}_1$ and $\mbox{args}_2$ are argument lists, then +\bda{lcl} +f(\mbox{args}_1)(\mbox{args}_2) & \ \ \mbox{is equivalent to}\ \ & (f(\mbox{args}_1))(\mbox{args}_2) +\eda +In our example, \code{sum(x => x * x * x)(1, 10)} is equivalent to the +following expression: +\code{(sum(x => x * x * x))(1, 10)}. + +The style of function-returning functions is so useful that Scala has +special syntax for it. For instance, the next definition of \code{sum} +is equivalent to the previous one, but is shorter: +\begin{lstlisting} +def sum(f: int => double)(a: int, b: int): double = + if (a > b) 0 else f(a) + sum(f)(a + 1, b) +\end{lstlisting} +Generally, a curried function definition +\begin{lstlisting} +def f (args$_1$) ... (args$_n$) = E +\end{lstlisting} +where $n > 1$ expands to +\begin{lstlisting} +def f (args$_1$) ... (args$_{n-1}$) = { def g (args$_n$) = E ; g } +\end{lstlisting} +where \code{g} is a fresh identifier. Or, shorter, using an anonymous function: +\begin{lstlisting} +def f (args$_1$) ... (args$_{n-1}$) = ( args$_n$ ) => E . +\end{lstlisting} +Performing this step $n$ times yields that +\begin{lstlisting} +def f (args$_1$) ... (args$_n$) = E +\end{lstlisting} +is equivalent to +\begin{lstlisting} +def f = (args$_1$) => ... => (args$_n$) => E . +\end{lstlisting} +Or, equivalently, using a value definition: +\begin{lstlisting} +val f = (args$_1$) => ... => (args$_n$) => E . +\end{lstlisting} +This style of function definition and application is called {\em +currying} after its promoter, Haskell B.\ Curry, a logician of the +20th century, even though the idea goes back further to Moses +Sch\"onfinkel and Gottlob Frege. + +The type of a function-returning function is expressed analogously to +its parameter list. Taking the last formulation of \code{sum} as an example, +the type of \code{sum} is \code{(int => double) => (int, int) => double}. +This is possible because function types associate to the right. I.e. +\begin{lstlisting} +T$_1$ => T$_2$ => T$_3$ $\mbox{is equivalent to}$ T$_1$ => (T$_2$ => T$_3$) +\end{lstlisting} + + +\begin{exercise} +1. The \code{sum} function uses a linear recursion. Can you write a +tail-recursive one by filling in the ??'s? + +\begin{lstlisting} +def sum(f: int => double)(a: int, b: int): double = { + def iter(a, result) = { + if (??) ?? + else iter(??, ??) + } + iter(??, ??) +} +\end{lstlisting} +\end{exercise} + +\begin{exercise} +Write a function \code{product} that computes the product of the +values of functions at points over a given range. +\end{exercise} + +\begin{exercise} +Write \code{factorial} in terms of \code{product}. +\end{exercise} + +\begin{exercise} +Can you write an even more general function which generalizes both +\code{sum} and \code{product}? +\end{exercise} + +\section{Example: Finding Fixed Points of Functions} + +A number \code{x} is called a {\em fixed point} of a function \code{f} if +\begin{lstlisting} +f(x) = x . +\end{lstlisting} +For some functions \code{f} we can locate the fixed point by beginning +with an initial guess and then applying \code{f} repeatedly, until the +value does not change anymore (or the change is within a small +tolerance). This is possible if the sequence +\begin{lstlisting} +x, f(x), f(f(x)), f(f(f(x))), ... +\end{lstlisting} +converges to fixed point of $f$. This idea is captured in +the following ``fixed-point finding function'': +\begin{lstlisting} +val tolerance = 0.0001; +def isCloseEnough(x: double, y: double) = abs((x - y) / x) < tolerance; +def fixedPoint(f: double => double)(firstGuess: double) = { + def iterate(guess: double): double = { + val next = f(guess); + if (isCloseEnough(guess, next)) next + else iterate(next) + } + iterate(firstGuess) +} +\end{lstlisting} +We now apply this idea in a reformulation of the square root function. +Let's start with a specification of \code{sqrt}: +\begin{lstlisting} +sqrt(x) = $\mbox{the {\sl y} such that}$ y * y = x + = $\mbox{the {\sl y} such that}$ y = x / y +\end{lstlisting} +Hence, \code{sqrt(x)} is a fixed point of the function \code{y => x / y}. +This suggests that \code{sqrt(x)} can be computed by fixed point iteration: +\begin{lstlisting} +def sqrt(x: double) = fixedPoint(y => x / y)(1.0) +\end{lstlisting} +Unfortunately, this does not converge. Let's instrument the fixed point +function with a print statement which keeps track of the current +\code{guess} value: +\begin{lstlisting} +def fixedPoint(f: double => double)(firstGuess: double) = { + def iterate(guess: double): double = { + val next = f(guess); + System.out.println(next); + if (isCloseEnough(guess, next)) next + else iterate(next) + } + iterate(firstGuess) +} +\end{lstlisting} +Then, \code{sqrt(2)} yields: +\begin{lstlisting} + 2.0 + 1.0 + 2.0 + 1.0 + 2.0 + ... +\end{lstlisting} +One way to control such oscillations is to prevent the guess from changing too much. +This can be achieved by {\em averaging} successive values of the original sequence: +\begin{lstlisting} +> def sqrt(x: double) = fixedPoint(y => (y + x/y) / 2)(1.0) +def sqrt(x: scala.Double): scala.Double +> sqrt(2.0) + 1.5 + 1.4166666666666665 + 1.4142156862745097 + 1.4142135623746899 + 1.4142135623746899 +\end{lstlisting} +In fact, expanding the \code{fixedPoint} function yields exactly our +previous definition of fixed point from Section~\ref{sec:sqrt}. + +The previous examples showed that the expressive power of a language +is considerably enhanced if functions can be passed as arguments. The +next example shows that functions which return functions can also be +very useful. + +Consider again fixed point iterations. We started with the observation +that $\sqrt(x)$ is a fixed point of the function \code{y => x / y}. +Then we made the iteration converge by averaging successive values. +This technique of {\em average dampening} is so general that it +can be wrapped in another function. +\begin{lstlisting} +def averageDamp(f: double => double)(x: double) = (x + f(x)) / 2 +\end{lstlisting} +Using \code{averageDamp}, we can reformulate the square root function +as follows. +\begin{lstlisting} +def sqrt(x: double) = fixedPoint(averageDamp(y => x/y))(1.0) +\end{lstlisting} +This expresses the elements of the algorithm as clearly as possible. + +\begin{exercise} Write a function for cube roots using \code{fixedPoint} and +\code{averageDamp}. +\end{exercise} + +\section{Summary} + +We have seen in the previous chapter that functions are essential +abstractions, because they permit us to introduce general methods of +computing as explicit, named elements in our programming language. +The present chapter has shown that these abstractions can be combined +by higher-order functions to create further abstractions. As +programmers, we should look out for opportunities to abstract and to +reuse. The highest possible level of abstraction is not always the +best, but it is important to know abstraction techniques, so that one +can use abstractions where appropriate. + +\section{Language Elements Seen So Far} + +Chapters~\ref{chap:simple-funs} and \ref{chap:first-class-funs} have +covered Scala's language elements to express expressions and types +comprising of primitive data and functions. The context-free syntax +of these language elements is given below in extended Backus-Naur +form, where `\code{|}' denotes alternatives, \code{[...]} denotes +option (0 or 1 occurrence), and \lstinline@{...}@ denotes repetition +(0 or more occurrences). + +\subsection*{Characters} + +Scala programs are sequences of (Unicode) characters. We distinguish the +following character sets: +\begin{itemize} +\item +whitespace, such as `\code{ }', tabulator, or newline characters, +\item +letters `\code{a}' to `\code{z}', `\code{A}' to `\code{Z}', +\item +digits \code{`0'} to `\code{9}', +\item +the delimiter characters + +\begin{lstlisting} +. , ; ( ) { } [ ] \ $\mbox{\tt "}$ ' +\end{lstlisting} + +\item +operator characters, such as `\code{#}' `\code{+}', +`\code{:}'. Essentially, these are printable characters which are +in none of the character sets above. +\end{itemize} + +\subsection*{Lexemes:} + +\begin{lstlisting} +ident = letter {letter | digit} + | operator { operator } + | ident '_' ident +literal = $\mbox{``as in Java''}$ +\end{lstlisting} + +Literals are as in Java. They define numbers, characters, strings, or +boolean values. Examples of literals as \code{0}, \code{1.0d10}, \code{'x'}, +\code{"he said \"hi!\""}, or \code{true}. + +Identifiers can be of two forms. They either start with a letter, +which is followed by a (possibly empty) sequence of letters or +symbols, or they start with an operator character, which is followed +by a (possibly empty) sequence of operator characters. Both forms of +identifiers may contain underscore characters `\code{_}'. Furthermore, +an underscore character may be followed by either sort of +identifier. Hence, the following are all legal identifiers: +\begin{lstlisting} +x Room10a + -- foldl_: +_vector +\end{lstlisting} +It follows from this rule that subsequent operator-identifiers need to +be separated by whitespace. For instance, the input +\code{x+-y} is parsed as the three token sequence \code{x}, \code{+-}, +\code{y}. If we want to express the sum of \code{x} with the +negated value of \code{y}, we need to add at least one space, +e.g. \code{x+ -y}. + +The \verb@$@ character is reserved for compiler-generated +identifiers; it should not be used in source programs. %$ + +The following are reserved words, they may not be used as identifiers: +\begin{lstlisting}[keywordstyle=] +abstract case catch class def +do else extends false final +finally for if import new +null object override package private +protected return sealed super this +trait try true type val +var while with yield +_ : = => <- <: >: # @ +\end{lstlisting} + +\subsection*{Types:} + +\begin{lstlisting} +Type = SimpleType | FunctionType +FunctionType = SimpleType '=>' Type | '(' [Types] ')' '=>' Type +SimpleType = byte | short | char | int | long | double | float | + boolean | unit | String +Types = Type {`,' Type} +\end{lstlisting} + +Types can be: +\begin{itemize} +\item number types \code{byte}, \code{short}, \code{char}, \code{int}, \code{long}, \code{float} and \code{double} (these are as in Java), +\item the type \code{boolean} with values \code{true} and \code{false}, +\item the type \code{unit} with the only value \code{()}, +\item the type \code{String}, +\item function types such as \code{(int, int) => int} or \code{String => Int => String}. +\end{itemize} + +\subsection*{Expressions:} + +\begin{lstlisting} +Expr = InfixExpr | FunctionExpr | if '(' Expr ')' Expr else Expr +InfixExpr = PrefixExpr | InfixExpr Operator InfixExpr +Operator = ident +PrefixExpr = ['+' | '-' | '!' | '~' ] SimpleExpr +SimpleExpr = ident | literal | SimpleExpr '.' ident | Block +FunctionExpr = Bindings '=>' Expr +Bindings = ident [':' SimpleType] | '(' [Binding {',' Binding}] ')' +Binding = ident [':' Type] +Block = '{' {Def ';'} Expr '}' +\end{lstlisting} + +Expressions can be: +\begin{itemize} +\item +identifiers such as \code{x}, \code{isGoodEnough}, \code{*}, or \code{+-}, +\item +literals, such as \code{0}, \code{1.0}, or \code{"abc"}, +\item +field and method selections, such as \code{System.out.println}, +\item +function applications, such as \code{sqrt(x)}, +\item +operator applications, such as \code{-x} or \code{y + x}, +\item +conditionals, such as \code{if (x < 0) -x else x}, +\item +blocks, such as \lstinline@{ val x = abs(y) ; x * 2 }@, +\item +anonymous functions, such as \code{x => x + 1} or \code{(x: int, y: int) => x + y}. +\end{itemize} + +\subsection*{Definitions:} + +\begin{lstlisting} +Def = FunDef | ValDef +FunDef = 'def' ident {'(' [Parameters] ')'} [':' Type] '=' Expr +ValDef = 'val' ident [':' Type] '=' Expr +Parameters = Parameter {',' Parameter} +Parameter = ['def'] ident ':' Type +\end{lstlisting} +Definitions can be: +\begin{itemize} +\item +function definitions such as \code{def square(x: int): int = x * x}, +\item +value definitions such as \code{val y = square(2)}. +\end{itemize} + +\chapter{Classes and Objects} +\label{chap:classes} + +Scala does not have a built-in type of rational numbers, but it is +easy to define one, using a class. Here's a possible implementation. + +\begin{lstlisting} +class Rational(n: int, d: int) { + private def gcd(x: int, y: int): int = { + if (x == 0) y + else if (x < 0) gcd(-x, y) + else if (y < 0) -gcd(x, -y) + else gcd(y % x, x); + } + private val g = gcd(n, d); + + val numer: int = n/g; + val denom: int = d/g; + def +(that: Rational) = + new Rational(numer * that.denom + that.numer * denom, + denom * that.denom); + def -(that: Rational) = + new Rational(numer * that.denom - that.numer * denom, + denom * that.denom); + def *(that: Rational) = + new Rational(numer * that.numer, denom * that.denom); + def /(that: Rational) = + new Rational(numer * that.denom, denom * that.numer); +} +\end{lstlisting} +This defines \code{Rational} as a class which takes two constructor +arguments \code{n} and \code{d}, containing the number's numerator and +denominator parts. The class provides fields which return these parts +as well as methods for arithmetic over rational numbers. Each +arithmetic method takes as parameter the right operand of the +operation. The left operand of the operation is always the rational +number of which the method is a member. + +\paragraph{Private members} +The implementation of rational numbers defines a private method +\code{gcd} which computes the greatest common denominator of two +integers, as well as a private field \code{g} which contains the +\code{gcd} of the constructor arguments. These members are inaccessible +outside class \code{Rational}. They are used in the implementation of +the class to eliminate common factors in the constructor arguments in +order to ensure that nominator and denominator are always in +normalized form. + +\paragraph{Creating and Accessing Objects} +As an example of how rational numbers can be used, here's a program +that prints the sum of all numbers $1/i$ where $i$ ranges from 1 to 10. +\begin{lstlisting} +var i = 1; +var x = new Rational(0, 1); +while (i <= 10) { + x = x + new Rational(1,i); + i = i + 1; +} +System.out.println("" + x.numer + "/" + x.denom); +\end{lstlisting} +The \code{+} takes as left operand a string and as right operand a +value of arbitrary type. It returns the result of converting its right +operand to a string and appending it to its left operand. + +\paragraph{Inheritance and Overriding} +Every class in Scala has a superclass which it extends. +\comment{Excepted is +only the root class \code{Object}, which does not have a superclass, +and which is indirectly extended by every other class. } +If a class +does not mention a superclass in its definition, the root type +\code{scala.AnyRef} is implicitly assumed (for Java implementations, +this type is an alias for \code{java.lang.Object}. For instance, class +\code{Rational} could equivalently be defined as +\begin{lstlisting} +class Rational(n: int, d: int) extends AnyRef { + ... // as before +} +\end{lstlisting} +A class inherits all members from its superclass. It may also redefine +(or: {\em override}) some inherited members. For instance, class +\code{java.lang.Object} defines +a method +\code{toString} which returns a representation of the object as a string: +\begin{lstlisting} +class Object { + ... + def toString(): String = ... +} +\end{lstlisting} +The implementation of \code{toString} in \code{Object} +forms a string consisting of the object's class name and a number. It +makes sense to redefine this method for objects that are rational +numbers: +\begin{lstlisting} +class Rational(n: int, d: int) extends AnyRef { + ... // as before + override def toString() = "" + numer + "/" + denom; +} +\end{lstlisting} +Note that, unlike in Java, redefining definitions need to be preceded +by an \code{override} modifier. + +If class $A$ extends class $B$, then objects of type $A$ may be used +wherever objects of type $B$ are expected. We say in this case that +type $A$ {\em conforms} to type $B$. For instance, \code{Rational} +conforms to \code{AnyRef}, so it is legal to assign a \code{Rational} +value to a variable of type \code{AnyRef}: +\begin{lstlisting} +var x: AnyRef = new Rational(1,2); +\end{lstlisting} + +\paragraph{Parameterless Methods} +%Also unlike in Java, methods in Scala do not necessarily take a +%parameter list. An example is \code{toString}; the method is invoked +%by simply mentioning its name. For instance: +%\begin{lstlisting} +%val r = new Rational(1,2); +%System.out.println(r.toString()); // prints``1/2'' +%\end{lstlisting} +Unlike in Java, methods in Scala do not necessarily take a +parameter list. An example is the \code{square} method below. This +method is invoked by simply mentioning its name. +\begin{lstlisting} +class Rational(n: int, d: int) extends AnyRef { + ... // as before + def square = Rational(numer*numer, denom*denom); +} +val r = new Rational(3,4); +System.out.println(r.square); // prints``9/16'' +\end{lstlisting} +That is, parameterless methods are accessed just as value fields such +as \code{numer} are. The difference between values and parameterless +methods lies in their definition. The right-hand side of a value is +evaluated when the object is created, and the value does not change +afterwards. A right-hand side of a parameterless method, on the other +hand, is evaluated each time the method is called. The uniform access +of fields and parameterless methods gives increased flexibility for +the implementer of a class. Often, a field in one version of a class +becomes a computed value in the next version. Uniform access ensures +that clients do not have to be rewritten because of that change. + +\paragraph{Abstract Classes} + +Consider the task of writing a class for sets of integer numbers with +two operations, \code{incl} and \code{contains}. \code{(s incl x)} +should return a new set which contains the element \code{x} togther +with all the elements of set \code{s}. \code{(s contains x)} should +return true if the set \code{s} contains the element \code{x}, and +should return \code{false} otherwise. The interface of such sets is +given by: +\begin{lstlisting} +abstract class IntSet { + def incl(x: int): IntSet; + def contains(x: int): boolean; +} +\end{lstlisting} +\code{IntSet} is labeled as an \emph{abstract class}. This has two +consequences. First, abstract classes may have {\em deferred} members +which are declared but which do not have an implementation. In our +case, both \code{incl} and \code{contains} are such members. Second, +because an abstract class might have unimplemented members, no objects +of that class may be created using \code{new}. By contrast, an +abstract class may be used as a base class of some other class, which +implements the deferred members. + +\paragraph{Traits} + +Instead of \code{abstract class} one also often uses the keyword +\code{trait} in Scala. A trait is an abstract class with no state, no +constructor arguments, and no side effects during object +initialization. Since \code{IntSet}'s fall in this category, one can +alternatively define them as traits: +\begin{lstlisting} +trait IntSet { + def incl(x: int): IntSet; + def contains(x: int): boolean; +} +\end{lstlisting} +A trait corresponds to an interface in Java, except +that a trait can also define implemented methods. + +\paragraph{Implementing Abstract Classes} + +Let's say, we plan to implement sets as binary trees. There are two +possible forms of trees. A tree for the empty set, and a tree +consisting of an integer and two subtrees. Here are their +implementations. + +\begin{lstlisting} +class EmptySet extends IntSet { + def contains(x: int): boolean = false; + def incl(x: int): IntSet = new NonEmptySet(x, new EmptySet, new EmptySet); +} +\end{lstlisting} + +\begin{lstlisting} +class NonEmptySet(elem:int, left:IntSet, right:IntSet) extends IntSet { + def contains(x: int): boolean = + if (x < elem) left contains x + else if (x > elem) right contains x + else true; + def incl(x: int): IntSet = + if (x < elem) new NonEmptySet(elem, left incl x, right) + else if (x > elem) new NonEmptySet(elem, left, right incl x) + else this; +} +\end{lstlisting} +Both \code{EmptySet} and \code{NonEmptySet} extend class +\code{IntSet}. This implies that types \code{EmptySet} and +\code{NonEmptySet} conform to type \code{IntSet} -- a value of type \code{EmptySet} or \code{NonEmptySet} may be used wherever a value of type \code{IntSet} is required. + +\begin{exercise} Write methods \code{union} and \code{intersection} to form +the union and intersection between two sets. +\end{exercise} + +\begin{exercise} Add a method +\begin{lstlisting} +def excl(x: int) +\end{lstlisting} +to return the given set without the element \code{x}. To accomplish this, +it is useful to also implement a test method +\begin{lstlisting} +def isEmpty: boolean +\end{lstlisting} +for sets. +\end{exercise} + +\paragraph{Dynamic Binding} + +Object-oriented languages (Scala included) use \emph{dynamic dispatch} +for method invocations. That is, the code invoked for a method call +depends on the run-time type of the object which contains the method. +For example, consider the expression \code{s contains 7} where +\code{s} is a value of declared type \code{s: IntSet}. Which code for +\code{contains} is executed depends on the type of value of \code{s} at run-time. +If it is an \code{EmptySet} value, it is the implementation of \code{contains} in class \code{EmptySet} that is executed, and analogously for \code{NonEmptySet} values. +This behavior is a direct consequence of our substitution model of evaluation. +For instance, +\begin{lstlisting} + (new EmptySet).contains(7) + +-> $\rewriteby{by replacing {\sl contains} by its body in class {\sl EmptySet}}$ + + false +\end{lstlisting} +Or, +\begin{lstlisting} + new NonEmptySet(7, new EmptySet, new EmptySet).contains(1) + +-> $\rewriteby{by replacing {\sl contains} by its body in class {\sl NonEmptySet}}$ + + if (1 < 7) new EmptySet contains 1 + else if (1 > 7) new EmptySet contains 1 + else true + +-> $\rewriteby{by rewriting the conditional}$ + + new EmptySet contains 1 + +-> $\rewriteby{by replacing {\sl contains} by its body in class {\sl EmptySet}}$ + + false . +\end{lstlisting} + +Dynamic method dispatch is analogous to higher-order function +calls. In both cases, the identity of code to be executed is known +only at run-time. This similarity is not just superficial. Indeed, +Scala represents every function value as an object (see +Section~\ref{sec:functions}). + + +\paragraph{Objects} + +In the previous implementation of integer sets, empty sets were +expressed with \code{new EmptySet}; so a new object was created every time +an empty set value was required. We could have avoided unnecessary +object creations by defining a value \code{empty} once and then using +this value instead of every occurrence of \code{new EmptySet}. E.g. +\begin{lstlisting} +val EmptySetVal = new EmptySet; +\end{lstlisting} +One problem with this approach is that a value definition such as the +one above is not a legal top-level definition in Scala; it has to be +part of another class or object. Also, the definition of class +\code{EmptySet} now seems a bit of an overkill -- why define a class of objects, +if we are only interested in a single object of this class? A more +direct approach is to use an {\em object definition}. Here is +a more streamlined alternative definition of the empty set: +\begin{lstlisting} +object EmptySet extends IntSet { + def contains(x: int): boolean = false; + def incl(x: int): IntSet = new NonEmptySet(x, empty, empty); +} +\end{lstlisting} +The syntax of an object definition follows the syntax of a class +definition; it has an optional extends clause as well as an optional +body. As is the case for classes, the extends clause defines inherited +members of the object whereas the body defines overriding or new +members. However, an object definition defines a single object only; +it is not possible to create other objects with the same structure +using \code{new}. Therefore, object definitions also lack constructor +parameters, which might be present in class definitions. + +Object definitions can appear anywhere in a Scala program; including +at top-level. Since there is no fixed execution order of top-level +entities in Scala, one might ask exactly when the object defined by an +object definition is created and initialized. The answer is that the +object is created the first time one of its members is accessed. This +strategy is called {\em lazy evaluation}. + +\paragraph{Standard Classes} + +\todo{include picture} + +Scala is a pure object-oriented language. This means that every value +in Scala can be regarded as an object. In fact, even primitive types +such as \code{int} or \code{boolean} are not treated specially. They +are defined as type aliases of Scala classes in module \code{Predef}: +\begin{lstlisting} +type boolean = scala.Boolean; +type int = scala.Int; +type long = scala.Long; +... +\end{lstlisting} +For efficiency, the compiler usually represents values of type +\code{scala.Int} by 32 bit integers, values of type +\code{scala.Boolean} by Java's booleans, etc. But it converts these +specialized representations to objects when required, for instance +when a primitive \code{int} value is passed to a function with a +parameter of type \code{AnyRef}. Hence, the special representation of +primitive values is just an optimization, it does not change the +meaning of a program. + +Here is a specification of class \code{Boolean}. +\begin{lstlisting} +package scala; +trait Boolean { + def && (def x: Boolean): Boolean; + def || (def x: Boolean): Boolean; + def ! : Boolean; + + def == (x: Boolean) : Boolean + def != (x: Boolean) : Boolean + def < (x: Boolean) : Boolean + def > (x: Boolean) : Boolean + def <= (x: Boolean) : Boolean + def >= (x: Boolean) : Boolean +} +\end{lstlisting} +Booleans can be defined using only classes and objects, without +reference to a built-in type of booleans or numbers. A possible +implementation of class \code{Boolean} is given below. This is not +the actual implementation in the standard Scala library. For +efficiency reasons the standard implementation uses built-in +booleans. +\begin{lstlisting} +package scala; +trait Boolean { + def ifThenElse(def thenpart: Boolean, def elsepart: Boolean) + + def && (def x: Boolean): Boolean = ifThenElse(x, false); + def || (def x: Boolean): Boolean = ifThenElse(true, x); + def ! : Boolean = ifThenElse(false, true); + + def == (x: Boolean) : Boolean = ifThenElse(x, x.!); + def != (x: Boolean) : Boolean = ifThenElse(x.!, x); + def < (x: Boolean) : Boolean = ifThenElse(false, x); + def > (x: Boolean) : Boolean = ifThenElse(x.!, false); + def <= (x: Boolean) : Boolean = ifThenElse(x, true); + def >= (x: Boolean) : Boolean = ifThenElse(true, x.!); +} +case object True extends Boolean { + def ifThenElse(def t: Boolean, def e: Boolean) = t +} +case object False extends Boolean { + def ifThenElse(def t: Boolean, def e: Boolean) = e +} +\end{lstlisting} +Here is a partial specification of class \code{Int}. + +\begin{lstlisting} +package scala; +trait Int extends AnyVal { + def coerce: Long; + def coerce: Float; + def coerce: Double; + + def + (that: Double): Double; + def + (that: Float): Float; + def + (that: Long): Long; + def + (that: Int): Int; // analogous for -, *, /, % + + def << (cnt: Int): Int; // analogous for >>, >>> + + def & (that: Long): Long; + def & (that: Int): Int; // analogous for |, ^ + + def == (that: Double): Boolean; + def == (that: Float): Boolean; + def == (that: Long): Boolean; // analogous for !=, <, >, <=, >= +} +\end{lstlisting} + +Class \code{Int} can in principle also be implemented using just +objects and classes, without reference to a built in type of +integers. To see how, we consider a slightly simpler problem, namely +how to implement a type \code{Nat} of natural (i.e. non-negative) +numbers. Here is the definition of a trait \code{Nat}: +\begin{lstlisting} +trait Nat { + def isZero: Boolean; + def predecessor: Nat; + def successor: Nat; + def + (that: Nat): Nat; + def - (that: Nat): Nat; +} +\end{lstlisting} +To implement the operations of class \code{Nat}, we define a subobject +\code{Zero} and a subclass \code{Succ} (for successor). Each number +\code{N} is represented as \code{N} applications of the \code{Succ} +constructor to \code{Zero}: +\[ +\underbrace{\mbox{\sl new Succ( ... new Succ}}_{\mbox{$N$ times}}\mbox{\sl (Zero) ... )} +\] +The implementation of the \code{Zero} object is straightforward: +\begin{lstlisting} +object Zero extends Nat { + def isZero: Boolean = true; + def predecessor: Nat = throw new Error("negative number"); + def successor: Nat = new Succ(Zero); + def + (that: Nat): Nat = that; + def - (that: Nat): Nat = if (that.isZero) Zero + else throw new Error("negative number") +} +\end{lstlisting} + +The implementation of the predecessor and subtraction functions on +\code{Zero} throws an \code{Error} exception, which aborts the program +with the given error message. + +Here is the implementation of the successor class: +\begin{lstlisting} +class Succ(x: Nat) extends Nat { + def isZero: Boolean = false; + def predecessor: Nat = x; + def successor: Nat = new Succ(this); + def + (that: Nat): Nat = x + that.successor; + def - (that: Nat): Nat = x - that.predecessor; +} +\end{lstlisting} +Note the implementation of method \code{successor}. To create the +successor of a number, we need to pass the object itself as an +argument to the \code{Succ} constructor. The object itself is +referenced by the reserved name \code{this}. + +The implementations of \code{+} and \code{-} each contain a recursive +call with the constructor argument as receiver. The recursion will +terminate once the receiver is the \code{Zero} object (which is +guaranteed to happen eventually because of the way numbers are formed). + +\begin{exercise} Write an implementation \code{Integer} of integer numbers +The implementation should support all operations of class \code{Nat} +while adding two methods +\begin{lstlisting} +def isPositive: Boolean +def negate: Integer +\end{lstlisting} +The first method should return \code{true} if the number is positive. The second method should negate the number. +Do not use any of Scala's standard numeric classes in your +implementation. (Hint: There are two possible ways to implement +\code{Integer}. One can either make use the existing implementation of +\code{Nat}, representing an integer as a natural number and a sign. +Or one can generalize the given implementation of \code{Nat} to +\code{Integer}, using the three subclasses \code{Zero} for 0, +\code{Succ} for positive numbers and \code{Pred} for negative numbers.) +\end{exercise} + + + +\subsection*{Language Elements Introduced In This Chapter} + +\textbf{Types:} +\begin{lstlisting} +Type = ... | ident +\end{lstlisting} + +Types can now be arbitrary identifiers which represent classes. + +\textbf{Expressions:} +\begin{lstlisting} +Expr = ... | Expr '.' ident | 'new' Expr | 'this' +\end{lstlisting} + +An expression can now be an object creation, or +a selection \code{E.m} of a member \code{m} +from an object-valued expression \code{E}, or it can be the reserved name \code{this}. + +\textbf{Definitions and Declarations:} +\begin{lstlisting} +Def = FunDef | ValDef | ClassDef | TraitDef | ObjectDef +ClassDef = ['abstract'] 'class' ident ['(' [Parameters] ')'] + ['extends' Expr] [`{' {TemplateDef} `}'] +TraitDef = 'trait' ident ['extends' Expr] ['{' {TemplateDef} '}'] +ObjectDef = 'object' ident ['extends' Expr] ['{' {ObjectDef} '}'] +TemplateDef = [Modifier] (Def | Dcl) +ObjectDef = [Modifier] Def +Modifier = 'private' | 'override' +Dcl = FunDcl | ValDcl +FunDcl = 'def' ident {'(' [Parameters] ')'} ':' Type +ValDcl = 'val' ident ':' Type +\end{lstlisting} + +A definition can now be a class, trait or object definition such as +\begin{lstlisting} +class C(params) extends B { defs } +trait T extends B { defs } +object O extends B { defs } +\end{lstlisting} +The definitions \code{defs} in a class, trait or object may be +preceded by modifiers \code{private} or \code{override}. + +Abstract classes and traits may also contain declarations. These +introduce {\em deferred} functions or values with their types, but do +not give an implementation. Deferred members have to be implemented in +subclasses before objects of an abstract class or trait can be created. + +\chapter{Case Classes and Pattern Matching} + +Say, we want to write an interpreter for arithmetic expressions. To +keep things simple initially, we restrict ourselves to just numbers +and \code{+} operations. Such expressions can be represented as a class hierarchy, with an abstract base class \code{Expr} as the root, and two subclasses \code{Number} and +\code{Sum}. Then, an expression \code{1 + (3 + 7)} would be represented as +\begin{lstlisting} +new Sum(new Number(1), new Sum(new Number(3), new Number(7))) +\end{lstlisting} +Now, an evaluator of an expression like this needs to know of what +form it is (either \code{Sum} or \code{Number}) and also needs to +access the components of the expression. The following +implementation provides all necessary methods. +\begin{lstlisting} +trait Expr { + def isNumber: boolean; + def isSum: boolean; + def numValue: int; + def leftOp: Expr; + def rightOp: Expr; +} +class Number(n: int) extends Expr { + def isNumber: boolean = true; + def isSum: boolean = false; + def numValue: int = n; + def leftOp: Expr = throw new Error("Number.leftOp"); + def rightOp: Expr = throw new Error("Number.rightOp"); +} +class Sum(e1: Expr, e2: Expr) extends Expr { + def isNumber: boolean = false; + def isSum: boolean = true; + def numValue: int = throw new Error("Sum.numValue"); + def leftOp: Expr = e1; + def rightOp: Expr = e2; +} +\end{lstlisting} +With these classification and access methods, writing an evaluator function is simple: +\begin{lstlisting} +def eval(e: Expr): int = { + if (e.isNumber) e.numValue + else if (e.isSum) eval(e.leftOp) + eval(e.rightOp) + else throw new Error("unrecognized expression kind") +} +\end{lstlisting} +However, defining all these methods in classes \code{Sum} and +\code{Number} is rather tedious. Furthermore, the problem becomes worse +when we want to add new forms of expressions. For instance, consider +adding a new expression form +\code{Prod} for products. Not only do we have to implement a new class \code{Prod}, with all previous classification and access methods; we also have to introduce a +new abstract method \code{isProduct} in class \code{Expr} and +implement that method in subclasses \code{Number}, \code{Sum}, and +\code{Prod}. Having to modify existing code when a system grows is always problematic, since it introduces versioning and maintenance problems. + +The promise of object-oriented programming is that such modifications +should be unnecessary, because they can be avoided by re-using +existing, unmodified code through inheritance. Indeed, a more +object-oriented decomposition of our problem solves the problem. The +idea is to make the ``high-level'' operation \code{eval} a method of +each expression class, instead of implementing it as a function +outside the expression class hierarchy, as we have done +before. Because \code{eval} is now a member of all expression nodes, +all classification and access methods become superfluous, and the implementation is simplified considerably: +\begin{lstlisting} +trait Expr { + def eval: int; +} +class Number(n: int) extends Expr { + def eval: int = n; +} +class Sum(e1: Expr, e2: Expr) extends Expr { + def eval: int = e1.eval + e2.eval; +} +\end{lstlisting} +Furthermore, adding a new \code{Prod} class does not entail any changes to existing code: +\begin{lstlisting} +class Prod(e1: Expr, e2: Expr) extends Expr { + def eval: int = e1.eval * e2.eval; +} +\end{lstlisting} + +The conclusion we can draw from this example is that object-oriented +decomposition is the technique of choice for constructing systems that +should be extensible with new types of data. But there is also another +possible way we might want to extend the expression example. We might +want to add new {\em operations} on expressions. For instance, we might +want to add an operation that pretty-prints an expression tree to standard output. + +If we have defined all classification and access methods, such an +operation can easily be written as an external function. Here is an +implementation: +\begin{lstlisting} +def print(e: Expr): unit = + if (e.isNumber) System.out.print(e.numValue) + else if (e.isSum) { + System.out.print("("); + print(e.leftOp); + System.out.print("+"); + print(e.rightOp); + System.out.print(")"); + } else throw new Error("unrecognized expression kind"); +\end{lstlisting} +However, if we had opted for an object-oriented decomposition of +expressions, we would need to add a new \code{print} method +to each class: +\begin{lstlisting} +trait Expr { + def eval: int; + def print: unit; +} +class Number(n: int) extends Expr { + def eval: int = n; + def print: unit = System.out.print(n); +} +class Sum(e1: Expr, e2: Expr) extends Expr { + def eval: int = e1.eval + e2.eval; + def print: unit = { + System.out.print("("); + print(e1); + System.out.print("+"); + print(e2); + System.out.print(")"); +} +\end{lstlisting} +Hence, classical object-oriented decomposition requires modification +of all existing classes when a system is extended with new operations. + +As yet another way we might want to extend the interpreter, consider +expression simplification. For instance, we might want to write a +function which rewrites expressions of the form +\code{a * b + a * c} to \code{a * (b + c)}. This operation requires inspection of +more than a single node of the expression tree at the same +time. Hence, it cannot be implemented by a method in each expression +kind, unless that method can also inspect other nodes. So we are +forced to have classification and access methods in this case. This +seems to bring us back to square one, with all the problems of +verbosity and extensibility. + +Taking a closer look, one observers that the only purpose of the +classification and access functions is to {\em reverse} the data +construction process. They let us determine, first, which sub-class +of an abstract base class was used and, second, what were the +constructor arguments. Since this situation is quite common, Scala has +a way to automate it with case classes. + +\section{Case Classes and Case Objects} + +{\em Case classes} and {\em case objects} are defined like a normal +classes or objects, except that the definition is prefixed with the modifier +\code{case}. For instance, the definitions +\begin{lstlisting} +trait Expr; +case class Number(n: int) extends Expr; +case class Sum(e1: Expr, e2: Expr) extends Expr; +\end{lstlisting} +introduce \code{Number} and \code{Sum} as case classes. +The \code{case} modifier in front of a class or object +definition has the following effects. +\begin{enumerate} +\item Case classes implicitly come with a constructor function, with the same name as the class. In our example, the two functions +\begin{lstlisting} +def Number(n: int) = new Number(n); +def Sum(e1: Expr, e2: Expr) = new Sum(e1, e2); +\end{lstlisting} +would be added. Hence, one can now construct expression trees a bit more concisely, as in +\begin{lstlisting} +Sum(Sum(Number(1), Number(2)), Number(3)) +\end{lstlisting} +\item Case classes and case objects +implicity come with implementations of methods +\code{toString}, \code{equals} and \code{hashCode}, which override the +methods with the same name in class \code{AnyRef}. The implementation +of these methods takes in each case the structure of a member of a +case class into account. The \code{toString} method represents an +expression tree the way it was constructed. So, +\begin{lstlisting} +Sum(Sum(Number(1), Number(2)), Number(3)) +\end{lstlisting} +would be converted to exactly that string, whereas the default +implementation in class \code{AnyRef} would return a string consisting +of the outermost constructor name \code{Sum} and a number. The +\code{equals} methods treats two case members of a case class as equal +if they have been constructed with the same constructor and with +arguments which are themselves pairwise equal. This also affects the +implementation of \code{==} and \code{!=}, which are implemented in +terms of \code{equals} in Scala. So, +\begin{lstlisting} +Sum(Number(1), Number(2)) == Sum(Number(1), Number(2)) +\end{lstlisting} +will yield \code{true}. If \code{Sum} or \code{Number} were not case +classes, the same expression would be \code{false}, since the standard +implementation of \code{equals} in class \code{AnyRef} always treats +objects created by different constructor calls as being different. +The \code{hashCode} method follows the same principle as other two +methods. It computes a hash code from the case class constructor name +and the hash codes of the constructor arguments, instead of from the object's +address, which is what the as the default implementation of \code{hashCode} does. +\item +Case classes implicity come with nullary accessor methods which +retrieve the constructor arguments. +In our example, \code{Number} would obtain an accessor method +\begin{lstlisting} +def n: int +\end{lstlisting} +which returns the constructor parameter \code{n}, whereas \code{Sum} would obtain two accessor methods +\begin{lstlisting} +def e1: Expr, e2: Expr; +\end{lstlisting} +Hence, if for a value \code{s} of type \code{Sum}, say, one can now +write \code{s.e1}, to access the left operand. However, for a value +\code{e} of type \code{Expr}, the term \code{e.e1} would be illegal +since \code{e1} is defined in \code{Sum}; it is not a member of the +base class \code{Expr}. +So, how do we determine the constructor and access constructor +arguments for values whose static type is the base class \code{Expr}? +This is solved by the fourth and final particularity of case classes. +\item +Case classes allow the constructions of {\em patterns} which refer to +the case class constructor. +\end{enumerate} + +\section{Pattern Matching} + +Pattern matching is a generalization of C or Java's \code{switch} +statement to class hierarchies. Instead of a \code{switch} statement, +there is a standard method \code{match}, which is defined in Scala's +root class \code{Any}, and therefore is available for all objects. +The \code{match} method takes as argument a number of cases. +For instance, here is an implementation of \code{eval} using +pattern matching. +\begin{lstlisting} +def eval(e: Expr): int = e match { + case Number(x) => x + case Sum(l, r) => eval(l) + eval(r) +} +\end{lstlisting} +In this example, there are two cases. Each case associates a pattern +with an expression. Patterns are matched against the selector +values \code{e}. The first pattern in our example, +\code{Number(n)}, matches all values of the form \code{Number(v)}, +where \code{v} is an arbitrary value. In that case, the {\em pattern +variable} \code{n} is bound to the value \code{v}. Similarly, the +pattern \code{Sum(l, r)} matches all selector values of form +\code{Sum(v}$_1$\code{, v}$_2$\code{)} and binds the pattern variables +\code{l} and \code{r} +to \code{v}$_1$ and \code{v}$_2$, respectively. + +In general, patterns are built from +\begin{itemize} +\item Case class constructors, e.g. \code{Number}, \code{Sum}, whose arguments + are again patterns, +\item pattern variables, e.g. \code{n}, \code{e1}, \code{e2}, +\item the ``wildcard'' pattern \code{_}, +\item literals, e.g. \code{1}, \code{true}, "abc", +\item constant identifiers, e.g. \code{MAXINT}, \code{EmptySet}. +\end{itemize} +Pattern variables always start with a lower-case letter, so that they +can be distinguished from constant identifiers, which start with an +upper case letter. Each variable name may occur only once in a +pattern. For instance, \code{Sum(x, x)} would be illegal as a pattern, +since the pattern variable \code{x} occurs twice in it. + +\paragraph{Meaning of Pattern Matching} +A pattern matching expression +\begin{lstlisting} +e.match { case p$_1$ => e$_1$ ... case p$_n$ => e$_n$ } +\end{lstlisting} +matches the patterns $p_1 \commadots p_n$ in the order they +are written against the selector value \code{e}. +\begin{itemize} +\item +A constructor pattern $C(p_1 \commadots p_n)$ matches all values that +are of type \code{C} (or a subtype thereof) and that have been constructed with +\code{C}-arguments matching patterns $p_1 \commadots p_n$. +\item +A variable pattern \code{x} matches any value and binds the variable +name to that value. +\item +The wildcard pattern `\code{_}' matches any value but does not bind a name to that value. +\item A constant pattern \code{C} matches a value which is +equal (in terms of \code{==}) to \code{C}. +\end{itemize} +The pattern matching expression rewrites to the right-hand-side of the +first case whose pattern matches the selector value. References to +pattern variables are replaced by corresponding constructor arguments. +If none of the patterns matches, the pattern matching expression is +aborted with a \code{MatchError} exception. + +\example Our substitution model of program evaluation extends quite naturally to pattern matching, For instance, here is how \code{eval} applied to a simple expression is re-written: +\begin{lstlisting} + eval(Sum(Number(1), Number(2))) + +-> $\mbox{\tab\tab\rm(by rewriting the application)}$ + + Sum(Number(1), Number(2)) match { + case Number(n) => n + case Sum(e1, e2) => eval(e1) + eval(e2) + } + +-> $\mbox{\tab\tab\rm(by rewriting the pattern match)}$ + + eval(Number(1)) + eval(Number(2)) + +-> $\mbox{\tab\tab\rm(by rewriting the first application)}$ + + Number(1) match { + case Number(n) => n + case Sum(e1, e2) => eval(e1) + eval(e2) + } + eval(Number(2)) + +-> $\mbox{\tab\tab\rm(by rewriting the pattern match)}$ + + 1 + eval(Number(2)) + +->$^*$ 1 + 2 -> 3 +\end{lstlisting} + +\paragraph{Pattern Matching and Methods} +In the previous example, we have used pattern +matching in a function which was defined outside the class hierarchy +over which it matches. Of course, it is also possible to define a +pattern matching function in that class hierarchy itself. For +instance, we could have defined +\code{eval} is a method of the base class \code{Expr}, and still have used pattern matching in its implementation: +\begin{lstlisting} +trait Expr { + def eval: int = this match { + case Number(n) => n + case Sum(e1, e2) => e1.eval + e2.eval + } +} +\end{lstlisting} + +\begin{exercise} Consider the following definitions representing trees +of integers. These definitions can be seen as an alternative +representation of \code{IntSet}: +\begin{lstlisting} +trait IntTree; +case object EmptyTree extends IntTree; +case class Node(elem: int, left: IntTree, right: IntTree) extends IntTree; +\end{lstlisting} +Complete the following implementations of function \code{contains} and \code{insert} for +\code{IntTree}'s. +\begin{lstlisting} +def contains(t: IntTree, v: int): boolean = t match { ... + ... +} +def insert(t: IntTree, v: int): IntTree = t match { ... + ... +} +\end{lstlisting} +\end{exercise} + +\paragraph{Pattern Matching Anonymous Functions} + +So far, case-expressions always appeared in conjunction with a +\verb@match@ operation. But it is also possible to use +case-expressions by themselves. A block of case-expressions such as +\begin{lstlisting} +{ case $P_1$ => $E_1$ ... case $P_n$ => $E_n$ } +\end{lstlisting} +is seen by itself as a function which matches its arguments +against the patterns $P_1 \commadots P_n$, and produces the result of +one of $E_1 \commadots E_n$. (If no pattern matches, the function +would throw a \code{MatchError} exception instead). +In other words, the expression above is seen as a shorthand for the anonymous function +\begin{lstlisting} +(x => x match { case $P_1$ => $E_1$ ... case $P_n$ => $E_n$ }) +\end{lstlisting} +where \code{x} is a fresh variable which is not used +otherwise in the expression. + +\chapter{Generic Types and Methods} + +Classes in Scala can have type parameters. We demonstrate the use of +type parameters with functional stacks as an example. Say, we want to +write a data type of stacks of integers, with methods \code{push}, +\code{top}, \code{pop}, and \code{isEmpty}. This is achieved by the +following class hierarchy: +\begin{lstlisting} +trait IntStack { + def push(x: int): IntStack = new IntNonEmptyStack(x, this); + def isEmpty: boolean + def top: int; + def pop: IntStack; +} +class IntEmptyStack extends IntStack { + def isEmpty = true; + def top = throw new Error("EmptyStack.top"); + def pop = throw new Error("EmptyStack.pop"); +} +class IntNonEmptyStack(elem: int, rest: IntStack) { + def isEmpty = false; + def top = elem; + def pop = rest; +} +\end{lstlisting} +Of course, it would also make sense to define an abstraction for a +stack of Strings. To do that, one could take the existing abstraction +for \code{IntStack}, rename it to \code{StringStack} and at the same +time rename all occurrences of type \code{int} to \code{String}. + +A better way, which does not entail code duplication, is to +parameterize the stack definitions with the element type. +Parameterization lets us generalize from a specific instance of a +problem to a more general one. So far, we have used parameterization +only for values, but it is available also for types. To arrive at a +{\em generic} version of \code{Stack}, we equip it with a type +parameter. +\begin{lstlisting} +trait Stack[a] { + def push(x: a): Stack[a] = new NonEmptyStack[a](x, this); + def isEmpty: boolean + def top: a; + def pop: Stack[a]; +} +class EmptyStack[a] extends Stack[a] { + def isEmpty = true; + def top = throw new Error("EmptyStack.top"); + def pop = throw new Error("EmptyStack.pop"); +} +class NonEmptyStack[a](elem: a, rest: Stack[a]) extends Stack[a] { + def isEmpty = false; + def top = elem; + def pop = rest; +} +\end{lstlisting} +In the definitions above, `\code{a}' is a {\em type parameter} of +class \code{Stack} and its subclasses. Type parameters are arbitrary +names; they are enclosed in brackets instead of parentheses, so that +they can be easily distinguished from value parameters. Here is an +example how the generic classes are used: +\begin{lstlisting} +val x = new EmptyStack[int]; +val y = x.push(1).push(2); +System.out.println(y.pop.top); +\end{lstlisting} +The first line creates a new empty stack of \code{int}'s. Note the +actual type argument \code{[int]} which replaces the formal type +parameter \code{a}. + +It is also possible to parameterize methods with types. As an example, +here is a generic method which determines whether one stack is a +prefix of another. +\begin{lstlisting} +def isPrefix[a](p: Stack[a], s: Stack[a]): boolean = { + p.isEmpty || + p.top == s.top && isPrefix[a](p.pop, s.pop); +} +\end{lstlisting} +parameters are called {\em polymorphic}. Generic methods are also +called {\em polymorphic}. The term comes from the Greek, where it +means ``having many forms''. To apply a polymorphic method such as +\code{isPrefix}, we pass type parameters as well as value parameters +to it. For instance, +\begin{lstlisting} +val s1 = new EmptyStack[String].push("abc"); +val s2 = new EmptyStack[String].push("abx").push(s.pop) +System.out.println(isPrefix[String](s1, s2)); +\end{lstlisting} + +\paragraph{Local Type Inference} +Passing type parameters such as \code{[int]} or \code{[String]} all +the time can become tedious in applications where generic functions +are used a lot. Quite often, the information in a type parameter is +redundant, because the correct parameter type can also be determined +by inspecting the function's value parameters or expected result type. +Taking the expression \code{isPrefix[String](s1, s2)} as an +example, we know that its value parameters are both of type +\code{Stack[String]}, so we can deduce that the type parameter must +be \code{String}. Scala has a fairly powerful type inferencer which +allows one to omit type parameters to polymorphic functions and +constructors in situations like these. In the example above, one +could have written \code{isPrefix(s1, s2)} and the missing type argument +\code{[String]} would have been inserted by the type inferencer. + +\section{Type Parameter Bounds} + +Now that we know how to make classes generic it is natural to +generalize some of the earlier classes we have written. For instance +class \code{IntSet} could be generalized to sets with arbitrary +element types. Let's try. The trait for generic sets is easily +written. +\begin{lstlisting} +trait Set[a] { + def incl(x: a): Set[a]; + def contains(x: a): boolean; +} +\end{lstlisting} +However, if we still want to implement sets as binary search trees, we +encounter a problem. The \code{contains} and \code{incl} methods both +compare elements using methods \code{<} and \code{>}. For +\code{IntSet} this was OK, since type \code{int} has these two +methods. But for an arbitrary type parameter \code{a}, we cannot +guarantee this. Therefore, the previous implementation of, say, +\code{contains} would generate a compiler error. +\begin{lstlisting} + def contains(x: int): boolean = + if (x < elem) left contains x + ^ < $\mbox{\sl not a member of type}$ a. +\end{lstlisting} +One way to solve the problem is to restrict the legal types that can +be substituted for type \code{a} to only those types that contain methods +\code{<} and \code{>} of the correct types. There is a trait +\code{Ord[a]} in the standard class library Scala which represents +values which are comparable (via \code{<} and \code{>}) to values of +type \code{a}. We can enforce the comparability of a type by demanding +that the type is a subtype of \code{Ord}. This is done by giving an +upper bound to the type parameter of \code{Set}: +\begin{lstlisting} +trait Set[a <: Ord[a]] { + def incl(x: a): Set[a]; + def contains(x: a): boolean; +} +\end{lstlisting} +The parameter declaration \code{a <: Ord[a]} introduces \code{a} as a +type parameter which must be a subtype of \code{Ord[a]}, i.e.\ its values +must be comparable to values of the same type. + +With this restriction, we can now implement the rest of the generic +set abstraction as we did in the case of \code{IntSet}s before. + +\begin{lstlisting} +class EmptySet[a <: Ord[a]] extends Set[a] { + def contains(x: a): boolean = false; + def incl(x: a): Set[a] = new NonEmptySet(x, new EmptySet[a], new EmptySet[a]); +} +\end{lstlisting} + +\begin{lstlisting} +class NonEmptySet[a <: Ord[a]] + (elem:int, left: Set[a], right: Set[a]) extends Set[a] { + def contains(x: a): boolean = + if (x < elem) left contains x + else if (x > elem) right contains x + else true; + def incl(x: a): Set[a] = + if (x < elem) new NonEmptySet(elem, left incl x, right) + else if (x > elem) new NonEmptySet(elem, left, right incl x) + else this; +} +\end{lstlisting} +Note that we have left out the type argument in the object creations +\code{new NonEmptySet(...)}. In the same way as for polymorphic methods, +missing type arguments in constructor calls are inferred from value +arguments and/or the expected result type. + +Here is an example that uses the generic set abstraction. +\begin{lstlisting} +val s = new EmptySet[double].incl(1.0).incl(2.0); +s.contains(1.5) +\end{lstlisting} +This is OK, as type \code{double} implements trait \code{Ord[double]}. +However, the following example is in error. +\begin{lstlisting} +val s = new EmptySet[java.io.File] + ^ java.io.File $\mbox{\sl does not conform to type}$ + $\mbox{\sl parameter bound}$ Ord[java.io.File]. +\end{lstlisting} +To conclude the discussion of type parameter +bounds, here is the defintion of trait \code{Ord} in scala. +\begin{lstlisting} +package scala; +trait Ord[t <: Ord[t]]: t { + def < (that: t): Boolean; + def <=(that: t): Boolean = this < that || this == that; + def > (that: t): Boolean = that < this; + def >=(that: t): Boolean = that <= this; +} +\end{lstlisting} + +\section{Variance Annotations}\label{sec:first-arrays} + +The combination of type parameters and subtyping poses some +interesting questions. For instance, should \code{Stack[String]} be a +subtype of \code{Stack[AnyRef]}? Intuitively, this seems OK, since a +stack of \code{String}s is a special case of a stack of +\code{AnyRef}s. More generally, if \code{T} is a subtype of type \code{S} +then \code{Stack[T]} should be a subtype of \code{Stack[S]}. +This property is called {\em co-variant} subtyping. + +In Scala, generic types have by default non-variant subtyping. That +is, with \code{Stack} defined as above, stacks with different element +types would never be in a subtype relation. However, we can enforce +co-variant subtyping of stacks by changing the first line of the +definition of class \code{Stack} as follows. +\begin{lstlisting} +class Stack[+a] { +\end{lstlisting} +Prefixing a formal type parameter with a \code{+} indicates that +subtyping is covariant in that parameter. +Besides \code{+}, there is also a prefix \code{-} which indicates +contra-variant subtyping. If \code{Stack} was defined \code{class +Stack[-a] ...}, then \code{T} a subtype of type \code{S} would imply +that \code{Stack[S]} is a subtype of \code{Stack[T]} (which in the +case of stacks would be rather surprising!). + +In a purely functional world, all types could be co-variant. However, +the situation changes once we introduce mutable data. Consider the +case of arrays in Java or .NET. Such arrays are represented in Scala +by a generic class \code{Array}. Here is a partial definition of this +class. +\begin{lstlisting} +class Array[a] { + def apply(index: int): a + def update(index: int, elem: a): unit; +} +\end{lstlisting} +The class above defines the way Scala arrays are seen from Scala user +programs. The Scala compiler will map this abstraction to the +underlying arrays of the host system in most cases where this +possible. + +In Java, arrays are indeed covariant; that is, for reference types +\code{T} and \code{S}, if \code{T} is a subtype of \code{S}, then also +\code{Array[T]} is a subtype of \code{Array[S]}. This might seem +natural but leads to safety problems that require special runtime +checks. Here is an example: +\begin{lstlisting} +val x = new Array[String](1); +val y: Array[Any] = x; +y(0) = new Rational(1, 2); // this is syntactic sugar for + // y.update(0, new Rational(1, 2)); +\end{lstlisting} +In the first line, a new array of strings is created. In the second +line, this array is bound to a variable \code{y}, of type +\code{Array[Any]}. Assuming arrays are covariant, this is OK, since +\code{Array[String]} is a subtype of \code{Array[Any]}. Finally, in +the last line a rational number is stored in the array. This is also +OK, since type \code{Rational} is a subtype of the element type +\code{Any} of the array \code{y}. We thus end up storing a rational +number in an array of strings, which clearly violates type soundness. + +Java solves this problem by introducing a run-time check in the third +line which tests whether the stored element is compatible with the +element type with which the array was created. We have seen in the +example that this element type is not necessarily the static element +type of the array being updated. If the test fails, an +\code{ArrayStoreException} is raised. + +Scala solves this problem instead statically, by disallowing the +second line at compile-time, because arrays in Scala have non-variant +subtyping. This raises the question how a Scala compiler verifies that +variance annotations are correct. If we had simply declared arrays +co-variant, how would the potential problem have been detected? + +Scala uses a conservative approximation to verify soundness of +variance annotations. A covariant type parameter of a class may only +appear in co-variant positions inside the class. Among the co-variant +positions are the types of values in the class, the result types of +methods in the class, and type arguments to other covariant types. Not +co-variant are types of formal method parameters. Hence, the following +class definition would have been rejected +\begin{lstlisting} +class Array[+a] { + def apply(index: int): a; + def update(index: int, elem: a): unit; + ^ $\mbox{\sl covariant type parameter}$ a + $\mbox{\sl appears in contravariant position.}$ +} +\end{lstlisting} +So far, so good. Intuitively, the compiler was correect in rejecting +the \code{update} method in a co-variant class because \code{update} +potentially changes state, and therefore undermines the soundness of +co-variant subtyping. + +However, there are also methods which do not mutate state, but where a +type parameter still appears contra-variantly. An example is +\code{push} in type \code{Stack}. Again the Scala compiler will reject +the definition of this method for co-variant stacks. +\begin{lstlisting} +class Stack[+a] { + def push(x: a): Stack[a] = + ^ $\mbox{\sl covariant type parameter}$ a + $\mbox{\sl appears in contravariant position.}$ +\end{lstlisting} +This is a pity, because, unlike arrays, stacks are purely functional data +structures and therefore should enable co-variant subtyping. However, +there is a a way to solve the problem by using a polymorphic method +with a lower type parameter bound. + +\section{Lower Bounds} + +We have seen upper bounds for type parameters. In a type parameter +declaration such as \code{t <: U}, the type parameter \code{t} is +restricted to range only over subtypes of type \code{U}. Symmetrical +to this are lower bounds in Scala. In a type parameter declaration +\code{t >: L}, the type parameter \code{t} is restricted to range only +over {\em supertypes} of type \code{L}. (One can also combine lower and +upper bounds, as in \code{t >: L <: U}.) + +Using lower bounds, we can generalize the \code{push} method in +\code{Stack} as follows. +\begin{lstlisting} +class Stack[+a] { + def push[b >: a](x: b): Stack[b] = new NonEmptyStack(x, this); +\end{lstlisting} +Technically, this solves our variance problem since now the type +parameter \code{a} appears no longer as a parameter type of method +\code{push}. Instead, it appears as lower bound for another type +parameter of a method, which is classified as a co-variant position. +Hence, the Scala compiler accepts the new definition of \code{push}. + +In fact, we have not only solved the technical variance problem but +also have generalized the definition of \code{push}. Before, we were +required to push only elements with types that conform to the declared +element type of the stack. Now, we can push also elements of a +supertype of this type, but the type of the returned stack will change +accordingly. For instance, we can now push an \code{AnyRef} onto a +stack of \code{String}s, but the resulting stack will be a stack of +\code{AnyRef}s instead of a stack of \code{String}s! + +In summary, one should not hesitate to add variance annotations to +your data structures, as this yields rich natural subtyping +relationships. The compiler will detect potential soundness +problems. Even if the compiler's approximation is too conservative, as +in the case of method \code{push} of class \code{Stack}, this will +often suggest a useful generalization of the contested method. + +\section{Least Types} + +Scala does not allow one to parameterize objects with types. That's +why we orginally defined a generic class \code{EmptyStack[a]}, even +though a single value denoting empty stacks of arbitrary type would +do. For co-variant stacks, however, one can use the following idiom: +\begin{lstlisting} +object EmptyStack extends Stack[All] { ... } +\end{lstlisting} +The identifier \code{All} refers to the bottom type \code{scala.All}, +which is a subtype of all other types. Hence, for co-variant stacks, +\code{Stack[All]} is a subtype of \code{Stack[T]}, for any other type +\code{T}. This makes it possible to use a single empty stack object +in user code. For instance: +\begin{lstlisting} +val s = EmptyStack.push("abc").push(new AnyRef()); +\end{lstlisting} +Let's analyze the type assignment for this expression in detail. The +\code{EmptyStack} object is of type \code{Stack[All]}, which has a +method +\begin{lstlisting} +push[b >: All](elem: b): Stack[b] . +\end{lstlisting} +Local type inference will determine that the type parameter \code{b} +should be instantiated to \code{String} in the application +\code{EmptyStack.push("abc")}. The result type of that application is hence +\code{Stack[String]}, which in turn has a method +\begin{lstlisting} +push[b >: String](elem: b): Stack[b] . +\end{lstlisting} +The final part of the value definition above is the application of +this method to \code{new AnyRef()}. Local type inference will +determine that the type parameter \code{b} should this time be +instantiated to \code{AnyRef}, with result type \code{Stack[AnyRef]}. +Hence, the type assigned to value \code{s} is \code{Stack[AnyRef]}. + +Besides \code{scala.All}, which is a subtype of every other type, +there is also the type \code{scala.AllRef}, which is a subtype of +\code{scala.AnyRef}, and every type derived from it. The \code{null} +literal in Scala is of that type. This makes \code{null} compatible +with every reference type, but not with a value type such as +\code{int}. + +We conclude this section with the complete improved definition of +stacks. Stacks have now co-variant subtyping, the \code{push} method +has been generalized, and the empty stack is represented by a single +object. +\begin{lstlisting} +trait Stack[+a] { + def push[b >: a](x: b): Stack[b] = new NonEmptyStack(x, this); + def isEmpty: boolean + def top: a; + def pop: Stack[a]; +} +object EmptyStack extends Stack[All] { + def isEmpty = true; + def top = throw new Error("EmptyStack.top"); + def pop = throw new Error("EmptyStack.pop"); +} +class NonEmptyStack[{a](elem: a, rest: Stack[a]) extends Stack[a] { + def isEmpty = false; + def top = elem; + def pop = rest; +} +\end{lstlisting} +Many classes in the Scala library are generic. We now present two +commonly used families of generic classes, tuples and functions. The +discussion of another common class, lists, is deferred to the next +chapter. + +\section{Tuples} + +Sometimes, a function needs to return more than one result. For +instance, take the function \code{divmod} which returns the integer quotient +and rest of two given integer arguments. Of course, one can define a +class to hold the two results of \code{divmod}, as in: +\begin{lstlisting} +case class TwoInts(first: int, second: int); +def divmod(x: int, y: int): TwoInts = new TwoInts(x / y, x % y) +\end{lstlisting} +However, having to define a new class for every possible pair of +result types is very tedious. In Scala one can use instead a +the generic classes \lstinline@Tuple$n$@, for each $n$ between +2 and 9. As an example, here is the definition of Tuple2. +\begin{lstlisting} +package scala; +case class Tuple2[a, b](_1: a, _2: b); +\end{lstlisting} +With \code{Tuple2}, the \code{divmod} method can be written as follows. +\begin{lstlisting} +def divmod(x: int, y: int) = new Tuple2[int, int](x / y, x % y) +\end{lstlisting} +As usual, type parameters to constructors can be omitted if they are +deducible from value arguments. Also, Scala defines an alias +\code{Pair} for \code{Tuple2} (as well as \code{Triple} for \code{Tuple3}). +With these conventions, \code{divmod} can equivalently be written as +follows. +\begin{lstlisting} +def divmod(x: int, y: int) = Pair(x / y, x % y) +\end{lstlisting} +How are elements of tuples acessed? Since tuples are case classes, +there are two possibilities. One can either access a tuple's fields +using the names of the constructor parameters \lstinline@_$i$@, as in the following example: +\begin{lstlisting} +val xy = divmod(x, y); +System.out.println("quotient: " + x._1 + ", rest: " + x._2); +\end{lstlisting} +Or one uses pattern matching on tuples, as in the following erample: +\begin{lstlisting} +divmod(x, y) match { + case Pair(n, d) => + System.out.println("quotient: " + n + ", rest: " + d); +} +\end{lstlisting} +Note that type parameters are never used in patterns; it would have +been illegal to write case \code{Pair[int, int](n, d)}. + +\section{Functions}\label{sec:functions} + +Scala is a functional language in that functions are first-class +values. Scala is also an object-oriented language in that every value +is an object. It follows that functions are objects in Scala. For +instance, a function from type \code{String} to type \code{int} is +represented as an instance of the trait \code{Function1[String, int]}. +The \code{Function1} trait is defined as follows. +\begin{lstlisting} +package scala; +trait Function1[-a, +b] { + def apply(x: a): b +} +\end{lstlisting} +Besides \code{Function1}, there are also definitions of +\code{Function0} and \code{Function2} up to \code{Function9} in the +standard Scala library. That is, there is one definition for each +possible number of function parameters between 0 and 9. Scala's +function type syntax ~\lstinline@$T_1 \commadots T_n$ => $S$@~ is +simply an abbreviation for the parameterized type +~\lstinline@Function$n$[$T_1 \commadots T_n, S$]@~. + +Scala uses the same syntax $f(x)$ for function application, no matter +whether $f$ is a method or a function object. This is made possible by +the following convention: A function application $f(x)$ where $f$ is +an object (as opposed to a method) is taken to be a shorthand for +\lstinline@$f$.apply($x$)@. Hence, the \code{apply} method of a +function type is inserted automatically where this is necessary. + +That's also why we defined array subscripting in +Section~\ref{sec:first-arrays} by an \code{apply} method. For any +array \code{a}, the subscript operation \code{a(i)} is taken to be a +shorthand for \code{a.apply(i)}. + +Functions are an example where a contra-variant type parameter +declaration is useful. For example, consider the following code: +\begin{lstlisting} +val f: (AnyRef => int) = x => x.hashCode(); +val g: (String => int) = f +g("abc") +\end{lstlisting} +It's sound to bind the value \code{g} of type \code{String => int} to +\code{f}, which is of type \code{AnyRef => int}. Indeed, all one can +do with function of type \code{String => int} is pass it a string in +order to obtain an integer. Clearly, the same works for function +\code{f}: If we pass it a string (or any other object), we obtain an +integer. This demonstrates that function subtyping is contra-variant +in its argument type whereas it is covariant in its result type. +In short, $S \Rightarrow T$ is a subtype of $S' \Rightarrow T'$, provided +$S'$ is a subtype of $S$ and $T$ is a subtype of $T'$. + +\example Consider the Scala code +\begin{lstlisting} +val plus1: (int => int) = (x: int) => x + 1; +plus1(2) +\end{lstlisting} +This is expanded into the following object code. +\begin{lstlisting} +val plus1: Function1[int, int] = new Function1[int, int] { + def apply(x: int): int = x + 1 +} +plus1.apply(2) +\end{lstlisting} +Here, the object creation \lstinline@new Function1[int, int]{ ... }@ +represents an instance of an {\em anonymous class}. It combines the +creation of a new \code{Function1} object with an implementation of +the \code{apply} method (which is abstract in \code{Function1}). +Equivalently, but more verbosely, one could have used a local class: +\begin{lstlisting} +val plus1: Function1[int, int] = { + class Local extends Function1[int, int] { + def apply(x: int): int = x + 1 + } + new Local: Function1[int, int] +} +plus1.apply(2) +\end{lstlisting} + +\chapter{Lists} + +Lists are an important data structure in many Scala programs. +A list containing the elements \code{x}$_1$, \ldots, \code{x}$_n$ is written +\code{List(x}$_1$\code{, ..., x}$_n$\code{)}. Examples are: +\begin{lstlisting} +val fruit = List("apples", "oranges", "pears"); +val nums = List(1, 2, 3, 4); +val diag3 = List(List(1, 0, 0), List(0, 1, 0)); +val empty = List(); +\end{lstlisting} +Lists are similar to arrays in languages such as C or Java, but there +are also three important differences. First, lists are immutable. That +is, elements of a list cannot be changed by assignment. Second, +lists have a recursive structure, whereas arrays are flat. Third, +lists support a much richer set of operations than arrays usually do. + +\section{Using Lists} + +\paragraph{The List type} +Like arrays, lists are {\em homogeneous}. That is, the elements of a +list all have the same type. The type of a list with elements of type +\code{T} is written \code{List[T]} (compare to \code{T[]} in Java). +\begin{lstlisting} +val fruit: List[String] = List("apples", "oranges", "pears"); +val nums : List[int] = List(1, 2, 3, 4); +val diag3: List[List[int]] = List(List(1, 0, 0), List(0, 1, 0)); +val empty: List[int] = List(); +\end{lstlisting} + +\paragraph{List constructors} +All lists are built from two more fundamental constructors, \code{Nil} +and \code{::} (pronounced ``cons''). \code{Nil} represents an empty +list. The infix operator \code{::} expresses list extension. That is, +\code{x :: xs} represents a list whose first element is \code{x}, +which is followed by (the elements of) list \code{xs}. Hence, the +list values above could also have been defined as follows (in fact +their previous definition is simply syntactic sugar for the definitions below). +\begin{lstlisting} +val fruit = "apples" :: ("oranges" :: ("pears" :: Nil)); +val nums = 1 :: (2 :: (3 :: (4 :: Nil))); +val diag3 = (1 :: (0 :: (0 :: Nil))) :: + (0 :: (1 :: (0 :: Nil))) :: + (0 :: (0 :: (1 :: Nil))) :: Nil; +val empty = Nil; +\end{lstlisting} +The `\code{::}' operation associates to the right: \code{A :: B :: C} is +interpreted as \code{A :: (B :: C)}. Therefore, we can drop the +parentheses in the definitions above. For instance, we can write +shorter +\begin{lstlisting} +val nums = 1 :: 2 :: 3 :: 4 :: Nil; +\end{lstlisting} + +\paragraph{Basic operations on lists} +All operations on lists can be expressed in terms of the following three: + +\begin{tabular}{ll} +\code{head} & returns the first element of a list,\\ +\code{tail} & returns the list consisting of all elements except the\\ +& first element,\\ +\code{isEmpty} & returns \code{true} iff the list is empty +\end{tabular} + +These operations are defined as methods of list objects. So we invoke +them by selecting from the list that's operated on. Examples: +\begin{lstlisting} +empty.isEmpty = true +fruit.isEmpty = false +fruit.head = "apples" +fruit.tail.head = "oranges" +diag3.head = List(1, 0, 0) +\end{lstlisting} +The \code{head} and \code{tail} methods are defined only for non-empty +lists. When selected from an empty list, they throw an exception. + +As an example of how lists can be processed, consider sorting the +elements of a list of numbers into ascending order. One simple way to +do so is {\em insertion sort}, which works as follows: To sort a +non-empty list with first element \code{x} and rest \code{xs}, sort +the remainder \code{xs} and insert the element \code{x} at the right +position in the result. Sorting an empty list will yield the +empty list. Expressed as Scala code: +\begin{lstlisting} +def isort(xs: List[int]): List[int] = + if (xs.isEmpty) Nil + else insert(xs.head, isort(xs.tail)) +\end{lstlisting} + +\begin{exercise} Provide an implementation of the missing function +\code{insert}. +\end{exercise} + +\paragraph{List patterns} In fact, \code{::} is defined as a case +class in Scala's standard library. Hence, it is possible to decompose +lists by pattern matching, using patterns composed from the \code{Nil} +and \code{::} constructors. For instance, \code{isort} can be written +alternatively as follows. +\begin{lstlisting} +def isort(xs: List[int]): List[int] = xs match { + case List() => List() + case x :: xs1 => insert(x, isort(xs1)) +} +\end{lstlisting} +where +\begin{lstlisting} +def insert(x: int, xs: List[int]): List[int] = xs match { + case List() => List(x) + case y :: ys => if (x <= y) x :: xs else y :: insert(x, ys) +} +\end{lstlisting} + +\section{Definition of class List I: First Order Methods} +\label{sec:list-first-order} + +Lists are not built in in Scala; they are defined by an abstract class +\code{List}, which comes with two subclasses for \code{::} and \code{Nil}. +In the following we present a tour through class \code{List}. +\begin{lstlisting} +package scala; +abstract class List[+a] { +\end{lstlisting} +\code{List} is an abstract class, so one cannot define elements by +calling the empty \code{List} constructor (e.g. by +\code{new List}). The class has a type parameter \code{a}. It is +co-variant in this parameter, which means that +\code{List[S] <: List[T]} for all types \code{S} and \code{T} such that +\code{S <: T}. The class is situated in the package +\code{scala}. This is a package containing the most important standard +classes of Scala. + \code{List} defines a number of methods, which are +explained in the following. + +\paragraph{Decomposing lists} +First, there are the three basic methods \code{isEmpty}, +\code{head}, \code{tail}. Their implementation in terms of pattern +matching is straightforward: +\begin{lstlisting} +def isEmpty: boolean = match { + case Nil => true + case x :: xs => false +} +def head: a = match { + case Nil => throw new Error("Nil.head") + case x :: xs => x +} +def tail: List[a] = match { + case Nil => throw new Error("Nil.tail") + case x :: xs => x +} +\end{lstlisting} + +The next function computes the length of a list. +\begin{lstlisting} +def length = match { + case Nil => 0 + case x :: xs => 1 + xs.length +} +\end{lstlisting} +\begin{exercise} Design a tail-recursive version of \code{length}. +\end{exercise} + +The next two functions are the complements of \code{head} and +\code{tail}. +\begin{lstlisting} +def last: a; +def init: List[a]; +\end{lstlisting} +\code{xs.last} returns the last element of list \code{xs}, whereas +\code{xs.init} returns all elements of \code{xs} except the last. +Both functions have to traverse the entire list, and are thus less +efficient than their \code{head} and \code{tail} analogues. +Here is the implementation of \code{last}. +\begin{lstlisting} +def last: a = match { + case Nil => throw new Error("Nil.last") + case x :: Nil => x + case x :: xs => xs.last +} +\end{lstlisting} +The implementation of \code{init} is analogous. + +The next three functions return a prefix of the list, or a suffix, or +both. +\begin{lstlisting} +def take(n: int): List[a] = + if (n == 0 || isEmpty) Nil else head :: tail.take(n-1); + +def drop(n: int): List[a] = + if (n == 0 || isEmpty) this else tail.drop(n-1); + +def split(n: int): Pair[List[a], List[a]] = Pair(take(n), drop(n)) +\end{lstlisting} +\code{(xs take n)} returns the first \code{n} elements of list +\code{xs}, or the whole list, if its length is smaller than \code{n}. +\code{(xs drop n)} returns all elements of \code{xs} except the +\code{n} first ones. Finally, \code{(xs split n)} returns a pair +consisting of the lists resulting from \code{xs take n} and +\code{xs drop n}. + +The next function returns an element at a given index in a list. +It is thus analogous to array subscripting. Indices start at 0. +\begin{lstlisting} +def apply(n: int): a = drop(n).head; +\end{lstlisting} +The \code{apply} method has a special meaning in Scala. An object with +an \code{apply} method can be applied to arguments as if it was a +function. For instance, to pick the 3'rd element of a list \code{xs}, +one can write either \code{xs.apply(3)} or \code{xs(3)} -- the latter +expression expands into the first. + +With \code{take} and \code{drop}, we can extract sublists consisting +of consecutive elements of the original list. To extract the sublist +$xs_m \commadots xs_{n-1}$ of a list \code{xs}, use: + +\begin{lstlisting} +xs.drop(m).take(n - m) +\end{lstlisting} + +\paragraph{Zipping lists} The next function combines two lists into a list of pairs. +Given two lists +\begin{lstlisting} +xs = List(x$_1$, ..., x$_n$) $\mbox{\rm, and}$ +ys = List(y$_1$, ..., y$_n$) , +\end{lstlisting} +\code{xs zip ys} constructs the list +\code{List(Pair(x}$_1$\code{, y}$_1$\code{), ..., Pair(x}$_n$\code{, y}$_n$\code{))}. +If the two lists have different lengths, the longer one of the two is +truncated. Here is the definition of \code{zip} -- note that it is a +polymorphic method. +\begin{lstlisting} +def zip[b](that: List[b]): List[Pair[a,b]] = + if (this.isEmpty || that.isEmpty) Nil + else Pair(this.head, that.head) :: (this.tail zip that.tail); +\end{lstlisting} + +\paragraph{Consing lists.} +Like any infix operator, \code{::} +is also implemented as a method of an object. In this case, the object +is the list that is extended. This is possible, because operators +ending with a `\code{:}' character are treated specially in Scala. +All such operators are treated as methods of their right operand. E.g., +\begin{lstlisting} + x :: y = y.::(x) $\mbox{\rm whereas}$ x + y = x.+(y) +\end{lstlisting} +Note, however, that operands of a binary operation are in each case +evaluated from left to right. So, if \code{D} and \code{E} are +expressions with possible side-effects, \code{D :: E} is translated to +\lstinline@{val x = D; E.::(x)}@ in order to maintain the left-to-right +order of operand evaluation. + +Another difference between operators ending in a `\code{:}' and other +operators concerns their associativity. Operators ending in +`\code{:}' are right-associative, whereas other operators are +left-associative. E.g., +\begin{lstlisting} + x :: y :: z = x :: (y :: z) $\mbox{\rm whereas}$ x + y + z = (x + y) + z +\end{lstlisting} +The definition of \code{::} as a method in +class \code{List} is as follows: +\begin{lstlisting} +def ::[b >: a](x: b): List[b] = new scala.::(x, this); +\end{lstlisting} +Note that \code{::} is defined for all elements \code{x} of type +\code{B} and lists of type \code{List[A]} such that the type \code{B} +of \code{x} is a supertype of the list's element type \code{A}. The result +is in this case a list of \code{B}'s. This +is expressed by the type parameter \code{b} with lower bound \code{a} +in the signature of \code{::}. + +\paragraph{Concatenating lists} +An operation similar to \code{::} is list concatenation, written +`\code{:::}'. The result of \code{(xs ::: ys)} is a list consisting of +all elements of \code{xs}, followed by all elements of \code{ys}. +Because it ends in a colon, \code{:::} is right-associative and is +considered as a method of its right-hand operand. Therefore, +\begin{lstlisting} +xs ::: ys ::: zs = xs ::: (ys ::: zs) + = zs.:::(ys).:::(xs) +\end{lstlisting} +Here is the implementation of the \code{:::} method: +\begin{lstlisting} + def :::[b >: a](prefix: List[b]): List[b] = prefix match { + case Nil => this + case p :: ps => this.:::(ps).::(p) + } +\end{lstlisting} + +\paragraph{Reversing lists} Another useful operation +is list reversal. There is a method \code{reverse} in \code{List} to +that effect. Let's try to give its implementation: +\begin{lstlisting} +def reverse[a](xs: List[a]): List[a] = xs match { + case Nil => Nil + case x :: xs => reverse(xs) ::: List(x) +} +\end{lstlisting} +This implementation has the advantage of being simple, but it is not +very efficient. Indeed, one concatenation is executed for every +element in the list. List concatenation takes time proportional to the +length of its first operand. Therefore, the complexity of +\code{reverse(xs)} is +\[ +n + (n - 1) + ... + 1 = n(n+1)/2 +\] +where $n$ is the length of \code{xs}. Can \code{reverse} be +implemented more efficiently? We will see later that there exists +another implementation which has only linear complexity. + +\section{Example: Merge sort} + +The insertion sort presented earlier in this chapter is simple to +formulate, but also not very efficient. It's average complexity is +proportional to the square of the length of the input list. We now +design a program to sort the elements of a list which is more +efficient than insertion sort. A good algorithm for this is {\em merge +sort}, which works as follows. + +First, if the list has zero or one elements, it is already sorted, so +one returns the list unchanged. Longer lists are split into two +sub-lists, each containing about half the elements of the original +list. Each sub-list is sorted by a recursive call to the sort +function, and the resulting two sorted lists are then combined in a +merge operation. + +For a general implementation of merge sort, we still have to specify +the type of list elements to be sorted, as well as the function to be +used for the comparison of elements. We obtain a function of maximal +generality by passing these two items as parameters. This leads to the +following implementation. +\begin{lstlisting} +def msort[a](less: (a, a) => boolean)(xs: List[a]): List[a] = { + def merge(xs1: List[a], xs2: List[a]): List[a] = + if (xs1.isEmpty) xs2 + else if (xs2.isEmpty) xs1 + else if (less(xs1.head, xs2.head)) xs1.head :: merge(xs1.tail, xs2) + else xs2.head :: merge(xs1, xs2.tail); + val n = xs.length/2; + if (n == 0) xs + else merge(msort(less)(xs take n), msort(less)(xs drop n)) +} +\end{lstlisting} +The complexity of \code{msort} is $O(N;log(N))$, where $N$ is the +length of the input list. To see why, note that splitting a list in +two and merging two sorted lists each take time proportional to the +length of the argument list(s). Each recursive call of \code{msort} +halves the number of elements in its input, so there are $O(log(N))$ +consecutive recursive calls until the base case of lists of length 1 +is reached. However, for longer lists each call spawns off two +further calls. Adding everything up we obtain that at each of the +$O(log(N))$ call levels, every element of the original lists takes +part in one split operation and in one merge operation. Hence, every +call level has a total cost proportional to $O(N)$. Since there are +$O(log(N))$ call levels, we obtain an overall cost of +$O(N;log(N))$. That cost does not depend on the initial distribution +of elements in the list, so the worst case cost is the same as the +average case cost. This makes merge sort an attractive algorithm for +sorting lists. + +Here is an example how \code{msort} is used. +\begin{lstlisting} +msort(x: int, y: int => x < y)(List(5, 7, 1, 3)) +\end{lstlisting} +The definition of \code{msort} is curried, to make it easy to specialize it with particular +comparison functions. For instance, +\begin{lstlisting} + +val intSort = msort(x: int, y: int => x < y) +val reverseSort = msort(x: int, y: int => x > y) +\end{lstlisting} + +\section{Definition of class List II: Higher-Order Methods} + +The examples encountered so far show that functions over lists often +have similar structures. We can identify several patterns of +computation over lists, like: +\begin{itemize} + \item transforming every element of a list in some way. + \item extracting from a list all elements satisfying a criterion. + \item combine the elements of a list using some operator. +\end{itemize} +Functional programming languages enable programmers to write eneral +functions which implement patterns like this by means of higher order +functions. We now discuss a set of commonly used higher-order +functions, which are implemented as methods in class \code{List}. + +\paragraph{Mapping over lists} +A common operation is to transform each element of a list and then +return the lists of results. For instance, to scale each element of a +list by a given factor. +\begin{lstlisting} +def scaleList(xs: List[double], factor: double): List[double] = xs match { + case Nil => xs + case x :: xs1 => x * factor :: scaleList(xs1, factor) +} +\end{lstlisting} +This pattern can be generalized to the \code{map} method of class \code{List}: +\begin{lstlisting} +abstract class List[a] { ... + def map[b](f: a => b): List[b] = this match { + case Nil => this + case x :: xs => f(x) :: xs.map(f) + } +\end{lstlisting} +Using \code{map}, \code{scaleList} can be more consisely written as follows. +\begin{lstlisting} +def scaleList(xs: List[double], factor: double) = + xs map (x => x * factor) +\end{lstlisting} + +As another example, consider the problem of returning a given column +of a matrix which is represented as a list of rows, where each row is +again a list. This is done by the following function \code{column}. + +\begin{lstlisting} +def column[a](xs: List[List[a[]], index: int): List[a] = + xs map (row => row at index) +\end{lstlisting} + +Closely related to \code{map} is the \code{foreach} method, which +applies a given function to all elements of a list, but does not +construct a list of results. The function is thus applied only for its +side effect. \code{foreach} is defined as follows. +\begin{lstlisting} + def foreach(f: a => unit): unit = this match { + case Nil => () + case x :: xs => f(x) ; xs.foreach(f) + } +\end{lstlisting} +This function can be used for printing all elements of a list, for instance: +\begin{lstlisting} + xs foreach (x => System.out.println(x)) +\end{lstlisting} + +\begin{exercise} Consider a function which squares all elements of a list and +returns a list with the results. Complete the following two equivalent +definitions of \code{squareList}. + +\begin{lstlisting} +def squareList(xs: List[int]): List[int] = xs match { + case List() => ?? + case y :: ys => ?? +} +def squareList(xs: List[int]): List[int] = + xs map ?? +\end{lstlisting} +\end{exercise} + +\paragraph{Filtering Lists} +Another common operation selects from a list all elements fulfilling a +given criterion. For instance, to return a list of all positive +elements in some given lists of integers: +\begin{lstlisting} +def posElems(xs: List[int]): List[int] = xs match { + case Nil => xs + case x :: xs1 => if (x > 0) x :: posElems(xs1) else posElems(xs1) +} +\end{lstlisting} +This pattern is generalized to the \code{filter} method of class \code{List}: +\begin{lstlisting} + def filter(p: a => boolean): List[a] = this match { + case Nil => this + case x :: xs => if (p(x)) x :: xs.filter(p) else xs.filter(p) + } +\end{lstlisting} +Using \code{filter}, \code{posElems} can be more consisely written as +follows. +\begin{lstlisting} +def posElems(xs: List[int]): List[int] = + xs filter (x => x > 0) +\end{lstlisting} + +An operation related to filtering is testing whether all elements of a +list satisfy a certain condition. Dually, one might also be interested +in the question whether there exists an element in a list that +satisfies a certain condition. These operations are embodied in the +higher-order functions \code{forall} and \code{exists} of class +\code{List}. +\begin{lstlisting} +def forall(p: a => Boolean): Boolean = + isEmpty || (p(head) && (tail forall p)); +def exists(p: a => Boolean): Boolean = + !isEmpty && (p(head) || (tail exists p)); +\end{lstlisting} +To illustrate the use of \code{forall}, consider the question whether +a number if prime. Remember that a number $n$ is prime of it can be +divided without remainder only by one and itself. The most direct +translation of this definition would test that $n$ divided by all +numbers from 2 upto and excluding itself gives a non-zero +remainder. This list of numbers can be generated using a function +\code{List.range} which is defined in object \code{List} as follows. +\begin{lstlisting} +package scala; +object List { ... + def range(from: int, end: int): List[int] = + if (from >= end) Nil else from :: range(from + 1, end); +\end{lstlisting} +For example, \code{List.range(2, n)} +generates the list of all integers from 2 upto and excluding $n$. +The function \code{isPrime} can now simply be defined as follows. +\begin{lstlisting} +def isPrime(n: int) = + List.range(2, n) forall (x => n % x != 0) +\end{lstlisting} +We see that the mathematical definition of prime-ness has been +translated directly into Scala code. + +Exercise: Define \code{forall} and \code{exists} in terms of \code{filter}. + + +\paragraph{Folding and Reducing Lists} +Another common operation is to combine the elements of a list with +some operator. For instance: +\begin{lstlisting} +sum(List(x$_1$, ..., x$_n$)) = 0 + x$_1$ + ... + x$_n$ +product(List(x$_1$, ..., x$_n$)) = 1 * x$_1$ * ... * x$_n$ +\end{lstlisting} +Of course, we can implement both functions with a +recursive scheme: +\begin{lstlisting} +def sum(xs: List[int]): int = xs match { + case Nil => 0 + case y :: ys => y + sum(ys) +} +def product(xs: List[int]): int = xs match { + case Nil => 1 + case y :: ys => y * product(ys) +} +\end{lstlisting} +But we can also use the generaliztion of this program scheme embodied +in the \code{reduceLeft} method of class \code{List}. This method +inserts a given binary operator between adjacent elements of a given list. +E.g.\ +\begin{lstlisting} +List(x$_1$, ..., x$_n$).reduceLeft(op) = (...(x$_1$ op x$_2$) op ... ) op x$_n$ +\end{lstlisting} +Using \code{reduceLeft}, we can make the common pattern +in \code{sum} and \code{product} apparent: +\begin{lstlisting} +def sum(xs: List[int]) = (0 :: xs) reduceLeft {(x, y) => x + y} +def product(xs: List[int]) = (1 :: xs) reduceLeft {(x, y) => x * y} +\end{lstlisting} +Here is the implementation of \code{reduceLeft}. +\begin{lstlisting} + def reduceLeft(op: (a, a) => a): a = this match { + case Nil => error("Nil.reduceLeft") + case x :: xs => (xs foldLeft x)(op) + } + def foldLeft[b](z: b)(op: (b, a) => b): b = this match { + case Nil => z + case x :: xs => (xs foldLeft op(z, x))(op) + } +} +\end{lstlisting} +We see that the \code{reduceLeft} method is defined in terms of +another generally useful method, \code{foldLeft}. The latter takes as +additional parameter an {\em accumulator} \code{z}, which is returned +when \code{foldLeft} is applied on an empty list. That is, +\begin{lstlisting} +(List(x$_1$, ..., x$_n$) foldLeft z)(op) = (...(z op x$_1$) op ... ) op x$_n$ +\end{lstlisting} +The \code{sum} and \code{product} methods can be defined alternatively +using \code{foldLeft}: +\begin{lstlisting} +def sum(xs: List[int]) = (xs foldLeft 0) {(x, y) => x + y} +def product(xs: List[int]) = (xs foldLeft 1) {(x, y) => x * y} +\end{lstlisting} + +\paragraph{FoldRight and ReduceRight} +Applications of \code{foldLeft} and \code{reduceLeft} expand to +left-leaning trees. \todo{insert pictures}. They have duals +\code{foldRight} and \code{reduceRight}, which produce right-leaning +trees. +\begin{lstlisting} +List(x$_1$, ..., x$_n$).reduceRight(op) = x$_1$ op ( ... (x$_{n-1}$ op x$_n$)...) +(List(x$_1$, ..., x$_n$) foldRight acc)(op) = x$_1$ op ( ... (x$_n$ op acc)...) +\end{lstlisting} +These are defined as follows. +\begin{lstlisting} + def reduceRight(op: (a, a) => a): a = match + case Nil => error("Nil.reduceRight") + case x :: Nil => x + case x :: xs => op(x, xs.reduceRight(op)) + } + def foldRight[b](z: b)(op: (a, b) => b): b = match { + case Nil => z + case x :: xs => op(x, (xs foldRight z)(op)) + } +\end{lstlisting} + +Class \code{List} defines also two symbolic abbreviations for +\code{foldLeft} and \code{foldRight}: +\begin{lstlisting} + def /:[b](z: b)(f: (b, a) => b): b = foldLeft(z)(f); + def :\[b](z: b)(f: (a, b) => b): b = foldRight(z)(f); +\end{lstlisting} +The method names picture the left/right leaning trees of the fold +operations by forward or backward slashes. The \code{:} points in each +case to the list argument whereas the end of the slash points to the +accumulator (or: zero) argument \code{z}. +That is, +\begin{lstlisting} +(z /: List(x$_1$, ..., x$_n$))(op) = (...(z op x$_1$) op ... ) op x$_n$ +(List(x$_1$, ..., x$_n$) :\ z)(op) = x$_1$ op ( ... (x$_n$ op acc)...) +\end{lstlisting} +For associative and commutative operators, \code{/:} and +\code{:\\} are equivalent (even though there may be a difference +in efficiency). But sometimes, only one of the two operators is +appropriate or has the right type: + +\begin{exercise} Consider the problem of writing a function \code{flatten}, +which takes a list of element lists as arguments. The result of +\code{flatten} should be the concatenation of all element lists into a +single list. Here is the an implementation of this method in terms of +\code{:\\}. +\begin{lstlisting} +def flatten[a](xs: List[List[a]]): List[a] = + (xs :\ Nil) {(x, xs) => x ::: xs} +\end{lstlisting} +In this case it is not possible to replace the application of +\code{:\\} with \code{/:}. Explain why. + +In fact \code{flatten} is predefined together with a set of other +userful function in an object called \code{List} in the standatd Scala +library. It can be accessed from user program by calling +\code{List.flatten}. Note that \code{flatten} is not a method of class +\code{List} -- it would not make sense there, since it applies only +to lists of lists, not to all lists in general. +\end{exercise} + +\paragraph{List Reversal Again} We have seen in +Section~\ref{sec:list-first-order} an implementation of method +\code{reverse} whose run-time was quadratic in the length of the list +to be reversed. We now develop a new implementation of \code{reverse}, +which has linear cost. The idea is to use a \code{foldLeft} +operation based on the following program scheme. +\begin{lstlisting} +class List[+a] { ... + def reverse: List[a] = (z? /: this)(op?) +\end{lstlisting} +It only remains to fill in the \code{z?} and \code{op?} parts. Let's +try to deduce them from examples. +\begin{lstlisting} + Nil += Nil.reverse // by specification += (z /: Nil)(op) // by the template for reverse += (Nil foldLeft z)(op) // by the definition of /: += z // by definition of foldLeft +\end{lstlisting} +Hence, \code{z?} must be \code{Nil}. To deduce the second operand, +let's study reversal of a list of length one. +\begin{lstlisting} + List(x) += List(x).reverse // by specification += (Nil /: List(x))(op) // by the template for reverse, with z = Nil += (List(x) foldLeft Nil)(op) // by the definition of /: += op(Nil, x) // by definition of foldLeft +\end{lstlisting} +Hence, \code{op(Nil, x)} equals \code{List(x)}, which is the same +as \code{x :: Nil}. This suggests to take as \code{op} the +\code{::} operator with its operands exchanged. Hence, we arrive at +the following implementation for \code{reverse}, which has linear complexity. +\begin{lstlisting} +def reverse: List[a] = + ((Nil: List[a]) /: this) {(xs, x) => x :: xs} +\end{lstlisting} +(Remark: The type annotation of \code{Nil} is necessary +to make the type inferencer work.) + +\begin{exercise} Fill in the missing expressions to complete the following +definitions of some basic list-manipulation operations as fold +operations. +\begin{lstlisting} +def mapFun[a, b](xs: List[a], f: a => b): List[b] = + (xs :\ List[b]()){ ?? } + +def lengthFun[a](xs: List[a]): int = + (0 /: xs){ ?? } +\end{lstlisting} +\end{exercise} + +\paragraph{Nested Mappings} + +We can employ higher-order list processing functions to express many +computations that are normally expressed as nested loops in imperative +languages. + +As an example, consider the following problem: Given a positive +integer $n$, find all pairs of positive integers $i$ and $j$, where +$1 \leq j < i < n$ such that $i + j$ is prime. For instance, if $n = 7$, +the pairs are +\bda{c|lllllll} +i & 2 & 3 & 4 & 4 & 5 & 6 & 6\\ +j & 1 & 2 & 1 & 3 & 2 & 1 & 5\\ \hline +i + j & 3 & 5 & 5 & 7 & 7 & 7 & 11 +\eda + +A natural way to solve this problem consists of two steps. In a first step, +one generates the sequence of all pairs $(i, j)$ of integers such that +$1 \leq j < i < n$. In a second step one then filters from this sequence +all pairs $(i, j)$ such that $i + j$ is prime. + +Looking at the first step in more detail, a natural way to generate +the sequence of pairs consists of three sub-steps. First, generate +all integers between $1$ and $n$ for $i$. +\item +Second, for each integer $i$ between $1$ and $n$, generate the list of +pairs $(i, 1)$ up to $(i, i-1)$. This can be achieved by a +combination of \code{range} and \code{map}: +\begin{lstlisting} + List.range(1, i) map (x => Pair(i, x)) +\end{lstlisting} +Finally, combine all sublists using \code{foldRight} with \code{:::}. +Putting everything together gives the following expression: +\begin{lstlisting} +List.range(1, n) + .map(i => List.range(1, i).map(x => Pair(i, x))) + .foldRight(List[Pair[int, int]]()) {(xs, ys) => xs ::: ys} + .filter(pair => isPrime(pair._1 + pair._2)) +\end{lstlisting} + +\paragraph{Flattening Maps} +The combination of mapping and then concatenating sublists +resulting from the map +is so common that we there is a special method +for it in class \code{List}: +\begin{lstlisting} +abstract class List[+a] { ... + def flatMap[b](f: a => List[b]): List[b] = match { + case Nil => Nil + case x :: xs => f(x) ::: (xs flatMap f) + } +} +\end{lstlisting} +With \code{flatMap}, the pairs-whose-sum-is-prime expression +could have been written more concisely as follows. +\begin{lstlisting} +List.range(1, n) + .flatMap(i => List.range(1, i).map(x => Pair(i, x))) + .filter(pair => isPrime(pair._1 + pair._2)) +\end{lstlisting} + + + +\section{Summary} + +This chapter has ingtroduced lists as a fundamental data structure in +programming. Since lists are immutable, they are a common data type in +functional programming languages. They play there a role comparable to +arrays in imperative languages. However, the access patterns between +arrays and lists are quite different. Where array accessing is always +done by indexing, this is much less common for lists. We have seen +that \code{scala.List} defines a method called \code{apply} for indexing; +however this operation is much more costly than in the case of arrays +(linear as opposed to constant time). Instead of indexing, lists are +usually traversed recursively, where recursion steps are usually based +on a pattern match over the traversed list. There is also a rich set of +higher-order combinators which allow one to instantiate a set of +predefined patterns of computations over lists. + +\comment{ +\bsh{Reasoning About Lists} + +Recall the concatenation operation for lists: + +\begin{lstlisting} +class List[+a] { + ... + def ::: (that: List[a]): List[a] = + if (isEmpty) that + else head :: (tail ::: that) +} +\end{lstlisting} + +We would like to verify that concatenation is associative, with the +empty list \code{List()} as left and right identity: +\bda{lcl} + (xs ::: ys) ::: zs &=& xs ::: (ys ::: zs) \\ + xs ::: List() &=& xs \gap =\ List() ::: xs +\eda +\emph{Q}: How can we prove statements like the one above? + +\emph{A}: By \emph{structural induction} over lists. +\es +\bsh{Reminder: Natural Induction} + +Recall the proof principle of \emph{natural induction}: + +To show a property \mathtext{P(n)} for all numbers \mathtext{n \geq b}: +\be +\item Show that \mathtext{P(b)} holds (\emph{base case}). +\item For arbitrary \mathtext{n \geq b} show: +\begin{quote} + if \mathtext{P(n)} holds, then \mathtext{P(n+1)} holds as well +\end{quote} +(\emph{induction step}). +\ee +%\es\bs +\emph{Example}: Given +\begin{lstlisting} +def factorial(n: int): int = + if (n == 0) 1 + else n * factorial(n-1) +\end{lstlisting} +show that, for all \code{n >= 4}, +\begin{lstlisting} + factorial(n) >= 2$^n$ +\end{lstlisting} +\es\bs +\Case{\code{4}} +is established by simple calculation of \code{factorial(4) = 24} and \code{2$^4$ = 16}. + +\Case{\code{n+1}} +We have for \code{n >= 4}: +\begin{lstlisting} + \= factorial(n + 1) + = \> $\expl{by the second clause of factorial(*)}$ + \> (n + 1) * factorial(n) + >= \> $\expl{by calculation}$ + \> 2 * factorial(n) + >= \> $\expl{by the induction hypothesis}$ + \> 2 * 2$^n$. +\end{lstlisting} +Note that in our proof we can freely apply reduction steps such as in (*) +anywhere in a term. + + +This works because purely functional programs do not have side +effects; so a term is equivalent to the term it reduces to. + +The principle is called {\em\emph{referential transparency}}. +\es +\bsh{Structural Induction} + +The principle of structural induction is analogous to natural induction: + +In the case of lists, it is as follows: + +To prove a property \mathtext{P(xs)} for all lists \mathtext{xs}, +\be +\item Show that \code{P(List())} holds (\emph{base case}). +\item For arbitrary lists \mathtext{xs} and elements \mathtext{x} + show: +\begin{quote} + if \mathtext{P(xs)} holds, then \mathtext{P(x :: xs)} holds as well +\end{quote} +(\emph{induction step}). +\ee + +\es +\bsh{Example} + +We show \code{(xs ::: ys) ::: zs = xs ::: (ys ::: zs)} by structural induction +on \code{xs}. + +\Case{\code{List()}} +For the left-hand side, we have: +\begin{lstlisting} + \= (List() ::: ys) ::: zs + = \> $\expl{by first clause of \prog{:::}}$ + \> ys ::: zs +\end{lstlisting} +For the right-hand side, we have: +\begin{lstlisting} + \= List() ::: (ys ::: zs) + = \> $\expl{by first clause of \prog{:::}}$ + \> ys ::: zs +\end{lstlisting} +So the case is established. + +\es +\bs +\Case{\code{x :: xs}} + +For the left-hand side, we have: +\begin{lstlisting} + \= ((x :: xs) ::: ys) ::: zs + = \> $\expl{by second clause of \prog{:::}}$ + \> (x :: (xs ::: ys)) ::: zs + = \> $\expl{by second clause of \prog{:::}}$ + \> x :: ((xs ::: ys) ::: zs) + = \> $\expl{by the induction hypothesis}$ + \> x :: (xs ::: (ys ::: zs)) +\end{lstlisting} + +For the right-hand side, we have: +\begin{lstlisting} + \= (x :: xs) ::: (ys ::: zs) + = \> $\expl{by second clause of \prog{:::}}$ + \> x :: (xs ::: (ys ::: zs)) +\end{lstlisting} +So the case (and with it the property) is established. + +\begin{exercise} +Show by induction on \code{xs} that \code{xs ::: List() = xs}. +\es +\bsh{Example (2)} +\end{exercise} + +As a more difficult example, consider function +\begin{lstlisting} +abstract class List[a] { ... + def reverse: List[a] = match { + case List() => List() + case x :: xs => xs.reverse ::: List(x) + } +} +\end{lstlisting} +We would like to prove the proposition that +\begin{lstlisting} + xs.reverse.reverse = xs . +\end{lstlisting} +We proceed by induction over \code{xs}. The base case is easy to establish: +\begin{lstlisting} + \= List().reverse.reverse + = \> $\expl{by first clause of \prog{reverse}}$ + \> List().reverse + = \> $\expl{by first clause of \prog{reverse}}$ + \> List() +\end{lstlisting} +\es\bs +For the induction step, we try: +\begin{lstlisting} + \= (x :: xs).reverse.reverse + = \> $\expl{by second clause of \prog{reverse}}$ + \> (xs.reverse ::: List(x)).reverse +\end{lstlisting} +There's nothing more we can do to this expression, so we turn to the right side: +\begin{lstlisting} + \= x :: xs + = \> $\expl{by induction hypothesis}$ + \> x :: xs.reverse.reverse +\end{lstlisting} +The two sides have simplified to different expressions. + +So we still have to show that +\begin{lstlisting} + (xs.reverse ::: List(x)).reverse = x :: xs.reverse.reverse +\end{lstlisting} +Trying to prove this directly by induction does not work. + +Instead we have to {\em generalize} the equation to: +\begin{lstlisting} + (ys ::: List(x)).reverse = x :: ys.reverse +\end{lstlisting} +\es\bs +This equation can be proved by a second induction argument over \code{ys}. +(See blackboard). + +\begin{exercise} +Is it the case that \code{(xs drop m) at n = xs at (m + n)} for all +natural numbers \code{m}, \code{n} and all lists \code{xs}? +\end{exercise} + +\es +\bsh{Structural Induction on Trees} + +Structural induction is not restricted to lists; it works for arbitrary +trees. + +The general induction principle is as follows. + +To show that property \code{P(t)} holds for all trees of a certain type, +\begin{itemize} +\item Show \code{P(l)} for all leaf trees \code{$l$}. +\item For every interior node \code{t} with subtrees \code{s$_1$, ..., s$_n$}, + show that \code{P(s$_1$) $\wedge$ ... $\wedge$ P(s$_n$) => P(t)}. +\end{itemize} + +\example Recall our definition of \code{IntSet} with +operations \code{contains} and \code{incl}: + +\begin{lstlisting} +abstract class IntSet { + abstract def incl(x: int): IntSet + abstract def contains(x: int): boolean +} +\end{lstlisting} +\es\bs +\begin{lstlisting} +case class Empty extends IntSet { + def contains(x: int): boolean = false + def incl(x: int): IntSet = NonEmpty(x, Empty, Empty) +} +case class NonEmpty(elem: int, left: Set, right: Set) extends IntSet { + def contains(x: int): boolean = + if (x < elem) left contains x + else if (x > elem) right contains x + else true + def incl(x: int): IntSet = + if (x < elem) NonEmpty(elem, left incl x, right) + else if (x > elem) NonEmpty(elem, left, right incl x) + else this +} +\end{lstlisting} +(With \code{case} added, so that we can use factory methods instead of \code{new}). + +What does it mean to prove the correctness of this implementation? +\es +\bsh{Laws of IntSet} + +One way to state and prove the correctness of an implementation is +to prove laws that hold for it. + +In the case of \code{IntSet}, three such laws would be: + +For all sets \code{s}, elements \code{x}, \code{y}: + +\begin{lstlisting} +Empty contains x \= = false +(s incl x) contains x \> = true +(s incl x) contains y \> = s contains y if x $\neq$ y +\end{lstlisting} + +(In fact, one can show that these laws characterize the desired data +type completely). + +How can we establish that these laws hold? + +\emph{Proposition 1}: \code{Empty contains x = false}. + +\emph{Proof}: By the definition of \code{contains} in \code{Empty}. +\es\bs +\emph{Proposition 2}: \code{(xs incl x) contains x = true} + +\emph{Proof:} + +\Case{\code{Empty}} +\begin{lstlisting} + \= (Empty incl x) contains x + = \> $\expl{by definition of \prog{incl} in \prog{Empty}}$ + \> NonEmpty(x, Empty, Empty) contains x + = \> $\expl{by definition of \prog{contains} in \prog{NonEmpty}}$ + \> true +\end{lstlisting} + +\Case{\code{NonEmpty(x, l, r)}} +\begin{lstlisting} + \= (NonEmpty(x, l, r) incl x) contains x + = \> $\expl{by definition of \prog{incl} in \prog{NonEmpty}}$ + \> NonEmpty(x, l, r) contains x + = \> $\expl{by definition of \prog{contains} in \prog{Empty}}$ + \> true +\end{lstlisting} +\es\bs +\Case{\code{NonEmpty(y, l, r)} where \code{y < x}} +\begin{lstlisting} + \= (NonEmpty(y, l, r) incl x) contains x + = \> $\expl{by definition of \prog{incl} in \prog{NonEmpty}}$ + \> NonEmpty(y, l, r incl x) contains x + = \> $\expl{by definition of \prog{contains} in \prog{NonEmpty}}$ + \> (r incl x) contains x + = \> $\expl{by the induction hypothesis}$ + \> true +\end{lstlisting} + +\Case{\code{NonEmpty(y, l, r)} where \code{y > x}} is analogous. + +\bigskip + +\emph{Proposition 3}: If \code{x $\neq$ y} then +\code{xs incl y contains x = xs contains x}. + +\emph{Proof:} See blackboard. +\es +\bsh{Exercise} + +Say we add a \code{union} function to \code{IntSet}: + +\begin{lstlisting} +class IntSet { ... + def union(other: IntSet): IntSet +} +class Expty extends IntSet { ... + def union(other: IntSet) = other +} +class NonEmpty(x: int, l: IntSet, r: IntSet) extends IntSet { ... + def union(other: IntSet): IntSet = l union r union other incl x +} +\end{lstlisting} + +The correctness of \code{union} can be subsumed with the following +law: + +\emph{Proposition 4}: +\code{(xs union ys) contains x = xs contains x || ys contains x}. +Is that true ? What hypothesis is missing ? Show a counterexample. + +Show Proposition 4 using structural induction on \code{xs}. +\es +\comment{ + +\emph{Proof:} By induction on \code{xs}. + +\Case{\code{Empty}} + +\Case{\code{NonEmpty(x, l, r)}} + +\Case{\code{NonEmpty(y, l, r)} where \code{y < x}} + +\begin{lstlisting} + \= (Empty union ys) contains x + = \> $\expl{by definition of \prog{union} in \prog{Empty}}$ + \> ys contains x + = \> $\expl{Boolean algebra}$ + \> false || ys contains x + = \> $\expl{by definition of \prog{contains} in \prog{Empty} (reverse)}$ + \> (Empty contains x) || (ys contains x) +\end{lstlisting} + +\begin{lstlisting} + \= (NonEmpty(x, l, r) union ys) contains x + = \> $\expl{by definition of \prog{union} in \prog{NonEmpty}}$ + \> (l union r union ys incl x) contains x + = \> $\expl{by Proposition 2}$ + \> true + = \> $\expl{Boolean algebra}$ + \> true || (ys contains x) + = \> $\expl{by definition of \prog{contains} in \prog{NonEmpty} (reverse)}$ + \> (NonEmpty(x, l, r) contains x) || (ys contains x) +\end{lstlisting} + +\begin{lstlisting} + \= (NonEmpty(y, l, r) union ys) contains x + = \> $\expl{by definition of \prog{union} in \prog{NonEmpty}}$ + \> (l union r union ys incl y) contains x + = \> $\expl{by Proposition 3}$ + \> (l union r union ys) contains x + = \> $\expl{by the induction hypothesis}$ + \> ((l union r) contains x) || (ys contains x) + = \> $\expl{by Proposition 3}$ + \> ((l union r incl y) contains x) || (ys contains x) +\end{lstlisting} + +\Case{\code{NonEmpty(y, l, r)} where \code{y < x}} + ... is analogous. + +\es +}} +\chapter{\label{sec:for-notation}For-Comprehensions} + +The last chapter demonstrated that higher-order functions such as +\verb@map@, \verb@flatMap@, \verb@filter@ provide powerful +constructions for dealing with lists. But sometimes the level of +abstraction required by these functions makes a program hard to +understand. + +To help understandbility, Scala has a special notation which +simplifies common patterns of applications of higher-order functions. +This notation builds a bridge between set-comprehensions in +mathematics and for-loops in imperative languages such as C or +Java. It also closely resembles the query notation of relational +databases. + +As a first example, say we are given a list \code{persons} of persons +with \code{name} and \code{age} fields. To print the names of all +persons in the sequence which are aged over 20, one can write: +\begin{lstlisting} +for (val p <- persons; p.age > 20) yield p.name +\end{lstlisting} +This is equivalent to the following expression , which uses +higher-order functions \code{filter} and \code{map}: +\begin{lstlisting} +persons filter (p => p.age > 20) map (p => p.name) +\end{lstlisting} +The for-comprehension looks a bit like a for-loop in imperative languages, +except that it constructs a list of the results of all iterations. + +Generally, a for-comprehension is of the form +\begin{lstlisting} +for ( $s$ ) yield $e$ +\end{lstlisting} +Here, $s$ is a sequence of {\em generators} and {\em filters}. A {\em +generator} is of the form \code{val x <- e}, where \code{e} is a +list-valued expression. It binds \code{x} to successive values in the +list. A {\em filter} is an expression \code{f} of type +\code{boolean}. It omits from consideration all bindings for which +\code{f} is \code{false}. The sequence $s$ starts in each case with a +generator. If there are several generators in a sequence, later +generators vary more rapidly than earlier ones. + +Here are two examples that show how for-comprehensions are used. +First, let's redo an example of the previous chapter: Given a positive +integer $n$, find all pairs of positive integers $i$ and $j$, where $1 +\leq j < i < n$ such that $i + j$ is prime. With a for-comprehension +this problem is solved as follows: +\begin{lstlisting} +for (val i <- List.range(1, n); + val j <- List.range(1, i); + isPrime(i+j)) yield Pair(i, j) +\end{lstlisting} +This is arguably much clearer than the solution using \code{map}, +\code{flatMap} and \code{filter} that we have developed previously. + +As a second example, consider computing the scalar product of two +vectors \code{xs} and \code{ys}. Using a for-comprehension, this can +be written as follows. +\begin{lstlisting} + sum (for(val (x, y) <- xs zip ys) yield x * y) +\end{lstlisting} + +\section{The N-Queens Problem} + +For-comprehensions are especially useful for solving combinatorial +puzzles. An example of such a puzzle is the 8-queens problem: Given a +standard chessboard, place 8 queens such that no queen is in check from any +other (a queen can check another piece if they are on the same +column, row, or diagional). We will now develop a solution to this +problem, generalizing it to chessboards of arbitrary size. Hence, the +problem is to place $n$ queens on a chessboard of size $n \times n$. + +To solve this problem, note that we need to place a queen in each row. +So we could place queens in successive rows, each time checking that a +newly placed queen is not in queck from any other queens that have +already been placed. In the course of this search, it might arrive +that a queen to be placed in row $k$ would be in check in all fields +of that row from queens in row $1$ to $k-1$. In that case, we need to +abort that part of the search in order to continue with a different +configuration of queens in columns $1$ to $k-1$. + +This suggests a recursive algorithm. Assume that we have already +generated all solutions of placing $k-1$ queens on a board of size $n +\times n$. We can represent each such solution by a list of length +$k-1$ of column numbers (which can range from $1$ to $n$). We treat +these partial solution lists as stacks, where the column number of the +queen in row $k-1$ comes first in the list, followed by the column +number of the queen in row $k-2$, etc. The bottom of the stack is the +column number of the queen placed in the first row of the board. All +solutions together are then represented as a list of lists, with one +element for each solution. + +Now, to place the $k$'the queen, we generate all possible extensions +of each previous solution by one more queen. This yields another list +of solution lists, this time of length $k$. We continue the process +until we have reached solutions of the size of the chessboard $n$. +This algorithmic idea is embodied in function \code{placeQueens} below: +\begin{lstlisting} +def queens(n: int): List[List[int]] = { + def placeQueens(k: int): List[List[int]] = + if (k == 0) List(List()) + else for (val queens <- placeQueens(k - 1); + val column <- List.range(1, n + 1); + isSafe(column, queens, 1)) yield col :: queens; + placeQueens(n); +} +\end{lstlisting} + +\begin{exercise} Write the function +\begin{lstlisting} + def isSafe(col: int, queens: List[int], delta: int): boolean +\end{lstlisting} +which tests whether a queen in the given column \verb@col@ is safe with +respect to the \verb@queens@ already placed. Here, \verb@delta@ is the difference between the row of the queen to be +placed and the row of the first queen in the list. +\end{exercise} + +\section{Querying with For-Comprehensions} + +The for-notation is essentially equivalent to common operations of +database query languages. For instance, say we are given a +database \code{books}, represented as a list of books, where +\code{Book} is defined as follows. +\begin{lstlisting} +case class Book(title: String, authors: List[String]); +\end{lstlisting} +Here is a small example database: +\begin{lstlisting} +val books: List[Book] = List( + Book("Structure and Interpretation of Computer Programs", + List("Abelson, Harald", "Sussman, Gerald J.")), + Book("Principles of Compiler Design", + List("Aho, Alfred", "Ullman, Jeffrey")), + Book("Programming in Modula-2", + List("Wirth, Niklaus")), + Book("Introduction to Functional Programming"), + List("Bird, Richard")), + Book("The Java Language Specification", + List("Gosling, James", "Joy, Bill", "Steele, Guy", "Bracha, Gilad"))); +\end{lstlisting} +Then, to find the titles of all books whose author's last name is ``Ullman'': +\begin{lstlisting} +for (val b <- books; val a <- b.authors; a startsWith "Ullman") +yield b.title +\end{lstlisting} +(Here, \code{startsWith} is a method in \code{java.lang.String}). Or, +to find the titles of all books that have the string ``Program'' in +their title: +\begin{lstlisting} +for (val b <- books; (b.title indexOf "Program") >= 0) +yield b.title +\end{lstlisting} +Or, to find the names of all authors that have written at least two +books in the database. +\begin{lstlisting} +for (val b1 <- books; val b2 <- books; b1 != b2; + val a1 <- b1.authors; val a2 <- b2.authors; a1 == a2) +yield a1 +\end{lstlisting} +The last solution is not yet perfect, because authors will appear +several times in the list of results. We still need to remove +duplicate authors from result lists. This can be achieved with the +following function. +\begin{lstlisting} +def removeDuplicates[a](xs: List[a]): List[a] = + if (xs.isEmpty) xs + else xs.head :: removeDuplicates(xs.tail filter (x => x != xs.head)); +\end{lstlisting} +Note that the last expression in method \code{removeDuplicates} +can be equivalently expressed using a for-comprehension. +\begin{lstlisting} +xs.head :: removeDuplicates(for (val x <- xs.tail; x != xs.head) yield x) +\end{lstlisting} + +\section{Translation of For-Comprehensions} + +Every for-comprehension can be expressed in terms of the three +higher-order functions \code{map}, \code{flatMap} and \code{filter}. +Here is the translation scheme, which is also used by the Scala compiler. +\begin{itemize} +\item +A simple for-comprehension +\begin{lstlisting} +for (val x <- e) yield e' +\end{lstlisting} +is translated to +\begin{lstlisting} +e.map(x => e') +\end{lstlisting} +\item +A for-comprehension +\begin{lstlisting} +for (val x <- e; f; s) yield e' +\end{lstlisting} +where \code{f} is a filter and \code{s} is a (possibly empty) +sequence of generators or filters +is translated to +\begin{lstlisting} +for (val x <- e.filter(x => f); s) yield e' +\end{lstlisting} +and then translation continues with the latter expression. +\item +A for-comprehension +\begin{lstlisting} +for (val x <- e; y <- e'; s) yield e'' +\end{lstlisting} +where \code{s} is a (possibly empty) +sequence of generators or filters +is translated to +\begin{lstlisting} +e.flatMap(x => for (y <- e'; s) yield e'') +\end{lstlisting} +and then translation continues with the latter expression. +\end{itemize} +For instance, taking our "pairs of integers whose sum is prime" example: +\begin{lstlisting} +for { val i <- range(1, n); + val j <- range(1, i); + isPrime(i+j) +} yield (i, j) +\end{lstlisting} +Here is what we get when we translate this expression: +\begin{lstlisting} +range(1, n) + .flatMap(i => + range(1, i) + .filter(j => isPrime(i+j)) + .map(j => (i, j))) +\end{lstlisting} + +Conversely, it would also be possible to express functions \code{map}, +\code{flatMap}{ and \code{filter} using for-comprehensions. Here are the +three functions again, this time implemented using for-comprehensions. +\begin{lstlisting} +object Demo { + def map[a, b](xs: List[a], f: a => b): List[b] = + for (val x <- cs) yield f(x); + + def flatMap[a, b](xs: List[a], f: a => List[b]): List[b] = + for (val x <- xs; val y <- f(x)) yield y; + + def filter[a](xs: List[a], p: a => boolean): List[a] = + for (val x <- xs; p(x)) yield x; +} +\end{lstlisting} +Not surprisingly, the translation of the for-comprehension in the body of +\code{Demo.map} will produce a call to \code{map} in class \code{List}. +Similarly, \code{Demo.flatMap} and \code{Demo.filter} translate to +\code{flatMap} and \code{filter} in class \code{List}. + +\begin{exercise} +Define the following function in terms of \code{for}. +\begin{lstlisting} +def flatten(xss: List[List[a]]): List[a] = + (xss :\ List()) ((xs, ys) => xs ::: ys) +\end{lstlisting} +\end{exercise} + +\begin{exercise} +Translate +\begin{lstlisting} +for { val b <- books; val a <- b.authors; a startsWith "Bird" } yield b.title +for { val b <- books; (b.title indexOf "Program") >= 0 } yield b.title +\end{lstlisting} +to higher-order functions. +\end{exercise} + +\section{For-Loops}\label{sec:for-loops} + +For-comprehensions resemble for-loops in imperative languages, except +that they produce a list of results. Sometimes, a list of results is +not needed but we would still like the flexibility of generators and +filters in iterations over lists. This is made possible by a variant +of the for-comprehension syntax, which excpresses for-loops: +\begin{lstlisting} +for ( $s$ ) $e$ +\end{lstlisting} +This construct is the same as the standard for-comprehension syntax +except that the keyword \code{yield} is missing. The for-loop is +executed by executing the expression $e$ for each element generated +from the sequence of generators and filters $s$. + +As an example, the following expression prints out all elements of a +matrix represented as a list of lists: + \begin{lstlisting} +for (xs <- xss) { + for (x <- xs) System.out.print(x + "\t") + System.out.println() +} +\end{lstlisting} +The translation of for-loops to higher-order methods of class +\code{List} is similar to the translation of for-comprehensions, but +is simpler. Where for-comprehensions translate to \code{map} and +\code{flatMap}, for-loops translate in each case to \code{foreach}. + +\section{Generalizing For} + +We have seen that the translation of for-comprehensions only relies on +the presence of methods \code{map}, \code{flatMap}, and +\code{filter}. Therefore it is possible to apply the same notation to +generators that produce objects other than lists; these objects only +have to support the three key functions \code{map}, \code{flatMap}, +and \code{filter}. + +The standard Scala library has several other abstractions that support +these three methods and with them support for-comprehensions. We will +encounter some of them in the following chapters. As a programmer you +can also use this principle to enable for-comprehensions for types you +define -- these types just need to support methods \code{map}, +\code{flatMap}, and \code{filter}. + +There are many examples where this is useful: Examples are database +interfaces, XML trees, or optional values. We will see in +Chapter~\ref{sec:parsers-results} how for-comprehensions can be used +in the definition of parsers for context-free grammars that construct +abstract syntax trees. + +One caveat: It is not assured automatically that the result +translating a for-comprehension is well-typed. To ensure this, the +types of \code{map}, \code{flatMap} and \code{filter} have to be +essentially similar to the types of these methods in class \code{List}. + +To make this precise, assume you have a parameterized class + \code{C[a]} for which you want to enable for-comprehensions. Then + \code{C} should define \code{map}, \code{flatMap} and \code{filter} + with the following types: +\begin{lstlisting} +def map[b](f: a => b): C[b] +def flatMap[b](f: a => C[b]): C[b] +def filter(p: a => boolean): C[a] +\end{lstlisting} +It would be attractive to enforce these types statically in the Scala +compiler, for instance by requiring that any type supporting +for-comprehensions implements a standard trait with these methods +\footnote{In the programming language Haskell, which has similar +constructs, this abstraction is called a ``monad with zero''}. The +problem is that such a standard trait would have to abstract over the +identity of the class \code{C}, for instance by taking \code{C} as a +type parameter. Note that this parameter would be a type constructor, +which gets applied to {\em several different} types in the signatures of +methods \code{map} and \code{flatMap}. Unfortunately, the Scala type +system is too weak to express this construct, since it can handle only +type parameters which are fully applied types. + +\chapter{Mutable State} + +Most programs we have presented so for did not have side-effects +\footnote{We ignore here the fact that some of our program printed to +standard output, which technically is a side effect.}. Therefore, the +notion of {\em time} did not matter. For a program that terminates, +any sequence of actions would have led to the same result! This is +also reflected by the substitution model of computation, where a +rewrite step can be applied anywhere in a term, and all rewritings +that terminate lead to the same solution. In fact, this {\em +confluence} property is a deep result in $\lambda$-calculus, the +theory underlying functional programming. + +In this chapter, we introduce functions with side effects and study +their behavior. We will see that as a consequence we have to +fundamenatlly modify up the substitution model of computation which we +employed so far. + +\section{Stateful Objects} + +We normally view the world as a set of objects, some of which have +state that {\em changes} over time. Normally, state is associated +with a set of variables that can be changed in the course of a +computation. There is also a more abstract notion of state, which +does not refer to particular constructs of a programming language: An +object {\em has state} (or: {\em is stateful}) if its behavior is +influenced by its history. + +For instance, a bank account object has state, because the question +``can I withdraw 100 CHF?'' +might have different answers during the lifetime of the account. + +In Scala, all mutable state is ultimately built from variables. A +variable definition is written like a value definition, but starts +with \verb@var@ instead of \verb@val@. For instance, the following two +definitions introduce and initialize two variables \code{x} and +\code{count}. +\begin{lstlisting} +var x: String = "abc"; +var count = 111; +\end{lstlisting} +Like a value definition, a variable definition associates a name with +a value. But in the case of a variable definition, this association +may be changed later by an assignment. Such assignments are written +as in C or Java. Examples: +\begin{lstlisting} +x = "hello"; +count = count + 1; +\end{lstlisting} +In Scala, every defined variable has to be initialized at the point of +its definition. For instance, the statement ~\code{var x: int;}~ is +{\em not} regarded as a variable definition, because the initializer +is missing\footnote{If a statement like this appears in a class, it is +instead regarded as a variable declaration, which introcuces +abstract access methods for the variable, but does not associate these +methods with a piece of state.}. If one does not know, or does not +care about, the appropriate initializer, one can use a wildcard +instead. I.e. +\begin{lstlisting} +val x: T = _; +\end{lstlisting} +will initialize \code{x} to some default value (\code{null} for +reference types, \code{false} for booleans, and the appropriate +version of \code{0} for numeric value types). + +Real-world objects with state are represented in Scala by objects that +have variables as members. For instance, here is a class that +represents bank accounts. +\begin{lstlisting} +class BankAccount { + private var balance = 0; + def deposit(amount: int): unit = + if (amount > 0) balance = balance + amount; + + def withdraw(amount: int): int = + if (0 < amount && amount <= balance) { + balance = balance - amount; + balance + } else error("insufficient funds"); +} +\end{lstlisting} +The class defines a variable \code{balance} which contains the current +balance of an account. Methods \code{deposit} and \code{withdraw} +change the value of this variable through assignments. Note that +\code{balance} is \code{private} in class \code{BankAccount} -- hence +it can not be accessed directly outside the class. + +To create bank-accounts, we use the usual object creation notation: +\begin{lstlisting} +val myAccount = new BankAccount +\end{lstlisting} + +\example Here is a \code{scalaint} session that deals with bank +accounts. + +\begin{lstlisting} +> :l bankaccount.scala +loading file 'bankaccount.scala' +> val account = new BankAccount +val account : BankAccount = BankAccount$\Dollar$class@1797795 +> account deposit 50 +(): scala.Unit +> account withdraw 20 +30: scala.Int +> account withdraw 20 +10: scala.Int +> account withdraw 15 +java.lang.RuntimeException: insufficient funds + at error(Predef.scala:3) + at BankAccount$\Dollar$class.withdraw(bankaccount.scala:13) + at <top-level>(console:1) +> +\end{lstlisting} +The example shows that applying the same operation (\code{withdraw +20}) twice to an account yields different results. So, clearly, +accounts are stateful objects. + +\paragraph{Sameness and Change} +Assignments pose new problems in deciding when two expressions are +``the same''. +If assignments are excluded, and one writes +\begin{lstlisting} +val x = E; val y = E; +\end{lstlisting} +where \code{E} is some arbitrary expression, +then \code{x} and \code{y} can reasonably be assumed to be the same. +I.e. one could have equivalently written +\begin{lstlisting} +val x = E; val y = x; +\end{lstlisting} +(This property is usually called {\em referential transparency}). But +once we admit assignments, the two definition sequences are different. +Consider: +\begin{lstlisting} +val x = new BankAccount; val y = new BankAccount; +\end{lstlisting} +To answer the question whether \code{x} and \code{y} are the same, we +need to be more precise what ``sameness'' means. This meaning is +captured in the notion of {\em operational equivalence}, which, +somewhat informally, is stated as follows. + +Suppose we have two definitions of \code{x} and \code{y}. +To test whether \code{x} and \code{y} define the same value, proceed +as follows. +\begin{itemize} +\item +Execute the definitions followed by an +arbitrary sequence \code{S} of operations that involve \code{x} and +\code{y}. Observe the results (if any). +\item +Then, execute the definitions with another sequence \code{S'} which +results from \code{S} by renaming all occurrences of \code{y} in +\code{S} to \code{x}. +\item +If the results of running \code{S'} are different, then surely +\code{x} and \code{y} are different. +\item +On the other hand, if all possible pairs of sequences \code{(S, S')} +yield the same results, then \code{x} and \code{y} are the same. +\end{itemize} +In other words, operational equivalence regards two definitions +\code{x} and \code{y} as defining the same value, if no possible +experiment can distinguish between \code{x} and \code{y}. An +experiment in this context are two version of an arbitrary program which use either +\code{x} or \code{y}. + +Given this definition, let's test whether +\begin{lstlisting} +val x = new BankAccount; val y = new BankAccount; +\end{lstlisting} +defines values \code{x} and \code{y} which are the same. +Here are the definitions again, followed by a test sequence: + +\begin{lstlisting} +> val x = new BankAccount +> val y = new BankAccount +> x deposit 30 +30 +> y withdraw 20 +java.lang.RuntimeException: insufficient funds +\end{lstlisting} + +Now, rename all occurrences of \code{y} in that sequence to +\code{x}. We get: +\begin{lstlisting} +> val x = new BankAccount +> val y = new BankAccount +> x deposit 30 +30 +> x withdraw 20 +10 +\end{lstlisting} +Since the final results are different, we have established that +\code{x} and \code{y} are not the same. +On the other hand, if we define +\begin{lstlisting} +val x = new BankAccount; val y = x +\end{lstlisting} +then no sequence of operations can distinguish between \code{x} and +\code{y}, so \code{x} and \code{y} are the same in this case. + +\paragraph{Assignment and the Substitution Model} +These examples show that our previous substitution model of +computation cannot be used anymore. After all, under this +model we could always replace a value name by its +defining expression. +For instance in +\begin{lstlisting} +val x = new BankAccount; val y = x +\end{lstlisting} +the \code{x} in the definition of \code{y} could +be replaced by \code{new BankAccount}. +But we have seen that this change leads to a different program. +So the substitution model must be invalid, once we add assignments. + +\section{Imperative Control Structures} + +Scala has the \code{while} and \code{do-while} loop constructs known +from the C and Java languages. There is also a single branch \code{if} +which leaves out the else-part as well as a \code{return} statement which +aborts a function prematurely. This makes it possible to program in a +conventional imperative style. For instance, the following function, +which computes the \code{n}'th power of a given parameter \code{x}, is +implemented using \code{while} and single-branch \code{if}. +\begin{lstlisting} +def power (x: double, n: int): double = { + var r = 1.0; + var i = n; + while (i > 0) { + if ((i & 1) == 1) { r = r * x } + if (i > 1) r = r * r; + i = i >> 1; + } + r +} +\end{lstlisting} +These imperative control constructs are in the language for +convenience. They could have been left out, as the same constructs can +be implemented using just functions. As an example, let's develop a +functional implementation of the while loop. \code{whileLoop} should +be a function that takes two parameters: a condition, of type +\code{boolean}, and a command, of type \code{unit}. Both condition and +command need to be passed by-name, so that they are evaluated +repeatedly for each loop iteration. This leads to the following +definition of \code{whileLoop}. +\begin{lstlisting} +def whileLoop(def condition: boolean)(def command: unit): unit = + if (condition) { + command; whileLoop(condition)(command) + } else {} +\end{lstlisting} +Note that \code{whileLoop} is tail recursive, so it operates in +constant stack space. + +\begin{exercise} Write a function \code{repeatLoop}, which should be +applied as follows: +\begin{lstlisting} +repeatLoop { command } ( condition ) +\end{lstlisting} +Is there also a way to obtain a loop syntax like the following? +\begin{lstlisting} +repeatLoop { command } until ( condition ) +\end{lstlisting} +\end{exercise} + +Some other control constructs known from C and Java are missing in +Scala: There are no \code{break} and \code{continue} jumps for loops. +There are also no for-loops in the Java sense -- these have been +replaced by the more general for-loop construct discussed in +Section~\ref{sec:for-loops}. + +\section{Extended Example: Discrete Event Simulation} + +We now discuss an example that demonstrates how assignments and +higher-order functions can be combined in interesting ways. +We will build a simulator for digital circuits. + +The example is taken from Abelson and Sussman's book +\cite{abelson-sussman:structure}. We augment their basic (Scheme-) +code by an object-oriented structure which allows code-reuse through +inheritance. The example also shows how discrete event simulation programs +in general are structured and built. + +We start with a little language to describe digital circuits. +A digital circuit is built from {\em wires} and {\em function boxes}. +Wires carry signals which are transformed by function boxes. +We will represent signals by the booleans \code{true} and +\code{false}. + +Basic function boxes (or: {\em gates}) are: +\begin{itemize} +\item An \emph{inverter}, which negates its signal +\item An \emph{and-gate}, which sets its output to the conjunction of its input. +\item An \emph{or-gate}, which sets its output to the disjunction of its +input. +\end{itemize} +Other function boxes can be built by combining basic ones. + +Gates have {\em delays}, so an output of a gate will change only some +time after its inputs change. + +\paragraph{A Language for Digital Circuits} + +We describe the elements of a digital circuit by the following set of +Scala classes and functions. + +First, there is a class \code{Wire} for wires. +We can construct wires as follows. +\begin{lstlisting} +val a = new Wire; +val b = new Wire; +val c = new Wire; +\end{lstlisting} +Second, there are functions +\begin{lstlisting} +def inverter(input: Wire, output: Wire): unit +def andGate(a1: Wire, a2: Wire, output: Wire): unit +def orGate(o1: Wire, o2: Wire, output: Wire): unit +\end{lstlisting} +which ``make'' the basic gates we need (as side-effects). +More complicated function boxes can now be built from these. +For instance, to construct a half-adder, we can define: + +\begin{lstlisting} + def halfAdder(a: Wire, b: Wire, s: Wire, c: Wire): unit = { + val d = new Wire; + val e = new Wire; + orGate(a, b, d); + andGate(a, b, c); + inverter(c, e); + andGate(d, e, s); + } +\end{lstlisting} +This abstraction can itself be used, for instance in defining a full +adder: +\begin{lstlisting} + def fullAdder(a: Wire, b: Wire, cin: Wire, sum: Wire, cout: Wire) = { + val s = new Wire; + val c1 = new Wire; + val c2 = new Wire; + halfAdder(a, cin, s, c1); + halfAdder(b, s, sum, c2); + orGate(c1, c2, cout); + } +\end{lstlisting} +Class \code{Wire} and functions \code{inverter}, \code{andGate}, and +\code{orGate} represent thus a little language in which users can +define digital circuits. We now give implementations of this class +and these functions, which allow one to simulate circuits. +These implementations are based on a simple and general API for +discrete event simulation. + +\paragraph{The Simulation API} + +Discrete event simulation performs user-defined \emph{actions} at +specified \emph{times}. +An {\em action} is represented as a function which takes no parameters and +returns a \code{unit} result: +\begin{lstlisting} +type Action = () => unit; +\end{lstlisting} +The \emph{time} is simulated; it is not the actual ``wall-clock'' time. + +A concrete simulation will be done inside an object which inherits +from the abstract \code{Simulation} class. This class has the following +signature: + +\begin{lstlisting} +abstract class Simulation { + def currentTime: int; + def afterDelay(delay: int, def action: Action): unit; + def run: unit; +} +\end{lstlisting} +Here, +\code{currentTime} returns the current simulated time as an integer +number, +\code{afterDelay} schedules an action to be performed at a specified +delay after \code{currentTime}, and +\code{run} runs the simulation until there are no further actions to be +performed. + +\paragraph{The Wire Class} +A wire needs to support three basic actions. +\begin{itemize} +\item[] +\code{getSignal: boolean}~~ returns the current signal on the wire. +\item[] +\code{setSignal(sig: boolean): unit}~~ sets the wire's signal to \code{sig}. +\item[] +\code{addAction(p: Action): unit}~~ attaches the specified procedure +\code{p} to the {\em actions} of the wire. All attached action +procedures will be executed every time the signal of a wire changes. +\end{itemize} +Here is an implementation of the \code{Wire} class: +\begin{lstlisting} +class Wire { + private var sigVal = false; + private var actions: List[Action] = List(); + def getSignal = sigVal; + def setSignal(s: boolean) = + if (s != sigVal) { + sigVal = s; + actions.foreach(action => action()); + } + def addAction(a: Action) = { + actions = a :: actions; a() + } +} +\end{lstlisting} +Two private variables make up the state of a wire. The variable +\code{sigVal} represents the current signal, and the variable +\code{actions} represents the action procedures currently attached to +the wire. + +\paragraph{The Inverter Class} +We implement an inverter by installing an action on its input wire, +namely the action which puts the negated input signal onto the output +signal. The action needs to take effect at \code{InverterDelay} +simulated time units after the input changes. This suggests the +following implementation: +\begin{lstlisting} +def inverter(input: Wire, output: Wire) = { + def invertAction() = { + val inputSig = input.getSignal; + afterDelay(InverterDelay, () => output.setSignal(!inputSig)) + } + input addAction invertAction +} +\end{lstlisting} + +\paragraph{The And-Gate Class} +And-gates are implemented analogously to inverters. The action of an +\code{andGate} is to output the conjunction of its input signals. +This should happen at \code{AndGateDelay} simulated time units after +any one of its two inputs changes. Hence, the following implementation: +\begin{lstlisting} +def andGate(a1: Wire, a2: Wire, output: Wire) = { + def andAction() = { + val a1Sig = a1.getSignal; + val a2Sig = a2.getSignal; + afterDelay(AndGateDelay, () => output.setSignal(a1Sig & a2Sig)); + } + a1 addAction andAction; + a2 addAction andAction; +} +\end{lstlisting} + +\begin{exercise} Write the implementation of \code{orGate}. +\end{exercise} + +\begin{exercise} Another way is to define an or-gate by a combination of +inverters and and gates. Define a function \code{orGate} in terms of +\code{andGate} and \code{inverter}. What is the delay time of this function? +\end{exercise} + +\paragraph{The Simulation Class} + +Now, we just need to implement class \code{Simulation}, and we are +done. The idea is that we maintain inside a \code{Simulation} object +an \emph{agenda} of actions to perform. The agenda is represented as +a list of pairs of actions and the times they need to be run. The +agenda list is sorted, so that earlier actions come before later ones. +\begin{lstlisting} +class Simulation { + private type Agenda = List[Pair[int, Action]]; + private var agenda: Agenda = List(); +\end{lstlisting} +There is also a private variable \code{curtime} to keep track of the +current simulated time. +\begin{lstlisting} + private var curtime = 0; +\end{lstlisting} +An application of the method \code{afterDelay(delay, action)} +inserts the pair \code{(curtime + delay, action)} into the +\code{agenda} list at the appropriate place. +\begin{lstlisting} + def afterDelay(int delay)(def action: Action): unit = { + val actiontime = curtime + delay; + def insertAction(ag: Agenda): Agenda = ag match { + case List() => + Pair(actiontime, action) :: ag + case (first @ Pair(time, act)) :: ag1 => + if (actiontime < time) Pair(actiontime, action) :: ag + else first :: insert(ag1) + } + agenda = insert(agenda) + } +\end{lstlisting} +An application of the \code{run} method removes successive elements +from the \code{agenda} and performs their actions. +It continues until the agenda is empty: +\begin{lstlisting} +def run = { + afterDelay(0, () => System.out.println("*** simulation started ***")); + agenda match { + case List() => + case Pair(_, action) :: agenda1 => + agenda = agenda1; action(); run + } +} +\end{lstlisting} + + +\paragraph{Running the Simulator} +To run the simulator, we still need a way to inspect changes of +signals on wires. To this purpose, we write a function \code{probe}. +\begin{lstlisting} +def probe(name: String, wire: Wire): unit = { + wire addAction (() => + System.out.println( + name + " " + currentTime + " new_value = " + wire.getSignal); + ) +} +\end{lstlisting} +Now, to see the simulator in action, let's define four wires, and place +probes on two of them: +\begin{lstlisting} +> val input1 = new Wire +> val input2 = new Wire +> val sum = new Wire +> val carry = new Wire + +> probe("sum", sum) +sum 0 new_value = false +> probe("carry", carry) +carry 0 new_value = false +\end{lstlisting} +Now let's define a half-adder connecting the wires: +\begin{lstlisting} +> halfAdder(input1, input2, sum, carry); +\end{lstlisting} +Finally, set one after another the signals on the two input wires to +\code{true} and run the simulation. +\begin{lstlisting} +> input1 setSignal true; run +*** simulation started *** +sum 8 new_value = true +> input2 setSignal true; run +carry 11 new_value = true +sum 15 new_value = false +\end{lstlisting} + +\section{Summary} + +We have seen in this chapter the constructs that let us model state in +Scala -- these are variables, assignments, abd imperative control +structures. State and Assignment complicate our mental model of +computation. In particular, referential transparency is lost. On the +other hand, assignment gives us new ways to formulate programs +elegantly. As always, it depends on the situation whether purely +functional programming or programming with assignments works best. + +\chapter{Computing with Streams} + +The previous chapters have introduced variables, assignment and +stateful objects. We have seen how real-world objects that change +with time can be modelled by changing the state of variables in a +computation. Time changes in the real world thus are modelled by time +changes in program execution. Of course, such time changes are usually +stretched out or compressed, but their relative order is the same. +This seems quite natural, but there is a also price to pay: Our simple +and powerful substitution model for functional computation is no +longer applicable once we introduce variables and assignment. + +Is there another way? Can we model state change in the real world +using only immutable functions? Taking mathematics as a guide, the +answer is clearly yes: A time-changing quantity is simply modelled by +a function \code{f(t)} with a time parameter \code{t}. The same can be +done in computation. Instead of overwriting a variable with successive +values, we represent all these values as successive elements in a +list. So, a mutable variable \code{var x: T} gets replaced by an +immutable value \code{val x: List[T]}. In a sense, we trade space for +time -- the different values of the variable now all exit concurrently +as different elements of the list. One advantage of the list-based +view is that we can ``time-travel'', i.e. view several successive +values of the variable at the same time. Another advantage is that we +can make use of the powerful library of list processing functions, +which often simplifies computation. For instance, consider the +imperative way to compute the sum of all prime numbers in an interval: +\begin{lstlisting} +def sumPrimes(start: int, end: int): int = { + var i = start; + var acc = 0; + while (i < end) { + if (isPrime(i)) acc = acc + i; + i = i + 1; + } + acc +} +\end{lstlisting} +Note that the variable \code{i} ``steps through'' all values of the interval +\code{[start .. end-1]}. + +A more functional way is to represent the list of values of variable \code{i} directly as \code{range(start, end)}. Then the function can be rewritten as follows. +\begin{lstlisting} +def sumPrimes(start: int, end: int) = + sum(range(start, end) filter isPrime); +\end{lstlisting} + +No contest which program is shorter and clearer! However, the +functional program is also considerably less efficient since it +constructs a list of all numbers in the interval, and then another one +for the prime numbers. Even worse from an efficiency point of view is +the following example: + +To find the second prime number between \code{1000} and \code{10000}: +\begin{lstlisting} + range(1000, 10000) filter isPrime at 1 +\end{lstlisting} +Here, the list of all numbers between \code{1000} and \code{10000} is +constructed. But most of that list is never inspected! + +However, we can obtain efficient execution for examples like these by +a trick: +\begin{quote} +%\red + Avoid computing the tail of a sequence unless that tail is actually + necessary for the computation. +\end{quote} +We define a new class for such sequences, which is called \code{Stream}. + +Streams are created using the constant \code{empty} and the constructor \code{cons}, +which are both defined in module \code{scala.Stream}. For instance, the following +expression constructs a stream with elements \code{1} and \code{2}: +\begin{lstlisting} +Stream.cons(1, Stream.cons(2, Stream.empty)) +\end{lstlisting} +As another example, here is the analogue of \code{List.range}, +but returning a stream instead of a list: +\begin{lstlisting} +def range(start: Int, end: Int): Stream[Int] = + if (start >= end) Stream.empty + else Stream.cons(start, range(start + 1, end)); +\end{lstlisting} +(This function is also defined as given above in module +\code{Stream}). Even though \code{Stream.range} and \code{List.range} +look similar, their execution behavior is completely different: + +\code{Stream.range} immediately returns with a \code{Stream} object +whose first element is \code{start}. All other elements are computed +only when they are \emph{demanded} by calling the \code{tail} method +(which might be never at all). + +Streams are accessed just as lists. as for lists, the basic access +methods are \code{isEmpty}, \code{head} and \code{tail}. For instance, +we can print all elements of a stream as follows. +\begin{lstlisting} +def print(xs: Stream[a]): unit = + if (!xs.isEmpty) { System.out.println(xs.head); print(xs.tail) } +\end{lstlisting} +Streams also support almost all other methods defined on lists (see +below for where their methods sets differ). For instance, we can find +the second prime number between \code{1000} and \code{10000} by applying methods +\code{filter} and \code{apply} on an interval stream: +\begin{lstlisting} + Stream.range(1000, 10000) filter isPrime at 1 +\end{lstlisting} +The difference to the previous list-based implementation is that now +we do not needlessly construct and test for primality any numbers +beyond 3. + +\paragraph{Consing and appending streams} Two methods in class \code{List} +which are not supported by class \code{Stream} are \code{::} and +\code{:::}. The reason is that these methods are dispatched on their +right-hand side argument, which means that this argument needs to be +evaluated before the method is called. For instance, in the case of +\code{x :: xs} on lists, the tail \code{xs} needs to be evaluated +before \code{::} can be called and the new list can be constructed. +This does not work for streams, where we require that the tail of a +stream should not be evaluated until it is demanded by a \code{tail} operation. +The argument why list-append \code{:::} cannot be adapted to streams is analogous. + +Intstead of \code{x :: xs}, one uses \code{Stream.cons(x, xs)} for +constructing a stream with first element \code{x} and (unevaluated) +rest \code{xs}. Instead of \code{xs ::: ys}, one uses the operation +\code{xs append ys}. + +\chapter{Iterators} + +Iterators are the imperative version of streams. Like streams, +iterators describe potentially infinite lists. However, there is no +data-structure which contains the elements of an iterator. Instead, +iterators aloow one to step through the sequence, using two abstract methods \code{next} and \code{hasNext}. +\begin{lstlisting} +trait Iterator[+a] { + def hasNext: boolean; + def next: a; +\end{lstlisting} +Method \code{next} returns successive elements. Method \code{hasNext} +indicates whether there are still more elements to be returned by +\code{next}. Iterators also support some other methods, which are +explained later. + +As an example, here is an application which prints the squares of all +numbers from 1 to 100. +\begin{lstlisting} +var it: Iterator[int] = Iterator.range(1, 100); +while (it.hasNext) { + val x = it.next; + System.out.println(x * x) +} +\end{lstlisting} + +\section{Iterator Methods} + +Iterators support a rich set of methods besides \code{next} and +\code{hasNext}, which is described in the following. Many of these +methods mimic a corresponding functionality in lists. + +\paragraph{Append} +Method \code{append} constructs an iterator which resumes with the +given iterator \code{it} after the current iterator has finished. +\begin{lstlisting} + def append[b >: a](that: Iterator[b]): Iterator[b] = new Iterator[b] { + def hasNext = Iterator.this.hasNext || that.hasNext; + def next = if (Iterator.this.hasNext) Iterator.this.next else that.next; + } +\end{lstlisting} +The terms \code{Iterator.this.next} and \code{Iterator.this.hasNext} +in the definition of \code{append} call the corresponding methods as +they are defined in the enclosing \code{Iterator} class. If the +\code{Iterator} prefix to \code{this} would have been missing, +\code{hasNext} and \code{next} would have called recursively the +methods being defined in the result of \code{append}, which is not +what we want. + +\paragraph{Map, FlatMap, Foreach} Method \code{map} +constructs an iterator which returns all elements of the original +iterator transformed by a given function \code{f}. +\begin{lstlisting} + def map[b](f: a => b): Iterator[b] = new Iterator[b] { + def hasNext = Iterator.this.hasNext; + def next = f(Iterator.this.next) + } +\end{lstlisting} +Method \code{flatMap} is like method \code{map}, except that the +transformation function \code{f} now returns an iterator. +The result of \code{flatMap} is the iterator resulting from appending +together all iterators returned from successive calls of \code{f}. +\begin{lstlisting} + def flatMap[b](f: a => Iterator[b]): Iterator[b] = new Iterator[b] { + private var cur: Iterator[b] = Iterator.empty; + def hasNext: Boolean = + if (cur.hasNext) true + else if (Iterator.this.hasNext) { cur = f(Iterator.this.next); hasNext } + else false; + def next: b = + if (cur.hasNext) cur.next + else if (Iterator.this.hasNext) { cur = f(Iterator.this.next); next } + else error("next on empty iterator"); + } +\end{lstlisting} +Closely related to \code{map} is the \code{foreach} method, which +applies a given function to all elements of an iterator, but does not +construct a list of results +\begin{lstlisting} + def foreach(f: a => Unit): Unit = + while (hasNext) { f(next) } +\end{lstlisting} + +\paragraph{Filter} Method \code{filter} constructs an iterator which +returns all elements of the original iterator that satisfy a criterion +\code{p}. +\begin{lstlisting} + def filter(p: a => Boolean) = new BufferedIterator[a] { + private val source = + Iterator.this.buffered; + private def skip: Unit = + while (source.hasNext && !p(source.head)) { source.next; () } + def hasNext: Boolean = + { skip; source.hasNext } + def next: a = + { skip; source.next } + def head: a = + { skip; source.head; } + } +\end{lstlisting} +In fact, \code{filter} returns instances of a subclass of iterators +which are ``buffered''. A \code{BufferedIterator} object is an +interator which has in addition a method \code{head}. This method +returns the element which would otherwise have been returned by +\code{head}, but does not advance beyond that element. Hence, the +element returned by \code{head} is returned again by the next call to +\code{head} or \code{next}. Here is the definition of the +\code{BufferedIterator} trait. +\begin{lstlisting} +trait BufferedIterator[+a] extends Iterator[a] { + def head: a +} +\end{lstlisting} +Since \code{map}, \code{flatMap}, \code{filter}, and \code{foreach} +exist for iterators, it follows that for-comprehensions and for-loops +can also be used on iterators. For instance, the application which prints the squares of numbers between 1 and 100 could have equivalently been expressed as follows. +\begin{lstlisting} +for (val i <- Iterator.range(1, 100)) + System.out.println(i * i); +\end{lstlisting} + +\paragraph{Zip} Method \code{zip} takes another iterator and +returns an iterator consisting of pairs of corresponding elements +returned by the two iterators. +\begin{lstlisting} + def zip[b](that: Iterator[b]) = new Iterator[Pair[a, b]] { + def hasNext = Iterator.this.hasNext && that.hasNext; + def next = Pair(Iterator.this.next, that.next); + } +} +\end{lstlisting} + +\section{Constructing Iterators} + +Concrete iterators need to provide implementations for the two +abstract methods \code{next} and \code{hasNext} in class +\code{Iterator}. The simplest iterator is \code{Iterator.empty} which +always returns an empty sequence: +\begin{lstlisting} +object Iterator { + object empty extends Iterator[All] { + def hasNext = false; + def next: a = error("next on empty iterator"); + } +\end{lstlisting} +A more interesting iterator enumerates all elements of an array. This +iterator is constructed by the \code{fromArray} method, which is also defined in the object \code{Iterator} +\begin{lstlisting} + def fromArray[a](xs: Array[a]) = new Iterator[a] { + private var i = 0; + def hasNext: Boolean = + i < xs.length; + def next: a = + if (i < xs.length) { val x = xs(i) ; i = i + 1 ; x } + else error("next on empty iterator"); + } +\end{lstlisting} +Another iterator enumerates an integer interval. The +\code{Iterator.range} function returns an iterator which traverses a +given interval of integer values. It is defined as follows. +\begin{lstlisting} +object Iterator { + def range(start: int, end: int) = new Iterator[int] { + private var current = start; + def hasNext = current < end; + def next = { + val r = current; + if (current < end) current = current + 1 + else throw new Error("end of iterator"); + r + } + } +} +\end{lstlisting} +All iterators seen so far terminate eventually. It is also possible to +define iterators that go on forever. For instance, the following +iterator returns successive integers from some start +value\footnote{Due to the finite representation of type \prog{int}, +numbers will wrap around at $2^31$.}. +\begin{lstlisting} +def from(start: int) = new Iterator[int] { + private var last = start - 1; + def hasNext = true; + def next = { last = last + 1; last } +} +\end{lstlisting} + +\section{Using Iterators} + +Here are two more examples how iterators are used. First, to print all +elements of an array \code{xs: Array[int]}, one can write: +\begin{lstlisting} + Iterator.fromArray(xs) foreach (x => + System.out.println(x)) +\end{lstlisting} +Or, using a for-comprehension: +\begin{lstlisting} + for (val x <- Iterator.fromArray(xs)) + System.out.println(x) +\end{lstlisting} +As a second example, consider the problem of finding the indices of +all the elements in an array of \code{double}s greater than some +\code{limit}. The indices should be returned as an iterator. +This is achieved by the following expression. +\begin{lstlisting} +import Iterator._; +fromArray(xs) +.zip(from(0)) +.filter(case Pair(x, i) => x > limit) +.map(case Pair(x, i) => i) +\end{lstlisting} +Or, using a for-comprehension: +\begin{lstlisting} +import Iterator._; +for (val Pair(x, i) <- fromArray(xs) zip from(0); x > limit) +yield i +\end{lstlisting} + + + + + + + +\chapter{Combinator Parsing}\label{sec:combinator-parsing} + +In this chapter we describe how to write combinator parsers in +Scala. Such parsers are constructed from predefined higher-order +functions, so called {\em parser combinators}, that closely model the +constructions of an EBNF grammar \cite{wirth:ebnf}. + +As running example, we consider parsers for possibly nested +lists of identifiers and numbers, which +are described by the following context-free grammar. +\bda{p{3cm}cp{10cm}} +letter &::=& /* all letters */ \\ +digit &::=& /* all digits */ \\[0.5em] +ident &::=& letter \{letter $|$ digit \}\\ +number &::=& digit \{digit\}\\[0.5em] +list &::=& `(' [listElems] `)' \\ +listElems &::=& expr [`,' listElems] \\ +expr &::=& ident | number | list + +\eda + +\section{Simple Combinator Parsing} + +In this section we will only be concerned with the task of recognizing +input strings, not with processing them. So we can describe parsers +by the sets of input strings they accept. There are two +fundamental operators over parsers: +\code{&&&} expresses the sequential composition of a parser with +another, while \code{|||} expresses an alternative. These operations +will both be defined as methods of a \code{Parser} class. We will +also define constructors for the following primitive parsers: + +\begin{tabular}{ll} +\code{empty} & The parser that accepts the empty string +\\ +\code{fail} & The parser that accepts no string +\\ +\code{chr(c: char)} + & The parser that accepts the single-character string ``$c$''. +\\ +\code{chr(p: char => boolean)} + & The parser that accepts single-character strings + ``$c$'' \\ + & for which $p(c)$ is true. +\end{tabular} + +There are also the two higher-order parser combinators \code{opt}, +expressing optionality and \code{rep}, expressing repetition. +For any parser $p$, \code{opt(}$p$\code{)} yields a parser that +accepts the strings accepted by $p$ or else the empty string, while +\code{rep(}$p$\code{)} accepts arbitrary sequences of the strings accepted by +$p$. In EBNF, \code{opt(}$p$\code{)} corresponds to $[p]$ and +\code{rep(}$p$\code{)} corresponds to $\{p\}$. + +The central idea of parser combinators is that parsers can be produced +by a straightforward rewrite of the grammar, replacing \code{::=} with +\code{=}, sequencing with +\code{&&&}, choice +\code{|} with \code{|||}, repetition \code{\{...\}} with +\code{rep(...)} and optional occurrence \code{[...]} with \code{opt(...)}. +Applying this process to the grammar of lists +yields the following class. +\begin{lstlisting} +abstract class ListParsers extends Parsers { + def chr(p: char => boolean): Parser; + def chr(c: char): Parser = chr(d: char => d == c); + + def letter : Parser = chr(Character.isLetter); + def digit : Parser = chr(Character.isDigit); + + def ident : Parser = letter &&& rep(letter ||| digit); + def number : Parser = digit &&& rep(digit); + def list : Parser = chr('(') &&& opt(listElems) &&& chr(')'); + def listElems : Parser = expr &&& (chr(',') &&& listElems ||| empty); + def expr : Parser = ident ||| number ||| list; +} +\end{lstlisting} +This class isolates the grammar from other aspects of parsing. It +abstracts over the type of input +and over the method used to parse a single character +(represented by the abstract method \code{chr(p: char => +boolean))}. The missing bits of information need to be supplied by code +applying the parser class. + +It remains to explain how to implement a library with the combinators +described above. We will pack combinators and their underlying +implementation in a base class \code{Parsers}, which is inherited by +\code{ListParsers}. The first question to decide is which underlying +representation type to use for a parser. We treat parsers here +essentially as functions that take a datum of the input type +\code{intype} and that yield a parse result of type +\code{Option[intype]}. The \code{Option} type is predefined as +follows. +\begin{lstlisting} +trait Option[+a]; +case object None extends Option[All]; +case class Some[a](x: a) extends Option[a]; +\end{lstlisting} +A parser applied to some input either succeeds or fails. If it fails, +it returns the constant \code{None}. If it succeeds, it returns a +value of the form \code{Some(in1)} where \code{in1} represents the +input that remains to be parsed. +\begin{lstlisting} +abstract class Parsers { + type intype; + abstract class Parser { + type Result = Option[intype]; + def apply(in: intype): Result; +\end{lstlisting} +A parser also implements the combinators +for sequence and alternative: +\begin{lstlisting} + /*** p &&& q applies first p, and if that succeeds, then q + */ + def &&& (def q: Parser) = new Parser { + def apply(in: intype): Result = Parser.this.apply(in) match { + case None => None + case Some(in1) => q(in1) + } + } + + /*** p ||| q applies first p, and, if that fails, then q. + */ + def ||| (def q: Parser) = new Parser { + def apply(in: intype): Result = Parser.this.apply(in) match { + case None => q(in) + case s => s + } + } +\end{lstlisting} +The implementations of the primitive parsers \code{empty} and \code{fail} +are trivial: +\begin{lstlisting} + val empty = new Parser { def apply(in: intype): Result = Some(in) } + val fail = new Parser { def apply(in: intype): Result = None } +\end{lstlisting} +The higher-order parser combinators \code{opt} and \code{rep} can be +defined in terms of the combinators for sequence and alternative: +\begin{lstlisting} + def opt(p: Parser): Parser = p ||| empty; // p? = (p | <empty>) + def rep(p: Parser): Parser = opt(rep1(p)); // p* = [p+] + def rep1(p: Parser): Parser = p &&& rep(p); // p+ = p p* +} // end Parser +\end{lstlisting} +To run combinator parsers, we still need to decide on a way to handle +parser input. Several possibilities exist: The input could be +represented as a list, as an array, or as a random access file. Note +that the presented combinator parsers use backtracking to change from +one alternative to another. Therefore, it must be possible to reset +input to a point that was previously parsed. If one restricted the +focus to LL(1) grammars, a non-backtracking implementation of the +parser combinators in class \code{Parsers} would also be possible. In +that case sequential input methods based on (say) iterators or +sequential files would also be possible. + +In our example, we represent the input by a pair of a string, which +contains the input phrase as a whole, and an index, which represents +the portion of the input which has not yet been parsed. Since the +input string does not change, just the index needs to be passed around +as a result of individual parse steps. This leads to the following +class of parsers that read strings: +\begin{lstlisting} +class ParseString(s: String) extends Parsers { + type intype = int; + def chr(p: char => boolean) = new Parser { + def apply(in: int): Parser#Result = + if (in < s.length() && p(s charAt in)) Some(in + 1); + else None; + } + val input = 0; +} +\end{lstlisting} +This class implements a method \code{chr(p: char => boolean)} and a +value \code{input}. The \code{chr} method builds a parser that either +reads a single character satisfying the given predicate \code{p} or +fails. All other parsers over strings are ultimately implemented in +terms of that method. The \code{input} value represents the input as a +whole. In out case, it is simply value \code{0}, the start index of +the string to be read. + +Note \code{apply}'s result type, \code{Parser#Result}. This syntax +selects the type element \code{Result} of the type \code{Parser}. It +thus corresponds roughly to selecting a static inner class from some +outer class in Java. Note that we could {\em not} have written +\code{Parser.Result}, as the latter would express selection of the +\code{Result} element from a {\em value} named \code{Parser}. + +We have now extended the root class \code{Parsers} in two different +directions: Class \code{ListParsers} defines a grammar of phrases to +be parsed, whereas class \code{ParseString} defines a method by which +such phrases are input. To write a concrete parsing application, we +need to define both grammar and input method. We do this by combining +two extensions of \code{Parsers} using a {\em mixin composition}. +Here is the start of a sample application: +\begin{lstlisting} +object Test { + def main(args: Array[String]): unit = { + val ps = new ListParsers with ParseString(args(0)); +\end{lstlisting} +The last line above creates a new family of parsers by composing class +\code{ListParsers} with class \code{ParseString}. The two classes +share the common superclass \code{Parsers}. The abstract method +\code{chr} in \code{ListParsers} is implemented by class \code{ParseString}. + +To run the parser, we apply the start symbol of the grammar +\code{expr} the argument code{input} and observe the result: +\begin{lstlisting} + ps.expr(input) match { + case Some(n) => + System.out.println("parsed: " + args(0).substring(0, n)); + case None => + System.out.println("nothing parsed"); + } + } +}// end Test +\end{lstlisting} +Note the syntax ~\code{ps.expr(input)}, which treats the \code{expr} +parser as if it was a function. In Scala, objects with \code{apply} +methods can be applied directly to arguments as if they were functions. + +Here is an example run of the program above: +\begin{lstlisting} +> java examples.Test "(x,1,(y,z))" +parsed: (x,1,(y,z)) +> java examples.Test "(x,,1,(y,z))" +nothing parsed +\end{lstlisting} + +\section{\label{sec:parsers-results}Parsers that Produce Results} + +The combinator library of the previous section does not support the +generation of output from parsing. But usually one does not just want +to check whether a given string belongs to the defined language, one +also wants to convert the input string into some internal +representation such as an abstract syntax tree. + +In this section, we modify our parser library to build parsers that +produce results. We will make use of the for-comprehensions introduced +in Chapter~\ref{sec:for-notation}. The basic combinator of sequential +composition, formerly ~\code{p &&& q}, now becomes +\begin{lstlisting} +for (val x <- p; val y <- q) yield e . +\end{lstlisting} +Here, the names \code{x} and \code{y} are bound to the results of +executing the parsers \code{p} and \code{q}. \code{e} is an expression +that uses these results to build the tree returned by the composed +parser. + +Before describing the implementation of the new parser combinators, we +explain how the new building blocks are used. Say we want to modify +our list parser so that it returns an abstract syntax tree of the +parsed expression. Syntax trees are given by the following class hierarchy: +\begin{lstlisting} +abstract class Tree{} +case class Id (s: String) extends Tree {} +case class Num(n: int) extends Tree {} +case class Lst(elems: List[Tree]) extends Tree {} +\end{lstlisting} +That is, a syntax tree is an identifier, an integer number, or a +\code{Lst} node with a list of trees as descendants. + +As a first step towards parsers that produce results we define three +little parsers that return a single read character as result. +\begin{lstlisting} +abstract class CharParsers extends Parsers { + def any: Parser[char]; + def chr(ch: char): Parser[char] = + for (val c <- any; c == ch) yield c; + def chr(p: char => boolean): Parser[char] = + for (val c <- any; p(c)) yield c; +} +\end{lstlisting} +The \code{any} parser succeeds with the first character of remaining +input as long as input is nonempty. It is abstract in class +\code{ListParsers} since we want to abstract in this class from the +concrete input method used. The two \code{chr} parsers return as before +the first input character if it equals a given character or matches a +given predicate. They are now implemented in terms of \code{any}. + +The next level is represented by parsers reading identifiers, numbers +and lists. Here is a parser for identifiers. +\begin{lstlisting} +class ListParsers extends CharParsers { + def ident: Parser[Tree] = + for ( + val c: char <- chr(Character.isLetter); + val cs: List[char] <- rep(chr(Character.isLetterOrDigit)) + ) yield Id((c :: cs).mkString("", "", "")); +\end{lstlisting} +Remark: Because \code{chr(...)} returns a single character, its +repetition \code{rep(chr(...))} returns a list of characters. The +\code{yield} part of the for-comprehension converts all intermediate +results into an \code{Id} node with a string as element. To convert +the read characters into a string, it conses them into a single list, +and invokes the \code{mkString} method on the result. + +Here is a parser for numbers: +\begin{lstlisting} + def number: Parser[Tree] = + for ( + val d: char <- chr(Character.isDigit); + val ds: List[char] <- rep(chr(Character.isDigit)) + ) yield Num(((d - '0') /: ds) ((x, digit) => x * 10 + digit - '0')); +\end{lstlisting} +Intermediate results are in this case the leading digit of +the read number, followed by a list of remaining digits. The +\code{yield} part of the for-comprehension reduces these to a number +by a fold-left operation. + +Here is a parser for lists: +\begin{lstlisting} + def list: Parser[Tree] = + for ( + val _ <- chr('('); + val es <- listElems ||| succeed(List()); + val _ <- chr(')') + ) yield Lst(es); + + def listElems: Parser[List[Tree]] = + for ( + val x <- expr; + val xs <- chr(',') &&& listElems ||| succeed(List()) + ) yield x :: xs; +\end{lstlisting} +The \code{list} parser returns a \code{Lst} node with a list of trees +as elements. That list is either the result of \code{listElems}, or, +if that fails, the empty list (expressed here as: the result of a +parser which always succeeds with the empty list as result). + +The highest level of our grammar is represented by function +\code{expr}: +\begin{lstlisting} + def expr: Parser[Tree] = + ident ||| number ||| list +}// end ListParsers. +\end{lstlisting} +We now present the parser combinators that support the new +scheme. Parsers that succeed now return a parse result besides the +un-consumed input. +\begin{lstlisting} +abstract class Parsers { + type intype; + trait Parser[a] { + type Result = Option[Pair[a, intype]]; + def apply(in: intype): Result; +\end{lstlisting} +Parsers are parameterized with the type of their result. The class +\code{Parser[a]} now defines new methods \code{map}, \code{flatMap} +and \code{filter}. The \code{for} expressions are mapped by the +compiler to calls of these functions using the scheme described in +Chapter~\ref{sec:for-notation}. For parsers, these methods are +implemented as follows. +\begin{lstlisting} + def filter(pred: a => boolean) = new Parser[a] { + def apply(in: intype): Result = Parser.this.apply(in) match { + case None => None + case Some(Pair(x, in1)) => if (pred(x)) Some(Pair(x, in1)) else None + } + } + def map[b](f: a => b) = new Parser[b] { + def apply(in: intype): Result = Parser.this.apply(in) match { + case None => None + case Some(Pair(x, in1)) => Some(Pair(f(x), in1)) + } + } + def flatMap[b](f: a => Parser[b]) = new Parser[b] { + def apply(in: intype): Result = Parser.this.apply(in) match { + case None => None + case Some(Pair(x, in1)) => f(x).apply(in1) + } + } +\end{lstlisting} +The \code{filter} method takes as parameter a predicate $p$ which it +applies to the results of the current parser. If the predicate is +false, the parser fails by returning \code{None}; otherwise it returns +the result of the current parser. The \code{map} method takes as +parameter a function $f$ which it applies to the results of the +current parser. The \code{flatMap} takes as parameter a function +\code{f} which returns a parser. It applies \code{f} to the result of +the current parser and then continues with the resulting parser. The +\code{|||} method is essentially defined as before. The +\code{&&&} method can now be defined in terms of \code{for}. +\begin{lstlisting} + def ||| (def p: Parser[a]) = new Parser[a] { + def apply(in: intype): Result = Parser.this.apply(in) match { + case None => p(in) + case s => s + } + } + + def &&& [b](def p: Parser[b]): Parser[b] = + for (val _ <- this; val x <- p) yield x; + }// end Parser +\end{lstlisting} + +The primitive parser \code{succeed} replaces \code{empty}. It consumes +no input and returns its parameter as result. +\begin{lstlisting} + def succeed[a](x: a) = new Parser[a] { + def apply(in: intype) = Some(Pair(x, in)) + } +\end{lstlisting} + +The parser combinators \code{rep} and \code{opt} now also return +results. \code{rep} returns a list which contains as elements the +results of each iteration of its sub-parser. \code{opt} returns a list +which is either empty or returns as single element the result of the +optional parser. +\begin{lstlisting} + def rep[a](p: Parser[a]): Parser[List[a]] = + rep1(p) ||| succeed(List()); + + def rep1[a](p: Parser[a]): Parser[List[a]] = + for (val x <- p; val xs <- rep(p)) yield x :: xs; + + def opt[a](p: Parser[a]): Parser[List[a]] = + (for (val x <- p) yield List(x)) ||| succeed(List()); +} // end Parsers +\end{lstlisting} +The root class \code{Parsers} abstracts over which kind of +input is parsed. As before, we determine the input method by a separate class. +Here is \code{ParseString}, this time adapted to parsers that return results. +It defines now the method \code{any}, which returns the first input character. +\begin{lstlisting} +class ParseString(s: String) extends Parsers { + type intype = int; + val input = 0; + def any = new Parser[char] { + def apply(in: int): Parser[char]#Result = + if (in < s.length()) Some(Pair(s charAt in, in + 1)) else None; + } +} +\end{lstlisting} +The rest of the application is as before. Here is a test program which +constructs a list parser over strings and prints out the result of +applying it to the command line argument. +\begin{lstlisting} +object Test { + def main(args: Array[String]): unit = { + val ps = new ListParsers with ParseString(args(0)); + ps.expr(input) match { + case Some(Pair(list, _)) => System.out.println("parsed: " + list); + case None => "nothing parsed" + } + } +} +\end{lstlisting} + +\begin{exercise}\label{exercise:end-marker} The parsers we have defined so +far can succeed even if there is some input beyond the parsed text. To +prevent this, one needs a parser which recognizes the end of input. +Redesign the parser library so that such a parser can be introduced. +Which classes need to be modified? +\end{exercise} + +\chapter{\label{sec:hm}Hindley/Milner Type Inference} + +This chapter demonstrates Scala's data types and pattern matching by +developing a type inference system in the Hindley/Milner style +\cite{milner:polymorphism}. The source language for the type inferencer is +lambda calculus with a let construct called Mini-ML. Abstract syntax +trees for the Mini-ML are represented by the following data type of +\code{Terms}. +\begin{lstlisting} +trait Term {} +case class Var(x: String) extends Term { + override def toString() = x +} +case class Lam(x: String, e: Term) extends Term { + override def toString() = "(\\" + x + "." + e + ")" +} +case class App(f: Term, e: Term) extends Term { + override def toString() = "(" + f + " " + e + ")" +} +case class Let(x: String, e: Term, f: Term) extends Term { + override def toString() = "let " + x + " = " + e + " in " + f; +} +\end{lstlisting} +There are four tree constructors: \code{Var} for variables, \code{Lam} +for function abstractions, \code{App} for function applications, and +\code{Let} for let expressions. Each case class overrides the +\code{toString()} method of class \code{Any}, so that terms can be +printed in legible form. + +We next define the types that are +computed by the inference system. +\begin{lstlisting} +sealed trait Type {} +case class Tyvar(a: String) extends Type { + override def toString() = a +} +case class Arrow(t1: Type, t2: Type) extends Type { + override def toString() = "(" + t1 + "->" + t2 + ")" +} +case class Tycon(k: String, ts: List[Type]) extends Type { + override def toString() = + k + (if (ts.isEmpty) "" else ts.mkString("[", ",", "]")) +} +\end{lstlisting} +There are three type constructors: \code{Tyvar} for type variables, +\code{Arrow} for function types and \code{Tycon} for type constructors +such as \code{boolean} or \code{List}. Type constructors have as +component a list of their type parameters. This list is empty for type +constants such as \code{boolean}. Again, the type constructors +implement the \code{toString} method in order to display types legibly. + +Note that \code{Type} is a \code{sealed} class. This means that no +subclasses or data constructors that extend \code{Type} can be formed +outside the sequence of definitions in which \code{Type} is defined. +This makes \code{Type} a {\em closed} algebraic data type with exactly +three alternatives. By contrast, type \code{Term} is an {\em open} +algebraic type for which further alternatives can be defined. + +The main parts of the type inferencer are contained in object +\code{typeInfer}. We start with a utility function which creates +fresh type variables: +\begin{lstlisting} +object typeInfer { + private var n: Int = 0; + def newTyvar(): Type = { n = n + 1 ; Tyvar("a" + n) } +\end{lstlisting} +We next define a class for substitutions. A substitution is an +idempotent function from type variables to types. It maps a finite +number of type variables to some types, and leaves all other type +variables unchanged. The meaning of a substitution is extended +point-wise to a mapping from types to types. +\begin{lstlisting} + trait Subst extends Any with Function1[Type,Type] { + + def lookup(x: Tyvar): Type; + + def apply(t: Type): Type = t match { + case tv @ Tyvar(a) => val u = lookup(tv); if (t == u) t else apply(u); + case Arrow(t1, t2) => Arrow(apply(t1), apply(t2)) + case Tycon(k, ts) => Tycon(k, ts map apply) + } + + def extend(x: Tyvar, t: Type) = new Subst { + def lookup(y: Tyvar): Type = if (x == y) t else Subst.this.lookup(y); + } + } + val emptySubst = new Subst { def lookup(t: Tyvar): Type = t } +\end{lstlisting} +We represent substitutions as functions, of type \code{Type => +Type}. This is achieved by making class \code{Subst} inherit from the +unary function type \code{Function1[Type, Type]}\footnote{ +The class inherits the function type as a mixin rather than as a direct +superclass. This is because in the current Scala implementation, the +\code{Function1} type is a Java interface, which cannot be used as a direct +superclass of some other class.}. +To be an instance +of this type, a substitution \code{s} has to implement an \code{apply} +method that takes a \code{Type} as argument and yields another +\code{Type} as result. A function application \code{s(t)} is then +interpreted as \code{s.apply(t)}. + +The \code{lookup} method is abstract in class \code{Subst}. There are +two concrete forms of substitutions which differ in how they +implement this method. One form is defined by the \code{emptySubst} value, +the other is defined by the \code{extend} method in class +\code{Subst}. + +The next data type describes type schemes, which consist of a type and +a list of names of type variables which appear universally quantified +in the type scheme. +For instance, the type scheme $\forall a\forall b.a \!\arrow\! b$ would be represented in the type checker as: +\begin{lstlisting} +TypeScheme(List(TyVar("a"), TyVar("b")), Arrow(Tyvar("a"), Tyvar("b"))) . +\end{lstlisting} +The class definition of type schemes does not carry an extends +clause; this means that type schemes extend directly class +\code{AnyRef}. Even though there is only one possible way to +construct a type scheme, a case class representation was chosen +since it offers convenient ways to decompose an instance of this type into its +parts. +\begin{lstlisting} +case class TypeScheme(tyvars: List[String], tpe: Type) { + def newInstance: Type = { + (emptySubst /: tyvars) ((s, tv) => s.extend(tv, newTyvar())) (tpe); + } +} +\end{lstlisting} +Type scheme objects come with a method \code{newInstance}, which +returns the type contained in the scheme after all universally type +variables have been renamed to fresh variables. The implementation of +this method folds (with \code{/:}) the type scheme's type variables +with an operation which extends a given substitution \code{s} by +renaming a given type variable \code{tv} to a fresh type +variable. The resulting substitution renames all type variables of the +scheme to fresh ones. This substitution is then applied to the type +part of the type scheme. + +The last type we need in the type inferencer is +\code{Env}, a type for environments, which associate variable names +with type schemes. They are represented by a type alias \code{Env} in +module \code{typeInfer}: +\begin{lstlisting} +type Env = List[Pair[String, TypeScheme]]; +\end{lstlisting} +There are two operations on environments. The \code{lookup} function +returns the type scheme associated with a given name, or \code{null} +if the name is not recorded in the environment. +\begin{lstlisting} + def lookup(env: Env, x: String): TypeScheme = env match { + case List() => null + case Pair(y, t) :: env1 => if (x == y) t else lookup(env1, x) + } +\end{lstlisting} +The \code{gen} function turns a given type into a type scheme, +quantifying over all type variables that are free in the type, but +not in the environment. +\begin{lstlisting} + def gen(env: Env, t: Type): TypeScheme = + TypeScheme(tyvars(t) diff tyvars(env), t); +\end{lstlisting} +The set of free type variables of a type is simply the set of all type +variables which occur in the type. It is represented here as a list of +type variables, which is constructed as follows. +\begin{lstlisting} + def tyvars(t: Type): List[Tyvar] = t match { + case tv @ Tyvar(a) => + List(tv) + case Arrow(t1, t2) => + tyvars(t1) union tyvars(t2) + case Tycon(k, ts) => + (List[Tyvar]() /: ts) ((tvs, t) => tvs union tyvars(t)); + } +\end{lstlisting} +Note that the syntax \code{tv @ ...} in the first pattern introduces a variable +which is bound to the pattern that follows. Note also that the explicit type parameter \code{[Tyvar]} in the expression of the third +clause is needed to make local type inference work. + +The set of free type variables of a type scheme is the set of free +type variables of its type component, excluding any quantified type variables: +\begin{lstlisting} + def tyvars(ts: TypeScheme): List[Tyvar] = + tyvars(ts.tpe) diff ts.tyvars; +\end{lstlisting} +Finally, the set of free type variables of an environment is the union +of the free type variables of all type schemes recorded in it. +\begin{lstlisting} + def tyvars(env: Env): List[Tyvar] = + (List[Tyvar]() /: env) ((tvs, nt) => tvs union tyvars(nt._2)); +\end{lstlisting} +A central operation of Hindley/Milner type checking is unification, +which computes a substitution to make two given types equal (such a +substitution is called a {\em unifier}). Function \code{mgu} computes +the most general unifier of two given types $t$ and $u$ under a +pre-existing substitution $s$. That is, it returns the most general +substitution $s'$ which extends $s$, and which makes $s'(t)$ and +$s'(u)$ equal types. +\begin{lstlisting} + def mgu(t: Type, u: Type, s: Subst): Subst = Pair(s(t), s(u)) match { + case Pair(Tyvar(a), Tyvar(b)) if (a == b) => + s + case Pair(Tyvar(a), _) if !(tyvars(u) contains a) => + s.extend(Tyvar(a), u) + case Pair(_, Tyvar(a)) => + mgu(u, t, s) + case Pair(Arrow(t1, t2), Arrow(u1, u2)) => + mgu(t1, u1, mgu(t2, u2, s)) + case Pair(Tycon(k1, ts), Tycon(k2, us)) if (k1 == k2) => + (s /: (ts zip us)) ((s, tu) => mgu(tu._1, tu._2, s)) + case _ => + throw new TypeError("cannot unify " + s(t) + " with " + s(u)) + } +\end{lstlisting} +The \code{mgu} function throws a \code{TypeError} exception if no +unifier substitution exists. This can happen because the two types +have different type constructors at corresponding places, or because a +type variable is unified with a type that contains the type variable +itself. Such exceptions are modeled here as instances of case classes +that inherit from the predefined \code{Exception} class. +\begin{lstlisting} + case class TypeError(s: String) extends Exception(s) {} +\end{lstlisting} +The main task of the type checker is implemented by function +\code{tp}. This function takes as parameters an environment $env$, a +term $e$, a proto-type $t$, and a +pre-existing substitution $s$. The function yields a substitution +$s'$ that extends $s$ and that +turns $s'(env) \ts e: s'(t)$ into a derivable type judgment according +to the derivation rules of the Hindley/Milner type system \cite{milner:polymorphism}. A +\code{TypeError} exception is thrown if no such substitution exists. +\begin{lstlisting} + def tp(env: Env, e: Term, t: Type, s: Subst): Subst = { + current = e; + e match { + case Var(x) => + val u = lookup(env, x); + if (u == null) throw new TypeError("undefined: " + x); + else mgu(u.newInstance, t, s) + + case Lam(x, e1) => + val a = newTyvar(), b = newTyvar(); + val s1 = mgu(t, Arrow(a, b), s); + val env1 = Pair(x, TypeScheme(List(), a)) :: env; + tp(env1, e1, b, s1) + + case App(e1, e2) => + val a = newTyvar(); + val s1 = tp(env, e1, Arrow(a, t), s); + tp(env, e2, a, s1) + + case Let(x, e1, e2) => + val a = newTyvar(); + val s1 = tp(env, e1, a, s); + tp(Pair(x, gen(env, s1(a))) :: env, e2, t, s1) + } + } + var current: Term = null; +\end{lstlisting} +To aid error diagnostics, the \code{tp} function stores the currently +analyzed sub-term in variable \code{current}. Thus, if type checking +is aborted with a \code{TypeError} exception, this variable will +contain the subterm that caused the problem. + +The last function of the type inference module, \code{typeOf}, is a +simplified facade for \code{tp}. It computes the type of a given term +$e$ in a given environment $env$. It does so by creating a fresh type +variable $a$, computing a typing substitution that makes $env \ts e: a$ +into a derivable type judgment, and returning +the result of applying the substitution to $a$. +\begin{lstlisting} + def typeOf(env: Env, e: Term): Type = { + val a = newTyvar(); + tp(env, e, a, emptySubst)(a) + } +}// end typeInfer +\end{lstlisting} +To apply the type inferencer, it is convenient to have a predefined +environment that contains bindings for commonly used constants. The +module \code{predefined} defines an environment \code{env} that +contains bindings for the types of booleans, numbers and lists +together with some primitive operations over them. It also +defines a fixed point operator \code{fix}, which can be used to +represent recursion. +\begin{lstlisting} +object predefined { + val booleanType = Tycon("Boolean", List()); + val intType = Tycon("Int", List()); + def listType(t: Type) = Tycon("List", List(t)); + + private def gen(t: Type): typeInfer.TypeScheme = typeInfer.gen(List(), t); + private val a = typeInfer.newTyvar(); + val env = List( + Pair("true", gen(booleanType)), + Pair("false", gen(booleanType)), + Pair("if", gen(Arrow(booleanType, Arrow(a, Arrow(a, a))))), + Pair("zero", gen(intType)), + Pair("succ", gen(Arrow(intType, intType))), + Pair("nil", gen(listType(a))), + Pair("cons", gen(Arrow(a, Arrow(listType(a), listType(a))))), + Pair("isEmpty", gen(Arrow(listType(a), booleanType))), + Pair("head", gen(Arrow(listType(a), a))), + Pair("tail", gen(Arrow(listType(a), listType(a)))), + Pair("fix", gen(Arrow(Arrow(a, a), a))) + ) +} +\end{lstlisting} +Here's an example how the type inferencer can be used. +Let's define a function \code{showType} which returns the type of +a given term computed in the predefined environment +\code{Predefined.env}: +\begin{lstlisting} +object testInfer { + def showType(e: Term): String = + try { + typeInfer.typeOf(predefined.env, e).toString(); + } catch { + case typeInfer.TypeError(msg) => + "\n cannot type: " + typeInfer.current + + "\n reason: " + msg; + } +\end{lstlisting} +Then the application +\begin{lstlisting} +> testInfer.showType(Lam("x", App(App(Var("cons"), Var("x")), Var("nil")))); +\end{lstlisting} +would give the response +\begin{lstlisting} +> (a6->List[a6]) +\end{lstlisting} +To make the type inferencer more useful, we complete it with a +parser. +Function \code{main} of module \code{testInfer} +parses and typechecks a Mini-ML expression which is given as the first +command line argument. +\begin{lstlisting} + def main(args: Array[String]): unit = { + val ps = new MiniMLParsers with ParseString(args(0)); + ps.all(ps.input) match { + case Some(Pair(term, _)) => + System.out.println("" + term + ": " + showType(term)); + case None => + System.out.println("syntax error"); + } + } +}// typeInf +\end{lstlisting} +To do the parsing, method \code{main} uses the combinator parser +scheme of Chapter~\ref{sec:combinator-parsing}. It creates a parser +family \code{ps} as a mixin composition of parsers +that understand MiniML (but do not know where input comes from) and +parsers that read input from a given string. The \code{MiniMLParsers} +object implements parsers for the following grammar. +\begin{lstlisting} +term ::= "\" ident "." term + | term1 {term1} + | "let" ident "=" term "in" term +term1 ::= ident + | "(" term ")" +all ::= term ";" +\end{lstlisting} +Input as a whole is described by the production \code{all}; it +consists of a term followed by a semicolon. We allow ``whitespace'' +consisting of one or more space, tabulator or newline characters +between any two lexemes (this is not reflected in the grammar +above). Identifiers are defined as in +Chapter~\ref{sec:combinator-parsing} except that an identifier cannot +be one of the two reserved words "let" and "in". +\begin{lstlisting} +abstract class MiniMLParsers[intype] extends CharParsers[intype] { + + /** whitespace */ + def whitespace = rep{chr(' ') ||| chr('\t') ||| chr('\n')}; + + /** A given character, possible preceded by whitespace */ + def wschr(ch: char) = whitespace &&& chr(ch); + + /** identifiers or keywords */ + def id: Parser[String] = + for ( + val c: char <- whitespace &&& chr(Character.isLetter); + val cs: List[char] <- rep(chr(Character.isLetterOrDigit)) + ) yield (c :: cs).mkString("", "", ""); + + /** Non-keyword identifiers */ + def ident: Parser[String] = + for (val s <- id; s != "let" && s != "in") yield s; + + /** term = '\' ident '.' term | term1 {term1} | let ident "=" term in term */ + def term: Parser[Term] = + ( for ( + val _ <- wschr('\\'); + val x <- ident; + val _ <- wschr('.'); + val t <- term) + yield Lam(x, t): Term ) + ||| + ( for ( + val letid <- id; letid == "let"; + val x <- ident; + val _ <- wschr('='); + val t <- term; + val inid <- id; inid == "in"; + val c <- term) + yield Let(x, t, c) ) + ||| + ( for ( + val t <- term1; + val ts <- rep(term1)) + yield (t /: ts)((f, arg) => App(f, arg)) ); + + /** term1 = ident | '(' term ')' */ + def term1: Parser[Term] = + ( for (val s <- ident) + yield Var(s): Term ) + ||| + ( for ( + val _ <- wschr('('); + val t <- term; + val _ <- wschr(')')) + yield t ); + + /** all = term ';' */ + def all: Parser[Term] = + for ( + val t <- term; + val _ <- wschr(';')) + yield t; +} +\end{lstlisting} +Here are some sample MiniML programs and the output the type inferencer gives for each of them: +\begin{lstlisting} +> java testInfer +| "\x.\f.f(f x);" +(\x.(\f.(f (f x)))): (a8->((a8->a8)->a8)) + +> java testInfer +| "let id = \x.x +| in if (id true) (id nil) (id (cons zero nil));" +let id = (\x.x) in (((if (id true)) (id nil)) (id ((cons zero) nil))): List[Int] + +> java testInfer +| "let id = \x.x +| in if (id true) (id nil);" +let id = (\x.x) in ((if (id true)) (id nil)): (List[a13]->List[a13]) + +> java testInfer +| "let length = fix (\len.\xs. +| if (isEmpty xs) +| zero +| (succ (len (tail xs)))) +| in (length nil);" +let length = (fix (\len.(\xs.(((if (isEmpty xs)) zero) +(succ (len (tail xs))))))) in (length nil): Int + +> java testInfer +| "let id = \x.x +| in if (id true) (id nil) zero;" +let id = (\x.x) in (((if (id true)) (id nil)) zero): + cannot type: zero + reason: cannot unify Int with List[a14] +\end{lstlisting} + +\begin{exercise}\label{exercise:hm-parse} Using the parser library constructed in +Exercise~\ref{exercise:end-marker}, modify the MiniML parser library +so that no marker ``;'' is necessary for indicating the end of input. +\end{exercise} + +\begin{exercise}\label{execcise:hm-extend} Extend the Mini-ML parser and type +inferencer with a \code{letrec} construct which allows the definition of +recursive functions. Syntax: +\begin{lstlisting} +letrec ident "=" term in term . +\end{lstlisting} +The typing of \code{letrec} is as for {let}, +except that the defined identifier is visible in the defining expression. Using \code{letrec}, the \code{length} function for lists can now be defined as follows. +\begin{lstlisting} +letrec length = \xs. + if (isEmpty xs) + zero + (succ (length (tail xs))) +in ... +\end{lstlisting} +\end{exercise} + +\chapter{Abstractions for Concurrency}\label{sec:ex-concurrency} + +This section reviews common concurrent programming patterns and shows +how they can be implemented in Scala. + +\section{Signals and Monitors} + +\example +The {\em monitor} provides the basic means for mutual exclusion +of processes in Scala. It is defined as follows. +\begin{lstlisting} +trait Monitor { + def synchronized [a] (def e: a): a; + def await(def cond: boolean) = while (false == cond) { wait() } +} +\end{lstlisting} +The \code{synchronized} method in class \code{Monitor} executes its +argument computation \code{e} in mutual exclusive mode -- at any one +time, only one thread can execute a \code{synchronized} argument of a +given monitor. + +Threads can suspend inside a monitor by waiting on a signal. The +standard \code{java.lang.Object} class offers for this purpose methods +\code{send} and \code{notify}. Threads that call the \code{wait} +method wait until a \code{notify} method of the same object is called +subsequently by some other thread. Calls to \code{notify} with no +threads waiting for the signal are ignored. +Here are the signatures of these methods in class +\code{java.lang.Object}. +\begin{lstlisting} + def wait(): unit; + def wait(msec: long): unit; + def notify(): unit; + def notifyAll(): unit; +\end{lstlisting} +There is also a timed form of \code{wait}, which blocks only as long +as no signal was received or the specified amount of time (given in +milliseconds) has elapsed. Furthermore, there is a \code{notifyAll} +method which unblocks all threads which wait for the signal. +These methods, as well as class \code{Monitor} are primitive in +Scala; they are implemented in terms of the underlying runtime system. + +Typically, a thread waits for some condition to be established. If the +condition does not hold at the time of the wait call, the thread +blocks until some other thread has established the condition. It is +the responsibility of this other thread to wake up waiting processes +by issuing a \code{notify} or \code{notifyAll}. Note however, that +there is no guarantee that a waiting process gets to run immediately +when the call to notify is issued. It could be that other processes +get to run first which invalidate the condition again. Therefore, the +correct form of waiting for a condition $C$ uses a while loop: +\begin{lstlisting} +while (!$C$) wait(); +\end{lstlisting} +The monitor class contains a method \code{await} which does the same +thing; using it, the above loop can be expressed as \lstinline@await($C$)@. + +As an example of how monitors are used, here is is an implementation +of a bounded buffer class. +\begin{lstlisting} +class BoundedBuffer[a](N: Int) extends Monitor() { + var in = 0, out = 0, n = 0; + val elems = new Array[a](N); + + def put(x: a) = synchronized { + await (n < N); + elems(in) = x ; in = (in + 1) % N ; n = n + 1; + if (n == 1) notifyAll(); + } + + def get: a = synchronized { + await (n != 0); + val x = elems(out) ; out = (out + 1) % N ; n = n - 1; + if (n == N - 1) notifyAll(); + x + } +} +\end{lstlisting} +And here is a program using a bounded buffer to communicate between a +producer and a consumer process. +\begin{lstlisting} +import concurrent.ops._; +... +val buf = new BoundedBuffer[String](10) +spawn { while (true) { val s = produceString ; buf.put(s) } } +spawn { while (true) { val s = buf.get ; consumeString(s) } } +} +\end{lstlisting} +The \code{spawn} method spawns a new thread which executes the +expression given in the parameter. It is defined in object \code{concurrent.ops} +as follows. +\begin{lstlisting} +def spawn(def p: unit) = { + val t = new Thread() { override def run() = p; } + t.start() +} +\end{lstlisting} + +\comment{ +\section{Logic Variable} + +A logic variable (or lvar for short) offers operations \code{:=} +and \code{value} to define the variable and to retrieve its value. +Variables can be \code{define}d only once. A call to \code{value} +blocks until the variable has been defined. + +Logic variables can be implemented as follows. + +\begin{lstlisting} +class LVar[a] extends Monitor { + private val defined = new Signal + private var isDefined: boolean = false + private var v: a + def value = synchronized { + if (!isDefined) defined.wait + v + } + def :=(x: a) = synchronized { + v = x ; isDefined = true ; defined.send + } +} +\end{lstlisting} +} + +\section{SyncVars} + +A synchronized variable (or syncvar for short) offers \code{get} and +\code{put} operations to read and set the variable. \code{get} operations +block until the variable has been defined. An \code{unset} operation +resets the variable to undefined state. + +Here's the standard implementation of synchronized variables. +\begin{lstlisting} +package scala.concurrent; +class SyncVar[a] with Monitor { + private var isDefined: Boolean = false; + private var value: a = _; + def get = synchronized { + if (!isDefined) wait(); + value + } + def set(x: a) = synchronized { + value = x ; isDefined = true ; notifyAll(); + } + def isSet: Boolean = + isDefined; + def unset = synchronized { + isDefined = false; + } +} +\end{lstlisting} + +\section{Futures} +\label{sec:futures} + +A {\em future} is a value which is computed in parallel to some other +client thread, to be used by the client thread at some future time. +Futures are used in order to make good use of parallel processing +resources. A typical usage is: + +\begin{lstlisting} +import scala.concurrent.ops._; +... +val x = future(someLengthyComputation); +anotherLengthyComputation; +val y = f(x()) + g(x()); +\end{lstlisting} + +The \code{future} method is defined in object +\code{scala.concurrent.ops} as follows. +\begin{lstlisting} +def future[a](def p: a): unit => a = { + val result = new SyncVar[a]; + fork { result.set(p) } + (() => result.get) +} +\end{lstlisting} + +The \code{future} method gets as parameter a computation \code{p} to +be performed. The type of the computation is arbitrary; it is +represented by \code{future}'s type parameter \code{a}. The +\code{future} method defines a guard \code{result}, which takes a +parameter representing the result of the computation. It then forks +off a new thread that computes the result and invokes the +\code{result} guard when it is finished. In parallel to this thread, +the function returns an anonymous function of type \code{a}. +When called, this functions waits on the result guard to be +invoked, and, once this happens returns the result argument. +At the same time, the function reinvokes the \code{result} guard with +the same argument, so that future invocations of the function can +return the result immediately. + +\section{Parallel Computations} + +The next example presents a function \code{par} which takes a pair of +computations as parameters and which returns the results of the computations +in another pair. The two computations are performed in parallel. + +The function is defined in object +\code{scala.concurrent.ops} as follows. +\begin{lstlisting} + def par[a, b](def xp: a, def yp: b): Pair[a, b] = { + val y = new SyncVar[b]; + spawn { y set yp } + Pair(xp, y.get) + } +\end{lstlisting} +Defined in the same place is a function \code{replicate} which performs a +number of replicates of a computation in parallel. Each +replication instance is passed an integer number which identifies it. +\begin{lstlisting} + def replicate(start: Int, end: Int)(p: Int => Unit): Unit = { + if (start == end) + () + else if (start + 1 == end) + p(start) + else { + val mid = (start + end) / 2; + spawn { replicate(start, mid)(p) } + replicate(mid, end)(p) + } + } +\end{lstlisting} + +The next function uses \code{replicate} to perform parallel +computations on all elements of an array. + +\begin{lstlisting} +def parMap[a,b](f: a => b, xs: Array[a]): Array[b] = { + val results = new Array[b](xs.length); + replicate(0, xs.length) { i => results(i) = f(xs(i)) } + results +} +\end{lstlisting} + +\section{Semaphores} + +A common mechanism for process synchronization is a {\em lock} (or: +{\em semaphore}). A lock offers two atomic actions: \prog{acquire} and +\prog{release}. Here's the implementation of a lock in Scala: + +\begin{lstlisting} +package scala.concurrent; + +class Lock with Monitor { + var available = true; + def acquire = synchronized { + if (!available) wait(); + available = false + } + def release = synchronized { + available = true; + notify() + } +} +\end{lstlisting} + +\section{Readers/Writers} + +A more complex form of synchronization distinguishes between {\em +readers} which access a common resource without modifying it and {\em +writers} which can both access and modify it. To synchronize readers +and writers we need to implement operations \prog{startRead}, \prog{startWrite}, +\prog{endRead}, \prog{endWrite}, such that: +\begin{itemize} +\item there can be multiple concurrent readers, +\item there can only be one writer at one time, +\item pending write requests have priority over pending read requests, +but don't preempt ongoing read operations. +\end{itemize} +The following implementation of a readers/writers lock is based on the +{\em mailbox} concept (see Section~\ref{sec:mailbox}). + +\begin{lstlisting} +import scala.concurrent._; + +class ReadersWriters { + val m = new MailBox; + private case class Writers(n: int), Readers(n: int); + Writers(0); Readers(0); + def startRead = m receive { + case Writers(n) if n == 0 => m receive { + case Readers(n) => Writers(0) ; Readers(n+1); + } + } + def startWrite = m receive { + case Writers(n) => + Writers(n+1); + m receive { case Readers(n) if n == 0 => } + } + def endRead = m receive { + case Readers(n) => Readers(n-1) + } + def endWrite = m receive { + case Writers(n) => Writers(n-1) ; if (n == 0) Readers(0) + } +} +\end{lstlisting} + +\section{Asynchronous Channels} + +A fundamental way of interprocess communication is the asynchronous +channel. Its implementation makes use the following simple class for linked +lists: +\begin{lstlisting} +class LinkedList[a] { + var elem: a = _; + var next: LinkedList[a] = null; +} +\end{lstlisting} +To facilitate insertion and deletion of elements into linked lists, +every reference into a linked list points to the node which precedes +the node which conceptually forms the top of the list. +Empty linked lists start with a dummy node, whose successor is \code{null}. + +The channel class uses a linked list to store data that has been sent +but not read yet. In the opposite direction, a threads that +wish to read from an empty channel, register their presence by +incrementing the \code{nreaders} field and waiting to be notified. +\begin{lstlisting} +package scala.concurrent; + +class Channel[a] with Monitor { + class LinkedList[a] { + var elem: a = _; + var next: LinkedList[a] = null; + } + private var written = new LinkedList[a]; + private var lastWritten = new LinkedList[a]; + private var nreaders = 0; + + def write(x: a) = synchronized { + lastWritten.elem = x; + lastWritten.next = new LinkedList[a]; + lastWritten = lastWritten.next; + if (nreaders > 0) notify(); + } + + def read: a = synchronized { + if (written.next == null) { + nreaders = nreaders + 1; wait(); nreaders = nreaders - 1; + } + val x = written.elem; + written = written.next; + x + } +} +\end{lstlisting} + +\section{Synchronous Channels} + +Here's an implementation of synchronous channels, where the sender of +a message blocks until that message has been received. Synchronous +channels only need a single variable to store messages in transit, but +three signals are used to coordinate reader and writer processes. +\begin{lstlisting} +package scala.concurrent; + +class SyncChannel[a] with Monitor { + private var data: a = _; + private var reading = false; + private var writing = false; + + def write(x: a) = synchronized { + await(!writing); + data = x; + writing = true; + if (reading) notifyAll(); + else await(reading) + } + + def read: a = synchronized { + await(!reading); + reading = true; + await(writing); + val x = data; + writing = false; + reading = false; + notifyAll(); + x + } +} +\end{lstlisting} + +\section{Workers} + +Here's an implementation of a {\em compute server} in Scala. The +server implements a \code{future} method which evaluates a given +expression in parallel with its caller. Unlike the implementation in +Section~\ref{sec:futures} the server computes futures only with a +predefined number of threads. A possible implementation of the server +could run each thread on a separate processor, and could hence avoid +the overhead inherent in context-switching several threads on a single +processor. + +\begin{lstlisting} +import scala.concurrent._, scala.concurrent.ops._; + +class ComputeServer(n: Int) { + + private trait Job { + type t; + def task: t; + def ret(x: t): Unit; + } + + private val openJobs = new Channel[Job](); + + private def processor(i: Int): Unit = { + while (true) { + val job = openJobs.read; + job.ret(job.task) + } + } + + def future[a](def p: a): () => a = { + val reply = new SyncVar[a](); + openJobs.write{ + new Job { + type t = a; + def task = p; + def ret(x: a) = reply.set(x); + } + } + () => reply.get + } + + spawn(replicate(0, n) { processor }) +} +\end{lstlisting} +Expressions to be computed (i.e. arguments +to calls of \code{future}) are written to the \code{openJobs} +channel. A {\em job} is an object with +\begin{itemize} +\item +An abstract type \code{t} which describes the result of the compute +job. +\item +A parameterless \code{task} method of type \code{t} which denotes +the expression to be computed. +\item +A \code{return} method which consumes the result once it is +computed. +\end{itemize} +The compute server creates $n$ \code{processor} processes as part of +its initialization. Every such process repeatedly consumes an open +job, evaluates the job's \code{task} method and passes the result on +to the job's +\code{return} method. The polymorphic \code{future} method creates +a new job where the \code{return} method is implemented by a guard +named \code{reply} and inserts this job into the set of open jobs by +calling the \code{isOpen} guard. It then waits until the corresponding +\code{reply} guard is called. + +The example demonstrates the use of abstract types. The abstract type +\code{t} keeps track of the result type of a job, which can vary +between different jobs. Without abstract types it would be impossible +to implement the same class to the user in a statically type-safe +way, without relying on dynamic type tests and type casts. + + +Here is some code which uses the compute server to evaluate +the expression \code{41 + 1}. +\begin{lstlisting} +object Test with Executable { + val server = new ComputeServer(1); + val f = server.future(41 + 1); + Console.println(f()) +} +\end{lstlisting} + +\section{Mailboxes} +\label{sec:mailbox} + +Mailboxes are high-level, flexible constructs for process +synchronization and communication. They allow sending and receiving of +messages. A {\em message} in this context is an arbitrary object. +There is a special message \code{TIMEOUT} which is used to signal a +time-out. +\begin{lstlisting} +case class TIMEOUT; +\end{lstlisting} +Mailboxes implement the following signature. +\begin{lstlisting} +class MailBox { + def send(msg: Any): unit; + def receive[a](f: PartialFunction[Any, a]): a; + def receiveWithin[a](msec: long)(f: PartialFunction[Any, a]): a; +} +\end{lstlisting} +The state of a mailbox consists of a multi-set of messages. +Messages are added to the mailbox the \code{send} method. Messages +are removed using the \code{receive} method, which is passed a message +processor \code{f} as argument, which is a partial function from +messages to some arbitrary result type. Typically, this function is +implemented as a pattern matching expression. The \code{receive} +method blocks until there is a message in the mailbox for which its +message processor is defined. The matching message is then removed +from the mailbox and the blocked thread is restarted by applying the +message processor to the message. Both sent messages and receivers are +ordered in time. A receiver $r$ is applied to a matching message $m$ +only if there is no other (message, receiver) pair which precedes $(m, +r)$ in the partial ordering on pairs that orders each component in +time. + +As a simple example of how mailboxes are used, consider a +one-place buffer: +\begin{lstlisting} +class OnePlaceBuffer { + private val m = new MailBox; // An internal milbox + private case class Empty, Full(x: int); // Types of messages we deal with + m send Empty; // Initialization + def write(x: int): unit = + m receive { case Empty => m send Full(x) } + def read: int = + m receive { case Full(x) => m send Empty ; x } +} +\end{lstlisting} +Here's how the mailbox class can be implemented: +\begin{lstlisting} +class MailBox with Monitor { + private abstract class Receiver extends Signal { + def isDefined(msg: Any): boolean; + var msg = null; + } +\end{lstlisting} +We define an internal class for receivers with a test method +\code{isDefined}, which indicates whether the receiver is +defined for a given message. The receiver inherits from class +\code{Signal} a \code{notify} method which is used to wake up a +receiver thread. When the receiver thread is woken up, the message it +needs to be applied to is stored in the \code{msg} variable of +\code{Receiver}. +\begin{lstlisting} + private val sent = new LinkedList[Any]; + private var lastSent = sent; + private val receivers = new LinkedList[Receiver]; + private var lastReceiver = receivers; +\end{lstlisting} +The mailbox class maintains two linked lists, +one for sent but unconsumed messages, the other for waiting receivers. +\begin{lstlisting} + def send(msg: Any): unit = synchronized { + var r = receivers, r1 = r.next; + while (r1 != null && !r1.elem.isDefined(msg)) { + r = r1; r1 = r1.next; + } + if (r1 != null) { + r.next = r1.next; r1.elem.msg = msg; r1.elem.notify; + } else { + lastSent = insert(lastSent, msg); + } + } +\end{lstlisting} +The \code{send} method first checks whether a waiting receiver is +applicable to the sent message. If yes, the receiver is notified. +Otherwise, the message is appended to the linked list of sent messages. +\begin{lstlisting} + def receive[a](f: PartialFunction[Any, a]): a = { + val msg: Any = synchronized { + var s = sent, s1 = s.next; + while (s1 != null && !f.isDefinedAt(s1.elem)) { + s = s1; s1 = s1.next + } + if (s1 != null) { + s.next = s1.next; s1.elem + } else { + val r = insert(lastReceiver, new Receiver { + def isDefined(msg: Any) = f.isDefinedAt(msg); + }); + lastReceiver = r; + r.elem.wait(); + r.elem.msg + } + } + f(msg) + } +\end{lstlisting} +The \code{receive} method first checks whether the message processor function +\code{f} can be applied to a message that has already been sent but that +was not yet consumed. If yes, the thread continues immediately by +applying \code{f} to the message. Otherwise, a new receiver is created +and linked into the \code{receivers} list, and the thread waits for a +notification on this receiver. Once the thread is woken up again, it +continues by applying \code{f} to the message that was stored in the +receiver. The insert method on linked lists is defined as follows. +\begin{lstlisting} + def insert(l: LinkedList[a], x: a): LinkedList[a] = { + l.next = new LinkedList[a]; + l.next.elem = x; + l.next.next = l.next; + l + } +\end{lstlisting} +The mailbox class also offers a method \code{receiveWithin} +which blocks for only a specified maximal amount of time. If no +message is received within the specified time interval (given in +milliseconds), the message processor argument $f$ will be unblocked +with the special \code{TIMEOUT} message. The implementation of +\code{receiveWithin} is quite similar to \code{receive}: +\begin{lstlisting} + def receiveWithin[a](msec: long)(f: PartialFunction[Any, a]): a = { + val msg: Any = synchronized { + var s = sent, s1 = s.next; + while (s1 != null && !f.isDefinedAt(s1.elem)) { + s = s1; s1 = s1.next ; + } + if (s1 != null) { + s.next = s1.next; s1.elem + } else { + val r = insert(lastReceiver, new Receiver { + def isDefined(msg: Any) = f.isDefinedAt(msg); + }); + lastReceiver = r; + r.elem.wait(msec); + if (r.elem.msg == null) r.elem.msg = TIMEOUT; + r.elem.msg + } + } + f(msg) + } +} // end MailBox +\end{lstlisting} +The only differences are the timed call to \code{wait}, and the +statement following it. + +\section{Actors} +\label{sec:actors} + +Chapter~\ref{chap:example-auction} sketched as a program example the +implementation of an electronic auction service. This service was +based on high-level actor processes, that work by inspecting messages +in their mailbox using pattern matching. An actor is simply a thread +whose communication primitives are those of a mailbox. Actors are +hence defined as a mixin composition extension of Java's standard +\code{Thread} class with the \code{MailBox} class. +\begin{lstlisting} +abstract class Actor extends Thread with MailBox; +\end{lstlisting} + +\comment{ +As an extended example of an application that uses actors, we come +back to the auction server example of Section~\ref{sec:ex-auction}. +The following code implements: + +\begin{figure}[thb] +\begin{lstlisting} +class AuctionMessage; +case class + Offer(bid: int, client: Process), // make a bid + Inquire(client: Process) extends AuctionMessage // inquire status + +class AuctionReply; +case class + Status(asked; int, expiration: Date), // asked sum, expiration date + BestOffer, // yours is the best offer + BeatenOffer(maxBid: int), // offer beaten by maxBid + AuctionConcluded(seller: Process, client: Process),// auction concluded + AuctionFailed // failed with no bids + AuctionOver extends AuctionReply // bidding is closed +\end{lstlisting} +\end{figure} + +\begin{lstlisting} +class Auction(seller: Process, minBid: int, closing: Date) + extends Process { + + val timeToShutdown = 36000000 // msec + val delta = 10 // bid increment +\end{lstlisting} +\begin{lstlisting} + def run = { + var askedBid = minBid + var maxBidder: Process = null + while (true) { + receiveWithin ((closing - Date.currentDate).msec) { + case Offer(bid, client) => { + if (bid >= askedBid) { + if (maxBidder != null && maxBidder != client) { + maxBidder send BeatenOffer(bid) + } + maxBidder = client + askedBid = bid + delta + client send BestOffer + } else client send BeatenOffer(maxBid) + } +\end{lstlisting} +\begin{lstlisting} + case Inquire(client) => { + client send Status(askedBid, closing) + } +\end{lstlisting} +\begin{lstlisting} + case TIMEOUT => { + if (maxBidder != null) { + val reply = AuctionConcluded(seller, maxBidder) + maxBidder send reply + seller send reply + } else seller send AuctionFailed + receiveWithin (timeToShutdown) { + case Offer(_, client) => client send AuctionOver ; discardAndContinue + case _ => discardAndContinue + case TIMEOUT => stop + } + } +\end{lstlisting} +\begin{lstlisting} + case _ => discardAndContinue + } + } + } +\end{lstlisting} +\begin{lstlisting} + def houseKeeping: int = { + val Limit = 100 + var nWaiting: int = 0 + receiveWithin(0) { + case _ => + nWaiting = nWaiting + 1 + if (nWaiting > Limit) { + receiveWithin(0) { + case Offer(_, _) => continue + case TIMEOUT => + case _ => discardAndContinue + } + } else continue + case TIMEOUT => + } + } +} +\end{lstlisting} +\begin{lstlisting} +class Bidder (auction: Process, minBid: int, maxBid: int) + extends Process { + val MaxTries = 3 + val Unknown = -1 + + var nextBid = Unknown +\end{lstlisting} +\begin{lstlisting} + def getAuctionStatus = { + var nTries = 0 + while (nextBid == Unknown && nTries < MaxTries) { + auction send Inquiry(this) + nTries = nTries + 1 + receiveWithin(waitTime) { + case Status(bid, _) => bid match { + case None => nextBid = minBid + case Some(curBid) => nextBid = curBid + Delta + } + case TIMEOUT => + case _ => continue + } + } + status + } +\end{lstlisting} +\begin{lstlisting} + def bid: unit = { + if (nextBid < maxBid) { + auction send Offer(nextBid, this) + receive { + case BestOffer => + receive { + case BeatenOffer(bestBid) => + nextBid = bestBid + Delta + bid + case AuctionConcluded(seller, client) => + transferPayment(seller, nextBid) + case _ => continue + } + + case BeatenOffer(bestBid) => + nextBid = nextBid + Delta + bid + + case AuctionOver => + + case _ => continue + } + } + } +\end{lstlisting} +\begin{lstlisting} + def run = { + getAuctionStatus + if (nextBid != Unknown) bid + } + + def transferPayment(seller: Process, amount: int) +} +\end{lstlisting} +} diff --git a/doc/reference/ReferencePart.tex b/doc/reference/ReferencePart.tex new file mode 100644 index 0000000000..d6984288e9 --- /dev/null +++ b/doc/reference/ReferencePart.tex @@ -0,0 +1,4579 @@ +\renewcommand{\todo}[1]{} +\newcommand{\notyet}[1]{\footnote{#1 not yet implemented.}} +\newcommand{\Ts}{\mbox{\sl Ts}} +\newcommand{\tps}{\mbox{\sl tps}} +\newcommand{\psig}{\mbox{\sl psig}} +\newcommand{\args}{\mbox{\sl args}} +\newcommand{\targs}{\mbox{\sl targs}} +\newcommand{\enums}{\mbox{\sl enums}} +\newcommand{\proto}{\mbox{\sl pt}} +\newcommand{\argtypes}{\mbox{\sl Ts}} +\newcommand{\stats}{\mbox{\sl stats}} +\newcommand{\overload}{\la\mbox{\sf and}\ra} +\newcommand{\op}{\mbox{\sl op}} + +\newcommand{\ifqualified}[1]{} +\newcommand{\iflet}[1]{} +\newcommand{\ifundefvar}[1]{} +\newcommand{\iffinaltype}[1]{} +\newcommand{\ifpackaging}[1]{} +\newcommand{\ifnewfor}[1]{} + +\chapter{Lexical Syntax} + +This chapter defines the syntax of Scala tokens. Tokens are +constructed from characters in the following character sets: +\begin{enumerate} +\item Whitespace characters. +\item Lower case letters ~\lstinline@`a' | $\ldots$ | `z'@~ and +upper case letters ~\lstinline@`A' | $\ldots$ | `Z' | `$\Dollar$' | `_'@. +\item Digits ~\lstinline@`0' | $\ldots$ | `9'@. +\item Parentheses ~\lstinline@`(' | `)' | `[' | `]' | `{' | `}'@. +\item Delimiter characters ~\lstinline@``' | `'' | `"' | `.' | `;' | `,'@. +\item Operator characters. These include all printable ASCII characters +which are in none of the sets above. +\end{enumerate} + +These sets are extended in the usual way to Unicode. + +\section{Identifiers}\label{sec:idents} + +\syntax\begin{lstlisting} +op ::= special {special} +varid ::= lower {letter $|$ digit} [`_' [id]] +id ::= upper {letter $|$ digit} [`_' [id]] + | varid + | op + | ```string chars`'' +\end{lstlisting} + +There are three ways to form an identifier. First, an identifier can +start with a letter which can be followed by an arbitrary sequence of +letters and digits. This may be followed by an underscore +`\lstinline@_@' character and another string of characters that by +themselves make up an identifier. Second, an identifier can be start +with a special character followed by an arbitrary sequence of special +characters. Finally, an identifier may also be formed by an arbitrary +string between backquotes (host systems may impose some restrictions +on which strings are legal for identifiers). As usual, a longest +match rule applies. For instance, the string + +\begin{lstlisting} +big_bob++=z3 +\end{lstlisting} + +decomposes into the three identifiers \lstinline@big_bob@, \lstinline@++=@, and +\code{z3}. The rules for pattern matching further distinguish between +{\em variable identifiers}, which start with a lower case letter, and +{\em constant identifiers}, which do not. + + +The `\lstinline[mathescape=false]@$@'\comment{$} character is reserved for compiler-synthesized identifiers. +User programs are not allowed to define identifiers which contain `\lstinline[mathescape=false]@$@'\comment{$} +characters. + +The following names are reserved words instead of being members of the +syntactic class \code{id} of lexical identifiers. + +\begin{lstlisting} +abstract case catch class def +do else extends false final +finally for if import new +null object override package private +protected return sealed super this +throw trait try true type +val var while with yield +_ : = => <- <: >: # @ +\end{lstlisting} + +The Unicode operator `$\Rightarrow$' has the ASCII equivalent +`$=>$', which is also reserved. + +\example +Here are examples of identifiers: +\begin{lstlisting} + x Object maxIndex p2p empty_? + + +_field +\end{lstlisting} + +\section{Braces and Semicolons} + +A semicolon `\lstinline@;@' is implicitly inserted after every closing brace +if there is a new line character between closing brace and the next +regular token after it, except if that token cannot legally start a +statement. + +The tokens which cannot legally start a statement +are the following delimiters and reserved words: +\begin{lstlisting} +catch else extends finally with yield +, . ; : = => <- <: >: # @ ) ] } +\end{lstlisting} + +\section{Literals} + +There are literals for integer numbers (of types \code{Int} and \code{Long}), +floating point numbers (of types \code{Float} and \code{Double}), characters, and +strings. The syntax of these literals is in each case as in Java. + +\syntax\begin{lstlisting} +intLit ::= $\mbox{\rm\em ``as in Java''}$ +floatLit ::= $\mbox{\rm\em ``as in Java''}$ +charLit ::= $\mbox{\rm\em ``as in Java''}$ +stringLit ::= $\mbox{\rm\em ``as in Java''}$ +\end{lstlisting} + +\section{Whitespace and Comments} + +Tokens may be separated by whitespace characters (ASCII codes 0 to 32) +and/or comments. Comments come in two forms: + +A single-line comment is a sequence of characters which starts with +\lstinline@//@ and extends to the end of the line. + +A multi-line comment is a sequence of characters between \lstinline@/*@ and +\lstinline@*/@. Multi-line comments may be nested. + + +\chapter{\label{sec:names}Identifiers, Names and Scopes} + +Names in Scala identify types, values, methods, and classes which +are collectively called {\em entities}. Names are introduced by +definitions, declarations (\sref{sec:defs}) or import clauses +(\sref{sec:import}), which are collectively called {\em binders}. + +There are two different name spaces, one for types (\sref{sec:types}) +and one for terms (\sref{sec:exprs}). The same name may designate a +type and a term, depending on the context where the name is used. + +A definition or declaration has a {\em scope} in which the entity +defined by a single name can be accessed using a simple name. Scopes +are nested, and a definition or declaration in some inner scope {\em +shadows} a definition in an outer scope that contributes to the same +name space. Furthermore, a definition or declaration shadows bindings +introduced by a preceding import clause, even if the import clause is +in the same block. Import clauses, on the other hand, only shadow +bindings introduced by other import clauses in outer blocks. + +A reference to an unqualified (type- or term-) identifier $x$ is bound +by the unique binder, which +\begin{itemize} +\item defines an entity with name $x$ in the same namespace as the +identifier, and +\item shadows all other binders that define entities with name $x$ in that namespace. +\end{itemize} +It is an error if no such binder exists. If $x$ is bound by an import +clause, then the simple name $x$ is taken to be equivalent to the +qualified name to which $x$ is mapped by the import clause. If $x$ is bound by a definition or declaration, +then $x$ refers to the entity introduced by that +binder. In that case, the type of $x$ is the type of the referenced +entity. + +\example Consider the following nested definitions and imports: + +\begin{lstlisting} +object m1 { + object m2 { val x: int = 1; val y: int = 2 } + object m3 { val x: boolean = true; val y: String = "" } + val x: int = 3; + { import m2._; // shadows nothing + // reference to `x' is ambiguous here + val x: String = "abc"; // shadows preceding import + // name `x' refers to latest val definition + { import m3._ // shadows only preceding import m2 + // reference to `x' is ambiguous here + // name `y' refers to latest import clause + } + } +} +\end{lstlisting} + +A reference to a qualified (type- or term-) identifier $e.x$ refers to +the member of the type $T$ of $e$ which has the name $x$ in the same +namespace as the identifier. It is an error if $T$ is not a value type +(\sref{sec:value-types}). The type of $e.x$ is the member type of the +referenced entity in $T$. + +\chapter{\label{sec:types}Types} + +\syntax\begin{lstlisting} + Type ::= Type1 `=>' Type + | `(' [Types] `)' `=>' Type + | Type1 + Type1 ::= SimpleType {with SimpleType} [Refinement] + SimpleType ::= StableId + | SimpleType `#' id + | Path `.' type + | SimpleType TypeArgs + | `(' Type ')' + Types ::= Type {`,' Type} +\end{lstlisting} + +We distinguish between first-order types and type constructors, which +take type parameters and yield types. A subset of first-order types +called {\em value types} represents sets of (first-class) values. +Value types are either {\em concrete} or {\em abstract}. Every +concrete value type can be represented as a {\em class type}, i.e.\ a +type designator (\sref{sec:type-desig}) that refers to a +class\footnote{We assume that objects and packages also +implicitly define a class (of the same name as the object or package, +but inaccessible to user programs).} (\sref{sec:classes}), +or as a {\em compound type} (\sref{sec:compound-types}) +consisting of class types and possibly +also a refinement (\sref{sec:refinements}) that further constrains the +types of its members. + +A shorthand exists for denoting function types +(\sref{sec:function-types}). Abstract value types are introduced by +type parameters and abstract type bindings (\sref{sec:typedcl}). +Parentheses in types are used for grouping. + +Non-value types capture properties of +identifiers that are not values +(\sref{sec:synthetic-types}). There is no syntax to express these +types directly in Scala. + +\section{Paths}\label{sec:paths}\label{sec:stable-ids} + +\syntax\begin{lstlisting} + StableId ::= id + | Path `.' id + | [id '.'] super [`[' id `]'] `.' id + Path ::= StableId + | [id `.'] this +\end{lstlisting} + +Paths are not types themselves, but they can be a part of named types +and in that way form a central role in Scala's type system. + +A path is one of the following. +\begin{itemize} +\item +The empty path $\epsilon$ (which cannot be written explicitly in user programs). +\item +\lstinline@$C$.this@, where $C$ references a class. +The path \code{this} is taken as a shorthand for \lstinline@$C$.this@ where +$C$ is the name of the class directly enclosing the reference. +\item +\lstinline@$p$.$x$@ where $p$ is a path and $x$ is a stable member of $p$. +{\em Stable members} are members introduced by value or object +definitions, as well as packages. +\item +\lstinline@$C$.super.$x$@ or \lstinline@$C$.super[$M\,$].$x$@ +where $C$ references a class and $x$ references a +stable member of the super class or designated mixin class $M$ of $C$. +The prefix \code{super} is taken as a shorthand for \lstinline@$C$.super@ where +$C$ is the name of the class directly enclosing the reference. +\end{itemize} +A {\em stable identifier} is a path which ends in an identifier. + +\section{Value Types}\label{sec:value-types} + +\subsection{Singleton Types} +\label{sec:singleton-type} + +\syntax\begin{lstlisting} + SimpleType ::= Path `.' type +\end{lstlisting} + +A singleton type is of the form \lstinline@$p$.type@, where $p$ is a +path. The type denotes the set of values consisting of +exactly the value denoted by $p$. + +\subsection{Type Projection} +\label{sec:type-project} + +\syntax\begin{lstlisting} +SimpleType ::= SimpleType `#' id +\end{lstlisting} + +A type projection \lstinline@$T$#$x$@ references the type member named +$x$ of type $T$. $T$ must be either a singleton type, +or a non-abstract class type, or a Java class type (in either of the +last two cases, it is guaranteed that $T$ has no abstract type +members). + +\subsection{Type Designators} +\label{sec:type-desig} + +\syntax\begin{lstlisting} + SimpleType ::= StableId +\end{lstlisting} + +A type designator refers to a named value type. It can be simple or +qualified. All such type designators are shorthands for type projections. + +Specifically, the unqualified type name $t$ where $t$ is bound in some +class, object, or package $C$ is taken as a shorthand for +\lstinline@$C$.this.type#$t$@. If $t$ is not bound in a class, object, or +package, then $t$ is taken as a shorthand for +\lstinline@$\epsilon$.type#$t$@. + +A qualified type designator has the form \lstinline@$p$.$t$@ where $p$ is +a path (\sref{sec:paths}) and $t$ is a type name. Such a type designator is +equivalent to the type projection \lstinline@$p$.type#$x$@. + +\example +Some type designators and their expansions are listed below. We assume +a local type parameter $t$, a value \code{mytable} +with a type member \code{Node} and the standard class \lstinline@scala.Int@, +\begin{lstlisting} + t $\epsilon$.type#t + Int scala.type#Int + scala.Int scala.type#Int + data.maintable.Node data.maintable.type#Node +\end{lstlisting} + +\subsection{Parameterized Types} +\label{sec:param-types} + +\syntax\begin{lstlisting} + SimpleType ::= SimpleType TypeArgs + TypeArgs ::= `[' Types `]' +\end{lstlisting} + +A parameterized type $T[U_1 \commadots U_n]$ consists of a type designator +$T$ and type parameters $U_1 \commadots U_n$ where $n \geq 1$. $T$ +must refer to a type constructor which takes $n$ type parameters $a_1 \commadots a_n$ +with lower bounds $L_1 \commadots L_n$ and upper bounds $U_1 \commadots U_n$. + +The parameterized type is well-formed if each actual type parameter +{\em conforms to its bounds}, i.e.\ $L_i\sigma <: T_i <: U_i\sigma$ where $\sigma$ +is the substitution $[a_1 := T_1 \commadots a_n := T_n]$. + +\example\label{ex:param-types} +Given the partial type definitions: + +\begin{lstlisting} + class TreeMap[a <: Ord[a], b] { $\ldots$ } + class List[a] { $\ldots$ } + class I extends Ord[I] { $\ldots$ } +\end{lstlisting} + +the following parameterized types are well formed: + +\begin{lstlisting} + TreeMap[I, String] + List[I] + List[List[Boolean]] +\end{lstlisting} + +\example Given the type definitions of \ref{ex:param-types}, +the following types are ill-formed: + +\begin{lstlisting} + TreeMap[I] // illegal: wrong number of parameters + TreeMap[List[I], Boolean] // illegal: type parameter not within bound +\end{lstlisting} + +\subsection{Compound Types} +\label{sec:compound-types} +\label{sec:refinements} + +\syntax\begin{lstlisting} + Type ::= SimpleType {with SimpleType} [Refinement] + Refinement ::= `{' [RefineStat {`;' RefineStat}] `}' + RefineStat ::= Dcl + | type TypeDef {`,' TypeDef} + | +\end{lstlisting} + +A compound type ~\lstinline@$T_1$ with $\ldots$ with $T_n$ {$R\,$}@~ represents +objects with members as given in the component types $T_1 \commadots +T_n$ and the refinement \lstinline@{$R\,$}@. Each component type $T_i$ must be a +class type \todo{Relax for first?}. A +refinement \lstinline@{$R\,$}@ contains declarations and type +definitions. Each declaration or definition in a refinement must +override a declaration or definition in one of the component types +$T_1 \commadots T_n$. The usual rules for overriding (\sref{sec:overriding}) +apply. If no refinement is given, the empty refinement is implicitly +added, i.e. ~\lstinline@$T_1$ with $\ldots$ with $T_n$@~ is a shorthand for +~\lstinline@$T_1$ with $\ldots$ with $T_n$ {}@. + +\subsection{Function Types} +\label{sec:function-types} + +\syntax\begin{lstlisting} + SimpleType ::= Type1 `=>' Type + | `(' [Types] `)' `=>' Type +\end{lstlisting} +The type ~\lstinline@($T_1 \commadots T_n$) => $U$@~ represents the set of function +values that take arguments of types $T_1 \commadots T_n$ and yield +results of type $U$. In the case of exactly one argument type +~\lstinline@$T$ => $U$@~ is a shorthand for ~\lstinline@($T\,$) => $U$@. Function types +associate to the right, e.g.~\lstinline@($S\,$) => ($T\,$) => $U$@~ is the same as +~\lstinline@($S\,$) => (($T\,$) => $U\,$)@. + +Function types are shorthands for class types that define \code{apply} +functions. Specifically, the $n$-ary function type +~\lstinline@($T_1 \commadots T_n$) => U@~ is a shorthand for the class type +\lstinline@Function$n$[$T_1 \commadots T_n$,$U\,$]@. Such class +types are defined in the Scala library for $n$ between 0 and 9 as follows. +\begin{lstlisting} +package scala; +trait Function$n$[-$T_1 \commadots$ -$T_n$, +$R$] { + def apply($x_1$: $T_1 \commadots x_n$: $T_n$): $R$; + override def toString() = "<function>"; +} +\end{lstlisting} +Hence, function types are covariant in their result type, and +contravariant in their argument types. + +\section{Non-Value Types} +\label{sec:synthetic-types} + +The types explained in the following do not denote sets of values, nor +do they appear explicitely in programs. They are introduced in this +report as the internal types of defined identifiers. + +\subsection{Method Types} +\label{sec:method-types} + +A method type is denoted internally as $(\Ts)U$, where $(\Ts)$ is a +sequence of types $(T_1 \commadots T_n)$ for some $n \geq 0$ +and $U$ is a (value or method) type. This type represents named +methods that take arguments of types $T_1 \commadots T_n$ +and that return a result of type $U$. + +Method types associate to the right: $(\Ts_1)(\Ts_2)U$ is treated as +$(\Ts_1)((\Ts_2)U)$. + +A special case are types of methods without any parameters. They are +written here $[]T$, following the syntax for polymorphic method types +(\sref{sec:poly-types}). Parameterless methods name expressions that +are re-evaluated each time the parameterless method name is +referenced. + +Method types do not exist as types of values. If a method name is used +as a value, its type is implicitly converted to a corresponding +function type (\sref{sec:impl-conv}). + +\example The declarations +\begin{lstlisting} +def a: Int +def b (x: Int): Boolean +def c (x: Int) (y: String, z: String): String +\end{lstlisting} +produce the typings +\begin{lstlisting} +a: [] Int +b: (Int) Boolean +c: (Int) (String, String) String +\end{lstlisting} + +\subsection{Polymorphic Method Types} +\label{sec:poly-types} + +A polymorphic method type is denoted internally as ~\lstinline@[$\tps\,$]$T$@~ where +\lstinline@[$\tps\,$]@ is a type parameter section +~\lstinline@[$a_1$ <: $L_1$ >: $U_1 \commadots a_n$ <: $L_n$ >: $U_n$]@~ +for some $n \geq 0$ and $T$ is a +(value or method) type. This type represents named methods that +take type arguments ~\lstinline@$S_1 \commadots S_n$@~ which +conform (\sref{sec:param-types}) to the lower bounds +~\lstinline@$S_1 \commadots S_n$@~ and the upper bounds +~\lstinline@$U_1 \commadots U_n$@~ and that yield results of type $T$. + +\example The declarations +\begin{lstlisting} +def empty[a]: List[a] +def union[a <: Comparable[a]] (x: Set[a], xs: Set[a]): Set[a] +\end{lstlisting} +produce the typings +\begin{lstlisting} +empty : [a >: All <: Any] List[a] +union : [a >: All <: Comparable[a]] (x: Set[a], xs: Set[a]) Set[a] . +\end{lstlisting} + +\comment{ +\subsection{Overloaded Types} +\label{sec:overloaded-types} + +More than one values or methods are defined in the same scope with the +same name, we model + +An overloaded type consisting of type alternatives $T_1 \commadots +T_n (n \geq 2)$ is denoted internally $T_1 \overload \ldots \overload T_n$. + +\example The definitions +\begin{lstlisting} +def println: unit; +def println(s: string): unit = $\ldots$; +def println(x: float): unit = $\ldots$; +def println(x: float, width: int): unit = $\ldots$; +def println[a](x: a)(tostring: a => String): unit = $\ldots$ +\end{lstlisting} +define a single function \code{println} which has an overloaded +type. +\begin{lstlisting} +println: [] unit $\overload$ + (String) unit $\overload$ + (float) unit $\overload$ + (float, int) unit $\overload$ + [a] (a) (a => String) unit +\end{lstlisting} + +\example The definitions +\begin{lstlisting} +def f(x: T): T = $\ldots$; +val f = 0 +\end{lstlisting} +define a function \code{f} which has type ~\lstinline@(x: T)T $\overload$ Int@. +} + +\section{Base Classes and Member Definitions} +\label{sec:base-classes-member-defs} + +Types, bounds and base classes of class members depend on the way the +members are referenced. Central here are three notions, namely: +\begin{enumerate} +\item the notion of the set of base classes of a type $T$, +\item the notion of a type $T$ in some class $C$ seem from some + prefix type $S$, +\item the notion of a member binding of some type $T$. +\end{enumerate} +These notions are defined mutually recursively as follows. + +1. The set of {\em base classes} of a type is a set of class types, +given as follows. +\begin{itemize} +\item +The base classes of a class type $C$ are the base classes of class +$C$. +\item +The base classes of an aliased type are the base classes of its alias. +\item +The base classes of an abstract type are the base classes of its upper bound. +\item +The base classes of a parameterized type +~\lstinline@$C$[$T_1 \commadots T_n$]@~ are the base classes +of type $C$, where every occurrence of a type parameter $a_i$ +of $C$ has been replaced by the corresponding parameter type $T_i$. +\item +The base classes of a singleton type \lstinline@$p$.type@ are the base classes of +the type of $p$. +\item +The base classes of a compound type +~\lstinline@$T_1$ with $\ldots$ with $T_n$ {$R\,$}@~ +are the {\em reduced union} of the base +classes of all $T_i$'s. This means: +Let the multi-set $\SS$ be the multi-set-union of the +base classes of all $T_i$'s. +If $\SS$ contains several type instances of the same class, say +~\lstinline@$S^i$#$C$[$T^i_1 \commadots T^i_n$]@~ $(i \in I)$, then +all those instances +are replaced by one of them which conforms to all +others. It is an error if no such instance exists, or if $C$ is not a trait +(\sref{sec:traits}). It follows that the reduced union, if it exists, +produces a set of class types, where different types are instances of different classes. +\item +The base classes of a type selection \lstinline@$S$#$T$@ are +determined as follows. If $T$ is an alias or abstract type, the +previous clauses apply. Otherwise, $T$ must be a (possibly +parameterized) class type, which is defined in some class $B$. Then +the base classes of \lstinline@$S$#$T$@ are the base classes of $T$ +in $B$ seen from the prefix type $S$. +\end{itemize} + +2. The notion of a type $T$ +{\em in class $C$ seen from some prefix type +$S\,$} makes sense only if the prefix type $S$ +has a type instance of class $C$ as a base class, say +~\lstinline@$S'$#$C$[$T_1 \commadots T_n$]@. Then we define as follows. +\begin{itemize} + \item + If \lstinline@$S$ = $\epsilon$.type@, then $T$ in $C$ seen from $S$ is $T$ itself. + \item Otherwise, if $T$ is the $i$'th type parameter of some class $D$, then + \begin{itemize} + \item + If $S$ has a base class ~\lstinline@$D$[$U_1 \commadots U_n$]@, for some type parameters + ~\lstinline@[$U_1 \commadots U_n$]@, then $T$ in $C$ seen from $S$ is $U_i$. + \item + Otherwise, if $C$ is defined in a class $C'$, then + $T$ in $C$ seen from $S$ is the same as $T$ in $C'$ seen from $S'$. + \item + Otherwise, if $C$ is not defined in another class, then + $T$ in $C$ seen from $S$ is $T$ itself. + \end{itemize} +\item + Otherwise, + if $T$ is the singleton type \lstinline@$D$.this.type@ for some class $D$ + then + \begin{itemize} + \item + If $D$ is a subclass of $C$ and + $S$ has a type instance of class $D$ among its base classes. + then $T$ in $C$ seen from $S$ is $S$. + \item + Otherwise, if $C$ is defined in a class $C'$, then + $T$ in $C$ seen from $S$ is the same as $T$ in $C'$ seen from $S'$. + \item + Otherwise, if $C$ is not defined in another class, then + $T$ in $C$ seen from $S$ is $T$ itself. + \end{itemize} +\item + If $T$ is some other type, then the described mapping is performed + to all its type components. +\end{itemize} + +If $T$ is a possibly parameterized class type, where $T$'s class +is defined in some other class $D$, and $S$ is some prefix type, +then we use ``$T$ seen from $S$'' as a shorthand for +``$T$ in $D$ seen from $S$. + +3. The {\em member bindings} of a type $T$ are all bindings $d$ such that +there exists a type instance of some class $C$ among the base classes of $T$ +and there exists a definition or declaration $d'$ in $C$ +such that $d$ results from $d'$ by replacing every +type $T'$ in $d'$ by $T'$ in $C$ seen from $T$. + +The {\em definition} of a type projection \lstinline@$S$#$t$@ is the member +binding $d$ of the type $t$ in $S$. In that case, we also say +that \lstinline@$S$#$t$@ {\em is defined by} $d$. + +\section{Relations between types} + +We define two relations between types. +\begin{quote}\begin{tabular}{l@{\gap}l@{\gap}l} +\em Type equivalence & $T \equiv U$ & $T$ and $U$ are interchangeable +in all contexts. +\\ +\em Conformance & $T \conforms U$ & Type $T$ conforms to type $U$. +\end{tabular}\end{quote} + +\subsection{Type Equivalence} +\label{sec:type-equiv} + +Equivalence $(\equiv)$ between types is the smallest congruence\footnote{ A +congruence is an equivalence relation which is closed under formation +of contexts} such that the following holds: +\begin{itemize} +\item +If $t$ is defined by a type alias ~\lstinline@type $t$ = $T$@, then $t$ is +equivalent to $T$. +\item +If a path $p$ has a singleton type ~\lstinline@$q$.type@, then +~\lstinline@$p$.type $\equiv q$.type@. +\item +If $O$ is defined by an object definition, and $p$ is a path +consisting only of package or object selectors and ending in $O$, then +~\lstinline@$O$.this.type $\equiv p$.type@. +\item +Two compound types are equivalent if their component types are +pairwise equivalent and their refinements are equivalent. Two +refinements are equivalent if they bind the same names and the +modifiers, types and bounds of every declared entity are equivalent in +both refinements. +\item +Two method types are equivalent if they have equivalent result +types, both have the same number of parameters, and corresponding +parameters have equivalent types as well as the same \code{def} or +\lstinline@*@ modifiers. Note that the names of parameters do not matter +for method type equivalence. +\item +Two polymorphic types are equivalent if they have the same number of +type parameters, and, after renaming one set of type parameters by +another, the result types as well as lower and upper bounds of +corresponding type parameters are equivalent. +\item +Two overloaded types are equivalent if for every alternative type in +either type there exists an equivalent alternative type in the other. +\end{itemize} + +\subsection{Conformance} +\label{sec:subtyping} + +The conformance relation $(\conforms)$ is the smallest +transitive relation that satisfies the following conditions. +\begin{itemize} +\item Conformance includes equivalence. If $T \equiv U$ then $T \conforms U$. +\item For every value type $T$, + $\mbox{\code{scala.All}} \conforms T \conforms \mbox{\code{scala.Any}}$. +\item For every value type $T \conforms \mbox{\code{scala.AnyRef}}$ + one has $\mbox{\code{scala.AllRef}} \conforms T$. +\item A type variable or abstract type $t$ conforms to its upper bound and + its lower bound conforms to $t$. +\item A class type or parameterized type $c$ conforms to any of its basetypes, $b$. +\item A type projection \lstinline@$T$#$t$@ conforms to \lstinline@$U$#$t$@ if + $T$ conforms to $U$. +\item A parameterized type ~\lstinline@$T$[$T_1 \commadots T_n$]@~ conforms to + ~\lstinline@$T$[$U_1 \commadots U_n$]@~ if + the following three conditions hold for $i = 1 \commadots n$. + \begin{itemize} + \item + If the $i$'th type parameter of $T$ is declared covariant, then $T_i \conforms U_i$. + \item + If the $i$'th type parameter of $T$ is declared contravariant, then $U_i \conforms T_i$. + \item + If the $i$'th type parameter of $T$ is declared neither covariant + nor contravariant, then $U_i \equiv T_i$. + \end{itemize} +\item A compound type ~\lstinline@$T_1$ with $\ldots$ with $T_n$ {$R\,$}@~ conforms to + each of its component types $T_i$. +\item If $T \conforms U_i$ for $i = 1 \commadots n$ and for every + binding of a type or value $x$ in $R$ there exists a member + binding of $x$ in $T$ subsuming it, then $T$ conforms to the + compound type ~\lstinline@$T_1$ with $\ldots$ with $T_n$ {$R\,$}@. +\item If + $T'_i$ conforms to $T_i$ for $i = 1 \commadots n$ and $U$ conforms to $U'$ + then the method type $(T_1 \commadots T_n) U$ conforms to + $(T'_1 \commadots T'_n) U'$. +\item If, assuming +$L'_1 \conforms a_1 \conforms U'_1 \commadots L'_n \conforms a_n \conforms U'_n$ +one has $L_i \conforms L'_i$ and $U'_i \conforms U_i$ +for $i = 1 \commadots n$, as well as $T \conforms T'$ then the polymorphic type +$[a_1 >: L_1 <: U_1 \commadots a_n >: L_n <: U_n] T$ conforms to the polymorphic type +$[a_1 >: L'_1 <: U'_1 \commadots a_n >: L'_n <: U'_n] T'$. +\item +An overloaded type $T_1 \overload \ldots \overload T_n$ conforms to each of its alternative types $T_i$. +\item +A type $S$ conforms to the overloaded type $T_1 \overload \ldots \overload T_n$ +if $S$ conforms to each alternative type $T_i$. \todo{Really?} +\end{itemize} + +A declaration or definition in some compound type of class type $C$ +is {\em subsumes} another +declaration of the same name in some compound type or class type $C'$, if one of the following holds. +\begin{itemize} +\item +A value declaration ~\lstinline@val $x$: $T$@~ or value definition +~\lstinline@val $x$: $T$ = $e$@~ subsumes a value declaration +~\lstinline@val $x$: $T'$@~ if $T \conforms T'$. +\item +A type alias +$\TYPE;t=T$ subsumes a type alias $\TYPE;t=T'$ if +$T \equiv T'$. +\item +A type declaration ~\lstinline@type $t$ >: $L$ <: $U$@~ subsumes +a type declaration ~\lstinline@type $t$ >: $L'$ <: $U'$@~ if $L' \conforms L$ and +$U \conforms U'$. +\item +A type or class definition of some type $t$ subsumes an abstract +type declaration ~\lstinline@type t >: L <: U@~ if +$L \conforms t \conforms U$. +\end{itemize} + +The $(\conforms)$ relation forms a partial order between types. The {\em +least upper bound} or the {\em greatest lower bound} of a set of types +is understood to be relative to that order. + +\paragraph{Note} The least upper bound of a set of types does not always exist. For instance, consider +the class definitions +\begin{lstlisting} +class A[+t] {} +class B extends A[B]; +class C extends A[C]; +\end{lstlisting} +Then the types ~\lstinline@A[Any], A[A[Any]], A[A[A[Any]]], ...@~ form +a descending sequence of upper bounds for \code{B} and \code{C}. The +least upper bound would be the infinite limit of that sequence, which +does ot exist as a Scala type. Since cases like this are in general +impossible to detect, a Scala compiler is free to reject a term +which has a type specified as a least upper or greatest lower bound, +and that bound would be more complex than some compiler-set +limit\footnote{The current Scala compiler limits the nesting level +of parameterization in a such bounds to 10.}. + +\section{Type Erasure} +\label{sec:erasure} + +A type is called {\em generic} if it contains type arguments or type variables. +{\em Type erasure} is a mapping from (possibly generic) types to +non-generic types. We write $|T|$ for the erasure of type $T$. +The erasure mapping is defined as follows. +\begin{itemize} +\item The erasure of a type variable is the erasure of its upper bound. +\item The erasure of a parameterized type $T[T_1 \commadots T_n]$ is $|T|$. +\item The erasure of a singleton type \lstinline@$p$.type@ is the + erasure of the type of $p$. +\item The erasure of a type projection \lstinline@$T$#$x$@ is \lstinline@|$T$|#$x$@. +\item The erasure of a compound type ~\lstinline@$T_1$ with $\ldots$ with $T_n$ {$R\,$}@ + is $|T_1|$. +\item The erasure of every other type is the type itself. +\end{itemize} + +\section{Implicit Conversions} +\label{sec:impl-conv} + +\todo{Include Anything to unit?} + +The following implicit conversions are applied to expressions of +method type that are used as values, rather than being applied to some +arguments. +\begin{itemize} +\item +A parameterless method $m$ of type $[] T$ +is converted to type $T$ by evaluating the expression to which $m$ is bound. +\item +An expression $e$ of polymorphic type +\begin{lstlisting} +[$a_1$ >: $L_1$ <: $U_1 \commadots a_n$ >: $L_n$ <: $U_n$]$T$ +\end{lstlisting} +which does not appear as the function part of +a type application is converted to type $T$ +by determining with local type inference +(\sref{sec:local-type-inf}) instance types ~\lstinline@$T_1 \commadots T_n$@~ +for the type variables ~\lstinline@$a_1 \commadots a_n$@~ and +implicitly embedding $e$ in the type application +~\lstinline@$e$[$U_1 \commadots U_n$]@~ (\sref{sec:type-app}). +\item +An expression $e$ of monomorphic method type +$(\Ts_1) \ldots (\Ts_n) U$ of arity $n > 0$ +which does not appear as the function part of an application is +converted to a function type by implicitly embedding $e$ in +the following term, where $x$ is a fresh variable and each $ps_i$ is a +parameter section consisting of parameters with fresh names of types $\Ts_i$: +\begin{lstlisting} +(val $x$ = $e$ ; $(ps_1) \ldots \Arrow \ldots \Arrow (ps_n) \Arrow x(ps_1)\ldots(ps_n)$) +\end{lstlisting} +This conversion is not applicable to functions with call-by-name +parameters \lstinline@def $x$: $T$@ or repeated parameters +\lstinline@x: T*@, (\sref{sec:parameters}), because its result would +violate the well-formedness rules for anonymous functions +(\sref{sec:closures}). Hence, methods with such parameters +always need to be applied to arguments immediately. +\end{itemize} + +When used in an expression, a value of type \code{byte}, \code{char}, +or \code{short} is always implicitly converted to a value of type +\code{int}. + +If an expression $e$ has type $T$ where $T$ does not conform to the +expected type $pt$ and $T$ has a member named \lstinline@coerce@ of type +$[]U$ where $U$ does comform to $pt$, then the expression is typed and evaluated is if it was +\lstinline@$e$.coerce@. + + +\chapter{Basic Declarations and Definitions} +\label{sec:defs} + +\syntax\begin{lstlisting} + Dcl ::= val ValDcl {`,' ValDcl} + | var VarDcl {`,' VarDcl} + | def FunDcl {`,' FunDcl} + | type TypeDcl {`,' TypeDcl} + Def ::= val PatDef {`,' PatDef} + | var VarDef {`,' VarDef} + | def FunDef {`,' FunDef} + | type TypeDef {`,' TypeDef} + | ClsDef +\end{lstlisting} + +A {\em declaration} introduces names and assigns them types. It can +appear as one of the statements of a class definition +(\sref{sec:templates}) or as part of a refinement in a compound +type (\ref{sec:refinements}). + +A {\em definition} introduces names that denote terms or types. It can +form part of an object or class definition or it can be local to a +block. Both declarations and definitions produce {\em bindings} that +associate type names with type definitions or bounds, and that +associate term names with types. + +The scope of a name introduced by a declaration or definition is the +whole statement sequence containing the binding. However, there is a +restriction on forward references: In a statement sequence $s_1 \ldots +s_n$, if a simple name in $s_i$ refers to an entity defined by $s_j$ +where $j \geq i$, then every non-empty statement between and including +$s_i$ and $s_j$ must be an import clause, +or a function, type, class, or object definition. It may not be +a value definition, a variable defninition, or an expression. + +\comment{ +Every basic definition may introduce several defined names, separated +by commas. These are expanded according to the following scheme: +\bda{lcl} +\VAL;x, y: T = e && \VAL; x: T = e \\ + && \VAL; y: T = x \\[0.5em] + +\LET;x, y: T = e && \LET; x: T = e \\ + && \VAL; y: T = x \\[0.5em] + +\DEF;x, y (ps): T = e &\tab\mbox{expands to}\tab& \DEF; x(ps): T = e \\ + && \DEF; y(ps): T = x(ps)\\[0.5em] + +\VAR;x, y: T := e && \VAR;x: T := e\\ + && \VAR;y: T := x\\[0.5em] + +\TYPE;t,u = T && \TYPE; t = T\\ + && \TYPE; u = t\\[0.5em] +\eda +} + +All definitions have a ``repeated form'' where the initial +definition keyword is followed by several constituent definitions +which are separated by commas. A repeated definition is +always interpreted as a sequence formed from the +constituent definitions. E.g.\ the function definition +~\lstinline@def f(x) = x, g(y) = y@~ expands to +~\lstinline@def f(x) = x; def g(y) = y@~ and +the type definition +~\lstinline@type T, U <: B@~ expands to +~\lstinline@type T; type U <: B@. + +\comment{ +If an element in such a sequence introduces only the defined name, +possibly with some type or value parameters, but leaves out any +aditional parts in the definition, then those parts are implicitly +copied from the next subsequent sequence element which consists of +more than just a defined name and parameters. Examples: +\begin{itemize} +\item[] +The variable declaration ~\lstinline@var x, y: int@~ +expands to ~\lstinline@var x: int; var y: int@. +\item[] +The value definition ~\lstinline@val x, y: int = 1@~ +expands to ~\lstinline@val x: int = 1; val y: int = 1@. +\item[] +The class definition ~\lstinline@case class X(), Y(n: int) extends Z@~ expands to +~\lstinline@case class X extends Z; case class Y(n: int) extends Z@. +\item +The object definition ~\lstinline@case object Red, Green, Blue extends Color@~ +expands to +\begin{lstlisting} +case object Red extends Color; +case object Green extends Color; +case object Blue extends Color . +\end{lstlisting} +\end{itemize} +} +\section{Value Declarations and Definitions} +\label{sec:valdef} + +\syntax\begin{lstlisting} + Dcl ::= val ValDcl {`,' ValDcl} + ValDcl ::= id `:' Type + Def ::= val PatDef {`,' PatDef} + PatDef ::= Pattern2 [`:' Type] `=' Expr +\end{lstlisting} + +A value declaration ~\lstinline@val $x$: $T$@~ introduces $x$ as a name of a value of +type $T$. + +A value definition ~\lstinline@val $x$: $T$ = $e$@~ defines $x$ as a name of the +value that results from the evaluation of $e$. The type $T$ may be +omitted, in which case the type of expression $e$ is assumed. +If a type $T$ is given, then $e$ is expected to conform to it. + +Evaluation of the value definition implies evaluation of its +right-hand side $e$. The effect of the value definition is to bind +$x$ to the value of $e$ converted to type $T$. + +Value definitions can alternatively have a pattern +(\sref{sec:patterns}) as left-hand side. If $p$ is some pattern other +than a simple name or a name followed by a colon and a type, then the +value definition ~\lstinline@val $p$ = $e$@~ is expanded as follows: + +1. If the pattern $p$ has bound variables $x_1 \commadots x_n$, where $n > 1$: +\begin{lstlisting} +val $\Dollar x$ = $e$.match {case $p$ => scala.Tuple$n$($x_1 \commadots x_n$)} +val $x_1$ = $\Dollar x$._1 +$\ldots$ +val $x_n$ = $\Dollar x$._n . +\end{lstlisting} +Here, $\Dollar x$ is a fresh name. The class +\lstinline@Tuple$n$@ is defined for $n = 2 \commadots 9$ in package +\code{scala}. + +2. If $p$ has a unique bound variable $x$: +\begin{lstlisting} +val $x$ = $e$.match { case $p$ => $x$ } +\end{lstlisting} + +3. If $p$ has no bound variables: +\begin{lstlisting} +$e$.match { case $p$ => ()} +\end{lstlisting} + +\example +The following are examples of value definitions +\begin{lstlisting} +val pi = 3.1415; +val pi: double = 3.1415; // equivalent to first definition +val Some(x) = f(); // a pattern definition +val x :: xs = mylist; // an infix pattern definition +\end{lstlisting} + +The last two definitions have the following expansions. +\begin{lstlisting} +val x = f().match { case Some(x) => x } + +val x$\Dollar$ = mylist.match { case x :: xs => scala.Tuple2(x, xs) } +val x = x$\Dollar$._1; +val xs = x$\Dollar$._2; +\end{lstlisting} + +\section{Variable Declarations and Definitions} +\label{sec:vardef} + +\syntax\begin{lstlisting} + Dcl ::= var VarDcl {`,' VarDcl} + Def ::= var ValDef {`,' ValDef} + VarDcl ::= id `:' Type + VarDef ::= id [`:' Type] `=' Expr + | id `:' Type `=' `_' +\end{lstlisting} + +A variable declaration ~\lstinline@var $x$: $T$@~ is equivalent to declarations +of a {\em getter function} $x$ and a {\em setter function} +\lstinline@$x$_=@, defined as follows: + +\begin{lstlisting} + def $x$: $T$; + def $x$_= ($y$: $T$): unit +\end{lstlisting} + +An implementation of a class containing variable declarations +may define these variables using variable definitions, or it may +define setter and getter functions directly. + +A variable definition ~\lstinline@var $x$: $T$ = $e$@~ introduces a mutable +variable with type $T$ and initial value as given by the +expression $e$. The type $T$ can be omitted, +in which case the type of $e$ is assumed. If $T$ is given, then $e$ +is expected to conform to it. + +A variable definition ~\lstinline@var $x$: $T$ = _@~ introduces a mutable +variable with type \ $T$ and a default initial value. +The default value depends on the type $T$ as follows: +\begin{quote}\begin{tabular}{ll} +\code{0} & if $T$ is \code{int} or one of its subrange types, \\ +\code{0L} & if $T$ is \code{long},\\ +\lstinline@0.0f@ & if $T$ is \code{float},\\ +\lstinline@0.0d@ & if $T$ is \code{double},\\ +\code{false} & if $T$ is \code{boolean},\\ +\lstinline@()@ & if $T$ is \code{unit}, \\ +\code{null} & for all other types $T$. +\end{tabular}\end{quote} + +When they occur as members of a template, both forms of variable +definition also introduce a getter function $x$ which returns the +value currently assigned to the variable, as well as a setter function +\lstinline@$x$_=@ which changes the value currently assigned to the variable. +The functions have the same signatures as for a variable declaration. +The getter and setter functions are then members of the template +instead of the variable accessed by them. + +\example The following example shows how {\em properties} can be +simulated in Scala. It defines a class \code{TimeOfDayVar} of time +values with updatable integer fields representing hours, minutes, and +seconds. Its implementation contains tests that allow only legal +values to be assigned to these fields. The user code, on the other +hand, accesses these fields just like normal variables. + +\begin{lstlisting} +class TimeOfDayVar { + private var h: int = 0, m: int = 0, s: int = 0; + + def hours = h; + def hours_= (h: int) = if (0 <= h && h < 24) this.h = h + else throw new DateError(); + + def minutes = m + def minutes_= (m: int) = if (0 <= m && m < 60) this.m = m + else throw new DateError(); + + def seconds = s + def seconds_= (s: int) = if (0 <= s && s < 60) this.s = s + else throw new DateError(); +} +val t = new TimeOfDayVar; +d.hours = 8; d.minutes = 30; d.seconds = 0; +d.hours = 25; // throws a DateError exception +\end{lstlisting} + +\section{Type Declarations and Type Aliases} +\label{sec:typedcl} +\label{sec:typealias} + +\syntax\begin{lstlisting} + Dcl ::= type TypeDcl {`,' TypeDcl} + TypeDcl ::= id [>: Type] [<: Type] + Def ::= type TypeDef {`,' TypeDef} + TypeDef ::= id [TypeParamClause] `=' Type +\end{lstlisting} + +A {\em type declaration} ~\lstinline@type $t$ >: $L$ <: $U$@~ declares $t$ to +be an abstract type with lower bound type $L$ and upper bound +type $U$. If such a declaration appears as a member declaration +of a type, implementations of the type may implement $t$ with any +type $T$ for which $L \conforms T \conforms U$. Either or both bounds may +be omitted. If the lower bound $L$ is missing, the bottom type +\lstinline@scala.All@ is assumed. If the upper bound $U$ is missing, +the top type \lstinline@scala.Any@ is assumed. + +A {\em type alias} ~\lstinline@type $t$ = $T$@~ defines $t$ to be an alias +name for the type $T$. The left hand side of a type alias may +have a type parameter clause, e.g. ~\lstinline@type $t$[$\tps\,$] = $T$@. The scope +of a type parameter extends over the right hand side $T$ and the +type parameter clause $\tps$ itself. + +The scope rules for definitions (\sref{sec:defs}) and type parameters +(\sref{sec:funsigs}) make it possible that a type name appears in its +own bound or in its right-hand side. However, it is a static error if +a type alias refers recursively to the defined type constructor itself. +That is, the type $T$ in a type alias ~\lstinline@type $t$[$\tps\,$] = $T$@~ may not refer +directly or indirectly to the name $t$. It is also an error if +an abstract type is directly or indirectly its own upper or lower bound. + +\example The following are legal type declarations and definitions: +\begin{lstlisting} +type IntList = List[Integer]; +type T <: Comparable[T]; +type Two[a] = Tuple2[a, a]; +\end{lstlisting} + +The following are illegal: +\begin{lstlisting} +type Abs = Comparable[Abs]; // recursive type alias + +type S <: T; // S, T are bounded by themselves. +type T <: S; + +type T <: AnyRef with T; // T is abstract, may not be part of + // compound type + +type T >: Comparable[T.That]; // Cannot select from T. + // T is a type, not a value +\end{lstlisting} + +If a type alias ~\lstinline@type $t$[$\tps\,$] = $S$@~ refers to a class type +$S$, the name $t$ can also be used as a constructor for +objects of type $S$. + +\example The \code{Predef} module contains a definition which establishes \code{Pair} +as an alias of the parameterized class \code{Tuple2}: +\begin{lstlisting} +type Pair[+a, +b] = Tuple2[a, b]; +\end{lstlisting} +As a consequence, for any two types $S$ and $T$, the type +~\lstinline@Pair[$S$, $T\,$]@~ is equivalent to the type ~\lstinline@Tuple2[$S$, $T\,$]@. +\code{Pair} can also be used as a constructor instead of \code{Tuple2}, as in +\begin{lstlisting} +new Pair[Int, Int](1, 2) . +\end{lstlisting} + +\section{Type Parameters} + +\syntax\begin{lstlisting} + TypeParamClause ::= `[' TypeParam {`,' TypeParam} `]' + TypeParam ::= [`+' | `-'] TypeDcl +\end{lstlisting} + + +Type parameters appear in type definitions, class definitions, and +function definitions. The most general form of a type parameter is +~\lstinline@$\pm t$ >: $L$ <: $U$@. Here, $L$, and $U$ are lower +and upper bounds that constrain possible type arguments for the +parameter, and $\pm$ is a {\em variance}, i.e.\ an optional prefix +of either \lstinline@+@, or \lstinline@-@. + +\comment{ +The upper bound $U$ in a type parameter clauses may not be a final +class. The lower bound may not denote a value type.\todo{Why} +} +The names of all type parameters in a type parameter clause must be +pairwise different. The scope of a type parameter includes in each +case the whole type parameter clause. Therefore it is possible that a +type parameter appears as part of its own bounds or the bounds of +other type parameters in the same clause. However, a type parameter +may not be bounded directly or indirectly by itself. + +\example Here are some well-formed type parameter clauses: +\begin{lstlisting} +[s, t] +[ex <: Throwable] +[a <: Ord[b], b <: a] +[a, b, c >: a <: b] +\end{lstlisting} +The following type parameter clauses are illegal +since type parameter are bounded by themselves. +\begin{lstlisting} +[a >: a] +[a <: b, b <: c, c <: a] +\end{lstlisting} + +Variance annotations indicate how type instances with the given type +parameters vary with respect to subtyping (\sref{sec:subtyping}). A +`\lstinline@+@' variance indicates a covariant dependency, a `\lstinline@-@' +variance indicates a contravariant dependency, and a missing variance +indication indicates an invariant dependency. + +A variance annotation constrains the way the annotated type variable +may appear in the type or class which binds the type parameter. In a +type definition ~\lstinline@type $t$[$\tps\,$] = $S$@, type parameters labeled +`\lstinline@+@' must only appear in covariant position in $S$ whereas +type parameters labeled `\lstinline@-@' must only appear in contravariant +position. Analogously, for a class definition +~\lstinline@class $c$[$\tps\,$]($ps\,$): $s$ extends $t$@, type parameters labeled +`\lstinline@+@' must only appear in covariant position in the self type +$s$ and the template $t$, whereas type +parameters labeled `\lstinline@-@' must only appear in contravariant +position. + +The variance position of a type parameter in a type or template is +defined as follows. Let the opposite of covariance be contravariance, +and the opposite of invariance be itself. The top-level of the type +or template is always in covariant position. The variance position +changes at the following constructs. +\begin{itemize} +\item +The variance position of a method parameter is the opposite of the +variance position of the enclosing parameter clause. +\item +The variance position of a type parameter is the opposite of the +variance position of the enclosing type parameter clause. +\item +The variance position of the lower bound of a type declaration or type parameter +is the opposite of the variance position of the type declaration or parameter. +\item +The right hand side $S$ of a type alias ~\lstinline@type $t$[$\tps\,$] = $S$@~ +is always in invariant position. +\item +The type of a mutable variable is always in invariant position. +\item +The prefix $S$ of a type selection \lstinline@$S$#$T$@ is always in invariant position. +\item +For a type argument $T$ of a type ~\lstinline@$S$[$\ldots T \ldots$ ]@: If the +corresponding type parameter is invariant, then $T$ is in +invariant position. If the corresponding type parameter is +contravariant, the variance position of $T$ is the opposite of +the variance position of the enclosing type ~\lstinline@$S$[$\ldots T \ldots$ ]@. +\end{itemize} + +\example The following variance annotation is legal. +\begin{lstlisting} +class P[a, b] { + val fst: a, snd: b +}\end{lstlisting} +With this variance annotation, elements +of type $P$ subtype covariantly with respect to their arguments. +For instance, +\begin{lstlisting} +P[IOExeption, String] <: P[Throwable, AnyRef] . +\end{lstlisting} + +If we make the elements of $P$ mutable, +the variance annotation becomes illegal. +\begin{lstlisting} +class Q[+a, +b] { + var fst: a, snd: b // **** error: illegal variance: + // `a', `b' occur in invariant position. +} +\end{lstlisting} + +\example The following variance annotation is illegal, since $a$ appears +in contravariant position in the parameter of \code{append}: + +\begin{lstlisting} +trait Vector[+a] { + def append(x: Vector[a]): Vector[a]; + // **** error: illegal variance: + // `a' occurs in contravariant position. +} +\end{lstlisting} +The problem can be avoided by generalizing the type of \code{append} +by means of a lower bound: + +\begin{lstlisting} +trait Vector[+a] { + def append[b >: a](x: Vector[b]): Vector[b]; +} +\end{lstlisting} + +\example Here is a case where a contravariant type parameter is useful. + +\begin{lstlisting} +trait OutputChannel[-a] { + def write(x: a): unit +} +\end{lstlisting} +With that annotation, we have that +\lstinline@OutputChannel[AnyRef]@ conforms to \lstinline@OutputChannel[String]@. +That is, a +channel on which one can write any object can substitute for a channel +on which one can write only strings. + +\section{Function Declarations and Definitions} +\label{sec:defdef} +\label{sec:funsigs} +\label{sec:parameters} + +\syntax\begin{lstlisting} +Dcl ::= def FunDcl {`,' FunDcl} +FunDcl ::= id [FunTypeParamClause] {ParamClause} `:' Type +Def ::= def FunDef {`,' FunDef} +FunDef ::= id [FunTypeParamClause] {ParamClause} + [`:' Type] `=' Expr +FunTypeParamClause ::= `[' TypeDcl {`,' TypeDcl} `]' +ParamClause ::= `(' [Param {`,' Param}] `)' +Param ::= [def] id `:' Type [`*'] +\end{lstlisting} + +A function declaration has the form ~\lstinline@def $f \psig$: $T$@, where +$f$ is the function's name, $\psig$ is its parameter +signature and $T$ is its result type. A function definition +~\lstinline@$f \psig$: $T$ = $e$@~ also includes a {\em function body} $e$, +i.e.\ an expression which defines the function's result. A parameter +signature consists of an optional type parameter clause \lstinline@[$\tps\,$]@, +followed by zero or more value parameter clauses +~\lstinline@($ps_1$)$\ldots$($ps_n$)@. Such a declaration or definition +introduces a value with a (possibly polymorphic) method type whose +parameter types and result type are as given. + +A type parameter clause $\tps$ consists of one or more type +declarations (\sref{sec:typedcl}), which introduce type parameters, +possibly with bounds. The scope of a type parameter includes +the whole signature, including any of the type parameter bounds as +well as the function body, if it is present. + +A value parameter clause $ps$ consists of zero or more formal +parameter bindings such as \lstinline@$x$: $T$@, which bind value +parameters and associate them with their types. The scope of a formal +value parameter name $x$ is the function body, if one is +given. Both type parameter names and value parameter names must be +pairwise distinct. + +Value parameters may be prefixed by \code{def}, e.g.\ +~\lstinline@def $x$:$T$@. The type of such a parameter is then the +parameterless method type ~\lstinline@[]$T$@. This indicates that the +corresponding argument is not evaluated at the point of function +application, but instead is evaluated at each use within the +function. That is, the argument is evaluated using {\em call-by-name}. + +\example The declaration +\begin{lstlisting} +def whileLoop (def cond: Boolean) (def stat: Unit): Unit +\end{lstlisting} +produces the typing +\begin{lstlisting} +whileLoop: (cond: [] Boolean) (stat: [] Unit) Unit +\end{lstlisting} +which indicates that both parameters of \code{while} are evaluated using +call-by-name. + +The last value parameter of a parameter section may be suffixed by +``\code{*}'', e.g.\ ~\lstinline@(..., $x$:$T$*)@. The type of such a +{\em repeated} parameter inside the method is then the sequence type +\lstinline@scala.Seq[$T$]@. Methods with repeated parameters +\lstinline@$T$*@ take a variable number of arguments of type $T$. That is, +if a method $m$ with type ~\lstinline@($T_1 \commadots T_n, S$*)$U$@~ +is applied to arguments $(e_1 \commadots e_k)$ where $k \geq n$, then +$m$ is taken in that application to have type $(T_1 \commadots T_n, S +\commadots S)U$, with $k - n$ occurences of type $S$. +\todo{Change to ???: If the method +is converted to a function type instead of being applied immediately, +a repeated parameter \lstinline@$T$*@ is taken to be ~\lstinline@scala.Seq[$T$]@~ +instead.} + +\example The following method definition computes the sum of a variable number +of integer arguments. +\begin{lstlisting} +def sum(args: int*) { + var result = 0; + for (val arg <- args.elements) result = result + arg; + result +} +\end{lstlisting} +The following applications of this method yield \code{0}, \code{1}, +\code{6}, in that order. +\begin{lstlisting} +sum() +sum(1) +sum(1, 2, 3, 4, 5) +\end{lstlisting} + + +The type of the function body must conform to the function's declared +result type, if one is given. If the function definition is not +recursive, the result type may be omitted, in which case it is +determined from the type of the function body. + +\section{Overloaded Definitions} +\label{sec:overloaded-defs} +\todo{change} + +An overloaded definition is a set of $n > 1$ value or function +definitions in the same statement sequence that define the same name, +binding it to types ~\lstinline@$T_1 \commadots T_n$@, respectively. +The individual definitions are called {\em alternatives}. Overloaded +definitions may only appear in the statement sequence of a template. +Alternatives always need to specify the type of the defined entity +completely. It is an error if the types of two alternatives $T_i$ and +$T_j$ have the same erasure (\sref{sec:erasure}). + +\todo{Say something about bridge methods.} +%This must be a well-formed +%overloaded type + +\section{Import Clauses} +\label{sec:import} + +\syntax\begin{lstlisting} + Import ::= import ImportExpr {`,' ImportExpr} + ImportExpr ::= StableId `.' (id | `_' | ImportSelectors) + ImportSelectors ::= `{' {ImportSelector `,'} + (ImportSelector | `_') `}' + ImportSelector ::= id [`=>' id | `=>' `_'] +\end{lstlisting} + +An import clause has the form ~\lstinline@import $p$.$I$@~ where $p$ is a stable +identifier (\sref{sec:paths}) and $I$ is an import expression. +The import expression determines a set of names of members of $p$ +which are made available without qualification. The most general form +of an import expression is a list of {\em import selectors} +\begin{lstlisting} +{ $x_1$ => $y_1 \commadots x_n$ => $y_n$, _ } +\end{lstlisting} +for $n \geq 0$, where the final wildcard `\lstinline@_@' may be absent. It +makes available each member \lstinline@$p$.$x_i$@ under the unqualified name +$y_i$. I.e.\ every import selector ~\lstinline@$x_i$ => $y_i$@~ renames +\lstinline@$p$.$x_i$@ to +$y_i$. If a final wildcard is present, all members $z$ of +$p$ other than ~\lstinline@$x_1 \commadots x_n$@~ are also made available +under their own unqualified names. + +Import selectors work in the same way for type and term members. For +instance, an import clause ~\lstinline@import $p$.{$x$ => $y\,$}@~ renames the term +name \lstinline@$p$.$x$@ to the term name $y$ and the type name \lstinline@$p$.$x$@ +to the type name $y$. At least one of these two names must +reference a member of $p$. + +If the target in an import selector is a wildcard, the import selector +hides access to the source member. For instance, the import selector +~\lstinline@$x$ => _@~ ``renames'' $x$ to the wildcard symbol (which is +unaccessible as a name in user programs), and thereby effectively +prevents unqualified access to $x$. This is useful if there is a +final wildcard in the same import selector list, which imports all +members not mentioned in previous import selectors. + +Several shorthands exist. An import selector may be just a simple name +$x$. In this case, $x$ is imported without renaming, so the +import selector is equivalent to ~\lstinline@$x$ => $x$@. Furthermore, it is +possible to replace the whole import selector list by a single +identifier or wildcard. The import clause ~\lstinline@import $p$.$x$@~ is +equivalent to ~\lstinline@import $p$.{$x\,$}@~, i.e.\ it makes available without +qualification the member $x$ of $p$. The import clause +~\lstinline@import $p$._@~ is equivalent to +~\lstinline@import $p$.{_}@, +i.e.\ it makes available without qualification all members of $p$ +(this is analogous to ~\lstinline@import $p$.*@~ in Java). + +An import clause with multiple import expressions +~\lstinline@import $p_1$.$I_1 \commadots p_n$.$I_n$@~ is interpreted as a +sequence of import clauses +~\lstinline@import $p_1$.$I_1$; $\ldots$; import $p_n$.$I_n$@. + +\example Consider the object definition: +\begin{lstlisting} +object M { + def z = 0, one = 1; + def add(x: Int, y: Int): Int = x + y +} +\end{lstlisting} +Then the block +\begin{lstlisting} +{ import M.{one, z => zero, _}; add(zero, one) } +\end{lstlisting} +is equivalent to the block +\begin{lstlisting} +{ M.add(M.z, M.one) } . +\end{lstlisting} + +\chapter{Classes and Objects} +\label{sec:globaldefs} + +\syntax\begin{lstlisting} + ClsDef ::= ([case] class | trait) ClassDef {`,' ClassDef} + | [case] object ObjectDef {`,' ObjectDef} +\end{lstlisting} + +Classes (\sref{sec:classes}) and objects +(\sref{sec:modules}) are both defined in terms of {\em templates}. + +\section{Templates} +\label{sec:templates} + +\syntax\begin{lstlisting} + Template ::= Constr {`with' Constr} [TemplateBody] + TemplateBody ::= `{' [TemplateStat {`;' TemplateStat}] `}' +\end{lstlisting} + +A template defines the type signature, behavior and initial state of a +class of objects or of a single object. Templates form part of +instance creation expressions, class definitions, and object +definitions. A template +~\lstinline@$sc$ with $mc_1$ with $\ldots$ with $mc_n$ {$\stats\,$}@~ +consists of a constructor invocation $sc$ +which defines the template's {\em superclass}, constructor invocations +~\lstinline@$mc_1 \commadots mc_n$@~ $(n \geq 0)$, which define the +template's {\em mixin classes}, and a statement sequence $\stats$ which +contains additional member definitions for the template. Superclass +and mixin classes together are called the {\em parent classes} of a +template. They must be pairwise different. The superclass of a +template must be a subtype of the superclass of each mixin class. The +{\em least proper supertype} of a template is the class type or +compound type (\sref{sec:compound-types}) consisting of the its parent +classes. + +\todo{introduce ScalaObject} + +Member definitions define new members or overwrite members in the +parent classes. If the template forms part of a class definition, +the statement part $\stats$ may also contain declarations of abstract members. +%The type of each non-private definition or declaration of a +%template must be equivalent to a type which does not refer to any +%private members of that template. + +\todo{Make all references to Java generic} + +\paragraph{Inheriting from Java Types} A template may have a Java class as +its superclass and Java interfaces as its mixin classes. On the other +hand, it is not permitted to have a Java class as a mixin class, or a +Java interface as a superclass. + +\subsection{Constructor Invocations} +\label{sec:constr-invoke} +\syntax\begin{lstlisting} + Constr ::= StableId [TypeArgs] [`(' [Exprs] `)'] +\end{lstlisting} + +Constructor invocations define the type, members, and initial state of +objects created by an instance creation expression, or of parts of an +object's definition which are inherited by a class or object +definition. A constructor invocation is a function application +\lstinline@$x$.$c$($\args\,$)@, where $x$ is a stable identifier +(\sref{sec:stable-ids}), $c$ is a type name which either +designates a class or defines an alias type for one, and $\args$ +is an argument list, which matches one of the constructors of that +class. The prefix `\lstinline@$x$.@' can be omitted. +%The class $c$ must conform to \lstinline@scala.AnyRef@, +%i.e.\ it may not be a value type. +The argument list \lstinline@($\args\,$)@ can also be omitted, in which case an +empty argument list \lstinline@()@ is implicitly added. + +\subsection{Base Classes} +\label{sec:base-classes} + +For every template, class type and constructor invocation we define +two sets of class types: the {\em base classes} and {\em mixin base +classes}. Their definitions are as follows. + +The {\em mixin base classes} of a template +~\lstinline@$sc$ with $mc_1$ with $\ldots$ with $mc_n$ {$\stats\,$}@~ +are +the reduced union (\sref{sec:base-classes-member-defs}) of the base classes of all +mixins $mc_i$. The mixin base classes of a class type $C$ are the +mixin base classes of the template augmented by $C$ itself. The +mixin base classes of a constructor invocation of type $T$ are the +mixin base classes of class $T$. + +The {\em base classes} of a template consist are the reduced union of +the base classes of its superclass and the template's mixin base +classes. The base classes of class \lstinline@scala.Any@ consist of +just the class itself. The base classes of some other class type $C$ +are the base classes of the template represented by $C$ augmented by +$C$ itself. The base classes of a constructor invocation of type $T$ +are the base classes of $T$. + +The notions of mixin base classes and base classes are extended from +classes to arbitrary types following the definitions of +\sref{sec:base-classes-member-defs}. + +\comment{ +If two types in the base class sequence of a template refer to the +same class definition, then that definition must define a trait +(\sref{sec:traits}), and the type that comes later in the sequence must +conform to the type that comes first. +(\sref{sec:base-classes-member-defs}). +} + +\example +Consider the following class definitions: +\begin{lstlisting} +class A; +class B extends A; +trait C extends A; +class D extends A; +class E extends B with C with D; +class F extends B with D with E; +\end{lstlisting} +The mixin base classes and base classes of classes \code{A-F} are given in +the following table: +\begin{quote}\begin{tabular}{|l|l|l|} \hline + \ & Mixin base classses & Base classes \\ \hline +A & A & A, ScalaObject, AnyRef, Any \\ +B & B & B, A, ScalaObject, AnyRef, Any \\ +C & C & C, A, ScalaObject, AnyRef, Any \\ +D & D & D, A, ScalaObject, AnyRef, Any \\ +E & C, D, E & E, B, C, D, A, ScalaObject, AnyRef, Any \\ +F & C, D, E, F & F, B, D, E, C, A, ScalaObject, AnyRef, Any \\ \hline +\end{tabular}\end{quote} +Note that \code{D} is inherited twice by \code{F}, once directly, the +other time indirectly throgh \code{E}. This is permitted, since +\code{D} is a trait. + + +\subsection{Evaluation} + +The evaluation of a template or constructor invocation depends on +whether the template defines an object or is a superclass of a +constructed object, or whether it is used as a mixin for a defined +object. In the second case, the evaluation of a template used as a +mixin depends on an {\em actual superclass}, which is known at the +point where the template is used in a definition of an object, but not +at the point where it is defined. The actual superclass is used in the +determination of the meaning of \code{super} (\sref{sec:this-super}). + +We therefore define two notions of template evaluation: (Plain) +evaluation (as a defining template or superclass) and mixin evaluation +with a given superclass $sc$. These notions are defined for templates +and constructor invocations as follows. + +A {\em mixin evaluation with superclass $sc$} of a template +~\lstinline@$sc'$ with $mc_1$ with $mc_n$ {$\stats\,$}@~ consists of mixin +evaluations with superclass $sc$ of the mixin constructor invocations +~\lstinline@$mc_1 \commadots mc_n$@~ in the order they are given, followed by an +evaluation of the statement sequence $\stats$. Within $\stats$ the +actual superclass refers to $sc$. A mixin evaluation with superclass +$sc$ of a class constructor invocation \code{ci} consists of an evaluation +of the constructor function and its arguments in the order they are +given, followed by a mixin evaluation with superclass $sc$ of the +template represented by the constructor invocation. + +An {\em evaluation} of a template +~\lstinline@$sc$ with $mc_1$ with $mc_n$ with ($\stats\,$)@~ consists of an evaluation of +the superclass constructor invocation $sc$, +followed by a mixin evaluation with superclass $sc$ of the template. An +evaluation of a class constructor invocation \code{ci} consists of an +evaluation of the constructor function and its arguments in +the order they are given, followed by an evaluation of the template +represented by the constructor invocation. + +\subsection{Template Members} + +\label{sec:members} + +The object resulting from evaluation of a template has directly bound +members and inherited members. Members can be abstract or concrete. +For a template $T$ these categories are defined as follows. +\begin{enumerate} +\item +A {\em directly bound} member of $T$ is an entity introduced by a member +definition or declaration in $T$'s statement sequence. The +member is called {\em abstract} if it is introduced by a declaration, +{\em concrete} otherwise. +\item +A {\em concrete inherited} member of $T$ is a non-private, concrete member of +one of $T$'s parent classes, except if a member with the same name is +already directly bound in $T$ or the member is mixin-overridden in +$T$. A member $m$ of $T$'s superclass is {\em mixin-overridden} in $T$ +if there is a concrete member of a mixin base class of $T$ which +either overrides $m$ itself or overrides a member named $m$ of a base +class of $T$'s superclass. +\item +An {\em abstract inherited} member of $T$ is a non-private, abstract member +of one of $T$'s parent classes $P_i$, except if the template has a +directly bound or concrete inherited member with the same name, or the +template has an abstract member inherited from a parent class $P_j$ where +$j > i$\todo{OK to leave out?: , and which has the same modifiers and type as the member +inherited from $P_j$ would have in $T$}. +\end{enumerate} +It is an error if a template has more than one member with +the same name. + + + +\comment{ +The type of a member $m$ is determined as follows: If $m$ is defined +in $\stats$, then its type is the type as given in the member's +declaration or definition. Otherwise, if $m$ is inherited from the +base class ~\lstinline@$B$[$T_1$, $\ldots$. $T_n$]@, $B$'s class declaration has formal +parameters ~\lstinline@[$a_1 \commadots a_n$]@, and $M$'s type in $B$ is $U$, then +$M$'s type in $C$ is ~\lstinline@$U$[$a_1$ := $T_1 \commadots a_n$ := $T_n$]@. + +\ifqualified{ +Members of templates have internally qualified names $Q\qex x$ where +$x$ is a simple name and $Q$ is either the empty name $\epsilon$, or +is a qualified name referencing the module or class that first +introduces the member. A basic declaration or definition of $x$ in a +module or class $M$ introduces a member with the following qualified +name: +\begin{enumerate} +\item +If the binding is labeled with an ~\lstinline@override $Q$@\notyet{Override + with qualifier} modifier, +where $Q$ is a fully qualified name of a base class of $M$, then the +qualified name is the qualified expansion (\sref{sec:names}) of $x$ in +$Q$. +\item +If the binding is labeled with an \code{override} modifier without a +base class name, then the qualified name is the qualified expansion +of $x$ in $M$'s least proper supertype (\sref{sec:templates}). +\item +An implicit \code{override} modifier is added and case (2) also +applies if $M$'s least proper supertype contains an abstract member +with simple name $x$. +\item +If no \code{override} modifier is given or implied, then if $M$ is +labeled \code{qualified}, the qualified name is $M\qex x$. If $M$ is +not labeled \code{qualified}, the qualified name is $\epsilon\qex x$. +\end{enumerate} +} +} + +\example Consider the class definitions + +\begin{lstlisting} +class A { def f: Int = 1 ; def g: Int = 2 ; def h: Int = 3 } +abstract class B { def f: Int = 4 ; def g: Int } +abstract class C extends A with B { def h: Int } +\end{lstlisting} + +Then class \code{C} has a directly bound abstract member \code{h}. It +inherits member \code{f} from class \code{B} and member \code{g} from +class \code{A}. + +\ifqualified{ +\example\label{ex:compound-b} +Consider the definitions: +\begin{lstlisting} +qualified class Root extends Any { def r1: Root, r2: Int } +qualified class A extends Root { def r1: A, a: String } +qualified class B extends A { def r1: B, b: Double } +\end{lstlisting} +Then ~\lstinline@A with B@~ has members +\lstinline@Root::r1@ of type \code{B}, \lstinline@Root::r2@ of type \code{Int}, +\lstinline@A::a:@ of type \code{String}, and \lstinline@B::b@ of type \code{Double}, +in addition to the members inherited from class \code{Any}. +} + +\subsection{Overriding} +\label{sec:overriding} + +A template member $M$ that has the same \ifqualified{qualified} name +as a non-private member $M'$ of a base class (and that belongs to the +same namespace) is said to {\em override} that member. In this case +the binding of the overriding member $M$ must subsume +(\sref{sec:subtyping}) the binding of the overridden member $M'$. +Furthermore, the overridden definition may not be a class definition. +Method definitions may only override other method definitions (or the +methods implicitly defined by a variable definition). They may not +override value definitions. Finally, the following restrictions +on modifiers apply to $M$ and $M'$: +\begin{itemize} +\item +$M'$ must not be labeled \code{final}. +\item +$M$ must not be labeled \code{private}. +\item +If $M$ is labeled \code{protected}, then $M'$ must also be +labeled \code{protected}. +\item +If $M'$ is not an abstract member, then +$M$ must be labeled \code{override}. +\item +If $M'$ is labelled \code{abstract} and \code{override}, and $M'$ is a +member of the static superclass of the class containing the definition +of $M$, then $M$ must also be labelled \code{abstract} and +\code{override}. +\end{itemize} + +\example\label{ex:compound-a} +Consider the definitions: +\begin{lstlisting} +trait Root { type T <: Root } +trait A extends Root { type T <: A } +trait B extends Root { type T <: B } +trait C extends A with B; +\end{lstlisting} +Then the trait definition \code{C} is not well-formed because the +binding of \code{T} in \code{C} is +~\lstinline@type T <: B@, +which fails to subsume the binding ~\lstinline@type T <: A@~ of \code{T} +in type \code{A}. The problem can be solved by adding an overriding +definition of type \code{T} in class \code{C}: +\begin{lstlisting} +class C extends A with B { type T <: C } +\end{lstlisting} + +\subsection{Modifiers} +\label{sec:modifiers} + +\syntax\begin{lstlisting} + Modifier ::= LocalModifier + | private + | protected + | override + LocalModifier ::= abstract + | final + | sealed +\end{lstlisting} + +Member definitions may be preceded by modifiers which affect the +\ifqualified{qualified names, }accessibility and usage of the +identifiers bound by them. If several modifiers are given, their +order does not matter, but the same modifier may not occur repeatedly. +Modifiers preceding a repeated definition apply to all constituent +definitions. The rules governing the validity and meaning of a +modifier are as follows. +\begin{itemize} +\item +The \code{private} modifier can be used with any definition in a +template. Private members can be accessed only from within the template +that defines them. +%Furthermore, accesses are not permitted in +%packagings (\sref{sec:topdefs}) other than the one containing the +%definition. +Private members are not inherited by subclasses and they +may not override definitions in parent classes. +\code{private} may not be applied to abstract members, and it +may not be combined in one modifier list with +\code{protected}, \code{final} or \code{override}. +\item +The \code{protected} modifier applies to class member definitions. +Protected members can be accessed from within the template of the defining +class as well as in all templates that have the defining class as a base class. +%Furthermore, accesses from the template of the defining class are not +%permitted in packagings other than the one +%containing the definition. +A protected identifier $x$ may be used as +a member name in a selection \lstinline@$r$.$x$@ only if $r$ is one of the reserved +words \code{this} and +\code{super}, or if $r$'s type conforms to a type-instance of the class +which contains the access. +\item +The \code{override} modifier applies to class member definitions. It +is mandatory for member definitions that override some other concrete +member definition in a super- or mixin-class. If an \code{override} +modifier is given, there must be at least one overridden member +definition. + +The \code{override} modifier has an additional significance when +combined with the \code{abstract} modifier. That modifier combination +is only allowed for members of abstract classes. A member +labelled \code{abstract} and \code{override} must override some +member of the superclass of the class containing the definition. + +We call a member of a template {\em incomplete} if it is either +abstract (i.e.\ defined by a declaration), or it is labelled +\code{abstract} and \code{override} and it overrides an incomplete +member of the template's superclass. + +Note that the \code{abstract override} modifier combination does not +influence the concept whether a member is concrete or +abstract. A member for which only a declaration is given is abstract, +whereas a member for which a full definition is given is concrete. + +\item +The \code{abstract} modifier is used in class definitions. It is +mandatory if the class has incomplete members. Abstract classes +cannot be instantiated (\sref{sec:inst-creation}) with a constructor +invocation unless followed by mixin constructors or statements which +override all incomplete members of the class. + +The \code{abstract} modifier can also be used in conjunction with +\code{override} for class member definitions. In that case the meaning +of the previous discussion applies. +\item +The \code{final} modifier applies to class member definitions and to +class definitions. A \code{final} class member definition may not be +overridden in subclasses. A \code{final} class may not be inherited by +a template. \code{final} is redundant for object definitions. Members +of final classes or objects are implicitly also final, so the +\code{final} modifier is redundant for them, too. \code{final} may +not be applied to incomplete members, and it may not be combined in one +modifier list with \code{private} or \code{sealed}. +\item +The \code{sealed} modifier applies to class definitions. A +\code{sealed} class may not be inherited, except if either +\begin{itemize} +\item +the inheriting template is nested within the definition of the sealed +class itself, or +\item +the inheriting template belongs to a class or object definition which +forms part of the same statement sequence as the definition of the +sealed class. +\end{itemize} +\end{itemize} + +\example A useful idiom to prevent clients of a class from +constructing new instances of that class is to declare the class +\code{abstract} and \code{sealed}: + +\begin{lstlisting} +object m { + abstract sealed class C (x: Int) { + def nextC = C(x + 1) {} + } + val empty = new C(0) {} +} +\end{lstlisting} +For instance, in the code above clients can create instances of class +\lstinline@m.C@ only by calling the \code{nextC} method of an existing \lstinline@m.C@ +object; it is not possible for clients to create objects of class +\lstinline@m.C@ directly. Indeed the following two lines are both in error: + +\begin{lstlisting} + m.C(0) // **** error: C is abstract, so it cannot be instantiated. + m.C(0) {} // **** error: illegal inheritance from sealed class. +\end{lstlisting} + +\section{Class Definitions} +\label{sec:classes} + +\syntax\begin{lstlisting} + ClsDef ::= class ClassDef {`,' ClassDef} + ClassDef ::= id [TypeParamClause] [ParamClause] + [`:' SimpleType] ClassTemplate + ClassTemplate ::= extends Template + | TemplateBody + | +\end{lstlisting} + +The most general form of class definition is +~\lstinline@class $c$[$\tps\,$]($ps\,$): $s$ extends $t$@. +Here, +\begin{itemize} +\item[] +$c$ is the name of the class to be defined. +\item[] $\tps$ is a non-empty list of type parameters of the class +being defined. The scope of a type parameter is the whole class +definition including the type parameter section itself. It is +illegal to define two type parameters with the same name. The type +parameter section \lstinline@[$\tps\,$]@ may be omitted. A class with a type +parameter section is called {\em polymorphic}, otherwise it is called +{\em monomorphic}. +\item[] +$ps$ is a formal value parameter clause for the {\em primary +constructor} of the class. The scope of a formal value parameter includes +the template $t$. However, a formal value parameter may not form +part of the types of any of the parent classes or members of $t$. +It is illegal to define two formal value parameters with the same name. +The formal parameter section \lstinline@($ps\,$)@ may be omitted, in which case +an empty parameter section \lstinline@()@ is assumed. +\item[] +$s$ is the {\em self type} of the class. Inside the +class, the type of \code{this} is assumed to be $s$. The self +type must conform to the self types of all classes which are inherited +by the template $t$. The self type declaration `\lstinline@:$s$@' may be +omitted, in which case the self type of the class is assumed to be +equal to \lstinline@$c$[$\tps\,$]@. +\item[] +$t$ is a +template (\sref{sec:templates}) of the form +\begin{lstlisting} +$sc$ with $mc_1$ with $\ldots$ with $mc_n$ { $\stats$ } $\gap(n \geq 0)$ +\end{lstlisting} +which defines the base classes, behavior and initial state of objects of +the class. The extends clause ~\lstinline@extends $sc$@~ +can be omitted, in which case +~\lstinline@extends scala.AnyRef@~ is assumed. The class body +~\lstinline@{$\stats\,$}@~ may also be omitted, in which case the empty body +\lstinline@{}@ is assumed. +\end{itemize} +This class definition defines a type \lstinline@$c$[$\tps\,$]@ and a constructor +which when applied to parameters conforming to types $ps$ +initializes instances of type \lstinline@$c$[$\tps\,$]@ by evaluating the template +$t$. + +\subsection{Constructor Definitions}\label{sec:constr-defs} + +\syntax\begin{lstlisting} + FunDef ::= this ParamClause`=' ConstrExpr + ConstrExpr ::= this ArgumentExprs + | `{' this ArgumentExprs {`;' BlockStat} `}' +\end{lstlisting} + +A class may have additional constructors besides the primary +constructor. These are defined by constructor definitions of the form +~\lstinline@def this($ps\,$) = $e$@. Such a definition introduces an +additional constructor for the enclosing class, with parameters as +given in the formal parameter list $ps$, and whose evaluation is +defined by the constructor expression $e$. The scope of each formal +parameter is the constructor expression $e$. A constructor expression +is either a self constructor invocation \lstinline@this($\args\,$)@ or +a block which begins with a self constructor invocation. Neither the +signature, nor the self constructor invocation of a constructor +definition may refer to \verb@this@, or refer to value parameters or +members of the enclosing class by simple name. + +If there are auxiliary constructors of a class $C$, they define +together with $C$'s primary constructor an overloaded constructor +value. The usual rules for overloading resolution +(\sref{sec:overloaded-defs}) apply for constructor invocations of $C$, +including the self constructor invocations in the constructor +expressions themselves. To prevent infinite cycles of constructor +invocations, there is the restriction that every self constructor +invocation must refer to a constructor definition which precedes it +(i.e. it must refer to either a preceding auxiliary constructor or the +primary constructor of the class). The type of a constructor +expression must be always so that a generic instance of the class is +constructed. I.e., if the class in question has name $C$ and type +parameters \lstinline@[$\tps\,$]@, then each constructor must construct an +instance of \lstinline@$C$[$\tps\,$]@; it is not permitted to instantiate formal +type parameters. + +\example Consider the class definition + +\begin{lstlisting} +class LinkedList[a]() { + var head = _; + var tail = null; + def isEmpty = tail != null; + def this(head: a) = { this(); this.head = head; } + def this(head: a, tail: List[a]) = { this(head); this.tail = tail } +} +\end{lstlisting} +This defines a class \code{LinkedList} with an overloaded constructor of type +\begin{lstlisting} +[a](): LinkedList[a] $\overload$ +[a](x: a): LinkedList[a] $\overload$ +[a](x: a, xs: LinkList[a]): LinkedList[a] . +\end{lstlisting} +The second constructor alternative constructs an singleton list, while the +third one constructs a list with a given head and tail. + +\subsection{Case Classes} +\label{sec:case-classes} + +\syntax\begin{lstlisting} + ClsDef ::= case class ClassDef {`,' ClassDef} +\end{lstlisting} + +If a class definition is prefixed with \code{case}, the class is said +to be a {\em case class}. The primary constructor of a case class may +be used in a constructor pattern (\sref{sec:patterns}). +The following four restrictions ensure efficient pattern matching for +case classes. +\begin{enumerate} +\item None of the base classes of a case class may be a case +class. +\item No type may have two different case classes among its base types. +\item A case class may not inherit indirectly from a +\lstinline@sealed@ class. That is, if a base class $b$ of a case class $c$ +is marked \lstinline@sealed@, then $b$ must be a parent class of $c$. +\item +The primary constructor of a case class may not have any call-by-name +parameters (\sref{sec:parameters}). +\end{enumerate} + +A case class definition of ~\lstinline@$c$[$\tps\,$]($ps\,$)@~ with type +parameters $\tps$ and value parameters $ps$ implicitly +generates a function definition for a {\em case class factory} +together with the class definition itself: +\begin{lstlisting} +def c[$\tps\,$]($ps\,$): $s$ = new $c$[$\tps\,$]($ps\,$) +\end{lstlisting} +(Here, $s$ is the self type of class $c$. +If a type parameter section +is missing in the class, it is also missing in the factory +definition). + +Also implicitly defined are accessor member definitions +in the class that return its value parameters. Every binding +$x: T$ in the parameter section leads to a value definition of +$x$ that defines $x$ to be an alias of the parameter. +%Every +%parameterless function binding \lstinline@def x: T@ leads to a +%parameterless function definition of $x$ which returns the result +%of invoking the parameter function. +%The case class may not contain a +%directly bound member with the same simple name as one of its value +%parameters. + +Every case class implicitly overrides some method definitions of class +\lstinline@scala.AnyRef@ (\sref{sec:cls-object}) unless a definition of the same +method is already given in the case class itself or a concrete +definition of the same method is given in some base class of the case +class different from \code{AnyRef}. In particular: +\begin{itemize} +\item[] Method ~\lstinline@equals: (Any)boolean@~ is structural equality, where two +instances are equal if they belong to the same class and +have equal (with respect to \code{equals}) primary constructor arguments. +\item[] Method ~\lstinline@hashCode: ()int@~ computes a hash-code +depending on the data structure in a way which maps equal (with respect to +\code{equals}) values to equal hash-codes. +\item[] Method ~\lstinline@toString: ()String@~ returns a string representation which +contains the name of the class and its primary constructor arguments. +\end{itemize} + +\example Here is the definition of abstract syntax for lambda +calculus: + +\begin{lstlisting} +class Expr; +case class + Var (x: String) extends Expr, + Apply (f: Expr, e: Expr) extends Expr, + Lambda (x: String, e: Expr) extends Expr; +\end{lstlisting} +This defines a class \code{Expr} with case classes +\code{Var}, \code{Apply} and \code{Lambda}. A call-by-value evaluator for lambda +expressions could then be written as follows. + +\begin{lstlisting} +type Env = String => Value; +case class Value(e: Expr, env: Env); + +def eval(e: Expr, env: Env): Value = e match { + case Var (x) => + env(x) + case Apply(f, g) => + val Value(Lambda (x, e1), env1) = eval(f, env); + val v = eval(g, env); + eval (e1, (y => if (y == x) v else env1(y))) + case Lambda(_, _) => + Value(e, env) +} +\end{lstlisting} + +It is possible to define further case classes that extend type +\code{Expr} in other parts of the program, for instance +\begin{lstlisting} +case class Number(x: Int) extends Expr; +\end{lstlisting} + +This form of extensibility can be excluded by declaring the base class +\code{Expr} \code{sealed}; in this case, the only classes permitted to +extend \code{Expr} are those which are nested inside \code{Expr}, or +which appear in the same statement sequence as the definition of +\code{Expr}. + +\section{Traits} + +\label{sec:traits} + +\syntax\begin{lstlisting} + ClsDef ::= trait ClassDef {`,' ClassDef} +\end{lstlisting} + +A class definition which starts with the reserved word \code{trait} +instead of \code{class} defines a trait. A trait is a specific +instance of an abstract class, so the \code{abstract} modifier is +redundant for it. The template of a trait must satisfy the following +three restrictions. +\begin{enumerate} +\item All base classes of the trait are traits. +\item All parent class constructors of a template + must be primary constructors with empty value + parameter lists. +\item All non-empty statements in the template are either imports or pure definitions. +\end{enumerate} +A {\em pure} definition can be evaluated without any side effect. +Function, type, class, or object definitions are always pure. A value +definition is pure if its right-hand side expression is pure. Pure +expressions are paths, literals, and typed expressions +$e: T$ where $e$ is pure. + +These restrictions ensure that the evaluation of the mixin constructor +of a trait has no effect. Therefore, traits may appear several times +in the base classes of a template, whereas other classes cannot. +%\item Packagings may add interface classes as new base classes to an +%existing class or module. + +\example\label{ex:comparable} +The following trait class defines the property of being +ordered, i.e. comparable to objects of some type. It contains an abstract method +\lstinline@<@ and default implementations of the other comparison operators +\lstinline@<=@, \lstinline@>@, and \lstinline@>=@. + +\begin{lstlisting} +trait Ord[t <: Ord[t]]: t { + def < (that: t): Boolean; + def <=(that: t): Boolean = this < that || this == that; + def > (that: t): Boolean = that < this; + def >=(that: t): Boolean = that <= this; +} +\end{lstlisting} + +\section{Object Definitions} +\label{sec:modules} +\label{sec:object-defs} + +\syntax\begin{lstlisting} + ObjectDef ::= id [`:' SimpleType] ClassTemplate +\end{lstlisting} + +An object definition defines a single object of a new class. Its +most general form is +~\lstinline@object $m$: $s$ extends $t$@. Here, +\begin{itemize} +\item[] +$m$ is the name of the object to be defined. +\item[] $s$ is the {\em self type} of the object. References to $m$ +are assumed to have type $s$. Furthermore, inside the template $t$, +the type of \code{this} is also assumed to be $s$. The type of the +anonymous class defined by $t$ must conform to $s$ and $s$ must +conform to the self types of all classes which are inherited by +$t$. The self type declaration `$:s$' may be omitted, in which case +the self type is assumed to be equal to the anonymous class defined by +$t$. +\item[] +$t$ is a +template (\sref{sec:templates}) of the form +\begin{lstlisting} +$sc$ with $mc_1$ with $\ldots$ with $mc_n$ { $\stats$ } +\end{lstlisting} +which defines the base classes, behavior and initial state of $m$. +The extends clause ~\lstinline@extends $sc$@~ +can be omitted, in which case +~\lstinline@extends scala.AnyRef@~ is assumed. The class body +~\lstinline@{$\stats\,$}@~ may also be omitted, in which case the empty body +\lstinline@{}@ is assumed. +\end{itemize} +The object definition defines a single object (or: {\em module}) +conforming to the template $t$. It is roughly equivalent to a class +definition and a value definition that creates an object of the class: +\begin{lstlisting} +final class $m\Dollar$cls: $s$ extends $t$; +final val $m$: $s$ = new m$\Dollar$cls; +\end{lstlisting} +(The \code{final} modifiers are omitted if the definition occurs as +part of a block. The class name \lstinline@$m\Dollar$cls@ is not +accessible for user programs.) + +There are however two differences between an object definition and a +pair of class and value definitions such as the one given above. First, +object definitions may appear as top-level definitions in a +compilation unit, whereas value definitions may not. Second, the +module defined by an object definition is instantiated lazily. The +~\lstinline@new $m\Dollar$cls@~ constructor is evaluated not at the point +of the object definition, but is instead evaluated the first time $m$ +is dereferenced during execution of the program (which might be never +at all). An attempt to dereference $m$ again in the course of +evaluation of the constructor leads to a infinite loop or run-time +error. Other threads trying to dereference $m$ while the constructor +is being evaluated block until evaluation is complete. + +\example +Classes in Scala do not have static members; however, an equivalent +effect can be achieved by an accompanying object definition +E.g. +\begin{lstlisting} +abstract class Point { + val x: Double; + val y: Double; + def isOrigin = (x == 0.0 && y == 0.0); +} +object Point { + val origin = new Point() { val x = 0.0, y = 0.0 } +} +\end{lstlisting} +This defines a class \code{Point} and an object \code{Point} which +contains \code{origin} as a member. Note that the double use of the +name \code{Point} is legal, since the class definition defines the name +\code{Point} in the type name space, whereas the object definition +defines a name in the term namespace. + +This technique is applied by the Scala compiler when interpreting a +Java class with static members. Such a class $C$ is conceptually seen +as a pair of a Scala class that contains all instance members of $C$ +and a Scala object that contains all static members of $C$. + +\comment{ +\example Here's an outline of a module definition for a file system. + +\begin{lstlisting} +module FileSystem { + private type FileDirectory; + private val dir: FileDirectory + + interface File { + def read(xs: Array[Byte]) + def close: Unit + } + + private class FileHandle extends File { $\ldots$ } + + def open(name: String): File = $\ldots$ +} +\end{lstlisting} +} + +\chapter{Expressions} +\label{sec:exprs} + +\syntax\begin{lstlisting} + Expr ::= [Bindings `=>'] Expr + | Expr1 + Expr1 ::= if `(' Expr `)' Expr [[`;'] else Expr] + | try `{' block `}' [catch Expr] [finally Expr] + | while '(' Expr ')' Expr + | do Expr [`;'] while `(' Expr ')' + | for `(' Enumerators `)' (do | yield) Expr + | return [Expr] + | throw Expr + | [SimpleExpr `.'] id `=' Expr + | SimpleExpr ArgumentExprs `=' Expr + | PostfixExpr [`:' Type1] + PostfixExpr ::= InfixExpr [id] + InfixExpr ::= PrefixExpr + | InfixExpr id PrefixExpr + PrefixExpr ::= [`-' | `+' | `~' | `!'] SimpleExpr + SimpleExpr ::= Literal + | Path + | `(' [Expr] `)' + | BlockExpr + | new Template + | SimpleExpr `.' id + | SimpleExpr TypeArgs + | SimpleExpr ArgumentExprs + ArgumentExprs ::= `(' [Exprs] ')' + | BlockExpr + BlockExpr ::= `{' CaseClause {CaseClause} `}' + | `{' Block `}' + Block ::= {BlockStat `;'} [ResultExpr] + ResultExpr ::= Expr1 + | Bindings `=>' Block + Exprs ::= Expr {`,' Expr} +\end{lstlisting} + +Expressions are composed of operators and operands. Expression forms are +discussed subsequently in decreasing order of precedence. + +The typing of expressions is often relative to some {\em expected +type}. When we write ``expression $e$ is expected to conform to +type $T$'', we mean: (1) the expected type of $e$ is +$T$, and (2) the type of expression $e$ must conform to +$T$. + +\section{Literals} + +\syntax\begin{lstlisting} + SimpleExpr ::= Literal + Literal ::= intLit + | floatLit + | charLit + | stringLit + | symbolLit + | true + | false + | null +\end{lstlisting} + +Typing and evaluation of numeric, character, and string literals are +generally as in Java. An integer literal denotes an integer +number. Its type is normally \code{int}. However, if the expected type +$\proto$ of the expression is either \code{byte}, \code{short}, or +\code{char} and the integer number fits in the numeric range defined +by the type, then the number is converted to type $\proto$ and the +expression's type is $\proto$. A floating point literal denotes a +single-precision or double precision IEEE floating point number. A +character literal denotes a Unicode character. A string literal +denotes a member of \lstinline@String@. + +A symbol literal ~\lstinline@'$x$@~ is a shorthand for the expression +~\lstinline@scala.Symbol("$x$")@. If the symbol literal is followed by +actual parameters, as in ~\lstinline@'$x$($\args\,$)@, then the whole +expression is taken to be a shorthand for +~\lstinline@scala.Symbol("$x$", $\args\,$)@. + +The boolean truth values are denoted by the reserved words \code{true} +and \code{false}. The type of these expressions is \code{boolean}, and +their evaluation is immediate. + +The \code{null} literal is of type \lstinline@scala.AllRef@. It +denotes a reference value which refers to a special ``null' object, +which implements methods in class \lstinline@scala.AnyRef@ as follows: +\begin{itemize} +\item +\lstinline@eq($x\,$)@, \lstinline@==($x\,$)@, \lstinline@equals($x\,$)@ return \code{true} iff their +argument $x$ is also the ``null'' object. +\item +\lstinline@isInstanceOf[$T\,$]@ always returns \code{false}. +\item +\lstinline@asInstanceOf[$T\,$]@ returns the ``null'' object itself if +$T$ conforms to \lstinline@scala.AnyRef@, and throws a +\lstinline@NullPointerException@ otherwise. +\item +\code{toString()} returns the string ``null''. +\end{itemize} +A reference to any other member of the ``null'' object causes a +\code{NullPointerException} to be thrown. + +\section{Designators} +\label{sec:designators} + +\syntax\begin{lstlisting} + Designator ::= Path + | SimpleExpr `.' id +\end{lstlisting} + +A designator refers to a named term. It can be a {\em simple name} or +a {\em selection}. If $r$ is a stable identifier of type $T$, the +selection $r.x$ refers to the term member of $r$ that is identified in +$T$ by the name $x$. For other expressions $e$, $e.x$ is typed as if +it was $(\VAL;y=e\semi y.x)$ for some fresh name $y$. The typing rules +for blocks implies that in that case $x$'s type may not refer to any +abstract type member of $e$. + +The expected type of a designator's prefix is always missing. +The +type of a designator is normally the type of the entity it refers +to. However, if the designator is a path (\sref{sec:paths}) $p$, +its type is \lstinline@$p$.type@, provided the expression's expected type is +a singleton type, or $p$ occurs as the prefix of a selection +or type selection. + +The selection $e.x$ is evaluated by first evaluating the qualifier +expression $e$. The selection's result is then the value to which the +selector identifier is bound in the object resulting from evaluation of $e$. + +\section{This and Super} +\label{sec:this-super} + +\syntax\begin{lstlisting} + SimpleExpr ::= [id `.'] this + | [id `.'] super [`[' id `]'] `.' id +\end{lstlisting} + +The expression \code{this} can appear in the statement part of a +template or compound type. It stands for the object being defined by +the innermost template or compound type enclosing the reference. If +this is a compound type, the type of \code{this} is that compound type. +If it is a template of an instance creation expression, the type of +\code{this} is the type of that template. If it is a template of a +class or object definition with simple name $C$, the type of this +is the same as the type of \lstinline@$C$.this@. + +The expression \lstinline@$C$.this@ is legal in the statement part of an +enclosing class or object definition with simple name $C$. It +stands for the object being defined by the innermost such definition. +If the expression's expected type is a singleton type, or +\lstinline@$C$.this@ occurs as the prefix of a selection, its type is +\lstinline@$C$.this.type@, otherwise it is the self type of class $C$. + +A reference \lstinline@super.$m$@ in a template refers to the +definition of $m$ in the actual superclass (\sref{sec:base-classes}) +of the template. A reference \lstinline@$C$.super.$m$@ refers to the +definition of $m$ in the actual superclass of the innermost enclosing +class or object definition named $C$ which encloses the reference. The +definition $m$ referred to via \code{super} or \lstinline@$C$.super@ +must be concrete, or the template containing the reference must have an +incomplete (\sref{sec:modifiers}) member $m'$ which overrides $m$. + +The \code{super} prefix may be followed by a mixin qualifier +\lstinline@[$M\,$]@, as in \lstinline@$C$.super[$M\,$].$x$@. This is called a {\em mixin +super reference}. In this case, the reference is to the member of +$x$ in the (first) mixin class of $C$ whose simple name +is $M$. That member may not be abstract. + +\example\label{ex:super} +Consider the following class definitions + +\begin{lstlisting} +class Root { val x = "Root" } +class A extends Root { override val x = "A" ; val superA = super.x } +class B extends Root { override val x = "B" ; val superB = super.x } +class C extends A with B { + override val x = "C" ; val superC = super.x +} +class D extends A { val superD = super.x } +class E extends C with D { val superE = super.x } +\end{lstlisting} +Then we have: +\begin{lstlisting} +(new A).superA == "Root", (new B).superB == "Root" +(new C).superA == "Root", (new C).superB == "A", (new C).superC == "A" +(new D).superA == "Root", (new D).superD == "A" +(new E).superA == "Root", (new E).superB == "A", (new E).superC == "A", + (new E).superD == "C", (new E).superE == "C" +\end{lstlisting} +Note that the \code{superB} function returns different results +depending on whether \code{B} is used as defining class or as a mixin class. + +\example Consider the following class definitions: +\begin{lstlisting} +class Shape { + override def equals(other: Any) = $\ldots$; + $\ldots$ +} +trait Bordered extends Shape { + val thickness: int; + override def equals(other: Any) = other match { + case that: Bordered => + super equals other && this.thickness == that.thickness + case _ => false + } + $\ldots$ +} +trait Colored extends Shape { + val color: Color; + override def equals(other: Any) = other match { + case that: Colored => + super equals other && this.color == that.color + case _ => false + } + $\ldots$ +} +\end{lstlisting} + +Both definitions of \code{equals} are combined in the class +below. +\begin{lstlisting} +trait BorderedColoredShape extends Shape with Bordered with Colored { + override def equals(other: Any) = + super[Bordered].equals(that) && super[Colored].equals(that) +} +\end{lstlisting} + +\section{Function Applications} +\label{sec:apply} + +\syntax\begin{lstlisting} + SimpleExpr ::= SimpleExpr ArgumentExprs +\end{lstlisting} + +An application \lstinline@$f$($e_1 \commadots e_n$)@ applies the function $f$ to the +argument expressions $e_1 \commadots e_n$. If $f$ has a method type +\lstinline@($T_1 \commadots T_n$)U@, the type of each argument +expression $e_i$ must conform to the corresponding parameter type +$T_i$. If $f$ has some value type, the application is taken to be +equivalent to \lstinline@$f$.apply($e_1 \commadots e_n$)@, i.e.\ the +application of an \code{apply} method defined by $f$. + +%Class constructor functions +%(\sref{sec:classes}) can only be applied in constructor invocations +%(\sref{sec:constr-invoke}), never in expressions. + +Evaluation of \lstinline@$f$($e_1 \commadots e_n$)@ usually entails evaluation of +$f$ and $e_1 \commadots e_n$ in that order. Each argument expression +is converted to the type of its corresponding formal parameter. After +that, the application is rewritten to the function's right hand side, +with actual arguments substituted for formal parameters. The result +of evaluating the rewritten right-hand side is finally converted to +the function's declared result type, if one is given. + +The case of a formal \code{def}-parameter with a parameterless +method type \lstinline@[]$T$@ is treated specially. In this case, the +corresponding actual argument expression is not evaluated before the +application. Instead, every use of the formal parameter on the +right-hand side of the rewrite rule entails a re-evaluation of the +actual argument expression. In other words, the evaluation order for +\code{def}-parameters is {\em call-by-name} whereas the evaluation +order for normal parameters is {\em call-by-value}. + +\section{Type Applications} +\label{sec:type-app} +\syntax\begin{lstlisting} + SimpleExpr ::= SimpleExpr `[' Types `]' +\end{lstlisting} + +A type application \lstinline@$e$[$T_1 \commadots T_n$]@ instantiates a +polymorphic value $e$ of type +~\lstinline@[$a_1$ >: $L_1$ <: $U_1 \commadots a_n$ >: $L_n$ <: $U_n$]S@~ with +argument types \lstinline@$T_1 \commadots T_n$@. Every argument type +$T_i$ must obey corresponding bounds $L_i$ and +$U_i$. That is, for each $i = 1 \commadots n$, we must +have $L_i \sigma \conforms T_i \conforms U_i \sigma$, where $\sigma$ is the +substitution $[a_1 := T_1 \commadots a_n := T_n]$. The type +of the application is \lstinline@S$\sigma$@. + +The function part $e$ may also have some value type. In this case +the type application is taken to be equivalent to +~\lstinline@$e$.apply[$T_1 \commadots$ T$_n$]@, i.e.\ the +application of an \code{apply} method defined by $e$. + +Type applications can be omitted if local type inference +(\sref{sec:local-type-inf}) can infer best type parameters for a +polymorphic functions from the types of the actual function arguments +and the expected result type. + +\section{References to Overloaded Bindings} +\label{sec:overloaded-refs} + +If a name $f$ referenced in an identifier or selection is +overloaded (\sref{sec:overloaded-defs}), the context of the reference +has to identify a unique alternative of the overloaded binding. The +way this is done depends on whether or not $f$ is used as a +function. Let $\AA$ be the set of all type alternatives of +$f$. + +Assume first that $f$ appears as a function in an application, as +in \lstinline@$f$($\args\,$)@. If there is precisely one alternative in +$\AA$ which is a (possibly polymorphic) method type whose arity +matches the number of arguments given, that alternative is chosen. + +Otherwise, let $\argtypes$ be the vector of types obtained by +typing each argument with a missing expected type. One determines +first the set of applicable alternatives. A method type alternative is +{\em applicable} if each type in $\argtypes$ is compatible with +the corresponding formal parameter type in the alternative, and, if +the expected type is defined, the method's result type is compatible to +it. A polymorphic method type is applicable if local type inference +can determine type arguments so that the instantiated method type is +applicable. + +Here, a type $T$ is {\em compatible} to a type $U$ if $T$ +conforms to $U$ after applying implicit conversions +(\sref{sec:impl-conv}). + +Let $\BB$ be the set of applicable alternatives. It is an error if +$\BB$ is empty. Otherwise, one chooses the {\em most specific} +alternative among the alternatives in $\BB$, according to the +following definition of being ``more specific''. +\begin{itemize} +\item +A method type \lstinline@($\Ts\,$)$U$@ is more specific than some other +type $S$ if $S$ is applicable to arguments \lstinline@($ps\,$)@ of +types $\Ts$. +\item +A polymorphic method type +~\lstinline@[$a_1$ >: $L_1$ <: $U_1 \commadots a_n$ >: $L_n$ <: $U_n$]T@~ is +more specific than some other type $S$ if $T$ is more +specific than $S$ under the assumption that for +$i = 1 \commadots n$ each $a_i$ is an abstract type name +bounded from below by $L_i$ and from above by $U_i$. +\item +Any other type is always more specific than a parameterized method +type or a polymorphic type. +\end{itemize} +It is an error if there is no unique alternative in $\BB$ which is +more specific than all other alternatives in $\BB$. + +Assume next that $f$ appears as a function in a type +application, as in \lstinline@$f$[$\targs\,$]@. Then we choose an alternative in +$\AA$ which takes the same number of type parameters as there are +type arguments in $\targs$. It is an error if no such alternative +exists, or if it is not unique. + +Assume finally that $f$ does not appear as a function in either +an application or a type application. If an expected type is given, +let $\BB$ be the set of those alternatives in $\AA$ which are +compatible to it. Otherwise, let $\BB$ be the same as $\AA$. +We choose in this case the most specific alternative among all +alternatives in $\BB$. It is an error if there is no unique +alternative in $\BB$ which is more specific than all other +alternatives in $\BB$. + +\example Consider the following definitions: + +\begin{lstlisting} + class A extends B {} + def f(x: B, y: B) = $\ldots$ + def f(x: A, y: B) = $\ldots$ + val a: A, b: B +\end{lstlisting} +Then the application \lstinline@f(b, b)@ refers to the first +definition of $f$ whereas the application \lstinline@f(a, a)@ +refers to the second. Assume now we add a third overloaded definition +\begin{lstlisting} + def f(x: B, y: A) = $\ldots$ +\end{lstlisting} +Then the application \lstinline@f(a, a)@ is rejected for being ambiguous, since +no most specific applicable signature exists. + +\section{Instance Creation Expressions} +\label{sec:inst-creation} + +\syntax\begin{lstlisting} + SimpleExpr ::= new Template +\end{lstlisting} + +A simple instance creation expression is of the form ~\lstinline@new $c$@~ +where $c$ is a constructor invocation +(\sref{sec:constr-invoke}). Let $T$ be the type of $c$. Then $T$ must +denote a (a type instance of) a non-abstract subclass of +\lstinline@scala.AnyRef@ which conforms to its self type +(\sref{sec:classes}). The expression is evaluated by creating a fresh +object of type $T$ which is is initialized by evaluating $c$. The +type of the expression is $T$'s self type (which might be less +specific than $T\,$). + +A general instance creation expression is of the form +\begin{lstlisting} +new $sc$ with $mc_1$ with $\ldots$ with $mc_n$ {$\stats\,$} +\end{lstlisting} +where $n \geq 0$, $sc$ as well as $mc_1 \commadots mc_n$ are +constructor invocations (of types $S, T_1 \commadots T_n$, say) and +$\stats$ is a statement sequence containing initializer statements and +member definitions (\sref{sec:members}). The type of such an instance +creation expression is then the compound type +\lstinline@$S$ with $T_1$ with $\ldots$ with $T_n$ {$R\,$}@, +where \lstinline@{$R\,$}@ is +a refinement (\sref{sec:compound-types}) which declares exactly those +members of $\stats$ that override a member of $S$ or $T_1 \commadots +T_n$. \todo{what about methods and overloaded defs?} For this type to +be well-formed, $R$ may not reference types defined in $\stats$ which +do not themselves form part of $R$. + +The instance creation expression is evaluated by creating a fresh +object, which is initialized by evaluating the expression template. + +\example Consider the class +\begin{lstlisting} +abstract class C { + type T; val x: T; def f(x: T): AnyRef +} +\end{lstlisting} +and the instance creation expression +\begin{lstlisting} +C { type T = Int; val x: T = 1; def f(x: T): T = y; val y: T = 2 } +\end{lstlisting} +Then the created object's type is: +\begin{lstlisting} +C { type T = Int; val x: T; def f(x: T): T } +\end{lstlisting} +The value $y$ is missing from the type, since $y$ does not +override a member of $C$. + +\section{Blocks} +\label{sec:blocks} + +\syntax\begin{lstlisting} + BlockExpr ::= `{' Block `}' + Block ::= [{BlockStat `;'} ResultExpr] +\end{lstlisting} + +A block expression ~\lstinline@{$s_1$; $\ldots$; $s_n$; $e\,$}@~ is constructed from a +sequence of block statements $s_1 \commadots s_n$ and a final +expression $e$. The final expression can be omitted, in which +case the unit value \lstinline@()@ is assumed. + +%Whether or not the scope includes the statement itself +%depends on the kind of definition. + +The expected type of the final expression $e$ is the expected +type of the block. The expected type of all preceding statements is +missing. + +The type of a block ~\lstinline@$s_1$; $\ldots$; $s_n$; $e$@~ is usually the type of +$e$. That type must be equivalent to a type which does not refer +to an entity defined locally in the block. If this condition is +violated, but a fully defined expected type is given, the type of the +block is instead assumed to be the expected type. + +Evaluation of the block entails evaluation of its statement sequence, +followed by an evaluation of the final expression $e$, which +defines the result of the block. + +\example +Written in isolation, +the block +\begin{lstlisting} +{ class C extends B {$\ldots$} ; new C } +\end{lstlisting} +is illegal, since its type +refers to class $C$, which is defined locally in the block. + +However, when used in a definition such as +\begin{lstlisting} +val x: B = { class C extends B {$\ldots$} ; new C } +\end{lstlisting} +the block is well-formed, since the problematic type $C$ can be +replaced by the expected type $B$. + +\section{Prefix, Infix, and Postfix Operations} +\label{sec:infix-operations} + +\syntax\begin{lstlisting} + PostfixExpr ::= InfixExpr [id] + InfixExpr ::= PrefixExpr + | InfixExpr id PrefixExpr + PrefixExpr ::= [`-' | `+' | `!' | `~'] SimpleExpr +\end{lstlisting} + +Expressions can be constructed from operands and operators. A prefix +operation $op;e$ consists of a prefix operator $op$, which +must be one of the identifiers `\lstinline@+@', `\lstinline@-@', `\lstinline@!@', or +`\lstinline@~@', and a simple expression $e$. The expression is +equivalent to the postfix method application $e.op$. + +Prefix operators are different from normal function applications in +that their operand expression need not be atomic. For instance, the +input sequence \lstinline@-sin(x)@ is read as \lstinline@-(sin(x))@, whereas the +function application \lstinline@negate sin(x)@ would be parsed as the +application of the infix operator \code{sin} to the operands +\code{negate} and \lstinline@(x)@. + +An infix or postfix operator can be an arbitrary identifier. Infix +operators have precedence and associativity defined as follows: + +The {\em precedence} of an infix operator is determined by the operator's first +character. Characters are listed below in increasing order of +precedence, with characters on the same line having the same precedence. +\begin{lstlisting} + $\mbox{\rm\sl(all letters)}$ + | + ^ + & + < > + = ! + : + + - + * / % + $\mbox{\rm\sl(all other special characters)}$ +\end{lstlisting} +That is, operators starting with a letter have lowest precedence, +followed by operators starting with `\lstinline@|@', etc. + +The {\em associativity} of an operator is determined by the operator's +last character. Operators ending with a colon `\lstinline@:@' are +right-associative. All other operators are left-associative. + +Precedence and associativity of operators determine the grouping of +parts of an expression as follows. +\begin{itemize} +\item If there are several infix operations in an +expression, then operators with higher precedence bind more closely +than operators with lower precedence. +\item If there are consecutive infix +operations $e_0; \op_1; e_1; \op_2 \ldots \op_n; e_n$ +with operators $\op_1 \commadots \op_n$ of the same precedence, +then all these operators must +have the same associativity. If all operators are left-associative, +the sequence is interpreted as +$(\ldots(e_0;\op_1;e_1);\op_2\ldots);\op_n;e_n$. +Otherwise, if all operators are right-associative, the +sequence is interpreted as +$e_0;\op_1;(e_1;\op_2;(\ldots \op_n;e_n)\ldots)$. +\item +Postfix operators always have lower precedence than infix +operators. E.g.\ $e_1;\op_1;e_2;\op_2$ is always equivalent to +$(e_1;\op_1;e_2);\op_2$. +\end{itemize} +A postfix operation $e;\op$ is interpreted as $e.\op$. A +left-associative binary operation $e_1;\op;e_2$ is interpreted as +$e_1.\op(e_2)$. If $\op$ is right-associative, the same operation is +interpreted as ~\lstinline@(val $x$=$e_1$; $e_2$.$\op$($x\,$))@, +where $x$ is a fresh name. + +\section{Typed Expressions} + +\syntax\begin{lstlisting} + Expr1 ::= PostfixExpr [`:' Type1] +\end{lstlisting} + +The typed expression $e: T$ has type $T$. The type of +expression $e$ is expected to conform to $T$. The result of +the expression is the value of $e$ converted to type $T$. + +\example Here are examples of well-typed and illegally typed expressions. + +\begin{lstlisting} + 1: int // legal, of type int + 1: long // legal, of type long + // 1: string // illegal +\end{lstlisting} + +\section{Assignments} + +\syntax\begin{lstlisting} + Expr1 ::= Designator `=' Expr + | SimpleExpr ArgumentExprs `=' Expr +\end{lstlisting} + +The interpretation of an assignment to a simple variable ~\lstinline@$x$ = $e$@~ +depends on the definition of $x$. If $x$ denotes a mutable +variable, then the assignment changes the current value of $x$ to be +the result of evaluating the expression $e$. The type of $e$ is +expected to conform to the type of $x$. If $x$ is a parameterless +function defined in some template, and the same template contains a +setter function \lstinline@$x$_=@ as member, then the assignment +~\lstinline@$x$ = $e$@~ is interpreted as the invocation +~\lstinline@$x$_=($e\,$)@~ of that setter function. Analogously, an +assignment ~\lstinline@$f.x$ = $e$@~ to a parameterless function $x$ +is interpreted as the invocation ~\lstinline@$f.x$_=($e\,$)@. + +An assignment ~\lstinline@$f$($\args\,$) = $e$@~ with a function application to the +left of the ``\lstinline@=@' operator is interpreted as +~\lstinline@$f.$update($\args$, $e\,$)@, i.e.\ +the invocation of an \code{update} function defined by $f$. + +\example \label{ex:imp-mat-mul} +Here is the usual imperative code for matrix multiplication. + +\begin{lstlisting} +def matmul(xss: Array[Array[double]], yss: Array[Array[double]]) = { + val zss: Array[Array[double]] = new Array(xss.length, yss.length); + var i = 0; + while (i < xss.length) { + var j = 0; + while (j < yss(0).length) { + var acc = 0.0; + var k = 0; + while (k < yss.length) { + acc = acc + xs(i)(k) * yss(k)(j); + k = k + 1 + } + zss(i)(j) = acc; + j = j + 1 + } + i = i + 1 + } + zss +} +\end{lstlisting} +Desugaring the array accesses and assignments yields the following +expanded version: +\begin{lstlisting} +def matmul(xss: Array[Array[double]], yss: Array[Array[double]]) = { + val zss: Array[Array[double]] = new Array(xss.length, yss.length); + var i = 0; + while (i < xss.length) { + var j = 0; + while (j < yss(0).length) { + var acc = 0.0; + var k = 0; + while (k < yss.length) { + acc = acc + xss.apply(i).apply(k) * yss.apply(k).apply(j); + k = k + 1 + } + zss.apply(i).update(j, acc); + j = j + 1 + } + i = i + 1 + } + zss +} +\end{lstlisting} + +\section{Conditional Expressions} + +\syntax\begin{lstlisting} + Expr1 ::= if `(' Expr `)' Expr [[`;'] else Expr] +\end{lstlisting} + +The conditional expression ~\lstinline@if ($e_1$) $e_2$ else $e_3$@~ chooses +one of the values of $e_2$ and $e_3$, depending on the +value of $e_1$. The condition $e_1$ is expected to +conform to type \code{boolean}. The then-part $e_2$ and the +else-part $e_3$ are both expected to conform to the expected +type of the conditional expression. The type of the conditional +expression is the least upper bound of the types of $e_1$ and +$e_2$. A semicolon preceding the \code{else} symbol of a +conditional expression is ignored. + +The conditional expression is evaluated by evaluating first +$e_1$. If this evaluates to \code{true}, the result of +evaluating $e_2$ is returned, otherwise the result of +evaluating $e_3$ is returned. + +A short form of the conditional expression eliminates the +else-part. The conditional expression ~\lstinline@if ($e_1$) $e_2$@~ is +evaluated as if it was ~\lstinline@if ($e_1$) $e_2$ else ()@. The type of +this expression is \code{unit} and the then-part +$e_2$ is also expected to conform to type \code{unit}. + +\section{While Loop Expressions} + +\syntax\begin{lstlisting} + Expr1 ::= while `(' Expr ')' Expr +\end{lstlisting} + +The while loop expression ~\lstinline@while ($e_1$) $e_2$@~ is typed and +evaluated as if it was an application of ~\lstinline@whileLoop ($e_1$) ($e_2$)@~ where +the hypothetical function \code{whileLoop} is defined as follows. + +\begin{lstlisting} + def whileLoop(def c: boolean)(def s: unit): unit = + if (c) { s ; while(c)(s) } else {} +\end{lstlisting} + +\example The loop +\begin{lstlisting} + while (x != 0) { y = y + 1/x ; x = x - 1 } +\end{lstlisting} +Is equivalent to the application +\begin{lstlisting} + whileLoop (x != 0) { y = y + 1/x ; x = x - 1 } +\end{lstlisting} +Note that this application will never produce a division-by-zero +error at run-time, since the +expression ~\lstinline@(y = 1/x)@~ will be evaluated in the body of +\code{while} only if the condition parameter is false. + +\section{Do Loop Expressions} + +\syntax\begin{lstlisting} + Expr1 ::= do Expr [`;'] while `(' Expr ')' +\end{lstlisting} + +The do loop expression ~\lstinline@do $e_1$ while ($e_2$)@~ is typed and +evaluated as if it was the expression ~\lstinline@($e_1$ ; while ($e_2$) $e_1$)@. +A semicolon preceding the \code{while} symbol of a do loop expression is ignored. + +\section{Comprehensions} + +\syntax\begin{lstlisting} + Expr1 ::= for `(' Enumerators `)' [yield] Expr + Enumerator ::= Generator {`;' Enumerator} + Enumerator ::= Generator + | Expr + Generator ::= val Pattern1 `<-' Expr +\end{lstlisting} + +A comprehension ~\lstinline@for ($\enums\,$) yield $e$@~ evaluates +expression $e$ for each binding generated by the enumerators +$\enums$. Enumerators start with a generator, which can be followed by +further generators or filters. A {\em generator} +~\lstinline@val $p$ <- $e$@~ +produces bindings from an expression $e$ which is matched in +some way against pattern $p$. A {\em filter} is an expressions which restricts +enumerated bindings. The precise meaning of generators and filters is +defined by translation to invocations of four methods: \code{map}, +\code{filter}, \code{flatMap}, and \code{foreach}. These methods can +be implemented in different ways for different carrier types. +\comment{As an +example, an implementation of these methods for lists is given in +\sref{cls-list}.} + +The translation scheme is as follows. +In a first step, every generator ~\lstinline@val $p$ <- $e$@, where $p$ is not +a pattern variable, is replaced by +\begin{lstlisting} +val $p$ <- $e$.filter { case $p$ => true; case _ => false } +\end{lstlisting} +Then, the following +rules are applied repeatedly until all comprehensions have been eliminated. +\begin{itemize} +\item +A generator ~\lstinline@val $p$ <- $e$@~ followed by a filter $f$ is translated to +a single generator ~\lstinline@val $p$ <- $e$.filter($x_1 \commadots x_n$ => $f\,$)@~ where +$x_1 \commadots x_n$ are the free variables of $p$. + +\item +A for-comprehension +~\lstinline@for (val $p$ <- $e\,$) yield $e'$@~ +is translated to +~\lstinline@$e$.map { case $p$ => $e'$ }@. + +\item +A for-comprehension +~\lstinline@for (val $p$ <- $e\,$) $e'$@~ +is translated to +~\lstinline@$e$.foreach { case $p$ => $e'$ }@. + +\item +A for-comprehension +\begin{lstlisting} +for (val $p$ <- $e$; val $p'$ <- $e'; \ldots$) yield $e''$ , +\end{lstlisting} +where \lstinline@$\ldots$@ is a (possibly empty) +sequence of generators or filters, +is translated to +\begin{lstlisting} +$e$.flatmap { case $p$ => for (val $p'$ <- $e'; \ldots$) yield $e''$ } . +\end{lstlisting} +\item +A for-comprehension +\begin{lstlisting} +for (val $p$ <- $e$; val $p'$ <- $e'; \ldots$) $e''$ . +\end{lstlisting} +where \lstinline@$\ldots$@ is a (possibly empty) +sequence of generators or filters, +is translated to +\begin{lstlisting} +$e$.foreach { case $p$ => for (val $p'$ <- $e'; \ldots$) $e''$ } . +\end{lstlisting} +\end{itemize} + +\example +the following code produces all pairs of numbers +between $1$ and $n-1$ whose sums are prime. +\begin{lstlisting} +for { val i <- range(1, n); + val j <- range(1, i); + isPrime(i+j) +} yield Pair (i, j) +\end{lstlisting} +The for-comprehension is translated to: +\begin{lstlisting} +range(1, n) + .flatMap { + case i => range(1, i) + .filter { j => isPrime(i+j) } + .map { case j => Pair(i, j) } } +\end{lstlisting} + +\comment{ +\example +\begin{lstlisting} +package class List[a] { + def map[b](f: (a)b): List[b] = match { + case <> => <> + case x :: xs => f(x) :: xs.map(f) + } + def filter(p: (a)Boolean) = match { + case <> => <> + case x :: xs => if p(x) then x :: xs.filter(p) else xs.filter(p) + } + def flatMap[b](f: (a)List[b]): List[b] = + if (isEmpty) Nil + else f(head) ::: tail.flatMap(f); + def foreach(f: (a)Unit): Unit = + if (isEmpty) () + else (f(head); tail.foreach(f)); +} +\end{lstlisting} + +\example +\begin{lstlisting} +abstract class Graph[Node] { + type Edge = (Node, Node) + val nodes: List[Node] + val edges: List[Edge] + def succs(n: Node) = for ((p, s) <- g.edges, p == n) s + def preds(n: Node) = for ((p, s) <- g.edges, s == n) p +} +def topsort[Node](g: Graph[Node]): List[Node] = { + val sources = for (n <- g.nodes, g.preds(n) == <>) n + if (g.nodes.isEmpty) <> + else if (sources.isEmpty) new Error(``topsort of cyclic graph'') throw + else sources :+: topsort(new Graph[Node] { + val nodes = g.nodes diff sources + val edges = for ((p, s) <- g.edges, !(sources contains p)) (p, s) + }) +} +\end{lstlisting} +} + +\example For comprehensions can be used to express vector +and matrix algorithms concisely. +For instance, here is a function to compute the transpose of a given matrix: + +\begin{lstlisting} +def transpose[a](xss: Array[Array[a]]) { + for (val i <- Array.range(0, xss(0).length)) yield + Array(for (val xs <- xss) yield xs(i)) +\end{lstlisting} + +Here is a function to compute the scalar product of two vectors: +\begin{lstlisting} +def scalprod(xs: Array[double], ys: Array[double]) { + var acc = 0.0; + for (val Pair(x, y) <- xs zip ys) acc = acc + x * y; + acc +} +\end{lstlisting} + +Finally, here is a function to compute the product of two matrices. Compare with the imperative version of \ref{ex:imp-mat-mul}. +\begin{lstlisting} +def matmul(xss: Array[Array[double]], yss: Array[Array[double]]) = { + val ysst = transpose(yss); + for (val xs <- xs) yield + for (val yst <- ysst) yield + scalprod(xs, yst) +} +\end{lstlisting} +The code above makes use of the fact that \code{map}, \code{flatmap}, +\code{filter}, and \code{foreach} are defined for members of class +\lstinline@scala.Array@. + +\section{Return Expressions} + +\syntax\begin{lstlisting} + Expr1 ::= return [Expr] +\end{lstlisting} + +A return expression ~\lstinline@return $e$@~ must occur inside the +body of some enclosing named method or function $f$. This function +must have an explicitly declared result type, and the type of $e$ must +conform to it. The return expression evaluates the expression $e$ and +returns its value as the result of $f$. The evaluation of any statements or +expressions following the return expression is omitted. The type of +a return expression is \code{scala.All}. + + + +\section{Throw Expressions} + +\syntax\begin{lstlisting} + Expr1 ::= throw Expr +\end{lstlisting} + +A throw expression ~\lstinline@throw $e$@~ evaluates the expression +$e$. The type of this expression must conform to +\code{Throwable}. If $e$ evaluates to an exception +reference, evaluation is aborted with the thrown exception. If $e$ +evaluates to \code{null}, evaluation is instead aborted with a +\code{NullPointerException}. If there is an active +\code{try} expression (\sref{sec:try}) which handles the thrown +exception, evaluation resumes with the handler; otherwise the thread +executing the \code{throw} is aborted. The type of a throw expression +is \code{scala.All}. + +\section{Try Expressions}\label{sec:try} + +\syntax\begin{lstlisting} + Expr1 ::= try `{' Block `}' [catch Expr] [finally Expr] +\end{lstlisting} + +A try expression ~\lstinline@try { $b$ } catch $e$@~ evaluates the block +$b$. If evaluation of $b$ does not cause an exception to be +thrown, the result of $b$ is returned. Otherwise the {\em +handler} $e$ is applied to the thrown exception. Let $\proto$ +be the expected type of the try expression. The block $b$ is +expected to conform to $\proto$. The handler $e$ is expected +conform to type ~\lstinline@scala.PartialFunction[scala.Throwable, $\proto\,$]@. +The type of the try expression is the least upper bound of the type of +$b$ and the result type of $e$. + +A try expression ~\lstinline@try { $b$ } finally $e$@~ evaluates the block +$b$. If evaluation of $b$ does not cause an exception to be +thrown, the expression $e$ is evaluated. If an exception is thrown +during evaluation of $e$, the evaluation of the try expression is +aborted with the thrown exception. If no exception is thrown during +evaluation of $e$, the result of $b$ is returned as the +result of the try expression. + +If an exception is thrown during +evaluation of $b$, the finally block +$e$ is also evaluated. If another exception $e$ is thrown +during evaluation of $e$, evaluation of the try expression is +aborted with the thrown exception. If no exception is thrown during +evaluation of $e$, the original exception thrown in $b$ is +re-thrown once evaluation of $e$ has completed. The block +$b$ is expected to conform to the expected type of the try +expression. The finally expression $e$ is expected to conform to +type \code{unit}. + +A try expression ~\lstinline@try { $b$ } catch $e_1$ finally $e_2$@~ is a shorthand +for ~\lstinline@try { try { $b$ } catch $e_1$ } finally $e_2$@. + + + + +\section{Anonymous Functions} +\label{sec:closures} + +\syntax\begin{lstlisting} + Expr1 ::= Bindings `=>' Expr + ResultExpr ::= Bindings `=>' Block + Bindings ::= `(' Binding {`,' Binding `)' + | id [`:' Type1] + Binding ::= id [`:' Type] +\end{lstlisting} + +The anonymous function ~\lstinline@($x_1$: $T_1 \commadots x_n$: $T_n$) => e@~ +maps parameters $x_i$ of types $T_i$ to a result given +by expression $e$. The scope of each formal parameter +$x_i$ is $e$. Formal parameters must have pairwise distinct names. + +If the expected type of the anonymous function is of the form +~\lstinline@scala.Function$n$[$S_1 \commadots S_n$, $R\,$]@, the +expected type of $e$ is $R$ and the type $T_i$ of any of the +parameters $x_i$ can be omitted, in which +case~\lstinline@$T_i$ = $S_i$@ is assumed. +If the expected type of the anonymous function is +some other type, all formal parameter types must be explicitly given, +and the expected type of $e$ is missing. The type of the anonymous +function +is~\lstinline@scala.Function$n$[$S_1 \commadots S_n$, $T\,$]@, +where $T$ is the type of $e$. $T$ must be equivalent to a +type which does not refer to any of the formal parameters $x_i$. + +The anonymous function is evaluated as the instance creation expression +\begin{lstlisting} +scala.Function$n$[$T_1 \commadots T_n$, $T$] { + def apply($x_1$: $T_1 \commadots x_n$: $T_n$): $T$ = $e$ +} +\end{lstlisting} +In the case of a single formal parameter, ~\lstinline@($x$: $T\,$) => $e$@~ and ~\lstinline@($x\,$) => $e$@~ +can be abbreviated to ~\lstinline@$x$: $T$ => e@, and ~\lstinline@$x$ => $e$@, respectively. + +\example Examples of anonymous functions: + +\begin{lstlisting} + x => x // The identity function + + f => g => x => f(g(x)) // Curried function composition + + (x: Int,y: Int) => x + y // A summation function + + () => { count = count + 1; count } // The function which takes an + // empty parameter list $()$, + // increments a non-local variable + // `count' and returns the new value. +\end{lstlisting} + +\section{Statements} +\label{sec:statements} + +\syntax\begin{lstlisting} + BlockStat ::= Import + | Def + | {LocalModifier} ClsDef + | Expr + | + TemplateStat ::= Import + | {Modifier} Def + | {Modifier} Dcl + | Expr + | +\end{lstlisting} + +Statements occur as parts of blocks and templates. A statement can be +an import, a definition or an expression, or it can be empty. +Statements used in the template of a class definition can also be +declarations. An expression that is used as a statement can have an +arbitrary value type. An expression statement $e$ is evaluated by +evaluating $e$ and discarding the result of the evaluation. +\todo{Generalize to implicit coercion?} + +Block statements may be definitions which bind local names in the +block. The only modifiers allowed in block-local definitions are modifiers +\code{abstract}, \code{final}, or \code{sealed} preceding a class or +object definition. + +With the exception of overloaded definitions +(\sref{sec:overloaded-defs}), a statement sequence making up a block +or template may not contain two definitions or declarations that bind +the same name in the same namespace. Evaluation of a statement +sequence entails evaluation of the statements in the order they are +written. + +\chapter{Pattern Matching} + +\section{Patterns} + +% 2003 July - changed to new pattern syntax + semantic Burak +% Nov - incorporated changes to grammar, avoiding empty patterns +% definitions for value and sequence patterns +\label{sec:patterns} + +\syntax\begin{lstlisting} + Pattern ::= Pattern1 { `|' Pattern1 } + Pattern1 ::= varid `:' Type + | `_' `:' Type + | Pattern2 + Pattern2 ::= [varid `@'] Pattern3 + Pattern3 ::= SimplePattern [ '*' | '?' | '+' ] + | SimplePattern { id' SimplePattern } + SimplePattern ::= `_' + | varid + | Literal + | StableId [ `(' [Patterns] `)' ] + | `(' [Patterns] `)' + Patterns ::= Pattern {`,' Pattern} + id' ::= id $\textit{ but not }$ '*' | '?' | '+' | `@' | `|' +\end{lstlisting} + +A pattern is built from constants, constructors, variables and regular +operators. Pattern matching tests whether a given value (or sequence +of values) has the shape defined by a pattern, and, if it does, binds +the variables in the pattern to the corresponding components of the +value (or sequence of values). The same variable name may not be +bound more than once in a pattern. + +\subsection{Value and Sequence Patterns} + +\todo{Need to distinguish between value and sequence patterns at the outside} + +On an abstract level, we distinguish between value patterns and sequence patterns, which are defined in a +mutually inductive manner. A {\em value pattern} describes a set of matching values. A +{\em sequence pattern} describes a set of matching of sequences of values. Both sorts of patterns may +contain {\em variable bindings} which serve to extract constituents of a value or sequence, +and may consist of patterns of the respective other sort. + +The type of a patterns and the expected types of variables +within patterns are determined by the context. + +Concretely, we distinguish the following kinds of patterns. + +A {\em wild-card pattern} \_ matches any value. + +A {\em typed pattern} $\_: T$ matches values of type $T$. The type $T$ may be + a class type or a compound type; it may not contain a refinement (\sref{sec:refinements}). +This pattern matches any non-null value of type $T$. $T$ must conform to the pattern's expected +type. A pattern $x:T$ is treated the same way as $x @ (\_:T)$ + +A {\em pattern literal} $l$ matches any value that is equal (in terms +of $==$) to it. It's type must conform to the expected type of the +pattern. + +A {\em named pattern constant} $p$ is a stable identifier +(\sref{sec:stable-ids}). To resolve the syntactic overlap with a +variable pattern, a named pattern constant may not be a simple name +starting with a lower-case letter. The stable identifier $p$ is +expected to conform to the expected type of the pattern. The pattern +matches any value $v$ such that ~\lstinline@$r$ == $v$@~ +(\sref{sec:cls-object}). + +A {\em sequence pattern} $p_1 \commadots p_n$ where $n \geq 0$ is a +sequence of patterns separated by commas and matching the sequence of +values that are matched by the components. Sequence patterns may only +appear under constructor applications, or nested within a another sequence pattern. +Note that empty sequence patterns are allowed. The type of value patterns that appear in +a sequence pattern is the expected type as determined from the constructor. +A {\em fixed-length argument pattern} is a special sequence pattern where +where all $p_i$ are value patterns. + +A {\em choice pattern} $p_1 | \ldots | p_n$ is a choice among several +alternatives, which may not contain variable-binding patterns. It +matches every value and every sequence matched by at least one of its alternatives. +Note that the empty sequence may appear as an alternative. An {\em option +pattern} $p?$ is an abbreviation for $(p| )$. A choice is a value pattern if all its branches +are value patterns. In this case, all branches must conform to the expected type and the type +of the choice is the least upper bound of the branches. Otherwise, it has the same type as the +sequence pattern it is part of. + +An {\em iterated pattern} $p*$ matches sequences of values +consisting of zero, one or more occurrences of values matched by $p$, +where $p$ may not contain a variable-binding pattern. A {\em non-empty +iterated pattern} $p+$ is an abbreviation for $(p,p*)$. + +A {\em constructor pattern} $c ( p )$ consists of a simple type $c$ +followed by a pattern $p$. If $c$ designates a monomorphic case +class, then it must conform to the expected type of the pattern, the +pattern must be a fixed length argument pattern $p_1 \commadots p_n$ +whose length corresponds to the number of arguments of $c$'s primary +constructor. The expected types of the component patterns are then +taken from the formal parameter types of (said) constructor. If $c$ +designates a polymorphic case class, then there must be a unique type +application instance of it such that the instantiation of $c$ conforms +to the expected type of the pattern. The instantiated formal parameter +types of $c$'s primary constructor are then taken as the expected +types of the component patterns $p_1\commadots p_n$. In both cases, +the pattern matches all objects created from constructor invocations +$c(v_1 \commadots v_n)$ where each component pattern $p_i$ matches the +corresponding value $v_i$. If $c$ does not designate a case class, it +must be a subclass of \lstinline@Seq[$T\,$]@. In that case $p$ may be an +arbitrary sequence pattern. Value patterns in $p$ are expected to conform to +type $T$, and the pattern matches all objects whose \lstinline@elements()@ +method returns a sequence that matches $p$. + +The pattern $(p)$ is regarded as equivalent to the pattern $p$, if $p$ +is a nonempty sequence pattern. The empty tuple $()$ is a shorthand +for the constructor pattern \code{Unit}. + +An {\em infix operation pattern} ~\lstinline@$p$ $op$ $p'$@~ is a shorthand for the +constructor pattern ~\lstinline@$op$($p$, $p'$)@. The precedence and +associativity of operators in patterns is the same as in expressions +(\sref{sec:infix-operations}). The operands may not be empty sequence +patterns. + +\subsection{Variable Binding} + +A {\em variable-binding pattern} $x @ p$ is a simple identifier $x$ +which starts with a lower case letter, together with a pattern $p$. It +matches a value or a sequence of values whenever $p$ does, and in +addition binds the variable name to that value or to that sequence of +values. If $p$ is a value pattern of type $T$, the type of $x$ is also $T$. +If $p$ is a sequence pattern and appears under a constructor $c <: $\lstinline@Seq[$T\,$]@, +then the type of $x$ is \lstinline@List[$T\,$]@. %%\todo{really?} burak:yes +where $T$ is the expected type as dictated by the constructor. A pattern +consisting of only a variable $x$ is treated as the bound value pattern $x @ \_$. + +Regular expressions that contain variable bindings may be ambiguous, +i.e. there might be several ways to match a sequence against the +pattern. In these cases, the \emph{right-longest policy} applies: +patterns that appear more to the right than others in a sequence take precedence in case +of overlaps. + +\example Some examples of patterns are: +\begin{enumerate} +\item +The pattern ~\lstinline@ex: IOException@~ matches all instances of class +\code{IOException}, binding variable \code{ex} to the instance. +\item +The pattern ~\lstinline@Pair(x, _)@~ matches pairs of values, binding \code{x} to +the first component of the pair. The second component is matched +with a wildcard pattern. +\item +The pattern \ \code{List( x, y, xs @ _ * )} matches lists of length $\geq 2$, +binding \code{x} to the list's first element, \code{y} to the list's +second element, and \code{xs} to the remainder, which may be empty. +\item +The pattern \ \code{List( 1, x@(( 'a' | 'b' )+),y,_ )} matches a list that +contains 1 as its first element, continues with a non-empty sequence of +\code{'a'}s and \code{'b'}s, followed by two more elements. The sequence 'a's and 'b's +is bound to \code{x}, and the next to last element is bound to \code{y}. +\item +The pattern \code{List( x@( 'a'* ), 'a'+ )} matches a non-empty list of +\code{'a'}s. Because of the shortest match policy, \code{x} will always be bound to +the empty sequence. +\item +The pattern \code{List( x@( 'a'+ ), 'a'* )} also matches a non-empty list of +\code{'a'}s. Here, \code{x} will always be bound to +the sequence containing one \code{'a'} +\end{enumerate} + +\section{Pattern Matching Expressions} +\label{sec:pattern-match} + +\syntax\begin{lstlisting} + BlockExpr ::= `{' CaseClause {CaseClause} `}' + CaseClause ::= case Pattern [`if' PostfixExpr] `=>' Block +\end{lstlisting} + +A pattern matching expression +~\lstinline@case $p_1$ => $b_1$ $\ldots$ case $p_n$ => $b_n$@ \ consists of a number +$n \geq 1$ of cases. Each case consists of a (possibly guarded) pattern +$p_i$ and a block $b_i$. The scope of the pattern variables in $p_i$ is +the corresponding block $b_i$. + +The expected type of a pattern matching expression must in part be +defined. It must be either ~\lstinline@scala.Function1[$T_p$, $T_r$]@ \ or +~\lstinline@scala.PartialFunction[$T_p$, $T_r$]@, where the argument type +$T_p$ must be fully determined, but the result type +$T_r$ may be undetermined. All patterns are typed +relative to the expected type $T_p$ (\sref{sec:patterns}). The expected type of +every block $b_i$ is $T_r$. +Let $T_b$ be the least upper bound of the types of all blocks +$b_i$. The type of the pattern matching expression is +then the required type with $T_r$ replaced by $T_b$ +(i.e. the type is either ~\lstinline@scala.Function[$T_p$, $T_b$]@~ or +~\lstinline@scala.PartialFunction[$T_p$, $T_b$]@. + +When applying a pattern matching expression to a selector value, +patterns are tried in sequence until one is found which matches the +selector value (\sref{sec:patterns}). Say this case is $\CASE;p_i +\Arrow b_i$. The result of the whole expression is then the result of +evaluating $b_i$, where all pattern variables of $p_i$ are bound to +the corresponding parts of the selector value. If no matching pattern +is found, a \code{scala.MatchError} exception is thrown. + +The pattern in a case may also be followed by a guard suffix \ \code{if e}\ +with a boolean expression $e$. The guard expression is evaluated if +the preceding pattern in the case matches. If the guard expression +evaluates to \code{true}, the pattern match succeeds as normal. If the +guard expression evaluates to \code{false}, the pattern in the case +is considered not to match and the search for a matching pattern +continues. + +\comment{ +A case with several patterns $\CASE;p_1 \commadots p_n ;\IF; e \Arrow b$ is a +shorthand for a sequence of single-pattern cases $\CASE;p_1;\IF;e \Arrow b +;\ldots; \CASE;p_n ;\IF;e\Arrow b$. In this case none of the patterns +$p_i$ may contain a named pattern variable (but the patterns may contain +wild-cards). +} + +In the interest of efficiency the evaluation of a pattern matching +expression may try patterns in some other order than textual +sequence. This might affect evaluation through +side effects in guards. However, it is guaranteed that a guard +expression is evaluated only if the pattern it guards matches. + +\example +Often, pattern matching expressions are used as arguments +of the \code{match} method, which is predefined in class \code{Any} +(\sref{sec:cls-object}) and is implemented there by postfix function +application. Here is an example: +\begin{lstlisting} +def length [a] (xs: List[a]) = xs match { + case Nil => 0 + case x :: xs1 => 1 + length (xs1) +} +\end{lstlisting} + +\chapter{Top-Level Definitions} +\label{sec:topdefs} + +\syntax\begin{lstlisting} + CompilationUnit ::= [package QualId `;'] {TopStat `;'} TopStat + TopStat ::= {Modifier} ClsDef + | Import + | Packaging + | + QualId ::= id {`.' id} +\end{lstlisting} + +A compilation unit consists of a sequence of packagings, import +clauses, and class and object definitions, which may be preceded by a +package clause. + +A compilation unit ~\lstinline@package $p$; $\stats$@~ starting with a package +clause is equivalent to a compilation unit consisting of a single +packaging ~\lstinline@package $p$ { $\stats$ }@. + +Implicitly imported into every compilation unit are, in that order : +the package \code{java.lang}, the package \code{scala}, and the object +\code{scala.Predef} (\sref{cls:predef}). Members of a later import in +that order hide members of an earlier import. + +\section{Packagings}\label{sec:packagings} + +\syntax\begin{lstlisting} + Packaging ::= package QualId `{' {TopStat `;'} TopStat `}' +\end{lstlisting} + +A package is a special object which defines a set of member classes, +objects and packages. Unlike other objects, packages are not introduced +by a definition. Instead, the set of members of a package is determined by +packagings. + +A packaging \ \code{package p { ds }}\ injects all definitions in +\code{ds} as members into the package whose qualified name is +$p$. If a definition in \code{ds} is labeled \code{private}, it +is visible only for other members in the package. + +Selections \code{p.m} from $p$ as well as imports from $p$ +work as for objects. However, unlike other objects, packages may not +be used as values. It is illegal to have a package with the same fully +qualified name as a module or a class. + +Top-level definitions outside a packaging are assumed to be injected +into a special empty package. That package cannot be named and +therefore cannot be imported. However, members of the empty package +are visible to each other without qualification. + +\example The following example will create a hello world program as +function \code{main} of module \code{test.HelloWorld}. +\begin{lstlisting} +package test; + +object HelloWord { + def main(args: Array[String]) = System.out.println("hello world") +} +\end{lstlisting} + +\chapter{Local Type Inference} +\label{sec:local-type-inf} + +To be completed. + +\chapter{The Scala Standard Library} + +The Scala standard library consists of the package \code{scala} with a +number of classes and modules. Some of these classes are described in +the following. + +\section{Root Classes} +\label{sec:cls-root} +\label{sec:cls-any} +\label{sec:cls-object} + +The root of the Scala class hierarchy is formed by class \code{Any}. +Every class in a Scala execution environment inherits directly or +indirectly from this class. Class \code{Any} has two direct +subclasses: \code{AnyRef} and\code{AnyVal}. + +The subclass \code{AnyRef} represents all values which are represented +as objects in the underlying host system. Every user-defined Scala +class inherits directly or indirectly from this class. Furthermore, +every user-defined Scala class also inherits the trait +\code{scala.ScalaObject}. Classes written in other languages still +inherit from \code{scala.AnyRef}, but not from +\code{scala.ScalaObject}. + +The class \code{AnyVal} has a fixed number subclasses, which describe +values which are not implemented as objects in the underlying host +system. + +Classes \code{AnyRef} and \code{AnyVal} are required to provide only +the members declared in class \code{Any}, but implementations may add +host-specific methods to these classes (for instance, an +implementation may identify class \code{AnyRef} with its own root +class for objects). + +The standard interfaces of these root classes is described by the +following definitions. + +\begin{lstlisting} +package scala; +abstract class Any { + + /** Reference equality */ + final def eq(that: Any): boolean = $\ldots$ + + /** Defined equality */ + def equals(that: Any): boolean = this eq that; + + /** Semantic equality between values of same type */ + final def == (that: Any): boolean = this equals that + + /** Semantic inequality between values of same type */ + final def != (that: Any): boolean = !(this == that) + + /** Hash code */ + def hashCode(): Int = $\ldots$ + + /** Textual representation */ + def toString(): String = $\ldots$ + + /** Type test */ + def isInstanceOf[a]: Boolean = match { + case x: a => true + case _ => false + } + + /** Type cast */ + def asInstanceOf[a]: a = match { + case x: a => x + case _ => if (this eq null) this + else throw new ClassCastException() + } + + /** Pattern match */ + def match[a, b](cases: a => b): b = cases(this); +} +final class AnyVal extends Any; +class AnyRef extends Any; +trait ScalaObject extends AnyRef; +\end{lstlisting} + +The type cast operation \verb@asInstanceOf@ has a special meaning (not +expressed in the code above) when its type parameter is a numeric +type. For any type \lstinline@T <: Double@, and any numeric value +\verb@v@ \lstinline@v.asInstanceIf[T]@ converts \code{v} to type +\code{T} using the rules of Java's numeric type cast operation. The +conversion might truncate the numeric value (as when going from +\code{Long} to \code{Int} or from \code{Int} to \code{Byte}) or it +might lose precision (as when going from \code{Double} to \code{Float} +or when converting between \code{Long} and \code{Float}). + +\section{Value Classes} +\label{cls:value} + +Value classes are classes whose instances are not represented as +objects by the underlying host system. All value classes inherit from +class \code{AnyVal}. Scala implementations need to provide the +value classes \code{Unit}, \code{Boolean}, \code{Double}, \code{Float}, +\code{Long}, \code{Int}, \code{Char}, \code{Short}, and \code{Byte} +(but are free to provide others as well). +The signatures of these classes are defined in the following. + +\subsection{Class \large{\code{Double}}} + +\begin{lstlisting} +package scala; +abstract sealed class Double extends AnyVal { + def + (that: Double): Double // double addition + def - (that: Double): Double // double subtraction + def * (that: Double): Double // double multiplication + def / (that: Double): Double // double division + def % (that: Double): Double // double remainder + + def == (that: Double): Boolean // double equality + def != (that: Double): Boolean // double inequality + def < (that: Double): Boolean // double less + def > (that: Double): Boolean // double greater + def <= (that: Double): Boolean // double less or equals + def >= (that: Double): Boolean // double greater or equals + + def - : Double = 0.0 - this // double negation + def + : Double = this +} +\end{lstlisting} + +\subsection{Class \large{\code{Float}}} + +\begin{lstlisting} +package scala; +abstract sealed class Float extends AnyVal { + def coerce: Double // convert to Double + + def + (that: Double): Double; // double addition + def + (that: Float): Double // float addition + /* analogous for -, *, /, % */ + + def == (that: Double): Boolean; // double equality + def == (that: Float): Boolean; // float equality + /* analogous for !=, <, >, <=, >= */ + + def - : Float; // float negation + def + : Float +} +\end{lstlisting} + +\subsection{Class \large{\code{Long}}} + +\begin{lstlisting} +package scala; +abstract sealed class Long extends AnyVal { + def coerce: Double // convert to Double + def coerce: Float // convert to Float + + def + (that: Double): Double; // double addition + def + (that: Float): Double; // float addtion + def + (that: Long): Long = // long addition + /* analogous for -, *, /, % */ + + def << (cnt: Int): Long // long left shift + def >> (cnt: Int): Long // long signed right shift + def >>> (cnt: Int): Long // long unsigned right shift + def & (that: Long): Long // long bitwise and + def | (that: Long): Long // long bitwise or + def ^ (that: Long): Long // long bitwise exclusive or + + def == (that: Double): Boolean; // double equality + def == (that: Float): Boolean; // float equality + def == (that: Long): Boolean // long equality + /* analogous for !=, <, >, <=, >= */ + + def - : Long; // long negation + def + : Long; // long identity + def ~ : Long // long bitwise negation +} +\end{lstlisting} + +\subsection{Class \large{\code{Int}}} + +\begin{lstlisting} +package scala; +abstract sealed class Int extends AnyVal { + def coerce: Double // convert to Double + def coerce: Float // convert to Float + def coerce: Long // convert to Long + + def + (that: Double): Double; // double addition + def + (that: Float): Double; // float addtion + def + (that: Long): Long; // long addition + def + (that: Int): Int; // int addition + /* analogous for -, *, /, % */ + + def << (cnt: Int): Int; // int left shift + /* analogous for >>, >>> */ + + def & (that: Long): Long; // long bitwise and + def & (that: Int): Int; // int bitwise and + /* analogous for |, ^ */ + + def == (that: Double): Boolean; // double equality + def == (that: Float): Boolean; // float equality + def == (that: Long): Boolean // long equality + def == (that: Int): Boolean // int equality + /* analogous for !=, <, >, <=, >= */ + + def - : Int; // int negation + def + : Int; // int identity + def ~ : Int; // int bitwise negation +} +\end{lstlisting} + +\subsection{Class \large{\code{Short}}} + +\begin{lstlisting} +package scala; +abstract sealed class Short extends AnyVal { + def coerce: Double // convert to Double + def coerce: Float // convert to Float + def coerce: Long // convert to Long + def coerce: Int // convert to Int +} +\end{lstlisting} + +\subsection{Class \large{\code{Char}}} + +\begin{lstlisting} +package scala; +abstract sealed class Char extends AnyVal { + def coerce: Double // convert to Double + def coerce: Float // convert to Float + def coerce: Long // convert to Long + def coerce: Int // convert to Int + + def isDigit: Boolean; // is this character a digit? + def isLetter: Boolean; // is this character a letter? + def isLetterOrDigit: Boolean; // is this character a letter or digit? + def isWhiteSpace // is this a whitespace character? +} +\end{lstlisting} + +\subsection{Class \large{\code{Short}}} + +\begin{lstlisting} +package scala; +abstract sealed class Short extends AnyVal { + def coerce: Double // convert to Double + def coerce: Float // convert to Float + def coerce: Long // convert to Long + def coerce: Int // convert to Int + def coerce: Short // convert to Short +} +\end{lstlisting} + +\subsection{Class \large{\code{Boolean}}} +\label{sec:cls-boolean} + +\begin{lstlisting} +package scala; +abstract sealed class Boolean extends AnyVal { + def && (def x: Boolean): Boolean; // boolean and + def || (def x: Boolean): Boolean; // boolean or + def & (x: Boolean): Boolean; // boolean strict and + def | (x: Boolean): Boolean // boolean strict or + + def == (x: Boolean): Boolean // boolean equality + def != (x: Boolean): Boolean // boolean inequality + + def ! (x: Boolean): Boolean // boolean negation +} +\end{lstlisting} + +\subsection{Class \large{\code{Unit}}} + +\begin{lstlisting} +package scala; +abstract sealed class Unit extends AnyVal; +\end{lstlisting} + +\section{Standard Reference Classes} +\label{cls:reference} + +This section presents some standard Scala reference classes which are +treated in a special way in Scala compiler -- either Scala provides +syntactic sugar for them, or the Scala compiler generates special code +for their operations. Other classes in the standard Scala library are +documented by HTML pages elsewhere. + +\subsection{Class \large{\code{String}}} + +The \verb@String@ class is usually derived from the standard String +class of the underlying host system (and may be identified with +it). For Scala clients the class is taken to support in each case a +method +\begin{lstlisting} +def + (that: Any): String +\end{lstlisting} +which concatenates its left operand with the textual representation of its +right operand. + +\subsection{The \large{\code{Tuple}} classes} + +Scala defines tuple classes \lstinline@Tuple$n$@ for $n = 2 \commadots 9$. +These are defined as follows. + +\begin{lstlisting} +package scala; +case class Tuple$n$[+a_1, ..., +a_n](_1: a_1, ..., _$n$: a_$n$) { + def toString = "(" ++ _1 ++ "," ++ $\ldots$ ++ "," ++_$n$ ++ ")" +} +\end{lstlisting} + +The implicity imported \code{Predef} object (\sref{cls:predef}) defines +the names \code{Pair} as an alias of \code{Tuple2} and \code{Triple} +as an alias for \code{Tuple3}. + +\subsection{The \large{\code{Function}} Classes} +\label{sec:cls-function} + +Scala defines function classes \lstinline@Function$n$@ for $n = 1 \commadots 9$. +These are defined as follows. + +\begin{lstlisting} +package scala; +class Function$n$[-a_1, ..., -a_$n$, +b] { + def apply(x_1: a_1, ..., x_$n$: a_$n$): b; + def toString = "<function>"; +} +\end{lstlisting} + +\comment{ +There is also a module \code{Function}, defined as follows. +\begin{lstlisting} +package scala; +module Function { + def compose[a](fs: List[(a)a]): (a)a = { + x => fs match { + case Nil => x + case f :: fs1 => compose(fs1)(f(x)) + } + } +} +\end{lstlisting} +} + +A subclass of \lstinline@Function1@ represents partial functions, +which are undefined on some points in their domain. In addition to the +\code{apply} method of functions, partial functions also have a +\code{isDefined} method, which tells whether the function is defined +at the given argument: +\begin{lstlisting} +class PartialFunction[-a,+b] extends Function1[a, b] { + def isDefinedAt(x: a): Boolean +} +\end{lstlisting} + +The implicity imported \code{Predef} object (\sref{cls:predef}) defines the name +\code{Function} as an alias of \code{Function1}. + +\subsection{Class \large{\code{Array}}}\label{cls:array} + +The class of generic arrays is given as follows. + +\begin{lstlisting} +package scala; +class Array[a](length: int) with Function[Int, a] { + def length: int; + def apply(i: Int): a; + def update(i: Int)(x: a): Unit; +} +\end{lstlisting} + +\comment{ +\begin{lstlisting} +module Array { + def create[a](i1: Int): Array[a] = Array[a](i1) + def create[a](i1: Int, i2: Int): Array[Array[a]] = { + val x: Array[Array[a]] = create(i1) + 0 to (i1 - 1) do { i => x(i) = create(i2) } + x + } + $\ldots$ + def create[a](i1: Int, i2: Int, i3: Int, i4: Int, i5: Int, + i6: Int, i7: Int, i8: Int, i9: Int, i10: Int) + : Array[Array[Array[Array[Array[Array[Array[Array[Array[Array[a]]]]]]]]]] = { + val x: Array[Array[Array[Array[Array[Array[Array[Array[Array[a]]]]]]]]] = create(i1) + 0 to (i1 - 1) do { i => x(i) = create(i2, i3, i4, i5, i6, i7, i8, i9, i10) } + x + } +} +\end{lstlisting} +} + +\section{The \large{\code{Predef}} Object}\label{cls:predef} + +The \code{Predef} module defines standard functions and type aliases +for Scala programs. It is always implicity imported, so that all its +defined members are available without qualification. Here is its +definition for the JVM environment. + +\begin{lstlisting} +package scala; +object Predef { + type byte = scala.Byte; + type short = scala.Short; + type char = scala.Char; + type int = scala.Int; + type long = scala.Long; + type float = scala.Float; + type double = scala.Double; + type boolean = scala.Boolean; + type unit = scala.Unit; + + type String = java.lang.String; + type NullPointerException = java.lang.NullPointerException; + type Throwable = java.lang.Throwable; + // other aliases to be identified + + /** Abort with error message */ + def error(message: String): All = throw new Error(message); + + /** Throw an error if given assertion does not hold. */ + def assert(assertion: Boolean): Unit = + if (!assertion) throw new Error("assertion failed"); + + /** Throw an error with given message if given assertion does not hold */ + def assert(assertion: Boolean, message: Any): Unit = { + if (!assertion) throw new Error("assertion failed: " + message); + + /** Create an array with given elements */ + def Array[A](xs: A*): Array[A] = { + val array: Array[A] = new Array[A](xs.length); + var i = 0; + for (val x <- xs.elements) { array(i) = x; i = i + 1; } + array; + } + + /** Aliases for pairs and triples */ + type Pair[+p, +q] = Tuple2[p, q]; + def Pair[a, b](x: a, y: b) = Tuple2(x, y); + type Triple[+a, +b, +c] = Tuple3[a, b, c]; + def Triple[a, b, c](x: a, y: b, z: c) = Tuple3(x, y, z); + + /** Alias for unary functions */ + type Function = Function1; + + /** Some standard simple functions */ + def id[a](x: a): a = x; + def fst[a](x: a, y: Any): a = x; + def scd[a](x: Any, y: a): a = y; +} +\end{lstlisting} + +\appendix +\chapter{Scala Syntax Summary} + +The lexical syntax of Scala is given by the following grammar in EBNF +form. + +\begin{lstlisting} + upper ::= `A' | $\ldots$ | `Z' | `$\Dollar$' | `_' + lower ::= `a' | $\ldots$ | `z' + letter ::= upper | lower + digit ::= `0' | $\ldots$ | `9' + special ::= $\mbox{\rm\em ``all other characters except parentheses ([{}]) and periods''}$ + + op ::= special {special} + varid ::= lower {letter | digit} [`_' [id]] + id ::= upper {letter | digit} [`_' [id]] + | varid + | op + | `\'stringLit + + intLit ::= $\mbox{\rm\em ``as in Java''}$ + floatLit ::= $\mbox{\rm\em ``as in Java''}$ + charLit ::= $\mbox{\rm\em ``as in Java''}$ + stringLit ::= $\mbox{\rm\em ``as in Java''}$ + symbolLit ::= `\'' id + + comment ::= `/*' ``any sequence of characters'' `*/' + | `//' `any sequence of characters up to end of line'' +\end{lstlisting} + +The context-free syntax of Scala is given by the following EBNF +grammar. + +\begin{lstlisting} + Literal ::= intLit + | floatLit + | charLit + | stringLit + | symbolLit + | true + | false + | null + + StableId ::= id + | Path `.' id + Path ::= StableId + | [id `.'] this + | [id '.'] super [`[' id `]']`.' id + + Type ::= Type1 `=>' Type + | `(' [Types] `)' `=>' Type + | Type1 + Type1 ::= SimpleType {with SimpleType} [Refinement] + SimpleType ::= SimpleType TypeArgs + | SimpleType `#' id + | StableId + | Path `.' type + | `(' Type ')' + TypeArgs ::= `[' Types `]' + Types ::= Type {`,' Type} + Refinement ::= `{' [RefineStat {`;' RefineStat}] `}' + RefineStat ::= Dcl + | type TypeDef {`,' TypeDef} + | + + Exprs ::= Expr {`,' Expr} + Expr ::= Bindings `=>' Expr + | Expr1 + Expr1 ::= if `(' Expr1 `)' Expr [[`;'] else Expr] + | try `{' Block `}' [catch Expr] [finally Expr] + | do Expr [`;'] while `(' Expr ')' + | for `(' Enumerators `)' (do | yield) Expr + | return [Expr] + | throw Expr + | [SimpleExpr `.'] id `=' Expr + | SimpleExpr ArgumentExprs `=' Expr + | PostfixExpr [`:' Type1] + PostfixExpr ::= InfixExpr [id] + InfixExpr ::= PrefixExpr + | InfixExpr id PrefixExpr + PrefixExpr ::= [`-' | `+' | `~' | `!'] SimpleExpr + SimpleExpr ::= Literal + | Path + | `(' [Expr] `)' + | BlockExpr + | new Template + | SimpleExpr `.' id + | SimpleExpr TypeArgs + | SimpleExpr ArgumentExprs + ArgumentExprs ::= `(' [Exprs] ')' + | BlockExpr + BlockExpr ::= `{' CaseClause {CaseClause} `}' + | `{' Block `}' + Block ::= {BlockStat `;'} [ResultExpr] + BlockStat ::= Import + | Def + | {LocalModifier} ClsDef + | Expr1 + | + ResultExpr ::= Expr1 + | Bindings `=>' Block + + Enumerators ::= Generator {`;' Enumerator} + Enumerator ::= Generator + | Expr + Generator ::= val Pattern1 `<-' Expr + + CaseClause ::= case Pattern [`if' PostfixExpr] `=>' Block + + Constr ::= StableId [TypeArgs] [`(' [Exprs] `)'] + + Pattern ::= Pattern1 { `|' Pattern1 } + Pattern1 ::= varid `:' Type + | `_' `:' Type + | Pattern2 + Pattern2 ::= [varid `@'] Pattern3 + Pattern3 ::= SimplePattern [ '*' | '?' | '+' ] + | SimplePattern { id SimplePattern } + SimplePattern ::= `_' + | varid + | Literal + | StableId [ `(' [Patterns] `)' ] + | `(' [Patterns] `)' + Patterns ::= Pattern {`,' Pattern} + + TypeParamClause ::= `[' TypeParam {`,' TypeParam} `]' + FunTypeParamClause ::= `[' TypeDcl {`,' TypeDcl} `]' + TypeParam ::= [`+' | `-'] TypeDcl + ParamClause ::= `(' [Param {`,' Param}] `)' + Param ::= [def] id `:' Type [`*'] + Bindings ::= id [`:' Type1] + | `(' Binding {`,' Binding `)' + Binding ::= id [`:' Type] + + Modifier ::= LocalModifier + | private + | protected + | override + LocalModifier ::= abstract + | final + | sealed + + Template ::= Constr {`with' Constr} [TemplateBody] + TemplateBody ::= `{' [TemplateStat {`;' TemplateStat}] `}' + TemplateStat ::= Import + | {Modifier} Def + | {Modifier} Dcl + | Expr + | + + Import ::= import ImportExpr {`,' ImportExpr} + ImportExpr ::= StableId `.' (id | `_' | ImportSelectors) + ImportSelectors ::= `{' {ImportSelector `,'} + (ImportSelector | `_') `}' + ImportSelector ::= id [`=>' id | `=>' `_'] + + Dcl ::= val ValDcl {`,' ValDcl} + | var VarDcl {`,' VarDcl} + | def FunDcl {`,' FunDcl} + | type TypeDcl {`,' TypeDcl} + ValDcl ::= id `:' Type + VarDcl ::= id `:' Type + FunDcl ::= id [FunTypeParamClause] {ParamClause} `:' Type + TypeDcl ::= id [`>:' Type] [`<:' Type] + + Def ::= val PatDef {`,' PatDef} + | var VarDef {`,' VarDef} + | def FunDef {`,' FunDef} + | type TypeDef {`,' TypeDef} + | ClsDef + PatDef ::= Pattern `=' Expr + VarDef ::= id [`:' Type] `=' Expr + | id `:' Type `=' `_' + FunDef ::= id [FunTypeParamClause] {ParamClause} + [`:' Type] `=' Expr + | this ParamClause `=' ConstrExpr + TypeDef ::= id [TypeParamClause] `=' Type + ClsDef ::= ([case] class | trait) ClassDef {`,' ClassDef} + | [case] object ObjectDef {`,' ObjectDef} + ClassDef ::= id [TypeParamClause] [ParamClause] + [`:' SimpleType] ClassTemplate + ObjectDef ::= id [`:' SimpleType] ClassTemplate + ClassTemplate ::= extends Template + | TemplateBody + | + ConstrExpr ::= this ArgumentExprs + | `{' this ArgumentExprs {`;' BlockStat} `}' + + CompilationUnit ::= [package QualId `;'] {TopStat `;'} TopStat + TopStat ::= {Modifier} ClsDef + | Import + | Packaging + | + Packaging ::= package QualId `{' {TopStat `;'} TopStat `}' + QualId ::= id {`.' id} +\end{lstlisting} + +\chapter{Implementation Status} + +The present Scala compiler does not yet implement all of the Scala +specification. Its currently existing omissions and deviations are +listed below. We are working on a refined implementation that +addresses these issues. +\begin{enumerate} +\item +Unicode support is still limited. At present we only permit Unicode +encodings \verb@\uXXXX@ in strings and backquote-enclosed identifiers. +To define or access a Unicode identifier, you need to put it in +backquotes and use the \verb@\uXXXX@ encoding. +\item +The unicode operator ``$\Rightarrow$'' +(\sref{sec:idents}) is not yet recognized; you need to use the two +character ASCII equivalent ``\code{=>}'' instead. +\item +The current implementation does not yet support run-time types. +All types are erased (\sref{sec:erasure}) during compilation. This means that +the following operations give potentially wrong results. +\begin{itemize} +\item +Type tests and type casts to parameterized types. Here it is only tested +that a value is an instance of the given top-level type constructor. +\item +Type tests and type casts to type parameters and abstract types. Here +it is only tested that a value is an instance of the type parameter's upper bound. +\item +Polymorphic array creation. If \code{t} is a type variable or abstract type, then +\code{new Array[t]} will yield an array of the upper bound of \code{t}. +\end{itemize} +\item +Return expressions are not yet permitted inside an anonymous function +or inside a call-by-name argument (i.e.\ a function argument corresponding to a +\code{def} parameter). +\item +Members of the empty package (\sref{sec:packagings}) cannot yet be +accessed from other source files. Hence, all library classes and +objects have to be in some package. +\item +At present, auxiliary constructors (\sref{sec:constr-defs}) are only permitted +for monomorphic classes. +\item +The \code{Array} class supports as yet only a restricted set of +operations as given in \sref{cls:array}. It is planned to extend that +interface. In particular, arrays will implement the \code{scala.Seq} +trait as well as the methods needed to support for-comprehensions. +\item +At present, all classes used as mixins must be accessible to the Scala +compiler in source form. +\end{enumerate} + +\end{document} + + +\comment{ +\section{Definitions} + +For a possibly recursive definition such as $\LET;x_1 = e_1 +;\ldots; \LET x_n = e_n$, local type inference proceeds as +follows. +A first phase assigns {\em a-priori types} to the $x_i$. The a-priori +type of $x$ is the declared type of $x$ if a declared type is +given. Otherwise, it is the inherited type, if one is +given. Otherwise, it is undefined. + +A second phase assigns completely defined types to the $x_i$, in some +order. The type of $x$ is the a-priori type, if it is completely +defined. Otherwise, it is the a-priori type of $x$'s right hand side. +The a-priori type of an expression $e$ depends on the form of $e$. +\begin{enumerate} +\item +The a-priori type of a +typed expression $e:T$ is $T$. +\item +The a-priori type of a class instance +creation expression $c;\WITH;(b)$ is $C;\WITH;R$ where $C$ is the +type of the class given in $c$ and $R$ is the a-priori type of block +$b$. +\item +The a-priori type of a block is a record consisting the a-priori +types of each non-private identifier which is declared in the block +and which is visible at in last statement of the block. Here, it is +required that every import clause $\IMPORT;e_1 \commadots e_n$ refers +to expressions whose type can be computed with the type information +determined so far. Otherwise, a compile time error results. +\item +The a-priori type of any other expression is the expression's type, if +that type can be computed with the type information determined so far. +Otherwise, a compile time error results. +\end{enumerate} +The compiler will find an ordering in which types are assigned without +compiler errors to all variables $x_1 \commadots x_n$, if such an +ordering exists. This can be achieved by lazy evaluation. +} +\section{Exceptions} +\label{sec:exceptions} + +There is a predefined type \code{Throwable}, as well as functions to +throw and handle values of type \code{Throwable}. These are declared +as follows. + +\begin{lstlisting} + class Throwable { + def throw[a]: a + } + class ExceptOrFinally[a] { + def except (handler: PartialFunction[Throwable,a]): a + def finally (def handler: Unit): a + } + def try [a] (def body: a): ExceptOrFinally[a] +\end{lstlisting} + +The type \code{Throwable} represents exceptions and error objects; it +may be identified with an analogous type of the underlying +implementation such as \code{java.lang.Throwable}. We will in the +following loosely call values of type \code{Throwable} exceptions. + +The \code{throw} method in \code{Throwable} aborts execution of the +thread executing it and passes the thrown exception to the handler +that was most recently installed by a +\code{try} function in the current thread. If no \code{try} method is +active, the thread terminates. + +The \code{try} function executes its body with the given exception +handler. A \code{try} expression comes in two forms. The first form is + +\begin{lstlisting} +try $body$ except $handler$ . +\end{lstlisting} + +If $body$ executes without an exception being thrown, then executing +the try expression is equivalent to just executing $body$. If some +exception is thrown from within $body$ for which \code{handler} is defined, +the handler is invoked with the thrown exception as argument. + +The second form of a try expression is + +\begin{lstlisting} +try $body$ finally $handler$ . +\end{lstlisting} + +This expression will execute $body$. A normal execution of $body$ is +followed by an invocation of the $handler$ expression. The $handler$ +expression does not take arguments and has \code{Unit} as result type. +If execution of the handler expression throws an exception, this +exception is propagated out of the \code{try} statement. Otherwise, +if an exception was thrown in $body$ prior to invocation of $handler$, +that exception is re-thrown after the invocation. Finally, if both +$body$ and $handler$ terminate normally, the original result of +$body$ is the result of the \code{try} expression. + +\example An example of a try-except expression: + +\begin{lstlisting} +try { + System.in.readString() +} except { + case ex: EndOfFile => "" +} +\end{lstlisting} + +\example An example of a try-finally expression: + +\begin{lstlisting} +file = open (fileName) +if (file != null) { + try { + process (file) + } finally { + file.close + } +} +\end{lstlisting} + +\section{Concurrency} +\label{sec:concurrency} + +\subsection{Basic Concurrency Constructs} + +Scala programs may be executed by several threads that operate +concurrently. The thread model used is based on the model of the +underlying run-time system. We postulate a predefined +class \code{Thread} for run-time threads, +\code{fork} function to spawn off a new thread, +as well as \code{Monitor} and \code{Signal} classes. These are +specified as follows\notyet{Concurrency constructs are}. + + +\begin{lstlisting} +class Thread { $\ldots$ } +def fork (def p: Unit): Thread +\end{lstlisting} + +The \code{fork} function runs its argument computation \code{p} in a +separate thread. It returns the thread object immediately to its +caller. Unhandled exceptions (\sref{sec:exceptions}) thrown during +evaluation of \code{p} abort execution of the forked thread and are +otherwise ignored. + +\begin{lstlisting} +class Monitor { + def synchronized [a] (def e: a): a +} +\end{lstlisting} + +Monitors define a \code{synchronized} method which provides mutual +exclusion between threads. It executes its argument computation +\code{e} while asserting exclusive ownership of the monitor +object whose method is invoked. If some other thread has ownership of +the same monitor object, the computation is delayed until the other +process has relinquished its ownership. Ownership of a monitor is +relinquished at the end of the argument computation, and while the +computation is waiting for a signal. + +\begin{lstlisting} +class Signal { + def wait: Unit + def wait(msec: Long): Unit + def notify: Unit + def notifyAll: Unit +} +\end{lstlisting} + +The \code{Signal} class provides the basic means for process +synchronization. The \code{wait} method of a signal suspends the +calling thread until it is woken up by some future invocation of the +signal's \code{notify} or \code{notifyAll} method. The \code{notify} +method wakes up one thread that is waiting for the signal. The +\code{notifyAll} method wakes up all threads that are waiting for the +signal. A second version of the \code{wait} method takes a time-out +parameter (given in milliseconds). A thread calling \code{wait(msec)} +will suspend until unblocked by a \code{notify} or \code{notifyAll} +method, or until the \code{msec} millseconds have passed. + +\subsection{Channels} + +\begin{lstlisting} +class Channel[a] { + def write(x: a): Unit + def read: a +} +\end{lstlisting} + +An object of type \code{Channel[a]} Channels offer a write-operation +which writes data of type \code{a} to the channel, and a read +operation, which returns written data as a result. The write operation +is non-blocking; that is it returns immediately without waiting for +the written data to be read. + +\subsection{Message Spaces} + +The Scala library also provides message spaces as a higher-level, +flexible construct for process synchronization and communication. A +{\em message} is an arbitrary object that inherits from the +\code{Message} class. +There is a special message \code{TIMEOUT} which is used to signal a time-out. +\begin{lstlisting} +class Message +case class TIMEOUT extends Message +\end{lstlisting} +Message spaces implement the following class. +\begin{lstlisting} +class MessageSpace { + def send(msg: Message): Unit + def receive[a](f: PartialFunction1[Message, a]): a + def receiveWithin[a](msec: Long)(f: PartialFunction1[Message, a]): a +} +\end{lstlisting} +The state of a message space consists of a multi-set of messages. +Messages are added to the space using the \code{send} method. Messages +are removed using the \code{receive} method, which is passed a message +processor \code{f} as argument, which is a partial function from +messages to some arbitrary result type. Typically, this function is +implemented as a pattern matching expression. The \code{receive} +method blocks until there is a message in the space for which its +message processor is defined. The matching message is then removed +from the space and the blocked thread is restarted by applying the +message processor to the message. Both sent messages and receivers are +ordered in time. A receiver $r$ is applied to a matching message $m$ +only if there is no other (message, receiver) pair which precedes $(m, +r)$ in the partial ordering on pairs that orders each component in +time. + +The message space class also offers a method \code{receiveWithin} +which blocks for only a specified maximal amount of time. If no +message is received within the specified time interval (given in +milliseconds), the message processor argument $f$ will be unblocked +with the special \code{TIMEOUT} message. + +case class extends { $\ldots$ } + +trait List { } +class Nil +class Cons + +\comment{changes: + Type ::= SimpleType {with SimpleType} [with Refinement] + | class SimpleType + SimpleType ::= SimpleType [TypeArgs] + | `(' [Types] `)' + | + | this +} diff --git a/doc/reference/ScalaByExample.tex b/doc/reference/ScalaByExample.tex index 4241aafbe3..7acbb4a805 100644 --- a/doc/reference/ScalaByExample.tex +++ b/doc/reference/ScalaByExample.tex @@ -22,15 +22,6 @@ } \fi -\def\exercise{ - \def\theresult{Exercise~\thesection.\arabic{result}} - \refstepcounter{result} - \trivlist\item[\hskip - \labelsep{\bf \theresult}]} -\def\endexercise{\endtrivlist} - -\newcommand{\rewriteby}[1]{\mbox{\tab\tab\rm(#1)}} - \renewcommand{\doctitle}{Scala By Example\\[33mm]\ } \renewcommand{\docauthor}{Martin Odersky\\[53mm]\ } @@ -43,6885 +34,7 @@ \mainmatter \sloppy -\chapter{\label{chap:intro}Introduction} - -\input{RationalePart} - -The rest of this document is structured as -follows. Chapters~\ref{chap:example-one} and -\ref{chap:example-auction} highlight some of the features that make -Scala interesting. The following chapters introduce the language -constructs of Scala in a more thorough -way. Chapter~\ref{chap:simple-funs} introduces basic expressions and -simple functions. Chapter~\ref{chap:first-class-funs} introduces -higher-order functions. (to be continued). - -This document ows a great dept to Sussman and Abelson's wonderful book -``Structure and Interpretation of Computer -Programs''\cite{abelson-sussman:structure}. Many of their examples and -exercises are also present here. Of course, the working language has -in each case been changed from Scheme to Scala. Furthermore, the -examples make use of Scala's object-oriented constructs where -appropriate. - - -\chapter{\label{chap:example-one}A First Example} - -As a first example, here is an implementation of Quicksort in Scala. - -\begin{lstlisting} -def sort(xs: Array[int]): unit = { - def swap(i: int, j: int): unit = { - val t = xs(i); xs(i) = xs(j); xs(j) = t; - } - def sort1(l: int, r: int): unit = { - val pivot = xs((l + r) / 2); - var i = l, j = r; - while (i <= j) { - while (xs(i) < pivot) { i = i + 1 } - while (xs(j) > pivot) { j = j - 1 } - if (i <= j) { - swap(i, j); - i = i + 1; - j = j - 1; - } - } - if (l < j) sort1(l, j); - if (j < r) sort1(i, r); - } - sort1(0, xs.length - 1); -} -\end{lstlisting} - -The implementation looks quite similar to what one would write in Java -or C. We use the same operators and similar control structures. -There are also some minor syntactical differences. In particular: -\begin{itemize} -\item -Definitions start with a reserved word. Function definitions start -with \code{def}, variable definitions start with \code{var} and -definitions of values (i.e. read only variables) start with \code{val}. -\item -The declared type of a symbol is given after the symbol and a colon. -The declared type can often be omitted, because the compiler can infer -it from the context. -\item -We use \code{unit} instead of \code{void} to define the result type of -a procedure. -\item -Array types are written \code{Array[T]} rather than \code{T[]}, -and array selections are written \code{a(i)} rather than \code{a[i]}. -\item -Functions can be nested inside other functions. Nested functions can -access parameters and local variables of enclosing functions. For -instance, the name of the array \code{a} is visible in functions -\code{swap} and \code{sort1}, and therefore need not be passed as a -parameter to them. -\end{itemize} -So far, Scala looks like a fairly conventional language with some -syntactic pecularities. In fact it is possible to write programs in a -conventional imperative or object-oriented style. This is important -because it is one of the things that makes it easy to combine Scala -components with components written in mainstream languages such as -Java, C\# or Visual Basic. - -However, it is also possible to write programs in a style which looks -completely different. Here is Quicksort again, this time written in -functional style. - -\begin{lstlisting} -def sort(xs: List[int]): List[int] = { - val pivot = a(a.length / 2); - sort(a.filter(x => x < pivot)) - ::: a.filter(x => x == pivot) - ::: sort(a.filter(x => x > pivot)) -} -\end{lstlisting} - -The functional program works with lists instead of arrays.\footnote{In -a future complete implemenetation of Scala, we could also have used arrays -instead of lists, but at the moment arrays do not yet support -\code{filter} and \code{:::}.} -It captures the essence of the quicksort algorithm in a concise way: -\begin{itemize} -\item Pick an element in the middle of the list as a pivot. -\item Partition the lists into two sub-lists containing elements that -are less than, respectively greater than the pivot element, and a -third list which contains elements equal to privot. -\item Sort the first two sub-lists by a recursive invocation of -the sort function.\footnote{This is not quite what the imperative algorithm does; -the latter partitions the array into two sub-arrays containing elements -less than or greater or equal to pivot.} -\item The result is obtained by appending the three sub-lists together. -\end{itemize} -Both the imperative and the functional implementation have the same -asymptotic complexity -- $O(N;log(N))$ in the average case and -$O(N^2)$ in the worst case. But where the imperative implementation -operates in place by modifying the argument array, the functional -implementation returns a new sorted list and leaves the argument -list unchanged. The functional implementation thus requires more -transient memory than the imperative one. - -The functional implementation makes it look like Scala is a language -that's specialized for functional operations on lists. In fact, it -is not; all of the operations used in the example are simple library -methods of a class \code{List[t]} which is part of the standard -Scala library, and which itself is implemented in Scala. - -In particular, there is the method \code{filter} which takes as -argument a {\em predicate function} that maps list elements to -boolean values. The result of \code{filter} is a list consisting of -all the elements of the original list for which the given predicate -function is true. The \code{filter} method of an object of type -\code{List[t]} thus has the signature - -\begin{lstlisting} -def filter(p: t => boolean): List[t] -\end{lstlisting} - -Here, \code{t => boolean} is the type of functions that take an element -of type \code{t} and return a \code{boolean}. Functions like -\code{filter} that take another function as argument or return one as -result are called {\em higher-order} functions. - -In the quicksort program, \code{filter} is applied three times to an -anonymous function argument. The first argument, -\code{x => x <= pivot} represents the function that maps its parameter -\code{x} to the boolean value \code{x <= pivot}. That is, it yields -true if \code{x} is smaller or equal than \code{pivot}, false -otherwise. The function is anonymous, i.e.\ it is not defined with a -name. The type of the \code{x} parameter is omitted because a Scala -compiler can infer it automatically from the context where the -function is used. To summarize, \code{xs.filter(x => x <= pivot)} -returns a list consisting of all elements of the list \code{xs} that are -smaller than \code{pivot}. - -\comment{ -It is also possible to apply higher-order functions such as -\code{filter} to named function arguments. Here is functional -quicksort again, where the two anonymous functions are replaced by -named auxiliary functions that compare the argument to the -\code{pivot} value. - -\begin{lstlisting} -def sort (xs: List[int]): List[int] = { - val pivot = xs(xs.length / 2); - def leqPivot(x: int) = x <= pivot; - def gtPivot(x: int) = x > pivot; - def eqPivot(x: int) = x == pivot; - sort(xs filter leqPivot) - ::: sort(xs filter eqPivot) - ::: sort(xs filter gtPivot) -} -\end{lstlisting} -} - -An object of type \code{List[t]} also has a method ``\code{:::}'' -which takes an another list and which returns the result of appending this -list to itself. This method has the signature - -\begin{lstlisting} -def :::(that: List[t]): List[t] -\end{lstlisting} - -Scala does not distinguish between identifiers and operator names. An -identifier can be either a sequence of letters and digits which begins -with a letter, or it can be a sequence of special characters, such as -``\code{+}'', ``\code{*}'', or ``\code{:}''. The last definition thus -introduced a new method identifier ``\code{:::}''. This identifier is -used in the Quicksort example as a binary infix operator that connects -the two sub-lists resulting from the partition. In fact, any method -can be used as an operator in Scala. The binary operation $E;op;E'$ -is always interpreted as the method call $E.op(E')$. This holds also -for binary infix operators which start with a letter. The recursive call -to \code{sort} in the last quicksort example is thus equivalent to -\begin{lstlisting} -sort(a.filter(x => x < pivot)) - .:::(sort(a.filter(x => x == pivot))) - .:::(sort(a.filter(x => x > pivot))) -\end{lstlisting} - -Looking again in detail at the first, imperative implementation of -Quicksort, we find that many of the language constructs used in the -second solution are also present, albeit in a disguised form. - -For instance, ``standard'' binary operators such as \code{+}, -\code{-}, or \code{<} are not treated in any special way. Like -\code{append}, they are methods of their left operand. Consequently, -the expression \code{i + 1} is regarded as the invocation -\code{i.+(1)} of the \code{+} method of the integer value \code{x}. -Of course, a compiler is free (if it is moderately smart, even expected) -to recognize the special case of calling the \code{+} method over -integer arguments and to generate efficient inline code for it. - -Control constructs such as \code{while} are also not primitive but are -predefined functions in the standard Scala library. Here is the -definition of \code{while} in Scala. -\begin{lstlisting} -def while (def p: boolean) (def s: unit): unit = - if (p) { s ; while(p)(s) } -\end{lstlisting} -The \code{while} function takes as first parameter a test function, -which takes no parameters and yields a boolean value. As second -parameter it takes a command function which also takes no parameters -and yields a trivial result. \code{while} invokes the command function -as long as the test function yields true. Again, compilers are free to -pick specialized implementations of \code{while} that have the same -behavior as the invocation of the function given above. - -\chapter{Programming with Actors and Messages} -\label{chap:example-auction} - -Here's an example that shows an application area for which Scala is -particularly well suited. Consider the task of implementing an -electronic auction service. We use an Erlang-style actor process -model to implement the participants of the auction. Actors are -objects to which messages are sent. Every process has a ``mailbox'' of -its incoming messages which is represented as a queue. It can work -sequentially through the messages in its mailbox, or search for -messages matching some pattern. - -\begin{lstlisting}[style=floating,label=fig:simple-auction-msgs,caption=Implementation of an Auction Service] -trait AuctionMessage; -case class Offer(bid: int, client: Actor) extends AuctionMessage; -case class Inquire(client: Actor) extends AuctionMessage; - -trait AuctionReply; -case class Status(asked: int, expire: Date) extends AuctionReply; -case object BestOffer extends AuctionReply; -case class BeatenOffer(maxBid: int) extends AuctionReply; -case class AuctionConcluded(seller: Actor, client: Actor) - extends AuctionReply; -case object AuctionFailed extends AuctionReply; -case object AuctionOver extends AuctionReply; -\end{lstlisting} - -For every traded item there is an auctioneer process that publishes -information about the traded item, that accepts offers from clients -and that communicates with the seller and winning bidder to close the -transaction. We present an overview of a simple implementation -here. - -As a first step, we define the messages that are exchanged during an -auction. There are two abstract base classes (called {\em traits}): -\code{AuctionMessage} for messages from clients to the auction -service, and \code{AuctionReply} for replies from the service to the -clients. For both base classes there exists a number of cases, which -are defined in Figure~\ref{fig:simple-auction-msgs}. - -\begin{lstlisting}[style=floating,label=fig:simple-auction,caption=Implementation of an Auction Service] -class Auction(seller: Actor, minBid: int, closing: Date) extends Actor { - val timeToShutdown = 36000000; // msec - val bidIncrement = 10; - def run() = { - var maxBid = minBid - bidIncrement; - var maxBidder: Actor = _; - var running = true; - while (running) { - receiveWithin ((closing.getTime() - new Date().getTime())) { - case Offer(bid, client) => - if (bid >= maxBid + bidIncrement) { - if (maxBid >= minBid) maxBidder send BeatenOffer(bid); - maxBid = bid; maxBidder = client; client send BestOffer; - } else { - client send BeatenOffer(maxBid); - } - case Inquire(client) => - client send Status(maxBid, closing); - case TIMEOUT => - if (maxBid >= minBid) { - val reply = AuctionConcluded(seller, maxBidder); - maxBidder send reply; seller send reply; - } else { - seller send AuctionFailed; - } - receiveWithin(timeToShutdown) { - case Offer(_, client) => client send AuctionOver - case TIMEOUT => running = false; - } - } - } - } -} -\end{lstlisting} - -For each base class, there are a number of {\em case classes} which -define the format of particular messages in the class. These messages -might well be ultimately mapped to small XML documents. We expect -automatic tools to exist that convert between XML documents and -internal data structures like the ones defined above. - -Figure~\ref{fig:simple-auction} presents a Scala implementation of a -class \code{Auction} for auction processes that coordinate the bidding -on one item. Objects of this class are created by indicating -\begin{itemize} -\item a seller process which needs to be notified when the auction is over, -\item a minimal bid, -\item the date when the auction is to be closed. -\end{itemize} -The process behavior is defined by its \code{run} method. That method -repeatedly selects (using \code{receiveWithin}) a message and reacts to it, -until the auction is closed, which is signalled by a \code{TIMEOUT} -message. Before finally stopping, it stays active for another period -determined by the \code{timeToShutdown} constant and replies to -further offers that the auction is closed. - -Here are some further explanations of the constructs used in this -program: -\begin{itemize} -\item -The \code{receiveWithin} method of class \code{Actor} takes as -parameters a time span given in milliseconds and a function that -processes messages in the mailbox. The function is given by a sequence -of cases that each specify a pattern and an action to perform for -messages matching the pattern. The \code{receiveWithin} method selects -the first message in the mailbox which matches one of these patterns -and applies the corresponding action to it. -\item -The last case of \code{receiveWithin} is guarded by a -\code{TIMEOUT} pattern. If no other messages are received in the meantime, this -pattern is triggered after the time span which is passed as argument -to the enclosing \code{receiveWithin} method. \code{TIMEOUT} is a -particular instance of class \code{Message}, which is triggered by the -\code{Actor} implementation itself. -\item -Reply messages are sent using syntax of the form -\code{destination send SomeMessage}. \code{send} is used here as a -binary operator with a process and a message as arguments. This is -equivalent in Scala to the method call -\code{destination.send(SomeMessage)}, i.e. the invocation of -the \code{send} of the destination process with the given message as -parameter. -\end{itemize} -The preceding discussion gave a flavor of distributed programming in -Scala. It might seem that Scala has a rich set of language constructs -that support actor processes, message sending and receiving, -programming with timeouts, etc. In fact, the opposite is true. All the -constructs discussed above are offered as methods in the library class -\code{Actor}. That class is itself implemented in Scala, based on the underlying -thread model of the host language (e.g. Java, or .NET). -The implementation of all features of class \code{Actor} used here is -given in Section~\ref{sec:actors}. - -The advantages of the library-based approach are relative simplicity -of the core language and flexibility for library designers. Because -the core language need not specify details of high-level process -communication, it can be kept simpler and more general. Because the -particular model of messages in a mailbox is a library module, it can -be freely modified if a different model is needed in some -applications. The approach requires however that the core language is -expressive enough to provide the necessary language abstractions in a -convenient way. Scala has been designed with this in mind; one of its -major design goals was that it should be flexible enough to act as a -convenient host language for domain specific languages implemented by -library modules. For instance, the actor communication constructs -presented above can be regarded as one such domain specific language, -which conceptually extends the Scala core. - -\chapter{\label{chap:simple-funs}Expressions and Simple Functions} - -The previous examples gave an impression of what can be done with -Scala. We now introduce its constructs one by one in a more -systematic fashion. We start with the smallest level, expressions and -functions. - -\section{Expressions And Simple Functions} - -A Scala system comes with an interpreter which can be seen as a fancy -calculator. A user interacts with the calculator by typing in -expressions. The calculator returns the evaluation results and their -types. Example: - -\begin{lstlisting} -> 87 + 145 -232: scala.Int - -> 5 + 2 * 3 -11: scala.Int - -> "hello" + " world!" -hello world: scala.String -\end{lstlisting} -It is also possible to name a sub-expression and use the name instead -of the expression afterwards: -\begin{lstlisting} -> def scale = 5 -def scale: int - -> 7 * scale -35: scala.Int -\end{lstlisting} -\begin{lstlisting} -> def pi = 3.14159 -def pi: scala.Double - -> def radius = 10 -def radius: scala.Int - -> 2 * pi * radius -62.8318: scala.Double -\end{lstlisting} -Definitions start with the reserved word \code{def}; they introduce a -name which stands for the expression following the \code{=} sign. The -interpreter will answer with the introduced name and its type. - -Executing a definition such as \code{def x = e} will not evaluate the -expression \code{e}. Instead \code{e} is evaluated whenever \code{x} -is used. Alternatively, Scala offers a value definition -\code{val x = e}, which does evaluate the right-hand-side \code{e} as part of the -evaluation of the definition. If \code{x} is then used subsequently, -it is immediately replaced by the pre-computed value of -\code{e}, so that the expression need not be evaluated again. - -How are expressions evaluated? An expression consisting of operators -and operands is evaluated by repeatedly applying the following -simplification steps. -\begin{itemize} -\item pick the left-most operation -\item evaluate its operands -\item apply the operator to the operand values. -\end{itemize} -A name defined by \code{def}\ is evaluated by replacing the name by the -(unevaluated) definition's right hand side. A name defined by \code{val} is -evaluated by replacing the name by the value of the definitions's -right-hand side. The evaluation process stops once we have reached a -value. A value is some data item such as a string, a number, an array, -or a list. - -\example -Here is an evaluation of an arithmetic expression. -\begin{lstlisting} -$\,\,\,$ (2 * pi) * radius -$\rightarrow$ (2 * 3.14159) * radius -$\rightarrow$ 6.28318 * radius -$\rightarrow$ 6.28318 * 10 -$\rightarrow$ 62.8318 -\end{lstlisting} -The process of stepwise simplification of expressions to values is -called {\em reduction}. - -\section{Parameters} - -Using \code{def}, one can also define functions with parameters. Example: -\begin{lstlisting} -> def square(x: double) = x * x -def square(x: double): scala.Double - -> square(2) -4.0: scala.Double - -> square(5 + 4) -81.0: scala.Double - -> square(square(4)) -256.0: scala.Double - -> def sumOfSquares(x: double, y: double) = square(x) + square(y) -def sumOfSquares(x: scala.Double, y: scala.Double): scala.Double -\end{lstlisting} - -Function parameters follow the function name and are always enclosed -in parentheses. Every parameter comes with a type, which is indicated -following the parameter name and a colon. At the present time, we -only need basic numeric types such as the type \code{scala.Double} of -double precision numbers. Scala defines {\em type aliases} for some -standard types, so we can write numeric types as in Java. For instance -\code{double} is a type alias of \code{scala.Double} and \code{int} is -a type alias for \code{scala.Int}. - -Functions with parameters are evaluated analogously to operators in -expressions. First, the arguments of the function are evaluated (in -left-to-right order). Then, the function application is replaced by -the function's right hand side, and at the same time all formal -parameters of the function are replaced by their corresponding actual -arguments. - -\example\ - -\begin{lstlisting} -$\,\,\,$ sumOfSquares(3, 2+2) -$\rightarrow$ sumOfSquares(3, 4) -$\rightarrow$ square(3) + square(4) -$\rightarrow$ 3 * 3 + square(4) -$\rightarrow$ 9 + square(4) -$\rightarrow$ 9 + 4 * 4 -$\rightarrow$ 9 + 16 -$\rightarrow$ 25 -\end{lstlisting} - -The example shows that the interpreter reduces function arguments to -values before rewriting the function application. One could instead -have chosen to apply the function to unreduced arguments. This would -have yielded the following reduction sequence: -\begin{lstlisting} -$\,\,\,$ sumOfSquares(3, 2+2) -$\rightarrow$ square(3) + square(2+2) -$\rightarrow$ 3 * 3 + square(2+2) -$\rightarrow$ 9 + square(2+2) -$\rightarrow$ 9 + (2+2) * (2+2) -$\rightarrow$ 9 + 4 * (2+2) -$\rightarrow$ 9 + 4 * 4 -$\rightarrow$ 9 + 16 -$\rightarrow$ 25 -\end{lstlisting} - -The second evaluation order is known as \emph{call-by-name}, -whereas the first one is known as \emph{call-by-value}. For -expressions that use only pure functions and that therefore can be -reduced with the substitution model, both schemes yield the same final -values. - -Call-by-value has the advantage that it avoids repeated evaluation of -arguments. Call-by-name has the advantage that it avoids evaluation of -arguments when the parameter is not used at all by the function. -Call-by-value is usually more efficient than call-by-name, but a -call-by-value evaluation might loop where a call-by-name evaluation -would terminate. Consider: -\begin{lstlisting} -> def loop: int = loop -def loop: scala.Int - -> def first(x: int, y: int) = x -def first(x: scala.Int, y: scala.Int): scala.Int -\end{lstlisting} -Then \code{first(1, loop)} reduces with call-by-name to \code{1}, -whereas the same term reduces with call-by-value repeatedly to itself, -hence evaluation does not terminate. -\begin{lstlisting} -$\,\,\,$ first(1, loop) -$\rightarrow$ first(1, loop) -$\rightarrow$ first(1, loop) -$\rightarrow$ ... -\end{lstlisting} -Scala uses call-by-value by default, but it switches to call-by-name evaluation -if the parameter is preceded by \code{def}. - -\example\ - -\begin{lstlisting} -> def constOne(x: int, def y: int) = 1 -constOne(x: scala.Int, def y: scala.Int): scala.Int - -> constOne(1, loop) -1: scala.Int - -> constOne(loop, 2) // gives an infinite loop. -^C -\end{lstlisting} - -\section{Conditional Expressions} - -Scala's \code{if-else} lets one choose between two alternatives. Its -syntax is like Java's \code{if-else}. But where Java's \code{if-else} -can be used only as an alternative of statements, Scala allows the -same syntax to choose between two expressions. That's why Scala's -\code{if-else} serves also as a substitute for Java's conditional -expression \code{ ... ? ... : ...}. - -\example\ - -\begin{lstlisting} -> def abs(x: double) = if (x >= 0) x else -x -abs(x: double): double -\end{lstlisting} -Scala's boolean expressions are similar to Java's; they are formed -from the constants -\code{true} and -\code{false}, comparison operators, boolean negation \code{!} and the -boolean operators $\,$\code{&&}$\,$ and $\,$\code{||}. - -\section{\label{sec:sqrt}Example: Square Roots by Newton's Method} - -We now illustrate the language elements introduced so far in the -construction of a more interesting program. The task is to write a -function -\begin{lstlisting} -def sqrt(x: double): double = ... -\end{lstlisting} -which computes the square root of \code{x}. - -A common way to compute square roots is by Newton's method of -successive approximations. One starts with an initial guess \code{y} -(say: \code{y = 1}). One then repeatedly improves the current guess -\code{y} by taking the average of \code{y} and \code{x/y}. As an -example, the next three columns indicate the guess \code{y}, the -quotient \code{x/y}, and their average for the first approximations of -$\sqrt 2$. -\begin{lstlisting} -1 2/1 = 2 1.5 -1.5 2/1.5 = 1.3333 1.4167 -1.4167 2/1.4167 = 1.4118 1.4142 -1.4142 ... ... - -$y$ $x/y$ $(y + x/y)/2$ -\end{lstlisting} -One can implement this algorithm in Scala by a set of small functions, -which each represent one of the elements of the algorithm. - -We first define a function for iterating from a guess to the result: -\begin{lstlisting} -def sqrtIter(guess: double, x: double): double = - if (isGoodEnough(guess, x)) guess - else sqrtIter(improve(guess, x), x); -\end{lstlisting} -Note that \code{sqrtIter} calls itself recursively. Loops in -imperative programs can always be modelled by recursion in functional -programs. - -Note also that the definition of \code{sqrtIter} contains a return -type, which follows the parameter section. Such return types are -mandatory for recursive functions. For a non-recursive function, the -return type is optional; if it is missing the type checker will -compute it from the type of the function's right-hand side. However, -even for non-recursive functions it is often a good idea to include a -return type for better documentation. - -As a second step, we define the two functions called by -\code{sqrtIter}: a function to \code{improve} the guess and a -termination test \code{isGoodEnough}. Here is their definition. -\begin{lstlisting} -def improve(guess: double, x: double) = - (guess + x / guess) / 2; - -def isGoodEnough(guess: double, x: double) = - abs(square(guess) - x) < 0.001; -\end{lstlisting} - -Finally, the \code{sqrt} function itself is defined by an aplication -of \code{sqrtIter}. -\begin{lstlisting} -def sqrt(x: double) = sqrtIter(1.0, x); -\end{lstlisting} - -\begin{exercise} The \code{isGoodEnough} test is not very precise for small -numbers and might lead to non-termination for very large ones (why?). -Design a different version of \code{isGoodEnough} which does not have -these problems. -\end{exercise} - -\begin{exercise} Trace the execution of the \code{sqrt(4)} expression. -\end{exercise} - -\section{Nested Functions} - -The functional programming style encourages the construction of many -small helper functions. In the last example, the implementation -of \code{sqrt} made use of the helper functions \code{sqrtIter}, -\code{improve} and \code{isGoodEnough}. The names of these functions -are relevant only for the implementation of \code{sqrt}. We normally -do not want users of \code{sqrt} to access these functions directly. - -We can enforce this (and avoid name-space pollution) by including -the helper functions within the calling function itself: -\begin{lstlisting} -def sqrt(x: double) = { - def sqrtIter(guess: double, x: double): double = - if (isGoodEnough(guess, x)) guess - else sqrtIter(improve(guess, x), x); - def improve(guess: double, x: double) = - (guess + x / guess) / 2; - def isGoodEnough(guess: double, x: double) = - abs(square(guess) - x) < 0.001; - sqrtIter(1.0, x) -} -\end{lstlisting} -In this program, the braces \code{\{ ... \}} enclose a {\em block}. -Blocks in Scala are themselves expressions. Every block ends in a -result expression which defines its value. The result expression may -be preceded by auxiliary definitions, which are visible only in the -block itself. - -Every definition in a block must be followed by a semicolon, which -separates this definition from subsequent definitions or the result -expression. However, a semicolon is inserted implicitly if the -definition ends in a right brace and is followed by a new line. -Therefore, the following are all legal: -\begin{lstlisting} -def f(x) = x + 1; /* `;' mandatory */ -f(1) + f(2) - -def g(x) = {x + 1} -g(1) + g(2) - -def h(x) = {x + 1}; /* `;' mandatory */ h(1) + h(2) -\end{lstlisting} -Scala uses the usual block-structured scoping rules. A name defined in -some outer block is visible also in some inner block, provided it is -not redefined there. This rule permits us to simplify our -\code{sqrt} example. We need not pass \code{x} around as an additional parameter of -the nested functions, since it is always visible in them as a -parameter of the outer function \code{sqrt}. Here is the simplified code: -\begin{lstlisting} -def sqrt(x: double) = { - def sqrtIter(guess: double): double = - if (isGoodEnough(guess)) guess - else sqrtIter(improve(guess)); - def improve(guess: double) = - (guess + x / guess) / 2; - def isGoodEnough(guess: double) = - abs(square(guess) - x) < 0.001; - sqrtIter(1.0) -} -\end{lstlisting} - -\section{Tail Recursion} - -Consider the following function to compute the greatest common divisor -of two given numbers. - -\begin{lstlisting} -def gcd(a: int, b: int): int = if (b == 0) a else gcd(b, a % b) -\end{lstlisting} - -Using our substitution model of function evaluation, -\code{gcd(14, 21)} evaluates as follows: - -\begin{lstlisting} -$\,\,$ gcd(14, 21) -$\rightarrow\!$ if (21 == 0) 14 else gcd(21, 14 % 21) -$\rightarrow\!$ if (false) 14 else gcd(21, 14 % 21) -$\rightarrow\!$ gcd(21, 14 % 21) -$\rightarrow\!$ gcd(21, 14) -$\rightarrow\!$ if (14 == 0) 21 else gcd(14, 21 % 14) -$\rightarrow$ $\rightarrow$ gcd(14, 21 % 14) -$\rightarrow\!$ gcd(14, 7) -$\rightarrow\!$ if (7 == 0) 14 else gcd(7, 14 % 7) -$\rightarrow$ $\rightarrow$ gcd(7, 14 % 7) -$\rightarrow\!$ gcd(7, 0) -$\rightarrow\!$ if (0 == 0) 7 else gcd(0, 7 % 0) -$\rightarrow$ $\rightarrow$ 7 -\end{lstlisting} - -Contrast this with the evaluation of another recursive function, -\code{factorial}: - -\begin{lstlisting} -def factorial(n: int): int = if (n == 0) 1 else n * factorial(n - 1) -\end{lstlisting} - -The application \code{factorial(5)} rewrites as follows: -\begin{lstlisting} -$\,\,\,$ factorial(5) -$\rightarrow$ if (5 == 0) 1 else 5 * factorial(5 - 1) -$\rightarrow$ 5 * factorial(5 - 1) -$\rightarrow$ 5 * factorial(4) -$\rightarrow\ldots\rightarrow$ 5 * (4 * factorial(3)) -$\rightarrow\ldots\rightarrow$ 5 * (4 * (3 * factorial(2))) -$\rightarrow\ldots\rightarrow$ 5 * (4 * (3 * (2 * factorial(1)))) -$\rightarrow\ldots\rightarrow$ 5 * (4 * (3 * (2 * (1 * factorial(0)))) -$\rightarrow\ldots\rightarrow$ 5 * (4 * (3 * (2 * (1 * 1)))) -$\rightarrow\ldots\rightarrow$ 120 -\end{lstlisting} -There is an important difference between the two rewrite sequences: -The terms in the rewrite sequence of \code{gcd} have again and again -the same form. As evaluation proceeds, their size is bounded by a -constant. By contrast, in the evaluation of factorial we get longer -and longer chains of operands which are then multiplied in the last -part of the evaluation sequence. - -Even though actual implementations of Scala do not work by rewriting -terms, they nevertheless should have the same space behavior as in the -rewrite sequences. In the implementation of \code{gcd}, one notes that -the recursive call to \code{gcd} is the last action performed in the -evaluation of its body. One also says that \code{gcd} is -``tail-recursive''. The final call in a tail-recursive function can be -implemented by a jump back to the beginning of that function. The -arguments of that call can overwrite the parameters of the current -instantiation of \code{gcd}, so that no new stack space is needed. -Hence, tail recursive functions are iterative processes, which can be -executed in constant space. - -By contrast, the recursive call in \code{factorial} is followed by a -multiplication. Hence, a new stack frame is allocated for the -recursive instance of factorial, and is decallocated after that -instance has finished. The given formulation of the factorial function -is not tail-recursive; it needs space proportional to its input -parameter for its execution. - -More generally, if the last action of a function is a call to another -(possibly the same) function, only a single stack frame is needed for -both functions. Such calls are called ``tail calls''. In principle, -tail calls can always re-use the stack frame of the calling function. -However, some run-time environments (such as the Java VM) lack the -primititives to make stack frame re-use for tail calls efficient. A -production quality Scala implementation is therefore only required to -re-use the stack frame of a directly tail-recursive function whose -last action is a call to itself. Other tail calls might be optimized -also, but one should not rely on this across implementations. - -\begin{exercise} Design a tail-recursive version of -\code{factorial}. -\end{exercise} - -\chapter{\label{chap:first-class-funs}First-Class Functions} - -A function in Scala is a ``first-class value''. Like any other value, -it may be passed as a parameter or returned as a result. Functions -which take other functions as parameters or return them as results are -called {\em higher-order} functions. This chapter introduces -higher-order functions and shows how they provide a flexible mechanism -for program composition. - -As a motivating example, consider the following three related tasks: -\begin{enumerate} -\item -Write a function to sum all integers between two given numbers \code{a} and \code{b}: -\begin{lstlisting} -def sumInts(a: int, b: int): double = - if (a > b) 0 else a + sumInts(a + 1, b) -\end{lstlisting} -\item -Write a function to sum the cubes of all integers between two given numbers -\code{a} and \code{b}: -\begin{lstlisting} -def cube(x: int): double = x * x * x -def sumCubes(a: int, b: int): double = - if (a > b) 0 else cube(a) + sumSqrts(a + 1, b) -\end{lstlisting} -\item -Write a function to sum the reciprocals of all integers between two given numbers -\code{a} and \code{b}: -\begin{lstlisting} -def sumReciprocals(a: int, b: int): double = - if (a > b) 0 else 1.0 / a + sumReciprocals(a + 1, b) -\end{lstlisting} -\end{enumerate} -These functions are all instances of -\(\sum^b_a f(n)\) for different values of $f$. -We can factor out the common pattern by defining a function \code{sum}: -\begin{lstlisting} -def sum(f: int => double, a: int, b: int): double = - if (a > b) 0 else f(a) + sum(f, a + 1, b) -\end{lstlisting} -The type \code{int => double} is the type of functions that -take arguments of type \code{int} and return results of type -\code{double}. So \code{sum} is a function which takes another function as -a parameter. In other words, \code{sum} is a {\em higher-order} -function. - -Using \code{sum}, we can formulate the three summing functions as -follows. -\begin{lstlisting} -def sumInts(a: int, b: int): double = sum(id, a, b); -def sumCubes(a: int, b: int): double = sum(cube, a, b); -def sumReciprocals(a: int, b: int): double = sum(reciprocal, a, b); -\end{lstlisting} -where -\begin{lstlisting} -def id(x: int): double = x; -def cube(x: int): double = x * x * x; -def reciprocal(x: int): double = 1.0/x; -\end{lstlisting} - -\section{Anonymous Functions} - -Parameterization by functions tends to create many small functions. In -the previous example, we defined \code{id}, \code{cube} and -\code{reciprocal} as separate functions, so that they could be -passed as arguments to \code{sum}. - -Instead of using named function definitions for these small argument -functions, we can formulate them in a shorter way as {\em anonymous -functions}. An anonymous function is an expression that evaluates to a -function; the function is defined without giving it a name. As an -example consider the anonymous reciprocal function: -\begin{lstlisting} - x: int => 1.0/x -\end{lstlisting} -The part before the arrow `\code{=>}' is the parameter of the function, -whereas the part following the `\code{=>}' is its body. If there are -several parameters, we need to enclose them in parentheses. For -instance, here is an anonymous function which multiples its two arguments. -\begin{lstlisting} - (x: double, y: double) => x * y -\end{lstlisting} -Using anonymous functions, we can reformulate the three summation -functions without named auxiliary functions: -\begin{lstlisting} -def sumInts(a: int, b: int): double = sum(x: int => x, a, b); -def sumCubes(a: int, b: int): double = sum(x: int => x * x * x, a, b); -def sumReciprocals(a: int, b: int): double = sum(x: int => 1.0/x, a, b); -\end{lstlisting} -Often, the Scala compiler can deduce the parameter type(s) from the -context of the anonymous function in which case they can be omitted. -For instance, in the case of \code{sumInts}, \code{sumCubes} and -\code{sumReciprocals}, one knows from the type of -\code{sum} that the first parameter must be a function of type -\code{int => double}. Hence, the parameter type \code{int} is -redundant and may be omitted: -\begin{lstlisting} -def sumInts(a: int, b: int): double = sum(x => x, a, b); -def sumCubes(a: int, b: int): double = sum(x => x * x * x, a, b); -def sumReciprocals(a: int, b: int): double = sum(x => 1.0/x, a, b); -\end{lstlisting} - -Generally, the Scala term -\code{(x}$_1$\code{: T}$_1$\code{, ..., x}$_n$\code{: T}$_n$\code{) => E} -defines a function which maps its parameters -\code{x}$_1$\code{, ..., x}$_n$ to the result of the expression \code{E} -(where \code{E} may refer to \code{x}$_1$\code{, ..., x}$_n$). Anonymous -functions are not essential language elements of Scala, as they can -always be expressed in terms of named functions. Indeed, the -anonymous function -\begin{lstlisting} -(x$_1$: T$_1$, ..., x$_n$: T$_n$) => E -\end{lstlisting} -is equivalent to the block -\begin{lstlisting} -{ def f (x$_1$: T$_1$, ..., x$_n$: T$_n$) = E ; f } -\end{lstlisting} -where \code{f} is fresh name which is used nowhere else in the program. -We also say, anonymous functions are ``syntactic sugar''. - -\section{Currying} - -The latest formulation of the three summing function is already quite -compact. But we can do even better. Note that -\code{a} and \code{b} appear as parameters and arguments of every function -but they do not seem to take part in interesting combinations. Is -there a way to get rid of them? - -Let's try to rewrite \code{sum} so that it does not take the bounds -\code{a} and \code{b} as parameters: -\begin{lstlisting} -def sum(f: int => double) = { - def sumF(a: int, b: int): double = - if (a > b) 0 else f(a) + sumF(a + 1, b); - sumF -} -\end{lstlisting} -In this formulation, \code{sum} is a function which returns another -function, namely the specialized summing function \code{sumF}. This -latter function does all the work; it takes the bounds \code{a} and -\code{b} as parameters, applies \code{sum}'s function parameter \code{f} to all -integers between them, and sums up the results. - -Using this new formulation of \code{sum}, we can now define: -\begin{lstlisting} -def sumInts = sum(x => x); -def sumCubes = sum(x => x * x * x); -def sumReciprocals = sum(x => 1.0/x); -\end{lstlisting} -Or, equivalently, with value definitions: -\begin{lstlisting} -val sumInts = sum(x => x); -val sumCubes = sum(x => x * x * x); -val sumReciprocals = sum(x => 1.0/x); -\end{lstlisting} -These functions can be applied like other functions. For instance, -\begin{lstlisting} -> sumCubes(1, 10) + sumReciprocals(10, 20) -3025.7687714031754: scala.Double -\end{lstlisting} -How are function-returning functions applied? As an example, in the expression -\begin{lstlisting} -sum(x => x * x * x)(1, 10) , -\end{lstlisting} -the function \code{sum} is applied to the cubing function -\code{(x => x * x * x)}. The resulting function is then -applied to the second argument list, \code{(1, 10)}. - -This notation is possible because function application associates to the left. -That is, if $\mbox{args}_1$ and $\mbox{args}_2$ are argument lists, then -\bda{lcl} -f(\mbox{args}_1)(\mbox{args}_2) & \ \ \mbox{is equivalent to}\ \ & (f(\mbox{args}_1))(\mbox{args}_2) -\eda -In our example, \code{sum(x => x * x * x)(1, 10)} is equivalent to the -following expression: -\code{(sum(x => x * x * x))(1, 10)}. - -The style of function-returning functions is so useful that Scala has -special syntax for it. For instance, the next definition of \code{sum} -is equivalent to the previous one, but is shorter: -\begin{lstlisting} -def sum(f: int => double)(a: int, b: int): double = - if (a > b) 0 else f(a) + sum(f)(a + 1, b) -\end{lstlisting} -Generally, a curried function definition -\begin{lstlisting} -def f (args$_1$) ... (args$_n$) = E -\end{lstlisting} -where $n > 1$ expands to -\begin{lstlisting} -def f (args$_1$) ... (args$_{n-1}$) = { def g (args$_n$) = E ; g } -\end{lstlisting} -where \code{g} is a fresh identifier. Or, shorter, using an anonymous function: -\begin{lstlisting} -def f (args$_1$) ... (args$_{n-1}$) = ( args$_n$ ) => E . -\end{lstlisting} -Performing this step $n$ times yields that -\begin{lstlisting} -def f (args$_1$) ... (args$_n$) = E -\end{lstlisting} -is equivalent to -\begin{lstlisting} -def f = (args$_1$) => ... => (args$_n$) => E . -\end{lstlisting} -Or, equivalently, using a value definition: -\begin{lstlisting} -val f = (args$_1$) => ... => (args$_n$) => E . -\end{lstlisting} -This style of function definition and application is called {\em -currying} after its promoter, Haskell B.\ Curry, a logician of the -20th century, even though the idea goes back further to Moses -Sch\"onfinkel and Gottlob Frege. - -The type of a function-returning function is expressed analogously to -its parameter list. Taking the last formulation of \code{sum} as an example, -the type of \code{sum} is \code{(int => double) => (int, int) => double}. -This is possible because function types associate to the right. I.e. -\begin{lstlisting} -T$_1$ => T$_2$ => T$_3$ $\mbox{is equivalent to}$ T$_1$ => (T$_2$ => T$_3$) -\end{lstlisting} - - -\begin{exercise} -1. The \code{sum} function uses a linear recursion. Can you write a -tail-recursive one by filling in the ??'s? - -\begin{lstlisting} -def sum(f: int => double)(a: int, b: int): double = { - def iter(a, result) = { - if (??) ?? - else iter(??, ??) - } - iter(??, ??) -} -\end{lstlisting} -\end{exercise} - -\begin{exercise} -Write a function \code{product} that computes the product of the -values of functions at points over a given range. -\end{exercise} - -\begin{exercise} -Write \code{factorial} in terms of \code{product}. -\end{exercise} - -\begin{exercise} -Can you write an even more general function which generalizes both -\code{sum} and \code{product}? -\end{exercise} - -\section{Example: Finding Fixed Points of Functions} - -A number \code{x} is called a {\em fixed point} of a function \code{f} if -\begin{lstlisting} -f(x) = x . -\end{lstlisting} -For some functions \code{f} we can locate the fixed point by beginning -with an initial guess and then applying \code{f} repeatedly, until the -value does not change anymore (or the change is within a small -tolerance). This is possible if the sequence -\begin{lstlisting} -x, f(x), f(f(x)), f(f(f(x))), ... -\end{lstlisting} -converges to fixed point of $f$. This idea is captured in -the following ``fixed-point finding function'': -\begin{lstlisting} -val tolerance = 0.0001; -def isCloseEnough(x: double, y: double) = abs((x - y) / x) < tolerance; -def fixedPoint(f: double => double)(firstGuess: double) = { - def iterate(guess: double): double = { - val next = f(guess); - if (isCloseEnough(guess, next)) next - else iterate(next) - } - iterate(firstGuess) -} -\end{lstlisting} -We now apply this idea in a reformulation of the square root function. -Let's start with a specification of \code{sqrt}: -\begin{lstlisting} -sqrt(x) = $\mbox{the {\sl y} such that}$ y * y = x - = $\mbox{the {\sl y} such that}$ y = x / y -\end{lstlisting} -Hence, \code{sqrt(x)} is a fixed point of the function \code{y => x / y}. -This suggests that \code{sqrt(x)} can be computed by fixed point iteration: -\begin{lstlisting} -def sqrt(x: double) = fixedPoint(y => x / y)(1.0) -\end{lstlisting} -Unfortunately, this does not converge. Let's instrument the fixed point -function with a print statement which keeps track of the current -\code{guess} value: -\begin{lstlisting} -def fixedPoint(f: double => double)(firstGuess: double) = { - def iterate(guess: double): double = { - val next = f(guess); - System.out.println(next); - if (isCloseEnough(guess, next)) next - else iterate(next) - } - iterate(firstGuess) -} -\end{lstlisting} -Then, \code{sqrt(2)} yields: -\begin{lstlisting} - 2.0 - 1.0 - 2.0 - 1.0 - 2.0 - ... -\end{lstlisting} -One way to control such oscillations is to prevent the guess from changing too much. -This can be achieved by {\em averaging} successive values of the original sequence: -\begin{lstlisting} -> def sqrt(x: double) = fixedPoint(y => (y + x/y) / 2)(1.0) -def sqrt(x: scala.Double): scala.Double -> sqrt(2.0) - 1.5 - 1.4166666666666665 - 1.4142156862745097 - 1.4142135623746899 - 1.4142135623746899 -\end{lstlisting} -In fact, expanding the \code{fixedPoint} function yields exactly our -previous definition of fixed point from Section~\ref{sec:sqrt}. - -The previous examples showed that the expressive power of a language -is considerably enhanced if functions can be passed as arguments. The -next example shows that functions which return functions can also be -very useful. - -Consider again fixed point iterations. We started with the observation -that $\sqrt(x)$ is a fixed point of the function \code{y => x / y}. -Then we made the iteration converge by averaging successive values. -This technique of {\em average dampening} is so general that it -can be wrapped in another function. -\begin{lstlisting} -def averageDamp(f: double => double)(x: double) = (x + f(x)) / 2 -\end{lstlisting} -Using \code{averageDamp}, we can reformulate the square root function -as follows. -\begin{lstlisting} -def sqrt(x: double) = fixedPoint(averageDamp(y => x/y))(1.0) -\end{lstlisting} -This expresses the elements of the algorithm as clearly as possible. - -\begin{exercise} Write a function for cube roots using \code{fixedPoint} and -\code{averageDamp}. -\end{exercise} - -\section{Summary} - -We have seen in the previous chapter that functions are essential -abstractions, because they permit us to introduce general methods of -computing as explicit, named elements in our programming language. -The present chapter has shown that these abstractions can be combined -by higher-order functions to create further abstractions. As -programmers, we should look out for opportunities to abstract and to -reuse. The highest possible level of abstraction is not always the -best, but it is important to know abstraction techniques, so that one -can use abstractions where appropriate. - -\section{Language Elements Seen So Far} - -Chapters~\ref{chap:simple-funs} and \ref{chap:first-class-funs} have -covered Scala's language elements to express expressions and types -comprising of primitive data and functions. The context-free syntax -of these language elements is given below in extended Backus-Naur -form, where `\code{|}' denotes alternatives, \code{[...]} denotes -option (0 or 1 occurrence), and \lstinline@{...}@ denotes repetition -(0 or more occurrences). - -\subsection*{Characters} - -Scala programs are sequences of (Unicode) characters. We distinguish the -following character sets: -\begin{itemize} -\item -whitespace, such as `\code{ }', tabulator, or newline characters, -\item -letters `\code{a}' to `\code{z}', `\code{A}' to `\code{Z}', -\item -digits \code{`0'} to `\code{9}', -\item -the delimiter characters - -\begin{lstlisting} -. , ; ( ) { } [ ] \ $\mbox{\tt "}$ ' -\end{lstlisting} - -\item -operator characters, such as `\code{#}' `\code{+}', -`\code{:}'. Essentially, these are printable characters which are -in none of the character sets above. -\end{itemize} - -\subsection*{Lexemes:} - -\begin{lstlisting} -ident = letter {letter | digit} - | operator { operator } - | ident '_' ident -literal = $\mbox{``as in Java''}$ -\end{lstlisting} - -Literals are as in Java. They define numbers, characters, strings, or -boolean values. Examples of literals as \code{0}, \code{1.0d10}, \code{'x'}, -\code{"he said \"hi!\""}, or \code{true}. - -Identifiers can be of two forms. They either start with a letter, -which is followed by a (possibly empty) sequence of letters or -symbols, or they start with an operator character, which is followed -by a (possibly empty) sequence of operator characters. Both forms of -identifiers may contain underscore characters `\code{_}'. Furthermore, -an underscore character may be followed by either sort of -identifier. Hence, the following are all legal identifiers: -\begin{lstlisting} -x Room10a + -- foldl_: +_vector -\end{lstlisting} -It follows from this rule that subsequent operator-identifiers need to -be separated by whitespace. For instance, the input -\code{x+-y} is parsed as the three token sequence \code{x}, \code{+-}, -\code{y}. If we want to express the sum of \code{x} with the -negated value of \code{y}, we need to add at least one space, -e.g. \code{x+ -y}. - -The \verb@$@ character is reserved for compiler-generated -identifiers; it should not be used in source programs. %$ - -The following are reserved words, they may not be used as identifiers: -\begin{lstlisting}[keywordstyle=] -abstract case catch class def -do else extends false final -finally for if import new -null object override package private -protected return sealed super this -trait try true type val -var while with yield -_ : = => <- <: >: # @ -\end{lstlisting} - -\subsection*{Types:} - -\begin{lstlisting} -Type = SimpleType | FunctionType -FunctionType = SimpleType '=>' Type | '(' [Types] ')' '=>' Type -SimpleType = byte | short | char | int | long | double | float | - boolean | unit | String -Types = Type {`,' Type} -\end{lstlisting} - -Types can be: -\begin{itemize} -\item number types \code{byte}, \code{short}, \code{char}, \code{int}, \code{long}, \code{float} and \code{double} (these are as in Java), -\item the type \code{boolean} with values \code{true} and \code{false}, -\item the type \code{unit} with the only value \code{()}, -\item the type \code{String}, -\item function types such as \code{(int, int) => int} or \code{String => Int => String}. -\end{itemize} - -\subsection*{Expressions:} - -\begin{lstlisting} -Expr = InfixExpr | FunctionExpr | if '(' Expr ')' Expr else Expr -InfixExpr = PrefixExpr | InfixExpr Operator InfixExpr -Operator = ident -PrefixExpr = ['+' | '-' | '!' | '~' ] SimpleExpr -SimpleExpr = ident | literal | SimpleExpr '.' ident | Block -FunctionExpr = Bindings '=>' Expr -Bindings = ident [':' SimpleType] | '(' [Binding {',' Binding}] ')' -Binding = ident [':' Type] -Block = '{' {Def ';'} Expr '}' -\end{lstlisting} - -Expressions can be: -\begin{itemize} -\item -identifiers such as \code{x}, \code{isGoodEnough}, \code{*}, or \code{+-}, -\item -literals, such as \code{0}, \code{1.0}, or \code{"abc"}, -\item -field and method selections, such as \code{System.out.println}, -\item -function applications, such as \code{sqrt(x)}, -\item -operator applications, such as \code{-x} or \code{y + x}, -\item -conditionals, such as \code{if (x < 0) -x else x}, -\item -blocks, such as \lstinline@{ val x = abs(y) ; x * 2 }@, -\item -anonymous functions, such as \code{x => x + 1} or \code{(x: int, y: int) => x + y}. -\end{itemize} - -\subsection*{Definitions:} - -\begin{lstlisting} -Def = FunDef | ValDef -FunDef = 'def' ident {'(' [Parameters] ')'} [':' Type] '=' Expr -ValDef = 'val' ident [':' Type] '=' Expr -Parameters = Parameter {',' Parameter} -Parameter = ['def'] ident ':' Type -\end{lstlisting} -Definitions can be: -\begin{itemize} -\item -function definitions such as \code{def square(x: int): int = x * x}, -\item -value definitions such as \code{val y = square(2)}. -\end{itemize} - -\chapter{Classes and Objects} -\label{chap:classes} - -Scala does not have a built-in type of rational numbers, but it is -easy to define one, using a class. Here's a possible implementation. - -\begin{lstlisting} -class Rational(n: int, d: int) { - private def gcd(x: int, y: int): int = { - if (x == 0) y - else if (x < 0) gcd(-x, y) - else if (y < 0) -gcd(x, -y) - else gcd(y % x, x); - } - private val g = gcd(n, d); - - val numer: int = n/g; - val denom: int = d/g; - def +(that: Rational) = - new Rational(numer * that.denom + that.numer * denom, - denom * that.denom); - def -(that: Rational) = - new Rational(numer * that.denom - that.numer * denom, - denom * that.denom); - def *(that: Rational) = - new Rational(numer * that.numer, denom * that.denom); - def /(that: Rational) = - new Rational(numer * that.denom, denom * that.numer); -} -\end{lstlisting} -This defines \code{Rational} as a class which takes two constructor -arguments \code{n} and \code{d}, containing the number's numerator and -denominator parts. The class provides fields which return these parts -as well as methods for arithmetic over rational numbers. Each -arithmetic method takes as parameter the right operand of the -operation. The left operand of the operation is always the rational -number of which the method is a member. - -\paragraph{Private members} -The implementation of rational numbers defines a private method -\code{gcd} which computes the greatest common denominator of two -integers, as well as a private field \code{g} which contains the -\code{gcd} of the constructor arguments. These members are inaccessible -outside class \code{Rational}. They are used in the implementation of -the class to eliminate common factors in the constructor arguments in -order to ensure that nominator and denominator are always in -normalized form. - -\paragraph{Creating and Accessing Objects} -As an example of how rational numbers can be used, here's a program -that prints the sum of all numbers $1/i$ where $i$ ranges from 1 to 10. -\begin{lstlisting} -var i = 1; -var x = new Rational(0, 1); -while (i <= 10) { - x = x + new Rational(1,i); - i = i + 1; -} -System.out.println("" + x.numer + "/" + x.denom); -\end{lstlisting} -The \code{+} takes as left operand a string and as right operand a -value of arbitrary type. It returns the result of converting its right -operand to a string and appending it to its left operand. - -\paragraph{Inheritance and Overriding} -Every class in Scala has a superclass which it extends. -\comment{Excepted is -only the root class \code{Object}, which does not have a superclass, -and which is indirectly extended by every other class. } -If a class -does not mention a superclass in its definition, the root type -\code{scala.AnyRef} is implicitly assumed (for Java implementations, -this type is an alias for \code{java.lang.Object}. For instance, class -\code{Rational} could equivalently be defined as -\begin{lstlisting} -class Rational(n: int, d: int) extends AnyRef { - ... // as before -} -\end{lstlisting} -A class inherits all members from its superclass. It may also redefine -(or: {\em override}) some inherited members. For instance, class -\code{java.lang.Object} defines -a method -\code{toString} which returns a representation of the object as a string: -\begin{lstlisting} -class Object { - ... - def toString(): String = ... -} -\end{lstlisting} -The implementation of \code{toString} in \code{Object} -forms a string consisting of the object's class name and a number. It -makes sense to redefine this method for objects that are rational -numbers: -\begin{lstlisting} -class Rational(n: int, d: int) extends AnyRef { - ... // as before - override def toString() = "" + numer + "/" + denom; -} -\end{lstlisting} -Note that, unlike in Java, redefining definitions need to be preceded -by an \code{override} modifier. - -If class $A$ extends class $B$, then objects of type $A$ may be used -wherever objects of type $B$ are expected. We say in this case that -type $A$ {\em conforms} to type $B$. For instance, \code{Rational} -conforms to \code{AnyRef}, so it is legal to assign a \code{Rational} -value to a variable of type \code{AnyRef}: -\begin{lstlisting} -var x: AnyRef = new Rational(1,2); -\end{lstlisting} - -\paragraph{Parameterless Methods} -%Also unlike in Java, methods in Scala do not necessarily take a -%parameter list. An example is \code{toString}; the method is invoked -%by simply mentioning its name. For instance: -%\begin{lstlisting} -%val r = new Rational(1,2); -%System.out.println(r.toString()); // prints``1/2'' -%\end{lstlisting} -Unlike in Java, methods in Scala do not necessarily take a -parameter list. An example is the \code{square} method below. This -method is invoked by simply mentioning its name. -\begin{lstlisting} -class Rational(n: int, d: int) extends AnyRef { - ... // as before - def square = Rational(numer*numer, denom*denom); -} -val r = new Rational(3,4); -System.out.println(r.square); // prints``9/16'' -\end{lstlisting} -That is, parameterless methods are accessed just as value fields such -as \code{numer} are. The difference between values and parameterless -methods lies in their definition. The right-hand side of a value is -evaluated when the object is created, and the value does not change -afterwards. A right-hand side of a parameterless method, on the other -hand, is evaluated each time the method is called. The uniform access -of fields and parameterless methods gives increased flexibility for -the implementer of a class. Often, a field in one version of a class -becomes a computed value in the next version. Uniform access ensures -that clients do not have to be rewritten because of that change. - -\paragraph{Abstract Classes} - -Consider the task of writing a class for sets of integer numbers with -two operations, \code{incl} and \code{contains}. \code{(s incl x)} -should return a new set which contains the element \code{x} togther -with all the elements of set \code{s}. \code{(s contains x)} should -return true if the set \code{s} contains the element \code{x}, and -should return \code{false} otherwise. The interface of such sets is -given by: -\begin{lstlisting} -abstract class IntSet { - def incl(x: int): IntSet; - def contains(x: int): boolean; -} -\end{lstlisting} -\code{IntSet} is labeled as an \emph{abstract class}. This has two -consequences. First, abstract classes may have {\em deferred} members -which are declared but which do not have an implementation. In our -case, both \code{incl} and \code{contains} are such members. Second, -because an abstract class might have unimplemented members, no objects -of that class may be created using \code{new}. By contrast, an -abstract class may be used as a base class of some other class, which -implements the deferred members. - -\paragraph{Traits} - -Instead of \code{abstract class} one also often uses the keyword -\code{trait} in Scala. A trait is an abstract class with no state, no -constructor arguments, and no side effects during object -initialization. Since \code{IntSet}'s fall in this category, one can -alternatively define them as traits: -\begin{lstlisting} -trait IntSet { - def incl(x: int): IntSet; - def contains(x: int): boolean; -} -\end{lstlisting} -A trait corresponds to an interface in Java, except -that a trait can also define implemented methods. - -\paragraph{Implementing Abstract Classes} - -Let's say, we plan to implement sets as binary trees. There are two -possible forms of trees. A tree for the empty set, and a tree -consisting of an integer and two subtrees. Here are their -implementations. - -\begin{lstlisting} -class EmptySet extends IntSet { - def contains(x: int): boolean = false; - def incl(x: int): IntSet = new NonEmptySet(x, new EmptySet, new EmptySet); -} -\end{lstlisting} - -\begin{lstlisting} -class NonEmptySet(elem:int, left:IntSet, right:IntSet) extends IntSet { - def contains(x: int): boolean = - if (x < elem) left contains x - else if (x > elem) right contains x - else true; - def incl(x: int): IntSet = - if (x < elem) new NonEmptySet(elem, left incl x, right) - else if (x > elem) new NonEmptySet(elem, left, right incl x) - else this; -} -\end{lstlisting} -Both \code{EmptySet} and \code{NonEmptySet} extend class -\code{IntSet}. This implies that types \code{EmptySet} and -\code{NonEmptySet} conform to type \code{IntSet} -- a value of type \code{EmptySet} or \code{NonEmptySet} may be used wherever a value of type \code{IntSet} is required. - -\begin{exercise} Write methods \code{union} and \code{intersection} to form -the union and intersection between two sets. -\end{exercise} - -\begin{exercise} Add a method -\begin{lstlisting} -def excl(x: int) -\end{lstlisting} -to return the given set without the element \code{x}. To accomplish this, -it is useful to also implement a test method -\begin{lstlisting} -def isEmpty: boolean -\end{lstlisting} -for sets. -\end{exercise} - -\paragraph{Dynamic Binding} - -Object-oriented languages (Scala included) use \emph{dynamic dispatch} -for method invocations. That is, the code invoked for a method call -depends on the run-time type of the object which contains the method. -For example, consider the expression \code{s contains 7} where -\code{s} is a value of declared type \code{s: IntSet}. Which code for -\code{contains} is executed depends on the type of value of \code{s} at run-time. -If it is an \code{EmptySet} value, it is the implementation of \code{contains} in class \code{EmptySet} that is executed, and analogously for \code{NonEmptySet} values. -This behavior is a direct consequence of our substitution model of evaluation. -For instance, -\begin{lstlisting} - (new EmptySet).contains(7) - --> $\rewriteby{by replacing {\sl contains} by its body in class {\sl EmptySet}}$ - - false -\end{lstlisting} -Or, -\begin{lstlisting} - new NonEmptySet(7, new EmptySet, new EmptySet).contains(1) - --> $\rewriteby{by replacing {\sl contains} by its body in class {\sl NonEmptySet}}$ - - if (1 < 7) new EmptySet contains 1 - else if (1 > 7) new EmptySet contains 1 - else true - --> $\rewriteby{by rewriting the conditional}$ - - new EmptySet contains 1 - --> $\rewriteby{by replacing {\sl contains} by its body in class {\sl EmptySet}}$ - - false . -\end{lstlisting} - -Dynamic method dispatch is analogous to higher-order function -calls. In both cases, the identity of code to be executed is known -only at run-time. This similarity is not just superficial. Indeed, -Scala represents every function value as an object (see -Section~\ref{sec:functions}). - - -\paragraph{Objects} - -In the previous implementation of integer sets, empty sets were -expressed with \code{new EmptySet}; so a new object was created every time -an empty set value was required. We could have avoided unnecessary -object creations by defining a value \code{empty} once and then using -this value instead of every occurrence of \code{new EmptySet}. E.g. -\begin{lstlisting} -val EmptySetVal = new EmptySet; -\end{lstlisting} -One problem with this approach is that a value definition such as the -one above is not a legal top-level definition in Scala; it has to be -part of another class or object. Also, the definition of class -\code{EmptySet} now seems a bit of an overkill -- why define a class of objects, -if we are only interested in a single object of this class? A more -direct approach is to use an {\em object definition}. Here is -a more streamlined alternative definition of the empty set: -\begin{lstlisting} -object EmptySet extends IntSet { - def contains(x: int): boolean = false; - def incl(x: int): IntSet = new NonEmptySet(x, empty, empty); -} -\end{lstlisting} -The syntax of an object definition follows the syntax of a class -definition; it has an optional extends clause as well as an optional -body. As is the case for classes, the extends clause defines inherited -members of the object whereas the body defines overriding or new -members. However, an object definition defines a single object only; -it is not possible to create other objects with the same structure -using \code{new}. Therefore, object definitions also lack constructor -parameters, which might be present in class definitions. - -Object definitions can appear anywhere in a Scala program; including -at top-level. Since there is no fixed execution order of top-level -entities in Scala, one might ask exactly when the object defined by an -object definition is created and initialized. The answer is that the -object is created the first time one of its members is accessed. This -strategy is called {\em lazy evaluation}. - -\paragraph{Standard Classes} - -\todo{include picture} - -Scala is a pure object-oriented language. This means that every value -in Scala can be regarded as an object. In fact, even primitive types -such as \code{int} or \code{boolean} are not treated specially. They -are defined as type aliases of Scala classes in module \code{Predef}: -\begin{lstlisting} -type boolean = scala.Boolean; -type int = scala.Int; -type long = scala.Long; -... -\end{lstlisting} -For efficiency, the compiler usually represents values of type -\code{scala.Int} by 32 bit integers, values of type -\code{scala.Boolean} by Java's booleans, etc. But it converts these -specialized representations to objects when required, for instance -when a primitive \code{int} value is passed to a function with a -parameter of type \code{AnyRef}. Hence, the special representation of -primitive values is just an optimization, it does not change the -meaning of a program. - -Here is a specification of class \code{Boolean}. -\begin{lstlisting} -package scala; -trait Boolean { - def && (def x: Boolean): Boolean; - def || (def x: Boolean): Boolean; - def ! : Boolean; - - def == (x: Boolean) : Boolean - def != (x: Boolean) : Boolean - def < (x: Boolean) : Boolean - def > (x: Boolean) : Boolean - def <= (x: Boolean) : Boolean - def >= (x: Boolean) : Boolean -} -\end{lstlisting} -Booleans can be defined using only classes and objects, without -reference to a built-in type of booleans or numbers. A possible -implementation of class \code{Boolean} is given below. This is not -the actual implementation in the standard Scala library. For -efficiency reasons the standard implementation uses built-in -booleans. -\begin{lstlisting} -package scala; -trait Boolean { - def ifThenElse(def thenpart: Boolean, def elsepart: Boolean) - - def && (def x: Boolean): Boolean = ifThenElse(x, false); - def || (def x: Boolean): Boolean = ifThenElse(true, x); - def ! : Boolean = ifThenElse(false, true); - - def == (x: Boolean) : Boolean = ifThenElse(x, x.!); - def != (x: Boolean) : Boolean = ifThenElse(x.!, x); - def < (x: Boolean) : Boolean = ifThenElse(false, x); - def > (x: Boolean) : Boolean = ifThenElse(x.!, false); - def <= (x: Boolean) : Boolean = ifThenElse(x, true); - def >= (x: Boolean) : Boolean = ifThenElse(true, x.!); -} -case object True extends Boolean { - def ifThenElse(def t: Boolean, def e: Boolean) = t -} -case object False extends Boolean { - def ifThenElse(def t: Boolean, def e: Boolean) = e -} -\end{lstlisting} -Here is a partial specification of class \code{Int}. - -\begin{lstlisting} -package scala; -trait Int extends AnyVal { - def coerce: Long; - def coerce: Float; - def coerce: Double; - - def + (that: Double): Double; - def + (that: Float): Float; - def + (that: Long): Long; - def + (that: Int): Int; // analogous for -, *, /, % - - def << (cnt: Int): Int; // analogous for >>, >>> - - def & (that: Long): Long; - def & (that: Int): Int; // analogous for |, ^ - - def == (that: Double): Boolean; - def == (that: Float): Boolean; - def == (that: Long): Boolean; // analogous for !=, <, >, <=, >= -} -\end{lstlisting} - -Class \code{Int} can in principle also be implemented using just -objects and classes, without reference to a built in type of -integers. To see how, we consider a slightly simpler problem, namely -how to implement a type \code{Nat} of natural (i.e. non-negative) -numbers. Here is the definition of a trait \code{Nat}: -\begin{lstlisting} -trait Nat { - def isZero: Boolean; - def predecessor: Nat; - def successor: Nat; - def + (that: Nat): Nat; - def - (that: Nat): Nat; -} -\end{lstlisting} -To implement the operations of class \code{Nat}, we define a subobject -\code{Zero} and a subclass \code{Succ} (for successor). Each number -\code{N} is represented as \code{N} applications of the \code{Succ} -constructor to \code{Zero}: -\[ -\underbrace{\mbox{\sl new Succ( ... new Succ}}_{\mbox{$N$ times}}\mbox{\sl (Zero) ... )} -\] -The implementation of the \code{Zero} object is straightforward: -\begin{lstlisting} -object Zero extends Nat { - def isZero: Boolean = true; - def predecessor: Nat = throw new Error("negative number"); - def successor: Nat = new Succ(Zero); - def + (that: Nat): Nat = that; - def - (that: Nat): Nat = if (that.isZero) Zero - else throw new Error("negative number") -} -\end{lstlisting} - -The implementation of the predecessor and subtraction functions on -\code{Zero} throws an \code{Error} exception, which aborts the program -with the given error message. - -Here is the implementation of the successor class: -\begin{lstlisting} -class Succ(x: Nat) extends Nat { - def isZero: Boolean = false; - def predecessor: Nat = x; - def successor: Nat = new Succ(this); - def + (that: Nat): Nat = x + that.successor; - def - (that: Nat): Nat = x - that.predecessor; -} -\end{lstlisting} -Note the implementation of method \code{successor}. To create the -successor of a number, we need to pass the object itself as an -argument to the \code{Succ} constructor. The object itself is -referenced by the reserved name \code{this}. - -The implementations of \code{+} and \code{-} each contain a recursive -call with the constructor argument as receiver. The recursion will -terminate once the receiver is the \code{Zero} object (which is -guaranteed to happen eventually because of the way numbers are formed). - -\begin{exercise} Write an implementation \code{Integer} of integer numbers -The implementation should support all operations of class \code{Nat} -while adding two methods -\begin{lstlisting} -def isPositive: Boolean -def negate: Integer -\end{lstlisting} -The first method should return \code{true} if the number is positive. The second method should negate the number. -Do not use any of Scala's standard numeric classes in your -implementation. (Hint: There are two possible ways to implement -\code{Integer}. One can either make use the existing implementation of -\code{Nat}, representing an integer as a natural number and a sign. -Or one can generalize the given implementation of \code{Nat} to -\code{Integer}, using the three subclasses \code{Zero} for 0, -\code{Succ} for positive numbers and \code{Pred} for negative numbers.) -\end{exercise} - - - -\subsection*{Language Elements Introduced In This Chapter} - -\textbf{Types:} -\begin{lstlisting} -Type = ... | ident -\end{lstlisting} - -Types can now be arbitrary identifiers which represent classes. - -\textbf{Expressions:} -\begin{lstlisting} -Expr = ... | Expr '.' ident | 'new' Expr | 'this' -\end{lstlisting} - -An expression can now be an object creation, or -a selection \code{E.m} of a member \code{m} -from an object-valued expression \code{E}, or it can be the reserved name \code{this}. - -\textbf{Definitions and Declarations:} -\begin{lstlisting} -Def = FunDef | ValDef | ClassDef | TraitDef | ObjectDef -ClassDef = ['abstract'] 'class' ident ['(' [Parameters] ')'] - ['extends' Expr] [`{' {TemplateDef} `}'] -TraitDef = 'trait' ident ['extends' Expr] ['{' {TemplateDef} '}'] -ObjectDef = 'object' ident ['extends' Expr] ['{' {ObjectDef} '}'] -TemplateDef = [Modifier] (Def | Dcl) -ObjectDef = [Modifier] Def -Modifier = 'private' | 'override' -Dcl = FunDcl | ValDcl -FunDcl = 'def' ident {'(' [Parameters] ')'} ':' Type -ValDcl = 'val' ident ':' Type -\end{lstlisting} - -A definition can now be a class, trait or object definition such as -\begin{lstlisting} -class C(params) extends B { defs } -trait T extends B { defs } -object O extends B { defs } -\end{lstlisting} -The definitions \code{defs} in a class, trait or object may be -preceded by modifiers \code{private} or \code{override}. - -Abstract classes and traits may also contain declarations. These -introduce {\em deferred} functions or values with their types, but do -not give an implementation. Deferred members have to be implemented in -subclasses before objects of an abstract class or trait can be created. - -\chapter{Case Classes and Pattern Matching} - -Say, we want to write an interpreter for arithmetic expressions. To -keep things simple initially, we restrict ourselves to just numbers -and \code{+} operations. Such expressions can be represented as a class hierarchy, with an abstract base class \code{Expr} as the root, and two subclasses \code{Number} and -\code{Sum}. Then, an expression \code{1 + (3 + 7)} would be represented as -\begin{lstlisting} -new Sum(new Number(1), new Sum(new Number(3), new Number(7))) -\end{lstlisting} -Now, an evaluator of an expression like this needs to know of what -form it is (either \code{Sum} or \code{Number}) and also needs to -access the components of the expression. The following -implementation provides all necessary methods. -\begin{lstlisting} -trait Expr { - def isNumber: boolean; - def isSum: boolean; - def numValue: int; - def leftOp: Expr; - def rightOp: Expr; -} -class Number(n: int) extends Expr { - def isNumber: boolean = true; - def isSum: boolean = false; - def numValue: int = n; - def leftOp: Expr = throw new Error("Number.leftOp"); - def rightOp: Expr = throw new Error("Number.rightOp"); -} -class Sum(e1: Expr, e2: Expr) extends Expr { - def isNumber: boolean = false; - def isSum: boolean = true; - def numValue: int = throw new Error("Sum.numValue"); - def leftOp: Expr = e1; - def rightOp: Expr = e2; -} -\end{lstlisting} -With these classification and access methods, writing an evaluator function is simple: -\begin{lstlisting} -def eval(e: Expr): int = { - if (e.isNumber) e.numValue - else if (e.isSum) eval(e.leftOp) + eval(e.rightOp) - else throw new Error("unrecognized expression kind") -} -\end{lstlisting} -However, defining all these methods in classes \code{Sum} and -\code{Number} is rather tedious. Furthermore, the problem becomes worse -when we want to add new forms of expressions. For instance, consider -adding a new expression form -\code{Prod} for products. Not only do we have to implement a new class \code{Prod}, with all previous classification and access methods; we also have to introduce a -new abstract method \code{isProduct} in class \code{Expr} and -implement that method in subclasses \code{Number}, \code{Sum}, and -\code{Prod}. Having to modify existing code when a system grows is always problematic, since it introduces versioning and maintenance problems. - -The promise of object-oriented programming is that such modifications -should be unnecessary, because they can be avoided by re-using -existing, unmodified code through inheritance. Indeed, a more -object-oriented decomposition of our problem solves the problem. The -idea is to make the ``high-level'' operation \code{eval} a method of -each expression class, instead of implementing it as a function -outside the expression class hierarchy, as we have done -before. Because \code{eval} is now a member of all expression nodes, -all classification and access methods become superfluous, and the implementation is simplified considerably: -\begin{lstlisting} -trait Expr { - def eval: int; -} -class Number(n: int) extends Expr { - def eval: int = n; -} -class Sum(e1: Expr, e2: Expr) extends Expr { - def eval: int = e1.eval + e2.eval; -} -\end{lstlisting} -Furthermore, adding a new \code{Prod} class does not entail any changes to existing code: -\begin{lstlisting} -class Prod(e1: Expr, e2: Expr) extends Expr { - def eval: int = e1.eval * e2.eval; -} -\end{lstlisting} - -The conclusion we can draw from this example is that object-oriented -decomposition is the technique of choice for constructing systems that -should be extensible with new types of data. But there is also another -possible way we might want to extend the expression example. We might -want to add new {\em operations} on expressions. For instance, we might -want to add an operation that pretty-prints an expression tree to standard output. - -If we have defined all classification and access methods, such an -operation can easily be written as an external function. Here is an -implementation: -\begin{lstlisting} -def print(e: Expr): unit = - if (e.isNumber) System.out.print(e.numValue) - else if (e.isSum) { - System.out.print("("); - print(e.leftOp); - System.out.print("+"); - print(e.rightOp); - System.out.print(")"); - } else throw new Error("unrecognized expression kind"); -\end{lstlisting} -However, if we had opted for an object-oriented decomposition of -expressions, we would need to add a new \code{print} method -to each class: -\begin{lstlisting} -trait Expr { - def eval: int; - def print: unit; -} -class Number(n: int) extends Expr { - def eval: int = n; - def print: unit = System.out.print(n); -} -class Sum(e1: Expr, e2: Expr) extends Expr { - def eval: int = e1.eval + e2.eval; - def print: unit = { - System.out.print("("); - print(e1); - System.out.print("+"); - print(e2); - System.out.print(")"); -} -\end{lstlisting} -Hence, classical object-oriented decomposition requires modification -of all existing classes when a system is extended with new operations. - -As yet another way we might want to extend the interpreter, consider -expression simplification. For instance, we might want to write a -function which rewrites expressions of the form -\code{a * b + a * c} to \code{a * (b + c)}. This operation requires inspection of -more than a single node of the expression tree at the same -time. Hence, it cannot be implemented by a method in each expression -kind, unless that method can also inspect other nodes. So we are -forced to have classification and access methods in this case. This -seems to bring us back to square one, with all the problems of -verbosity and extensibility. - -Taking a closer look, one observers that the only purpose of the -classification and access functions is to {\em reverse} the data -construction process. They let us determine, first, which sub-class -of an abstract base class was used and, second, what were the -constructor arguments. Since this situation is quite common, Scala has -a way to automate it with case classes. - -\section{Case Classes and Case Objects} - -{\em Case classes} and {\em case objects} are defined like a normal -classes or objects, except that the definition is prefixed with the modifier -\code{case}. For instance, the definitions -\begin{lstlisting} -trait Expr; -case class Number(n: int) extends Expr; -case class Sum(e1: Expr, e2: Expr) extends Expr; -\end{lstlisting} -introduce \code{Number} and \code{Sum} as case classes. -The \code{case} modifier in front of a class or object -definition has the following effects. -\begin{enumerate} -\item Case classes implicitly come with a constructor function, with the same name as the class. In our example, the two functions -\begin{lstlisting} -def Number(n: int) = new Number(n); -def Sum(e1: Expr, e2: Expr) = new Sum(e1, e2); -\end{lstlisting} -would be added. Hence, one can now construct expression trees a bit more concisely, as in -\begin{lstlisting} -Sum(Sum(Number(1), Number(2)), Number(3)) -\end{lstlisting} -\item Case classes and case objects -implicity come with implementations of methods -\code{toString}, \code{equals} and \code{hashCode}, which override the -methods with the same name in class \code{AnyRef}. The implementation -of these methods takes in each case the structure of a member of a -case class into account. The \code{toString} method represents an -expression tree the way it was constructed. So, -\begin{lstlisting} -Sum(Sum(Number(1), Number(2)), Number(3)) -\end{lstlisting} -would be converted to exactly that string, whereas the default -implementation in class \code{AnyRef} would return a string consisting -of the outermost constructor name \code{Sum} and a number. The -\code{equals} methods treats two case members of a case class as equal -if they have been constructed with the same constructor and with -arguments which are themselves pairwise equal. This also affects the -implementation of \code{==} and \code{!=}, which are implemented in -terms of \code{equals} in Scala. So, -\begin{lstlisting} -Sum(Number(1), Number(2)) == Sum(Number(1), Number(2)) -\end{lstlisting} -will yield \code{true}. If \code{Sum} or \code{Number} were not case -classes, the same expression would be \code{false}, since the standard -implementation of \code{equals} in class \code{AnyRef} always treats -objects created by different constructor calls as being different. -The \code{hashCode} method follows the same principle as other two -methods. It computes a hash code from the case class constructor name -and the hash codes of the constructor arguments, instead of from the object's -address, which is what the as the default implementation of \code{hashCode} does. -\item -Case classes implicity come with nullary accessor methods which -retrieve the constructor arguments. -In our example, \code{Number} would obtain an accessor method -\begin{lstlisting} -def n: int -\end{lstlisting} -which returns the constructor parameter \code{n}, whereas \code{Sum} would obtain two accessor methods -\begin{lstlisting} -def e1: Expr, e2: Expr; -\end{lstlisting} -Hence, if for a value \code{s} of type \code{Sum}, say, one can now -write \code{s.e1}, to access the left operand. However, for a value -\code{e} of type \code{Expr}, the term \code{e.e1} would be illegal -since \code{e1} is defined in \code{Sum}; it is not a member of the -base class \code{Expr}. -So, how do we determine the constructor and access constructor -arguments for values whose static type is the base class \code{Expr}? -This is solved by the fourth and final particularity of case classes. -\item -Case classes allow the constructions of {\em patterns} which refer to -the case class constructor. -\end{enumerate} - -\section{Pattern Matching} - -Pattern matching is a generalization of C or Java's \code{switch} -statement to class hierarchies. Instead of a \code{switch} statement, -there is a standard method \code{match}, which is defined in Scala's -root class \code{Any}, and therefore is available for all objects. -The \code{match} method takes as argument a number of cases. -For instance, here is an implementation of \code{eval} using -pattern matching. -\begin{lstlisting} -def eval(e: Expr): int = e match { - case Number(x) => x - case Sum(l, r) => eval(l) + eval(r) -} -\end{lstlisting} -In this example, there are two cases. Each case associates a pattern -with an expression. Patterns are matched against the selector -values \code{e}. The first pattern in our example, -\code{Number(n)}, matches all values of the form \code{Number(v)}, -where \code{v} is an arbitrary value. In that case, the {\em pattern -variable} \code{n} is bound to the value \code{v}. Similarly, the -pattern \code{Sum(l, r)} matches all selector values of form -\code{Sum(v}$_1$\code{, v}$_2$\code{)} and binds the pattern variables -\code{l} and \code{r} -to \code{v}$_1$ and \code{v}$_2$, respectively. - -In general, patterns are built from -\begin{itemize} -\item Case class constructors, e.g. \code{Number}, \code{Sum}, whose arguments - are again patterns, -\item pattern variables, e.g. \code{n}, \code{e1}, \code{e2}, -\item the ``wildcard'' pattern \code{_}, -\item literals, e.g. \code{1}, \code{true}, "abc", -\item constant identifiers, e.g. \code{MAXINT}, \code{EmptySet}. -\end{itemize} -Pattern variables always start with a lower-case letter, so that they -can be distinguished from constant identifiers, which start with an -upper case letter. Each variable name may occur only once in a -pattern. For instance, \code{Sum(x, x)} would be illegal as a pattern, -since the pattern variable \code{x} occurs twice in it. - -\paragraph{Meaning of Pattern Matching} -A pattern matching expression -\begin{lstlisting} -e.match { case p$_1$ => e$_1$ ... case p$_n$ => e$_n$ } -\end{lstlisting} -matches the patterns $p_1 \commadots p_n$ in the order they -are written against the selector value \code{e}. -\begin{itemize} -\item -A constructor pattern $C(p_1 \commadots p_n)$ matches all values that -are of type \code{C} (or a subtype thereof) and that have been constructed with -\code{C}-arguments matching patterns $p_1 \commadots p_n$. -\item -A variable pattern \code{x} matches any value and binds the variable -name to that value. -\item -The wildcard pattern `\code{_}' matches any value but does not bind a name to that value. -\item A constant pattern \code{C} matches a value which is -equal (in terms of \code{==}) to \code{C}. -\end{itemize} -The pattern matching expression rewrites to the right-hand-side of the -first case whose pattern matches the selector value. References to -pattern variables are replaced by corresponding constructor arguments. -If none of the patterns matches, the pattern matching expression is -aborted with a \code{MatchError} exception. - -\example Our substitution model of program evaluation extends quite naturally to pattern matching, For instance, here is how \code{eval} applied to a simple expression is re-written: -\begin{lstlisting} - eval(Sum(Number(1), Number(2))) - --> $\mbox{\tab\tab\rm(by rewriting the application)}$ - - Sum(Number(1), Number(2)) match { - case Number(n) => n - case Sum(e1, e2) => eval(e1) + eval(e2) - } - --> $\mbox{\tab\tab\rm(by rewriting the pattern match)}$ - - eval(Number(1)) + eval(Number(2)) - --> $\mbox{\tab\tab\rm(by rewriting the first application)}$ - - Number(1) match { - case Number(n) => n - case Sum(e1, e2) => eval(e1) + eval(e2) - } + eval(Number(2)) - --> $\mbox{\tab\tab\rm(by rewriting the pattern match)}$ - - 1 + eval(Number(2)) - -->$^*$ 1 + 2 -> 3 -\end{lstlisting} - -\paragraph{Pattern Matching and Methods} -In the previous example, we have used pattern -matching in a function which was defined outside the class hierarchy -over which it matches. Of course, it is also possible to define a -pattern matching function in that class hierarchy itself. For -instance, we could have defined -\code{eval} is a method of the base class \code{Expr}, and still have used pattern matching in its implementation: -\begin{lstlisting} -trait Expr { - def eval: int = this match { - case Number(n) => n - case Sum(e1, e2) => e1.eval + e2.eval - } -} -\end{lstlisting} - -\begin{exercise} Consider the following definitions representing trees -of integers. These definitions can be seen as an alternative -representation of \code{IntSet}: -\begin{lstlisting} -trait IntTree; -case object EmptyTree extends IntTree; -case class Node(elem: int, left: IntTree, right: IntTree) extends IntTree; -\end{lstlisting} -Complete the following implementations of function \code{contains} and \code{insert} for -\code{IntTree}'s. -\begin{lstlisting} -def contains(t: IntTree, v: int): boolean = t match { ... - ... -} -def insert(t: IntTree, v: int): IntTree = t match { ... - ... -} -\end{lstlisting} -\end{exercise} - -\paragraph{Pattern Matching Anonymous Functions} - -So far, case-expressions always appeared in conjunction with a -\verb@match@ operation. But it is also possible to use -case-expressions by themselves. A block of case-expressions such as -\begin{lstlisting} -{ case $P_1$ => $E_1$ ... case $P_n$ => $E_n$ } -\end{lstlisting} -is seen by itself as a function which matches its arguments -against the patterns $P_1 \commadots P_n$, and produces the result of -one of $E_1 \commadots E_n$. (If no pattern matches, the function -would throw a \code{MatchError} exception instead). -In other words, the expression above is seen as a shorthand for the anonymous function -\begin{lstlisting} -(x => x match { case $P_1$ => $E_1$ ... case $P_n$ => $E_n$ }) -\end{lstlisting} -where \code{x} is a fresh variable which is not used -otherwise in the expression. - -\chapter{Generic Types and Methods} - -Classes in Scala can have type parameters. We demonstrate the use of -type parameters with functional stacks as an example. Say, we want to -write a data type of stacks of integers, with methods \code{push}, -\code{top}, \code{pop}, and \code{isEmpty}. This is achieved by the -following class hierarchy: -\begin{lstlisting} -trait IntStack { - def push(x: int): IntStack = new IntNonEmptyStack(x, this); - def isEmpty: boolean - def top: int; - def pop: IntStack; -} -class IntEmptyStack extends IntStack { - def isEmpty = true; - def top = throw new Error("EmptyStack.top"); - def pop = throw new Error("EmptyStack.pop"); -} -class IntNonEmptyStack(elem: int, rest: IntStack) { - def isEmpty = false; - def top = elem; - def pop = rest; -} -\end{lstlisting} -Of course, it would also make sense to define an abstraction for a -stack of Strings. To do that, one could take the existing abstraction -for \code{IntStack}, rename it to \code{StringStack} and at the same -time rename all occurrences of type \code{int} to \code{String}. - -A better way, which does not entail code duplication, is to -parameterize the stack definitions with the element type. -Parameterization lets us generalize from a specific instance of a -problem to a more general one. So far, we have used parameterization -only for values, but it is available also for types. To arrive at a -{\em generic} version of \code{Stack}, we equip it with a type -parameter. -\begin{lstlisting} -trait Stack[a] { - def push(x: a): Stack[a] = new NonEmptyStack[a](x, this); - def isEmpty: boolean - def top: a; - def pop: Stack[a]; -} -class EmptyStack[a] extends Stack[a] { - def isEmpty = true; - def top = throw new Error("EmptyStack.top"); - def pop = throw new Error("EmptyStack.pop"); -} -class NonEmptyStack[a](elem: a, rest: Stack[a]) extends Stack[a] { - def isEmpty = false; - def top = elem; - def pop = rest; -} -\end{lstlisting} -In the definitions above, `\code{a}' is a {\em type parameter} of -class \code{Stack} and its subclasses. Type parameters are arbitrary -names; they are enclosed in brackets instead of parentheses, so that -they can be easily distinguished from value parameters. Here is an -example how the generic classes are used: -\begin{lstlisting} -val x = new EmptyStack[int]; -val y = x.push(1).push(2); -System.out.println(y.pop.top); -\end{lstlisting} -The first line creates a new empty stack of \code{int}'s. Note the -actual type argument \code{[int]} which replaces the formal type -parameter \code{a}. - -It is also possible to parameterize methods with types. As an example, -here is a generic method which determines whether one stack is a -prefix of another. -\begin{lstlisting} -def isPrefix[a](p: Stack[a], s: Stack[a]): boolean = { - p.isEmpty || - p.top == s.top && isPrefix[a](p.pop, s.pop); -} -\end{lstlisting} -parameters are called {\em polymorphic}. Generic methods are also -called {\em polymorphic}. The term comes from the Greek, where it -means ``having many forms''. To apply a polymorphic method such as -\code{isPrefix}, we pass type parameters as well as value parameters -to it. For instance, -\begin{lstlisting} -val s1 = new EmptyStack[String].push("abc"); -val s2 = new EmptyStack[String].push("abx").push(s.pop) -System.out.println(isPrefix[String](s1, s2)); -\end{lstlisting} - -\paragraph{Local Type Inference} -Passing type parameters such as \code{[int]} or \code{[String]} all -the time can become tedious in applications where generic functions -are used a lot. Quite often, the information in a type parameter is -redundant, because the correct parameter type can also be determined -by inspecting the function's value parameters or expected result type. -Taking the expression \code{isPrefix[String](s1, s2)} as an -example, we know that its value parameters are both of type -\code{Stack[String]}, so we can deduce that the type parameter must -be \code{String}. Scala has a fairly powerful type inferencer which -allows one to omit type parameters to polymorphic functions and -constructors in situations like these. In the example above, one -could have written \code{isPrefix(s1, s2)} and the missing type argument -\code{[String]} would have been inserted by the type inferencer. - -\section{Type Parameter Bounds} - -Now that we know how to make classes generic it is natural to -generalize some of the earlier classes we have written. For instance -class \code{IntSet} could be generalized to sets with arbitrary -element types. Let's try. The trait for generic sets is easily -written. -\begin{lstlisting} -trait Set[a] { - def incl(x: a): Set[a]; - def contains(x: a): boolean; -} -\end{lstlisting} -However, if we still want to implement sets as binary search trees, we -encounter a problem. The \code{contains} and \code{incl} methods both -compare elements using methods \code{<} and \code{>}. For -\code{IntSet} this was OK, since type \code{int} has these two -methods. But for an arbitrary type parameter \code{a}, we cannot -guarantee this. Therefore, the previous implementation of, say, -\code{contains} would generate a compiler error. -\begin{lstlisting} - def contains(x: int): boolean = - if (x < elem) left contains x - ^ < $\mbox{\sl not a member of type}$ a. -\end{lstlisting} -One way to solve the problem is to restrict the legal types that can -be substituted for type \code{a} to only those types that contain methods -\code{<} and \code{>} of the correct types. There is a trait -\code{Ord[a]} in the standard class library Scala which represents -values which are comparable (via \code{<} and \code{>}) to values of -type \code{a}. We can enforce the comparability of a type by demanding -that the type is a subtype of \code{Ord}. This is done by giving an -upper bound to the type parameter of \code{Set}: -\begin{lstlisting} -trait Set[a <: Ord[a]] { - def incl(x: a): Set[a]; - def contains(x: a): boolean; -} -\end{lstlisting} -The parameter declaration \code{a <: Ord[a]} introduces \code{a} as a -type parameter which must be a subtype of \code{Ord[a]}, i.e.\ its values -must be comparable to values of the same type. - -With this restriction, we can now implement the rest of the generic -set abstraction as we did in the case of \code{IntSet}s before. - -\begin{lstlisting} -class EmptySet[a <: Ord[a]] extends Set[a] { - def contains(x: a): boolean = false; - def incl(x: a): Set[a] = new NonEmptySet(x, new EmptySet[a], new EmptySet[a]); -} -\end{lstlisting} - -\begin{lstlisting} -class NonEmptySet[a <: Ord[a]] - (elem:int, left: Set[a], right: Set[a]) extends Set[a] { - def contains(x: a): boolean = - if (x < elem) left contains x - else if (x > elem) right contains x - else true; - def incl(x: a): Set[a] = - if (x < elem) new NonEmptySet(elem, left incl x, right) - else if (x > elem) new NonEmptySet(elem, left, right incl x) - else this; -} -\end{lstlisting} -Note that we have left out the type argument in the object creations -\code{new NonEmptySet(...)}. In the same way as for polymorphic methods, -missing type arguments in constructor calls are inferred from value -arguments and/or the expected result type. - -Here is an example that uses the generic set abstraction. -\begin{lstlisting} -val s = new EmptySet[double].incl(1.0).incl(2.0); -s.contains(1.5) -\end{lstlisting} -This is OK, as type \code{double} implements trait \code{Ord[double]}. -However, the following example is in error. -\begin{lstlisting} -val s = new EmptySet[java.io.File] - ^ java.io.File $\mbox{\sl does not conform to type}$ - $\mbox{\sl parameter bound}$ Ord[java.io.File]. -\end{lstlisting} -To conclude the discussion of type parameter -bounds, here is the defintion of trait \code{Ord} in scala. -\begin{lstlisting} -package scala; -trait Ord[t <: Ord[t]]: t { - def < (that: t): Boolean; - def <=(that: t): Boolean = this < that || this == that; - def > (that: t): Boolean = that < this; - def >=(that: t): Boolean = that <= this; -} -\end{lstlisting} - -\section{Variance Annotations}\label{sec:first-arrays} - -The combination of type parameters and subtyping poses some -interesting questions. For instance, should \code{Stack[String]} be a -subtype of \code{Stack[AnyRef]}? Intuitively, this seems OK, since a -stack of \code{String}s is a special case of a stack of -\code{AnyRef}s. More generally, if \code{T} is a subtype of type \code{S} -then \code{Stack[T]} should be a subtype of \code{Stack[S]}. -This property is called {\em co-variant} subtyping. - -In Scala, generic types have by default non-variant subtyping. That -is, with \code{Stack} defined as above, stacks with different element -types would never be in a subtype relation. However, we can enforce -co-variant subtyping of stacks by changing the first line of the -definition of class \code{Stack} as follows. -\begin{lstlisting} -class Stack[+a] { -\end{lstlisting} -Prefixing a formal type parameter with a \code{+} indicates that -subtyping is covariant in that parameter. -Besides \code{+}, there is also a prefix \code{-} which indicates -contra-variant subtyping. If \code{Stack} was defined \code{class -Stack[-a] ...}, then \code{T} a subtype of type \code{S} would imply -that \code{Stack[S]} is a subtype of \code{Stack[T]} (which in the -case of stacks would be rather surprising!). - -In a purely functional world, all types could be co-variant. However, -the situation changes once we introduce mutable data. Consider the -case of arrays in Java or .NET. Such arrays are represented in Scala -by a generic class \code{Array}. Here is a partial definition of this -class. -\begin{lstlisting} -class Array[a] { - def apply(index: int): a - def update(index: int, elem: a): unit; -} -\end{lstlisting} -The class above defines the way Scala arrays are seen from Scala user -programs. The Scala compiler will map this abstraction to the -underlying arrays of the host system in most cases where this -possible. - -In Java, arrays are indeed covariant; that is, for reference types -\code{T} and \code{S}, if \code{T} is a subtype of \code{S}, then also -\code{Array[T]} is a subtype of \code{Array[S]}. This might seem -natural but leads to safety problems that require special runtime -checks. Here is an example: -\begin{lstlisting} -val x = new Array[String](1); -val y: Array[Any] = x; -y(0) = new Rational(1, 2); // this is syntactic sugar for - // y.update(0, new Rational(1, 2)); -\end{lstlisting} -In the first line, a new array of strings is created. In the second -line, this array is bound to a variable \code{y}, of type -\code{Array[Any]}. Assuming arrays are covariant, this is OK, since -\code{Array[String]} is a subtype of \code{Array[Any]}. Finally, in -the last line a rational number is stored in the array. This is also -OK, since type \code{Rational} is a subtype of the element type -\code{Any} of the array \code{y}. We thus end up storing a rational -number in an array of strings, which clearly violates type soundness. - -Java solves this problem by introducing a run-time check in the third -line which tests whether the stored element is compatible with the -element type with which the array was created. We have seen in the -example that this element type is not necessarily the static element -type of the array being updated. If the test fails, an -\code{ArrayStoreException} is raised. - -Scala solves this problem instead statically, by disallowing the -second line at compile-time, because arrays in Scala have non-variant -subtyping. This raises the question how a Scala compiler verifies that -variance annotations are correct. If we had simply declared arrays -co-variant, how would the potential problem have been detected? - -Scala uses a conservative approximation to verify soundness of -variance annotations. A covariant type parameter of a class may only -appear in co-variant positions inside the class. Among the co-variant -positions are the types of values in the class, the result types of -methods in the class, and type arguments to other covariant types. Not -co-variant are types of formal method parameters. Hence, the following -class definition would have been rejected -\begin{lstlisting} -class Array[+a] { - def apply(index: int): a; - def update(index: int, elem: a): unit; - ^ $\mbox{\sl covariant type parameter}$ a - $\mbox{\sl appears in contravariant position.}$ -} -\end{lstlisting} -So far, so good. Intuitively, the compiler was correect in rejecting -the \code{update} method in a co-variant class because \code{update} -potentially changes state, and therefore undermines the soundness of -co-variant subtyping. - -However, there are also methods which do not mutate state, but where a -type parameter still appears contra-variantly. An example is -\code{push} in type \code{Stack}. Again the Scala compiler will reject -the definition of this method for co-variant stacks. -\begin{lstlisting} -class Stack[+a] { - def push(x: a): Stack[a] = - ^ $\mbox{\sl covariant type parameter}$ a - $\mbox{\sl appears in contravariant position.}$ -\end{lstlisting} -This is a pity, because, unlike arrays, stacks are purely functional data -structures and therefore should enable co-variant subtyping. However, -there is a a way to solve the problem by using a polymorphic method -with a lower type parameter bound. - -\section{Lower Bounds} - -We have seen upper bounds for type parameters. In a type parameter -declaration such as \code{t <: U}, the type parameter \code{t} is -restricted to range only over subtypes of type \code{U}. Symmetrical -to this are lower bounds in Scala. In a type parameter declaration -\code{t >: L}, the type parameter \code{t} is restricted to range only -over {\em supertypes} of type \code{L}. (One can also combine lower and -upper bounds, as in \code{t >: L <: U}.) - -Using lower bounds, we can generalize the \code{push} method in -\code{Stack} as follows. -\begin{lstlisting} -class Stack[+a] { - def push[b >: a](x: b): Stack[b] = new NonEmptyStack(x, this); -\end{lstlisting} -Technically, this solves our variance problem since now the type -parameter \code{a} appears no longer as a parameter type of method -\code{push}. Instead, it appears as lower bound for another type -parameter of a method, which is classified as a co-variant position. -Hence, the Scala compiler accepts the new definition of \code{push}. - -In fact, we have not only solved the technical variance problem but -also have generalized the definition of \code{push}. Before, we were -required to push only elements with types that conform to the declared -element type of the stack. Now, we can push also elements of a -supertype of this type, but the type of the returned stack will change -accordingly. For instance, we can now push an \code{AnyRef} onto a -stack of \code{String}s, but the resulting stack will be a stack of -\code{AnyRef}s instead of a stack of \code{String}s! - -In summary, one should not hesitate to add variance annotations to -your data structures, as this yields rich natural subtyping -relationships. The compiler will detect potential soundness -problems. Even if the compiler's approximation is too conservative, as -in the case of method \code{push} of class \code{Stack}, this will -often suggest a useful generalization of the contested method. - -\section{Least Types} - -Scala does not allow one to parameterize objects with types. That's -why we orginally defined a generic class \code{EmptyStack[a]}, even -though a single value denoting empty stacks of arbitrary type would -do. For co-variant stacks, however, one can use the following idiom: -\begin{lstlisting} -object EmptyStack extends Stack[All] { ... } -\end{lstlisting} -The identifier \code{All} refers to the bottom type \code{scala.All}, -which is a subtype of all other types. Hence, for co-variant stacks, -\code{Stack[All]} is a subtype of \code{Stack[T]}, for any other type -\code{T}. This makes it possible to use a single empty stack object -in user code. For instance: -\begin{lstlisting} -val s = EmptyStack.push("abc").push(new AnyRef()); -\end{lstlisting} -Let's analyze the type assignment for this expression in detail. The -\code{EmptyStack} object is of type \code{Stack[All]}, which has a -method -\begin{lstlisting} -push[b >: All](elem: b): Stack[b] . -\end{lstlisting} -Local type inference will determine that the type parameter \code{b} -should be instantiated to \code{String} in the application -\code{EmptyStack.push("abc")}. The result type of that application is hence -\code{Stack[String]}, which in turn has a method -\begin{lstlisting} -push[b >: String](elem: b): Stack[b] . -\end{lstlisting} -The final part of the value definition above is the application of -this method to \code{new AnyRef()}. Local type inference will -determine that the type parameter \code{b} should this time be -instantiated to \code{AnyRef}, with result type \code{Stack[AnyRef]}. -Hence, the type assigned to value \code{s} is \code{Stack[AnyRef]}. - -Besides \code{scala.All}, which is a subtype of every other type, -there is also the type \code{scala.AllRef}, which is a subtype of -\code{scala.AnyRef}, and every type derived from it. The \code{null} -literal in Scala is of that type. This makes \code{null} compatible -with every reference type, but not with a value type such as -\code{int}. - -We conclude this section with the complete improved definition of -stacks. Stacks have now co-variant subtyping, the \code{push} method -has been generalized, and the empty stack is represented by a single -object. -\begin{lstlisting} -trait Stack[+a] { - def push[b >: a](x: b): Stack[b] = new NonEmptyStack(x, this); - def isEmpty: boolean - def top: a; - def pop: Stack[a]; -} -object EmptyStack extends Stack[All] { - def isEmpty = true; - def top = throw new Error("EmptyStack.top"); - def pop = throw new Error("EmptyStack.pop"); -} -class NonEmptyStack[{a](elem: a, rest: Stack[a]) extends Stack[a] { - def isEmpty = false; - def top = elem; - def pop = rest; -} -\end{lstlisting} -Many classes in the Scala library are generic. We now present two -commonly used families of generic classes, tuples and functions. The -discussion of another common class, lists, is deferred to the next -chapter. - -\section{Tuples} - -Sometimes, a function needs to return more than one result. For -instance, take the function \code{divmod} which returns the integer quotient -and rest of two given integer arguments. Of course, one can define a -class to hold the two results of \code{divmod}, as in: -\begin{lstlisting} -case class TwoInts(first: int, second: int); -def divmod(x: int, y: int): TwoInts = new TwoInts(x / y, x % y) -\end{lstlisting} -However, having to define a new class for every possible pair of -result types is very tedious. In Scala one can use instead a -the generic classes \lstinline@Tuple$n$@, for each $n$ between -2 and 9. As an example, here is the definition of Tuple2. -\begin{lstlisting} -package scala; -case class Tuple2[a, b](_1: a, _2: b); -\end{lstlisting} -With \code{Tuple2}, the \code{divmod} method can be written as follows. -\begin{lstlisting} -def divmod(x: int, y: int) = new Tuple2[int, int](x / y, x % y) -\end{lstlisting} -As usual, type parameters to constructors can be omitted if they are -deducible from value arguments. Also, Scala defines an alias -\code{Pair} for \code{Tuple2} (as well as \code{Triple} for \code{Tuple3}). -With these conventions, \code{divmod} can equivalently be written as -follows. -\begin{lstlisting} -def divmod(x: int, y: int) = Pair(x / y, x % y) -\end{lstlisting} -How are elements of tuples acessed? Since tuples are case classes, -there are two possibilities. One can either access a tuple's fields -using the names of the constructor parameters \lstinline@_$i$@, as in the following example: -\begin{lstlisting} -val xy = divmod(x, y); -System.out.println("quotient: " + x._1 + ", rest: " + x._2); -\end{lstlisting} -Or one uses pattern matching on tuples, as in the following erample: -\begin{lstlisting} -divmod(x, y) match { - case Pair(n, d) => - System.out.println("quotient: " + n + ", rest: " + d); -} -\end{lstlisting} -Note that type parameters are never used in patterns; it would have -been illegal to write case \code{Pair[int, int](n, d)}. - -\section{Functions}\label{sec:functions} - -Scala is a functional language in that functions are first-class -values. Scala is also an object-oriented language in that every value -is an object. It follows that functions are objects in Scala. For -instance, a function from type \code{String} to type \code{int} is -represented as an instance of the trait \code{Function1[String, int]}. -The \code{Function1} trait is defined as follows. -\begin{lstlisting} -package scala; -trait Function1[-a, +b] { - def apply(x: a): b -} -\end{lstlisting} -Besides \code{Function1}, there are also definitions of -\code{Function0} and \code{Function2} up to \code{Function9} in the -standard Scala library. That is, there is one definition for each -possible number of function parameters between 0 and 9. Scala's -function type syntax ~\lstinline@$T_1 \commadots T_n$ => $S$@~ is -simply an abbreviation for the parameterized type -~\lstinline@Function$n$[$T_1 \commadots T_n, S$]@~. - -Scala uses the same syntax $f(x)$ for function application, no matter -whether $f$ is a method or a function object. This is made possible by -the following convention: A function application $f(x)$ where $f$ is -an object (as opposed to a method) is taken to be a shorthand for -\lstinline@$f$.apply($x$)@. Hence, the \code{apply} method of a -function type is inserted automatically where this is necessary. - -That's also why we defined array subscripting in -Section~\ref{sec:first-arrays} by an \code{apply} method. For any -array \code{a}, the subscript operation \code{a(i)} is taken to be a -shorthand for \code{a.apply(i)}. - -Functions are an example where a contra-variant type parameter -declaration is useful. For example, consider the following code: -\begin{lstlisting} -val f: (AnyRef => int) = x => x.hashCode(); -val g: (String => int) = f -g("abc") -\end{lstlisting} -It's sound to bind the value \code{g} of type \code{String => int} to -\code{f}, which is of type \code{AnyRef => int}. Indeed, all one can -do with function of type \code{String => int} is pass it a string in -order to obtain an integer. Clearly, the same works for function -\code{f}: If we pass it a string (or any other object), we obtain an -integer. This demonstrates that function subtyping is contra-variant -in its argument type whereas it is covariant in its result type. -In short, $S \Rightarrow T$ is a subtype of $S' \Rightarrow T'$, provided -$S'$ is a subtype of $S$ and $T$ is a subtype of $T'$. - -\example Consider the Scala code -\begin{lstlisting} -val plus1: (int => int) = (x: int) => x + 1; -plus1(2) -\end{lstlisting} -This is expanded into the following object code. -\begin{lstlisting} -val plus1: Function1[int, int] = new Function1[int, int] { - def apply(x: int): int = x + 1 -} -plus1.apply(2) -\end{lstlisting} -Here, the object creation \lstinline@new Function1[int, int]{ ... }@ -represents an instance of an {\em anonymous class}. It combines the -creation of a new \code{Function1} object with an implementation of -the \code{apply} method (which is abstract in \code{Function1}). -Equivalently, but more verbosely, one could have used a local class: -\begin{lstlisting} -val plus1: Function1[int, int] = { - class Local extends Function1[int, int] { - def apply(x: int): int = x + 1 - } - new Local: Function1[int, int] -} -plus1.apply(2) -\end{lstlisting} - -\chapter{Lists} - -Lists are an important data structure in many Scala programs. -A list containing the elements \code{x}$_1$, \ldots, \code{x}$_n$ is written -\code{List(x}$_1$\code{, ..., x}$_n$\code{)}. Examples are: -\begin{lstlisting} -val fruit = List("apples", "oranges", "pears"); -val nums = List(1, 2, 3, 4); -val diag3 = List(List(1, 0, 0), List(0, 1, 0)); -val empty = List(); -\end{lstlisting} -Lists are similar to arrays in languages such as C or Java, but there -are also three important differences. First, lists are immutable. That -is, elements of a list cannot be changed by assignment. Second, -lists have a recursive structure, whereas arrays are flat. Third, -lists support a much richer set of operations than arrays usually do. - -\section{Using Lists} - -\paragraph{The List type} -Like arrays, lists are {\em homogeneous}. That is, the elements of a -list all have the same type. The type of a list with elements of type -\code{T} is written \code{List[T]} (compare to \code{T[]} in Java). -\begin{lstlisting} -val fruit: List[String] = List("apples", "oranges", "pears"); -val nums : List[int] = List(1, 2, 3, 4); -val diag3: List[List[int]] = List(List(1, 0, 0), List(0, 1, 0)); -val empty: List[int] = List(); -\end{lstlisting} - -\paragraph{List constructors} -All lists are built from two more fundamental constructors, \code{Nil} -and \code{::} (pronounced ``cons''). \code{Nil} represents an empty -list. The infix operator \code{::} expresses list extension. That is, -\code{x :: xs} represents a list whose first element is \code{x}, -which is followed by (the elements of) list \code{xs}. Hence, the -list values above could also have been defined as follows (in fact -their previous definition is simply syntactic sugar for the definitions below). -\begin{lstlisting} -val fruit = "apples" :: ("oranges" :: ("pears" :: Nil)); -val nums = 1 :: (2 :: (3 :: (4 :: Nil))); -val diag3 = (1 :: (0 :: (0 :: Nil))) :: - (0 :: (1 :: (0 :: Nil))) :: - (0 :: (0 :: (1 :: Nil))) :: Nil; -val empty = Nil; -\end{lstlisting} -The `\code{::}' operation associates to the right: \code{A :: B :: C} is -interpreted as \code{A :: (B :: C)}. Therefore, we can drop the -parentheses in the definitions above. For instance, we can write -shorter -\begin{lstlisting} -val nums = 1 :: 2 :: 3 :: 4 :: Nil; -\end{lstlisting} - -\paragraph{Basic operations on lists} -All operations on lists can be expressed in terms of the following three: - -\begin{tabular}{ll} -\code{head} & returns the first element of a list,\\ -\code{tail} & returns the list consisting of all elements except the\\ -& first element,\\ -\code{isEmpty} & returns \code{true} iff the list is empty -\end{tabular} - -These operations are defined as methods of list objects. So we invoke -them by selecting from the list that's operated on. Examples: -\begin{lstlisting} -empty.isEmpty = true -fruit.isEmpty = false -fruit.head = "apples" -fruit.tail.head = "oranges" -diag3.head = List(1, 0, 0) -\end{lstlisting} -The \code{head} and \code{tail} methods are defined only for non-empty -lists. When selected from an empty list, they throw an exception. - -As an example of how lists can be processed, consider sorting the -elements of a list of numbers into ascending order. One simple way to -do so is {\em insertion sort}, which works as follows: To sort a -non-empty list with first element \code{x} and rest \code{xs}, sort -the remainder \code{xs} and insert the element \code{x} at the right -position in the result. Sorting an empty list will yield the -empty list. Expressed as Scala code: -\begin{lstlisting} -def isort(xs: List[int]): List[int] = - if (xs.isEmpty) Nil - else insert(xs.head, isort(xs.tail)) -\end{lstlisting} - -\begin{exercise} Provide an implementation of the missing function -\code{insert}. -\end{exercise} - -\paragraph{List patterns} In fact, \code{::} is defined as a case -class in Scala's standard library. Hence, it is possible to decompose -lists by pattern matching, using patterns composed from the \code{Nil} -and \code{::} constructors. For instance, \code{isort} can be written -alternatively as follows. -\begin{lstlisting} -def isort(xs: List[int]): List[int] = xs match { - case List() => List() - case x :: xs1 => insert(x, isort(xs1)) -} -\end{lstlisting} -where -\begin{lstlisting} -def insert(x: int, xs: List[int]): List[int] = xs match { - case List() => List(x) - case y :: ys => if (x <= y) x :: xs else y :: insert(x, ys) -} -\end{lstlisting} - -\section{Definition of class List I: First Order Methods} -\label{sec:list-first-order} - -Lists are not built in in Scala; they are defined by an abstract class -\code{List}, which comes with two subclasses for \code{::} and \code{Nil}. -In the following we present a tour through class \code{List}. -\begin{lstlisting} -package scala; -abstract class List[+a] { -\end{lstlisting} -\code{List} is an abstract class, so one cannot define elements by -calling the empty \code{List} constructor (e.g. by -\code{new List}). The class has a type parameter \code{a}. It is -co-variant in this parameter, which means that -\code{List[S] <: List[T]} for all types \code{S} and \code{T} such that -\code{S <: T}. The class is situated in the package -\code{scala}. This is a package containing the most important standard -classes of Scala. - \code{List} defines a number of methods, which are -explained in the following. - -\paragraph{Decomposing lists} -First, there are the three basic methods \code{isEmpty}, -\code{head}, \code{tail}. Their implementation in terms of pattern -matching is straightforward: -\begin{lstlisting} -def isEmpty: boolean = match { - case Nil => true - case x :: xs => false -} -def head: a = match { - case Nil => throw new Error("Nil.head") - case x :: xs => x -} -def tail: List[a] = match { - case Nil => throw new Error("Nil.tail") - case x :: xs => x -} -\end{lstlisting} - -The next function computes the length of a list. -\begin{lstlisting} -def length = match { - case Nil => 0 - case x :: xs => 1 + xs.length -} -\end{lstlisting} -\begin{exercise} Design a tail-recursive version of \code{length}. -\end{exercise} - -The next two functions are the complements of \code{head} and -\code{tail}. -\begin{lstlisting} -def last: a; -def init: List[a]; -\end{lstlisting} -\code{xs.last} returns the last element of list \code{xs}, whereas -\code{xs.init} returns all elements of \code{xs} except the last. -Both functions have to traverse the entire list, and are thus less -efficient than their \code{head} and \code{tail} analogues. -Here is the implementation of \code{last}. -\begin{lstlisting} -def last: a = match { - case Nil => throw new Error("Nil.last") - case x :: Nil => x - case x :: xs => xs.last -} -\end{lstlisting} -The implementation of \code{init} is analogous. - -The next three functions return a prefix of the list, or a suffix, or -both. -\begin{lstlisting} -def take(n: int): List[a] = - if (n == 0 || isEmpty) Nil else head :: tail.take(n-1); - -def drop(n: int): List[a] = - if (n == 0 || isEmpty) this else tail.drop(n-1); - -def split(n: int): Pair[List[a], List[a]] = Pair(take(n), drop(n)) -\end{lstlisting} -\code{(xs take n)} returns the first \code{n} elements of list -\code{xs}, or the whole list, if its length is smaller than \code{n}. -\code{(xs drop n)} returns all elements of \code{xs} except the -\code{n} first ones. Finally, \code{(xs split n)} returns a pair -consisting of the lists resulting from \code{xs take n} and -\code{xs drop n}. - -The next function returns an element at a given index in a list. -It is thus analogous to array subscripting. Indices start at 0. -\begin{lstlisting} -def apply(n: int): a = drop(n).head; -\end{lstlisting} -The \code{apply} method has a special meaning in Scala. An object with -an \code{apply} method can be applied to arguments as if it was a -function. For instance, to pick the 3'rd element of a list \code{xs}, -one can write either \code{xs.apply(3)} or \code{xs(3)} -- the latter -expression expands into the first. - -With \code{take} and \code{drop}, we can extract sublists consisting -of consecutive elements of the original list. To extract the sublist -$xs_m \commadots xs_{n-1}$ of a list \code{xs}, use: - -\begin{lstlisting} -xs.drop(m).take(n - m) -\end{lstlisting} - -\paragraph{Zipping lists} The next function combines two lists into a list of pairs. -Given two lists -\begin{lstlisting} -xs = List(x$_1$, ..., x$_n$) $\mbox{\rm, and}$ -ys = List(y$_1$, ..., y$_n$) , -\end{lstlisting} -\code{xs zip ys} constructs the list -\code{List(Pair(x}$_1$\code{, y}$_1$\code{), ..., Pair(x}$_n$\code{, y}$_n$\code{))}. -If the two lists have different lengths, the longer one of the two is -truncated. Here is the definition of \code{zip} -- note that it is a -polymorphic method. -\begin{lstlisting} -def zip[b](that: List[b]): List[Pair[a,b]] = - if (this.isEmpty || that.isEmpty) Nil - else Pair(this.head, that.head) :: (this.tail zip that.tail); -\end{lstlisting} - -\paragraph{Consing lists.} -Like any infix operator, \code{::} -is also implemented as a method of an object. In this case, the object -is the list that is extended. This is possible, because operators -ending with a `\code{:}' character are treated specially in Scala. -All such operators are treated as methods of their right operand. E.g., -\begin{lstlisting} - x :: y = y.::(x) $\mbox{\rm whereas}$ x + y = x.+(y) -\end{lstlisting} -Note, however, that operands of a binary operation are in each case -evaluated from left to right. So, if \code{D} and \code{E} are -expressions with possible side-effects, \code{D :: E} is translated to -\lstinline@{val x = D; E.::(x)}@ in order to maintain the left-to-right -order of operand evaluation. - -Another difference between operators ending in a `\code{:}' and other -operators concerns their associativity. Operators ending in -`\code{:}' are right-associative, whereas other operators are -left-associative. E.g., -\begin{lstlisting} - x :: y :: z = x :: (y :: z) $\mbox{\rm whereas}$ x + y + z = (x + y) + z -\end{lstlisting} -The definition of \code{::} as a method in -class \code{List} is as follows: -\begin{lstlisting} -def ::[b >: a](x: b): List[b] = new scala.::(x, this); -\end{lstlisting} -Note that \code{::} is defined for all elements \code{x} of type -\code{B} and lists of type \code{List[A]} such that the type \code{B} -of \code{x} is a supertype of the list's element type \code{A}. The result -is in this case a list of \code{B}'s. This -is expressed by the type parameter \code{b} with lower bound \code{a} -in the signature of \code{::}. - -\paragraph{Concatenating lists} -An operation similar to \code{::} is list concatenation, written -`\code{:::}'. The result of \code{(xs ::: ys)} is a list consisting of -all elements of \code{xs}, followed by all elements of \code{ys}. -Because it ends in a colon, \code{:::} is right-associative and is -considered as a method of its right-hand operand. Therefore, -\begin{lstlisting} -xs ::: ys ::: zs = xs ::: (ys ::: zs) - = zs.:::(ys).:::(xs) -\end{lstlisting} -Here is the implementation of the \code{:::} method: -\begin{lstlisting} - def :::[b >: a](prefix: List[b]): List[b] = prefix match { - case Nil => this - case p :: ps => this.:::(ps).::(p) - } -\end{lstlisting} - -\paragraph{Reversing lists} Another useful operation -is list reversal. There is a method \code{reverse} in \code{List} to -that effect. Let's try to give its implementation: -\begin{lstlisting} -def reverse[a](xs: List[a]): List[a] = xs match { - case Nil => Nil - case x :: xs => reverse(xs) ::: List(x) -} -\end{lstlisting} -This implementation has the advantage of being simple, but it is not -very efficient. Indeed, one concatenation is executed for every -element in the list. List concatenation takes time proportional to the -length of its first operand. Therefore, the complexity of -\code{reverse(xs)} is -\[ -n + (n - 1) + ... + 1 = n(n+1)/2 -\] -where $n$ is the length of \code{xs}. Can \code{reverse} be -implemented more efficiently? We will see later that there exists -another implementation which has only linear complexity. - -\section{Example: Merge sort} - -The insertion sort presented earlier in this chapter is simple to -formulate, but also not very efficient. It's average complexity is -proportional to the square of the length of the input list. We now -design a program to sort the elements of a list which is more -efficient than insertion sort. A good algorithm for this is {\em merge -sort}, which works as follows. - -First, if the list has zero or one elements, it is already sorted, so -one returns the list unchanged. Longer lists are split into two -sub-lists, each containing about half the elements of the original -list. Each sub-list is sorted by a recursive call to the sort -function, and the resulting two sorted lists are then combined in a -merge operation. - -For a general implementation of merge sort, we still have to specify -the type of list elements to be sorted, as well as the function to be -used for the comparison of elements. We obtain a function of maximal -generality by passing these two items as parameters. This leads to the -following implementation. -\begin{lstlisting} -def msort[a](less: (a, a) => boolean)(xs: List[a]): List[a] = { - def merge(xs1: List[a], xs2: List[a]): List[a] = - if (xs1.isEmpty) xs2 - else if (xs2.isEmpty) xs1 - else if (less(xs1.head, xs2.head)) xs1.head :: merge(xs1.tail, xs2) - else xs2.head :: merge(xs1, xs2.tail); - val n = xs.length/2; - if (n == 0) xs - else merge(msort(less)(xs take n), msort(less)(xs drop n)) -} -\end{lstlisting} -The complexity of \code{msort} is $O(N;log(N))$, where $N$ is the -length of the input list. To see why, note that splitting a list in -two and merging two sorted lists each take time proportional to the -length of the argument list(s). Each recursive call of \code{msort} -halves the number of elements in its input, so there are $O(log(N))$ -consecutive recursive calls until the base case of lists of length 1 -is reached. However, for longer lists each call spawns off two -further calls. Adding everything up we obtain that at each of the -$O(log(N))$ call levels, every element of the original lists takes -part in one split operation and in one merge operation. Hence, every -call level has a total cost proportional to $O(N)$. Since there are -$O(log(N))$ call levels, we obtain an overall cost of -$O(N;log(N))$. That cost does not depend on the initial distribution -of elements in the list, so the worst case cost is the same as the -average case cost. This makes merge sort an attractive algorithm for -sorting lists. - -Here is an example how \code{msort} is used. -\begin{lstlisting} -msort(x: int, y: int => x < y)(List(5, 7, 1, 3)) -\end{lstlisting} -The definition of \code{msort} is curried, to make it easy to specialize it with particular -comparison functions. For instance, -\begin{lstlisting} - -val intSort = msort(x: int, y: int => x < y) -val reverseSort = msort(x: int, y: int => x > y) -\end{lstlisting} - -\section{Definition of class List II: Higher-Order Methods} - -The examples encountered so far show that functions over lists often -have similar structures. We can identify several patterns of -computation over lists, like: -\begin{itemize} - \item transforming every element of a list in some way. - \item extracting from a list all elements satisfying a criterion. - \item combine the elements of a list using some operator. -\end{itemize} -Functional programming languages enable programmers to write eneral -functions which implement patterns like this by means of higher order -functions. We now discuss a set of commonly used higher-order -functions, which are implemented as methods in class \code{List}. - -\paragraph{Mapping over lists} -A common operation is to transform each element of a list and then -return the lists of results. For instance, to scale each element of a -list by a given factor. -\begin{lstlisting} -def scaleList(xs: List[double], factor: double): List[double] = xs match { - case Nil => xs - case x :: xs1 => x * factor :: scaleList(xs1, factor) -} -\end{lstlisting} -This pattern can be generalized to the \code{map} method of class \code{List}: -\begin{lstlisting} -abstract class List[a] { ... - def map[b](f: a => b): List[b] = this match { - case Nil => this - case x :: xs => f(x) :: xs.map(f) - } -\end{lstlisting} -Using \code{map}, \code{scaleList} can be more consisely written as follows. -\begin{lstlisting} -def scaleList(xs: List[double], factor: double) = - xs map (x => x * factor) -\end{lstlisting} - -As another example, consider the problem of returning a given column -of a matrix which is represented as a list of rows, where each row is -again a list. This is done by the following function \code{column}. - -\begin{lstlisting} -def column[a](xs: List[List[a[]], index: int): List[a] = - xs map (row => row at index) -\end{lstlisting} - -Closely related to \code{map} is the \code{foreach} method, which -applies a given function to all elements of a list, but does not -construct a list of results. The function is thus applied only for its -side effect. \code{foreach} is defined as follows. -\begin{lstlisting} - def foreach(f: a => unit): unit = this match { - case Nil => () - case x :: xs => f(x) ; xs.foreach(f) - } -\end{lstlisting} -This function can be used for printing all elements of a list, for instance: -\begin{lstlisting} - xs foreach (x => System.out.println(x)) -\end{lstlisting} - -\begin{exercise} Consider a function which squares all elements of a list and -returns a list with the results. Complete the following two equivalent -definitions of \code{squareList}. - -\begin{lstlisting} -def squareList(xs: List[int]): List[int] = xs match { - case List() => ?? - case y :: ys => ?? -} -def squareList(xs: List[int]): List[int] = - xs map ?? -\end{lstlisting} -\end{exercise} - -\paragraph{Filtering Lists} -Another common operation selects from a list all elements fulfilling a -given criterion. For instance, to return a list of all positive -elements in some given lists of integers: -\begin{lstlisting} -def posElems(xs: List[int]): List[int] = xs match { - case Nil => xs - case x :: xs1 => if (x > 0) x :: posElems(xs1) else posElems(xs1) -} -\end{lstlisting} -This pattern is generalized to the \code{filter} method of class \code{List}: -\begin{lstlisting} - def filter(p: a => boolean): List[a] = this match { - case Nil => this - case x :: xs => if (p(x)) x :: xs.filter(p) else xs.filter(p) - } -\end{lstlisting} -Using \code{filter}, \code{posElems} can be more consisely written as -follows. -\begin{lstlisting} -def posElems(xs: List[int]): List[int] = - xs filter (x => x > 0) -\end{lstlisting} - -An operation related to filtering is testing whether all elements of a -list satisfy a certain condition. Dually, one might also be interested -in the question whether there exists an element in a list that -satisfies a certain condition. These operations are embodied in the -higher-order functions \code{forall} and \code{exists} of class -\code{List}. -\begin{lstlisting} -def forall(p: a => Boolean): Boolean = - isEmpty || (p(head) && (tail forall p)); -def exists(p: a => Boolean): Boolean = - !isEmpty && (p(head) || (tail exists p)); -\end{lstlisting} -To illustrate the use of \code{forall}, consider the question whether -a number if prime. Remember that a number $n$ is prime of it can be -divided without remainder only by one and itself. The most direct -translation of this definition would test that $n$ divided by all -numbers from 2 upto and excluding itself gives a non-zero -remainder. This list of numbers can be generated using a function -\code{List.range} which is defined in object \code{List} as follows. -\begin{lstlisting} -package scala; -object List { ... - def range(from: int, end: int): List[int] = - if (from >= end) Nil else from :: range(from + 1, end); -\end{lstlisting} -For example, \code{List.range(2, n)} -generates the list of all integers from 2 upto and excluding $n$. -The function \code{isPrime} can now simply be defined as follows. -\begin{lstlisting} -def isPrime(n: int) = - List.range(2, n) forall (x => n % x != 0) -\end{lstlisting} -We see that the mathematical definition of prime-ness has been -translated directly into Scala code. - -Exercise: Define \code{forall} and \code{exists} in terms of \code{filter}. - - -\paragraph{Folding and Reducing Lists} -Another common operation is to combine the elements of a list with -some operator. For instance: -\begin{lstlisting} -sum(List(x$_1$, ..., x$_n$)) = 0 + x$_1$ + ... + x$_n$ -product(List(x$_1$, ..., x$_n$)) = 1 * x$_1$ * ... * x$_n$ -\end{lstlisting} -Of course, we can implement both functions with a -recursive scheme: -\begin{lstlisting} -def sum(xs: List[int]): int = xs match { - case Nil => 0 - case y :: ys => y + sum(ys) -} -def product(xs: List[int]): int = xs match { - case Nil => 1 - case y :: ys => y * product(ys) -} -\end{lstlisting} -But we can also use the generaliztion of this program scheme embodied -in the \code{reduceLeft} method of class \code{List}. This method -inserts a given binary operator between adjacent elements of a given list. -E.g.\ -\begin{lstlisting} -List(x$_1$, ..., x$_n$).reduceLeft(op) = (...(x$_1$ op x$_2$) op ... ) op x$_n$ -\end{lstlisting} -Using \code{reduceLeft}, we can make the common pattern -in \code{sum} and \code{product} apparent: -\begin{lstlisting} -def sum(xs: List[int]) = (0 :: xs) reduceLeft {(x, y) => x + y} -def product(xs: List[int]) = (1 :: xs) reduceLeft {(x, y) => x * y} -\end{lstlisting} -Here is the implementation of \code{reduceLeft}. -\begin{lstlisting} - def reduceLeft(op: (a, a) => a): a = this match { - case Nil => error("Nil.reduceLeft") - case x :: xs => (xs foldLeft x)(op) - } - def foldLeft[b](z: b)(op: (b, a) => b): b = this match { - case Nil => z - case x :: xs => (xs foldLeft op(z, x))(op) - } -} -\end{lstlisting} -We see that the \code{reduceLeft} method is defined in terms of -another generally useful method, \code{foldLeft}. The latter takes as -additional parameter an {\em accumulator} \code{z}, which is returned -when \code{foldLeft} is applied on an empty list. That is, -\begin{lstlisting} -(List(x$_1$, ..., x$_n$) foldLeft z)(op) = (...(z op x$_1$) op ... ) op x$_n$ -\end{lstlisting} -The \code{sum} and \code{product} methods can be defined alternatively -using \code{foldLeft}: -\begin{lstlisting} -def sum(xs: List[int]) = (xs foldLeft 0) {(x, y) => x + y} -def product(xs: List[int]) = (xs foldLeft 1) {(x, y) => x * y} -\end{lstlisting} - -\paragraph{FoldRight and ReduceRight} -Applications of \code{foldLeft} and \code{reduceLeft} expand to -left-leaning trees. \todo{insert pictures}. They have duals -\code{foldRight} and \code{reduceRight}, which produce right-leaning -trees. -\begin{lstlisting} -List(x$_1$, ..., x$_n$).reduceRight(op) = x$_1$ op ( ... (x$_{n-1}$ op x$_n$)...) -(List(x$_1$, ..., x$_n$) foldRight acc)(op) = x$_1$ op ( ... (x$_n$ op acc)...) -\end{lstlisting} -These are defined as follows. -\begin{lstlisting} - def reduceRight(op: (a, a) => a): a = match - case Nil => error("Nil.reduceRight") - case x :: Nil => x - case x :: xs => op(x, xs.reduceRight(op)) - } - def foldRight[b](z: b)(op: (a, b) => b): b = match { - case Nil => z - case x :: xs => op(x, (xs foldRight z)(op)) - } -\end{lstlisting} - -Class \code{List} defines also two symbolic abbreviations for -\code{foldLeft} and \code{foldRight}: -\begin{lstlisting} - def /:[b](z: b)(f: (b, a) => b): b = foldLeft(z)(f); - def :\[b](z: b)(f: (a, b) => b): b = foldRight(z)(f); -\end{lstlisting} -The method names picture the left/right leaning trees of the fold -operations by forward or backward slashes. The \code{:} points in each -case to the list argument whereas the end of the slash points to the -accumulator (or: zero) argument \code{z}. -That is, -\begin{lstlisting} -(z /: List(x$_1$, ..., x$_n$))(op) = (...(z op x$_1$) op ... ) op x$_n$ -(List(x$_1$, ..., x$_n$) :\ z)(op) = x$_1$ op ( ... (x$_n$ op acc)...) -\end{lstlisting} -For associative and commutative operators, \code{/:} and -\code{:\\} are equivalent (even though there may be a difference -in efficiency). But sometimes, only one of the two operators is -appropriate or has the right type: - -\begin{exercise} Consider the problem of writing a function \code{flatten}, -which takes a list of element lists as arguments. The result of -\code{flatten} should be the concatenation of all element lists into a -single list. Here is the an implementation of this method in terms of -\code{:\\}. -\begin{lstlisting} -def flatten[a](xs: List[List[a]]): List[a] = - (xs :\ Nil) {(x, xs) => x ::: xs} -\end{lstlisting} -In this case it is not possible to replace the application of -\code{:\\} with \code{/:}. Explain why. - -In fact \code{flatten} is predefined together with a set of other -userful function in an object called \code{List} in the standatd Scala -library. It can be accessed from user program by calling -\code{List.flatten}. Note that \code{flatten} is not a method of class -\code{List} -- it would not make sense there, since it applies only -to lists of lists, not to all lists in general. -\end{exercise} - -\paragraph{List Reversal Again} We have seen in -Section~\ref{sec:list-first-order} an implementation of method -\code{reverse} whose run-time was quadratic in the length of the list -to be reversed. We now develop a new implementation of \code{reverse}, -which has linear cost. The idea is to use a \code{foldLeft} -operation based on the following program scheme. -\begin{lstlisting} -class List[+a] { ... - def reverse: List[a] = (z? /: this)(op?) -\end{lstlisting} -It only remains to fill in the \code{z?} and \code{op?} parts. Let's -try to deduce them from examples. -\begin{lstlisting} - Nil -= Nil.reverse // by specification -= (z /: Nil)(op) // by the template for reverse -= (Nil foldLeft z)(op) // by the definition of /: -= z // by definition of foldLeft -\end{lstlisting} -Hence, \code{z?} must be \code{Nil}. To deduce the second operand, -let's study reversal of a list of length one. -\begin{lstlisting} - List(x) -= List(x).reverse // by specification -= (Nil /: List(x))(op) // by the template for reverse, with z = Nil -= (List(x) foldLeft Nil)(op) // by the definition of /: -= op(Nil, x) // by definition of foldLeft -\end{lstlisting} -Hence, \code{op(Nil, x)} equals \code{List(x)}, which is the same -as \code{x :: Nil}. This suggests to take as \code{op} the -\code{::} operator with its operands exchanged. Hence, we arrive at -the following implementation for \code{reverse}, which has linear complexity. -\begin{lstlisting} -def reverse: List[a] = - ((Nil: List[a]) /: this) {(xs, x) => x :: xs} -\end{lstlisting} -(Remark: The type annotation of \code{Nil} is necessary -to make the type inferencer work.) - -\begin{exercise} Fill in the missing expressions to complete the following -definitions of some basic list-manipulation operations as fold -operations. -\begin{lstlisting} -def mapFun[a, b](xs: List[a], f: a => b): List[b] = - (xs :\ List[b]()){ ?? } - -def lengthFun[a](xs: List[a]): int = - (0 /: xs){ ?? } -\end{lstlisting} -\end{exercise} - -\paragraph{Nested Mappings} - -We can employ higher-order list processing functions to express many -computations that are normally expressed as nested loops in imperative -languages. - -As an example, consider the following problem: Given a positive -integer $n$, find all pairs of positive integers $i$ and $j$, where -$1 \leq j < i < n$ such that $i + j$ is prime. For instance, if $n = 7$, -the pairs are -\bda{c|lllllll} -i & 2 & 3 & 4 & 4 & 5 & 6 & 6\\ -j & 1 & 2 & 1 & 3 & 2 & 1 & 5\\ \hline -i + j & 3 & 5 & 5 & 7 & 7 & 7 & 11 -\eda - -A natural way to solve this problem consists of two steps. In a first step, -one generates the sequence of all pairs $(i, j)$ of integers such that -$1 \leq j < i < n$. In a second step one then filters from this sequence -all pairs $(i, j)$ such that $i + j$ is prime. - -Looking at the first step in more detail, a natural way to generate -the sequence of pairs consists of three sub-steps. First, generate -all integers between $1$ and $n$ for $i$. -\item -Second, for each integer $i$ between $1$ and $n$, generate the list of -pairs $(i, 1)$ up to $(i, i-1)$. This can be achieved by a -combination of \code{range} and \code{map}: -\begin{lstlisting} - List.range(1, i) map (x => Pair(i, x)) -\end{lstlisting} -Finally, combine all sublists using \code{foldRight} with \code{:::}. -Putting everything together gives the following expression: -\begin{lstlisting} -List.range(1, n) - .map(i => List.range(1, i).map(x => Pair(i, x))) - .foldRight(List[Pair[int, int]]()) {(xs, ys) => xs ::: ys} - .filter(pair => isPrime(pair._1 + pair._2)) -\end{lstlisting} - -\paragraph{Flattening Maps} -The combination of mapping and then concatenating sublists -resulting from the map -is so common that we there is a special method -for it in class \code{List}: -\begin{lstlisting} -abstract class List[+a] { ... - def flatMap[b](f: a => List[b]): List[b] = match { - case Nil => Nil - case x :: xs => f(x) ::: (xs flatMap f) - } -} -\end{lstlisting} -With \code{flatMap}, the pairs-whose-sum-is-prime expression -could have been written more concisely as follows. -\begin{lstlisting} -List.range(1, n) - .flatMap(i => List.range(1, i).map(x => Pair(i, x))) - .filter(pair => isPrime(pair._1 + pair._2)) -\end{lstlisting} - - - -\section{Summary} - -This chapter has ingtroduced lists as a fundamental data structure in -programming. Since lists are immutable, they are a common data type in -functional programming languages. They play there a role comparable to -arrays in imperative languages. However, the access patterns between -arrays and lists are quite different. Where array accessing is always -done by indexing, this is much less common for lists. We have seen -that \code{scala.List} defines a method called \code{apply} for indexing; -however this operation is much more costly than in the case of arrays -(linear as opposed to constant time). Instead of indexing, lists are -usually traversed recursively, where recursion steps are usually based -on a pattern match over the traversed list. There is also a rich set of -higher-order combinators which allow one to instantiate a set of -predefined patterns of computations over lists. - -\comment{ -\bsh{Reasoning About Lists} - -Recall the concatenation operation for lists: - -\begin{lstlisting} -class List[+a] { - ... - def ::: (that: List[a]): List[a] = - if (isEmpty) that - else head :: (tail ::: that) -} -\end{lstlisting} - -We would like to verify that concatenation is associative, with the -empty list \code{List()} as left and right identity: -\bda{lcl} - (xs ::: ys) ::: zs &=& xs ::: (ys ::: zs) \\ - xs ::: List() &=& xs \gap =\ List() ::: xs -\eda -\emph{Q}: How can we prove statements like the one above? - -\emph{A}: By \emph{structural induction} over lists. -\es -\bsh{Reminder: Natural Induction} - -Recall the proof principle of \emph{natural induction}: - -To show a property \mathtext{P(n)} for all numbers \mathtext{n \geq b}: -\be -\item Show that \mathtext{P(b)} holds (\emph{base case}). -\item For arbitrary \mathtext{n \geq b} show: -\begin{quote} - if \mathtext{P(n)} holds, then \mathtext{P(n+1)} holds as well -\end{quote} -(\emph{induction step}). -\ee -%\es\bs -\emph{Example}: Given -\begin{lstlisting} -def factorial(n: int): int = - if (n == 0) 1 - else n * factorial(n-1) -\end{lstlisting} -show that, for all \code{n >= 4}, -\begin{lstlisting} - factorial(n) >= 2$^n$ -\end{lstlisting} -\es\bs -\Case{\code{4}} -is established by simple calculation of \code{factorial(4) = 24} and \code{2$^4$ = 16}. - -\Case{\code{n+1}} -We have for \code{n >= 4}: -\begin{lstlisting} - \= factorial(n + 1) - = \> $\expl{by the second clause of factorial(*)}$ - \> (n + 1) * factorial(n) - >= \> $\expl{by calculation}$ - \> 2 * factorial(n) - >= \> $\expl{by the induction hypothesis}$ - \> 2 * 2$^n$. -\end{lstlisting} -Note that in our proof we can freely apply reduction steps such as in (*) -anywhere in a term. - - -This works because purely functional programs do not have side -effects; so a term is equivalent to the term it reduces to. - -The principle is called {\em\emph{referential transparency}}. -\es -\bsh{Structural Induction} - -The principle of structural induction is analogous to natural induction: - -In the case of lists, it is as follows: - -To prove a property \mathtext{P(xs)} for all lists \mathtext{xs}, -\be -\item Show that \code{P(List())} holds (\emph{base case}). -\item For arbitrary lists \mathtext{xs} and elements \mathtext{x} - show: -\begin{quote} - if \mathtext{P(xs)} holds, then \mathtext{P(x :: xs)} holds as well -\end{quote} -(\emph{induction step}). -\ee - -\es -\bsh{Example} - -We show \code{(xs ::: ys) ::: zs = xs ::: (ys ::: zs)} by structural induction -on \code{xs}. - -\Case{\code{List()}} -For the left-hand side, we have: -\begin{lstlisting} - \= (List() ::: ys) ::: zs - = \> $\expl{by first clause of \prog{:::}}$ - \> ys ::: zs -\end{lstlisting} -For the right-hand side, we have: -\begin{lstlisting} - \= List() ::: (ys ::: zs) - = \> $\expl{by first clause of \prog{:::}}$ - \> ys ::: zs -\end{lstlisting} -So the case is established. - -\es -\bs -\Case{\code{x :: xs}} - -For the left-hand side, we have: -\begin{lstlisting} - \= ((x :: xs) ::: ys) ::: zs - = \> $\expl{by second clause of \prog{:::}}$ - \> (x :: (xs ::: ys)) ::: zs - = \> $\expl{by second clause of \prog{:::}}$ - \> x :: ((xs ::: ys) ::: zs) - = \> $\expl{by the induction hypothesis}$ - \> x :: (xs ::: (ys ::: zs)) -\end{lstlisting} - -For the right-hand side, we have: -\begin{lstlisting} - \= (x :: xs) ::: (ys ::: zs) - = \> $\expl{by second clause of \prog{:::}}$ - \> x :: (xs ::: (ys ::: zs)) -\end{lstlisting} -So the case (and with it the property) is established. - -\begin{exercise} -Show by induction on \code{xs} that \code{xs ::: List() = xs}. -\es -\bsh{Example (2)} -\end{exercise} - -As a more difficult example, consider function -\begin{lstlisting} -abstract class List[a] { ... - def reverse: List[a] = match { - case List() => List() - case x :: xs => xs.reverse ::: List(x) - } -} -\end{lstlisting} -We would like to prove the proposition that -\begin{lstlisting} - xs.reverse.reverse = xs . -\end{lstlisting} -We proceed by induction over \code{xs}. The base case is easy to establish: -\begin{lstlisting} - \= List().reverse.reverse - = \> $\expl{by first clause of \prog{reverse}}$ - \> List().reverse - = \> $\expl{by first clause of \prog{reverse}}$ - \> List() -\end{lstlisting} -\es\bs -For the induction step, we try: -\begin{lstlisting} - \= (x :: xs).reverse.reverse - = \> $\expl{by second clause of \prog{reverse}}$ - \> (xs.reverse ::: List(x)).reverse -\end{lstlisting} -There's nothing more we can do to this expression, so we turn to the right side: -\begin{lstlisting} - \= x :: xs - = \> $\expl{by induction hypothesis}$ - \> x :: xs.reverse.reverse -\end{lstlisting} -The two sides have simplified to different expressions. - -So we still have to show that -\begin{lstlisting} - (xs.reverse ::: List(x)).reverse = x :: xs.reverse.reverse -\end{lstlisting} -Trying to prove this directly by induction does not work. - -Instead we have to {\em generalize} the equation to: -\begin{lstlisting} - (ys ::: List(x)).reverse = x :: ys.reverse -\end{lstlisting} -\es\bs -This equation can be proved by a second induction argument over \code{ys}. -(See blackboard). - -\begin{exercise} -Is it the case that \code{(xs drop m) at n = xs at (m + n)} for all -natural numbers \code{m}, \code{n} and all lists \code{xs}? -\end{exercise} - -\es -\bsh{Structural Induction on Trees} - -Structural induction is not restricted to lists; it works for arbitrary -trees. - -The general induction principle is as follows. - -To show that property \code{P(t)} holds for all trees of a certain type, -\begin{itemize} -\item Show \code{P(l)} for all leaf trees \code{$l$}. -\item For every interior node \code{t} with subtrees \code{s$_1$, ..., s$_n$}, - show that \code{P(s$_1$) $\wedge$ ... $\wedge$ P(s$_n$) => P(t)}. -\end{itemize} - -\example Recall our definition of \code{IntSet} with -operations \code{contains} and \code{incl}: - -\begin{lstlisting} -abstract class IntSet { - abstract def incl(x: int): IntSet - abstract def contains(x: int): boolean -} -\end{lstlisting} -\es\bs -\begin{lstlisting} -case class Empty extends IntSet { - def contains(x: int): boolean = false - def incl(x: int): IntSet = NonEmpty(x, Empty, Empty) -} -case class NonEmpty(elem: int, left: Set, right: Set) extends IntSet { - def contains(x: int): boolean = - if (x < elem) left contains x - else if (x > elem) right contains x - else true - def incl(x: int): IntSet = - if (x < elem) NonEmpty(elem, left incl x, right) - else if (x > elem) NonEmpty(elem, left, right incl x) - else this -} -\end{lstlisting} -(With \code{case} added, so that we can use factory methods instead of \code{new}). - -What does it mean to prove the correctness of this implementation? -\es -\bsh{Laws of IntSet} - -One way to state and prove the correctness of an implementation is -to prove laws that hold for it. - -In the case of \code{IntSet}, three such laws would be: - -For all sets \code{s}, elements \code{x}, \code{y}: - -\begin{lstlisting} -Empty contains x \= = false -(s incl x) contains x \> = true -(s incl x) contains y \> = s contains y if x $\neq$ y -\end{lstlisting} - -(In fact, one can show that these laws characterize the desired data -type completely). - -How can we establish that these laws hold? - -\emph{Proposition 1}: \code{Empty contains x = false}. - -\emph{Proof}: By the definition of \code{contains} in \code{Empty}. -\es\bs -\emph{Proposition 2}: \code{(xs incl x) contains x = true} - -\emph{Proof:} - -\Case{\code{Empty}} -\begin{lstlisting} - \= (Empty incl x) contains x - = \> $\expl{by definition of \prog{incl} in \prog{Empty}}$ - \> NonEmpty(x, Empty, Empty) contains x - = \> $\expl{by definition of \prog{contains} in \prog{NonEmpty}}$ - \> true -\end{lstlisting} - -\Case{\code{NonEmpty(x, l, r)}} -\begin{lstlisting} - \= (NonEmpty(x, l, r) incl x) contains x - = \> $\expl{by definition of \prog{incl} in \prog{NonEmpty}}$ - \> NonEmpty(x, l, r) contains x - = \> $\expl{by definition of \prog{contains} in \prog{Empty}}$ - \> true -\end{lstlisting} -\es\bs -\Case{\code{NonEmpty(y, l, r)} where \code{y < x}} -\begin{lstlisting} - \= (NonEmpty(y, l, r) incl x) contains x - = \> $\expl{by definition of \prog{incl} in \prog{NonEmpty}}$ - \> NonEmpty(y, l, r incl x) contains x - = \> $\expl{by definition of \prog{contains} in \prog{NonEmpty}}$ - \> (r incl x) contains x - = \> $\expl{by the induction hypothesis}$ - \> true -\end{lstlisting} - -\Case{\code{NonEmpty(y, l, r)} where \code{y > x}} is analogous. - -\bigskip - -\emph{Proposition 3}: If \code{x $\neq$ y} then -\code{xs incl y contains x = xs contains x}. - -\emph{Proof:} See blackboard. -\es -\bsh{Exercise} - -Say we add a \code{union} function to \code{IntSet}: - -\begin{lstlisting} -class IntSet { ... - def union(other: IntSet): IntSet -} -class Expty extends IntSet { ... - def union(other: IntSet) = other -} -class NonEmpty(x: int, l: IntSet, r: IntSet) extends IntSet { ... - def union(other: IntSet): IntSet = l union r union other incl x -} -\end{lstlisting} - -The correctness of \code{union} can be subsumed with the following -law: - -\emph{Proposition 4}: -\code{(xs union ys) contains x = xs contains x || ys contains x}. -Is that true ? What hypothesis is missing ? Show a counterexample. - -Show Proposition 4 using structural induction on \code{xs}. -\es -\comment{ - -\emph{Proof:} By induction on \code{xs}. - -\Case{\code{Empty}} - -\Case{\code{NonEmpty(x, l, r)}} - -\Case{\code{NonEmpty(y, l, r)} where \code{y < x}} - -\begin{lstlisting} - \= (Empty union ys) contains x - = \> $\expl{by definition of \prog{union} in \prog{Empty}}$ - \> ys contains x - = \> $\expl{Boolean algebra}$ - \> false || ys contains x - = \> $\expl{by definition of \prog{contains} in \prog{Empty} (reverse)}$ - \> (Empty contains x) || (ys contains x) -\end{lstlisting} - -\begin{lstlisting} - \= (NonEmpty(x, l, r) union ys) contains x - = \> $\expl{by definition of \prog{union} in \prog{NonEmpty}}$ - \> (l union r union ys incl x) contains x - = \> $\expl{by Proposition 2}$ - \> true - = \> $\expl{Boolean algebra}$ - \> true || (ys contains x) - = \> $\expl{by definition of \prog{contains} in \prog{NonEmpty} (reverse)}$ - \> (NonEmpty(x, l, r) contains x) || (ys contains x) -\end{lstlisting} - -\begin{lstlisting} - \= (NonEmpty(y, l, r) union ys) contains x - = \> $\expl{by definition of \prog{union} in \prog{NonEmpty}}$ - \> (l union r union ys incl y) contains x - = \> $\expl{by Proposition 3}$ - \> (l union r union ys) contains x - = \> $\expl{by the induction hypothesis}$ - \> ((l union r) contains x) || (ys contains x) - = \> $\expl{by Proposition 3}$ - \> ((l union r incl y) contains x) || (ys contains x) -\end{lstlisting} - -\Case{\code{NonEmpty(y, l, r)} where \code{y < x}} - ... is analogous. - -\es -}} -\chapter{\label{sec:for-notation}For-Comprehensions} - -The last chapter demonstrated that higher-order functions such as -\verb@map@, \verb@flatMap@, \verb@filter@ provide powerful -constructions for dealing with lists. But sometimes the level of -abstraction required by these functions makes a program hard to -understand. - -To help understandbility, Scala has a special notation which -simplifies common patterns of applications of higher-order functions. -This notation builds a bridge between set-comprehensions in -mathematics and for-loops in imperative languages such as C or -Java. It also closely resembles the query notation of relational -databases. - -As a first example, say we are given a list \code{persons} of persons -with \code{name} and \code{age} fields. To print the names of all -persons in the sequence which are aged over 20, one can write: -\begin{lstlisting} -for (val p <- persons; p.age > 20) yield p.name -\end{lstlisting} -This is equivalent to the following expression , which uses -higher-order functions \code{filter} and \code{map}: -\begin{lstlisting} -persons filter (p => p.age > 20) map (p => p.name) -\end{lstlisting} -The for-comprehension looks a bit like a for-loop in imperative languages, -except that it constructs a list of the results of all iterations. - -Generally, a for-comprehension is of the form -\begin{lstlisting} -for ( $s$ ) yield $e$ -\end{lstlisting} -Here, $s$ is a sequence of {\em generators} and {\em filters}. A {\em -generator} is of the form \code{val x <- e}, where \code{e} is a -list-valued expression. It binds \code{x} to successive values in the -list. A {\em filter} is an expression \code{f} of type -\code{boolean}. It omits from consideration all bindings for which -\code{f} is \code{false}. The sequence $s$ starts in each case with a -generator. If there are several generators in a sequence, later -generators vary more rapidly than earlier ones. - -Here are two examples that show how for-comprehensions are used. -First, let's redo an example of the previous chapter: Given a positive -integer $n$, find all pairs of positive integers $i$ and $j$, where $1 -\leq j < i < n$ such that $i + j$ is prime. With a for-comprehension -this problem is solved as follows: -\begin{lstlisting} -for (val i <- List.range(1, n); - val j <- List.range(1, i); - isPrime(i+j)) yield Pair(i, j) -\end{lstlisting} -This is arguably much clearer than the solution using \code{map}, -\code{flatMap} and \code{filter} that we have developed previously. - -As a second example, consider computing the scalar product of two -vectors \code{xs} and \code{ys}. Using a for-comprehension, this can -be written as follows. -\begin{lstlisting} - sum (for(val (x, y) <- xs zip ys) yield x * y) -\end{lstlisting} - -\section{The N-Queens Problem} - -For-comprehensions are especially useful for solving combinatorial -puzzles. An example of such a puzzle is the 8-queens problem: Given a -standard chessboard, place 8 queens such that no queen is in check from any -other (a queen can check another piece if they are on the same -column, row, or diagional). We will now develop a solution to this -problem, generalizing it to chessboards of arbitrary size. Hence, the -problem is to place $n$ queens on a chessboard of size $n \times n$. - -To solve this problem, note that we need to place a queen in each row. -So we could place queens in successive rows, each time checking that a -newly placed queen is not in queck from any other queens that have -already been placed. In the course of this search, it might arrive -that a queen to be placed in row $k$ would be in check in all fields -of that row from queens in row $1$ to $k-1$. In that case, we need to -abort that part of the search in order to continue with a different -configuration of queens in columns $1$ to $k-1$. - -This suggests a recursive algorithm. Assume that we have already -generated all solutions of placing $k-1$ queens on a board of size $n -\times n$. We can represent each such solution by a list of length -$k-1$ of column numbers (which can range from $1$ to $n$). We treat -these partial solution lists as stacks, where the column number of the -queen in row $k-1$ comes first in the list, followed by the column -number of the queen in row $k-2$, etc. The bottom of the stack is the -column number of the queen placed in the first row of the board. All -solutions together are then represented as a list of lists, with one -element for each solution. - -Now, to place the $k$'the queen, we generate all possible extensions -of each previous solution by one more queen. This yields another list -of solution lists, this time of length $k$. We continue the process -until we have reached solutions of the size of the chessboard $n$. -This algorithmic idea is embodied in function \code{placeQueens} below: -\begin{lstlisting} -def queens(n: int): List[List[int]] = { - def placeQueens(k: int): List[List[int]] = - if (k == 0) List(List()) - else for (val queens <- placeQueens(k - 1); - val column <- List.range(1, n + 1); - isSafe(column, queens, 1)) yield col :: queens; - placeQueens(n); -} -\end{lstlisting} - -\begin{exercise} Write the function -\begin{lstlisting} - def isSafe(col: int, queens: List[int], delta: int): boolean -\end{lstlisting} -which tests whether a queen in the given column \verb@col@ is safe with -respect to the \verb@queens@ already placed. Here, \verb@delta@ is the difference between the row of the queen to be -placed and the row of the first queen in the list. -\end{exercise} - -\section{Querying with For-Comprehensions} - -The for-notation is essentially equivalent to common operations of -database query languages. For instance, say we are given a -database \code{books}, represented as a list of books, where -\code{Book} is defined as follows. -\begin{lstlisting} -case class Book(title: String, authors: List[String]); -\end{lstlisting} -Here is a small example database: -\begin{lstlisting} -val books: List[Book] = List( - Book("Structure and Interpretation of Computer Programs", - List("Abelson, Harald", "Sussman, Gerald J.")), - Book("Principles of Compiler Design", - List("Aho, Alfred", "Ullman, Jeffrey")), - Book("Programming in Modula-2", - List("Wirth, Niklaus")), - Book("Introduction to Functional Programming"), - List("Bird, Richard")), - Book("The Java Language Specification", - List("Gosling, James", "Joy, Bill", "Steele, Guy", "Bracha, Gilad"))); -\end{lstlisting} -Then, to find the titles of all books whose author's last name is ``Ullman'': -\begin{lstlisting} -for (val b <- books; val a <- b.authors; a startsWith "Ullman") -yield b.title -\end{lstlisting} -(Here, \code{startsWith} is a method in \code{java.lang.String}). Or, -to find the titles of all books that have the string ``Program'' in -their title: -\begin{lstlisting} -for (val b <- books; (b.title indexOf "Program") >= 0) -yield b.title -\end{lstlisting} -Or, to find the names of all authors that have written at least two -books in the database. -\begin{lstlisting} -for (val b1 <- books; val b2 <- books; b1 != b2; - val a1 <- b1.authors; val a2 <- b2.authors; a1 == a2) -yield a1 -\end{lstlisting} -The last solution is not yet perfect, because authors will appear -several times in the list of results. We still need to remove -duplicate authors from result lists. This can be achieved with the -following function. -\begin{lstlisting} -def removeDuplicates[a](xs: List[a]): List[a] = - if (xs.isEmpty) xs - else xs.head :: removeDuplicates(xs.tail filter (x => x != xs.head)); -\end{lstlisting} -Note that the last expression in method \code{removeDuplicates} -can be equivalently expressed using a for-comprehension. -\begin{lstlisting} -xs.head :: removeDuplicates(for (val x <- xs.tail; x != xs.head) yield x) -\end{lstlisting} - -\section{Translation of For-Comprehensions} - -Every for-comprehension can be expressed in terms of the three -higher-order functions \code{map}, \code{flatMap} and \code{filter}. -Here is the translation scheme, which is also used by the Scala compiler. -\begin{itemize} -\item -A simple for-comprehension -\begin{lstlisting} -for (val x <- e) yield e' -\end{lstlisting} -is translated to -\begin{lstlisting} -e.map(x => e') -\end{lstlisting} -\item -A for-comprehension -\begin{lstlisting} -for (val x <- e; f; s) yield e' -\end{lstlisting} -where \code{f} is a filter and \code{s} is a (possibly empty) -sequence of generators or filters -is translated to -\begin{lstlisting} -for (val x <- e.filter(x => f); s) yield e' -\end{lstlisting} -and then translation continues with the latter expression. -\item -A for-comprehension -\begin{lstlisting} -for (val x <- e; y <- e'; s) yield e'' -\end{lstlisting} -where \code{s} is a (possibly empty) -sequence of generators or filters -is translated to -\begin{lstlisting} -e.flatMap(x => for (y <- e'; s) yield e'') -\end{lstlisting} -and then translation continues with the latter expression. -\end{itemize} -For instance, taking our "pairs of integers whose sum is prime" example: -\begin{lstlisting} -for { val i <- range(1, n); - val j <- range(1, i); - isPrime(i+j) -} yield (i, j) -\end{lstlisting} -Here is what we get when we translate this expression: -\begin{lstlisting} -range(1, n) - .flatMap(i => - range(1, i) - .filter(j => isPrime(i+j)) - .map(j => (i, j))) -\end{lstlisting} - -Conversely, it would also be possible to express functions \code{map}, -\code{flatMap}{ and \code{filter} using for-comprehensions. Here are the -three functions again, this time implemented using for-comprehensions. -\begin{lstlisting} -object Demo { - def map[a, b](xs: List[a], f: a => b): List[b] = - for (val x <- cs) yield f(x); - - def flatMap[a, b](xs: List[a], f: a => List[b]): List[b] = - for (val x <- xs; val y <- f(x)) yield y; - - def filter[a](xs: List[a], p: a => boolean): List[a] = - for (val x <- xs; p(x)) yield x; -} -\end{lstlisting} -Not surprisingly, the translation of the for-comprehension in the body of -\code{Demo.map} will produce a call to \code{map} in class \code{List}. -Similarly, \code{Demo.flatMap} and \code{Demo.filter} translate to -\code{flatMap} and \code{filter} in class \code{List}. - -\begin{exercise} -Define the following function in terms of \code{for}. -\begin{lstlisting} -def flatten(xss: List[List[a]]): List[a] = - (xss :\ List()) ((xs, ys) => xs ::: ys) -\end{lstlisting} -\end{exercise} - -\begin{exercise} -Translate -\begin{lstlisting} -for { val b <- books; val a <- b.authors; a startsWith "Bird" } yield b.title -for { val b <- books; (b.title indexOf "Program") >= 0 } yield b.title -\end{lstlisting} -to higher-order functions. -\end{exercise} - -\section{For-Loops}\label{sec:for-loops} - -For-comprehensions resemble for-loops in imperative languages, except -that they produce a list of results. Sometimes, a list of results is -not needed but we would still like the flexibility of generators and -filters in iterations over lists. This is made possible by a variant -of the for-comprehension syntax, which excpresses for-loops: -\begin{lstlisting} -for ( $s$ ) $e$ -\end{lstlisting} -This construct is the same as the standard for-comprehension syntax -except that the keyword \code{yield} is missing. The for-loop is -executed by executing the expression $e$ for each element generated -from the sequence of generators and filters $s$. - -As an example, the following expression prints out all elements of a -matrix represented as a list of lists: - \begin{lstlisting} -for (xs <- xss) { - for (x <- xs) System.out.print(x + "\t") - System.out.println() -} -\end{lstlisting} -The translation of for-loops to higher-order methods of class -\code{List} is similar to the translation of for-comprehensions, but -is simpler. Where for-comprehensions translate to \code{map} and -\code{flatMap}, for-loops translate in each case to \code{foreach}. - -\section{Generalizing For} - -We have seen that the translation of for-comprehensions only relies on -the presence of methods \code{map}, \code{flatMap}, and -\code{filter}. Therefore it is possible to apply the same notation to -generators that produce objects other than lists; these objects only -have to support the three key functions \code{map}, \code{flatMap}, -and \code{filter}. - -The standard Scala library has several other abstractions that support -these three methods and with them support for-comprehensions. We will -encounter some of them in the following chapters. As a programmer you -can also use this principle to enable for-comprehensions for types you -define -- these types just need to support methods \code{map}, -\code{flatMap}, and \code{filter}. - -There are many examples where this is useful: Examples are database -interfaces, XML trees, or optional values. We will see in -Chapter~\ref{sec:parsers-results} how for-comprehensions can be used -in the definition of parsers for context-free grammars that construct -abstract syntax trees. - -One caveat: It is not assured automatically that the result -translating a for-comprehension is well-typed. To ensure this, the -types of \code{map}, \code{flatMap} and \code{filter} have to be -essentially similar to the types of these methods in class \code{List}. - -To make this precise, assume you have a parameterized class - \code{C[a]} for which you want to enable for-comprehensions. Then - \code{C} should define \code{map}, \code{flatMap} and \code{filter} - with the following types: -\begin{lstlisting} -def map[b](f: a => b): C[b] -def flatMap[b](f: a => C[b]): C[b] -def filter(p: a => boolean): C[a] -\end{lstlisting} -It would be attractive to enforce these types statically in the Scala -compiler, for instance by requiring that any type supporting -for-comprehensions implements a standard trait with these methods -\footnote{In the programming language Haskell, which has similar -constructs, this abstraction is called a ``monad with zero''}. The -problem is that such a standard trait would have to abstract over the -identity of the class \code{C}, for instance by taking \code{C} as a -type parameter. Note that this parameter would be a type constructor, -which gets applied to {\em several different} types in the signatures of -methods \code{map} and \code{flatMap}. Unfortunately, the Scala type -system is too weak to express this construct, since it can handle only -type parameters which are fully applied types. - -\chapter{Mutable State} - -Most programs we have presented so for did not have side-effects -\footnote{We ignore here the fact that some of our program printed to -standard output, which technically is a side effect.}. Therefore, the -notion of {\em time} did not matter. For a program that terminates, -any sequence of actions would have led to the same result! This is -also reflected by the substitution model of computation, where a -rewrite step can be applied anywhere in a term, and all rewritings -that terminate lead to the same solution. In fact, this {\em -confluence} property is a deep result in $\lambda$-calculus, the -theory underlying functional programming. - -In this chapter, we introduce functions with side effects and study -their behavior. We will see that as a consequence we have to -fundamenatlly modify up the substitution model of computation which we -employed so far. - -\section{Stateful Objects} - -We normally view the world as a set of objects, some of which have -state that {\em changes} over time. Normally, state is associated -with a set of variables that can be changed in the course of a -computation. There is also a more abstract notion of state, which -does not refer to particular constructs of a programming language: An -object {\em has state} (or: {\em is stateful}) if its behavior is -influenced by its history. - -For instance, a bank account object has state, because the question -``can I withdraw 100 CHF?'' -might have different answers during the lifetime of the account. - -In Scala, all mutable state is ultimately built from variables. A -variable definition is written like a value definition, but starts -with \verb@var@ instead of \verb@val@. For instance, the following two -definitions introduce and initialize two variables \code{x} and -\code{count}. -\begin{lstlisting} -var x: String = "abc"; -var count = 111; -\end{lstlisting} -Like a value definition, a variable definition associates a name with -a value. But in the case of a variable definition, this association -may be changed later by an assignment. Such assignments are written -as in C or Java. Examples: -\begin{lstlisting} -x = "hello"; -count = count + 1; -\end{lstlisting} -In Scala, every defined variable has to be initialized at the point of -its definition. For instance, the statement ~\code{var x: int;}~ is -{\em not} regarded as a variable definition, because the initializer -is missing\footnote{If a statement like this appears in a class, it is -instead regarded as a variable declaration, which introcuces -abstract access methods for the variable, but does not associate these -methods with a piece of state.}. If one does not know, or does not -care about, the appropriate initializer, one can use a wildcard -instead. I.e. -\begin{lstlisting} -val x: T = _; -\end{lstlisting} -will initialize \code{x} to some default value (\code{null} for -reference types, \code{false} for booleans, and the appropriate -version of \code{0} for numeric value types). - -Real-world objects with state are represented in Scala by objects that -have variables as members. For instance, here is a class that -represents bank accounts. -\begin{lstlisting} -class BankAccount { - private var balance = 0; - def deposit(amount: int): unit = - if (amount > 0) balance = balance + amount; - - def withdraw(amount: int): int = - if (0 < amount && amount <= balance) { - balance = balance - amount; - balance - } else error("insufficient funds"); -} -\end{lstlisting} -The class defines a variable \code{balance} which contains the current -balance of an account. Methods \code{deposit} and \code{withdraw} -change the value of this variable through assignments. Note that -\code{balance} is \code{private} in class \code{BankAccount} -- hence -it can not be accessed directly outside the class. - -To create bank-accounts, we use the usual object creation notation: -\begin{lstlisting} -val myAccount = new BankAccount -\end{lstlisting} - -\example Here is a \code{scalaint} session that deals with bank -accounts. - -\begin{lstlisting} -> :l bankaccount.scala -loading file 'bankaccount.scala' -> val account = new BankAccount -val account : BankAccount = BankAccount$\Dollar$class@1797795 -> account deposit 50 -(): scala.Unit -> account withdraw 20 -30: scala.Int -> account withdraw 20 -10: scala.Int -> account withdraw 15 -java.lang.RuntimeException: insufficient funds - at error(Predef.scala:3) - at BankAccount$\Dollar$class.withdraw(bankaccount.scala:13) - at <top-level>(console:1) -> -\end{lstlisting} -The example shows that applying the same operation (\code{withdraw -20}) twice to an account yields different results. So, clearly, -accounts are stateful objects. - -\paragraph{Sameness and Change} -Assignments pose new problems in deciding when two expressions are -``the same''. -If assignments are excluded, and one writes -\begin{lstlisting} -val x = E; val y = E; -\end{lstlisting} -where \code{E} is some arbitrary expression, -then \code{x} and \code{y} can reasonably be assumed to be the same. -I.e. one could have equivalently written -\begin{lstlisting} -val x = E; val y = x; -\end{lstlisting} -(This property is usually called {\em referential transparency}). But -once we admit assignments, the two definition sequences are different. -Consider: -\begin{lstlisting} -val x = new BankAccount; val y = new BankAccount; -\end{lstlisting} -To answer the question whether \code{x} and \code{y} are the same, we -need to be more precise what ``sameness'' means. This meaning is -captured in the notion of {\em operational equivalence}, which, -somewhat informally, is stated as follows. - -Suppose we have two definitions of \code{x} and \code{y}. -To test whether \code{x} and \code{y} define the same value, proceed -as follows. -\begin{itemize} -\item -Execute the definitions followed by an -arbitrary sequence \code{S} of operations that involve \code{x} and -\code{y}. Observe the results (if any). -\item -Then, execute the definitions with another sequence \code{S'} which -results from \code{S} by renaming all occurrences of \code{y} in -\code{S} to \code{x}. -\item -If the results of running \code{S'} are different, then surely -\code{x} and \code{y} are different. -\item -On the other hand, if all possible pairs of sequences \code{(S, S')} -yield the same results, then \code{x} and \code{y} are the same. -\end{itemize} -In other words, operational equivalence regards two definitions -\code{x} and \code{y} as defining the same value, if no possible -experiment can distinguish between \code{x} and \code{y}. An -experiment in this context are two version of an arbitrary program which use either -\code{x} or \code{y}. - -Given this definition, let's test whether -\begin{lstlisting} -val x = new BankAccount; val y = new BankAccount; -\end{lstlisting} -defines values \code{x} and \code{y} which are the same. -Here are the definitions again, followed by a test sequence: - -\begin{lstlisting} -> val x = new BankAccount -> val y = new BankAccount -> x deposit 30 -30 -> y withdraw 20 -java.lang.RuntimeException: insufficient funds -\end{lstlisting} - -Now, rename all occurrences of \code{y} in that sequence to -\code{x}. We get: -\begin{lstlisting} -> val x = new BankAccount -> val y = new BankAccount -> x deposit 30 -30 -> x withdraw 20 -10 -\end{lstlisting} -Since the final results are different, we have established that -\code{x} and \code{y} are not the same. -On the other hand, if we define -\begin{lstlisting} -val x = new BankAccount; val y = x -\end{lstlisting} -then no sequence of operations can distinguish between \code{x} and -\code{y}, so \code{x} and \code{y} are the same in this case. - -\paragraph{Assignment and the Substitution Model} -These examples show that our previous substitution model of -computation cannot be used anymore. After all, under this -model we could always replace a value name by its -defining expression. -For instance in -\begin{lstlisting} -val x = new BankAccount; val y = x -\end{lstlisting} -the \code{x} in the definition of \code{y} could -be replaced by \code{new BankAccount}. -But we have seen that this change leads to a different program. -So the substitution model must be invalid, once we add assignments. - -\section{Imperative Control Structures} - -Scala has the \code{while} and \code{do-while} loop constructs known -from the C and Java languages. There is also a single branch \code{if} -which leaves out the else-part as well as a \code{return} statement which -aborts a function prematurely. This makes it possible to program in a -conventional imperative style. For instance, the following function, -which computes the \code{n}'th power of a given parameter \code{x}, is -implemented using \code{while} and single-branch \code{if}. -\begin{lstlisting} -def power (x: double, n: int): double = { - var r = 1.0; - var i = n; - while (i > 0) { - if ((i & 1) == 1) { r = r * x } - if (i > 1) r = r * r; - i = i >> 1; - } - r -} -\end{lstlisting} -These imperative control constructs are in the language for -convenience. They could have been left out, as the same constructs can -be implemented using just functions. As an example, let's develop a -functional implementation of the while loop. \code{whileLoop} should -be a function that takes two parameters: a condition, of type -\code{boolean}, and a command, of type \code{unit}. Both condition and -command need to be passed by-name, so that they are evaluated -repeatedly for each loop iteration. This leads to the following -definition of \code{whileLoop}. -\begin{lstlisting} -def whileLoop(def condition: boolean)(def command: unit): unit = - if (condition) { - command; whileLoop(condition)(command) - } else {} -\end{lstlisting} -Note that \code{whileLoop} is tail recursive, so it operates in -constant stack space. - -\begin{exercise} Write a function \code{repeatLoop}, which should be -applied as follows: -\begin{lstlisting} -repeatLoop { command } ( condition ) -\end{lstlisting} -Is there also a way to obtain a loop syntax like the following? -\begin{lstlisting} -repeatLoop { command } until ( condition ) -\end{lstlisting} -\end{exercise} - -Some other control constructs known from C and Java are missing in -Scala: There are no \code{break} and \code{continue} jumps for loops. -There are also no for-loops in the Java sense -- these have been -replaced by the more general for-loop construct discussed in -Section~\ref{sec:for-loops}. - -\section{Extended Example: Discrete Event Simulation} - -We now discuss an example that demonstrates how assignments and -higher-order functions can be combined in interesting ways. -We will build a simulator for digital circuits. - -The example is taken from Abelson and Sussman's book -\cite{abelson-sussman:structure}. We augment their basic (Scheme-) -code by an object-oriented structure which allows code-reuse through -inheritance. The example also shows how discrete event simulation programs -in general are structured and built. - -We start with a little language to describe digital circuits. -A digital circuit is built from {\em wires} and {\em function boxes}. -Wires carry signals which are transformed by function boxes. -We will represent signals by the booleans \code{true} and -\code{false}. - -Basic function boxes (or: {\em gates}) are: -\begin{itemize} -\item An \emph{inverter}, which negates its signal -\item An \emph{and-gate}, which sets its output to the conjunction of its input. -\item An \emph{or-gate}, which sets its output to the disjunction of its -input. -\end{itemize} -Other function boxes can be built by combining basic ones. - -Gates have {\em delays}, so an output of a gate will change only some -time after its inputs change. - -\paragraph{A Language for Digital Circuits} - -We describe the elements of a digital circuit by the following set of -Scala classes and functions. - -First, there is a class \code{Wire} for wires. -We can construct wires as follows. -\begin{lstlisting} -val a = new Wire; -val b = new Wire; -val c = new Wire; -\end{lstlisting} -Second, there are functions -\begin{lstlisting} -def inverter(input: Wire, output: Wire): unit -def andGate(a1: Wire, a2: Wire, output: Wire): unit -def orGate(o1: Wire, o2: Wire, output: Wire): unit -\end{lstlisting} -which ``make'' the basic gates we need (as side-effects). -More complicated function boxes can now be built from these. -For instance, to construct a half-adder, we can define: - -\begin{lstlisting} - def halfAdder(a: Wire, b: Wire, s: Wire, c: Wire): unit = { - val d = new Wire; - val e = new Wire; - orGate(a, b, d); - andGate(a, b, c); - inverter(c, e); - andGate(d, e, s); - } -\end{lstlisting} -This abstraction can itself be used, for instance in defining a full -adder: -\begin{lstlisting} - def fullAdder(a: Wire, b: Wire, cin: Wire, sum: Wire, cout: Wire) = { - val s = new Wire; - val c1 = new Wire; - val c2 = new Wire; - halfAdder(a, cin, s, c1); - halfAdder(b, s, sum, c2); - orGate(c1, c2, cout); - } -\end{lstlisting} -Class \code{Wire} and functions \code{inverter}, \code{andGate}, and -\code{orGate} represent thus a little language in which users can -define digital circuits. We now give implementations of this class -and these functions, which allow one to simulate circuits. -These implementations are based on a simple and general API for -discrete event simulation. - -\paragraph{The Simulation API} - -Discrete event simulation performs user-defined \emph{actions} at -specified \emph{times}. -An {\em action} is represented as a function which takes no parameters and -returns a \code{unit} result: -\begin{lstlisting} -type Action = () => unit; -\end{lstlisting} -The \emph{time} is simulated; it is not the actual ``wall-clock'' time. - -A concrete simulation will be done inside an object which inherits -from the abstract \code{Simulation} class. This class has the following -signature: - -\begin{lstlisting} -abstract class Simulation { - def currentTime: int; - def afterDelay(delay: int, def action: Action): unit; - def run: unit; -} -\end{lstlisting} -Here, -\code{currentTime} returns the current simulated time as an integer -number, -\code{afterDelay} schedules an action to be performed at a specified -delay after \code{currentTime}, and -\code{run} runs the simulation until there are no further actions to be -performed. - -\paragraph{The Wire Class} -A wire needs to support three basic actions. -\begin{itemize} -\item[] -\code{getSignal: boolean}~~ returns the current signal on the wire. -\item[] -\code{setSignal(sig: boolean): unit}~~ sets the wire's signal to \code{sig}. -\item[] -\code{addAction(p: Action): unit}~~ attaches the specified procedure -\code{p} to the {\em actions} of the wire. All attached action -procedures will be executed every time the signal of a wire changes. -\end{itemize} -Here is an implementation of the \code{Wire} class: -\begin{lstlisting} -class Wire { - private var sigVal = false; - private var actions: List[Action] = List(); - def getSignal = sigVal; - def setSignal(s: boolean) = - if (s != sigVal) { - sigVal = s; - actions.foreach(action => action()); - } - def addAction(a: Action) = { - actions = a :: actions; a() - } -} -\end{lstlisting} -Two private variables make up the state of a wire. The variable -\code{sigVal} represents the current signal, and the variable -\code{actions} represents the action procedures currently attached to -the wire. - -\paragraph{The Inverter Class} -We implement an inverter by installing an action on its input wire, -namely the action which puts the negated input signal onto the output -signal. The action needs to take effect at \code{InverterDelay} -simulated time units after the input changes. This suggests the -following implementation: -\begin{lstlisting} -def inverter(input: Wire, output: Wire) = { - def invertAction() = { - val inputSig = input.getSignal; - afterDelay(InverterDelay, () => output.setSignal(!inputSig)) - } - input addAction invertAction -} -\end{lstlisting} - -\paragraph{The And-Gate Class} -And-gates are implemented analogously to inverters. The action of an -\code{andGate} is to output the conjunction of its input signals. -This should happen at \code{AndGateDelay} simulated time units after -any one of its two inputs changes. Hence, the following implementation: -\begin{lstlisting} -def andGate(a1: Wire, a2: Wire, output: Wire) = { - def andAction() = { - val a1Sig = a1.getSignal; - val a2Sig = a2.getSignal; - afterDelay(AndGateDelay, () => output.setSignal(a1Sig & a2Sig)); - } - a1 addAction andAction; - a2 addAction andAction; -} -\end{lstlisting} - -\begin{exercise} Write the implementation of \code{orGate}. -\end{exercise} - -\begin{exercise} Another way is to define an or-gate by a combination of -inverters and and gates. Define a function \code{orGate} in terms of -\code{andGate} and \code{inverter}. What is the delay time of this function? -\end{exercise} - -\paragraph{The Simulation Class} - -Now, we just need to implement class \code{Simulation}, and we are -done. The idea is that we maintain inside a \code{Simulation} object -an \emph{agenda} of actions to perform. The agenda is represented as -a list of pairs of actions and the times they need to be run. The -agenda list is sorted, so that earlier actions come before later ones. -\begin{lstlisting} -class Simulation { - private type Agenda = List[Pair[int, Action]]; - private var agenda: Agenda = List(); -\end{lstlisting} -There is also a private variable \code{curtime} to keep track of the -current simulated time. -\begin{lstlisting} - private var curtime = 0; -\end{lstlisting} -An application of the method \code{afterDelay(delay, action)} -inserts the pair \code{(curtime + delay, action)} into the -\code{agenda} list at the appropriate place. -\begin{lstlisting} - def afterDelay(int delay)(def action: Action): unit = { - val actiontime = curtime + delay; - def insertAction(ag: Agenda): Agenda = ag match { - case List() => - Pair(actiontime, action) :: ag - case (first @ Pair(time, act)) :: ag1 => - if (actiontime < time) Pair(actiontime, action) :: ag - else first :: insert(ag1) - } - agenda = insert(agenda) - } -\end{lstlisting} -An application of the \code{run} method removes successive elements -from the \code{agenda} and performs their actions. -It continues until the agenda is empty: -\begin{lstlisting} -def run = { - afterDelay(0, () => System.out.println("*** simulation started ***")); - agenda match { - case List() => - case Pair(_, action) :: agenda1 => - agenda = agenda1; action(); run - } -} -\end{lstlisting} - - -\paragraph{Running the Simulator} -To run the simulator, we still need a way to inspect changes of -signals on wires. To this purpose, we write a function \code{probe}. -\begin{lstlisting} -def probe(name: String, wire: Wire): unit = { - wire addAction (() => - System.out.println( - name + " " + currentTime + " new_value = " + wire.getSignal); - ) -} -\end{lstlisting} -Now, to see the simulator in action, let's define four wires, and place -probes on two of them: -\begin{lstlisting} -> val input1 = new Wire -> val input2 = new Wire -> val sum = new Wire -> val carry = new Wire - -> probe("sum", sum) -sum 0 new_value = false -> probe("carry", carry) -carry 0 new_value = false -\end{lstlisting} -Now let's define a half-adder connecting the wires: -\begin{lstlisting} -> halfAdder(input1, input2, sum, carry); -\end{lstlisting} -Finally, set one after another the signals on the two input wires to -\code{true} and run the simulation. -\begin{lstlisting} -> input1 setSignal true; run -*** simulation started *** -sum 8 new_value = true -> input2 setSignal true; run -carry 11 new_value = true -sum 15 new_value = false -\end{lstlisting} - -\section{Summary} - -We have seen in this chapter the constructs that let us model state in -Scala -- these are variables, assignments, abd imperative control -structures. State and Assignment complicate our mental model of -computation. In particular, referential transparency is lost. On the -other hand, assignment gives us new ways to formulate programs -elegantly. As always, it depends on the situation whether purely -functional programming or programming with assignments works best. - -\chapter{Computing with Streams} - -The previous chapters have introduced variables, assignment and -stateful objects. We have seen how real-world objects that change -with time can be modelled by changing the state of variables in a -computation. Time changes in the real world thus are modelled by time -changes in program execution. Of course, such time changes are usually -stretched out or compressed, but their relative order is the same. -This seems quite natural, but there is a also price to pay: Our simple -and powerful substitution model for functional computation is no -longer applicable once we introduce variables and assignment. - -Is there another way? Can we model state change in the real world -using only immutable functions? Taking mathematics as a guide, the -answer is clearly yes: A time-changing quantity is simply modelled by -a function \code{f(t)} with a time parameter \code{t}. The same can be -done in computation. Instead of overwriting a variable with successive -values, we represent all these values as successive elements in a -list. So, a mutable variable \code{var x: T} gets replaced by an -immutable value \code{val x: List[T]}. In a sense, we trade space for -time -- the different values of the variable now all exit concurrently -as different elements of the list. One advantage of the list-based -view is that we can ``time-travel'', i.e. view several successive -values of the variable at the same time. Another advantage is that we -can make use of the powerful library of list processing functions, -which often simplifies computation. For instance, consider the -imperative way to compute the sum of all prime numbers in an interval: -\begin{lstlisting} -def sumPrimes(start: int, end: int): int = { - var i = start; - var acc = 0; - while (i < end) { - if (isPrime(i)) acc = acc + i; - i = i + 1; - } - acc -} -\end{lstlisting} -Note that the variable \code{i} ``steps through'' all values of the interval -\code{[start .. end-1]}. - -A more functional way is to represent the list of values of variable \code{i} directly as \code{range(start, end)}. Then the function can be rewritten as follows. -\begin{lstlisting} -def sumPrimes(start: int, end: int) = - sum(range(start, end) filter isPrime); -\end{lstlisting} - -No contest which program is shorter and clearer! However, the -functional program is also considerably less efficient since it -constructs a list of all numbers in the interval, and then another one -for the prime numbers. Even worse from an efficiency point of view is -the following example: - -To find the second prime number between \code{1000} and \code{10000}: -\begin{lstlisting} - range(1000, 10000) filter isPrime at 1 -\end{lstlisting} -Here, the list of all numbers between \code{1000} and \code{10000} is -constructed. But most of that list is never inspected! - -However, we can obtain efficient execution for examples like these by -a trick: -\begin{quote} -%\red - Avoid computing the tail of a sequence unless that tail is actually - necessary for the computation. -\end{quote} -We define a new class for such sequences, which is called \code{Stream}. - -Streams are created using the constant \code{empty} and the constructor \code{cons}, -which are both defined in module \code{scala.Stream}. For instance, the following -expression constructs a stream with elements \code{1} and \code{2}: -\begin{lstlisting} -Stream.cons(1, Stream.cons(2, Stream.empty)) -\end{lstlisting} -As another example, here is the analogue of \code{List.range}, -but returning a stream instead of a list: -\begin{lstlisting} -def range(start: Int, end: Int): Stream[Int] = - if (start >= end) Stream.empty - else Stream.cons(start, range(start + 1, end)); -\end{lstlisting} -(This function is also defined as given above in module -\code{Stream}). Even though \code{Stream.range} and \code{List.range} -look similar, their execution behavior is completely different: - -\code{Stream.range} immediately returns with a \code{Stream} object -whose first element is \code{start}. All other elements are computed -only when they are \emph{demanded} by calling the \code{tail} method -(which might be never at all). - -Streams are accessed just as lists. as for lists, the basic access -methods are \code{isEmpty}, \code{head} and \code{tail}. For instance, -we can print all elements of a stream as follows. -\begin{lstlisting} -def print(xs: Stream[a]): unit = - if (!xs.isEmpty) { System.out.println(xs.head); print(xs.tail) } -\end{lstlisting} -Streams also support almost all other methods defined on lists (see -below for where their methods sets differ). For instance, we can find -the second prime number between \code{1000} and \code{10000} by applying methods -\code{filter} and \code{apply} on an interval stream: -\begin{lstlisting} - Stream.range(1000, 10000) filter isPrime at 1 -\end{lstlisting} -The difference to the previous list-based implementation is that now -we do not needlessly construct and test for primality any numbers -beyond 3. - -\paragraph{Consing and appending streams} Two methods in class \code{List} -which are not supported by class \code{Stream} are \code{::} and -\code{:::}. The reason is that these methods are dispatched on their -right-hand side argument, which means that this argument needs to be -evaluated before the method is called. For instance, in the case of -\code{x :: xs} on lists, the tail \code{xs} needs to be evaluated -before \code{::} can be called and the new list can be constructed. -This does not work for streams, where we require that the tail of a -stream should not be evaluated until it is demanded by a \code{tail} operation. -The argument why list-append \code{:::} cannot be adapted to streams is analogous. - -Intstead of \code{x :: xs}, one uses \code{Stream.cons(x, xs)} for -constructing a stream with first element \code{x} and (unevaluated) -rest \code{xs}. Instead of \code{xs ::: ys}, one uses the operation -\code{xs append ys}. - -\chapter{Iterators} - -Iterators are the imperative version of streams. Like streams, -iterators describe potentially infinite lists. However, there is no -data-structure which contains the elements of an iterator. Instead, -iterators aloow one to step through the sequence, using two abstract methods \code{next} and \code{hasNext}. -\begin{lstlisting} -trait Iterator[+a] { - def hasNext: boolean; - def next: a; -\end{lstlisting} -Method \code{next} returns successive elements. Method \code{hasNext} -indicates whether there are still more elements to be returned by -\code{next}. Iterators also support some other methods, which are -explained later. - -As an example, here is an application which prints the squares of all -numbers from 1 to 100. -\begin{lstlisting} -var it: Iterator[int] = Iterator.range(1, 100); -while (it.hasNext) { - val x = it.next; - System.out.println(x * x) -} -\end{lstlisting} - -\section{Iterator Methods} - -Iterators support a rich set of methods besides \code{next} and -\code{hasNext}, which is described in the following. Many of these -methods mimic a corresponding functionality in lists. - -\paragraph{Append} -Method \code{append} constructs an iterator which resumes with the -given iterator \code{it} after the current iterator has finished. -\begin{lstlisting} - def append[b >: a](that: Iterator[b]): Iterator[b] = new Iterator[b] { - def hasNext = Iterator.this.hasNext || that.hasNext; - def next = if (Iterator.this.hasNext) Iterator.this.next else that.next; - } -\end{lstlisting} -The terms \code{Iterator.this.next} and \code{Iterator.this.hasNext} -in the definition of \code{append} call the corresponding methods as -they are defined in the enclosing \code{Iterator} class. If the -\code{Iterator} prefix to \code{this} would have been missing, -\code{hasNext} and \code{next} would have called recursively the -methods being defined in the result of \code{append}, which is not -what we want. - -\paragraph{Map, FlatMap, Foreach} Method \code{map} -constructs an iterator which returns all elements of the original -iterator transformed by a given function \code{f}. -\begin{lstlisting} - def map[b](f: a => b): Iterator[b] = new Iterator[b] { - def hasNext = Iterator.this.hasNext; - def next = f(Iterator.this.next) - } -\end{lstlisting} -Method \code{flatMap} is like method \code{map}, except that the -transformation function \code{f} now returns an iterator. -The result of \code{flatMap} is the iterator resulting from appending -together all iterators returned from successive calls of \code{f}. -\begin{lstlisting} - def flatMap[b](f: a => Iterator[b]): Iterator[b] = new Iterator[b] { - private var cur: Iterator[b] = Iterator.empty; - def hasNext: Boolean = - if (cur.hasNext) true - else if (Iterator.this.hasNext) { cur = f(Iterator.this.next); hasNext } - else false; - def next: b = - if (cur.hasNext) cur.next - else if (Iterator.this.hasNext) { cur = f(Iterator.this.next); next } - else error("next on empty iterator"); - } -\end{lstlisting} -Closely related to \code{map} is the \code{foreach} method, which -applies a given function to all elements of an iterator, but does not -construct a list of results -\begin{lstlisting} - def foreach(f: a => Unit): Unit = - while (hasNext) { f(next) } -\end{lstlisting} - -\paragraph{Filter} Method \code{filter} constructs an iterator which -returns all elements of the original iterator that satisfy a criterion -\code{p}. -\begin{lstlisting} - def filter(p: a => Boolean) = new BufferedIterator[a] { - private val source = - Iterator.this.buffered; - private def skip: Unit = - while (source.hasNext && !p(source.head)) { source.next; () } - def hasNext: Boolean = - { skip; source.hasNext } - def next: a = - { skip; source.next } - def head: a = - { skip; source.head; } - } -\end{lstlisting} -In fact, \code{filter} returns instances of a subclass of iterators -which are ``buffered''. A \code{BufferedIterator} object is an -interator which has in addition a method \code{head}. This method -returns the element which would otherwise have been returned by -\code{head}, but does not advance beyond that element. Hence, the -element returned by \code{head} is returned again by the next call to -\code{head} or \code{next}. Here is the definition of the -\code{BufferedIterator} trait. -\begin{lstlisting} -trait BufferedIterator[+a] extends Iterator[a] { - def head: a -} -\end{lstlisting} -Since \code{map}, \code{flatMap}, \code{filter}, and \code{foreach} -exist for iterators, it follows that for-comprehensions and for-loops -can also be used on iterators. For instance, the application which prints the squares of numbers between 1 and 100 could have equivalently been expressed as follows. -\begin{lstlisting} -for (val i <- Iterator.range(1, 100)) - System.out.println(i * i); -\end{lstlisting} - -\paragraph{Zip} Method \code{zip} takes another iterator and -returns an iterator consisting of pairs of corresponding elements -returned by the two iterators. -\begin{lstlisting} - def zip[b](that: Iterator[b]) = new Iterator[Pair[a, b]] { - def hasNext = Iterator.this.hasNext && that.hasNext; - def next = Pair(Iterator.this.next, that.next); - } -} -\end{lstlisting} - -\section{Constructing Iterators} - -Concrete iterators need to provide implementations for the two -abstract methods \code{next} and \code{hasNext} in class -\code{Iterator}. The simplest iterator is \code{Iterator.empty} which -always returns an empty sequence: -\begin{lstlisting} -object Iterator { - object empty extends Iterator[All] { - def hasNext = false; - def next: a = error("next on empty iterator"); - } -\end{lstlisting} -A more interesting iterator enumerates all elements of an array. This -iterator is constructed by the \code{fromArray} method, which is also defined in the object \code{Iterator} -\begin{lstlisting} - def fromArray[a](xs: Array[a]) = new Iterator[a] { - private var i = 0; - def hasNext: Boolean = - i < xs.length; - def next: a = - if (i < xs.length) { val x = xs(i) ; i = i + 1 ; x } - else error("next on empty iterator"); - } -\end{lstlisting} -Another iterator enumerates an integer interval. The -\code{Iterator.range} function returns an iterator which traverses a -given interval of integer values. It is defined as follows. -\begin{lstlisting} -object Iterator { - def range(start: int, end: int) = new Iterator[int] { - private var current = start; - def hasNext = current < end; - def next = { - val r = current; - if (current < end) current = current + 1 - else throw new Error("end of iterator"); - r - } - } -} -\end{lstlisting} -All iterators seen so far terminate eventually. It is also possible to -define iterators that go on forever. For instance, the following -iterator returns successive integers from some start -value\footnote{Due to the finite representation of type \prog{int}, -numbers will wrap around at $2^31$.}. -\begin{lstlisting} -def from(start: int) = new Iterator[int] { - private var last = start - 1; - def hasNext = true; - def next = { last = last + 1; last } -} -\end{lstlisting} - -\section{Using Iterators} - -Here are two more examples how iterators are used. First, to print all -elements of an array \code{xs: Array[int]}, one can write: -\begin{lstlisting} - Iterator.fromArray(xs) foreach (x => - System.out.println(x)) -\end{lstlisting} -Or, using a for-comprehension: -\begin{lstlisting} - for (val x <- Iterator.fromArray(xs)) - System.out.println(x) -\end{lstlisting} -As a second example, consider the problem of finding the indices of -all the elements in an array of \code{double}s greater than some -\code{limit}. The indices should be returned as an iterator. -This is achieved by the following expression. -\begin{lstlisting} -import Iterator._; -fromArray(xs) -.zip(from(0)) -.filter(case Pair(x, i) => x > limit) -.map(case Pair(x, i) => i) -\end{lstlisting} -Or, using a for-comprehension: -\begin{lstlisting} -import Iterator._; -for (val Pair(x, i) <- fromArray(xs) zip from(0); x > limit) -yield i -\end{lstlisting} - - - - - - - -\chapter{Combinator Parsing}\label{sec:combinator-parsing} - -In this chapter we describe how to write combinator parsers in -Scala. Such parsers are constructed from predefined higher-order -functions, so called {\em parser combinators}, that closely model the -constructions of an EBNF grammar \cite{wirth:ebnf}. - -As running example, we consider parsers for possibly nested -lists of identifiers and numbers, which -are described by the following context-free grammar. -\bda{p{3cm}cp{10cm}} -letter &::=& /* all letters */ \\ -digit &::=& /* all digits */ \\[0.5em] -ident &::=& letter \{letter $|$ digit \}\\ -number &::=& digit \{digit\}\\[0.5em] -list &::=& `(' [listElems] `)' \\ -listElems &::=& expr [`,' listElems] \\ -expr &::=& ident | number | list - -\eda - -\section{Simple Combinator Parsing} - -In this section we will only be concerned with the task of recognizing -input strings, not with processing them. So we can describe parsers -by the sets of input strings they accept. There are two -fundamental operators over parsers: -\code{&&&} expresses the sequential composition of a parser with -another, while \code{|||} expresses an alternative. These operations -will both be defined as methods of a \code{Parser} class. We will -also define constructors for the following primitive parsers: - -\begin{tabular}{ll} -\code{empty} & The parser that accepts the empty string -\\ -\code{fail} & The parser that accepts no string -\\ -\code{chr(c: char)} - & The parser that accepts the single-character string ``$c$''. -\\ -\code{chr(p: char => boolean)} - & The parser that accepts single-character strings - ``$c$'' \\ - & for which $p(c)$ is true. -\end{tabular} - -There are also the two higher-order parser combinators \code{opt}, -expressing optionality and \code{rep}, expressing repetition. -For any parser $p$, \code{opt(}$p$\code{)} yields a parser that -accepts the strings accepted by $p$ or else the empty string, while -\code{rep(}$p$\code{)} accepts arbitrary sequences of the strings accepted by -$p$. In EBNF, \code{opt(}$p$\code{)} corresponds to $[p]$ and -\code{rep(}$p$\code{)} corresponds to $\{p\}$. - -The central idea of parser combinators is that parsers can be produced -by a straightforward rewrite of the grammar, replacing \code{::=} with -\code{=}, sequencing with -\code{&&&}, choice -\code{|} with \code{|||}, repetition \code{\{...\}} with -\code{rep(...)} and optional occurrence \code{[...]} with \code{opt(...)}. -Applying this process to the grammar of lists -yields the following class. -\begin{lstlisting} -abstract class ListParsers extends Parsers { - def chr(p: char => boolean): Parser; - def chr(c: char): Parser = chr(d: char => d == c); - - def letter : Parser = chr(Character.isLetter); - def digit : Parser = chr(Character.isDigit); - - def ident : Parser = letter &&& rep(letter ||| digit); - def number : Parser = digit &&& rep(digit); - def list : Parser = chr('(') &&& opt(listElems) &&& chr(')'); - def listElems : Parser = expr &&& (chr(',') &&& listElems ||| empty); - def expr : Parser = ident ||| number ||| list; -} -\end{lstlisting} -This class isolates the grammar from other aspects of parsing. It -abstracts over the type of input -and over the method used to parse a single character -(represented by the abstract method \code{chr(p: char => -boolean))}. The missing bits of information need to be supplied by code -applying the parser class. - -It remains to explain how to implement a library with the combinators -described above. We will pack combinators and their underlying -implementation in a base class \code{Parsers}, which is inherited by -\code{ListParsers}. The first question to decide is which underlying -representation type to use for a parser. We treat parsers here -essentially as functions that take a datum of the input type -\code{intype} and that yield a parse result of type -\code{Option[intype]}. The \code{Option} type is predefined as -follows. -\begin{lstlisting} -trait Option[+a]; -case object None extends Option[All]; -case class Some[a](x: a) extends Option[a]; -\end{lstlisting} -A parser applied to some input either succeeds or fails. If it fails, -it returns the constant \code{None}. If it succeeds, it returns a -value of the form \code{Some(in1)} where \code{in1} represents the -input that remains to be parsed. -\begin{lstlisting} -abstract class Parsers { - type intype; - abstract class Parser { - type Result = Option[intype]; - def apply(in: intype): Result; -\end{lstlisting} -A parser also implements the combinators -for sequence and alternative: -\begin{lstlisting} - /*** p &&& q applies first p, and if that succeeds, then q - */ - def &&& (def q: Parser) = new Parser { - def apply(in: intype): Result = Parser.this.apply(in) match { - case None => None - case Some(in1) => q(in1) - } - } - - /*** p ||| q applies first p, and, if that fails, then q. - */ - def ||| (def q: Parser) = new Parser { - def apply(in: intype): Result = Parser.this.apply(in) match { - case None => q(in) - case s => s - } - } -\end{lstlisting} -The implementations of the primitive parsers \code{empty} and \code{fail} -are trivial: -\begin{lstlisting} - val empty = new Parser { def apply(in: intype): Result = Some(in) } - val fail = new Parser { def apply(in: intype): Result = None } -\end{lstlisting} -The higher-order parser combinators \code{opt} and \code{rep} can be -defined in terms of the combinators for sequence and alternative: -\begin{lstlisting} - def opt(p: Parser): Parser = p ||| empty; // p? = (p | <empty>) - def rep(p: Parser): Parser = opt(rep1(p)); // p* = [p+] - def rep1(p: Parser): Parser = p &&& rep(p); // p+ = p p* -} // end Parser -\end{lstlisting} -To run combinator parsers, we still need to decide on a way to handle -parser input. Several possibilities exist: The input could be -represented as a list, as an array, or as a random access file. Note -that the presented combinator parsers use backtracking to change from -one alternative to another. Therefore, it must be possible to reset -input to a point that was previously parsed. If one restricted the -focus to LL(1) grammars, a non-backtracking implementation of the -parser combinators in class \code{Parsers} would also be possible. In -that case sequential input methods based on (say) iterators or -sequential files would also be possible. - -In our example, we represent the input by a pair of a string, which -contains the input phrase as a whole, and an index, which represents -the portion of the input which has not yet been parsed. Since the -input string does not change, just the index needs to be passed around -as a result of individual parse steps. This leads to the following -class of parsers that read strings: -\begin{lstlisting} -class ParseString(s: String) extends Parsers { - type intype = int; - def chr(p: char => boolean) = new Parser { - def apply(in: int): Parser#Result = - if (in < s.length() && p(s charAt in)) Some(in + 1); - else None; - } - val input = 0; -} -\end{lstlisting} -This class implements a method \code{chr(p: char => boolean)} and a -value \code{input}. The \code{chr} method builds a parser that either -reads a single character satisfying the given predicate \code{p} or -fails. All other parsers over strings are ultimately implemented in -terms of that method. The \code{input} value represents the input as a -whole. In out case, it is simply value \code{0}, the start index of -the string to be read. - -Note \code{apply}'s result type, \code{Parser#Result}. This syntax -selects the type element \code{Result} of the type \code{Parser}. It -thus corresponds roughly to selecting a static inner class from some -outer class in Java. Note that we could {\em not} have written -\code{Parser.Result}, as the latter would express selection of the -\code{Result} element from a {\em value} named \code{Parser}. - -We have now extended the root class \code{Parsers} in two different -directions: Class \code{ListParsers} defines a grammar of phrases to -be parsed, whereas class \code{ParseString} defines a method by which -such phrases are input. To write a concrete parsing application, we -need to define both grammar and input method. We do this by combining -two extensions of \code{Parsers} using a {\em mixin composition}. -Here is the start of a sample application: -\begin{lstlisting} -object Test { - def main(args: Array[String]): unit = { - val ps = new ListParsers with ParseString(args(0)); -\end{lstlisting} -The last line above creates a new family of parsers by composing class -\code{ListParsers} with class \code{ParseString}. The two classes -share the common superclass \code{Parsers}. The abstract method -\code{chr} in \code{ListParsers} is implemented by class \code{ParseString}. - -To run the parser, we apply the start symbol of the grammar -\code{expr} the argument code{input} and observe the result: -\begin{lstlisting} - ps.expr(input) match { - case Some(n) => - System.out.println("parsed: " + args(0).substring(0, n)); - case None => - System.out.println("nothing parsed"); - } - } -}// end Test -\end{lstlisting} -Note the syntax ~\code{ps.expr(input)}, which treats the \code{expr} -parser as if it was a function. In Scala, objects with \code{apply} -methods can be applied directly to arguments as if they were functions. - -Here is an example run of the program above: -\begin{lstlisting} -> java examples.Test "(x,1,(y,z))" -parsed: (x,1,(y,z)) -> java examples.Test "(x,,1,(y,z))" -nothing parsed -\end{lstlisting} - -\section{\label{sec:parsers-results}Parsers that Produce Results} - -The combinator library of the previous section does not support the -generation of output from parsing. But usually one does not just want -to check whether a given string belongs to the defined language, one -also wants to convert the input string into some internal -representation such as an abstract syntax tree. - -In this section, we modify our parser library to build parsers that -produce results. We will make use of the for-comprehensions introduced -in Chapter~\ref{sec:for-notation}. The basic combinator of sequential -composition, formerly ~\code{p &&& q}, now becomes -\begin{lstlisting} -for (val x <- p; val y <- q) yield e . -\end{lstlisting} -Here, the names \code{x} and \code{y} are bound to the results of -executing the parsers \code{p} and \code{q}. \code{e} is an expression -that uses these results to build the tree returned by the composed -parser. - -Before describing the implementation of the new parser combinators, we -explain how the new building blocks are used. Say we want to modify -our list parser so that it returns an abstract syntax tree of the -parsed expression. Syntax trees are given by the following class hierarchy: -\begin{lstlisting} -abstract class Tree{} -case class Id (s: String) extends Tree {} -case class Num(n: int) extends Tree {} -case class Lst(elems: List[Tree]) extends Tree {} -\end{lstlisting} -That is, a syntax tree is an identifier, an integer number, or a -\code{Lst} node with a list of trees as descendants. - -As a first step towards parsers that produce results we define three -little parsers that return a single read character as result. -\begin{lstlisting} -abstract class CharParsers extends Parsers { - def any: Parser[char]; - def chr(ch: char): Parser[char] = - for (val c <- any; c == ch) yield c; - def chr(p: char => boolean): Parser[char] = - for (val c <- any; p(c)) yield c; -} -\end{lstlisting} -The \code{any} parser succeeds with the first character of remaining -input as long as input is nonempty. It is abstract in class -\code{ListParsers} since we want to abstract in this class from the -concrete input method used. The two \code{chr} parsers return as before -the first input character if it equals a given character or matches a -given predicate. They are now implemented in terms of \code{any}. - -The next level is represented by parsers reading identifiers, numbers -and lists. Here is a parser for identifiers. -\begin{lstlisting} -class ListParsers extends CharParsers { - def ident: Parser[Tree] = - for ( - val c: char <- chr(Character.isLetter); - val cs: List[char] <- rep(chr(Character.isLetterOrDigit)) - ) yield Id((c :: cs).mkString("", "", "")); -\end{lstlisting} -Remark: Because \code{chr(...)} returns a single character, its -repetition \code{rep(chr(...))} returns a list of characters. The -\code{yield} part of the for-comprehension converts all intermediate -results into an \code{Id} node with a string as element. To convert -the read characters into a string, it conses them into a single list, -and invokes the \code{mkString} method on the result. - -Here is a parser for numbers: -\begin{lstlisting} - def number: Parser[Tree] = - for ( - val d: char <- chr(Character.isDigit); - val ds: List[char] <- rep(chr(Character.isDigit)) - ) yield Num(((d - '0') /: ds) ((x, digit) => x * 10 + digit - '0')); -\end{lstlisting} -Intermediate results are in this case the leading digit of -the read number, followed by a list of remaining digits. The -\code{yield} part of the for-comprehension reduces these to a number -by a fold-left operation. - -Here is a parser for lists: -\begin{lstlisting} - def list: Parser[Tree] = - for ( - val _ <- chr('('); - val es <- listElems ||| succeed(List()); - val _ <- chr(')') - ) yield Lst(es); - - def listElems: Parser[List[Tree]] = - for ( - val x <- expr; - val xs <- chr(',') &&& listElems ||| succeed(List()) - ) yield x :: xs; -\end{lstlisting} -The \code{list} parser returns a \code{Lst} node with a list of trees -as elements. That list is either the result of \code{listElems}, or, -if that fails, the empty list (expressed here as: the result of a -parser which always succeeds with the empty list as result). - -The highest level of our grammar is represented by function -\code{expr}: -\begin{lstlisting} - def expr: Parser[Tree] = - ident ||| number ||| list -}// end ListParsers. -\end{lstlisting} -We now present the parser combinators that support the new -scheme. Parsers that succeed now return a parse result besides the -un-consumed input. -\begin{lstlisting} -abstract class Parsers { - type intype; - trait Parser[a] { - type Result = Option[Pair[a, intype]]; - def apply(in: intype): Result; -\end{lstlisting} -Parsers are parameterized with the type of their result. The class -\code{Parser[a]} now defines new methods \code{map}, \code{flatMap} -and \code{filter}. The \code{for} expressions are mapped by the -compiler to calls of these functions using the scheme described in -Chapter~\ref{sec:for-notation}. For parsers, these methods are -implemented as follows. -\begin{lstlisting} - def filter(pred: a => boolean) = new Parser[a] { - def apply(in: intype): Result = Parser.this.apply(in) match { - case None => None - case Some(Pair(x, in1)) => if (pred(x)) Some(Pair(x, in1)) else None - } - } - def map[b](f: a => b) = new Parser[b] { - def apply(in: intype): Result = Parser.this.apply(in) match { - case None => None - case Some(Pair(x, in1)) => Some(Pair(f(x), in1)) - } - } - def flatMap[b](f: a => Parser[b]) = new Parser[b] { - def apply(in: intype): Result = Parser.this.apply(in) match { - case None => None - case Some(Pair(x, in1)) => f(x).apply(in1) - } - } -\end{lstlisting} -The \code{filter} method takes as parameter a predicate $p$ which it -applies to the results of the current parser. If the predicate is -false, the parser fails by returning \code{None}; otherwise it returns -the result of the current parser. The \code{map} method takes as -parameter a function $f$ which it applies to the results of the -current parser. The \code{flatMap} takes as parameter a function -\code{f} which returns a parser. It applies \code{f} to the result of -the current parser and then continues with the resulting parser. The -\code{|||} method is essentially defined as before. The -\code{&&&} method can now be defined in terms of \code{for}. -\begin{lstlisting} - def ||| (def p: Parser[a]) = new Parser[a] { - def apply(in: intype): Result = Parser.this.apply(in) match { - case None => p(in) - case s => s - } - } - - def &&& [b](def p: Parser[b]): Parser[b] = - for (val _ <- this; val x <- p) yield x; - }// end Parser -\end{lstlisting} - -The primitive parser \code{succeed} replaces \code{empty}. It consumes -no input and returns its parameter as result. -\begin{lstlisting} - def succeed[a](x: a) = new Parser[a] { - def apply(in: intype) = Some(Pair(x, in)) - } -\end{lstlisting} - -The parser combinators \code{rep} and \code{opt} now also return -results. \code{rep} returns a list which contains as elements the -results of each iteration of its sub-parser. \code{opt} returns a list -which is either empty or returns as single element the result of the -optional parser. -\begin{lstlisting} - def rep[a](p: Parser[a]): Parser[List[a]] = - rep1(p) ||| succeed(List()); - - def rep1[a](p: Parser[a]): Parser[List[a]] = - for (val x <- p; val xs <- rep(p)) yield x :: xs; - - def opt[a](p: Parser[a]): Parser[List[a]] = - (for (val x <- p) yield List(x)) ||| succeed(List()); -} // end Parsers -\end{lstlisting} -The root class \code{Parsers} abstracts over which kind of -input is parsed. As before, we determine the input method by a separate class. -Here is \code{ParseString}, this time adapted to parsers that return results. -It defines now the method \code{any}, which returns the first input character. -\begin{lstlisting} -class ParseString(s: String) extends Parsers { - type intype = int; - val input = 0; - def any = new Parser[char] { - def apply(in: int): Parser[char]#Result = - if (in < s.length()) Some(Pair(s charAt in, in + 1)) else None; - } -} -\end{lstlisting} -The rest of the application is as before. Here is a test program which -constructs a list parser over strings and prints out the result of -applying it to the command line argument. -\begin{lstlisting} -object Test { - def main(args: Array[String]): unit = { - val ps = new ListParsers with ParseString(args(0)); - ps.expr(input) match { - case Some(Pair(list, _)) => System.out.println("parsed: " + list); - case None => "nothing parsed" - } - } -} -\end{lstlisting} - -\begin{exercise}\label{exercise:end-marker} The parsers we have defined so -far can succeed even if there is some input beyond the parsed text. To -prevent this, one needs a parser which recognizes the end of input. -Redesign the parser library so that such a parser can be introduced. -Which classes need to be modified? -\end{exercise} - -\chapter{\label{sec:hm}Hindley/Milner Type Inference} - -This chapter demonstrates Scala's data types and pattern matching by -developing a type inference system in the Hindley/Milner style -\cite{milner:polymorphism}. The source language for the type inferencer is -lambda calculus with a let construct called Mini-ML. Abstract syntax -trees for the Mini-ML are represented by the following data type of -\code{Terms}. -\begin{lstlisting} -trait Term {} -case class Var(x: String) extends Term { - override def toString() = x -} -case class Lam(x: String, e: Term) extends Term { - override def toString() = "(\\" + x + "." + e + ")" -} -case class App(f: Term, e: Term) extends Term { - override def toString() = "(" + f + " " + e + ")" -} -case class Let(x: String, e: Term, f: Term) extends Term { - override def toString() = "let " + x + " = " + e + " in " + f; -} -\end{lstlisting} -There are four tree constructors: \code{Var} for variables, \code{Lam} -for function abstractions, \code{App} for function applications, and -\code{Let} for let expressions. Each case class overrides the -\code{toString()} method of class \code{Any}, so that terms can be -printed in legible form. - -We next define the types that are -computed by the inference system. -\begin{lstlisting} -sealed trait Type {} -case class Tyvar(a: String) extends Type { - override def toString() = a -} -case class Arrow(t1: Type, t2: Type) extends Type { - override def toString() = "(" + t1 + "->" + t2 + ")" -} -case class Tycon(k: String, ts: List[Type]) extends Type { - override def toString() = - k + (if (ts.isEmpty) "" else ts.mkString("[", ",", "]")) -} -\end{lstlisting} -There are three type constructors: \code{Tyvar} for type variables, -\code{Arrow} for function types and \code{Tycon} for type constructors -such as \code{boolean} or \code{List}. Type constructors have as -component a list of their type parameters. This list is empty for type -constants such as \code{boolean}. Again, the type constructors -implement the \code{toString} method in order to display types legibly. - -Note that \code{Type} is a \code{sealed} class. This means that no -subclasses or data constructors that extend \code{Type} can be formed -outside the sequence of definitions in which \code{Type} is defined. -This makes \code{Type} a {\em closed} algebraic data type with exactly -three alternatives. By contrast, type \code{Term} is an {\em open} -algebraic type for which further alternatives can be defined. - -The main parts of the type inferencer are contained in object -\code{typeInfer}. We start with a utility function which creates -fresh type variables: -\begin{lstlisting} -object typeInfer { - private var n: Int = 0; - def newTyvar(): Type = { n = n + 1 ; Tyvar("a" + n) } -\end{lstlisting} -We next define a class for substitutions. A substitution is an -idempotent function from type variables to types. It maps a finite -number of type variables to some types, and leaves all other type -variables unchanged. The meaning of a substitution is extended -point-wise to a mapping from types to types. -\begin{lstlisting} - trait Subst extends Any with Function1[Type,Type] { - - def lookup(x: Tyvar): Type; - - def apply(t: Type): Type = t match { - case tv @ Tyvar(a) => val u = lookup(tv); if (t == u) t else apply(u); - case Arrow(t1, t2) => Arrow(apply(t1), apply(t2)) - case Tycon(k, ts) => Tycon(k, ts map apply) - } - - def extend(x: Tyvar, t: Type) = new Subst { - def lookup(y: Tyvar): Type = if (x == y) t else Subst.this.lookup(y); - } - } - val emptySubst = new Subst { def lookup(t: Tyvar): Type = t } -\end{lstlisting} -We represent substitutions as functions, of type \code{Type => -Type}. This is achieved by making class \code{Subst} inherit from the -unary function type \code{Function1[Type, Type]}\footnote{ -The class inherits the function type as a mixin rather than as a direct -superclass. This is because in the current Scala implementation, the -\code{Function1} type is a Java interface, which cannot be used as a direct -superclass of some other class.}. -To be an instance -of this type, a substitution \code{s} has to implement an \code{apply} -method that takes a \code{Type} as argument and yields another -\code{Type} as result. A function application \code{s(t)} is then -interpreted as \code{s.apply(t)}. - -The \code{lookup} method is abstract in class \code{Subst}. There are -two concrete forms of substitutions which differ in how they -implement this method. One form is defined by the \code{emptySubst} value, -the other is defined by the \code{extend} method in class -\code{Subst}. - -The next data type describes type schemes, which consist of a type and -a list of names of type variables which appear universally quantified -in the type scheme. -For instance, the type scheme $\forall a\forall b.a \!\arrow\! b$ would be represented in the type checker as: -\begin{lstlisting} -TypeScheme(List(TyVar("a"), TyVar("b")), Arrow(Tyvar("a"), Tyvar("b"))) . -\end{lstlisting} -The class definition of type schemes does not carry an extends -clause; this means that type schemes extend directly class -\code{AnyRef}. Even though there is only one possible way to -construct a type scheme, a case class representation was chosen -since it offers convenient ways to decompose an instance of this type into its -parts. -\begin{lstlisting} -case class TypeScheme(tyvars: List[String], tpe: Type) { - def newInstance: Type = { - (emptySubst /: tyvars) ((s, tv) => s.extend(tv, newTyvar())) (tpe); - } -} -\end{lstlisting} -Type scheme objects come with a method \code{newInstance}, which -returns the type contained in the scheme after all universally type -variables have been renamed to fresh variables. The implementation of -this method folds (with \code{/:}) the type scheme's type variables -with an operation which extends a given substitution \code{s} by -renaming a given type variable \code{tv} to a fresh type -variable. The resulting substitution renames all type variables of the -scheme to fresh ones. This substitution is then applied to the type -part of the type scheme. - -The last type we need in the type inferencer is -\code{Env}, a type for environments, which associate variable names -with type schemes. They are represented by a type alias \code{Env} in -module \code{typeInfer}: -\begin{lstlisting} -type Env = List[Pair[String, TypeScheme]]; -\end{lstlisting} -There are two operations on environments. The \code{lookup} function -returns the type scheme associated with a given name, or \code{null} -if the name is not recorded in the environment. -\begin{lstlisting} - def lookup(env: Env, x: String): TypeScheme = env match { - case List() => null - case Pair(y, t) :: env1 => if (x == y) t else lookup(env1, x) - } -\end{lstlisting} -The \code{gen} function turns a given type into a type scheme, -quantifying over all type variables that are free in the type, but -not in the environment. -\begin{lstlisting} - def gen(env: Env, t: Type): TypeScheme = - TypeScheme(tyvars(t) diff tyvars(env), t); -\end{lstlisting} -The set of free type variables of a type is simply the set of all type -variables which occur in the type. It is represented here as a list of -type variables, which is constructed as follows. -\begin{lstlisting} - def tyvars(t: Type): List[Tyvar] = t match { - case tv @ Tyvar(a) => - List(tv) - case Arrow(t1, t2) => - tyvars(t1) union tyvars(t2) - case Tycon(k, ts) => - (List[Tyvar]() /: ts) ((tvs, t) => tvs union tyvars(t)); - } -\end{lstlisting} -Note that the syntax \code{tv @ ...} in the first pattern introduces a variable -which is bound to the pattern that follows. Note also that the explicit type parameter \code{[Tyvar]} in the expression of the third -clause is needed to make local type inference work. - -The set of free type variables of a type scheme is the set of free -type variables of its type component, excluding any quantified type variables: -\begin{lstlisting} - def tyvars(ts: TypeScheme): List[Tyvar] = - tyvars(ts.tpe) diff ts.tyvars; -\end{lstlisting} -Finally, the set of free type variables of an environment is the union -of the free type variables of all type schemes recorded in it. -\begin{lstlisting} - def tyvars(env: Env): List[Tyvar] = - (List[Tyvar]() /: env) ((tvs, nt) => tvs union tyvars(nt._2)); -\end{lstlisting} -A central operation of Hindley/Milner type checking is unification, -which computes a substitution to make two given types equal (such a -substitution is called a {\em unifier}). Function \code{mgu} computes -the most general unifier of two given types $t$ and $u$ under a -pre-existing substitution $s$. That is, it returns the most general -substitution $s'$ which extends $s$, and which makes $s'(t)$ and -$s'(u)$ equal types. -\begin{lstlisting} - def mgu(t: Type, u: Type, s: Subst): Subst = Pair(s(t), s(u)) match { - case Pair(Tyvar(a), Tyvar(b)) if (a == b) => - s - case Pair(Tyvar(a), _) if !(tyvars(u) contains a) => - s.extend(Tyvar(a), u) - case Pair(_, Tyvar(a)) => - mgu(u, t, s) - case Pair(Arrow(t1, t2), Arrow(u1, u2)) => - mgu(t1, u1, mgu(t2, u2, s)) - case Pair(Tycon(k1, ts), Tycon(k2, us)) if (k1 == k2) => - (s /: (ts zip us)) ((s, tu) => mgu(tu._1, tu._2, s)) - case _ => - throw new TypeError("cannot unify " + s(t) + " with " + s(u)) - } -\end{lstlisting} -The \code{mgu} function throws a \code{TypeError} exception if no -unifier substitution exists. This can happen because the two types -have different type constructors at corresponding places, or because a -type variable is unified with a type that contains the type variable -itself. Such exceptions are modeled here as instances of case classes -that inherit from the predefined \code{Exception} class. -\begin{lstlisting} - case class TypeError(s: String) extends Exception(s) {} -\end{lstlisting} -The main task of the type checker is implemented by function -\code{tp}. This function takes as parameters an environment $env$, a -term $e$, a proto-type $t$, and a -pre-existing substitution $s$. The function yields a substitution -$s'$ that extends $s$ and that -turns $s'(env) \ts e: s'(t)$ into a derivable type judgment according -to the derivation rules of the Hindley/Milner type system \cite{milner:polymorphism}. A -\code{TypeError} exception is thrown if no such substitution exists. -\begin{lstlisting} - def tp(env: Env, e: Term, t: Type, s: Subst): Subst = { - current = e; - e match { - case Var(x) => - val u = lookup(env, x); - if (u == null) throw new TypeError("undefined: " + x); - else mgu(u.newInstance, t, s) - - case Lam(x, e1) => - val a = newTyvar(), b = newTyvar(); - val s1 = mgu(t, Arrow(a, b), s); - val env1 = Pair(x, TypeScheme(List(), a)) :: env; - tp(env1, e1, b, s1) - - case App(e1, e2) => - val a = newTyvar(); - val s1 = tp(env, e1, Arrow(a, t), s); - tp(env, e2, a, s1) - - case Let(x, e1, e2) => - val a = newTyvar(); - val s1 = tp(env, e1, a, s); - tp(Pair(x, gen(env, s1(a))) :: env, e2, t, s1) - } - } - var current: Term = null; -\end{lstlisting} -To aid error diagnostics, the \code{tp} function stores the currently -analyzed sub-term in variable \code{current}. Thus, if type checking -is aborted with a \code{TypeError} exception, this variable will -contain the subterm that caused the problem. - -The last function of the type inference module, \code{typeOf}, is a -simplified facade for \code{tp}. It computes the type of a given term -$e$ in a given environment $env$. It does so by creating a fresh type -variable $a$, computing a typing substitution that makes $env \ts e: a$ -into a derivable type judgment, and returning -the result of applying the substitution to $a$. -\begin{lstlisting} - def typeOf(env: Env, e: Term): Type = { - val a = newTyvar(); - tp(env, e, a, emptySubst)(a) - } -}// end typeInfer -\end{lstlisting} -To apply the type inferencer, it is convenient to have a predefined -environment that contains bindings for commonly used constants. The -module \code{predefined} defines an environment \code{env} that -contains bindings for the types of booleans, numbers and lists -together with some primitive operations over them. It also -defines a fixed point operator \code{fix}, which can be used to -represent recursion. -\begin{lstlisting} -object predefined { - val booleanType = Tycon("Boolean", List()); - val intType = Tycon("Int", List()); - def listType(t: Type) = Tycon("List", List(t)); - - private def gen(t: Type): typeInfer.TypeScheme = typeInfer.gen(List(), t); - private val a = typeInfer.newTyvar(); - val env = List( - Pair("true", gen(booleanType)), - Pair("false", gen(booleanType)), - Pair("if", gen(Arrow(booleanType, Arrow(a, Arrow(a, a))))), - Pair("zero", gen(intType)), - Pair("succ", gen(Arrow(intType, intType))), - Pair("nil", gen(listType(a))), - Pair("cons", gen(Arrow(a, Arrow(listType(a), listType(a))))), - Pair("isEmpty", gen(Arrow(listType(a), booleanType))), - Pair("head", gen(Arrow(listType(a), a))), - Pair("tail", gen(Arrow(listType(a), listType(a)))), - Pair("fix", gen(Arrow(Arrow(a, a), a))) - ) -} -\end{lstlisting} -Here's an example how the type inferencer can be used. -Let's define a function \code{showType} which returns the type of -a given term computed in the predefined environment -\code{Predefined.env}: -\begin{lstlisting} -object testInfer { - def showType(e: Term): String = - try { - typeInfer.typeOf(predefined.env, e).toString(); - } catch { - case typeInfer.TypeError(msg) => - "\n cannot type: " + typeInfer.current + - "\n reason: " + msg; - } -\end{lstlisting} -Then the application -\begin{lstlisting} -> testInfer.showType(Lam("x", App(App(Var("cons"), Var("x")), Var("nil")))); -\end{lstlisting} -would give the response -\begin{lstlisting} -> (a6->List[a6]) -\end{lstlisting} -To make the type inferencer more useful, we complete it with a -parser. -Function \code{main} of module \code{testInfer} -parses and typechecks a Mini-ML expression which is given as the first -command line argument. -\begin{lstlisting} - def main(args: Array[String]): unit = { - val ps = new MiniMLParsers with ParseString(args(0)); - ps.all(ps.input) match { - case Some(Pair(term, _)) => - System.out.println("" + term + ": " + showType(term)); - case None => - System.out.println("syntax error"); - } - } -}// typeInf -\end{lstlisting} -To do the parsing, method \code{main} uses the combinator parser -scheme of Chapter~\ref{sec:combinator-parsing}. It creates a parser -family \code{ps} as a mixin composition of parsers -that understand MiniML (but do not know where input comes from) and -parsers that read input from a given string. The \code{MiniMLParsers} -object implements parsers for the following grammar. -\begin{lstlisting} -term ::= "\" ident "." term - | term1 {term1} - | "let" ident "=" term "in" term -term1 ::= ident - | "(" term ")" -all ::= term ";" -\end{lstlisting} -Input as a whole is described by the production \code{all}; it -consists of a term followed by a semicolon. We allow ``whitespace'' -consisting of one or more space, tabulator or newline characters -between any two lexemes (this is not reflected in the grammar -above). Identifiers are defined as in -Chapter~\ref{sec:combinator-parsing} except that an identifier cannot -be one of the two reserved words "let" and "in". -\begin{lstlisting} -abstract class MiniMLParsers[intype] extends CharParsers[intype] { - - /** whitespace */ - def whitespace = rep{chr(' ') ||| chr('\t') ||| chr('\n')}; - - /** A given character, possible preceded by whitespace */ - def wschr(ch: char) = whitespace &&& chr(ch); - - /** identifiers or keywords */ - def id: Parser[String] = - for ( - val c: char <- whitespace &&& chr(Character.isLetter); - val cs: List[char] <- rep(chr(Character.isLetterOrDigit)) - ) yield (c :: cs).mkString("", "", ""); - - /** Non-keyword identifiers */ - def ident: Parser[String] = - for (val s <- id; s != "let" && s != "in") yield s; - - /** term = '\' ident '.' term | term1 {term1} | let ident "=" term in term */ - def term: Parser[Term] = - ( for ( - val _ <- wschr('\\'); - val x <- ident; - val _ <- wschr('.'); - val t <- term) - yield Lam(x, t): Term ) - ||| - ( for ( - val letid <- id; letid == "let"; - val x <- ident; - val _ <- wschr('='); - val t <- term; - val inid <- id; inid == "in"; - val c <- term) - yield Let(x, t, c) ) - ||| - ( for ( - val t <- term1; - val ts <- rep(term1)) - yield (t /: ts)((f, arg) => App(f, arg)) ); - - /** term1 = ident | '(' term ')' */ - def term1: Parser[Term] = - ( for (val s <- ident) - yield Var(s): Term ) - ||| - ( for ( - val _ <- wschr('('); - val t <- term; - val _ <- wschr(')')) - yield t ); - - /** all = term ';' */ - def all: Parser[Term] = - for ( - val t <- term; - val _ <- wschr(';')) - yield t; -} -\end{lstlisting} -Here are some sample MiniML programs and the output the type inferencer gives for each of them: -\begin{lstlisting} -> java testInfer -| "\x.\f.f(f x);" -(\x.(\f.(f (f x)))): (a8->((a8->a8)->a8)) - -> java testInfer -| "let id = \x.x -| in if (id true) (id nil) (id (cons zero nil));" -let id = (\x.x) in (((if (id true)) (id nil)) (id ((cons zero) nil))): List[Int] - -> java testInfer -| "let id = \x.x -| in if (id true) (id nil);" -let id = (\x.x) in ((if (id true)) (id nil)): (List[a13]->List[a13]) - -> java testInfer -| "let length = fix (\len.\xs. -| if (isEmpty xs) -| zero -| (succ (len (tail xs)))) -| in (length nil);" -let length = (fix (\len.(\xs.(((if (isEmpty xs)) zero) -(succ (len (tail xs))))))) in (length nil): Int - -> java testInfer -| "let id = \x.x -| in if (id true) (id nil) zero;" -let id = (\x.x) in (((if (id true)) (id nil)) zero): - cannot type: zero - reason: cannot unify Int with List[a14] -\end{lstlisting} - -\begin{exercise}\label{exercise:hm-parse} Using the parser library constructed in -Exercise~\ref{exercise:end-marker}, modify the MiniML parser library -so that no marker ``;'' is necessary for indicating the end of input. -\end{exercise} - -\begin{exercise}\label{execcise:hm-extend} Extend the Mini-ML parser and type -inferencer with a \code{letrec} construct which allows the definition of -recursive functions. Syntax: -\begin{lstlisting} -letrec ident "=" term in term . -\end{lstlisting} -The typing of \code{letrec} is as for {let}, -except that the defined identifier is visible in the defining expression. Using \code{letrec}, the \code{length} function for lists can now be defined as follows. -\begin{lstlisting} -letrec length = \xs. - if (isEmpty xs) - zero - (succ (length (tail xs))) -in ... -\end{lstlisting} -\end{exercise} - -\chapter{Abstractions for Concurrency}\label{sec:ex-concurrency} - -This section reviews common concurrent programming patterns and shows -how they can be implemented in Scala. - -\section{Signals and Monitors} - -\example -The {\em monitor} provides the basic means for mutual exclusion -of processes in Scala. It is defined as follows. -\begin{lstlisting} -trait Monitor { - def synchronized [a] (def e: a): a; - def await(def cond: boolean) = while (false == cond) { wait() } -} -\end{lstlisting} -The \code{synchronized} method in class \code{Monitor} executes its -argument computation \code{e} in mutual exclusive mode -- at any one -time, only one thread can execute a \code{synchronized} argument of a -given monitor. - -Threads can suspend inside a monitor by waiting on a signal. The -standard \code{java.lang.Object} class offers for this purpose methods -\code{send} and \code{notify}. Threads that call the \code{wait} -method wait until a \code{notify} method of the same object is called -subsequently by some other thread. Calls to \code{notify} with no -threads waiting for the signal are ignored. -Here are the signatures of these methods in class -\code{java.lang.Object}. -\begin{lstlisting} - def wait(): unit; - def wait(msec: long): unit; - def notify(): unit; - def notifyAll(): unit; -\end{lstlisting} -There is also a timed form of \code{wait}, which blocks only as long -as no signal was received or the specified amount of time (given in -milliseconds) has elapsed. Furthermore, there is a \code{notifyAll} -method which unblocks all threads which wait for the signal. -These methods, as well as class \code{Monitor} are primitive in -Scala; they are implemented in terms of the underlying runtime system. - -Typically, a thread waits for some condition to be established. If the -condition does not hold at the time of the wait call, the thread -blocks until some other thread has established the condition. It is -the responsibility of this other thread to wake up waiting processes -by issuing a \code{notify} or \code{notifyAll}. Note however, that -there is no guarantee that a waiting process gets to run immediately -when the call to notify is issued. It could be that other processes -get to run first which invalidate the condition again. Therefore, the -correct form of waiting for a condition $C$ uses a while loop: -\begin{lstlisting} -while (!$C$) wait(); -\end{lstlisting} -The monitor class contains a method \code{await} which does the same -thing; using it, the above loop can be expressed as \lstinline@await($C$)@. - -As an example of how monitors are used, here is is an implementation -of a bounded buffer class. -\begin{lstlisting} -class BoundedBuffer[a](N: Int) extends Monitor() { - var in = 0, out = 0, n = 0; - val elems = new Array[a](N); - - def put(x: a) = synchronized { - await (n < N); - elems(in) = x ; in = (in + 1) % N ; n = n + 1; - if (n == 1) notifyAll(); - } - - def get: a = synchronized { - await (n != 0); - val x = elems(out) ; out = (out + 1) % N ; n = n - 1; - if (n == N - 1) notifyAll(); - x - } -} -\end{lstlisting} -And here is a program using a bounded buffer to communicate between a -producer and a consumer process. -\begin{lstlisting} -import concurrent.ops._; -... -val buf = new BoundedBuffer[String](10) -spawn { while (true) { val s = produceString ; buf.put(s) } } -spawn { while (true) { val s = buf.get ; consumeString(s) } } -} -\end{lstlisting} -The \code{spawn} method spawns a new thread which executes the -expression given in the parameter. It is defined in object \code{concurrent.ops} -as follows. -\begin{lstlisting} -def spawn(def p: unit) = { - val t = new Thread() { override def run() = p; } - t.start() -} -\end{lstlisting} - -\comment{ -\section{Logic Variable} - -A logic variable (or lvar for short) offers operations \code{:=} -and \code{value} to define the variable and to retrieve its value. -Variables can be \code{define}d only once. A call to \code{value} -blocks until the variable has been defined. - -Logic variables can be implemented as follows. - -\begin{lstlisting} -class LVar[a] extends Monitor { - private val defined = new Signal - private var isDefined: boolean = false - private var v: a - def value = synchronized { - if (!isDefined) defined.wait - v - } - def :=(x: a) = synchronized { - v = x ; isDefined = true ; defined.send - } -} -\end{lstlisting} -} - -\section{SyncVars} - -A synchronized variable (or syncvar for short) offers \code{get} and -\code{put} operations to read and set the variable. \code{get} operations -block until the variable has been defined. An \code{unset} operation -resets the variable to undefined state. - -Here's the standard implementation of synchronized variables. -\begin{lstlisting} -package scala.concurrent; -class SyncVar[a] with Monitor { - private var isDefined: Boolean = false; - private var value: a = _; - def get = synchronized { - if (!isDefined) wait(); - value - } - def set(x: a) = synchronized { - value = x ; isDefined = true ; notifyAll(); - } - def isSet: Boolean = - isDefined; - def unset = synchronized { - isDefined = false; - } -} -\end{lstlisting} - -\section{Futures} -\label{sec:futures} - -A {\em future} is a value which is computed in parallel to some other -client thread, to be used by the client thread at some future time. -Futures are used in order to make good use of parallel processing -resources. A typical usage is: - -\begin{lstlisting} -import scala.concurrent.ops._; -... -val x = future(someLengthyComputation); -anotherLengthyComputation; -val y = f(x()) + g(x()); -\end{lstlisting} - -The \code{future} method is defined in object -\code{scala.concurrent.ops} as follows. -\begin{lstlisting} -def future[a](def p: a): unit => a = { - val result = new SyncVar[a]; - fork { result.set(p) } - (() => result.get) -} -\end{lstlisting} - -The \code{future} method gets as parameter a computation \code{p} to -be performed. The type of the computation is arbitrary; it is -represented by \code{future}'s type parameter \code{a}. The -\code{future} method defines a guard \code{result}, which takes a -parameter representing the result of the computation. It then forks -off a new thread that computes the result and invokes the -\code{result} guard when it is finished. In parallel to this thread, -the function returns an anonymous function of type \code{a}. -When called, this functions waits on the result guard to be -invoked, and, once this happens returns the result argument. -At the same time, the function reinvokes the \code{result} guard with -the same argument, so that future invocations of the function can -return the result immediately. - -\section{Parallel Computations} - -The next example presents a function \code{par} which takes a pair of -computations as parameters and which returns the results of the computations -in another pair. The two computations are performed in parallel. - -The function is defined in object -\code{scala.concurrent.ops} as follows. -\begin{lstlisting} - def par[a, b](def xp: a, def yp: b): Pair[a, b] = { - val y = new SyncVar[b]; - spawn { y set yp } - Pair(xp, y.get) - } -\end{lstlisting} -Defined in the same place is a function \code{replicate} which performs a -number of replicates of a computation in parallel. Each -replication instance is passed an integer number which identifies it. -\begin{lstlisting} - def replicate(start: Int, end: Int)(p: Int => Unit): Unit = { - if (start == end) - () - else if (start + 1 == end) - p(start) - else { - val mid = (start + end) / 2; - spawn { replicate(start, mid)(p) } - replicate(mid, end)(p) - } - } -\end{lstlisting} - -The next function uses \code{replicate} to perform parallel -computations on all elements of an array. - -\begin{lstlisting} -def parMap[a,b](f: a => b, xs: Array[a]): Array[b] = { - val results = new Array[b](xs.length); - replicate(0, xs.length) { i => results(i) = f(xs(i)) } - results -} -\end{lstlisting} - -\section{Semaphores} - -A common mechanism for process synchronization is a {\em lock} (or: -{\em semaphore}). A lock offers two atomic actions: \prog{acquire} and -\prog{release}. Here's the implementation of a lock in Scala: - -\begin{lstlisting} -package scala.concurrent; - -class Lock with Monitor { - var available = true; - def acquire = synchronized { - if (!available) wait(); - available = false - } - def release = synchronized { - available = true; - notify() - } -} -\end{lstlisting} - -\section{Readers/Writers} - -A more complex form of synchronization distinguishes between {\em -readers} which access a common resource without modifying it and {\em -writers} which can both access and modify it. To synchronize readers -and writers we need to implement operations \prog{startRead}, \prog{startWrite}, -\prog{endRead}, \prog{endWrite}, such that: -\begin{itemize} -\item there can be multiple concurrent readers, -\item there can only be one writer at one time, -\item pending write requests have priority over pending read requests, -but don't preempt ongoing read operations. -\end{itemize} -The following implementation of a readers/writers lock is based on the -{\em mailbox} concept (see Section~\ref{sec:mailbox}). - -\begin{lstlisting} -import scala.concurrent._; - -class ReadersWriters { - val m = new MailBox; - private case class Writers(n: int), Readers(n: int); - Writers(0); Readers(0); - def startRead = m receive { - case Writers(n) if n == 0 => m receive { - case Readers(n) => Writers(0) ; Readers(n+1); - } - } - def startWrite = m receive { - case Writers(n) => - Writers(n+1); - m receive { case Readers(n) if n == 0 => } - } - def endRead = m receive { - case Readers(n) => Readers(n-1) - } - def endWrite = m receive { - case Writers(n) => Writers(n-1) ; if (n == 0) Readers(0) - } -} -\end{lstlisting} - -\section{Asynchronous Channels} - -A fundamental way of interprocess communication is the asynchronous -channel. Its implementation makes use the following simple class for linked -lists: -\begin{lstlisting} -class LinkedList[a] { - var elem: a = _; - var next: LinkedList[a] = null; -} -\end{lstlisting} -To facilitate insertion and deletion of elements into linked lists, -every reference into a linked list points to the node which precedes -the node which conceptually forms the top of the list. -Empty linked lists start with a dummy node, whose successor is \code{null}. - -The channel class uses a linked list to store data that has been sent -but not read yet. In the opposite direction, a threads that -wish to read from an empty channel, register their presence by -incrementing the \code{nreaders} field and waiting to be notified. -\begin{lstlisting} -package scala.concurrent; - -class Channel[a] with Monitor { - class LinkedList[a] { - var elem: a = _; - var next: LinkedList[a] = null; - } - private var written = new LinkedList[a]; - private var lastWritten = new LinkedList[a]; - private var nreaders = 0; - - def write(x: a) = synchronized { - lastWritten.elem = x; - lastWritten.next = new LinkedList[a]; - lastWritten = lastWritten.next; - if (nreaders > 0) notify(); - } - - def read: a = synchronized { - if (written.next == null) { - nreaders = nreaders + 1; wait(); nreaders = nreaders - 1; - } - val x = written.elem; - written = written.next; - x - } -} -\end{lstlisting} - -\section{Synchronous Channels} - -Here's an implementation of synchronous channels, where the sender of -a message blocks until that message has been received. Synchronous -channels only need a single variable to store messages in transit, but -three signals are used to coordinate reader and writer processes. -\begin{lstlisting} -package scala.concurrent; - -class SyncChannel[a] with Monitor { - private var data: a = _; - private var reading = false; - private var writing = false; - - def write(x: a) = synchronized { - await(!writing); - data = x; - writing = true; - if (reading) notifyAll(); - else await(reading) - } - - def read: a = synchronized { - await(!reading); - reading = true; - await(writing); - val x = data; - writing = false; - reading = false; - notifyAll(); - x - } -} -\end{lstlisting} - -\section{Workers} - -Here's an implementation of a {\em compute server} in Scala. The -server implements a \code{future} method which evaluates a given -expression in parallel with its caller. Unlike the implementation in -Section~\ref{sec:futures} the server computes futures only with a -predefined number of threads. A possible implementation of the server -could run each thread on a separate processor, and could hence avoid -the overhead inherent in context-switching several threads on a single -processor. - -\begin{lstlisting} -import scala.concurrent._, scala.concurrent.ops._; - -class ComputeServer(n: Int) { - - private trait Job { - type t; - def task: t; - def ret(x: t): Unit; - } - - private val openJobs = new Channel[Job](); - - private def processor(i: Int): Unit = { - while (true) { - val job = openJobs.read; - job.ret(job.task) - } - } - - def future[a](def p: a): () => a = { - val reply = new SyncVar[a](); - openJobs.write{ - new Job { - type t = a; - def task = p; - def ret(x: a) = reply.set(x); - } - } - () => reply.get - } - - spawn(replicate(0, n) { processor }) -} -\end{lstlisting} -Expressions to be computed (i.e. arguments -to calls of \code{future}) are written to the \code{openJobs} -channel. A {\em job} is an object with -\begin{itemize} -\item -An abstract type \code{t} which describes the result of the compute -job. -\item -A parameterless \code{task} method of type \code{t} which denotes -the expression to be computed. -\item -A \code{return} method which consumes the result once it is -computed. -\end{itemize} -The compute server creates $n$ \code{processor} processes as part of -its initialization. Every such process repeatedly consumes an open -job, evaluates the job's \code{task} method and passes the result on -to the job's -\code{return} method. The polymorphic \code{future} method creates -a new job where the \code{return} method is implemented by a guard -named \code{reply} and inserts this job into the set of open jobs by -calling the \code{isOpen} guard. It then waits until the corresponding -\code{reply} guard is called. - -The example demonstrates the use of abstract types. The abstract type -\code{t} keeps track of the result type of a job, which can vary -between different jobs. Without abstract types it would be impossible -to implement the same class to the user in a statically type-safe -way, without relying on dynamic type tests and type casts. - - -Here is some code which uses the compute server to evaluate -the expression \code{41 + 1}. -\begin{lstlisting} -object Test with Executable { - val server = new ComputeServer(1); - val f = server.future(41 + 1); - Console.println(f()) -} -\end{lstlisting} - -\section{Mailboxes} -\label{sec:mailbox} - -Mailboxes are high-level, flexible constructs for process -synchronization and communication. They allow sending and receiving of -messages. A {\em message} in this context is an arbitrary object. -There is a special message \code{TIMEOUT} which is used to signal a -time-out. -\begin{lstlisting} -case class TIMEOUT; -\end{lstlisting} -Mailboxes implement the following signature. -\begin{lstlisting} -class MailBox { - def send(msg: Any): unit; - def receive[a](f: PartialFunction[Any, a]): a; - def receiveWithin[a](msec: long)(f: PartialFunction[Any, a]): a; -} -\end{lstlisting} -The state of a mailbox consists of a multi-set of messages. -Messages are added to the mailbox the \code{send} method. Messages -are removed using the \code{receive} method, which is passed a message -processor \code{f} as argument, which is a partial function from -messages to some arbitrary result type. Typically, this function is -implemented as a pattern matching expression. The \code{receive} -method blocks until there is a message in the mailbox for which its -message processor is defined. The matching message is then removed -from the mailbox and the blocked thread is restarted by applying the -message processor to the message. Both sent messages and receivers are -ordered in time. A receiver $r$ is applied to a matching message $m$ -only if there is no other (message, receiver) pair which precedes $(m, -r)$ in the partial ordering on pairs that orders each component in -time. - -As a simple example of how mailboxes are used, consider a -one-place buffer: -\begin{lstlisting} -class OnePlaceBuffer { - private val m = new MailBox; // An internal milbox - private case class Empty, Full(x: int); // Types of messages we deal with - m send Empty; // Initialization - def write(x: int): unit = - m receive { case Empty => m send Full(x) } - def read: int = - m receive { case Full(x) => m send Empty ; x } -} -\end{lstlisting} -Here's how the mailbox class can be implemented: -\begin{lstlisting} -class MailBox with Monitor { - private abstract class Receiver extends Signal { - def isDefined(msg: Any): boolean; - var msg = null; - } -\end{lstlisting} -We define an internal class for receivers with a test method -\code{isDefined}, which indicates whether the receiver is -defined for a given message. The receiver inherits from class -\code{Signal} a \code{notify} method which is used to wake up a -receiver thread. When the receiver thread is woken up, the message it -needs to be applied to is stored in the \code{msg} variable of -\code{Receiver}. -\begin{lstlisting} - private val sent = new LinkedList[Any]; - private var lastSent = sent; - private val receivers = new LinkedList[Receiver]; - private var lastReceiver = receivers; -\end{lstlisting} -The mailbox class maintains two linked lists, -one for sent but unconsumed messages, the other for waiting receivers. -\begin{lstlisting} - def send(msg: Any): unit = synchronized { - var r = receivers, r1 = r.next; - while (r1 != null && !r1.elem.isDefined(msg)) { - r = r1; r1 = r1.next; - } - if (r1 != null) { - r.next = r1.next; r1.elem.msg = msg; r1.elem.notify; - } else { - lastSent = insert(lastSent, msg); - } - } -\end{lstlisting} -The \code{send} method first checks whether a waiting receiver is -applicable to the sent message. If yes, the receiver is notified. -Otherwise, the message is appended to the linked list of sent messages. -\begin{lstlisting} - def receive[a](f: PartialFunction[Any, a]): a = { - val msg: Any = synchronized { - var s = sent, s1 = s.next; - while (s1 != null && !f.isDefinedAt(s1.elem)) { - s = s1; s1 = s1.next - } - if (s1 != null) { - s.next = s1.next; s1.elem - } else { - val r = insert(lastReceiver, new Receiver { - def isDefined(msg: Any) = f.isDefinedAt(msg); - }); - lastReceiver = r; - r.elem.wait(); - r.elem.msg - } - } - f(msg) - } -\end{lstlisting} -The \code{receive} method first checks whether the message processor function -\code{f} can be applied to a message that has already been sent but that -was not yet consumed. If yes, the thread continues immediately by -applying \code{f} to the message. Otherwise, a new receiver is created -and linked into the \code{receivers} list, and the thread waits for a -notification on this receiver. Once the thread is woken up again, it -continues by applying \code{f} to the message that was stored in the -receiver. The insert method on linked lists is defined as follows. -\begin{lstlisting} - def insert(l: LinkedList[a], x: a): LinkedList[a] = { - l.next = new LinkedList[a]; - l.next.elem = x; - l.next.next = l.next; - l - } -\end{lstlisting} -The mailbox class also offers a method \code{receiveWithin} -which blocks for only a specified maximal amount of time. If no -message is received within the specified time interval (given in -milliseconds), the message processor argument $f$ will be unblocked -with the special \code{TIMEOUT} message. The implementation of -\code{receiveWithin} is quite similar to \code{receive}: -\begin{lstlisting} - def receiveWithin[a](msec: long)(f: PartialFunction[Any, a]): a = { - val msg: Any = synchronized { - var s = sent, s1 = s.next; - while (s1 != null && !f.isDefinedAt(s1.elem)) { - s = s1; s1 = s1.next ; - } - if (s1 != null) { - s.next = s1.next; s1.elem - } else { - val r = insert(lastReceiver, new Receiver { - def isDefined(msg: Any) = f.isDefinedAt(msg); - }); - lastReceiver = r; - r.elem.wait(msec); - if (r.elem.msg == null) r.elem.msg = TIMEOUT; - r.elem.msg - } - } - f(msg) - } -} // end MailBox -\end{lstlisting} -The only differences are the timed call to \code{wait}, and the -statement following it. - -\section{Actors} -\label{sec:actors} - -Chapter~\ref{chap:example-auction} sketched as a program example the -implementation of an electronic auction service. This service was -based on high-level actor processes, that work by inspecting messages -in their mailbox using pattern matching. An actor is simply a thread -whose communication primitives are those of a mailbox. Actors are -hence defined as a mixin composition extension of Java's standard -\code{Thread} class with the \code{MailBox} class. -\begin{lstlisting} -abstract class Actor extends Thread with MailBox; -\end{lstlisting} - -\comment{ -As an extended example of an application that uses actors, we come -back to the auction server example of Section~\ref{sec:ex-auction}. -The following code implements: - -\begin{figure}[thb] -\begin{lstlisting} -class AuctionMessage; -case class - Offer(bid: int, client: Process), // make a bid - Inquire(client: Process) extends AuctionMessage // inquire status - -class AuctionReply; -case class - Status(asked; int, expiration: Date), // asked sum, expiration date - BestOffer, // yours is the best offer - BeatenOffer(maxBid: int), // offer beaten by maxBid - AuctionConcluded(seller: Process, client: Process),// auction concluded - AuctionFailed // failed with no bids - AuctionOver extends AuctionReply // bidding is closed -\end{lstlisting} -\end{figure} - -\begin{lstlisting} -class Auction(seller: Process, minBid: int, closing: Date) - extends Process { - - val timeToShutdown = 36000000 // msec - val delta = 10 // bid increment -\end{lstlisting} -\begin{lstlisting} - def run = { - var askedBid = minBid - var maxBidder: Process = null - while (true) { - receiveWithin ((closing - Date.currentDate).msec) { - case Offer(bid, client) => { - if (bid >= askedBid) { - if (maxBidder != null && maxBidder != client) { - maxBidder send BeatenOffer(bid) - } - maxBidder = client - askedBid = bid + delta - client send BestOffer - } else client send BeatenOffer(maxBid) - } -\end{lstlisting} -\begin{lstlisting} - case Inquire(client) => { - client send Status(askedBid, closing) - } -\end{lstlisting} -\begin{lstlisting} - case TIMEOUT => { - if (maxBidder != null) { - val reply = AuctionConcluded(seller, maxBidder) - maxBidder send reply - seller send reply - } else seller send AuctionFailed - receiveWithin (timeToShutdown) { - case Offer(_, client) => client send AuctionOver ; discardAndContinue - case _ => discardAndContinue - case TIMEOUT => stop - } - } -\end{lstlisting} -\begin{lstlisting} - case _ => discardAndContinue - } - } - } -\end{lstlisting} -\begin{lstlisting} - def houseKeeping: int = { - val Limit = 100 - var nWaiting: int = 0 - receiveWithin(0) { - case _ => - nWaiting = nWaiting + 1 - if (nWaiting > Limit) { - receiveWithin(0) { - case Offer(_, _) => continue - case TIMEOUT => - case _ => discardAndContinue - } - } else continue - case TIMEOUT => - } - } -} -\end{lstlisting} -\begin{lstlisting} -class Bidder (auction: Process, minBid: int, maxBid: int) - extends Process { - val MaxTries = 3 - val Unknown = -1 - - var nextBid = Unknown -\end{lstlisting} -\begin{lstlisting} - def getAuctionStatus = { - var nTries = 0 - while (nextBid == Unknown && nTries < MaxTries) { - auction send Inquiry(this) - nTries = nTries + 1 - receiveWithin(waitTime) { - case Status(bid, _) => bid match { - case None => nextBid = minBid - case Some(curBid) => nextBid = curBid + Delta - } - case TIMEOUT => - case _ => continue - } - } - status - } -\end{lstlisting} -\begin{lstlisting} - def bid: unit = { - if (nextBid < maxBid) { - auction send Offer(nextBid, this) - receive { - case BestOffer => - receive { - case BeatenOffer(bestBid) => - nextBid = bestBid + Delta - bid - case AuctionConcluded(seller, client) => - transferPayment(seller, nextBid) - case _ => continue - } - - case BeatenOffer(bestBid) => - nextBid = nextBid + Delta - bid - - case AuctionOver => - - case _ => continue - } - } - } -\end{lstlisting} -\begin{lstlisting} - def run = { - getAuctionStatus - if (nextBid != Unknown) bid - } - - def transferPayment(seller: Process, amount: int) -} -\end{lstlisting} -} - +\input{ExamplesPart} \bibliographystyle{alpha} \bibliography{Scala} diff --git a/doc/reference/ScalaReference.tex b/doc/reference/ScalaReference.tex index a2df3eadfc..6794167de8 100644 --- a/doc/reference/ScalaReference.tex +++ b/doc/reference/ScalaReference.tex @@ -50,27 +50,6 @@ \usepackage{math} \usepackage{scaladefs} -\renewcommand{\todo}[1]{} -\newcommand{\notyet}[1]{\footnote{#1 not yet implemented.}} -\newcommand{\Ts}{\mbox{\sl Ts}} -\newcommand{\tps}{\mbox{\sl tps}} -\newcommand{\psig}{\mbox{\sl psig}} -\newcommand{\args}{\mbox{\sl args}} -\newcommand{\targs}{\mbox{\sl targs}} -\newcommand{\enums}{\mbox{\sl enums}} -\newcommand{\proto}{\mbox{\sl pt}} -\newcommand{\argtypes}{\mbox{\sl Ts}} -\newcommand{\stats}{\mbox{\sl stats}} -\newcommand{\overload}{\la\mbox{\sf and}\ra} -\newcommand{\op}{\mbox{\sl op}} - -\newcommand{\ifqualified}[1]{} -\newcommand{\iflet}[1]{} -\newcommand{\ifundefvar}[1]{} -\newcommand{\iffinaltype}[1]{} -\newcommand{\ifpackaging}[1]{} -\newcommand{\ifnewfor}[1]{} - \ifpdf \pdfinfo { /Author (Martin Odersky) @@ -109,4994 +88,14 @@ Matthias Zenger \\[25mm]\ } %\todo{`:' as synonym for $\EXTENDS$?} -\chapter{Rationale} +\part{Rationale} \input{RationalePart} -\chapter{Lexical Syntax} - -This chapter defines the syntax of Scala tokens. Tokens are -constructed from characters in the following character sets: -\begin{enumerate} -\item Whitespace characters. -\item Lower case letters ~\lstinline@`a' | $\ldots$ | `z'@~ and -upper case letters ~\lstinline@`A' | $\ldots$ | `Z' | `$\Dollar$' | `_'@. -\item Digits ~\lstinline@`0' | $\ldots$ | `9'@. -\item Parentheses ~\lstinline@`(' | `)' | `[' | `]' | `{' | `}'@. -\item Delimiter characters ~\lstinline@``' | `'' | `"' | `.' | `;' | `,'@. -\item Operator characters. These include all printable ASCII characters -which are in none of the sets above. -\end{enumerate} - -These sets are extended in the usual way to Unicode. - -\section{Identifiers}\label{sec:idents} - -\syntax\begin{lstlisting} -op ::= special {special} -varid ::= lower {letter $|$ digit} [`_' [id]] -id ::= upper {letter $|$ digit} [`_' [id]] - | varid - | op - | ```string chars`'' -\end{lstlisting} - -There are three ways to form an identifier. First, an identifier can -start with a letter which can be followed by an arbitrary sequence of -letters and digits. This may be followed by an underscore -`\lstinline@_@' character and another string of characters that by -themselves make up an identifier. Second, an identifier can be start -with a special character followed by an arbitrary sequence of special -characters. Finally, an identifier may also be formed by an arbitrary -string between backquotes (host systems may impose some restrictions -on which strings are legal for identifiers). As usual, a longest -match rule applies. For instance, the string - -\begin{lstlisting} -big_bob++=z3 -\end{lstlisting} - -decomposes into the three identifiers \lstinline@big_bob@, \lstinline@++=@, and -\code{z3}. The rules for pattern matching further distinguish between -{\em variable identifiers}, which start with a lower case letter, and -{\em constant identifiers}, which do not. - - -The `\lstinline[mathescape=false]@$@'\comment{$} character is reserved for compiler-synthesized identifiers. -User programs are not allowed to define identifiers which contain `\lstinline[mathescape=false]@$@'\comment{$} -characters. - -The following names are reserved words instead of being members of the -syntactic class \code{id} of lexical identifiers. - -\begin{lstlisting} -abstract case catch class def -do else extends false final -finally for if import new -null object override package private -protected return sealed super this -throw trait try true type -val var while with yield -_ : = => <- <: >: # @ -\end{lstlisting} - -The Unicode operator `$\Rightarrow$' has the ASCII equivalent -`$=>$', which is also reserved. - -\example -Here are examples of identifiers: -\begin{lstlisting} - x Object maxIndex p2p empty_? - + +_field -\end{lstlisting} - -\section{Braces and Semicolons} - -A semicolon `\lstinline@;@' is implicitly inserted after every closing brace -if there is a new line character between closing brace and the next -regular token after it, except if that token cannot legally start a -statement. - -The tokens which cannot legally start a statement -are the following delimiters and reserved words: -\begin{lstlisting} -catch else extends finally with yield -, . ; : = => <- <: >: # @ ) ] } -\end{lstlisting} - -\section{Literals} - -There are literals for integer numbers (of types \code{Int} and \code{Long}), -floating point numbers (of types \code{Float} and \code{Double}), characters, and -strings. The syntax of these literals is in each case as in Java. - -\syntax\begin{lstlisting} -intLit ::= $\mbox{\rm\em ``as in Java''}$ -floatLit ::= $\mbox{\rm\em ``as in Java''}$ -charLit ::= $\mbox{\rm\em ``as in Java''}$ -stringLit ::= $\mbox{\rm\em ``as in Java''}$ -\end{lstlisting} - -\section{Whitespace and Comments} - -Tokens may be separated by whitespace characters (ASCII codes 0 to 32) -and/or comments. Comments come in two forms: - -A single-line comment is a sequence of characters which starts with -\lstinline@//@ and extends to the end of the line. - -A multi-line comment is a sequence of characters between \lstinline@/*@ and -\lstinline@*/@. Multi-line comments may be nested. - - -\chapter{\label{sec:names}Identifiers, Names and Scopes} - -Names in Scala identify types, values, methods, and classes which -are collectively called {\em entities}. Names are introduced by -definitions, declarations (\sref{sec:defs}) or import clauses -(\sref{sec:import}), which are collectively called {\em binders}. - -There are two different name spaces, one for types (\sref{sec:types}) -and one for terms (\sref{sec:exprs}). The same name may designate a -type and a term, depending on the context where the name is used. - -A definition or declaration has a {\em scope} in which the entity -defined by a single name can be accessed using a simple name. Scopes -are nested, and a definition or declaration in some inner scope {\em -shadows} a definition in an outer scope that contributes to the same -name space. Furthermore, a definition or declaration shadows bindings -introduced by a preceding import clause, even if the import clause is -in the same block. Import clauses, on the other hand, only shadow -bindings introduced by other import clauses in outer blocks. - -A reference to an unqualified (type- or term-) identifier $x$ is bound -by the unique binder, which -\begin{itemize} -\item defines an entity with name $x$ in the same namespace as the -identifier, and -\item shadows all other binders that define entities with name $x$ in that namespace. -\end{itemize} -It is an error if no such binder exists. If $x$ is bound by an import -clause, then the simple name $x$ is taken to be equivalent to the -qualified name to which $x$ is mapped by the import clause. If $x$ is bound by a definition or declaration, -then $x$ refers to the entity introduced by that -binder. In that case, the type of $x$ is the type of the referenced -entity. - -\example Consider the following nested definitions and imports: - -\begin{lstlisting} -object m1 { - object m2 { val x: int = 1; val y: int = 2 } - object m3 { val x: boolean = true; val y: String = "" } - val x: int = 3; - { import m2._; // shadows nothing - // reference to `x' is ambiguous here - val x: String = "abc"; // shadows preceding import - // name `x' refers to latest val definition - { import m3._ // shadows only preceding import m2 - // reference to `x' is ambiguous here - // name `y' refers to latest import clause - } - } -} -\end{lstlisting} - -A reference to a qualified (type- or term-) identifier $e.x$ refers to -the member of the type $T$ of $e$ which has the name $x$ in the same -namespace as the identifier. It is an error if $T$ is not a value type -(\sref{sec:value-types}). The type of $e.x$ is the member type of the -referenced entity in $T$. - -\chapter{\label{sec:types}Types} - -\syntax\begin{lstlisting} - Type ::= Type1 `=>' Type - | `(' [Types] `)' `=>' Type - | Type1 - Type1 ::= SimpleType {with SimpleType} [Refinement] - SimpleType ::= StableId - | SimpleType `#' id - | Path `.' type - | SimpleType TypeArgs - | `(' Type ')' - Types ::= Type {`,' Type} -\end{lstlisting} - -We distinguish between first-order types and type constructors, which -take type parameters and yield types. A subset of first-order types -called {\em value types} represents sets of (first-class) values. -Value types are either {\em concrete} or {\em abstract}. Every -concrete value type can be represented as a {\em class type}, i.e.\ a -type designator (\sref{sec:type-desig}) that refers to a -class\footnote{We assume that objects and packages also -implicitly define a class (of the same name as the object or package, -but inaccessible to user programs).} (\sref{sec:classes}), -or as a {\em compound type} (\sref{sec:compound-types}) -consisting of class types and possibly -also a refinement (\sref{sec:refinements}) that further constrains the -types of its members. - -A shorthand exists for denoting function types -(\sref{sec:function-types}). Abstract value types are introduced by -type parameters and abstract type bindings (\sref{sec:typedcl}). -Parentheses in types are used for grouping. - -Non-value types capture properties of -identifiers that are not values -(\sref{sec:synthetic-types}). There is no syntax to express these -types directly in Scala. - -\section{Paths}\label{sec:paths}\label{sec:stable-ids} - -\syntax\begin{lstlisting} - StableId ::= id - | Path `.' id - | [id '.'] super [`[' id `]'] `.' id - Path ::= StableId - | [id `.'] this -\end{lstlisting} - -Paths are not types themselves, but they can be a part of named types -and in that way form a central role in Scala's type system. - -A path is one of the following. -\begin{itemize} -\item -The empty path $\epsilon$ (which cannot be written explicitly in user programs). -\item -\lstinline@$C$.this@, where $C$ references a class. -The path \code{this} is taken as a shorthand for \lstinline@$C$.this@ where -$C$ is the name of the class directly enclosing the reference. -\item -\lstinline@$p$.$x$@ where $p$ is a path and $x$ is a stable member of $p$. -{\em Stable members} are members introduced by value or object -definitions, as well as packages. -\item -\lstinline@$C$.super.$x$@ or \lstinline@$C$.super[$M\,$].$x$@ -where $C$ references a class and $x$ references a -stable member of the super class or designated mixin class $M$ of $C$. -The prefix \code{super} is taken as a shorthand for \lstinline@$C$.super@ where -$C$ is the name of the class directly enclosing the reference. -\end{itemize} -A {\em stable identifier} is a path which ends in an identifier. - -\section{Value Types}\label{sec:value-types} - -\subsection{Singleton Types} -\label{sec:singleton-type} - -\syntax\begin{lstlisting} - SimpleType ::= Path `.' type -\end{lstlisting} - -A singleton type is of the form \lstinline@$p$.type@, where $p$ is a -path. The type denotes the set of values consisting of -exactly the value denoted by $p$. - -\subsection{Type Projection} -\label{sec:type-project} - -\syntax\begin{lstlisting} -SimpleType ::= SimpleType `#' id -\end{lstlisting} - -A type projection \lstinline@$T$#$x$@ references the type member named -$x$ of type $T$. $T$ must be either a singleton type, -or a non-abstract class type, or a Java class type (in either of the -last two cases, it is guaranteed that $T$ has no abstract type -members). - -\subsection{Type Designators} -\label{sec:type-desig} - -\syntax\begin{lstlisting} - SimpleType ::= StableId -\end{lstlisting} - -A type designator refers to a named value type. It can be simple or -qualified. All such type designators are shorthands for type projections. - -Specifically, the unqualified type name $t$ where $t$ is bound in some -class, object, or package $C$ is taken as a shorthand for -\lstinline@$C$.this.type#$t$@. If $t$ is not bound in a class, object, or -package, then $t$ is taken as a shorthand for -\lstinline@$\epsilon$.type#$t$@. - -A qualified type designator has the form \lstinline@$p$.$t$@ where $p$ is -a path (\sref{sec:paths}) and $t$ is a type name. Such a type designator is -equivalent to the type projection \lstinline@$p$.type#$x$@. - -\example -Some type designators and their expansions are listed below. We assume -a local type parameter $t$, a value \code{mytable} -with a type member \code{Node} and the standard class \lstinline@scala.Int@, -\begin{lstlisting} - t $\epsilon$.type#t - Int scala.type#Int - scala.Int scala.type#Int - data.maintable.Node data.maintable.type#Node -\end{lstlisting} - -\subsection{Parameterized Types} -\label{sec:param-types} - -\syntax\begin{lstlisting} - SimpleType ::= SimpleType TypeArgs - TypeArgs ::= `[' Types `]' -\end{lstlisting} - -A parameterized type $T[U_1 \commadots U_n]$ consists of a type designator -$T$ and type parameters $U_1 \commadots U_n$ where $n \geq 1$. $T$ -must refer to a type constructor which takes $n$ type parameters $a_1 \commadots a_n$ -with lower bounds $L_1 \commadots L_n$ and upper bounds $U_1 \commadots U_n$. - -The parameterized type is well-formed if each actual type parameter -{\em conforms to its bounds}, i.e.\ $L_i\sigma <: T_i <: U_i\sigma$ where $\sigma$ -is the substitution $[a_1 := T_1 \commadots a_n := T_n]$. - -\example\label{ex:param-types} -Given the partial type definitions: - -\begin{lstlisting} - class TreeMap[a <: Ord[a], b] { $\ldots$ } - class List[a] { $\ldots$ } - class I extends Ord[I] { $\ldots$ } -\end{lstlisting} - -the following parameterized types are well formed: - -\begin{lstlisting} - TreeMap[I, String] - List[I] - List[List[Boolean]] -\end{lstlisting} - -\example Given the type definitions of \ref{ex:param-types}, -the following types are ill-formed: - -\begin{lstlisting} - TreeMap[I] // illegal: wrong number of parameters - TreeMap[List[I], Boolean] // illegal: type parameter not within bound -\end{lstlisting} - -\subsection{Compound Types} -\label{sec:compound-types} -\label{sec:refinements} - -\syntax\begin{lstlisting} - Type ::= SimpleType {with SimpleType} [Refinement] - Refinement ::= `{' [RefineStat {`;' RefineStat}] `}' - RefineStat ::= Dcl - | type TypeDef {`,' TypeDef} - | -\end{lstlisting} - -A compound type ~\lstinline@$T_1$ with $\ldots$ with $T_n$ {$R\,$}@~ represents -objects with members as given in the component types $T_1 \commadots -T_n$ and the refinement \lstinline@{$R\,$}@. Each component type $T_i$ must be a -class type \todo{Relax for first?}. A -refinement \lstinline@{$R\,$}@ contains declarations and type -definitions. Each declaration or definition in a refinement must -override a declaration or definition in one of the component types -$T_1 \commadots T_n$. The usual rules for overriding (\sref{sec:overriding}) -apply. If no refinement is given, the empty refinement is implicitly -added, i.e. ~\lstinline@$T_1$ with $\ldots$ with $T_n$@~ is a shorthand for -~\lstinline@$T_1$ with $\ldots$ with $T_n$ {}@. - -\subsection{Function Types} -\label{sec:function-types} - -\syntax\begin{lstlisting} - SimpleType ::= Type1 `=>' Type - | `(' [Types] `)' `=>' Type -\end{lstlisting} -The type ~\lstinline@($T_1 \commadots T_n$) => $U$@~ represents the set of function -values that take arguments of types $T_1 \commadots T_n$ and yield -results of type $U$. In the case of exactly one argument type -~\lstinline@$T$ => $U$@~ is a shorthand for ~\lstinline@($T\,$) => $U$@. Function types -associate to the right, e.g.~\lstinline@($S\,$) => ($T\,$) => $U$@~ is the same as -~\lstinline@($S\,$) => (($T\,$) => $U\,$)@. - -Function types are shorthands for class types that define \code{apply} -functions. Specifically, the $n$-ary function type -~\lstinline@($T_1 \commadots T_n$) => U@~ is a shorthand for the class type -\lstinline@Function$n$[$T_1 \commadots T_n$,$U\,$]@. Such class -types are defined in the Scala library for $n$ between 0 and 9 as follows. -\begin{lstlisting} -package scala; -trait Function$n$[-$T_1 \commadots$ -$T_n$, +$R$] { - def apply($x_1$: $T_1 \commadots x_n$: $T_n$): $R$; - override def toString() = "<function>"; -} -\end{lstlisting} -Hence, function types are covariant in their result type, and -contravariant in their argument types. - -\section{Non-Value Types} -\label{sec:synthetic-types} - -The types explained in the following do not denote sets of values, nor -do they appear explicitely in programs. They are introduced in this -report as the internal types of defined identifiers. - -\subsection{Method Types} -\label{sec:method-types} - -A method type is denoted internally as $(\Ts)U$, where $(\Ts)$ is a -sequence of types $(T_1 \commadots T_n)$ for some $n \geq 0$ -and $U$ is a (value or method) type. This type represents named -methods that take arguments of types $T_1 \commadots T_n$ -and that return a result of type $U$. - -Method types associate to the right: $(\Ts_1)(\Ts_2)U$ is treated as -$(\Ts_1)((\Ts_2)U)$. - -A special case are types of methods without any parameters. They are -written here $[]T$, following the syntax for polymorphic method types -(\sref{sec:poly-types}). Parameterless methods name expressions that -are re-evaluated each time the parameterless method name is -referenced. - -Method types do not exist as types of values. If a method name is used -as a value, its type is implicitly converted to a corresponding -function type (\sref{sec:impl-conv}). - -\example The declarations -\begin{lstlisting} -def a: Int -def b (x: Int): Boolean -def c (x: Int) (y: String, z: String): String -\end{lstlisting} -produce the typings -\begin{lstlisting} -a: [] Int -b: (Int) Boolean -c: (Int) (String, String) String -\end{lstlisting} - -\subsection{Polymorphic Method Types} -\label{sec:poly-types} - -A polymorphic method type is denoted internally as ~\lstinline@[$\tps\,$]$T$@~ where -\lstinline@[$\tps\,$]@ is a type parameter section -~\lstinline@[$a_1$ <: $L_1$ >: $U_1 \commadots a_n$ <: $L_n$ >: $U_n$]@~ -for some $n \geq 0$ and $T$ is a -(value or method) type. This type represents named methods that -take type arguments ~\lstinline@$S_1 \commadots S_n$@~ which -conform (\sref{sec:param-types}) to the lower bounds -~\lstinline@$S_1 \commadots S_n$@~ and the upper bounds -~\lstinline@$U_1 \commadots U_n$@~ and that yield results of type $T$. - -\example The declarations -\begin{lstlisting} -def empty[a]: List[a] -def union[a <: Comparable[a]] (x: Set[a], xs: Set[a]): Set[a] -\end{lstlisting} -produce the typings -\begin{lstlisting} -empty : [a >: All <: Any] List[a] -union : [a >: All <: Comparable[a]] (x: Set[a], xs: Set[a]) Set[a] . -\end{lstlisting} - -\comment{ -\subsection{Overloaded Types} -\label{sec:overloaded-types} - -More than one values or methods are defined in the same scope with the -same name, we model - -An overloaded type consisting of type alternatives $T_1 \commadots -T_n (n \geq 2)$ is denoted internally $T_1 \overload \ldots \overload T_n$. - -\example The definitions -\begin{lstlisting} -def println: unit; -def println(s: string): unit = $\ldots$; -def println(x: float): unit = $\ldots$; -def println(x: float, width: int): unit = $\ldots$; -def println[a](x: a)(tostring: a => String): unit = $\ldots$ -\end{lstlisting} -define a single function \code{println} which has an overloaded -type. -\begin{lstlisting} -println: [] unit $\overload$ - (String) unit $\overload$ - (float) unit $\overload$ - (float, int) unit $\overload$ - [a] (a) (a => String) unit -\end{lstlisting} - -\example The definitions -\begin{lstlisting} -def f(x: T): T = $\ldots$; -val f = 0 -\end{lstlisting} -define a function \code{f} which has type ~\lstinline@(x: T)T $\overload$ Int@. -} - -\section{Base Classes and Member Definitions} -\label{sec:base-classes-member-defs} - -Types, bounds and base classes of class members depend on the way the -members are referenced. Central here are three notions, namely: -\begin{enumerate} -\item the notion of the set of base classes of a type $T$, -\item the notion of a type $T$ in some class $C$ seem from some - prefix type $S$, -\item the notion of a member binding of some type $T$. -\end{enumerate} -These notions are defined mutually recursively as follows. - -1. The set of {\em base classes} of a type is a set of class types, -given as follows. -\begin{itemize} -\item -The base classes of a class type $C$ are the base classes of class -$C$. -\item -The base classes of an aliased type are the base classes of its alias. -\item -The base classes of an abstract type are the base classes of its upper bound. -\item -The base classes of a parameterized type -~\lstinline@$C$[$T_1 \commadots T_n$]@~ are the base classes -of type $C$, where every occurrence of a type parameter $a_i$ -of $C$ has been replaced by the corresponding parameter type $T_i$. -\item -The base classes of a singleton type \lstinline@$p$.type@ are the base classes of -the type of $p$. -\item -The base classes of a compound type -~\lstinline@$T_1$ with $\ldots$ with $T_n$ {$R\,$}@~ -are the {\em reduced union} of the base -classes of all $T_i$'s. This means: -Let the multi-set $\SS$ be the multi-set-union of the -base classes of all $T_i$'s. -If $\SS$ contains several type instances of the same class, say -~\lstinline@$S^i$#$C$[$T^i_1 \commadots T^i_n$]@~ $(i \in I)$, then -all those instances -are replaced by one of them which conforms to all -others. It is an error if no such instance exists, or if $C$ is not a trait -(\sref{sec:traits}). It follows that the reduced union, if it exists, -produces a set of class types, where different types are instances of different classes. -\item -The base classes of a type selection \lstinline@$S$#$T$@ are -determined as follows. If $T$ is an alias or abstract type, the -previous clauses apply. Otherwise, $T$ must be a (possibly -parameterized) class type, which is defined in some class $B$. Then -the base classes of \lstinline@$S$#$T$@ are the base classes of $T$ -in $B$ seen from the prefix type $S$. -\end{itemize} - -2. The notion of a type $T$ -{\em in class $C$ seen from some prefix type -$S\,$} makes sense only if the prefix type $S$ -has a type instance of class $C$ as a base class, say -~\lstinline@$S'$#$C$[$T_1 \commadots T_n$]@. Then we define as follows. -\begin{itemize} - \item - If \lstinline@$S$ = $\epsilon$.type@, then $T$ in $C$ seen from $S$ is $T$ itself. - \item Otherwise, if $T$ is the $i$'th type parameter of some class $D$, then - \begin{itemize} - \item - If $S$ has a base class ~\lstinline@$D$[$U_1 \commadots U_n$]@, for some type parameters - ~\lstinline@[$U_1 \commadots U_n$]@, then $T$ in $C$ seen from $S$ is $U_i$. - \item - Otherwise, if $C$ is defined in a class $C'$, then - $T$ in $C$ seen from $S$ is the same as $T$ in $C'$ seen from $S'$. - \item - Otherwise, if $C$ is not defined in another class, then - $T$ in $C$ seen from $S$ is $T$ itself. - \end{itemize} -\item - Otherwise, - if $T$ is the singleton type \lstinline@$D$.this.type@ for some class $D$ - then - \begin{itemize} - \item - If $D$ is a subclass of $C$ and - $S$ has a type instance of class $D$ among its base classes. - then $T$ in $C$ seen from $S$ is $S$. - \item - Otherwise, if $C$ is defined in a class $C'$, then - $T$ in $C$ seen from $S$ is the same as $T$ in $C'$ seen from $S'$. - \item - Otherwise, if $C$ is not defined in another class, then - $T$ in $C$ seen from $S$ is $T$ itself. - \end{itemize} -\item - If $T$ is some other type, then the described mapping is performed - to all its type components. -\end{itemize} - -If $T$ is a possibly parameterized class type, where $T$'s class -is defined in some other class $D$, and $S$ is some prefix type, -then we use ``$T$ seen from $S$'' as a shorthand for -``$T$ in $D$ seen from $S$. - -3. The {\em member bindings} of a type $T$ are all bindings $d$ such that -there exists a type instance of some class $C$ among the base classes of $T$ -and there exists a definition or declaration $d'$ in $C$ -such that $d$ results from $d'$ by replacing every -type $T'$ in $d'$ by $T'$ in $C$ seen from $T$. - -The {\em definition} of a type projection \lstinline@$S$#$t$@ is the member -binding $d$ of the type $t$ in $S$. In that case, we also say -that \lstinline@$S$#$t$@ {\em is defined by} $d$. - -\section{Relations between types} - -We define two relations between types. -\begin{quote}\begin{tabular}{l@{\gap}l@{\gap}l} -\em Type equivalence & $T \equiv U$ & $T$ and $U$ are interchangeable -in all contexts. -\\ -\em Conformance & $T \conforms U$ & Type $T$ conforms to type $U$. -\end{tabular}\end{quote} - -\subsection{Type Equivalence} -\label{sec:type-equiv} - -Equivalence $(\equiv)$ between types is the smallest congruence\footnote{ A -congruence is an equivalence relation which is closed under formation -of contexts} such that the following holds: -\begin{itemize} -\item -If $t$ is defined by a type alias ~\lstinline@type $t$ = $T$@, then $t$ is -equivalent to $T$. -\item -If a path $p$ has a singleton type ~\lstinline@$q$.type@, then -~\lstinline@$p$.type $\equiv q$.type@. -\item -If $O$ is defined by an object definition, and $p$ is a path -consisting only of package or object selectors and ending in $O$, then -~\lstinline@$O$.this.type $\equiv p$.type@. -\item -Two compound types are equivalent if their component types are -pairwise equivalent and their refinements are equivalent. Two -refinements are equivalent if they bind the same names and the -modifiers, types and bounds of every declared entity are equivalent in -both refinements. -\item -Two method types are equivalent if they have equivalent result -types, both have the same number of parameters, and corresponding -parameters have equivalent types as well as the same \code{def} or -\lstinline@*@ modifiers. Note that the names of parameters do not matter -for method type equivalence. -\item -Two polymorphic types are equivalent if they have the same number of -type parameters, and, after renaming one set of type parameters by -another, the result types as well as lower and upper bounds of -corresponding type parameters are equivalent. -\item -Two overloaded types are equivalent if for every alternative type in -either type there exists an equivalent alternative type in the other. -\end{itemize} - -\subsection{Conformance} -\label{sec:subtyping} - -The conformance relation $(\conforms)$ is the smallest -transitive relation that satisfies the following conditions. -\begin{itemize} -\item Conformance includes equivalence. If $T \equiv U$ then $T \conforms U$. -\item For every value type $T$, - $\mbox{\code{scala.All}} \conforms T \conforms \mbox{\code{scala.Any}}$. -\item For every value type $T \conforms \mbox{\code{scala.AnyRef}}$ - one has $\mbox{\code{scala.AllRef}} \conforms T$. -\item A type variable or abstract type $t$ conforms to its upper bound and - its lower bound conforms to $t$. -\item A class type or parameterized type $c$ conforms to any of its basetypes, $b$. -\item A type projection \lstinline@$T$#$t$@ conforms to \lstinline@$U$#$t$@ if - $T$ conforms to $U$. -\item A parameterized type ~\lstinline@$T$[$T_1 \commadots T_n$]@~ conforms to - ~\lstinline@$T$[$U_1 \commadots U_n$]@~ if - the following three conditions hold for $i = 1 \commadots n$. - \begin{itemize} - \item - If the $i$'th type parameter of $T$ is declared covariant, then $T_i \conforms U_i$. - \item - If the $i$'th type parameter of $T$ is declared contravariant, then $U_i \conforms T_i$. - \item - If the $i$'th type parameter of $T$ is declared neither covariant - nor contravariant, then $U_i \equiv T_i$. - \end{itemize} -\item A compound type ~\lstinline@$T_1$ with $\ldots$ with $T_n$ {$R\,$}@~ conforms to - each of its component types $T_i$. -\item If $T \conforms U_i$ for $i = 1 \commadots n$ and for every - binding of a type or value $x$ in $R$ there exists a member - binding of $x$ in $T$ subsuming it, then $T$ conforms to the - compound type ~\lstinline@$T_1$ with $\ldots$ with $T_n$ {$R\,$}@. -\item If - $T'_i$ conforms to $T_i$ for $i = 1 \commadots n$ and $U$ conforms to $U'$ - then the method type $(T_1 \commadots T_n) U$ conforms to - $(T'_1 \commadots T'_n) U'$. -\item If, assuming -$L'_1 \conforms a_1 \conforms U'_1 \commadots L'_n \conforms a_n \conforms U'_n$ -one has $L_i \conforms L'_i$ and $U'_i \conforms U_i$ -for $i = 1 \commadots n$, as well as $T \conforms T'$ then the polymorphic type -$[a_1 >: L_1 <: U_1 \commadots a_n >: L_n <: U_n] T$ conforms to the polymorphic type -$[a_1 >: L'_1 <: U'_1 \commadots a_n >: L'_n <: U'_n] T'$. -\item -An overloaded type $T_1 \overload \ldots \overload T_n$ conforms to each of its alternative types $T_i$. -\item -A type $S$ conforms to the overloaded type $T_1 \overload \ldots \overload T_n$ -if $S$ conforms to each alternative type $T_i$. \todo{Really?} -\end{itemize} - -A declaration or definition in some compound type of class type $C$ -is {\em subsumes} another -declaration of the same name in some compound type or class type $C'$, if one of the following holds. -\begin{itemize} -\item -A value declaration ~\lstinline@val $x$: $T$@~ or value definition -~\lstinline@val $x$: $T$ = $e$@~ subsumes a value declaration -~\lstinline@val $x$: $T'$@~ if $T \conforms T'$. -\item -A type alias -$\TYPE;t=T$ subsumes a type alias $\TYPE;t=T'$ if -$T \equiv T'$. -\item -A type declaration ~\lstinline@type $t$ >: $L$ <: $U$@~ subsumes -a type declaration ~\lstinline@type $t$ >: $L'$ <: $U'$@~ if $L' \conforms L$ and -$U \conforms U'$. -\item -A type or class definition of some type $t$ subsumes an abstract -type declaration ~\lstinline@type t >: L <: U@~ if -$L \conforms t \conforms U$. -\end{itemize} - -The $(\conforms)$ relation forms a partial order between types. The {\em -least upper bound} or the {\em greatest lower bound} of a set of types -is understood to be relative to that order. - -\paragraph{Note} The least upper bound of a set of types does not always exist. For instance, consider -the class definitions -\begin{lstlisting} -class A[+t] {} -class B extends A[B]; -class C extends A[C]; -\end{lstlisting} -Then the types ~\lstinline@A[Any], A[A[Any]], A[A[A[Any]]], ...@~ form -a descending sequence of upper bounds for \code{B} and \code{C}. The -least upper bound would be the infinite limit of that sequence, which -does ot exist as a Scala type. Since cases like this are in general -impossible to detect, a Scala compiler is free to reject a term -which has a type specified as a least upper or greatest lower bound, -and that bound would be more complex than some compiler-set -limit\footnote{The current Scala compiler limits the nesting level -of parameterization in a such bounds to 10.}. - -\section{Type Erasure} -\label{sec:erasure} - -A type is called {\em generic} if it contains type arguments or type variables. -{\em Type erasure} is a mapping from (possibly generic) types to -non-generic types. We write $|T|$ for the erasure of type $T$. -The erasure mapping is defined as follows. -\begin{itemize} -\item The erasure of a type variable is the erasure of its upper bound. -\item The erasure of a parameterized type $T[T_1 \commadots T_n]$ is $|T|$. -\item The erasure of a singleton type \lstinline@$p$.type@ is the - erasure of the type of $p$. -\item The erasure of a type projection \lstinline@$T$#$x$@ is \lstinline@|$T$|#$x$@. -\item The erasure of a compound type ~\lstinline@$T_1$ with $\ldots$ with $T_n$ {$R\,$}@ - is $|T_1|$. -\item The erasure of every other type is the type itself. -\end{itemize} - -\section{Implicit Conversions} -\label{sec:impl-conv} - -\todo{Include Anything to unit?} - -The following implicit conversions are applied to expressions of -method type that are used as values, rather than being applied to some -arguments. -\begin{itemize} -\item -A parameterless method $m$ of type $[] T$ -is converted to type $T$ by evaluating the expression to which $m$ is bound. -\item -An expression $e$ of polymorphic type -\begin{lstlisting} -[$a_1$ >: $L_1$ <: $U_1 \commadots a_n$ >: $L_n$ <: $U_n$]$T$ -\end{lstlisting} -which does not appear as the function part of -a type application is converted to type $T$ -by determining with local type inference -(\sref{sec:local-type-inf}) instance types ~\lstinline@$T_1 \commadots T_n$@~ -for the type variables ~\lstinline@$a_1 \commadots a_n$@~ and -implicitly embedding $e$ in the type application -~\lstinline@$e$[$U_1 \commadots U_n$]@~ (\sref{sec:type-app}). -\item -An expression $e$ of monomorphic method type -$(\Ts_1) \ldots (\Ts_n) U$ of arity $n > 0$ -which does not appear as the function part of an application is -converted to a function type by implicitly embedding $e$ in -the following term, where $x$ is a fresh variable and each $ps_i$ is a -parameter section consisting of parameters with fresh names of types $\Ts_i$: -\begin{lstlisting} -(val $x$ = $e$ ; $(ps_1) \ldots \Arrow \ldots \Arrow (ps_n) \Arrow x(ps_1)\ldots(ps_n)$) -\end{lstlisting} -This conversion is not applicable to functions with call-by-name -parameters \lstinline@def $x$: $T$@ or repeated parameters -\lstinline@x: T*@, (\sref{sec:parameters}), because its result would -violate the well-formedness rules for anonymous functions -(\sref{sec:closures}). Hence, methods with such parameters -always need to be applied to arguments immediately. -\end{itemize} - -When used in an expression, a value of type \code{byte}, \code{char}, -or \code{short} is always implicitly converted to a value of type -\code{int}. - -If an expression $e$ has type $T$ where $T$ does not conform to the -expected type $pt$ and $T$ has a member named \lstinline@coerce@ of type -$[]U$ where $U$ does comform to $pt$, then the expression is typed and evaluated is if it was -\lstinline@$e$.coerce@. - - -\chapter{Basic Declarations and Definitions} -\label{sec:defs} - -\syntax\begin{lstlisting} - Dcl ::= val ValDcl {`,' ValDcl} - | var VarDcl {`,' VarDcl} - | def FunDcl {`,' FunDcl} - | type TypeDcl {`,' TypeDcl} - Def ::= val PatDef {`,' PatDef} - | var VarDef {`,' VarDef} - | def FunDef {`,' FunDef} - | type TypeDef {`,' TypeDef} - | ClsDef -\end{lstlisting} - -A {\em declaration} introduces names and assigns them types. It can -appear as one of the statements of a class definition -(\sref{sec:templates}) or as part of a refinement in a compound -type (\ref{sec:refinements}). - -A {\em definition} introduces names that denote terms or types. It can -form part of an object or class definition or it can be local to a -block. Both declarations and definitions produce {\em bindings} that -associate type names with type definitions or bounds, and that -associate term names with types. - -The scope of a name introduced by a declaration or definition is the -whole statement sequence containing the binding. However, there is a -restriction on forward references: In a statement sequence $s_1 \ldots -s_n$, if a simple name in $s_i$ refers to an entity defined by $s_j$ -where $j \geq i$, then every non-empty statement between and including -$s_i$ and $s_j$ must be an import clause, -or a function, type, class, or object definition. It may not be -a value definition, a variable defninition, or an expression. - -\comment{ -Every basic definition may introduce several defined names, separated -by commas. These are expanded according to the following scheme: -\bda{lcl} -\VAL;x, y: T = e && \VAL; x: T = e \\ - && \VAL; y: T = x \\[0.5em] - -\LET;x, y: T = e && \LET; x: T = e \\ - && \VAL; y: T = x \\[0.5em] - -\DEF;x, y (ps): T = e &\tab\mbox{expands to}\tab& \DEF; x(ps): T = e \\ - && \DEF; y(ps): T = x(ps)\\[0.5em] - -\VAR;x, y: T := e && \VAR;x: T := e\\ - && \VAR;y: T := x\\[0.5em] - -\TYPE;t,u = T && \TYPE; t = T\\ - && \TYPE; u = t\\[0.5em] -\eda -} - -All definitions have a ``repeated form'' where the initial -definition keyword is followed by several constituent definitions -which are separated by commas. A repeated definition is -always interpreted as a sequence formed from the -constituent definitions. E.g.\ the function definition -~\lstinline@def f(x) = x, g(y) = y@~ expands to -~\lstinline@def f(x) = x; def g(y) = y@~ and -the type definition -~\lstinline@type T, U <: B@~ expands to -~\lstinline@type T; type U <: B@. - -\comment{ -If an element in such a sequence introduces only the defined name, -possibly with some type or value parameters, but leaves out any -aditional parts in the definition, then those parts are implicitly -copied from the next subsequent sequence element which consists of -more than just a defined name and parameters. Examples: -\begin{itemize} -\item[] -The variable declaration ~\lstinline@var x, y: int@~ -expands to ~\lstinline@var x: int; var y: int@. -\item[] -The value definition ~\lstinline@val x, y: int = 1@~ -expands to ~\lstinline@val x: int = 1; val y: int = 1@. -\item[] -The class definition ~\lstinline@case class X(), Y(n: int) extends Z@~ expands to -~\lstinline@case class X extends Z; case class Y(n: int) extends Z@. -\item -The object definition ~\lstinline@case object Red, Green, Blue extends Color@~ -expands to -\begin{lstlisting} -case object Red extends Color; -case object Green extends Color; -case object Blue extends Color . -\end{lstlisting} -\end{itemize} -} -\section{Value Declarations and Definitions} -\label{sec:valdef} - -\syntax\begin{lstlisting} - Dcl ::= val ValDcl {`,' ValDcl} - ValDcl ::= id `:' Type - Def ::= val PatDef {`,' PatDef} - PatDef ::= Pattern2 [`:' Type] `=' Expr -\end{lstlisting} - -A value declaration ~\lstinline@val $x$: $T$@~ introduces $x$ as a name of a value of -type $T$. - -A value definition ~\lstinline@val $x$: $T$ = $e$@~ defines $x$ as a name of the -value that results from the evaluation of $e$. The type $T$ may be -omitted, in which case the type of expression $e$ is assumed. -If a type $T$ is given, then $e$ is expected to conform to it. - -Evaluation of the value definition implies evaluation of its -right-hand side $e$. The effect of the value definition is to bind -$x$ to the value of $e$ converted to type $T$. - -Value definitions can alternatively have a pattern -(\sref{sec:patterns}) as left-hand side. If $p$ is some pattern other -than a simple name or a name followed by a colon and a type, then the -value definition ~\lstinline@val $p$ = $e$@~ is expanded as follows: - -1. If the pattern $p$ has bound variables $x_1 \commadots x_n$, where $n > 1$: -\begin{lstlisting} -val $\Dollar x$ = $e$.match {case $p$ => scala.Tuple$n$($x_1 \commadots x_n$)} -val $x_1$ = $\Dollar x$._1 -$\ldots$ -val $x_n$ = $\Dollar x$._n . -\end{lstlisting} -Here, $\Dollar x$ is a fresh name. The class -\lstinline@Tuple$n$@ is defined for $n = 2 \commadots 9$ in package -\code{scala}. - -2. If $p$ has a unique bound variable $x$: -\begin{lstlisting} -val $x$ = $e$.match { case $p$ => $x$ } -\end{lstlisting} - -3. If $p$ has no bound variables: -\begin{lstlisting} -$e$.match { case $p$ => ()} -\end{lstlisting} - -\example -The following are examples of value definitions -\begin{lstlisting} -val pi = 3.1415; -val pi: double = 3.1415; // equivalent to first definition -val Some(x) = f(); // a pattern definition -val x :: xs = mylist; // an infix pattern definition -\end{lstlisting} - -The last two definitions have the following expansions. -\begin{lstlisting} -val x = f().match { case Some(x) => x } - -val x$\Dollar$ = mylist.match { case x :: xs => scala.Tuple2(x, xs) } -val x = x$\Dollar$._1; -val xs = x$\Dollar$._2; -\end{lstlisting} - -\section{Variable Declarations and Definitions} -\label{sec:vardef} - -\syntax\begin{lstlisting} - Dcl ::= var VarDcl {`,' VarDcl} - Def ::= var ValDef {`,' ValDef} - VarDcl ::= id `:' Type - VarDef ::= id [`:' Type] `=' Expr - | id `:' Type `=' `_' -\end{lstlisting} - -A variable declaration ~\lstinline@var $x$: $T$@~ is equivalent to declarations -of a {\em getter function} $x$ and a {\em setter function} -\lstinline@$x$_=@, defined as follows: - -\begin{lstlisting} - def $x$: $T$; - def $x$_= ($y$: $T$): unit -\end{lstlisting} - -An implementation of a class containing variable declarations -may define these variables using variable definitions, or it may -define setter and getter functions directly. - -A variable definition ~\lstinline@var $x$: $T$ = $e$@~ introduces a mutable -variable with type $T$ and initial value as given by the -expression $e$. The type $T$ can be omitted, -in which case the type of $e$ is assumed. If $T$ is given, then $e$ -is expected to conform to it. - -A variable definition ~\lstinline@var $x$: $T$ = _@~ introduces a mutable -variable with type \ $T$ and a default initial value. -The default value depends on the type $T$ as follows: -\begin{quote}\begin{tabular}{ll} -\code{0} & if $T$ is \code{int} or one of its subrange types, \\ -\code{0L} & if $T$ is \code{long},\\ -\lstinline@0.0f@ & if $T$ is \code{float},\\ -\lstinline@0.0d@ & if $T$ is \code{double},\\ -\code{false} & if $T$ is \code{boolean},\\ -\lstinline@()@ & if $T$ is \code{unit}, \\ -\code{null} & for all other types $T$. -\end{tabular}\end{quote} - -When they occur as members of a template, both forms of variable -definition also introduce a getter function $x$ which returns the -value currently assigned to the variable, as well as a setter function -\lstinline@$x$_=@ which changes the value currently assigned to the variable. -The functions have the same signatures as for a variable declaration. -The getter and setter functions are then members of the template -instead of the variable accessed by them. - -\example The following example shows how {\em properties} can be -simulated in Scala. It defines a class \code{TimeOfDayVar} of time -values with updatable integer fields representing hours, minutes, and -seconds. Its implementation contains tests that allow only legal -values to be assigned to these fields. The user code, on the other -hand, accesses these fields just like normal variables. - -\begin{lstlisting} -class TimeOfDayVar { - private var h: int = 0, m: int = 0, s: int = 0; - - def hours = h; - def hours_= (h: int) = if (0 <= h && h < 24) this.h = h - else throw new DateError(); - - def minutes = m - def minutes_= (m: int) = if (0 <= m && m < 60) this.m = m - else throw new DateError(); - - def seconds = s - def seconds_= (s: int) = if (0 <= s && s < 60) this.s = s - else throw new DateError(); -} -val t = new TimeOfDayVar; -d.hours = 8; d.minutes = 30; d.seconds = 0; -d.hours = 25; // throws a DateError exception -\end{lstlisting} - -\section{Type Declarations and Type Aliases} -\label{sec:typedcl} -\label{sec:typealias} - -\syntax\begin{lstlisting} - Dcl ::= type TypeDcl {`,' TypeDcl} - TypeDcl ::= id [>: Type] [<: Type] - Def ::= type TypeDef {`,' TypeDef} - TypeDef ::= id [TypeParamClause] `=' Type -\end{lstlisting} - -A {\em type declaration} ~\lstinline@type $t$ >: $L$ <: $U$@~ declares $t$ to -be an abstract type with lower bound type $L$ and upper bound -type $U$. If such a declaration appears as a member declaration -of a type, implementations of the type may implement $t$ with any -type $T$ for which $L \conforms T \conforms U$. Either or both bounds may -be omitted. If the lower bound $L$ is missing, the bottom type -\lstinline@scala.All@ is assumed. If the upper bound $U$ is missing, -the top type \lstinline@scala.Any@ is assumed. - -A {\em type alias} ~\lstinline@type $t$ = $T$@~ defines $t$ to be an alias -name for the type $T$. The left hand side of a type alias may -have a type parameter clause, e.g. ~\lstinline@type $t$[$\tps\,$] = $T$@. The scope -of a type parameter extends over the right hand side $T$ and the -type parameter clause $\tps$ itself. - -The scope rules for definitions (\sref{sec:defs}) and type parameters -(\sref{sec:funsigs}) make it possible that a type name appears in its -own bound or in its right-hand side. However, it is a static error if -a type alias refers recursively to the defined type constructor itself. -That is, the type $T$ in a type alias ~\lstinline@type $t$[$\tps\,$] = $T$@~ may not refer -directly or indirectly to the name $t$. It is also an error if -an abstract type is directly or indirectly its own upper or lower bound. - -\example The following are legal type declarations and definitions: -\begin{lstlisting} -type IntList = List[Integer]; -type T <: Comparable[T]; -type Two[a] = Tuple2[a, a]; -\end{lstlisting} - -The following are illegal: -\begin{lstlisting} -type Abs = Comparable[Abs]; // recursive type alias - -type S <: T; // S, T are bounded by themselves. -type T <: S; - -type T <: AnyRef with T; // T is abstract, may not be part of - // compound type - -type T >: Comparable[T.That]; // Cannot select from T. - // T is a type, not a value -\end{lstlisting} - -If a type alias ~\lstinline@type $t$[$\tps\,$] = $S$@~ refers to a class type -$S$, the name $t$ can also be used as a constructor for -objects of type $S$. - -\example The \code{Predef} module contains a definition which establishes \code{Pair} -as an alias of the parameterized class \code{Tuple2}: -\begin{lstlisting} -type Pair[+a, +b] = Tuple2[a, b]; -\end{lstlisting} -As a consequence, for any two types $S$ and $T$, the type -~\lstinline@Pair[$S$, $T\,$]@~ is equivalent to the type ~\lstinline@Tuple2[$S$, $T\,$]@. -\code{Pair} can also be used as a constructor instead of \code{Tuple2}, as in -\begin{lstlisting} -new Pair[Int, Int](1, 2) . -\end{lstlisting} - -\section{Type Parameters} - -\syntax\begin{lstlisting} - TypeParamClause ::= `[' TypeParam {`,' TypeParam} `]' - TypeParam ::= [`+' | `-'] TypeDcl -\end{lstlisting} - - -Type parameters appear in type definitions, class definitions, and -function definitions. The most general form of a type parameter is -~\lstinline@$\pm t$ >: $L$ <: $U$@. Here, $L$, and $U$ are lower -and upper bounds that constrain possible type arguments for the -parameter, and $\pm$ is a {\em variance}, i.e.\ an optional prefix -of either \lstinline@+@, or \lstinline@-@. - -\comment{ -The upper bound $U$ in a type parameter clauses may not be a final -class. The lower bound may not denote a value type.\todo{Why} -} -The names of all type parameters in a type parameter clause must be -pairwise different. The scope of a type parameter includes in each -case the whole type parameter clause. Therefore it is possible that a -type parameter appears as part of its own bounds or the bounds of -other type parameters in the same clause. However, a type parameter -may not be bounded directly or indirectly by itself. - -\example Here are some well-formed type parameter clauses: -\begin{lstlisting} -[s, t] -[ex <: Throwable] -[a <: Ord[b], b <: a] -[a, b, c >: a <: b] -\end{lstlisting} -The following type parameter clauses are illegal -since type parameter are bounded by themselves. -\begin{lstlisting} -[a >: a] -[a <: b, b <: c, c <: a] -\end{lstlisting} - -Variance annotations indicate how type instances with the given type -parameters vary with respect to subtyping (\sref{sec:subtyping}). A -`\lstinline@+@' variance indicates a covariant dependency, a `\lstinline@-@' -variance indicates a contravariant dependency, and a missing variance -indication indicates an invariant dependency. - -A variance annotation constrains the way the annotated type variable -may appear in the type or class which binds the type parameter. In a -type definition ~\lstinline@type $t$[$\tps\,$] = $S$@, type parameters labeled -`\lstinline@+@' must only appear in covariant position in $S$ whereas -type parameters labeled `\lstinline@-@' must only appear in contravariant -position. Analogously, for a class definition -~\lstinline@class $c$[$\tps\,$]($ps\,$): $s$ extends $t$@, type parameters labeled -`\lstinline@+@' must only appear in covariant position in the self type -$s$ and the template $t$, whereas type -parameters labeled `\lstinline@-@' must only appear in contravariant -position. - -The variance position of a type parameter in a type or template is -defined as follows. Let the opposite of covariance be contravariance, -and the opposite of invariance be itself. The top-level of the type -or template is always in covariant position. The variance position -changes at the following constructs. -\begin{itemize} -\item -The variance position of a method parameter is the opposite of the -variance position of the enclosing parameter clause. -\item -The variance position of a type parameter is the opposite of the -variance position of the enclosing type parameter clause. -\item -The variance position of the lower bound of a type declaration or type parameter -is the opposite of the variance position of the type declaration or parameter. -\item -The right hand side $S$ of a type alias ~\lstinline@type $t$[$\tps\,$] = $S$@~ -is always in invariant position. -\item -The type of a mutable variable is always in invariant position. -\item -The prefix $S$ of a type selection \lstinline@$S$#$T$@ is always in invariant position. -\item -For a type argument $T$ of a type ~\lstinline@$S$[$\ldots T \ldots$ ]@: If the -corresponding type parameter is invariant, then $T$ is in -invariant position. If the corresponding type parameter is -contravariant, the variance position of $T$ is the opposite of -the variance position of the enclosing type ~\lstinline@$S$[$\ldots T \ldots$ ]@. -\end{itemize} - -\example The following variance annotation is legal. -\begin{lstlisting} -class P[a, b] { - val fst: a, snd: b -}\end{lstlisting} -With this variance annotation, elements -of type $P$ subtype covariantly with respect to their arguments. -For instance, -\begin{lstlisting} -P[IOExeption, String] <: P[Throwable, AnyRef] . -\end{lstlisting} - -If we make the elements of $P$ mutable, -the variance annotation becomes illegal. -\begin{lstlisting} -class Q[+a, +b] { - var fst: a, snd: b // **** error: illegal variance: - // `a', `b' occur in invariant position. -} -\end{lstlisting} - -\example The following variance annotation is illegal, since $a$ appears -in contravariant position in the parameter of \code{append}: - -\begin{lstlisting} -trait Vector[+a] { - def append(x: Vector[a]): Vector[a]; - // **** error: illegal variance: - // `a' occurs in contravariant position. -} -\end{lstlisting} -The problem can be avoided by generalizing the type of \code{append} -by means of a lower bound: - -\begin{lstlisting} -trait Vector[+a] { - def append[b >: a](x: Vector[b]): Vector[b]; -} -\end{lstlisting} - -\example Here is a case where a contravariant type parameter is useful. - -\begin{lstlisting} -trait OutputChannel[-a] { - def write(x: a): unit -} -\end{lstlisting} -With that annotation, we have that -\lstinline@OutputChannel[AnyRef]@ conforms to \lstinline@OutputChannel[String]@. -That is, a -channel on which one can write any object can substitute for a channel -on which one can write only strings. - -\section{Function Declarations and Definitions} -\label{sec:defdef} -\label{sec:funsigs} -\label{sec:parameters} - -\syntax\begin{lstlisting} -Dcl ::= def FunDcl {`,' FunDcl} -FunDcl ::= id [FunTypeParamClause] {ParamClause} `:' Type -Def ::= def FunDef {`,' FunDef} -FunDef ::= id [FunTypeParamClause] {ParamClause} - [`:' Type] `=' Expr -FunTypeParamClause ::= `[' TypeDcl {`,' TypeDcl} `]' -ParamClause ::= `(' [Param {`,' Param}] `)' -Param ::= [def] id `:' Type [`*'] -\end{lstlisting} - -A function declaration has the form ~\lstinline@def $f \psig$: $T$@, where -$f$ is the function's name, $\psig$ is its parameter -signature and $T$ is its result type. A function definition -~\lstinline@$f \psig$: $T$ = $e$@~ also includes a {\em function body} $e$, -i.e.\ an expression which defines the function's result. A parameter -signature consists of an optional type parameter clause \lstinline@[$\tps\,$]@, -followed by zero or more value parameter clauses -~\lstinline@($ps_1$)$\ldots$($ps_n$)@. Such a declaration or definition -introduces a value with a (possibly polymorphic) method type whose -parameter types and result type are as given. - -A type parameter clause $\tps$ consists of one or more type -declarations (\sref{sec:typedcl}), which introduce type parameters, -possibly with bounds. The scope of a type parameter includes -the whole signature, including any of the type parameter bounds as -well as the function body, if it is present. - -A value parameter clause $ps$ consists of zero or more formal -parameter bindings such as \lstinline@$x$: $T$@, which bind value -parameters and associate them with their types. The scope of a formal -value parameter name $x$ is the function body, if one is -given. Both type parameter names and value parameter names must be -pairwise distinct. - -Value parameters may be prefixed by \code{def}, e.g.\ -~\lstinline@def $x$:$T$@. The type of such a parameter is then the -parameterless method type ~\lstinline@[]$T$@. This indicates that the -corresponding argument is not evaluated at the point of function -application, but instead is evaluated at each use within the -function. That is, the argument is evaluated using {\em call-by-name}. - -\example The declaration -\begin{lstlisting} -def whileLoop (def cond: Boolean) (def stat: Unit): Unit -\end{lstlisting} -produces the typing -\begin{lstlisting} -whileLoop: (cond: [] Boolean) (stat: [] Unit) Unit -\end{lstlisting} -which indicates that both parameters of \code{while} are evaluated using -call-by-name. - -The last value parameter of a parameter section may be suffixed by -``\code{*}'', e.g.\ ~\lstinline@(..., $x$:$T$*)@. The type of such a -{\em repeated} parameter inside the method is then the sequence type -\lstinline@scala.Seq[$T$]@. Methods with repeated parameters -\lstinline@$T$*@ take a variable number of arguments of type $T$. That is, -if a method $m$ with type ~\lstinline@($T_1 \commadots T_n, S$*)$U$@~ -is applied to arguments $(e_1 \commadots e_k)$ where $k \geq n$, then -$m$ is taken in that application to have type $(T_1 \commadots T_n, S -\commadots S)U$, with $k - n$ occurences of type $S$. -\todo{Change to ???: If the method -is converted to a function type instead of being applied immediately, -a repeated parameter \lstinline@$T$*@ is taken to be ~\lstinline@scala.Seq[$T$]@~ -instead.} - -\example The following method definition computes the sum of a variable number -of integer arguments. -\begin{lstlisting} -def sum(args: int*) { - var result = 0; - for (val arg <- args.elements) result = result + arg; - result -} -\end{lstlisting} -The following applications of this method yield \code{0}, \code{1}, -\code{6}, in that order. -\begin{lstlisting} -sum() -sum(1) -sum(1, 2, 3, 4, 5) -\end{lstlisting} - - -The type of the function body must conform to the function's declared -result type, if one is given. If the function definition is not -recursive, the result type may be omitted, in which case it is -determined from the type of the function body. - -\section{Overloaded Definitions} -\label{sec:overloaded-defs} -\todo{change} - -An overloaded definition is a set of $n > 1$ value or function -definitions in the same statement sequence that define the same name, -binding it to types ~\lstinline@$T_1 \commadots T_n$@, respectively. -The individual definitions are called {\em alternatives}. Overloaded -definitions may only appear in the statement sequence of a template. -Alternatives always need to specify the type of the defined entity -completely. It is an error if the types of two alternatives $T_i$ and -$T_j$ have the same erasure (\sref{sec:erasure}). - -\todo{Say something about bridge methods.} -%This must be a well-formed -%overloaded type - -\section{Import Clauses} -\label{sec:import} - -\syntax\begin{lstlisting} - Import ::= import ImportExpr {`,' ImportExpr} - ImportExpr ::= StableId `.' (id | `_' | ImportSelectors) - ImportSelectors ::= `{' {ImportSelector `,'} - (ImportSelector | `_') `}' - ImportSelector ::= id [`=>' id | `=>' `_'] -\end{lstlisting} - -An import clause has the form ~\lstinline@import $p$.$I$@~ where $p$ is a stable -identifier (\sref{sec:paths}) and $I$ is an import expression. -The import expression determines a set of names of members of $p$ -which are made available without qualification. The most general form -of an import expression is a list of {\em import selectors} -\begin{lstlisting} -{ $x_1$ => $y_1 \commadots x_n$ => $y_n$, _ } -\end{lstlisting} -for $n \geq 0$, where the final wildcard `\lstinline@_@' may be absent. It -makes available each member \lstinline@$p$.$x_i$@ under the unqualified name -$y_i$. I.e.\ every import selector ~\lstinline@$x_i$ => $y_i$@~ renames -\lstinline@$p$.$x_i$@ to -$y_i$. If a final wildcard is present, all members $z$ of -$p$ other than ~\lstinline@$x_1 \commadots x_n$@~ are also made available -under their own unqualified names. - -Import selectors work in the same way for type and term members. For -instance, an import clause ~\lstinline@import $p$.{$x$ => $y\,$}@~ renames the term -name \lstinline@$p$.$x$@ to the term name $y$ and the type name \lstinline@$p$.$x$@ -to the type name $y$. At least one of these two names must -reference a member of $p$. - -If the target in an import selector is a wildcard, the import selector -hides access to the source member. For instance, the import selector -~\lstinline@$x$ => _@~ ``renames'' $x$ to the wildcard symbol (which is -unaccessible as a name in user programs), and thereby effectively -prevents unqualified access to $x$. This is useful if there is a -final wildcard in the same import selector list, which imports all -members not mentioned in previous import selectors. - -Several shorthands exist. An import selector may be just a simple name -$x$. In this case, $x$ is imported without renaming, so the -import selector is equivalent to ~\lstinline@$x$ => $x$@. Furthermore, it is -possible to replace the whole import selector list by a single -identifier or wildcard. The import clause ~\lstinline@import $p$.$x$@~ is -equivalent to ~\lstinline@import $p$.{$x\,$}@~, i.e.\ it makes available without -qualification the member $x$ of $p$. The import clause -~\lstinline@import $p$._@~ is equivalent to -~\lstinline@import $p$.{_}@, -i.e.\ it makes available without qualification all members of $p$ -(this is analogous to ~\lstinline@import $p$.*@~ in Java). - -An import clause with multiple import expressions -~\lstinline@import $p_1$.$I_1 \commadots p_n$.$I_n$@~ is interpreted as a -sequence of import clauses -~\lstinline@import $p_1$.$I_1$; $\ldots$; import $p_n$.$I_n$@. - -\example Consider the object definition: -\begin{lstlisting} -object M { - def z = 0, one = 1; - def add(x: Int, y: Int): Int = x + y -} -\end{lstlisting} -Then the block -\begin{lstlisting} -{ import M.{one, z => zero, _}; add(zero, one) } -\end{lstlisting} -is equivalent to the block -\begin{lstlisting} -{ M.add(M.z, M.one) } . -\end{lstlisting} - -\chapter{Classes and Objects} -\label{sec:globaldefs} - -\syntax\begin{lstlisting} - ClsDef ::= ([case] class | trait) ClassDef {`,' ClassDef} - | [case] object ObjectDef {`,' ObjectDef} -\end{lstlisting} - -Classes (\sref{sec:classes}) and objects -(\sref{sec:modules}) are both defined in terms of {\em templates}. - -\section{Templates} -\label{sec:templates} - -\syntax\begin{lstlisting} - Template ::= Constr {`with' Constr} [TemplateBody] - TemplateBody ::= `{' [TemplateStat {`;' TemplateStat}] `}' -\end{lstlisting} - -A template defines the type signature, behavior and initial state of a -class of objects or of a single object. Templates form part of -instance creation expressions, class definitions, and object -definitions. A template -~\lstinline@$sc$ with $mc_1$ with $\ldots$ with $mc_n$ {$\stats\,$}@~ -consists of a constructor invocation $sc$ -which defines the template's {\em superclass}, constructor invocations -~\lstinline@$mc_1 \commadots mc_n$@~ $(n \geq 0)$, which define the -template's {\em mixin classes}, and a statement sequence $\stats$ which -contains additional member definitions for the template. Superclass -and mixin classes together are called the {\em parent classes} of a -template. They must be pairwise different. The superclass of a -template must be a subtype of the superclass of each mixin class. The -{\em least proper supertype} of a template is the class type or -compound type (\sref{sec:compound-types}) consisting of the its parent -classes. - -\todo{introduce ScalaObject} - -Member definitions define new members or overwrite members in the -parent classes. If the template forms part of a class definition, -the statement part $\stats$ may also contain declarations of abstract members. -%The type of each non-private definition or declaration of a -%template must be equivalent to a type which does not refer to any -%private members of that template. - -\todo{Make all references to Java generic} - -\paragraph{Inheriting from Java Types} A template may have a Java class as -its superclass and Java interfaces as its mixin classes. On the other -hand, it is not permitted to have a Java class as a mixin class, or a -Java interface as a superclass. - -\subsection{Constructor Invocations} -\label{sec:constr-invoke} -\syntax\begin{lstlisting} - Constr ::= StableId [TypeArgs] [`(' [Exprs] `)'] -\end{lstlisting} - -Constructor invocations define the type, members, and initial state of -objects created by an instance creation expression, or of parts of an -object's definition which are inherited by a class or object -definition. A constructor invocation is a function application -\lstinline@$x$.$c$($\args\,$)@, where $x$ is a stable identifier -(\sref{sec:stable-ids}), $c$ is a type name which either -designates a class or defines an alias type for one, and $\args$ -is an argument list, which matches one of the constructors of that -class. The prefix `\lstinline@$x$.@' can be omitted. -%The class $c$ must conform to \lstinline@scala.AnyRef@, -%i.e.\ it may not be a value type. -The argument list \lstinline@($\args\,$)@ can also be omitted, in which case an -empty argument list \lstinline@()@ is implicitly added. - -\subsection{Base Classes} -\label{sec:base-classes} - -For every template, class type and constructor invocation we define -two sets of class types: the {\em base classes} and {\em mixin base -classes}. Their definitions are as follows. - -The {\em mixin base classes} of a template -~\lstinline@$sc$ with $mc_1$ with $\ldots$ with $mc_n$ {$\stats\,$}@~ -are -the reduced union (\sref{sec:base-classes-member-defs}) of the base classes of all -mixins $mc_i$. The mixin base classes of a class type $C$ are the -mixin base classes of the template augmented by $C$ itself. The -mixin base classes of a constructor invocation of type $T$ are the -mixin base classes of class $T$. - -The {\em base classes} of a template consist are the reduced union of -the base classes of its superclass and the template's mixin base -classes. The base classes of class \lstinline@scala.Any@ consist of -just the class itself. The base classes of some other class type $C$ -are the base classes of the template represented by $C$ augmented by -$C$ itself. The base classes of a constructor invocation of type $T$ -are the base classes of $T$. - -The notions of mixin base classes and base classes are extended from -classes to arbitrary types following the definitions of -\sref{sec:base-classes-member-defs}. - -\comment{ -If two types in the base class sequence of a template refer to the -same class definition, then that definition must define a trait -(\sref{sec:traits}), and the type that comes later in the sequence must -conform to the type that comes first. -(\sref{sec:base-classes-member-defs}). -} - -\example -Consider the following class definitions: -\begin{lstlisting} -class A; -class B extends A; -trait C extends A; -class D extends A; -class E extends B with C with D; -class F extends B with D with E; -\end{lstlisting} -The mixin base classes and base classes of classes \code{A-F} are given in -the following table: -\begin{quote}\begin{tabular}{|l|l|l|} \hline - \ & Mixin base classses & Base classes \\ \hline -A & A & A, ScalaObject, AnyRef, Any \\ -B & B & B, A, ScalaObject, AnyRef, Any \\ -C & C & C, A, ScalaObject, AnyRef, Any \\ -D & D & D, A, ScalaObject, AnyRef, Any \\ -E & C, D, E & E, B, C, D, A, ScalaObject, AnyRef, Any \\ -F & C, D, E, F & F, B, D, E, C, A, ScalaObject, AnyRef, Any \\ \hline -\end{tabular}\end{quote} -Note that \code{D} is inherited twice by \code{F}, once directly, the -other time indirectly throgh \code{E}. This is permitted, since -\code{D} is a trait. - - -\subsection{Evaluation} - -The evaluation of a template or constructor invocation depends on -whether the template defines an object or is a superclass of a -constructed object, or whether it is used as a mixin for a defined -object. In the second case, the evaluation of a template used as a -mixin depends on an {\em actual superclass}, which is known at the -point where the template is used in a definition of an object, but not -at the point where it is defined. The actual superclass is used in the -determination of the meaning of \code{super} (\sref{sec:this-super}). - -We therefore define two notions of template evaluation: (Plain) -evaluation (as a defining template or superclass) and mixin evaluation -with a given superclass $sc$. These notions are defined for templates -and constructor invocations as follows. - -A {\em mixin evaluation with superclass $sc$} of a template -~\lstinline@$sc'$ with $mc_1$ with $mc_n$ {$\stats\,$}@~ consists of mixin -evaluations with superclass $sc$ of the mixin constructor invocations -~\lstinline@$mc_1 \commadots mc_n$@~ in the order they are given, followed by an -evaluation of the statement sequence $\stats$. Within $\stats$ the -actual superclass refers to $sc$. A mixin evaluation with superclass -$sc$ of a class constructor invocation \code{ci} consists of an evaluation -of the constructor function and its arguments in the order they are -given, followed by a mixin evaluation with superclass $sc$ of the -template represented by the constructor invocation. - -An {\em evaluation} of a template -~\lstinline@$sc$ with $mc_1$ with $mc_n$ with ($\stats\,$)@~ consists of an evaluation of -the superclass constructor invocation $sc$, -followed by a mixin evaluation with superclass $sc$ of the template. An -evaluation of a class constructor invocation \code{ci} consists of an -evaluation of the constructor function and its arguments in -the order they are given, followed by an evaluation of the template -represented by the constructor invocation. - -\subsection{Template Members} - -\label{sec:members} - -The object resulting from evaluation of a template has directly bound -members and inherited members. Members can be abstract or concrete. -For a template $T$ these categories are defined as follows. -\begin{enumerate} -\item -A {\em directly bound} member of $T$ is an entity introduced by a member -definition or declaration in $T$'s statement sequence. The -member is called {\em abstract} if it is introduced by a declaration, -{\em concrete} otherwise. -\item -A {\em concrete inherited} member of $T$ is a non-private, concrete member of -one of $T$'s parent classes, except if a member with the same name is -already directly bound in $T$ or the member is mixin-overridden in -$T$. A member $m$ of $T$'s superclass is {\em mixin-overridden} in $T$ -if there is a concrete member of a mixin base class of $T$ which -either overrides $m$ itself or overrides a member named $m$ of a base -class of $T$'s superclass. -\item -An {\em abstract inherited} member of $T$ is a non-private, abstract member -of one of $T$'s parent classes $P_i$, except if the template has a -directly bound or concrete inherited member with the same name, or the -template has an abstract member inherited from a parent class $P_j$ where -$j > i$\todo{OK to leave out?: , and which has the same modifiers and type as the member -inherited from $P_j$ would have in $T$}. -\end{enumerate} -It is an error if a template has more than one member with -the same name. - - - -\comment{ -The type of a member $m$ is determined as follows: If $m$ is defined -in $\stats$, then its type is the type as given in the member's -declaration or definition. Otherwise, if $m$ is inherited from the -base class ~\lstinline@$B$[$T_1$, $\ldots$. $T_n$]@, $B$'s class declaration has formal -parameters ~\lstinline@[$a_1 \commadots a_n$]@, and $M$'s type in $B$ is $U$, then -$M$'s type in $C$ is ~\lstinline@$U$[$a_1$ := $T_1 \commadots a_n$ := $T_n$]@. - -\ifqualified{ -Members of templates have internally qualified names $Q\qex x$ where -$x$ is a simple name and $Q$ is either the empty name $\epsilon$, or -is a qualified name referencing the module or class that first -introduces the member. A basic declaration or definition of $x$ in a -module or class $M$ introduces a member with the following qualified -name: -\begin{enumerate} -\item -If the binding is labeled with an ~\lstinline@override $Q$@\notyet{Override - with qualifier} modifier, -where $Q$ is a fully qualified name of a base class of $M$, then the -qualified name is the qualified expansion (\sref{sec:names}) of $x$ in -$Q$. -\item -If the binding is labeled with an \code{override} modifier without a -base class name, then the qualified name is the qualified expansion -of $x$ in $M$'s least proper supertype (\sref{sec:templates}). -\item -An implicit \code{override} modifier is added and case (2) also -applies if $M$'s least proper supertype contains an abstract member -with simple name $x$. -\item -If no \code{override} modifier is given or implied, then if $M$ is -labeled \code{qualified}, the qualified name is $M\qex x$. If $M$ is -not labeled \code{qualified}, the qualified name is $\epsilon\qex x$. -\end{enumerate} -} -} - -\example Consider the class definitions - -\begin{lstlisting} -class A { def f: Int = 1 ; def g: Int = 2 ; def h: Int = 3 } -abstract class B { def f: Int = 4 ; def g: Int } -abstract class C extends A with B { def h: Int } -\end{lstlisting} - -Then class \code{C} has a directly bound abstract member \code{h}. It -inherits member \code{f} from class \code{B} and member \code{g} from -class \code{A}. - -\ifqualified{ -\example\label{ex:compound-b} -Consider the definitions: -\begin{lstlisting} -qualified class Root extends Any { def r1: Root, r2: Int } -qualified class A extends Root { def r1: A, a: String } -qualified class B extends A { def r1: B, b: Double } -\end{lstlisting} -Then ~\lstinline@A with B@~ has members -\lstinline@Root::r1@ of type \code{B}, \lstinline@Root::r2@ of type \code{Int}, -\lstinline@A::a:@ of type \code{String}, and \lstinline@B::b@ of type \code{Double}, -in addition to the members inherited from class \code{Any}. -} - -\subsection{Overriding} -\label{sec:overriding} - -A template member $M$ that has the same \ifqualified{qualified} name -as a non-private member $M'$ of a base class (and that belongs to the -same namespace) is said to {\em override} that member. In this case -the binding of the overriding member $M$ must subsume -(\sref{sec:subtyping}) the binding of the overridden member $M'$. -Furthermore, the overridden definition may not be a class definition. -Method definitions may only override other method definitions (or the -methods implicitly defined by a variable definition). They may not -override value definitions. Finally, the following restrictions -on modifiers apply to $M$ and $M'$: -\begin{itemize} -\item -$M'$ must not be labeled \code{final}. -\item -$M$ must not be labeled \code{private}. -\item -If $M$ is labeled \code{protected}, then $M'$ must also be -labeled \code{protected}. -\item -If $M'$ is not an abstract member, then -$M$ must be labeled \code{override}. -\item -If $M'$ is labelled \code{abstract} and \code{override}, and $M'$ is a -member of the static superclass of the class containing the definition -of $M$, then $M$ must also be labelled \code{abstract} and -\code{override}. -\end{itemize} - -\example\label{ex:compound-a} -Consider the definitions: -\begin{lstlisting} -trait Root { type T <: Root } -trait A extends Root { type T <: A } -trait B extends Root { type T <: B } -trait C extends A with B; -\end{lstlisting} -Then the trait definition \code{C} is not well-formed because the -binding of \code{T} in \code{C} is -~\lstinline@type T <: B@, -which fails to subsume the binding ~\lstinline@type T <: A@~ of \code{T} -in type \code{A}. The problem can be solved by adding an overriding -definition of type \code{T} in class \code{C}: -\begin{lstlisting} -class C extends A with B { type T <: C } -\end{lstlisting} - -\subsection{Modifiers} -\label{sec:modifiers} - -\syntax\begin{lstlisting} - Modifier ::= LocalModifier - | private - | protected - | override - LocalModifier ::= abstract - | final - | sealed -\end{lstlisting} - -Member definitions may be preceded by modifiers which affect the -\ifqualified{qualified names, }accessibility and usage of the -identifiers bound by them. If several modifiers are given, their -order does not matter, but the same modifier may not occur repeatedly. -Modifiers preceding a repeated definition apply to all constituent -definitions. The rules governing the validity and meaning of a -modifier are as follows. -\begin{itemize} -\item -The \code{private} modifier can be used with any definition in a -template. Private members can be accessed only from within the template -that defines them. -%Furthermore, accesses are not permitted in -%packagings (\sref{sec:topdefs}) other than the one containing the -%definition. -Private members are not inherited by subclasses and they -may not override definitions in parent classes. -\code{private} may not be applied to abstract members, and it -may not be combined in one modifier list with -\code{protected}, \code{final} or \code{override}. -\item -The \code{protected} modifier applies to class member definitions. -Protected members can be accessed from within the template of the defining -class as well as in all templates that have the defining class as a base class. -%Furthermore, accesses from the template of the defining class are not -%permitted in packagings other than the one -%containing the definition. -A protected identifier $x$ may be used as -a member name in a selection \lstinline@$r$.$x$@ only if $r$ is one of the reserved -words \code{this} and -\code{super}, or if $r$'s type conforms to a type-instance of the class -which contains the access. -\item -The \code{override} modifier applies to class member definitions. It -is mandatory for member definitions that override some other concrete -member definition in a super- or mixin-class. If an \code{override} -modifier is given, there must be at least one overridden member -definition. - -The \code{override} modifier has an additional significance when -combined with the \code{abstract} modifier. That modifier combination -is only allowed for members of abstract classes. A member -labelled \code{abstract} and \code{override} must override some -member of the superclass of the class containing the definition. - -We call a member of a template {\em incomplete} if it is either -abstract (i.e.\ defined by a declaration), or it is labelled -\code{abstract} and \code{override} and it overrides an incomplete -member of the template's superclass. - -Note that the \code{abstract override} modifier combination does not -influence the concept whether a member is concrete or -abstract. A member for which only a declaration is given is abstract, -whereas a member for which a full definition is given is concrete. - -\item -The \code{abstract} modifier is used in class definitions. It is -mandatory if the class has incomplete members. Abstract classes -cannot be instantiated (\sref{sec:inst-creation}) with a constructor -invocation unless followed by mixin constructors or statements which -override all incomplete members of the class. - -The \code{abstract} modifier can also be used in conjunction with -\code{override} for class member definitions. In that case the meaning -of the previous discussion applies. -\item -The \code{final} modifier applies to class member definitions and to -class definitions. A \code{final} class member definition may not be -overridden in subclasses. A \code{final} class may not be inherited by -a template. \code{final} is redundant for object definitions. Members -of final classes or objects are implicitly also final, so the -\code{final} modifier is redundant for them, too. \code{final} may -not be applied to incomplete members, and it may not be combined in one -modifier list with \code{private} or \code{sealed}. -\item -The \code{sealed} modifier applies to class definitions. A -\code{sealed} class may not be inherited, except if either -\begin{itemize} -\item -the inheriting template is nested within the definition of the sealed -class itself, or -\item -the inheriting template belongs to a class or object definition which -forms part of the same statement sequence as the definition of the -sealed class. -\end{itemize} -\end{itemize} - -\example A useful idiom to prevent clients of a class from -constructing new instances of that class is to declare the class -\code{abstract} and \code{sealed}: - -\begin{lstlisting} -object m { - abstract sealed class C (x: Int) { - def nextC = C(x + 1) {} - } - val empty = new C(0) {} -} -\end{lstlisting} -For instance, in the code above clients can create instances of class -\lstinline@m.C@ only by calling the \code{nextC} method of an existing \lstinline@m.C@ -object; it is not possible for clients to create objects of class -\lstinline@m.C@ directly. Indeed the following two lines are both in error: - -\begin{lstlisting} - m.C(0) // **** error: C is abstract, so it cannot be instantiated. - m.C(0) {} // **** error: illegal inheritance from sealed class. -\end{lstlisting} - -\section{Class Definitions} -\label{sec:classes} - -\syntax\begin{lstlisting} - ClsDef ::= class ClassDef {`,' ClassDef} - ClassDef ::= id [TypeParamClause] [ParamClause] - [`:' SimpleType] ClassTemplate - ClassTemplate ::= extends Template - | TemplateBody - | -\end{lstlisting} - -The most general form of class definition is -~\lstinline@class $c$[$\tps\,$]($ps\,$): $s$ extends $t$@. -Here, -\begin{itemize} -\item[] -$c$ is the name of the class to be defined. -\item[] $\tps$ is a non-empty list of type parameters of the class -being defined. The scope of a type parameter is the whole class -definition including the type parameter section itself. It is -illegal to define two type parameters with the same name. The type -parameter section \lstinline@[$\tps\,$]@ may be omitted. A class with a type -parameter section is called {\em polymorphic}, otherwise it is called -{\em monomorphic}. -\item[] -$ps$ is a formal value parameter clause for the {\em primary -constructor} of the class. The scope of a formal value parameter includes -the template $t$. However, a formal value parameter may not form -part of the types of any of the parent classes or members of $t$. -It is illegal to define two formal value parameters with the same name. -The formal parameter section \lstinline@($ps\,$)@ may be omitted, in which case -an empty parameter section \lstinline@()@ is assumed. -\item[] -$s$ is the {\em self type} of the class. Inside the -class, the type of \code{this} is assumed to be $s$. The self -type must conform to the self types of all classes which are inherited -by the template $t$. The self type declaration `\lstinline@:$s$@' may be -omitted, in which case the self type of the class is assumed to be -equal to \lstinline@$c$[$\tps\,$]@. -\item[] -$t$ is a -template (\sref{sec:templates}) of the form -\begin{lstlisting} -$sc$ with $mc_1$ with $\ldots$ with $mc_n$ { $\stats$ } $\gap(n \geq 0)$ -\end{lstlisting} -which defines the base classes, behavior and initial state of objects of -the class. The extends clause ~\lstinline@extends $sc$@~ -can be omitted, in which case -~\lstinline@extends scala.AnyRef@~ is assumed. The class body -~\lstinline@{$\stats\,$}@~ may also be omitted, in which case the empty body -\lstinline@{}@ is assumed. -\end{itemize} -This class definition defines a type \lstinline@$c$[$\tps\,$]@ and a constructor -which when applied to parameters conforming to types $ps$ -initializes instances of type \lstinline@$c$[$\tps\,$]@ by evaluating the template -$t$. - -\subsection{Constructor Definitions}\label{sec:constr-defs} - -\syntax\begin{lstlisting} - FunDef ::= this ParamClause`=' ConstrExpr - ConstrExpr ::= this ArgumentExprs - | `{' this ArgumentExprs {`;' BlockStat} `}' -\end{lstlisting} - -A class may have additional constructors besides the primary -constructor. These are defined by constructor definitions of the form -~\lstinline@def this($ps\,$) = $e$@. Such a definition introduces an -additional constructor for the enclosing class, with parameters as -given in the formal parameter list $ps$, and whose evaluation is -defined by the constructor expression $e$. The scope of each formal -parameter is the constructor expression $e$. A constructor expression -is either a self constructor invocation \lstinline@this($\args\,$)@ or -a block which begins with a self constructor invocation. Neither the -signature, nor the self constructor invocation of a constructor -definition may refer to \verb@this@, or refer to value parameters or -members of the enclosing class by simple name. - -If there are auxiliary constructors of a class $C$, they define -together with $C$'s primary constructor an overloaded constructor -value. The usual rules for overloading resolution -(\sref{sec:overloaded-defs}) apply for constructor invocations of $C$, -including the self constructor invocations in the constructor -expressions themselves. To prevent infinite cycles of constructor -invocations, there is the restriction that every self constructor -invocation must refer to a constructor definition which precedes it -(i.e. it must refer to either a preceding auxiliary constructor or the -primary constructor of the class). The type of a constructor -expression must be always so that a generic instance of the class is -constructed. I.e., if the class in question has name $C$ and type -parameters \lstinline@[$\tps\,$]@, then each constructor must construct an -instance of \lstinline@$C$[$\tps\,$]@; it is not permitted to instantiate formal -type parameters. - -\example Consider the class definition - -\begin{lstlisting} -class LinkedList[a]() { - var head = _; - var tail = null; - def isEmpty = tail != null; - def this(head: a) = { this(); this.head = head; } - def this(head: a, tail: List[a]) = { this(head); this.tail = tail } -} -\end{lstlisting} -This defines a class \code{LinkedList} with an overloaded constructor of type -\begin{lstlisting} -[a](): LinkedList[a] $\overload$ -[a](x: a): LinkedList[a] $\overload$ -[a](x: a, xs: LinkList[a]): LinkedList[a] . -\end{lstlisting} -The second constructor alternative constructs an singleton list, while the -third one constructs a list with a given head and tail. - -\subsection{Case Classes} -\label{sec:case-classes} - -\syntax\begin{lstlisting} - ClsDef ::= case class ClassDef {`,' ClassDef} -\end{lstlisting} - -If a class definition is prefixed with \code{case}, the class is said -to be a {\em case class}. The primary constructor of a case class may -be used in a constructor pattern (\sref{sec:patterns}). -The following four restrictions ensure efficient pattern matching for -case classes. -\begin{enumerate} -\item None of the base classes of a case class may be a case -class. -\item No type may have two different case classes among its base types. -\item A case class may not inherit indirectly from a -\lstinline@sealed@ class. That is, if a base class $b$ of a case class $c$ -is marked \lstinline@sealed@, then $b$ must be a parent class of $c$. -\item -The primary constructor of a case class may not have any call-by-name -parameters (\sref{sec:parameters}). -\end{enumerate} - -A case class definition of ~\lstinline@$c$[$\tps\,$]($ps\,$)@~ with type -parameters $\tps$ and value parameters $ps$ implicitly -generates a function definition for a {\em case class factory} -together with the class definition itself: -\begin{lstlisting} -def c[$\tps\,$]($ps\,$): $s$ = new $c$[$\tps\,$]($ps\,$) -\end{lstlisting} -(Here, $s$ is the self type of class $c$. -If a type parameter section -is missing in the class, it is also missing in the factory -definition). - -Also implicitly defined are accessor member definitions -in the class that return its value parameters. Every binding -$x: T$ in the parameter section leads to a value definition of -$x$ that defines $x$ to be an alias of the parameter. -%Every -%parameterless function binding \lstinline@def x: T@ leads to a -%parameterless function definition of $x$ which returns the result -%of invoking the parameter function. -%The case class may not contain a -%directly bound member with the same simple name as one of its value -%parameters. - -Every case class implicitly overrides some method definitions of class -\lstinline@scala.AnyRef@ (\sref{sec:cls-object}) unless a definition of the same -method is already given in the case class itself or a concrete -definition of the same method is given in some base class of the case -class different from \code{AnyRef}. In particular: -\begin{itemize} -\item[] Method ~\lstinline@equals: (Any)boolean@~ is structural equality, where two -instances are equal if they belong to the same class and -have equal (with respect to \code{equals}) primary constructor arguments. -\item[] Method ~\lstinline@hashCode: ()int@~ computes a hash-code -depending on the data structure in a way which maps equal (with respect to -\code{equals}) values to equal hash-codes. -\item[] Method ~\lstinline@toString: ()String@~ returns a string representation which -contains the name of the class and its primary constructor arguments. -\end{itemize} - -\example Here is the definition of abstract syntax for lambda -calculus: - -\begin{lstlisting} -class Expr; -case class - Var (x: String) extends Expr, - Apply (f: Expr, e: Expr) extends Expr, - Lambda (x: String, e: Expr) extends Expr; -\end{lstlisting} -This defines a class \code{Expr} with case classes -\code{Var}, \code{Apply} and \code{Lambda}. A call-by-value evaluator for lambda -expressions could then be written as follows. - -\begin{lstlisting} -type Env = String => Value; -case class Value(e: Expr, env: Env); - -def eval(e: Expr, env: Env): Value = e match { - case Var (x) => - env(x) - case Apply(f, g) => - val Value(Lambda (x, e1), env1) = eval(f, env); - val v = eval(g, env); - eval (e1, (y => if (y == x) v else env1(y))) - case Lambda(_, _) => - Value(e, env) -} -\end{lstlisting} - -It is possible to define further case classes that extend type -\code{Expr} in other parts of the program, for instance -\begin{lstlisting} -case class Number(x: Int) extends Expr; -\end{lstlisting} - -This form of extensibility can be excluded by declaring the base class -\code{Expr} \code{sealed}; in this case, the only classes permitted to -extend \code{Expr} are those which are nested inside \code{Expr}, or -which appear in the same statement sequence as the definition of -\code{Expr}. - -\section{Traits} - -\label{sec:traits} - -\syntax\begin{lstlisting} - ClsDef ::= trait ClassDef {`,' ClassDef} -\end{lstlisting} - -A class definition which starts with the reserved word \code{trait} -instead of \code{class} defines a trait. A trait is a specific -instance of an abstract class, so the \code{abstract} modifier is -redundant for it. The template of a trait must satisfy the following -three restrictions. -\begin{enumerate} -\item All base classes of the trait are traits. -\item All parent class constructors of a template - must be primary constructors with empty value - parameter lists. -\item All non-empty statements in the template are either imports or pure definitions. -\end{enumerate} -A {\em pure} definition can be evaluated without any side effect. -Function, type, class, or object definitions are always pure. A value -definition is pure if its right-hand side expression is pure. Pure -expressions are paths, literals, and typed expressions -$e: T$ where $e$ is pure. - -These restrictions ensure that the evaluation of the mixin constructor -of a trait has no effect. Therefore, traits may appear several times -in the base classes of a template, whereas other classes cannot. -%\item Packagings may add interface classes as new base classes to an -%existing class or module. - -\example\label{ex:comparable} -The following trait class defines the property of being -ordered, i.e. comparable to objects of some type. It contains an abstract method -\lstinline@<@ and default implementations of the other comparison operators -\lstinline@<=@, \lstinline@>@, and \lstinline@>=@. - -\begin{lstlisting} -trait Ord[t <: Ord[t]]: t { - def < (that: t): Boolean; - def <=(that: t): Boolean = this < that || this == that; - def > (that: t): Boolean = that < this; - def >=(that: t): Boolean = that <= this; -} -\end{lstlisting} - -\section{Object Definitions} -\label{sec:modules} -\label{sec:object-defs} - -\syntax\begin{lstlisting} - ObjectDef ::= id [`:' SimpleType] ClassTemplate -\end{lstlisting} - -An object definition defines a single object of a new class. Its -most general form is -~\lstinline@object $m$: $s$ extends $t$@. Here, -\begin{itemize} -\item[] -$m$ is the name of the object to be defined. -\item[] $s$ is the {\em self type} of the object. References to $m$ -are assumed to have type $s$. Furthermore, inside the template $t$, -the type of \code{this} is also assumed to be $s$. The type of the -anonymous class defined by $t$ must conform to $s$ and $s$ must -conform to the self types of all classes which are inherited by -$t$. The self type declaration `$:s$' may be omitted, in which case -the self type is assumed to be equal to the anonymous class defined by -$t$. -\item[] -$t$ is a -template (\sref{sec:templates}) of the form -\begin{lstlisting} -$sc$ with $mc_1$ with $\ldots$ with $mc_n$ { $\stats$ } -\end{lstlisting} -which defines the base classes, behavior and initial state of $m$. -The extends clause ~\lstinline@extends $sc$@~ -can be omitted, in which case -~\lstinline@extends scala.AnyRef@~ is assumed. The class body -~\lstinline@{$\stats\,$}@~ may also be omitted, in which case the empty body -\lstinline@{}@ is assumed. -\end{itemize} -The object definition defines a single object (or: {\em module}) -conforming to the template $t$. It is roughly equivalent to a class -definition and a value definition that creates an object of the class: -\begin{lstlisting} -final class $m\Dollar$cls: $s$ extends $t$; -final val $m$: $s$ = new m$\Dollar$cls; -\end{lstlisting} -(The \code{final} modifiers are omitted if the definition occurs as -part of a block. The class name \lstinline@$m\Dollar$cls@ is not -accessible for user programs.) - -There are however two differences between an object definition and a -pair of class and value definitions such as the one given above. First, -object definitions may appear as top-level definitions in a -compilation unit, whereas value definitions may not. Second, the -module defined by an object definition is instantiated lazily. The -~\lstinline@new $m\Dollar$cls@~ constructor is evaluated not at the point -of the object definition, but is instead evaluated the first time $m$ -is dereferenced during execution of the program (which might be never -at all). An attempt to dereference $m$ again in the course of -evaluation of the constructor leads to a infinite loop or run-time -error. Other threads trying to dereference $m$ while the constructor -is being evaluated block until evaluation is complete. - -\example -Classes in Scala do not have static members; however, an equivalent -effect can be achieved by an accompanying object definition -E.g. -\begin{lstlisting} -abstract class Point { - val x: Double; - val y: Double; - def isOrigin = (x == 0.0 && y == 0.0); -} -object Point { - val origin = new Point() { val x = 0.0, y = 0.0 } -} -\end{lstlisting} -This defines a class \code{Point} and an object \code{Point} which -contains \code{origin} as a member. Note that the double use of the -name \code{Point} is legal, since the class definition defines the name -\code{Point} in the type name space, whereas the object definition -defines a name in the term namespace. - -This technique is applied by the Scala compiler when interpreting a -Java class with static members. Such a class $C$ is conceptually seen -as a pair of a Scala class that contains all instance members of $C$ -and a Scala object that contains all static members of $C$. - -\comment{ -\example Here's an outline of a module definition for a file system. - -\begin{lstlisting} -module FileSystem { - private type FileDirectory; - private val dir: FileDirectory - - interface File { - def read(xs: Array[Byte]) - def close: Unit - } - - private class FileHandle extends File { $\ldots$ } - - def open(name: String): File = $\ldots$ -} -\end{lstlisting} -} - -\chapter{Expressions} -\label{sec:exprs} - -\syntax\begin{lstlisting} - Expr ::= [Bindings `=>'] Expr - | Expr1 - Expr1 ::= if `(' Expr `)' Expr [[`;'] else Expr] - | try `{' block `}' [catch Expr] [finally Expr] - | while '(' Expr ')' Expr - | do Expr [`;'] while `(' Expr ')' - | for `(' Enumerators `)' (do | yield) Expr - | return [Expr] - | throw Expr - | [SimpleExpr `.'] id `=' Expr - | SimpleExpr ArgumentExprs `=' Expr - | PostfixExpr [`:' Type1] - PostfixExpr ::= InfixExpr [id] - InfixExpr ::= PrefixExpr - | InfixExpr id PrefixExpr - PrefixExpr ::= [`-' | `+' | `~' | `!'] SimpleExpr - SimpleExpr ::= Literal - | Path - | `(' [Expr] `)' - | BlockExpr - | new Template - | SimpleExpr `.' id - | SimpleExpr TypeArgs - | SimpleExpr ArgumentExprs - ArgumentExprs ::= `(' [Exprs] ')' - | BlockExpr - BlockExpr ::= `{' CaseClause {CaseClause} `}' - | `{' Block `}' - Block ::= {BlockStat `;'} [ResultExpr] - ResultExpr ::= Expr1 - | Bindings `=>' Block - Exprs ::= Expr {`,' Expr} -\end{lstlisting} - -Expressions are composed of operators and operands. Expression forms are -discussed subsequently in decreasing order of precedence. - -The typing of expressions is often relative to some {\em expected -type}. When we write ``expression $e$ is expected to conform to -type $T$'', we mean: (1) the expected type of $e$ is -$T$, and (2) the type of expression $e$ must conform to -$T$. - -\section{Literals} - -\syntax\begin{lstlisting} - SimpleExpr ::= Literal - Literal ::= intLit - | floatLit - | charLit - | stringLit - | symbolLit - | true - | false - | null -\end{lstlisting} - -Typing and evaluation of numeric, character, and string literals are -generally as in Java. An integer literal denotes an integer -number. Its type is normally \code{int}. However, if the expected type -$\proto$ of the expression is either \code{byte}, \code{short}, or -\code{char} and the integer number fits in the numeric range defined -by the type, then the number is converted to type $\proto$ and the -expression's type is $\proto$. A floating point literal denotes a -single-precision or double precision IEEE floating point number. A -character literal denotes a Unicode character. A string literal -denotes a member of \lstinline@String@. - -A symbol literal ~\lstinline@'$x$@~ is a shorthand for the expression -~\lstinline@scala.Symbol("$x$")@. If the symbol literal is followed by -actual parameters, as in ~\lstinline@'$x$($\args\,$)@, then the whole -expression is taken to be a shorthand for -~\lstinline@scala.Symbol("$x$", $\args\,$)@. - -The boolean truth values are denoted by the reserved words \code{true} -and \code{false}. The type of these expressions is \code{boolean}, and -their evaluation is immediate. - -The \code{null} literal is of type \lstinline@scala.AllRef@. It -denotes a reference value which refers to a special ``null' object, -which implements methods in class \lstinline@scala.AnyRef@ as follows: -\begin{itemize} -\item -\lstinline@eq($x\,$)@, \lstinline@==($x\,$)@, \lstinline@equals($x\,$)@ return \code{true} iff their -argument $x$ is also the ``null'' object. -\item -\lstinline@isInstanceOf[$T\,$]@ always returns \code{false}. -\item -\lstinline@asInstanceOf[$T\,$]@ returns the ``null'' object itself if -$T$ conforms to \lstinline@scala.AnyRef@, and throws a -\lstinline@NullPointerException@ otherwise. -\item -\code{toString()} returns the string ``null''. -\end{itemize} -A reference to any other member of the ``null'' object causes a -\code{NullPointerException} to be thrown. - -\section{Designators} -\label{sec:designators} - -\syntax\begin{lstlisting} - Designator ::= Path - | SimpleExpr `.' id -\end{lstlisting} - -A designator refers to a named term. It can be a {\em simple name} or -a {\em selection}. If $r$ is a stable identifier of type $T$, the -selection $r.x$ refers to the term member of $r$ that is identified in -$T$ by the name $x$. For other expressions $e$, $e.x$ is typed as if -it was $(\VAL;y=e\semi y.x)$ for some fresh name $y$. The typing rules -for blocks implies that in that case $x$'s type may not refer to any -abstract type member of $e$. - -The expected type of a designator's prefix is always missing. -The -type of a designator is normally the type of the entity it refers -to. However, if the designator is a path (\sref{sec:paths}) $p$, -its type is \lstinline@$p$.type@, provided the expression's expected type is -a singleton type, or $p$ occurs as the prefix of a selection -or type selection. - -The selection $e.x$ is evaluated by first evaluating the qualifier -expression $e$. The selection's result is then the value to which the -selector identifier is bound in the object resulting from evaluation of $e$. - -\section{This and Super} -\label{sec:this-super} - -\syntax\begin{lstlisting} - SimpleExpr ::= [id `.'] this - | [id `.'] super [`[' id `]'] `.' id -\end{lstlisting} - -The expression \code{this} can appear in the statement part of a -template or compound type. It stands for the object being defined by -the innermost template or compound type enclosing the reference. If -this is a compound type, the type of \code{this} is that compound type. -If it is a template of an instance creation expression, the type of -\code{this} is the type of that template. If it is a template of a -class or object definition with simple name $C$, the type of this -is the same as the type of \lstinline@$C$.this@. - -The expression \lstinline@$C$.this@ is legal in the statement part of an -enclosing class or object definition with simple name $C$. It -stands for the object being defined by the innermost such definition. -If the expression's expected type is a singleton type, or -\lstinline@$C$.this@ occurs as the prefix of a selection, its type is -\lstinline@$C$.this.type@, otherwise it is the self type of class $C$. - -A reference \lstinline@super.$m$@ in a template refers to the -definition of $m$ in the actual superclass (\sref{sec:base-classes}) -of the template. A reference \lstinline@$C$.super.$m$@ refers to the -definition of $m$ in the actual superclass of the innermost enclosing -class or object definition named $C$ which encloses the reference. The -definition $m$ referred to via \code{super} or \lstinline@$C$.super@ -must be concrete, or the template containing the reference must have an -incomplete (\sref{sec:modifiers}) member $m'$ which overrides $m$. - -The \code{super} prefix may be followed by a mixin qualifier -\lstinline@[$M\,$]@, as in \lstinline@$C$.super[$M\,$].$x$@. This is called a {\em mixin -super reference}. In this case, the reference is to the member of -$x$ in the (first) mixin class of $C$ whose simple name -is $M$. That member may not be abstract. - -\example\label{ex:super} -Consider the following class definitions - -\begin{lstlisting} -class Root { val x = "Root" } -class A extends Root { override val x = "A" ; val superA = super.x } -class B extends Root { override val x = "B" ; val superB = super.x } -class C extends A with B { - override val x = "C" ; val superC = super.x -} -class D extends A { val superD = super.x } -class E extends C with D { val superE = super.x } -\end{lstlisting} -Then we have: -\begin{lstlisting} -(new A).superA == "Root", (new B).superB == "Root" -(new C).superA == "Root", (new C).superB == "A", (new C).superC == "A" -(new D).superA == "Root", (new D).superD == "A" -(new E).superA == "Root", (new E).superB == "A", (new E).superC == "A", - (new E).superD == "C", (new E).superE == "C" -\end{lstlisting} -Note that the \code{superB} function returns different results -depending on whether \code{B} is used as defining class or as a mixin class. - -\example Consider the following class definitions: -\begin{lstlisting} -class Shape { - override def equals(other: Any) = $\ldots$; - $\ldots$ -} -trait Bordered extends Shape { - val thickness: int; - override def equals(other: Any) = other match { - case that: Bordered => - super equals other && this.thickness == that.thickness - case _ => false - } - $\ldots$ -} -trait Colored extends Shape { - val color: Color; - override def equals(other: Any) = other match { - case that: Colored => - super equals other && this.color == that.color - case _ => false - } - $\ldots$ -} -\end{lstlisting} - -Both definitions of \code{equals} are combined in the class -below. -\begin{lstlisting} -trait BorderedColoredShape extends Shape with Bordered with Colored { - override def equals(other: Any) = - super[Bordered].equals(that) && super[Colored].equals(that) -} -\end{lstlisting} - -\section{Function Applications} -\label{sec:apply} - -\syntax\begin{lstlisting} - SimpleExpr ::= SimpleExpr ArgumentExprs -\end{lstlisting} - -An application \lstinline@$f$($e_1 \commadots e_n$)@ applies the function $f$ to the -argument expressions $e_1 \commadots e_n$. If $f$ has a method type -\lstinline@($T_1 \commadots T_n$)U@, the type of each argument -expression $e_i$ must conform to the corresponding parameter type -$T_i$. If $f$ has some value type, the application is taken to be -equivalent to \lstinline@$f$.apply($e_1 \commadots e_n$)@, i.e.\ the -application of an \code{apply} method defined by $f$. - -%Class constructor functions -%(\sref{sec:classes}) can only be applied in constructor invocations -%(\sref{sec:constr-invoke}), never in expressions. - -Evaluation of \lstinline@$f$($e_1 \commadots e_n$)@ usually entails evaluation of -$f$ and $e_1 \commadots e_n$ in that order. Each argument expression -is converted to the type of its corresponding formal parameter. After -that, the application is rewritten to the function's right hand side, -with actual arguments substituted for formal parameters. The result -of evaluating the rewritten right-hand side is finally converted to -the function's declared result type, if one is given. - -The case of a formal \code{def}-parameter with a parameterless -method type \lstinline@[]$T$@ is treated specially. In this case, the -corresponding actual argument expression is not evaluated before the -application. Instead, every use of the formal parameter on the -right-hand side of the rewrite rule entails a re-evaluation of the -actual argument expression. In other words, the evaluation order for -\code{def}-parameters is {\em call-by-name} whereas the evaluation -order for normal parameters is {\em call-by-value}. - -\section{Type Applications} -\label{sec:type-app} -\syntax\begin{lstlisting} - SimpleExpr ::= SimpleExpr `[' Types `]' -\end{lstlisting} - -A type application \lstinline@$e$[$T_1 \commadots T_n$]@ instantiates a -polymorphic value $e$ of type -~\lstinline@[$a_1$ >: $L_1$ <: $U_1 \commadots a_n$ >: $L_n$ <: $U_n$]S@~ with -argument types \lstinline@$T_1 \commadots T_n$@. Every argument type -$T_i$ must obey corresponding bounds $L_i$ and -$U_i$. That is, for each $i = 1 \commadots n$, we must -have $L_i \sigma \conforms T_i \conforms U_i \sigma$, where $\sigma$ is the -substitution $[a_1 := T_1 \commadots a_n := T_n]$. The type -of the application is \lstinline@S$\sigma$@. - -The function part $e$ may also have some value type. In this case -the type application is taken to be equivalent to -~\lstinline@$e$.apply[$T_1 \commadots$ T$_n$]@, i.e.\ the -application of an \code{apply} method defined by $e$. - -Type applications can be omitted if local type inference -(\sref{sec:local-type-inf}) can infer best type parameters for a -polymorphic functions from the types of the actual function arguments -and the expected result type. - -\section{References to Overloaded Bindings} -\label{sec:overloaded-refs} - -If a name $f$ referenced in an identifier or selection is -overloaded (\sref{sec:overloaded-defs}), the context of the reference -has to identify a unique alternative of the overloaded binding. The -way this is done depends on whether or not $f$ is used as a -function. Let $\AA$ be the set of all type alternatives of -$f$. - -Assume first that $f$ appears as a function in an application, as -in \lstinline@$f$($\args\,$)@. If there is precisely one alternative in -$\AA$ which is a (possibly polymorphic) method type whose arity -matches the number of arguments given, that alternative is chosen. - -Otherwise, let $\argtypes$ be the vector of types obtained by -typing each argument with a missing expected type. One determines -first the set of applicable alternatives. A method type alternative is -{\em applicable} if each type in $\argtypes$ is compatible with -the corresponding formal parameter type in the alternative, and, if -the expected type is defined, the method's result type is compatible to -it. A polymorphic method type is applicable if local type inference -can determine type arguments so that the instantiated method type is -applicable. - -Here, a type $T$ is {\em compatible} to a type $U$ if $T$ -conforms to $U$ after applying implicit conversions -(\sref{sec:impl-conv}). - -Let $\BB$ be the set of applicable alternatives. It is an error if -$\BB$ is empty. Otherwise, one chooses the {\em most specific} -alternative among the alternatives in $\BB$, according to the -following definition of being ``more specific''. -\begin{itemize} -\item -A method type \lstinline@($\Ts\,$)$U$@ is more specific than some other -type $S$ if $S$ is applicable to arguments \lstinline@($ps\,$)@ of -types $\Ts$. -\item -A polymorphic method type -~\lstinline@[$a_1$ >: $L_1$ <: $U_1 \commadots a_n$ >: $L_n$ <: $U_n$]T@~ is -more specific than some other type $S$ if $T$ is more -specific than $S$ under the assumption that for -$i = 1 \commadots n$ each $a_i$ is an abstract type name -bounded from below by $L_i$ and from above by $U_i$. -\item -Any other type is always more specific than a parameterized method -type or a polymorphic type. -\end{itemize} -It is an error if there is no unique alternative in $\BB$ which is -more specific than all other alternatives in $\BB$. - -Assume next that $f$ appears as a function in a type -application, as in \lstinline@$f$[$\targs\,$]@. Then we choose an alternative in -$\AA$ which takes the same number of type parameters as there are -type arguments in $\targs$. It is an error if no such alternative -exists, or if it is not unique. - -Assume finally that $f$ does not appear as a function in either -an application or a type application. If an expected type is given, -let $\BB$ be the set of those alternatives in $\AA$ which are -compatible to it. Otherwise, let $\BB$ be the same as $\AA$. -We choose in this case the most specific alternative among all -alternatives in $\BB$. It is an error if there is no unique -alternative in $\BB$ which is more specific than all other -alternatives in $\BB$. - -\example Consider the following definitions: - -\begin{lstlisting} - class A extends B {} - def f(x: B, y: B) = $\ldots$ - def f(x: A, y: B) = $\ldots$ - val a: A, b: B -\end{lstlisting} -Then the application \lstinline@f(b, b)@ refers to the first -definition of $f$ whereas the application \lstinline@f(a, a)@ -refers to the second. Assume now we add a third overloaded definition -\begin{lstlisting} - def f(x: B, y: A) = $\ldots$ -\end{lstlisting} -Then the application \lstinline@f(a, a)@ is rejected for being ambiguous, since -no most specific applicable signature exists. - -\section{Instance Creation Expressions} -\label{sec:inst-creation} - -\syntax\begin{lstlisting} - SimpleExpr ::= new Template -\end{lstlisting} - -A simple instance creation expression is of the form ~\lstinline@new $c$@~ -where $c$ is a constructor invocation -(\sref{sec:constr-invoke}). Let $T$ be the type of $c$. Then $T$ must -denote a (a type instance of) a non-abstract subclass of -\lstinline@scala.AnyRef@ which conforms to its self type -(\sref{sec:classes}). The expression is evaluated by creating a fresh -object of type $T$ which is is initialized by evaluating $c$. The -type of the expression is $T$'s self type (which might be less -specific than $T\,$). - -A general instance creation expression is of the form -\begin{lstlisting} -new $sc$ with $mc_1$ with $\ldots$ with $mc_n$ {$\stats\,$} -\end{lstlisting} -where $n \geq 0$, $sc$ as well as $mc_1 \commadots mc_n$ are -constructor invocations (of types $S, T_1 \commadots T_n$, say) and -$\stats$ is a statement sequence containing initializer statements and -member definitions (\sref{sec:members}). The type of such an instance -creation expression is then the compound type -\lstinline@$S$ with $T_1$ with $\ldots$ with $T_n$ {$R\,$}@, -where \lstinline@{$R\,$}@ is -a refinement (\sref{sec:compound-types}) which declares exactly those -members of $\stats$ that override a member of $S$ or $T_1 \commadots -T_n$. \todo{what about methods and overloaded defs?} For this type to -be well-formed, $R$ may not reference types defined in $\stats$ which -do not themselves form part of $R$. - -The instance creation expression is evaluated by creating a fresh -object, which is initialized by evaluating the expression template. - -\example Consider the class -\begin{lstlisting} -abstract class C { - type T; val x: T; def f(x: T): AnyRef -} -\end{lstlisting} -and the instance creation expression -\begin{lstlisting} -C { type T = Int; val x: T = 1; def f(x: T): T = y; val y: T = 2 } -\end{lstlisting} -Then the created object's type is: -\begin{lstlisting} -C { type T = Int; val x: T; def f(x: T): T } -\end{lstlisting} -The value $y$ is missing from the type, since $y$ does not -override a member of $C$. - -\section{Blocks} -\label{sec:blocks} - -\syntax\begin{lstlisting} - BlockExpr ::= `{' Block `}' - Block ::= [{BlockStat `;'} ResultExpr] -\end{lstlisting} - -A block expression ~\lstinline@{$s_1$; $\ldots$; $s_n$; $e\,$}@~ is constructed from a -sequence of block statements $s_1 \commadots s_n$ and a final -expression $e$. The final expression can be omitted, in which -case the unit value \lstinline@()@ is assumed. - -%Whether or not the scope includes the statement itself -%depends on the kind of definition. - -The expected type of the final expression $e$ is the expected -type of the block. The expected type of all preceding statements is -missing. - -The type of a block ~\lstinline@$s_1$; $\ldots$; $s_n$; $e$@~ is usually the type of -$e$. That type must be equivalent to a type which does not refer -to an entity defined locally in the block. If this condition is -violated, but a fully defined expected type is given, the type of the -block is instead assumed to be the expected type. - -Evaluation of the block entails evaluation of its statement sequence, -followed by an evaluation of the final expression $e$, which -defines the result of the block. - -\example -Written in isolation, -the block -\begin{lstlisting} -{ class C extends B {$\ldots$} ; new C } -\end{lstlisting} -is illegal, since its type -refers to class $C$, which is defined locally in the block. - -However, when used in a definition such as -\begin{lstlisting} -val x: B = { class C extends B {$\ldots$} ; new C } -\end{lstlisting} -the block is well-formed, since the problematic type $C$ can be -replaced by the expected type $B$. - -\section{Prefix, Infix, and Postfix Operations} -\label{sec:infix-operations} - -\syntax\begin{lstlisting} - PostfixExpr ::= InfixExpr [id] - InfixExpr ::= PrefixExpr - | InfixExpr id PrefixExpr - PrefixExpr ::= [`-' | `+' | `!' | `~'] SimpleExpr -\end{lstlisting} - -Expressions can be constructed from operands and operators. A prefix -operation $op;e$ consists of a prefix operator $op$, which -must be one of the identifiers `\lstinline@+@', `\lstinline@-@', `\lstinline@!@', or -`\lstinline@~@', and a simple expression $e$. The expression is -equivalent to the postfix method application $e.op$. - -Prefix operators are different from normal function applications in -that their operand expression need not be atomic. For instance, the -input sequence \lstinline@-sin(x)@ is read as \lstinline@-(sin(x))@, whereas the -function application \lstinline@negate sin(x)@ would be parsed as the -application of the infix operator \code{sin} to the operands -\code{negate} and \lstinline@(x)@. - -An infix or postfix operator can be an arbitrary identifier. Infix -operators have precedence and associativity defined as follows: - -The {\em precedence} of an infix operator is determined by the operator's first -character. Characters are listed below in increasing order of -precedence, with characters on the same line having the same precedence. -\begin{lstlisting} - $\mbox{\rm\sl(all letters)}$ - | - ^ - & - < > - = ! - : - + - - * / % - $\mbox{\rm\sl(all other special characters)}$ -\end{lstlisting} -That is, operators starting with a letter have lowest precedence, -followed by operators starting with `\lstinline@|@', etc. - -The {\em associativity} of an operator is determined by the operator's -last character. Operators ending with a colon `\lstinline@:@' are -right-associative. All other operators are left-associative. - -Precedence and associativity of operators determine the grouping of -parts of an expression as follows. -\begin{itemize} -\item If there are several infix operations in an -expression, then operators with higher precedence bind more closely -than operators with lower precedence. -\item If there are consecutive infix -operations $e_0; \op_1; e_1; \op_2 \ldots \op_n; e_n$ -with operators $\op_1 \commadots \op_n$ of the same precedence, -then all these operators must -have the same associativity. If all operators are left-associative, -the sequence is interpreted as -$(\ldots(e_0;\op_1;e_1);\op_2\ldots);\op_n;e_n$. -Otherwise, if all operators are right-associative, the -sequence is interpreted as -$e_0;\op_1;(e_1;\op_2;(\ldots \op_n;e_n)\ldots)$. -\item -Postfix operators always have lower precedence than infix -operators. E.g.\ $e_1;\op_1;e_2;\op_2$ is always equivalent to -$(e_1;\op_1;e_2);\op_2$. -\end{itemize} -A postfix operation $e;\op$ is interpreted as $e.\op$. A -left-associative binary operation $e_1;\op;e_2$ is interpreted as -$e_1.\op(e_2)$. If $\op$ is right-associative, the same operation is -interpreted as ~\lstinline@(val $x$=$e_1$; $e_2$.$\op$($x\,$))@, -where $x$ is a fresh name. - -\section{Typed Expressions} - -\syntax\begin{lstlisting} - Expr1 ::= PostfixExpr [`:' Type1] -\end{lstlisting} - -The typed expression $e: T$ has type $T$. The type of -expression $e$ is expected to conform to $T$. The result of -the expression is the value of $e$ converted to type $T$. - -\example Here are examples of well-typed and illegally typed expressions. - -\begin{lstlisting} - 1: int // legal, of type int - 1: long // legal, of type long - // 1: string // illegal -\end{lstlisting} - -\section{Assignments} - -\syntax\begin{lstlisting} - Expr1 ::= Designator `=' Expr - | SimpleExpr ArgumentExprs `=' Expr -\end{lstlisting} - -The interpretation of an assignment to a simple variable ~\lstinline@$x$ = $e$@~ -depends on the definition of $x$. If $x$ denotes a mutable -variable, then the assignment changes the current value of $x$ to be -the result of evaluating the expression $e$. The type of $e$ is -expected to conform to the type of $x$. If $x$ is a parameterless -function defined in some template, and the same template contains a -setter function \lstinline@$x$_=@ as member, then the assignment -~\lstinline@$x$ = $e$@~ is interpreted as the invocation -~\lstinline@$x$_=($e\,$)@~ of that setter function. Analogously, an -assignment ~\lstinline@$f.x$ = $e$@~ to a parameterless function $x$ -is interpreted as the invocation ~\lstinline@$f.x$_=($e\,$)@. - -An assignment ~\lstinline@$f$($\args\,$) = $e$@~ with a function application to the -left of the ``\lstinline@=@' operator is interpreted as -~\lstinline@$f.$update($\args$, $e\,$)@, i.e.\ -the invocation of an \code{update} function defined by $f$. - -\example \label{ex:imp-mat-mul} -Here is the usual imperative code for matrix multiplication. - -\begin{lstlisting} -def matmul(xss: Array[Array[double]], yss: Array[Array[double]]) = { - val zss: Array[Array[double]] = new Array(xss.length, yss.length); - var i = 0; - while (i < xss.length) { - var j = 0; - while (j < yss(0).length) { - var acc = 0.0; - var k = 0; - while (k < yss.length) { - acc = acc + xs(i)(k) * yss(k)(j); - k = k + 1 - } - zss(i)(j) = acc; - j = j + 1 - } - i = i + 1 - } - zss -} -\end{lstlisting} -Desugaring the array accesses and assignments yields the following -expanded version: -\begin{lstlisting} -def matmul(xss: Array[Array[double]], yss: Array[Array[double]]) = { - val zss: Array[Array[double]] = new Array(xss.length, yss.length); - var i = 0; - while (i < xss.length) { - var j = 0; - while (j < yss(0).length) { - var acc = 0.0; - var k = 0; - while (k < yss.length) { - acc = acc + xss.apply(i).apply(k) * yss.apply(k).apply(j); - k = k + 1 - } - zss.apply(i).update(j, acc); - j = j + 1 - } - i = i + 1 - } - zss -} -\end{lstlisting} - -\section{Conditional Expressions} - -\syntax\begin{lstlisting} - Expr1 ::= if `(' Expr `)' Expr [[`;'] else Expr] -\end{lstlisting} - -The conditional expression ~\lstinline@if ($e_1$) $e_2$ else $e_3$@~ chooses -one of the values of $e_2$ and $e_3$, depending on the -value of $e_1$. The condition $e_1$ is expected to -conform to type \code{boolean}. The then-part $e_2$ and the -else-part $e_3$ are both expected to conform to the expected -type of the conditional expression. The type of the conditional -expression is the least upper bound of the types of $e_1$ and -$e_2$. A semicolon preceding the \code{else} symbol of a -conditional expression is ignored. - -The conditional expression is evaluated by evaluating first -$e_1$. If this evaluates to \code{true}, the result of -evaluating $e_2$ is returned, otherwise the result of -evaluating $e_3$ is returned. - -A short form of the conditional expression eliminates the -else-part. The conditional expression ~\lstinline@if ($e_1$) $e_2$@~ is -evaluated as if it was ~\lstinline@if ($e_1$) $e_2$ else ()@. The type of -this expression is \code{unit} and the then-part -$e_2$ is also expected to conform to type \code{unit}. - -\section{While Loop Expressions} - -\syntax\begin{lstlisting} - Expr1 ::= while `(' Expr ')' Expr -\end{lstlisting} - -The while loop expression ~\lstinline@while ($e_1$) $e_2$@~ is typed and -evaluated as if it was an application of ~\lstinline@whileLoop ($e_1$) ($e_2$)@~ where -the hypothetical function \code{whileLoop} is defined as follows. - -\begin{lstlisting} - def whileLoop(def c: boolean)(def s: unit): unit = - if (c) { s ; while(c)(s) } else {} -\end{lstlisting} - -\example The loop -\begin{lstlisting} - while (x != 0) { y = y + 1/x ; x = x - 1 } -\end{lstlisting} -Is equivalent to the application -\begin{lstlisting} - whileLoop (x != 0) { y = y + 1/x ; x = x - 1 } -\end{lstlisting} -Note that this application will never produce a division-by-zero -error at run-time, since the -expression ~\lstinline@(y = 1/x)@~ will be evaluated in the body of -\code{while} only if the condition parameter is false. - -\section{Do Loop Expressions} - -\syntax\begin{lstlisting} - Expr1 ::= do Expr [`;'] while `(' Expr ')' -\end{lstlisting} - -The do loop expression ~\lstinline@do $e_1$ while ($e_2$)@~ is typed and -evaluated as if it was the expression ~\lstinline@($e_1$ ; while ($e_2$) $e_1$)@. -A semicolon preceding the \code{while} symbol of a do loop expression is ignored. - -\section{Comprehensions} - -\syntax\begin{lstlisting} - Expr1 ::= for `(' Enumerators `)' [yield] Expr - Enumerator ::= Generator {`;' Enumerator} - Enumerator ::= Generator - | Expr - Generator ::= val Pattern1 `<-' Expr -\end{lstlisting} - -A comprehension ~\lstinline@for ($\enums\,$) yield $e$@~ evaluates -expression $e$ for each binding generated by the enumerators -$\enums$. Enumerators start with a generator, which can be followed by -further generators or filters. A {\em generator} -~\lstinline@val $p$ <- $e$@~ -produces bindings from an expression $e$ which is matched in -some way against pattern $p$. A {\em filter} is an expressions which restricts -enumerated bindings. The precise meaning of generators and filters is -defined by translation to invocations of four methods: \code{map}, -\code{filter}, \code{flatMap}, and \code{foreach}. These methods can -be implemented in different ways for different carrier types. -\comment{As an -example, an implementation of these methods for lists is given in -\sref{cls-list}.} - -The translation scheme is as follows. -In a first step, every generator ~\lstinline@val $p$ <- $e$@, where $p$ is not -a pattern variable, is replaced by -\begin{lstlisting} -val $p$ <- $e$.filter { case $p$ => true; case _ => false } -\end{lstlisting} -Then, the following -rules are applied repeatedly until all comprehensions have been eliminated. -\begin{itemize} -\item -A generator ~\lstinline@val $p$ <- $e$@~ followed by a filter $f$ is translated to -a single generator ~\lstinline@val $p$ <- $e$.filter($x_1 \commadots x_n$ => $f\,$)@~ where -$x_1 \commadots x_n$ are the free variables of $p$. - -\item -A for-comprehension -~\lstinline@for (val $p$ <- $e\,$) yield $e'$@~ -is translated to -~\lstinline@$e$.map { case $p$ => $e'$ }@. - -\item -A for-comprehension -~\lstinline@for (val $p$ <- $e\,$) $e'$@~ -is translated to -~\lstinline@$e$.foreach { case $p$ => $e'$ }@. - -\item -A for-comprehension -\begin{lstlisting} -for (val $p$ <- $e$; val $p'$ <- $e'; \ldots$) yield $e''$ , -\end{lstlisting} -where \lstinline@$\ldots$@ is a (possibly empty) -sequence of generators or filters, -is translated to -\begin{lstlisting} -$e$.flatmap { case $p$ => for (val $p'$ <- $e'; \ldots$) yield $e''$ } . -\end{lstlisting} -\item -A for-comprehension -\begin{lstlisting} -for (val $p$ <- $e$; val $p'$ <- $e'; \ldots$) $e''$ . -\end{lstlisting} -where \lstinline@$\ldots$@ is a (possibly empty) -sequence of generators or filters, -is translated to -\begin{lstlisting} -$e$.foreach { case $p$ => for (val $p'$ <- $e'; \ldots$) $e''$ } . -\end{lstlisting} -\end{itemize} - -\example -the following code produces all pairs of numbers -between $1$ and $n-1$ whose sums are prime. -\begin{lstlisting} -for { val i <- range(1, n); - val j <- range(1, i); - isPrime(i+j) -} yield Pair (i, j) -\end{lstlisting} -The for-comprehension is translated to: -\begin{lstlisting} -range(1, n) - .flatMap { - case i => range(1, i) - .filter { j => isPrime(i+j) } - .map { case j => Pair(i, j) } } -\end{lstlisting} - -\comment{ -\example -\begin{lstlisting} -package class List[a] { - def map[b](f: (a)b): List[b] = match { - case <> => <> - case x :: xs => f(x) :: xs.map(f) - } - def filter(p: (a)Boolean) = match { - case <> => <> - case x :: xs => if p(x) then x :: xs.filter(p) else xs.filter(p) - } - def flatMap[b](f: (a)List[b]): List[b] = - if (isEmpty) Nil - else f(head) ::: tail.flatMap(f); - def foreach(f: (a)Unit): Unit = - if (isEmpty) () - else (f(head); tail.foreach(f)); -} -\end{lstlisting} - -\example -\begin{lstlisting} -abstract class Graph[Node] { - type Edge = (Node, Node) - val nodes: List[Node] - val edges: List[Edge] - def succs(n: Node) = for ((p, s) <- g.edges, p == n) s - def preds(n: Node) = for ((p, s) <- g.edges, s == n) p -} -def topsort[Node](g: Graph[Node]): List[Node] = { - val sources = for (n <- g.nodes, g.preds(n) == <>) n - if (g.nodes.isEmpty) <> - else if (sources.isEmpty) new Error(``topsort of cyclic graph'') throw - else sources :+: topsort(new Graph[Node] { - val nodes = g.nodes diff sources - val edges = for ((p, s) <- g.edges, !(sources contains p)) (p, s) - }) -} -\end{lstlisting} -} - -\example For comprehensions can be used to express vector -and matrix algorithms concisely. -For instance, here is a function to compute the transpose of a given matrix: - -\begin{lstlisting} -def transpose[a](xss: Array[Array[a]]) { - for (val i <- Array.range(0, xss(0).length)) yield - Array(for (val xs <- xss) yield xs(i)) -\end{lstlisting} - -Here is a function to compute the scalar product of two vectors: -\begin{lstlisting} -def scalprod(xs: Array[double], ys: Array[double]) { - var acc = 0.0; - for (val Pair(x, y) <- xs zip ys) acc = acc + x * y; - acc -} -\end{lstlisting} - -Finally, here is a function to compute the product of two matrices. Compare with the imperative version of \ref{ex:imp-mat-mul}. -\begin{lstlisting} -def matmul(xss: Array[Array[double]], yss: Array[Array[double]]) = { - val ysst = transpose(yss); - for (val xs <- xs) yield - for (val yst <- ysst) yield - scalprod(xs, yst) -} -\end{lstlisting} -The code above makes use of the fact that \code{map}, \code{flatmap}, -\code{filter}, and \code{foreach} are defined for members of class -\lstinline@scala.Array@. - -\section{Return Expressions} - -\syntax\begin{lstlisting} - Expr1 ::= return [Expr] -\end{lstlisting} - -A return expression ~\lstinline@return $e$@~ must occur inside the -body of some enclosing named method or function $f$. This function -must have an explicitly declared result type, and the type of $e$ must -conform to it. The return expression evaluates the expression $e$ and -returns its value as the result of $f$. The evaluation of any statements or -expressions following the return expression is omitted. The type of -a return expression is \code{scala.All}. - - - -\section{Throw Expressions} - -\syntax\begin{lstlisting} - Expr1 ::= throw Expr -\end{lstlisting} - -A throw expression ~\lstinline@throw $e$@~ evaluates the expression -$e$. The type of this expression must conform to -\code{Throwable}. If $e$ evaluates to an exception -reference, evaluation is aborted with the thrown exception. If $e$ -evaluates to \code{null}, evaluation is instead aborted with a -\code{NullPointerException}. If there is an active -\code{try} expression (\sref{sec:try}) which handles the thrown -exception, evaluation resumes with the handler; otherwise the thread -executing the \code{throw} is aborted. The type of a throw expression -is \code{scala.All}. - -\section{Try Expressions}\label{sec:try} - -\syntax\begin{lstlisting} - Expr1 ::= try `{' Block `}' [catch Expr] [finally Expr] -\end{lstlisting} - -A try expression ~\lstinline@try { $b$ } catch $e$@~ evaluates the block -$b$. If evaluation of $b$ does not cause an exception to be -thrown, the result of $b$ is returned. Otherwise the {\em -handler} $e$ is applied to the thrown exception. Let $\proto$ -be the expected type of the try expression. The block $b$ is -expected to conform to $\proto$. The handler $e$ is expected -conform to type ~\lstinline@scala.PartialFunction[scala.Throwable, $\proto\,$]@. -The type of the try expression is the least upper bound of the type of -$b$ and the result type of $e$. - -A try expression ~\lstinline@try { $b$ } finally $e$@~ evaluates the block -$b$. If evaluation of $b$ does not cause an exception to be -thrown, the expression $e$ is evaluated. If an exception is thrown -during evaluation of $e$, the evaluation of the try expression is -aborted with the thrown exception. If no exception is thrown during -evaluation of $e$, the result of $b$ is returned as the -result of the try expression. - -If an exception is thrown during -evaluation of $b$, the finally block -$e$ is also evaluated. If another exception $e$ is thrown -during evaluation of $e$, evaluation of the try expression is -aborted with the thrown exception. If no exception is thrown during -evaluation of $e$, the original exception thrown in $b$ is -re-thrown once evaluation of $e$ has completed. The block -$b$ is expected to conform to the expected type of the try -expression. The finally expression $e$ is expected to conform to -type \code{unit}. - -A try expression ~\lstinline@try { $b$ } catch $e_1$ finally $e_2$@~ is a shorthand -for ~\lstinline@try { try { $b$ } catch $e_1$ } finally $e_2$@. - - - - -\section{Anonymous Functions} -\label{sec:closures} - -\syntax\begin{lstlisting} - Expr1 ::= Bindings `=>' Expr - ResultExpr ::= Bindings `=>' Block - Bindings ::= `(' Binding {`,' Binding `)' - | id [`:' Type1] - Binding ::= id [`:' Type] -\end{lstlisting} - -The anonymous function ~\lstinline@($x_1$: $T_1 \commadots x_n$: $T_n$) => e@~ -maps parameters $x_i$ of types $T_i$ to a result given -by expression $e$. The scope of each formal parameter -$x_i$ is $e$. Formal parameters must have pairwise distinct names. - -If the expected type of the anonymous function is of the form -~\lstinline@scala.Function$n$[$S_1 \commadots S_n$, $R\,$]@, the -expected type of $e$ is $R$ and the type $T_i$ of any of the -parameters $x_i$ can be omitted, in which -case~\lstinline@$T_i$ = $S_i$@ is assumed. -If the expected type of the anonymous function is -some other type, all formal parameter types must be explicitly given, -and the expected type of $e$ is missing. The type of the anonymous -function -is~\lstinline@scala.Function$n$[$S_1 \commadots S_n$, $T\,$]@, -where $T$ is the type of $e$. $T$ must be equivalent to a -type which does not refer to any of the formal parameters $x_i$. - -The anonymous function is evaluated as the instance creation expression -\begin{lstlisting} -scala.Function$n$[$T_1 \commadots T_n$, $T$] { - def apply($x_1$: $T_1 \commadots x_n$: $T_n$): $T$ = $e$ -} -\end{lstlisting} -In the case of a single formal parameter, ~\lstinline@($x$: $T\,$) => $e$@~ and ~\lstinline@($x\,$) => $e$@~ -can be abbreviated to ~\lstinline@$x$: $T$ => e@, and ~\lstinline@$x$ => $e$@, respectively. - -\example Examples of anonymous functions: - -\begin{lstlisting} - x => x // The identity function - - f => g => x => f(g(x)) // Curried function composition - - (x: Int,y: Int) => x + y // A summation function - - () => { count = count + 1; count } // The function which takes an - // empty parameter list $()$, - // increments a non-local variable - // `count' and returns the new value. -\end{lstlisting} - -\section{Statements} -\label{sec:statements} - -\syntax\begin{lstlisting} - BlockStat ::= Import - | Def - | {LocalModifier} ClsDef - | Expr - | - TemplateStat ::= Import - | {Modifier} Def - | {Modifier} Dcl - | Expr - | -\end{lstlisting} - -Statements occur as parts of blocks and templates. A statement can be -an import, a definition or an expression, or it can be empty. -Statements used in the template of a class definition can also be -declarations. An expression that is used as a statement can have an -arbitrary value type. An expression statement $e$ is evaluated by -evaluating $e$ and discarding the result of the evaluation. -\todo{Generalize to implicit coercion?} - -Block statements may be definitions which bind local names in the -block. The only modifiers allowed in block-local definitions are modifiers -\code{abstract}, \code{final}, or \code{sealed} preceding a class or -object definition. - -With the exception of overloaded definitions -(\sref{sec:overloaded-defs}), a statement sequence making up a block -or template may not contain two definitions or declarations that bind -the same name in the same namespace. Evaluation of a statement -sequence entails evaluation of the statements in the order they are -written. - -\chapter{Pattern Matching} - -\section{Patterns} - -% 2003 July - changed to new pattern syntax + semantic Burak -% Nov - incorporated changes to grammar, avoiding empty patterns -% definitions for value and sequence patterns -\label{sec:patterns} - -\syntax\begin{lstlisting} - Pattern ::= Pattern1 { `|' Pattern1 } - Pattern1 ::= varid `:' Type - | `_' `:' Type - | Pattern2 - Pattern2 ::= [varid `@'] Pattern3 - Pattern3 ::= SimplePattern [ '*' | '?' | '+' ] - | SimplePattern { id' SimplePattern } - SimplePattern ::= `_' - | varid - | Literal - | StableId [ `(' [Patterns] `)' ] - | `(' [Patterns] `)' - Patterns ::= Pattern {`,' Pattern} - id' ::= id $\textit{ but not }$ '*' | '?' | '+' | `@' | `|' -\end{lstlisting} - -A pattern is built from constants, constructors, variables and regular -operators. Pattern matching tests whether a given value (or sequence -of values) has the shape defined by a pattern, and, if it does, binds -the variables in the pattern to the corresponding components of the -value (or sequence of values). The same variable name may not be -bound more than once in a pattern. - -\subsection{Value and Sequence Patterns} - -\todo{Need to distinguish between value and sequence patterns at the outside} - -On an abstract level, we distinguish between value patterns and sequence patterns, which are defined in a -mutually inductive manner. A {\em value pattern} describes a set of matching values. A -{\em sequence pattern} describes a set of matching of sequences of values. Both sorts of patterns may -contain {\em variable bindings} which serve to extract constituents of a value or sequence, -and may consist of patterns of the respective other sort. - -The type of a patterns and the expected types of variables -within patterns are determined by the context. - -Concretely, we distinguish the following kinds of patterns. - -A {\em wild-card pattern} \_ matches any value. - -A {\em typed pattern} $\_: T$ matches values of type $T$. The type $T$ may be - a class type or a compound type; it may not contain a refinement (\sref{sec:refinements}). -This pattern matches any non-null value of type $T$. $T$ must conform to the pattern's expected -type. A pattern $x:T$ is treated the same way as $x @ (\_:T)$ - -A {\em pattern literal} $l$ matches any value that is equal (in terms -of $==$) to it. It's type must conform to the expected type of the -pattern. - -A {\em named pattern constant} $p$ is a stable identifier -(\sref{sec:stable-ids}). To resolve the syntactic overlap with a -variable pattern, a named pattern constant may not be a simple name -starting with a lower-case letter. The stable identifier $p$ is -expected to conform to the expected type of the pattern. The pattern -matches any value $v$ such that ~\lstinline@$r$ == $v$@~ -(\sref{sec:cls-object}). - -A {\em sequence pattern} $p_1 \commadots p_n$ where $n \geq 0$ is a -sequence of patterns separated by commas and matching the sequence of -values that are matched by the components. Sequence patterns may only -appear under constructor applications, or nested within a another sequence pattern. -Note that empty sequence patterns are allowed. The type of value patterns that appear in -a sequence pattern is the expected type as determined from the constructor. -A {\em fixed-length argument pattern} is a special sequence pattern where -where all $p_i$ are value patterns. - -A {\em choice pattern} $p_1 | \ldots | p_n$ is a choice among several -alternatives, which may not contain variable-binding patterns. It -matches every value and every sequence matched by at least one of its alternatives. -Note that the empty sequence may appear as an alternative. An {\em option -pattern} $p?$ is an abbreviation for $(p| )$. A choice is a value pattern if all its branches -are value patterns. In this case, all branches must conform to the expected type and the type -of the choice is the least upper bound of the branches. Otherwise, it has the same type as the -sequence pattern it is part of. - -An {\em iterated pattern} $p*$ matches sequences of values -consisting of zero, one or more occurrences of values matched by $p$, -where $p$ may not contain a variable-binding pattern. A {\em non-empty -iterated pattern} $p+$ is an abbreviation for $(p,p*)$. - -A {\em constructor pattern} $c ( p )$ consists of a simple type $c$ -followed by a pattern $p$. If $c$ designates a monomorphic case -class, then it must conform to the expected type of the pattern, the -pattern must be a fixed length argument pattern $p_1 \commadots p_n$ -whose length corresponds to the number of arguments of $c$'s primary -constructor. The expected types of the component patterns are then -taken from the formal parameter types of (said) constructor. If $c$ -designates a polymorphic case class, then there must be a unique type -application instance of it such that the instantiation of $c$ conforms -to the expected type of the pattern. The instantiated formal parameter -types of $c$'s primary constructor are then taken as the expected -types of the component patterns $p_1\commadots p_n$. In both cases, -the pattern matches all objects created from constructor invocations -$c(v_1 \commadots v_n)$ where each component pattern $p_i$ matches the -corresponding value $v_i$. If $c$ does not designate a case class, it -must be a subclass of \lstinline@Seq[$T\,$]@. In that case $p$ may be an -arbitrary sequence pattern. Value patterns in $p$ are expected to conform to -type $T$, and the pattern matches all objects whose \lstinline@elements()@ -method returns a sequence that matches $p$. - -The pattern $(p)$ is regarded as equivalent to the pattern $p$, if $p$ -is a nonempty sequence pattern. The empty tuple $()$ is a shorthand -for the constructor pattern \code{Unit}. - -An {\em infix operation pattern} ~\lstinline@$p$ $op$ $p'$@~ is a shorthand for the -constructor pattern ~\lstinline@$op$($p$, $p'$)@. The precedence and -associativity of operators in patterns is the same as in expressions -(\sref{sec:infix-operations}). The operands may not be empty sequence -patterns. - -\subsection{Variable Binding} - -A {\em variable-binding pattern} $x @ p$ is a simple identifier $x$ -which starts with a lower case letter, together with a pattern $p$. It -matches a value or a sequence of values whenever $p$ does, and in -addition binds the variable name to that value or to that sequence of -values. If $p$ is a value pattern of type $T$, the type of $x$ is also $T$. -If $p$ is a sequence pattern and appears under a constructor $c <: $\lstinline@Seq[$T\,$]@, -then the type of $x$ is \lstinline@List[$T\,$]@. %%\todo{really?} burak:yes -where $T$ is the expected type as dictated by the constructor. A pattern -consisting of only a variable $x$ is treated as the bound value pattern $x @ \_$. - -Regular expressions that contain variable bindings may be ambiguous, -i.e. there might be several ways to match a sequence against the -pattern. In these cases, the \emph{right-longest policy} applies: -patterns that appear more to the right than others in a sequence take precedence in case -of overlaps. - -\example Some examples of patterns are: -\begin{enumerate} -\item -The pattern ~\lstinline@ex: IOException@~ matches all instances of class -\code{IOException}, binding variable \code{ex} to the instance. -\item -The pattern ~\lstinline@Pair(x, _)@~ matches pairs of values, binding \code{x} to -the first component of the pair. The second component is matched -with a wildcard pattern. -\item -The pattern \ \code{List( x, y, xs @ _ * )} matches lists of length $\geq 2$, -binding \code{x} to the list's first element, \code{y} to the list's -second element, and \code{xs} to the remainder, which may be empty. -\item -The pattern \ \code{List( 1, x@(( 'a' | 'b' )+),y,_ )} matches a list that -contains 1 as its first element, continues with a non-empty sequence of -\code{'a'}s and \code{'b'}s, followed by two more elements. The sequence 'a's and 'b's -is bound to \code{x}, and the next to last element is bound to \code{y}. -\item -The pattern \code{List( x@( 'a'* ), 'a'+ )} matches a non-empty list of -\code{'a'}s. Because of the shortest match policy, \code{x} will always be bound to -the empty sequence. -\item -The pattern \code{List( x@( 'a'+ ), 'a'* )} also matches a non-empty list of -\code{'a'}s. Here, \code{x} will always be bound to -the sequence containing one \code{'a'} -\end{enumerate} - -\section{Pattern Matching Expressions} -\label{sec:pattern-match} - -\syntax\begin{lstlisting} - BlockExpr ::= `{' CaseClause {CaseClause} `}' - CaseClause ::= case Pattern [`if' PostfixExpr] `=>' Block -\end{lstlisting} - -A pattern matching expression -~\lstinline@case $p_1$ => $b_1$ $\ldots$ case $p_n$ => $b_n$@ \ consists of a number -$n \geq 1$ of cases. Each case consists of a (possibly guarded) pattern -$p_i$ and a block $b_i$. The scope of the pattern variables in $p_i$ is -the corresponding block $b_i$. - -The expected type of a pattern matching expression must in part be -defined. It must be either ~\lstinline@scala.Function1[$T_p$, $T_r$]@ \ or -~\lstinline@scala.PartialFunction[$T_p$, $T_r$]@, where the argument type -$T_p$ must be fully determined, but the result type -$T_r$ may be undetermined. All patterns are typed -relative to the expected type $T_p$ (\sref{sec:patterns}). The expected type of -every block $b_i$ is $T_r$. -Let $T_b$ be the least upper bound of the types of all blocks -$b_i$. The type of the pattern matching expression is -then the required type with $T_r$ replaced by $T_b$ -(i.e. the type is either ~\lstinline@scala.Function[$T_p$, $T_b$]@~ or -~\lstinline@scala.PartialFunction[$T_p$, $T_b$]@. - -When applying a pattern matching expression to a selector value, -patterns are tried in sequence until one is found which matches the -selector value (\sref{sec:patterns}). Say this case is $\CASE;p_i -\Arrow b_i$. The result of the whole expression is then the result of -evaluating $b_i$, where all pattern variables of $p_i$ are bound to -the corresponding parts of the selector value. If no matching pattern -is found, a \code{scala.MatchError} exception is thrown. - -The pattern in a case may also be followed by a guard suffix \ \code{if e}\ -with a boolean expression $e$. The guard expression is evaluated if -the preceding pattern in the case matches. If the guard expression -evaluates to \code{true}, the pattern match succeeds as normal. If the -guard expression evaluates to \code{false}, the pattern in the case -is considered not to match and the search for a matching pattern -continues. - -\comment{ -A case with several patterns $\CASE;p_1 \commadots p_n ;\IF; e \Arrow b$ is a -shorthand for a sequence of single-pattern cases $\CASE;p_1;\IF;e \Arrow b -;\ldots; \CASE;p_n ;\IF;e\Arrow b$. In this case none of the patterns -$p_i$ may contain a named pattern variable (but the patterns may contain -wild-cards). -} - -In the interest of efficiency the evaluation of a pattern matching -expression may try patterns in some other order than textual -sequence. This might affect evaluation through -side effects in guards. However, it is guaranteed that a guard -expression is evaluated only if the pattern it guards matches. - -\example -Often, pattern matching expressions are used as arguments -of the \code{match} method, which is predefined in class \code{Any} -(\sref{sec:cls-object}) and is implemented there by postfix function -application. Here is an example: -\begin{lstlisting} -def length [a] (xs: List[a]) = xs match { - case Nil => 0 - case x :: xs1 => 1 + length (xs1) -} -\end{lstlisting} - -\chapter{Top-Level Definitions} -\label{sec:topdefs} - -\syntax\begin{lstlisting} - CompilationUnit ::= [package QualId `;'] {TopStat `;'} TopStat - TopStat ::= {Modifier} ClsDef - | Import - | Packaging - | - QualId ::= id {`.' id} -\end{lstlisting} - -A compilation unit consists of a sequence of packagings, import -clauses, and class and object definitions, which may be preceded by a -package clause. - -A compilation unit ~\lstinline@package $p$; $\stats$@~ starting with a package -clause is equivalent to a compilation unit consisting of a single -packaging ~\lstinline@package $p$ { $\stats$ }@. - -Implicitly imported into every compilation unit are, in that order : -the package \code{java.lang}, the package \code{scala}, and the object -\code{scala.Predef} (\sref{cls:predef}). Members of a later import in -that order hide members of an earlier import. - -\section{Packagings}\label{sec:packagings} - -\syntax\begin{lstlisting} - Packaging ::= package QualId `{' {TopStat `;'} TopStat `}' -\end{lstlisting} - -A package is a special object which defines a set of member classes, -objects and packages. Unlike other objects, packages are not introduced -by a definition. Instead, the set of members of a package is determined by -packagings. - -A packaging \ \code{package p { ds }}\ injects all definitions in -\code{ds} as members into the package whose qualified name is -$p$. If a definition in \code{ds} is labeled \code{private}, it -is visible only for other members in the package. - -Selections \code{p.m} from $p$ as well as imports from $p$ -work as for objects. However, unlike other objects, packages may not -be used as values. It is illegal to have a package with the same fully -qualified name as a module or a class. - -Top-level definitions outside a packaging are assumed to be injected -into a special empty package. That package cannot be named and -therefore cannot be imported. However, members of the empty package -are visible to each other without qualification. - -\example The following example will create a hello world program as -function \code{main} of module \code{test.HelloWorld}. -\begin{lstlisting} -package test; - -object HelloWord { - def main(args: Array[String]) = System.out.println("hello world") -} -\end{lstlisting} - -\chapter{Local Type Inference} -\label{sec:local-type-inf} - -To be completed. - -\chapter{The Scala Standard Library} - -The Scala standard library consists of the package \code{scala} with a -number of classes and modules. Some of these classes are described in -the following. - -\section{Root Classes} -\label{sec:cls-root} -\label{sec:cls-any} -\label{sec:cls-object} - -The root of the Scala class hierarchy is formed by class \code{Any}. -Every class in a Scala execution environment inherits directly or -indirectly from this class. Class \code{Any} has two direct -subclasses: \code{AnyRef} and\code{AnyVal}. - -The subclass \code{AnyRef} represents all values which are represented -as objects in the underlying host system. Every user-defined Scala -class inherits directly or indirectly from this class. Furthermore, -every user-defined Scala class also inherits the trait -\code{scala.ScalaObject}. Classes written in other languages still -inherit from \code{scala.AnyRef}, but not from -\code{scala.ScalaObject}. - -The class \code{AnyVal} has a fixed number subclasses, which describe -values which are not implemented as objects in the underlying host -system. - -Classes \code{AnyRef} and \code{AnyVal} are required to provide only -the members declared in class \code{Any}, but implementations may add -host-specific methods to these classes (for instance, an -implementation may identify class \code{AnyRef} with its own root -class for objects). - -The standard interfaces of these root classes is described by the -following definitions. - -\begin{lstlisting} -package scala; -abstract class Any { - - /** Reference equality */ - final def eq(that: Any): boolean = $\ldots$ - - /** Defined equality */ - def equals(that: Any): boolean = this eq that; - - /** Semantic equality between values of same type */ - final def == (that: Any): boolean = this equals that - - /** Semantic inequality between values of same type */ - final def != (that: Any): boolean = !(this == that) - - /** Hash code */ - def hashCode(): Int = $\ldots$ - - /** Textual representation */ - def toString(): String = $\ldots$ - - /** Type test */ - def isInstanceOf[a]: Boolean = match { - case x: a => true - case _ => false - } - - /** Type cast */ - def asInstanceOf[a]: a = match { - case x: a => x - case _ => if (this eq null) this - else throw new ClassCastException() - } - - /** Pattern match */ - def match[a, b](cases: a => b): b = cases(this); -} -final class AnyVal extends Any; -class AnyRef extends Any; -trait ScalaObject extends AnyRef; -\end{lstlisting} - -The type cast operation \verb@asInstanceOf@ has a special meaning (not -expressed in the code above) when its type parameter is a numeric -type. For any type \lstinline@T <: Double@, and any numeric value -\verb@v@ \lstinline@v.asInstanceIf[T]@ converts \code{v} to type -\code{T} using the rules of Java's numeric type cast operation. The -conversion might truncate the numeric value (as when going from -\code{Long} to \code{Int} or from \code{Int} to \code{Byte}) or it -might lose precision (as when going from \code{Double} to \code{Float} -or when converting between \code{Long} and \code{Float}). - -\section{Value Classes} -\label{cls:value} - -Value classes are classes whose instances are not represented as -objects by the underlying host system. All value classes inherit from -class \code{AnyVal}. Scala implementations need to provide the -value classes \code{Unit}, \code{Boolean}, \code{Double}, \code{Float}, -\code{Long}, \code{Int}, \code{Char}, \code{Short}, and \code{Byte} -(but are free to provide others as well). -The signatures of these classes are defined in the following. - -\subsection{Class \large{\code{Double}}} - -\begin{lstlisting} -package scala; -abstract sealed class Double extends AnyVal { - def + (that: Double): Double // double addition - def - (that: Double): Double // double subtraction - def * (that: Double): Double // double multiplication - def / (that: Double): Double // double division - def % (that: Double): Double // double remainder - - def == (that: Double): Boolean // double equality - def != (that: Double): Boolean // double inequality - def < (that: Double): Boolean // double less - def > (that: Double): Boolean // double greater - def <= (that: Double): Boolean // double less or equals - def >= (that: Double): Boolean // double greater or equals - - def - : Double = 0.0 - this // double negation - def + : Double = this -} -\end{lstlisting} - -\subsection{Class \large{\code{Float}}} - -\begin{lstlisting} -package scala; -abstract sealed class Float extends AnyVal { - def coerce: Double // convert to Double - - def + (that: Double): Double; // double addition - def + (that: Float): Double // float addition - /* analogous for -, *, /, % */ - - def == (that: Double): Boolean; // double equality - def == (that: Float): Boolean; // float equality - /* analogous for !=, <, >, <=, >= */ - - def - : Float; // float negation - def + : Float -} -\end{lstlisting} - -\subsection{Class \large{\code{Long}}} - -\begin{lstlisting} -package scala; -abstract sealed class Long extends AnyVal { - def coerce: Double // convert to Double - def coerce: Float // convert to Float - - def + (that: Double): Double; // double addition - def + (that: Float): Double; // float addtion - def + (that: Long): Long = // long addition - /* analogous for -, *, /, % */ - - def << (cnt: Int): Long // long left shift - def >> (cnt: Int): Long // long signed right shift - def >>> (cnt: Int): Long // long unsigned right shift - def & (that: Long): Long // long bitwise and - def | (that: Long): Long // long bitwise or - def ^ (that: Long): Long // long bitwise exclusive or - - def == (that: Double): Boolean; // double equality - def == (that: Float): Boolean; // float equality - def == (that: Long): Boolean // long equality - /* analogous for !=, <, >, <=, >= */ - - def - : Long; // long negation - def + : Long; // long identity - def ~ : Long // long bitwise negation -} -\end{lstlisting} - -\subsection{Class \large{\code{Int}}} - -\begin{lstlisting} -package scala; -abstract sealed class Int extends AnyVal { - def coerce: Double // convert to Double - def coerce: Float // convert to Float - def coerce: Long // convert to Long - - def + (that: Double): Double; // double addition - def + (that: Float): Double; // float addtion - def + (that: Long): Long; // long addition - def + (that: Int): Int; // int addition - /* analogous for -, *, /, % */ - - def << (cnt: Int): Int; // int left shift - /* analogous for >>, >>> */ - - def & (that: Long): Long; // long bitwise and - def & (that: Int): Int; // int bitwise and - /* analogous for |, ^ */ - - def == (that: Double): Boolean; // double equality - def == (that: Float): Boolean; // float equality - def == (that: Long): Boolean // long equality - def == (that: Int): Boolean // int equality - /* analogous for !=, <, >, <=, >= */ - - def - : Int; // int negation - def + : Int; // int identity - def ~ : Int; // int bitwise negation -} -\end{lstlisting} - -\subsection{Class \large{\code{Short}}} - -\begin{lstlisting} -package scala; -abstract sealed class Short extends AnyVal { - def coerce: Double // convert to Double - def coerce: Float // convert to Float - def coerce: Long // convert to Long - def coerce: Int // convert to Int -} -\end{lstlisting} - -\subsection{Class \large{\code{Char}}} - -\begin{lstlisting} -package scala; -abstract sealed class Char extends AnyVal { - def coerce: Double // convert to Double - def coerce: Float // convert to Float - def coerce: Long // convert to Long - def coerce: Int // convert to Int - - def isDigit: Boolean; // is this character a digit? - def isLetter: Boolean; // is this character a letter? - def isLetterOrDigit: Boolean; // is this character a letter or digit? - def isWhiteSpace // is this a whitespace character? -} -\end{lstlisting} - -\subsection{Class \large{\code{Short}}} - -\begin{lstlisting} -package scala; -abstract sealed class Short extends AnyVal { - def coerce: Double // convert to Double - def coerce: Float // convert to Float - def coerce: Long // convert to Long - def coerce: Int // convert to Int - def coerce: Short // convert to Short -} -\end{lstlisting} - -\subsection{Class \large{\code{Boolean}}} -\label{sec:cls-boolean} - -\begin{lstlisting} -package scala; -abstract sealed class Boolean extends AnyVal { - def && (def x: Boolean): Boolean; // boolean and - def || (def x: Boolean): Boolean; // boolean or - def & (x: Boolean): Boolean; // boolean strict and - def | (x: Boolean): Boolean // boolean strict or - - def == (x: Boolean): Boolean // boolean equality - def != (x: Boolean): Boolean // boolean inequality - - def ! (x: Boolean): Boolean // boolean negation -} -\end{lstlisting} - -\subsection{Class \large{\code{Unit}}} - -\begin{lstlisting} -package scala; -abstract sealed class Unit extends AnyVal; -\end{lstlisting} - -\section{Standard Reference Classes} -\label{cls:reference} - -This section presents some standard Scala reference classes which are -treated in a special way in Scala compiler -- either Scala provides -syntactic sugar for them, or the Scala compiler generates special code -for their operations. Other classes in the standard Scala library are -documented by HTML pages elsewhere. - -\subsection{Class \large{\code{String}}} - -The \verb@String@ class is usually derived from the standard String -class of the underlying host system (and may be identified with -it). For Scala clients the class is taken to support in each case a -method -\begin{lstlisting} -def + (that: Any): String -\end{lstlisting} -which concatenates its left operand with the textual representation of its -right operand. - -\subsection{The \large{\code{Tuple}} classes} - -Scala defines tuple classes \lstinline@Tuple$n$@ for $n = 2 \commadots 9$. -These are defined as follows. - -\begin{lstlisting} -package scala; -case class Tuple$n$[+a_1, ..., +a_n](_1: a_1, ..., _$n$: a_$n$) { - def toString = "(" ++ _1 ++ "," ++ $\ldots$ ++ "," ++_$n$ ++ ")" -} -\end{lstlisting} - -The implicity imported \code{Predef} object (\sref{cls:predef}) defines -the names \code{Pair} as an alias of \code{Tuple2} and \code{Triple} -as an alias for \code{Tuple3}. - -\subsection{The \large{\code{Function}} Classes} -\label{sec:cls-function} - -Scala defines function classes \lstinline@Function$n$@ for $n = 1 \commadots 9$. -These are defined as follows. - -\begin{lstlisting} -package scala; -class Function$n$[-a_1, ..., -a_$n$, +b] { - def apply(x_1: a_1, ..., x_$n$: a_$n$): b; - def toString = "<function>"; -} -\end{lstlisting} - -\comment{ -There is also a module \code{Function}, defined as follows. -\begin{lstlisting} -package scala; -module Function { - def compose[a](fs: List[(a)a]): (a)a = { - x => fs match { - case Nil => x - case f :: fs1 => compose(fs1)(f(x)) - } - } -} -\end{lstlisting} -} - -A subclass of \lstinline@Function1@ represents partial functions, -which are undefined on some points in their domain. In addition to the -\code{apply} method of functions, partial functions also have a -\code{isDefined} method, which tells whether the function is defined -at the given argument: -\begin{lstlisting} -class PartialFunction[-a,+b] extends Function1[a, b] { - def isDefinedAt(x: a): Boolean -} -\end{lstlisting} - -The implicity imported \code{Predef} object (\sref{cls:predef}) defines the name -\code{Function} as an alias of \code{Function1}. - -\subsection{Class \large{\code{Array}}}\label{cls:array} - -The class of generic arrays is given as follows. - -\begin{lstlisting} -package scala; -class Array[a](length: int) with Function[Int, a] { - def length: int; - def apply(i: Int): a; - def update(i: Int)(x: a): Unit; -} -\end{lstlisting} - -\comment{ -\begin{lstlisting} -module Array { - def create[a](i1: Int): Array[a] = Array[a](i1) - def create[a](i1: Int, i2: Int): Array[Array[a]] = { - val x: Array[Array[a]] = create(i1) - 0 to (i1 - 1) do { i => x(i) = create(i2) } - x - } - $\ldots$ - def create[a](i1: Int, i2: Int, i3: Int, i4: Int, i5: Int, - i6: Int, i7: Int, i8: Int, i9: Int, i10: Int) - : Array[Array[Array[Array[Array[Array[Array[Array[Array[Array[a]]]]]]]]]] = { - val x: Array[Array[Array[Array[Array[Array[Array[Array[Array[a]]]]]]]]] = create(i1) - 0 to (i1 - 1) do { i => x(i) = create(i2, i3, i4, i5, i6, i7, i8, i9, i10) } - x - } -} -\end{lstlisting} -} - -\section{The \large{\code{Predef}} Object}\label{cls:predef} - -The \code{Predef} module defines standard functions and type aliases -for Scala programs. It is always implicity imported, so that all its -defined members are available without qualification. Here is its -definition for the JVM environment. - -\begin{lstlisting} -package scala; -object Predef { - type byte = scala.Byte; - type short = scala.Short; - type char = scala.Char; - type int = scala.Int; - type long = scala.Long; - type float = scala.Float; - type double = scala.Double; - type boolean = scala.Boolean; - type unit = scala.Unit; - - type String = java.lang.String; - type NullPointerException = java.lang.NullPointerException; - type Throwable = java.lang.Throwable; - // other aliases to be identified - - /** Abort with error message */ - def error(message: String): All = throw new Error(message); - - /** Throw an error if given assertion does not hold. */ - def assert(assertion: Boolean): Unit = - if (!assertion) throw new Error("assertion failed"); - - /** Throw an error with given message if given assertion does not hold */ - def assert(assertion: Boolean, message: Any): Unit = { - if (!assertion) throw new Error("assertion failed: " + message); - - /** Create an array with given elements */ - def Array[A](xs: A*): Array[A] = { - val array: Array[A] = new Array[A](xs.length); - var i = 0; - for (val x <- xs.elements) { array(i) = x; i = i + 1; } - array; - } - - /** Aliases for pairs and triples */ - type Pair[+p, +q] = Tuple2[p, q]; - def Pair[a, b](x: a, y: b) = Tuple2(x, y); - type Triple[+a, +b, +c] = Tuple3[a, b, c]; - def Triple[a, b, c](x: a, y: b, z: c) = Tuple3(x, y, z); - - /** Alias for unary functions */ - type Function = Function1; - - /** Some standard simple functions */ - def id[a](x: a): a = x; - def fst[a](x: a, y: Any): a = x; - def scd[a](x: Any, y: a): a = y; -} -\end{lstlisting} +\part{The Scala Language Specification} +\input{ReferencePart} \bibliographystyle{alpha} \bibliography{Scala} -\appendix -\chapter{Scala Syntax Summary} - -The lexical syntax of Scala is given by the following grammar in EBNF -form. - -\begin{lstlisting} - upper ::= `A' | $\ldots$ | `Z' | `$\Dollar$' | `_' - lower ::= `a' | $\ldots$ | `z' - letter ::= upper | lower - digit ::= `0' | $\ldots$ | `9' - special ::= $\mbox{\rm\em ``all other characters except parentheses ([{}]) and periods''}$ - - op ::= special {special} - varid ::= lower {letter | digit} [`_' [id]] - id ::= upper {letter | digit} [`_' [id]] - | varid - | op - | `\'stringLit - - intLit ::= $\mbox{\rm\em ``as in Java''}$ - floatLit ::= $\mbox{\rm\em ``as in Java''}$ - charLit ::= $\mbox{\rm\em ``as in Java''}$ - stringLit ::= $\mbox{\rm\em ``as in Java''}$ - symbolLit ::= `\'' id - - comment ::= `/*' ``any sequence of characters'' `*/' - | `//' `any sequence of characters up to end of line'' -\end{lstlisting} - -The context-free syntax of Scala is given by the following EBNF -grammar. - -\begin{lstlisting} - Literal ::= intLit - | floatLit - | charLit - | stringLit - | symbolLit - | true - | false - | null - - StableId ::= id - | Path `.' id - Path ::= StableId - | [id `.'] this - | [id '.'] super [`[' id `]']`.' id - - Type ::= Type1 `=>' Type - | `(' [Types] `)' `=>' Type - | Type1 - Type1 ::= SimpleType {with SimpleType} [Refinement] - SimpleType ::= SimpleType TypeArgs - | SimpleType `#' id - | StableId - | Path `.' type - | `(' Type ')' - TypeArgs ::= `[' Types `]' - Types ::= Type {`,' Type} - Refinement ::= `{' [RefineStat {`;' RefineStat}] `}' - RefineStat ::= Dcl - | type TypeDef {`,' TypeDef} - | - - Exprs ::= Expr {`,' Expr} - Expr ::= Bindings `=>' Expr - | Expr1 - Expr1 ::= if `(' Expr1 `)' Expr [[`;'] else Expr] - | try `{' Block `}' [catch Expr] [finally Expr] - | do Expr [`;'] while `(' Expr ')' - | for `(' Enumerators `)' (do | yield) Expr - | return [Expr] - | throw Expr - | [SimpleExpr `.'] id `=' Expr - | SimpleExpr ArgumentExprs `=' Expr - | PostfixExpr [`:' Type1] - PostfixExpr ::= InfixExpr [id] - InfixExpr ::= PrefixExpr - | InfixExpr id PrefixExpr - PrefixExpr ::= [`-' | `+' | `~' | `!'] SimpleExpr - SimpleExpr ::= Literal - | Path - | `(' [Expr] `)' - | BlockExpr - | new Template - | SimpleExpr `.' id - | SimpleExpr TypeArgs - | SimpleExpr ArgumentExprs - ArgumentExprs ::= `(' [Exprs] ')' - | BlockExpr - BlockExpr ::= `{' CaseClause {CaseClause} `}' - | `{' Block `}' - Block ::= {BlockStat `;'} [ResultExpr] - BlockStat ::= Import - | Def - | {LocalModifier} ClsDef - | Expr1 - | - ResultExpr ::= Expr1 - | Bindings `=>' Block - - Enumerators ::= Generator {`;' Enumerator} - Enumerator ::= Generator - | Expr - Generator ::= val Pattern1 `<-' Expr - - CaseClause ::= case Pattern [`if' PostfixExpr] `=>' Block - - Constr ::= StableId [TypeArgs] [`(' [Exprs] `)'] - - Pattern ::= Pattern1 { `|' Pattern1 } - Pattern1 ::= varid `:' Type - | `_' `:' Type - | Pattern2 - Pattern2 ::= [varid `@'] Pattern3 - Pattern3 ::= SimplePattern [ '*' | '?' | '+' ] - | SimplePattern { id SimplePattern } - SimplePattern ::= `_' - | varid - | Literal - | StableId [ `(' [Patterns] `)' ] - | `(' [Patterns] `)' - Patterns ::= Pattern {`,' Pattern} - - TypeParamClause ::= `[' TypeParam {`,' TypeParam} `]' - FunTypeParamClause ::= `[' TypeDcl {`,' TypeDcl} `]' - TypeParam ::= [`+' | `-'] TypeDcl - ParamClause ::= `(' [Param {`,' Param}] `)' - Param ::= [def] id `:' Type [`*'] - Bindings ::= id [`:' Type1] - | `(' Binding {`,' Binding `)' - Binding ::= id [`:' Type] - - Modifier ::= LocalModifier - | private - | protected - | override - LocalModifier ::= abstract - | final - | sealed - - Template ::= Constr {`with' Constr} [TemplateBody] - TemplateBody ::= `{' [TemplateStat {`;' TemplateStat}] `}' - TemplateStat ::= Import - | {Modifier} Def - | {Modifier} Dcl - | Expr - | - - Import ::= import ImportExpr {`,' ImportExpr} - ImportExpr ::= StableId `.' (id | `_' | ImportSelectors) - ImportSelectors ::= `{' {ImportSelector `,'} - (ImportSelector | `_') `}' - ImportSelector ::= id [`=>' id | `=>' `_'] - - Dcl ::= val ValDcl {`,' ValDcl} - | var VarDcl {`,' VarDcl} - | def FunDcl {`,' FunDcl} - | type TypeDcl {`,' TypeDcl} - ValDcl ::= id `:' Type - VarDcl ::= id `:' Type - FunDcl ::= id [FunTypeParamClause] {ParamClause} `:' Type - TypeDcl ::= id [`>:' Type] [`<:' Type] - - Def ::= val PatDef {`,' PatDef} - | var VarDef {`,' VarDef} - | def FunDef {`,' FunDef} - | type TypeDef {`,' TypeDef} - | ClsDef - PatDef ::= Pattern `=' Expr - VarDef ::= id [`:' Type] `=' Expr - | id `:' Type `=' `_' - FunDef ::= id [FunTypeParamClause] {ParamClause} - [`:' Type] `=' Expr - | this ParamClause `=' ConstrExpr - TypeDef ::= id [TypeParamClause] `=' Type - ClsDef ::= ([case] class | trait) ClassDef {`,' ClassDef} - | [case] object ObjectDef {`,' ObjectDef} - ClassDef ::= id [TypeParamClause] [ParamClause] - [`:' SimpleType] ClassTemplate - ObjectDef ::= id [`:' SimpleType] ClassTemplate - ClassTemplate ::= extends Template - | TemplateBody - | - ConstrExpr ::= this ArgumentExprs - | `{' this ArgumentExprs {`;' BlockStat} `}' - - CompilationUnit ::= [package QualId `;'] {TopStat `;'} TopStat - TopStat ::= {Modifier} ClsDef - | Import - | Packaging - | - Packaging ::= package QualId `{' {TopStat `;'} TopStat `}' - QualId ::= id {`.' id} -\end{lstlisting} - -\chapter{Implementation Status} - -The present Scala compiler does not yet implement all of the Scala -specification. Its currently existing omissions and deviations are -listed below. We are working on a refined implementation that -addresses these issues. -\begin{enumerate} -\item -Unicode support is still limited. At present we only permit Unicode -encodings \verb@\uXXXX@ in strings and backquote-enclosed identifiers. -To define or access a Unicode identifier, you need to put it in -backquotes and use the \verb@\uXXXX@ encoding. -\item -The unicode operator ``$\Rightarrow$'' -(\sref{sec:idents}) is not yet recognized; you need to use the two -character ASCII equivalent ``\code{=>}'' instead. -\item -The current implementation does not yet support run-time types. -All types are erased (\sref{sec:erasure}) during compilation. This means that -the following operations give potentially wrong results. -\begin{itemize} -\item -Type tests and type casts to parameterized types. Here it is only tested -that a value is an instance of the given top-level type constructor. -\item -Type tests and type casts to type parameters and abstract types. Here -it is only tested that a value is an instance of the type parameter's upper bound. -\item -Polymorphic array creation. If \code{t} is a type variable or abstract type, then -\code{new Array[t]} will yield an array of the upper bound of \code{t}. -\end{itemize} -\item -Return expressions are not yet permitted inside an anonymous function -or inside a call-by-name argument (i.e.\ a function argument corresponding to a -\code{def} parameter). -\item -Members of the empty package (\sref{sec:packagings}) cannot yet be -accessed from other source files. Hence, all library classes and -objects have to be in some package. -\item -At present, auxiliary constructors (\sref{sec:constr-defs}) are only permitted -for monomorphic classes. -\item -The \code{Array} class supports as yet only a restricted set of -operations as given in \sref{cls:array}. It is planned to extend that -interface. In particular, arrays will implement the \code{scala.Seq} -trait as well as the methods needed to support for-comprehensions. -\item -At present, all classes used as mixins must be accessible to the Scala -compiler in source form. -\end{enumerate} - -\end{document} - - -\comment{ -\section{Definitions} - -For a possibly recursive definition such as $\LET;x_1 = e_1 -;\ldots; \LET x_n = e_n$, local type inference proceeds as -follows. -A first phase assigns {\em a-priori types} to the $x_i$. The a-priori -type of $x$ is the declared type of $x$ if a declared type is -given. Otherwise, it is the inherited type, if one is -given. Otherwise, it is undefined. - -A second phase assigns completely defined types to the $x_i$, in some -order. The type of $x$ is the a-priori type, if it is completely -defined. Otherwise, it is the a-priori type of $x$'s right hand side. -The a-priori type of an expression $e$ depends on the form of $e$. -\begin{enumerate} -\item -The a-priori type of a -typed expression $e:T$ is $T$. -\item -The a-priori type of a class instance -creation expression $c;\WITH;(b)$ is $C;\WITH;R$ where $C$ is the -type of the class given in $c$ and $R$ is the a-priori type of block -$b$. -\item -The a-priori type of a block is a record consisting the a-priori -types of each non-private identifier which is declared in the block -and which is visible at in last statement of the block. Here, it is -required that every import clause $\IMPORT;e_1 \commadots e_n$ refers -to expressions whose type can be computed with the type information -determined so far. Otherwise, a compile time error results. -\item -The a-priori type of any other expression is the expression's type, if -that type can be computed with the type information determined so far. -Otherwise, a compile time error results. -\end{enumerate} -The compiler will find an ordering in which types are assigned without -compiler errors to all variables $x_1 \commadots x_n$, if such an -ordering exists. This can be achieved by lazy evaluation. -} -\section{Exceptions} -\label{sec:exceptions} - -There is a predefined type \code{Throwable}, as well as functions to -throw and handle values of type \code{Throwable}. These are declared -as follows. - -\begin{lstlisting} - class Throwable { - def throw[a]: a - } - class ExceptOrFinally[a] { - def except (handler: PartialFunction[Throwable,a]): a - def finally (def handler: Unit): a - } - def try [a] (def body: a): ExceptOrFinally[a] -\end{lstlisting} - -The type \code{Throwable} represents exceptions and error objects; it -may be identified with an analogous type of the underlying -implementation such as \code{java.lang.Throwable}. We will in the -following loosely call values of type \code{Throwable} exceptions. - -The \code{throw} method in \code{Throwable} aborts execution of the -thread executing it and passes the thrown exception to the handler -that was most recently installed by a -\code{try} function in the current thread. If no \code{try} method is -active, the thread terminates. - -The \code{try} function executes its body with the given exception -handler. A \code{try} expression comes in two forms. The first form is - -\begin{lstlisting} -try $body$ except $handler$ . -\end{lstlisting} - -If $body$ executes without an exception being thrown, then executing -the try expression is equivalent to just executing $body$. If some -exception is thrown from within $body$ for which \code{handler} is defined, -the handler is invoked with the thrown exception as argument. - -The second form of a try expression is - -\begin{lstlisting} -try $body$ finally $handler$ . -\end{lstlisting} - -This expression will execute $body$. A normal execution of $body$ is -followed by an invocation of the $handler$ expression. The $handler$ -expression does not take arguments and has \code{Unit} as result type. -If execution of the handler expression throws an exception, this -exception is propagated out of the \code{try} statement. Otherwise, -if an exception was thrown in $body$ prior to invocation of $handler$, -that exception is re-thrown after the invocation. Finally, if both -$body$ and $handler$ terminate normally, the original result of -$body$ is the result of the \code{try} expression. - -\example An example of a try-except expression: - -\begin{lstlisting} -try { - System.in.readString() -} except { - case ex: EndOfFile => "" -} -\end{lstlisting} - -\example An example of a try-finally expression: - -\begin{lstlisting} -file = open (fileName) -if (file != null) { - try { - process (file) - } finally { - file.close - } -} -\end{lstlisting} - -\section{Concurrency} -\label{sec:concurrency} - -\subsection{Basic Concurrency Constructs} - -Scala programs may be executed by several threads that operate -concurrently. The thread model used is based on the model of the -underlying run-time system. We postulate a predefined -class \code{Thread} for run-time threads, -\code{fork} function to spawn off a new thread, -as well as \code{Monitor} and \code{Signal} classes. These are -specified as follows\notyet{Concurrency constructs are}. - - -\begin{lstlisting} -class Thread { $\ldots$ } -def fork (def p: Unit): Thread -\end{lstlisting} - -The \code{fork} function runs its argument computation \code{p} in a -separate thread. It returns the thread object immediately to its -caller. Unhandled exceptions (\sref{sec:exceptions}) thrown during -evaluation of \code{p} abort execution of the forked thread and are -otherwise ignored. - -\begin{lstlisting} -class Monitor { - def synchronized [a] (def e: a): a -} -\end{lstlisting} - -Monitors define a \code{synchronized} method which provides mutual -exclusion between threads. It executes its argument computation -\code{e} while asserting exclusive ownership of the monitor -object whose method is invoked. If some other thread has ownership of -the same monitor object, the computation is delayed until the other -process has relinquished its ownership. Ownership of a monitor is -relinquished at the end of the argument computation, and while the -computation is waiting for a signal. - -\begin{lstlisting} -class Signal { - def wait: Unit - def wait(msec: Long): Unit - def notify: Unit - def notifyAll: Unit -} -\end{lstlisting} - -The \code{Signal} class provides the basic means for process -synchronization. The \code{wait} method of a signal suspends the -calling thread until it is woken up by some future invocation of the -signal's \code{notify} or \code{notifyAll} method. The \code{notify} -method wakes up one thread that is waiting for the signal. The -\code{notifyAll} method wakes up all threads that are waiting for the -signal. A second version of the \code{wait} method takes a time-out -parameter (given in milliseconds). A thread calling \code{wait(msec)} -will suspend until unblocked by a \code{notify} or \code{notifyAll} -method, or until the \code{msec} millseconds have passed. - -\subsection{Channels} - -\begin{lstlisting} -class Channel[a] { - def write(x: a): Unit - def read: a -} -\end{lstlisting} - -An object of type \code{Channel[a]} Channels offer a write-operation -which writes data of type \code{a} to the channel, and a read -operation, which returns written data as a result. The write operation -is non-blocking; that is it returns immediately without waiting for -the written data to be read. - -\subsection{Message Spaces} - -The Scala library also provides message spaces as a higher-level, -flexible construct for process synchronization and communication. A -{\em message} is an arbitrary object that inherits from the -\code{Message} class. -There is a special message \code{TIMEOUT} which is used to signal a time-out. -\begin{lstlisting} -class Message -case class TIMEOUT extends Message -\end{lstlisting} -Message spaces implement the following class. -\begin{lstlisting} -class MessageSpace { - def send(msg: Message): Unit - def receive[a](f: PartialFunction1[Message, a]): a - def receiveWithin[a](msec: Long)(f: PartialFunction1[Message, a]): a -} -\end{lstlisting} -The state of a message space consists of a multi-set of messages. -Messages are added to the space using the \code{send} method. Messages -are removed using the \code{receive} method, which is passed a message -processor \code{f} as argument, which is a partial function from -messages to some arbitrary result type. Typically, this function is -implemented as a pattern matching expression. The \code{receive} -method blocks until there is a message in the space for which its -message processor is defined. The matching message is then removed -from the space and the blocked thread is restarted by applying the -message processor to the message. Both sent messages and receivers are -ordered in time. A receiver $r$ is applied to a matching message $m$ -only if there is no other (message, receiver) pair which precedes $(m, -r)$ in the partial ordering on pairs that orders each component in -time. - -The message space class also offers a method \code{receiveWithin} -which blocks for only a specified maximal amount of time. If no -message is received within the specified time interval (given in -milliseconds), the message processor argument $f$ will be unblocked -with the special \code{TIMEOUT} message. - -case class extends { $\ldots$ } - -trait List { } -class Nil -class Cons - -\comment{changes: - Type ::= SimpleType {with SimpleType} [with Refinement] - | class SimpleType - SimpleType ::= SimpleType [TypeArgs] - | `(' [Types] `)' - | - | this -} \end{document} - -\comment{changes: - - Type ::= SimpleType {with SimpleType} [with Refinement] - | class SimpleType - SimpleType ::= TypeDesignator [TypeArgs] - | `(' Type `,' Types `)' - | `(' [Types] `)' Type - | this - - PureDef ::= module ModuleDef {`,' ModuleDef} - ::= def FunDef {`,' FunDef} - | type TypeDef {`,' TypeDef} - | [case] class ClassDef {`,' ClassDef} - | case CaseDef {`,' CaseDef} - CaseDef ::= Ids ClassTemplate - - Modifier ::= final - | private - | protected - | override [QualId] - | qualified - | abstract - -\section{Class Aliases} -\label{sec:class-alias} - -\syntax\begin{lstlisting} - ClassDef ::= ClassAlias - InterfaceDef ::= ClassAlias - ClassAlias ::= id [TypeParamClause] `=' SimpleType -\end{lstlisting} - -Classes may also be defined to be aliases for other classes. A class -alias is of the form $\CLASS;c[$\tps\,$] = d[$\targs\,$]$ where $d[$\targs\,$]$ is a -class type. Both $\tps$ and $\targs$ may be empty. -This introduces the type $c[$\tps\,$]$ as an alias for type -$d[$\targs\,$]$, in the same way the following type alias definition would: -\begin{lstlisting} -type c[$\tps\,$] = d[$\targs\,$] -\end{lstlisting} -The class alias definition is legal if the type alias definition would be legal. - -Assuming $d$ defines a class with type parameters $$\tps$'$ and -parameters $(ps_1) \ldots (ps_n)$, the newly defined type is also -introduced as a class with a constructor which takes type parameters -$[$\tps\,$]$, and which takes value parameters -$([$\targs$/$\tps$']ps_1)\ldots([$\targs$/$\tps$']ps_n)$. - -The modifiers \code{private} and -\code{protected} apply to a class alias independently of the class it represents. -The class $c$ is regarded as final if $d$ is final, or if a -\code{final} modifier is given for the alias definition. -$c$ is regarded as a case class iff $d$ is one. In this -case, -\begin{itemize} -\item the alias definition may also be prefixed with \code{case}, and -\item the case constructor is also aliased, as if it was -defined such: -\begin{lstlisting} -def c[$\tps\,$]($ps_1$)\ldots($ps_n$):D = d[$\targs\,$]$([$\targs$/$\tps$']$ps_1$)\ldots([$\targs$/$\tps$']$ps_n$)$ . -\end{lstlisting} -The new function $c$ is again classified as a case constructor, so -it may appear in constructor patterns (\sref{sec:patterns}). -\end{itemize} -Aliases for interfaces follow the same rules as class aliases, but -start with \code{interface} instead of \code{class}. -} - -type T extends { $\ldots$ } - -class C extends { $\ldots$ } - -new C { $\ldots$ } - -type C -class C < { $\ldots$ } - -A & B & C & -\ifqualified{ -Parameter clauses (\sref{sec:funsigs}), -definitions that are local to a block (\sref{sec:blocks}), and import -clauses always introduce {\em simple names} $x$, which consist of a -single identifier. On the other hand, definitions and declarations -that form part of a module (\sref{sec:modules}) or a class -(\sref{sec:classes}) conceptually always introduce {\em qualified -names}\notyet{Qualified names are} -$Q\qex x$ where a simple name $x$ comes with a qualified -identifier $Q$. $Q$ is either the fully qualified name of a module or -class which is labelled -\code{qualified}, or it is the empty name $\epsilon$. - -The {\em fully qualified name} of a module or class $M[$\targs\,$]$ with -simple name $M$ and type arguments $[$\targs\,$]$ is -\begin{itemize} -\item $Q.M$, if the definition of $M$ appears in the template defining -a module or class with fully qualified name $Q$. -\item -$M$ if the definition of $M$ appears on the top-level or as a definition -in a block. -\end{itemize} -} - -\ifqualified{ -It is possible that a definition in some class or module $M$ -introduces several qualified names $Q_1\qex x \commadots Q_n\qex x$ in a name -space that have the same simple name suffix but different qualifiers -$Q_1 \commadots Q_n$. This happens for instance if a module \code{M} -implements two qualified classes \code{C}, \code{D} that each define a -function \code{f}: -\begin{lstlisting} -qualified abstract class B { def f: Unit = ""} -qualified abstract class C extends B { def f: Unit } -qualified abstract class D extends B { def f: Unit } - -module M extends C with D { - override C def f = println("C::f") - override D def f = println("D::f") - - // f // error: ambiguous - (this:D).f // prints ``D::f'' -} - -def main() = (M:C).f // prints ``C::f'' -\end{lstlisting} -Members of modules or classes are accessed using simple names, -not qualified names. - -The {\em qualified expansion} of a simple name $x$ in some type $T$ is -determined as follows: Let $Q_1\qex x \commadots Q_n\qex x$ be all the -qualified names of members of $T$ that have a simple name suffix $x$. -If one of the qualifiers $Q_i$ is the empty name $\epsilon$, then the -qualified expansion of $x$ in $T$ is $\epsilon\qex x$. Otherwise, let -$C_1 -\commadots C_n$ be the base classes (\sref{sec:base-classes}) -of $T$ that have fully qualified -names $Q_1 -\commadots Q_n$, respectively. If there exists a least class $C_j$ -among the $C_i$ in the subclass ordering, then the qualified expansion -of $x$ in $T$ is $Q_j\qex x$. Otherwise the qualified expansion does not -exist. - -Conversely, if $Q\qex x$ is the qualified expansion of some simple -name $x$ in $M$, we say that the entity named $Q\qex x$ in $M$ is {\em -identified in $M$ by the simple name} $x$. We leave out the -qualification ``in $M$'' if it is clear from the context. -In the example above, the qualified expansion of \code{f} in \code{C} -is \code{C::f}, because \code{C} is a subclass of \code{B}. On the -other hand, the qualified expansion of \code{f} in \code{M} does not -exist, since among the two choices \code{C::f} and \code{D::f} neither -class is a subclass of the other. - -A member access $e.x$ of some type term $e$ of type $T$ references the -member identified in $T$ by the simple name $x$ (i.e.\ the member -which is named by the qualified expansion of $x$ in $T\,$). - -In the example above, the simple name \code{f} in \code{M} would be -ambiguous since the qualified expansion of \code{f} in \code{M} does -not exist. To reference one of the two functions with simple name -\code{f}, one can use an explicit typing. For instance, the name -\code{(this:D).f} references the implementation of \code{D::f} in -\code{M}. -} - -\comment{ -\example The following example illustrates the difference between -virtual and non-virtual members with respect to overriding. - -\begin{lstlisting} -class C { - virtual def f = "f in C" - def g = "g in C" - def both1 = this.f ++ ", " ++ this.g - def both2 = f ++ ", " ++ g -} - -class D extends C { - override def f = "f in D" - override def g = "redefined g in D" - new def g = "new g in D" -} - -val d = D -println(d.f) // prints ``f in D'' -println(d.g) // prints ``new g in D'' -println(d.both1) // prints ``f in D, redefined g in D'' -println(d.both2) // prints ``f in D, g in C'' - -val c: C = d -println(c.f) // prints ``f in D'' -println(c.g) // prints ``redefined g in D'' -println(c.both1) // prints ``f in D, redefined g in D'' -println(c.both2) // prints ``f in D, g in C'' -\end{lstlisting} -} - -\comment{ -\section{The Self Type} -\label{sec:self-type} - -\syntax\begin{lstlisting} -SimpleType ::= $\This$ -\end{lstlisting} - -The self type \code{this} may be used in the statement part of a -template, where it refers to the type of the object being defined by -the template. It is the type of the self reference \code{this}. - -For every leaf class (\sref{sec:modifiers}) $C$, \code{this} is -treated as an alias for the class itself, as if it was declared such: -\begin{lstlisting} -final class C $\ldots$ { - type this = C - $\ldots$ -} -\end{lstlisting} -For non-leaf classes $C$, \code{this} is treated as an abstract type -bounded by the class itself, as if it was declared such: -\begin{lstlisting} -abstract class C $\ldots$ { - type this extends C - $\ldots$ -} -\end{lstlisting} - -Analogously, for every compound type \lstinline@$T_1$ with $\ldots$ with $T_n$@, -\code{this} is treated as an abstract type conforming to the whole compound -type, as if it was bound in the refinement -\begin{lstlisting} -type this extends $T_1$ with $\ldots$ with $T_n$ . -\end{lstlisting} -Finally, for every declaration of a parameter or abstract type -\mbox{$a \extends T\,$}, \code{this} is treated as an an abstract type -conforming to $a$, as if the bound type $T$ was augmented to -\lstinline@$T$ { abstract type this extends $a$@~}. -On the other hand, if the parameter or abstract type is declared -\code{final}, as in $\FINAL;a \extends T$, then \code{this} is treated as an alias -for $a$, as if the bound type $T$ was augmented to -\lstinline@$T$ { type this = $a$ }@~. - -\example -Consider the following classes for one- and two-dimensional -points with a \code{distance} method that computes the distance -between two points of the same type. -\begin{lstlisting} -class Point1D(x: Float) { - def xCoord = x - def distance (that: this) = abs(this.xCoord - that.xCoord) - def self: this = this -} -final class FinalPoint1D(x: Float) extends Point1D(x) - -class Point2D(x: Float, y: Float) extends Point1D(x) { - def yCoord = y - override def distance(that: this) = - sqrt (square(this.xCoord - that.xCoord) + square(this.yCoord - that.yCoord)) -} -\end{lstlisting} -Assume the following definitions: -\begin{lstlisting} -val p1f: FinalPoint1D = FinalPoint1D(0.0) -val p1a: Point1D = p1f -val p1b: Point1D = Point2D(3.0, 4.0) -\end{lstlisting} -Of the following expressions, three are well-formed, the other three -are ill-formed. -\begin{lstlisting} -p1f distance p1f // OK, yields 0,0 -p1f distance p1b // OK, yields 3.0 -p1a distance p1a // OK, yields 0.0 -p1a distance p1f // ERROR, required: p1a.this, found: FinalPoint1D -p1a distance p1b // ERROR, required: p1a.this, found: p1b.this -p1b distance p1a // ERROR, required: p1b.this, found: p1a.this -\end{lstlisting} -The last of these expressions would cause an illegal access to a -non-existing class \code{yCoord} of an object of type \code{Point1D}, -if it were permitted to execute in spite of being not well-typed. -} - -\iflet{ -\section{Let Definitions} -\label{sec:letdef} - -\syntax\begin{lstlisting} - PureDef ::= $\LET$ ValDef {`,' ValDef} - ValDef ::= id [`:' Type] `=' Expr -\end{lstlisting} - -A let definition $\LET;x: T = e$ defines $x$ as a name of the value -that results from the delayed evaluation of $e$. The type $T$ must be -a concrete value type (\sref{sec:types}) and the type of the -expression $e$ must conform to $T$. The effect of the let definition -is to bind the left-hand side $x$ to the result of evaluating $e$ -converted to type $T$. However, the expression $e$ is not evaluated -at the point of the let definition, but is instead evaluated the first -time $x$ is dereferenced during execution of the program (which might -be never at all). An attempt to dereference $x$ again in the course of -evaluation of $e$ leads to a run-time error. Other threads trying to -dereference $x$ while $e$ is being evaluated block until evaluation is -complete. - -The type $T$ may be omitted if it can be determined using local type -inference (\sref{sec:local-type-inf}). -} - -\section{Packagings} - -\syntax\begin{lstlisting} - Packaging ::= package QualId `{' {TopStat `;'} TopStat `}' -\end{lstlisting} - -A package is a special object which defines a set of member classes, -objects and packages. Unlike other objects, packages are not defined -by a definition. Instead, the set of members is determined by -packagings. - -A packaging \code{package p { ds }} injects all definitions in -\code{ds} as members into the package whose qualified name is -\code{p}. If a definition in \code{ds} is labelled \code{private}, it -is visible only for other members in the package. - -Selections \code{p.m} from \code{p} as well as imports from \code{p} -work as for objects. However, unlike other objects, packages may not -be used as values. It is illegal to have a package with the same fully -qualified name as an object or a class. - -Top-level definitions outside a packaging are assumed to be injected -into a special empty package. That package cannot be named and -therefore cannot be imported. However, members of the empty package -are visible to each other without qualification. - -\example The following examples create a hello world program as -function \code{main} of module \code{test.HelloWorld}. -\begin{lstlisting} -package test; - -object HelloWord { - def main(args: Array[String]) = System.out.println("hello world") -} -\end{lstlisting} - -\ifpackaging{ -Packagings augment top-level modules and classes. A simple packaging -$$\PACKAGE;id;\WITH;mi_1;\ldots;\WITH;mi_n;\WITH;($\stats\,$)$$ augments the -template of the top-level module or class named $id$ with new mixin -classes and with new member definitions. - -The static effect of such a packaging can be expressed as a -source-to-source tranformation which adds $mi_1 \commadots mi_n$ to -the mixin classes of $id$, and which adds the definitions in $\stats$ -to the statement part of $id$'s template. Each type $mi_j$ must refer -to an interface type and $\stats$ must consists only of pure and local -definitions. The augmented template and any class that extends it -must be well-formed. The aditional definitions may not overwrite -definitions of the augmented template, and they may not access private -members of it. - -Several packagings can be applied to the same top-level definition, -and those packagings may reside in different compilation units. - -A qualified packaging $\PACKAGE;Q.id;\WITH;t$ is equivalent to the -nested packagings -\begin{lstlisting} -package $Q$ { - package $id$ with $t$ -} -\end{lstlisting} - -A packaging with type parameters $\PACKAGE;c[$\tps\,$];\WITH;$\ldots$$ applies to -a parameterized class $c$. The number of type parameters must equal -the number of type parameters of $c$, and every bound in $\tps$ must -conform to the corresponding bound in the original definition of $c$. - -The augmented class has the type parameters given in its original -definition. If a parameter $a$ of an augmented class has a bound $T$ -which is a strict subtype of the corresponding bound in the original -class, $a \conforms T$ is taken as an {\em application condition} for -the packaging. That is, every time a member defined in the packaging -is accessed or a conformance between class $c$ and a mixin base class -of the packaging needs to be established, an (instantiation of) the -application condition is checked. An unvalidated application -condition constitutes a type error. \todo{Need to specify more -precisely when application conditions are checked} - -\example The following example will create a hello world program as -function \code{main} of module \code{test.HelloWorld}. -\begin{lstlisting} -package test { - module HelloWord { - def main(args: Array[String]) = out.println("hello world") - } -} -\end{lstlisting} -This assumes there exists a top-level definition that defines a -\code{test} module, e.g.: -\begin{lstlisting} -module test -\end{lstlisting} - -\example The following packaging adds class \code{Comparable} -(\ref{ex:comparable}) as a mixin to class -\code{scala.List}, provided the list elements are also comparable. -Every instance of \lstinline@List[$T\,$]@ will then implement -\lstinline@Comparable[List[$T\,$]]@ in the way it is defined in the -packaging. Each use of the added functionality for an instance type -\lstinline@List[$T\,$]@ requires that the application condition -\lstinline@T $<:$ Comparable@ is satisfied. -\begin{lstlisting} -package scala.List[a extends Comparable[a]] with Comparable[List[a]] { - def < (that: List[a]) = (this, that) match { - case (_, Nil) => False - case (Nil, _) => True - case (x :: xs, y :: ys) => (x < y) || (x == y && xs < ys) - } -} -\end{lstlisting} -} - - - -} -\end{lstlisting} -} - - |