summaryrefslogtreecommitdiff
path: root/book/src/main/scalatex/book/indepth/DesignSpace.scalatex
blob: da17a4f9e95b11eb2a2a878912f27e890d6f5cd9 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
@p
    Scala.js is a relatively large project, and is the result of both an enormous amount of hard work as well as a number of decisions that craft what it's like to program in Scala.js today. Many of these decisions result in marked differences from the behavior of the same code running on the JVM. This chapter explores the reasoning and rationale behind these decisions.


@sect("Why No Reflection?")
  @p
    Scala.js prohibits reflection as it makes dead-code elimination difficult, and the compiler relies heavily on dead-code elimination to generate reasonably-sized executables. The chapter on @sect.ref("The Compilation Pipeline") goes into more detail of why, but a rough estimate of the effect of various optimizations on a small application is:

  @ul
    @li
      @b{Full Output} - ~20mb
    @li
      @b{Naive Dead-Code-Elimnation} - ~800kb
    @li
      @b{Inlining Dead-Code-Elimnation} - ~600kb
    @li
      @b{Minified by Google Closure Compiler} - ~200kb

  @p
    The default output size of 20mb makes the executables difficult to work with. Even though browsers can deal with 20mb Javascript blobs, it takes the browser several seconds to even load it, and up to a minute after that for the JIT to optimize the whole thing.

  @sect{Dead Code Elimination}
    @p
      To illustrate why reflection makes things difficult, consider a tiny application:

    @hl.scala
      @@JSExport
      object App extends js.JSApp{
        @@JSExport
        def main() = {
          println(foo())
        }
        def foo() = 10
        def bar = "i am a cow"
      }
      object Dead{
        def complexFunction() = ...
      }

    @p
      When the @sect.ref("Fast Optimization", "Scala.js optimizer"),  looks at this application, it is able to deduce certain things immediately:

    @ul
      @li
        @hl.scala{App} and @hl.scala{App.main} are exported via @hl.scala{@@JSExport}, and thus can't be considered dead code.
      @li
        @hl.scala{App.foo} is called from @hl.scala{App.main}, and so has to be kept around
      @li
        @hl.scala{App.bar} is never called from @hl.scala{App.main} or @hl.scala{App.foo}, and so can be eliminated
      @li
        @hl.scala{Dead}, including @hl.scala{Dead.complexFunction}, are not called from any live code, and can be eliminated.

    @p
      The actual process is a bit more involved than this, but this is a first-approximation of how the dead-code-elimination works: you start with a small set of live code (e.g. @hl.scala{@@JSExport}ed things), search out to find the things which are recursively reachable from that set, and eliminate all the rest. This means that the Scala.js compiler can eliminate, e.g., parts of the Scala standard library that you are not using. The standard library is not small, and makes up the bulk of the 20mb of the uncompressed blob.

  @sect{Whither Reflection?}
    @p
      To imagine why reflection makes this difficult, imagine a slightly modified program which includes some reflective calls in @hl.scala{App.main}

    @hl.scala
      @@JSExport
      object App extends js.JSApp{
        @@JSExport
        def main() = {
          Class.forName(userInput()).getMethod(userInput()).invoke()
        }
        def foo() = 10
        def bar = "i am a cow"
      }
      object Dead{
        def complexFunction() = ...
      }

    @p
      Here, we're assuming @hl.scala{userInput()} is some method which returns a @hl.scala{String} that was input by the user or otherwise somehow decided at runtime.
    @p
      We can start the same process: @hl.scala{App.main} is live since we @hl.scala{@@JSExport}ed it, but what objects or methods are reachable from @hl.scala{App.main}? The answer is: it depends on the values of @hl.scala{userInput()}, which we don't know. And hence we don't know which classes or methods are reachable! Depending on what @hl.scala{userInput()} returns, any or all methods and classes could be used by @hl.scala{App.main()}.
    @p
      This leaves us a few options:

    @ul
      @li
        Keep every method or class around at runtime. This severely hampers the compiler's ability to optimize, and results in massive 20mb executables.
      @li
        Ignore reflection, and go ahead and eliminate/optimize things assuming reflection did not exist.
      @li
        Allow the user to annotate methods/classes that should be kept, and eliminate the rest.

    @p
      All three are possible options: Scala.js started off with #1. #3 is the approach used by @lnk("Proguard", "http://proguard.sourceforge.net/manual/examples.html#annotated"), which lets you annotate things e.g. @hl.scala{@@KeepApplication} to preserve things for reflection and preventing Proguard from eliminating them as dead code.

    @p
      In the end, Scala.js chose #2. This is helped by the fact that overall, Scala code tends not to use reflection as heavily as Java, or dynamic languages which use it heavily. Scala uses techniques such as @lnk("lambdas", "http://docs.scala-lang.org/tutorials/tour/anonymous-function-syntax.html") or @lnk("implicits", "http://docs.scala-lang.org/tutorials/tour/implicit-parameters.html") to satisfy many use cases which Java has traditionally used reflection for, while friendly to the optimizer.

    @p
      There are a range of use-cases for reflection where you want to inspect an object's structure or methods, where lambdas or implicits don't help. People use reflection to @lnk("serialize objects", "http://jackson.codehaus.org/DataBindingDeepDive"), or for @lnk("routing messages to methods", "https://access.redhat.com/documentation/en-US/Fuse_ESB_Enterprise/7.1/html/Implementing_Enterprise_Integration_Patterns/files/BasicPrinciples-BeanIntegration.html"). However, both these cases can be satisfied by...

  @sect{Macros}

    @p
      The Scala programming language, since the 2.10.x series, has support for @lnk("Macros", "http://docs.scala-lang.org/overviews/macros/overview.html") in the language. Although experimental, these are heavily used in many projects such as Play and Slick and Akka, and allow a developer to perform compile-time computations and generate code where-ever the macros are used.

    @p
      People typically think of macros as AST-transformers: you pass in an AST and get a modified AST out. However, in Scala, these ASTs are strongly-typed, and the macro is able to inspect the types involved in generating the output AST. This leads to a lot of @lnk("interesting techniques", "http://docs.scala-lang.org/overviews/macros/implicits.html") around macros where you synthesize ASTs based on the type (explicit or inferred) of the macro callsite, something that is impossible in dynamic languages.

    @p
      Practically, this means that you can use macros to do things such as inspecting the methods, fields and other type-level properties of a typed value. This allows us to do things like @lnk("serialize objects with no boilerplate", "https://github.com/lihaoyi/upickle"):

    @hl.scala
      import upickle._

      case class Thing(a: Int, b: String)
      write(Thing(1, "gg"))
      // res23: String = {"a": 1, "b": "gg"}

    @p
      Or to @lnk("route messages to the appropiate methods", "https://github.com/lihaoyi/autowire") without boilerplate, and @i{without} using reflection!

    @p
      The fact that you can satisfy these use cases with macros is non-obvious: in dynamic languages, macros only get an AST, which is basically opaque when you're only passing a single value to it. With Scala, you get the value @i{together with it's type}, which lets you inspect the type and generate the proper serialization/routing code that is impossible to do in a dynamic language with macros.

    @p
      Using macros here also plays well with the Scala.js optimizer: the macros are fully expanded before the optimizer is run, so by the time the optimizer sees the code, there is no more magic left: it is then free to do dead-code-elimination/inlining/other-optimizations without worrying about reflection causing the code to do weird things at runtime. Thus, we've managed to substitute most of the main use-cases of reflection, and so can do without it.

@sect{Why does error behavior differ?}
  @p
    Scala.js deviates from the semantics of Scala-JVM in several ways. Many of these ways revolve around the edge-conditions of a program: what happens when something goes wrong? An array index is out of bounds? An integer is divided-by-zero? These differences cause some amount of annoyance when debugging, since when you mess up an array index, you expect an exception, not silently-invalid-data!

  @p
    In most of these cases, it was a trade-off between performance and correctness. These are situations where the default semantics of Scala deviate from that of Javascript, and Scala.js would have to perform extra work to emulate the desired behavior. For example, compare the division behavior of the JVM and Javascript.
  @sect{Divide-by-zero: a case study}
    @hl.scala
      /*JVM*/
      15 / 4              // 3
    @hl.javascript
      /*JS*/
      15 / 4              // 3.25
    @p
      On the JVM, integer division is a primitive, and dividing @hl.scala{15 / 4} gives @hl.scala{3}. However, in Javascript, it gives @hl.javascript{3.25}, since all numbers of double-precision floating points.

    @p
      Scala.js works around this in the general case by adding a @hl.javascript{| 0} to the translation, e.g.

    @hl.scala
      /*JVM*/
      15 / 4              // 3
    @hl.javascript
      /*JS*/
      (15 / 4) | 0        // 3

    @p
      This gives the correct result for most numbers, and is reasonably efficient. However, what about dividing-by-zero?

    @hl.scala
      /*JVM*/
      15 / 0              // ArithmeticException
    @hl.javascript
      /*JS*/
      15 / 0              // Infinity
      (15 / 0) | 0        // 0

    @p
      On the JVM, the JVM is kind enough to throw an exception for you. However, in Javascript, the integer simply wraps around to @hl.javascript{Infinity}, which then gets truncated down to zero.
    @p
      So that's the current behavior of integers in Scala.js. One may ask: can we fix it? And the answer is, we can:
    @hl.scala
      /*JVM*/
      1 / 0               // ArithmeticException
    @hl.javascript
      /*JS*/
      function intDivide(x, y){
        var z = x / y
        if (z == Infinity) throw new ArithmeticException("Divide by Zero")
        else return z
      }
      intDivide(1, 0)     // ArithmeticException
    @p
      This translation fixes the problem, and enforces that the @hl.scala{ArithmeticException} is thrown at the correct time. However, this approach causes some overhead: what was previously two primitive operations is now a function call, a local variable assignment, and a conditional. That is a lot more expensive than two primitive operations!

  @sect{The Performance/Correctness Tradeoff}
    @p
      In the end, a lot of the semantic differences listed here come down to the same tradeoff: we could make the code behave more-like-Scala, but at a cost of adding overhead via function calls and other checks. Furthermore, the cost is paid regardless of whether the "exceptional case" is triggered or not: in the example above, every division in the program pays the cost!
    @p
      The decision to not support these exceptional cases comes down to a value judgement: how often do people actually depend on an exception being thrown as part of their program semantics, e.g. by catching it and performing actions? And how often are they just a way of indicating bugs? It turns out that very few @hl.scala{ArithmeticException}s, @hl.scala{ArrayIndexOutOfBoundsException}s, or similar are actually a necessary part of the program! They exist during debugging, but after that, these code paths are never relied upon "in production".
    @p
      Thus Scala.js goes for a compromise: in the Fast Optimization mode, we run the code with all these checks in place, so as to catch cases where these errors occur close-to-the-source and make it easy for you to debug them. In Full Optimization mode, on the other hand, we remove these checks, assuming you've already ran through these cases and found any bugs during development.
    @p
      This is a common pattern in situations where there's a tradeoff between debuggability and speed. In Scala.js' case, it allows us to get good debuggability in development, as well as good performance in production. There's some loss in debuggability in development, sacrificed in exchange for greater performance.


@sect("Why Jars instead of RequireJS/CommonJS")
  @p
    In JVM-land, the standard method for distributing these libraries is as @lnk("Maven Artifacts", "http://stackoverflow.com/questions/2487485/what-is-maven-artifact"). These are typically published in a public location such as @lnk("Maven Central", "http://search.maven.org/"), where others can find and download them for use. Typically downloads are done automatically by the build-tool: in Scala-JVM typically this is SBT.
  @p
    In Javascript-land, there are multiple ways of acquiring dependencies: @lnk("CommonJS", "http://en.wikipedia.org/wiki/CommonJS") and @lnk("RequireJS/AMD", "http://requirejs.org/") are two competing standards with a host of implementations. Historically, a third approach has been most common: the developer would simply download the modules himself, check it into source-control and manually add a @hl.html{<script>} tag to the HTML page that will make the functionality available through some global variable.
  @p
    In Scala.js, we side with the JVM standard of distributing libraries as maven jars. This lets us take advantage of all the existing tooling around Scala to handle these jars (SBT, Ivy, Maven Central, etc.) which is far more mature and cohesive than the story in Javascript-land. For example, the Scalatags library we used in the earlier is @lnk("published on maven central", "http://search.maven.org/#search%7Cga%7C1%7Cscalatags"), and adding one line to SBT is enough to pull it down and include it in our project.

  @p
    One interesting wrinkle in Scala.js's case is that since Scala can compile to both Scala.js and Scala-JVM, it is entirely possible to publish a library that can run on both client and server! This chapter will explore the process of building, testing, and publishing such a library.