summaryrefslogtreecommitdiff
path: root/book/src/main/scalatex/book/indepth/DesignSpace.scalatex
blob: 594d8c8f5db6fbe43ca8bc1dc39ce83c6d5b8c5e (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
@import BookData._

@p
    Scala.js is a relatively large project, and is the result of both an enormous amount of hard work as well as a number of decisions that craft what it's like to program in Scala.js today. Many of these decisions result in marked differences from the behavior of the same code running on the JVM. This chapter explores the reasoning and rationale behind these decisions.


@sect{Why No Reflection?}
  @p
    Scala.js prohibits reflection as it makes dead-code elimination difficult, and the compiler relies heavily on dead-code elimination to generate reasonably-sized executables. The chapter on @sect.ref("The Compilation Pipeline") goes into more detail of why, but a rough estimate of the effect of various optimizations on a small application is:

  @ul
    @li
      @b{Full Output} - ~20mb
    @li
      @b{Naive Dead-Code-Elimnation} - ~800kb
    @li
      @b{Inlining Dead-Code-Elimnation} - ~600kb
    @li
      @b{Minified by Google Closure Compiler} - ~200kb

  @p
    The default output size of 20mb makes the executables difficult to work with. Even though browsers can deal with 20mb Javascript blobs, it takes the browser several seconds to even load it, and up to a minute after that for the JIT to optimize the whole thing.

  @sect{Dead Code Elimination}
    @p
      To illustrate why reflection makes things difficult, consider a tiny application:

    @hl.scala
      @@JSExport
      object App extends js.JSApp{
        @@JSExport
        def main() = {
          println(foo())
        }
        def foo() = 10
        def bar = "i am a cow"
      }
      object Dead{
        def complexFunction() = ...
      }

    @p
      When the @sect.ref("Fast Optimization", "Scala.js optimizer"),  looks at this application, it is able to deduce certain things immediately:

    @ul
      @li
        @hl.scala{App} and @hl.scala{App.main} are exported via @hl.scala{@@JSExport}, and thus can't be considered dead code.
      @li
        @hl.scala{App.foo} is called from @hl.scala{App.main}, and so has to be kept around
      @li
        @hl.scala{App.bar} is never called from @hl.scala{App.main} or @hl.scala{App.foo}, and so can be eliminated
      @li
        @hl.scala{Dead}, including @hl.scala{Dead.complexFunction}, are not called from any live code, and can be eliminated.

    @p
      The actual process is a bit more involved than this, but this is a first-approximation of how the dead-code-elimination works: you start with a small set of live code (e.g. @hl.scala{@@JSExport}ed things), search out to find the things which are recursively reachable from that set, and eliminate all the rest. This means that the Scala.js compiler can eliminate, e.g., parts of the Scala standard library that you are not using. The standard library is not small, and makes up the bulk of the 20mb of the uncompressed blob.

  @sect{Whither Reflection?}
    @p
      To imagine why reflection makes this difficult, imagine a slightly modified program which includes some reflective calls in @hl.scala{App.main}

    @hl.scala
      @@JSExport
      object App extends js.JSApp{
        @@JSExport
        def main() = {
          Class.forName(userInput()).getMethod(userInput()).invoke()
        }
        def foo() = 10
        def bar = "i am a cow"
      }
      object Dead{
        def complexFunction() = ...
      }

    @p
      Here, we're assuming @hl.scala{userInput()} is some method which returns a @hl.scala{String} that was input by the user or otherwise somehow decided at runtime.
    @p
      We can start the same process: @hl.scala{App.main} is live since we @hl.scala{@@JSExport}ed it, but what objects or methods are reachable from @hl.scala{App.main}? The answer is: it depends on the values of @hl.scala{userInput()}, which we don't know. And hence we don't know which classes or methods are reachable! Depending on what @hl.scala{userInput()} returns, any or all methods and classes could be used by @hl.scala{App.main()}.
    @p
      This leaves us a few options:

    @ul
      @li
        Keep every method or class around at runtime. This severely hampers the compiler's ability to optimize, and results in massive 20mb executables.
      @li
        Ignore reflection, and go ahead and eliminate/optimize things assuming reflection did not exist.
      @li
        Allow the user to annotate methods/classes that should be kept, and eliminate the rest.

    @p
      All three are possible options: Scala.js started off with #1. #3 is the approach used by @lnk("Proguard", "http://proguard.sourceforge.net/manual/examples.html#annotated"), which lets you annotate things e.g. @hl.scala{@@KeepApplication} to preserve things for reflection and preventing Proguard from eliminating them as dead code.

    @p
      In the end, Scala.js chose #2. This is helped by the fact that overall, Scala code tends not to use reflection as heavily as Java, or dynamic languages which use it heavily. Scala uses techniques such as @lnk("lambdas", "http://docs.scala-lang.org/tutorials/tour/anonymous-function-syntax.html") or @lnk("implicits", "http://docs.scala-lang.org/tutorials/tour/implicit-parameters.html") to satisfy many use cases which Java has traditionally used reflection for, while friendly to the optimizer.

    @p
      There are a range of use-cases for reflection where you want to inspect an object's structure or methods, where lambdas or implicits don't help. People use reflection to @lnk("serialize objects", "http://jackson.codehaus.org/DataBindingDeepDive"), or for @lnk("routing messages to methods", "https://access.redhat.com/documentation/en-US/Fuse_ESB_Enterprise/7.1/html/Implementing_Enterprise_Integration_Patterns/files/BasicPrinciples-BeanIntegration.html"). However, both these cases can be satisfied by...

  @sect{Macros}

    @p
      The Scala programming language, since the 2.10.x series, has support for @lnk("Macros", "http://docs.scala-lang.org/overviews/macros/overview.html") in the language. Although experimental, these are heavily used in many projects such as Play and Slick and Akka, and allow a developer to perform compile-time computations and generate code where-ever the macros are used.

    @p
      People typically think of macros as AST-transformers: you pass in an AST and get a modified AST out. However, in Scala, these ASTs are strongly-typed, and the macro is able to inspect the types involved in generating the output AST. This leads to a lot of @lnk("interesting techniques", "http://docs.scala-lang.org/overviews/macros/implicits.html") around macros where you synthesize ASTs based on the type (explicit or inferred) of the macro callsite, something that is impossible in dynamic languages.

    @p
      Practically, this means that you can use macros to do things such as inspecting the methods, fields and other type-level properties of a typed value. This allows us to do things like @lnk("serialize objects with no boilerplate", "https://github.com/lihaoyi/upickle"):

    @hl.scala
      import upickle._

      case class Thing(a: Int, b: String)
      write(Thing(1, "gg"))
      // res23: String = {"a": 1, "b": "gg"}

    @p
      Or to @lnk("route messages to the appropiate methods", "https://github.com/lihaoyi/autowire") without boilerplate, and @i{without} using reflection!

    @p
      The fact that you can satisfy these use cases with macros is non-obvious: in dynamic languages, macros only get an AST, which is basically opaque when you're only passing a single value to it. With Scala, you get the value @i{together with it's type}, which lets you inspect the type and generate the proper serialization/routing code that is impossible to do in a dynamic language with macros.

    @p
      Using macros here also plays well with the Scala.js optimizer: the macros are fully expanded before the optimizer is run, so by the time the optimizer sees the code, there is no more magic left: it is then free to do dead-code-elimination/inlining/other-optimizations without worrying about reflection causing the code to do weird things at runtime. Thus, we've managed to substitute most of the main use-cases of reflection, and so can do without it.

@sect{Why does error behavior differ?}
  @p
    Scala.js deviates from the semantics of Scala-JVM in several ways. Many of these ways revolve around the edge-conditions of a program: what happens when something goes wrong? An array index is out of bounds? An integer is divided-by-zero? These differences cause some amount of annoyance when debugging, since when you mess up an array index, you expect an exception, not silently-invalid-data!

  @p
    In most of these cases, it was a trade-off between performance and correctness. These are situations where the default semantics of Scala deviate from that of Javascript, and Scala.js would have to perform extra work to emulate the desired behavior. For example, compare the division behavior of the JVM and Javascript.
  @sect{Divide-by-zero: a case study}
    @hl.scala
      /*JVM*/
      15 / 4              // 3
    @hl.javascript
      /*JS*/
      15 / 4              // 3.25
    @p
      On the JVM, integer division is a primitive, and dividing @hl.scala{15 / 4} gives @hl.scala{3}. However, in Javascript, it gives @hl.javascript{3.25}, since all numbers of double-precision floating points.

    @p
      Scala.js works around this in the general case by adding a @hl.javascript{| 0} to the translation, e.g.

    @hl.scala
      /*JVM*/
      15 / 4              // 3
    @hl.javascript
      /*JS*/
      (15 / 4) | 0        // 3

    @p
      This gives the correct result for most numbers, and is reasonably efficient (actually, it tends to be @i{more} efficient on modern VMs). However, what about dividing-by-zero?

    @hl.scala
      /*JVM*/
      15 / 0              // ArithmeticException
    @hl.javascript
      /*JS*/
      15 / 0              // Infinity
      (15 / 0) | 0        // 0

    @p
      On the JVM, the JVM is kind enough to throw an exception for you. However, in Javascript, the integer simply wraps around to @hl.javascript{Infinity}, which then gets truncated down to zero.
    @p
      So that's the current behavior of integers in Scala.js. One may ask: can we fix it? And the answer is, we can:
    @hl.scala
      /*JVM*/
      1 / 0               // ArithmeticException
    @hl.javascript
      /*JS*/
      function intDivide(x, y){
        var z = x / y
        if (z == Infinity) throw new ArithmeticException("Divide by Zero")
        else return z
      }
      intDivide(1, 0)     // ArithmeticException
    @p
      This translation fixes the problem, and enforces that the @hl.scala{ArithmeticException} is thrown at the correct time. However, this approach causes some overhead: what was previously two primitive operations is now a function call, a local variable assignment, and a conditional. That is a lot more expensive than two primitive operations!

  @sect{The Performance/Correctness Tradeoff}
    @p
      In the end, a lot of the semantic differences listed here come down to the same tradeoff: we could make the code behave more-like-Scala, but at a cost of adding overhead via function calls and other checks. Furthermore, the cost is paid regardless of whether the "exceptional case" is triggered or not: in the example above, every division in the program pays the cost!
    @p
      The decision to not support these exceptional cases comes down to a value judgement: how often do people actually depend on an exception being thrown as part of their program semantics, e.g. by catching it and performing actions? And how often are they just a way of indicating bugs? It turns out that very few @hl.scala{ArithmeticException}s, @hl.scala{ArrayIndexOutOfBoundsException}s, or similar are actually a necessary part of the program! They exist during debugging, but after that, these code paths are never relied upon "in production".
    @p
      Thus Scala.js goes for a compromise: in the Fast Optimization mode, we run the code with all these checks in place (this is work in progress; currently only @code{asInstanceOf}s are thus checked), so as to catch cases where these errors occur close-to-the-source and make it easy for you to debug them. In Full Optimization mode, on the other hand, we remove these checks, assuming you've already ran through these cases and found any bugs during development.
    @p
      This is a common pattern in situations where there's a tradeoff between debuggability and speed. In Scala.js' case, it allows us to get good debuggability in development, as well as good performance in production. There's some loss in debuggability in development, sacrificed in exchange for greater performance.

@sect{Small Executables}
  Why do we care so much about how big our executables are in Scala.js? Why don't we care about how big they are on Scala-JVM? This is mostly due to three reasons:

  @ul
    @li
      When cross-compiling Scala to Javascript, the end-result tends to be much more verbose than when cross-compiled to Java Bytecode.
    @li
      Scala.js typically is run in web browsers, which typically do not work well with large executables compared to e.g. the JVM
    @li
      Scala.js often is delivered to many users over the network, and long download times force users to wait, degrading the user experience

  @p
    These factors combined means that Scala.js has to put in extra effort to optimize the code to reduce it's size at compile-time.

  @sect{Raw Verbosity}
    @p
      Scala.js compiles to Javascript source code, while Scala-JVM compiles to Java bytecode. Java bytecode is a binary format and thus somewhat optimized for size, while Javascript is textual and is designed to be easy to read and write by hand.
    @p
      What does these mean, concretely? This means that a symbol marking something, e.g. the start of a function, is often a single byte in Java bytecode. Even more, it may not have any delimiter at all, instead the meaning of the binary data being inferred from its position in the file! On the other hand, in Javascript, declaring a function takes a long-and-verbose @hl.javascript{function} keyword, which together with peripheral punctuation (@code{.}, @code{ = }, etc.) often adds up to tens of bytes to express a single idea.
    @p
      What does this mean concretely? This means that expressing the same meaning in Javascript usually takes more "raw code" than expressing the same meaning in Java bytecode. Even though Java bytecode is relatively verbose for a binary format, it still is significantly more concise the Javascript, and it shows: the Scala standard library weighs in at a cool 6mb on Scala-JVM, while it weighs 20mb on Scala.js.
    @p
      All things being equal, this would mean that Scala.js would have to work harder to keep down code-size than Scala-JVM would have to. Alas, not all other things are equal.

  @sect{Browsers Performance}
    @p
      Without any optimization, a naive compilation to Scala.js results in an executable (Including the standard library) weighing around 20mb. On the surface, this isn't a problem: runtimes like the JVM have no issue with loading 20mb of Java bytecode to execute; many large desktop applications weigh in the 100s of megabytes while still loading and executing fine.
    @p
      However, the web browser isn't a native execution environment; loading 20mb of Javascript is sufficient to heavily tax even the most modern web browsers such as Chrome and Firefox. Even though most of the code comprises class and method definitions that never have their contents executed, loading such a heavy load into e.g. Chrome makes it freeze for 5-10 seconds initially. Even after that, even after the code has all been parsed and isn't been actively executed, having all this Javascript makes the browser sluggish for up to a minute before the JIT compiler can speed things up.
    @p
      Overall, this means that you probably do not want to work with un-optimized Scala.js executables. Even for development, the slow load times and initial sluggishness make testing the results of your hard-work in the browser a frustrating experience. But that's not all...

  @sect{Deployment Size}
    @p
      Scala.js applications often run in the browser. Not just any browser, but the browsers of your users, who had come to your website or web-app to try and accomplish some task. This is in stark contrast the Scala-JVM applications, which most often run on servers: servers that you own and control, and can deploy code to at your leisure.

    @p
      When running code on your own servers in some data center, you often do not care how big the compiled code is: the Scala standard library is several (6-7) megabytes, which added to your own code and any third-party libraries you're using, may add up to tens of megabytes, maybe a hundred or two if it's a relatively large application. Even that pales in comparison to the size of the JVM, which weighs in the 100s of megabytes.
    @p
      Even so, you are deploying your code on an machine (virtual or real) which has several gigabytes of memory and 100s of gigabytes of disk space. Even if the size of the code makes deployment slower, you only deploy fresh code a handful of times a day at most, and the size of your executable typically does not worry you.
    @p
      Scala.js is different: it runs in the browsers of your users. Before it can run in their browser, it first has to be downloaded, probably over a connection that is much slower than the one used to deploy your code to your servers or data-center. It probably is downloaded thousands of times per day, and every user which downloads it must pay the cost of waiting for it to finish downloading before they can take any actions on your website.

    @p
      A typical website loads ~100kb-1mb of Javascript, and 1mb is on the heavy side. Most Javascript libraries weigh in on the order of 50-100kb. For Scala.js to be useful in the browser, it has to be able to compare favorably with these numbers.

  @hr

  @p
    Thus, while on Scala-JVM you typically have executables that (including dependencies) end up weighing 10s to 100s of megabytes, Scala.js has a much tighter budget. A hello world Scala.js application weighs in at around 100kb, and as you write more code and use more libraries (and parts of the standard library) this number rises to the 100s of kb. This isn't tiny, especially compared to the many small Javascript libraries out there, but it definitely is much smaller than what you'd be used to on the JVM.