book/src/main/scalatex/book/indepth/CompilationPipeline.scalatex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210

@import BookData._

@p
  Scala.js is implemented as a compiler plugin in the Scala compiler. Despite this, the overall process looks very different from that of a normal Scala application. This is because Scala.js optimizes for the size of the compiled executable, which is something that Scala-JVM does not usually do.

@sect{Whole Program Optimizaton}
  @p
    At a first approximation, Scala.js achieves its tiny executables by using whole-program optimization. Scala-JVM, like Java, allows for separate compilation: this means that after compilation, you can combine your compiled code with code compiled separately, which can interact with the code you already compiled in an ad-hoc basis: code from both sides can call each others methods, instantiate each others classes, etc. without any limits.

  @p
    Even things like package-private do not help you: Java packages are separate-compile-able too, and multiple compilation runs can dump things in the same package! You may think that private members and methods may be some salvation, but the Java ecosystem typically relies heavily on reflection, which depends on the fact that these private things remain exactly as-they-are.

  @p
    Overall, this makes it difficult to do any meaningful optimization: you never know whether or not you can eliminate a class, method or field. Even if it's not used anywhere you can see, it could easily be used by some other code compiled separately, or accessed through reflection.

  @p
    With Scala.js, we have decided to forgo reflection, and forgo separate compilation, in exchange for smaller executables. This is made easier by the fact that the pure-Scala ecosystem makes little use of reflection overall. Thus, at the right before shipping your Scala.js app to your users, the Scala.js optimizer gathers up all your Scala.js code, determines which things are used and which are not, and eliminates all the un-used classes/methods/variables. This allows us to achieve a much smaller code size than is possible with reflection/separate-compilation support. Furthermore, because we forgo these two things, we can perform much more aggressive inlining and other compile-time optimizations than is possible with Scala-JVM, further reducing code size and improving performance.

  @p
    It's worth noting that such optimizations exist as an option on the JVM aswell: @lnk("Proguard", "http://proguard.sourceforge.net/") is a well known library for doing similar DCE/optimization for Java/Scala applications, and is extensively used in developing mobile applications which face similar "minimize-code-size" constraints that web-apps do. However, the bulk of Scala code which runs on the server does not use these tools.

@sect{How Compilation Works}
  @p
    The Scala.js compilation pipeline is roughly split into multiple stages:

  @ul
    @li
      @b{Initial Compilation}: @code{.scala} files to @code{.class} and @code{.sjsir} files
    @li
      @b{Fast Optimization}: @code{.sjsir} files to one smallish/fast @code{.js} file, or
    @li
      @b{Full Optimization}: @code{.sjsir} files to one smaller/faster @code{.js} file

  @p
    @code{.scala} files are the source code you're familiar with. @code{.class} files are the JVM-targetted artifacts which aren't used for actually producing @code{.js} files, but are kept around for pretty much everything else: the compiler uses them for separate compilation and macros, and tools such as @lnk.misc.IntelliJ or @lnk.misc.Eclipse use these files to provide IDE support for Scala.js code. @code{.js} files are the output Javascript, which we can execute in a web browser.
  @p
    @code{.sjsir} files are worth calling out: the name stands for "ScalaJS Intermediate Representation", and these files contain compiled code half-way between Scala and Javascript: most Scala features have by this point been replaced by their Java/Javascript equivalents, but it still contains Types (which have all been inferred) that can aid in analysis. Many Scala.js specific optimizations take place on this IR.

  @p
    Each stage has a purpose, and together the stages do bring benefits to offset their cost in complexity. The original compilation pipeline was much more simple:

  @ul
    @li
      @b{Compilation}: @code{.scala} files to @code{.js} files

  @p
    But produced far larger (20mb) and slower executables. This section will explore each stage and we'll learn what these stages do, starting with a small example program:

  @hl.scala
    def main() = {
      var x = 0
      while(x < 999){
        x = x + "2".toInt
      }
      println(x)
    }

  @sect{Compilation}
    @p
      As described earlier, the Scala.js compiler is implemented as a Scala compiler plugin, and lives in the main repository in @lnk("compiler/", "https://github.com/scala-js/scala-js/tree/master/compiler"). The bulk of the plugin runs after the @code{mixin} phase in the @lnk("Scala compilation pipeline", "http://stackoverflow.com/a/4528092/871202"). By this point:

    @ul
      @li
        Types and implicits have all been inferred
      @li
        Pattern-matches have been compiled to imperative code
      @li
        @hl.scala("@tailrec") functions have been translated to while-loops, @hl.scala{lazy val}s have been replaced by @hl.scala{var}s.
      @li
        @hl.scala{trait}s have been @lnk("replaced by interfaces and classes", "http://stackoverflow.com/a/2558317/871202")

    @p
      Overall, by the time the Scala.js compiler plugin takes action, most of the high-level features of the Scala language have already been removed. Compared to a hypothetical, alternative "from scratch" implementation, this approach has several advantages:

    @ul
      @li
        It helps ensure that the semantics of these features always, 100% match that of Scala-JVM
      @li
        It reduces the amount of implementation work required by re-using the existing compilation phases

    @p
      This first phase is mostly a translation from the Scala compiler's internal AST to the Scala.js Intermediate Representation, and does not contain very many interesting optimizations. At the end of the initial compilation, the Scala compiler with Scala.js plugin results in two sets of files:

    @ul
      @li
        The original @code{.class} files, @i{almost} as if they were compiled on the JVM, but not quite. They are sufficiently valid that the compiler can execute macros defined in them, but they should not be used to actually run.
      @li
        The @code{.sjsir} files, destined for further compilation in the Scala.js pipeline.

    @p
      The ASTs defined in the @code{.sjsir} files is at about the same level of abstraction as the @hl.scala{Tree}s that the Scala compiler is working with at this stage. However, the @hl.scala{Tree}s within the Scala compiler contain a lot of cruft related to the compiler internals, and are also not easily serializable. This phase cleans them up into a "purer" format, (defined in the @lnk("ir/", "https://github.com/scala-js/scala-js/blob/master/ir/src/main/scala/scala/scalajs/ir/Trees.scala") folder) which is also serializable.

    @p
      This is the only phase in the Scala.js compilation pipeline that separate compilation is possible: you can compile many different sets of Scala.js @code{.scala} files separately, only to combine them later. This is used e.g. for distributing Scala.js libraries as Maven Jars, which are compiled separately by library authors to be combined into a final executable later.

  @sect{Fast Optimization}
    @p
      Without optimizations, the actual JavaScript code emitted for the above snippet would look like this:
    @hl.javascript
      ScalaJS.c.Lexample_ScalaJSExample$.prototype.main__V = (function() {
        var x = 0;
        while ((x < 999)) {
          x = ((x + new ScalaJS.c.sci_StringOps().init___T(
            ScalaJS.m.s_Predef$().augmentString__T__T("2")).toInt__I()) | 0)
        };
        ScalaJS.m.s_Predef$().println__O__V(x)
      });
    @p
      This is a pretty straightforward translation from the intermediate reprensentation into vanilla JavaScript code:

    @ul
      @li
        Scala-style method @hl.scala{def}s become Javascript-style prototype-function-assignment
      @li
        Scala @hl.scala{val}s and @hl.scala{var}s become Javascript @hl.scala{var}s
      @li
        Scala @hl.scala{while}s become Javascript @hl.scala{while}s
      @li
        Implicits are materialized, hence all the @hl.scala{StringOps} and @hl.scala{augmentString} extensions are present in the output
      @li
        Classes and methods are fully-qualified, e.g. @hl.scala{println} becomes @hl.scala{Predef().println}
      @li
        Method names are qualified by their types, e.g. @hl.scala{__O__V} means that @hl.scala{println} takes @hl.scala{Object} and returns @hl.scala{void}

    @p
      This is an incomplete description of the translation, but it should give a good sense of how the translation from Scala to Javascript looks like. In general, the output is verbose but straightforward.

    @p
      In addition to this superficial translation, the optimizer does a number of things which are more subtle and vary from case to case. Without diving into too much detail, here are a few optimizations that are performed:

    @ul
      @li
        @b{Dead-code elimination}: entry-points to the program such as @hl.scala("@JSExport")ed methods/classes are kept, as are any methods/classes that these reference. All others are removed. This reduces the potentially 20mb of Javascript generated by a naive compilation to a more manageable 400kb-1mb for a typical application
      @li
        @b{Inlining}: under some circumstances, the optimizer inlines the implementation of methods at call sites. For example, it does so for all "small enough" methods. This typically reduces the code size by a small amount, but offers a several-times speedup of the generated code by inlining away much of the overhead from the abstractions (implicit-conversions, higher-order-functions, etc.) in Scala's standard library.
      @li
        @b{Constant-folding}: due to inlining and other optimizations, some variables that could have arbitrary are known to contain a constant. These variables are replaced by their respective constants, which, in turn, can trigger more optimizations.
      @li
        @b{Closure elimination}: probably one of the most important optimizations. When inlining a higher-order method such as @code{map}, the optimizer can in turn inline the anonymous function inside the body of the loop, effectively turning polymorphic dispatch with closures into bare-metal loops.
    @p
      Applying these optimizations on our examples results in the following JavaScript code instead, which is what you typically execute in fastOpt stage:

    @hl.javascript
      ScalaJS.c.Lexample_ScalaJSExample$.prototype.main__V = (function() {
        var x = 0;
        while ((x < 999)) {
          var jsx$1 = x;
          var this$2 = new ScalaJS.c.sci_StringOps().init___T("2");
          var this$4 = ScalaJS.m.jl_Integer$();
          var s = this$2.repr$1;
          x = ((jsx$1 + this$4.parseInt__T__I__I(s, 10)) | 0)
        };
        var x$1 = x;
        var this$6 = ScalaJS.m.s_Console$();
        var this$7 = this$6.outVar$2;
        ScalaJS.as.Ljava_io_PrintStream(this$7.tl$1.get__O()).println__O__V(x$1)
      });

    @p
      As a whole-program optimization, it tightly ties together the code it is compiling and does not let you e.g. inject additional classes later. This does not mean you cannot interact with external code at all: you can, but it has to go through explicitly @hl.scala{@@JSExport}ed methods and classes via Javascript Interop, and not on ad-hoc classes/methods within the module. Thus it's entirely possible to have multiple "whole-programs" running in the same browser; they just will likely have duplicate copies of e.g. standard library classes inside of them, since they cannot share the code as it's not exported.

    @p
      While the input for this phase is the aggregate @code{.sjsir} files from your project and all your dependencies, the output is executable Javascript. This phase usually runs in less than a second, outputs a Javascript blob in the 400kb-1mb range, and is suitable for repeated use during development. This corresponds to the @code{fastOptJS} command in SBT.

  @sect{Full Optimization}
    @hl.javascript
      Fd.prototype.main = function() {
        for(var a = 0;999 > a;) {
          var b = (new D).j("2");
          E();
          a = a + Ja(0, b.R) | 0
        }
        b = Xa(ed().pc.Sb);
        fd(b, gd(s(), a));
        fd(b, "\n");
      };

    @p
      The @lnk("Google Closure Compiler", "https://developers.google.com/closure/compiler/") (GCC) is a set of tools that work with Javascript. It has multiple @lnk("levels of optimization", "https://developers.google.com/closure/compiler/docs/compilation_levels"), doing everything from basic whitespace-removal to heavy optimization. It is an old, relatively mature project that is relied on both inside and outside Google to optimize the delivery of Javascript to the browser.

    @p
      Scala.js uses GCC in its most aggressive mode: @lnk("Advanced Optimization", "https://developers.google.com/closure/compiler/docs/api-tutorial3"). GCC spits out a compressed, minified version of the Javascript (above) that @sect.ref{Fast Optimization} spits out: e.g. in the example above, all identifiers have been renamed to short strings, the @hl.javascript{while}-loop has been replaced by a @hl.javascript{for}-loop, and the @hl.scala{println} function has been inlined.

    @p
      As described in the linked documentation, GCC performs optimizations such as:

    @ul
      @li
        Whitespace removal
      @li
        Variable and property renaming
      @li
        Dead code elimination
      @li
        Inlining

    @p
      Notably, GCC @i{does not preserve the semantics of arbitrary Javascript}! In particular, it only works for a subset of Javascript that it understands and can properly analyze. This is an issue when hand-writing Javascript for GCC since it's very easy to step outside that subset and have GCC break your code, but is not a worry when using Scala.js: the Scala.js optimizer (the previous phase in the pipeline) automatically outputs Javascript which GCC understands and can work with.
    @p
      There is some overlap between the optimizations performed by the Scala.js optimizer and GCC. For example, both apply DCE and inlining in some form. However, there are also a lot of optimizations specific to each tool. In general, the Scala.js optimizer is more concerned about producing very efficient JavaScript code, while GCC shines at making that JavaScript as small as possible (in terms of the number of characters).
    @p
      The combination of both these tools produces small and fast output blobs: ~100-400kb. This takes 5-10 seconds to run, which makes it somewhat slow for iterative development, so it's typically only run right before final testing and deployment. This corresponds to the @code{fullOptJS} command in SBT.

@hr

@p
  This hopefully has given a good overview of how the Scala.js compilation pipeline works. The pipeline and optimizer is a work-in-progress, and is changing all the time in an attempt to achieve ever-smaller executables and ever-faster code.

@p
  This whole chapter has been focused on the @i{what} but not the @i{why}. The chapter on @sect.ref{Scala.js' Design Space} contains a section which talks about @sect.ref("Small Executables", "why we care so much about small executables").