diff options
Diffstat (limited to 'docs/tasks.md')
-rw-r--r-- | docs/tasks.md | 310 |
1 files changed, 310 insertions, 0 deletions
diff --git a/docs/tasks.md b/docs/tasks.md new file mode 100644 index 00000000..ef9270bb --- /dev/null +++ b/docs/tasks.md @@ -0,0 +1,310 @@ +One of Mill's core abstractions is it's *Task Graph*: this is how Mill defines, +orders and caches work it needs to do, and exists independently of any support +for building Scala. + +The following is a simple self-contained example using Mill to compile Java: + +```scala +import ammonite.ops._, mill._ + +def sourceRootPath = pwd / 'src +def resourceRootPath = pwd / 'resources + +def sourceRoot = T.source{ sourceRootPath } + +def resourceRoot = T.source{ resourceRootPath } + +def allSources = T{ ls.rec(sourceRoot().path).map(PathRef(_)) } + +def classFiles = T{ + mkdir(T.ctx().dest) + import ammonite.ops._ + %("javac", sources().map(_.path.toString()), "-d", T.ctx().dest)(wd = T.ctx().dest) + PathRef(T.ctx().dest) +} + +def jar = T{ mill.modules.Jvm.createJar(Agg(resourceRoot().path, classFiles().path)) } + +def run(mainClsName: String) = T.command{ + %%('java, "-cp", classFiles().path, mainClsName) +} +``` + +Here, we have two `T.source`s, `sourceRoot` and `resourceRoot`, which act as the +roots of our task graph. `allSources` depends on `sourceRoot` by calling +`sourceRoot()` to extract it's value, `classFiles` depends on `allSources` the +same way, and `jar` depends on both `classFiles` and `resourceRoot`. + +Filesystem o1perations in Mill are done using the +[Ammonite-Ops](http://ammonite.io/#Ammonite-Ops) library. + +The above build defines the following task graph: + +``` +sourceRoot -> allSources -> classFiles + | + v + resourceRoot ----> jar +``` + +When you first evaluate `jar` (e.g. via `mill jar` at the command line), it will +evaluate all the defined targets: `sourceRoot`, `allSources`, `classFiles`, +`resourceRoot` and `jar`. + +Subsequent `mill jars` will evaluate only as much as is necessary, depending on +what input sources changed: + +- If the files in `sourceRoot` change, it will re-evaluate `allSources`, + compiling to `classFiles`, and building the `jar` + +- If the files in `resourceRoot` change, it will only re-evaluate `jar` and use + the cached output of `allSources` and `classFiles` + +Apart from the `foo()` call-sites which define what each targets depend on, the +code within each `T{...}` wrapper is arbirary Scala code that can compute an +arbitrary result from it's inputs. + +## Different Kinds of Tasks + +There are four primary kinds of *Tasks* that you should care about: + +- [Targets](#targets), defined using `T{...}` +- [Sources](#sources), defined using `T.source{...}` +- [Commands](#commands), defined using `T.command{...}` + +### Targets + +```scala +def allSources = T{ ls.rec(sourceRoot().path).map(PathRef(_)) } +``` + +`Target`s are defined using the `def foo = T{...}` syntax, and dependencies on +other targets are defined using `foo()` to extract the value from them. Apart +from the `foo()` calls, the `T{...}` block contains arbitrary code that does +some work and returns a result. + +Each target e.g. `classFiles` is assigned a path on disk as scratch space & to +store it's output files at `out/classFiles/dest/`, and it's returned metadata is +automatically JSON-serialized and stored at `out/classFiles/meta.json`. The +return-value of targets has to be JSON-serializable via +[uPickle](https://github.com/lihaoyi/upickle). + +If you want to return a file or a set of files as the result of a `Target`, +write them to disk within your `T.ctx().dest` available through the +[Task Context API](#task-context-api) and return a `PathRef` to the files you +wrote. + +If a target's inputs change but it's output does not, e.g. someone changes a +comment within the source files that doesn't affect the classfiles, then +downstream targets do not re-evaluate. This is determined using the `.hashCode` +of the Target's return value. For target's returning `ammonite.ops.Path`s that +reference files on disk, you can wrap the `Path` in a `PathRef` (shown above) +whose `.hashCode()` will include the hashes of all files on disk at time of +creation. + +The graph of inter-dependent targets is evaluated in topological order; that +means that the body of a target will not even begin to evaluate if one of it's +upstream dependencies has failed. This is unlike normal Scala functions: a plain +old function `foo` would evaluate halfway and then blow up if one of `foo`'s +dependencies throws an exception. + +Targets cannot take parameters and must be 0-argument `def`s defined directly +within a `Module` body + +### Sources + +```scala +def sourceRootPath = pwd / 'src + +def sourceRoot = T.source{ sourceRootPath } +``` + +`Source`s are defined using `T.source{ ... }`, taking an `ammonite.ops.Path` as +an input. A `Source` is a subclass of `Target[PathRef]`: this means that it's +build signature/`hashCode` depends not just on the path it refers to (e.g. +`foo/bar/baz`) but also the MD5 hash of the filesystem tree under that path. + +### Commands + +```scala +def run(mainClsName: String) = T.command{ + %%('java, "-cp", classFiles().path, mainClsName) +} +``` + +Defined using `T.command{ ... }` syntax, `Command`s can run arbitrary code, with +dependencies declared using the same `foo()` syntax (e.g. `classFiles()` above). +Commands can be parametrized, but their output is not cached, so they will +re-evaluate every time even if none of their inputs have changed. + +Like [Targets](#targets), a command only evaluates after all it's upstream +dependencies have completed, and will not begin to run if any upstream +dependency has failed. + +Commands are assigned the same scratch/output directory `out/run/dest/` as +Targets are, and it's returned metadata stored at the same `out/run/meta.json` +path for consumption by external tools. + +Commands can only be defined directly within a `Module` body. + +## Task Context API + +There are several APIs available to you within the body of a `T{...}` or +`T.command{...}` block to help your write the code implementing your Target or +Command: + +### mill.util.Ctx.DefCtx + +- `T.ctx().dest` +- `implicitly[mill.util.Ctx.DefCtx]` + +This is the unique `out/classFiles/dest/` path or `out/run/dest/` path that is +assigned to every Target or Command. It is cleared before your task runs, and +you can use it as a scratch space for temporary files or a place to put returned +artifacts. This is guaranteed to be unique for every `Target` or `Command`, so +you can be sure that you will not collide or interfere with anyone else writing +to those same paths. + +### mill.util.Ctx.LogCtx + +- `T.ctx().log` +- `implicitly[mill.util.Ctx.LogCtx]` + +This is the default logger provided for every task. While your task is running, +`System.out` and `System.in` are also redirected to this logger. The logs for a +task are streamed to standard out/error as you would expect, but each task's +specific output is also streamed to a log file on disk e.g. `out/run/log` or +`out/classFiles/log` for you to inspect later. + +## Other Tasks + +- [Anonymous Tasks](#anonymous-tasks), defined using `T.task{...}` +- [Persistent Targets](#persistent-targets) +- [Inputs](#inputs) +- [Workers](#workers) + + +### Anonymous Tasks + +```scala +def foo(x: Int) = T.task{ ... x ... bar() ... } +``` + +You can define anonymous tasks using the `T.task{ ... }` syntax. These are not +runnable from the command-line, but can be used to share common code you find +yourself repeating in `Target`s and `Command`s. + +```scala +def downstreamTarget = T{ ... foo() ... } +def downstreamCommand = T.command{ ... foo() ... } +``` +Anonymous tasks's output does not need to be JSON-serializable, their output is +not cached, and they can be defined with or without arguments. Unlike +[Targets](#targets) or [Commands](#commands), anonymous tasks can be defined +anywhere and passed around any way you want, until you finally make use of them +within a downstream target or command. + +While an anonymous task `foo`'s own output is not cached, if it is used in a +downstream target `bar` and the upstream targets's `baz` `qux` haven't changed, +`bar`'s cached output will be used and `foo`'s evaluation will be skipped +altogether. + +### Persistent Targets +```scala +def foo = T.persistent{ ... } +``` + +Identical to [Targets](#targets), except that the `dest/` directory is not +cleared in between runs. + +This is useful if you are running external incremental-compilers, such as +Scala's [Zinc](https://github.com/sbt/zinc), Javascript's +[WebPack](https://webpack.js.org/), which rely on filesystem caches to speed up +incremental execution of their particular build step. + +Since Mill no longer forces a "clean slate" re-evaluation of `T.persistent` +targets, it is up to you to ensure your code (or the third-party incremental +compilers you rely on!) are deterministic. They should always converge to the +same outputs for a given set of inputs, regardless of what builds and what +filesystem states existed before. + +### Inputs + +```scala +def foo = T.input{ ... } +``` + +A generalization of [Sources](#sources), `T.input`s are tasks that re-evaluate +*every time* (Unlike [Anonymous Tasks](#anonymous-tasks)), containing an +arbitrary block of code. + +Inputs can be used to force re-evaluation of some external property that may +affect your build. For example, if I have a [Target](#targets) `bar` that makes +use of the current git version: + +```scala +def bar = T{ ... %%("git", "rev-parse", "HEAD").out.string ... } +``` + +`bar` will not know that `git rev-parse` can change, and will +not know to re-evaluate when your `git rev-parse HEAD` *does* change. This means +`bar` will continue to use any previously cached value, and `bar`'s output will +be out of date! + +To fix this, you can wrap your `git rev-parse HEAD` in a `T.input`: + +```scala +def foo = T.input{ %%("git", "rev-parse", "HEAD").out.string } +def bar = T{ ... foo() ... } +``` + +This makes `foo` will always re-evaluate every build; if `git rev-parse HEAD` +does not change, that will not invalidate `bar`'s caches. But if `git rev-parse +HEAD` *does* change, `foo`'s output will change and `bar` will be correctly +invalidated and re-compute using the new version of `foo`. + +Note that because `T.input`s re-evaluate every time, you should ensure that the +code you put in `T.input` runs quickly. Ideally it should just be a simple check +"did anything change?" and any heavy-lifting can be delegated to downstream +targets. + +### Workers + +```scala +def foo = T.worker{ ... } +``` + +Most tasks dispose of their in-memory return-value every evaluation; in the case +of [Targets](#targets), this is stored on disk and loaded next time if +necessary, while [Commands](#commands) just re-compute them each time. Even if +you use `--watch` or the Build REPL to keep the Mill process running, all this +state is still discarded and re-built every evaluation. + +Workers are unique in that they store their in-memory return-value between +evaluations. This makes them useful for storing in-memory caches or references +to long-lived external worker processes that you can re-use. + +Mill uses workers to managed long-lived instances of the +[Zinc Incremental Scala Compiler](https://github.com/sbt/zinc) and the +[Scala.js Optimizer](https://github.com/scala-js/scala-js). This lets us keep +them in-memory with warm caches and fast incremental execution. + +Like [Persistent Targets](#persistent-targets), Workers inherently involve +mutable state, and it is up to the implementation to ensure that this mutable +state is only used for caching/performance and does not affect the +externally-visible behavior of the worker. + +## Cheat Sheet + +The following table might help you make sense of the small collection of +different Task types: + +| | Target | Command | Source/Input | Anonymous Task | Persistent Target | Worker | +|:-------------------------------|:-------|:--------|:-------------|:---------------|:------------------|:-------| +| Cached on Disk | X | X | | | X | | +| Must be JSON Writable | X | X | | | X | | +| Must be JSON Readable | X | | | | X | | +| Runnable from the Command Line | X | X | | | X | | +| Can Take Arguments | | X | | X | | | +| Cached between Evaluations | | | | | | X | + |