aboutsummaryrefslogtreecommitdiff
path: root/docs/structured-streaming-programming-guide.md
diff options
context:
space:
mode:
authorSeigneurin, Alexis (CONT) <Alexis.Seigneurin@capitalone.com>2016-09-01 09:32:05 +0100
committerSean Owen <sowen@cloudera.com>2016-09-01 09:32:05 +0100
commitdd859f95c0aaa0b7c8fbff0a5f108cf3c9bf520a (patch)
treed6df3fd9bab228fba912ed6ea5316f749d8956ef /docs/structured-streaming-programming-guide.md
parenta18c169fd050e71fdb07b153ae0fa5c410d8de27 (diff)
downloadspark-dd859f95c0aaa0b7c8fbff0a5f108cf3c9bf520a.tar.gz
spark-dd859f95c0aaa0b7c8fbff0a5f108cf3c9bf520a.tar.bz2
spark-dd859f95c0aaa0b7c8fbff0a5f108cf3c9bf520a.zip
fixed typos
fixed 2 typos Author: Seigneurin, Alexis (CONT) <Alexis.Seigneurin@capitalone.com> Closes #14877 from aseigneurin/fix-typo-2.
Diffstat (limited to 'docs/structured-streaming-programming-guide.md')
-rw-r--r--docs/structured-streaming-programming-guide.md4
1 files changed, 2 insertions, 2 deletions
diff --git a/docs/structured-streaming-programming-guide.md b/docs/structured-streaming-programming-guide.md
index cdc3975d7c..c7ed3b04bc 100644
--- a/docs/structured-streaming-programming-guide.md
+++ b/docs/structured-streaming-programming-guide.md
@@ -400,7 +400,7 @@ data, thus relieving the users from reasoning about it. As an example, let’s
see how this model handles event-time based processing and late arriving data.
## Handling Event-time and Late Data
-Event-time is the time embedded in the data itself. For many applications, you may want to operate on this event-time. For example, if you want to get the number of events generated by IoT devices every minute, then you probably want to use the time when the data was generated (that is, event-time in the data), rather than the time Spark receives them. This event-time is very naturally expressed in this model -- each event from the devices is a row in the table, and event-time is a column value in the row. This allows window-based aggregations (e.g. number of event every minute) to be just a special type of grouping and aggregation on the even-time column -- each time window is a group and each row can belong to multiple windows/groups. Therefore, such event-time-window-based aggregation queries can be defined consistently on both a static dataset (e.g. from collected device events logs) as well as on a data stream, making the life of the user much easier.
+Event-time is the time embedded in the data itself. For many applications, you may want to operate on this event-time. For example, if you want to get the number of events generated by IoT devices every minute, then you probably want to use the time when the data was generated (that is, event-time in the data), rather than the time Spark receives them. This event-time is very naturally expressed in this model -- each event from the devices is a row in the table, and event-time is a column value in the row. This allows window-based aggregations (e.g. number of events every minute) to be just a special type of grouping and aggregation on the even-time column -- each time window is a group and each row can belong to multiple windows/groups. Therefore, such event-time-window-based aggregation queries can be defined consistently on both a static dataset (e.g. from collected device events logs) as well as on a data stream, making the life of the user much easier.
Furthermore, this model naturally handles data that has arrived later than expected based on its event-time. Since Spark is updating the Result Table, it has full control over updating/cleaning up the aggregates when there is late data. While not yet implemented in Spark 2.0, event-time watermarking will be used to manage this data. These are explained later in more details in the [Window Operations](#window-operations-on-event-time) section.
@@ -535,7 +535,7 @@ ds.filter(_.signal > 10).map(_.device) // using typed APIs
df.groupBy("type").count() // using untyped API
// Running average signal for each device type
-Import org.apache.spark.sql.expressions.scalalang.typed._
+import org.apache.spark.sql.expressions.scalalang.typed._
ds.groupByKey(_.type).agg(typed.avg(_.signal)) // using typed API
{% endhighlight %}