summaryrefslogtreecommitdiff
path: root/spec
diff options
context:
space:
mode:
authorSom Snytt <som.snytt@gmail.com>2015-06-29 07:57:33 -0700
committerSom Snytt <som.snytt@gmail.com>2015-06-29 07:57:33 -0700
commitab527ce8cc0220443bda5cc3337ebae158c2fe74 (patch)
tree415931bf3424c5a53c4aba346f7679cc9d33f0a7 /spec
parentaad7c67fe047c6ea9b40ff9588adf0b51dbcf57b (diff)
downloadscala-ab527ce8cc0220443bda5cc3337ebae158c2fe74.tar.gz
scala-ab527ce8cc0220443bda5cc3337ebae158c2fe74.tar.bz2
scala-ab527ce8cc0220443bda5cc3337ebae158c2fe74.zip
SI-6810 Spec reflects literal parsing literally
Emphasize that literal parsing accepts Unicode escapes as if they were escaped. In particular, a newline represented by its Unicode escape does not terminate the line in the middle of a literal.
Diffstat (limited to 'spec')
-rw-r--r--spec/01-lexical-syntax.md49
-rw-r--r--spec/13-syntax-summary.md5
2 files changed, 30 insertions, 24 deletions
diff --git a/spec/01-lexical-syntax.md b/spec/01-lexical-syntax.md
index e26cb796c8..06e3a458a4 100644
--- a/spec/01-lexical-syntax.md
+++ b/spec/01-lexical-syntax.md
@@ -398,40 +398,46 @@ members of type `Boolean`.
### Character Literals
```ebnf
-characterLiteral ::= ‘'’ (printableChar | charEscapeSeq) ‘'’
+characterLiteral ::= ‘'’ (charNoQuoteOrNewline | UnicodeEscape | charEscapeSeq) ‘'’
```
A character literal is a single character enclosed in quotes.
-The character is either a printable unicode character or is described
-by an [escape sequence](#escape-sequences).
+The character can be any Unicode character except the single quote
+delimiter or `\u000A` (LF) or `\u000D` (CR);
+or any Unicode character represented by either a
+[Unicode escape](01-lexical-syntax.html) or by an [escape sequence](#escape-sequences).
> ```scala
> 'a' '\u0041' '\n' '\t'
> ```
-Note that `'\u000A'` is _not_ a valid character literal because
-Unicode conversion is done before literal parsing and the Unicode
-character `\u000A` (line feed) is not a printable
-character. One can use instead the escape sequence `'\n'` or
-the octal escape `'\12'` ([see here](#escape-sequences)).
+Note that although Unicode conversion is done early during parsing,
+so that Unicode characters are generally equivalent to their escaped
+expansion in the source text, literal parsing accepts arbitrary
+Unicode escapes, including the character literal `'\u000A'`,
+which can also be written using the escape sequence `'\n'`.
### String Literals
```ebnf
stringLiteral ::= ‘"’ {stringElement} ‘"’
-stringElement ::= printableCharNoDoubleQuote | charEscapeSeq
+stringElement ::= charNoDoubleQuoteOrNewline | UnicodeEscape | charEscapeSeq
```
-A string literal is a sequence of characters in double quotes. The
-characters are either printable unicode character or are described by
-[escape sequences](#escape-sequences). If the string literal
-contains a double quote character, it must be escaped,
-i.e. `"\""`. The value of a string literal is an instance of
-class `String`.
+A string literal is a sequence of characters in double quotes.
+The characters can be any Unicode character except the double quote
+delimiter or `\u000A` (LF) or `\u000D` (CR);
+or any Unicode character represented by either a
+[Unicode escape](01-lexical-syntax.html) or by an [escape sequence](#escape-sequences).
+
+If the string literal contains a double quote character, it must be escaped using
+`"\""`.
+
+The value of a string literal is an instance of class `String`.
> ```scala
-> "Hello,\nWorld!"
-> "This string contains a \" character."
+> "Hello, world!\n"
+> "\"Hello,\" replied the world."
> ```
#### Multi-Line String Literals
@@ -443,11 +449,10 @@ multiLineChars ::= {[‘"’] [‘"’] charNoDoubleQuote} {‘"’}
A multi-line string literal is a sequence of characters enclosed in
triple quotes `""" ... """`. The sequence of characters is
-arbitrary, except that it may contain three or more consuctive quote characters
-only at the very end. Characters
-must not necessarily be printable; newlines or other
-control characters are also permitted. Unicode escapes work as everywhere else, but none
-of the escape sequences [here](#escape-sequences) are interpreted.
+arbitrary, except that it may contain three or more consecutive quote characters
+only at the very end. In particular, embedded newlines
+are permitted. Unicode escapes work as everywhere else, but none
+of the [escape sequences](#escape-sequences) are interpreted.
> ```scala
> """the present string
diff --git a/spec/13-syntax-summary.md b/spec/13-syntax-summary.md
index 7f73e107de..a4b4aae570 100644
--- a/spec/13-syntax-summary.md
+++ b/spec/13-syntax-summary.md
@@ -57,11 +57,12 @@ floatType ::= ‘F’ | ‘f’ | ‘D’ | ‘d’
booleanLiteral ::= ‘true’ | ‘false’
-characterLiteral ::= ‘'’ (printableChar | charEscapeSeq) ‘'’
+characterLiteral ::= ‘'’ (charNoQuoteOrNewline | UnicodeEscape | charEscapeSeq) ‘'’
stringLiteral ::= ‘"’ {stringElement} ‘"’
| ‘"""’ multiLineChars ‘"""’
-stringElement ::= (printableChar except ‘"’)
+stringElement ::= charNoDoubleQuoteOrNewline
+ | UnicodeEscape
| charEscapeSeq
multiLineChars ::= {[‘"’] [‘"’] charNoDoubleQuote} {‘"’}