Skip to content

Tokenise suffixes on all literals #19103

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Nov 20, 2014
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 12 additions & 4 deletions mk/grammar.mk
Original file line number Diff line number Diff line change
Expand Up @@ -30,17 +30,25 @@ endef
$(BG):
$(Q)mkdir -p $(BG)

$(BG)RustLexer.class: $(SG)RustLexer.g4
$(BG)RustLexer.class: $(BG) $(SG)RustLexer.g4
$(Q)$(CFG_ANTLR4) -o $(B)grammar $(SG)RustLexer.g4
$(Q)$(CFG_JAVAC) -d $(BG) $(BG)RustLexer.java

$(BG)verify: $(SG)verify.rs rustc-stage2-H-$(CFG_BUILD) $(LD)stamp.regex_macros $(LD)stamp.rustc
$(Q)$(RUSTC) -O --out-dir $(BG) -L $(L) $(SG)verify.rs
check-build-lexer-verifier: $(BG)verify

ifeq ($(NO_REBUILD),)
VERIFY_DEPS := rustc-stage2-H-$(CFG_BUILD) $(LD)stamp.regex_macros $(LD)stamp.rustc
else
VERIFY_DEPS :=
endif

$(BG)verify: $(BG) $(SG)verify.rs $(VERIFY_DEPS)
$(Q)$(RUSTC) --out-dir $(BG) -L $(L) $(SG)verify.rs

ifdef CFG_JAVAC
ifdef CFG_ANTLR4
ifdef CFG_GRUN
check-lexer: $(BG) $(BG)RustLexer.class $(BG)verify
check-lexer: $(BG) $(BG)RustLexer.class check-build-lexer-verifier
$(info Verifying libsyntax against the reference lexer ...)
$(Q)$(SG)check.sh $(S) "$(BG)" \
"$(CFG_GRUN)" "$(BG)verify" "$(BG)RustLexer.tokens"
Expand Down
2 changes: 1 addition & 1 deletion mk/tests.mk
Original file line number Diff line number Diff line change
Expand Up @@ -199,7 +199,7 @@ check-docs: cleantestlibs cleantmptestlogs check-stage2-docs

# Some less critical tests that are not prone to breakage.
# Not run as part of the normal test suite, but tested by bors on checkin.
check-secondary: check-build-compiletest check-lexer check-pretty
check-secondary: check-build-compiletest check-build-lexer-verifier check-lexer check-pretty

# check + check-secondary.
#
Expand Down
43 changes: 20 additions & 23 deletions src/doc/reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -216,9 +216,15 @@ rather than referring to it by name or some other evaluation rule. A literal is
a form of constant expression, so is evaluated (primarily) at compile time.

```{.ebnf .gram}
literal : string_lit | char_lit | byte_string_lit | byte_lit | num_lit ;
lit_suffix : ident;
literal : [ string_lit | char_lit | byte_string_lit | byte_lit | num_lit ] lit_suffix ?;
```

The optional suffix is only used for certain numeric literals, but is
reserved for future extension, that is, the above gives the lexical
grammar, but a Rust parser will reject everything but the 12 special
cases mentioned in [Number literals](#number-literals) below.

#### Character and string literals

```{.ebnf .gram}
Expand Down Expand Up @@ -371,27 +377,20 @@ b"\\x52"; br"\x52"; // \x52
#### Number literals

```{.ebnf .gram}
num_lit : nonzero_dec [ dec_digit | '_' ] * num_suffix ?
| '0' [ [ dec_digit | '_' ] * num_suffix ?
| 'b' [ '1' | '0' | '_' ] + int_suffix ?
| 'o' [ oct_digit | '_' ] + int_suffix ?
| 'x' [ hex_digit | '_' ] + int_suffix ? ] ;

num_suffix : int_suffix | float_suffix ;
num_lit : nonzero_dec [ dec_digit | '_' ] * float_suffix ?
| '0' [ [ dec_digit | '_' ] * float_suffix ?
| 'b' [ '1' | '0' | '_' ] +
| 'o' [ oct_digit | '_' ] +
| 'x' [ hex_digit | '_' ] + ] ;

int_suffix : 'u' int_suffix_size ?
| 'i' int_suffix_size ? ;
int_suffix_size : [ '8' | "16" | "32" | "64" ] ;
float_suffix : [ exponent | '.' dec_lit exponent ? ] ? ;

float_suffix : [ exponent | '.' dec_lit exponent ? ] ? float_suffix_ty ? ;
float_suffix_ty : 'f' [ "32" | "64" ] ;
exponent : ['E' | 'e'] ['-' | '+' ] ? dec_lit ;
dec_lit : [ dec_digit | '_' ] + ;
```

A _number literal_ is either an _integer literal_ or a _floating-point
literal_. The grammar for recognizing the two kinds of literals is mixed, as
they are differentiated by suffixes.
literal_. The grammar for recognizing the two kinds of literals is mixed.

##### Integer literals

Expand All @@ -406,9 +405,9 @@ An _integer literal_ has one of four forms:
* A _binary literal_ starts with the character sequence `U+0030` `U+0062`
(`0b`) and continues as any mixture of binary digits and underscores.

An integer literal may be followed (immediately, without any spaces) by an
_integer suffix_, which changes the type of the literal. There are two kinds of
integer literal suffix:
Like any literal, an integer literal may be followed (immediately,
without any spaces) by an _integer suffix_, which forcibly sets the
type of the literal. There are 10 valid values for an integer suffix:

* The `i` and `u` suffixes give the literal type `int` or `uint`,
respectively.
Expand Down Expand Up @@ -443,11 +442,9 @@ A _floating-point literal_ has one of two forms:
* A single _decimal literal_ followed by an _exponent_.

By default, a floating-point literal has a generic type, and, like integer
literals, the type must be uniquely determined from the context. A
floating-point literal may be followed (immediately, without any spaces) by a
_floating-point suffix_, which changes the type of the literal. There are two
floating-point suffixes: `f32`, and `f64` (the 32-bit and 64-bit floating point
types).
literals, the type must be uniquely determined from the context. There are two valid
_floating-point suffixes_, `f32` and `f64` (the 32-bit and 64-bit floating point
types), which explicitly determine the type of the literal.

Examples of floating-point literals of various forms:

Expand Down
44 changes: 15 additions & 29 deletions src/grammar/RustLexer.g4
Original file line number Diff line number Diff line change
Expand Up @@ -92,49 +92,35 @@ fragment CHAR_ESCAPE
| 'U' HEXIT HEXIT HEXIT HEXIT HEXIT HEXIT HEXIT HEXIT
;

LIT_CHAR
: '\'' ( '\\' CHAR_ESCAPE | ~[\\'\n\t\r] ) '\''
fragment SUFFIX
: IDENT
;

LIT_BYTE
: 'b\'' ( '\\' ( [xX] HEXIT HEXIT | [nrt\\'"0] ) | ~[\\'\n\t\r] ) '\''
LIT_CHAR
: '\'' ( '\\' CHAR_ESCAPE | ~[\\'\n\t\r] ) '\'' SUFFIX?
;

fragment INT_SUFFIX
: 'i'
| 'i8'
| 'i16'
| 'i32'
| 'i64'
| 'u'
| 'u8'
| 'u16'
| 'u32'
| 'u64'
LIT_BYTE
: 'b\'' ( '\\' ( [xX] HEXIT HEXIT | [nrt\\'"0] ) | ~[\\'\n\t\r] ) '\'' SUFFIX?
;

LIT_INTEGER
: [0-9][0-9_]* INT_SUFFIX?
| '0b' [01][01_]* INT_SUFFIX?
| '0o' [0-7][0-7_]* INT_SUFFIX?
| '0x' [0-9a-fA-F][0-9a-fA-F_]* INT_SUFFIX?
;

fragment FLOAT_SUFFIX
: 'f32'
| 'f64'
: [0-9][0-9_]* SUFFIX?
| '0b' [01][01_]* SUFFIX?
| '0o' [0-7][0-7_]* SUFFIX?
| '0x' [0-9a-fA-F][0-9a-fA-F_]* SUFFIX?
;

LIT_FLOAT
: [0-9][0-9_]* ('.' | ('.' [0-9][0-9_]*)? ([eE] [-+]? [0-9][0-9_]*)? FLOAT_SUFFIX?)
: [0-9][0-9_]* ('.' | ('.' [0-9][0-9_]*)? ([eE] [-+]? [0-9][0-9_]*)? SUFFIX?)
;

LIT_STR
: '"' ('\\\n' | '\\\r\n' | '\\' CHAR_ESCAPE | .)*? '"'
: '"' ('\\\n' | '\\\r\n' | '\\' CHAR_ESCAPE | .)*? '"' SUFFIX?
;

LIT_BINARY : 'b' LIT_STR ;
LIT_BINARY_RAW : 'rb' LIT_STR_RAW ;
LIT_BINARY : 'b' LIT_STR SUFFIX?;
LIT_BINARY_RAW : 'rb' LIT_STR_RAW SUFFIX?;

/* this is a bit messy */

Expand All @@ -148,7 +134,7 @@ fragment LIT_STR_RAW_INNER2
;

LIT_STR_RAW
: 'r' LIT_STR_RAW_INNER
: 'r' LIT_STR_RAW_INNER SUFFIX?
;

IDENT : XID_start XID_continue* ;
Expand Down
Loading