CSE130: Programming Languages Principles & Paradigms

A Scheme Interpreter

Part One: A Scanner for Scheme Tokens

Complete the implementation of a Lisp-like scanner, using the provided Java code. The grammar for tokens is a subset of the tokens specified in the Revised^6 Report on the Algorithmic Language Scheme, Section 4.2 (Lexical Syntax).

The subset only allows for identifiers, booleans, characters, strings, numbers (only decimal integers), parenthesis, square brackets, a single quote, and the dot.

Lexeme ::= Identifier | Boolean | Number | Character | String | Punctuation

Identifier ::= Initial {Subsequent} | PeculiarIdentifier
Initial ::= Constituent | SpecialInitial
Constituent ::= Letter
Letter ::= a | b | c | ⋯ | z | A | B | C | ⋯ | Z
SpecialInitial ::= ! | $ | % | & | * | / | : | < | = | > | ? | ^ | _ | ~
Subsequent ::= Initial | Digit | SpecialSubsequent
Digit ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
SpecialSubsequent ::= + | - | . | @
PeculiarIdentifier ::= + | - | ... | ->{Subsequent}

Boolean ::= #t | #T | #f | #F

Number ::= Sign Uinteger10
Sign ::= «empty» | + | -
Uinteger10 ::= Digit {Digit}

Character ::= #\«any character»
            | #\CharacterName
CharacterName ::= nul | tab | newline | return | space

String ::= "{StringElement}"
StringElement ::= NonEscaped | \t | \n | \r | \" | \\
NonEscaped ::= «any character other than double-quote or slash»

Punctuation ::= ( | ) | [ | ] | ' | .

Whitespace and comments are not represented by lexemes. Your scanner should follow these rules:

Whitespace ::= «any t such that Character.isWhitespace(t) is true»
Comment ::= ; «all subsequent characters up to a line ending»

A line ending is when the result of Character.getType(t) is Character.LINE_SEPARATOR.

Details

Download schemeProjectStarter.zip and edit TokenScanner.java.