README
![build status](https://travis-ci.org/michaelherndon/ux-lexer.png)
ux-lexer
version: 0.10.0-alpha1
Extensible Lexer for JavaScript without requiring regular expressions.
This library is meant to provide a foundation for creating custom lexers written in JavaScript in order to provide lexical analysis which will produce tokens.
- Create rules
- Add the rules to a lexer in order of precedence.
- Parse the lexical syntax from a string.
Contents
Key Requirements
- A lexer that does not require regular expressions.
- Runs in multiple JavaScript environments: browsers, web workers, nodejs, windows rt.
- Keep the library at a bare minimum.
Rationale
Some problems need a better solution than using regular expressions to parse tokens from strings, especially HTML. Regular expressions can be ineffecient,hard to read, and gives you only so much control.
Builds
The builds for nodejs, browser, and amd will let you include files as needed. The builds will also include a util.js file that includes all the methods bundled in a single file. The build process creates 3 versions of the scripts, one for CommonJS, one for AMD, and one with closures for the browser.
The source files are included with the npm module inside the src folder. This is so that developers and script consumers may cheery pick the functionality he/she wants and include the files as neede for custom builds.
To build and test everything use:
$ grunt ci
There are two different versions of ux-lexer. lexer.js will require ux-util as a depdendency, while lexer-all.js will bundle only the methods needed from ux-util requiring zero external dependenies.
Browser Distribution
location: dist/browser
The browser distribution will use closures to wrap functionality and it uses the global variable of "ux.util". If you wish to use a method you can do the following:
<script type="/scripts/ux-util/equals.js"> </script>
var equals = self.ux.util.equals;
if(equals(left, right))
{
// do something
}
AMD Distribution
location: dist/amd[/lib]
The amd distribution has the main file in the root and the rest of the files are pushed into the lib folder. This is so that the same require statements will work with node and when using something like require js with a browser.
CommonJS Distribution
location: lib
The files are located inside of the lib folder.
API
Reader
Provides various methods to read, scan, and peek at various parts of an array like value/object.
back to top
constructor(Array|String enumerable)
Takes an array like source that can be iterated over and has a zero based index property accessor.
example
var reader = new Reader("text to reader");
var example = function() {
var argReader = new Reader(arguments);
};
current
Gets the current value or object for the current position.
back to top
limit
Gets the number of items in the ienumerable which is the fartherest index - 1 that the reader can move to.
back to top
position
Gets the current position/index that reader is currently pointing to for the enumerable.
data
Gets the enumerable data that the reader was given.
back to top
emptyValue
Gets or sets value that the reader knows to be the end of string or file. This defaults to null.
dispose()
Remove any references that the reader is holding on to. By default, the reader will dispose of the reference to the enumerable and methods that are created in the constructor.
example
var using = require("ux-util/lib/using");
using(new Reader("some text here!"), function(r){
}); // disposed will be called.
var reader = new Reader("some other text");
console.log(reader.peek(0));
reader.dispose();
next()
Returns the next value in the ienumerable. If the environment supports StopIteration, next() will throw it when done. Otherwise it will throw an Error with the message of "StopIteration".
example
var example = function(){
for(value of new Reader(arguments)) {
console.log(value);
}
}
example("one", "two", "three");
var reader = new Reader("function(arg1, arg2) {}");
try {
while(true) {
console.log(reader.next());
}
} catch(e) {
if(e.message !== "StopInteration" && typeof e !== "StopInteration") {
// log error or rethrow
}
}
nextValue()
Returns the next value or an empty value if the reader has reached the end of ienumerable.
example
var reader = new Reader("function(arg1, arg2) {}");
var current = null;
while((current = reader.nextValue()) !== reader.emptyValue) {
console.log(current);
}
peek(Number position)
Returns the value or object at the specified position if the position is less than the limit, otherwise it returns the emptyValue.
example
var reader = new Reader("ABCDEF");
console.log(reader.peek(3)); // D
console.log(reader.peek(9)); // null
peekAtNext()
Returns the value, object, or emptyValue for next position in the reader.
example
var reader = new Reader("ABCDED");
reader.next();
var c = reader.peekAtNext();
console.log(c); // B
reset()
Return the reader to the start position in order to be read the enumerable again.
example
var reader = new Reader("ABCDEF");
var current = null;
while((current = reader.nextValue())) {
console.log(current);
}
reader.reset();
while(current = reader.nextValue())) {
console.log("v2: " + current);
}
scan(Function|Object predicate, [Number position], [Number limit])
Looks for a section of the enumerable for values or objects that match the predicate and returns position of the match.
example
var reader = Reader("ABCDEFEDCBA");
var position = reader.scan("D");
console.log(position); // 3
position = reader.scan(function scan(c) {
var count = scan.count || 0;
if(c === 'D')
{
if(count === 1)
return true;
scan.count = 1;
}
return false;
});
position = reader.scan("D", 5);
console.log(position); // 7
slice(Number offset, Number limit)
Returns an array of values or objects that starts at the offset position up to the specified limit.
example
var reader = Reader("ABCDEFEDCBA");
var slice = reader.slice(1,2);
console.log(slice); // ["B","C"];
to(Number position)
Moves the reader to specified position.
example
var reader = Reader("ABCDEFEDCBA");
reader.to(3)
var next = reader.nextValue();
console.log(next); // "E"
LexerRule
Rules determine how characters are consumed and transformed into a token.
LexerRule Example
var IdentifierRule = LexerRule.extend({
tokenName: "IDENTIFIER",
value: null,
position: null;
match: function(character, reader) {
var alpha = Lexer.isLetter(character);
if(!alpha || (character !== "_" && character !== "