README
Parser builder
Synopsis
Writing parsers is quite a common but sometimes lengthy task. To ease this process atok-parser leverages the atok tokenizer and performs the basic steps to set up a streaming parser, such as:
- Automatically instantiate a tokenizer with provided options
- Provide a mechanism to locate an error in the input data
track([Boolean]): keep track of the line and column positions to be used when building errors. Note that when set, tracking incurs a performance penalty.
- Proxy basic node.js streaming methods:
write(),end(),pause()andresume() - Proxy basic node.js streaming events (note that [data] and [end] are not automatically proxied) and some of atok
- [drain]
- [debug]
- Provide preset variables within the parser constructor
- atok {Object}: atok tokenizer instance
- self {Object}: this
- Provide helpers that simplify parsing rules (see below for description)
whitespace()number()float()word()string()utf8()chunk()stringList()match()noop()wait()
Download
It is published on node package manager (npm). To install, do:
npm install atok-parser
Usage
A silly example to illustrate the various pre defined variables and parser definition. It parses a flot number and returns the value via its #parse method.
function myParser (options) {
function handler (num) {
// The options are set from the myParser function parameters
// self is already set to the Parser instance
if ( options.check && !isFinite(num) )
return self.emit('error', new Error('Invalid float: ' + num))
self.emit('data', num)
}
// the float() and whitespace() helpers are provided by atok-parser
atok.float(handler)
atok.whitespace()
}
var Parser = require('..').createParser(myParser)
// Add the #parse() method to the Parser
Parser.prototype.parse = function (data) {
var res
// One (silly) way to make parse() look synchronous...
this.once('data', function (data) {
res = data
})
this.write(data)
// ...write() is synchronous
return res
}
// Instantiate a parser
var p = new Parser({ check: true })
// Parse a valid float
var validfloat = p.parse('123.456 ')
console.log('parsed data is of type', typeof validfloat, 'value', validfloat)
// The following data will produce an invalid float and an error
p.on('error', console.error)
var invalidfloat = p.parse('123.456e1234 ')
Methods
createParserFromFile(file[, parserOptions, parserEvents, atokOptions]): return a parser class (Function) based on the input file.- file {String}: file to read the parser from (.js extension is optional)
- parserOptions {String}: coma separated list of parser options
- parserEvents {Object}: events emitted by the parser with their arguments count
- atokOptions {Object}: tokenizer options
The following variables are made available to the parser javascript code:
atok {_Object_}: atok tokenizer instanciated with provided options. Also set as this.atok DO NOT DELETEself {_Object_}: reference to this
Predefined methods:
write(data)end([data])pause()resume()debug([logger (_Function_)])track(flag (_Boolean_))
Events automatically forwarded from tokenizer to parser:
draindebug
createParser(data[, parserOptions, parserEvents, atokOptions]): same ascreateParserFromFile()but with supplied content instead of a file name- data {String | Array | Function}: the content to be used, can also be an array of strings or a function. If a function, its parameters are used as parser options unless parserOptions is set
Helpers
Helpers are a set of standard Atok rules organized to match a specific type of data. If the data is encountered, the handler is fired with the results. If not, the rule is ignored. The behaviour of a single helper is the same as a single Atok rule:
- go to the next rule if no match, unless
continue(jump, jumpOnFail)was applied to the helper - go back to the first rule of the rule set upon match, unless
continue(jump)was applied to the helper - next rule set can be set using
next(ruleSetId) - rules can be jumped around by using
continue(jump, jumpOnFail). A helper has exactly the size of a single rule, which greatly helps defining complex rules.
// Parse a whitespace separated list of floats
var myParser = [
'atok.float(function (n) { self.emit("data", n) })'
, 'atok.continue(-1, -2)'
, 'atok.whitespace()'
]
var Parser = require('atok-parser').createParser(myParser)
var p = new Parser
p.on('data', function (num) {
console.log(typeof num, num)
})
p.end('0.133 0.255')
Arguments are not required. If no handler is specified, the [data] event will be emitted with the corresponding data.
whitespace(handler): ignore consecutive spaces, tabs, line breaks.handler(whitespace)
number(handler): process positive integershandler(num)
float(handler): process float numbers. NB. the result can be an invalid float (NaN or Infinity).handler(floatNumber)
word(handler): process a word containing letters, digits and underscoreshandler(word)
string([start, end, esc,] handler): process a delimited string. If end is not supplied, it is set to start.- start {String}: starting pattern (default=")
- end {String}: ending pattern (default=")
- esc {String}: escape character (default=)
handler(string)
utf8([start, end,] handler): process a delimited string containing UTF-8 encoded characters. If end is not supplied, it is set to start.- start {String}: starting pattern (default=")
- end {String}: ending pattern (default=")
handler(UTF-8String)
chunk(charSet, handler):- charSet {Object}: object defining the charsets to be used as matching characters e.g. { start: 'aA', end 'zZ' } matches all letters
handler(chunk)
stringList([start, end, separator,] handler): process a delimited list of strings- start {String}: starting pattern (default=()
- end {String}: ending pattern (default=))
- separator {String}: separator character (default=,)
handler(listOfStrings)
match(start, end, stringQuotes, handler): find a matching pattern (e.g. bracket matching), skipping string content if required- start {String}: starting pattern to look for
- end {String}: ending pattern to look for
- stringQuotes {Array}: array of string delimiters (default=['"', "'"]). Use an empty array to disable string content processing
handler(token)
noop(next): passthrough - does not do anything except applying given properties (useful to branch rules without having to useatok#saveRuleSet()andatok#loadRuleSet())- next {String}: next ruleset to load
wait(atokPattern[...atokPattern], handler): wait for the given pattern. Nothing happens until data is received that triggers the pattern. Must be preceded bycontinue()to properly work. Typical usage is when expecting a string the starting quote is received but not the end... so wait until then and resume the rules workflow.nvp([nameCharSet, separator, endPattern] handler): parse a named value pair (default nameCharSet={ start: 'aA0_', end: 'zZ9_' }, separator==, endPattern={ firstOf: ' \t\n\r' }). Disable endPattern by setting it to '' or [].handler(name, value)
Examples
A set of examples are located under the examples/ directory.