sax-super-stream

Transform stream implemented using SAX with hierarchical parsing

Usage no npm install needed!

<script type="module">
  import saxSuperStream from 'https://cdn.skypack.dev/sax-super-stream';
</script>

README

NPM version Build Status Dependency Status

sax-super-stream

Transform stream converting XML into object by applying hierarchy of element parsers. It's implemented using sax parser, which allows it to process large XML files in a memory efficient manner. It's very flexible: by configuring element parsers only for those elements, from which you need to extract data, you can avoid creating an intermediary representation of the entire XML structure.

Install

$ npm install --save sax-super-stream

Usage

Example below shows how to print the titles of the articles from RSS feed.

var getlet = require('getlet');
var stream = require('sax-super-stream');

var PARSERS = {
  'rss': {
    'channel': {
      'item': {
        $: stream.object,
        'title': {
          $text: function(text, o) { o.title = text; }
        }
      }
    }
  }
};

getlet('http://blog.npmjs.org/rss')
  .pipe(stream(PARSERS))
  .on('data', function(item) {
    console.log(item.title);
  });

More examples can be found in Furkot GPX and KML importers.

API

stream(parserConfig[, options])

Create transform stream that reads XML and writes objects

  • parserConfig - contains hierarchical configuration of element parsers, each entry correspondes to the XML element tree, each value describes the action performed when an element is encountered during XML parsing

  • options - optional set of options passed to sax parser - defaults are as follows

    • trim - true
    • normalize - true
    • lowercase - false
    • xmlns - true
    • position - false
    • strictEntities - true
    • noscript - true

parserConfig

parserConfig is a hierarchical object that contains references to either parse functions or other parseConfig objects

parse function - function(xmlnode, object, context)

  • xmlnode - sax node with attributes
  • object - contains reference to the currently constructed object if any
  • context - provided to be used by parser functions, it can be used to store intermediatry data

this is bound to current parsed object stack

parse config reference - object

each propery of the object represents a direct child element of the parsed node in XML hierachy, special $ is a self reference

'item': parseItemFunction

is the same as:

'item': {
  '

: parseItemFunction
}

special values

  • $after - function(object, context) - called when element tag is closed, element content is parsed
  • $text - function(text, object, context) - called when element content is encountered
  • $uri - string - if specified it should match element namespace, otherwise element will be ignored, if $uri is not specified namespaces are ignored

predefined parsers

There are several predefined parser functions that can be used in parser config:

  • object(name) - creates a new object and optionally assigns it to parent's name property
  • collection(name) - creates a new Array and optionally assigns it to parent's name property
  • appendToCollection(name) - create a new object and append to Array stored in parent's name property, create a new Array if it does not exist yet
  • assignTo(name) - assign value to the parent's property name

License

MIT © Damian Krzeminski