README

address-normalizer

Javascript address normalizer for cuban address schema (tested on Havana data set). Creates structured addresses from raw address strings.

Installation

npm run install --save

or can be loaded from the unpkg cdn

Usage

create_address

The main function is create_address which takes as argument an address string and returns a normalized address

interface NormalizedAddress {
    number?: string;
    street?: StreetLike;
    between?: StreetLike[];
    corner?: StreetLike[];
    district?: string;
    municipality?: string;
}

interface StreetLike {
    name: string;
}

It uses a basic grammar parser following the common way to define an address in Cuba

examples:

import {create_address} from 'path/to/this/module';

create_address('Calle San Ignacio e/ Chacón y Empedrado, Havana Vieja')
// {"street":{"name":"calle san ignacio"},"between":[{"name":"chacón"},{"name":"empedrado"}],"municipality":"havana vieja"}
create_address('Calle 47 #3417 e/ 34 y 36, Playa');
// {"street":{"name":"calle 47"},"number":"3417","between":[{"name":"34"},{"name":"36"}],"municipality":"playa"}
create_address('Sta. Catalina #213 esq. Luz Caballero, Diez de Octubre')
// {"number":"213","corner":[{"name":"luz caballero"},{"name":"santa catalina"}],"municipality":"diez de octubre"}

Note that "municipios" must be separated from the rest thanks to a punctuation sign such as comma

Tokenizer

For further processing or as extension point you can directly use the tokenizer. The tokenize function takes a string an return an iterable iterator of tokens (to see the definition of tokens refers to the types declaration files)

import {tokenize} from 'path/to/this/module';

for (const t of tokenize('Calle San Ignacio e/ Chacón y Empedrado, Havana Vieja')){
    console.log(t);
}

/*
{ category: 2, value: 'calle' }
{ category: 0, value: ' ' }
{ category: 2, value: 'san' }
{ category: 0, value: ' ' }
{ category: 2, value: 'ignacio' }
{ category: 0, value: ' ' }
{ category: 3, value: 'e/' }
{ category: 0, value: ' ' }
{ category: 2, value: 'chacón' }
{ category: 0, value: ' ' }
{ category: 3, value: 'y' }
{ category: 0, value: ' ' }
{ category: 2, value: 'empedrado' }
{ category: 1, value: ',' }
{ category: 0, value: ' ' }
{ category: 2, value: 'havana' }
{ category: 0, value: ' ' }
{ category: 2, value: 'vieja' }
 */

Parser

You can also get the abstract tree of the very simplified grammar using the parse function

const ast = parse('Sta. Catalina #213 esq. Luz Caballero, Diez de Octubre');

/*
{
    'groups': [{
        'type': 4,
        'members': [
            {'type': 3, 'value': 'santa catalina'},
            {'type': 0, 'value': {'type': 3, 'value': '213'}},
            {'type': 1, 'value': [{'type': 3, 'value': 'luz caballero'}]}
        ]
    }, {
        'type': 4,
        'members': [{'type': 3, 'value': 'diez de octubre'}]
    }]
};
 */

@citykleta/habana-address-normalizer

Usage no npm install needed!