@stdlib/datasets-spam-assassin

Spam Assassin public mail corpus.

Usage no npm install needed!

<script type="module">
  import stdlibDatasetsSpamAssassin from 'https://cdn.skypack.dev/@stdlib/datasets-spam-assassin';
</script>

README

Spam Assassin

NPM version Build Status Coverage Status

The Spam Assassin public mail corpus.

Installation

npm install @stdlib/datasets-spam-assassin

Usage

var corpus = require( '@stdlib/datasets-spam-assassin' );

corpus()

Returns the Spam Assassin public mail corpus.

var data = corpus();
// returns [{...},{...},...]

Each array element has the following fields:

  • id: message id (relative to message group)
  • group: message group
  • checksum: object containing checksum info
  • text: message text (including headers)

The message group may be one of the following:

  • easy-ham-1: easier to detect non-spam e-mails (2500 messages)
  • easy-ham-2: easier to detect non-spam e-mails collected at a later date (1400 messages)
  • hard-ham-1: harder to detect non-spam e-mails (250 messages)
  • spam-1: spam e-mails (500 messages)
  • spam-2: spam e-mails collected at a later date (1396 messages)

The checksum object contains the following fields:

  • type: checksum type (e.g., MD5)
  • value: checksum value

Examples

var corpus = require( '@stdlib/datasets-spam-assassin' );

var data;
var i;

data = corpus();
for ( i = 0; i < data.length; i++ ) {
    console.log( 'Character Count: %d', data[ i ].text.length );
}

CLI

Installation

To use the module as a general utility, install the module globally

npm install -g @stdlib/datasets-spam-assassin

Usage

Usage: spam-assassin [options]

Options:

  -h,    --help                Print this message.
  -V,    --version             Print the package version.
         --format fmt          Output format: 'txt' or 'ndjson'.

Notes

  • The CLI supports two output formats: plain text (txt) and newline-delimited JSON (NDJSON). The default output format is txt.

Examples

$ spam-assassin

License

The data files (databases) are licensed under an Open Data Commons Public Domain Dedication & License 1.0 and their contents are licensed under Creative Commons Zero v1.0 Universal. The software is licensed under Apache License, Version 2.0.


Notice

This package is part of stdlib, a standard library for JavaScript and Node.js, with an emphasis on numerical and scientific computing. The library provides a collection of robust, high performance libraries for mathematics, statistics, streams, utilities, and more.

For more information on the project, filing bug reports and feature requests, and guidance on how to develop stdlib, see the main project repository.

Community

Chat


Copyright

Copyright © 2016-2022. The Stdlib Authors.