h2o.js

Node.js bindings to H2O

Usage no npm install needed!

<script type="module">
  import h2oJs from 'https://cdn.skypack.dev/h2o.js';
</script>

README

H₂O.js

This Node.js / io.js module provides access to the H₂O JVM (and extensions thereof), its objects, its machine-learning algorithms, and modeling support (basic munging and feature generation) capabilities.

It is designed to bring H₂O to a wider audience of data and machine learning devotees that work exclusively with Javascript, for building machine learning applications or doing data munging in a fast, scalable environment without any extra mental anguish about threads and parallelism.

H₂O also supports R, Python, Scala and Java.

What is H₂O?

H₂O is a piece of Java software for data modeling and general computing. There are many different views of the H₂O software, but the primary view of H₂O is that of a distributed (many machines), parallel (many CPUs), in memory (several hundred GBs Xmx) processing engine.

There are two levels of parallelism:

within node
across (or between) node.

The goal, remember, is to "simply" add more processors to a given problem in order to produce a solution faster. The conceptual paradigm MapReduce (also known as "divide and conquer and combine") along with a good concurrent application structure (c.f. jsr166y and NonBlockingHashMap) enable this type of scaling in H₂O (we’re really cooking with gas now!).

For application developers and data scientists, the gritty details of thread-safety, algorithm parallelism, and node coherence on a network are concealed by simple-to-use REST calls that are all documented here. In addition, H₂O is an open-source project under the Apache v2 licence. All of the source code is on Github, there is an active Google Group mailing list, our nightly tests are open for perusal, our JIRA ticketing system is also open for public use. Last, but not least, we regularly engage the machine learning community all over the nation with a very busy meetup schedule (so if you’re not in The Valley, no sweat, we’re probably coming to you soon!), and finally, we host our very own H₂O World conference. We also sometimes host hack-a-thons at our campus in Mountain View, CA. Needless to say, there is a lot of support for the application developer.

In order to make the most out of H₂O, there are some key conceptual pieces that are helpful to know before getting started. Mainly, it’s helpful to know about the different types of objects that live in H₂O and what the rules of engagement are in the context of the REST API (which is what any non-JVM interface is all about).

Let’s get started!

The H₂O Object System

H₂O sports a distributed key-value store (the "DKV"), which contains pointers to the various objects that make up the H₂O ecosystem. The DKV is a kind of biosphere in that it encapsulates all shared objects (though, it may not encapsulate all objects). Some shared objects are mutable by the client; some shared objects are read-only by the client, but mutable by H₂O (e.g. a model being constructed will change over time); and actions by the client may have side-effects on other clients (multi-tenancy is not a supported model of use, but it is possible for multiple clients to attach to a single H₂O cloud).

Briefly, these objects are:

Key: A key is an entry in the DKV that maps to an object in H₂O.
Frame: A Frame is a collection of Vec objects. It is a 2D array of elements.
Vec: A Vec is a collection of Chunk objects. It is a 1D array of elements.
Chunk: A Chunk holds a fraction of the BigData. It is a 1D array of elements.
ModelMetrics: A collection of metrics for a given category of model.
Model: A model is an immutable object having predict and metrics methods.
Job: A Job is a non-blocking task that performs a finite amount of work.

Many of these objects have no meaning to an end Javascript user, but in order to make sense of the objects available in this module it is helpful to understand how these objects map to objects in the JVM (because after all, this module is an interface that allows the manipulation of a distributed system).

(To be continued...)

Details

Downloads (weekly)
Updated: April 1, 2015
Created: March 20, 2015
Package score learn more: Package Security
ES Module Entrypoint Info
Export Map Info
Keywords
License
README
Repository URL
TypeScript Types Info
License: MIT
Dependencies: 5
Links: Package Security snyk.io

Homepage github.com
Issues github.com
Repository github.com
Keywords: machine learning predictive analytics predictive modeling data mining computational statistics statistics statistical learning clustering classification regression deep learning
Collaborators: @lo5