README

AranAccess

Deprecated This repository is no longer maintained. Instead, we encourage you to switch to lachrist/linvail. In particular, this module is not compatible with >= aran@3.0.0. Originally this module was named linvail. But, because it was hard coupled to the JavaScript instrumenter aran, we decided to rename it aran-access. However lately, we managed to remove all the dependencies between the two modules. Therefore, we decided to switch back to its original name: linvail.

AranAccess is npm module that implements an access control system around JavaScript code transformed by Aran. This module's motivation is to build dynamic analyses capable of tracking primitive values across the object graph.

Getting Started

npm install acorn aran astring aran-access

const Acorn = require("acorn");
const Aran = require("aran");
const Astring = require("astring");
const AranAccess = require("aran-access");

let counter = 0;
const access = AranAccess({
  enter: (value) => ({concrete:value, shadow:"#"+(counter++)}),
  leave: (value) => value.concrete
});
const aran = Aran({
  namespace: "TRAPS",
  sandbox: true,
  pointcut: Object.keys(access.advice)
});
access.membrane.transform = (script, scope) =>
  Astring.generate(aran.weave(Acorn.parse(script), pointcut, parent));

global.TRAPS = Object.assign({}, access.advice);
global.TRAPS.primitive = (primitive, serial) => {
  const result = access.advice.primitive(primitive, serial);
  console.log(result.shadow+"("+result.concrete+") // @"+serial);
  return result;
};
global.TRAPS.binary = (operator, left, right, serial) => {
  const result = access.advice.binary(operator, left, right, serial);
  console.log(result.shadow+"("+result.concrete+") = "+left.shadow+" "+operator+" "+right.shadow+" // @"+serial);
  return result;
};

global.eval(Astring.generate(aran.setup()));
global.eval(access.membrane.transform(`
let division = {};
division.dividend = 1 - 1;
division.divisor = 20 - 2 * 10;
division.result = division.dividend / division.divisor;
if (isNaN(division.result))
  console.log("!!!NaN division result!!!");
`, "global"));

#6(apply) // @2
#10(defineProperty) // @2
#14(getPrototypeOf) // @2
#18(keys) // @2
#22(iterator) // @2
#24(undefined) // @2
#26(dividend) // @6
#27(1) // @9
#28(1) // @10
#29(0) = #27 - #28 // @8
#30(divisor) // @12
#31(20) // @15
#32(2) // @17
#33(10) // @18
#34(20) = #32 * #33 // @16
#35(0) = #31 - #34 // @14
#36(result) // @20
#37(dividend) // @23
#38(divisor) // @25
#39(NaN) = #29 / #35 // @22
#42(result) // @30
#45(log) // @33
#46(!!!NaN division result!!!) // @35
!!!NaN division result!!!

Demonstrators

demo/analysis/identity.js Demonstrate the API of this module but don't produce any observable effect.
demo/analysis/identity-wrapper.js Still don't produce any observable effect but wrap every values entering the system and unwrap them as they leave the system.
demo/analysis/tracer.js Use an identity membranes and log every operations.
demo/analysis/wrapper: Every values entering transformed code areas are wrapped to provide a well-defined identity. Every wrapper leaving transformed code areas are unwrapped so the behavior of the base program is not altered. Wrapping and unwrapping operations are logged.
demo/analysis/concolic: In this very simple concolic executer, primitive values from literal, binary and unary operations are all considered symbolic. Also use a wrapper membrane but overwrite a couple of traps to log data dependencies.
demo/analysis/dependency: Same as above but every traps is login the data flow.

API

catergory

A wild value is either:
- a primitive
- a reference which satisfies the below constraints:
  - its prototype is a wild value
  - the values of its data properties are wild values
  - the getters and setters of its accessor properties are wild values
  - applying it with a wild value this-argument and wild value arguments will return a wild value
  - constructing it with wild value arguments will return a wild value
A tame value is either:
- a primitive
- a reference which satisfies the below constraints:
  - its prototype is a tame value.
  - the values of its data properties are inner values expect if the reference is an array; in that case, it's length property remains a wild value.
  - the getters and setters of its accessor properties are tame values
  - applying it with a tame value this-argument and inner value arguments will return an inner value
  - constructing it with inner value arguments will return an inner value

`access = require("aran-access")(membrane)`

{advice, membrane, capture, release} = require("aran-access")({enter, leave, check, transform});

inner = enter(tame): User-defined function to convert a tame value to an inner value.
tame = leave(inner): User-defined function to convert an inner value to a tame value.
check :: boolean, default false: Indicates whether runtime checks should be peformed inside capture and release to detect type errors between tame and wild values. Improve the chance of these checks to spot a type error at the right place the membrane.enter and membrane.leave can be rewritten as:
```
const enter = (value) => { ... };
const leave = (value) => { ... };
membrane.enter = (value) {
  access.release(value);
  return enter(value);
};
membrane.leave = (value) => {
  const result = leave(value);
  access.release(result);
  return result;
};
```
transformed = transform(original, scope): This function will be called to transform code before passing it to the infamous eval function. If membrane.transform is not defined, access.advice.eval will throw an exception.
advice :: object: An Aran advice, contains Aran traps and a SANDBOX property whose value is set to access.capture(global). The user can modify the advice before letting Aran using it.
access.membrane :: object: The same value as the given argument.
tame = access.capture(wild): Convert a wild value into a tame value.
wild = access.release(tame): Convert a tame value into a wild value.

Discussion

Aran and program transformation in general is good for introspecting the control flow and pointers data flow. Things become more difficult when reasoning about primitive value data flow is involved. For instance, there is no way at the JavaScript language level to differentiate two null values even though they have different origins. This restriction strikes every JavaScript primitive values because they are inlined into different parts of the program's state -- e.g the environment and the value stack. All of these copying blur the concept of a primitive value's identity and lifetime. By opposition, objects can be properly differentiated based on their address in the store. Such situation happens in almost every mainstream programming languages. We now discuss several strategies to provide an identity to primitive values:

Shadow States: For low-level languages such as binary code, primitive values are often tracked by maintaining a so called "shadow state" that mirrors the concrete program state. This shadow state contains analysis-related information about the program values situated at the same location in the concrete state. Valgrind is a popular binary instrumentation framework which utilizes this technique to enables many data-flow analyses. The difficulty of this technique lies in maintaining the shadow state as non-instrumented functions are being executed. In JavaScript this problem typically arises when objects are passed to non instrumented functions such as builtins. Keeping the shadow store in sync during such operation requires to know the exact semantic of the non-instrumented function. Since they are so many different builtin functions in JavaScript, this is a very hard thing to do.
Record And Replay: Record and replay systems such as Jalangi are an intelligent response to the challenge of keeping in sync the shadow state with its concrete state. Acknowledging that divergences between shadow and concrete states cannot be completely avoided, these systems allows divergences in the replay phase which can be recovered from by utilizing the trace gathered during the record phase. We propose two arguments against such technique: First, every time divergences are resolved in the replay phase, values with unknown origin are being introduced which necessarily diminish the precision of the resulting analysis. Second, the replay phase only provide information about partial execution which can be puzzling to reason about.
Wrappers: Instead of providing an entire separated shadow state, wrappers constitute a finer grained solution. By wrapping primitive values inside objects we can simply let them propagate through the data flow of the base program. The challenge introduced by wrappers is to make them behave like their wrapped primitive value to non-transformed code. We explore three solutions to this challenge:
- Boxed Values: JavaScript enables to box booleans, numbers and strings. Despite that symbols, undefined and null cannot be tracked by this method, boxed values do not always behave like their primitive counterpart within builtins.
```
// Strings cannot be differentiated based on their origin
let string1 = "abc";
let string2 = "abc";
assert(string1 === string2);
// Boxed strings can be differentiated based on their origin
let boxed_string1 = new String("abc");
let boxed_string2 = new String("abc");
assert(boxed_string1 !== boxed_string2);
// Boxed value behave as primitive in some builtins: 
assert(JSON.stringify({a:string1}) === JSON.stringify({a:boxed_string1}));
// In others, they don't...
let error
try {
  Object.defineProperty(string1, "foo", {value:"bar"});
} catch (e) {
  error = e;
}
assert(error);
Object.defineProperty(boxed_string1, "foo", {value:"bar"});
```
- valueOf Method: A similar mechanism to boxed value is to use the valueOf method. Many JavaScript builtins expecting a primitive value but receiving an object will try to convert this object into a primitive using its valueOf method. As for boxed values this solution is not bullet proof and there exists many cases where the valueOf method will not be invoked.
- Explicit Wrappers: Finally a last options consists in using explicit wrappers which should be cleaned up before escaping to non-instrumented code. This requires to setup an access control system between instrumented code and non-instrumented code. This the solution this module directly enables.

Acknowledgments

I'm Laurent Christophe a phd student at the Vrij Universiteit of Brussel (VUB). I'm working at the SOFT language lab, my promoters are Coen De Roover and Wolfgang De Meuter. I'm currently being employed on the Tearless project.

Usage no npm install needed!