1cli

Use JS code snippets for command-line file munging

Usage no npm install needed!

<script type="module">
  import cli from 'https://cdn.skypack.dev/1cli';
</script>

README

1cli

Use JavaScript code snippets within "one-line" shell command-lines for file munging.

  • Special support for json, jsonl, comma-separated csv files, tab-separated csv files and pipe | separated csv files.

  • Short names for useful functionality:

    • _ represents the current "line".

    • _p(...) like console.log(...).

    • _j(o) convert o to/from json.

For example, to convert a .csv file to a .jsonl file:

# show contents of csv file:
$ cat examples/vm-ips.csv 
user,course,ip
john,cs123,192.168.1.2
bill,cs223,192.168.1.3
mary,cs123,10.1.2.3
sue,cs223,192.168.1.4
# convert
$ $ 1cli -e '_p(_j(_))'  examples/vm-ips.csv > ~/tmp/vm-ips.jsonl
# show result
$ cat ~/tmp/vm-ips.jsonl 
{"user":"john","course":"cs123","ip":"192.168.1.2"}
{"user":"bill","course":"cs223","ip":"192.168.1.3"}
{"user":"mary","course":"cs123","ip":"10.1.2.3"}
{"user":"sue","course":"cs223","ip":"192.168.1.4"}
$

The .csv extension on the input file results in it being read into a list of object. The -e --eval option will run provided code block _p(_j(_)), printing out each csv object as json.

The following example shows converting the same file vm-ips.csv into a json file with keys for each user:

$ 1cli -e 'BEGIN o={}' -e 'o[_.user]=_' -e 'END _p(_j(o))' \
     examples/vm-ips.csv >~/tmp/vm-ips.json
#show output (edited for limiting line length)
$ cat ~/tmp/vm-ips.json
{"john":{"user":"john","course":"cs123","ip":"192.168.1.2"},
 "bill":{"user":"bill","course":"cs223","ip":"192.168.1.3"},
  "mary":{"user":"mary","course":"cs123","ip":"10.1.2.3"},
  "sue":{"user":"sue","course":"cs223","ip":"192.168.1.4"}
}
$

The BEGIN code block is executed only once at the start of the script and is used to set up an empty object o. The subsequent code block adds each csv object to the o object under its user key.

The following example filters vm-ips.csv to produce a jsonl contains all csv objects have ip's starting with 10.

$ 1cli -e '_.ip.startsWith('10.') && _p(_j(_))' examples/vm-ips.csv 
{"user":"mary","course":"cs123","ip":"10.1.2.3"}
$

The following example uses the previously generated vm-ips.json file along with a new student-info.jsonl file to join entries for the same id's. Note that we match the user field to the portion before the @ of the email from the student-info.jsonl file:

# show student-infos.jsonl input file
$ cat examples/student-info.jsonl 
{ "id": "123-465-mar", "first": "Mary", "last": "Traub", "email": "mary@x.com" }
{ "id": "132-456-sue", "first": "Sue", "last": "Rawls", "email": "sue@x.com" }
{ "id": "123-456-jhn", "first": "John", "last": "Smith", "email": "john@x.com" }
{ "id": "123-456-bil", "first": "Bill", "last": "Gray", "email": "bill@x.com" }
# join user from vm-ips.json to email id from student-infos.json
$ 1cli -e 'u=_.email.m(/^[^@]+/)[0]; x={..._contents[0][u], ..._}; _p(_j(x))' \
    ~/tmp/vm-ips.json examples/student-info.jsonl \
    > ~/tmp/student-vm-ips.jsonl
# show results (edited for line-length)
$ cat ~/tmp/student-vm-ips.jsonl 
{"user":"mary","course":"cs123","ip":"10.1.2.3","id":"123-465-mar",
    "first":"Mary","last":"Traub","email":"mary@x.com"}
{"user":"sue","course":"cs223","ip":"192.168.1.4","id":"132-456-sue",
    "first":"Sue","last":"Rawls","email":"sue@x.com"}
{"user":"john","course":"cs123","ip":"192.168.1.2","id":"123-456-jhn",
    "first":"John","last":"Smith","email":"john@x.com"}
{"user":"bill","course":"cs223","ip":"192.168.1.3","id":"123-456-bil",
    "first":"Bill","last":"Gray","email":"bill@x.com"}
$

Here is the manual produced using the --man option:

$ 1cli --man
Usage: 1cli [options] [...path]

Options:
  -d, --debug                  output generated functions on stderr
  -e, --eval <code>            evaluate <code>. Can be specified multiple
                               times. If --loop, then evaluate for each _
                               "line". If <code> starts with BEGIN/END then
                               evaluate only at start/end.  (default: [])
  -f, --field-sep <sep>        use <sep> to split _ line into _0, _1, ... when
                               --loop (default: "/\\s+/")
  -h, --help                   display help for command
  -l, --no-loop                run code only once
  -L, --loop                   repeat code for each _ "line" (default: true)
  -m, --no-monkey              do not monkey-patch standard classes
  -M, --monkey                 monkey-patch standard classes (default: true)
  --man                        output manual on stdout
  -p, --no-print               do not print _ "line" after each loop iteration
  -P, --print                  print _ "line" after each loop iteration
                               (default: false)
  --src <path>                 like specifying <path> in [...path]; always
                               recognize --ext extensions and loop over "lines"
                               when applicable
  --src-l <path>               alias for --src-no-loop
  --src-lx <path>              alias for --src-no-ext-no-split
  --src-no-ext <path>          like specifying <path> in [...path]; loop over
                               "lines" but do not recognize special -X
                               extensions
  --src-no-loop <path>         like specifying <path> in [...path]; recognize
                               --ext extensions but do not loop over "lines"
  --src-no-loop-no-ext <path>  like specifying <path> in [...path] arguments;
                               do not recognize special extensions; do not loop
                               over "lines"
  --src-x <path>               alias for --src-no-ext
  -v, --version                output the version number
  -x, --no-ext                 no special handling for extensions in [path...]
  -X, --ext                    special handling for json, jsonl, csv, psv, tsv
                               extensions in [path...] (default: true)

Process files specified by [...path] or by --src* options
using --eval <code> blocks. 

If a path is specified as '-', then read from standard input; if an
extension is required, then attempt to guess an extension based on the
initial content.

If --loop, then repeat <code> blocks for each "line" of file contents.
A <code> block starting with 'BEGIN' is executed only once at the
start. A <code> block starting with 'END' is executed only once at the
end.

Unless extension processing has been turned off by specifying
--no-ext or by using the --src-*no-ext options, the following
special extensions are recognized:

  .csv:       parsed as comma-separated CSV with first line as header
  .json:      parsed as JSON content; never split into lines.
  .jsonl:     each line parsed as JSON; always split into lines
  .psv:       parsed as pipe '|' separated CSV with first line as header
  .tsv:       parsed as tab-separated CSV with first line as header


Note that all of the above extensions except .json are read
in as an array of objects and processed within the --eval
loop blocks (unless --no-loop or --src-no-loop* is specified).

The code for each block has access to the following constants:

  _contents:  array of contents of all files specified by <path...> or --src
  _d:         _d(path): return array of contents of directory dir
  _entries:   _entries(obj) => Object.entries(obj)
  _f:         _f(path): returns array of "lines" from path
  _j:         _j(arg) => convert arg to/from JSON
  _keys:      _keys(obj) => Object.keys(obj)
  _p:         _p(...) is an alias for console.log(...)
  _paths:     array of paths of all files specified by <path...> or --src
  _values:    _values(obj) => Object.values(obj)
  _x:         _x(cmd) returns stdout for executing shell command cmd


When a block is being executed repeatedly because
of the --loop option, it has access to the following additional
variables:

  _:          current "line" being processed
  _c:         contents of current path
  _n:         current line number (1-origin)
  _path:      current path being processed


Specifying --monkey-patch, patches standard classes with
convenience methods:

  m:          str.m(...) => str.match(...); results[0, 1]...] put into $0, $1...
  r:          str.r(...) => str.replace(...)
  s:          str.s(...) => str.split(...)


$