@oktupol/base-emoji

Like base32 and base64, but with emoji. Uses 1024 unique emoji as character set.

Usage no npm install needed!

<script type="module">
  import oktupolBaseEmoji from 'https://cdn.skypack.dev/@oktupol/base-emoji';
</script>

README

๐Ÿ‘ช๐Ÿ—พ๐Ÿคต๐Ÿฏ Base Emoji ๐Ÿฆง๐Ÿฅ…๐Ÿ”๐Ÿš

There is base32, there is base64, now there is base-emoji!

Installation

Install base-emoji as a cli executable using npm:

npm install -g @oktupol/base-emoji

or as a library inside your Javascript or Typescript project:

npm install @oktupol/base-emoji

Usage

CLI

  • Encode data from stdin:

    echo 'Hello World' | base-emoji
    
    ==> ๐Ÿ…๐Ÿš“๐Ÿ“ฟ๐Ÿ™‰๐Ÿค๐Ÿ๐Ÿ•Ž๐Ÿšฅ๐ŸŒฟ๐Ÿค›๐Ÿ•“
    
  • Decode with the flag -d

    echo '๐ŸŽ๐Ÿป๐Ÿช–๐Ÿฆญ๐Ÿƒ๐Ÿป๐Ÿชถ๐Ÿฆˆ๐Ÿ†๐ŸŒ—๐Ÿ‘ฉ๐Ÿถ๐Ÿ•—' | base-emoji -d
    
    ==> I like emojis
    
  • Encode or decode data from a file

    Cat cat.jpg - 2009, Michael Wilson CC BY-NC-ND 2.0

    base-emoji cat.jpg
    
    ==>
    โžฟ๐ŸŒพ๐Ÿ“›๐Ÿคน๐Ÿคœ๐Ÿ˜ก๐Ÿ—ป๐Ÿฆ•๐Ÿ˜€๐Ÿ˜†๐Ÿ“–๐Ÿคน๐Ÿ’…๐Ÿ˜€๐Ÿ˜€๐Ÿ™‚๐Ÿ˜€๐Ÿคช๐Ÿ™๐Ÿคน๐Ÿ˜˜๐Ÿ˜€๐Ÿ˜€๐Ÿ˜ƒ๐Ÿ˜€๐Ÿ˜€๐Ÿคฃ๐Ÿถ๐Ÿ˜€๐Ÿ˜€๐Ÿ˜€๐Ÿ˜€
    ๐Ÿ˜€๐Ÿ˜€๐Ÿ˜€๐Ÿคพ๐Ÿชฃ๐Ÿ™‚๐Ÿƒ๐Ÿ˜ป๐Ÿง‡๐Ÿ“บ๐Ÿ•Ž๐Ÿงพ๐Ÿง‡๐Ÿฅป๐Ÿ˜‡๐ŸŽท๐Ÿ‘จ๐Ÿ˜๐Ÿฅ„๐Ÿš‡๐Ÿช๐Ÿ˜Ÿ๐Ÿคน๐Ÿ˜€๐Ÿ˜€๐Ÿ˜€๐Ÿค‘๐Ÿฆ๐Ÿ˜…๐Ÿ‘๐Ÿ˜€๐Ÿ“ฟ
    ๐Ÿค˜๐Ÿ’‹๐Ÿ‘—๐Ÿคน๐Ÿ˜€๐Ÿคจ...
    

    cat.jpg.emoji - full output of above command

  • Direct the output of any command into a file

    base-emoji -d dog.jpg.emoji > dog.jpg
    
  • When encoding, optionally use the -a flag to armor the output

    base-emoji -a some-document.pdf
    
    ==> 
    ๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ข๐Ÿ’๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต
    ๐Ÿฆ๐Ÿ‘ญ๐Ÿช›๐Ÿ‘ž๐Ÿคฅ๐Ÿ‘โณ๐Ÿ˜€๐Ÿ˜€๐Ÿคด๐ŸšŽ๐Ÿ˜ฒ๐Ÿฆฅ๐Ÿ˜€๐Ÿ˜€๐Ÿ€๐Ÿ˜€๐Ÿ˜€๐Ÿค™๐Ÿฅƒ๐Ÿคช๐Ÿ˜€๐Ÿ˜€๐Ÿƒ๐Ÿงช๐Ÿšฟ๐Ÿ’พ๐Ÿ˜€๐Ÿ˜€๐Ÿ˜ฆ๐Ÿ‘ฎ๐Ÿš‡
    ...
    ๐Ÿ”ƒ๐Ÿ˜€๐Ÿ˜€๐Ÿ˜€๐Ÿฆ„๐Ÿ˜ซ๐Ÿช›๐Ÿฆถ๐Ÿ‘ช๐Ÿฅƒ๐Ÿ–ค๐Ÿ•“
    ๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ข๐Ÿ’”๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต
    
  • When encoding with armor, optionally use the --descriptor option to specify a descriptor

    gpg --export-secret-key my@email.tld | base-emoji -a --descriptor '๐Ÿคซ๐Ÿ”‘๐Ÿ™Š'
    
    ==>
    ๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿคซ๐Ÿ”‘๐Ÿ™Š๐Ÿ’๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต
    ๐Ÿ•ง๐Ÿ’ฆ๐Ÿฆฒ๐Ÿ‘๐Ÿ•ž๐Ÿง๐Ÿช๐Ÿซ•๐Ÿ“ค๐Ÿฅฏ๐Ÿฆญ๐Ÿฅฌ๐Ÿšธ๐Ÿชฆ๐Ÿ‡๐Ÿชถ๐Ÿฏ๐Ÿธ๐ŸฅŠโž–๐Ÿงโžฟ๐Ÿช ๐ŸŽ๐Ÿชฅ๐ŸฅŒ๐Ÿ๐Ÿ”™๐Ÿฆ๐Ÿง‚๐Ÿ•ž๐Ÿด
    ...
    ๐Ÿšฃ๐Ÿšถ๐Ÿ’’๐Ÿฆ”๐Ÿฆƒ๐Ÿ‘‚๐ŸŽฑ๐Ÿ˜’๐ŸŒฑโ›…๐ŸŒต๐Ÿ•“
    ๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿคซ๐Ÿ”‘๐Ÿ™Š๐Ÿ’”๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต    
    
    
  • For a complete list of available options, run

    base-emoji --help
    

Inside a Node project

The base-emoji library can be imported using

CommonJS:

const { BaseEmoji } = require('@oktupol/base-emoji');

ES6, Typescript:

import { BaseEmoji } from '@oktupol/base-emoji';

There are two functions:

BaseEmoji.encode()

Usage:

const result = BaseEmoji.encode(data, options);

Parameters:

  • data (required) being any of:

    • a string
    • an ArrayBufferLike (e.g. ArrayBuffer, Uint8Array)
  • options (optional) - an object with following structure; all keys are optional:

    {
      armor?: boolean;
      armorDescriptor?: boolean;
      wrap?: number;
    }
    
    • armor - if true, the resulting output will be armored.
    • armorDescriptor - when armored, the value will be used in the header and footer of the output
    • wrap - if provided, wrap after n characters

BaseEmoji.decode()

Usage:

const result = BaseEmoji.decode(data, options);

Parameters:

  • data (required) - A base-emoji encoded string
  • options (optional) - an object with following structure; all keys are optional:
    {
      output: 'string' | 'binary'
    }
    
    • output - return the output as String, if string, or as Uint8Array, if binary

How does it work

The prinicple is identical to that of base64. In base64, data bits are rearranged from their original 8-tuple bytes into 6-tuples, of which there are 64, and each of these 6-tuples is then represented with one ascii character.

bytes  |    104 = h    |    105 = i    |     33 = !    | ...
DATA   |0 1 1 0 1 0.0 0'0 1 1 0.1 0 0 1'0 0.1 0 0 0 0 1| ...
base64 |   26 = a  |    6 = G  |   36 = k  |   33 = h  | ...

Therefore, the base64 representation of hi! is aGkh.

In base-emoji, 1024 different symbols are used for representing 10-tuples.

bytes      |    104 = h    |    105 = i    |     33 = !    |              ...
DATA       |0 1 1 0 1 0 0 0'0 1.1 0 1 0 0 1'0 0 1 0.0 0 0 1'0 0 0 0 0 0.0 ...
base-emoji |      417 = ๐Ÿ’     |      658 = ๐ŸŒ’     |       64 = ๐Ÿ˜Ÿ     |  ...

The complete list of emojis is located in emoji-map.json

Padding

Since 10 quite obviously doesn't divide evenly into 8, base-emoji-encoded data contains a few bits more of information at the end than the original data. In case of above example, the base-emoji encoded representation of the string hi! has 6 bits of information overhanging. This is important to know especially once there are is an overhang of 8 bits, because then it would otherwise be ambiguous whether the last 8 bits are a byte of the original information or not.

To indicate the length of the overhang, following symbols are appended to the end of the base-emoji encoded string:

Padding character ๐Ÿ•› ๐Ÿ• ๐Ÿ•‘ ๐Ÿ•’ ๐Ÿ•“ ๐Ÿ•” ๐Ÿ•• ๐Ÿ•– ๐Ÿ•— ๐Ÿ•˜
Bits of overhang 0 1 2 3 4 5 6 7 8 9

Whereas the padding character for 0 bits of overhang is optional, and the characters for 1, 3, 5, 7 and 9 bits can't realistically occur.

In above example, there are six bits of overhang, meaning the emoji representation receives the padding character ๐Ÿ••. Hence, the full base-emoji representation of hi! is ๐Ÿ’๐ŸŒ’๐Ÿ˜Ÿ๐Ÿ••.

Efficiency

All that being said, base-emoji is horribly inefficient at encoding data.

In base64, where every 6-tuple of bits is encoded in one ascii character of one byte, the encoded data size is 4/3 times the original data size, i.e. around 33.3% larger.

In base-emoji, we use 1024 symbols to encode 10-tuples, however, these 1024 symbols are Unicode! An exact number can't be given due to unicode characters being of variable size, but a quick test with 1000 random bytes showed a threefold increase.

head -c 1000 /dev/urandom | base64 | wc -c
==> 1354

head -c 1000 /dev/urandom | base32 | wc -c
==> 1622

head -c 1000 /dev/urandom | base-emoji | wc -c
==> about 3175