This repo is for EMASM, the Ethereum Macro Assembler (derived from the name MASM for the classic Microsoft Macro Assembler). EMASM was developed for the ETHBoston 2019 hackathon to provide tooling for a technique that came to me in a dream I had a few weeks earlier.
First it is necessary to understand how a transaction to create a contract works: When you omit the "to" field in a eth_sendTransaction payload it essentially gives a new meaning to your calldata, instead of loading a contract binary as opcodes and making the calldata available to the opcodes via CALLDATACOPY/CALLDATALOAD, your calldata actually IS opcodes and essentially any logic you perform in the constructor has the address of the future contract before the contract runtime code is even loaded. Once the constructor function finishes running it will typically CODECOPY the runtime code of the contract (which is simply appended to the end of the constructor function) and RETURN the entire buffer. The return data of this transaction is now stored on chain and is the opcodes that get loaded in the "to" address.
Now that you understand this, what do you think will happen if you pass your contract deploy code from solc to "eth_call" and omit the "to" field? Well, it turns out, a standard ETH RPC will also think you are a contract constructor, but since this is "eth_call," it will just return your runtime data as a response. Now, what if you exploit this behavior to actually not return runtime code at all, why don't we just do any EVM code and RETURN the result of this computation? This becomes much more useful when you use CALL instructions to actually directly call as many contracts as you need to, or even the same one (which just has a poor API for returning batch data) and also we don't want to have to deploy a proxy contract as a view layer to perform this task. In fact, we can write these payloads to avoid doing any of the ABI encoding/decoding we would have to do in Solidity as we pack/unpack inputs/outputs between phases of deriving the final output, and also we can skip the overhead of the ETH node loading the view layer binary from the trie to setup the execution context. All of this matters if you are trying to max out the performance of doing complex reads from the chain accessible via contract APIs, and it's also fun!
So what is EMASM?
emasm([ '0x1', 'addatwo', 'jump', ['somelabel', ['0x0', 'mstore', '0x20', '0x0', 'return']], ['addatwo', ['0x2', 'add', 'somelabel', 'jump']], ]); // computes 0x1 + 0x2 in an extremely obfuscated manner
Similar syntax exists for encoding raw bytes into the binary, in case you want to use CODECOPY to copy some data to memory for any reason. Just prefix your label with
bytes: and anywhere in the program you can use
bytes:<label>:size. The fact that the second element of the array for a bytes string is also an array is for consistency with the syntax for segments; elements in the array defining your bytes beyond the first element are simply discarded.
emasm([ ['bytes:somelabel', ['0xfffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff']], ['bytes:somelabel:ptr', 'bytes:somelabel:size'] ]); // defines a long bytes string with all bits set and then pushes the offset into the final binary where the bytes segment is encoded, then pushes the size in bytes of the string on the stack
The only other things to note in emasm is that your segments and bytes that you encode in the AST occur in the final binary in the order they appear. Additionally, the only special rules around nested arrays are when an array appears that follows the rules for defining labels or bytes segment. The entire array is essentially recursively flattened. The motivation for this is that it makes your macros completely composable without having to consider any sort of concat operations with arrays.
There is very little validation in the EMASM assembler function, primarily due to the idea that we want to max out the speed at which we can go from AST to bytecode on your dApp. I have various other copouts if you don't like that one. Eventually I will add an optional pass to the assembler that will provide errors.
How do I even use assembly?
EVM assembly takes practice! But luckily, it's easier than assembly languages for modern chipsets. Skim the Ethereum yellow paper and pay close attention to the appendix for the list of possible opcodes. All of these opcodes can be used in EMASM!
For an easier reference after you're introduced, check out the Solidity docs for inline assembly, essentially the first argument to Solidity assembly functions must be the topmost position on the stack, second argument is the item on the stack pushed on immediately prior to the topmost, etc. Good luck.
Do you have any macros?
Not really, but if you make a good one make a npm module and feel free to share and I can post links to it here. It would be neat to at least see an implementation of ABIEncoderv2, something I wanted in Huff but seemed not doable in a generic way with Huff's version of metaprogramming.
Raymond Pulver IV Author of IDEX Ostensible Madman