elfinfo

ELF parser

Usage no npm install needed!

<script type="module">
  import elfinfo from 'https://cdn.skypack.dev/elfinfo';
</script>

README

elfinfo

A javascript library to parse information from ELF files. Has basic functionality at the moment but will hopefully have stack analysis and disassembly in the near future.

Usage

import * as fs from 'fs/promises';
import { open } from 'elfinfo';

// Parse the specified ELF file.
const elfdata = await fs.readFile('someelffile');
const info = await open();

The open function above will parse the ELF file header, program headers, and sections. It will also read the symbol table and strings.

open can be called with a variety of arguments. A string will open a file, a buffer, array, or blob will parse directly from memory, and a file handle will read from the file.

Examining the data

Several functions are provided on the elf data structure for examining information about symbols and translating addresses. For example, getSymbolsInSection will get all the symbols exist in a specified setcion, getSymbolFileOffset will tell you the actual file offset of a symbol (if possible) so you can actually read the symbol data. There are also functions for doing VMA and LMA stuff. Documentation is currently pending but autocomplete should work in an IDE like VS Code.

BigInt and Number

Javascript numbers are doubles. This is non-ideal for 64-bit file offsets so for 64-bit ELF files BigInt is used whenever the data is stored as a 64 bit number in the ELF file or where something refers to a memory location. This can be a pain since you can't mix arithmetic for BigInt and Number. There isn't currently a nice solution (I mean, what can you do?), so just be aware of it.

Terminology

ELF and ELF tools (such as readelf) sometimes use conflicting terminology. Here is an indication of what things mean according to this library:

  • a Segment refers to a piece of data that exists in the ELF file and is to be loaded into memory at a certain location. In the ELF file they are stored as Program Header Entries. A segment consists mainly of a file offset and two memory locations, the virtual and physical memory locations.
  • a Section refers to the various sections stored in the ELF file. A section has an address which is always a virtual (VMA) address. Each section mainly consists of a name, a type, a virtual memory location, and a size. There are many kinds of sections, but the main ones are those that contain program data (either code or data), symbols, and strings. elfinfo currently parses string and symbol sections.
  • a Symbol can refer to many different things, but usually refers to a function or variable used in code. There are also symbols for sections and files. Symbols are used for debugging or other analysis and do not affect program execution. Symbols are stored in symbol table sections and the names of symbols are stored in string table sections. Stored with the symbol is the name of the symbol, the type of the symbol, the virtual memory location of the symbol, sometimes the size of the symbol, and some other things.
  • a Virtual Address refers to the address a segment, section, or symbol has in memory. This is sometimes referred to as a VMA address or a memory address.
  • a Physical Address refers to the address a segment, section, or symbol has in non-volatile storage. This does not refer to the offset in the file. A normal ELF executable for an operating system like linux will usually have virtual addresses match the physical addresses since the file can be mapped into memory wherever needed. However, in embedded systems the data for virtual memory locations needs to be stored in flash somewhere. This is the physical address. This is also called the LMA address or load address. Some symbols and sections don't have a physical address (for example, BSS section symbols that are cleared in memory on startup).
  • a File Offset refers to a location in the ELF file itself. Only segments have file offsets, but the file offset can be calculated for a section or symbol if the symbol or section has a physical address.

What gets parsed

A debug function is also provided, that spits out readelf/objdump like stuff.

import * as fs from 'fs/promises';
import { open, debug } from 'elfinfo';

// read the ELF file
const elfdata = await fs.readFile('someelffile');
const info = await open(elfdata);

// generate human-readable output
const fileinfo = debug(info);
console.log(fileinfo);

This will produce the following output. This may help you get an idea of what elfinfo parses at the moment:

Path: someelffile
Class:                             ELF64 (2)
Bits:                              64 bits
Data:                              Little endian (1)
Version:                           1
OS/ABI:                            SystemV (0x00)
ABI version:                       0
Type:                              Executable (0x02)
ISA/machine:                       x64 (0x3e)
ISA/machine version:               1
Entry Point:                       0x004003e0
Program header offset:             0x40
Section header offset:             0x18b0
Flags:                             0 (0x00)
Program headers:                   56 bytes × 9
Section headers:                   64 bytes × 27
String table section index:        26


Program Header Entries:

    #   Type                 Offset             VirtAddr           PhysAddr           FileSize           MemSiz             Align      Flags
    0   Program Header Table 0x0000000000000040 0x0000000000400040 0x0000000000400040 0x00000000000001f8 0x00000000000001f8 0x00000008 Read
    1   Interp               0x0000000000000238 0x0000000000400238 0x0000000000400238 0x000000000000001c 0x000000000000001c 0x00000001 Read
    2   Load                 0x0000000000000000 0x0000000000400000 0x0000000000400000 0x0000000000000840 0x0000000000000840 0x00200000 Read | Execute
    3   Load                 0x0000000000000e10 0x0000000000600e10 0x0000000000600e10 0x0000000000000210 0x0000000000000218 0x00200000 Read | Write
    4   Dynamic              0x0000000000000e20 0x0000000000600e20 0x0000000000600e20 0x00000000000001d0 0x00000000000001d0 0x00000008 Read | Write
    5   Note                 0x0000000000000254 0x0000000000400254 0x0000000000400254 0x0000000000000020 0x0000000000000020 0x00000004 Read
    6   GNU EH frame         0x00000000000006bc 0x00000000004006bc 0x00000000004006bc 0x000000000000004c 0x000000000000004c 0x00000004 Read
    7   GNU stack info       0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x00000010 Read | Write
    8   GNU ro relocation    0x0000000000000e10 0x0000000000600e10 0x0000000000600e10 0x00000000000001f0 0x00000000000001f0 0x00000001 Read


Sections:

    #   Name               Type                             Address            Offset             Size               EntSize            Link  Info  Align      Flags
    0   <null>             NULL section                     0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000             0x00000000 <none>
    1   .interp            Prog bits                        0x0000000000400238 0x0000000000000238 0x000000000000001c 0x0000000000000000             0x00000001 Alloc
    2   .note.ABI-tag      Note                             0x0000000000400254 0x0000000000000254 0x0000000000000020 0x0000000000000000             0x00000004 Alloc
    3   .gnu.hash          GNU hash section                 0x0000000000400278 0x0000000000000278 0x000000000000001c 0x0000000000000000    4        0x00000008 Alloc
    4   .dynsym            Dynamic linking symbols section  0x0000000000400298 0x0000000000000298 0x0000000000000060 0x0000000000000018    5     1  0x00000008 Alloc
    5   .dynstr            String table                     0x00000000004002f8 0x00000000000002f8 0x000000000000003f 0x0000000000000000             0x00000001 Alloc
    6   .gnu.version       GNU version symbol table         0x0000000000400338 0x0000000000000338 0x0000000000000008 0x0000000000000002    4        0x00000002 Alloc
    7   .gnu.version_r     GNU version needs section        0x0000000000400340 0x0000000000000340 0x0000000000000020 0x0000000000000000    5     1  0x00000008 Alloc
    8   .rela.dyn          Relocation section with addends  0x0000000000400360 0x0000000000000360 0x0000000000000030 0x0000000000000018    4        0x00000008 Alloc
    9   .rela.plt          Relocation section with addends  0x0000000000400390 0x0000000000000390 0x0000000000000018 0x0000000000000018    4    21  0x00000008 Alloc | Info Link
    10  .init              Prog bits                        0x00000000004003a8 0x00000000000003a8 0x0000000000000017 0x0000000000000000             0x00000004 Alloc | Executable
    11  .plt               Prog bits                        0x00000000004003c0 0x00000000000003c0 0x0000000000000020 0x0000000000000010             0x00000010 Alloc | Executable
    12  .text              Prog bits                        0x00000000004003e0 0x00000000000003e0 0x0000000000000292 0x0000000000000000             0x00000010 Alloc | Executable
    13  .fini              Prog bits                        0x0000000000400674 0x0000000000000674 0x0000000000000009 0x0000000000000000             0x00000004 Alloc | Executable
    14  .rodata            Prog bits                        0x0000000000400680 0x0000000000000680 0x000000000000003a 0x0000000000000000             0x00000004 Alloc
    15  .eh_frame_hdr      Prog bits                        0x00000000004006bc 0x00000000000006bc 0x000000000000004c 0x0000000000000000             0x00000004 Alloc
    16  .eh_frame          Prog bits                        0x0000000000400708 0x0000000000000708 0x0000000000000138 0x0000000000000000             0x00000008 Alloc
    17  .init_array        Init array                       0x0000000000600e10 0x0000000000000e10 0x0000000000000008 0x0000000000000008             0x00000008 Writeable | Alloc
    18  .fini_array        Fini array                       0x0000000000600e18 0x0000000000000e18 0x0000000000000008 0x0000000000000008             0x00000008 Writeable | Alloc
    19  .dynamic           Dynamic                          0x0000000000600e20 0x0000000000000e20 0x00000000000001d0 0x0000000000000010    5        0x00000008 Writeable | Alloc
    20  .got               Prog bits                        0x0000000000600ff0 0x0000000000000ff0 0x0000000000000010 0x0000000000000008             0x00000008 Writeable | Alloc
    21  .got.plt           Prog bits                        0x0000000000601000 0x0000000000001000 0x0000000000000020 0x0000000000000008             0x00000008 Writeable | Alloc
    22  .bss               No bits                          0x0000000000601020 0x0000000000001020 0x0000000000000008 0x0000000000000000             0x00000001 Writeable | Alloc
    23  .comment           Prog bits                        0x0000000000000000 0x0000000000001020 0x000000000000005f 0x0000000000000001             0x00000001 Merge | Strings
    24  .symtab            Symbol table                     0x0000000000000000 0x0000000000001080 0x0000000000000570 0x0000000000000018   25    42  0x00000008 <none>
    25  .strtab            String table                     0x0000000000000000 0x00000000000015f0 0x00000000000001d1 0x0000000000000000             0x00000001 <none>
    26  .shstrtab          String table                     0x0000000000000000 0x00000000000017c1 0x00000000000000ea 0x0000000000000000             0x00000001 <none>


Symbols for section #4 .dynsym:

    #   Value              Size       Type                         Bind   Visibility Name
        0 0x0000000000000000 0x00000000 None                         Local  Default
        1 0x0000000000000000 0x00000000 Function                     Global Default    printf
        2 0x0000000000000000 0x00000000 Function                     Global Default    __libc_start_main
        3 0x0000000000000000 0x00000000 None                         Weak   Default    __gmon_start__


Symbols for section #24 .symtab:

    #   Value              Size       Type                         Bind   Visibility Name
        0 0x0000000000000000 0x00000000 None                         Local  Default
        1 0x0000000000400238 0x00000000 Section                      Local  Default
        2 0x0000000000400254 0x00000000 Section                      Local  Default
        3 0x0000000000400278 0x00000000 Section                      Local  Default
        4 0x0000000000400298 0x00000000 Section                      Local  Default
        5 0x00000000004002f8 0x00000000 Section                      Local  Default
        6 0x0000000000400338 0x00000000 Section                      Local  Default
        7 0x0000000000400340 0x00000000 Section                      Local  Default
        8 0x0000000000400360 0x00000000 Section                      Local  Default
        9 0x0000000000400390 0x00000000 Section                      Local  Default
        10 0x00000000004003a8 0x00000000 Section                      Local  Default
        11 0x00000000004003c0 0x00000000 Section                      Local  Default
        12 0x00000000004003e0 0x00000000 Section                      Local  Default
        13 0x0000000000400674 0x00000000 Section                      Local  Default
        14 0x0000000000400680 0x00000000 Section                      Local  Default
        15 0x00000000004006bc 0x00000000 Section                      Local  Default
        16 0x0000000000400708 0x00000000 Section                      Local  Default
        17 0x0000000000600e10 0x00000000 Section                      Local  Default
        18 0x0000000000600e18 0x00000000 Section                      Local  Default
        19 0x0000000000600e20 0x00000000 Section                      Local  Default
        20 0x0000000000600ff0 0x00000000 Section                      Local  Default
        21 0x0000000000601000 0x00000000 Section                      Local  Default
        22 0x0000000000601020 0x00000000 Section                      Local  Default
        23 0x0000000000000000 0x00000000 Section                      Local  Default
        24 0x0000000000000000 0x00000000 File                         Local  Default    crtstuff.c
        25 0x0000000000400420 0x00000000 Function                     Local  Default    deregister_tm_clones
        26 0x0000000000400450 0x00000000 Function                     Local  Default    register_tm_clones
        27 0x0000000000400490 0x00000000 Function                     Local  Default    __do_global_dtors_aux
        28 0x0000000000601020 0x00000001 Object                       Local  Default    completed.7698
        29 0x0000000000600e18 0x00000000 Object                       Local  Default    __do_global_dtors_aux_fini_array_entry
        30 0x00000000004004c0 0x00000000 Function                     Local  Default    frame_dummy
        31 0x0000000000600e10 0x00000000 Object                       Local  Default    __frame_dummy_init_array_entry
        32 0x0000000000000000 0x00000000 File                         Local  Default    factorial.cpp
        33 0x0000000000000000 0x00000000 File                         Local  Default    main.cpp
        34 0x0000000000000000 0x00000000 File                         Local  Default    crtstuff.c
        35 0x000000000040083c 0x00000000 Object                       Local  Default    __FRAME_END__
        36 0x0000000000000000 0x00000000 File                         Local  Default
        37 0x0000000000600e18 0x00000000 None                         Local  Default    __init_array_end
        38 0x0000000000600e20 0x00000000 Object                       Local  Default    _DYNAMIC
        39 0x0000000000600e10 0x00000000 None                         Local  Default    __init_array_start
        40 0x00000000004006bc 0x00000000 None                         Local  Default    __GNU_EH_FRAME_HDR
        41 0x0000000000601000 0x00000000 Object                       Local  Default    _GLOBAL_OFFSET_TABLE_
        42 0x0000000000400670 0x00000002 Function                     Global Default    __libc_csu_fini
        43 0x0000000000601020 0x00000000 None                         Global Default    _edata
        44 0x0000000000400674 0x00000000 Function                     Global Default    _fini
        45 0x0000000000000000 0x00000000 Function                     Global Default    printf@@GLIBC_2.2.5
        46 0x0000000000000000 0x00000000 Function                     Global Default    __libc_start_main@@GLIBC_2.2.5
        47 0x0000000000400510 0x00000072 Function                     Global Default    _Z10factorialff
        48 0x00000000004004d0 0x0000003e Function                     Global Default    _Z10factorialii
        49 0x0000000000000000 0x00000000 None                         Weak   Default    __gmon_start__
        50 0x0000000000400600 0x00000065 Function                     Global Default    __libc_csu_init
        51 0x0000000000601028 0x00000000 None                         Global Default    _end
        52 0x0000000000400410 0x00000002 Function                     Global Hidden     _dl_relocate_static_pie
        53 0x00000000004003e0 0x0000002b Function                     Global Default    _start
        54 0x0000000000601020 0x00000000 None                         Global Default    __bss_start
        55 0x0000000000400590 0x0000006f Function                     Global Default    main
        56 0x0000000000601020 0x00000000 Object                       Global Hidden     __TMC_END__
        57 0x00000000004003a8 0x00000000 Function                     Global Default    _init

Disassembly

Basic disassembly is supported via the peer dependency disassembler. If the package is installed, disassembly of functions is possible. For example:

// parse the elf file. readSymbolData tells it to load the symbol data into memory
const { elf } = await elfinfo.open('program_path', { readSymbolData: true });

// get a symbol
const symbol = getSymbolByName(elf, 'main');

// disassemble
const instructions = disassemble(elf, symbol);

// format the instruction data
const instructionsString = instructions
  .map(x => `${x.mnemonic} ${x.operands}`).join('\n');

This will produce output like this:

...
mov rbp, rsp
sub rsp, 0x20
mov dword ptr [rbp - 4], 0
mov dword ptr [rbp - 8], edi
mov qword ptr [rbp - 0x10], rsi
mov dword ptr [rbp - 0x14], 0
movsxd rax, dword ptr [rbp - 0x14]
cmp rax, 0xa
jae 0x40123f
mov eax, dword ptr [rbp - 0x14]
add eax, 1
...

Testing

In order to run tests you will need to have the following programs installed and in your path

And then build the test programs by running testprograms/build.sh.

Note: just because the programs compile doesn't mean they will work or represent how one should write programs for any of the given platforms. The idea is to generate executables for tests and the tests don't run the programs, they just expect the ELF files to contain certain things.

Roadmap

Done:

  • Read elf file, including segments and sections.
  • Read symbol and string tables and relate them to sections.
  • Provide functions for dealing with addresses (VMA, LMA, and file).
  • Basic disassembly of functions.

TODO:

  • Async file API.
  • Blob API.
  • Documentation.
  • Better disassembly of functions.
  • Rudimentary binary analysis, especially stack analysis.
  • Demanging of C++ names (or other names for that matter).
  • Performance. Though the ELF parsing happens in an instant, the functions for inspecting the structure are slow and will suffer on big files.
  • Test on more platforms. Currently we do cursory checks for x64, Risc-V and ARM Cortex-M. Other platforms of interest could be MIPS, PowerPC, etc. executables for these systems should load but no tests have been done.
  • A companion library for visualization. I would like to see where everything is and easily be able to spot functions that are too big or in the wrong place.
  • Enough information in the ELF structure to be able to write ELF files.

License

See LICENSE which applies to all files in this repository unless otherwise specified.