README
filetype.js
Detect the file type of a Buffer/Uint8Array/ArrayBuffer
The file type is detected by checking the magic number of the buffer.
This package is for detecting binary-based file formats, not text-based formats like .txt
, .csv
, .svg
, etc.
Installation
$ npm install @jedithepro/filetype.js
Usage
Node.js
Determine file type from a file:
const FileType = require('@jedithepro/filetype.js');
(async () => {
console.log(await FileType.fromFile('Unicorn.png'));
//=> {ext: 'png', mime: 'image/png'}
})();
Determine file type from a Buffer, which may be a portion of the beginning of a file:
const FileType = require('@jedithepro/filetype.js');
const readChunk = require('read-chunk');
(async () => {
const buffer = readChunk.sync('Unicorn.png', 0, 4100);
console.log(await FileType.fromBuffer(buffer));
//=> {ext: 'png', mime: 'image/png'}
})();
Determine file type from a stream:
const fs = require('fs');
const FileType = require('@jedithepro/filetype.js');
(async () => {
const stream = fs.createReadStream('Unicorn.mp4');
console.log(await FileType.fromStream(stream));
//=> {ext: 'mp4', mime: 'video/mp4'}
}
)();
The stream method can also be used to read from a remote location:
const got = require('got');
const FileType = require('@jedithepro/filetype.js');
const url = 'https://upload.wikimedia.org/wikipedia/en/a/a9/Example.jpg';
(async () => {
const stream = got.stream(url);
console.log(await FileType.fromStream(stream));
//=> {ext: 'jpg', mime: 'image/jpeg'}
})();
Another stream example:
const stream = require('stream');
const fs = require('fs');
const crypto = require('crypto');
const FileType = require('@jedithepro/filetype.js');
(async () => {
const read = fs.createReadStream('encrypted.enc');
const decipher = crypto.createDecipheriv(alg, key, iv);
const fileTypeStream = await FileType.stream(stream.pipeline(read, decipher));
console.log(fileTypeStream.fileType);
//=> {ext: 'mov', mime: 'video/quicktime'}
const write = fs.createWriteStream(`decrypted.${fileTypeStream.fileType.ext}`);
fileTypeStream.pipe(write);
})();
API
FileType.fromBuffer(buffer)
Detect the file type of a Buffer
, Uint8Array
, or ArrayBuffer
.
The file type is detected by checking the magic number of the buffer.
If file access is available, it is recommended to use FileType.fromFile()
instead.
Returns a Promise
for an object with the detected file type and MIME type:
ext
- One of the supported file typesmime
- The MIME type
Or undefined
when there is no match.
buffer
Type: Buffer | Uint8Array | ArrayBuffer
A buffer representing file data. It works best if the buffer contains the entire file, it may work with a smaller portion as well.
FileType.fromFile(filePath)
Detect the file type of a file path.
The file type is detected by checking the magic number of the buffer.
Returns a Promise
for an object with the detected file type and MIME type:
ext
- One of the supported file typesmime
- The MIME type
Or undefined
when there is no match.
filePath
Type: string
The file path to parse.
FileType.fromStream(stream)
Detect the file type of a Node.js readable stream.
The file type is detected by checking the magic number of the buffer.
Returns a Promise
for an object with the detected file type and MIME type:
ext
- One of the supported file typesmime
- The MIME type
Or undefined
when there is no match.
stream
Type: stream.Readable
A readable stream representing file data.
FileType.fromTokenizer(tokenizer)
Detect the file type from an ITokenizer
source.
This method is used internally, but can also be used for a special "tokenizer" reader.
A tokenizer propagates the internal read functions, allowing alternative transport mechanisms, to access files, to be implemented and used.
Returns a Promise
for an object with the detected file type and MIME type:
ext
- One of the supported file typesmime
- The MIME type
Or undefined
when there is no match.
An example is @tokenizer/http
, which requests data using HTTP-range-requests. A difference with a conventional stream and the tokenizer, is that it can ignore (seek, fast-forward) in the stream. For example, you may only need and read the first 6 bytes, and the last 128 bytes, which may be an advantage in case reading the entire file would take longer.
const {makeTokenizer} = require('@tokenizer/http');
const FileType = require('@jedithepro/filetype.js');
const audioTrackUrl = 'https://test-audio.netlify.com/Various%20Artists%20-%202009%20-%20netBloc%20Vol%2024_%20tiuqottigeloot%20%5BMP3-V2%5D/01%20-%20Diablo%20Swing%20Orchestra%20-%20Heroines.mp3';
(async () => {
const httpTokenizer = await makeTokenizer(audioTrackUrl);
const fileType = await FileType.fromTokenizer(httpTokenizer);
console.log(fileType);
//=> {ext: 'mp3', mime: 'audio/mpeg'}
})();
Or use @tokenizer/s3
to determine the file type of a file stored on Amazon S3:
const FileType = require('@jedithepro/filetype.js');
const S3 = require('aws-sdk/clients/s3');
const {makeTokenizer} = require('@tokenizer/s3');
(async () => {
// Initialize the S3 client
const s3 = new S3();
// Initialize the S3 tokenizer.
const s3Tokenizer = await makeTokenizer(s3, {
Bucket: 'affectlab',
Key: '1min_35sec.mp4'
});
// Figure out what kind of file it is.
const fileType = await FileType.fromTokenizer(s3Tokenizer);
console.log(fileType);
})();
Note that only the minimum amount of data required to determine the file type is read (okay, just a bit extra to prevent too many fragmented reads).
FileType.extensions
Returns a set of supported file extensions.
FileType.mimeTypes
Returns a set of supported MIME types.
Supported file types
jpg
png
apng
- Animated Portable Network Graphicsgif
webp
flif
cr2
- Canon Raw image file (v2)cr3
- Canon Raw image file (v3)orf
- Olympus Raw image filearw
- Sony Alpha Raw image filedng
- Adobe Digital Negative image filenef
- Nikon Electronic Format image filerw2
- Panasonic RAW image fileraf
- Fujifilm RAW image filetif
bmp
icns
jxr
psd
indd
zip
tar
rar
gz
bz2
7z
dmg
mp4
mid
mkv
webm
mov
avi
mpg
mp1
- MPEG-1 Audio Layer Imp2
mp3
ogg
ogv
ogm
oga
spx
ogx
opus
flac
wav
qcp
amr
pdf
epub
mobi
- Mobipocketexe
swf
rtf
woff
woff2
eot
ttf
otf
ico
flv
ps
xz
sqlite
nes
crx
xpi
cab
deb
ar
rpm
Z
lz
cfb
mxf
mts
wasm
blend
bpg
docx
pptx
xlsx
jp2
- JPEG 2000jpm
- JPEG 2000jpx
- JPEG 2000mj2
- Motion JPEG 2000aif
odt
- OpenDocument for word processingods
- OpenDocument for spreadsheetsodp
- OpenDocument for presentationsxml
heic
cur
ktx
ape
- Monkey's Audiowv
- WavPackasf
- Advanced Systems Formatdcm
- DICOM Image Filempc
- Musepack (SV7 & SV8)ics
- iCalendarglb
- GL Transmission Formatpcap
- Libpcap File Formatdsf
- Sony DSD Stream File (DSF)lnk
- Microsoft Windows file shortcutalias
- macOS Alias filevoc
- Creative Voice Fileac3
- ATSC A/52 Audio File3gp
- Multimedia container format defined by the Third Generation Partnership Project (3GPP) for 3G UMTS multimedia services3g2
- Multimedia container format defined by the 3GPP2 for 3G CDMA2000 multimedia servicesm4v
- MPEG-4 Visual bitstreamsm4p
- MPEG-4 files with audio streams encrypted by FairPlay Digital Rights Management as were sold through the iTunes Storem4a
- Audio-only MPEG-4 filesm4b
- Audiobook and podcast MPEG-4 files, which also contain metadata including chapter markers, images, and hyperlinksf4v
- ISO base media file format used by Adobe Flash Playerf4p
- ISO base media file format protected by Adobe Access DRM used by Adobe Flash Playerf4a
- Audio-only ISO base media file format used by Adobe Flash Playerf4b
- Audiobook and podcast ISO base media file format used by Adobe Flash Playermie
- Dedicated meta information format which supports storage of binary as well as textual meta informationshp
- Geospatial vector data formatarrow
- Columnar format for tables of dataaac
- Advanced Audio Codingit
- Audio module format: Impulse Trackers3m
- Audio module format: ScreamTracker 3xm
- Audio module format: FastTracker 2ai
- Adobe Illustrator Artworkskp
- SketchUpavif
- AV1 Image File Formateps
- Encapsulated PostScriptlzh
- LZH archivepgp
- Pretty Good Privacyasar
- Archive format primarily used to enclose Electron applicationsstl
- Standard Tesselated Geometry File Format (ASCII only)
Pull requests are welcome for additional commonly used file types.
The following file types will not be accepted:
- MS-CFB: Microsoft Compound File Binary File Format based formats, too old and difficult to parse:
.doc
- Microsoft Word 97-2003 Document.xls
- Microsoft Excel 97-2003 Document.ppt
- Microsoft PowerPoint97-2003 Document.msi
- Microsoft Windows Installer
.csv
- Reason..svg
- Detecting it requires a full-blown parser. Check outis-svg
for something that mostly works.