A tool concentrating on converting csv data to JSON with customised parser supporting
Usage no npm install needed!
<script type="module">
import csvtojson from 'https://cdn.skypack.dev/csvtojson';
</script>
README
CSVTOJSON
csvtojson module is a comprehensive nodejs csv parser to convert csv to json or column arrays. It can be used as node.js library / command line tool / or in browser. Below are some features:
const request=require('request')
const csv=require('csvtojson')
csv()
.fromStream(request.get('http://mywebsite.com/mycsvfile.csv'))
.subscribe((json)=>{
return new Promise((resolve,reject)=>{
// long operation for each json e.g. transform / write into database.
})
},onError,onComplete);
output: The format to be converted to. "json" (default) -- convert csv to json. "csv" -- convert csv to csv row array. "line" -- convert csv to csv line string
delimiter: delimiter used for separating columns. Use "auto" if delimiter is unknown in advance, in this case, delimiter will be auto-detected (by best attempt). Use an array to give a list of potential delimiters e.g. [",","|","
quot;]. default: ","
quote: If a column contains delimiter, it is able to use quote character to surround the column content. e.g. "hello, world" won't be split into two columns while parsing. Set to "off" will ignore all quotes. default: " (double quote)
trim: Indicate if parser trim off spaces surrounding column content. e.g. " content " will be trimmed to "content". Default: true
checkType: This parameter turns on and off whether check field type. Default is false. (The default is true if version < 1.1.4)
ignoreEmpty: Ignore the empty value in CSV columns. If a column value is not given, set this to true to skip them. Default: false.
fork (experimental): Fork another process to parse the CSV stream. It is effective if many concurrent parsing sessions for large csv files. Default: false
noheader:Indicating csv data has no header row and first row is data row. Default is false. See header row
headers: An array to specify the headers of CSV data. If --noheader is false, this value will override CSV header row. Default: null. Example: ["my field","name"]. See header row
flatKeys: Don't interpret dots (.) and square brackets in header fields as nested object or array identifiers at all (treat them like regular characters for JSON field identifiers). Default: false.
maxRowLength: the max character a csv row could have. 0 means infinite. If max number exceeded, parser will emit "error" of "row_exceed". if a possibly corrupted csv data provided, give it a number like 65535 so the parser won't consume memory. default: 0
checkColumn: whether check column number of a row is the same as headers. If column number mismatched headers number, an error of "mismatched_column" will be emitted.. default: false
eol: End of line character. If omitted, parser will attempt to retrieve it from the first chunks of CSV data.
escape: escape character used in quoted column. Default is double quote (") according to RFC4108. Change to back slash (\) or other chars for your own case.
includeColumns: This parameter instructs the parser to include only those columns as specified by the regular expression. Example: /(name|age)/ will parse and include columns whose header contains "name" or "age"
ignoreColumns: This parameter instructs the parser to ignore columns as specified by the regular expression. Example: /(name|age)/ will ignore columns whose header contains "name" or "age"
colParser: Allows override parsing logic for a specific column. It accepts a JSON object with fields like: headName: <String | Function | ColParser> . e.g. {field1:'number'} will use built-in number parser to convert value of the field1 column to number. For more information See details below
alwaysSplitAtEOL: Always interpret each line (as defined by eol like \n) as a row. This will prevent eol characters from being used within a row (even inside a quoted field). Default is false. Change to true if you are confident no inline line breaks (like line break in a cell which has multi line text).
nullObject: How to parse if a csv cell contains "null". Default false will keep "null" as string. Change to true if a null object is needed.
downstreamFormat: Option to set what JSON array format is needed by downstream. "line" is also called ndjson format. This format will write lines of JSON (without square brackets and commas) to downstream. "array" will write complete JSON array string to downstream (suitable for file writable stream etc). Default "line"
needEmitAll: Parser will build JSON result is .then is called (or await is used). If this is not desired, set this to false. Default is true.
All parameters can be used in Command Line tool.
Asynchronous Result Process
Since v2.0.0, asynchronous processing has been fully supported.
e.g. Process each JSON result asynchronously.
csv().fromFile(csvFile)
.subscribe((json)=>{
return new Promise((resolve,reject)=>{
// Async operation on the json
// don't forget to call resolve and reject
})
})
Note that if error being emitted, the process will stop as node.js will automatically unpipe() upper-stream and chained down-stream1. This will cause end event never being emitted because end event is only emitted when all data being consumed 2. If need to know when parsing finished, use done event instead of end.
In order to not produce nested JSON, simply set flatKeys:true in parameters.
/**
csvStr:
a.b,a.c
1,2
*/
csv({flatKeys:true})
.fromString(csvStr)
.subscribe((jsonObj)=>{
//{"a.b":1,"a.c":2} rather than {"a":{"b":1,"c":2}}
});
Header Row
csvtojson uses csv header row as generator of JSON keys. However, it does not require the csv source containing a header row. There are 4 ways to define header rows:
First row of csv source. Use first row of csv source as header row. This is default.
If first row of csv source is header row but it is incorrect and need to be replaced. Use headers:[] and noheader:false parameters.
If original csv source has no header row but the header definition can be defined. Use headers:[] and noheader:true parameters.
If original csv source has no header row and the header definition is unknown. Use noheader:true. This will automatically add fieldN header to csv cells
Example
// replace header row (first row) from original source with 'header1, header2'
csv({
noheader: false,
headers: ['header1','header2']
})
// original source has no header row. add 'field1' 'field2' ... 'fieldN' as csv header
csv({
noheader: true
})
// original source has no header row. use 'header1' 'header2' as its header row
csv({
noheader: true,
headers: ['header1','header2']
})
Column Parser
Column Parser allows writing a custom parser for a column in CSV data.
What is Column Parser
When csvtojson walks through csv data, it converts value in a cell to something else. For example, if checkType is true, csvtojson will attempt to find a proper type parser according to the cell value. That is, if cell value is "5", a numberParser will be used and all value under that column will use the numberParser to transform data.
Built-in parsers
There are currently following built-in parser:
string: Convert value to string
number: Convert value to number
omit: omit the whole column
This will override types infered from checkType:true parameter. More built-in parsers will be added as requested in issues page.
Thank you to all our sponsors! (please ask your company to also support this open source project by becoming a sponsor)
Paypal
Browser Usage
To use csvtojson in browser is quite simple. There are two ways:
1. Embed script directly into script tag
There is a pre-built script located in browser/csvtojson.min.js. Simply include that file in a script tag in index.html page:
<script src="node_modules/csvtojson/browser/csvtojson.min.js"></script>
<!-- or use cdn -->
<script src="https://cdn.rawgit.com/Keyang/node-csvtojson/d41f44aa/browser/csvtojson.min.js"></script>