Csvkit - Command line tools to work with CSV
In my day job as a Backend Developer in a E-Commerce Company I often have to communicate data to other departments. As always in the business world this happens via Excel Tables (or in my case .csv-files).
As you all know it can be a pain in the ass to handle large csv files - but dont worry anymore, help is on the way.
In some cases we have to work with exports/imports to third party software which only supports csv. As we work with a product database containing around 240k SKUs the files get kinda big (~2GB 🤦♂️ ). Sometimes the tables have to be filtered, converted or cleaned up - as a developer naturally i would write a small script to performce this tasks. This takes (some) time, is probably not very interesting and results in a very specific tool that you usally need exactly once - not very productive or fulfilling.
But not anymore - say hello to csvkit. I recently stumbled across this gem. It helps you converting csv files (from/to excel, json, …), have a look at you csv data, extract and combine csv data and even enables you to query csv like a database table via SQL.
With the magic of unix pipes you can achive virtually everything you need in a single one line bash command. No need to share some script files - a single message via slack an a colleague can use the same “tool”.
Examples
Convert a excel to json
$ in2csv /tmp/test.xls | csvjson | jq
[
{
"Some": "a",
"Column ": 123,
"Test": "foobar"
},
{
"Some": "b",
"Column ": 4242,
"Test": "baz"
}
]
Extract some columns a csv:
$ csvcut -e iso8859-1 -c number,active,stock -d ";" ./somelarge.csv > ./stocks_only.csv
The execution time for the tool is also very reasonable (test file was around 2GB):
$ time csvcut -e iso8859-1 -c number,active,stock -d ";" ./somelarge.csv > ./stocks_only.csv
real 0m8,869s
user 0m8,581s
sys 0m0,276s