In my day job as a Backend Developer in a E-Commerce Company I often have to communicate data to other departments. As always in the business world this happens via Excel Tables (or in my case .csv-files).
As you all know it can be a pain in the ass to handle large csv files - but dont worry anymore, help is on the way.

In some cases we have to work with exports/imports to third party software which only supports csv. As we work with a product database containing around 240k SKUs the files get kinda big (~2GB 🤦‍♂️ ). Sometimes the tables have to be filtered, converted or cleaned up - as a developer naturally i would write a small script to performce this tasks. This takes (some) time, is probably not very interesting and results in a very specific tool that you usally need exactly once - not very productive or fulfilling.

But not anymore - say hello to csvkit. I recently stumbled across this gem. It helps you converting csv files (from/to excel, json, …), have a look at you csv data, extract and combine csv data and even enables you to query csv like a database table via SQL.

With the magic of unix pipes you can achive virtually everything you need in a single one line bash command. No need to share some script files - a single message via slack an a colleague can use the same “tool”.

Examples

Convert a excel to json

$ in2csv /tmp/test.xls | csvjson | jq
[
  {
    "Some": "a",
    "Column ": 123,
    "Test": "foobar"
  },
  {
    "Some": "b",
    "Column ": 4242,
    "Test": "baz"
  }
]

Extract some columns a csv:

$ csvcut -e iso8859-1 -c number,active,stock -d ";" ./somelarge.csv > ./stocks_only.csv

The execution time for the tool is also very reasonable (test file was around 2GB):

$ time csvcut -e iso8859-1 -c number,active,stock -d ";" ./somelarge.csv > ./stocks_only.csv

real	0m8,869s
user	0m8,581s
sys	0m0,276s