2 Comments

So I have a couple questions:

1. This assumes all your data fits in memory, or no?

2. How do you deal with I/O wait from the CPU for reading from disk and interacting line by line if it doesn't fit into your memory?

3. Isn't CSV as useful? Like many languages supports using the "header" of a CSV and generates a hash/array whatever based on those headers (defeating the purposing of parsing json back and forth).

4. Again, exporting as CSV is even faster, specially if you using PostgreSQL and you use "COPY" or MySQL "LOAD DATA INTO", cannot speak for sqlite though.

Thank you!

Expand full comment
author

I love these questions!

1 & 2. Thes questions deserve a whole post on their own, so I apologize for trying to stuff a complicated issue into a comment. The way I would deal with it is by adding a --line-records flag to the "json-sql query" subcommand. However I haven't implemented this yet, and would happily review a pull request impelementing such a feature.

3. CSV is great for representing single tables via the standard, "header" + values convention as you suggested. However, I am not aware of a standard convention in CSV to represent the multiple table output of json-sql read. Coming up with one in CSV would require a complicated spec, whereas with json objects provide a natural way to represent this.

I've also found that more of my engineering problems as a whole can be reduced to json problems than csv problems. While a lot of data can be stuff into a CSV, there's no natural way to call csv.load(data) in various languages to get the structure.

This also means that I have a rich set of tools and techniques for working with json, but not csv. To handle csv data, I actually have twin tools, csv-to-json and json-to-csv to make all of my csv problems json problems. json-to-csv notably only works on a subset of json values, whereas csv-to-json works on all csv.

4. CSV is faster to export in terms of machine time. However, this blog is focused on coding faster. As such, the blog always tradesoff of machine time to for development velocity. In practice, json-sql will have insufficient performance for some problems. In those cases, custom code or at least a csv-sql command which wraps a sql command in COPY/LOAD DATA might work well.

Expand full comment