# go-tablib: format-agnostic tabular dataset library [![MIT License](http://img.shields.io/badge/license-MIT-blue.svg?style=flat-square)][license] [![Go Documentation](http://img.shields.io/badge/go-documentation-blue.svg?style=flat-square)][godocs] [![Go Report Card](https://goreportcard.com/badge/github.com/agrison/go-tablib)][goreportcard] [![Build Status](https://travis-ci.org/agrison/go-tablib.svg?branch=master)](https://travis-ci.org/agrison/go-tablib) [license]: https://github.com/agrison/go-tablib/blob/master/LICENSE [godocs]: https://godoc.org/github.com/agrison/go-tablib [goreportcard]: https://goreportcard.com/report/github.com/agrison/go-tablib Go-Tablib is a format-agnostic tabular dataset library, written in Go. This is a port of the famous Python's [tablib](https://github.com/kennethreitz/tablib) by Kenneth Reitz with some new features. Export formats supported: * JSON (Sets + Books) * YAML (Sets + Books) * XLSX (Sets + Books) * XML (Sets + Books) * TSV (Sets) * CSV (Sets) * ASCII + Markdown (Sets) * MySQL (Sets) * Postgres (Sets) Loading formats supported: * JSON (Sets + Books) * YAML (Sets + Books) * XML (Sets) * CSV (Sets) * TSV (Sets) ## Overview ### tablib.Dataset A Dataset is a table of tabular data. It must have a header row. Datasets can be exported to JSON, YAML, CSV, TSV, and XML. They can be filtered, sorted and validated against constraint on columns. ### tablib.Databook A Databook is a set of Datasets. The most common form of a Databook is an Excel file with multiple spreadsheets. Databooks can be exported to JSON, YAML and XML. ### tablib.Exportable An exportable is a struct that holds a buffer representing the Databook or Dataset after it has been formated to any of the supported export formats. At this point the Datbook or Dataset cannot be modified anymore, but it can be returned as a `string`, a `[]byte` or written to a `io.Writer` or a file. ## Usage Creates a dataset and populate it: ```go ds := NewDataset([]string{"firstName", "lastName"}) ``` Add new rows: ```go ds.Append([]interface{}{"John", "Adams"}) ds.AppendValues("George", "Washington") ``` Add new columns: ```go ds.AppendColumn("age", []interface{}{90, 67}) ds.AppendColumnValues("sex", "male", "male") ``` Add a dynamic column, by passing a function which has access to the current row, and must return a value: ```go func lastNameLen(row []interface{}) interface{} { return len(row[1].(string)) } ds.AppendDynamicColumn("lastName length", lastNameLen) ds.CSV() // >> // firstName, lastName, age, sex, lastName length // John, Adams, 90, male, 5 // George, Washington, 67, male, 10 ``` Delete rows: ```go ds.DeleteRow(1) // starts at 0 ``` Delete columns: ```go ds.DeleteColumn("sex") ``` Get a row or multiple rows: ```go row, _ := ds.Row(0) fmt.Println(row["firstName"]) // George rows, _ := ds.Rows(0, 1) fmt.Println(rows[0]["firstName"]) // George fmt.Println(rows[1]["firstName"]) // Thomas ``` Slice a Dataset: ```go newDs, _ := ds.Slice(1, 5) // returns a fresh Dataset with rows [1..5[ ``` ## Filtering You can add **tags** to rows by using a specific `Dataset` method. This allows you to filter your `Dataset` later. This can be useful to separate rows of data based on arbitrary criteria (e.g. origin) that you don’t want to include in your `Dataset`. ```go ds := NewDataset([]string{"Maker", "Model"}) ds.AppendTagged([]interface{}{"Porsche", "911"}, "fast", "luxury") ds.AppendTagged([]interface{}{"Skoda", "Octavia"}, "family") ds.AppendTagged([]interface{}{"Ferrari", "458"}, "fast", "luxury") ds.AppendValues("Citroen", "Picasso") ds.AppendValues("Bentley", "Continental") ds.Tag(4, "luxury") // Bentley ds.AppendValuesTagged("Aston Martin", "DB9", /* these are tags */ "fast", "luxury") ``` Filtering the `Dataset` is possible by calling `Filter(column)`: ```go luxuryCars, err := ds.Filter("luxury").CSV() fmt.Println(luxuryCars) // >>> // Maker,Model // Porsche,911 // Ferrari,458 // Bentley,Continental // Aston Martin,DB9 ``` ```go fastCars, err := ds.Filter("fast").CSV() fmt.Println(fastCars) // >>> // Maker,Model // Porsche,911 // Ferrari,458 // Aston Martin,DB9 ``` Tags at a specific row can be retrieved by calling `Dataset.Tags(index int)` ## Sorting Datasets can be sorted by a specific column. ```go ds := NewDataset([]string{"Maker", "Model", "Year"}) ds.AppendValues("Porsche", "991", 2012) ds.AppendValues("Skoda", "Octavia", 2011) ds.AppendValues("Ferrari", "458", 2009) ds.AppendValues("Citroen", "Picasso II", 2013) ds.AppendValues("Bentley", "Continental GT", 2003) sorted, err := ds.Sort("Year").CSV() fmt.Println(sorted) // >> // Maker, Model, Year // Bentley, Continental GT, 2003 // Ferrari, 458, 2009 // Skoda, Octavia, 2011 // Porsche, 991, 2012 // Citroen, Picasso II, 2013 ``` ## Constraining Datasets can have columns constrained by functions and further checked if valid. ```go ds := NewDataset([]string{"Maker", "Model", "Year"}) ds.AppendValues("Porsche", "991", 2012) ds.AppendValues("Skoda", "Octavia", 2011) ds.AppendValues("Ferrari", "458", 2009) ds.AppendValues("Citroen", "Picasso II", 2013) ds.AppendValues("Bentley", "Continental GT", 2003) ds.ConstrainColumn("Year", func(val interface{}) bool { return val.(int) > 2008 }) ds.ValidFailFast() // false if !ds.Valid() { // validate the whole dataset, errors are retrieved in Dataset.ValidationErrors ds.ValidationErrors[0] // Row: 4, Column: 2 } ``` A Dataset with constrained columns can be filtered to keep only the rows satisfying the constraints. ```go valid := ds.ValidSubset().Tabular("simple") // Cars after 2008 fmt.Println(valid) ``` Will output: ``` ------------ --------------- --------- Maker Model Year ------------ --------------- --------- Porsche 991 2012 Skoda Octavia 2011 Ferrari 458 2009 Citroen Picasso II 2013 ------------ --------------- --------- ``` ```go invalid := ds.InvalidSubset().Tabular("simple") // Cars before 2008 fmt.Println(invalid) ``` Will output: ``` ------------ ------------------- --------- Maker Model Year ------------ ------------------- --------- Bentley Continental GT 2003 ------------ ------------------- --------- ``` ## Loading ### JSON ```go ds, _ := LoadJSON([]byte(`[ {"age":90,"firstName":"John","lastName":"Adams"}, {"age":67,"firstName":"George","lastName":"Washington"}, {"age":83,"firstName":"Henry","lastName":"Ford"} ]`)) ``` ### YAML ```go ds, _ := LoadYAML([]byte(`- age: 90 firstName: John lastName: Adams - age: 67 firstName: George lastName: Washington - age: 83 firstName: Henry lastName: Ford`)) ``` ## Exports ### Exportable Any of the following export format returns an `*Exportable` which means you can use: - `Bytes()` to get the content as a byte array - `String()` to get the content as a string - `WriteTo(io.Writer)` to write the content to an `io.Writer` - `WriteFile(filename string, perm os.FileMode)` to write to a file It avoids unnecessary conversion between `string` and `[]byte` to output/write/whatever. Thanks to [@figlief](https://github.com/figlief) for the proposition. ### JSON ```go json, _ := ds.JSON() fmt.Println(json) ``` Will output: ```json [{"age":90,"firstName":"John","lastName":"Adams"},{"age":67,"firstName":"George","lastName":"Washington"},{"age":83,"firstName":"Henry","lastName":"Ford"}] ``` ### XML ```go xml, _ := ds.XML() fmt.Println(xml) ``` Will ouput: ```xml 90 John Adams 67 George Washington 83 Henry Ford ``` ### CSV ```go csv, _ := ds.CSV() fmt.Println(csv) ``` Will ouput: ```csv firstName,lastName,age John,Adams,90 George,Washington,67 Henry,Ford,83 ``` ### TSV ```go tsv, _ := ds.TSV() fmt.Println(tsv) ``` Will ouput: ```tsv firstName lastName age John Adams 90 George Washington 67 Henry Ford 83 ``` ### YAML ```go yaml, _ := ds.YAML() fmt.Println(yaml) ``` Will ouput: ```yaml - age: 90 firstName: John lastName: Adams - age: 67 firstName: George lastName: Washington - age: 83 firstName: Henry lastName: Ford ``` ### HTML ```go html := ds.HTML() fmt.Println(html) ``` Will output: ```html
firstName lastName age
George Washington 90
Henry Ford 67
Foo Bar 83
``` ### XLSX ```go xlsx, _ := ds.XLSX() fmt.Println(xlsx) // >>> // binary content xlsx.WriteTo(...) ``` ### ASCII #### Grid format ```go ascii := ds.Tabular("grid" /* tablib.TabularGrid */) fmt.Println(ascii) ``` Will output: ``` +--------------+---------------+--------+ | firstName | lastName | age | +==============+===============+========+ | George | Washington | 90 | +--------------+---------------+--------+ | Henry | Ford | 67 | +--------------+---------------+--------+ | Foo | Bar | 83 | +--------------+---------------+--------+ ``` #### Simple format ```go ascii := ds.Tabular("simple" /* tablib.TabularSimple */) fmt.Println(ascii) ``` Will output: ``` -------------- --------------- -------- firstName lastName age -------------- --------------- -------- George Washington 90 Henry Ford 67 Foo Bar 83 -------------- --------------- -------- ``` #### Condensed format ```go ascii := ds.Tabular("condensed" /* tablib.TabularCondensed */) fmt.Println(ascii) ``` Similar to simple but with less line feed: ``` -------------- --------------- -------- firstName lastName age -------------- --------------- -------- George Washington 90 Henry Ford 67 Foo Bar 83 -------------- --------------- -------- ``` ### Markdown Markdown tables are similar to the Tabular condensed format, except that they have pipe characters separating columns. ```go mkd := ds.Markdown() // or mkd := ds.Tabular("markdown" /* tablib.TabularMarkdown */) fmt.Println(mkd) ``` Will output: ``` | firstName | lastName | gpa | | -------------- | --------------- | ------- | | John | Adams | 90 | | George | Washington | 67 | | Thomas | Jefferson | 50 | ``` Which equals to the following when rendered as HTML: | firstName | lastName | gpa | | -------------- | --------------- | ------- | | John | Adams | 90 | | George | Washington | 67 | | Thomas | Jefferson | 50 | ### MySQL ```go sql := ds.MySQL() fmt.Println(sql) ``` Will output: ```sql CREATE TABLE IF NOT EXISTS presidents ( id INT NOT NULL AUTO_INCREMENT PRIMARY KEY, firstName VARCHAR(9), lastName VARCHAR(8), gpa DOUBLE ); INSERT INTO presidents VALUES(1, 'Jacques', 'Chirac', 88); INSERT INTO presidents VALUES(2, 'Nicolas', 'Sarkozy', 98); INSERT INTO presidents VALUES(3, 'François', 'Hollande', 34); COMMIT; ``` Numeric (`uint`, `int`, `float`, ...) are stored as `DOUBLE`, `string`s as `VARCHAR` with width set to the length of the longest string in the column, and `time.Time`s are stored as `TIMESTAMP`. ### Postgres ```go sql := ds.Postgres() fmt.Println(sql) ``` Will output: ```sql CREATE TABLE IF NOT EXISTS presidents ( id SERIAL PRIMARY KEY, firstName TEXT, lastName TEXT, gpa NUMERIC ); INSERT INTO presidents VALUES(1, 'Jacques', 'Chirac', 88); INSERT INTO presidents VALUES(2, 'Nicolas', 'Sarkozy', 98); INSERT INTO presidents VALUES(3, 'François', 'Hollande', 34); COMMIT; ``` Numerics (`uint`, `int`, `float`, ...) are stored as `NUMERIC`, `string`s as `TEXT` and `time.Time`s are stored as `TIMESTAMP`. ## Databooks This is an example of how to use Databooks. ```go db := NewDatabook() // or loading a JSON content db, err := LoadDatabookJSON([]byte(`...`)) // or a YAML content db, err := LoadDatabookYAML([]byte(`...`)) // a dataset of presidents presidents, _ := LoadJSON([]byte(`[ {"Age":90,"First name":"John","Last name":"Adams"}, {"Age":67,"First name":"George","Last name":"Washington"}, {"Age":83,"First name":"Henry","Last name":"Ford"} ]`)) // a dataset of cars cars := NewDataset([]string{"Maker", "Model", "Year"}) cars.AppendValues("Porsche", "991", 2012) cars.AppendValues("Skoda", "Octavia", 2011) cars.AppendValues("Ferrari", "458", 2009) cars.AppendValues("Citroen", "Picasso II", 2013) cars.AppendValues("Bentley", "Continental GT", 2003) // add the sheets to the Databook db.AddSheet("Cars", cars.Sort("Year")) db.AddSheet("Presidents", presidents.SortReverse("Age")) fmt.Println(db.JSON()) ``` Will output the following JSON representation of the Databook: ```json [ { "title": "Cars", "data": [ {"Maker":"Bentley","Model":"Continental GT","Year":2003}, {"Maker":"Ferrari","Model":"458","Year":2009}, {"Maker":"Skoda","Model":"Octavia","Year":2011}, {"Maker":"Porsche","Model":"991","Year":2012}, {"Maker":"Citroen","Model":"Picasso II","Year":2013} ] }, { "title": "Presidents", "data": [ {"Age":90,"First name":"John","Last name":"Adams"}, {"Age":83,"First name":"Henry","Last name":"Ford"}, {"Age":67,"First name":"George","Last name":"Washington"} ] } ] ``` ## Installation ```bash go get github.com/agrison/go-tablib ``` For those wanting the v1 version where export methods returned a `string` and not an `Exportable`: ```bash go get gopkg.in/agrison/go-tablib.v1 ``` ## TODO * Loading in more formats * Support more formats: DBF, XLS, LATEX, ... ## Contribute It is a work in progress, so it may exist some bugs and edge cases not covered by the test suite. But we're on Github and this is Open Source, pull requests are more than welcomed, come and have some fun :) ## Acknowledgement Thanks to kennethreitz for the first implementation in Python, [`github.com/bndr/gotabulate`](https://github.com/bndr/gotabulate), [`github.com/clbanning/mxj`](https://github.com/clbanning/mxj), [`github.com/tealeg/xlsx`](https://github.com/tealeg/xlsx), [`gopkg.in/yaml.v2`](https://gopkg.in/yaml.v2)