Python Dataset Library -- CSV to SQLite ;)


Out of frustration of just needing to handle some (temporary) data properly and save it (csv and json are nice, but i wanted sqlilte in this case to query it easier) I found the dataset library.

This makes handling Databases (supposedly) as easy as json, .. and i have to admit it really makes stuff where one doesn't want to have full control over table creation,.. really simple and fast

Here's one really quick'n dirty csv -> sql (sqlite in my case) code:

import csv, uuid, dataset

def csv2dataset(fin, fout):
with open(fin, 'r') as finp:
inpdata = csv.DictReader(finp)
db = dataset.connect('sqlite:///' + fout)
table = db['contact']
for contact in inpdata:
contact['uuid'] = uuid.uuid4().hex
contact['verified'] = False

if __name__ == '__main__':
csv2dataset('addresses.csv', 'newsletterdata.sqlite')

all it does is open the csv file, uses DictReader to get the data.

Then we "connect" to the SQLite db and get/create  a table called 'contact' (this syntax seems to be the same as db.get_table - which creates a table if it doesn't exist .. there also is db.load_table which only fetches existing ones).

Loop through the contact data from the csv, add a column with an UUID (i don't like to use verification Urls that include the email, so i generated UUIDs) and a boolean column. The rest of the column names are from the source data (the colum names were in the first row of the csv file, which is what DictReader used)

The schema created from this is here:

CREATE TABLE contact (
first_name TEXT,
last_name TEXT,
display_name TEXT,
email TEXT,
organization TEXT,
uuid TEXT,
verified BOOLEAN,
CHECK (verified IN (0, 1))

if you want to query the DB it is just about the same (import, .. and error handling / checks omitted):

contacts = db.load_table('contact')
# find one only - this is None if nothing is returned
contact = contacts.find_one(contacts.table.columns.uuid=myuuid)
# find all occurences contact - returns an iterable object even if there are no matches
contact = contacts.find(contacts.table.columns.uuid == myuuid)

For the rest please consult the documentation on thie dataset webpage

Current rating: 5


There are currently no comments

New Comment


required (not published)