Gmail Data Visualization with d3.js
TL/DR; see the visualization in action.
Google now allows gmail users to download a complete archive of their Google
Services from Google Takeout. I
downloaded a copy of my gmail account and decided to poke around a bit. I’ve
always enjoyed manipulating large amounts textual data with the old school UNIX
grep, etc. so this sounded like a good excuse
to have some fun.
It took nearly two hours before my archive was available for download. Google
runs a backup at the time you request a download then the file itself was a
couple of gigs. I wasn’t sure what to format to expect my gmail backup.
Google archives all of my emails in to a single
.mbox file. The file was
massive. 1.4 Gigs of straight text. Attachments are stored in the mbox file
as well. They are base64 encoded and attached as a multipart email attachment.
Long story short, I parsed the
gmail.mbox, extracted a few meta data points
from each email, and strored the results in a SQL database.
A very useful piece of meta data is the custom
X-Gmail-Labels header google
adds to all emails. The value of the header are the labels applied to an email
I receive. I have around one hundred filter rules for incoming mail so each
piece is categorized. Anything in the
Inbox category is an email not
matching a filter rule.
From 1457861861978585811@xxx Tue Jan 21 05:30:14 2014 X-GM-THRID: 1457861861978585811 X-Gmail-Labels: Basecamp,VE,Important # <---- BOOM ...
I spent a fair amount of time deciding how I will visualize the data. After several SQL queries and a few dead ends I took inspiration from Mike Bostock’s NYT visualization for President Obama’s 2013 budget.
See the visualization in action.
Discussion on Hacker News.