Gmail Data Visualization with d3.js
Feb 23, 2014
TL/DR; see the visualization in action.
Google now allows gmail users to download a complete archive of their Google Services from Google Takeout.
I downloaded a copy of my gmail account and decided to poke around a bit.
I've always enjoyed manipulating large amounts textual data with the old school UNIX commands,
grep, etc. so this sounded like a good excuse to have some fun.
It took nearly two hours before my archive was available for download.
Google runs a backup at the time you request a download then the file itself was a couple of gigs.
I wasn't sure what to format to expect my gmail backup.
Google archives all of my emails in to a single
The file was massive.
1.4 Gigs of straight text.
Attachments are stored in the mbox file as well.
They are base64 encoded and attached as a multipart email attachment.
Long story short, I parsed the
gmail.mbox, extracted a few meta data points from each email, and strored the results in a SQL database.
A very useful piece of meta data is the custom
X-Gmail-Labels header google adds to all emails.
The value of the header are the labels applied to an email I receive.
I have around one hundred filter rules for incoming mail so each piece is categorized.
Anything in the
Inbox category is an email not matching a filter rule.
From 1457861861978585811@xxx Tue Jan 21 05:30:14 2014 X-GM-THRID: 1457861861978585811 X-Gmail-Labels: Basecamp,VE,Important # <---- BOOM ...
I spent a fair amount of time deciding how I will visualize the data. After several SQL queries and a few dead ends I took inspiration from Mike Bostock's NYT visualization for President Obama's 2013 budget.
See the visualization in action.
Discussion on Hacker News.↫ All Notes