About

 
I took this picture a few years ago at sunrise in Death Valley National Park. You can barely make out my wife in it -- a tiny black speck about 1/3 of the width in from the right.

I took this picture a few years ago at sunrise in Death Valley National Park. You can barely make out my wife in it -- a tiny black speck about 1/3 of the width in from the right.

About

I'm a Computational Journalist at ProPublica, where I use data science and code to do investigative journalism. Before this, I was a Machine Learning Engineer at Atrium, where I wrote software to automatically extract and analyze legal terms from contracts.

I've worked on stories that have:

My stories have been appeared in the New York Times, Wired, and the Guardian, among other publications.

My 2017 blog post that uncovered more than a million astroturf comments for FCC neutrality regulations unexpectedly went viral. It was covered in the Washington Post, Fortune, and engadget, and I was also invited on to Science Friday to explain my work. I’ve also been interviewed about my data science work in the New York Times and Forbes.

My Experience

Before working a in newsroom, I’ve had experience in small startups, large international law firms and in government. The common thread tying together each of those roles has been my ability to quickly analyze, contextualize and synthesize data and clearly communicate the results.

I have a law degree from Columbia Law School, where I was also the Editor-in-Chief of the Columbia Science and Technology Law Review, and a bachelors degree in systems engineering from the University of Waterloo, specializing in cognitive science.

My Data Science Philosophy

I believe in using models that are reproducible and, to the extent possible, interpretable. I use deep learning only when appropriate to the problem at hand, not as a first resort.

I believe in style guides (for both natural and programming languages), but not in blindly following rules at the expense of broader principles (for both natural and programming languages).

I believe in writing DRY code (and thinking about why), and I believe that data science projects for fun should have whimsical names. :-)

Most of all, I believe in paying attention to the details, in being clear about my assumptions, in being aware of the limitations of statistics, and in doing great work that matters.

Get in touch

I'd love to hear your feedback or questions about my work. Shoot me an email if you have an opportunity in mind or an idea for collaboration. You can reach me at jeff.kao at propublica.org, at +1 646 789 5351 on Signal, or through the contact page.