Best Practices for Data Diaries in Investigative Journalism and Beyond



Earlier this month, Farsight attended The National Institute for Computer-Assisted Reporting (NICAR) Conference in New Orleans to deliver hands-on DNSDB training to nearly 100 data journalists. The two standing-room only workshops, "Finding the story: Using DNS search for investigative journalism," were delivered by Farsight Security CEO Dr. Paul Vixie.

One of the primary takeaways from this terrific conference, which drew over 1000 data journalists from around the world, is the critical role data plays in today's investigations. In addition to our own trainings, I attended several other sessions including, "Dear Diary: Best Practices For Keeping a Data Diary." In this session, WBUR Ally Jarmanning moderated a great discussion with reporters Associated Press Ken Sweet and The Markup Emmanuel Martinez, who shared the importance and best practices for keeping data diaries.

In this article, I want to share some information from this session and would be interested to learn more how you and your team record your data use in your investigations.

The session began by defining the term,"data diary." A data diary is a recipe for your analysis – like a recipe for your food. It should not only record the data (the type of data, orgin of the data) and/or records you used, but also every step you take in the investigation – and the reasons behind those steps. Even if you are changing a variable in an Excel spreadsheet – even just sorting a column – the reporters recommend that you capture it in your data diary.

The data diary should contain your code or formulas, where you downloaded the data (and how you saved it), text that explains your analysis and outputs – whatever your code spit out.

Possible formats for data diaries include Google Docs, Text Files, or Jupyter Notebooks, Microsoft Word – whatever works for you as long as it is clear and readable to other people.

The reporters described how the data diary provides an opportunity for you to have a dialogue with your data team. Everybody reads it, from fact checkers and beat reporters to editors and more. The diary provides context for an investigation. It enables the reporters to verify findings and build trust with their team. In addition, if you need to take a break from your investigation, the data diary can be the roadmap that helps you get back to the last step in your investigation.

This session reminded me that recording your data use or types of pivots is critical for all investigations – not just news stories. Recently, one of our DNSDB users shared, "When using DNSDB as an investigative tool, it is important to keep notes of the trail you are going down; sometimes you can go down to the second or fourth level unexpectedly and go off-track, and want to return to your starting point. You should also record your different datasets – and how they may link together. I verify every connection/link."

What are some of your own tips and tricks for recording the use of data in your investigations? I'd like to learn more and invite you to share your best practices with me at I will share them in a future blogpost.

To learn more about our investigative journalism grant program, visit here. To read more how investigative reporters are using DNSDB in their investigations, I encourage you to read these recent WSJ, Columbia Journalism Review, Reuters and ProPublica articles.

Thank you to NICAR for the opportunity to attend and train at your event. Finally, thank you to all the great investigative journalists who are on the front-lines every day working to uncover the truth for their readers.

Karen Burke is the Director of Corporate Communications for Farsight Security, Inc.