Telling Stories with Data – Project Workflow
Article by Lu Kang and Liukun Xie
Tom Calver, Data Journalist, BBC News Friday 12th October, 2018
Bush House, Kings College London
Tom Calver, a data journalist in the Visual Journalism department of BBC News, gave a talk at King’s College London about “telling stories with data and data project workflow” as well as the amazing work they have done. Tom is a seasoned data journalist, having rich experience in investigation and telling data stories. Before he joined BBC News, he was a data journalist with the Which Magazine.
Tom’s speech was mainly about the data project workflow shown in Figure 1 below, with plenty of anecdotes added to make the presentation more vivid and accessible.
Figure 1. Data projects workflow
Tom’s speech began with recent outcomes of his work, including his work on articles such as: “police under pressure (panorama)”; “Two-thirds of Britain unaffordable to young renters” and “House prices down on a decade ago in most areas”. He then explained the significant principles of data journalism – telling stories with data, putting the reader at the heart of
stories, how data is just like any other sources and the use of fancy tools to help users. Having explained these four principles, he then introduced the class to the workflow of a typical data project.
The workflow usually consists of 4 steps: generating ideas, obtaining data, analysing the data, and the presentation of the data.
Step 1: Generating ideas
Tom emphasised the importance of generating ideas that will captivate readers. Idea generation can be significantly more difficult than other parts of the process as it can be really tricky to come up with rational ideas. As such, Tom suggested various ways to obtain inspiration for useful ideas. For example, reading different magazines, newspaper, letters, while also comparing and refreshing existing stories and building a set of questions for your topic.
Step 2: Data collection
It is essential to decide what is required for the story and consider various ways to obtain the relevant data. Tom recommended a granular approach to data and suggested classifying data using different columns to ensure that each attribute of the gathered data is captured and logged. Tom then provided an overview of data sources, for example, data obtained from government sources (such as the Office of National Statistics) including healthcare data from the NHS and data on criminality from the Metropolitan Police, public information data, surveys etc. Most of these sources are transparent and free, and it is possible to retrieve data conveniently. For example, you can get government data from data.gov.uk, you can get NHS data from NHS.Digital, etc. Freedom of information requests can be made if non-public data needs to be accessed. Requests can be made to public bodies setting out the data requested and rationale for the request with a typical response time of 20 days.
Step 3: Analysing the data
Tom explained that this is a preprocessing stage to transform the collected data into something that is easier to manipulate. Not all collected data is useful to the topic at hand and therefore such data must be processed or ‘cleaned’ before it can be presented. Basic cleaning includes finding and replacing data. If necessary, this can be followed by more advanced cleaning processes, such as using Python or R to process the data. When analysing data, it usually helps to consider two types of output – “What are your interested overall storylines” and “what could be relevant for lots of different people”. As for the former, it is important to interrogate data by comparing it to targets, checking if they has anything changed over time, combining it with other data like demographics, and visual representations of data. The latter really matters when presenting data, especially with different age groups, different areas or any different readers or audience. Tom also emphasized that, it is really vital to make sure the data analysis is reproducible and keep a log of steps taken in processing the data.
Step 4: Presenting your findings
Mapping from the selected data to a visual representation, which is accomplished through well design, computer algorithms, different charts, is really a complex process. Tom said there are many aspects to consider, such as which charts to use, how to add context as well as how to make it clear and easy to understand. Besides, one should also consider the interactivity and personalization of your data. So a well-designed visualization (for example, Figure 2) would be important in this step.
Figure 2: Polio data visualisation
Furthermore, Tom introduced a useful toolkit for each step of the data project workflow (Figure 3). In the first step, keep your curiosity and get inspiration from web browser, magazines, newspapers, conversations and so on. In the second step, you can choose from web browser ,Excel or Google sheets to confirm and sort out your data. And you can even use R or Python ,it’s helpful but not essential. For the third step, Tableau is a fabulous visualization tool, and you can also use Open Refine,Access, Python or R to assist your analysing. In the last step, Tom said that R has been widely used to analyse and present data to the public in BBC News. Besides, Datawrapper for sort data, Tableau for visualization and exploring data, HTML for coding and analysing data are very helpful. In conclusion, there are varieties of toolkit .They can provide you with great help for your data workflow.
Figure 3 Toolkit
Q and A:
What are the main tasks of data journalists? How does the team work?
The work of data journalist involves all aspects, including domestic and overseas. Every large or small project requires the participation of data journalists. There are usually about two designers in the team, responsible for designing appropriate building prints and how the website tools look like. Then, there may also be two developers responsible for building the tools. For small projects, data journalists may carry out all the work by themselves, but large projects may involve a team of 4 or 5 including data journalists, designers and developers working together.
How to evaluate the accuracy or correctness of the data collected?
There are different ways to check, such as through Excel and other tools. In general, there are always at least two people working on a project, one who conducts the analysis and the other who verifies the analysis. In order to ensure the robustness of collected data, two main sources of data are checked. One is from official sources such as government records. Data from such sources will have been repeatedly verified and so can be used
directly. Ensuring the veracity of the data is crucial before it is published by BBC News.
How many data workers are there in the department/group?
There are 8 data journalists and 10-15 designers in the BBC’s department spread across a number of projects. They need to programme a lot using different programming languages such as R.They are not only responsible for the BBC News in UK but also overseas (BBC World Service). Generally one person can be expected to be working on 8 projects at the same time.
Is there any training for data analysis?
There is plenty of training available at the workplace. For instance, there is a training session every Friday conducted by colleagues for about 1 hour so that they can learn from one another. This includes sharing knowledge, ideas and new skills. The rapidly changing world makes it essential for journal coders to continuously improve and keep up with their technical skills.
What is the data journalist trying to illustrate using the index on homes that are sold at more than a ￡1m, compared to the average sale price?
Average house prices is the main measurement used by most people to evaluate the value of a housing asset. This data can be obtained via multiple online platforms, such as Zoopla and RightMove, thus it is readily available. However, the team wanted to present something different with another perspective that also prove to be very useful at the same time. Using the statistic of
houses over ￡1m can potentially reflect the level of safety and
demographics of the housing estate. These factors typically play a
more significant role when deciding to buy a house.
Do different projects require different website tools to be created every time？
The BBC News data group would come up with a general blue print and mould. When a project from a different field arises, such as areas in crime, education or medical research, the original website tools would be modified and revised to be adapted and made more suitable for the data. The team would run the data monthly and release new promotion for website tools on data processing.
What does Tom generally do in the data group?
Tom would mainly work through the first three steps including idea generation, data collection and subsequent analysis on the data. The third step on analysis is the most time-consuming one, because the source of data may not be reliable and is required to be double checked after every operation.
During the cleaning of the data, is there a record on the changes made?
Firstly, the original data would be stored safely so that it can be referenced to if needed. Secondly, data journalists is required to log every step of the operation and method, in order to ensure that the data has been handled carefully and correctly.
Figures: Tom Calver