This is a case study about using data science and data visualization to tell a deeper story about a municipal population. It’s included here to illustrate the challenges of bringing enormous data sets to a single screen.
Data from many municipal departments is isolated, making it difficult to pinpoint areas needing more or different resources. How can the city uncover hidden needs within its communities by examining the data sets in new ways?
The specific approach to this project was framed by equity and equality within the diverse communities in the city.
Why equity and equality?
Many people conflate these two words but the reality is that they are very different concepts. For example, two areas of the city may receive the same amount of funding for parks and recreation. However one area must spilt its funds among many smaller, older parks while the other area has fewer but newer parks. The first park district spends all its funding on basic maintenance and cannot afford improvements. The second park district does not have similar maintenance costs and can spend its budget on improvements. Both districts have equal funding but not equitable funding.
This illustration shows the essential difference between equality and equity.
To better understand the problem, our team looked at the evaluation techniques of other national and international municipalities. While every city generates quantitative data, the process of understanding its populations often stops there. Statistics and numbers don’t tell the whole story. More in-depth insights come when the data are integrated and visualized. For example, the city of Santa Rosa, CA developed an impressive city “report card” to reveal multi-layered information about the different zones in the city.
The City of Santa Rosa, CA, created a scorecard showing different aspects of its diverse city zones.
The problem our team was trying to solve was slightly different from Santa Rosa. Instead of a static report card, we needed to make a tool that would display how different variables – time, boundaries, money, and crime – could interact dynamically.
Before we could start looking at how to integrate data and layers, we had to answer some difficult questions.
We also had to address a common problem in data science: databases that are asynchronous, incomplete, and missing, or information that offers only “apples to oranges” in context.
Here’s an example of misleading data. The first visualization leads the viewer the think one team is significantly better than the other. The second visualization places the data in a more appropriate context.
Source: National Geographic
From the beginning of the project, we knew that we would be working with some kind of map interface. The research we did with other cities suggested to our data science team that we could roll up multiple data sets together to create a composite score for a geographical area.
We called this the Community Wellbeing Score. The score was a high-level indicator comprised of information from community vital statistics. These included health indicators, education rates, employment rates, and similar data.
The score would not be a number on a scale of, for example, 1-100. The reason that there was no absolute top or bottom score is because the score was relative to itself. The score was a product of how one area of the city ranked in relation to all other parts of the city.
As a result, we needed to find a way to represent the Community Wellbeing Score on the map. We chose to use a heat map to represent the score because it can show relationships over an area.
These whiteboards show early efforts to work through user interactions with the product.
Once we identified the Community Wellbeing Score heat map as base layer, we then started playing with how to layers other types of information.
See explanation below.
The image above is among the first wireframes we created. The Community Wellbeing Score at left can be filtered for various of its composite scores. The smaller circles on top of the heat map indicated crime reports. We later changed our strategy with regard to criminal activity by treating it more like the CWS as opposed to specific instances of crime.
The “Resources Access Score” at low right also changed to “Access to City Services” as it would not be possible to incorporate and filter all the potential resources in a specific area.
Several iterations later and we were here…
See explanation below.
The first wireframe in this set shows an overview of the city with census tract boundaries. While there are other boundary options included in the dropdown, census tracts were chosen for several reasons. The first being that tracts change little over time and the data is free to use: it’s a reliable resource for poof of concept.
The second was that census data provided much of the information for the CWS. In this wireframe, only the CWS is turned on. Other options, such as crime or funding could be toggled on/off, which would affect the appearance of the heat map.
The second wireframe in this set zooms to a particular tract. Again, only the CWS is toggled on. At the bottom of the screen users can look at trends over time. The “blue sky ask” on this project was to create a timeline slider that would allow the user to view changes over time. In this case, there was not enough synchronous data to complete that ask.
In the third wireframe, crime data is toggled on. The result is that the heat map shifts by changing opacity and hue where there are new hot spots (darker in color) or cool spots.
The resource shown in the center of the screen (the community center) gives another layer of insight about why hotter or cooler spots may exist within a boundary. Resources like schools and clinics can have a large impact on the CWS because they affect things like education rates or childhood health and wellness.
It’s not possible to show the final screens here but they were very similar to the frames shown above. At this point, all the key elements that give insights about equity are in place. These blocks – resources, funding, crime, and the CWS – allow users to visualize how a number of complex factors interact. Users can view the city data at multiple granularities, from the whole city to a single neighborhood.
Not all databases are equal. Census data come out in periods of years whereas some city sets are measured in intervals of months (such as for youth summer programs). This project’s big challenge was maintaining intellectual integrity in the face of non-analogous data.
The story of a population can not be told by a single number. It’s a multifaceted, evolving data set, and 2D mediums, even screens with dynamic interactions, barely compensate.