Setting the Scene
In 2017, San Francisco became the first city in the United States where 100 percent of its residents live within a 10-minute walk to a park, assessed by the Trust for Public Land.
The Trust for Public Land uses the green space ranking annually to measure how the 100 most populous American cities are meeting their need for green space. It is presented as a ParkScore with methodology briefly explained on the organization’s website.
The Trust for Public Land acquires data from park inventories taken from park-owning agencies, with the data for all cities studied in a given year provided on its website. The main issue with ParkScore data is that the values are accumulative. Instead of considering the number of parks per city, the ranking assesses the green space land area. Due to this, comparing the number of green spaces in San Francisco (220 spaces managed by the Parks and Recreation Department) to the Trust for Public Land’s analysis is not very straightforward. Luckily, San Francisco P&R Department’s website provides various datasets on the parkland it manages in the city. Yet, after browsing the data, I began questioning whether each San Franciscan actually lives 10 minutes within a park. Let’s explore why!
Exploratory Question & Motivation
Three main aspects helped me define the exploratory question. Firstly, the easily accessible data on San Francisco Parks and Recreation’s website had two primary datasets with entries covering parks in the city. Both had conflicting values: the Park Lands dataset had 222 entries, and the Recreation and Parks Properties dataset had 240 entries. Secondly, after taking a closer look at the contents of these datasets, I found that the authors included playgrounds, golf courses, and recreational centers as part of the city’s park space. Lastly, the park inclusion criteria on the Trust for Public Land’s website explicitly state they count school parks with joint-use agreements and the privately owned parks managed for full public use. They do not include playgrounds without joint-use agreements and private golf courses in their assessment (Screenshot 1). Thus, since the Trust for Public Land takes data from park-owning agencies, P&R Department’s datasets would be one of the primary sources of information, but are they representative of the green space in the city?
Such discrepancies raise questions about data transparency. While the Trust for Public Land ranking might be correct, the public-facing data provided by San Francisco’s Parks and Recreation Department is misleading to those who might use it for research, scholarly/academic work or community-led efforts to city management. In broader terms, not being transparent about the number of accessible green spaces in a given city might lead to the creation of green space deserts, similar to transit deserts, where there is a gap between the green space availability and the needs of the local population. Misleading records on available green space can mistakenly lead urban planners and local authorities to overlook communal needs. In this case, we should not include entries such as golf courses, playgrounds, and recreational centers in the green space data pool because these places are only accessible to specific groups and aren’t inclusive of a larger community. Since they are often privately owned, they have different funding needs and capabilities.
Thus, my exploratory question is: How does our understanding of green space in today’s San Francisco change once we examine recent public-facing green space data provided by San Francisco P&R Department? I intend to clean two datasets compiled by San Francisco’s P&R Department with these objectives:
- To check whether the number of entries would decrease significantly (by about 22 for the Park Lands dataset and 24 for the Recreation and Parks Properties dataset, which would mean about a 10% decrease)
- To find the number of unique place names from both datasets to tap into the exploratory question.
A Drizzle of Context
It's fair to recognize that a 10-minute walk (approximately half a mile or 800 meters) might be a good indicator of parks' accessibility in San Francisco. Still, the World Health Organization recommends that all people reside within 300 meters of green space, for which the city falls short.
Yet, why is proper reporting of green space crucial, and more importantly, for who? The WHO 2022 report on blue-green infrastructure in Europe cites that people of lower socioeconomic standing benefit more from green spaces than those from privileged groups. The effects of green infrastructure are especially significant in underserved communities' mental health improvements and stress reduction. Findings from London, Berlin, and Sheffield suggest green spaces are central to fostering and providing social inclusion for asylum seekers, immigrants, and other disadvantaged groups. Parks are also linked to better mental and physical health in children and the elderly, creating space for social interactions and preventing social isolation for the elderly. While the reports on green spaces' quality and quantity govern policy-making, funding, and development decisions for all socioeconomic groups, they will inevitably impact those at a greater disadvantage the most. This consideration falls under the umbrella of Spatial Justice, meaning that underreporting or improper reporting of a city's public spaces increases the burden-benefit imbalance between privileged and less privileged social groups, leading to further marginalization of the latter.
There is, however, a challenge that comes with reporting. Its name is the paradox of exposure, and in San Francisco’s context, it relates to the city’s unhoused population. At the start of Covid-19, parks across the US became encampment grounds for people experiencing homelessness due to the closures of community centers and charities. Covid and park space regulations do not mix well, leading to the dismantling of similar encampments by police. Nevertheless, there is a common understanding that unhoused people exercise their “right to the city” by co-managing and co-creating these spaces according to their needs while building a community around them. What needs immediate attention is the authorities’ and park management’s responses to those experiencing homelessness using park space. In 2019, the US National Recreation and Park Association wrote: “The symptomatic impacts of homelessness, such as trash, camps and the ongoing presence of people experiencing homelessness, often upset housed park users and drove many of them to voice public complaints to park management, police departments and health departments”. Such statements coming from the country’s leading non-profit targeting the advancement of public parks turn a blind eye to marginalized communities’ needs, instead recognizing primarily white and privileged voices. We should note that reporting on the central role of green spaces in unhoused populations’ daily lives is crucial for increasing the accessibility and quality of these places to benefit underserved communities. At the same time, it will also point to the tensions between socioeconomic groups preventing the advancement of parks for people experiencing homelessness.
Lastly, proper reporting is vital in addressing climate change and public health. High-density urban areas are at a higher risk of urban heat island effects compared to rural and less built-up areas. Urban heat islands often correlate with the presence of impervious surfaces and little tree coverage, often due to the red-lined history of certain neighborhoods, creating an area that is a green space desert.
On Data
The two datasets hosted on Parks and Recreation Department’s website are
- The Park Lands dataset was created in 2016 and updated in 2019. There is no clear explanation of the contents of this dataset, but for this assignment, I assume that the entries are the parklands supervised by the Department;
- The Recreation and Parks Properties dataset was created in 2019 and updated in 2022. This dataset includes the lands owned and maintained by the Recreation and Parks Department.
Both datasets include place names, longitude, and latitude of the locations, acres, square feet, and the multipolygon coordinates for plotting. The data is available in multiple formats, but for data pre-processing purposes, I will work with the CSV format.
Data Pre-Processing
The data cleaning process is described in Diagram 1, created on FigJam:
I executed the process with the pandas library in a Google Colab notebook, which allows for easy sharing and the ability for multiple users to contribute code at the same time. I added extensive comments describing all the steps (Screenshot 2) and linked the datasets for easy reproduction.
My main challenge during this process was deciding how to output the final CSV files. While I had a dataframe with all unique by name values sorted alphabetically, it did not have the rest of the columns from the original datasets. This would have caused an issue if I wanted to plot the locations of green spaces using the geopandas package for Python since the coordinate data would be missing, along with the land areas and other potentially useful information. Thus, I decided to preserve the original structure of both datasets (which also differs between the two) by saving just the pre-processed data from two separate CSV files. It means that while the entries do not include playgrounds, golf courses, and recreational centers, places with the same names exist in both datasets.
Results Discussion
Once I cleaned the datasets from place names that are not green spaces or parks, such as playgrounds, recreation centers, golf, and tennis courts, the number of entries went from 222 to 152 for the Park Lands dataset and from 240 to 204 in the Recreation and Parks Properties dataset, as shown in Table 1. Both of these constitute decreases by more than 10%. The total number of unique places by name I obtained is only 185 (as I mentioned above, I did not create a separate dataset for the unique entries across both datasets to preserve the original structure). One potential limitation of finding the unique place names is I matched the entries by exact spelling, thus not accounting for possible spelling mistakes. When cleaning the data, I did not exclude open spaces from it since I worked under the assumption that they are more likely to share similar qualities with green spaces compared to the entries I excluded.
While this approach is relatively simplistic and requires site visits to confirm the features of the listed places to analyze whether they 0btain qualities of green spaces, it still shows that the data provided by San Francisco’s Parks and Recreation Department is inclusive of sites other than parks. While I cannot confidently claim that this is overreporting of green spaces, it sure smells like it.
When analyzing green space deserts in cities, data like this might significantly impact the results if taken as a “single source of truth.” As a result, an urban planner or a community member might not realize that the entries they are mapping are not solely parks/green spaces unless they examine the data closely. It might lead to populating diagrams and reports with incorrect data, potentially preventing the planners from effectively identifying green space deserts. This can further contribute to heat island effects that disproportionately impact the public health of underserved communities. Funding and developmental executive decisions might also use the reports with misleading information, diverting funds from communities that need them more. While more research is required to identify causal relationships between overreporting and serving underserved communities, there are grounds for correlation.
This exploration highlights the significance of being critical of the data a researcher is using. Examining the P&R Department’s park data suggests that public-facing data might be more confusing than helpful in our understanding of green space in San Francisco. While green space data seems easy to collect and digitize for general use, cross-examining across multiple sources would be beneficial for studying this assignment’s exploratory question. Establishing whether all San Franciscans live within a 10-minute walk to a park becomes more challenging.
Next Steps
While working with these datasets, I have brainstormed ways in which a researcher could extend this project to provide more robust quantitative results. Firstly, combining the two CSV files I pre-processed into one (which, given my current results, should have 185 entries). For this, the researcher should preserve the same columns from each dataset that would allow plotting the locations. Secondly, the researcher can clean the datasets from entries I might have missed (e.g., spelling mistakes) and decide whether to include open spaces. Lastly, I suggest using the geopandas package to plot a polygon layer consisting of the unique entries representing green spaces (this is possible since the entries include multipolygon coordinates). It would allow the researcher to plot half a mile distance (approximately a 10-minute walk) from the given green spaces to identify whether there are green deserts in San Francisco. Adding additional sources of parkland data to comparatively analyze them to find overlaps and gaps would further strengthen the approach.
Data Sharing
- Park Lands Cleaned Dataset
- Recreation and Parks Properties Cleaned Dataset
- Google Colab Notebook
- Original Park Lands Dataset
- Original Recreation and Parks Properties Dataset
Click here to read Part 2 of the mini-series.
Credits:
Photo by Jeffrey Eisen from Unsplash; edited in Adobe Fresco