Building blocks of an inclusive data infrastructure
To reap the benefits of the smart city hype, cities need to get the basics right
For cities across the world, becoming a “smart city” seems inevitable as the myriad of digital innovations from the Internet of Things (IoT), and artificial intelligence, to big data have become mainstream. In this piece, we share a framework for digital transformation that cities and agencies can use to leverage opportunities for improved planning, policymaking, and service delivery.
While the universe of new and emerging applications of big data in cities is exciting and wide-ranging, it is critical that there is a greater emphasis on the key building blocks of an effective and inclusive city-wide data ecosystem. In this Inclusive Cities Catalyst piece, we lay out key building blocks, and outline practical steps and resources that can help cities make progress towards putting these building blocks in place.
1. Data standards
According to the US Geological Survey, a scientific agency of the United States government, “data standards are the guidelines by which data are described and recorded.” 1
Establishing a common format and definition of data through data standards is critical for cities and policymakers to be able to understand, share, or combine a given dataset with other data sources. Data standards also reduce the time needed to translate or clean data, “a common barrier encountered by data scientists, taking 26% of data scientists’ on-the-job time.”2 For instance, diverging formats for date variables (e.g., April 2, 2024, 04-02-24, 04/02/2024) require data scientists to manually interpret and convert dates into a common format.
The UK Office for National Statistics (ONS) documents the landscape of Data Standards. This shows that there is a wide range of different types of standards relating to different aspects of data3:
Classification Standards
Data File Format Standards
Data Format Standards
Data Management Standards
Data Organization Standards
Data Provider Standards
Data Sharing Standards
Geospatial Data Standards
Governance Standards
Metadata Standards
Statistical Unit Definition Standards
While all of these data standards are important, cities and urban policymakers should pay particular attention to these foundational data standards:
Classification standards: Depending on the sensitivity of the data an organization holds, there need to be different levels of classification, which determine the number of things, including who has access to that data and how long the data needs to be retained. Typically, there are four classifications for data: public, internal-only, confidential, and restricted.
Dataset-level standards: Dataset-level standards specify the scientific domain, structure, relationships, field labels, and parameter-level standards for the dataset as a whole. Parameter-level standards define the format and units for a given parameter (for example date/time or location) or field within a dataset and help users correctly interpret the values. Parameter-level standards should be adopted at the time of data collection, that is when values in a field are created or recorded.
Data encoding and interface standards: Data encoding standards define the rules for structuring and organizing data for use in a given context. These standards ensure that when applications read data, the information and context are preserved. Data encoding standards are generally associated with a file format (there are different formats through which information is encoded for storage in a computer file - for example, PNG vs. JPG for images). The usage of universal, and open-source formats allows for collaboration and accessibility by different types of organizations and stakeholders. Over the past decade, cities across the world have invested in the development of open data portals to facilitate transparency and civic engagement. The presence of APIs opens vital data assets up to the public and private developers that can build relevant software applications that re-use the data, for example, to give real-time updates on public transport4. Cities like Jakarta, Portland, or Buenos Aires highlight how open data approaches through common data standards can be a catalyst to collaboration and partnerships between government, academia, civil society, and the private sector5. Below are some examples of open data encoding standards used in urban planning:
2. Data system infrastructure and architecture
Data architecture is concerned with the methods by which data is collected, stored, and used. Collection can be as simple as an online survey or poll or in the case of “smart” cities can be built on decentralized networks of sensors and control devices. In turn, both wired and wireless networks connect the sensors with their control systems. Control systems gather, store, and process data, and then transmit this data output to the point where it can be actioned or consumed. Edge computing infrastructure handles time-sensitive applications and data aggregation, while private and public cloud infrastructure provides general-purpose utility computing, big data analysis, and long-term information storage6.
Cities face a limited set of choices when it comes to edge computing infrastructure, cloud infrastructure, data collection applications, data visualization products, and these choices interact with other key themes such as data standards, sensors, and other hardware options, as well as wider data governance and strategy questions.
Past research has highlighted that social and technological legacies create path dependencies for smart city development7. Yet when it comes to the current set of choices cities face, there is little research about the long-run implications of different hardware and software choices, and few resources are available to support urban policymakers in making the ones that will maximize value for their communities.
3. Data governance
Trust in data governance principles and structures is vital for cities aiming to ensure that a broad range of stakeholders and citizens actively participate in a city’s data strategy and wider smart city ambitions.
For New America, Natalie Chyi and Yuliya Panfil adapt principles from economist Elinor Ostrom, where she defined as resources—like water—that are shared and managed by a group. Her work challenges the well-known “Tragedy of the Commons” phenomenon and argues that individuals and communities could effectively manage their own collective resources.
Chyi and Panfil identify 4 key principles from Ostrom’s work and apply them to the challenge of urban data governance:
Promote responsibility for data governance among multiple layers of nested enterprises.
Create processes for the affected community to participate in making and modifying the rules around data.
Develop an effective monitoring system to be carried out by the community.
Provide accessible means for dispute resolution, use graduated sanctions against rule-breakers, and make enforcement measures clear8.
Smart city initiatives have often been criticized for their failure to leverage inputs from citizens, civil society, and community organizations. By drawing on Ostom’s work, city leaders can ensure that governance frameworks continue to be tailored to local needs and conditions, cities can have the flexibility to experiment with different rules and procedures, and that outcomes do not exclude a particular group of stakeholders. These data governance principles ultimately not only make urban data governance more democratic, but in the longer term also will contribute to more effective, and smarter, smart cities.
4. Data privacy
Every urban data strategy should aim to adhere to key data privacy principles around the collection, use, sharing, retention, and disposal of data. The EU General Data Protection Regulation (GDPR) sets out seven key principles for the lawful processing of personal data:
Lawfulness, fairness, and transparency: Personal data collection has to be lawful and it should be clearly stated that data is being collected as well as the reason why it is collected
Purpose limitation: Organisations should only collect personal data for a specific purpose, clearly state what that purpose is, and only collect data for as long as necessary to complete that purpose. Processing that takes place in the public interest or for scientific, historical, or statistical purposes is given more freedom.
Data minimization: Organisations should only process the personal data needed to achieve their state processing purposes.
Accuracy: Every reasonable step by organizations should be taken to erase or rectify data that is inaccurate and incomplete, while individuals should have to right to request that organizations do so within a set time frame.
Storage limitation: Organisations have to delete personal data that is no longer is no longer required.
Integrity and confidentiality: The GDPR sets out that personal data are processed in a manner that ensures appropriate security of the personal data, including protection against unauthorized or unlawful processing and against accidental loss, destruction, or damage, using appropriate technical or organizational measures.
A 7th principle, accountability, stipulates that all relevant organizations processing personal data have to be able to demonstrate that they are meeting the above compliance requirements and how they are doing so. User provisioning can support organizations in meeting these principles: Within an organization, various users will need to access the data for different reasons and their user permissions should reflect the appropriate access needed and allow for identity management. As much as possible, individual log-in credentials and accounts should be created to ensure that accounts are given the appropriate permissions, modified, disabled, and deleted as necessary.
Consistent enforcement of these principles across organizations in government, private sector, and civil society is vital for trust and buy-in.
5. Capacity building
Developing a smart city use cases requires engaging a range of specialized data science skillsets. The talent pool for public and private sector vary greatly and while there are some areas of overlap it’s vital that cities invest in developing their staff internally, develop growth pathways within data science, systems engineering, and other complimentary skillsets and design programs for engaging with private sector expertise.
Sabbaticals, fellowship programs, rotational programs can help develop the desired technical skillsets internally and provide opportunities for cross-pollination with private sector and research instituions. The U.S. Digital Corps is an example of a early career program designed to develop a talent pool “intersection of technology and public service”. Additionally, data.org’s recent report Workforce Wanted: Data Talent for Social Impact identifies an opportunity to shape and support a pool of 3.5 million data professionals focused on social impact in low- and middle-income countries (LMICs) over the next ten years. It provides resources and pathways for public policy makers aiming to build data science capabilities in the public sector.
6. Designing data products and services
For data to lead to tangile impact or policy outcomes, it must be transformed into a product or service. This requires multiple stages of processing, analysis, interpretation, design, and visualization. Data products can range from visualizations to apps to decision support systems or reports. Aligning with the end-user of the output from the onset and co-creating the output with them is critical to ensure that the product of the analysis will actually be useful and correspond to end-user needs.
In other instances, the value of quantitative analysis is only fully realized when combined with qualitative insights that are not always openly accessible to data scientists. It is therefore critical that teams are constituted by a diverse range of subject matter experts, civil society representatives, and policymakers in order to ensure that data products and services take advantage of existing insights and are tailored to the local context.
Takeaways
Here are some practical steps cities can take toward realizing these building blocks:
Start where you are: cities and local agencies are often stretched thin and underfunded. A great example is Freetown’s new property tax system that used satellite imagery and in-person data collection to reform the city’s property tax valuation, collection, and enforcement. The property tax system enabled Freetown to quintuple its property tax revenue, mobilze funds towards vital public services; gain public support with transparent valuation; and equitably enforce collections. The program is a great example of a city using its domestic capabilities coupled with appropriate external data sources to meet key administrative objectives.
The program The impact of simple changes such as:
Using data validation features in spreadsheets (available in Excel and Sheets) can vastly improve the quality of existing data and make strides toward adhering to any data standards
Establishing file naming structures
Establishing data sharing norms (via secure email, dropbox, or other means)
Removing sensitive information from surveys and data from datasets where is it not needed (i.e. removing Full Name requirement from public transit surveys)
Evaluating product subscriptions to remove redundancies and recover funding for other data initiatives
Appoint a chief data officer with the authority to work across different departments: CDOs play an important role in making sure that data becomes a strategic asset for their cities in supporting evidence-based decisions that result in better service delivery to residents. CDOs can work across departments and drive interventions aimed at aligning data standards, data standards, and data products. Barcelona’s approach to data governance has been widely discussed and researched. In order to ensure transparency and accountability of data governance processes, Barcelona City Council created a Municipal Data Office (MDO) which maintains compliance with GDPR, is led by the Chief Digital Officer, and is supervised by a Data Protection Officer. The Data Office’s role is to govern and analyze data and coordinate its management in different areas and districts in Barcelona. One of the main goals of the MDO is to unlock data of social or public value through the ecosystem by negotiating and arriving at data-sharing agreements with stakeholders9.
Conduct a city-wide data audit that identifies where data is collected, in what format, and for what purpose. However, given the vast range of different data repositories and the tendency for data to be siloed across different institutions and departments, data audits can be exhausting, inefficient, and resource-intensive. Setting priorities for data audits are therefore critical. A common data audit strategy followed by local governments is to afford priority to datasets that relate to critical public policy challenges and lend themselves to deposition in Open Data portals. Open Data portals can be built to different scales. As of June 2019:
Berlin has 1,689 datasets
Greater London has 1,254 datasets
Vienna has 470 datasets
Barcelona has 425 datasets
Copenhagen has 270 datasets
Paris has 252 datasets
Eindhoven has 91 datasets10
Join a network and community of practice of like-minded cities and urban policymakers: Cites have the ability to quickly learn and adapt best practices. By engaging in associations and communities of practice, cities can learn from each other, and openly exchange about benefits and challenges associated with different big data strategies. Relevant associations include:
Open Data Lab’s City Incubator for public intrapreneurs
Apolitical provides courses and training as well as the opportunity to join a wider community of practice exclusively for public sector officials
Open Data Institute’s City Programme - start by consulting this interactive open data toolkit for city leaders and this great webinar on how cities can better leverage open data for the built environment
European Innovation Partnership on Smart Cities and Communities
Provide staff with practical and ongoing training and support on data privacy and the meaning of data privacy principles in the context of different use cases: While there is a range of online certifications and training on data privacy for public sector officials, these have to be complemented by in-person training, discussions, practices, and safeguards.
To reap the benefits arising from innovations in digital technology and data processing capacities, cities need to get the basics right.
If you’re an urban policymaker and are seeking advice on where your city stands and how it can approach its data strategy, feel free to get in touch with us @ inclusivecitiescatalyst@gmail.com.
Stay tuned for our next piece - and if you’re an urban planner/urban policy expert/urban data strategist based on the African continent, and are open to working on new projects, get in touch with us to be featured in our upcoming directory.
US Geological Survey
US Geological Survey
Place Changers (2021), The case for data standards in urban planning and community engagement
Tony Blair Institute for Global Change (2021), Connected systems - rethinking cities in the age of mass data
Open Data for Development (2021), State of Open Data - Open Data and Urban Development
Server Technology (2022), A Smart City Overview
Meijer and Thaens (2021), Path Dependency of Smart Cities: How Technological and Social Legacies Condition Smart City Development
Chyi and Panfil (2020), A Commons Approach to Smart City Data Governance
The Data Economy Lab (2020), Cities & Data Sharing — Part 3: Barcelona
Heseltine Institute for Public Policy (2021), Building Data Ecosystems to Unlock the Value of (Big) Data: A Good Practices Reference Guide