This article by Harman Singh Dhodi, Kiran Tati, Sapna Maheshwari, and Washim Nawaz was originally published on Amazon Web Services’ Big Data Blog.
In our increasingly digital world, affordable access to high-speed broadband is a necessity to fully participate in our society, yet there are still millions of American households without internet access. HR&A Advisors—a multi-disciplinary consultancy with extensive work in the broadband and digital equity space is helping its state, county, and municipal clients deliver affordable internet access by analyzing locally specific digital inclusion needs and building tailored digital equity plans.
The first step in this process is mapping the digital divide. Which households don’t have access to the internet at home? Where do they live? What are their specific needs?
Public data sources aren’t sufficient for building a true understanding of digital inclusion needs. To fill in the gaps in existing data, HR&A creates digital equity surveys to build a more complete picture before developing digital equity plans. HR&A has used Amazon Redshift Serverless and CARTO to process survey findings more efficiently and create custom interactive dashboards to facilitate understanding of the results. HR&A’s collaboration with Amazon Redshift and CARTO has resulted in a 75% reduction in overall deployment and dashboard management time and helped the team achieve the following technical goals:
-
- Load survey results (CSV files) and geometry data (shape files) in a data warehouse
- Perform geo-spatial transformations using extract, transform, and load (ELT) jobs to join geometry data with survey results within the data warehouse to allow for visualization of survey results on a map
- Integrate with a business intelligence (BI) tool for advanced geo-spatial functions, visualizations, and mapping dashboards
- Scale data warehouse capacity up or down to address workloads of varying complexity in a cost-efficient manner
In this post, we unpack how HR&A uses Amazon Redshift spatial analytics and CARTO for cost-effective geo-spatial measurement of digital inclusion and internet access across multiple US states.
Before we get to the architecture details, here is what HR&A and its client, Colorado’s Office of the Future of Work, has to say about the solution.
“Working with the team at HR&A Advisors, Colorado’s Digital Equity Team created a custom dashboard that allowed us to very effectively evaluate our reach while surveying historically marginalized populations across Colorado. This dynamic tool, powered by AWS and CARTO, provided robust visualizations of which regions and populations were interacting with our survey, enabling us to zoom in quickly and address gaps in coverage. Ensuring we were able to seek out data from those who are most impacted by the digital divide in Colorado has been vital to addressing digital inequities in our state.”
— Melanie Colletti, Digital Equity Manager at Colorado’s Office of the Future of Work
“AWS allows us to securely house all of our survey data in one place, quickly scrub and analyze it on Amazon Redshift, and mirror the results through integration with data visualization tools such as CARTO without the data ever leaving AWS. This frees up our local computer space, greatly automates the survey cleaning and analysis step, and allows our clients to easily access the data results. Following the proof of concept and development of first prototype, almost all of our state clients showed interest in using the same solution for their states.”
— Harman Singh Dhodi, Analyst at HR&A Advisors, Inc.
Storing and analyzing large survey datasets
HR&A used Redshift Serverless to store large amounts of digital inclusion data in one place and quickly transform and analyze it using CARTO’s analytical toolkit to extend the spatial capabilities of Amazon Redshift and integrate with CARTO’s data visualization tools—all without the data ever leaving the AWS environment. This cut down significantly on analytical turnaround times.
The CARTO Analytics Toolbox for Redshift is composed of a set of user-defined functions and procedures organized in a set of modules based on the functionality they offer.
The following figure shows the solution and workflow steps developed during the proof of concept with a virtual private cloud (VPC) on Amazon Redshift.
Figure 1: Workflow illustrating data ingesting, transformation, and visualization using Redshift and CARTO.
In the following sections, we discuss each phase in the workflow in more detail.
Data ingestion
HR&A receives survey data as wide CSV files with hundreds of columns in each file and related spatial data in hexadecimal Extended Well-Known Binary (EWKB) in the form of shape files. These files are stored in Amazon Simple Storage Service (Amazon S3).
The Redshift COPY command is used to ingest the spatial data from shape files into the native GEOMETRY data type supported in Amazon Redshift. A combination of Amazon Redshift Spectrum and COPY commands are used to ingest the survey data stored as CSV files. For the files with unknown structures, AWS Glue crawlers are used to extract metadata and create table definitions in the Data Catalog. These table definitions are used as the metadata repository for external tables in Amazon Redshift.
For files with known structures, a Redshift stored procedure is used, which takes the file location and table name as parameters and runs a COPY command to load the raw data into corresponding Redshift tables.
Data transformation
Multiple stored procedures are used to split the raw table data and load it into corresponding target tables while applying the user-defined transformations.
These transformation rules include transformation of GEOMETRY data using native Redshift geo-spatial functions, like ST_Area and ST_length, and CARTO’s advanced spatial functions, which are readily available in Amazon Redshift as part of the CARTO Analytics Toolbox for Redshift installation. Furthermore, all the data ingestion and transformation steps are automated using an AWS Lambda function to run the Redshift query when any dataset in Amazon S3 gets updated.
Data visualization
The HR&A team used CARTO’s Redshift connector to connect to the Redshift Serverless endpoint and built dashboards using CARTO’s SQL interface and widgets to assist mapping while performing dynamic calculations of the map data as per client needs.
The following are sample screenshots of the dashboards that show survey responses by zip code. The counties that are in lighter shades represent limited survey responses and need to be included in the targeted data collection strategy.
The first image shows the dashboard without any active filters. The second image shows filtered map and chats by respondents who took the survey in Spanish. The user can select and toggle between features by clicking on the respective category in any of the bar charts.
Figure 2: Illustrative Digital Equity Survey Dashboard for the State of Colorado. (© HR&A Advisors)
Figure 3: Illustrative Digital Equity Survey Dashboard for the State of Colorado, filtered for respondents who took the survey in Spanish language. (© HR&A Advisors)
The result: A new standard for automatically updating digital inclusion dashboards
After developing the first interactive dashboard prototype with this methodology, five of HR&A’s state clients (CA, TX, NV, CO, and MA) showed interest in the solution. HR&A was able to implement it for each of them within 2 months—an incredibly quick turnaround for a custom, interactive digital inclusion dashboard.
HR&A also realized about a 75% reduction in overall deployment and dashboard management time, which meant the consulting team could redirect their focus from manually analyzing data to helping clients interpret and strategically plan around the results. Finally, the dashboard’s user-friendly interface made survey data more accessible to a wider range of stakeholders. This helped build a shared understanding when assessing gaps in each state’s digital inclusion landscape and allowed for a targeted data collection strategy from areas with limited survey responses, thereby supporting more productive collaboration overall.
Conclusion
In this post, we showed how HR&A was able to analyze geo-spatial data in large volumes using Amazon Redshift Serverless and CARTO.
With HR&A’s successful implementation, it’s evident that Redshift Serverless, with its flexibility and scalability, can be used as a catalyst for positive social change. As HR&A continues to pave the way for digital equity, their story stands as a testament to how AWS services and its partners can be used in addressing real-world challenges.
We encourage you to explore Redshift Serverless with CARTO for analyzing spatial data and let us know your experience in the comments.