Mat Schaffer

10 years of Ruby-powered citizen science

In the wake of the 2011 Tohoku earthquake and tsunami, people were worried for their safety. Safecast answered that call and went on to the largest open radiation database in the world.

10 years later our science project continues, with Ruby at its heart. Our radiation measurements span the globe and are freely available for anyone to use. And now with projects like Airnote we’re using that expertise to tackle new environmental challenges such as the California wildfires.

RubyKaigi Takeout 2021: https://rubykaigi.org/2021-takeout/presentations/matschaffer.html

RubyKaigi Takeout 2021

00:00:00.240 Hello everybody, and thank you for coming to my talk. This is ten years of Ruby-powered citizen science at RubyKaigi Takeout 2021. My name is Matt Schaffer. I'm an engineer at Elastic and a volunteer at Safecast, which is what I'm going to be talking about today. I am also a resident of Yamanashi Prefecture, so you can see me there hanging out by my wood pile.
00:00:10.639 I wanted to underline that while Elastic is my employer, Safecast, which I will discuss, is a volunteer time-off activity that I engage in as part of my work at Elastic. The company gives me five full business days per year to work on side projects that are open and welcome. It's a global company, and I'm able to be a Japanese citizen despite it being a worldwide organization. If you're interested in doing similar things, check out our careers page.
00:00:50.559 So, about Safecast: Safecast is probably best known for the radiation map that you see here. For the past ten years, we've been collecting radiation data and various environmental hazard data in response to the Fukushima disaster. I've talked about Safecast before, specifically at RubyKaigi in 2017. If you want a deeper dive into how the project got started and what we do, please take a look at that presentation, as it has much more detail. Today, I'm going to provide a very brief overview.
00:01:40.160 In the wake of the 2011 Fukushima tsunami and the subsequent nuclear disaster, there was a lot of confusion around the state of radiation. Were people safe? Did they need to evacuate? There really wasn't a lot of data available to answer these questions. In response, Safecast was one of the organizations that got started, initially using a Geiger counter duct-taped to the outside of a truck window alongside an iPhone to document it. This simple idea worked, allowing us to collect radiation measurements and provide people with the data they needed to make crucial decisions regarding their safety.
00:02:07.360 As you can imagine, it got a little more sophisticated over time; it wasn't really scalable for people to duct-tape Geiger counters to their windows. So, we evolved from there. We turned the duct-taped Geiger counter into a computer encased in a weatherproof box, which we eventually shrank down into a device that could be strapped to the side of a car. We recruited over a thousand volunteers around the country, and now around the world, to help drive around and collect radiation measurements. You can even see one mounted to the back of a postal service bike. These devices can be attached anywhere, allowing us to gather crucial data about environmental hazards that affect people in their local areas.
00:02:32.160 Fast forward 10 years, and we now have about 196 million data points in our database. We are referenced and cited by several important organizations, reflected in the logos on the screen. Our map now represents the largest open dataset of environmental radiation data. All of this data is Creative Commons zero public domain data. Recently, we celebrated our 10th anniversary with a 24-hour telethon on our YouTube channel, and we make presentations at international nuclear conferences while tracking unfolding events in environmental hazard data, primarily focused on radiation but increasingly on air quality as well.
00:03:06.959 I want to talk about how it works. I'll break this down into two parts: first, the devices. There isn’t much Ruby in this section yet, but there could be. We're doing some interesting things with M5Stack, and I've noticed there's an MRuby build for M5Stack. If you work with this or know anything about it, please get in touch. We would love to have more Ruby integrated into our devices, as that could be a lot of fun.
00:03:48.800 Right now, we have three main workhorse devices out in the field. The first is the large Nano unit on the left in the slide, which is the one you saw strapped to the car earlier. This weatherproof kit can be built by anyone and taken anywhere. I have my own that I’ve taken around the world many times, and it continues to provide a lot of the data in our database. The second device, Pointcast, is a radiation sensor that can be bolted to a wall and requires a power source. I detailed Pointcast in my 2017 presentation, explaining that it served as our primary fixed-point sensor reporting data back to us. It was a challenge to install and power these devices, which led to the development of Solarcast. This device, while somewhat of a prototype, was designed for easy installation without worrying about power or Ethernet connectivity, allowing for drop-and-forget functionality that measures both air quality and radiation.
00:05:02.720 We also have the Solarcast Nano, a radiation-only version of the Solarcast that follows the drop-and-forget principle and collects radiation data while being weatherproof and 3G-connected. We have been placing these on fence posts and telephone poles near nuclear facilities to enable ongoing radiation data reporting. Additionally, we created the B Geiger Cast, an ESP32 device that can be plugged into the shield of your EGeiger Nano. This converts your mobile sensor into a fixed-point sensor by connecting to Wi-Fi and Bluetooth. You only need to provide continual power, and the device will keep reporting data.
00:06:02.240 Finally, we have the B Geiger Zen, which is our next generation of the big Nano unit. It features an M5Stack at the front, GPS, and a radiation sensor at the back, similar to the existing Nano. The M5Stack is compelling technology that simplifies assembly by integrating many components that would previously require soldering onto the Geiger Nano into one package. We are currently focused on making this technology widely available.
00:07:28.480 We also developed the Air Node in collaboration with Blues Wireless. This project evolved from the Solarcast, using the challenges we faced as an opportunity to embed solutions into this new air sensor. The Air Node is solar-powered and can be installed anywhere by simply taping it to a window or wall in a sunny spot. It continually reports PM 2.5 data back to our servers, which are also public. This device utilizes a technology called Notecard from Blues Wireless, providing 500 megabytes of data over ten years with a built-in contract, meaning you have no concerns regarding data plan contracts; it works globally. You just need to turn it on, stick it to a sunny location, and you're done.
00:08:16.240 The data collected becomes available via a Grafana instance, which is also public. I have an example of my Air Node measuring PM 2.5 and temperature outdoors. It's been an interesting project, and we're also experimenting with radiation versions similar to the Air Node that can be solar-powered and used in a drop-and-forget manner. The idea is to deploy IoT sensors worldwide to collect various hazard data using the most suitable technology.
00:09:10.720 Next, I want to talk about how the system works on the cloud side, which is where Ruby comes into play. I previously discussed this in detail in my 2017 talk, and we’ve made several updates since then. The devices like the Air Node, the B Geiger Cast, and the Solarcast produce data that is normalized into JSON format before feeding it into our main ingestion framework called Ingest, which is a Ruby-based application. This application handles HTTP connectivity, sending data to Amazon's Simple Notification Service (SNS), which distributes the data to various locations.
00:10:26.160 Data flows into Elasticsearch on Elastic Cloud, S3 buckets owned by Safecast, and our own RDS instances. If you visit the datasets URL, you will find a public Amazon Resource Name (ARN) where you can subscribe to virtually anything. This includes potential subscriptions via email, although that may overwhelm your inbox. HTTP subscriptions are a smarter choice. Essentially, any time data comes into Ingest—whether from air node inputs or data from the B Geiger, Solarcast, and other devices—it gets routed to your application, allowing you to monitor and manage it as you see fit. Additionally, we have an API application built with Ruby on Rails available at api.safecast.org, offering a UI and management layer for accessing these data sources and facilitating bulk uploads.
00:11:50.640 The older devices like B Geiger Nano don't support continual recording, so they produce data files on SD cards which can then be uploaded to the Safecast API, facilitating the integration of this data into our servers. In the last four years, we've implemented significant infrastructure updates, such as a public subject that I talked about, excitedly allowing anyone to use our data. Furthermore, we've also introduced public S3 buckets for reporting on already collected data. We've transitioned from using DigitalOcean and Cloud66 back in 2017 to AWS and RDS, greatly enhancing our scalability. Our Continuous Integration (CI) processes also saw improvements, moving from Travis to Semaphore and then CircleCI, and we are now exploring GitHub Actions thanks to our expanded capabilities.
00:12:53.920 We migrated primarily from New Relic to Elastic APM, which has been advantageous because we are no longer restricted by host limits—instead, we pay for data storage alone. Since I work at Elastic, we also receive great sponsorship. Consequently, we can collect data across all our applications, and even user workstations can report to Elastic APM without any hassle. We’ve managed numerous upgrades in the past four years, including Ruby and Rails version updates, Postgres upgrades, and numerous gems—and the amount of contribution we see in our public project repositories is astounding. Moreover, since 2017, we've also added Windows support and GitPod development environments, facilitating contributions from collaborators.
00:14:20.160 We have also introduced many new features, like an improved bulk import flow for B Geiger uploads, a new tracking system for data approvals, and auto-approval mechanisms that expedite the mapping of newly uploaded data to map.safecast.org. This new system quickly processes uploads, contacts contributors for errant data points, and allows us to swiftly approve valid data points. The auto-approval is significant, speeding up the overall process and ensuring contributors have their data reflected on our public maps as soon as it's validated.
00:15:10.240 Additionally, we launched support for cosmic data, stemming from journeys aboard flights where we gathered measurements at high altitudes. Initially, uploads of cosmic data were disallowed to avoid cluttering the map with surface-level radiation data. However, we've now integrated cosmic data viewing into our database to maintain a clean user experience while allowing those measurements to remain accessible. I encourage everyone to play around with this capturing data at 11,000 meters; it's fascinating to witness the varied gamma rates encountered during flights.
00:16:32.640 As part of our outreach, we've been continuously engaging with the public. During our 10th anniversary telethon, we made all our videos publicly available at safecast.10. We contributed to COVID-19 tracking at the onset, especially regarding test tracking, but have shifted focus back on air and radiation measurement. Throughout this, we have also hosted online workshops teaching kids about electronics, radiation, and their environment. We've also made numerous podcast, video appearances, and interviews—a lot of fun and miscellaneous contributions that you can follow on our Twitter.
00:17:32.640 I would like to share how you can help. One significant reason for my talks is to encourage more involvement with our project. The Ruby and Rails ecosystems move rapidly, which poses a challenge for us as a volunteer organization without dedicated developers. Although we gain ground, we sometimes fall behind, especially given our long-standing app, making it necessary to rework older code. Our development environments are solid, and PRs for Ruby, Rails, and gem upgrades are genuinely welcome if you have that experience.
00:19:06.360 Additionally, we need assistance in upgrading our PostGIS and Elasticsearch; both present challenges given the large amounts of data we manage. We host around 200 gigabytes in PostGIS, and I’d guess the Elasticsearch side is similar. Fortunately, our cloud infrastructure simplifies creating snapshots of our databases, allowing for testing and upgrades without disrupting the main operating instance. If you'd like to assist with testing or other data-related deployments, I would gladly provide all necessary resources.
00:20:06.720 In conclusion, I aim to keep an eye on the help-wanted issues, focusing on tasks that require minimal domain knowledge about Safecast. Ultimately, it is critical that contributions align with individual interests. If there’s a feature you’re passionate about, I encourage you to pursue it. I believe we can create something exceptional together, and I appreciate you tuning in to my talk.
00:20:34.280 Thank you for listening. I hope you consider taking part in the Safecast project. Our main homepage is safecast.org, and you can reach me on Twitter or GitHub as Matt Schaffer. My personal email is [email protected]. I'd love to discuss any of the topics I covered, so feel free to reach out. I’ll stick around for any questions, and enjoy the conference!