RailsConf 2021

The Cost of Data

The Cost of Data

by Vaidehi Joshi

In the talk 'The Cost of Data' presented at RailsConf 2021 by Vaidehi Joshi, the focus is on the environmental impact of data storage, particularly through data centers. Joshi emphasizes how the digital data we generate is physically stored in data centers, which consume significant energy and have a substantial carbon footprint.

Key points discussed include:

- Understanding Data Storage: Most of the data generated online is stored in data centers, which are physical infrastructures that house servers and related hardware.

- Energy Consumption: Data centers use approximately 200 to 500 terawatt hours of electricity annually, representing 1 to 3 percent of the global electricity demand.

- Environmental Impact: The IT sector contributes about 2 percent of global greenhouse gas emissions, with data centers accounting for at least 0.3 percent. The impact of data centers varies based on the local energy sources they utilize, highlighting the importance of renewable energy.

- Cloud Providers Comparison: Two major cloud providers, AWS and Google Cloud, are compared in their sustainability practices. AWS aims for 100% renewable energy by 2030 but faces challenges with regions relying on fossil fuels, while Google purchases renewable energy equivalent to its consumption, demonstrating greater transparency and commitment to sustainability.

- Future of Data Centers: As global data traffic increases, energy demand from data centers is expected to rise, placing more pressure on energy resources. Solutions such as locating data centers in cooler climates and recycling waste energy are discussed. Additionally, there are calls for developers to consider where their data is stored and to promote sustainable practices.

- Individual and Collective Action: Joshi encourages individuals in tech to understand the carbon footprint of their data usage, choose cloud providers wisely, and advocate for transparency in energy sourcing from providers. Tools and resources are shared to facilitate this knowledge, including the Green Web Foundation to assess data center impacts and the low impact Manifesto for web development.

The talk concludes with a hopeful message that the tech industry can lead by example in sustainability and suggests that developers hold the power to influence practices towards a more sustainable future.

00:00:05.660 Hi everyone, my name is Vaidehi, and I'm really happy to be here with all of you today.
00:00:11.400 A little bit about me: I am a lead software engineer at Forum, the open-source software that powers online communities like Dev and Code Newbie. In my free time, I really love learning new things and sharing them with others.
00:00:24.720 These are a couple of the side projects I've done in my spare time. If you're interested in things like computer science, distributed systems, or really weird and strange computing history facts, you should check these projects out and come nerd out with me.
00:00:36.899 More recently, I've been obsessed with learning about something completely different and new to me. I've been really interested in how and where we store our data.
00:01:02.879 Now, you might be thinking: what do you mean? We've been thinking about how and where we store our data. Our data lives in a database, right? On some server somewhere. What’s there to think about? Well, it turns out there's more to this than meets the eye. So let's dig a little deeper.
00:01:24.420 Everything we create and share on the Internet is physically stored somewhere. However, most of us aren't thinking about how and where this data is stored. As developers, we might think that our data is ephemeral, but all of it lives somewhere. And when you work with code all day, it can be really easy to forget about the physical things that are powering our world.
00:01:41.700 So let's try to learn more about the physical aspects of our jobs together. These days, the majority of all data in the computing world is stored in something called a data center. A data center is one or many buildings that store servers and all of the hardware associated with those servers.
00:02:12.660 Before data centers were ubiquitous, we had to house all of our servers ourselves, sometimes in a single room. However, over time, people found that renting out servers in a data center was a lot easier than maintaining their own server rooms. That's how data centers became much more commonplace.
00:02:31.140 Most of us probably aren't even thinking about whether our data lives in a data center or not, and that's because most of us rely on cloud providers to think of that for us. Cloud providers are companies that rent out servers within their own data centers. If you use any of these cloud providers, then you're actually just renting out the servers in the data centers that these companies own, which is basically the service that they provide.
00:02:53.580 Now, renting out servers in a data center tends to be a lot easier than maintaining one's own server rooms. Why is that? When you store your content in a cloud provider's data center, then you don't have to maintain your own machines or your own network. You also conveniently have access to better hardware and improved security infrastructure, like backups and distributed databases.
00:03:14.519 Overall, storing content in a cloud provider's data center is just more efficient than running a data center of your own. Plus, when you sign up for a cloud provider, they'll promise you things like little to no downtime, which is a really nice benefit. But what do these cloud providers actually do in order to keep things running, to ensure operability, and to deliver on that promise of no downtime?
00:03:47.819 Well, to make sure that you can always access your data and keep their systems reliable, cloud providers will often set up more than one instance of your server in various different locations. This is something known as redundancy, which is the idea of having duplicates of something in the event of a failure. However, the trouble with having large numbers of servers is that the more servers you have, the more servers you need to power and maintain. Servers are demanding—they are powered by electrical energy. To take it a step further, not only do servers use energy, but they also emit energy in the form of heat.
00:04:45.720 If a server emits too much heat, then it will actually overheat and fail. Therefore, servers also need to be cooled in order to stay running and operable. Because of these hardware constraints, every data center will have some mechanism to remove hot air and supply cold air. So, there are two things happening: not only do we need energy to power servers, but we also need energy to cool them down too.
00:05:39.419 But just how much energy are we talking about? To be honest, it’s really hard to find exact numbers on any of this stuff, so I'll try to give you the best estimation based on my research. Looking at conservative estimates, data centers worldwide use approximately 200 terawatt hours a year. Now, by more generous estimates, some studies say that they consume up to 500 terawatt hours a year. So, that's quite a range, as one terawatt hour is equivalent to a trillion watts of energy consumed in a single hour.
00:06:09.300 For some context, the United Kingdom, which had a population of approximately 66 million in 2019, consumes only 303 terawatt hours a year. If we choose the more liberal estimate of data centers globally consuming closer to 500 terawatt hours a year, you can start to see that data centers collectively consume more electricity than many countries.
00:07:13.259 Depending on which estimate we go with, we can say that overall, data centers currently demand somewhere between one to three percent of the world's total electricity. This is pretty surprising when you think about the fact that data centers exist purely to store our data and make it available to us. Now, if data centers are using so much of the global energy supply, it’s worth asking: where does that energy come from?
00:08:06.300 The electricity to power data centers depends entirely on their local power grid. Unfortunately, depending on where you're located, your local power grid might actually burn fossil fuels as its main source of electricity. Fossil fuels emit greenhouse gases into the atmosphere. The important thing to remember here is that the environmental impact of every data center completely depends on where its energy comes from.
00:09:01.860 If its energy comes from fossil fuels, then the environmental impact of that data center is going to be far more detrimental than a data center whose energy comes from a more renewable source. When we start thinking about the overall impact of all data centers in the world and the tech sector, we can see how the energy that powers all of this infrastructure has a very real and tangible impact on the world.
00:09:35.760 Studies suggest that the IT and communication technology industries account for approximately two percent of all greenhouse gas emissions. Out of this two percent of emissions, conservative estimates suggest that data centers alone are responsible for at least 0.3 percent of that slice of the pie. This might not seem like a lot in the grand scheme of things, but when we think about the IT sector in the context of other industries and the impact they make in terms of emissions, it's actually quite significant.
00:10:41.460 Based on these numbers, it turns out that the tech sector has a carbon footprint equivalent to that of the airline industry, which is something that many of us may never have thought about. I know that this revelation came as a shock to me. So, what are we doing about this? The answer depends on each individual cloud provider and their values.
00:11:53.400 There's a great white paper published in 2018 called 'The State of Data Center Energy Use.' It was written by Paul Johnson and Ann Curry. This paper goes into a lot of detail about the environmental impacts of six of the biggest cloud providers. It also gives each of these cloud providers a grade in terms of the sustainability of their servers.
00:12:11.920 The original paper was written in 2018, which is when I discovered it. However, the authors of this paper updated it in 2020. I'll be referencing the updated version.
00:12:37.920 I mentioned that the white paper talks about many different cloud providers, but I'm only going to delve into two of them in detail today. However, I encourage you to check out the entire white paper because it’s very well-written and contains some wonderful resources.
00:12:53.460 Let's start with AWS, which is the largest cloud provider. They have committed to running on 100% renewable energy by the year 2030.
00:13:11.400 AWS allows you to house your servers and instances in different zones across the globe. By my count, they have around 22 zones, but only five of those zones are actually carbon neutral. According to AWS's website, these are the five zones that they claim are carbon neutral.
00:13:46.899 If you store your data in one of these carbon neutral zones, and if you provision a server in one of these zones, then AWS will purchase something called a carbon offset on your behalf. When AWS buys a carbon offset, they're really purchasing renewable energy and then putting it back into the electrical grid. These carbon offsets are what make these zones carbon neutral.
00:14:19.440 It's really important to note that while carbon neutral sounds good, there's a little more to it than meets the eye. If you store your data in one of these zones, you're still emitting greenhouse gases. Effectively, you're merely slowing down your emissions by buying renewable power generation from somewhere else in the form of a carbon offset.
00:15:05.700 Carbon offsets aren’t a long-term sustainable solution because they don't actually remove carbon emissions from the atmosphere; they just slow down how many emissions are going out into the atmosphere.
00:15:24.120 Despite the fact that AWS buys carbon offsets for those five carbon neutral regions I mentioned earlier, they still have other more problematic regions, like 'US East One,' which is located in Northern Virginia in the United States. Many companies host their data and provision their servers in this region.
00:15:52.080 Unfortunately, Dominion Energy, which is the power supplier for Virginia and this region, has doubled down on fossil fuels. They’ve taken steps to ensure that fossil fuels remain a part of their business model and, by some reports, they’ve tried to curb other renewable sources of energy, like solar. As a result, this means that all the data centers in this region are relying on a power grid that's powered by fossil fuels.
00:16:19.980 Now, you’ll remember that goal AWS has of becoming 100% renewable. Despite that goal and the fact that they have this entire region called 'US East One,' they're actually continuing to open new data centers in this region. Since 2017, they’ve increased their operations in US East One by 59%.
00:17:04.500 To make matters worse, AWS is the least transparent of all the major cloud providers, despite being the biggest. They do not do a good job of publicly reporting data on their current energy use or how quickly their energy use is growing. This makes it really hard to know if the renewable energy they claim to buy through carbon offsets is anywhere close to offsetting how much energy they actually use across all their data centers.
00:17:37.920 Now, if we look at Google Cloud, things start to look a little brighter. The Google Cloud Platform buys carbon offsets for all of its servers, not just a select few in specific regions. In fact, their parent company, Alphabet, is the largest corporate buyer of renewable energy. Compared to AWS, Google has been quite transparent about the fact that they know they can’t power 100% of their servers through renewable energy—at least not yet.
00:18:05.880 Instead, they’ve adopted a different strategy. For each kilowatt-hour of energy that Google Cloud Platform servers consume, Google purchases a matching kilowatt-hour of clean renewable energy and adds it back into the power grid. This means that the energy Google buys could be produced at a different time or in a different place from the location of their data centers.
00:18:52.620 Ultimately, they are actually adding new clean energy sources back into the electrical grid and matching it to all of the energy their servers consume. As a result, while hosting on Google Cloud might generate new carbon emissions, 100% of those emissions are offset. As it turns out, Google is surprisingly the leader in this entire sector and doing much better than many other cloud providers.
00:19:30.780 Microsoft is the only other provider that has met its 100% sustainability goal. You can read about that in the white paper. Google has also made it easy for their customers to pinpoint the exact usage and carbon emissions of their regions. They recently introduced something called the Carbon-Free Energy percentage (CFE), which measures the average percentage of carbon-free energy consumed in a particular region on an hourly basis.
00:20:13.140 This percentage also takes into account any carbon offsets and renewable energy purchased for that specific location. This data's transparency allows Google Cloud platform customers to focus on regions that maximize carbon-free energy and help reduce their carbon footprint, which is impressive.
00:20:59.560 More recently, Google has set a new goal: to run their business on carbon-free energy 24/7 everywhere across all data centers by the year 2030. This initiative is termed 'Real-Time Matching.' The idea here is to consider what time of day it is, and what season it is, regarding when energy is generated. Google aims to match that energy with the real-time demand of their data centers across all servers.
00:21:50.100 Now that we understand the realities of two very different cloud providers and how they're powering their data centers, how will energy usage of data centers scale into the future? Research papers have consistently found that global data traffic is growing quickly, leading to an inevitable rise in data center usage. This will demand more and more energy.
00:22:41.520 The estimates suggest that by the end of this decade, the IT industry could use between 8% and 21% of the total global electricity demand. Given that we're an industry that values efficiency and scalability, we must collectively start thinking about this problem.
00:23:16.140 The good news is that many of us are already working on solutions to this inevitability from a hardware perspective. Some cloud providers are constructing new data centers in colder climates where energy required to cool their servers is minimized.
00:23:35.520 Other cloud providers are recovering wasted heat from servers and repurposing it. For example, one data center in Sweden recovered wasted heat from their servers and used it to heat ten thousand apartments in Stockholm. Additionally, companies like Stripe and Basecamp have committed to becoming fully carbon neutral or carbon negative, which is exciting to see.
00:24:09.780 Each of us has agency in solving this problem. What can we do to mitigate the environmental impact of the data we create and store? First and foremost, we can find out where our data resides and whether it is stored in regions that are green, carbon neutral, and powered by renewable energy.
00:25:02.460 If your data isn't in a green region, it’s worth our time to determine what it might take to migrate our data to a different location or provider that is carbon neutral. If you're fortunate enough to be starting something fresh and need to provision new servers or databases, you now have the knowledge to select a clean cloud provider right from the start.
00:25:40.020 Another step we can take is to draw attention to this issue and discuss it with our teams. If you're at a smaller company, this might mean talking about cloud provider choices internally. However, at larger companies, especially those with substantial enterprise accounts, you can pressure your cloud provider to be more transparent about their energy sources while powering their data centers.
00:26:12.180 If they're not transparent, encourage them to clarify their data and make it easier for you as a customer to make green and sustainable choices. Collectively, we can hold these companies accountable. It's estimated that 53% of all servers will be located at hyperscale cloud data centers by the end of 2021. This means more than half of all functioning servers that are owned and operated are likely provisioned by AWS, Google, or Microsoft.
00:26:57.420 We need to focus on compelling them to commit to sustainable practices and follow through on their promises to mitigate the eventual growth that will happen. Thankfully, some providers are already taking steps in that direction. If you work for a company providing cloud services and have the power to do so, you can take it a step further by holding these companies accountable.
00:27:47.880 There are some active employees at Amazon pushing for these changes, which is encouraging to see. You could build something to make it easy to find information about this topic. Accurate statistics on energy consumption can be hard to find, and sometimes it's challenging to access data about these cloud providers. It is vital we make this data accessible so everyone can be thoughtful when making impactful choices.
00:28:34.260 A great example is a tool called the Cloud Sustainability Console, which is a Chrome extension highlighting which regions in AWS are carbon neutral. When you log into your AWS account, this extension lets you know whether you’re operating in a carbon neutral zone.
00:29:12.900 At the end of the day, we could all stand to be more aware and thoughtful about the computational resources we rely on in our daily lives as developers. This awareness extends beyond data centers to the websites we build, the data we fetch, and the content we store, all integral to our jobs. Sometimes we do this without a second thought, but the reality is that all this content and data requires energy in some form.
00:29:44.880 It's crucial to be mindful of that, especially as creators of these technologies and tools. I recently discovered a wonderful repository on GitHub called the Low Impact Manifesto, created by a Danish company called Organic Basics. It contains ten rules on creating a low-impact website and is open-sourced for us to use. Reflecting on every project and website I've built has made me think deeply about the data, resources, fonts, images, and requests I've made without considering the computational costs.
00:30:42.420 Another excellent resource is the Sustainable Web Manifesto, which lists solid principles that can guide us in creating a more sustainable internet. If you take away nothing else from this talk, I hope you will all become aware of the physical impacts of our actions, even when they may not be immediately visible. It's essential to recognize the impact our industry makes on both people and the planet as a whole.
00:31:43.980 Our industry relies on finite resources, even if we can't see them or often think of them. It's our responsibility to take that crucial first step to learn about these issues and educate ourselves and others. As I've learned more about data storage, I've realized that most of us don’t even know what the long-term costs of storing this data will be or the ramifications of our actions. Nobody is sure how much impact this will have five or ten years down the line.
00:32:03.720 Despite this uncertainty, I remain optimistic. While it might seem that our industry only contributes to this overwhelming problem, I believe we can choose to change the narrative. Historically, technology has pushed societal progress, and we can lead by example.
00:32:15.780 We are uniquely positioned since we are not only consumers of technology but also creators. This gives us the power to influence and guide our industry toward sustainability. If we are more aware and collectively commit to guiding our field towards a sustainable future, I am confident we can make a significant difference.
00:32:29.880 If you're interested in learning more about data centers and green energy, or if you're looking for statistics or articles, please check out costofdata.dev. There’s a wealth of information available, and I couldn’t cover everything in this talk, so I’ve compiled it for you to explore. Thank you so much for taking the time to listen, and I hope you learned something.