Summarized using AI

The Cost of Data

Vaidehi Joshi • February 20, 2020 • Earth • Talk

In her talk titled "The Cost of Data" delivered at RubyConf AU 2020, Vaidehi Joshi discusses the often overlooked realities and responsibilities surrounding data storage in the digital age. She highlights the physical infrastructure of the internet, specifically data centers and cloud services, underscoring the environmental impact associated with their energy consumption. Key topics include:

  • The Evolution of Data Storage: Vaidehi outlines the transition from personal server rooms to professional data centers and cloud services, emphasizing the efficiency and convenience of modern cloud providers like AWS, Microsoft Azure, and Google Cloud.
  • Data Center Energy Consumption: She reveals that data centers consume approximately 200 terawatt-hours of energy annually, accounting for 1-3% of the world's electricity usage, equivalent to the carbon footprint of the airline industry.
  • Environmental Responsibility: The talk explores the implications of this consumption, noting that many data centers primarily rely on fossil fuels. Vaidehi contrasts the varying sustainability commitments among cloud providers, such as AWS's goal for 100% renewable energy without a specified timeline versus Google's proactive approach of purchasing matching renewable energy for every kilowatt consumed.
  • Future Predictions: There is a discussion on the projected increase in global data usage and its impact on energy demand, potentially rising to 21% of total global electricity by 2030.
  • Call to Action: Vaidehi encourages developers and companies to become aware of where their data is stored and to advocate for greener options. She references tools like the Green Web Foundation and community initiatives like the Sustainable Servers petition, urging individuals to pressure cloud providers for more transparency and sustainability practices.
    The conclusion emphasizes the industry's potential to reduce its environmental impact through awareness and collective action, highlighting the privilege and responsibility technologists hold in shaping a more sustainable future.

The Cost of Data
Vaidehi Joshi • February 20, 2020 • Earth • Talk

Vaidehi Joshi

The internet grows every day. Every second, one of us is making calls to an API, uploading images, and streaming the latest content. But what is the cost of this—is it free? This talk explores the reality of data, and what responsibilities we have as creators and consumers of tech.

Vaidehi is a senior engineer at DEV, where she builds community and helps improve the software careers of millions. She enjoys building and breaking code, but loves creating empathetic engineering teams a whole lot more. She is the creator of basecs and baseds, two writing series exploring the fundamentals of computer science and distributed systems. She also co-hosts the Base.cs Podcast, and is a producer of the BaseCS and Byte Sized video series.

Produced by NDV: https://youtube.com/channel/UCQ7dFBzZGlBvtU2hCecsBBg?sub_confirmation=1

#ruby #rubyconf #rubyconfau #programming

Thu Feb 20 09:30:00 2020 at Plenary Room

RubyConf AU 2020

00:00:00 Oh great, everything's just working! I love it. Hi everyone! Oh, I'm sorry they put me first, and so therefore it is my moral responsibility to get you all excited for these two days.
00:00:09 So let's try that again, hi everyone! That's much better.
00:00:15 My name is Vaidehi Joshi, and I'm really excited to be here. This is my first time at RubyConf Australia and my first time in Australia.
00:00:22 So thank you! I will take all cheers and applause for the next 35 minutes; they're greatly appreciated.
00:00:29 This is a brand new talk, so it's always a little nerve-wracking. But I love Ruby conferences, especially single-track ones.
00:00:42 So I'm hoping that you all will like this. A little bit about me: as my very kind MCs talked about, I work at DEV, where we create great content for developers.
00:00:57 We build a community for software developers, so if you're interested in some really cool technical and non-technical content, you should check us out.
00:01:10 We're also hiring, so you should come chat with me afterwards if you're interested. I have some DEV stickers.
00:01:15 That's another reason to come talk to me afterwards—if you like stickers!
00:01:26 I also do a couple of side projects in my free time outside of work. I won't go through all of them—maybe you might have heard of some of them? I have stickers for these too!
00:01:38 So you should definitely come talk to me and get stickers for those. I basically just want to make friends, and that's why I bring stickers to conferences.
00:01:44 I would love to talk to everyone.
00:01:50 Outside of work and these side projects, what I really love doing is learning how stuff works.
00:01:59 Specifically, I like digging into why things are the way they are.
00:02:06 The thing that I've been diving into recently is the internet. I know, I like to pick really small topics that are not big and expansive at all.
00:02:18 But I've been learning about what the internet is built on, and it's built on a lot of different things.
00:02:26 As software developers, we don't always think about the physical aspects of what the internet is powered on.
00:02:34 Sometimes, all the physical aspects of it are out of sight, out of mind, because our jobs can feel really intangible.
00:02:40 But those physical things do exist: there are fiber cables under the ocean, there are cell towers in very remote locations.
00:02:53 They exist even if we don't think about them all the time.
00:03:01 Recently, I’ve been thinking about the physical aspects of all the data that we store on the internet.
00:03:18 Everything we put on the web is stored somewhere, but how many of us actually think about this?
00:03:25 I guess most of us don’t. I certainly didn’t until recently.
00:03:34 I want to tell you a little bit about what's going on in that side of the tech sector because it's fascinating and eye-opening.
00:03:40 All the data we create every time we make an API call, save to a database, or update some records is stored somewhere on the planet.
00:03:47 It used to be that you could store your data anywhere, like in a server room.
00:03:55 People who are veterans in the industry might remember server rooms; they still exist in a few places.
00:04:08 You used to be able to build your own server racks and put them in a room, having your own localized set of servers.
00:04:20 But the reality is that those are really hard to maintain; they're not easy to build either.
00:04:26 That's where data centers came about; they became a solution for this problem.
00:04:41 We can think of data centers as a large server room—a big building.
00:04:46 The sole purpose of this building is to house servers and all the hardware associated with them.
00:04:58 Some companies choose to build their own data centers because they get more control and security.
00:05:11 However, that can also be expensive and not feasible for everyone.
00:05:22 So there's another option: cloud services or cloud providers.
00:05:29 These are companies that rent out their data centers and the space inside them to other entities.
00:05:40 You probably know some of the larger ones: the largest by size is AWS, followed by Microsoft Azure and Google's cloud platform.
00:05:54 Then there's Alibaba, which I think is the fourth largest in size.
00:06:07 More and more companies and individuals are switching to using cloud providers.
00:06:13 They are storing their content in data centers hosted by these services.
00:06:20 When you do this, you get a couple of benefits: you don't need to maintain that hardware yourself.
00:06:32 You also don’t need to think too much about the network, and you get better hardware.
00:06:40 Some security benefits come in the form of distributed databases and backups.
00:06:52 Sometimes hosting with a cloud provider proves to be more efficient. In a 2014 assessment, the Data Center Efficiency Assessment found that typical cloud providers achieved about 65% utilization of their servers.
00:07:07 This was compared to 15% utilization of on-site data center servers.
00:07:14 Just looking at these numbers seems efficient; when a company switches from hosting their own servers to something provisioned by a cloud provider, they actually use about four times fewer servers in total.
00:07:23 Data suggests that people are migrating in that direction as well.
00:07:30 A Cisco report found that in 2016, 88% of data center traffic was filtering through the cloud.
00:07:38 Next year, they estimate that 96% of all data center traffic is going to go through the cloud.
00:07:44 Most of us are probably doing exactly this too, whether we know it or not.
00:07:55 In fact, in my six years of software development, every company that I have worked with has stored their data on AWS or has used Heroku, which uses AWS.
00:08:06 I never really thought about what that meant for the projects I was working on and all the data we were collecting and storing.
00:08:14 Based on the data I've shared, it seems efficient to do that, right? It seems good, better than building your own data center.
00:08:22 But the reality is that it's a bit more complicated.
00:08:31 When you start looking at what's going on at these data centers, you might have a different perspective.
00:08:41 When you put your data in a cloud provider's data center, they make you sign an SLA, which is short for Service Level Agreement.
00:08:54 What this SLA says is that the cloud provider guarantees you operation.
00:09:02 It's sort of like a lease agreement where they're the landlord.
00:09:08 One of the things this SLA often states is that the cloud provider promises little to no downtime. But how do you actually do that?
00:09:20 What's the reality of promising no downtime?
00:09:29 Data centers must ensure that their operations are reliable and think about things like redundancies so that your app and every other client using a cloud provider does not lose operations if that data center goes down.
00:09:41 They also have to ensure, at a very baseline level, that the lights stay on. You need the lights to keep the servers on.
00:09:50 Servers emit energy and release a lot of heat, and if they overheat, they fail.
00:09:58 So, not only do these data centers need to keep the servers running, they also have to cool these servers.
00:10:07 They do that by removing the hot air in a data center and supplying cold air.
00:10:14 This also takes energy; it’s not just about making sure the lights stay on and the servers are working.
00:10:21 It’s also about making sure that the servers stay at a certain temperature. Now, in the grand scheme of things, what does powering all of these data centers actually look like?
00:10:36 Before I go any further, I want to preface this with a large caveat: it is very hard to find data on this topic.
00:10:43 So I’ve done my best to show you a range of numbers. I've picked the most conservative data and estimates I can find.
00:10:54 But it's just hard to find consistent statistics, so keep that in mind.
00:11:05 Data centers worldwide consume approximately 200 terawatts a year in energy.
00:11:12 For context, one terawatt hour is equivalent to a trillion watts of energy consumed in one hour.
00:11:19 When I read this, I thought, 'That means nothing to me!'
00:11:26 To give you a little more perspective, 200 terawatts hours is more than the energy consumed in an entire year by a few countries.
00:11:38 The country of Iran uses less than this, and the United Kingdom also uses much less.
00:11:45 Scotland, which has a population of roughly 5 million, uses 25 terawatt-hours nationally in a year.
00:11:52 Data centers worldwide are using over 200 terawatt-hours. This means that they demand somewhere between 1% and 3% of the world's electricity.
00:12:08 Which is not nothing; our entire sector, the information and communication technology sector, accounts for 2% of greenhouse gas emissions.
00:12:15 Another way of thinking about this is that our sector has the same carbon footprint as the airline industry.
00:12:22 Now, looking at these numbers, you might have thought the same thing that I had: why are we causing such a large footprint?
00:12:29 What is causing this to happen?
00:12:36 As far as data centers go, the answer has to do with where they get their energy.
00:12:42 The electricity used by cloud providers to actually run those data centers, which they promise to keep operating under their SLA agreements, can come from many different places.
00:12:50 It really depends on the energy flowing into the power grid.
00:13:01 Unfortunately, many data centers rely on power grids that use fossil fuels that cause greenhouse gas emissions.
00:13:08 Some cloud providers are more self-aware than others and are making strides to change this, while others are not doing as well.
00:13:16 There's this great white paper written by Paul Johnson and Ann Curry that goes into detail about six cloud providers and how they are doing.
00:13:31 I encourage you to look at this white paper in depth later, but I want to share a few highlights.
00:13:45 The first one we'll discuss is AWS, the largest cloud provider. They have made a commitment to go 100% renewable energy.
00:13:54 However, they haven't set a date for this goal, so your guess is as good as mine as to when that will happen.
00:14:04 AWS has many different zones where you can store your data, but only five of them are carbon neutral.
00:14:12 This means that if you store your data in a carbon neutral zone, AWS will offset your server emissions.
00:14:19 They do this by buying a carbon offset, which means that they're purchasing renewable energy from somewhere else and putting it back into the electrical grid.
00:14:30 However, this doesn’t change the fact that they’re still causing greenhouse gas emissions.
00:14:36 You can think of purchasing an offset as a way to balance out the emissions you're putting out into the atmosphere, but it’s not a great long-term solution.
00:14:51 Importantly, AWS is not very transparent about what they are doing.
00:14:58 They don’t publicly report data on their current energy use or the rate at which they’re growing.
00:15:07 This lack of information makes it hard to know if the renewable energy they’re purchasing as carbon offsets is truly offsetting the quantity of energy they are using.
00:15:19 If you look at their regions that are not carbon neutral, such as US East 1, it only gets worse.
00:15:28 This particular region is infamously known for outages, often due to events happening at one of its data centers.
00:15:35 Many companies use data centers in this region as their primary storage.
00:15:43 The problem with US East 1 is that the power supplier in Virginia has explicitly chosen not to invest in renewable energy, instead doubling down on fossil fuels.
00:15:58 As a result, all the data centers in this region contribute to greenhouse gas emissions.
00:16:08 Since 2017, AWS has increased its operations in this region by 59%.
00:16:17 So I'm honestly not sure how they'll hit that 100% goal, which might be why they didn’t set a date.
00:16:25 Now let’s switch to Microsoft.
00:16:31 Microsoft's cloud provider is called Azure, which is trying a different approach.
00:16:39 They have something called the carbon fee model internally.
00:16:46 This means that every single business unit inside Microsoft gets charged a fee based on the carbon emissions of their operations.
00:16:55 This includes data centers, which is a cool initiative.
00:17:01 Microsoft has invested in renewable energy and sustainable energy projects since 2014.
00:17:08 They’ve been powering 100% of their energy consumption through renewable energy.
00:17:14 Unlike AWS, they don’t purchase carbon offsets, but renewable energy certificates.
00:17:21 In my opinion, this isn't as good as a carbon offset.
00:17:28 While a carbon offset represents a reduction in greenhouse gas emissions, a renewable energy certificate just signifies a quantity of renewable energy purchased.
00:17:36 Microsoft announced a plan for 60% renewable energy powering their data centers by the end of 2020.
00:17:44 However, they haven’t been entirely transparent about how they will achieve this.
00:17:51 Despite this promise, it's still not clear which regions in Azure are powered by renewable energy versus which are offset by renewable energy certificates.
00:18:05 Now, Google's situation is a bit different.
00:18:12 Google Cloud Platform buys carbon offsets for all its servers, not just those in specific regions.
00:18:20 In fact, Alphabet is the largest corporate buyer of renewable energy.
00:18:33 Google has faced challenges getting 100% renewable energy for its servers, but they purchase a matching kilowatt hour of clean energy for every kilowatt hour they consume.
00:18:41 This means that, even though Google Cloud contributes to carbon emissions, they offset everything added to the atmosphere.
00:18:54 Google is leading among these four cloud providers. They are doing much better than others.
00:19:03 Microsoft is the only other cloud provider that has actually met the 100% sustainability goal.
00:19:11 However, I cannot discuss everything going well without mentioning the challenges, especially those faced by providers based in China.
00:19:20 Last month, Greenpeace released a report tracking the state of renewable energy in China's tech industry.
00:19:32 Researchers estimate that between 2019 and 2023, the data center industry in China will increase by up to 66%.
00:19:44 This is important because most data centers in China are currently powered by coal.
00:19:51 In that report, 50% of the companies analyzed claimed they purchased renewable energy or relocated beefier to renewable energy sources.
00:20:00 One of those companies is Alibaba, but they have made very small steps towards purchasing renewables.
00:20:10 They have not been public about their data; it is difficult to find any information on their greenhouse gas emissions.
00:20:18 Without going into more detail, we now know a little about the reality of data centers.
00:20:29 The question now is how this will scale into the future. Are we doing enough?
00:20:39 Talking about the future of data is challenging because there is disagreement around predictions.
00:20:48 Researchers disagree about forecasts, but here are some things to consider.
00:20:55 One side effect of using a cloud provider is the ease of increasing usage.
00:21:02 It is easy to provision a new server with an uptick in data or traffic.
00:21:10 We can imagine how that might cause things to scale and impact data center traffic over time.
00:21:18 Another reality is that global data traffic is growing quickly.
00:21:28 According to a Cisco report published in the Journal of Industrial Ecology, global data center traffic increased fivefold between 2010 and 2015.
00:21:41 Researchers estimated approximately 28 billion devices are going to be connected to the internet in 2020.
00:21:53 The International Energy Agency found that in 2017, there were 18.4 billion IoT devices connected.
00:22:01 They estimate that number will grow to 20 billion this year.
00:22:09 The same report found that total global traffic in 2017 would likely surpass 1 zettabyte, with estimates that by 2022, it would hit 4.2 zettabytes.
00:22:20 After zettabytes come yottabytes, which is ten to the power of 24 bytes. My brain cannot fathom what that is, but it's cool!
00:22:31 Technology is going to continue demanding more and more energy.
00:22:42 A graph is showing an expected case projection.
00:22:50 In the expected case, our sector will demand approximately 21% of the total global electricity demand by 2030.
00:23:02 Given that we're an industry focused on efficiency and scalability, we should think about this issue.
00:23:08 How many of us have actually thought about this?
00:23:15 On a positive note, some are considering this problem.
00:23:23 More companies are investing in new data centers in colder climates to offset cooling energy requirements.
00:23:31 In 2018, DigitalX, based in Sweden, used an interesting cooling technique.
00:23:39 They recovered waste heat from servers to heat over 10,000 apartments in Stockholm.
00:23:46 So, energy waste wasn’t wasted after all!
00:23:53 A few years ago, the Oregon Health and Sciences University built a data center covered by a geodesic dome.
00:24:01 It used natural convection to eliminate cooling equipment, saving on electricity costs.
00:24:08 Stripe, which doesn’t have its own data centers and uses AWS, has committed to carbon neutrality.
00:24:17 They bought enough carbon offsets to reach net zero emissions by 2017.
00:24:27 The community is also raising awareness about this topic.
00:24:34 The authors of the white paper created a petition called the Sustainable Servers by 2028 petition, accumulating many signatures.
00:24:48 They plan to lobby cloud providers to invest in sustainable energy sources.
00:24:54 Each of us has a role in solving this problem too.
00:25:05 One of the biggest issues with storing cloud data is determining where it lives and if that area is clean.
00:25:15 We should figure out where your data lives.
00:25:24 Is it stored in a region that's carbon neutral, or somewhere causing greenhouse gas emissions?
00:25:32 Finding that information can be quite difficult.
00:25:42 The Green Web Foundation is a great resource for answering questions about your projects and any app or website you use.
00:25:52 Another step we can take is migrating data to a greener region.
00:26:01 Admittedly, if you've done a data migration, you know it’s a big ask, but it’s doable.
00:26:11 If you’re lucky enough to start fresh, provision servers in a carbon-neutral location from the start.
00:26:20 A great step is drawing attention to this issue. At my last company, I discovered we were storing data in US East 1.
00:26:31 I was horrified, so I brought it up to the team.
00:26:39 We began discussing what it would take to migrate out of US East 1.
00:26:46 Hopefully, they will go ahead and do that; it would be awesome!
00:26:54 At small companies, it might just mean discussing the issue internally.
00:27:02 But at larger companies, especially those with enterprise accounts, you can pressure your cloud provider.
00:27:08 Ask them to be transparent about where they're storing data and how they're sourcing their energy.
00:27:16 If you host with AWS, discuss it with your account manager and demand more renewable options.
00:27:25 Account managers really listen to their customers.
00:27:35 If you work for a cloud services provider, you can do even more.
00:27:42 Amazing employees at Microsoft and Amazon are already doing this.
00:27:49 If you want to, you can also create solutions for making this data more accessible and transparent.
00:27:58 A great example is the Cloud Sustainability Console, a Chrome extension highlighting the green regions in the AWS dashboard.
00:28:08 This is more than what Amazon is currently doing!
00:28:17 Super helpful and informative!
00:28:25 If you take nothing else from this talk, I hope you gain awareness of the physical implications of our work every day.
00:28:35 I don't think there are any easy answers here.
00:28:41 But I believe we have a responsibility to take the first step.
00:28:49 Given that privilege we have in this industry, I want to thank those who made me aware of this problem.
00:28:56 Thank you to Denise who tweeted an amazing infographic about data centers.
00:29:05 Also to Paul Johnson and Ann Curry for their research in that white paper.
00:29:12 Hearing a talk on climate change is kind of a downer, but now you know why I was so excited at the beginning.
00:29:20 We're using a finite resource, whether we realize it or not.
00:29:30 As I learned more about this topic, it became clear that we may not understand the true cost of storing data.
00:29:38 There’s a sticker price we don't know about yet.
00:29:47 Though I spent a lot of time learning about this, I don’t feel entirely disheartened.
00:29:54 While it might seem we’re contributing to the problem, there’s another perspective.
00:30:05 Our industry has pushed things forward historically, even if sometimes we mess up.
00:30:14 We’ve set good examples for other industries facing the same challenges.
00:30:22 I’m actually optimistic that we can set a great example again.
00:30:29 I see people in our industry working to change the status quo, designing better solutions.
00:30:35 None of us in this room are just consumers of technology; we're creators, which means we have a voice.
00:30:42 The only question that remains is whether you will use it.
00:30:54 Thank you.
Explore all talks recorded at RubyConf AU 2020
+15