Rails Security at Scale

by Jack McCracken

In the 'Rails Security at Scale' presentation held at RailsConf 2019, Jack McCracken from Shopify discusses the security measures implemented in one of the largest Rails applications. With Shopify experiencing significant growth, managing security for over 800,000 merchants has become increasingly challenging. McCracken emphasizes the necessity of embedding security culture within the organization, where every developer understands the implications of their changes on users. Key points of the presentation include:

Scaling Security: With a growing number of developers, it’s impractical for a small security team to oversee all code changes. Developers must be made aware of their responsibilities in safeguarding the platform.
Tool Development: McCracken highlights the importance of implementing tools like RuboCop and custom bots (e.g., Caution Tape Bot) to automatically flag potential security issues in code.
Safe Defaults: Ensuring that security measures are 'safe by default' is crucial. An example includes modifying the usage of html_safe to prevent cross-site scripting vulnerabilities.
Error Handling: Instead of punitive measures for mistakes, a supportive environment should encourage learning from mistakes without compromising security. Implementing features like Firewald helps manage permissions effectively.
Bug Bounty Programs: These programs incentivize security researchers to discover vulnerabilities. McCracken shares examples of critical vulnerabilities uncovered through their bug bounty program that saved Shopify significant potential losses.
Fun and Engaging Training: To cultivate a security-conscious culture, Shopify organizes engaging training events (like hackfests) where developers can learn through hands-on experiences.

The main takeaways encourage treating security as a shared responsibility among all developers, the importance of creating tools that support secure coding practices, and fostering an enjoyable learning environment to improve security awareness. McCracken concludes by inviting curious individuals to apply for open positions in Shopify’s application security team.

00:00:21.840 All right, hi everyone. I'm Jack McCracken and I work at Shopify.

00:00:26.960 Today, we're going to talk a little bit about how we manage security in one of the largest Rails applications in the world.

00:00:32.640 First off, a little bit about me. They took this photo with a low-resolution camera, so I'm sorry about that.

00:00:40.239 However, I love puzzles, I love crosswords, and I'm really big into baking. These are just a couple of things that get my mind off software.

00:00:46.879 Most importantly, I love security. I love making sure that every single one of Shopify's merchants is secure and does not need to worry about their platform security.

00:00:52.480 They can just focus on what makes them awesome. So, first off, what is Shopify? Shopify is a multi-channel e-commerce platform, which is a fancy way of saying that whenever you want to sell something, Shopify will let you sell it wherever.

00:01:07.439 If your customers are on Instagram, we sell on Instagram. If your customers are on Facebook, we sell on Facebook. Now, let's discuss some context for what we're dealing with.

00:01:19.040 In 2017, the main Shopify application deployed about 40 changes to production per day, and we had around 1,900 employees. At that time, we supported 375,000 merchants and handled 80,000 peak requests per second.

00:01:33.759 So, the joke at Shopify is that we double every year; however, I'm going to show you that it's actually slightly outpaced that. Just a year and a half later, we now deploy 150 changes to the core product, have 4,000 employees, 1,500 of whom are solely in research and development.

00:01:48.240 We also now support 800,000 merchants and handle 170,000 peak requests per second. What this means for me is that our two main customer bases, as a security team, are the employees who need to implement safe changes and the merchants who need to use those safe changes.

00:02:13.920 Both groups have doubled or tripled in the past year and a half, which leaves me feeling a little overwhelmed. I joined in 2017 when those earlier numbers were accurate, and at that point, I felt scared.

00:02:25.280 There were a thousand people working on this product, and I wondered how 10 people could secure it. How could we ensure that every merchant using Shopify had a good experience and didn’t feel insecure or suffer from a data breach?

00:02:48.240 It turns out, it's simply impossible for 10 people to police the actions of 1,000 people. You need to build security into your culture and make every developer aware that their changes impact 800,000 merchants or a million users.

00:03:06.319 These are real people, and it's critical that those values are embedded in your culture to ensure that they remain top of mind.

00:03:26.000 Another thing we heavily rely on is automation through the use of bots. I like to break down what we do into three categories to effectively manage security.

00:03:50.240 The first category is making things safe by default. If something looks right, it should indeed be right. If it seems secure, it should be secure. And if something looks scary, it should be treated as scary.

00:04:04.640 The second category is making messing up not the end of the world. Nobody wants to be that developer who feels shamed by a security person for making a mistake.

00:04:10.080 That discourages improvement and can even drive people away. Lastly, we aim to make security cool. No one wants to deal with that 'security guy' yelling at them.

00:04:22.160 People want security features in their products; they care about the overall product. So why not enable them to make their products more secure?

00:04:34.880 Starting with making things safe by default, our goal is to ensure that if someone is doing something, it is inherently safe. If it's obvious that something is safe, it should be.

00:04:50.639 If it's not, then there’s a discrepancy that needs addressing.

00:05:03.199 To keep things safe by default, you could set up tools like Breakman CI, which many talk about. Breakman is an excellent tool, but it requires a capable security team to ensure that the changes made are relevant to developers.

00:05:14.800 If not, it can be ignored easily, as we have experienced in the past, so it’s essential to keep it simple and ensure the changes implemented do not hinder a developer's ability to do their job.

00:05:27.280 The first example I'll discuss is a common vulnerability we encounter: Cross-Site Scripting (XSS). Most of you are likely familiar with it, but let's delve a bit deeper.

00:05:39.039 Consider a standard ERB template where we have an insertion of the parameter name.

00:05:44.639 This template contains a method called 'html_safe.' If we visit the site and enter something like 'jack', we get 'Hello, jack' as expected. However, if we input JavaScript instead, like 'alert(document.cookie)', this poses a severe security risk.

00:06:02.639 Every time I demonstrate this, people react with shock, as if to say, 'Oh no, another security alert!' I like to explain this by noting that anyone who sends you this link and you click on it has potential access to do anything in that application.

00:06:14.400 In the case of Shopify, it could allow them to add a product, change a password, or anything else. This is critical because, as a large company, we do not want to be a target for such vulnerabilities.

00:06:31.680 When you place that into the DOM, you see 'Hello' followed by a literal script tag, which means you are permitting whoever is inserting code to execute it.

00:06:45.440 The problem here stems from using 'html_safe.' If we remove that, we get 'Hello' followed by various symbols, which may look weird but is much safer than suffering from an XSS vulnerability.

00:06:58.319 Now, about 'html_safe'—I entered security with no prior experience a year and a half ago, thinking it marked strings as escaped. However, it does the exact opposite; the name is misleading and potentially dangerous.

00:07:11.759 What it does is turn the string into a 'safe buffer', which inserted into an ERB template won't get escaped, allowing for literal HTML to be included.

00:07:22.720 So, we've enabled a rule within our code to check for the usage of 'html_safe' and 'raw,' which function similarly, and flag those instances.

00:07:34.400 However, what if you genuinely mean to use 'html_safe'? For example, when rendering Markdown that you are sure is safe?

00:07:50.560 To mitigate this, we renamed the method in our codebase to 'dangerously_output_as_html' to indicate its associated risks.

00:08:07.840 Additionally, we developed the Caution Tape Bot to remind folks whenever 'html_safe' is used in a pull request.

00:08:20.240 It simply suggests reviewing documentation and requesting that the team checks over the code.

00:08:34.720 Implementing this can be done relatively simply with a GitHub action, making it efficient to maintain security hygiene.

00:08:45.600 We've also built an ERB lint application that checks for JavaScript contexts because, in JavaScript, escaping out of that context is not handled automatically, risking arbitrary JavaScript execution.

00:09:03.679 If you add its findings to the development process, you will see a marked improvement in security practices. You can find the link to that resource in these slides, which I will share on Twitter.

00:09:24.000 Moving on to the next significant issue—Cross-Site Request Forgery (CSRF)—this occurs when an attacker executes an action on behalf of a user without their knowledge by getting them to visit a malicious site.

00:09:50.560 For instance, if 'good.com' allows users to change their account details via a form which accepts tokens, targeting the CSRF vulnerability could let an attacker adjust your credentials without your awareness.

00:10:23.679 Rails handles this by checking whether session tokens align with tokens in forms to prevent unauthorized actions, although some developers may disable this feature, thus introducing security risks.

00:10:37.360 To counter this, we created another rule for the Caution Tape Bot, which checks for any instances of 'skip before_action verify_authenticity_token' and notifies developers of appropriate usage.

00:10:53.760 The second theme I'd like to address is that messing up should not be the end of the world. It’s essential to approach mistakes positively.

00:11:06.160 A popular saying is that we don’t make mistakes; we make happy accidents. Often, it's about addressing the failure of the system, rather than blaming individuals for errors.

00:11:24.000 We've implemented safeguards that alert developers to mistakes quickly and effectively, minimizing potential data exposure. A frequent issue arises when controlling access to resources.

00:11:45.440 For example, if a user tries to edit a blog post that doesn’t belong to them, that should not succeed. To handle this, we built a gem that helps check if the user’s object can relate to them and denies access if it cannot.

00:12:03.360 If the logged-in user tries to access data meant for another user, instead of triggering an authorization error, they receive a generalized error message.

00:12:24.160 This approach should protect your application more gracefully while notifying developers of potential security issues needing resolution.

00:12:39.760 Next, I want to talk about a tool we internally developed called Watchtower. Though it is not open-source, it's not overly complicated.

00:12:55.040 Watchtower scans a list of domains for vulnerabilities that stem from common applications like GraphQL and Sidekiq.

00:13:05.440 We found that many of these Rails apps often leave unmaintained Sidekiq instances exposed to the public, creating security flaws.

00:13:26.720 To identify these vulnerabilities, I wrote a bash script that systematically checks each service, reporting vulnerabilities found.

00:13:45.200 The next step to enhancing our security is establishing a bug bounty program. These programs incentivize ethical hackers to find vulnerabilities, effectively bolstering our security.

00:14:01.440 While it isn’t suitable for every company, a bug bounty can be beneficial as it engages good actors against bad actors who are already looking for vulnerabilities.

00:14:22.080 As a good start, create an email like '[email protected]' and provide a page explaining your security policies. You'll find hackers who can help secure your site.

00:14:40.800 Now, I'll discuss a couple of examples from our bug bounty program. The first example is the biggest payout received, a server-side request forgery exploit.

00:15:00.880 It was a subtle exploit where the hacker could redirect requests by manipulating a customization in our internal exchange app.

00:15:18.800 The detailed report they provided saved us a considerable amount of money as it allowed us to mitigate the threat before it could be exploited.

00:15:36.639 The second example involved a vulnerability that allowed a user to confirm an email without proper authorization, enabling them to log into any Shopify store.

00:15:47.360 By acting swiftly, we managed to patch this vulnerability during the holiday season.

00:16:03.440 The takeaway here is that bug bounty programs are instrumental in uncovering vulnerabilities that static analysis tools might miss.

00:16:21.120 As a closing note, it’s impossible for a small team to secure applications when hundreds of developers are actively making changes.

00:16:39.520 It’s vital that every developer understands they impact security for 800,000 merchants, and they are responsible for that.

00:16:57.839 So, how can we update developers on security trends without making it a full-time job? By making it engaging, showing them how to become hackers!

00:17:16.799 We call our security workshops 'Learn to Hack,' and we've had over 300 employees participate. During these sessions, we demonstrate vulnerabilities within mock apps.

00:17:33.599 Workshops encourage interaction and creative problem solving, and we've had incredible success with engagement.

00:17:50.480 Additionally, we initiated a Halloween Hackfest where employees competed to find security vulnerabilities in a fun, cooperative way.

00:18:06.559 This event not only increased security awareness but provided insight into real-world security practices.

00:18:22.639 We subsequently built the Shopify Capture The Flag (CTF) Challenge, which continues to engage employees year-round.

00:18:41.920 This initiative has encouraged employees to work on vulnerabilities and sharpen their security skills even amidst busy schedules.

00:19:00.720 So, before we wrap up, here are a few key takeaways from the lessons learned in our security journey.

00:19:20.240 Great security tooling does not solve the security problems of a company without proper developer input. Developers care about secure products—they really do.

00:19:38.879 Automating security practices in your development workflow is crucial for scaling security efforts. It is not practical to hire security personnel equal to the number of developers.

00:19:58.080 Make learning about security appealing; helping developers understand the risks in a relatable way is far more effective than boring lectures.

00:20:17.440 Ultimately, security is everyone’s responsibility. Taking the initiative to learn about security vulnerabilities will strengthen your team as a whole.

00:20:36.000 Thank you for listening. Since this is a sponsored talk, I’d like to mention that Shopify is hiring in application security, as well as numerous roles for Rails developers.

00:20:54.000 If you’re interested in any of this, feel free to come up to me or raise your hand. We have about ten minutes for questions.

00:21:47.670 You