00:00:21.840
All right, hi everyone. I'm Jack McCracken and I work at Shopify.
00:00:26.960
Today, we're going to talk a little bit about how we manage security in one of the largest Rails applications in the world.
00:00:32.640
First off, a little bit about me. They took this photo with a low-resolution camera, so I'm sorry about that.
00:00:40.239
However, I love puzzles, I love crosswords, and I'm really big into baking. These are just a couple of things that get my mind off software.
00:00:46.879
Most importantly, I love security. I love making sure that every single one of Shopify's merchants is secure and does not need to worry about their platform security.
00:00:52.480
They can just focus on what makes them awesome. So, first off, what is Shopify? Shopify is a multi-channel e-commerce platform, which is a fancy way of saying that whenever you want to sell something, Shopify will let you sell it wherever.
00:01:07.439
If your customers are on Instagram, we sell on Instagram. If your customers are on Facebook, we sell on Facebook. Now, let's discuss some context for what we're dealing with.
00:01:19.040
In 2017, the main Shopify application deployed about 40 changes to production per day, and we had around 1,900 employees. At that time, we supported 375,000 merchants and handled 80,000 peak requests per second.
00:01:33.759
So, the joke at Shopify is that we double every year; however, I'm going to show you that it's actually slightly outpaced that. Just a year and a half later, we now deploy 150 changes to the core product, have 4,000 employees, 1,500 of whom are solely in research and development.
00:01:48.240
We also now support 800,000 merchants and handle 170,000 peak requests per second. What this means for me is that our two main customer bases, as a security team, are the employees who need to implement safe changes and the merchants who need to use those safe changes.
00:02:13.920
Both groups have doubled or tripled in the past year and a half, which leaves me feeling a little overwhelmed. I joined in 2017 when those earlier numbers were accurate, and at that point, I felt scared.
00:02:25.280
There were a thousand people working on this product, and I wondered how 10 people could secure it. How could we ensure that every merchant using Shopify had a good experience and didn’t feel insecure or suffer from a data breach?
00:02:48.240
It turns out, it's simply impossible for 10 people to police the actions of 1,000 people. You need to build security into your culture and make every developer aware that their changes impact 800,000 merchants or a million users.
00:03:06.319
These are real people, and it's critical that those values are embedded in your culture to ensure that they remain top of mind.
00:03:26.000
Another thing we heavily rely on is automation through the use of bots. I like to break down what we do into three categories to effectively manage security.
00:03:50.240
The first category is making things safe by default. If something looks right, it should indeed be right. If it seems secure, it should be secure. And if something looks scary, it should be treated as scary.
00:04:04.640
The second category is making messing up not the end of the world. Nobody wants to be that developer who feels shamed by a security person for making a mistake.
00:04:10.080
That discourages improvement and can even drive people away. Lastly, we aim to make security cool. No one wants to deal with that 'security guy' yelling at them.
00:04:22.160
People want security features in their products; they care about the overall product. So why not enable them to make their products more secure?
00:04:34.880
Starting with making things safe by default, our goal is to ensure that if someone is doing something, it is inherently safe. If it's obvious that something is safe, it should be.
00:04:50.639
If it's not, then there’s a discrepancy that needs addressing.
00:05:03.199
To keep things safe by default, you could set up tools like Breakman CI, which many talk about. Breakman is an excellent tool, but it requires a capable security team to ensure that the changes made are relevant to developers.
00:05:14.800
If not, it can be ignored easily, as we have experienced in the past, so it’s essential to keep it simple and ensure the changes implemented do not hinder a developer's ability to do their job.
00:05:27.280
The first example I'll discuss is a common vulnerability we encounter: Cross-Site Scripting (XSS). Most of you are likely familiar with it, but let's delve a bit deeper.
00:05:39.039
Consider a standard ERB template where we have an insertion of the parameter name.
00:05:44.639
This template contains a method called 'html_safe.' If we visit the site and enter something like 'jack', we get 'Hello, jack' as expected. However, if we input JavaScript instead, like 'alert(document.cookie)', this poses a severe security risk.
00:06:02.639
Every time I demonstrate this, people react with shock, as if to say, 'Oh no, another security alert!' I like to explain this by noting that anyone who sends you this link and you click on it has potential access to do anything in that application.
00:06:14.400
In the case of Shopify, it could allow them to add a product, change a password, or anything else. This is critical because, as a large company, we do not want to be a target for such vulnerabilities.
00:06:31.680
When you place that into the DOM, you see 'Hello' followed by a literal script tag, which means you are permitting whoever is inserting code to execute it.
00:06:45.440
The problem here stems from using 'html_safe.' If we remove that, we get 'Hello' followed by various symbols, which may look weird but is much safer than suffering from an XSS vulnerability.
00:06:58.319
Now, about 'html_safe'—I entered security with no prior experience a year and a half ago, thinking it marked strings as escaped. However, it does the exact opposite; the name is misleading and potentially dangerous.
00:07:11.759
What it does is turn the string into a 'safe buffer', which inserted into an ERB template won't get escaped, allowing for literal HTML to be included.
00:07:22.720
So, we've enabled a rule within our code to check for the usage of 'html_safe' and 'raw,' which function similarly, and flag those instances.
00:07:34.400
However, what if you genuinely mean to use 'html_safe'? For example, when rendering Markdown that you are sure is safe?
00:07:50.560
To mitigate this, we renamed the method in our codebase to 'dangerously_output_as_html' to indicate its associated risks.
00:08:07.840
Additionally, we developed the Caution Tape Bot to remind folks whenever 'html_safe' is used in a pull request.
00:08:20.240
It simply suggests reviewing documentation and requesting that the team checks over the code.
00:08:34.720
Implementing this can be done relatively simply with a GitHub action, making it efficient to maintain security hygiene.
00:08:45.600
We've also built an ERB lint application that checks for JavaScript contexts because, in JavaScript, escaping out of that context is not handled automatically, risking arbitrary JavaScript execution.
00:09:03.679
If you add its findings to the development process, you will see a marked improvement in security practices. You can find the link to that resource in these slides, which I will share on Twitter.
00:09:24.000
Moving on to the next significant issue—Cross-Site Request Forgery (CSRF)—this occurs when an attacker executes an action on behalf of a user without their knowledge by getting them to visit a malicious site.
00:09:50.560
For instance, if 'good.com' allows users to change their account details via a form which accepts tokens, targeting the CSRF vulnerability could let an attacker adjust your credentials without your awareness.
00:10:23.679
Rails handles this by checking whether session tokens align with tokens in forms to prevent unauthorized actions, although some developers may disable this feature, thus introducing security risks.
00:10:37.360
To counter this, we created another rule for the Caution Tape Bot, which checks for any instances of 'skip before_action verify_authenticity_token' and notifies developers of appropriate usage.
00:10:53.760
The second theme I'd like to address is that messing up should not be the end of the world. It’s essential to approach mistakes positively.
00:11:06.160
A popular saying is that we don’t make mistakes; we make happy accidents. Often, it's about addressing the failure of the system, rather than blaming individuals for errors.
00:11:24.000
We've implemented safeguards that alert developers to mistakes quickly and effectively, minimizing potential data exposure. A frequent issue arises when controlling access to resources.
00:11:45.440
For example, if a user tries to edit a blog post that doesn’t belong to them, that should not succeed. To handle this, we built a gem that helps check if the user’s object can relate to them and denies access if it cannot.
00:12:03.360
If the logged-in user tries to access data meant for another user, instead of triggering an authorization error, they receive a generalized error message.
00:12:24.160
This approach should protect your application more gracefully while notifying developers of potential security issues needing resolution.
00:12:39.760
Next, I want to talk about a tool we internally developed called Watchtower. Though it is not open-source, it's not overly complicated.
00:12:55.040
Watchtower scans a list of domains for vulnerabilities that stem from common applications like GraphQL and Sidekiq.
00:13:05.440
We found that many of these Rails apps often leave unmaintained Sidekiq instances exposed to the public, creating security flaws.
00:13:26.720
To identify these vulnerabilities, I wrote a bash script that systematically checks each service, reporting vulnerabilities found.
00:13:45.200
The next step to enhancing our security is establishing a bug bounty program. These programs incentivize ethical hackers to find vulnerabilities, effectively bolstering our security.
00:14:01.440
While it isn’t suitable for every company, a bug bounty can be beneficial as it engages good actors against bad actors who are already looking for vulnerabilities.
00:14:22.080
As a good start, create an email like '[email protected]' and provide a page explaining your security policies. You'll find hackers who can help secure your site.
00:14:40.800
Now, I'll discuss a couple of examples from our bug bounty program. The first example is the biggest payout received, a server-side request forgery exploit.
00:15:00.880
It was a subtle exploit where the hacker could redirect requests by manipulating a customization in our internal exchange app.
00:15:18.800
The detailed report they provided saved us a considerable amount of money as it allowed us to mitigate the threat before it could be exploited.
00:15:36.639
The second example involved a vulnerability that allowed a user to confirm an email without proper authorization, enabling them to log into any Shopify store.
00:15:47.360
By acting swiftly, we managed to patch this vulnerability during the holiday season.
00:16:03.440
The takeaway here is that bug bounty programs are instrumental in uncovering vulnerabilities that static analysis tools might miss.
00:16:21.120
As a closing note, it’s impossible for a small team to secure applications when hundreds of developers are actively making changes.
00:16:39.520
It’s vital that every developer understands they impact security for 800,000 merchants, and they are responsible for that.
00:16:57.839
So, how can we update developers on security trends without making it a full-time job? By making it engaging, showing them how to become hackers!
00:17:16.799
We call our security workshops 'Learn to Hack,' and we've had over 300 employees participate. During these sessions, we demonstrate vulnerabilities within mock apps.
00:17:33.599
Workshops encourage interaction and creative problem solving, and we've had incredible success with engagement.
00:17:50.480
Additionally, we initiated a Halloween Hackfest where employees competed to find security vulnerabilities in a fun, cooperative way.
00:18:06.559
This event not only increased security awareness but provided insight into real-world security practices.
00:18:22.639
We subsequently built the Shopify Capture The Flag (CTF) Challenge, which continues to engage employees year-round.
00:18:41.920
This initiative has encouraged employees to work on vulnerabilities and sharpen their security skills even amidst busy schedules.
00:19:00.720
So, before we wrap up, here are a few key takeaways from the lessons learned in our security journey.
00:19:20.240
Great security tooling does not solve the security problems of a company without proper developer input. Developers care about secure products—they really do.
00:19:38.879
Automating security practices in your development workflow is crucial for scaling security efforts. It is not practical to hire security personnel equal to the number of developers.
00:19:58.080
Make learning about security appealing; helping developers understand the risks in a relatable way is far more effective than boring lectures.
00:20:17.440
Ultimately, security is everyone’s responsibility. Taking the initiative to learn about security vulnerabilities will strengthen your team as a whole.
00:20:36.000
Thank you for listening. Since this is a sponsored talk, I’d like to mention that Shopify is hiring in application security, as well as numerous roles for Rails developers.
00:20:54.000
If you’re interested in any of this, feel free to come up to me or raise your hand. We have about ten minutes for questions.
00:21:47.670
You