RailsConf 2020 CE

Enterprise Identity Management on Rails

Enterprise Identity Management on Rails

by Brynn Gitt and Oliver Sanford

The video titled "Enterprise Identity Management on Rails" presented by Brynn Gitt and Oliver Sanford at RailsConf 2020 explores the complexities involved in identity management within Ruby on Rails applications, particularly for larger enterprises. It emphasizes the importance of planning identity management strategies from the beginning to avoid challenges such as managing user accounts across multiple organizations and ensuring secure and efficient authentication processes. Here are the key points discussed throughout the presentation:

  • Scope Users to Organizations: The presenters recommend scoping user accounts to specific organizations to prevent identity management issues that arise when a user belongs to multiple organizations. This includes creating separate accounts for users in different organizations.

  • Enterprise Authentication: They outline the need for implementing robust authentication methods, highlighting that typical password management is not the focus. Instead, they suggest using multiple authentication services such as Omniauth, emphasizing methods like Single Sign-On (SSO).

  • Understanding Authentication Protocols: The video discusses various authentication protocols including SAML (Security Assertion Markup Language) and SCIM (System for Cross-domain Identity Management), explaining their relevance and application in enterprise-sized deployments.

  • Designing with Integration in Mind: The importance of considering identity provider integrations, such as Azure Active Directory or Google OAuth, is stressed. Implementing an effective access control system and managing user attributes are key to a successful integration.

  • Edge Cases and Observability: The presenters share insights on handling edge cases, such as email changes and user deletions, and emphasize the need for observability in logging significant events in the identity management process.

  • Testing and Debugging: They detail best practices for testing identity management integrations, including using specialized tools and frameworks, as well as practical suggestions for debugging SAML integrations with identity providers like Okta.

  • Implementation Libraries: Finally, an overview of various libraries and gems available for SCIM and SAML integration is provided, highlighting tools like Skim Kit, Skim Rails, and Skim Engine for developers looking to implement identity management solutions in Rails applications.

In conclusion, the speakers aim to equip developers with practical insights and strategies to create scalable and robust identity management systems in Ruby on Rails, ensuring compliance with enterprise needs and reducing future complexities.

00:00:09.110 Welcome to RailsConf 2020 Couch Edition and Enterprise Identity Management on Rails. I'm Brynn Gitt.
00:00:15.449 And I'm Oliver Sanford. We're software engineers at Mode Analytics. This talk is not about resetting passwords.
00:00:21.360 We don't like passwords very much, and we're assuming you're already familiar enough with Rails to have seen what password management typically looks like.
00:00:28.230 We're also going to skip right past session storage. Yes, in a web app, session storage is the backbone of identity management.
00:00:34.649 It's something you have to handle in a way that suits your scale and infrastructure. The various Rails session stores are well covered elsewhere.
00:00:41.640 This talk is also not about real ID verification, biometrics, or other new forms of strong identity verification for government or health processes.
00:00:46.710 Instead, it is about several topics you'll want to consider if you're building a product or service for larger businesses.
00:00:54.149 This talk is about some of the lessons we've learned in handling identity concerns in Rails apps, particularly through debugging SAML and developing a SCIM integration.
00:01:06.030 We'll start with a big-picture design consideration and then look more closely at a couple of areas like handling enterprise authentication in Rails and implementing SCIM.
00:01:16.979 The obvious but wrong way to set up identity management in Rails is to have all users share a single space.
00:01:23.310 Users belong to an organization, validate that no two user accounts can have the same email or username, and you're done, right?
00:01:29.880 Wrong! Sometimes the same person belongs to more than one organization. Add a membership table, and voila! Users can be part of multiple organizations.
00:01:40.709 Here's where the trouble starts: you'll run into policy conflicts when two organizations have different login policies. Which one do you apply?
00:01:48.840 If someone has logged in fully according to one organization's policies using a strategy or provider that another organization otherwise wouldn't, what is their authorization level?
00:01:54.119 When a person is removed, which organization has permission to delete their account?
00:02:00.869 A better approach is to scope users to organizations from the beginning of your project. This way, you'll validate traits like the email and username only within a single organization.
00:02:07.020 If [email protected] needs to be part of two orgs, each org will need to maintain a separate [email protected] account.
00:02:13.650 To understand the consequences of the two different design patterns, let's compare GitHub with Slack.
00:02:19.290 GitHub emerged out of an open source-oriented ethos, much like Twitter, where developers had handles in a single global namespace.
00:02:26.860 Slack, on the other hand, was designed for the enterprise from the beginning. You log in to a workspace specific to your organization.
00:02:34.489 The GitHub model is oriented toward global interactions and sharing. A user can still belong to an organization, however, things become more challenging when they belong to multiple organizations.
00:02:40.450 I definitely logged into my personal GitHub account recently and discovered that I was still a member of a large organization.
00:02:46.780 I stopped contracting for that organization three years ago.
00:02:52.389 The fact that GitHub has developed its Enterprise Cloud Edition, which offers unique username account spaces, SAML authentication, and SCIM integration at considerable additional cost, speaks to the effort required.
00:02:58.120 Rebuilding or refactoring their service to move away from the global namespace pattern shows the challenges involved.
00:03:05.739 To the extent that GitHub is really a B2B service—that is, if the platform is mostly used while people are at work with their employers paying for it—the open-source model becomes extraneous.
00:03:12.670 If you have the luxury of planning for enterprise support from the ground up, take those early decisions carefully.
00:03:18.609 How can you design your product or service so that enterprise identity management is a breeze?
00:03:24.490 Probably the best gift you can give yourself is to scope each user account entirely to a single organization.
00:03:30.370 If you need individuals to have accounts, either add them all to an invisible general public organization or make them each an organization of one.
00:03:36.039 Observability is fundamental. Identify and define any regular business events you want to log, such as authentication success or failure, a user provisioned event, or a user deletion event.
00:03:43.059 In an existing service, your product probably already has service objects to handle exceptions and analytics. Use them!
00:03:50.610 If you don't have them yet, build them. Half the battle in development is exploring and tuning a living system.
00:03:57.610 If you work with the running system locally and explore actual payloads, you'll find it makes the documentation much more concrete than the formalistic description of an RFP.
00:04:05.189 Even for local development, a log visualization product will serve you well compared to manually parsing the Rails logs for everything you need to find.
00:04:09.480 Many Rails apps start out life offering authentication via username and password.
00:04:13.169 Perhaps using something like Devise, try to skip this step! Storing individual passwords is insecure and risky. Avoid it if you can.
00:04:20.490 What's good about Devise is that its modules implement many fundamentals of identity management in an itemized way.
00:04:26.250 They're instructive and represent concerns you may still want to address, even if in a way that fits better with your existing practices.
00:04:33.960 For our purposes, the minimum viable implementation is single sign-on via some combination of public OAuth 2 vendors.
00:04:39.120 Users will click on the first available single sign-on button and defer to an external service.
00:04:44.940 At this stage, you'll need something more modular than username and password implementation in the Rails ecosystem, and that something is OmniAuth.
00:04:53.780 OmniAuth really has only one central idea: all forms of authentication boil down to a request phase and a callback phase.
00:04:59.780 In the request phase, we check for evidence of the user's identity. We look for an existing session and ask for credentials.
00:05:06.430 If we don't find it, in the callback phase, we pass verified identity information to your application and handle anything required to log them in.
00:05:13.000 OmniAuth supports some useful hooks, such as before the request phase—a good place for CSRF verification—and sending information to analytics.
00:05:20.050 Before the callback phase and on failure in between, it delegates the actual alert to whichever authentication strategy is appropriate.
00:05:27.200 Depending on your customers, you may find Google OAuth 2 is the most popular single sign-on method, or perhaps GitHub or Azure Active Directory.
00:05:32.450 In these cases, you're connecting to public OAuth 2 authorization servers, and the connection details are more or less the same for all your customers.
00:05:38.440 So you'll wind up with something that looks like a pile of configuration constants from various Omni authentication strategy gems.
00:05:45.000 This may work for a while, but there are some disadvantages to this system.
00:05:50.630 Not all your customers will have all the login methods set up, so people might click on the wrong thing and wind up in a dead end.
00:05:57.610 Moreover, your customers may require you to ensure their team only logs in with an approved strategy, so get access to both the request and callback actions where possible.
00:06:05.189 Since your system now supports multiple authentication providers, you'll need to track the appropriate external UID for each of them.
00:06:12.480 You'll find it in the OmniAuth OAuth hash and save it to an identity model associated with the user.
00:06:18.230 At this point, the minimal possible ERD looks something like this.
00:06:24.069 SAML stands for Security Assertion Markup Language. The 2.0 specification was approved way back in 2005 before the world saw its first iPhone.
00:06:30.430 SAML has some limitations, but it's fully realized and fairly widely adopted in larger organizations. It's a dialect of XML with a few interrelated documents.
00:06:41.410 You may wish to handle this with the public OAuth 2 vendors we discussed in the last section. Most of your connection details will be constants.
00:06:48.920 No worries! It won't be long before customers want to connect to identity providers completely under their own control.
00:06:56.520 They might ask you to do this with OAuth too, but we found the most widely requested enterprise authentication protocol is SAML.
00:07:04.270 In the future, it may be OpenID Connect.
00:07:10.150 A SAML authentication flow can be initiated either directly from the identity provider or from the service provider—your service.
00:07:17.790 There are some good reasons you'd want to initiate the request yourself: your request will likely have a short timeout associated with it.
00:07:24.360 It may also contain some disposable immutable data you wish the identity provider to send back as part of the SAML assertion.
00:07:29.840 In the OAuth 2 world, the SAML assertion would be called an authorization token.
00:07:36.960 Both measures help mitigate the risk of man-in-the-middle attacks.
00:07:42.310 In a SAML authentication flow, the security assertion or SAML response document contains the verified identifying information of the logged-in user.
00:07:49.360 Once the SAML response is received by the client and submitted to the service provider, the authentication flow is done.
00:07:57.190 The service provider may still decide not to log the user in for some reason, like a policy violation regarding a particular resource.
00:08:04.110 But from the moment the client sends the token and the response document back to the service provider, the identifying information is delivered.
00:08:11.690 Of course, it's also essential that the SAML documents always contain a signature that matches the rest of the information provided.
00:08:18.100 The main issue with SAML became clear within a few years after the release of 2.0: it assumes that the client is a web browser.
00:08:24.630 The SAML specification makes this assumption regarding the transport of this final step using either POST or redirect.
00:08:31.380 However, for mobile clients, this step does not work.
00:08:37.980 Both to address this limitation, providing for direct communication between the service and the authorization server.
00:08:44.240 The catch is that the identifying information is not provided to the user as part of the authentication token.
00:08:51.520 Instead, the service provider has to request it as part of the verification step.
00:08:57.640 If you need to support multiple customers with their own identity providers, take a look at OmniAuth MultiProvider.
00:09:04.650 This makes it easy to pull the parameters for a SAML strategy out of an Active Record model.
00:09:11.150 One limitation of OmniAuth is that while SAML 2.0 supports automatic configuration through metadata exchange, OmniAuth supports an options phase.
00:09:18.000 For uses like that, OmniAuth MultiProvider will need some extra tuning to get that to work.
00:09:25.320 In the meantime, consider setting up a feature flag or environment toggle to log the full XML of SAML assertions.
00:09:31.850 In the event a particular customer has trouble setting up their SAML provider, you can grab the assertion and see exactly what is required.
00:09:38.300 Here's a few other suggestions that are useful for debugging SAML and other authentication protocols.
00:09:45.110 Set up test organizations with your most popular identity vendors such as Okta or OneLogin.
00:09:52.550 Even if it's an identity provider that your organization also uses internally, you probably won't be able to use your organizational account for development and debugging.
00:09:59.020 Instead, you'll want a test organization with larger vendors such as Okta. This is not necessarily included in your contract.
00:10:05.970 It's something you'll want to negotiate. If you use trial accounts for development and testing, you'll find that they may expire and lose valuable state every 30 days.
00:10:12.520 To avoid this, you'll want your vendor account manager to include permanent testing resources from the beginning.
00:10:19.220 Rather than relying on a suspenders with a large surface area, another option that's helpful for understanding SAML is to debug against an open-source identity provider such as Keycloak.
00:10:25.480 This has the advantage of being free, plus you can quickly set up and run the identity provider locally using Docker.
00:10:31.560 Keycloak is certainly a thinner solution than Okta, but in practice this makes it easier to locate the certificates and settings you need.
00:10:37.050 You'll also want to be able to inspect authentication flows in process from the client perspective.
00:10:44.470 For this, use the Grok, a SAML browser plugin. The latter adds XML decoding and inspection to your browser developer tools.
00:10:51.600 It's super helpful to see exactly what's being sent for each stage of the flow and to decode the SAML assertions right there.
00:10:59.300 Before you get your SCIM implementation in place, you may need to handle account provisioning just in time.
00:11:05.540 After your single sign-on authentication response is received, the main idea is to treat the authorization token as being authoritative.
00:11:13.290 If Google says that Billy Moore from Greenfield is at your door, and you don't have an account for Billy Moore yet, you can create one for him as part of the login flow.
00:11:21.290 This is what we call just-in-time provisioning.
00:11:28.110 A SAML plus just-in-time provisioning scenario raises a couple of edge cases which you'll need to think through.
00:11:35.920 Among them are email and name changes, as when an individual gets married, and email inheritance, as when Billy Moore departs and Billy Strayhorn arrives.
00:11:42.070 What is SCIM? SCIM stands for System for Cross-domain Identity Management. Note that throughout this presentation, when we say SCIM, we refer to version 2.0 of the SCIM API.
00:11:50.610 SCIM is an API that is implemented by a service provider and used by an identity provider to manage resources on the service provider.
00:11:57.040 As a B2B company, we implemented the SCIM API to allow our customers to manage their users and groups through third-party software.
00:12:04.050 Specifically starting with Okta, this gives the administrators of our customer organizations more control over who can access our products and the permissions they have.
00:12:11.150 There are two portions of the SCIM API: discoverability and operations.
00:12:19.410 The discoverability portion of the API tells a client about the supported features, resources, and attributes.
00:12:26.120 The operations portion of the SCIM API provides the abilities to create, read, update, and delete resources, search for resources, and perform operations on those resources.
00:12:33.070 The two resources defined in the SCIM API are users and groups. However, this can also be extended to other resources in your application.
00:12:39.630 Here's an example of the user resource. We can see many of the attributes that are defined in the SCIM core schema.
00:12:46.380 And here is the group resource. There's a lot to take in here from these two resources.
00:12:52.090 So, to start off, we will want to determine how much of the SCIM API we need and want to support in order for our administrators, our customers, to manage resources efficiently.
00:12:58.450 This will narrow the scope of our SCIM API.
00:13:05.300 We will want to think about how SCIM interacts with our internal permission structure.
00:13:12.020 If we use groups to assign permissions, we'll want to implement the group's operations of the SCIM API.
00:13:18.910 We could also use a roles attribute on users to specify multiple values or a user type field to specify a singular value for our application.
00:13:25.410 We use both roles and groups for different permissions, so we will implement the roles attribute in our SCIM API and support the group's resource.
00:13:31.039 We have also worked with our product managers, customer success, and sales teams to determine which identity providers are the most valuable to our customers.
00:13:39.060 We have decided to start by integrating with Okta, so we will only need to implement the portions of the SCIM API which Okta requires.
00:13:44.520 It's important to also spend time researching the other identity providers we may want to support in the future.
00:13:51.170 We don't want to dig ourselves into a hole by ignoring the rest of the SCIM API and missing a crucial piece from the beginning.
00:13:58.240 So we'll remove some of the user attributes from the user resource that we don't need to support and add some others to get our supported user resource.
00:14:04.199 We'll do the same for groups.
00:14:11.049 Our resources are starting to look more manageable now. We will also be able to limit which endpoints are included in our first iteration of our SCIM API.
00:14:17.500 By only implementing those that Okta uses, this eliminates the entire discoverability section of the SCIM API and the bulk operations, too.
00:14:25.740 So, isn't there a gem for all this? Let's take a look at three gems to help us implement the SCIM API.
00:14:30.610 SCIM Kit by Milko, SCIM Rails by Lessing Lee, and SCIM Engine by Cisco AMP.
00:14:38.720 Each of these gems covers a slightly different piece of the SCIM API.
00:14:44.670 SCIM Kit covers the discoverability portion, while SCIM Rails covers the operations portion, but only for users.
00:14:51.570 SCIM Engine covers both the discoverability and operations.
00:14:57.950 The SCIM Kit gem focuses on the schema and resources for your application.
00:15:04.470 It helps provide the discoverability portion of the SCIM API by telling an identity provider about how to use your API.
00:15:10.570 It doesn't help you implement the operations portion of SCIM, so after implementing this gem into your Rails application, you will still need to do all the work to manage resources.
00:15:16.040 The pieces of the SCIM API that SCIM Kit will help you implement aren't required by Okta, so for us, this meant skipping this entirely to narrow our scope.
00:15:23.170 Next up, we have the SCIM Rails gem. This gem is not fully SCIM compliant.
00:15:30.270 It focuses on the components of SCIM that are required by Okta within the set of features Okta supports.
00:15:36.570 This gem helps implement the SCIM API for users, but does not support groups or adding other resources.
00:15:44.610 This gem also makes assumptions about how your underlying data model looks.
00:15:52.140 It assumes your users are organized within a company that has a scope to get these ERDs.
00:15:58.490 It also assumes that your data model matches the core SCIM schema. Due to our existing code, there wasn't enough flexibility here to add support for groups.
00:16:06.950 It was also difficult to handle the differences between our data models and the SCIM core schema.
00:16:13.410 Lastly, we have the SCIM Engine gem. This gem aims to be more general-purpose than the SCIM Kit and SCIM Rails gems.
00:16:19.410 It supports the core schema endpoints, the operation endpoints, and can be extended to handle multiple resource types.
00:16:27.020 There are some pieces you'll need to implement yourself, though, to be fully SCIM compliant.
00:16:34.470 You'll need to implement the index action with filtering, handling patched parameters, and other application-specific logic.
00:16:41.480 That other logic is summarized nicely in an example controller in the SCIM Engine repository.
00:16:49.850 Convert the SCIM Engine resources to your application object and save. This is definitely easier said than done.
00:16:56.170 Factoring your SCIM API endpoints is the hardest part of writing your SCIM implementation.
00:17:02.660 It opens up the question: should I really do what this request is asking?
00:17:09.920 The answer to this question lies in the permissions of your application.
00:17:17.270 In our case, users belong to organizations and organizations have domains.
00:17:24.130 If a request comes in for a user with an email address domain that belongs to the organization making the request, we fulfill it.
00:17:30.450 We use domains as an additional way to determine if the request has permission to create or link to users with a specific email.
00:17:36.540 In addition to linking users, we also ran into edge cases and questions from our customers.
00:17:43.170 These edge cases also apply to SSO with just-in-time provisioning.
00:17:50.100 The first edge case we ran into involved emails. Someone got married and decided that goatlover was not an excessively professional email handle.
00:17:57.950 With SCIM, it's now time for your administrator to link your accounts. They use your first name, last name, and email in the identity provider.
00:18:05.340 However, it doesn't link to your goatlover email handle that you used to sign up for the service provider.
00:18:12.820 Instead, you have a fresh blank account that's missing all your previous work.
00:18:19.250 In this case, we provided our customers with instructions on how to remove the new account, change the email within our application on the old account, and then retry the SCIM request.
00:18:25.700 Another solution would be to use the goatlover email handle in the identity provider.
00:18:32.490 Then change it within the identity provider to trigger an update operation on our SCIM API.
00:18:39.350 In this case, we expect the request to have an external UID that matches an existing user and a new email address.
00:18:46.790 This is another edge case we've added as a warning in our SCIM integration guide.
00:18:53.010 SSO with just-in-time provisioning handles this similarly: when we get the SSO login request, we check both the external UID and email address.
00:19:00.590 If the user already exists in our system, determined by their external UID, but the authorization token specifies a different name or email address, we're looking at an email change.
00:19:06.680 On the other hand, we may see the external UID has changed, but the email already exists in the system.
00:19:13.270 This results in a more subtle situation to think about. This happens when Billy Moore departs and Billy Strayhorn arrives.
00:19:20.890 The latter really wants to be [email protected].
00:19:27.570 It depends on how the external system treats email addresses.
00:19:34.520 If it does not enforce email uniqueness, stop right here! There's nothing really you can conclude, and your customer will have to take certain administrative actions manually.
00:19:41.490 If you can assume that the email, though it might be mutable, is unique within the customer organization, then based on our handling code, the new external UID paired with an old email address likely means the old UID is now inactive.
00:19:48.940 In other words, someone has left the customer organization, and you are now looking at a new person.
00:19:56.050 This situation is what we call email inheritance.
00:20:02.170 Checking the external UID is the solution for both SCIM and SSO with just-in-time provisioning.
00:20:09.130 The challenge here is that the correct actions are to deactivate or delete the prior user account.
00:20:16.490 However, in a mature system, these actions may not be so easy.
00:20:24.050 A user may have many associated records shared and accessed within the team.
00:20:30.320 You must be careful about what you delete and how you manage the disposition of any ambiguous situations.
00:20:37.200 For instance, if the user scheduled for deletion is the creator or owner of a resource, where does that resource fall in their absence?
00:20:44.250 It's also possible that user or administrative error has led to this situation, in which case you'd better hope there's an easy way to undelete anything you may need.
00:20:50.930 When you decide to deprovision a user in the course of another user's login or SCIM requests, that's a significant business decision.
00:20:59.580 You will definitely want to ensure you have observability and maybe even proactive alerting for these events.
00:21:06.260 Next, we'll talk about testing your SCIM API to ensure it will work with identity providers to manage resources on your application.
00:21:12.790 There are a couple of things we want to confirm when testing your SCIM integration.
00:21:20.020 The API complies with the format specified by the SCIM protocol.
00:21:25.740 Each endpoint implements the business logic we expect to take place.
00:21:30.520 Requests are authorized appropriately and only modify resources they should access.
00:21:37.370 Integrating with our preferred identity provider works. We're able to make changes in the identity provider and see those changes within our application.
00:21:42.200 This also includes running through common and uncommon administrative workflows.
00:21:48.990 For each of these, there will be different tools you can use. Okta provides tests for your API.
00:21:55.530 They also provide basic tests of integrating with their preview environment that work with Blazemeter RunScope.
00:22:01.800 You'll also want to include RSpec or another framework for integration and unit tests, and possibly penetration testing to ensure only authorized users are accessing and making changes through your SCIM API.
00:22:08.370 Okta also requires an entire suite of manual testing to be done through the Okta Admin environment.
00:22:16.380 This manual testing is tedious, but it will teach you how to use Okta as an administrator.
00:22:23.220 It also shows the workflows administrators use for common tasks like assigning users to your applications, deactivating and suspending users, and pushing groups.
00:22:29.260 If you don't know how a specific workflow is done in Okta, you can probably find it in the manual testing spreadsheet.
00:22:36.390 We also relied on a tool called Angra to capture and replay requests that hit our SCIM development server.
00:22:42.000 When your SCIM API is working, you can then submit it to Okta's Integration Network.
00:22:49.170 To submit, you'll need to provide Okta with an integration guide, showing passing automated tests and Blazemeter RunScope.
00:22:55.210 Confirm manual QA test passes and provide testing credentials for your application.
00:23:02.570 Each time we contacted Okta, there was about a one-week turnaround.
00:23:09.500 Our initial feedback involved updating support contacts for our company, removing optional Blazemeter RunScope tests that were not passing, providing testing credentials, and confirming that we performed manual testing of our integration.
00:23:15.960 Our first two bugs came from a race condition on creating users quickly and business logic for our application.
00:23:23.480 The next round of bugs involved asking us to update screenshots in their integration guide and an error in the Okta tests.
00:23:30.330 There was a typo in the role they assigned.
00:23:36.340 After one more round of updating screenshots in our integration guide, our application was approved.
00:23:43.430 It took about five weeks from first submission to getting final approval.
00:23:49.340 We hope you have enjoyed hearing about the lessons we've learned in handling identity concerns in Rails apps, debugging SAML, and developing a SCIM integration.
00:23:54.490 We've shared design considerations and insights into enterprise identity management on Rails.
00:24:00.210 Thanks for watching!