Ruby on Ales 2012
Outgrowing the Cloud
Summarized using AI

Outgrowing the Cloud

by Mike Moore

In this presentation titled "Outgrowing the Cloud," Mike Moore shares his experiences with cloud services and the challenges that arise as applications expand. Initially, cloud providers simplify application deployment and scaling, but complications can emerge as requirements become more complex.

Key Points Discussed:
- Introduction to Cloud Services: Moore highlights the appeal of the cloud for deploying applications and managing infrastructure. He emphasizes that while cloud services lower barriers and foster innovation, they are not without limitations.
- Bloomfire's Experience: Moore reflects on Bloomfire’s initial success with cloud services, where the standard architecture included web servers, load balancers, and databases. He details how customer domains and HTTPS requirements necessitated significant adjustments, notably the need for separate IP addresses.
- Challenges Encountered:
- IP Management: The need for unique IPs for customer domains created unsustainable costs, which became prohibitive for their pricing model.
- Customer Whitelists: Adding servers required constant updates to whitelists for corporate clients, presenting an ongoing administrative hurdle.
- Security Risks: Direct access to web servers by external parties posed severe security threats.
- Focus Areas for Improvement: To address their challenges, Moore identifies five critical areas:
- Networking: Establishing well-structured networks with controlled access.
- Security: Transitioning to in-house access management.
- Availability: Implementing solid monitoring and disaster recovery plans.
- Automation: Utilizing tools like Chef for process efficiency.
- Platform Understanding: Being aware of the specific environment for optimal configuration.
- Transition to Virtual Private Cloud (VPC): The team migrated to a VPC on AWS, which allowed them to manage their infrastructure more securely while keeping costs in check. This included handling SSL at the load balancer level, removing the need for direct certificates on web servers.
- Results and Benefits: The changes led to improved infrastructure scalability, reduced security risks, and enhanced availability through disaster recovery processes and automation tools like Chef and Cron.

In conclusion, Moore emphasizes the importance of being proactive about challenges as applications scale and encourages teams to maintain innovative practices. By adapting infrastructure without modifying application code, Bloomfire exemplified how to effectively leverage cloud capabilities while addressing specific needs.

00:00:21.619 This talk is about outgrowing the cloud. It will be more of an experiential record of my experiences using the cloud. For those who don't know me, my name is Mike Moore, and my handle is Voltage. I work at a company called Bloomfire, where we have a really great team and we enjoy our time together. By the way, we are hiring, so if you're looking for a job, feel free to reach out!
00:00:45.870 One of the things I depend on is running the Mountainless conference. So, by a show of hands, how many people are planning to attend Mountainless in less than two weeks? Excellent! I’m super excited for this year at Mountainless. However, if you look at our logo, it’s very literal, and I don't particularly like it. I hope we can change that by the time the conference starts.
00:01:14.050 Let's talk about the cloud. Everybody loves the cloud. The cloud offers an awful lot. It makes it super easy to send up a new application, it simplifies hardware provisioning, and helps with growth and scaling. But I know most of you are thinking about the fun aspects of the cloud. The best thing about the cloud is... well, Rainbow Dash! Rainbow Dash is the coolest of all.
00:01:32.170 Alright, show of hands! Are there any bronies in the audience? Just one? Okay, everyone else, you might want to grab a beer because it might not get any better for you from here.
00:02:25.880 Now, here is a really simple diagram showing the systems architecture. On the bottom, we have the web server, and on top is the API server. I'm going to use color coordination to make this easier to visualize. The red square represents databases. One common practice is to put multiple web servers in place to ensure redundancy. In case one falls over, you need a load balancer that acts as a front. This is a typical approach to building applications.
00:02:38.390 Initially, when Bloomfire started, we were on the cloud, and it worked really well for us. A user on the internet would request something from our load balancer, which would hand off to the web server. The web server would communicate with the database, package the data, and send it back to the user. I am a strong believer in PDD (Product Driven Development). During development, I prefer to not solve problems until I absolutely need to.
00:03:21.810 But that changed when I dealt with customer domains. We provide a self-service option for our customers, allowing them to use their own domain. This works fine for most cases. However, it fails spectacularly when using HTTPS because you need to send the SSL certificate before stating what the request is. This necessitates having separate IPs for customer domains, which led to significant challenges.
00:04:02.490 When users come in from Bloomfire, they go one way, but those using customer domains have to go through another route. Unfortunately, our cloud provider did not allow multiple IPs on a single instance unless you pay an exorbitant fee. At $80 a month for each IP, that was just too steep given that we were charging around $100 a month for the service. This model was unsustainable for our business.
00:04:55.530 A second pain point was regarding customer whitelists. Corporate clients often have strict authentication needs, which means they request us to whitelist our servers to integrate with their services. This is a challenge because every time we need to add capacity by adding a new web server, it necessitates an update to that whitelist. As we grow, this has the potential to become a severe obstacle.
00:05:51.229 Additionally, we face issues with hitting our web servers directly, which is concerning. If you know the IP and port being used, anyone could connect to our server or even our database directly, which is a significant security risk. We aimed to change this and grow our practices beyond the current cloud capabilities.
00:06:38.330 To do this, there were five key things we needed to focus on: networking, security, availability, automation, and understanding our platform. Networking is fundamental; we needed to establish a logical group of networks that could only be accessed in specified ways. This means understanding subnets and ensuring strict control over access. It's important to configure permissions properly to avoid unauthorized access.
00:07:38.119 Security is another critical factor. Previously, we relied on our cloud provider for access management. However, as we made this transition, we needed to control access ourselves, which appeared daunting at first. The availability of the system comes from monitoring applications, implementing disaster recovery, and ensuring that we cover all our bases.
00:08:19.080 Automation became crucial for efficiency. We decided to use Chef for our automation needs. It's a fantastic tool that streamlines processes. It's important to understand what platform you’re using, whether it’s a Debian-based platform or something else. We packaged up our Rails apps, used deb packages, and focused on creating automation solutions.
00:09:11.240 In summary, we identified that our main problems centered around customer domains and whitelists. To mitigate these challenges, we attempted to simulate a data center where we would have dedicated resources and configurations.
00:09:59.400 We decided to stay in the cloud but migrated our servers to a Virtual Private Cloud (VPC) on AWS, which provided us with a safe and secure environment. In this cloud, we could manage our subnets and the load balancer from Amazon, which relieved us of significant management overhead. This also allowed us to use SSL via the load balancer without needing to manage certificates on the web servers directly.
00:11:57.750 As requests come in, they go through our load balancer and are effectively managed without security issues. This resolved our primary problems: handling multiple domains and load balancing effectively by making this new architecture cost-effective.
00:13:38.579 We can scale our infrastructure easily within the private cloud. With the right architecture, we can focus on providing services to our customers without worrying about multiple external IPs cluttering our resources. Furthermore, all the IPs within those subnets are private, enhancing our security posture.
00:14:01.590 Our availability strategies included using backup snapshots and ensuring proper monitoring for disaster recovery. We utilized tools like Cron and Rake for automation, which proved invaluable in maintaining uptime and data consistency.
00:14:35.920 Finally, our automation through tools like Chef not only streamlined our processes but also allowed us to deploy quickly and without hassle. This way, we've minimized our downtime, maintained continuous deployments, and ensured our scaling strategy was efficient.
00:15:51.770 We acknowledge that institutional courage is also critical for success—a company like Bloomfire supports team-driven innovations and encourages making radical changes. Thankfully, we managed to tackle all these issues without altering our application directly; we made these improvements purely on the infrastructure side.
00:19:09.210 To conclude, using our experience as a reference, hopefully, we can manipulate the cloud environment to be as effective as possible. So, let's take a page from Rainbow Dash and kick some serious butt! Now, I’d love to open the floor for questions.
00:19:50.490 Regarding automation tools, we've settled on Chef Solo, which serves us well. We have a Git repository for our Chef recipes, and upon standing up a new server, we pull from that repository. It’s a straightforward process that allows for efficient configurations.
00:21:13.300 As far as monitoring goes, we've integrated New Relic to track server performance across our environment. This tool has been particularly beneficial in identifying process issues before they become significant problems. Generally, these have provided great value, allowing us to keep applications running smoothly.
00:23:02.080 In terms of cost, we found the total expenses of utilizing our cloud solution to be 10% lower than what we previously spent while being hosted on Engine Yard. By leveraging the cloud and optimizing our infrastructure, we've managed to not only improve performance but also reduce costs significantly.
Explore all talks recorded at Ruby on Ales 2012
+4