00:00:21.619
This talk is about outgrowing the cloud. It will be more of an experiential record of my experiences using the cloud. For those who don't know me, my name is Mike Moore, and my handle is Voltage. I work at a company called Bloomfire, where we have a really great team and we enjoy our time together. By the way, we are hiring, so if you're looking for a job, feel free to reach out!
00:00:45.870
One of the things I depend on is running the Mountainless conference. So, by a show of hands, how many people are planning to attend Mountainless in less than two weeks? Excellent! I’m super excited for this year at Mountainless. However, if you look at our logo, it’s very literal, and I don't particularly like it. I hope we can change that by the time the conference starts.
00:01:14.050
Let's talk about the cloud. Everybody loves the cloud. The cloud offers an awful lot. It makes it super easy to send up a new application, it simplifies hardware provisioning, and helps with growth and scaling. But I know most of you are thinking about the fun aspects of the cloud. The best thing about the cloud is... well, Rainbow Dash! Rainbow Dash is the coolest of all.
00:01:32.170
Alright, show of hands! Are there any bronies in the audience? Just one? Okay, everyone else, you might want to grab a beer because it might not get any better for you from here.
00:02:25.880
Now, here is a really simple diagram showing the systems architecture. On the bottom, we have the web server, and on top is the API server. I'm going to use color coordination to make this easier to visualize. The red square represents databases. One common practice is to put multiple web servers in place to ensure redundancy. In case one falls over, you need a load balancer that acts as a front. This is a typical approach to building applications.
00:02:38.390
Initially, when Bloomfire started, we were on the cloud, and it worked really well for us. A user on the internet would request something from our load balancer, which would hand off to the web server. The web server would communicate with the database, package the data, and send it back to the user. I am a strong believer in PDD (Product Driven Development). During development, I prefer to not solve problems until I absolutely need to.
00:03:21.810
But that changed when I dealt with customer domains. We provide a self-service option for our customers, allowing them to use their own domain. This works fine for most cases. However, it fails spectacularly when using HTTPS because you need to send the SSL certificate before stating what the request is. This necessitates having separate IPs for customer domains, which led to significant challenges.
00:04:02.490
When users come in from Bloomfire, they go one way, but those using customer domains have to go through another route. Unfortunately, our cloud provider did not allow multiple IPs on a single instance unless you pay an exorbitant fee. At $80 a month for each IP, that was just too steep given that we were charging around $100 a month for the service. This model was unsustainable for our business.
00:04:55.530
A second pain point was regarding customer whitelists. Corporate clients often have strict authentication needs, which means they request us to whitelist our servers to integrate with their services. This is a challenge because every time we need to add capacity by adding a new web server, it necessitates an update to that whitelist. As we grow, this has the potential to become a severe obstacle.
00:05:51.229
Additionally, we face issues with hitting our web servers directly, which is concerning. If you know the IP and port being used, anyone could connect to our server or even our database directly, which is a significant security risk. We aimed to change this and grow our practices beyond the current cloud capabilities.
00:06:38.330
To do this, there were five key things we needed to focus on: networking, security, availability, automation, and understanding our platform. Networking is fundamental; we needed to establish a logical group of networks that could only be accessed in specified ways. This means understanding subnets and ensuring strict control over access. It's important to configure permissions properly to avoid unauthorized access.
00:07:38.119
Security is another critical factor. Previously, we relied on our cloud provider for access management. However, as we made this transition, we needed to control access ourselves, which appeared daunting at first. The availability of the system comes from monitoring applications, implementing disaster recovery, and ensuring that we cover all our bases.
00:08:19.080
Automation became crucial for efficiency. We decided to use Chef for our automation needs. It's a fantastic tool that streamlines processes. It's important to understand what platform you’re using, whether it’s a Debian-based platform or something else. We packaged up our Rails apps, used deb packages, and focused on creating automation solutions.
00:09:11.240
In summary, we identified that our main problems centered around customer domains and whitelists. To mitigate these challenges, we attempted to simulate a data center where we would have dedicated resources and configurations.
00:09:59.400
We decided to stay in the cloud but migrated our servers to a Virtual Private Cloud (VPC) on AWS, which provided us with a safe and secure environment. In this cloud, we could manage our subnets and the load balancer from Amazon, which relieved us of significant management overhead. This also allowed us to use SSL via the load balancer without needing to manage certificates on the web servers directly.
00:11:57.750
As requests come in, they go through our load balancer and are effectively managed without security issues. This resolved our primary problems: handling multiple domains and load balancing effectively by making this new architecture cost-effective.
00:13:38.579
We can scale our infrastructure easily within the private cloud. With the right architecture, we can focus on providing services to our customers without worrying about multiple external IPs cluttering our resources. Furthermore, all the IPs within those subnets are private, enhancing our security posture.
00:14:01.590
Our availability strategies included using backup snapshots and ensuring proper monitoring for disaster recovery. We utilized tools like Cron and Rake for automation, which proved invaluable in maintaining uptime and data consistency.
00:14:35.920
Finally, our automation through tools like Chef not only streamlined our processes but also allowed us to deploy quickly and without hassle. This way, we've minimized our downtime, maintained continuous deployments, and ensured our scaling strategy was efficient.
00:15:51.770
We acknowledge that institutional courage is also critical for success—a company like Bloomfire supports team-driven innovations and encourages making radical changes. Thankfully, we managed to tackle all these issues without altering our application directly; we made these improvements purely on the infrastructure side.
00:19:09.210
To conclude, using our experience as a reference, hopefully, we can manipulate the cloud environment to be as effective as possible. So, let's take a page from Rainbow Dash and kick some serious butt! Now, I’d love to open the floor for questions.
00:19:50.490
Regarding automation tools, we've settled on Chef Solo, which serves us well. We have a Git repository for our Chef recipes, and upon standing up a new server, we pull from that repository. It’s a straightforward process that allows for efficient configurations.
00:21:13.300
As far as monitoring goes, we've integrated New Relic to track server performance across our environment. This tool has been particularly beneficial in identifying process issues before they become significant problems. Generally, these have provided great value, allowing us to keep applications running smoothly.
00:23:02.080
In terms of cost, we found the total expenses of utilizing our cloud solution to be 10% lower than what we previously spent while being hosted on Engine Yard. By leveraging the cloud and optimizing our infrastructure, we've managed to not only improve performance but also reduce costs significantly.