Castle On a Cloud: The GitHub Story

The video titled "Castle On a Cloud: The GitHub Story" features Ben Bleything from GitHub, focusing on the infrastructure that supports their internal applications, primarily built on AWS and Heroku. Although GitHub.com receives significant attention, this presentation sheds light on the often-overlooked aspects of their cloud infrastructure, which includes over 300 AWS instances, numerous Heroku dynos, and various other cloud services.

Key points discussed include:
- Infrastructure Overview: GitHub relies on a diverse infrastructure with hundreds of internal applications. Most of these applications are hosted on Amazon EC2 and Heroku, highlighting the reliance on cloud services to maintain operational efficiency.
- ChatOps: GitHub's operational model leans heavily on ChatOps, allowing the distributed Ops team to automate processes and communicate effectively via Campfire, integrating operational tasks with chat capabilities through tools like Hubot.
- Data Center Specifics: The presentation describes a data center located in Virginia that powers the main GitHub.com site, which is mainly served by high-density Dell C-series sled servers without virtualization for serving the website.
- Resource Management in AWS: GitHub utilizes various AWS services, such as EC2, RDS, S3, CloudFront, and more, to manage their resources while facing challenges like AWS's resource limits, particularly with S3 buckets.
- Identity and Access Management (IAM): Bleything discusses utilizing IAM to manage AWS credentials more efficiently, emphasizing the ability to consolidate storage locations while ensuring proper access controls.
- Operational Challenges and Solutions: Due to internal decisions, limits on S3 bucket creation have led to innovative workarounds, such as assigning prefixes to a single bucket for different environments. This adjustment significantly reduced their bucket count.
- Cloud AWS Tool: The Cloud AWS tool allows team members to interact with their EC2 resources via chat, showcasing how automation streamlines operational tasks and improves response time.
- Focus on Database Management: Bleything explains the preference for managing databases with MySQL rather than Postgres on Heroku due to existing expertise and tooling, despite some developers using Postgres.

In conclusion, Ben Bleything's talk highlights the complexities and strategies GitHub employs in managing their vast cloud infrastructure. The emphasis on automation, proper resource management, and the innovative use of ChatOps underpins their operational success. The main takeaway is that effective integration of cloud services and tools significantly enhances operational capabilities while addressing challenges associated with resource limits.

Castle On a Cloud: The GitHub Story
Ben Bleything • February 20, 2014 • Earth

When you think "GitHub", you're probably thinking of what we lovingly refer to as GitHub Dot Com: The Web Site. GitHub Dot Com: The Web Site runs on an incredibly interesting infrastructure composed of very powerful, cleverly configured, and deeply handsome servers. This is not their story.

This is the story of the other 90% of our infrastructure. This is the story of the 350 AWS instances, 250 Heroku dynos, and dozens of Rackspace Cloud, Softlayer, and ESX VMs we run. This is a story of tooling and monitoring, of happiness and heartbreak, and, ultimately, of The Cloud.

Help us caption & translate this video!

http://amara.org/v/FG3s/

Big Ruby 2014