Ruby Video
Talks
Speakers
Events
Topics
Leaderboard
Sign in
Talks
Speakers
Events
Topics
Use
Analytics
Sign in
Suggest modification to this talk
Title
Description
GitLab, the open source alternative to GitHub written in Rails, does not scale automatically out of the box, as it stores its git repositories on a single filesystem, making storage capabilities hard to expand. Rather than attaching a NAS server, we decided to use a cloud-based object storage (such as S3) to replace the FS. This introduced changes to both the Ruby layer and the deeper C layers. In this talk, we will show the audience how we did the change and overcame the performance loss introduced by network I/O. We will also show how we achieved high-availability after the changes.GitLab, the open source alternative to GitHub written in Rails, does not scale automatically out of the box, as it stores its git repositories on a single filesystem, making storage capabilities hard to expand. Rather than attaching a NAS server, we decided to use a cloud-based object storage (such as S3) to replace the FS. This introduced changes to both the Ruby layer and the deeper C layers. In this talk, we will show the audience how we did the change and overcame the performance loss introduced by network I/O. We will also show how we achieved high-availability after the changes.
Date
Summary
Markdown supported
In this talk presented at RailsConf 2016, Minqi Pan discusses the complexities of scaling GitLab, an open-source alternative to GitHub, for large organizations like Alibaba. Due to GitLab's architecture, which relies on a single filesystem to store git repositories, scaling becomes a significant challenge. Instead of using traditional NAS for file storage, the decision was made to transition to a cloud-based object storage solution like Alibaba OSS, similar to Amazon S3. This transition necessitated a careful redesign of both the Ruby layer and lower-level C components to cope with performance degradation from network I/O. Key points covered include: - **GitLab Architecture**: GitLab functions as a black box that operates through HTTP and SSH, with backends like PostgreSQL and Redis. The architecture can create bottlenecks when scaling due to its reliance on a traditional filesystem for repository storage. - **Innovative Solutions**: Pan introduced 'ssh-to-http', a project to translate SSH requests into HTTP requests to simplify server interactions. Load balancing was handled through IPVS for more efficient traffic management during requests. - **Cloud Migration**: The need to move GitLab’s storage to a cloud solution was emphasized due to the limitations of existing architecture. Moving to Alibaba OSS aligned better with the need for scalability and ease of maintenance. - **Handling Git Operations**: The talk detailed how the design connected Git to LibGit2, transforming how data retrieval operations were performed, especially dealing with packed content, which posed unique challenges due to its structure in Git. - **Performance Concerns**: Initial performance benchmarks indicated slower responses post-transition due to the necessary switch from fast filesystem operations to slower HTTP communications. Caching strategies were examined as a means to mitigate these performance losses. - **Future Developments**: Pan also hinted at exploring the potential of creating an AWS S3 compatible back-end to enhance deployment flexibility. In conclusion, effective caching and innovative architectural adjustments are critical to overcoming performance issues encountered with the shift to cloud storage. This talk serves as a comprehensive overview for Rails developers facing similar scaling issues with distributed applications.
Suggest modifications
Cancel