Twitter Mobile

LA RubyConf 2011

Twitter Mobile

Benjamin Sandofsky

1 talk

#ruby

#mobile-development

#continuous-deployment

#agile-development

#ruby-on-rails

Twitter Mobile

by Benjamin Sandofsky

The video, titled 'Twitter Mobile', features Benjamin Sandofsky, an engineer from Twitter's mobile team, who discusses the architecture and engineering practices of Twitter, particularly focusing on its mobile applications. The main topic centers on how Twitter scales its services to manage a significant volume of tweets and user interactions, especially via mobile platforms.

Key Points Discussed:
- Volume of Tweets: Twitter experiences around 110 million tweets daily, with approximately 40% originating from mobile devices.
- Architecture Overview: Sandofsky provides a high-level understanding of Twitter's architecture, explaining the flow of a tweet from the client to processing and delivery. The architecture involves a core codebase being transitioned into a service-oriented architecture.
- Mobile Platform: Emphasis on mobile.twitter.com and SMS services illustrates the importance of mobile traffic. The mobile site is a pure API client that enhances user experience while maintaining backend efficiency.
- Development Practices: The presentation highlights best practices derived from Ruby, including iterative development, automated testing, and code reviews which improve efficiency and quality control.
- Case Study: Sandofsky shares specific examples from mobile development, such as the creation of a REST client in only 51 days and improvements concerning API connections.
- Scaling Challenges: Discussion on the complexities of managing SMS services directly with numerous carriers, emphasizing the intricacies in business agreements, service configurations, and logging disconnections.
- Deployment and Testing: Twitter's continuous deployment strategy includes daily updates and thorough automated testing, facilitating rapid iteration and quality assurance.
- Team Dynamics: The importance of a generalist development approach fosters flexibility and avoids knowledge silos within the team.

Conclusions and Takeaways:
- Twitter’s engineering culture effectively combines people and processes to deliver quality service at scale.
- The use of Ruby in various applications demonstrates the balance between rapid development and robust architecture tailored for mobile needs.
- Cultural values, such as collaboration and iterativity, are essential for sustaining growth and innovation within Twitter’s mobile team. Sandofsky concludes with an invitation for engagement, emphasizing the importance of technical proficiency and cultural fit in potential candidates for his team.

00:00:29.760 My name is Ben Sandofsky. As you mentioned, I work on the mobile team, and the reason I'm here today is that Twitter is one of the largest companies running Ruby in production today.

00:00:35.920 To give you some scope of how large we are, we currently have about 110 million tweets coming in every day to the service. Given the one-to-many nature of Twitter's relationships, the actual number of tweets being delivered every day is quite larger.

00:00:47.840 The record for tweets per second is about 6,939, which occurred on New Year's Day this year. They actually overlaid a map of the world showing some epicenters where the largest number of tweets were coming through on that day. You can check it out on the blog.

00:01:05.840 As far as my team is concerned, about 40 percent of those tweets coming in daily originate from mobile. This traffic could be through a native app like Twitter for iPhone, the mobile website mobile.twitter.com, or our SMS service, which is especially huge in countries where smartphones are not prevalent.

00:01:30.320 Today, I'll cover the general architecture of Twitter so you can see where mobile fits into the bigger picture. Then, I'll provide some case studies from our mobile team, focusing on three areas where we currently use Ruby because our team loves it and utilizes it whenever possible. Out of these, we're probably going to dive the deepest into the mobile website since most of you are likely familiar with using Ruby on Rails for web apps.

00:02:00.000 I'll wrap up by discussing some best practices we co-opted from Ruby early in the company’s development, which I think were major influences on our engineering best practices. Hopefully, we'll also have some time at the end to answer questions.

00:02:21.520 Let's start by discussing the architecture and where mobile fits in. If you open your favorite client, type 'Hello, world!', and hit tweet, it goes into a significant chart that I'm going to break down into three parts.

00:02:38.959 As you load your client and send a tweet, it is sent to one of our front ends. If you're on twitter.com, it goes to our web front end which runs Rails, or to api.twitter.com if you're using a native app. If you're on mobile.twitter.com, it's actually a 100% pure API client, routing itself on top of api.twitter.com, providing a nice layer on top.

00:03:03.519 Your tweet then enters the core code that handles all business rules, including who's following whom and the various lookups on accounts. For now, this is handled in one relatively large codebase which we are splitting into a service-oriented architecture. We're still in the process of decoupling all front-end code from the core code, but this is the architecture we are striving for.

00:03:41.120 I won't go into detail about the services making up the core code, but I can mention some examples such as our social graph service powered by Flock, our open-source graph database, our user service on MySQL, and a geo service for timelines. If you’re interested, feel free to catch me afterwards, and I’ll be happy to discuss those in more detail.

00:04:06.720 This part of the graph covers the request response cycle. So when you hit tweet, your tweet routes to the main code. We verify that we've actually registered your tweet, but it isn't delivered to users immediately. In the context of a request response, we return a 200 status but are actually dropping your tweet into a queue.

00:04:17.359 The reason for this design is mainly to ensure performance and scalability. Essentially, we have a big queue full of tweets and a series of daemons on the other side of this queue processing the tweets one at a time and handling them as necessary. For example, when we send a tweet, there's a daemon responsible for checking the social graph to see who follows that user and subsequently delivering the tweet to all of those followers' timelines.

00:04:45.759 We have other daemons that handle SMS and push notifications, checking if users have devices linked to their accounts, or if any followers have devices linked. We also have the Streaming API consumers that pull tweets off and distribute them to Firehose or User Stream consumers. These are just examples of the numerous daemons we utilize for various purposes.

00:05:22.080 We've already touched on several areas where the mobile team plays a role, like the mobile website and SMS functions. Now, I'd like to do a quick show of hands: how many here have ever had to interoperate with an SMS carrier or operator?

00:05:39.759 Now, of that group, how many actually connect directly to the carriers instead of using services like Twilio? This is a smaller group. For those unfamiliar, turning on a carrier is quite a bit of work. It's not as simple as just pointing to a phone number and sending a text message. We first have to establish business agreements with each carrier worldwide, such as AT&T, Verizon, and T-Mobile in the US; there are literally hundreds of carriers globally.

00:06:30.720 For each carrier, we set up agreements outlining what we can and cannot do with users. Moreover, we need to configure each carrier; each country or carrier may have a different short code, and in the US, we use 404, but it could vary based on what's available in each country. We also need to internationalize all commands and manage the disconnection lists.

00:06:55.199 When a user turns off their phone service, the carrier must log that number. They periodically upload to us a list of phone numbers that are no longer with them so we can purge them from our database. If not, someone else might receive direct messages intended for that number, which can be quite awkward.

00:07:13.680 As for the codebase itself, it is divided into two parts: the web service portion that manages incoming SMS messages from the carriers and handles business logic, and the outgoing code, represented by a set of daemons calling out to the carriers, often using a SOAP-like interface, and in some instances, persistent connections.

00:07:37.919 The name code, as I mentioned, handles a lot of the product logic. For instance, when we pull a tweet off the queue to dispatch to a user, we first check whether the device is asleep. Users can choose settings to specify that they don’t want to receive SMS notifications from midnight to 6 AM, or they can adjust other configurations before we push the message to the carrier.

00:08:03.280 For incoming SMS, we expose an endpoint that allows carriers to make RESTful calls to post messages. Incoming messages aren't only tweets; users can also issue text commands like follow or block. Thus, we must have that layer to handle and respond to those commands within the app.

00:08:54.720 Currently, the app is a Rails application. The logic comprises about 2,500 lines of code, with a code to test ratio of about 1.5, meaning there are more tests than actual code.

00:09:38.560 When we develop daemons to connect to the carriers, for instance, to enable Apple Push Notifications, we simply subclass our daemon-based class, override the method, and put all the necessary logic there. We've refactored our daemon code into a core set of classes that span the entire company.

00:10:00.160 The key advantage here is that our operations team understands how to monitor and spin up new instances of these daemons whenever we need them, making it a unified codebase shared across all Ruby-based daemons in the company.

00:10:23.440 Regarding the logic set in that single overridden method, it's about 300 lines of code which handles both calling Apple and receiving disconnection notifications when users uninstall the iPhone app, along with various sanity checks.

00:10:53.120 In large distributed systems, various failures can occur. For instance, we've seen cases where multiple duplicate SMSs arrive. Hence, when you're dealing with hundreds of daemons all at once, it's crucial to implement sanity checks to avoid such issues. In the U.S., costs are associated with each SMS sent.

00:11:22.360 We must also ensure payload formatting for Apple’s notifications and conduct auditing to track which carriers receive more SMSs over time. For example, we've seen spikes in activity in places like Cairo recently.

00:11:50.720 Switching gears, the previous example illustrated how we use Ruby in a web service context. Now, I want to focus on how we develop a web front end using Rails for the mobile website, which I believe serves as an excellent example of agile product development.

00:12:15.760 This project was developed by just two engineers in 51 days, from the first commit to the public preview. During that time, we transitioned the tech stack from existing web apps running on Twitter to Unicorn running in production Twitter.

00:12:43.640 Shortly before the production deployment, we switched once more to Rainbows for concurrency reasons. Although adopting higher performance systems is great, it comes with operational overhead, requiring us to rewrite Puppet configurations and work with our ops teams.

00:13:02.640 Within those same 51 days, we also developed a REST client from scratch, which I’ll elaborate on soon. Importantly, we maintained a strong focus on code quality, ensuring we kept a 1:1 code to test ratio.

00:13:25.200 The overall goal of this project was to create a framework that makes connecting to the API easier. We believe that you don’t need to be an advanced Ruby hacker to effectively utilize Ruby.

00:13:52.720 Development of a domain-specific language revolved around connecting to the Twitter API, allowing us to rapidly implement new API endpoints as they became available. Here’s an example of our REST client making a call to the home timeline.

00:14:28.080 The 'current user' is simply an instance based on our cookie, and the API call fetches their home timeline. The method is dynamically generated by making that call within the class, leading to our domain-specific language creation for connecting to the API.

00:14:58.800 This syntax allows us to easily establish connections with specific API paths while mapping the returned JSON data to appropriate Ruby classes. The language is intuitive, and once you take a few moments to learn it, you can rapidly define your full API in just a page. This significantly reduces boilerplate code and minimizes errors.

00:15:42.080 However, the challenge doesn't just end there; we also need to map the JSON responses back to user-friendly Ruby objects. This is accomplished through an objectification module that also provides class methods for describing how JSON keys map onto the attributes of Ruby objects.

00:16:10.240 We can define attributes we want accessors for, along with specific parsing behaviors for certain keys. For example, when dealing with tweets, we take the user key and its corresponding sub-hash and pass it to the User class, implementing the same DSL mechanism to objectify it.

00:17:08.640 One significant topic I want to address is scaling, which may seem daunting but isn't as challenging as it appears. For instance, many developers use the JSON C library when the Agile C library for parsing JSON performs significantly faster based on benchmarks.

00:17:43.680 In our mobile website, we had to create our own captchas due to restrictions on using reCAPTCHA, necessary for low-end devices. Surprisingly, few people have explored GD2, which is an excellent image library significantly faster for image processing than ImageMagick.

00:18:24.720 We also built a simplified wrapper around GD2 to manage memory allocation, making it straightforward for users to work with. Mindfulness about object allocations can improve CPU performance tremendously.

00:18:55.760 While many assume that garbage collectors handle memory management entirely, awareness of memory allocations remains crucial, particularly in high-load systems. Understanding the implications behind your code choices can lead to significant performance improvements.

00:19:39.088 Although we discussed CPU usage on the front end, we need to focus on concurrency in our web applications. For example, if you've optimized your app to process requests in 10 milliseconds but find you're often waiting 100 milliseconds on API calls or database queries, that can severely affect performance.

00:20:30.080 In testing our app, we discovered most of our time was spent on I/O-bound tasks rather than CPU-intensive operations. By batching requests efficiently, we could achieve many more results without overloading our servers.

00:21:21.440 Ruby’s concurrency model can be restrictive, leading us to explore alternatives and settle on a library called Typhus for making RESTful requests. Typhus is capable of developing multi-threaded requests while respecting the global interpreter lock.

00:22:00.800 With this, we implemented a structure that allows us to batch different REST calls, effectively syncing while allowing other threads to continue running without blocking.

00:22:46.800 Another unconventional use of Ruby within the mobile team is as a testing harness. Automated testing tools for Cocoa apps are still not as advanced as Ruby's testing frameworks. Instead of porting those tools over to Cocoa, we opted to let Ruby handle testing.

00:23:30.800 The core logic for our iPhone and iPad apps runs purely on Objective C without dependencies on iOS frameworks. This setup allows us to treat Cocoa objects like Ruby objects smoothly.

00:24:00.800 After swiftly integrating a user interface using Interface Builder, we are working on continuous integration where updates trigger automated tests with each push.

00:24:34.400 As we move forward, I’d like to discuss company culture, often compared to cement: it’s easy to shape when it’s wet, but once it hardens, it’s challenging to change. Our early company culture benefitted from enthusiastic Ruby developers, embracing practices like automated testing and agile methodologies.

00:25:23.440 Everyone on our mobile team adopts a generalist approach rather than splitting into specialists. This enables team members to flexibly work across projects without creating single points of failure.

00:26:09.680 We practice iterative development and hold weekly planning meetings where we determine user stories and targets for the week, supported by tools such as Pivotal Tracker. Each week, we evaluate the progress and routes releases according to the ongoing value of the project.

00:27:05.920 We deploy new code daily on Twitter.com with approximately three deployments each day. Each deployment comprises various feature branches to enhance responsiveness.

00:27:58.880 The deployment process requires thorough engagement with our test suite, which runs on a fleet of machines, allowing it to be parallelized and efficiently completed in about two minutes.

00:29:07.280 Despite the complexity of native client updates, we manage weekly releases to our employees for feedback, utilizing an enterprise account to facilitate these updates.

00:29:44.800 Now, let’s talk about the type of candidates we look for in our mobile team. We prioritize technical proficiency, but even more so, we value passion for the product and personal project experience.

00:30:13.560 Cultural fit is critical; we're willing to overlook outstanding candidates if they do not mesh well with our team's values. We're fully committed to developing a positive, collaborative environment.

00:31:03.840 I want to conclude by thanking everyone for their attention and engagement today. I hope you are excited about the opportunities available within our company. I look forward to speaking with any interested candidates.

00:32:10.480 If you have a question regarding a known issue on the mobile site, I may not be the best source. I've been focused on the iPhone team for the past few months.

00:32:31.919 However, if you created a ticket for me, I'd be more than happy to look into it further and provide assistance.

00:33:10.560 When discussing full-stack mobile development, we firmly believe in having generalists rather than specialists, which fosters knowledge-sharing and flexibility amongst team members.

00:33:45.440 We conduct code reviews in our workflow to ensure thoroughness and knowledge sharing. Before merging code into the main repository, a review is required, promoting eyes on the code and sharing best practices.

00:34:25.440 In terms of efficiency concerning our rigorous setup for code reviews and testing, we offer built-in tools to reduce overhead, making the process easier than it might initially seem.

00:35:01.760 To maintain the efficiency of our processes, we encourage alignment flexibility for teams to decide what works for them along with sharing observations amongst peers, allowing room for less rigidity.

00:35:54.880 While we’re actively adapting our current practices, we wish to keep key principles while enhancing our discussions around new feature designs prior to deploying updates.

00:36:24.640 We utilize Ruby for UI testing as automation has been less explored within our mobile team. We've looked into UI testing options and might consider them as effective tools in the future.

00:37:18.080 In conclusion, if you have any questions or topics you'd like to discuss further, please don’t hesitate to reach out. Thank you so much to everyone for your time and your engagement in today’s presentation.

00:39:06.640 Thank you!

LA RubyConf 2011