00:00:19.439
All right, so yeah, I'm Jeff Dagley. I work at Zynga, formerly known as New Toy. How many of you have played a 'With Friends' game? I like that you keep me in a job! So, we've got the 'With Friends' family. The back end is a Rails service. On the front end, we have chess with friends and various other games.
00:00:39.600
So, how many play chess? There are a few of you. Words with Friends is our most popular game. Scramble with Friends has a few players as well. That’s actually the older icon of Hanging with Friends. Is anyone still playing Hanging with Friends? What about Matching with Friends? Yes, there are a few of you. And lastly, Gems with Friends is the one I currently work on.
00:01:05.360
All these games came out of McKinney, so if you're wondering where Words with Friends and Chess with Friends started, it was in the Downtown Public Library of McKinney. The two brothers, Paul and David Bettner, left their jobs when the iPhone came out and said, 'We're going to do something with this; it seems great!' So they went up to the library, which was quiet, and started building 'Chess with Friends'.
00:01:22.880
For those of you who don't know, 'Chess with Friends' was actually the first game. They released it, and while it was doing okay, they went on to build 'Words with Friends,' which became even more successful. I joined in October 2009 initially as a contractor for New Toy.
00:01:39.840
Today marks my three-year anniversary with Zynga, so I'm excited about that! It's fun stuff—three years at a company these days is quite an accomplishment. What are we going to talk about today? We'll discuss some mistakes I made, and I've been listening to talks about solving big problems. I'm going to share how we encountered those problems.
00:02:02.960
We didn't start off trying to solve those problems; we began as two guys in a library building a game on iOS without knowing the server back end. We built as we went along. We'll discuss the mistakes we made and how we've corrected them over time, as well as where we're going forward.
00:02:43.040
To your right is the day the iPhone 4 launched; we were the number one app in the iTunes App Store! That was an exciting place to be and shows where we came from. Here are the With Friends games we've developed. This illustrates the ecosystem of what we're currently supporting.
00:03:20.000
We have one back-end system supporting all these different games, with various clients and versions out in the wild. For example, 'Chess with Friends' has both a free and paid version; so does 'Hanging with Friends.' 'Scramble' and 'Gems with Friends' also have free and paid versions. 'Words with Friends,' our big one, has iPhone, iPad, and Android versions, available on Google Play and the Kindle.
00:03:36.160
Talking about supporting multiple clients and platforms, we have a fun problem in that not everyone updates to the latest version right away. Many players are still using older game versions. All this means we must ensure we don’t break older versions when releasing updates.
00:04:00.640
In case you haven't heard, a few people like our games, including Alec Baldwin, who got kicked off a plane for playing 'Words with Friends.' Various celebrities have tweeted or mentioned us, which is interesting. Fred Durst, the lead singer of Limp Bizkit, tweeted his username inviting people to play with him.
00:04:29.280
This led to our servers going down as everyone flooded to start games with him. We had to implement limits, as the server could not handle the overwhelming number of requests. Now, if you try to start a game with somebody exceeding their 20-game limit, our system alerts you.
00:05:11.759
We all know that Rails is often said to have scaling issues, and this is a joke among us. When I joined Paul and David, they shared their experience of building the back end with Rails, claiming it was 'sexy.' This kind of line does help with decision-making about technology, but it made me wonder if they thought I was sexy too.
00:06:04.639
What are the constraints we're dealing with? One major one is backwards compatibility. The current iOS version is 6.12, while Android is 4.22. The sheer number of Android versions and devices complicates matters. While we can force client upgrades when introducing major server changes, we want to avoid disrupting users whenever possible.
00:07:43.440
So we’ve kept parts of our code that check client versions. In one instance, we had to explicitly handle cases where a negative ID was sent from a client due to issues converting data types. This capability allows us to react more quickly on the server side.
00:08:59.760
When the clients were initially built, they used Active Record’s XML functionality rather than JSON. Many early clients are still requesting XML responses. We are transitioning new features to JSON, but we still have a substantial amount of XML traffic.
00:10:36.159
We support multiple game types within our one service, and we originally had a lot of conditionals designed to handle the specific logic for each game.
00:11:23.919
We have since refactored into smaller classes to better manage this logic. However, any change in shared logic affecting purchasing across all games must be carefully tested to avoid breaking other versions. Despite our best efforts, it's still easy to miss issues when updating one game that may inadvertently affect others.
00:12:54.000
The rapid growth of our user base did not come with a dedicated Ruby conference or in-depth planning. The Bettner brothers focused on developing 'Chess with Friends,' and all our growth has been organic. This organic growth brought many challenges, evidenced by our patchwork codebase filled with fixes.
00:13:59.600
So, how many of you are familiar with YAGNI? It stands for 'You Aren't Gonna Need It.' As our database grew, we discussed whether to shard it, but ultimately decided it wasn’t necessary yet—again, focusing on the simplest solution possible.
00:14:51.440
We began our journey on Slicehost, similar to how you might start with Heroku today. Slicehost was easy to set up; we eventually switched to Rails Machine, which provided us with dedicated hardware. Later, we migrated to SoftLayer, which supplied us with bare metal machines. Finally, after being acquired by Zynga, we moved into their data center.
00:16:32.480
We now operate on over 400 app servers, deploying around 35 unicorns on each server. We have over 40 worker machines processing background queues, 70-plus memcache servers, and over 125 MySQL shards for more complex needs.
00:17:14.960
MySQL-related issues plagued us early on. One notable incident occurred when we received a surge of new players after John Mayer tweeted about 'Words with Friends.' This influx strained our system and made late-night gameplay painfully slow.
00:18:57.480
We faced challenges like the 'Movepocalypse,' where we experienced ID exhaustion due to how Rails defaults to Integer IDs. This led us to coordinate client updates to ensure they could support a larger ID data type.
00:19:55.679
Concurrently, the 'Chatpocalypse' arose, leading to significant limitations. While we focused on managing the volume of moves, we guaranteed the integrity of chat functionality.
00:20:54.640
We worked through some interesting fixes, such as using an ID generator that accidentally collided with the MySQL auto-increment numbers, highlighting the need for careful planning.
00:21:43.679
We also experienced issues with MySQL's simultaneous connections limit. As we scaled our application, over 512 concurrent connections became routine, necessitating the implementation of a connection pooling strategy.
00:23:29.520
To address the challenges of scaling, we’ve kept track of every move made in our games, continuously refining database architectures to better accommodate our growing user base. We adopted partitioning strategies in light of the overhead from our earlier systems.
00:24:15.879
Over the years, we added humor into our development culture. For example, during the 'Movepocalypse,' we transitioned from simple moves tables to the 'Dance Moves' table to manage all our movements efficiently while introducing sharding.
00:25:53.679
The evolution of our game tables shows how we creatively adjusted our database design to keep pace with the growth in users. We explored sharding as a solution to our expanding data requirements, splitting massive tables into manageable sizes.
00:26:40.160
The design of large tables made it challenging to add new columns or data types. We began utilizing flexible JSON columns to store dynamic information without needing to alter table schemas.
00:27:30.679
As we shifted to a sharded approach, we tackled issues related to how different databases generated auto-incrementing IDs. We turned to Redis for effective ID generation, allowing for consistent tracking across multiple systems.
00:28:50.800
We introduced caching mechanisms to optimize our database interactions, focusing on areas that would yield significant performance improvements. Over time, we scaled our caching solutions to accommodate surging user activity.
00:30:25.920
Our efforts to maintain reliability extended to careful monitoring practices. Implementing tools like New Relic made tracking performance and troubleshooting problems more efficient.
00:31:54.080
We used real-time analytics to inform our decisions during major implementations. The practice of rolling out changes in increments allowed us to manage risk effectively and adjust whenever necessary.
00:32:43.440
In my experience, it’s imperative to focus on metrics that reveal the health of the application, especially under stress. By tracking these consistently, we can refine our approach and meet growing user expectations.
00:33:39.040
Investing in a robust testing culture helped us avoid ship without proper quality checks. Thus, we adopted a rule to ensure that every pull request would need accompanying tests to maintain code integrity.
00:35:20.000
Deploying code under scrutiny allowed us to stabilize our application over time. While it's inevitable that bugs will arise, a diligent testing strategy positions us to catch potential setbacks sooner.
00:36:05.000
Staying on top of code deployment and observability enabled us to forge a more robust system. For example, being cautious about how we managed cache versions significantly minimized issues related to inconsistent data.
00:37:14.720
We integrated strategies to handle known edge cases, which further strengthened our codebase. This includes remaining aware of how changes could affect fundamental functionality.
00:38:42.920
Overall, our experiences managing the games highlighted the importance of adaptability within a development environment. Continuous efforts to refine processes and technology choices have been paramount to our growth.
00:39:50.640
And of course, I can't forget to mention a few ‘Words with Friends’ tips. Knowing some two- and three-letter words can significantly increase your game performance. I'm more than happy to share those insights if you’re interested.
00:41:15.840
You can find me online as g daggly on most platforms. Thank you all for your attention!