Talks
Consuming the Twitter Streaming API with Ruby and MongoDB
Summarized using AI

Consuming the Twitter Streaming API with Ruby and MongoDB

by Jeff Linwood

The video presentation titled "Consuming the Twitter Streaming API with Ruby and MongoDB" by Jeff Linwood at the LoneStarRuby Conf 2011 addresses how to effectively utilize the Twitter Streaming API in conjunction with MongoDB through Ruby programming. The main objectives are to demonstrate how to listen for tweets based on specific criteria and to save this data for analysis.

Key Points Discussed:
- Twitter Streaming API Overview: The Twitter Streaming API allows developers to receive real-time tweets filtered by keywords, users, or geographic locations, enhancing data accessibility and reducing server load.
- Setup and Configuration: Jeff begins by introducing a Ruby script that listens for tweets about Ashton Kutcher (@aplusk). The script showcases the capability to capture tweets without requiring a follow, facilitating a broader data collection.
- Data Storage with MongoDB: MongoDB serves as a NoSQL database that complements the JSON-like structure of tweets received. Each tweet is stored as a document, facilitating further analysis.
- Streaming Process: The Ruby TweetStream gem simplifies connection to the Twitter Streaming API and manages tweet collection asynchronously, offering an efficient way to handle real-time data.
- Limitations: The Twitter Streaming API imposes certain restrictions, such as limits on the number of keywords and user IDs that can be tracked, and connections must be maintained to avoid excessive reconnections, which may result in account banning.
- Practical Application and Deployment: The command-line application can be easily deployed on services like Heroku, with attention to security practices regarding credentials.
- Future Prospects: The collected tweet data can be further analyzed for trends, sentiment analysis, or customer feedback, benefitting businesses focused on social media analytics.

Conclusion: Lastly, the presentation reinforces the versatility and potential of integrating Twitter Streaming API with MongoDB using Ruby, encouraging viewers to explore this technology for their projects and applications in social media analytics. Jeff concludes the presentation by inviting questions about the implementation details and application usage.

00:00:18.710 This presentation is about the Twitter Streaming API and how to use it effectively. I will discuss how to listen for tweets containing specific keywords or tweets from certain users, and how to store them in MongoDB. This overview will cover the basics of the Twitter Streaming API, and I will show you a simple Ruby script that can be deployed on platforms like Heroku to listen for these tweets and store them in a MongoDB database.
00:00:50.370 To demonstrate this, I will show you a Ruby script that listens for tweets mentioning Ashton Kutcher. His Twitter username is @aplusk. Importantly, you don't have to follow Ashton Kutcher for this to work. I will be using one of my own Twitter accounts to demonstrate the connection, since Twitter requires a username and password to access the Streaming API. However, I can show tweets from anyone in the Twitter sphere who is tweeting about Ashton.
00:01:11.880 As I set up the demonstration, the Ruby script listens for any tweets related to @aplusk. When I wrote this script, it was an interesting day with more engagement, but today it's relatively quiet. The tweets displayed may vary depending on the time. For instance, tweets from users interacting with @aplusk may show up. As those tweets are collected, they are stored in the MongoDB database, allowing for analysis later.
00:02:02.330 Now, let’s discuss the technology behind this. The Twitter Streaming API was released to reduce load on Twitter's servers while providing developers a real-time stream of Twitter data. Additionally, I will touch on MongoDB, which is a NoSQL database. It differs from traditional SQL databases in terms of setup and querying language, allowing for more flexibility and advanced features, such as direct support for map and reduce operations.
00:02:34.380 This application is not a web application, such as Rails or Sinatra; it is a command-line application. This allows you to deploy it anywhere, providing flexibility in usage. When connecting to Twitter’s API, you need to handle the flow of information. For example, the newer Twitter interface is built using JavaScript, which communicates with the Twitter API, allowing you to write clients and tools to interface with Twitter.
00:03:16.730 The Twitter Streaming API is what I will focus on today. It's used for real-time connections to Twitter, collecting tweets based on specified keywords, users, or geographic locations. Another API available is the User Stream API, which features the tweets that appear in your Twitter feed, allowing for personal connections. However, Twitter currently discourages the creation of generic Twitter clients due to recent changes in their developer terms.
00:04:05.740 Site Streams are designed for larger projects, aggregating multiple user streams, which is beneficial for scaling with larger applications. Many developers use the REST API for tasks like retrieving recent tweets or user tweets, making it the most common way to interact with Twitter. Lastly, the Search API is slightly different due to acquisitions, as it provides access to tweets that match searches.
00:04:44.170 With the Twitter Streaming API, you can track keywords, users, and locations. The keyword tracking is an OR search setup, allowing flexibility in tracking up to 400 keywords. The user tracking allows you to monitor tweets from any users you specify and includes location tracking, which enables the collection of tweets from regions or specific areas.
00:05:54.370 The TweetStream gem is a powerful tool for this purpose. It performs the heavy lifting for your Ruby code, allowing for easier setup without complicated configurations. It uses EventMachine for asynchronous processing, making it efficient for real-time data collection. To dive deeper into using this, you can check out the TweetStream documentation.
00:06:59.100 When connecting to the Twitter Streaming API, you’ll receive JSON responses. Here's a sample of what JSON for a tweet looks like. It contains various fields, including the text, user information, and whether the tweet was retweeted. This data is typically more content-rich than one might expect, containing links and user IDs, allowing developers to parse and utilize the data effectively in their applications.
00:09:01.110 An interesting feature of the Twitter API is how it handles the parsing of tweet information, including textual content, URLs, and hashtags. This could be advantageous for developing applications centered on trends or sentiment analysis. The format of data received matches closely with JSON, making it easy to work with MongoDB, which handles JSON-like structures as well.
00:10:03.300 One limitation to note is that while connecting to the Twitter Streaming API, you need to use HTTP Basic Authentication instead of OAuth, which is used more commonly with their REST API. This distinction exists likely because the Streaming API often runs on a server rather than a client application.
00:10:31.290 Each Twitter account is limited to just one stream, so if you plan to develop multiple applications, you may need separate Twitter accounts. This is crucial when trying to maintain control over Twitter's data flow and ensuring that you are compliant with their restrictions.
00:11:36.770 When utilizing the Streaming API, it is essential not to constantly reconnect to Twitter, as this can be costly. Instead, maintaining an open connection is preferred. If you write an application that continuously sends reconnect requests, Twitter reserves the right to ban such activity.
00:12:36.650 The Streaming API has other limitations, including the tracking capacity, which allows for 400 keywords, up to 5,000 user IDs, and 25 geographic bounding boxes. However, if you find you are reaching these limits, you are able to request additional access—all in Twitter’s interest to ensure their platform is utilized effectively without unnecessary restrictions.
00:13:43.760 Now, moving onto MongoDB, you can utilize it as a means of storing these tweets for later analysis. MongoDB is a NoSQL database that allows for a flexible structure for data storage. When dealing with tweets, MongoDB's JSON-like structure aligns perfectly with the data retrieved from the Twitter API.
00:15:50.710 MongoDB supports its own querying language, which means you'll need to familiarize yourself with that as you dive into it, but its design is aligned with how data is structured in applications like Twitter. For instance, you might use MongoDB for tasks such as aggregation using MapReduce, integrating JavaScript functions within your Ruby code.
00:16:59.580 Connecting to a MongoDB instance is straightforward. When inserting tweets you ensure that you’re effectively storing the JSON structure that aligns well with MongoDB capabilities. Additionally, you'll find drivers specifically for Ruby that make handling this communication smooth.
00:18:48.790 You can incorporate MongoDB within any Rails or Sinatra application, utilizing traditional SQL for parts of your application while employing MongoDB for specific datasets, like tweets. This flexibility allows for different architectural decisions without losing performance or usability.
00:20:31.240 I have set up a simple Ruby script that allows the connection to Twitter, pulls the relevant data, and organises it into a format that’s usable for analysis. It features easy configuration through a simple config file that stores your credentials without making them exposed in the code.
00:21:19.740 The script functions by connecting to Twitter and listening for relevant tweets, then saving those into the MongoDB database. Each tweet entry is stored as a document within a designated collection, allowing for real-time data accessibility. You can also track and follow keywords, users, and hashtags to ensure comprehensive data collection.
00:22:59.700 As I mentioned, I've deployed this application on Heroku using a free MongoDB hosting service. By setting up your environment variables correctly, you can easily connect to your database and manage it directly from your Ruby application. Instructions on deploying are available on my website.
00:24:35.970 While setting up on Heroku, I ensured to follow best practices by securing my credentials and avoiding sensitive information from being pushed to GitHub. Maintaining good security protocols is important to avoid unauthorized access to accounts.
00:25:47.500 In conclusion, you have the flexibility to deploy the application on various platforms. You can run it as a command-line utility, or set it up on a dedicated server. The setup is straightforward and allows for scalability if necessary.
00:27:13.150 Once this data is collected, the next steps are to determine how to process and analyze the collected tweets. This might involve aggregating data, performing sentiment analysis, or finding trends based on keyword interactions and user engagement.
00:28:45.090 While many have used similar technologies for social media analytics, companies are increasingly interested in monitoring online sentiments and brand reputations. These tools allow businesses to stay attuned to customer feedback and improve their image based on real-time data.
00:29:59.080 If you’re passionate about social media analytics and data processing, I encourage you to explore these tools further. Thank you for attending this presentation, and I'd be happy to address any questions now.
00:31:02.210 If you have questions about using the Twitter Streaming API or integrating it with MongoDB, feel free to ask. I'm always interested to hear what projects you all are working on and how you might use this technology.
Explore all talks recorded at LoneStarRuby Conf 2011
+15