LoneStarRuby Conf 2009

Summarized using AI

Rails in the Cloud

Ian Warshak • August 27, 2009 • Earth

The video titled "Rails in the Cloud" features Ian Warshak, a developer from RightScale, discussing how to integrate Ruby on Rails applications with cloud services, particularly focusing on Amazon Web Services (AWS). In his presentation, Warshak aims to provide insights into utilizing cloud infrastructure and services for web applications, using a simple photo application called Pictor as a case study.

Key Points Discussed:
- Introduction to Cloud Technologies:
- Warshak differentiates between cloud infrastructure (e.g., Amazon EC2) and cloud services (e.g., Amazon S3).

  • Pictor Application:

    • He introduces Pictor, a photo uploading application that processes images asynchronously, allowing users to upload photos while the application handles conversions in the background.
  • Cloud Infrastructure:

    • Emphasizes the need for a scalable solution without managing physical servers, highlighting the advantages of automating deployment and using AWS services.
    • The application runs on EC2 and utilizes S3 for storage, ensuring efficient handling of image files.
  • Handling Image Processing:

    • Discusses the use of asynchronous processing for image uploads, leveraging services like Amazon SQS for job queuing and processing servers to handle requests efficiently.
    • Highlights the significance of using a Content Delivery Network (CDN) like Amazon CloudFront to serve images quickly to users.
  • Database Management:

    • Warshak touches upon using Amazon SimpleDB, noting its differences from traditional relational databases and how it stores data in a non-relational format.
    • Describes job tracking for image processing using SimpleDB to monitor the state of each image.
  • System Architecture and Job Management:

    • Presents the application architecture, illustrating server pools and the workflow from image upload through processing to display. Jobs are created upon uploads, processed via a daemon service, and monitored for success or failure.
  • Real-Time Demonstration:

    • Warshak conducts a live demo, showcasing the upload process, monitoring job statuses, and displaying processed images.

Conclusions and Takeaways:
- Cloud services, particularly AWS, offer flexible, scalable solutions that allow developers to focus on building applications rather than managing infrastructure.
- Automating server configurations and using cloud services help streamline development processes while providing robust performance.
- The session emphasizes the growing importance of cloud computing in application development, encouraging developers to embrace these technologies for better efficiency and scalability.

Rails in the Cloud
Ian Warshak • August 27, 2009 • Earth

Rails in the Cloud by: Ian Warshak

Help us caption & translate this video!

http://amara.org/v/G1Wt/

LoneStarRuby Conf 2009

00:00:21.680 Okay, can y'all hear me? Alright, cool. My name is Ian Warshak.
00:00:27.400 I am a developer, and I work at RightScale. RightScale provides a cloud platform management tool that you can use.
00:00:33.960 I have some code that I'll be showing you, and here's the GitHub account where you can check it out.
00:00:39.040 I won't be showing a lot of the code, so feel free to look at it on your own. I'm going to be talking about Rails in the Cloud.
00:00:45.559 I'm going to focus mostly on Amazon, specifically Amazon's web services and cloud-based services. I'll walk you through a simple app that I wrote for this purpose.
00:00:52.559 It doesn't do much; it's quite arbitrary and a bit silly, but hopefully, it will give you ideas on how you can leverage Rails tools in your own applications.
00:00:59.199 The first thing I wanted to do is differentiate terms when we talk about the cloud. It's a big, nebulous term that means different things to different people. I separate this into two buckets: cloud infrastructure and cloud services. Cloud infrastructure refers to servers, like Amazon EC2, which is the big one. There's also Rackspace, Flexiscale, and GoGrid. Cloud services refer to solutions like Amazon S3 or Amazon Simple Queue Service. These aren't infrastructure but rather services.
00:01:18.479 I wrote this app called Pictor, a play on Flickr. I thought of it as a sort of 'Flickr killer' because it can scale. I realize that sounds pretentious, but that was my goal.
00:01:30.200 It's actually a simple photo application. You upload a photo, and the application transforms it in a couple of ways and displays the images on the homepage. It's not impressive, but some interesting aspects include that the photos are converted and processed asynchronously.
00:01:42.519 As I mentioned, I’m going to talk about the technologies behind it. I hope it provides ideas for using these technologies in your own work. I chose this photo application idea because it naturally lends itself to cloud computing, utilizing a lot of disk storage, and bandwidth for users downloading pictures.
00:02:00.280 The offline processing part is crucial because processing photos with tools like ImageMagick can take time. You don't want users waiting for responses while the photo converts or resizes.
00:02:17.440 I foresee this as being an instant hit, and we are going to need scaling right out of the gate.
00:02:30.200 If I were building this for real, I would want to utilize cloud infrastructure and services for several reasons. I don't want to build or configure hardware. Making arrangements for 10 servers to prepare for future needs doesn't appeal to me.
00:02:47.560 I want to offload as much of this work as possible. For a small team or single developer, managing disks or storage solutions takes too much time. Instead, I want to spend as little time on that and focus more on development.
00:03:06.920 So, how did I build this app? It’s a Rails application with roughly 300 lines of code, although I don't have tests for it. It runs on Amazon EC2 servers. I have two pools of servers: one pool for the web front end, managed by load balancers, and another pool for processing servers that handle the work.
00:03:37.360 All the photos are stored on Amazon S3, meaning I don’t have to worry about where to store these files on my servers, avoiding concerns about using SANs. I'm also using a Content Delivery Network, specifically Amazon's CloudFront, which I'll explain more about later.
00:03:56.560 I use the RightScale AWS gem for virtually everything, as RightScale has an extensive library for utilizing Amazon's web services. This is what we use for our own infrastructure, which we've distributed as open-source.
00:04:06.920 This library includes a Ruby interface for Amazon EC2, S3, and other web services. Why did I opt for Amazon? It's primarily because Amazon is the market leader in cloud services. There's a lot of sound competition, but currently, Amazon is the big dog in the field.
00:04:44.760 The integration between services is seamless; if I have an EC2 server, transferring data between servers is free. Generally, they charge for bandwidth, but bandwidth between their services is cost-free.
00:05:01.920 There are APIs for all of this, along with tons of libraries and documentation. While I mentioned using the RightScale AWS gem, there are many other libraries available, which makes it more accessible if you're considering using these technologies.
00:05:17.280 In case you're not familiar with EC2, it's essentially servers on demand. With an API call, you can launch a server in Amazon's data centers, and you pay by the hour, which is convenient because I don’t want to pay upfront.
00:05:48.639 One of the main caveats of EC2 is that it's not persistent. If you shut down your server, any configurations you made will be gone. This means you need to automate configurations which can be a pain initially but is ultimately beneficial.
00:06:26.640 Running multiple configured server images can become unsustainable, especially when scaling. RightScale provides tools for this, but there are also options like Chef, Puppet, and various systems configuration tools.
00:06:55.440 Amazon also has Elastic Block Store, which provides persistent disk storage for EC2 servers. However, I won't delve too deeply into that today. I'm sure many of you know about Amazon S3, a popular online storage service where you pay for the data you store.
00:07:25.760 CloudFront, on the other hand, acts as a Content Distribution Network, which I'm using to efficiently serve images and content to end-users.
00:07:41.360 With distribution servers worldwide, you can create a CloudFront domain. Essentially, by associating an S3 bucket with CloudFront, I get a unique domain allowing images to serve quickly.
00:08:00.000 For Amazon Simple Queue Service, it's a basic queuing service where I send messages to a queue and pull messages off of it. When a user uploads a picture, it creates jobs on the queue. The processing servers pull these jobs, process images, and perform necessary tasks.
00:08:50.960 What's neat about most messaging queues is that they ensure job durability by returning messages to the queue if processing fails. This means if something goes wrong, the job won’t get deleted immediately, allowing for retries.
00:09:19.520 Amazon SimpleDB is a non-relational database they offer. It can be quite challenging to navigate if you're coming from a relational database background, primarily because it lacks tables and joins.
00:09:43.560 In SimpleDB, what they call a table is referred to as a domain. You can't perform joins across two domains, which forces you to denormalize your data, keeping related data together.
00:10:07.640 The lack of a schema allows you to define the domain as you proceed, which is quite powerful. However, all data is stored as a string, so you need to handle things like dates and integers carefully.
00:10:33.920 SimpleDB automatically indexes all your data, so you don't need to run any index management. However, you have to remain aware that querying is not as fast, as each data pull involves making an internet call.
00:11:20.960 As you build your loads, the speed remains fairly constant, meaning the performance remains stable amongst all records, whether you have 10,000 or 10 million.
00:12:08.880 To summarize, you can worry less about scaling in some ways, but you still need to consider aspects of configuration management and automation. Having automated configurations allows systems to boot and configure correctly, minimizing potential headaches.
00:12:40.960 With SimpleDB, database performance is consistent, which is something that MySQL can struggle with when you have a large number of records. It excels at handling large amounts of data without degrading performance.
00:12:59.919 Amazon S3 and CloudFront handle all my static file serving, which lessens the load on my servers significantly.
00:13:28.560 Here's a diagram of the architecture of the application. SQS, SimpleDB, and S3 are all services that I utilize. My web servers and processing servers operate in two separate pools.
00:14:00.960 Users upload pictures to the application. When they upload a picture, they're directed to upload directly to S3. This means my servers handle less work, which is beneficial.
00:14:36.280 Each upload form contains a hidden redirect URL to facilitate this process. Upon successful upload, users are redirected back to my servers for further processing.
00:15:02.560 The redirect URL is utilized to initiate job creation and to start processing the uploaded image once it is received by Amazon.
00:15:30.160 As for job creation, once the server receives requests and the image name, I create jobs to handle the processing tasks, which are then sent to the queue.
00:15:57.360 In addition to creating the jobs, I also generate a SimpleDB record indicating the processing state of the image.
00:16:30.960 This record keeps track of the image name and job statuses, allowing me to ascertain when processing is complete.
00:17:05.040 There's a monitor job that checks if both jobs are done and updates the database accordingly with their statuses.
00:17:45.000 Now, after processing the pictures, I update the SimpleDB record, providing a status update based on the completion of the processing tasks.
00:18:20.000 The processor daemon runs in a loop on a server, continually pulling job requests from the queue and executing them.
00:19:12.960 Each picture that is uploaded results in two processed images, one monochrome and one with a paint effect, thereby fulfilling the task requirements for each upload.
00:19:45.120 Let me demonstrate the application with a live demo now, so if you have any questions, feel free to ask.
00:20:20.800 Currently, one daemon is running per processing server, while in a more advanced setup, you would typically want several daemons running on each server.
00:20:44.800 This setup effectively minimizes processing delays by designing the system for each job to be handled separately.
00:21:06.240 As the uploads progress, I'm able to see two web servers and monitor how jobs are distributed among the servers.
00:21:32.720 If I refresh, the app will switch between the two processing servers based on which server completed the jobs.
00:22:00.000 You can see a live demo of the application now, allowing you to upload a picture and monitor the processing.
00:22:32.640 The upload form is tied directly to Amazon S3. There are hidden values that authorize the upload and ensure that all processes are handled securely.
00:23:03.280 When the file is uploaded and successfully processed, the user is redirected back to my server to begin the next steps in processing.
00:23:35.280 Streamlining the upload process allows for a smoother user experience, enhancing how users interact with the application.
00:24:01.600 Once my servers retrieve the request with an image identifier from the URL, I create job instances required for the image conversion process.
00:24:37.440 Each conversion job is serialized into a format suitable for the queue, making them ready for processing.
00:25:16.920 My convert job class has a specific method that handles the actual image conversion, utilizing ImageMagick to perform the processing tasks.
00:25:52.760 I create jobs based on user uploads, and these jobs are handled by the processing servers, ensuring they complete tasks efficiently.
00:26:26.000 The system maintains a record for every image undergoing conversion processes, tracking their statuses across transformations.
00:27:01.040 Once both processing tasks are completed, SimpleDB records will reflect the finished statuses. This transparency helps manage workflows effortlessly.
00:27:34.920 The system continuously checks if the images have been processed and updates the corresponding database entries. If both conversions are complete, the application shows the images.
00:28:10.880 Here's how I update records, signaling to the system that the respective image has been transformed and is ready to be displayed.
00:28:47.679 My processor daemon runs iteratively, managing job loads systematically. Every upload leads to converted images, maintaining efficiency within the architecture.
00:29:23.760 Let me provide a brief walkthrough of uploading images and monitoring the conversions in real time.
00:30:02.479 By refreshing the page, you will see the processing states of the uploaded images, further indicating how they are dynamically managed across the server.
00:30:37.599 Would anybody like to view the uploaded images with applied effects as part of the demonstration?
00:31:12.000 I recently had memorable experiences during a mission trip to Guatemala, where I assisted in medical work, which I shared as a lighthearted side note.
00:31:46.200 As the demo progresses, you can see the completion of jobs across my processing servers, showcasing the readiness of converted images with applied effects.
00:32:22.000 Currently, you can observe two separate conversions happening, with the corresponding job results reflecting the processing outputs on the screen.
00:33:03.760 If you have any questions about the project or the workings of the setup, please feel free to ask. I'm happy to provide more insights.
00:33:38.400 Security protocols during image uploads involve HTTPS connections, ensuring that all data exchanged remains secure and authorized.
00:34:10.000 Post-upload, users are redirected back to the appropriate page within the application where they can view the images or job summaries.
00:34:43.000 If there are more inquiries about the processes shown, or deeper dives into the code repository, I'm happy to assist further, as many resources are available.
00:35:20.800 The underlying principles and security mechanisms of the API ensure that user interactions are consistently managed, keeping services both reliable and efficient.
00:36:00.800 Please keep the questions flowing, as it’s valuable for understanding how we can apply cloud solutions effectively in various contexts.
00:36:21.719 All right, thank you everyone for your attention and engagement throughout this presentation. Let’s discuss any final thoughts or questions you may have.
00:36:38.880 Thank you for being here today.
Explore all talks recorded at LoneStarRuby Conf 2009
+14