Ancient City Ruby 2016

Get Ready for Parallel Programming, Featuring Parallela

Parallella is a single-board computer roughly the size of a credit card or Raspberry Pi. Parallella runs Linux. It has 18 cores (2 ARM, 16 RISC) and you can buy it online for about $150. This talk will explore two questions: (1) How parallel execution differs from serial, and (2) Why we care about parallelism.

This talk is the sequel to Ray's Parallella talk from 2015. To get a head-start on the subject, check out Part One: http://rayhightower.com/blog/2015/08/22/madison-ruby-and-parallella/.

Ancient City Ruby 2016

00:00:00.410 Thank you for having me, Hash Rocket. It's a pleasure to be at Ancient City Ruby today and to talk to you about Parallella.
00:00:08.130 Parallella is a single-board computer roughly the size of a Raspberry Pi or a credit card. It features Ethernet for networking, RJ45, and a 5-volt power supply at 1 amp. It has USB micro connectors for your keyboard and mouse, requiring an adapter for that, as well as a micro HDMI for video output and a microSD slot for storage, which can replace a traditional hard drive.
00:00:20.609 More importantly, it is a single-board computer with 18 cores; this is how it differs from the Raspberry Pi. Today, we will talk about parallelism and why it is significant for web developers, mobile developers, or any type of development, as there are problems that parallelism solves that we might encounter now or in the future.
00:01:01.260 My name is Ray Hightower, and I run a software company called Wisdom Group. My team and I organize several conferences, including the Chicago Ruby User Group and Windy City Rails, which focuses on Ruby on Rails. We also host a new conference, Windy City Things, in Chicago, dedicated to the Internet of Things. But enough about me; let's talk about Parallella.
00:01:26.520 This is the Parallella desktop, and as you can see, it features a terminal window, a browser, and it is equipped with the best text editor known to humans: Vim. I might have started some discussion about that, but it also has Emacs, which is another excellent text editor. You can install tools on it just like any other Linux desktop.
00:01:37.409 I have a Parallella inside of a 3D-printed case created by Dr. Suzanne Jay Matthews and her team at West Point. She is a professor of computer science and computer engineering and teaches parallel programming and high-performance computing to cadets at West Point. Dr. Matthews was inspired to create a 3D-printable case, allowing individuals to print their own designs.
00:02:05.009 If you visit Thingiverse, the companion website for MakerBot, you can find the plans for the 3D-printable case, allowing you to print one yourself. The case is designed to snap together or can be assembled like Lego bricks. This design enables you to build a cluster of Parallella devices when you need more computational power.
00:02:24.450 So why do we care about parallelism? More specifically, what can we accomplish now with parallelism that we could not do before? These questions are crucial as everyone has someone to answer to, whether it’s clients if you own a company, or your own bank account.
00:02:39.640 To sum it up, the big reason is Moore's Law—or rather, the end of Moore's Law. Most of you are familiar with this concept: Moore's Law states that the number of transistors on a silicon wafer doubles approximately every 18 months. This exponential growth in transistors has led to remarkable improvements in functionality, as illustrated by the advances in smartphones.
00:03:22.800 However, there's a drawback: Moore's Law is approaching its limits, meaning we need new ways to extract performance from our silicon. One way to achieve this is through parallelism, and Parallella serves as a platform for experimenting with this concept.
00:03:46.850 Parallelism has existed for quite some time. For instance, Admiral Grace Hopper, who worked on COBOL, noted that rather than making a single powerful ox pull a heavy load, we can harness multiple oxen to share the burden. This analogy illustrates how we can enhance performance more effectively.
00:04:09.640 When I first explored parallelism, I worked with the Parallella board, excited about concurrency and parallelism. A key lesson I learned is that concurrency and parallelism are not the same. Rob Pike, one of the creators of the Go programming language, discusses this distinction in a talk available on my blog.
00:04:33.550 In simple terms, concurrency allows at least two threads to make progress, while parallelism enables at least two threads to execute simultaneously. Think back to early personal computers that managed multiple tasks on a single processor—concurrency involved rapidly switching between tasks, whereas parallelism runs multiple processes at the same time.
00:05:41.920 Another important consideration in parallel computing is energy consumption. When it comes to supercomputers, energy costs can be significant. For instance, the fastest supercomputer in China consumes nearly 18 million watts, costing millions annually. In contrast, Parallella typically uses only 5 watts, resulting in minimal operating costs.
00:06:04.680 To visualize the low power consumption, I used a device similar to a phone charger, which outputs 5 watts of energy. By repurposing old electronics, I created a setup demonstrating that Parallella consumes just 5 watts, making it an efficient option.
00:06:37.580 Now, let's touch on computer architecture, particularly RISC and ARM. RISC, or Reduced Instruction Set Computer, optimizes computer architecture to focus on the 20% of instructions that are used 80% of the time. ARM is a proprietary technology developed by ARM Holdings, primarily found in many mobile devices.
00:07:32.090 Parallella features two ARM cores and 16 RISC cores, organized on a single Epiphany chip. This architecture allows for efficient processing. The Epiphany chip has plans for scalability, potentially accommodating thousands of cores in the future.
00:08:00.960 Let's conduct an experiment comparing the Parallella's performance to a Mac running OS X. We will calculate all prime numbers between 0 and 16 million. First, we'll run the serial version on Parallella, taking approximately 237 seconds to find 1 million primes.
00:09:01.750 Next, I'll run the same task on the Mac. Even though a MacBook Pro can be expected to outperform a $150 Parallella board, it only took 14 seconds to get the same results. Now let’s assess the performance when running on Parallella in parallel.
00:09:40.760 To execute this in parallel on Parallella, I'll include a library called EDA.h, which enables us to address the cores efficiently. After compilation, the parallel execution is dramatically faster, taking just 18.6 seconds.
00:10:00.730 In summary, our results illustrate that for this embarrassingly parallel problem, Parallella, costing $150, can deliver comparable performance to a $2,000 Mac. This is a perfect example of how we can exploit parallelism.
00:10:51.420 Another area where parallelism excels is in calculating the Mandelbrot set. Here, Parallella divides the work into 16 chunks, allowing the cores to collaborate efficiently in real-time calculations.
00:11:35.680 So why do we care about parallelism? In my work with Wisdom Group, we became interested in parallelism due to a client focused on next-generation supercomputers with hundreds of thousands, or even millions, of cores. Our firm assists them in tracking performance data and optimizing systems.
00:12:44.920 Another practical application of parallelism is weather prediction. By dividing the geographical area into smaller cubes, meteorologists can process data on temperature, humidity, and other factors much more efficiently. Smaller cubes lead to more precise predictions, though they require increased processing power.
00:13:41.100 Finite element analysis is another application, allowing engineers to assess structural integrity by analyzing various elements under different stresses. By breaking the object into finite elements, engineers can determine how forces will affect each section.
00:14:40.140 To demonstrate how high-level languages like Ruby integrate with parallel computing, I have a simple chat application running on Parallella utilizing Sinatra and WebSockets. This reflects how parallel processing can enhance application functionality.
00:15:30.780 Another important aspect of Parallella is its Field-Programmable Gate Array (FPGA) functionality. This allows for quick implementation of Boolean logic and rapid reconfiguration for various tasks, which is crucial for many applications.
00:16:22.370 The Parallella board can be operated either as a headless server or with a monitor attached, depending on your project needs. The headless version permits more gates for processing, allowing for extensive customization.
00:16:55.390 There are many exciting uses for FPGAs, including in finance where speed is crucial for traders. It's important to leverage advancements in technology while respecting the potential challenges that accompany them.
00:17:48.180 Now, looking towards the future, another parallel single-board computer called Pine64 was announced via Kickstarter. It starts at $15 and includes ARM cores capable of 4K video, making it a powerful alternative for parallel processing.
00:19:03.900 Both the Parallella and Pine64 boards have their respective advantages, and your choice should depend on the specific requirements of your project. In conclusion, I encourage you to explore parallel programming as it holds immense potential for innovation.
00:20:08.650 I appreciate your attention today. If anyone has questions, please feel free to ask.
00:21:15.860 One question I received is about transitioning from serial to parallel execution. The key steps involve including the necessary library and ensuring your problem can be divided with reduced dependencies.
00:21:39.600 Additionally, someone asked about the Pine64 Kickstarter. While the Kickstarter has ended and they raised significant funds, Pine64 is beginning to fulfill pledges and may soon be available for sale directly.
00:22:04.740 As for the Nvidia Tegra, it is based on a GPU. However, the pricing for those units can be higher than Parallella or Pine64, so it's essential to weigh the benefits against your budget.
00:23:02.400 Thank you all for your engagement, and I'm happy to discuss any further questions or topics of interest!