PipelineFX co-founder and CEO Richard Lewis shares tips and tricks for scaling rendering workloads into the Cloud, including understanding data traffic patterns, matching hardware, and syncing data.
Richard Lewis is the co-founder and CEO of PipelineFX. With a focus on computer graphics since 1986, he also co-founded one of Hawaii’s leading computer systems integration firms, which he led for over 20 years. From 1998-2002, Lewis led the team that was the full-service systems integrator for Square USA’s Honolulu studio during the making of ‘Final Fantasy: The Spirits Within’ and ‘The Animatrix: Final Flight of the Osiris.’ Lewis negotiated the exclusive rights to the render management software developed by Square USA which is now called “Qube!”.
Whether you’re a small to medium sized studio trying to efficiently manage a moderate rendering workload or you’re running several hundred farm machines and you want a fast, dependable way to handle burst capacity, the Cloud is the latest option of interest. It’s not like using Cloud resources is a new concept, but the prices have come down enough to make it feasible and providers like Google are investing resources into learning how studios work so they can attract the business. Scaling into the Cloud is becoming easier and easier by the day.
Scaling comes with challenges, though. It can bring to light problems that you never knew you had. Increasing your rendering workload and consequently the amount of data traversing your network and systems can create bottlenecks and expose software or scripts in your pipeline that were not meant to be used outside of a few simultaneous jobs.
Because the support from scaling rendering can be enough of a problem to occupy any studio’s time, why would you want to increase risk by adding cloud rendering into the equation? You’ve heard the stories. Is it really worth it? Won’t it end up in tragedy? Not necessarily. If you consider these eight tips when scaling your rendering workload into the cloud, you’ll have a greater chance of success.
1.Understand data traffic patterns on your network
It’s important to understand where your data travels in your network when you initiate a render job. What subnets are currently maxed out or are getting close? What kind of data should really travel alone or outside of the channels of other data types? What is the traffic like through your VPN connection to the cloud? One PipelineFX client, Moonbot Studios, learned the importance of seeing VPN traffic patterns as they happen. Moonbot’s Brennan Chapman emphasized the benefits of visualizing data traffic patterns in real-time. “We were having problems with our VPN connection falling down,” he recounted. “What helped with tracking down the problem was watching the live graph of our throughput on our internet connection.”
Through watching the trending graphs and charts generated by the live transfer of data across his network, Chapman was able to detect the terminal packet loss through the complete saturation of Moonbot’s line.
2.Watch the volume of requests to your render farm manager
Rendering pipelines are filled with helpful scripts and utilities to insure successful job completion. A script that constantly polls all running jobs on the farm every 10 seconds may work very well at smaller scales, but when studios multiply their rendering workload, that same script and others that run innocent little calls to the Supervisor can become a crippling source of traffic. Before you scale, try to get a handle on the utilities your render jobs run that need Supervisor data. Consider replacing polling of the Supervisor with an event driven approach via callbacks. Callbacks can be used to report changes in render job status as they happen, resulting in significantly less work for the Supervisor.
3.Don’t burden your file server
The same phenomenon can happen with utilities run during rendering that rely on shared network locations. During larger render farm loads, a simple write to a local file server can contribute to bringing down the whole pipeline. Wrappers are usually a common source of traffic across shared network locations, so before you distribute a wrapper that runs every time Maya is opened, especially during rendering, learn what your wrappers do and if those commands are even needed during the rendering phase.
4.Don’t underestimate license management during rendering
In order to render, you need to make sure you have a software license available. It’s very easy to overlook the license count as your workload increases. Even more common is to forget to take into account how many times an application pings the license server during a render job. As mentioned with the Supervisor, your license server traffic increases exponentially with scale. That is another machine you want to make sure has adequate horsepower for the job. Remember you may need to manage many times more licenses than before when scaling to the cloud.
5.Match your hardware to the type of render job
Don’t overlook hardware options and how optimal they are in terms of the kind of render job running. For instance, you probably don’t want to assign a machine low on memory to handle Nuke jobs. Or, you don’t want to assign a machine rich in memory to handle long, slow computational jobs like simulations. With even closer inspection, you’ll see that certain processors work better than others with certain render file types or that you may need more SSD to be really effective.
6.Cache or sync data to reduce traffic at scale
We’ve mentioned before how important caching data is when managing your renders. It’s a great way to reduce the traffic on your machines and network because it removes the need to send redundant data. That can make quite a difference, especially in the Cloud. You want your render nodes and asset hosting to be fronted with caching appliances. Or, you can consider a live sync of render assets from your studio to the Cloud.
Jordan Soles of RodeoFX explains how he integrated rendering in the Google Cloud into his pipeline: “We put a great deal of effort into making our asset management/version publishing system ‘Cloud aware.’ This means that when new models, textures, etcetera, are published, they automatically get sent to the Cloud,” he said. “Ultimately, we end up reducing the amount of data needed to be transferred to the Cloud’s filesystem which allows our renders to start quickly.”
7.Your Supervisor needs horsepower!
The Supervisor is a complex controller of your render farm, and it’s responsible for massive amounts of communication to be effective. When you dramatically increase the amount of render work that you dispatch, you exponentially increase the amount of communication and data going through the Supervisor. It’s only fair to make sure your Supe can handle the additional traffic and isn’t bogged down with high I/O or other hardware bound limitations. Before the Cloud, sizing a Supervisor for rendering was a little more simple, as the rendering resources were known and more or less constant. With today’s hybrid Cloud pipelines a render manager Supervisor may be required to dispatch to 2x, 3x, 4x or more render nodes than exist on-premises.
8.Keep a watchful eye on your spending
Through personal experience or the sobering tales of peers, we have all heard about how easy it is to over spend using resources in the Cloud. With the pressure of a show deadline and a mounting backlog of render jobs, it becomes even more probable to just use whatever is needed to “get the job done.” But as we all know, the correct, affordable solution lies in finding a way to get the job done with smaller, more modest additions to your rendering resources. It’s important to keep a vigilant watch on your Cloud spending as you dispatch render jobs.
According to Google Cloud Platform Product Manager Todd Prives, users are able to understand the scalability they can achieve on their cloud nodes because “…certain cloud vendors out there offer per-minute billing, which allows you to scale workloads on a 1 virtual resource to 1 frame basis.”
After years of what ifs, the Cloud is finally mature enough to augment or replace the rendering resources in production studios. Qube! customers are actively reaching out to PipelineFX for assistance to start using the Cloud or are inquiring about how it works for other studios. RodeoFX and Moonbot are out there proving that it works, making these 8 tips a good way to gain some of their insight into how to make the Cloud work for you.