I felt ready to start competing on Kaggle to develop my deep learning skills. My office laptop however was completely unprepared. To effectively train deep convolutional neural networks I needed hardware that packs a punch.
I was at a crossroads.
- Should I pay for cloud services to run my calculations?
- Is the piece of mind of an out-of-the-box computer worth the money?
- How much time would I have to invest to design and build my own?
Being more than an occasional user, I ruled out cloud services based on cost. Building a workstation by parts is cheaper than an out-of-the-box solution. Yet there is no free lunch: what you save in money you pay by spending time and taking risk. But what if we assume this time is not spent but rather invested in building hardware knowledge? I was convinced this offsets the risk.
In this article I will walk you through my personal build process. There are plenty of articles showcasing complete workstation builds that work well. Rather than pitching you my build, I want to focus your attention on the decisions and trade-offs I took at each step. The aim is to leave you with a framework to navigate your way in this fascinating world.
For the whole part selection process I relied heavily on PC Part Picker which is completely free. In addition to a massive database of thoroughly reviewed parts, the website offers a tool that verifies part compatibility. I also shared my initial build on the forums to gather feedback and adjusted my final build accordingly.
The motherboard, GPU and CPU that made it into the final build
The GPU plays the lead role
The central component of any deep learning workstation is the GPU. Suggesting how to select the ideal GPU for your particular use case is beyond the scope of this article. This guide an excellent treatment of this subject.
As for me, I narrowed down my choice of provider to NVIDIA because they currently dominate the market thanks to their established deep learning libraries in CUDA. I was particularly interested in computer vision which requires sufficient GPU memory (in gigabytes).
Memory is the critical point to get right because the GPU stores the entire model and its input batch (a number of images) in memory. I prefered an RTX over a GTX model. RTX cards with their Turing cores allow to train models using a lower precision (16-bits) than the high-precision (32-bits) of GTX cards. This may sound bad, but this means that RTX cards require only half as much memory to train the same model.
I reduced my choice to either the RTX 2070 or the more advanced the RTX 2080 Ti. I made a bet on the more expensive RTX 2080 Ti card because of its higher memory: 11 GB versus 8 GB. In addition to computing faster (more FLOPS), it allowed me to experiment with a particularly heavy encoder and a large batch size. Totalling 8.7 GB of memory, this would have been impossible on the RTX 2070.
This 8.7 GB model would not have fit on an RTX 2070 GPU
I bought one GPU but picked the other components to comfortably support two RTX 2080 Ti GPUs. The higher cost of these “unnecessarily” powerful parts is relatively low compared to the price of a single GPU anyway. Fitting multiple GPUs is accomplished by NVIDIA’s scalable link interface or SLI. The accommodation of an SLI-bridge restricted my choice of motherboard.
The RTX 2080 Ti consumes up to 260 Watt producing a lot of heat. Two cooling fan types are available:
- A blower fan that blows air over the card and out of the workstation case.
- A double fan system that circulates air inside the case on to the card.
I chose the double fans because they provide a stronger air flow. I trusted my case fan to pull enough fresh air into the case to keep up to two cards cool.For three or more cards the blower fan would have been more appropriate.
The final build without lid showing a free slot for a second GPU
Remember: the other components’ purpose is to serve the GPU.
The CPU preparing batches of training and validation data for the GPU. A rule of thumb I followed was to make sure my CPU has a minimum of 4 threads (corresponding to two cores) for each GPU. Hence, the Intel Core i7 with 4 cores and 16 threads can reliably serve two RTX 2080 Ti GPUs. Yet I bought the AMD Ryzen 7 2700X with 8 cores and 16 threads. A definite overkill. However, at the price of just 165 USD this was an absolute bargain. Besides, for other parts of my work would benefit from the powerful CPU.
I mounted an ARCTIC Freezer 13 on top of my CPU. The cooler looks like an engine block of a small motorcycle but does an outstanding job keeping my CPU cool. Thankfully I could rely on PC Part Picker to find a case that would fit this monstrosity.
What about PCIe-lanes?
During my CPU search I repeatedly stumbled upon a discussion about PCIe-lanes. PCIe or “Peripheral Component Interconnect Express lanes” are the cables connecting all the various components to the motherboard. I decided not pay any attention to PCIe-lanes because it has a negligible impact on performance. Besides, training neural networks using PyTorch’s data loader with pinned memory render PCIe-lanes irrelevant. The AMD CPU will further restrict the choice of motherboard.
As a kid I had this notion that the motherboard was this super-important component that powered the entire computer. It turns out its main job is to fit and connect all the parts. I therefore picked a motherboard that could physically fit two GPUs side-by-side and had sufficient PCIe ports to connect the AMD Ryzen CPU and the two GPUs. I verified that it’s one of the SLI-ready motherboards. This motherboard is also one of the few motherboards that offers USB 3.0 Type-C front panel support
RAM memory size does not directly affect performance. However, a sufficient memory is necessary to execute my GPU code without swapping to disk. The rule of thumb is to get RAM that matches the GPU memory: 16 GB of RAM for my 11 GB GPU. Working with large datasets is uncomfortable with only 16 GB of so I bought the 32 GB RAM Corsair Vengeance LPX DDR4 3000 memory. I purposefully got two 16 GB parts instead of a single 32 GB part to take advantage of the dual-channel memory bus.
The opinions regarding RAM speed (indicated by the 3000 number in MHz) are contradicting to say the least. Some sources state that RAM speed is irrelevant for deep learning or even performance in general. Other sources state that the AMD Ryzen series form a notable exception. The saying goes that the speed of AMD’s Infinity Fabric is tied to the memory clock rate as explained in this clever video. To have peace of mind I bought the cheapest 3000 MHz version version of the 2x16GB kit I could find.
I bought an SSD with a NVMe connection for training. An NVMe connection allows the SSD to have its data read straight from a PCI-E slot right on the motherboard. This happens at a much faster rate than with a traditional SATA connection. I use my 1TB Western Digital Black NVMe SSD both for training and long-term storage. If 1 TB is not enough for you, buying a (cheaper) HDD for long-term storage while leaving the SSD for work makes sense.
5. Power supply
For the power supply I considered both its power (in Watt) and its energy efficiency (in percent). The supplied power must exceed the total consumed power of all parts. I estimated my power requirement by adding the power consumption of the CPU and two GPU plus a ten percent a factor for the other components:
45 W + 2 x 260 W + 10% = 620 W
The Corsair 1000 W power supply I bought is probably a bit of an overkill for this built. Though this allows me to keep this high-quality power supply when I add or upgrade parts in the future. The 80+ Platinum energy efficiency certification will pay itself back through months of smaller energy bills.
I bought the Lian Li PC-O11 Dynamic ATX Full Tower Case. Besides being able to fit my giant CPU cooler, the larger size simplified the wire handling during assembly. The case comes without fans so I added the Noctua NF-F12 120 mm fan.
Most programmers recommend a 27 inch screen which I found way too big. I bought two Dell P2418D monitors and am very satisfied with their 23.8 inch screen size. The IPS (In-Plane Switching) technology allows for good colors at wide viewing angles — important in a dual screen setup. For writing code the refresh rate is irrelevant so the 60Hz was no dealbreaker. Again programmers swore by a 4K screen resolution to better make out the characters in their code. I found my code to be perfectly readable at 2560 x 1440 resolution.
While I was searching for my monitor I stumbled on an interesting case where a GPU was damaged by a bad display port cable included with the monitor. This got me pretty worried considering I dropped more than a grand on my GPU. However, the odds of this fault occurring are so low that I never bothered to check whether the DP cables of my monitors passed the latest VESA certifications. Still and all, after countless hours of GPU-training my workstation is completely healthy and runs at peak performance to this day.
The completed workstation sitting on my desk
I hope this article gave you the courage to build your personal deep learning workstation. Myself, I had a blast diving deep into this subject! In addition to finding a solution that fit my need, I explored a fascinating world previously alien to me. I’m sure that as years go by and my workstation inevitably becomes obsolete, the knowledge I gained from this experience will endure and continue to serve me well.