Description

Thanks to AMD, building a gaming pc in late 2019 is pretty straightforward for almost any budget. However..., building a rig for deep learning on a budget can be confusing, and one have to really look into hardware to find the best combination of parts available. At work, I have a very powerful workstation with almost no compromises, but I wanted a personal build for my own projects (mostly NLP) around 1000USD. If you are not familiar with these builds; most of the "budget" deep learning pc guides on the internet are 2000USD+, so half of that price is definitely budget. Note that I'm living in Japan and I purchased almost everything here (used), so the prices shown are changed from Yen to USD. I game sometimes and I wanted this pc to run my favorite games smoothly, but this was always secondary to the main purpose of the build when I was hunting for parts.

I decided to allocate half of the budget -> maximum of 500USD just for GPU(s), as this is the most important component of every deep learning build. ROCm is not quite there yet (but looks very promising btw), so I went with Nvidia/Cuda. TLDR; I bought two GTX 1070s. Longer version: The 2070 and 2x1660ti fall well behind 2x1070, and a 2080 super and anything above is really out of budget. And actually, both in CNNs and LSTMs, 2x1070 outperforms a 2080 or a 1080ti. Here where I live, a good deal for a 1080ti or a non-super 2080 is ~560-580USD. An RTX 2080 would have the advantage of tensor cores, so I could do mixed-precision training. However, that's not pure half-precision training, as fp16 doesn't work well in all cases (while I admit it can be really useful). State of the art networks for language modeling use ~13GB even for base models. With 2x1070 I will have 2x8GB memory for fp32, and if I don't need that much memory, I can train two networks simultaneously. I wanted to get a model with good overclocking performance, so I choose the MSI Gaming X. The two cards were 391USD together, and both were in mint condition (I just repasted them). If I get a really good deal (like with one of the cards), I could even buy one more for the price of 2080/1080ti. And I actually do plan to have one more card in the future, but that will be a blower-style RTX (probably 2080 super). Originally, I planned to disable the 2nd card for gaming/Windows but I found out that quite a few of my favorite games support SLI, so I got an HB bridge (at Linux, I didn't even activate SLI, as it's irrelevant for deep learning, unlike NVLink). On Windows, the cards overclock to +115 core and +415 memory (2101MHz boost clock with a 4212MHz memory). Based on benchmarks, the cards are on the level of stock 1070tis. With almost the same OC in Linux, doing GPGPU Cuda stress testing consumes 200-215W/card full load. Temps are amazing with the Gaming X, and because these are pretty big fans on the card it's not even loud. Edit: Spending some time with the pc, here is my experience: Of course, games don’t scale like 3DMark so that SLI score is pretty much useless, but regarding deep learning performance, these two cards are really-really close to a stock 2080ti.

The part hunt continued with the motherboard and CPU. I have 2 GPUs for now and will buy another one in the future. So I needed a Mobo that has at least three x16 length PCIe slots. I don't really need native NVMe slot Mobo, this is absolutely not a priority. 32GB RAM is the absolute minimum it has to support, preferably 64GB. DDR3 is still fine, but need a board which can push it at least to the low 2000s with acceptable timings. For machine learning purposes, it doesn't matter at all, if the GPUs are running at x8 or x16 speed. But still, this means a CPU with at least 24(3x8) PCIe lanes is required. 1 core/GPU is totally fine, but hyperthreading is a must, as preprocessing tasks are easily parallelized. I looked into several platforms and Mobo-CPU combinations, and I found that the sweet spot for cost-performance is X79/LGA2011 (I wouldn't go older). X79 Mobos are not so cheap usually, but looking it as a combo with the CPU, you can find good deals (I would never recommend buying those cheap Aliexpress X79 boards, as you will need a good VRM for stability and overclocking). I got the E5 1650v2 for mere 55USD which is a 6 core unlocked soldered Xeon. The Mobo is an ASRock ATX board. Supports quad-channel DDR3 up to ~2600MHz OC, but unfortunately, the maximum capacity is 32GB. This was something I had to compromise on because X79 boards can be really overpriced. Overall, this is a good board, with a pretty decent VRM and a lot of ports, and thanks god it has a Clear CMOS button on the I/O panel. After installing, I updated the CPU to the latest available microcode (+disabled Spectre and Meltdown mitigations on both OS) and flashed the Mobo BIOS to the newest version. The base VCORE is set to 1.36V, and the CPU is stable @4.8Ghz all cores (so I'm pretty lucky). This means that the actual voltage on idle is ~1.34V, and ~1.39 during full load. When torturing her poor soul with Prime95, the voltage sometimes goes up to 1.407, which is about my psychological limit for this Ivy Bridge chip. Temperatures are fine (but more on that later). In AMD terms, it is around the level of a 2600x (sometimes better, sometimes worse, but pretty close). Edit: based on a commenter's input, I changed the voltage settings for the CPU. LLC is set to Level 3 and Vcore is 1.334V on idle, and the max is 1.368V with Prime95. Edit 2: Aida64 cache stress was unstable, so had to pump it up a little more, but temps are still ok. Also, I forgot to mention it, so: at full load it consumes 171W.

Both the CPU and Mobo supports ECC memory, but the Mobo only supports unbuffered ECC, which is about the same price as non-ECC memory. And it's not like I need it anyway. So for memory, I got the Kingston HyperX Beast 32GB(4x8) kit. It's a very good DDR3 (in benchmarks, outperforms a lot of 3000-3200 DDR4 kits), and the XMP profile worked as it supposed to; the memory is running at 2400Mhz in quad-channel. I reduced the timings a little bit, and it's stable at 10-12-13-30. Maybe I could tighten these a little bit more, but memory OC/tweaking can take a lot of time. To be honest, I wanted to get the memory cheaper, but better DDR3 kits are actually not that cheap (at least here), and I didn't want to get a 1600 kit with loose timings just to save 40$.

Storage was the only thing I already had for this build and it's on the part list. I have two 240GB Kingston SSDs for the two OS, and a WD Black 4TB HDD for additional storage. Well, the SSDs are... mediocre. But enough for the OS+few games/datasets, so it's fine for now. I don't keep anything important on the WD Black (actually one of these just failed me last month), but it's pretty fast for a hard disk.

I have an unused 500W power supply, but it’s obviously not enough. A heavily overclocked old Xeon, two overclocked gaming x 1070, and a high-end RTX card in the future... I needed ~1000W to be on the safe side. But the used market doesn’t work this way, and the only decent one was this FSP Aurum Pro 1200W beast. It’s a quality PSU with gold efficiency (and Tier A on LTT PSU Tier List), and I got a really good deal on it. The drawback is that it’s 180mm long, so it won’t fit most mid-tower cases.

Which brings us to the case. I should mention that cases are totally overpriced here compared to the US or Europe. The length of the PSU proved to be the biggest restriction when choosing, but I didn’t want to get a full tower just because of this. I wanted a case with a full mesh front because of the multi-GPU setup, and preferably mesh top as well. So I got the Zalman i3 mid-tower case, which was everything I wanted and even more... on paper. It has a nice elevation for the PSU fan, and for 56$ it comes with 4 preinstalled white LED fans. While not loud, these are pretty weak fans, so I sold all of them for 9$, which gives a 47$ case without fans. It has a fan controller (off + 3 levels, 6 headers) which proved to be quite convenient actually. It doesn't have a tempered glass window (just acrylic), but I cannot care less for this build. So far sounds pretty good, right? And actually this case could have been the value-king for performance, but there are some drawbacks. Some of the cable holes and in-case distances don't make sense at all, and super inconvenient. Even if you can put a 180mm PSU in, you can barely mount the HDDs into the bay. Once you've taken them out, you cannot put back the PCIe slot covers (fortunately we have thousands of those lying around in the lab, so I replaced them because it bothered me). There is almost no place at all for cable management. I mean if you install a microATX mobo with a 150mm PSU and one GPU and an SSD, I’m sure it’s still fine. But if you use the case for its full potential, you will have a hard time. Anyway, I managed to close the back somehow, but I had to put some part of the cables around the SSDs (pic 1). When I buy the third GPU I will definitely get a big full tower case... But it’s not all bad. It has an amazing airflow, the fan controller is a nice plus + it's cheap. I kept the top and bottom filters, but removed the front one, as I could see it would definitely limit airflow. And I rather clean it more often than running a hot/loud system. Oh I almost forgot; of course, the holes on the PSU shroud wasn’t aligned with the additional power header (sometimes called EZ plug, it’s a molex close to the PCIe lanes), so I had to cut one of the holes bigger. Maybe I would be fine without it, but the Mobo only has one 8 pin connector, and the cards consume a considerable amount of power... so better be safe than sorry.

Next is cooling. I really didn’t want to get an AIO, because even if the fan speeds are controlled by the Bios, controlling the pump on Linux can be a challenge or even not feasible at all (with the H150i pro on another system, I had to set it to max performance on Windows, otherwise it just ran in some silent mode on Linux, and opencorsairlink didn’t work). The CPU cooler is from Aliexpress (it was on sale+coupon). Except for my first computer we built with my father when I was ~10 years old, I only used Noctua for air cooling. So we can say I have pretty high standards when it comes to cooling. I didn’t want to pay for a D14/15, so I got the “best” tower-style air cooler I could find on Ali and still fit my case (+have RAM clearance). It came with two RGB fans with so-so CFM, so I put these to the top of my case, and replaced the push with a high CFM, and the pull with an average fan. Other fans include two average on the rear and front-top, and two high CFM front-middle and front-bottom (these are for GPU airflow). All fans except the CPU fans are controlled by the case fan controller (6 in total). I had to mutilate the controller headers a little bit, as these are 3 pin-width 2 pin headers, but 4 of the case fans are PWM. The reason why I didn’t just plug in these to the motherboard (with splitters/sata hub, etc.) is that I never really succeeded to control case fans in Linux systems (for Windows, Speedfan is brilliant though). For browsing/youtube, I don’t even turn on the case fans. For training/gaming the fans are on the 1st or 2nd level. With level 3, I’m getting diminishing returns and the system is pretty loud so I’m not using it. CPU idles at 28-29C, full load temp is around 70-75C, and the maximum observed temp in Prime95 (small FFTs) was 89C. Because the PC could almost fly, neither GPU (OCd) reached 70C so far whatever I did.

The display I’m using is a Dell P2416D which I got it for 135$. It’s a 23.8" 2K monitor, so the pixel density is really high and it has an amazing display quality. Originally it’s a 60Hz monitor, but I overclocked it for 70Hz (some achieved 75Hz with the same model, but mine started to skip frames at that refresh rate). I never play FPS games so this refresh rate is totally fine (so I won’t upgrade anytime soon). Nothing special about other peripherals. I have an old Logicool mouse and keyboard. I never had any issues with these while working/gaming. The speakers are the Creative Inspire T7700 which is a 7.1 set from 2003. I use it only as a 3.1, more than enough for me.

Synthetics/benchmarks: CPU: CPU-Z(multi/single) - 3676.8/495.3, CPU Mark - 17343, Geekbench 5 CPU(multi/single) - 6478/1045, Cinebench r20(multi/single) - 2730/352, Cinebench r15(multi/single) - 1266/171. Memory: Memory Mark - 3319 (108 dops/16002 uncached read/31895 cached read/12352 write/23 latency/53836 threaded). GPU: Geekbench 5 (single card) - CUDA 50860, OpenCL 53083, Vulkan 54927, Tensorflow 1.14 Resnet50 benchmark (both cards) - 286.36 images/sec (this is on a par with a stock Titan V, and almost on the level of a stock 2080ti) Other: 3DMark Time Spy (SLI) - 11777 (13330 graphics/7095 cpu), Shadow of the Tomb Raider benchmark (SLI): 2560x1440, all on/maxed, AA is smaa - AVG FPS: 99.

Real workloads: After tuning everything in, I tried the system with one of my BiLSTM models. Of course, it's not 4x2080ti, but this 2x1070 setup is an amazing bang for the buck. As for games, I decided to play Witcher 3 again. Everything maxed out+modded graphics and 2K res, the framerate is vsync locked for 70fps, and it never drops (CPU temp is 40-50C, GPUs are 45-55C).

Log in to rate comments or to post a comment.

Comments

  • 2 months ago
  • 2 points

An LLC setting which boosts voltage on load is NOT good for 24/7 stability. You ideally want some droop so that you can avoid electromigration. I burned up 4 E5-1660 Xeons with electromigration because of aggressive LLC.

I own a Gigabyte X79S-UP5 and speak from experience.

  • 2 months ago
  • 1 point

Thank you for your advice. I will fix this as I get home.

  • 2 months ago
  • 1 point

I changed the voltage settings, and updated the relevant parts of the text accordingly (the original settings are still there, maybe somebody can learn from it). I looked into LLC, and decided to go with the settings of 1.334V on idle, and the max is 1.368V (decided=worked). I ran some stress tests quickly and it was fine, but I will see how it goes in the long run. 1.368V with 4.8GHz for this chip sounds a little bit low for me, but maybe I'm just lucky.

  • 2 months ago
  • 1 point

Idle should be higher than load, because high voltage and low current with proper cooling doesn't damage the chip. Your load voltage should droop about the same amount as it rises right now, so invert the voltages with 1.334V load and 1.368v idle, and increase as needed.

  • 2 months ago
  • 1 point

Thanks for the answer! This is my understanding of LLC now: 1. If LLC is on a high/extreme mode, voltages can go up way higher than the set vcore which can degrade the chip 2. If the voltage is fluctuating a lot (idle-load), it can cause instability None of these is true currently, so what am I missing? I get that probably even 1.4 idle wouldn't hurt the chip if load goes down to like 1.38, but what's the point of this if the idle-load gap isn't big and the load is far from dangerous anyway?

  • 2 months ago
  • 1 point

Too much LLC (dropping a ton of voltage on load) does introduce instability, but the load voltage is where you want to be stable. 1.39 is where you want your highest idle to ideally be, because with C-states disabled, 1.4+ idle could still be slightly dangerous. Ideally you want to droop 0.3-0.4v, as more droop will introduce instability.

Because of this, you might want to back down to 4.7 or 4.6Ghz.

  • 2 months ago
  • 1 point

I still do not understand, why the current settings are incorrect. Why idle must be higher than load, if both idle and load voltages are totally safe and the system is stable? You mean that maybe around 1.37 is still too high?

  • 2 months ago
  • 1 point

More & more people talk about deep learning capabilities, but what does that look like when you do it at home? Are there "handy" applications of deep learning for the average PC enthusiast, or is it more of an industrial thing?

  • 2 months ago
  • 1 point

Although we are using deep learning technologies every day, developing these methods is another thing. There are several practical applications in all kinds of business intelligence (recommender systems, customer need assessment, advertising...), medical analysis, manufacturing, etc., but most of the people learning about/developing deep learning models are from the industry or academia. But of course, there are plenty of enthusiasts who maybe already know some coding, and just interested in the technology, wants to enter Kaggle competitions, etc. There are people who start learning about machine learning because she/he wants to enter the stock market with algorithmic trading, or just has a good startup idea. However, I see no reason why an "average" PC enthusiast would learn about it and buy the hardware for it.

As for me, there are some stuff I'm interested in/have my own ideas about but it's just outside the scope of my job. Maybe I'll produce few papers about my projects or even make some money out of it, but the point is to have fun, and do what I enjoy. Not so different than gaming in this aspect.

  • 2 months ago
  • 1 point

Honestly, I don't understand most of what I just read, but +1 for taking the time to type it all out.

  • 2 months ago
  • 1 point

haha thanks. First, I didn't want to make it this long, but then I figured maybe somebody will want to use it as a 'guide' for building a budget deep learning pc..

[comment deleted]
  • 2 months ago
  • 1 point

E5-16xx (original, v2, v3) xeons have an unlocked multiplier.

[comment deleted]
  • 2 months ago
  • 1 point

Thanks! I'm very lucky with this chip. And it was a bit of gamble with the aliexpress cooler, but turned out to be okay.