Business

AMD’s Epyc Milan offers double Intel Xeon’s datacenter performance

March 15, 2021

Enlarge / Whether your primary ask is higher performance per watt, per physical rack unit, or per TCO dollar, AMD’s Epyc Milan is an extremely strong contender.

Today, AMD launched Epyc Milan, the server / data center implementation of its Zen 3 architecture. The story for Epyc Milan is largely the same told by Ryzen 5000—lots of cores, high boost-clock rates, 19 percent gen-on-gen uplift, and an awful lot of polite schadenfreude at rival Intel’s expense.

The comparison between AMD and Intel is even more stark in the server room than it was in consumer PCs and workstations, because there’s no “but single thread” to fall back on here. Intel clung to a single-threaded performance lead over AMD for some time even after AMD began dominating in multithreaded performance. Although that lead disappeared in 2020, Intel could at least still point to near-equivalent single-threaded performance and pooh-pooh the relevance of the all-threaded performance it was getting crushed on.

This isn’t an excuse you can make in the data center—Epyc and Xeon Scalable are both aimed squarely at massively multitenanted, all-threads workloads, and Xeon Scalable just can’t keep up.

Head to head with Xeon Scalable

AMD took a giant leap forward in 2019 that Intel has so far been unable to replicate.
You can handle more massively multithreaded concurrent workload with fewer systems by going Epyc instead of Xeon.

AMD
It shouldn’t be difficult to find an Epyc-powered server to handle your workload, at any level of the stack.

AMD

We’ll get into some of the architectural changes in Epyc Milan later, but they’re probably not much surprise to readers who are really into CPU architecture in the first place—the transition from Rome to Milan is a shift from Zen 2 to Zen 3 architecture, not much different in the rack with Epyc Milan than it was on the desktop with Ryzen 5000.

We prefer the simple, boots-on-the-ground perspective here: these are faster processors than their Xeon competitors, and you can get more done with less physical space and electrical power with them. AMD presented a slide with a smoothed progress curve that shows Epyc lurching into high gear in 2017, bypassing Xeon and continuing to leave its rival in the dust.

We’re not entirely certain we agree with the smoothing—Xeon Scalable and Epyc were at a dead heat in both 2017 and 2018, then Epyc took a truly massive leap forward in 2019 with the first Zen 2. The smoothed curve seems to be trying to hammer the point home that Epyc continues to improve at a solid rate rather than stagnating.

AMD found plenty of ways to show Epyc Milan doubling Xeon Scalable’s performance. This one’s the money shot, in our opinion.

AMD
The Xeon Gold 6258R being used as a comparison here is legit—although nowhere near as expensive as a Platinum 8280, its performance is near-identical.

AMD
Shifting from Specrate floating point to Specrate integer doesn’t change things much—we’re still looking at just over double the performance of a Xeon 6258R.

AMD
JVM performance gets an even bigger delta than Specrate, with a whopping 2.17x performance boost above Xeon Platinum 8280.

AMD
Even if you drop from the 64-core Epyc 7763 down to the 32-core 75F3, you’re still looking at a 1.7x performance boost above Intel’s best.

AMD

There’s no denying the performance delta between Epyc and its closest Xeon competitors—and AMD’s presentation leaves no stone unturned in the quest to demonstrate it. AMD’s flagship 64-core Epyc 7763 is shown turning in more than double the performance of a Xeon Gold 6285R in Specrate 2017 integer, Specrate 2017 floating point, and Java Virtual Machine benchmarks.

Even more impressively, AMD CEO Lisa Su presented a slide showing 2.12x as many VDI desktop sessions running on an Epyc 7763 system as on a Xeon Platinum 8280 system. The only remaining question is whether these are fair comparisons to begin with—some were against Xeon Gold, one against Xeon Platinum, and none is against the most current Intel line-up. What gives?

There are effectively no publicly accessible benchmarks available for newer Xeons like the 8380HL—and they aren’t any faster than the Xeon Platinum 8280 anyway, even using Intel’s own numbers. Using the Xeon Gold 6285R in most comparisons makes sense also—it offers near-identical performance to the Xeon Platinum 8280,at the same TDP and significantly lower cost.

In other words, these numbers are being presented without any “gotchas” that we could find—AMD is comparing its flagships to Intel’s in the most reasonable head-to-head comparisons possible.

Architectural changes from Rome to Milan

Zen 3 offers a 19% IPC uplift versus Zen 2. No new motherboard required—Milan CPUs go on Rome boards just fine, after BIOS upgrade.
If you want to see all the new goodies in one place, this is your infographic.

AMD
Here’s where that 19% improved IPC comes from—better branch prediction, wider execution pipeline, and more load/store per cycle.

AMD
Zen 2 and Zen 3 each have 4MiB L3 cache per core—but Zen3 unifies it, sharing 32MiB among eight cores rather than 16MiB among four.

AMD

Milan offers 19 percent higher IPC (instructions per clock cycle) than Rome did, largely due to Zen 3’s improved branch prediction, wider execution pipeline, and increased load/store operations per clock cycle.

Zen 3 also offers a more unified L3 cache design than Zen 2 did. This one takes a little explaining—Zen 2 / Rome offered a 16MiB L3 cache for each four-core group; Zen 3 / Milan instead offers 32MiB for each eight-core group. This still breaks down to 4MiB of L3 per core—but for workloads in which multiple cores share data, Zen 3’s more unified design can add up to big savings.

If 3MiB of L3 cache data is identical for eight cores, Rome would have needed to burn 6MiB on it—an identical copy in each of two four-core groupings. Milan, instead, can save the same 3MiB in a single cache, serving all eight cores. This also means individual cores can address more L3 cache—32MiB for Milan to Rome’s 16MiB. The result is faster core and cache communication for large workloads, with corresponding reduction in effective memory latency.

Security improvements

Milan, like Rome before it, mitigates speculative execution attacks more thoroughly than Xeon. The third row is of particular note here.
SEV-SNP—Secure Nested Pages—and Shadow Stack protection are new to Zen 3.

AMD

AMD’s Epyc has enjoyed a generally better security reputation than Intel’s Xeon, and for good reason. The Spectre and Spectre V4 speculative execution attacks have been mitigated in hardware as well as at the OS / Hypervisor levels since Epyc Rome. Milan adds support for Secure Nested Paging—offering protection for trusted VMs from untrusted hypervisors—and a new feature called CET Shadow Stack.

The Shadow Stack feature helps protect against Return Oriented Programming attacks, by mirroring return addresses—this allows the system to detect and mitigate against an attack which successfully overflows one stack but does not reach the shadow stack. Use of this feature requires software updates in the operating system and/or hypervisor.

Epyc Milan CPU models

Epyc Milan launches in 15 flavors, ranging from the eight-core 72F3 with boost clock up to 4.1GHz at a 180W TDP up to the massive 7763, with 64 cores, boost clock up to 3.5 GHz, and 280W TDP.

All Milan models offer SMT (two threads per core), 8 channels of DDR4-3200 RAM per socket, 128 lanes of PCIe4, Secure Memory Encryption (encryption of system RAM against side-channel attacks), Secure Encrypted Virtualization (encryption of individual VMs against side-channel attacks from other VMs or from the host), and more.

The SKUs are grouped into three categories—the highest per-core performance comes from SKUs with an “F” in the third digit, ranging from eight-core/180W 72F3 to 32-core/280W 75F3. (We suspect that the “F” is for fast.)

The next grouping, optimized for highest core/thread count per socket, beings with “76” or “77” and ranges from the 48C/225W 7643 to 64C/280W 7763. If you’re looking for the most firepower per rack unit that you can find, these should be the first models on your list.

The remainder of Milan’s SKU lineup begins with either 73, 74, or 75 and is aimed at a “balanced” profile, looking to optimize performance and TCO. These range from the 16C/155W 7343P to the 32C/225W 7543.

Finally, when you see a “P” in any of these SKUs, it denotes a single-socket model.

Discussing Milan with a leading server OEM

After consuming AMD’s data, we spoke to Supermicro‘s Senior VP of Field Application Engineering, Vik Malyala. Supermicro has already shipped about 1,000 Milan-powered servers to select customers, and Malyala briefly confirmed the broad outlines of AMD’s performance data—yes, they’re fast, yes, 19 percent gen-on-gen uplift is about right—before we moved onto the real elephant in the room: supply.

According to Malyala, AMD has acknowledged that the supply chain doesn’t have a lot of wiggle room in it this year. Supermicro was told it would need to forecast its CPU supply needs to AMD well ahead of time in order to get timely delivery—a situation Malyala says applies to many upstream vendors, this year.

Although AMD’s promises to Supermicro are less than concrete—they hope to fulfill orders with “minimal disruption,” given adequate forecasting—Malyala says that AMD has hit its shipping targets so far. Supermicro is extending the same hand-in-hand to its larger customers as AMD is to its OEMs, describing a process of needs forecasting coming in from enterprises and data centers to the OEMs that allow it, too, to deliver in a predictable fashion.

This sort of advanced forecasting and delivery isn’t really applicable to small businesses which might only buy a few servers once every three to 10 years, of course. Malyala says those organizations are looking at a “probably less than three-week scenario” for small, ad-hoc orders.

When we asked about the level of interest and order volume Supermicro sees for Epyc versus Xeon servers, Malyala simply replied “customer interest [in Milan] has been extremely strong.”

Source link