technology daily: April 2015

Monday 27 April 2015

What is 802.11ax WiFi, and will it really deliver 10Gbps?

Linksys WRT 1900AC. Curvier, and with four antennae.

Wireless standards tend to get proposed, drafted, and finally accepted at what seems like a glacial pace. It’s been roughly 17 years since we began to see the first 802.11b wireless routers and laptops. In the intervening time, we’ve only seen three more mainstream standards take hold since then: 802.11g, 802.11n, and now 802.11ac. (I’m leaving out some lesser-used ones like 802.11a for the purposes of this story.)

Now a new standard looms over the horizon. And if you thought that your new 802.11ac router’s maximum speed of 1,300Mbps was already fast, think again. With 802.11ac fully certified and out the door, the Wi-Fi Alliance has started looking at its successor, 802.11ax — and it looks pretty enticing. While you may have a hard time getting more than 400Mbps to your smartphone via 802.11ac, 802.11ax should deliver real world speeds above 2Gbps. And in a lab-based trial of technology similar to 802.11ax, Huawei recently hit a max speed of 10.53Gbps, or around 1.4 gigabytes of data transfer per second. Clearly, 802.11ax is going to be fast. But what is it exactly?

What is 802.11ax WiFi?

The easiest way to think of 802.11ax is to start with 802.11ac — which allows for up to four different spatial streams (MIMO) — and then to massively increase the spectral efficiency (and thus max throughput) of each stream. Like its predecessor, 802.11ax operates in the 5GHz band, where there’s a lot more space for wide (80MHz and 160MHz) channels.

With 802.11ax, you get four MIMO (multiple-input-multiple-output) spatial streams, with each stream multiplexed with OFDA (orthogonal frequency division access). There is some confusion here as to whether the Wi-Fi Alliance and Huawei (which leads the 802.11ax working group) mean OFDA, or OFDMA. OFDMA (multiple access) is a well-known technique (and is the reason LTE is excellent for what it is). Either way, OFDM, OFDA, and OFDMA refer to methods of frequency-division multiplexing — each channel is separated into dozens, or even hundreds, of smaller subchannels, each with a slightly different frequency. By then turning these signals through right-angles (orthogonal), they can be stacked closer together and still be easily demultiplexed.

According to Huawei, the use of OFDA increases spectral efficiency by 10 times, which essentially translates into 10 times the max theoretical bandwidth, but 4x is seeming like more of a real-world possibility.

This lovely diagram shows you North America’s 5GHz channels, and where those 20/40/80/160MHz blocks fit in. As you can see, at 5GHz, you won’t ever get more than two 160MHz channels (and even then, only if you live in the boonies without interference from neighbors).

How fast is 802.11ax?

If we go for the more conservative 4x estimate, and assume a massive 160MHz channel, the maximum speed of a single 802.11ax stream will be around 3.5Gbps (compared with 866Mbps for a single 802.11ac stream). Multiply that out to a 4×4 MIMO network and you get a total capacity of 14Gbps. If you had a smartphone or laptop capable of two or three streams, you’d get some blazing connection speeds (7Gbps equates to around 900 megabytes per second; 10.5Gbps equates to 1,344MB/sec).

In a more realistic setup with 80MHz channels, we’re probably looking at a single-stream speed of around 1.6Gbps, which is a quite reasonable 200MB/sec. Again, if your mobile device supports MIMO, you could be seeing 400 or 600MB/sec. And in an even more realistic setup with 40MHz channels (such as what you’d probably get in a crowded apartment block), a single 802.11ax stream would net you 800Mbps (100MB/sec), or a total network capacity of 3.2Gbps. (Read: How to boost your WiFi speed by choosing the right channel.)

802.11ax range, reliability, and other factors

So far, neither the Wi-Fi Alliance nor Huawei has said much about 802.11ax’s other important features. Huawei says that “intelligent spectrum allocation” and “interference coordination” will be employed — but most modern WiFi hardware already does that.

It’s fairly safe to assume that working range will stay the same or increase slightly. Reliability should improve a little with the inclusion of OFDA, and with the aforementioned spectrum allocation and interference coordination features. Congestion may also be reduced as a result, and because data will be transferred between devices faster, thus freeing the airwaves for other connections.

Otherwise, 802.11ax will work in roughly the same fashion as 802.11ac, just with massively increased throughput. As we covered in our Linksys WRT1900AC review, 802.11ac is already pretty great. 802.11ax will just take things to the next level.

Do we need these kinds of speeds?

The problem, as with all things WiFi, isn’t necessarily the speed of the network itself — it’s congestion, and more than that even, it’s what the devices themselves are capable of. For example, even 802.11ax’s slowest speed of 100MB/sec is pushing it for a hard drive — and it’s faster than what the eMMC NAND flash storage in most smartphones can handle as well. Best-case scenario, a modern smartphone’s storage tops out at around 90MB/sec sequential read, 20MB/sec sequential write — worst case, with lots of little files, you’re looking at speeds in the single-megabyte range. Obviously, for the wider 80MHz and 160MHz channels, you’re going to need some desktop SSDs (or an array of desktop SSDs) to take advantage of 802.11ax’s max speeds.

Of course, not every use-case requires you to read or write data to a slow storage medium. But even so, alternate uses like streaming 4K video still fall short of these multi-gigabit speeds. Even if Netflix begins streaming 8K in the next few years (and you thought there wasn’t enough to watch in 4K!), 802.11ax has more than enough bandwidth. And the bottleneck isn’t your WiFi; It’s your internet connection. The current time frame for 802.11ax certification is 2018 — until then, upgrading to 802.11ac (if you haven’t already) should be a nice stopgap.

Tuesday 21 April 2015

AMD details new power efficiency improvements, update on ’25×20′ project

Energy efficiency is (if you’ll pardon the pun) a hot topic. Foundries and semiconductor manufacturers now trumpet their power saving initiatives with the same fervor they once reserved for clock speed improvements and performance improvements. AMD is no exception to this trend, and the company has just published a new white paper that details the work it’s doing as part of its ’25×20′ project, which intends to increase performance per watt by 25x within five years.

If you’ve followed our discussions on microprocessor trends and general power innovation, much of what the paper lays out will be familiar. The paper steps through hUMA(Heterogeneous Unified Memory Access) and the overall advantages of HSA, as well as the slowing rate of power improvements delivered strictly by foundry process shrinks. The most interesting area for our purposes is the additional information AMD is offering around Adaptive Voltage and Frequency Scaling, or AVFS. Most of these improvements arespecific to Carrizo — the Carrizo-L platform doesn’t implement them.

AVFS vs DVFS

There are two primary methods of conserving power in a microprocessor — the aforementioned AVFS, and Dynamic Voltage and Frequency Scaling, or DVFS. Both AMDand Intel have made use of DVFS for over a decade. DVFS uses what’s called open-loop scaling. In this type of system, the CPU vendor determines the optimal voltage for the chip based on the target application and frequency. DVFS is not calibrated to any specific chips — instead, Intel, AMD, and other vendors create a statistical model that predicts what voltage level a chip that’s already verified as good will need to operate at a given frequency.

DVFS is always designed to incorporate a significant amount of overhead. A CPU’s operating temperature will affect its voltage requirements. And since AMD and Intel don’t know if any given SoC will be operating at 40C or 80C, they tweak the DVFS model to ensure a chip won’t destabilize. In practice, this means margins of 10-20% at any given point.

According to Naffziger, this overhead corresponds to a great deal of waste, because modern CPU power consumption tends to scale by the square of the voltage increase (and closer to the cube if leakage is considered). A 10% increase in voltage is therefore roughly a 20% increase in power consumption.

AVFS, in contrast, uses a closed-loop system in which on-die hardware mechanisms manage the voltage — by taking real-time measurements of the junction temperature and current frequency, and adjusting the voltage to match them. This method eliminates the power waste discussed above by eliminating the traditional guard bands that are required to ensure proper operation of every piece of silicon.

There’s another advantage of AVFS that Naffziger doesn’t mention, though it’s not clear how relevant this is to AMD’s own interests: It can reduce the impact of process variation.

One of the differences between semiconductor manufacturing and more mundane products is that a semiconductor manufacturer doesn’t really “know” what kind of chips it built until it tests them. Every wafer of chips will have its own unique characteristics — some chips will use less voltage than others, some will hit higher clock speeds, and some won’t work at all. A well-known SoC design built on a mature node will suffer much less variation than a cutting-edge chip on brand-new process technology. But there’s always some degree of variance.

This variance is sometimes referred to as the process strength.

This graph shows three types of devices. The “Weak” device runs at 800MHz and draws the least power, but has the lowest available clock headroom. This chip won’t overclock well at all. A “Nominal” device will draw slightly more power, but has more frequency headroom, should the manufacturer choose to use it. A “Strong” device has the highest potential frequency headroom and requires less voltage to hit its maximum clock than a “Nominal” device — but its overall power consumption is higher at the same frequency and voltage because it leaks more power. Under DVFS, the manufacturer would fix each of these CPUs at 1V — the amount of voltage the weakest chip needs in order to ensure smooth operation.

AVFS, on the other hand, offers more options. By measuring and adjusting the voltage in real time, AVFS can determine that only the “Weak” chip needs a full 1V to operate. The “Nominal” chip can run at 0.95V, while the Strong chip can actually run at 0.9V. AVFS thus compensates for variation in process technology to ensure a more uniform product experience, and may improve yields as well.

Will AVFS give Carrizo a boost?

After years of promises from AMD with precious little to show for the company’s CPU efforts (unless you count the collapse of its core business as a victory), enthusiasts are understandably skeptical about what Carrizo will offer. The good news on this front is that AVFS isn’t just an idea AMD pulled out of its hat. It’s an understood tradeoff that adds design complexity on the chip, but can help compensate for foundry variation, and is generally believed to improve power consumption by at least the 20% figure that AMD is claiming. Some publications claim benefits up to 45% depending on workload (bear in mind that these are very different chips and target markets).

Reducing power consumption by 20% should allow for better battery life. But it’s not clear yet how this power tuning will impact performance. In theory, AMD should be able to hit higher boost frequencies, but this white paper notes AMD “has designed power management algorithms to optimize for typical use rather than peak computation periods that only occur (briefly) during the most demanding workloads. The result is a number of race-to-idle techniques to put a computer into sleep mode as frequently as possible to reduce average energy use.”

AMD is also baking in new support for the S0i3 idle state with Carrizo. S0i3 idles the chip in a deeper sleep than previous modes, and this should improve overall laptop battery life when the system isn’t in use.

It’s an open question, at this point, whether these techniques and strategies are flexible enough to enable higher performance during those high-use periods while still cutting overall power consumption. While a few Carrizo benchmarks have leaked to date, they aren’t very useful, at least without knowing more about the power and frequency bands the leaked chips are targeting — especially since leaked data is based on engineering samples, and may not be representative of final performance.

Here’s what I expect overall: Carrizo includes a number of power management techniques that are generally touted as reducing power consumption. It integrates more components on-die (another power reduction measure) and it offers support for idle power modes that previous AMD chips couldn’t use. When you pack all these improvements together, it’s reasonable to assume Carrizo will offer significant improvements in battery life. Exactly how much will depend on the OEMs themselves — we’ve seen plenty of evidence around Core M to illustrate that the decisions OEMs make around cooling and components have drastic impacts on the devices themselves.

Performance information suggests that Carrizo’s top-end chips may be slightly faster than Kaveri, but AMD’s own guidance suggest that the chip will be strongest in the low power bands below 35W. If Carrizo mimics Kaveri in this regard, it will see significant improvements in the 15-20W range, but may tie its predecessor’s performance at the 35W band.

After talking to Sam Naffziger, we don’t expect these improvements and capabilities to be a one-off function. AMD has yet to reveal any specifics of its plans for Zen in this regard, but there’s every reason to think we’ll see future chips leveraging capabilities like AVFS as well.

Tuesday 14 April 2015

US government blocks Intel, Chinese collaboration over nuclear fears, national security

A major planned upgrade to the most powerful supercomputer in the world, the Chinese-built Tianhe-2, has been effectively canceled by the US government. The original plan was to boost its capability, currently at ~57 petaflops (PFLOPS), up to 110 PFLOPS using faster Xeon processors and Xeon Phi add-in boards. The Department of Commerce has now scuttled that plan. Under US law, the DoC is required to regulate sales of certain products if it has information that indicates the hardware will be used in a manner “contrary to the national security or foreign policy interests of the United States.”

According to the report, the license agreement granted to China’s National University of Defense Technology was provisional and subject to case-by-case agreement. Intel, in other words, never had free reign to sell hardware in China. Its ability to do so was dependent on what the equipment was used for. The phrase “nuclear explosive activities” is defined as including: “Research on or development, design, manufacture, construction, testing, or maintenance of any nuclear explosive device, or components or subsystems of such a device.”

In order to levy such a classification, the government is required to certify that it has “more than positive knowledge” that the new equipment would be used in such activities. But the exact details remain secret for obvious reasons. For now, the Tianhe-2 will remain stuck at its existing technology level. Intel, meanwhile, will still have a use for those Xeon Phis — the company just signed an agreement with the US government to deliver the hardware for two supercomputers in 2016 and 2018.

Implications and politics

There are several ways to read this new classification. On the one hand, it’s unlikely to harm Intel’s finances — Intel had sold hardware to the Tianhe-2 project at a steep discount according to many sources, and such wins are often valued for their prestige and PR rather than their profitability. This new restriction won’t allow another company to step in, even if such a substitution were possible — it’s incredibly unlikely that IBM, Nvidia, or AMD could step in to offer an alternate solution.

It’s also possible that this classification is a way of subtly raising pressure on the Chinese as regards to political matters in the South China Sea. China has been pumping sand on to coral atolls in the area, in an attempt to boost its territorial claim to the Spratly Islands. The Spratly Islands are claimed by a number of countries, including China, which has argued that its borders and territorial sovereignty should extend across the area. Other nations, including the Philippines, Brunei, Vietnam, and the US have taken a dim view of this. Refusing to sell China the parts to upgrade its supercomputer could be a not-so-subtle message about the potential impact of Chinese aggression in this area.

China’s Loongson processor. The latest version is a eight-core chip built on 28nm.

Restricting China’s ability to buy high-end x86 hardware could lead the country to invest more heavily in building its own CPU architectures and investing with alternative companies. But this was almost certainly going to happen, no matter what. China is ambitious and anxious to create its own homegrown OS and CPU alternatives. The Loongson CPU has continued to evolve over the last few years, and is reportedly capable of executing x86 code at up to 70% of the performance of native binaries thanks to hardware-assisted emulation. Tests on the older Loongson 2F core showed that it lagged ARM and Intel in power efficiency, but the current 3B chip is several generations more advanced. These events might spur China to invest even more heavily in that effort, even though the chip was under development long before these issues arose.

Thursday 9 April 2015

Clear smart glass that generates its own electricity

It may seem like everything is becoming “smart” these days, but sometimes a new breakthrough makes you stand up and take notice. The latest one involves smart glass, which we’ve seen various types of in the past. Smart glass means more than just scratch- or shatter-resistant, like the Corning Gorilla Glass on your phone. It means the glass has some kind of special properties, like shifting tint in the sun, or preventing heat from passing through it.

That in and of itself is nothing new, either. But now a team of researched have developed a new kind of smart glass containing materials that enable the triboelectric effect, which captures the energy inherent in static electricity that occurs when two different materials collide. In other words, the glass can not only change color, but create electricity as well, as Phys.org reports.

Back in January, we reported on a flexible nanogenerator that also utilized the triboelectric effect to generate power for mobile devices by moving your body, using your skin as a source of static electricity. And we’ve seen nano-sized triboelectrics before, although they weren’t transparent.

The idea behind the smart glass is certainly different, but still pretty simple. It works like this: Glass is often subject to the elements, like rain and heavy winds. When those things collide with glass, you’ve got your necessarily two materials for the triboelectric effect. So the team developed a dual-layer glass to harness it.

The first layer contains nanogenerators that capture the positively charged energy in water droplets, which comes from rubbing against the air on the way down from clouds, the report said. The second layer holds two charged plastic sheets with tiny springs between them; as wind pressure increases on the glass, the plastic sheets are pushed closer together, creating an electric current.

The resulting glass is completely clear at first, but then develops a blue tint as it generates up to 130mW of electricity per square meter of glass. That won’t power your refrigerator, but it’ll charge your phone. And it’s sounding as if the blue tint is pretty light, i.e. still perfectly see-through, and not murkier and just translucent instead of transparent.

The next step: figuring out how to store the generated energy using embedded, also-transparent super-capacitors. The researchers also said they’re looking into how to integrate the glass with wireless networking, since there’s no separate power source needed. This is all in contrast to something like a fully transparent solar cell, which could turn every window in your home into a power source via solar energy. If and when just one of these transparent products can make it to market, it could have a profound effect on the market for renewables — and the way we power everyday devices in our homes.

Wednesday 1 April 2015

DirectX 12, LiquidVR may breathe fresh life into AMD GPUS, thanks to asynchronous compute

With DirectX 12 coming soon with Windows 10, VR technology ramping up from multiple vendors, and the Vulkan API already debuted, it’s an exceedingly interesting time to be in PC gaming. AMD’s GCN architecture is three years old at this point, but certain features baked into the chips at launch (and expanded with Hawaii in 2013) are only now coming into their own, thanks to the improvements ushered in by next-generation APIs.

One of the critical technologies underpinning this argument is the Asynchronous Command Engine (ACEs) that are part of every GCN-class video card. The original HD 7900 family had two ACE’s per GPU, while AMD’s Hawaii-class hardware bumped that even further, to eight.

AMD’s Hawaii, Kaveri, and at least the PS4 have eight ACE’s. The Xbox presumably does as well.

AMD’s Graphics Core Next (GCN) GPUs are capable of asynchronous execution to some degree, as are Nvidia GPUs based on the GTX 900 “Maxwell” family. Previous Nvidia cards like Kepler and even the GTX Titan were not.

What’s an Asynchronous Command Engine?

The ACE units inside AMD’s GCN architecture are designed for flexibility. The chart below explains the difference — instead of being forced to execute a single queue via pre-determined order, even when it makes no sense to do so, tasks from different queues can be scheduled and completed independently. This gives the GPU some limited ability to execute tasks out-of-order — if the GPU knows that a time-sensitive operation that only needs 10ns of compute time is in the queue alongside a long memory copy that isn’t particularly time sensitive, but will take 100,000ns, it can pull the short task, complete it, and then run the longer operation.

Asynchronous vs. synchronous threading

The point of using ACE’s is that they allow the GPU to process and execute multiple command streams in parallel. In DirectX11, this capability wasn’t really accessible — the API was heavily abstracted, and multiple developers have told us that multi-threading support in DX11 was essentially broken from Day 1. As a result, there’s been no real way to tell the graphics card to handle graphics and compute in the same workload.

GPU pipelines in DX11 vs. DX12

AMD’s original GCN hardware may have debuted with just two ACEs, but AMD claims that it added six ACE units to Hawaii as part of a forward-looking plan, knowing that the hardware would one day be useful. That’s precisely the sort of thing you’d expect a company to say, but there’s some objective evidence that Team Red is being honest. Back when GCN and Nvidia’s Kepler were going head to head, it quickly became apparent that while the two companies were neck and neck in gaming, AMD’s GCN was far more powerful than Nvidia’s GK104 and GK110 in many GPGPU workloads. The comparison was particularly lopsided in cryptocurrency mining, where AMD cards were able to shred Nvidia hardware thanks to a more powerful compute engine and support for some SHA-1 functions in hardware.

When AMD built Kaveri and the SoCs for the PS4 and Xbox One, it included eight ACEs in those chips as well. The thinking behind that move was that adding more asynchronous compute capability would allow programmers to use the GPU’s computational horsepower more effectively. Physics and certain other types of in-game calculations, including some of the work that’s done in virtual reality simulation, can be handled in the background.

Asynchronous shader performance in a simulated demo.

AMD’s argument is that with DX12 (and Mantle / Vulkan), developers can finally use these engines to their full potential. In the image above, the top pipeline is the DX11 method of doing things, in which work is mostly being handled serially. The bottom image is the DX12 methodology.

Whether programmers will take advantage of these specific AMD capabilities is an open question, but the fact that both the PS4 and Xbox one have a full set of ACEs to work with suggests that they may. If developers are writing the code to execute on GCN hardware already, moving that support over to DX12 and Windows 10 is no big deal.

A few PS4 titles and just one PC game use asynchronous shaders now, but that could change.

Right now, AMD has only released information on the PS4’s use of asynchronous shaders, but that doesn’t mean the Xbox One can’t. It’s possible that the DX12 API push that Microsoft is planning for that console will add the capability.

AMD is also pushing ACE’s as a major feature for its LiquidVR platform — a fundamental capability that it claims will give Radeon cards an edge over their Nvidia counterparts. We’ll need to see final hardware and drivers before making any such conclusions, of course, but the compute capabilities of the company’s cards are well established. It’s worth noting that while AMD did have an advantage in this area over Kepler, which had only one compute and one graphics pipeline, Maxwell has one graphics pipeline and 32 compute pipes, compared to just 8 AMD ACEs. Whether this impacts performance or not in shipping titles is something we’ll only be able to answer once DX12 games that specifically use these features are in-market.

The question, from the end-user perspective, obviously boils down to which company is going to offer better performance (or price/performance ratio) in the next-generation DX12 API. It’s far too early to make a determination on that front — recent 3DMark 12 benchmarks put AMD’s R9 290X out in front of Nvidia’s GTX 980, while Star Swarm results from earlier this year reversed that result.

What is clear is that DX12 and Vulkan are reinventing 3D APIs and, by extension, game development in ways we haven’t seen in years. The new capabilities of these frameworks are set to improve everything from multi-GPU configurations to VR displays. Toss in features like 4K monitors and FreeSync / G-Sync support, and it’s an exciting time for the PC gaming industry.

Subscribe to: Posts ( Atom )