Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Hardware Accelerators for Neural Networks

Summary

With the rise of artificial intelligence and its insatiable demand for low-energy, high performance computing, research is focusing on emerging technologies that perform new functions like large-scale vector-matrix multiplication, or synaptic weighting and neuronal response. These application-specific solutions, referred to as hardware accelerators, require evaluation of the performance of their new functionality. We are enabling this research by designing and building prototypes of, and developing measurements on, small- to medium-scale hardware accelerators consisting of a variety of nanodevices such as magnetic tunnel junctions and memristive crossbars.

Description

Microscope image of a crossbar array
Fig. 1. (left) Microscope image of a crossbar array of 20,000 magnetic tunnel junctions integrated with CMOS. Surrounding enclosure is a probe card containing hundreds of needles used to probe the chip. (right) Circuit schematic of a 2-by-2 portion of the crossbar array. Magnetic tunnel junctions are measured by turning on the transistors and applying a voltage to either the rows or columns.
Credit: NIST

One promising candidate for building a hardware accelerator comes from the field of spintronics, where information is carried by electronic spin rather than charge. Magnetic tunnel junctions are particularly suited because of their multifunctionality and compatibility with standard integrated circuits. The most straightforward way to use magnetic tunnel junctions in an AI hardware accelerator is to use them as controllable binary “weights” connecting neurons forming a neural network. Neural networks have become a workhorse in many aspects of artificial intelligence, from image and voice recognition through search algorithms and even self-driving cars. Implementing them as software in typical computing environments is not as energy efficient as it could be. The use of crossbar arrays of programmable devices, like magnetic tunnel junctions, is one hardware accelerator that aims to improve this efficiency. Working with Western Digital, we have integrated 20,000 magnetic tunnel junctions in a crossbar array with standard CMOS to produce a prototype hardware accelerator (see Fig. 1). We use this prototype crossbar array to perform neural network inference of handwritten digits, creating a versatile, accessible platform for developing characterization and performance metrics. 

Performance of magnetic tunnel junction crossbar
Fig. 2. (left) performance of magnetic tunnel junction crossbar with broken devices. We tested thirty-six different crossbar arrays and plotted the error of the neural network, where a lower error means higher performance. (right) performance improvement after implementing neural network training method that considers underlying device properties.  
Credit: NIST

By using magnetic tunnel junctions – physical devices – to represent the weights of a neural network, we learned that broken devices can severely impact performance. On a crossbar array of twenty thousand magnetic tunnel junctions, even an amount as small as one hundred broken devices (less than one percent) can reduce performance of the neural network significantly. Each unique chip has its own unique configuration of broken devices. We found that if you consider these configurations during neural network training, you can recover performance towards ideal levels (Fig. 2). For real-world applications, training millions of unique configurations for millions of chips is impractical. We found that we can use the underlying statistics of broken devices on a chip to produce solutions that perform well for any configuration. This finding provides a path to ensuring that a network works well on these crossbars, even if it is impossible to know exactly when or where devices will break. 

Another emerging technology we are investigating is memristive device junctions (memristors). The ability to set the resistance of these devices to a continuum of values as a function of programming current makes them prime candidates for representing weights in a neural network. Many neural networks that contain a rich density of information require precise weights within a certain range. Unlike magnetic tunnel junctions, memristors show potential to deliver these continuous characteristics. At the same time, memristor technology suffers from potentially high device-to-device variation, low yield and read-write cycles. Matching the emerging technology to the proper application is an important consideration in this case.

Ensemble Averaging
Fig. 3. Results that demonstrate the improvement in neural network performance by using ensemble averaging. (left) as the number of devices that are unusable increase (stuck percentage), a network with low redundancy quickly drops in performance (LEA -Simple). (right) By increasing redundancy, the network is ensure performance for a wider range of broken devices. 
Credit: NIST

In collaboration with George Washington University, we demonstrated that memristors can be integrated with CMOS in the same way as magnetic tunnel junctions. By implementing a method known as ensemble averaging, we demonstrated that even for a low yield and high variation in device properties, one can operate a neural network on a crossbar array of memristors (see Fig. 3). Ensemble averaging improves performance through redundancy, demonstrating that multiple copies of the same network (using different sets of devices) averaged at the right time during operation can recover performance more than 50 %. 

Created March 18, 2025, Updated March 28, 2025