With the rise of artificial intelligence and its insatiable demand for low-energy, high performance computing, research is focusing on emerging technologies that perform new functions like large-scale vector-matrix multiplication, or synaptic weighting and neuronal response. These application-specific solutions, referred to as hardware accelerators, require evaluation of the performance of their new functionality. We are enabling this research by designing and building prototypes of, and developing measurements on, small- to medium-scale hardware accelerators consisting of a variety of nanodevices such as magnetic tunnel junctions and memristive crossbars.
One promising candidate for building a hardware accelerator comes from the field of spintronics, where information is carried by electronic spin rather than charge. Magnetic tunnel junctions are particularly suited because of their multifunctionality and compatibility with standard integrated circuits. The most straightforward way to use magnetic tunnel junctions in an AI hardware accelerator is to use them as controllable binary “weights” connecting neurons forming a neural network. Neural networks have become a workhorse in many aspects of artificial intelligence, from image and voice recognition through search algorithms and even self-driving cars. Implementing them as software in typical computing environments is not as energy efficient as it could be. The use of crossbar arrays of programmable devices, like magnetic tunnel junctions, is one hardware accelerator that aims to improve this efficiency. Working with Western Digital, we have integrated 20,000 magnetic tunnel junctions in a crossbar array with standard CMOS to produce a prototype hardware accelerator (see Fig. 1). We use this prototype crossbar array to perform neural network inference of handwritten digits, creating a versatile, accessible platform for developing characterization and performance metrics.
By using magnetic tunnel junctions – physical devices – to represent the weights of a neural network, we learned that broken devices can severely impact performance. On a crossbar array of twenty thousand magnetic tunnel junctions, even an amount as small as one hundred broken devices (less than one percent) can reduce performance of the neural network significantly. Each unique chip has its own unique configuration of broken devices. We found that if you consider these configurations during neural network training, you can recover performance towards ideal levels (Fig. 2). For real-world applications, training millions of unique configurations for millions of chips is impractical. We found that we can use the underlying statistics of broken devices on a chip to produce solutions that perform well for any configuration. This finding provides a path to ensuring that a network works well on these crossbars, even if it is impossible to know exactly when or where devices will break.
Another emerging technology we are investigating is memristive device junctions (memristors). The ability to set the resistance of these devices to a continuum of values as a function of programming current makes them prime candidates for representing weights in a neural network. Many neural networks that contain a rich density of information require precise weights within a certain range. Unlike magnetic tunnel junctions, memristors show potential to deliver these continuous characteristics. At the same time, memristor technology suffers from potentially high device-to-device variation, low yield and read-write cycles. Matching the emerging technology to the proper application is an important consideration in this case.
In collaboration with George Washington University, we demonstrated that memristors can be integrated with CMOS in the same way as magnetic tunnel junctions. By implementing a method known as ensemble averaging, we demonstrated that even for a low yield and high variation in device properties, one can operate a neural network on a crossbar array of memristors (see Fig. 3). Ensemble averaging improves performance through redundancy, demonstrating that multiple copies of the same network (using different sets of devices) averaged at the right time during operation can recover performance more than 50 %.