Research
5 minute read

New algorithms open possibilities for training AI models on analog chips

When analog chips are used for language models, their physical properties limit them to inference. But IBM Research scientists are working on several new algorithms that equip these energy efficient processors to train models.

An abstract image showing the connection between a digital system and an analog in-memory computing device.

When analog chips are used for language models, their physical properties limit them to inference. But IBM Research scientists are working on several new algorithms that equip these energy efficient processors to train models.

Deep neural network training requires running many processors for days at a time, and as AI systems continue to scale, finding cheaper and more efficient ways to perform this training is becoming increasingly important. But IBM Research scientist Tayfun Gokmen and his team are taking a creative approach to the issue, developing algorithms that will enable analog AI devices to accelerate the process of deep neural network training — and do it more energy efficiently than CPUs or GPUs can.

Until now, inference has been the chief focus of In-memory computing is a type of computer architecture that eliminates the physical separation between memory and CPU. In data-intensive computation, including deep learning, runtime and energy consumption are dominated by data communication between the memory and CPU. By running deep learning models on in-memory computing chips, the same operations can be run in less time and with less energy.in-memory computing. But there are greater energy and compute savings to be found in training, Gokmen argues, because the computational cost of model training is monumentally higher. Unfortunately, when researchers use these in-memory computing devices for training, they don’t always behave. The materials used in these devices — like the atomic filaments used in resistive random-access memory or the chalcogenide glass used in phase change memory — struggle with noise and switching issues, so we must devise new algorithms to make these devices useful for accelerating deep neural network workloads.

One of the big problems they encounter is that many in-memory training algorithms need levels of computational fidelity that can be unrealistic to perform on analog devices. This team’s approach makes great strides toward resolving that problem, with algorithms that work around that requirement. They’ve published their results in Nature Communications.1

Analog in-memory computing

Most traditional chip designs, like CPUs or GPUs, have discrete memory and processing units, and must shuttle data back and forth between the two. This hindrance to a chip’s latency is referred to as the von Neumann bottleneck. With analog in-memory chips, however, there’s no separation between compute and memory, making these processors exceptionally economical compared to traditional designs. There’s no shuttling of data — in the case of AI, the model weights — back and forth through the von Neumann bottleneck.

In an analog device, model weights for a neural network are held not in transistors, but in devices that store them in physical form. These units contain specialized materials that change their conductance or resistance to encode intermediate values between 0 and 1. This quality means a single analog memory device can hold more values than a single transistor, and crossbars full of the devices make efficient use of space. But these analog units also have drawbacks: AI model training adjusts model weights billions or trillions of times — an easy task for digital transistors that can switch on and off over and over — but these physical memory devices can’t handle all that switching. Changing their physical state trillions of times will degrade their structure and reduce their computation fidelity.

For this reason, training is usually done on digital hardware and then the weights are ported over to an analog device, where they’re locked in for inference and won’t be tweaked further. “This is basically a one-time effort,” Gokmen says. “Then you’re using the same weights again and again.”

Training requires incremental adjustments, though, so the basic challenge is how to do these updates efficiently and reliably. Their proposed solution: using electrical pulses to simultaneously compute each weight gradient and perform the model weight update. But when you do it this way, you’re relying on the device to take the update correctly, and they often fail — either because of stochasticity or because of device-to-device variability. “One device may be updated by a certain amount, but when you go to another device, that amount may be different,” he says.

Beyond this inconsistency, there are issues with the materials. Depending on where in the conductance range the weight value lies, and how much you’re trying to change it, analog memory devices may have a harder time changing to suit. Specifically, Gokmen says, the increments of change tend to be stronger at the beginning, but once the material reaches high conductance, it will become saturated and it’s harder to adjust the values further. Similarly, if you bring the material’s conductance down, the weights come down fast at first and then saturate near the bottom of the range. In short, Gokmen says, these are just a few of the 10 or more different things that can go wrong when training AI models on these types of devices.

In-memory training algorithms

Materials scientists at IBM Research are working on addressing some of these problems at the physical level, but in the meantime, other researchers like Gokmen and team are developing algorithms to overcome the hurdles in with analog devices.

The team took two approaches to the problems with training models on analog in-memory devices. The algorithms they arrived at are called Analog Gradient Accumulation with Dynamic reference (AGAD) and Chopped Tiki-Taka version 2 (c-TTv2). Both are revised versions of an existing algorithm from his team — named after the “tiki-taka” style of soccer play made famous by the Spanish national team, that involves lots of short passes to maintain possession of the ball.

With these approaches, they’re tackling a handful of the problems that arise from the non-ideal properties of in-memory computing devices. This includes the amount of noise, both between cycles and from the variability of one device to the next. “We can also address the nonlinear switching behavior of the devices,” Gokmen says, those problems with saturation mentioned above. Any one of these three issues cause inconsistent model weight updates during analog in-memory AI training. These algorithms also help correct noise in the symmetry point, a measure that describes the conductance level at which a memory device stabilizes when it’s fed an electrical pulse. “It might not be a fixed point, it may be drifting around, and it could be different from one device to another,” says Gokmen.

In simulations of analog in-memory model training, they found both AGAD and c-TTv2 showed greatly reduced error rates compared to their previous TTv2 algorithm.

One major advance they’ve achieved with these algorithms is the ability to perform the model weight updates completely in memory, rather than offloading them to a digital device. “We’re really pushing that effort internally,” Gokmen says. “In terms of algorithm development, we’re ahead of the curve.” Now they’re ready to train small models on the available analog devices, but those plans are still coming together, and they depend on the availability of suitable analog hardware.

What’s next

The field of analog computing is still in its early days. Despite algorithmically addressing about half of the material problems with training in-memory analog processors, the team’s test results still show a performance gap for larger neural networks. Their follow-up work will interrogate why this happens, Gokmen says. “We still don’t understand why we have that gap.”

To build on the results, Gokmen and his team are collaborating with researchers at Rensselaer Polytechnic Institute to devise mathematical explanations and justifications for the effects they observed in their experiments.

IBM Research’s scientists are developing the hardware that will run tomorrow’s AI models, including the experimental core design shown off last year, which uses analog in-memory computing for inference workloads. There are also digital processors that can perform in-memory computing, including IBM’s AIU NorthPole chip which takes inspiration from the brain. Researchers working on analog in-memory computing argue that deep neural networks would work even better on this hardware if their architectures were co-designed with the algorithms for running them on analog devices, and these algorithms are part of the path that will get us there.

Date

Notes

  1. Note 1In-memory computing is a type of computer architecture that eliminates the physical separation between memory and CPU. In data-intensive computation, including deep learning, runtime and energy consumption are dominated by data communication between the memory and CPU. By running deep learning models on in-memory computing chips, the same operations can be run in less time and with less energy. ↩︎

References

  1. Rasch, Malte J., et al. Fast and Robust Analog In-Memory Deep Neural Network Training. Nature Communications, vol. 15, no. 1, Aug. 2024, p. 7133.