The next generation of neural networks could live in hardware
Networks programmed directly into computer chip hardware can identify images faster, and use much less energy, than the traditional neural networks that underpin most modern AI systems. That’s according to work presented at a leading machine learning conference in Vancouver last week.
Neural networks, from GPT-4 to Stable Diffusion, are built by wiring together perceptrons, which are highly simplified simulations of the neurons in our brains. In very large numbers, perceptrons are powerful, but they also consume enormous volumes of energy—so much that Microsoft has penned a deal that will reopen Three Mile Island to power its AI advancements.
Part of the trouble is that perceptrons are just software abstractions—running a perceptron network on a GPU requires translating that network into the language of hardware, which takes time and energy. Building a network directly from hardware components does away with a lot of those costs. One day, they could even be built directly into chips used in smartphones and other devices, dramatically reducing the need to send data to and from servers.
Felix Petersen, who did this work as a postdoctoral researcher at Stanford University, has a strategy for making that happen. He designed networks composed of logic gates, which are some of the basic building blocks of computer chips. Made up of a few transistors apiece, logic gates accept two bits—1s or 0s—as inputs and, according to a rule determined by their specific pattern of transistors, output a single bit. Just like perceptrons, logic gates can be chained up into networks. And running logic-gate networks is cheap, fast, and easy: in his talk at the Neural Information Processing Systems (NeurIPS) conference, Petersen said that they consume less energy than perceptron networks by a factor of hundreds of thousands.
Logic-gate networks don’t perform nearly as well as traditional neural networks on tasks like image labeling. But the approach’s speed and efficiency make it promising, according to Zhiru Zhang, a professor of electrical and computer engineering at Cornell University. “If we can close the gap, then this could potentially open up a lot of possibilities on this edge of machine learning,” he says.
Petersen didn’t go looking for ways to build energy-efficient AI networks. He came to logic gates through an interest in “differentiable relaxations,” or strategies for wrangling certain classes of mathematical problems into a form that calculus can solve. “It really started off as a mathematical and methodological curiosity,” he says.
Backpropagation, the training algorithm that made the deep-learning revolution possible, was an obvious use case for this approach. Because backpropagation runs on calculus, it can’t be used directly to train logic-gate networks. Logic gates only work with 0s and 1s, and calculus demands answers about all the fractions in between. Petersen devised a way to “relax” logic-gate networks enough for backpropagation by creating functions that work like logic gates on 0s and 1s but also give answers for intermediate values. He ran simulated networks with those gates through training and then converted the relaxed logic-gate network back into something that he could implement in computer hardware.
One challenge with this approach is that training the relaxed networks is tough. Each node in the network could end up as any one of 16 different logic gates, and the 16 probabilities associated with each of those gates must be kept track of and continually adjusted. That takes a huge amount of time and energy—during his NeurIPS talk, Petersen said that training his networks takes hundreds of times longer than training conventional neural networks on GPUs. At universities, which can’t afford to amass hundreds of thousands of GPUs, that amount of GPU time can be tough to swing—Petersen developed these networks, in collaboration with his colleagues, at Stanford University and the University of Konstanz. “It definitely makes the research tremendously hard,” he says.
Once the network has been trained, though, things get way, way cheaper. Petersen compared his logic-gate networks with a cohort of other ultra-efficient networks, such as binary neural networks, which use simplified perceptrons that can process only binary values. The logic-gate networks did just as well as these other efficient methods at classifying images in the CIFAR-10 data set, which includes 10 different categories of low-resolution pictures, from “frog” to “truck.” It achieved this with fewer than a tenth of the logic gates required by those other methods, and in less than a thousandth of the time. Petersen tested his networks using programmable computer chips called FPGAs, which can be used to emulate many different potential patterns of logic gates; implementing the networks in non-programmable ASIC chips would reduce costs even further, because programmable chips need to use more components in order to achieve their flexibility.
Farinaz Koushanfar, a professor of electrical and computer engineering at the University of California, San Diego, says she isn’t convinced that logic-gate networks will be able to perform when faced with more realistic problems. “It’s a cute idea, but I’m not sure how well it scales,” she says. She notes that the logic-gate networks can only be trained approximately, via the relaxation strategy, and approximations can fail. That hasn’t caused issues yet, but Koushanfar says that it could prove more problematic as the networks grow.
Nevertheless, Petersen is ambitious. He plans to continue pushing the abilities of his logic-gate networks, and he hopes, eventually, to create what he calls a “hardware foundation model.” A powerful, general-purpose logic-gate network for vision could be mass-produced directly on computer chips, and those chips could be integrated into devices like personal phones and computers. That could reap enormous energy benefits, Petersen says. If those networks could effectively reconstruct photos and videos from low-resolution information, for example, then far less data would need to be sent between servers and personal devices.
Petersen acknowledges that logic-gate networks will never compete with traditional neural networks on performance, but that isn’t his goal. Making something that works, and that is as efficient as possible, should be enough. “It won’t be the best model,” he says. “But it should be the cheapest.”