Software-Hardware Co-Design Becomes Real


For the past 20 years, the industry has sought to deploy hardware/software co-design concepts. While it is making progress, software/hardware co-design appears to have a much brighter future.

In order to understand the distinction between the two approaches, it is important to define some of the basics.

Hardware/software co-design is essentially a bottom-up process, where hardware is developed first with a general concept of how it is to be used. Software is then mapped to that hardware. This is sometimes called platform-based design. A very recent example of this is Arm‘s new Scalable Open Architecture for Embedded Edge (SOAFEE), which seeks to enable software-defined automotive development.

Software/hardware co-design, in contrast, is a top-down process where software workloads are used to drive the hardware architectures. This is becoming a much more popular approach today, and it is typified by AI inference engines and heterogenous architectures. High-level synthesis is also a form of this methodology.

Both are viable design approaches, and some design flows are a combination of the two. “It always goes back to fundamentals, the economy of scale,” says Michael Young, director of product marketing at Cadence. “It is based on the function you need to implement, and that generally translates into response time. Certain functions have real-time, mission-critical constraints. The balance between hardware and software is clear in these cases, because you need to make sure that whatever you do, the response time is within a defined limit. Other applications do not have this restriction and can be done when resources are available.”

But there are other pressures at play today as Moore’s Law scaling slows down. “What’s happening is that the software is driving the functionality in the hardware,” says Simon Davidmann, CEO at Imperas Software. “Products need software that is more efficient, and that is driving the hardware architectures.”

Neither approach is better than the other. “We see both hardware-first and software-first design approaches, and neither of the two yields sub-optimal results,” says Tim Kogel, principal applications engineer at Synopsys. “In AI, optimizing the hardware, AI algorithm, and AI compiler is a phase-coupled problem. They need to be designed, analyzed, and optimized together to arrive at an optimized solution. As a simple example, the size of the local memory in an AI accelerator determines the optimal loop tiling in the AI compiler.”

Costs are a very important part of the equation. “Co-design is a very good approach to realize highly optimized hardware for a given problem,” says Andy Heinig, group leader for advanced system integration and department head for efficient electronics at Fraunhofer IIS’ Engineering of Adaptive Systems Division. “But this high level of optimization is one of the drawbacks of the approach. Optimized designs are very expensive, and as a result such an approach can only work if the number of produced devices is very high. Most applications do not need optimized hardware, instead using more flexible architectures that can be re-used in different applications. Highly optimized but flexible architectures should be the result of the next-generation hardware/software co-design flows.”

High-level synthesis
The automatic generation of hardware from software has been a goal of academia and industry for several decades, and this led to the development of high-level synthesis (HLS). “Software that is developed to run on a CPU is not the most optimal code for high-level synthesis,” says Anoop Saha, senior manager for strategy and business development at Siemens EDA. “The mapping is inherently serial code into parallel blocks, and this is challenging. That is the value of HLS and how you do it. We have seen uses of SystemC, which has native support for multi-threading, but that is hardware-oriented and not software-oriented.”

Challenges remain with this approach. “We have been investing in it continuously, and we have continued to increase the adoption of it,” says Nick Ni, director of marketing, Software and AI Solutions at Xilinx. “Ten years ago, 99% of people only wrote Verilog and VHDL. But more than half of our developers are using HLS today for one piece of IP, so we have made a lot of progress in terms of adoption. The bottom line is that I don’t think anything has really taken off from a hardware/software co-design perspective. There have been a lot of interesting proposals on the language front to make it more parallel, more multi-processor friendly, and these are definitely going in the right direction. For example, OpenCL was really trying to get there, but it has lost steam.”

Platform-based approach
Platform-based design does not attempt to inject as much automation. Instead, it relies on human intervention based on analysis. “Hardware/software co-design has been happening for quite a while,” says Michael Frank, fellow and system architect at Arteris IP. “People have been trying to estimate the behavior of the platform and evaluation its performance using real software for quite a while. The industry has been building better simulators, such as Gem5, and Qemu. This has extended into systems where accelerators have been included, where you build models of accelerators and offload your CPUs by running parts of the code on the accelerator. And then you try to balance this, moving more functionality from the software into the hardware.”

Arm recently announced a new software architecture and reference implementation called Scalable Open Architecture for Embedded Edge (SOAFEE), and two new reference hardware platforms to accelerate the software-defined future of automotive. “To address the software-defined needs of cars, it is imperative to deliver a standardized framework that enhances proven cloud-native technologies that work at scale with the real-time and safety features required in automotive applications,” says Chet Babla, vice president of automotive at Arm’s Automotive and IoT Line of Business. “This same framework also can benefit other real-time and safety-critical use cases, such as robotics and industrial automation.”

This works well for some classes of applications. “We are seeing more hardware/software co-design, not just because the paradigm of processing has changed, but also the paradigm of hardware has changed,” says Siemens’ Saha. “In the past, the hardware was very general-purpose, where you had an ISA layer on top of it. The software sits on top of that. It provides a very clean segmentation of the boundary between software and hardware and how they interact with each other. This reduces time to market. But in order to change that, they have to change the software programming paradigm, and that impacts the ROI.”

A tipping point
It has been suggested that Nvidia created a tipping point with CUDA. While it was not the first time that a new programming model and methodology had been created, it is arguably the first time that it was successful. In fact, it turned what was an esoteric parallel-processing hardware architecture into something that approached a general-purpose compute platform for certain classes of problems. Without that, the GPU would still just be a graphics processor.

“CUDA was far ahead of OpenCL, because it was basically making the description of the parallelism platform agnostic,” says Arteris’ Frank. “But this was not the first. Ptolemy (UC Berkeley) was a way of modeling parallelism and modeling data-driven models. OpenMP, automatic parallelizing compilers — people have been working on this for a long time, and solving it is not trivial. Building the hardware platform to be a good target for the compiler turns out to be the right approach. Nvidia was one of the first ones to get that right.”

Xilinx’s Ni agrees. “It is always easiest if the user can put in explicit parallelism, like CUDA or even OpenCL. That makes it explicit and easier to compile. Making that fully exploit the pipeline, fully exploit the memory, is still a non-trivial problem.”

Impact of AI
The rapid development of AI has flipped the focus from a hardware-first to a software-first flow. “Understanding AI and ML software workloads is the critical first step to beginning to devise a hardware architecture,” says Lee Flanagan, CBO for Esperanto Technologies. “Workloads in AI are abstractly described in models, and there are many different types of models across AI applications. These models are used to drive AI chip architectures. For example, ResNet-50 (Residual Networks) is a convolutional neural network, which drives the needs for dense matrix computations for image classification. Recommendation systems for ML, however, require an architecture that supports sparse matrices across large models in a deep memory system.”

Specialized hardware is required to deploy the software when it has to meet latency requirements. “Many AI frameworks were designed to run in the cloud because that was the only way you could get 100 processors or 1,000 processors,” says Imperas’ Davidmann. “What’s happening nowadays is that people want all this data processing in the devices at the endpoint, and near the edge in the IoT. This is software/hardware co-design, where people are building the hardware to enable the software. They do not build a piece of hardware and see what software runs on it, which is what happened 20 years ago. Now they are driven by the needs of the software.”

While AI is the obvious application, the trend is much more general than that. “As stated by Hennessy/Patterson, AI is clearly driving a new golden age of computer architecture,” says Synopsys’ Kogel. “Moore’s Law is running out of steam, and with a projected 1,000X growth of design complexity in the next 10 years, AI is asking for more than Moore can deliver. The only…


Read More:Software-Hardware Co-Design Becomes Real