Skip to content

Intel's New Method Boosts MLP Performance on Intel GPUs

Intel's new method for MultiLayer Perceptrons (MLPs) pushes performance limits on Intel GPUs. It could challenge Nvidia's dominance in machine learning and AI.

In this image we can see food in the food processor.
In this image we can see food in the food processor.

Intel's New Method Boosts MLP Performance on Intel GPUs

Researchers have developed a new method to enhance the performance of MultiLayer Perceptrons (MLPs), a fundamental component in Machine Learning (ML) and Artificial Intelligence (AI). The approach, detailed in the work 'Fully-fused Multi-Layer Perceptrons on Intel Data Center GPUs', focuses on optimizing GPU utilization and throughput for MLPs on Intel hardware.

The team, led by Intel, has created a fully-fused implementation of MLPs. This involves combining multiple layers into a single GPU kernel, reducing memory overhead and kernel launch latency. The method is particularly effective for narrow MLPs with an arbitrary number of layers and a small, constant number of neurons per layer.

The SYCL implementation on Intel GPUs has shown impressive results. For MLPs with width 64, it outperforms an equivalent CUDA implementation by a factor up to 2.84 in inference and 1.75 in training. Moreover, the fully-fused MLPs significantly increase arithmetic intensity and performance, surpassing IPEX and CUDA PyTorch versions on Nvidia's H100 GPU. The paper demonstrates the efficiency of the SYCL implementation in diverse applications, including Image Compression, Neural Radiance Fields, and Physics-Informed Machine Learning.

The first SYCL implementation of fully-fused MLPs on Intel GPUs has proven to be a significant advancement. It offers improved performance and resource efficiency, making it a promising tool for various ML applications. The work highlights the potential of Intel GPUs in ML and AI, challenging the dominance of Nvidia GPUs in this field.

Read also:

Latest