The Ecosystem for Scientific Software (E4S) provides a curated collection of interoperable, performance-portable software packages that enable scalable, high-performance applications across diverse computing architectures. Within the E4S ecosystem, portable parallel programming systems play a critical role in achieving both productivity and performance on heterogeneous systems — including multicore CPUs, GPUs, and emerging accelerators — from vendors such as NVIDIA, AMD, and Intel.

E4S includes a set of portable programming models, frameworks, and runtime systems that allow application developers to write code once and execute efficiently across multiple platforms. These systems cover both intra-node (shared-memory and accelerator) parallelism and inter-node (distributed-memory) parallelism, while also leveraging modern language-supported features available in Fortran, C++, and LLVM-based compilers.


The Value of Portable Programming Layers

Portable programming layers abstract away hardware-specific details while maintaining high performance. This enables developers to target a wide range of architectures without rewriting large portions of their applications.

MPI: The Backbone of Distributed Scientific Computing

MPI (Message Passing Interface) remains the foundational model for distributed-memory programming. It provides a standardized API for exchanging messages among processes running on different nodes in a cluster. E4S supports MPICH, OpenMPI, and vendor-tuned variants (e.g., Cray MPICH), ensuring broad portability and performance across DOE supercomputers. MPI’s stability, mature tooling, and ecosystem-wide interoperability make it indispensable for scalable applications.

Kokkos and the Rise of Performance Portability

Kokkos, originating from Sandia National Laboratories, provides a C++ programming model for writing performance-portable code targeting multiple backends such as CUDA, HIP, SYCL, and OpenMP. Kokkos is tightly integrated with Trilinos, PETSc, and other E4S math libraries, allowing applications to exploit GPUs and CPUs through a single, modern C++ abstraction layer. Its companion ecosystem includes Kokkos Kernels and Kokkos Tools, providing optimized linear algebra and performance instrumentation capabilities.

OpenMP, OpenACC, and LLVM Compiler Advancements

Compiler-supported parallelism models such as OpenMP and OpenACC enable incremental parallelization of legacy codes using pragma directives. Recent advances in LLVM-based compilers — including Clang/Flang for C++ and Fortran — provide robust support for GPU offloading, unified memory, and mixed-language interoperability. These tools complement E4S programming systems by lowering barriers for performance portability in both research and production settings.


Intra-Node Programming Systems

These systems focus on exploiting shared-memory and accelerator-level parallelism within a compute node. They provide APIs and abstractions for multi-core CPUs, GPUs, and other accelerators.

System Description Key Features Supported Architectures
Kokkos C++ performance-portable programming model Multi-backend (CUDA, HIP, SYCL, OpenMP), hierarchical parallelism, memory spaces NVIDIA, AMD, Intel GPUs; CPUs
RAJA Loop-based C++ abstraction layer developed by LLNL Portable loop execution and memory management CUDA, HIP, OpenMP, TBB
OpenMP (LLVM/Clang) Directive-based parallelism model integrated in LLVM Tasking, SIMD, GPU offload support CPUs and GPUs across vendors
OpenACC Directive-based accelerator programming model High-level GPU offload without deep hardware knowledge NVIDIA GPUs, multicore CPUs
Legion Task-based parallel runtime for data-centric applications Logical regions, dynamic scheduling Multicore and distributed hybrid systems
UPC++ Asynchronous PGAS (Partitioned Global Address Space) library for C++ Remote memory access, futures, distributed objects Shared and distributed memory systems

Inter-Node Programming Systems

These systems enable scalable distributed-memory programming, coordinating computation and communication across multiple nodes.

System Description Key Features Supported Architectures
MPI (MPICH, OpenMPI) Standard message-passing library for distributed-memory systems Point-to-point, collective, one-sided, persistent communication CPU/GPU clusters
Charm++ Message-driven, migratable objects for parallel applications Adaptive load balancing, asynchronous execution Clusters, exascale systems
HPX Asynchronous C++ runtime system Futures, active messages, fine-grained parallelism Multicore and distributed nodes
PaRSEC DAG-based runtime for distributed task scheduling Dynamic dependency tracking Heterogeneous clusters
Legion Realm Runtime Distributed execution layer of Legion Scalable data distribution and communication Hybrid systems
GasNet-EX Communication layer for PGAS languages (UPC++, Chapel) High-performance low-level messaging Clusters, leadership-class systems

Language-Supported Parallelism via LLVM Compilers

Modern Fortran and C++ compilers built on LLVM infrastructure (including Flang, Clang, and vendor derivatives) now provide direct language and directive-based access to parallelism, significantly narrowing the gap between compilers and specialized libraries.

Language / Compiler Parallelism Model E4S Relevance Key Features
Fortran (Flang / LLVM Flang) Coarrays, DO CONCURRENT, OpenMP Interoperates with MPI and math libraries Standard parallel features with GPU offloading
C++ (Clang / LLVM) C++17/20 parallel STL, OpenMP, SYCL Enables integration with Kokkos, RAJA, HPX Unified parallel execution policies
SYCL (via DPC++, hipSYCL) Single-source heterogeneous programming model Bridges HPC and AI kernels in E4S Cross-vendor GPU support, C++ templates
LLVM/OpenACC Directive-based offload Complements Kokkos and OpenMP for incremental migration Portable performance on accelerators

Summary

E4S programming systems enable developers to write portable, high-performance applications capable of running efficiently across diverse architectures. By integrating low-level communication systems like MPI, node-level abstractions like Kokkos and RAJA, and modern compiler technologies supporting OpenMP, OpenACC, and SYCL, E4S ensures that scientific applications can evolve with hardware and remain sustainable over decades.

Together, these tools form the foundation of the DOE’s exascale software ecosystem — unifying productivity, portability, and performance across the world’s fastest computing platforms.