E4S AI & Machine Learning

The E4S (Ecosystem for Scientific Software) distribution integrates a growing set of AI and machine learning (AI/ML) libraries and tools that complement traditional scientific computing workflows. These capabilities help users harness the power of machine learning for data analysis, surrogate modeling, uncertainty quantification, and hybrid AI–HPC workflows that accelerate discovery.

AI/ML tools in E4S are designed to interoperate with established programming models (such as MPI, Kokkos, and OpenMP) and numerical libraries (such as PETSc and Trilinos), enabling scientific applications to adopt AI methods without compromising portability, performance, or maintainability across diverse architectures.

The Value of Using AI/ML Libraries and Tools

AI/ML libraries and tools provide new ways to accelerate scientific computing and data-driven discovery. They help researchers create predictive models that learn from data, automate analysis of large-scale simulations, and bridge the gap between traditional numerical modeling and emerging data-centric approaches.

By leveraging AI/ML libraries that are part of E4S, users benefit from:

Portability and performance: E4S ensures compatibility with DOE leadership-class systems and major hardware platforms (NVIDIA, AMD, Intel, ARM).
Interoperability: AI/ML frameworks integrate with high-performance data and math libraries, facilitating hybrid workflows that combine simulation and learning.
Reproducibility: E4S provides well-tested, versioned, and build-cached environments, simplifying deployment across HPC facilities.
Extensibility: The ecosystem supports emerging AI models, scientific surrogates, and domain-aware learning frameworks built by national labs and research partners.

E4S thus bridges the growing intersection between traditional simulation-based computing and modern AI-driven workflows.

Industry-Standard AI/ML Products Supported in E4S

Product	Description	Website
TensorFlow	Widely used deep learning framework supporting CPUs, GPUs, and TPUs, integrated with Python and C++ APIs	https://www.tensorflow.org
PyTorch	Popular deep learning framework with dynamic computation graphs and strong GPU support	https://pytorch.org
ONNX	Open Neural Network Exchange format enabling model interoperability across frameworks	https://onnx.ai
Horovod	Distributed deep learning training framework for TensorFlow, PyTorch, and MXNet	https://github.com/horovod/horovod
scikit-learn	Classical machine learning library for Python, featuring regression, classification, and clustering	https://scikit-learn.org
XGBoost	Gradient boosting framework for scalable supervised learning	https://xgboost.ai
LightGBM	Gradient boosting framework optimized for performance and memory efficiency	https://lightgbm.readthedocs.io

Emerging Scientific AI/ML Products

Product	Description	Website
TorchSWE	Torch-based shallow water equation solver demonstrating physics-informed AI for PDEs	https://github.com/torchswe/torchswe
Modulus	NVIDIA-supported framework for physics-informed neural networks (PINNs) and operator learning	https://developer.nvidia.com/modulus
Colmena	Toolkit for steering AI-driven campaigns and adaptive sampling in HPC environments	https://github.com/exalearn/colmena
DeepHyper	Scalable hyperparameter optimization and neural architecture search framework for HPC systems	https://deephyper.readthedocs.io
SmartSim	Orchestration framework for coupling AI/ML models with HPC simulations	https://www.craylabs.org/docs/smartsim
AI4S (AI for Science) Tools	DOE-developed toolkits for applying machine learning to scientific simulations and experiments	https://ai4science.energy.gov
Maestro Workflow	Workflow manager enabling AI-driven orchestration and resource coordination	https://github.com/LLNL/maestrowf

Vendor-Provided AI/ML Tools from NVIDIA, AMD, Intel, and ARM

Vendor	Product	Description	Website
NVIDIA	cuDNN	GPU-accelerated primitives for deep neural networks, used by TensorFlow and PyTorch	https://developer.nvidia.com/cudnn
NVIDIA	TensorRT	Inference optimizer and runtime for high-performance deep learning inference	https://developer.nvidia.com/tensorrt
AMD	MIOpen	Deep learning primitives library optimized for ROCm GPUs	https://rocmdocs.amd.com/projects/miopen
AMD	ROCm AI Stack	Integrated AI software stack including PyTorch and TensorFlow with ROCm backend	https://rocmdocs.amd.com/en/latest/
Intel	oneDNN	Deep Neural Network Library optimized for Intel CPUs and GPUs	https://github.com/oneapi-src/oneDNN
Intel	OpenVINO	Toolkit for optimizing and deploying AI inference across Intel devices	https://www.intel.com/openvino
ARM	Compute Library	Optimized AI and computer vision functions for ARM-based architectures	https://developer.arm.com/solutions/machine-learning-on-arm/developer-material/compute-library

Summary

E4S’s integration of AI/ML tools complements its strong foundation in numerical, performance, and I/O libraries. By combining domain-tested HPC libraries with industry-standard and emerging AI frameworks, E4S enables the next generation of hybrid simulation–AI workflows that are portable, scalable, and sustainable across future exascale and AI-accelerated systems.