Construct your prompt from the instructions below then use the E4S Guide Bot

Chat with the E4S Guide Bot

Introduction

Selecting the right data and visualization libraries or tools is an important step in building efficient, scalable, and maintainable workflows for scientific computing and high-performance data analysis. The E4S 25.06 release includes a rich collection of data management, I/O, and visualization libraries designed for parallel performance, portability, and ecosystem integration.

A newcomer should think about attributes related to their data structures, access patterns, parallelism model, visualization scale, supported hardware, and compatibility with programming environments. Understanding these attributes helps identify tools that meet the technical and scientific needs of a given project—whether the goal is to efficiently store and move simulation output, visualize multi-terabyte datasets, or integrate AI-assisted analytics into scientific workflows.

When defining selection attributes, it helps to consider both broadly meaningful characteristics (like portability or API support) and situation-specific attributes (like in-situ visualization, streaming data, or exascale readiness).

These attributes can then be described in a structured form that allows a chatbot or decision assistant to guide the selection of appropriate E4S products such as ADIOS2, HDF5, ParaView, Ascent, VisIt, VTK-m, or Catalyst2.

Example Prompt

I need a parallel data I/O library that can handle structured mesh output from a multi-GPU simulation on Frontier. The workflow runs in MPI with C++ and uses Kokkos for parallelism. The output should be compatible with ParaView Catalyst for in-situ visualization. I prefer an open-source library with strong E4S support and minimal code changes for integration.


Attributes for Data Libraries and Tools

Broadly Meaningful Attributes

Attribute Description
Parallel I/O support Whether the library supports scalable I/O for distributed applications using MPI or similar models
Data model Type of data supported (structured, unstructured, tabular, hierarchical, key-value, etc.)
Supported APIs Available language bindings such as C, C++, Fortran, Python
Portability Extent to which the library is portable across CPU and GPU systems and HPC platforms
Metadata management Ability to manage and query metadata efficiently at scale
Compression Support for data compression and decompression, including lossy or lossless modes
Checkpoint/restart capabilities Whether the library supports state checkpointing and restart for resilience
File format compatibility Interoperability with common file formats like HDF5, NetCDF, or BP5
Data streaming Ability to handle real-time or asynchronous data streams
E4S integration Availability as part of E4S releases with verified Spack package support
License Software license type and compatibility with project policies
Community support Availability of documentation, tutorials, and active maintenance
Performance tuning Options for optimizing data layout, buffering, and I/O scheduling

Situation-Specific Attributes for Data Libraries and Tools

For Simulation Output

Attribute Description
In-situ I/O Direct output to visualization or analysis frameworks during simulation runtime
Temporal data management Support for time series and multi-timestep data
Large-file scalability Ability to handle multi-terabyte output efficiently
Domain decomposition mapping Handling of partitioned data in structured or unstructured domains

For Machine Learning Data Pipelines

Attribute Description
Tensor data support Ability to read/write multi-dimensional tensor data efficiently
Streaming ingestion Support for high-frequency or batched input data
Integration with AI frameworks Compatibility with TensorFlow, PyTorch, or ONNX data formats

For Exascale and Heterogeneous Systems

Attribute Description
GPU-direct I/O Capability to move data between GPU memory and storage without CPU involvement
Burst buffer support Awareness of fast intermediate storage tiers on supercomputers
Resilience features Fault-tolerant I/O and restart mechanisms at extreme scale

Attributes for Visualization Libraries and Tools

Broadly Meaningful Attributes

Attribute Description
Rendering model Type of rendering (rasterization, ray tracing, volume rendering) supported
Parallel rendering Ability to distribute rendering across multiple nodes or GPUs
Supported data formats Input formats supported (e.g., VTK, HDF5, ADIOS2, Exodus)
Integration with simulation frameworks Ability to couple visualization with simulation codes for in-situ workflows
Language bindings Availability of Python, C++, or scripting interfaces
Interactivity Support for interactive exploration versus batch rendering
Scalability Suitability for large data visualization on clusters or supercomputers
Extensibility Ease of adding custom filters, readers, or visualization modules
E4S integration Inclusion in E4S releases and maintained Spack recipes
Portability Availability across Linux, macOS, Windows, and major HPC systems
Automation Support for scripting and reproducible visualization workflows

Situation-Specific Attributes for Visualization Libraries and Tools

For In-Situ Visualization

Attribute Description
Catalyst or Ascent compatibility Support for in-situ data coupling through Catalyst2 or Ascent
Data coupling mechanism Method used to connect simulation data streams to visualization components
Memory sharing Ability to operate without full data copies
Runtime overhead Impact on simulation performance during coupled visualization

For Post-Processing and Analysis

Attribute Description
File import range Ability to load data from multiple simulation tools or formats
Parallel batch rendering Capability to render large datasets without interactivity
Derived field generation Tools for computing secondary quantities from raw data
Scripting interface Availability of programmable interfaces for automation (e.g., Python scripting in ParaView)

For Interactive or Immersive Visualization

Attribute Description
Interactive mode Support for real-time user manipulation and exploration
Remote visualization Ability to interact with large data sets remotely via client-server models
VR/AR readiness Compatibility with immersive visualization systems
Web interface Support for web-based exploration using WebGL or similar technologies