xSDK: Building an Ecosystem of Highly Efficient Math Libraries for Exascale

Ongoing efforts to build increasingly powerful computer architectures are establishing new avenues for more complex and higher-fidelity simulations. When coupled with data analytics and learning, these simulations inspire novel scientific insights and deeper understanding. Exascale computers will be much faster than previous computer generations, performing 10¹⁸ operations per second — or 1,000 times faster than petascale. To achieve these performance improvements, computer architectures are becoming increasingly complex and incorporating deep memory hierarchies, very high node and core counts, and heterogeneous features like graphics processing units (GPUs). Such architectural changes impact the full breadth of computing scales, as heterogeneity pervades even current-generation laptops, workstations, and moderate-sized clusters.

While emerging advanced architectures provide unprecedented opportunities, they also present significant challenges for developers of scientific applications—such as multiphysics and multiscale codes—who must adapt their software to handle disruptive changes in architectures and innovative programming models that have not yet stabilized. Developers need to consider increasing concurrency while reducing communication and synchronization, as well as other complexities like the potential use of mixed precision to leverage the computing power that is available in low-precision tensor cores. On the one hand, developers should implement new scientific capabilities that in turn increase code complexity. On the other hand, these codes must be portable to new architectures, which requires the inclusion of novel programming models and the restructuring of code to achieve good performance. Addressing these issues is beyond the capability of any single person or team, thus necessitating collaboration among many teams that can encapsulate their expertise in reusable software and work together to create sustainable software ecosystems.

<strong>Figure 1.</strong> A Biological and Environmental Research (BER)-integrated hydrology and reactive transport simulation at the Copper Creek watershed in Colorado’s East River, and two simulations performed by the Center for Efficient Exascale Discretizations (CEED). <strong>1a.</strong> Advanced Terrestrial Simulator (ATS)-integrated hydrology spin-up phase. The surface plot shows the depth of ponded water and the subsurface plot depicts water saturation. <strong>1b.</strong> Injector: incompressible Navier-Stokes with fully implicit mass transport. <strong>1c.</strong> Simulation of a laser-driven radiating Kelvin-Helmholtz instability using a high-order multi-material arbitrary Lagrangian-Eulerian radiation-hydrodynamics discretization. Figure 1a courtesy of David Moulton and Zexuan Xu; 1b courtesy of Julian Andrej; and 1c courtesy of Tzanio Kolev, Robert Rieben, Vladimir Tomov, and Veselin Dobrev. — **Figure 1.** A Biological and Environmental Research (BER)-integrated hydrology and reactive transport simulation at the Copper Creek watershed in Colorado’s East River, and two simulations performed by the Center for Efficient Exascale Discretizations (CEED). **1a.** Advanced Terrestrial Simulator (ATS)-integrated hydrology spin-up phase. The surface plot shows the depth of ponded water and the subsurface plot depicts water saturation. **1b.** Injector: incompressible Navier-Stokes with fully implicit mass transport. **1c.** Simulation of a laser-driven radiating Kelvin-Helmholtz instability using a high-order multi-material arbitrary Lagrangian-Eulerian radiation-hydrodynamics discretization. Figure 1a courtesy of David Moulton and Zexuan Xu; 1b courtesy of Julian Andrej; and 1c courtesy of Tzanio Kolev, Robert Rieben, Vladimir Tomov, and Veselin Dobrev.

The Need for Software Ecosystem Perspectives

Sophisticated mathematical algorithms—which are required by many application codes—come from scientific libraries that are independently developed by experts who possess deep knowledge of these methods [2]. This approach moves much of the burden of porting to new architectures—possibly including novel programming models—and ensuring the correct and efficient performance of these components to library developers, who know their codes best. Large multiphysics applications often require the utilization of many independent libraries, which can lead to additional difficulties.

As many application developers have likely experienced, developing and compiling a large code that uses multiple, independently-developed libraries can be tricky. Issues like inconsistent header files, inconsistent versions, and namespace collisions can prevent a seamless build and cause hours or even days of frustration. The ability to effortlessly build a set of complementary libraries and use them in combination is crucial. Such libraries must be sustainable, well tested, and interoperable. To address these issues, developers are embracing software ecosystem perspectives as they work towards the common goals of productivity, sustainability, and portability.

xSDK History

The Extreme-scale Scientific Software Development Kit (xSDK) is an example of one such ecosystem. Work on the xSDK began in 2014 as part of the Interoperable Design of Extreme-scale Application Software (IDEAS) project, which was sponsored by the U.S. Department of Energy (DOE) Office of Science as a partnership between the Offices of Advanced Scientific Computing Research (ASCR) and Biological and Environmental Research (BER). IDEAS aimed to address challenges in software productivity and sustainability, with an emphasis on terrestrial ecosystem modeling. Research involved several subsurface simulation codes on the application side and four high-performance math libraries: hypre, PETSc, SuperLU, and Trilinos. Even with this limited number of packages, it was impossible to reliably build, link, and run a single executable due to various incompatibilities. As summarized in a recent community report [1], work on xSDK and IDEAS expanded in 2017 with the DOE Exascale Computing Project (ECP), which requires intensive development of applications and software technologies while anticipating and adapting to continuous advances in computing architectures.

<strong>Figure 2.</strong> Simulations performed by the ExaWind and WDMApp projects that utilize xSDK libraries. <strong>2a.</strong> Flow structure around an NREL five-megawatt wind turbine rotor generated by the ExaWind Nalu-Wind high-performance computational code. <strong>2b.</strong> Core-edge coupled turbulence simulation in a realistic tokamak geometry. The color contours represent perturbed plasma density. Figure 2a courtesy of Shreyas Ananthan and Ganesh Vijayakumar; 2b courtesy of Julian Dominski, Choong-Seock Chang, and Amitava Bhattacharjee. — **Figure 2.** Simulations performed by the ExaWind and WDMApp projects that utilize xSDK libraries. **2a.** Flow structure around an NREL five-megawatt wind turbine rotor generated by the ExaWind Nalu-Wind high-performance computational code. **2b.** Core-edge coupled turbulence simulation in a realistic tokamak geometry. The color contours represent perturbed plasma density. Figure 2a courtesy of Shreyas Ananthan and Ganesh Vijayakumar; 2b courtesy of Julian Dominski, Choong-Seock Chang, and Amitava Bhattacharjee.

A Set of Guidelines

To overcome some of the difficulties of building a code with multiple independently-developed math libraries, we created xSDK community policies — a set of rules or guidelines that each package agrees to follow. These mandatory policies address the topics of configuration, installation, testing, Message Passing Interface usage, portability, contact and version information, open-source licensing, namespacing, and repository access. They help improve software quality, usability, access, and sustainability. In addition, several recommended policies focus on error handling and use of a public repository, among other topics. A software package may become an xSDK member if it achieves sustainable compatibility with the community policies and interoperability with another xSDK library — either by using or being used by a different xSDK package. This approach addresses a variety of potential pitfalls in the combined utilization of diverse packages but is simultaneously not so heavy-handed as to interfere with the software strategies of individual libraries. The xSDK project regularly reviews the policies, makes changes when appropriate, and encourages ongoing input from the broader community to ensure that the policies do not become stale.

Testing and Releases

The xSDK developers provide regular releases that generally include new member packages. The xSDK currently comprises 23 math libraries and two domain components. It employs Spack, a flexible package manager that supports multiple versions, configurations, compilers, and platforms. Spack automatically detects any dependencies between packages based on scripts that are provided by library developers, which facilitates smooth and fast builds of large application codes. Before each xSDK release, ongoing regular testing is extended to include a variety of different platforms; this ensures a seamless build and helps handle any issues that have arisen during the libraries’ continual development. Developers also regularly update xSDK documentation, including instructions for platforms (such as DOE leadership computing facilities) where further commands are necessary for the build process. In addition, a suite of example codes called xsdk-examples demonstrates interoperability among packages. These examples provide both a training tool for users and a test suite that confirms correct xSDK builds. The xSDK project also offers training presentations at conferences, tutorials, and webinars for current and (hopefully) future users and collaborators.

Application Impact

<strong>Figure 3.</strong> A subset of xSDK that displays the libraries used in the applications in Figures 1 and 2. \(A \rightarrow B\) indicates that library A calls library B. Figure courtesy of the authors. — **Figure 3.** A subset of xSDK that displays the libraries used in the applications in Figures 1 and 2. \(A \rightarrow B\) indicates that library A calls library B. Figure courtesy of the authors.

The xSDK community serves as a strong foundation of advanced numerical software for next-generation scientific applications. Adoption of xSDK community policies has improved the quality of member packages and enables a seamless combined build. Increasing the interoperability layers among packages supplies additional options for algorithms and data structures, thus allowing applications to explore cutting-edge advances across multiple packages. Figures 1 and 2 depict several simulations that were generated with multiple xSDK libraries in combination. The BER-integrated hydrology and reactive transport simulation (see Figure 1a) requires a variety of solvers and time integration methods that come from the xSDK libraries hypre, Trilinos, PETSc, and SuperLU. Figure 1 also shows two simulations from the ECP's Center for Efficient Exascale Discretizations (CEED), which addresses applications like fluid dynamics, radiation hydrodynamics, multiscale coupled urban systems, and climate modeling.

CEED applications in turn require discretization packages MFEM and PUMI, time integration schemes provided by PETSc and SUNDIALS, and a variety of solvers that are available in Gingko, hypre, MAGMA, PETSc, STRUMPACK, and SuperLU. The ECP’s ExaWind project—which investigates complex fluid physics in wind farms (see Figure 2)—employed xSDK to discover new methods that researchers had previously not considered but now utilize regularly through the interoperability of Trilinos and hypre. The ECP project WDMApp—which pursues a high-fidelity whole device model of magnetically confined fusion plasma (see Figure 2)—requires solvers for linear, nonlinear, and eigen systems that are available in the xSDK libraries hypre, PETSc, MAGMA, SLEPc, and SuperLU. Figure 3 illustrates the xSDK packages that researchers used for the simulations in Figures 1 and 2, as well as their interoperabilities.

Future Plans

We will routinely modify xSDK’s community policies to ensure that they meet community needs as software strategies evolve over time. To facilitate policy changes—be it the modification or removal of an existing policy or the addition of a new one—we have established a well-defined process that requires agreement between the majority of xSDK package developers. For example, in response to a request by DOE computing facilities, we recently added a recommended policy that asks users to include specific material (such as contacts, license information, and a changelog) in each package’s top source directory, as well as a policy on documentation quality.

Finally, we will continue to add suitable packages to xSDK based on application code needs; these additions will be accompanied by required testing to ensure robust build and execution. We will increase interoperability layers between libraries when it makes sense to do so, and add new example codes to the test suite that demonstrate interoperabilities. Plans also exist for xSDK builds with special functionalities, such as CUDA or OpenMP capabilities for computers with GPUs or multicore nodes. Moreover, we note that the Extreme-scale Scientific Software Stack (E4S.io) effort addresses broader work across the entire software stack.

Building a successful and sustainable software ecosystem requires the contributions of all kinds of people. We welcome your input on a variety of platforms:

Send comments and questions to xsdk-developers@xsdk.info
Provide comments and pull requests for changes to the online community policies
Consider incorporating xSDK community policies into your software and contributing to xSDK.

We look forward to hearing from you.

This article is based on Ulrike Meier Yang’s invited talk at the 2020 SIAM Annual Meeting, which took place virtually last July. Yang’s presentation is available on SIAM’s YouTube Channel.

References

[1] Heroux, M.A., McInnes, L.C., Bernholdt, D., Dubey, A., Gonsiorowski, E., Marques, O., & Wolfenbarger, P. (2020). Advancing scientific productivity through better scientific software: Developer productivity and software sustainability report. Oak Ridge, TN: U.S. Department of Energy Office of Science. DOI: 10.2172/1606662.
[2] McInnes, L.C., Katz, D.S., & Lathrop, S. (2019, December 2). Computational research software: Challenges and community organizations working for culture change. SIAM News, 52(10), p. 5.

About the Authors

Ulrike Meier Yang

Lawrence Livermore National Laboratory

Ulrike Meier Yang leads the Mathematical Algorithms & Computing Group at Lawrence Livermore National Laboratory’s Center for Applied Scientific Computing. Her research interests lie in numerical algorithms, particularly algebraic multigrid methods, high-performance computing, and scientific software design. She contributes to the scalable linear solvers library “hypre” and leads the xSDK project in the U.S. Department of Energy’s Exascale Computing Project.

Lois Curfman McInnes

Senior Computational Scientist, Argonne National Laboratory

Lois Curfman McInnes is a senior computational scientist and Argonne Distinguished Fellow in the Mathematics and Computer Science Division at Argonne National Laboratory. Her work focuses on high-performance scientific computing, with an emphasis on scalable numerical libraries and community collaboration towards productive and sustainable software ecosystems. She served as chair of the SIAM Activity Group on Supercomputing from 2022-2023.