Research Projects and Software

Master and Bachelor Projects

Projects for Master and Bachelor theses are available in the frame of the three ongoing research projects presented below. Please contact me directly.

MSCA-IF Robust Project: Robust and Energy-Efficient Numerical Solvers Towards Reliable and Sustainable Scientific Computations (ongoing)

Computations in parallel environments, like the emerging Exascale systems, are usually orchestrated by complex runtimes that employ various strategies to uniformly and efficiently distribute computations and data. However, these strategies, pursuing excellent performance scalability, may also impair numerical reliability (accuracy and reproducibility) of final results due to the dynamic and, thus, non-deterministic execution as well as non-associativity of floating-point operations. Additionally, scientific computations frequently rely upon only one working precision for computing problems with various complexities, which leads to the significant underutilization of the floating-point representation or the lack of accuracy. The Robust project aims to address the issue of reliable and sustainable scientific computations through developing robust, energy-efficient, and high performing algorithmic solutions for underlying numerical linear algebra solvers and libraries as well as applying these solutions in applications and kernels at scale.

InterFLOP project (ongoing)

The InterFLOP project aims at providing a modular and scalable platform to both analyze and control the costs of FP behavior of today’s real programs facing new floating-point paradigms (bigger problems, new architectures, new representation formats). My work on the project concerns the mixed-precision strategies for linear algebra kernels and solvers, as well as optimistic error analysis.

(Eventually) consistent collective operations (ongoing)

This work is a part of the EPEEC project. EPEEC’s main goal is to develop and deploy a production-ready parallel programming environment that turns upcoming overwhelmingly-heterogeneous exascale supercomputers into manageable platforms for domain application developers. The consortium will significantly advance and integrate existing state-of-the-art components based on European technology (programming models, runtime systems, and tools) with key features enabling 3 overarching objectives: high coding productivity, high performance, and energy awareness.

I am leading the effort on developing efficient collectives for DL and HPC codes. We exlpore the space of eventually consistent collectives by investigating a Stale Synchronous Parallel (SSP) synchronization model, which allows the workers to compute iterations using bounded stale data. In the SSP model, the worker can receive updates while performing computation on stale data and, thus, seamlessly overlap communication and computation. Additionally, we consider a possibility to drop a certain part of the data that is below a user-defined threshold in a collective. This allows the application to proceed with computations upon arrival of a part of data instead of the full amount. Furthermore, we also provide classic/ consistent asynchronous variants of Allreduce suitable for large messages, especially in ML/ DL codes, as well as of AlltoAll, which is a time-consuming operation within the Quantum Espresso application. With this effort, we aim to extend and enhance the current set of collectives in GPI-2 (only few available) to provide developers/ users with a library of collectives in order to facilitate their work.

ExBLAS -- Exact BLAS (ongoing)

ExBLAS stands for the Exact (fast, accurate, and reproducible) BLAS library. ExBLAS aims at providing new algorithms and implementations for fundamental linear algebra operations -- like those included in the BLAS library -- that deliver reproducible and accurate results with the reasonable performance overhead compared to the non-reproducible standard implementations on modern parallel architectures such as Intel Xeon Phi co-processors and GPU accelerators. We construct our approach in such a way that it is independent from data partitioning, order of computations, thread scheduling, or reduction tree schemes. Depending on the type of BLAS routines, the performance overhead can be close to zero for memory-bound routines like parallel reduction or within 15 times off for compute-bound routines such as matrix-matrix multiplication. The desired overhead for these routines should be within 10 times, so there is still a space for improvement on both algorithmic and code levels.

The ExBLAS library is available at GitHub under the Modified BSD license.

AllScale (past)

The potential of existing programming models to effectively utilise future Exascale systems, while addressing the challenges of energy-efficiency, diminishing resilience and hardware diversity, is severely limited. It follows that the lack of appropriate, high-productivity and portable programming models for Exascale computing is a fundamental barrier for the future of science and engineering. In the AllScale project, we propose the solution (the AllScale Environment) for the effective development of highly scalable, resilient and performance-portable parallel applications for Exascale systems. AllScale follows three design principles:

  1. Use a single parallel programming model to target all the levels of hardware parallelism available in extreme scale computing systems;
  2. Leverage the inherent advantages of nested recursive parallelism for adaptive parallelisation, automatic resilience management and auto-tuning for multiple optimisation objectives.
  3. Provide a programming interface that will be fully compatible with widely used industry standards and existing toolchains.
More information on the AllScale project can be found here.

INTERTWinE (past)

The INTERTWinE project addresses the problem of programming model design and implementation for the Exascale.

The supercomputing community worldwide has set itself the challenge of building, by the end of this decade, a supercomputer which can deliver an exaflop, i.e. 1018 flop/s, or a million million million calculations per second. This poses challenges not only in terms of hardware development, but also software development. One of the main challenges is in the interoperability of application programming interfaces (APIs). This project seeks to address this interoperability, bringing together the principal European organisations driving the evolution of programming models and their implementations. More information on the INTERTWinE project can be found here.

EZTrace -- Easy to Trace (past)

The Easy to trace (EZTrace) library is a general framework for analyzing high-performance and parallel applications. EZTrace relies on a tracing mechanism that aims at executing an application once in order to record its execution trace. EZTrace is shipped with modules for the main libraries that are used in parallel programming, e.g. MPI and OpenMP, and it allows third-party developers to create modules that instrument their own functions. After the execution of the application, EZTrace analyzes the resulting trace and extracts various statistics. Furthermore, EZTrace provides a functionality to generate trace files in different formats.

The initial version of EZTrace was created at the University of Kyoto by Prof. Dr. François Trahay. Later, the other members joined the project such as Dr. Mathieu Faverge (Innovative Computing Laboratory, University of Tennessee) and both Dr. Damien Martin-Guillerez and Dr. François Rue (INRIA Bordeaux). In this project, my main responsibilities were to improve the performance of EZTrace in terms of scalability and the resources usage. I have also enabled the support of EZTrace on energy efficient HPC systems such as ARM processors. In addition, I was also involved into the maintenance, testing, writing tests, and enhancing the documentation. EZTrace is available here under the CeCILL-B license.

LiTL -- Lightweight Trace Library (past)

The Lightweight Trace Library (LiTL) is a scalable binary trace library that aims at providing performance analysis tools with a low-overhead event recording service. LiTL minimizes the usage of the CPU time and memory space in order to avoid disturbing the application that is being analyzed. LiTL is also completely thread-safe that allows to record events from multi-threaded applications. It records events only from user-space that permits to apply only the relevant data structures and functionalities, and thus to simplify the maintainability of the library. Last but not least, LiTL is a generic library that can be used in conjunction with a wide range of performance analysis tools like EZTrace. The LiTL team is composed of Prof. Dr. François Trahay and me. I was the main contributor to LiTL that wrote the code, tests, and documentation as well as maintained the library. The library is available here under the BSD license.