Skip to content

Latest commit

 

History

History
82 lines (73 loc) · 9.76 KB

RELEASE_NOTES.md

File metadata and controls

82 lines (73 loc) · 9.76 KB

Release Notes

This document contains a list of new features and known limitations of Intel® SHMEM releases.

Release 1.2.0

New Features and Enhancements

  • Support for Intel® MPI Library as a host back-end for Intel® SHMEM. Please follow instructions on Building Intel® SHMEM.
  • Support for on_queue API extensions allowing OpenSHMEM operations to be queued on SYCL devices from host. These APIs also allow users option to provide a list of SYCL events as a dependency vector.
  • Experimental support for OSHMPI. Intel® SHMEM can now be configured to run over OSHMPI with suitable MPI back-end. More details are available at Building Intel® SHMEM.
  • Support for Intel® SHMEM on Intel® Tiber™ AI Cloud. Please follow instructions here.
  • Limited support for OpenSHMEM thread models. Host API support for thread initialization and query routines.
  • Device and host API support for vector point-to-point synchronization operations.
  • Support for OFI Libfabric MLX provider-enabled networks via Intel® MPI Library.
  • Bug fixes improving functionality and performance.
  • Updated specification with new feature descriptions and APIs.
  • An improved and additional set of unit tests covering functionality of the new APIs.

Known Limitations

  • Only Sandia OpenSHMEM and Intel® MPI Library are currently supported as the host back-end.
  • Not all APIs from OpenSHMEM standard are supported. Please refer to Supported/Unsupported Features to get a complete view.
  • Intel® SHMEM requires a one-to-one mapping of PEs to SYCL devices. This implies that Intel® SHMEM executions must launch with a number of processes on each compute node that is no more than the number of available SYCL devices on each one of those nodes. By default, the Intel® SHMEM runtime considers each individual device tile to make up a single SYCL device and assigns a tile per PE.
  • All collective operations within a kernel must complete before invoking subsequent kernel-initiated collective operation.
  • To run Intel® SHMEM with SOS enabling the Slingshot provider in OFI, environment variable FI_CXI_OPTIMIZED_MRS=0 must be used. It is also recommended to use FI_CXI_DEFAULT_CQ_SIZE=131072.
  • To run Intel® SHMEM with SOS enabling the verbs provider, environment variable MLX5_SCATTER_TO_CQE=0 must be used.
  • To run Intel® SHMEM with Intel® MPI Library, environment variable I_MPI_OFFLOAD=1 must be used. Additionally, I_MPI_OFFLOAD_RDMA=1 may be necessary for GPU RDMA depending on the OFI provider. Please refer to the reference guide for further details.
  • The C++ templated APIs are currently available only with a Debug build (using -DCMAKE_BUILD_TYPE=Debug during configure).
  • Inter-node communication in Intel® SHMEM requires dma-buf support in the Linux kernel. Inter-node functionality in Intel® SHMEM Release 1.2.0 is tested with SUSE Linux Enterprise Server 15 SP4.

Release 1.1.0

New Features and Enhancements

  • Support for OpenSHMEM 1.5 teams and team-based collective operations.
  • Device and host API support for strided RMA operations - ibput and ibget, from OpenSHMEM 1.6.
  • Device and host API support for non-blocking atomic operations.
  • Device and host API support for size-based RMA and signaling operations.
  • Device and host API support for all/any/some versions of point-to-point synchronization operations.
  • Device and host API support for signal set, add, and wait-until operations.
  • Fixed implementation of ishmem_free.
  • Compatible with Sandia OpenSHMEM (SOS) v1.5.3rc1 and newer releases.
  • Support for OFI PSM3 provider enabled networks via SOS.
  • Updated specification with the teams API, size-based RMA, non-blocking AMO, team-based collectives, all/any/some flavors of synchronization operations, utility extensions for print messages, etc.
  • An improved and additional set of unit tests covering functionality of the new APIs.
  • New examples illustrating use cases of Intel® SHMEM functionalities including the Teams APIs.
  • Updated launcher script to launch Intel® SHMEM applications on the available SYCL devices in the system.

Known Limitations

  • Only Sandia OpenSHMEM is currently supported as the host back-end.
  • Not all APIs from OpenSHMEM standard are supported. Please refer to Supported/Unsupported Features to get a complete view.
  • Intel® SHMEM requires a one-to-one mapping of PEs to SYCL devices. This implies that Intel® SHMEM executions must launch with a number of processes on each compute node that is no more than the number of available SYCL devices on each one of those nodes. By default, the Intel® SHMEM runtime considers each individual device tile to make up a single SYCL device and assigns a tile per PE.
  • All collective operations within a kernel must complete before invoking subsequent kernel-initiated collective operation.
  • To run Intel® SHMEM with SOS enabling the Slingshot provider in OFI, environment variable FI_CXI_OPTIMIZED_MRS=0 must be used. It is also recommended to use FI_CXI_DEFAULT_CQ_SIZE=131072.
  • To run Intel® SHMEM with SOS enabling the verbs provider, environment variable MLX5_SCATTER_TO_CQE=0 must be used.
  • Inter-node communication in Intel® SHMEM requires dma-buf support in the Linux kernel. Inter-node functionality in Intel® SHMEM Release 1.1.0 is tested with SUSE Linux Enterprise Server 15 SP4.

Release 1.0.0

New Features

  • OpenSHMEM programming on Intel® GPUs.
  • A complete specification detailing the programming model, supported API, example programs, build and run instructions, etc.
  • Device and host API support for OpenSHMEM 1.5 compliant point-to-point Remote Memory Access, Atomic Memory Operations, Signaling, Memory Ordering, and Synchronization Operations.
  • Device and host API support for OpenSHMEM collective operations across all PEs.
  • Device API support for SYCL work-group and sub-group level extensions of Remote Memory Access, Signaling, Collective, Memory Ordering, and Synchronization Operations.
  • Support of C++ template function routines replacing the C11 Generic selection routines from OpenSHMEM specification.
  • GPU RDMA support when configured with Sandia OpenSHMEM with suitable OFI providers as host back-end.
  • Support of bypassing device-to-device communication and using SYCL USM host memory as symmetric heap via environment variables.
  • A comprehensive set of unit tests to test out functionality of core operations.
  • A suite of performance benchmarks covering device-initiated operation performance for a subset of the operations.
  • An implementation of 2D stencil kernel utilizing Intel® SHMEM RMA operations.
  • Examples to illustrate different use cases of Intel® SHMEM functionalities.
  • A launcher script to launch Intel® SHMEM applications on the available SYCL devices in the system with the correct mapping.

Known Limitations

  • Only Sandia OpenSHMEM as the host back-end is currently supported.
  • Not all APIs from OpenSHMEM standard are supported. Please refer to Supported/Unsupported Features to get a complete view.
  • Intel® SHMEM requires a one-to-one mapping of PEs to SYCL devices. This implies that Intel® SHMEM executions must launch with a number of processes on each compute node that is no more than the number of available SYCL devices on each one of those nodes. By default, the Intel® SHMEM runtime considers each individual device tile to make up a single SYCL device.
  • Current implementation of ishmem_free does not release memory for use in subsequent allocations.
  • Intel® SHMEM does not yet support teams-based collectives. All collectives must operate on the world team.
  • All collective operations must complete before another kernel calls collective operations.
  • Intel® SHMEM forces assigning a single tile per PE when using ZE_FLAT_DEVICE_HIERARCHY in COMBINED or COMPOSITE mode.
  • To run Intel® SHMEM with SOS enabling the Slingshot provider in OFI, environment variable FI_CXI_OPTIMIZED_MRS=0 must be used. It is also recommended to use FI_CXI_DEFAULT_CQ_SIZE=131072.
  • To run Intel® SHMEM with SOS enabling the verbs provider, environment variable MLX5_SCATTER_TO_CQE=0 must be used.
  • Inter-node communication in Intel® SHMEM requires dma-buf support in the Linux kernel. Inter-node functionality in Intel® SHMEM Release 1.0.0 is tested with SUSE Linux Enterprise Server 15 SP4.