Release Notes

This document contains a list of new features and known limitations of Intel® SHMEM releases.

Release 1.2.0

New Features and Enhancements

Support for Intel® MPI Library as a host back-end for Intel® SHMEM. Please follow instructions on Building Intel® SHMEM.
Support for on_queue API extensions allowing OpenSHMEM operations to be queued on SYCL devices from host. These APIs also allow users option to provide a list of SYCL events as a dependency vector.
Experimental support for OSHMPI. Intel® SHMEM can now be configured to run over OSHMPI with suitable MPI back-end. More details are available at Building Intel® SHMEM.
Support for Intel® SHMEM on Intel® Tiber™ AI Cloud. Please follow instructions here.
Limited support for OpenSHMEM thread models. Host API support for thread initialization and query routines.
Device and host API support for vector point-to-point synchronization operations.
Support for OFI Libfabric MLX provider-enabled networks via Intel® MPI Library.
Bug fixes improving functionality and performance.
Updated specification with new feature descriptions and APIs.
An improved and additional set of unit tests covering functionality of the new APIs.

Known Limitations

Only Sandia OpenSHMEM and Intel® MPI Library are currently supported as the host back-end.
Not all APIs from OpenSHMEM standard are supported. Please refer to Supported/Unsupported Features to get a complete view.
Intel® SHMEM requires a one-to-one mapping of PEs to SYCL devices. This implies that Intel® SHMEM executions must launch with a number of processes on each compute node that is no more than the number of available SYCL devices on each one of those nodes. By default, the Intel® SHMEM runtime considers each individual device tile to make up a single SYCL device and assigns a tile per PE.
All collective operations within a kernel must complete before invoking subsequent kernel-initiated collective operation.
To run Intel® SHMEM with SOS enabling the Slingshot provider in OFI, environment variable FI_CXI_OPTIMIZED_MRS=0 must be used. It is also recommended to use FI_CXI_DEFAULT_CQ_SIZE=131072.
To run Intel® SHMEM with SOS enabling the verbs provider, environment variable MLX5_SCATTER_TO_CQE=0 must be used.
To run Intel® SHMEM with Intel® MPI Library, environment variable I_MPI_OFFLOAD=1 must be used. Additionally, I_MPI_OFFLOAD_RDMA=1 may be necessary for GPU RDMA depending on the OFI provider. Please refer to the reference guide for further details.
The C++ templated APIs are currently available only with a Debug build (using -DCMAKE_BUILD_TYPE=Debug during configure).
Inter-node communication in Intel® SHMEM requires dma-buf support in the Linux kernel. Inter-node functionality in Intel® SHMEM Release 1.2.0 is tested with SUSE Linux Enterprise Server 15 SP4.

Release 1.1.0

New Features and Enhancements

Support for OpenSHMEM 1.5 teams and team-based collective operations.
Device and host API support for strided RMA operations - ibput and ibget, from OpenSHMEM 1.6.
Device and host API support for non-blocking atomic operations.
Device and host API support for size-based RMA and signaling operations.
Device and host API support for all/any/some versions of point-to-point synchronization operations.
Device and host API support for signal set, add, and wait-until operations.
Fixed implementation of ishmem_free.
Compatible with Sandia OpenSHMEM (SOS) v1.5.3rc1 and newer releases.
Support for OFI PSM3 provider enabled networks via SOS.
Updated specification with the teams API, size-based RMA, non-blocking AMO, team-based collectives, all/any/some flavors of synchronization operations, utility extensions for print messages, etc.
An improved and additional set of unit tests covering functionality of the new APIs.
New examples illustrating use cases of Intel® SHMEM functionalities including the Teams APIs.
Updated launcher script to launch Intel® SHMEM applications on the available SYCL devices in the system.

Known Limitations

Only Sandia OpenSHMEM is currently supported as the host back-end.
Not all APIs from OpenSHMEM standard are supported. Please refer to Supported/Unsupported Features to get a complete view.
Intel® SHMEM requires a one-to-one mapping of PEs to SYCL devices. This implies that Intel® SHMEM executions must launch with a number of processes on each compute node that is no more than the number of available SYCL devices on each one of those nodes. By default, the Intel® SHMEM runtime considers each individual device tile to make up a single SYCL device and assigns a tile per PE.
All collective operations within a kernel must complete before invoking subsequent kernel-initiated collective operation.
To run Intel® SHMEM with SOS enabling the Slingshot provider in OFI, environment variable FI_CXI_OPTIMIZED_MRS=0 must be used. It is also recommended to use FI_CXI_DEFAULT_CQ_SIZE=131072.
To run Intel® SHMEM with SOS enabling the verbs provider, environment variable MLX5_SCATTER_TO_CQE=0 must be used.
Inter-node communication in Intel® SHMEM requires dma-buf support in the Linux kernel. Inter-node functionality in Intel® SHMEM Release 1.1.0 is tested with SUSE Linux Enterprise Server 15 SP4.

Release 1.0.0

New Features

OpenSHMEM programming on Intel® GPUs.
A complete specification detailing the programming model, supported API, example programs, build and run instructions, etc.
Device and host API support for OpenSHMEM 1.5 compliant point-to-point Remote Memory Access, Atomic Memory Operations, Signaling, Memory Ordering, and Synchronization Operations.
Device and host API support for OpenSHMEM collective operations across all PEs.
Device API support for SYCL work-group and sub-group level extensions of Remote Memory Access, Signaling, Collective, Memory Ordering, and Synchronization Operations.
Support of C++ template function routines replacing the C11 Generic selection routines from OpenSHMEM specification.
GPU RDMA support when configured with Sandia OpenSHMEM with suitable OFI providers as host back-end.
Support of bypassing device-to-device communication and using SYCL USM host memory as symmetric heap via environment variables.
A comprehensive set of unit tests to test out functionality of core operations.
A suite of performance benchmarks covering device-initiated operation performance for a subset of the operations.
An implementation of 2D stencil kernel utilizing Intel® SHMEM RMA operations.
Examples to illustrate different use cases of Intel® SHMEM functionalities.
A launcher script to launch Intel® SHMEM applications on the available SYCL devices in the system with the correct mapping.

Known Limitations

Only Sandia OpenSHMEM as the host back-end is currently supported.
Not all APIs from OpenSHMEM standard are supported. Please refer to Supported/Unsupported Features to get a complete view.
Intel® SHMEM requires a one-to-one mapping of PEs to SYCL devices. This implies that Intel® SHMEM executions must launch with a number of processes on each compute node that is no more than the number of available SYCL devices on each one of those nodes. By default, the Intel® SHMEM runtime considers each individual device tile to make up a single SYCL device.
Current implementation of ishmem_free does not release memory for use in subsequent allocations.
Intel® SHMEM does not yet support teams-based collectives. All collectives must operate on the world team.
All collective operations must complete before another kernel calls collective operations.
Intel® SHMEM forces assigning a single tile per PE when using ZE_FLAT_DEVICE_HIERARCHY in COMBINED or COMPOSITE mode.
To run Intel® SHMEM with SOS enabling the Slingshot provider in OFI, environment variable FI_CXI_OPTIMIZED_MRS=0 must be used. It is also recommended to use FI_CXI_DEFAULT_CQ_SIZE=131072.
To run Intel® SHMEM with SOS enabling the verbs provider, environment variable MLX5_SCATTER_TO_CQE=0 must be used.
Inter-node communication in Intel® SHMEM requires dma-buf support in the Linux kernel. Inter-node functionality in Intel® SHMEM Release 1.0.0 is tested with SUSE Linux Enterprise Server 15 SP4.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RELEASE_NOTES.md

RELEASE_NOTES.md

Release Notes

Release 1.2.0

New Features and Enhancements

Known Limitations

Release 1.1.0

New Features and Enhancements

Known Limitations

Release 1.0.0

New Features

Known Limitations

Files

RELEASE_NOTES.md

Latest commit

History

RELEASE_NOTES.md

File metadata and controls

Release Notes

Release 1.2.0

New Features and Enhancements

Known Limitations

Release 1.1.0

New Features and Enhancements

Known Limitations

Release 1.0.0

New Features

Known Limitations