Update from github actions

uxlfoundation · Nov 8, 2024 · a8fe97f · a8fe97f
commit a8fe97f
Show file tree

Hide file tree

Showing 3,210 changed files with 1,589,104 additions and 0 deletions.
diff --git a/.nojekyll b/.nojekyll
diff --git a/index.html b/index.html
@@ -0,0 +1,9 @@
+<!DOCTYPE html>
+<html>
+  <head>
+    <meta http-equiv="refresh" content="7; url='https://uxlfoundation.github.io/oneAPI-spec/spec'" />
+  </head>
+  <body>
+    <p>Please follow <a href="https://uxlfoundation.github.io/oneAPI-spec/spec">this link</a>.</p>
+  </body>
+</html>
diff --git a/spec/404.html b/spec/404.html
diff --git a/spec/_images/bf16_programming.png b/spec/_images/bf16_programming.png
diff --git a/spec/_images/critical_path_in_graph.png b/spec/_images/critical_path_in_graph.png
diff --git a/spec/_images/data_analytics_stages.png b/spec/_images/data_analytics_stages.png
diff --git a/spec/_images/data_management_flow.png b/spec/_images/data_management_flow.png
diff --git a/spec/_images/dataset.png b/spec/_images/dataset.png
diff --git a/spec/_images/dep_graph.jpg b/spec/_images/dep_graph.jpg
diff --git a/spec/_images/e2eframeworks.png b/spec/_images/e2eframeworks.png
diff --git a/spec/_images/error_functions_plot.jpg b/spec/_images/error_functions_plot.jpg
diff --git a/spec/_images/extbrc_async.png b/spec/_images/extbrc_async.png
diff --git a/spec/_images/frame_cmplx.png b/spec/_images/frame_cmplx.png
diff --git a/spec/_images/graphviz-1d3aa857b4784f656d4eac77e4266e8349f08e34.png b/spec/_images/graphviz-1d3aa857b4784f656d4eac77e4266e8349f08e34.png
diff --git a/spec/_images/graphviz-1d3aa857b4784f656d4eac77e4266e8349f08e34.png.map b/spec/_images/graphviz-1d3aa857b4784f656d4eac77e4266e8349f08e34.png.map
@@ -0,0 +1,2 @@
+<map id="%3" name="%3">
+</map>
diff --git a/spec/_images/graphviz-262dd76ec9d2c127fae2d926c0e385f27c3ea770.png b/spec/_images/graphviz-262dd76ec9d2c127fae2d926c0e385f27c3ea770.png
diff --git a/spec/_images/graphviz-262dd76ec9d2c127fae2d926c0e385f27c3ea770.png.map b/spec/_images/graphviz-262dd76ec9d2c127fae2d926c0e385f27c3ea770.png.map
@@ -0,0 +1,2 @@
+<map id="%3" name="%3">
+</map>
diff --git a/spec/_images/graphviz-39e31570c6f96e0db07b1f492158588ad3e61a54.png b/spec/_images/graphviz-39e31570c6f96e0db07b1f492158588ad3e61a54.png
diff --git a/spec/_images/graphviz-39e31570c6f96e0db07b1f492158588ad3e61a54.png.map b/spec/_images/graphviz-39e31570c6f96e0db07b1f492158588ad3e61a54.png.map
@@ -0,0 +1,2 @@
+<map id="%3" name="%3">
+</map>
diff --git a/spec/_images/graphviz-4c49c9a0025ac6d1a15e64ee523965878c6ce453.png b/spec/_images/graphviz-4c49c9a0025ac6d1a15e64ee523965878c6ce453.png
diff --git a/spec/_images/graphviz-4c49c9a0025ac6d1a15e64ee523965878c6ce453.png.map b/spec/_images/graphviz-4c49c9a0025ac6d1a15e64ee523965878c6ce453.png.map
@@ -0,0 +1,2 @@
+<map id="%3" name="%3">
+</map>
diff --git a/spec/_images/graphviz-5a3d1fe20eb74b213cfce550b4f5465cdbfc9178.png b/spec/_images/graphviz-5a3d1fe20eb74b213cfce550b4f5465cdbfc9178.png
diff --git a/spec/_images/graphviz-5a3d1fe20eb74b213cfce550b4f5465cdbfc9178.png.map b/spec/_images/graphviz-5a3d1fe20eb74b213cfce550b4f5465cdbfc9178.png.map
@@ -0,0 +1,2 @@
+<map id="%3" name="%3">
+</map>
diff --git a/spec/_images/graphviz-71a26ccb9d4c3aed52dcdee7146cb1c18ccb5b9d.png b/spec/_images/graphviz-71a26ccb9d4c3aed52dcdee7146cb1c18ccb5b9d.png
diff --git a/spec/_images/graphviz-71a26ccb9d4c3aed52dcdee7146cb1c18ccb5b9d.png.map b/spec/_images/graphviz-71a26ccb9d4c3aed52dcdee7146cb1c18ccb5b9d.png.map
@@ -0,0 +1,2 @@
+<map id="%3" name="%3">
+</map>
diff --git a/spec/_images/graphviz-791be0cd51f839971dd94c448aaaa288de2c70aa.png b/spec/_images/graphviz-791be0cd51f839971dd94c448aaaa288de2c70aa.png
diff --git a/spec/_images/graphviz-791be0cd51f839971dd94c448aaaa288de2c70aa.png.map b/spec/_images/graphviz-791be0cd51f839971dd94c448aaaa288de2c70aa.png.map
@@ -0,0 +1,2 @@
+<map id="%3" name="%3">
+</map>
diff --git a/spec/_images/graphviz-aaa036013d52c433dae6ffb9e95663c19fa1ea9f.png b/spec/_images/graphviz-aaa036013d52c433dae6ffb9e95663c19fa1ea9f.png
diff --git a/spec/_images/graphviz-aaa036013d52c433dae6ffb9e95663c19fa1ea9f.png.map b/spec/_images/graphviz-aaa036013d52c433dae6ffb9e95663c19fa1ea9f.png.map
@@ -0,0 +1,2 @@
+<map id="%3" name="%3">
+</map>
diff --git a/spec/_images/graphviz-afb8259d26d0164efda5a9ab3f08532191385dea.png b/spec/_images/graphviz-afb8259d26d0164efda5a9ab3f08532191385dea.png
diff --git a/spec/_images/graphviz-afb8259d26d0164efda5a9ab3f08532191385dea.png.map b/spec/_images/graphviz-afb8259d26d0164efda5a9ab3f08532191385dea.png.map
@@ -0,0 +1,2 @@
+<map id="%3" name="%3">
+</map>
diff --git a/spec/_images/graphviz-f19bb34f815e805def8bc641648d6c943fda1175.png b/spec/_images/graphviz-f19bb34f815e805def8bc641648d6c943fda1175.png
diff --git a/spec/_images/graphviz-f19bb34f815e805def8bc641648d6c943fda1175.png.map b/spec/_images/graphviz-f19bb34f815e805def8bc641648d6c943fda1175.png.map
@@ -0,0 +1,2 @@
+<map id="%3" name="%3">
+</map>
diff --git a/spec/_images/half_edges.png b/spec/_images/half_edges.png
diff --git a/spec/_images/img_bf16_diagram.png b/spec/_images/img_bf16_diagram.png
diff --git a/spec/_images/img_execution_model.png b/spec/_images/img_execution_model.png
diff --git a/spec/_images/img_programming_model.png b/spec/_images/img_programming_model.png
diff --git a/spec/_images/int8_programming.png b/spec/_images/int8_programming.png
diff --git a/spec/_images/inverse_error_functions_plot.jpg b/spec/_images/inverse_error_functions_plot.jpg
diff --git a/spec/_images/message_flow_graph.jpg b/spec/_images/message_flow_graph.jpg
diff --git a/spec/_images/oneapi-architecture.png b/spec/_images/oneapi-architecture.png
diff --git a/spec/_images/plantuml-31c75932926528784d2be9756405f8417f6f5f74.png b/spec/_images/plantuml-31c75932926528784d2be9756405f8417f6f5f74.png
diff --git a/spec/_images/plantuml-e316494e8cc45441639c66133394db889078644d.png b/spec/_images/plantuml-e316494e8cc45441639c66133394db889078644d.png
diff --git a/spec/_images/programming_concepts.png b/spec/_images/programming_concepts.png
diff --git a/spec/_images/quad_uv.png b/spec/_images/quad_uv.png
diff --git a/spec/_images/rng-leapfrog.png b/spec/_images/rng-leapfrog.png
diff --git a/spec/_images/rng-skip-ahead.png b/spec/_images/rng-skip-ahead.png
diff --git a/spec/_images/sdk_function_naming_convention.png b/spec/_images/sdk_function_naming_convention.png
diff --git a/spec/_images/structured_spherical_coords.png b/spec/_images/structured_spherical_coords.png
diff --git a/spec/_images/table_accessor_usage_example.png b/spec/_images/table_accessor_usage_example.png
diff --git a/spec/_images/triangle_uv.png b/spec/_images/triangle_uv.png
diff --git a/spec/_images/unrolled_stack_rnn.jpg b/spec/_images/unrolled_stack_rnn.jpg
diff --git a/spec/_images/vdb_structure.png b/spec/_images/vdb_structure.png
diff --git a/spec/_images/vpp_region_of_interest_operation.png b/spec/_images/vpp_region_of_interest_operation.png
diff --git a/spec/_plantuml/31/31c75932926528784d2be9756405f8417f6f5f74.png b/spec/_plantuml/31/31c75932926528784d2be9756405f8417f6f5f74.png
diff --git a/spec/_plantuml/e3/e316494e8cc45441639c66133394db889078644d.png b/spec/_plantuml/e3/e316494e8cc45441639c66133394db889078644d.png
diff --git a/spec/_sources/404.rst b/spec/_sources/404.rst
@@ -0,0 +1,14 @@
+.. SPDX-FileCopyrightText: 2019-2020 Intel Corporation
+..
+.. SPDX-License-Identifier: CC-BY-4.0
+
+==============
+Page Not Found
+==============
+
+We cannot find the page. Please try:
+
+- Starting the navigation from `spec.oneapi.io <https://spec.oneapi.io>`__
+- Clearing your `browser cache <https://clear-my-cache.com/>`__ and
+  starting the navigation from `spec.oneapi.io <https://spec.oneapi.io>`__
+- Filing an issue in `Github <https://github.com/uxlfoundation/oneapi-spec/issues>`__
diff --git a/spec/_sources/404.rst.txt b/spec/_sources/404.rst.txt
@@ -0,0 +1,15 @@
+.. SPDX-FileCopyrightText: 2019-2020 Intel Corporation
+..
+.. SPDX-License-Identifier: CC-BY-4.0
+
+==============
+Page Not Found
+==============
+
+We cannot find the page. Please try:
+
+- Starting the navigation from `spec.oneapi.com <https://spec.oneapi.com>`__
+- Clearing your `browser cache <https://clear-my-cache.com/>`__ and
+  starting the navigation from `spec.oneapi.com <https://spec.oneapi.com>`__
+- Filing an issue in `Github <https://github.com/oneapi-src/oneapi-spec/issues>`__
+- Emailing to: `[email protected] <mailto:[email protected]>`__
diff --git a/spec/_sources/architecture.rst b/spec/_sources/architecture.rst
@@ -0,0 +1,154 @@
+.. SPDX-FileCopyrightText: 2019-2020 Intel Corporation
+..
+.. SPDX-License-Identifier: CC-BY-4.0
+
+Software Architecture
+=====================
+
+oneAPI provides a common developer interface across a range of data
+parallel accelerators (see the figure below).  Programmers use SYCL
+for both API programming and direct programming.  The capabilities of
+a oneAPI platform are determined by the Level Zero interface, which
+provides system software a common abstraction for a oneAPI device.
+
+.. image:: oneapi-architecture.png
+
+oneAPI Platform
+---------------
+
+A oneAPI platform is comprised of a *host* and a collection of
+*devices*.  The host is typically a multi-core CPU, and the devices
+are one or more GPUs, FPGAs, and other accelerators.  The processor
+serving as the host can also be targeted as a device by the software.
+
+Each device has an associated command *queue*.  A application that
+employs oneAPI runs on the host, following standard C++ execution
+semantics.  To run a *function object* on a device, the application
+submits a *command group* containing the function object to the
+device’s queue.  A function object contains a function definition
+together with associated variables. A function object submitted to a
+queue is also referred to as a *data parallel kernel* or simply a
+*kernel*.
+
+The application running on the host and the functions running on the
+devices communicate through *memory*.  oneAPI defines several
+mechanisms for sharing memory across the platform, depending on the
+capabilities of the devices:
+
+
+=========================  ===========
+Memory Sharing Mechanism   Description
+=========================  ===========
+Buffer objects             | An application can create *buffer objects*
+                           | to pass data to devices.  A buffer is an
+                           | array of data.  A command group will define
+                           | *accessor objects* to identify which
+                           | buffers are accessed in this call to the
+                           | device.  The oneAPI runtime will ensure the
+                           | data in the buffer is accessible to the
+                           | function running on the device.  The
+                           | buffer-accessor mechanism is available on
+                           | all oneAPI platforms
+Unified addressing         | Unified addressing guarantees that the host and
+                           | all devices will share a unified address space.
+                           | Pointer values in the unified address space will
+                           | always refer to the same location in memory.
+Unified shared memory      | Unified shared memory enables data to be shared
+                           | through pointers without using buffers and
+                           | accessors. There are several levels of support
+                           | for this feature, depending on the capabilities
+                           | of the underlying device.
+=========================  ===========
+
+
+The *scheduler* determines when a command group is run on a
+device.  The following mechanisms are used to determine when a command
+group is ready to run.
+
+  - If the buffer-accessor method is used, the command group is ready
+    when the buffers are defined and copied to the device as
+    necessary.
+
+  - If an ordered queue is used for a device, the command group is
+    ready as soon as the prior command groups in the queue are
+    finished.
+
+  - If unified shared memory is used, you must specify a set of event
+    objects which the command group depends on, and the command group
+    is ready when all of the events are completed.
+
+The application on the host and the functions on the devices can
+*synchronize* through *events*, which are objects that can coordinate
+execution.  If the buffer-accessor mechanism
+is used, the application and device can also synchronize through a
+*host accessor*, through the destruction of a buffer object, or
+through other more advanced mechanisms.
+
+API Programming Example
+-----------------------
+
+API programming requires the programmer to specify the target device and the
+memory communication strategy.  In the following example, we call the
+oneMKL matrix multiply routine, GEMM.  We are writing in SYCL and
+omitting irrelevant details.
+
+We create a queue initialized with a *gpu_selector* to specify that we
+want the computation performed on a GPU, and we define buffers to hold the
+arrays allocated on the host.  Compared to a standard C++ GEMM call,
+we add a parameter to specify the queue, and we replace the references
+to the arrays with references to the buffers that contain the arrays.
+Otherwise this is the standard GEMM C++ interface.
+
+.. code:: cpp
+
+  using namespace cl::sycl;
+
+  // declare host arrays
+  double *A = new double[M*N];
+  double *B = new double[N*P];
+  double *C = new double[M*P];
+
+  {
+      // Initializing the devices queue with a gpu_selector
+      queue q{gpu_selector()};
+
+      // Creating 1D buffers for matrices which are bound to host arrays
+      buffer<double, 1> a{A, range<1>{M*N}};
+      buffer<double, 1> b{B, range<1>{N*P}};
+      buffer<double, 1> c{C, range<1>{M*P}};
+
+      mkl::transpose nT = mkl::transpose::nontrans;
+      // Syntax
+      //   void gemm(queue &exec_queue, transpose transa, transpose transb,
+      //             int64_t m, int64_t n, int64_t k, T alpha,
+      //             buffer<T,1> &a, int64_t lda,
+      //             buffer<T,1> &b, int64_t ldb, T beta,
+      //             buffer<T,1> &c, int64_t ldc);
+      // call gemm
+      mkl::blas::gemm(q, nT, nT, M, P, N, 1.0, a, M, b, N, 0.0, c, M);
+  }
+  // when we exit the block, the buffer destructor will write result back to C.
+
+Direct Programming Example
+--------------------------
+
+With direct programming, we specify the target device and the memory
+communication strategy, as we do for API programming.  In addition, we
+must define and submit a command group to perform the computation.
+In the following example, we write a simple data parallel matrix
+multiply.  We are writing in SYCL and omitting irrelevant
+details.
+
+We create a queue initialized with a *gpu_selector* to specify that the
+command group should run on the GPU, and we define buffers to hold the
+arrays allocated on the host. We then submit the command group to the
+queue to perform the computation.  The command group defines accessors
+to specify we are reading arrays A and B and writing to C.  We then
+write a C++ lambda to create a function object that computes one
+element of the resulting matrix multiply.  We specify this function
+object as a parameter to a :code:`parallel_for` which maps the
+function across the arrays :code:`A` and :code:`B` in parallel.  When
+we leave the scope, the destructor for the buffer object holding
+:code:`C` writes the data back to the host array.
+
+.. literalinclude:: example.cpp
diff --git a/spec/_sources/architecture.rst.txt b/spec/_sources/architecture.rst.txt
@@ -0,0 +1,154 @@
+.. SPDX-FileCopyrightText: 2019-2020 Intel Corporation
+..
+.. SPDX-License-Identifier: CC-BY-4.0
+
+Software Architecture
+=====================
+
+oneAPI provides a common developer interface across a range of data
+parallel accelerators (see the figure below).  Programmers use SYCL
+for both API programming and direct programming.  The capabilities of
+a oneAPI platform are determined by the Level Zero interface, which
+provides system software a common abstraction for a oneAPI device.
+
+.. image:: oneapi-architecture.png
+
+oneAPI Platform
+---------------
+
+A oneAPI platform is comprised of a *host* and a collection of
+*devices*.  The host is typically a multi-core CPU, and the devices
+are one or more GPUs, FPGAs, and other accelerators.  The processor
+serving as the host can also be targeted as a device by the software.
+
+Each device has an associated command *queue*.  A application that
+employs oneAPI runs on the host, following standard C++ execution
+semantics.  To run a *function object* on a device, the application
+submits a *command group* containing the function object to the
+device’s queue.  A function object contains a function definition
+together with associated variables. A function object submitted to a
+queue is also referred to as a *data parallel kernel* or simply a
+*kernel*.
+
+The application running on the host and the functions running on the
+devices communicate through *memory*.  oneAPI defines several
+mechanisms for sharing memory across the platform, depending on the
+capabilities of the devices:
+
+
+=========================  ===========
+Memory Sharing Mechanism   Description
+=========================  ===========
+Buffer objects             | An application can create *buffer objects*
+                           | to pass data to devices.  A buffer is an
+                           | array of data.  A command group will define
+                           | *accessor objects* to identify which
+                           | buffers are accessed in this call to the
+                           | device.  The oneAPI runtime will ensure the
+                           | data in the buffer is accessible to the
+                           | function running on the device.  The
+                           | buffer-accessor mechanism is available on
+                           | all oneAPI platforms
+Unified addressing         | Unified addressing guarantees that the host and
+                           | all devices will share a unified address space.
+                           | Pointer values in the unified address space will
+                           | always refer to the same location in memory.
+Unified shared memory      | Unified shared memory enables data to be shared
+                           | through pointers without using buffers and
+                           | accessors. There are several levels of support
+                           | for this feature, depending on the capabilities
+                           | of the underlying device.
+=========================  ===========
+
+
+The *scheduler* determines when a command group is run on a
+device.  The following mechanisms are used to determine when a command
+group is ready to run.
+
+  - If the buffer-accessor method is used, the command group is ready
+    when the buffers are defined and copied to the device as
+    necessary.
+
+  - If an ordered queue is used for a device, the command group is
+    ready as soon as the prior command groups in the queue are
+    finished.
+
+  - If unified shared memory is used, you must specify a set of event
+    objects which the command group depends on, and the command group
+    is ready when all of the events are completed.
+
+The application on the host and the functions on the devices can
+*synchronize* through *events*, which are objects that can coordinate
+execution.  If the buffer-accessor mechanism
+is used, the application and device can also synchronize through a
+*host accessor*, through the destruction of a buffer object, or
+through other more advanced mechanisms.
+
+API Programming Example
+-----------------------
+
+API programming requires the programmer to specify the target device and the
+memory communication strategy.  In the following example, we call the
+oneMKL matrix multiply routine, GEMM.  We are writing in SYCL and
+omitting irrelevant details.
+
+We create a queue initialized with a *gpu_selector* to specify that we
+want the computation performed on a GPU, and we define buffers to hold the
+arrays allocated on the host.  Compared to a standard C++ GEMM call,
+we add a parameter to specify the queue, and we replace the references
+to the arrays with references to the buffers that contain the arrays.
+Otherwise this is the standard GEMM C++ interface.
+
+.. code:: cpp
+
+  using namespace cl::sycl;
+
+  // declare host arrays
+  double *A = new double[M*N];
+  double *B = new double[N*P];
+  double *C = new double[M*P];
+
+  {
+      // Initializing the devices queue with a gpu_selector
+      queue q{gpu_selector()};
+
+      // Creating 1D buffers for matrices which are bound to host arrays
+      buffer<double, 1> a{A, range<1>{M*N}};
+      buffer<double, 1> b{B, range<1>{N*P}};
+      buffer<double, 1> c{C, range<1>{M*P}};
+
+      mkl::transpose nT = mkl::transpose::nontrans;
+      // Syntax
+      //   void gemm(queue &exec_queue, transpose transa, transpose transb,
+      //             int64_t m, int64_t n, int64_t k, T alpha,
+      //             buffer<T,1> &a, int64_t lda,
+      //             buffer<T,1> &b, int64_t ldb, T beta,
+      //             buffer<T,1> &c, int64_t ldc);
+      // call gemm
+      mkl::blas::gemm(q, nT, nT, M, P, N, 1.0, a, M, b, N, 0.0, c, M);
+  }
+  // when we exit the block, the buffer destructor will write result back to C.
+
+Direct Programming Example
+--------------------------
+
+With direct programming, we specify the target device and the memory
+communication strategy, as we do for API programming.  In addition, we
+must define and submit a command group to perform the computation.
+In the following example, we write a simple data parallel matrix
+multiply.  We are writing in SYCL and omitting irrelevant
+details.
+
+We create a queue initialized with a *gpu_selector* to specify that the
+command group should run on the GPU, and we define buffers to hold the
+arrays allocated on the host. We then submit the command group to the
+queue to perform the computation.  The command group defines accessors
+to specify we are reading arrays A and B and writing to C.  We then
+write a C++ lambda to create a function object that computes one
+element of the resulting matrix multiply.  We specify this function
+object as a parameter to a :code:`parallel_for` which maps the
+function across the arrays :code:`A` and :code:`B` in parallel.  When
+we leave the scope, the destructor for the buffer object holding
+:code:`C` writes the data back to the host array.
+
+.. literalinclude:: example.cpp
diff --git a/spec/_sources/elements/element_list.rst b/spec/_sources/elements/element_list.rst
@@ -0,0 +1,16 @@
+.. SPDX-FileCopyrightText: 2019-2020 Intel Corporation
+..
+.. SPDX-License-Identifier: CC-BY-4.0
+
+- :ref:`oneDPL-section`: A companion to the DPC++ Compiler for
+  programming oneAPI devices with APIs from C++ standard library,
+  Parallel STL, and extensions.
+- :ref:`oneDNN-section`: High performance implementations of
+  primitives for deep learning frameworks
+- :ref:`oneCCL-section`: Communication primitives for scaling deep
+  learning frameworks across multiple devices
+- :ref:`oneDAL-section`: Algorithms for accelerated data science
+- :ref:`oneTBB-section`: Library for adding thread-based parallelism
+  to complex applications on multiprocessors
+- :ref:`oneMKL-section`: High performance math routines for science,
+  engineering, and financial applications