SYCL: Difference between revisions

SYCL
Original author(s)	Khronos Group
Developer(s)	Khronos Group
Initial release	March 2014; 10 years ago
Stable release	2020 revision 8 (1.2.1) / 19 October 2023; 14 months ago
Operating system	Cross-platform
Platform	Cross-platform
Type	High-level programming language
Website	www.khronos.org/sycl/ sycl.tech

Browse history interactively

← Previous edit Next edit →

Content deleted Content added

VisualWikitext

Inline

Revision as of 09:44, 12 July 2024

SYCL (pronounced "sickle") is a higher-level programming model to improve programming productivity on various hardware accelerators. It is a single-source embedded domain-specific language (eDSL) based on pure C++17. It is a standard developed by Khronos Group, announced in March 2014.

Origin of the name

SYCL (pronounced ‘sickle’) originally stood for SYstem-wide Compute Language,^[2] but since 2020 SYCL developers have stated that SYCL is a name and have made clear that it is no longer an acronym and contains no reference to OpenCL.^[3]

Purpose

SYCL is a royalty-free, cross-platform abstraction layer that builds on the underlying concepts, portability and efficiency inspired by OpenCL that enables code for heterogeneous processors to be written in a “single-source” style using completely standard C++. SYCL enables single-source development where C++ template functions can contain both host and device code to construct complex algorithms that use hardware accelerators, and then re-use them throughout their source code on different types of data.

While the SYCL standard started as the higher-level programming model sub-group of the OpenCL working group and was originally developed for use with OpenCL and SPIR, SYCL is a Khronos Group workgroup independent from the OpenCL working group since September 20, 2019 and starting with SYCL 2020, SYCL has been generalized as a more general heterogeneous framework able to target other systems. This is now possible with the concept of a generic backend to target any acceleration API while enabling full interoperability with the target API, like using existing native libraries to reach the maximum performance along with simplifying the programming effort. For example, the Open SYCL implementation targets ROCm and CUDA via AMD's cross-vendor HIP.

Versions

SYCL was introduced at GDC in March 2014 with provisional version 1.2,^[4] then the SYCL 1.2 final version was introduced at IWOCL 2015 in May 2015.^[5]

The latest version for the previous SYCL 1.2.1 series is SYCL 1.2.1 revision 7 which was published on April 27, 2020 (the first version was published on December 6, 2017^[6]).

SYCL 2.2 provisional was introduced at IWOCL 2016 in May 2016^[7] targeting C++14 and OpenCL 2.2. But the SYCL committee preferred not to finalize this version and to move towards a more flexible SYCL specification to address the increasing diversity of current hardware accelerators, including artificial intelligence engines, which led to SYCL 2020.

The latest version is SYCL 2020 revision 6 which was published on November 13, 2022, an evolution from first release of revision 2 which was published on February 9, 2021,^[8] taking into account the feedback from users and implementors on the SYCL 2020 Provisional Specification revision 1 published on June 30, 2020.^[9] C++17 and OpenCL 3.0 support are main targets of this release. Unified shared memory (USM) is one main feature for GPUs with OpenCL and CUDA support.

At IWOCL 2021 a roadmap was presented. DPC++, ComputeCpp, Open SYCL, triSYCL and neoSYCL are the main implementations of SYCL. Next Target in development is support of C++20 in future SYCL 202x.^[10]

Implementations

Data Parallel C++ (DPC++): an open source project of Intel to introduce SYCL for LLVM and oneAPI. C++17 and parts of C++20 with SYCL 2020 are base of this compiler framework.^[11]^[12]
ComputeCpp: SYCL 1.2.1 conformant framework of firm Codeplay with community version.^[13]^[14] Now deprecated in favor of DPC++.^[15]
AdaptiveCpp (formerly hipSYCL and Open SYCL): incomplete 1.2.1 support without Images or OpenCL interop; partly SYCL 2020.^[16] Supports AMD (ROCm), Nvidia (CUDA), Intel (Level Zero via SPIR-V), and CPUs (LLVM + OpenMP).^[17] Can produce fully generic binaries using a just-in-time runtime. Supports C++ standard parallelism (std::execution) in addition to SYCL.^[18]
triSYCL: based on C++20, OpenMP and OpenCL, slow development, incomplete, with a version based on top of DPC++^[19]
neoSYCL: SYCL 1.2.1 nearly complete, for hpc SX-Aurora Tsubasa, no OpenCL specific features like image support^[20]^[21]
SYCL-gtx: C++11 support, for OpenCL 1.2+, far from complete, no actual development^[22]
Sylkan is an implementation of SYCL to Vulkan devices in an experimental state.^[23]
Polygeist has a fork compiling SYCL through MLIR^[24] which is backed by the Inteon company.^[25]

Extensions

SYCL safety critical

In march the Khronos Group announced the creation of the SYCL SC Working Group^[26], with the objective of creating a a high-level heterogeneous computing framework for safety-critical systems. These systems span various fields, including avionics, automotive, industrial, and medical sectors.

The SYCL Safety Critical framework will comply with several industry standards to ensure its reliability and safety. These standards include MISRA C++ 202X^[27], which provides guidelines for the use of C++ in critical systems, RTCA DO-178C / EASA ED-12C^[28], which are standards for software considerations in airborne systems and equipment certification, ISO 26262/21448^[29], which pertains to the functional safety of road vehicles, IEC 61508, which covers the functional safety of electrical/electronic/programmable electronic safety-related systems, and IEC 62304,which relates to the lifecycle requirements for medical device software.^[26]

Software

Here is a list of software applications examples which are using SYCL:

Bioinformatics
- GROMACS: A molecular dynamics software widely used in bioinformatics and computational chemistry. Starting from its accelerated version in 2021, GROMACS utilizes SYCL 2020 for efficient computation on various hardware accelerators.^[30]
- LiGen: A molecular docking software that utilizes SYCL for accelerating computational tasks related to molecular structure analysis and docking simulations.^[31]
- Autodock: Another molecular docking software that leverages SYCL to accelerate the process of predicting how small molecules bind to a receptor of a known 3D structure.^[32]
Automotive Industry
- ISO 26262: The international standard for functional safety of automotive electrical and electronic systems. SYCL is used in automotive applications to accelerate safety-critical computations and simulations, ensuring compliance with stringent safety standards^[33].
Cosmology
- CRK-HACC: A cosmological n-body simulation code that has been ported to SYCL. It uses SYCL to accelerate calculations related to large-scale structure formation and dynamics in the universe.^[34]

Resources

Khronos Maintains a list of SYCL resource.^[35] Codeplay Software also provides tutorials on the website sycl.tech along with other information and news on the SYCL ecosystem.

License

The source files for building the specification, such as Makefiles and some scripts, the SYCL headers and the SYCL code samples are under the Apache 2.0 license.^[36]

Comparison with other Tools

The open standards SYCL and OpenCL are similar to the programming models of the proprietary stack CUDA from Nvidia and HIP from the open-source stack ROCm, supported by AMD.^[37]

In the Khronos Group realm, OpenCL and Vulkan are the low-level non-single source APIs, providing fine-grained control over hardware resources and operations. OpenCL is widely used for parallel programming across various hardware types, while Vulkan primarily focuses on high-performance graphics and computing tasks.^[38]

SYCL, on the other hand, is the high-level single-source C++ embedded domain-specific language (eDSL). It enables developers to write code for heterogeneous computing systems, including CPUs, GPUs, and other accelerators, using a single-source approach. This means that both host and device code can be written in the same C++ source file.^[39]

CUDA

By comparison, the single-source C++ embedded domain-specific language version of CUDA, which is named "CUDA Runtime API," is somewhat similar to SYCL. In fact, Intel released a tool called SYCLOMATIC that automatically translated code from CUDA to SYCL.^[40] However, there is a less known non-single-source version of CUDA, which is called "CUDA Driver API," similar to OpenCL, and used, for example, by the CUDA Runtime API implementation itself.^[37]

SYCL extends the C++ AMP features, relieving the programmer from explicitly transferring data between the host and devices by using buffers and accessors. This is in contrast to CUDA (prior to the introduction of Unified Memory in CUDA 6), where explicit data transfers were required. Starting with SYCL 2020, it is also possible to use USM instead of buffers and accessors, providing a lower-level programming model similar to Unified Memory in CUDA.^[41]

SYCL is higher-level than C++ AMP and CUDA since you do not need to build an explicit dependency graph between all the kernels, and it provides you with automatic asynchronous scheduling of the kernels with communication and computation overlap. This is all done by using the concept of accessors without requiring any compiler support.^[42]

Unlike C++ AMP and CUDA, SYCL is a pure C++ eDSL without any C++ extension. This allows for a basic CPU implementation that relies on pure runtime without any specific compiler.^[39]

Both DPC++^[43] and AdaptiveCpp^[44] compilers provide a backend to NVIDIA GPUs, similar to how CUDA does. This allows SYCL code to be compiled and run on NVIDIA hardware, allowing developers to leverage SYCL's high-level abstractions on CUDA-capable GPUs.^[43]^[44]

ROCm HIP

ROCm HIP targets Nvidia GPU, AMD GPU, and x86 CPU. HIP is a lower-level API that closely resembles CUDA's APIs.^[45] For example, AMD released a tool called HIPIFY that can automatically translate CUDA code to HIP.^[46] Therefore, many of the points mentioned in the comparison between CUDA and SYCL also apply to the comparison between HIP and SYCL.^[47]

ROCm HIP has some similarities to SYCL in the sense that it can target various vendors (AMD and Nvidia) and accelerator types (GPU and CPU).^[48] However, SYCL can target a broader range of accelerators and vendors. SYCL supports multiple types of accelerators simultaneously within a single application through the concept of backends. Additionally, SYCL is written in pure C++, whereas HIP, like CUDA, uses some language extensions. These extensions prevent HIP from being compiled with a standard C++ compiler.^[47]

Both DPC++^[43] and AdaptiveCpp^[44] compilers provide backends for NVIDIA and AMD GPUs, similar to how HIP does. This enables SYCL code to be compiled and executed on hardware from these vendors, offering developers the flexibility to leverage SYCL's high-level abstractions across a diverse range of devices and platforms.^[44]^[43]

Other programming models

SYCL has many similarities to the Kokkos programming model,^[49] including the use of opaque multi-dimensional array objects (SYCL buffers and Kokkos arrays), multi-dimensional ranges for parallel execution, and reductions (added in SYCL 2020). Numerous features in SYCL 2020 were added in response to feedback from the Kokkos community.

References

^ "Khronos SYCL Registry - the Khronos Group Inc".
^ Keryell, Ronan (17 November 2019). "SYCL: A Single-Source C++ Standard for Heterogeneous Computing" (PDF). Khronos.org. Retrieved 26 September 2023.
^ Keryell, Ronan. "Meaning of SYCL". GitHub. Retrieved 5 February 2021.
^ Khronos Group (19 March 2014). "Khronos Releases SYCL 1.2 Provisional Specification". Khronos. Retrieved 20 August 2017.
^ Khronos Group (11 May 2015). "Khronos Releases SYCL 1.2 Final Specification". Khronos. Retrieved 20 August 2017.
^ Khronos Group (6 December 2017). "The Khronos Group Releases Finalized SYCL 1.2.1". Khronos. Retrieved 12 December 2017.
^ Khronos Group (18 April 2016). "Khronos Releases OpenCL 2.2 Provisional Specification with OpenCL C++ Kernel Language". Khronos. Retrieved 18 September 2017.
^ Khronos Group (9 February 2021). "Khronos Releases SYCL 2020 Specification". Khronos. Retrieved 22 February 2021.
^ Khronos Group (30 June 2020). "Khronos Steps Towards Widespread Deployment of SYCL with Release of SYCL 2020 Provisional Specification". Khronos. Retrieved 4 December 2020.
^ https://www.iwocl.org/wp-content/uploads/k04-iwocl-syclcon-2021-wong-slides.pdf ^{[bare URL PDF]}
^ https://www.iwocl.org/wp-content/uploads/k01-iwocl-syclcon-2021-reinders-slides.pdf ^{[bare URL PDF]}
^ "Compile Cross-Architecture: Intel® oneAPI DPC++/C++ Compiler".
^ "Home - ComputeCpp CE - Products - Codeplay Developer".
^ "Guides - ComputeCpp CE - Products - Codeplay Developer".
^ "The Future of ComputeCpp". www.codeplay.com. Retrieved 2023-12-09.
^ "AdaptiveCpp feature support". GitHub. 4 July 2023.
^ "AdaptiveCpp/doc/compilation.md at develop · AdaptiveCpp/AdaptiveCpp". GitHub.
^ "AdaptiveCpp (formerly known as hipSYCL / Open SYCL)". GitHub. 4 July 2023.
^ "triSYCL". GitHub. 6 January 2022.
^ Ke, Yinan; Agung, Mulya; Takizawa, Hiroyuki (2021). "NeoSYCL: A SYCL implementation for SX-Aurora TSUBASA". The International Conference on High Performance Computing in Asia-Pacific Region. pp. 50–57. doi:10.1145/3432261.3432268. ISBN 9781450388429. S2CID 231597238.
^ Ke, Yinan; Agung, Mulya; Takizawa, Hiroyuki (2021). "NeoSYCL: A SYCL implementation for SX-Aurora TSUBASA". The International Conference on High Performance Computing in Asia-Pacific Region. pp. 50–57. doi:10.1145/3432261.3432268. ISBN 9781450388429. S2CID 231597238.
^ "Sycl-GTX". GitHub. 10 April 2021.
^ https://www.iwocl.org/wp-content/uploads/14-iwocl-syclcon-2021-thoman-slides.pdf ^{[bare URL PDF]}
^ "Polygeist". GitHub. 25 February 2022.
^ "Inteon". 25 February 2022.
^ ^a ^b "Khronos to Create SYCL SC Open Standard for Safety-Critical C++ Based Heterogeneous Compute". The Khronos Group. 2023-03-15. Retrieved 2024-07-10.
^ "MISRA". Retrieved 2024-07-11.
^ "ED-12C Aviation Software Standards Training - Airborne". Eurocae. Retrieved 2024-07-11.
^ "SOTIF – practical training". www.kuglermaag.com. Retrieved 2024-07-11.
^ https://www.iwocl.org/wp-content/uploads/k03-iwocl-syclcon-2021-trevett-updated.mp4.pdf ^{[bare URL PDF]}
^ Crisci, Luigi; Salimi Beni, Majid; Cosenza, Biagio; Scipione, Nicolò; Gadioli, Davide; Vitali, Emanuele; Palermo, Gianluca; Beccari, Andrea (2022-05-10). "Towards a Portable Drug Discovery Pipeline with SYCL 2020". Proceedings of the 10th International Workshop on OpenCL. IWOCL '22. New York, NY, USA: Association for Computing Machinery: 1–2. doi:10.1145/3529538.3529688. ISBN 978-1-4503-9658-5.
^ Solis-Vasquez, Leonardo; Mascarenhas, Edward; Koch, Andreas (2023-04-18). "Experiences Migrating CUDA to SYCL: A Molecular Docking Case Study". Proceedings of the 2023 International Workshop on OpenCL. IWOCL '23. New York, NY, USA: Association for Computing Machinery: 1–11. doi:10.1145/3585341.3585372. ISBN 979-8-4007-0745-2.
^ https://www.iwocl.org/wp-content/uploads/20-iwocl-syclcon-2021-rudkin-slides.pdf ^{[bare URL PDF]}
^ Rangel, Esteban Miguel; Pennycook, Simon John; Pope, Adrian; Frontiere, Nicholas; Ma, Zhiqiang; Madananth, Varsha (2023-11-12). "A Performance-Portable SYCL Implementation of CRK-HACC for Exascale". Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis. SC-W '23. New York, NY, USA: Association for Computing Machinery: 1114–1125. doi:10.1145/3624062.3624187. ISBN 979-8-4007-0785-8.
^ "SYCL Resources". khronos.org. Khronos group. 20 January 2014.
^ "SYCL Open Source Specification". GitHub. 10 January 2022.
^ ^a ^b Breyer, Marcel; Van Craen, Alexander; Pflüger, Dirk (2022-05-10). "A Comparison of SYCL, OpenCL, CUDA, and OpenMP for Massively Parallel Support Vector Machine Classification on Multi-Vendor Hardware". Proceedings of the 10th International Workshop on OpenCL. IWOCL '22. New York, NY, USA: Association for Computing Machinery: 1–12. doi:10.1145/3529538.3529980. ISBN 978-1-4503-9658-5.
^ "SYCL - C++ Single-source Heterogeneous Programming for Acceleration Offload". The Khronos Group. 2014-01-20. Retrieved 2024-07-12.
^ ^a ^b "SYCL™ 2020 Specification (revision 8)". registry.khronos.org. Retrieved 2024-07-12.
^ oneapi-src/SYCLomatic, oneAPI-SRC, 2024-07-11, retrieved 2024-07-11
^ Chen, Jolly; Dessole, Monica; Varbanescu, Ana Lucia (2024-01-24), Lessons Learned Migrating CUDA to SYCL: A HEP Case Study with ROOT RDataFrame, doi:10.48550/arXiv.2401.13310, retrieved 2024-07-12
^ "Buffer Accessor Modes". Intel. Retrieved 2024-07-11.
^ ^a ^b ^c ^d "DPC++ Documentation — oneAPI DPC++ Compiler documentation". intel.github.io. Retrieved 2024-07-11.
^ ^a ^b ^c ^d "AdaptiveCpp/doc/sycl-ecosystem.md at develop · AdaptiveCpp/AdaptiveCpp". GitHub. Retrieved 2024-07-11.
^ ROCm/HIP, AMD ROCm™ Software, 2024-07-11, retrieved 2024-07-11
^ "HIPIFY/README.md at amd-staging · ROCm/HIPIFY". GitHub. Retrieved 2024-07-11.
^ ^a ^b Jin, Zheming; Vetter, Jeffrey S. (2022-11). "Evaluating Nonuniform Reduction in HIP and SYCL on GPUs". IEEE: 37–43. doi:10.1109/DRBSD56682.2022.00010. ISBN 978-1-6654-6337-9. {{cite journal}}: Check date values in: |date= (help); Cite journal requires |journal= (help)
^ Reguly, Istvan Z. (2023-11-12). "Evaluating the performance portability of SYCL across CPUs and GPUs on bandwidth-bound applications". Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis. SC-W '23. New York, NY, USA: Association for Computing Machinery: 1038–1047. doi:10.1145/3624062.3624180. ISBN 979-8-4007-0785-8.
^ Hammond, Jeff R.; Kinsner, Michael; Brodman, James (2019). "A comparative analysis of Kokkos and SYCL as heterogeneous, parallel programming models for C++ applications". Proceedings of the International Workshop on OpenCL. pp. 1–2. doi:10.1145/3318170.3318193. ISBN 9781450362306. S2CID 195777149.

External links

[releases-1] "Khronos SYCL Registry - the Khronos Group Inc".

[2] Keryell, Ronan (17 November 2019). "SYCL: A Single-Source C++ Standard for Heterogeneous Computing" (PDF). Khronos.org. Retrieved 26 September 2023.

[3] Keryell, Ronan. "Meaning of SYCL". GitHub. Retrieved 5 February 2021.

[sycl-gdc-2014-4] Khronos Group (19 March 2014). "Khronos Releases SYCL 1.2 Provisional Specification". Khronos. Retrieved 20 August 2017.

[sycl-iwocl-2015-5] Khronos Group (11 May 2015). "Khronos Releases SYCL 1.2 Final Specification". Khronos. Retrieved 20 August 2017.

[sycl-ea-2017-6] Khronos Group (6 December 2017). "The Khronos Group Releases Finalized SYCL 1.2.1". Khronos. Retrieved 12 December 2017.

[sycl-iwocl-2016-7] Khronos Group (18 April 2016). "Khronos Releases OpenCL 2.2 Provisional Specification with OpenCL C++ Kernel Language". Khronos. Retrieved 18 September 2017.

[sycl-2020-pr-8] Khronos Group (9 February 2021). "Khronos Releases SYCL 2020 Specification". Khronos. Retrieved 22 February 2021.

[sycl-2-9] Khronos Group (30 June 2020). "Khronos Steps Towards Widespread Deployment of SYCL with Release of SYCL 2020 Provisional Specification". Khronos. Retrieved 4 December 2020.

[10] ttps://www.iwocl.org/wp-content/uploads/k04-iwocl-syclcon-2021-wong-slides.pdf ^{[bare URL PDF]}

[11] ttps://www.iwocl.org/wp-content/uploads/k01-iwocl-syclcon-2021-reinders-slides.pdf ^{[bare URL PDF]}

[12] "Compile Cross-Architecture: Intel® oneAPI DPC++/C++ Compiler".

[13] "Home - ComputeCpp CE - Products - Codeplay Developer".

[14] "Guides - ComputeCpp CE - Products - Codeplay Developer".

[15] "The Future of ComputeCpp". www.codeplay.com. Retrieved 2023-12-09.

[16] "AdaptiveCpp feature support". GitHub. 4 July 2023.

[17] "AdaptiveCpp/doc/compilation.md at develop · AdaptiveCpp/AdaptiveCpp". GitHub.

[18] "AdaptiveCpp (formerly known as hipSYCL / Open SYCL)". GitHub. 4 July 2023.

[19] "triSYCL". GitHub. 6 January 2022.

[20] Ke, Yinan; Agung, Mulya; Takizawa, Hiroyuki (2021). "NeoSYCL: A SYCL implementation for SX-Aurora TSUBASA". The International Conference on High Performance Computing in Asia-Pacific Region. pp. 50–57. doi:10.1145/3432261.3432268. ISBN 9781450388429. S2CID 231597238.

[21] Ke, Yinan; Agung, Mulya; Takizawa, Hiroyuki (2021). "NeoSYCL: A SYCL implementation for SX-Aurora TSUBASA". The International Conference on High Performance Computing in Asia-Pacific Region. pp. 50–57. doi:10.1145/3432261.3432268. ISBN 9781450388429. S2CID 231597238.

[22] "Sycl-GTX". GitHub. 10 April 2021.

[23] ttps://www.iwocl.org/wp-content/uploads/14-iwocl-syclcon-2021-thoman-slides.pdf ^{[bare URL PDF]}

[24] "Polygeist". GitHub. 25 February 2022.

[25] "Inteon". 25 February 2022.

[:0-26] "Khronos to Create SYCL SC Open Standard for Safety-Critical C++ Based Heterogeneous Compute". The Khronos Group. 2023-03-15. Retrieved 2024-07-10.

[27] "MISRA". Retrieved 2024-07-11.

[28] "ED-12C Aviation Software Standards Training - Airborne". Eurocae. Retrieved 2024-07-11.

[29] "SOTIF – practical training". www.kuglermaag.com. Retrieved 2024-07-11.

[30] ttps://www.iwocl.org/wp-content/uploads/k03-iwocl-syclcon-2021-trevett-updated.mp4.pdf ^{[bare URL PDF]}

[31] Crisci, Luigi; Salimi Beni, Majid; Cosenza, Biagio; Scipione, Nicolò; Gadioli, Davide; Vitali, Emanuele; Palermo, Gianluca; Beccari, Andrea (2022-05-10). "Towards a Portable Drug Discovery Pipeline with SYCL 2020". Proceedings of the 10th International Workshop on OpenCL. IWOCL '22. New York, NY, USA: Association for Computing Machinery: 1–2. doi:10.1145/3529538.3529688. ISBN 978-1-4503-9658-5.

[32] Solis-Vasquez, Leonardo; Mascarenhas, Edward; Koch, Andreas (2023-04-18). "Experiences Migrating CUDA to SYCL: A Molecular Docking Case Study". Proceedings of the 2023 International Workshop on OpenCL. IWOCL '23. New York, NY, USA: Association for Computing Machinery: 1–11. doi:10.1145/3585341.3585372. ISBN 979-8-4007-0745-2.

[33] ttps://www.iwocl.org/wp-content/uploads/20-iwocl-syclcon-2021-rudkin-slides.pdf ^{[bare URL PDF]}

[34] Rangel, Esteban Miguel; Pennycook, Simon John; Pope, Adrian; Frontiere, Nicholas; Ma, Zhiqiang; Madananth, Varsha (2023-11-12). "A Performance-Portable SYCL Implementation of CRK-HACC for Exascale". Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis. SC-W '23. New York, NY, USA: Association for Computing Machinery: 1114–1125. doi:10.1145/3624062.3624187. ISBN 979-8-4007-0785-8.

[35] "SYCL Resources". khronos.org. Khronos group. 20 January 2014.

[SYCL_license-36] "SYCL Open Source Specification". GitHub. 10 January 2022.

[:6-37] Breyer, Marcel; Van Craen, Alexander; Pflüger, Dirk (2022-05-10). "A Comparison of SYCL, OpenCL, CUDA, and OpenMP for Massively Parallel Support Vector Machine Classification on Multi-Vendor Hardware". Proceedings of the 10th International Workshop on OpenCL. IWOCL '22. New York, NY, USA: Association for Computing Machinery: 1–12. doi:10.1145/3529538.3529980. ISBN 978-1-4503-9658-5.

[:7-38] "SYCL - C++ Single-source Heterogeneous Programming for Acceleration Offload". The Khronos Group. 2014-01-20. Retrieved 2024-07-12.

[:8-39] "SYCL™ 2020 Specification (revision 8)". registry.khronos.org. Retrieved 2024-07-12.

[40] oneapi-src/SYCLomatic, oneAPI-SRC, 2024-07-11, retrieved 2024-07-11

[41] Chen, Jolly; Dessole, Monica; Varbanescu, Ana Lucia (2024-01-24), Lessons Learned Migrating CUDA to SYCL: A HEP Case Study with ROOT RDataFrame, doi:10.48550/arXiv.2401.13310, retrieved 2024-07-12

[42] "Buffer Accessor Modes". Intel. Retrieved 2024-07-11.

[:1-43] "DPC++ Documentation — oneAPI DPC++ Compiler documentation". intel.github.io. Retrieved 2024-07-11.

[:2-44] "AdaptiveCpp/doc/sycl-ecosystem.md at develop · AdaptiveCpp/AdaptiveCpp". GitHub. Retrieved 2024-07-11.

[45] ROCm/HIP, AMD ROCm™ Software, 2024-07-11, retrieved 2024-07-11

[46] "HIPIFY/README.md at amd-staging · ROCm/HIPIFY". GitHub. Retrieved 2024-07-11.

[:9-47] Jin, Zheming; Vetter, Jeffrey S. (2022-11). "Evaluating Nonuniform Reduction in HIP and SYCL on GPUs". IEEE: 37–43. doi:10.1109/DRBSD56682.2022.00010. ISBN 978-1-6654-6337-9. {{cite journal}}: Check date values in: |date= (help); Cite journal requires |journal= (help)

[48] Reguly, Istvan Z. (2023-11-12). "Evaluating the performance portability of SYCL across CPUs and GPUs on bandwidth-bound applications". Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis. SC-W '23. New York, NY, USA: Association for Computing Machinery: 1038–1047. doi:10.1145/3624062.3624180. ISBN 979-8-4007-0785-8.

[49] Hammond, Jeff R.; Kinsner, Michael; Brodman, James (2019). "A comparative analysis of Kokkos and SYCL as heterogeneous, parallel programming models for C++ applications". Proceedings of the International Workshop on OpenCL. pp. 1–2. doi:10.1145/3318170.3318193. ISBN 9781450362306. S2CID 195777149.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]

[44]

[45]

[46]

[47]

[48]

[49]

@@ Line 76: / Line 76: @@
 The source files for building the specification, such as Makefiles and some scripts, the SYCL headers and the SYCL code samples are under the [[Apache License|Apache 2.0 license]].<ref name="SYCL license">{{cite web|url = https://github.com/KhronosGroup/SYCL-Docs/blob/SYCL-1.2.1/master/LICENSE.txt|title = SYCL Open Source Specification|website = [[GitHub]]|date = 10 January 2022}}</ref>
-==Comparison with other APIs==
+== Comparison with other Tools ==
-The open standards SYCL and [[OpenCL]] are similar to the programming models of the proprietary stack [[CUDA]] from [[Nvidia]], and [[ROCm#HIP programming|HIP]] from the open-source stack [[ROCm]], supported by [[Advanced Micro Devices|AMD]].
+The open standards SYCL and [[OpenCL]] are similar to the programming models of the proprietary stack [[CUDA]] from [[Nvidia]] and [[ROCm#HIP programming|HIP]] from the open-source stack [[ROCm]], supported by [[Advanced Micro Devices|AMD]].<ref name=":6">{{Cite journal |last=Breyer |first=Marcel |last2=Van Craen |first2=Alexander |last3=Pflüger |first3=Dirk |date=2022-05-10 |title=A Comparison of SYCL, OpenCL, CUDA, and OpenMP for Massively Parallel Support Vector Machine Classification on Multi-Vendor Hardware |url=https://doi.org/10.1145/3529538.3529980 |journal=Proceedings of the 10th International Workshop on OpenCL |series=IWOCL '22 |location=New York, NY, USA |publisher=Association for Computing Machinery |pages=1–12 |doi=10.1145/3529538.3529980 |isbn=978-1-4503-9658-5}}</ref>
-In the [[Khronos Group]] realm, [[OpenCL]] and [[Vulkan (API)|Vulkan]] are the low-level ''non-single source'' [[Application programming interface|API]] and SYCL is the high-level ''single-source'' [[C++]] [[eDSL|embedded domain-specific language]] (eDSL).
+In the [[Khronos Group]] realm, [[OpenCL]] and [[Vulkan (API)|Vulkan]] are the low-level ''non-single source'' [[Application programming interface|APIs]], providing fine-grained control over hardware resources and operations. OpenCL is widely used for parallel programming across various hardware types, while Vulkan primarily focuses on high-performance graphics and computing tasks.<ref name=":7">{{Cite web |date=2014-01-20 |title=SYCL - C++ Single-source Heterogeneous Programming for Acceleration Offload |url=https://www.khronos.org/sycl/ |access-date=2024-07-12 |website=The Khronos Group |language=en}}</ref>
+SYCL, on the other hand, is the high-level ''single-source'' [[C++]] [[EDSL|embedded domain-specific language]] (eDSL). It enables developers to write code for heterogeneous computing systems, including CPUs, GPUs, and other accelerators, using a single-source approach. This means that both host and device code can be written in the same C++ source file.<ref name=":8">{{Cite web |title=SYCL™ 2020 Specification (revision 8) |url=https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html |access-date=2024-07-12 |website=registry.khronos.org}}</ref>
-===CUDA===
-By comparison, the ''single-source'' [[C++]] [[eDSL|embedded domain-specific language]] version of CUDA, which is actually named "CUDA ''Runtime'' [[Application programming interface|API]]", is somewhat similar to SYCL.
-But there is actually a less known ''non single-source'' version of CUDA which is called "CUDA ''Driver'' [[Application programming interface|API]]", similar to [[OpenCL]], and used for example by the CUDA ''Runtime'' [[Application programming interface|API]] implementation itself.
+=== CUDA ===
-SYCL extends the [[C++ AMP]] features relieving the programmer from explicitly transferring the data between the host and devices by using ''buffers'' and ''accessors'', by opposition to CUDA (before the introduction of ''Unified Memory'' in CUDA 6). But starting with SYCL 2020, it is also possible to use USM instead of ''buffers'' and ''accessors'' to use a lower-level programming model similar to ''Unified Memory'' in CUDA.
+By comparison, the ''single-source'' [[C++]] [[EDSL|embedded domain-specific language]] version of CUDA, which is named "CUDA ''Runtime'' [[Application programming interface|API]]," is somewhat similar to SYCL. In fact, [[Intel]] released a tool called SYCLOMATIC that automatically translated code from CUDA to SYCL.<ref>{{Citation |title=oneapi-src/SYCLomatic |date=2024-07-11 |url=https://github.com/oneapi-src/SYCLomatic |access-date=2024-07-11 |publisher=oneAPI-SRC}}</ref> However, there is a less known non-single-source version of CUDA, which is called "CUDA Driver API," similar to OpenCL, and used, for example, by the CUDA Runtime API implementation itself.<ref name=":6" />
+SYCL extends the [[C++ AMP]] features, relieving the programmer from explicitly transferring data between the host and devices by using buffers and accessors. This is in contrast to CUDA (prior to the introduction of ''[[CUDA#Version features and specifications|Unified Memory]]'' in CUDA 6), where explicit data transfers were required. Starting with SYCL 2020, it is also possible to use USM instead of ''buffers'' and ''accessors'', providing a lower-level programming model similar to ''Unified Memory'' in CUDA.<ref>{{Citation |last=Chen |first=Jolly |title=Lessons Learned Migrating CUDA to SYCL: A HEP Case Study with ROOT RDataFrame |date=2024-01-24 |url=http://arxiv.org/abs/2401.13310 |access-date=2024-07-12 |doi=10.48550/arXiv.2401.13310 |last2=Dessole |first2=Monica |last3=Varbanescu |first3=Ana Lucia}}</ref>
-SYCL is higher-level than C++ AMP and CUDA since you do not need to build an explicit dependency graph between all the kernels, and
-provides you automatic asynchronous scheduling of the kernels with communication and computation overlap. This is
-all done by using the concept of accessors, without requiring any compiler support.
+SYCL is higher-level than C++ AMP and CUDA since you do not need to build an explicit dependency graph between all the kernels, and it provides you with automatic asynchronous scheduling of the kernels with communication and computation overlap. This is all done by using the concept of accessors without requiring any compiler support.<ref>{{Cite web |title=Buffer Accessor Modes |url=https://www.intel.com/content/www/us/en/docs/oneapi/optimization-guide-gpu/2023-0/buffer-accessor-modes.html |access-date=2024-07-11 |website=Intel |language=en}}</ref>
-Unlike C++ AMP and CUDA, SYCL is a pure C++ eDSL without any C++ extension, allowing a basic CPU implementation relying on pure runtime without any specific compiler. This is very useful for debugging an application or for prototyping for a new architecture without having the architecture and compiler available yet.
+Unlike C++ AMP and CUDA, SYCL is a pure C++ eDSL without any C++ extension. This allows for a basic CPU implementation that relies on pure runtime without any specific compiler.<ref name=":8" />
-There are at least 3 known SYCL implementations targeting the CUDA backend.
+Both DPC++<ref name=":1">{{Cite web |title=DPC++ Documentation — oneAPI DPC++ Compiler documentation |url=https://intel.github.io/llvm-docs/ |access-date=2024-07-11 |website=intel.github.io}}</ref> and AdaptiveCpp<ref name=":2">{{Cite web |title=AdaptiveCpp/doc/sycl-ecosystem.md at develop · AdaptiveCpp/AdaptiveCpp |url=https://github.com/AdaptiveCpp/AdaptiveCpp/blob/develop/doc/sycl-ecosystem.md |access-date=2024-07-11 |website=GitHub |language=en}}</ref> compilers provide a backend to NVIDIA GPUs, similar to how CUDA does. This allows SYCL code to be compiled and run on NVIDIA hardware, allowing developers to leverage SYCL's high-level abstractions on CUDA-capable GPUs.<ref name=":1" /><ref name=":2" />
-===ROCm HIP===
-{{Section_expand|small=no|date=February 2022}}
-ROCm HIP can be seen as a clone of CUDA targeting Nvidia GPU, AMD GPU and x86 CPU. Thus ROCm HIP is a lower-level API compared to SYCL and most of the comments mentioned in the comparison with CUDA do apply.
+=== ROCm HIP ===
-ROCm HIP has some similarities to SYCL in the sense that it can target various vendors (AMD and Nvidia) and accelerator types (GPU and CPU). But SYCL can target according to the implementation any type of accelerators and any vendors, potentially at the same time, in a single application with the concept of backend. SYCL is also pure C++ while HIP uses some extensions inherited from CUDA, which prevents using a normal C++ compiler to target any CPU.
+[[ROCm#Software ecosystem|ROCm HIP]] targets Nvidia GPU, [[AMD]] GPU, and x86 CPU. HIP is a lower-level API that closely resembles CUDA's APIs.<ref>{{Citation |title=ROCm/HIP |date=2024-07-11 |url=https://github.com/ROCm/HIP |access-date=2024-07-11 |publisher=AMD ROCm™ Software}}</ref> For example, AMD released a tool called HIPIFY that can automatically translate CUDA code to HIP.<ref>{{Cite web |title=HIPIFY/README.md at amd-staging · ROCm/HIPIFY |url=https://github.com/ROCm/HIPIFY/blob/amd-staging/README.md |access-date=2024-07-11 |website=GitHub |language=en}}</ref> Therefore, many of the points mentioned in the comparison between CUDA and SYCL also apply to the comparison between HIP and SYCL.<ref name=":9">{{Cite journal |last=Jin |first=Zheming |last2=Vetter |first2=Jeffrey S. |date=2022-11 |title=Evaluating Nonuniform Reduction in HIP and SYCL on GPUs |url=https://ieeexplore.ieee.org/document/10025472/ |publisher=IEEE |pages=37–43 |doi=10.1109/DRBSD56682.2022.00010 |isbn=978-1-6654-6337-9}}</ref>
+ROCm HIP has some similarities to SYCL in the sense that it can target various vendors (AMD and Nvidia) and accelerator types (GPU and CPU).<ref>{{Cite journal |last=Reguly |first=Istvan Z. |date=2023-11-12 |title=Evaluating the performance portability of SYCL across CPUs and GPUs on bandwidth-bound applications |url=https://doi.org/10.1145/3624062.3624180 |journal=Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis |series=SC-W '23 |location=New York, NY, USA |publisher=Association for Computing Machinery |pages=1038–1047 |doi=10.1145/3624062.3624180 |isbn=979-8-4007-0785-8}}</ref> However, SYCL can target a broader range of accelerators and vendors. SYCL supports multiple types of accelerators simultaneously within a single application through the concept of backends. Additionally, SYCL is written in pure C++, whereas HIP, like CUDA, uses some language extensions. These extensions prevent HIP from being compiled with a standard C++ compiler.<ref name=":9" />
-There are at least 2 known implementations of SYCL targeting the HIP backend, oneAPI DPC++ and Open SYCL.
-The [https://github.com/OpenSYCL/OpenSYCL Open SYCL] implementation, over HIP, adds SYCL programming to CUDA and HIP.
+Both DPC++<ref name=":1" /> and AdaptiveCpp<ref name=":2" /> compilers provide backends for NVIDIA and AMD GPUs, similar to how HIP does. This enables SYCL code to be compiled and executed on hardware from these vendors, offering developers the flexibility to leverage SYCL's high-level abstractions across a diverse range of devices and platforms.<ref name=":2" /><ref name=":1" />
 ===Other programming models===

v t e Khronos Group Standards
Active	EGL glTF NNEF OpenCL OpenVG OpenVX OpenXR SPIR SYCL Vulkan
Inactive	COLLADA OpenGL ES SC WebGL OpenKODE OpenMAX OpenSL ES OpenWF WebCL

v t e Parallel computing
General	Distributed computing Parallel computing Massively parallel Cloud computing High-performance computing Multiprocessing Manycore processor GPGPU Computer network Systolic array
Levels	Bit Instruction Thread Task Data Memory Loop Pipeline
Multithreading	Temporal Simultaneous (SMT) Simultaneous and heterogenous Speculative (SpMT) Preemptive Cooperative Clustered multi-thread (CMT) Hardware scout
Theory	PRAM model PEM model Analysis of parallel algorithms Amdahl's law Gustafson's law Cost efficiency Karp–Flatt metric Slowdown Speedup
Elements	Process Thread Fiber Instruction window Array
Coordination	Multiprocessing Memory coherence Cache coherence Cache invalidation Barrier Synchronization Application checkpointing
Programming	Stream processing Dataflow programming Models Implicit parallelism Explicit parallelism Concurrency Non-blocking algorithm
Hardware	Flynn's taxonomy SISD SIMD Array processing (SIMT) Pipelined processing Associative processing MISD MIMD Dataflow architecture Pipelined processor Superscalar processor Vector processor Multiprocessor symmetric asymmetric Memory shared distributed distributed shared UMA NUMA COMA Massively parallel computer Computer cluster Beowulf cluster Grid computer Hardware acceleration
APIs	Ateji PX Boost Chapel HPX Charm++ Cilk Coarray Fortran CUDA Dryad C++ AMP Global Arrays GPUOpen MPI OpenMP OpenCL OpenHMPP OpenACC Parallel Extensions PVM pthreads RaftLib ROCm UPC TBB ZPL
Problems	Automatic parallelization Deadlock Deterministic algorithm Embarrassingly parallel Parallel slowdown Race condition Software lockout Scalability Starvation
Category: Parallel computing