Jump to content

User:Gianaccordi/sandbox1

From Wikipedia, the free encyclopedia

Extensions

[edit]

SYCL Safety Critical

[edit]

In march the Khronos Group announced the creation of the SYCL SC Working Group[1], with the objective of creating a a high-level heterogeneous computing framework for safety-critical systems. These systems span various fields, including avionics, automotive, industrial, and medical sectors.

The SYCL Safety Critical framework will comply with several industry standards to ensure its reliability and safety. These standards include MISRA C++ 202X[2], which provides guidelines for the use of C++ in critical systems, RTCA DO-178C / EASA ED-12C[3], which are standards for software considerations in airborne systems and equipment certification, ISO 26262/21448[4], which pertains to the functional safety of road vehicles, IEC 61508, which covers the functional safety of electrical/electronic/programmable electronic safety-related systems, and IEC 62304,which relates to the lifecycle requirements for medical device software[1].

Software

[edit]

Here is a list of software applications examples which are using SYCL:

  • Bioinformatics
    • GROMACS: A molecular dynamics software widely used in bioinformatics and computational chemistry. Starting from its accelerated version in 2021, GROMACS utilizes SYCL 2020 for efficient computation on various hardware accelerators[5].
    • LiGen: A molecular docking software that utilizes SYCL for accelerating computational tasks related to molecular structure analysis and docking simulations[6].
    • Autodock: Another molecular docking software that leverages SYCL to accelerate the process of predicting how small molecules bind to a receptor of a known 3D structure[7].
  • Automotive Industry
    • ISO 26262: The international standard for functional safety of automotive electrical and electronic systems. SYCL is used in automotive applications to accelerate safety-critical computations and simulations, ensuring compliance with stringent safety standards[8].
  • Cosmology
    • CRK-HACC: A cosmological n-body simulation code that has been ported to SYCL. It uses SYCL to accelerate calculations related to large-scale structure formation and dynamics in the universe[9].

Comparison with other Tools

[edit]

The open standards SYCL and OpenCL are similar to the programming models of the proprietary stack CUDA from Nvidia and HIP from the open-source stack ROCm, supported by AMD.[10]

In the Khronos Group realm, OpenCL and Vulkan are the low-level non-single source APIs, providing fine-grained control over hardware resources and operations. OpenCL is widely used for parallel programming across various hardware types, while Vulkan primarily focuses on high-performance graphics and computing tasks.[11]

SYCL, on the other hand, is the high-level single-source C++ embedded domain-specific language (eDSL). It enables developers to write code for heterogeneous computing systems, including CPUs, GPUs, and other accelerators, using a single-source approach. This means that both host and device code can be written in the same C++ source file.[12]

CUDA

[edit]

By comparison, the single-source C++ embedded domain-specific language version of CUDA, which is named "CUDA Runtime API," is somewhat similar to SYCL. In fact, Intel released a tool called SYCLOMATIC that automatically translated code from CUDA to SYCL.[13] However, there is a less known non-single-source version of CUDA, which is called "CUDA Driver API," similar to OpenCL, and used, for example, by the CUDA Runtime API implementation itself.[10]

SYCL extends the C++ AMP features, relieving the programmer from explicitly transferring data between the host and devices by using buffers and accessors. This is in contrast to CUDA (prior to the introduction of Unified Memory in CUDA 6), where explicit data transfers were required. Starting with SYCL 2020, it is also possible to use USM instead of buffers and accessors, providing a lower-level programming model similar to Unified Memory in CUDA.[14]

SYCL is higher-level than C++ AMP and CUDA since you do not need to build an explicit dependency graph between all the kernels, and it provides you with automatic asynchronous scheduling of the kernels with communication and computation overlap. This is all done by using the concept of accessors without requiring any compiler support.[15]

Unlike C++ AMP and CUDA, SYCL is a pure C++ eDSL without any C++ extension. This allows for a basic CPU implementation that relies on pure runtime without any specific compiler.[12]

Both DPC++[16] and AdaptiveCpp[17] compilers provide a backend to NVIDIA GPUs, similar to how CUDA does. This allows SYCL code to be compiled and run on NVIDIA hardware, allowing developers to leverage SYCL's high-level abstractions on CUDA-capable GPUs.[16][17]

ROCm HIP

[edit]

ROCm HIP targets Nvidia GPU, AMD GPU, and x86 CPU. HIP is a lower-level API that closely resembles CUDA's APIs.[18] For example, AMD released a tool called HIPIFY that can automatically translate CUDA code to HIP.[19] Therefore, many of the points mentioned in the comparison between CUDA and SYCL also apply to the comparison between HIP and SYCL.[20]

ROCm HIP has some similarities to SYCL in the sense that it can target various vendors (AMD and Nvidia) and accelerator types (GPU and CPU).[21] However, SYCL can target a broader range of accelerators and vendors. SYCL supports multiple types of accelerators simultaneously within a single application through the concept of backends. Additionally, SYCL is written in pure C++, whereas HIP, like CUDA, uses some language extensions. These extensions prevent HIP from being compiled with a standard C++ compiler.[20]

Both DPC++[16] and AdaptiveCpp[17] compilers provide backends for NVIDIA and AMD GPUs, similar to how HIP does. This enables SYCL code to be compiled and executed on hardware from these vendors, offering developers the flexibility to leverage SYCL's high-level abstractions across a diverse range of devices and platforms.[17][16]

Kokkos

[edit]

SYCL has many similarities to the Kokkos programming model,[22] including the use of opaque multi-dimensional array objects (SYCL buffers and Kokkos arrays), multi-dimensional ranges for parallel execution, and reductions (added in SYCL 2020).[23] Numerous features in SYCL 2020 were added in response to feedback from the Kokkos community.

SYCL focuses more on heterogeneous systems; thanks to its integration with OpenCL, it can be adopted on a wide range of devices. Kokkos, on the other hand, targets most of the HPC platforms[24], thus it is more HPC-oriented for performance.

As of 2024, the Kokkos team is developing a SYCL backend[25], which enables Kokkos to target Intel hardware in addition to the platforms it already supports. This development broadens the applicability of Kokkos and allows for greater flexibility in leveraging different hardware architectures within HPC applications.[22]

Raja

[edit]

Raja[26][27] is a library of C++ software abstractions to enable the architecture and programming portability of HPC applications.

Like SYCL, it provides portable code across heterogeneous platforms. However, unlike SYCL, Raja introduces an abstraction layer over other programming models like CUDA, HIP, OpenMP, and others.[28] This allows developers to write their code once and run it on various backends without modifying the core logic. Raja is maintained and developed at Lawrence Livermore National Laboratory (LLNL), whereas SYCL is an open standard maintained by the community.[11]

Similar to Kokkos, Raja is more tailored for HPC use cases, focusing on performance and scalability in high-performance computing environments. In contrast, SYCL supports a broader range of devices, making it more versatile for different types of applications beyond just HPC.[27]

As of 2024, the Raja team is developing a SYCL backend[29], which will enable Raja to also target Intel hardware. This development will enhance Raja's portability and flexibility, allowing it to leverage SYCL's capabilities and expand its applicability across a wider array of hardware platforms.[11]

OpenMP

[edit]

OpenMP targets computational offloading to external accelerators[30], primarily focusing on multi-core architectures and GPUs. SYCL, on the other hand, is oriented towards a broader range of devices due to its integration with OpenCL, which enables support for various types of hardware accelerators.[31]

OpenMP uses a pragma-based approach, where the programmer annotates the code with directives, and the compiler handles the complexity of parallel execution and memory management. This high-level abstraction makes it easier for developers to parallelize their applications without dealing with the intricate details of memory transfers and synchronization.[32]

Both OpenMP and SYCL support C++ and are standardized. OpenMP is standardized by the OpenMP Architecture Review Board (ARB), while SYCL is standardized by the Khronos Group.[11]

OpenMP has wide support from various compilers, like GCC and Clang.[33]

std::par

[edit]

std::par is part of the C++17 standard[34] and is designed to facilitate the parallel execution of standard algorithms on C++ standard containers. It provides a standard way to take advantage of external accelerators by allowing developers to specify an execution policy for parallel operations, such as std::for_each, std::transform, and std::reduce. This enables efficient use of multi-core processors and other parallel hardware without requiring significant changes to the code.[35]

SYCL can be used as a backend for std::par, enabling the execution of standard algorithms on a wide range of external accelerators, including GPUs from Intel, AMD, and NVIDIA, as well as other types of accelerators.[36] By leveraging SYCL's capabilities, developers can write standard C++ code that seamlessly executes on heterogeneous computing environments. This integration allows for greater flexibility and performance optimization across different hardware platforms.[36]

The use of SYCL as a backend for std::par is compiler-dependent, meaning it requires a compiler that supports both SYCL and the parallel execution policies introduced in C++17.[36] Examples of such compilers include DPC++ and other SYCL-compliant compilers. With these compilers, developers can take advantage of SYCL's abstractions for memory management and parallel execution while still using the familiar C++ standard algorithms and execution policies.[16]

References

[edit]
  1. ^ a b "Khronos to Create SYCL SC Open Standard for Safety-Critical C++ Based Heterogeneous Compute". The Khronos Group. 2023-03-15. Retrieved 2024-07-10.
  2. ^ "MISRA". Retrieved 2024-07-11.
  3. ^ "ED-12C Aviation Software Standards Training - Airborne". Eurocae. Retrieved 2024-07-11.
  4. ^ "SOTIF – practical training". www.kuglermaag.com. Retrieved 2024-07-11.
  5. ^ https://www.iwocl.org/wp-content/uploads/k03-iwocl-syclcon-2021-trevett-updated.mp4.pdf [bare URL PDF]
  6. ^ Crisci, Luigi; Salimi Beni, Majid; Cosenza, Biagio; Scipione, Nicolò; Gadioli, Davide; Vitali, Emanuele; Palermo, Gianluca; Beccari, Andrea (2022-05-10). "Towards a Portable Drug Discovery Pipeline with SYCL 2020". Proceedings of the 10th International Workshop on OpenCL. IWOCL '22. New York, NY, USA: Association for Computing Machinery: 1–2. doi:10.1145/3529538.3529688. ISBN 978-1-4503-9658-5.
  7. ^ Solis-Vasquez, Leonardo; Mascarenhas, Edward; Koch, Andreas (2023-04-18). "Experiences Migrating CUDA to SYCL: A Molecular Docking Case Study". Proceedings of the 2023 International Workshop on OpenCL. IWOCL '23. New York, NY, USA: Association for Computing Machinery: 1–11. doi:10.1145/3585341.3585372. ISBN 979-8-4007-0745-2.
  8. ^ https://www.iwocl.org/wp-content/uploads/20-iwocl-syclcon-2021-rudkin-slides.pdf [bare URL PDF]
  9. ^ Rangel, Esteban Miguel; Pennycook, Simon John; Pope, Adrian; Frontiere, Nicholas; Ma, Zhiqiang; Madananth, Varsha (2023-11-12). "A Performance-Portable SYCL Implementation of CRK-HACC for Exascale". Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis. SC-W '23. New York, NY, USA: Association for Computing Machinery: 1114–1125. doi:10.1145/3624062.3624187. ISBN 979-8-4007-0785-8.
  10. ^ a b Breyer, Marcel; Van Craen, Alexander; Pflüger, Dirk (2022-05-10). "A Comparison of SYCL, OpenCL, CUDA, and OpenMP for Massively Parallel Support Vector Machine Classification on Multi-Vendor Hardware". Proceedings of the 10th International Workshop on OpenCL. IWOCL '22. New York, NY, USA: Association for Computing Machinery: 1–12. doi:10.1145/3529538.3529980. ISBN 978-1-4503-9658-5.
  11. ^ a b c d "SYCL - C++ Single-source Heterogeneous Programming for Acceleration Offload". The Khronos Group. 2014-01-20. Retrieved 2024-07-12.
  12. ^ a b "SYCL™ 2020 Specification (revision 8)". registry.khronos.org. Retrieved 2024-07-12.
  13. ^ oneapi-src/SYCLomatic, oneAPI-SRC, 2024-07-11, retrieved 2024-07-11
  14. ^ Chen, Jolly; Dessole, Monica; Varbanescu, Ana Lucia (2024-01-24), Lessons Learned Migrating CUDA to SYCL: A HEP Case Study with ROOT RDataFrame, doi:10.48550/arXiv.2401.13310, retrieved 2024-07-12
  15. ^ "Buffer Accessor Modes". Intel. Retrieved 2024-07-11.
  16. ^ a b c d e "DPC++ Documentation — oneAPI DPC++ Compiler documentation". intel.github.io. Retrieved 2024-07-11.
  17. ^ a b c d "AdaptiveCpp/doc/sycl-ecosystem.md at develop · AdaptiveCpp/AdaptiveCpp". GitHub. Retrieved 2024-07-11.
  18. ^ ROCm/HIP, AMD ROCm™ Software, 2024-07-11, retrieved 2024-07-11
  19. ^ "HIPIFY/README.md at amd-staging · ROCm/HIPIFY". GitHub. Retrieved 2024-07-11.
  20. ^ a b Jin, Zheming; Vetter, Jeffrey S. (2022-11). "Evaluating Nonuniform Reduction in HIP and SYCL on GPUs". IEEE: 37–43. doi:10.1109/DRBSD56682.2022.00010. ISBN 978-1-6654-6337-9. {{cite journal}}: Check date values in: |date= (help); Cite journal requires |journal= (help)
  21. ^ Reguly, Istvan Z. (2023-11-12). "Evaluating the performance portability of SYCL across CPUs and GPUs on bandwidth-bound applications". Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis. SC-W '23. New York, NY, USA: Association for Computing Machinery: 1038–1047. doi:10.1145/3624062.3624180. ISBN 979-8-4007-0785-8.
  22. ^ a b Hammond, Jeff R.; Kinsner, Michael; Brodman, James (2019). "A comparative analysis of Kokkos and SYCL as heterogeneous, parallel programming models for C++ applications". Proceedings of the International Workshop on OpenCL. pp. 1–2. doi:10.1145/3318170.3318193. ISBN 9781450362306. S2CID 195777149.
  23. ^ Dufek, Amanda S.; Gayatri, Rahulkumar; Mehta, Neil; Doerfler, Douglas; Cook, Brandon; Ghadar, Yasaman; DeTar, Carleton (2021-11). "Case Study of Using Kokkos and SYCL as Performance-Portable Frameworks for Milc-Dslash Benchmark on NVIDIA, AMD and Intel GPUs". IEEE: 57–67. doi:10.1109/P3HPC54578.2021.00009. ISBN 978-1-6654-2439-4. {{cite journal}}: Check date values in: |date= (help); Cite journal requires |journal= (help)
  24. ^ Trott, Christian R.; Lebrun-Grandié, Damien; Arndt, Daniel; Ciesko, Jan; Dang, Vinh; Ellingwood, Nathan; Gayatri, Rahulkumar; Harvey, Evan; Hollman, Daisy S. (2022), Kokkos 3: Programming Model Extensions for the Exascale Era, retrieved 2024-07-10
  25. ^ Arndt, Daniel; Lebrun-Grandie, Damien; Trott, Christian (2024-04-08). "Experiences with implementing Kokkos' SYCL backend". Proceedings of the 12th International Workshop on OpenCL and SYCL. IWOCL '24. New York, NY, USA: Association for Computing Machinery: 1–11. doi:10.1145/3648115.3648118. ISBN 979-8-4007-1790-1.
  26. ^ LLNL/RAJA, Lawrence Livermore National Laboratory, 2024-07-08, retrieved 2024-07-10
  27. ^ a b Beckingsale, David A.; Scogland, Thomas RW; Burmark, Jason; Hornung, Rich; Jones, Holger; Killian, William; Kunen, Adam J.; Pearce, Olga; Robinson, Peter; Ryujin, Brian S. (2019-11). "RAJA: Portable Performance for Large-Scale Scientific Applications". IEEE: 71–81. doi:10.1109/P3HPC49587.2019.00012. ISBN 978-1-7281-6003-0. {{cite journal}}: Check date values in: |date= (help); Cite journal requires |journal= (help)
  28. ^ Beckingsale, David A.; Scogland, Thomas RW; Burmark, Jason; Hornung, Rich; Jones, Holger; Killian, William; Kunen, Adam J.; Pearce, Olga; Robinson, Peter; Ryujin, Brian S. (2019-11). "RAJA: Portable Performance for Large-Scale Scientific Applications". IEEE: 71–81. doi:10.1109/P3HPC49587.2019.00012. ISBN 978-1-7281-6003-0. {{cite journal}}: Check date values in: |date= (help); Cite journal requires |journal= (help)
  29. ^ Homerding, Brian; Vargas, Arturo; Scogland, Tom; Chen, Robert; Davis, Mike; Hornung, Rich (2024-04-08). "Enabling RAJA on Intel GPUs with SYCL". Proceedings of the 12th International Workshop on OpenCL and SYCL. IWOCL '24. New York, NY, USA: Association for Computing Machinery: 1–10. doi:10.1145/3648115.3648131. ISBN 979-8-4007-1790-1.
  30. ^ tim.lewis. "Home". OpenMP. Retrieved 2024-07-10.
  31. ^ "OpenCL - The Open Standard for Parallel Programming of Heterogeneous Systems". The Khronos Group. 2013-07-21. Retrieved 2024-07-12.
  32. ^ Friedman, Richard. "Reference Guides". OpenMP. Retrieved 2024-07-12.
  33. ^ "OpenMP Compilers & Tools".
  34. ^ "std::execution::seq, std::execution::par, std::execution::par_unseq, std::execution::unseq - cppreference.com". en.cppreference.com. Retrieved 2024-07-10.
  35. ^ "Accelerating Standard C++ with GPUs Using stdpar". NVIDIA Technical Blog. 2020-08-04. Retrieved 2024-07-10.
  36. ^ a b c Alpay, Aksel; Heuveline, Vincent (2024-04-08). "AdaptiveCpp Stdpar: C++ Standard Parallelism Integrated Into a SYCL Compiler". Proceedings of the 12th International Workshop on OpenCL and SYCL. IWOCL '24. New York, NY, USA: Association for Computing Machinery: 1–12. doi:10.1145/3648115.3648117. ISBN 979-8-4007-1790-1.