SHMEM: Difference between revisions

Content deleted Content added

Inline

Latest revision as of 19:11, 24 October 2024

SHMEM (from Cray Research's “shared memory” library^[1]) is a family of parallel programming libraries, providing one-sided, RDMA, parallel-processing interfaces for low-latency distributed-memory supercomputers. The SHMEM acronym was subsequently reverse engineered to mean "Symmetric Hierarchical MEMory”.^[2] Later it was expanded to distributed memory parallel computer clusters, and is used as parallel programming interface or as low-level interface to build partitioned global address space (PGAS) systems and languages.^[3] “Libsma”, the first SHMEM library, was created by Richard Smith at Cray Research in 1993 as a set of thin interfaces to access the CRAY T3D's inter-processor-communication hardware. SHMEM has been implemented by Cray Research, SGI, Cray Inc., Quadrics, HP, GSHMEM, IBM, QLogic, Mellanox, Universities of Houston and Florida; there is also open-source OpenSHMEM.^[4]

SHMEM laid the foundations for low-latency (sub-microsecond) one-sided communication.^[5] After its use on the CRAY T3E,^[6] its popularity waned as few machines could deliver the near-microsecond latencies necessary to maintain efficiency for its hallmark individual-word communication. With the advent of popular sub-microsecond interconnects, SHMEM has been used to address the necessity of hyper-efficient, portable, parallel-communication methods for exascale computing.^[7]

Programs written using SHMEM can be started on several computers, connected together with some high-performance network, supported by used SHMEM library. Every computer runs a copy of a program (SPMD); each copy is called PE (processing element). PEs can ask the SHMEM library to do remote memory-access operations, like reading ("shmem_get" operation) or writing ("shmem_put" operation) data. Peer-to-peer operations are one-sided, which means that no active cooperation from remote thread is needed to complete the action (but it can poll its local memory for changes using "shmem_wait"). Operations can be done on short types like bytes or words, or on longer datatypes like arrays, sometimes evenly strided or indexed (only some elements of array are sent). For short datatypes, SHMEM can do atomic operations (CAS, fetch and add, atomic increment, etc.) even in remote memory. Also there are two different synchronization methods:^[4] task control sync (barriers and locks) and functions to enforce memory fencing and ordering. SHMEM has several collective operations, which should be started by all PEs, like reductions, broadcast, collect.

Every PE has some of its memory declared as "symmetric" segment (or shared memory area) and other memory is private. Only "shared" memory can be accessed in one-sided operation from remote PEs. Programmers can use static-memory constructs or shmalloc/shfree routines to create objects with symmetric address that span the PEs.

Typical SHMEM functions

start_pes(N) — start N processing elements (PE)
_my_pe() — ask SHMEM to return the PE identifier of current thread
shmem_barrier_all() — wait until all PEs run up to barrier; then enable them to go further
shmem_put(target, source, length, pe) — write data of length "length" to the remote address "target" on PE with id "pe" from local address "source"
shmem_get(target, source, length, pe) — read data of length "length" from the remote address "source" on PE with id "pe" and save to read values into local address "target"^[8]

List of SHMEM implementations

Cray Research: Original SHMEM for Cray T3D, Cray T3E, and Cray Research PVP supercomputers^[9]
SGI: SGI-SHMEM for systems with NUMAlink and Altix build with InfiniBand network adapters
Cray Inc.: MP-SHMEM for Unicos MP (X1E supercomputer)
Cray Inc.: LC-SHMEM for Unicos LC (Cray XT3, XT4, XT5)
Quadrics: Q-SHMEM^[10] for Linux clusters with QsNet interconnect^[9]
Cyclops-64 SHMEM
HP SHMEM^[9]
IBM SHMEM^[9]
GPSHMEM^[9]

OpenSHMEM implementations

OpenSHMEM is a standard effort by SGI and Open Source Software Solutions, Inc.

University of Houston: Reference OpenSHMEM^[4]^[9]
Mellanox ScalableSHMEM^[9]
Portals-SHMEM (on top of Portals interface)
University of Florida: Gator SHMEM^[9]
Open MPI includes an implementation of OpenSHMEM^[11]
Adapteva Epiphany Coprocessor^[12]
Sandia OpenSHMEM (SOS) supports multiple networking APIs ^[13]

Disadvantages

In first years SHMEM was accessible only on some Cray Research machines (later additionally on SGI)^[1] equipped with special networks, limiting library widespread and being vendor lock-in (for example, Cray Research recommended to partially rewrite MPI programs to combine both MPI and shmem calls, which make the program non-portable to other clear-MPI environment).

SHMEM was not defined as standard,^[9]^[1] so there were created several incompatible variants of SHMEM libraries by other vendors. Libraries had different include file names, different management function names for starting PEs or getting current PE id,^[9] and some functions were changed or not supported.

Some SHMEM routines were designed according to Cray T3D architecture limitations, for example reductions and broadcasts could be started only on subsets of PEs with size being power of two.^[2]^[9]

Variants of SHMEM libraries can run on top of any MPI library, even when a cluster has only non-RDMA optimized Ethernet, however the performance will be typically worse than other enhanced networking protocols.

Memory in shared region should be allocated using special functions (shmalloc/shfree), not with the system malloc.^[9]

SHMEM is available only for C and Fortran (some versions also to C++).^[9]

Many disadvantages of SHMEM have been overcome with the use of OpenSHMEM on popular sub-microsecond interconnects, driven by exascale development.^[7]

References

^ ^a ^b ^c Cray Research (1999). Cray T3E C and C++ Optimization Guide (PDF) (Technical report). pp. 45–83. 004-2178-002. Archived from the original (PDF) on 2014-02-01.
^ ^a ^b Introduction to Parallel Computing - 3.11 Related Work // cse590o course, University of Washington, Winter 2002; page 154
^ "New Accelerations for Parallel Programming" (PDF). Mellanox. 2012. Retrieved 18 January 2014. SHMEM is being used/proposed as a lower level interface for PGAS implementations
^ ^a ^b ^c Poole, Stephen (2011). "OpenSHMEM - Toward a Unified RMA Model". Encyclopedia of Parallel Computing. pp. 1379–1391. doi:10.1007/978-0-387-09766-4_490. ISBN 978-0-387-09765-7.
^ Tools for Benchmarking, Tracing, and Simulating SHMEM Applications // CUG 2012, paper by San Diego Supercomputer center and ORNL
^ Recent Advances in Parallel Virtual Machine and Message Passing ..., Volume 11 page 59: "One-sided communication as a programming paradigm was made popular initially by the SHMEM library on the Cray T3D and T3E..."
^ ^a ^b "OpenSHMEM 2015". www.csm.ornl.gov. Retrieved 2017-04-10.
^ SGI TPL. "man shmem_get". Archived from the original on 2014-02-01.
^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j ^k ^l ^m Nanjegowda, Ram; Pophale, Swaroop; Curtis, Tony (2012). "OpenSHMEM Tutorial" (PDF). University of Houston, Texas. Archived from the original (PDF) on 2014-02-01.
^ Shmem Programming Manual // Quadrics, 2000-2001
^ OpenMPI
^ James Ross and David Richie. An OpenSHMEM Implementation for the Adapteva Epiphany Coprocessor. In Proceedings of the Third Workshop on OpenSHMEM and Related Technologies, "OpenSHMEM 2016". www.csm.ornl.gov.. Springer, 2016.
^ Sandia OpenSHMEM (SOS) on Github

External links

Using SHMEM on the CRAY T3E
man intro_shmem (SGI TPL) - Introduction to the SHMEM programming model
OpenSHMEM: an effort to create a specification for a standardized API for parallel programming in the Partitioned Global Address Space.

[craych3-1] Cray Research (1999). Cray T3E C and C++ Optimization Guide (PDF) (Technical report). pp. 45–83. 004-2178-002. Archived from the original (PDF) on 2014-02-01.

[cse590-2] Introduction to Parallel Computing - 3.11 Related Work // cse590o course, University of Washington, Winter 2002; page 154

[3] "New Accelerations for Parallel Programming" (PDF). Mellanox. 2012. Retrieved 18 January 2014. SHMEM is being used/proposed as a lower level interface for PGAS implementations

[openshmem-rma-toward-4] Poole, Stephen (2011). "OpenSHMEM - Toward a Unified RMA Model". Encyclopedia of Parallel Computing. pp. 1379–1391. doi:10.1007/978-0-387-09766-4_490. ISBN 978-0-387-09765-7.

[5] Tools for Benchmarking, Tracing, and Simulating SHMEM Applications // CUG 2012, paper by San Diego Supercomputer center and ORNL

[6] Recent Advances in Parallel Virtual Machine and Message Passing ..., Volume 11 page 59: "One-sided communication as a programming paradigm was made popular initially by the SHMEM library on the Cray T3D and T3E..."

[:0-7] "OpenSHMEM 2015". www.csm.ornl.gov. Retrieved 2017-04-10.

[8] SGI TPL. "man shmem_get". Archived from the original on 2014-02-01.

[uh2012-9] ^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j ^k ^l ^m Nanjegowda, Ram; Pophale, Swaroop; Curtis, Tony (2012). "OpenSHMEM Tutorial" (PDF). University of Houston, Texas. Archived from the original (PDF) on 2014-02-01.

[10] Shmem Programming Manual // Quadrics, 2000-2001

[11] OpenMPI

[12] James Ross and David Richie. An OpenSHMEM Implementation for the Adapteva Epiphany Coprocessor. In Proceedings of the Third Workshop on OpenSHMEM and Related Technologies, "OpenSHMEM 2016". www.csm.ornl.gov.. Springer, 2016.

[13] Sandia OpenSHMEM (SOS) on Github

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

@@ Line 1: / Line 1: @@
-'''SHMEM''' (from Symmetric Hierarchical Memory access) is a family of parallel programming libraries,  initially providing remote memory access for big shared-memory supercomputers using one-sided communications.<ref name="cse590"/> Later it was expanded to [[distributed memory]] parallel computer clusters, and is used as parallel programming interface or as low-level interface to build [[partitioned global address space]] (PGAS) systems and languages.<ref>{{cite web|url=http://www.bsc.es/sites/default/files/public/mare_nostrum/hpcac2012-5_mellanox.pdf|year=2012|publisher=Mellanox|accessdate=18 January 2014|quote=SHMEM is being used/proposed as a lower level interface for PGAS implementations|title=New Accelerations for Parallel Programming}}</ref> The first SHMEM library, libsma, was created by Cray in 1993. Later the SHMEM was also implemented by SGI, Quadrics, HP, GSHMEM, IBM, QLogic, Mellanox, Universities of Houston and Florida; there is also open-source OpenSHMEM.<ref name="openshmem-rma-toward">{{cite journal|last=Poole|first=Stephen|year=2011|title=OpenSHMEM - Toward a Unified RMA Model|journal=Encyclopedia of Parallel Computing |pages=1379–1391|url=http://link.springer.com/referenceworkentry/10.1007%2F978-0-387-09766-4_490|accessdate=2013-01-15}}</ref>
+'''SHMEM''' (from Cray Research's “shared memory” library<ref name="craych3"/>) is a family of parallel programming libraries, providing one-sided, [[Remote direct memory access|RDMA]], parallel-processing interfaces for low-latency distributed-memory supercomputers.  The SHMEM acronym was subsequently reverse engineered to mean "Symmetric Hierarchical MEMory”.<ref name="cse590"/> Later it was expanded to [[distributed memory]] parallel computer clusters, and is used as parallel programming interface or as low-level interface to build [[partitioned global address space]] (PGAS) systems and languages.<ref>{{cite web|url=http://www.bsc.es/sites/default/files/public/mare_nostrum/hpcac2012-5_mellanox.pdf|year=2012|publisher=Mellanox|accessdate=18 January 2014|quote=SHMEM is being used/proposed as a lower level interface for PGAS implementations|title=New Accelerations for Parallel Programming}}</ref> “Libsma”, the first SHMEM library, was created by Richard Smith at Cray Research in 1993 as a set of thin interfaces to access the CRAY T3D's inter-processor-communication hardware.  SHMEM has been implemented by Cray Research, SGI, Cray Inc., Quadrics, HP, GSHMEM, IBM, QLogic, Mellanox, Universities of Houston and Florida; there is also open-source OpenSHMEM.<ref name="openshmem-rma-toward">{{cite book|last=Poole|first=Stephen|title=Encyclopedia of Parallel Computing |chapter=OpenSHMEM - Toward a Unified RMA Model |year=2011|pages=1379–1391|doi=10.1007/978-0-387-09766-4_490|isbn=978-0-387-09765-7}}</ref>
-Historically, SHMEM, the earliest one-sided library,<ref>[http://users.sdsc.edu/~lcarring/Papers/2012_CUG.pdf Tools for Benchmarking, Tracing, and Simulating SHMEM Applications] // CUG 2012, paper by San Diego Supercomputer center and ORNL</ref> made the one-sided parallel programming paradigm popular.<ref>[https://books.google.com/books?id=eeNswgpkBW4C&pg=PA59&dq=%22SHMEM%22+popular Recent Advances in Parallel Virtual Machine and Message Passing ..., Volume 11] page 59: "One-sided communication as a programming paradigm was made popular initially by the SHMEM library on the Cray T3D and T3E..."</ref>
+SHMEM laid the foundations for low-latency (sub-microsecond) one-sided communication.<ref>[http://users.sdsc.edu/~lcarring/Papers/2012_CUG.pdf Tools for Benchmarking, Tracing, and Simulating SHMEM Applications] // CUG 2012, paper by San Diego Supercomputer center and ORNL</ref> After its use on the CRAY T3E,<ref>[https://books.google.com/books?id=eeNswgpkBW4C&pg=PA59&dq=%22SHMEM%22+popular Recent Advances in Parallel Virtual Machine and Message Passing ..., Volume 11] page 59: "One-sided communication as a programming paradigm was made popular initially by the SHMEM library on the Cray T3D and T3E..."</ref> its popularity waned as few machines could deliver the near-microsecond latencies necessary to maintain efficiency for its hallmark individual-word communication.  With the advent of popular sub-microsecond interconnects, SHMEM has been used to address the necessity of hyper-efficient, portable, parallel-communication methods for exascale computing.<ref name=":0">{{Cite web|url=http://www.csm.ornl.gov/workshops/openshmem2015/agenda_technical.html|title=OpenSHMEM 2015|website=www.csm.ornl.gov|access-date=2017-04-10}}</ref>
 Programs written using SHMEM can be started on several computers, connected together with some high-performance network, supported by used SHMEM library. Every computer runs a copy of a program ([[SPMD]]); each copy is called PE (processing element). PEs can ask the SHMEM library to do remote memory-access operations, like reading ("shmem_get" operation) or writing ("shmem_put" operation) data. Peer-to-peer operations are one-sided, which means that no active cooperation from remote thread is needed to complete the action (but it can poll its local memory for changes using "shmem_wait"). Operations can be done on short types like bytes or words, or on longer datatypes like arrays, sometimes evenly strided or indexed (only some elements of array are sent). For short datatypes, SHMEM can do atomic operations ([[Compare-and-swap|CAS]], fetch and add, atomic increment, etc.) even in remote memory. Also there are two different synchronization methods:<ref name="openshmem-rma-toward"/> task control sync (barriers and locks) and functions to enforce memory fencing and ordering. SHMEM has several collective operations, which should be started by all PEs, like reductions, broadcast, collect.
-Every PEs has some of it memory declared as "symmetric" segment (or shared memory area) and other memory is private. Only "shared" memory can be accessed in one-sided operation from remote PEs. It is possible to create symmetric objects which has same address on every PE.
+Every PE has some of its memory declared as "symmetric" segment (or shared memory area) and other memory is private. Only "shared" memory can be accessed in one-sided operation from remote PEs.  Programmers can use static-memory constructs or shmalloc/shfree routines to create objects with symmetric address that span the PEs.
 == Typical SHMEM functions ==
-* start_pes(N) - start N processing elements (PE)
+* {{Mono|start_pes(N)}} — start ''N'' processing elements (PE)
-* _my_pe() - ask SHMEM to return the PE identifier of current thread
+* {{Mono|_my_pe()}} — ask SHMEM to return the PE identifier of current thread
-* shmem_barrier_all() - wait until all PEs run up to barrier; then enable them to go further
+* {{Mono|shmem_barrier_all()}} — wait until all PEs run up to barrier; then enable them to go further
-* shmem_put(target, source, length, pe) - write data of length "length" to the remote address "target" on PE with id "pe" from local address "source"
+* {{Mono|shmem_put(target, source, length, pe)}} — write data of length "length" to the remote address "target" on PE with id "pe" from local address "source"
-* shmem_get(target, source, length, pe) - read data of length "length" from the remote address "source" on PE with id "pe" and save to read values into local address "target"<ref>[http://docs.sgi.com/library/tpl/cgi-bin/getdoc.cgi?coll=linux&db=man&fname=/usr/share/catman/man3/shmem_get.3.html man shmem_get] (SGI TPL)</ref>
+* {{Mono|shmem_get(target, source, length, pe)}} — read data of length "length" from the remote address "source" on PE with id "pe" and save to read values into local address "target"<ref>{{Cite web |title=man shmem_get|author=SGI TPL |url=http://docs.sgi.com/library/tpl/cgi-bin/getdoc.cgi?coll=linux&db=man&fname=/usr/share/catman/man3/shmem_get.3.html |archive-url=https://web.archive.org/web/20140201200101/http://docs.sgi.com/library/tpl/cgi-bin/getdoc.cgi?coll=linux&db=man&fname=/usr/share/catman/man3/shmem_get.3.html |archive-date=2014-02-01 |url-status=dead}}</ref>
 == List of SHMEM implementations ==
 <!-- <ref name="openshmem-rma-toward"/> -->
+* Cray Research: Original SHMEM for [[Cray T3D]], [[Cray T3E]], and Cray Research PVP supercomputers<ref name="uh2012" />
-* SGI: SGI-SHMEM for systems with NUMALink and Altix build with Infiniband network adapters
+* SGI: SGI-SHMEM for systems with [[NUMAlink]] and Altix build with InfiniBand network adapters
-* Cray's original SHMEM for T3D, T3E, PVP supercomputers<ref name="uh2012"/>
-* Cray: MP-SHMEM for Unicos MP (X1E supercomputer)
+* Cray Inc.: MP-SHMEM for Unicos MP (X1E supercomputer)
-* Cray: LC-SHMEM for Unicos LC (Cray XT3, XT4, XT5)
+* Cray Inc.: LC-SHMEM for Unicos LC (Cray XT3, XT4, XT5)
 * Quadrics: Q-SHMEM<ref>[http://staff.psc.edu/oneal/compaq/ShmemMan.pdf Shmem Programming Manual] // Quadrics, 2000-2001</ref> for Linux clusters with QsNet interconnect<ref name="uh2012"/>
 * Cyclops-64 SHMEM
@@ Line 25: / Line 25: @@
 * IBM SHMEM<ref name="uh2012"/>
 * GPSHMEM<ref name="uh2012"/>
-* ----- OpenSHMEM implementations (standard effort by SGI and Open Source Software Solutions, Inc.)
+=== OpenSHMEM implementations ===
+OpenSHMEM is a standard effort by SGI and Open Source Software Solutions, Inc.
 * University of Houston: Reference OpenSHMEM<ref name="openshmem-rma-toward"/><ref name="uh2012"/>
 * Mellanox ScalableSHMEM<ref name="uh2012"/>
 * Portals-SHMEM (on top of [[Portals network programming api|Portals interface]])
 * University of Florida: Gator SHMEM<ref name="uh2012"/>
+* [[Open MPI]] includes an implementation of OpenSHMEM<ref>[https://www.open-mpi.org OpenMPI]</ref>
+* Adapteva Epiphany Coprocessor<ref>James Ross and David Richie. An OpenSHMEM Implementation for the Adapteva Epiphany Coprocessor. In Proceedings of the Third Workshop on OpenSHMEM and Related Technologies, {{Cite web|url=https://www.csm.ornl.gov/workshops/openshmem2016/|title=OpenSHMEM 2016|website=www.csm.ornl.gov}}. Springer, 2016.</ref>
+* Sandia OpenSHMEM (SOS) supports multiple networking APIs <ref>[https://github.com/Sandia-OpenSHMEM/SOS Sandia OpenSHMEM (SOS) on Github]</ref>
 ==Disadvantages==
-In first years SHMEM  was accessible only on some Cray machines (later additionally on SGI)<ref name="craych3">[http://docs.cray.com/books/004-2178-002/06chap3.pdf SHMEM] // Cray, Document 004-2178-002, chapter 3</ref> equipped with special networks, limiting library widespread and being [[vendor lock-in]] (for example, Cray recommends to partially rewrite MPI programs to combine both MPI and shmem calls, which make the program non-portable to other clear-MPI environment).
+In first years SHMEM was accessible only on some Cray Research machines (later additionally on SGI)<ref name="craych3">{{cite tech report |author=Cray Research |date=1999 |title=Cray T3E C and C++ Optimization Guide |number=004-2178-002 |pages=45{{ndash}}83 |url=http://docs.cray.com/books/004-2178-002/06chap3.pdf  |archive-url=https://web.archive.org/web/20140201100405/http://docs.cray.com/books/004-2178-002/06chap3.pdf |archive-date=2014-02-01 |url-status=dead}}</ref> equipped with special networks, limiting library widespread and being [[vendor lock-in]] (for example, Cray Research recommended to partially rewrite MPI programs to combine both MPI and shmem calls, which make the program non-portable to other clear-MPI environment).
 SHMEM was not defined as standard,<ref name="uh2012"/><ref name="craych3"/> so there were created several incompatible variants of SHMEM libraries by other vendors. Libraries had different include file names, different management function names for starting PEs or getting current PE id,<ref name="uh2012"/> and some functions were changed or not supported.
-Some SHMEM routines were designed according to Cray T3D architecture limitations, for example reductions and broadcasts could be started only on subsets of PEs with size being power of two.<ref name="cse590">[http://courses.cs.washington.edu/courses/cse590o/02wi/otherapproaches.pdf Introduction to Parallel Computing - 3.11 Related Work] // cse590o course, University of Washington, Winter 2002; page 154</ref><ref name="uh2012"/>
+Some SHMEM routines were designed according to Cray T3D architecture limitations, for example reductions and broadcasts could be started only on subsets of PEs with size being [[power of two]].<ref name="cse590">[http://courses.cs.washington.edu/courses/cse590o/02wi/otherapproaches.pdf Introduction to Parallel Computing - 3.11 Related Work] // cse590o course, University of Washington, Winter 2002; page 154</ref><ref name="uh2012"/>
+Variants of SHMEM libraries can run on top of any MPI library, even when a cluster has only non-[[Remote direct memory access|RDMA]] optimized Ethernet, however the performance will be typically worse than other enhanced networking protocols.
+Memory in shared region should be allocated using special functions ({{Mono|shmalloc}}/{{Mono|shfree}}), not with the system {{Mono|malloc}}.<ref name="uh2012"/>
-Now there are variants of SHMEM libraries, which can run on top of any MPI library, even when a cluster has only non-rdma optimized Ethernet, however the performance will be typically worse than other enhanced networking protocols.
+SHMEM is available only for C and Fortran (some versions also to C++).<ref name="uh2012">{{cite web |title=OpenSHMEM Tutorial |last1=Nanjegowda |first1=Ram |last2=Pophale |first2=Swaroop |last3=Curtis |first3=Tony |year=2012 |publisher=University of Houston, Texas |url=http://www2.cs.uh.edu/~tonyc/pgas12/openshmem-tutorial/pgas-2012-slides.pdf |archive-url=https://web.archive.org/web/20140201160650/http://www2.cs.uh.edu/~tonyc/pgas12/openshmem-tutorial/pgas-2012-slides.pdf |archive-date=2014-02-01 |url-status=dead}}</ref>
-Memory in shared region should be allocated using special functions (shmalloc/shfree), not with the system malloc.<ref name="uh2012"/>
+Many disadvantages of SHMEM have been overcome with the use of OpenSHMEM on popular sub-microsecond interconnects, driven by exascale development.<ref name=":0" />
-SHMEM is available only for C and Fortran (some versions also to C++).<ref name="uh2012">[http://www2.cs.uh.edu/~tonyc/pgas12/openshmem-tutorial/pgas-2012-slides.pdf OpenSHMEM TUTORIAL] // University of Houston, Texas, 2012</ref>
 == See also ==
 <!-- partially filled according http://www.csm.ornl.gov/workshops/openshmem2013/documents/ReducingSynchronizationOverheadThroughBundledCommunication.pdf "4 Related Work" by Intel -->
 * [[Message Passing Interface]] (especially one-sided operations of MPI-2)
-* [[GASNet]]
-* [[ARMCI]]
 * [[Active Messages]]
 * [[Unified Parallel C]] (one of PGAS languages, can be implemented on top of SHMEM)
-* [[OpenSHMEM]] (Open release of the SHMEM API by Open Source Software Solutions, Inc.)
 == References ==
@@ Line 62: / Line 66: @@
 * [https://web.archive.org/web/19991118150814/http://www.sdsc.edu/SDSCwire/v3.15/shmem_07_30_97.html Using SHMEM on the CRAY T3E]
 * [http://docs.sgi.com/library/tpl/cgi-bin/getdoc.cgi?coll=linux&db=man&fname=/usr/share/catman/man3/intro_shmem.3.html man intro_shmem] (SGI TPL) - Introduction to the SHMEM programming model
+* [http://www.openshmem.org/site/ OpenSHMEM]: an effort to create a specification for a standardized API for parallel programming in the Partitioned Global Address Space.
 {{Parallel Computing}}

v t e Parallel computing
General	Distributed computing Parallel computing Massively parallel Cloud computing High-performance computing Multiprocessing Manycore processor GPGPU Computer network Systolic array
Levels	Bit Instruction Thread Task Data Memory Loop Pipeline
Multithreading	Temporal Simultaneous (SMT) Simultaneous and heterogenous Speculative (SpMT) Preemptive Cooperative Clustered multi-thread (CMT) Hardware scout
Theory	PRAM model PEM model Analysis of parallel algorithms Amdahl's law Gustafson's law Cost efficiency Karp–Flatt metric Slowdown Speedup
Elements	Process Thread Fiber Instruction window Array
Coordination	Multiprocessing Memory coherence Cache coherence Cache invalidation Barrier Synchronization Application checkpointing
Programming	Stream processing Dataflow programming Models Implicit parallelism Explicit parallelism Concurrency Non-blocking algorithm
Hardware	Flynn's taxonomy SISD SIMD Array processing (SIMT) Pipelined processing Associative processing MISD MIMD Dataflow architecture Pipelined processor Superscalar processor Vector processor Multiprocessor symmetric asymmetric Memory shared distributed distributed shared UMA NUMA COMA Massively parallel computer Computer cluster Beowulf cluster Grid computer Hardware acceleration
APIs	Ateji PX Boost Chapel HPX Charm++ Cilk Coarray Fortran CUDA Dryad C++ AMP Global Arrays GPUOpen MPI OpenMP OpenCL OpenHMPP OpenACC Parallel Extensions PVM pthreads RaftLib ROCm UPC TBB ZPL
Problems	Automatic parallelization Deadlock Deterministic algorithm Embarrassingly parallel Parallel slowdown Race condition Software lockout Scalability Starvation
Category: Parallel computing