Jump to content

Parallel algorithm: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
added information about developing an effective parallel algorithm for solution in "parallelizability" section and citation to book
 
(41 intermediate revisions by 31 users not shown)
Line 1: Line 1:
{{refimprove|date=November 2012}}
{{More citations needed|date=November 2012}}
In [[computer science]], a '''parallel algorithm''', as opposed to a traditional [[serial algorithm]], is an [[algorithm]] which can be executed a piece at a time on many different processing devices, and then combined together again at the end to get the correct result.<ref>{{cite paper| title=Parallel Algorithms | first1=Guy E. | last1=Blelloch | first2=Bruce M. | last2=Maggs | publisher=School of Computer Science, [[Carnegie Mellon University]] | location=USA | accessdate=7 November 2012 }}</ref>
In [[computer science]], a '''parallel algorithm''', as opposed to a traditional [[serial algorithm]], is an [[algorithm]] which can do multiple operations in a given time. It has been a tradition of [[computer science]] to describe serial algorithms in [[abstract machine]] models, often the one known as [[random-access machine]]. Similarly, many computer science researchers have used a so-called [[parallel random-access machine]] (PRAM) as a parallel abstract machine (shared-memory).<ref>{{cite web |title=Parallel Algorithms |first1=Guy E. |last1=Blelloch |first2=Bruce M. |last2=Maggs |publisher=School of Computer Science, [[Carnegie Mellon University]] |location=USA |access-date=2015-07-27 |url=https://www.cs.cmu.edu/~guyb/papers/BM04.pdf}}</ref><ref>{{cite web |title=Thinking in Parallel: Some Basic Data-Parallel Algorithms and Techniques, 104 pages |first=Uzi |last=Vishkin |publisher=Class notes of courses on parallel algorithms taught since 1992 at the University of Maryland, College Park, Tel Aviv University and the Technion |date=2009 |url=http://users.umiacs.umd.edu/~vishkin/PUBLICATIONS/classnotes.pdf}}</ref>


Many parallel algorithms are executed [[concurrent computing|concurrently]] – though in general [[concurrent algorithm]]s are a distinct concept – and thus these concepts are often conflated, with which aspect of an algorithm is parallel and which is concurrent not being clearly distinguished. Further, non-parallel, non-concurrent algorithms are often referred to as "sequential algorithms", by contrast with concurrent algorithms.
Many parallel algorithms are executed [[concurrent computing|concurrently]] – though in general [[concurrent algorithm]]s are a distinct concept – and thus these concepts are often conflated, with which aspect of an algorithm is parallel and which is concurrent not being clearly distinguished. Further, non-parallel, non-concurrent algorithms are often referred to as "[[sequential algorithm]]s", by contrast with concurrent algorithms.


==Parallelizability==
==Parallelizability==
{{see also|Analysis of parallel algorithms}}
Algorithms vary significantly in how parallelizable they are, ranging from easily parallelizable to completely unparallelizable. Further, a given problem may accommodate different algorithms, which may be more or less parallelizable.
Algorithms vary significantly in how parallelizable they are, ranging from easily parallelizable to completely unparallelizable. Further, a given problem may accommodate different algorithms, which may be more or less parallelizable.


Some problems are easy to divide up into pieces in this way – these are called ''[[embarrassingly parallel problem]]s.'' For example, splitting up the job of checking all of the numbers from one to a hundred thousand to see which are [[prime number|primes]] could be done by assigning a subset of the numbers to each available processor, and then putting the list of positive results back together. Algorithms are also used for things such as rubik's cubing and for hash decryption.
Some problems are easy to divide up into pieces in this way – these are called ''[[embarrassingly parallel]] problems.'' Examples include many algorithms to solve [[Rubik's Cube]]s and find values which result in a given [[associative array|hash]].{{citation needed|date=December 2019}}


Some problems cannot be split up into parallel portions, as they require the results from a preceding step to effectively carry on with the next step – these are called '''{{visible anchor|inherently serial problem}}s'''. Examples include iterative [[Numerical analysis|numerical methods]], such as [[Newton's method]], iterative solutions to the [[three-body problem]], and most of the available algorithms to compute [[pi]] (π).
Some problems cannot be split up into parallel portions, as they require the results from a preceding step to effectively carry on with the next step – these are called '''{{visible anchor|inherently serial problem}}s'''. Examples include iterative [[numerical analysis|numerical methods]], such as [[Newton's method]], iterative solutions to the [[three-body problem]], and most of the available algorithms to compute [[pi]] (π).{{citation needed|date=August 2019}} Some sequential algorithms can be converted into parallel algorithms using [[automatic parallelization]].<ref name="MXian1997">{{cite book |author1=Megson G M|author2=Chen Xian|title=Automatic Parallelization For A Class Of Regular Computations|url=https://books.google.com/books?id=1wHtCgAAQBAJ&q=%22parallel+algorithm%22|date=4 January 1997|publisher=World Scientific|isbn=978-981-4498-41-8}}</ref>


In many cases developing an effective parallel algorithm for solution of some task requires attraction of new ideas and methods comparing to creating a sequential algorithm version. These are, for instance, practically important problems of searching a target element in data structures, evaluation of an algebraic expression, etc.<ref>{{Cite book |last=Kurgalin |first=Sergei |title=The discrete math workbook: a companion manual using Python |last2=Borzunov |first2=Sergei |date=2020 |publisher=Springer Naturel |isbn=978-3-030-42220-2 |edition=2nd |series=Texts in Computer Science |location=Cham, Switzerland}}</ref>
Computing prime numbers is an interesting example of a problem where different algorithms vary significantly in parallelizability. The [[sieve of Eratosthenes]] is inherently serial – though highly efficient for small numbers – as it uses the ''k''th prime number as input to its ''k'' step, which produces the ''k''+1st prime; while other algorithms are embarrassingly parallel, as they operate on a single number without needing to know all primes up to that point.


==Motivation==
==Motivation==
Parallel algorithms on individual devices have become more common since the early 2000s because of substantial improvements in [[multiprocessing]] systems and the rise of [[multi-core]] processors. Up until the end of 2004, single-core processor performance rapidly increased via [[frequency scaling]], andhus it was easier to construct a computer with a single fast core than one with many slower cores with the same [[throughput]], so multicore systems were of more limited use. Since 2004 however, frequency scaling hit a wall, and thus multicore systems have become more widespread, making parallel algorithms of more general use.
Parallel algorithms on individual devices have become more common since the early 2000s because of substantial improvements in [[multiprocessing]] systems and the rise of [[multi-core]] processors. Up until the end of 2004, single-core processor performance rapidly increased via [[frequency scaling]], and thus it was easier to construct a computer with a single fast core than one with many slower cores with the same [[throughput]], so multicore systems were of more limited use. Since 2004 however, frequency scaling hit a wall, and thus multicore systems have become more widespread, making parallel algorithms of more general use.


==Issues==
==Issues==

===Communication===
===Communication===
The cost or complexity of serial algorithms is estimated in terms of the space (memory) and time (processor cycles) that they take. Parallel algorithms need to optimize one more resource, the communication between different processors. There are two ways parallel processors communicate, shared memory or message passing.
The cost or complexity of serial algorithms is estimated in terms of the space (memory) and time (processor cycles) that they take. Parallel algorithms need to optimize one more resource, the communication between different processors. There are two ways parallel processors communicate, shared memory or message passing.


[[Shared memory]] processing needs additional [[Lock (computer science)|locking]] for the data, imposes the overhead of additional processor and bus cycles, and also serializes some portion of the algorithm.
[[Shared memory]] processing needs additional [[lock (computer science)|locking]] for the data, imposes the overhead of additional processor and bus cycles, and also serializes some portion of the algorithm.


[[Message passing]] processing uses channels and message boxes but this communication adds transfer overhead on the bus, additional memory need for queues and message boxes and latency in the messages. Designs of parallel processors use special buses like crossbar so that the communication overhead will be small but it is the parallel algorithm that decides the volume of the traffic.
[[Message passing]] processing uses channels and message boxes but this communication adds transfer overhead on the bus, additional memory need for queues and message boxes and latency in the messages. Designs of parallel processors use special [[bus (computing)|buses]] like [[crossbar switch|crossbar]] so that the communication overhead will be small but it is the parallel algorithm that decides the volume of the traffic.


If the communication overhead of additional processors outweighs the benefit of adding another processor, one encounters [[parallel slowdown]].
If the communication overhead of additional processors outweighs the benefit of adding another processor, one encounters [[parallel slowdown]].
Line 28: Line 30:
===Load balancing===
===Load balancing===
{{main|Load balancing (computing)}}
{{main|Load balancing (computing)}}
Another problem with parallel algorithms is ensuring that they are suitably [[Load balancing (computing)|load balanced]], by ensuring that ''load'' (overall work) is balanced, rather than input size being balanced. For example, checking all numbers from one to a hundred thousand for primality is easy to split amongst processors; however, if the numbers are simply divided out evenly (1–1,000, 1,001–2,000, etc.), the amount of work will be unbalanced, as smaller numbers are easier to process by this algorithm (easier to test for primality), and thus some processors will get more work to do than the others, which will sit idle until the loaded processors complete.
Another problem with parallel algorithms is ensuring that they are suitably [[load balancing (computing)|load balanced]], by ensuring that ''load'' (overall work) is balanced, rather than input size being balanced. For example, checking all numbers from one to a hundred thousand for primality is easy to split among processors; however, if the numbers are simply divided out evenly (1–1,000, 1,001–2,000, etc.), the amount of work will be unbalanced, as smaller numbers are easier to process by this algorithm (easier to test for primality), and thus some processors will get more work to do than the others, which will sit idle until the loaded processors complete.


==Distributed algorithms==
==Distributed algorithms==
{{main|Distributed algorithm}}
{{main|Distributed algorithm}}
{{section-stub|date=February 2014}}
{{expand section|date=February 2014}}
A subtype of parallel algorithms, ''[[distributed algorithm]]s'' are algorithms designed to work in [[cluster computing]] and [[distributed computing]] environments, where additional concerns beyond the scope of "classical" parallel algorithms need to be addressed.
A subtype of parallel algorithms, ''[[distributed algorithm]]s'', are algorithms designed to work in [[cluster computing]] and [[distributed computing]] environments, where additional concerns beyond the scope of "classical" parallel algorithms need to be addressed.


==See also==
==See also==
* [[Multiple-agent system]] (MAS)
* [[Multiple-agent system]] (MAS)
* [[Parallel algorithms for matrix multiplication]]
* [[Neural network]]
* [[Parallel algorithms for minimum spanning trees]]
* [[Parallel computing]]
* [[Parallel computing]]
* [[Parareal]]


==References==
==References==
{{reflist}}
{{Reflist}}


==External links==
==External links==
*[http://www-unix.mcs.anl.gov/dbpp/ Designing and Building Parallel Programs page at the US Argonne National Laboratories]
*[https://www.mcs.anl.gov/~itf/dbpp/ Designing and Building Parallel Programs], US Argonne National Laboratory

{{Parallel computing}}
{{Authority control}}


[[Category:Parallel computing]]
[[Category:Parallel computing]]

Latest revision as of 17:16, 25 July 2024

In computer science, a parallel algorithm, as opposed to a traditional serial algorithm, is an algorithm which can do multiple operations in a given time. It has been a tradition of computer science to describe serial algorithms in abstract machine models, often the one known as random-access machine. Similarly, many computer science researchers have used a so-called parallel random-access machine (PRAM) as a parallel abstract machine (shared-memory).[1][2]

Many parallel algorithms are executed concurrently – though in general concurrent algorithms are a distinct concept – and thus these concepts are often conflated, with which aspect of an algorithm is parallel and which is concurrent not being clearly distinguished. Further, non-parallel, non-concurrent algorithms are often referred to as "sequential algorithms", by contrast with concurrent algorithms.

Parallelizability

[edit]

Algorithms vary significantly in how parallelizable they are, ranging from easily parallelizable to completely unparallelizable. Further, a given problem may accommodate different algorithms, which may be more or less parallelizable.

Some problems are easy to divide up into pieces in this way – these are called embarrassingly parallel problems. Examples include many algorithms to solve Rubik's Cubes and find values which result in a given hash.[citation needed]

Some problems cannot be split up into parallel portions, as they require the results from a preceding step to effectively carry on with the next step – these are called inherently serial problems. Examples include iterative numerical methods, such as Newton's method, iterative solutions to the three-body problem, and most of the available algorithms to compute pi (π).[citation needed] Some sequential algorithms can be converted into parallel algorithms using automatic parallelization.[3]

In many cases developing an effective parallel algorithm for solution of some task requires attraction of new ideas and methods comparing to creating a sequential algorithm version. These are, for instance, practically important problems of searching a target element in data structures, evaluation of an algebraic expression, etc.[4]

Motivation

[edit]

Parallel algorithms on individual devices have become more common since the early 2000s because of substantial improvements in multiprocessing systems and the rise of multi-core processors. Up until the end of 2004, single-core processor performance rapidly increased via frequency scaling, and thus it was easier to construct a computer with a single fast core than one with many slower cores with the same throughput, so multicore systems were of more limited use. Since 2004 however, frequency scaling hit a wall, and thus multicore systems have become more widespread, making parallel algorithms of more general use.

Issues

[edit]

Communication

[edit]

The cost or complexity of serial algorithms is estimated in terms of the space (memory) and time (processor cycles) that they take. Parallel algorithms need to optimize one more resource, the communication between different processors. There are two ways parallel processors communicate, shared memory or message passing.

Shared memory processing needs additional locking for the data, imposes the overhead of additional processor and bus cycles, and also serializes some portion of the algorithm.

Message passing processing uses channels and message boxes but this communication adds transfer overhead on the bus, additional memory need for queues and message boxes and latency in the messages. Designs of parallel processors use special buses like crossbar so that the communication overhead will be small but it is the parallel algorithm that decides the volume of the traffic.

If the communication overhead of additional processors outweighs the benefit of adding another processor, one encounters parallel slowdown.

Load balancing

[edit]

Another problem with parallel algorithms is ensuring that they are suitably load balanced, by ensuring that load (overall work) is balanced, rather than input size being balanced. For example, checking all numbers from one to a hundred thousand for primality is easy to split among processors; however, if the numbers are simply divided out evenly (1–1,000, 1,001–2,000, etc.), the amount of work will be unbalanced, as smaller numbers are easier to process by this algorithm (easier to test for primality), and thus some processors will get more work to do than the others, which will sit idle until the loaded processors complete.

Distributed algorithms

[edit]

A subtype of parallel algorithms, distributed algorithms, are algorithms designed to work in cluster computing and distributed computing environments, where additional concerns beyond the scope of "classical" parallel algorithms need to be addressed.

See also

[edit]

References

[edit]
  1. ^ Blelloch, Guy E.; Maggs, Bruce M. "Parallel Algorithms" (PDF). USA: School of Computer Science, Carnegie Mellon University. Retrieved 2015-07-27.
  2. ^ Vishkin, Uzi (2009). "Thinking in Parallel: Some Basic Data-Parallel Algorithms and Techniques, 104 pages" (PDF). Class notes of courses on parallel algorithms taught since 1992 at the University of Maryland, College Park, Tel Aviv University and the Technion.
  3. ^ Megson G M; Chen Xian (4 January 1997). Automatic Parallelization For A Class Of Regular Computations. World Scientific. ISBN 978-981-4498-41-8.
  4. ^ Kurgalin, Sergei; Borzunov, Sergei (2020). The discrete math workbook: a companion manual using Python. Texts in Computer Science (2nd ed.). Cham, Switzerland: Springer Naturel. ISBN 978-3-030-42220-2.
[edit]