Two-phase commit protocol: Difference between revisions
Tag: section blanking |
m Added sequence diagrams for more scenarios |
||
(41 intermediate revisions by 28 users not shown) | |||
Line 1: | Line 1: | ||
{{Short description|Computer science transaction algorithm}} |
|||
{{Redirect|2PC|the play in American and Canadian football|Two-point conversion|the cryptographic protocol|Commitment scheme}} |
|||
{{Redirect|2PC|the play in American and Canadian football|Two-point conversion|the cryptographic protocol|Commitment scheme|the concurrency control|Two-phase locking}} |
|||
[[File:Two phase commit seq diagram success 01.png|thumb|The sequence diagram showing the success path of Two Phase Commit protocol created with [https://fizzbee.io FizzBee]]] |
|||
In [[transaction processing]], [[database]]s, and [[computer networking]], the '''two-phase commit protocol''' ('''2PC''') is a type of [[Atomic commit|atomic commitment protocol]] (ACP). It is a [[distributed algorithm]] that coordinates all the processes that participate in a [[Distributed transaction|distributed atomic transaction]] on whether to ''[[Commit (data management)|commit]]'' or ''abort'' (''roll back'') the transaction (it is a specialized type of [[Consensus (computer science)|consensus]] protocol). The protocol achieves its goal even in many cases of temporary system failure (involving either process, network node, communication, etc. failures), and is thus widely used.<ref name="bernstein1987">[[Phil Bernstein|Philip A. Bernstein]], Vassos Hadzilacos, Nathan Goodman (1987): [http://research.microsoft.com/en-us/people/philbe/ccontrol.aspx ''Concurrency Control and Recovery in Database Systems''], Chapter 7, Addison Wesley Publishing Company, {{ISBN|0-201-10715-5}}</ref><ref name="weikum2001">[[Gerhard Weikum]], Gottfried Vossen (2001): [http://www.elsevier.com/wps/find/bookdescription.cws_home/677937/description#description ''Transactional Information Systems''], Chapter 19, Elsevier, {{ISBN|1-55860-508-8}}</ref><ref name=Bern2009>Philip A. Bernstein, Eric Newcomer (2009): [http://www.elsevierdirect.com/product.jsp?isbn=9781558606234 ''Principles of Transaction Processing'', 2nd Edition], Chapter 8, Morgan Kaufmann (Elsevier), {{ISBN|978-1-55860-623-4}}</ref> |
|||
In [[transaction processing]], [[database]]s, and [[computer networking]], the '''two-phase commit protocol''' ('''2PC''', ''tupac'') is a type of [[Atomic commit|atomic commitment protocol]] (ACP). It is a [[distributed algorithm]] that coordinates all the processes that participate in a [[Distributed transaction|distributed atomic transaction]] on whether to [[Commit (data management)|commit]] or abort (roll back) the transaction. This protocol (a specialised type of [[Consensus (computer science)|consensus]] protocol) achieves its goal even in many cases of temporary system failure (involving either process, network node, communication, etc. failures), and is thus widely used.<ref name="bernstein1987">[[Phil Bernstein|Philip A. Bernstein]], Vassos Hadzilacos, Nathan Goodman (1987): [http://research.microsoft.com/en-us/people/philbe/ccontrol.aspx ''Concurrency Control and Recovery in Database Systems''], Chapter 7, Addison Wesley Publishing Company, {{ISBN|0-201-10715-5}}</ref><ref name="weikum2001">[[Gerhard Weikum]], Gottfried Vossen (2001): [http://www.elsevier.com/wps/find/bookdescription.cws_home/677937/description#description ''Transactional Information Systems''], Chapter 19, Elsevier, {{ISBN|1-55860-508-8}}</ref><ref name=Bern2009>Philip A. Bernstein, Eric Newcomer (2009): [http://www.elsevierdirect.com/product.jsp?isbn=9781558606234 ''Principles of Transaction Processing'', 2nd Edition] {{Webarchive|url=https://web.archive.org/web/20100807151625/http://www.elsevierdirect.com/product.jsp?isbn=9781558606234 |date=2010-08-07 }}, Chapter 8, Morgan Kaufmann (Elsevier), {{ISBN|978-1-55860-623-4}}</ref> |
|||
However, it is not resilient to all possible failure configurations, and in rare cases, manual intervention is needed to remedy an outcome. To accommodate recovery from failure (automatic in most cases) the protocol's participants use [[Server log|logging]] of the protocol's states. Log records, which are typically slow to generate but survive failures, are used by the protocol's [[recovery procedure]]s. Many protocol variants exist that primarily differ in logging strategies and recovery mechanisms. Though usually intended to be used infrequently, recovery procedures compose a substantial portion of the protocol, due to many possible failure scenarios to be considered and supported by the protocol. |
However, it is not resilient to all possible failure configurations, and in rare cases, manual intervention is needed to remedy an outcome. To accommodate recovery from failure (automatic in most cases) the protocol's participants use [[Server log|logging]] of the protocol's states. Log records, which are typically slow to generate but survive failures, are used by the protocol's [[recovery procedure]]s. Many protocol variants exist that primarily differ in logging strategies and recovery mechanisms. Though usually intended to be used infrequently, recovery procedures compose a substantial portion of the protocol, due to many possible failure scenarios to be considered and supported by the protocol. |
||
In a "normal execution" of any single [[distributed transaction]] (i.e., when no failure occurs, which is typically the most frequent situation), the protocol consists of two phases: |
In a "normal execution" of any single [[distributed transaction]] (i.e., when no failure occurs, which is typically the most frequent situation), the protocol consists of two phases: |
||
#The |
#The commit-request phase (or voting phase), in which a coordinator process attempts to prepare all the transaction's participating processes (named participants, cohorts, or workers) to take the necessary steps for either committing or aborting the transaction and to vote, either "Yes": commit (if the transaction participant's local portion execution has ended properly), or "No": abort (if a problem has been detected with the local portion), and |
||
#The |
#The commit phase, in which, based on voting of the participants, the coordinator decides whether to commit (only if all have voted "Yes") or abort the transaction (otherwise), and notifies the result to all the participants. The participants then follow with the needed actions (commit or abort) with their local transactional resources (also called recoverable resources; e.g., database data) and their respective portions in the transaction's other output (if applicable). |
||
The two-phase commit (2PC) protocol should not be confused with the [[two-phase locking]] (2PL) protocol, a [[concurrency control]] protocol. |
|||
==Assumptions== |
==Assumptions== |
||
The protocol works in the following manner: one node is a designated |
The protocol works in the following manner: one node is a designated coordinator, which is the master site, and the rest of the nodes in the network are designated the participants. The protocol assumes that: |
||
# there is [[stable storage]] at each node with a [[Write ahead logging|write-ahead log]], |
|||
# no node crashes forever, |
|||
# the data in the write-ahead log is never lost or corrupted in a crash, and |
|||
# any two nodes can communicate with each other. |
|||
The last assumption is not too restrictive, as network communication can typically be rerouted. The first two assumptions are much stronger; if a node is totally destroyed then data can be lost. |
|||
The protocol is initiated by the coordinator after the last step of the transaction has been reached. The participants then respond with an '''agreement''' message or an '''abort''' message depending on whether the transaction has been processed successfully at the participant. |
|||
The protocol is initiated by the coordinator after the last step of the transaction has been reached. The participants then respond with an agreement message or an abort message depending on whether the transaction has been processed successfully at the participant. |
|||
==Basic algorithm== |
==Basic algorithm== |
||
Line 19: | Line 27: | ||
===Commit request (or voting) phase=== |
===Commit request (or voting) phase=== |
||
#The coordinator sends a |
#The coordinator sends a query to commit message to all participants and waits until it has received a reply from all participants. |
||
#The participants execute the transaction up to the point where they will be asked to commit. They each write an entry to their |
#The participants execute the transaction up to the point where they will be asked to commit. They each write an entry to their undo log and an entry to their [[redo log]]. |
||
#Each participant replies with an |
#Each participant replies with an agreement message (participant votes Yes to commit), if the participant's actions succeeded, or an abort message (participant votes No to commit), if the participant experiences a failure that will make it impossible to commit. |
||
===Commit (or completion) phase=== |
===Commit (or completion) phase=== |
||
====Success==== |
====Success==== |
||
If the coordinator received an |
If the coordinator received an agreement message from all participants during the commit-request phase: |
||
#The coordinator sends a |
#The coordinator sends a commit message to all the participants. |
||
#Each participant completes the operation, and releases all the locks and resources held during the transaction. |
#Each participant completes the operation, and releases all the locks and resources held during the transaction. |
||
#Each participant sends an |
#Each participant sends an acknowledgement to the coordinator. |
||
#The coordinator completes the transaction when all |
#The coordinator completes the transaction when all acknowledgements have been received. |
||
====Failure==== |
|||
If any participant votes No during the commit-request phase (or the coordinator's timeout expires): |
|||
#The coordinator sends a rollback message to all the participants. |
|||
#Each participant undoes the transaction using the undo log, and releases the resources and locks held during the transaction. |
|||
#Each participant sends an acknowledgement to the coordinator. |
|||
#The coordinator undoes the transaction when all acknowledgements have been received. |
|||
====Message flow==== |
====Message flow==== |
||
<pre> |
<pre> |
||
Coordinator Participant |
Coordinator Participant |
||
QUERY TO COMMIT |
|||
--------------------------------> |
--------------------------------> |
||
VOTE YES/NO prepare*/abort* |
|||
<------------------------------- |
<------------------------------- |
||
commit*/abort* |
commit*/abort* COMMIT/ROLLBACK |
||
--------------------------------> |
--------------------------------> |
||
ACKNOWLEDGEMENT commit*/abort* |
|||
<-------------------------------- |
<-------------------------------- |
||
end |
end |
||
</pre> |
</pre> |
||
An * next to the record type means that the record is forced to stable storage.<ref name="mohan1986">[[C. Mohan]], Bruce Lindsay and R. Obermarck (1986): [http://dl.acm.org/citation.cfm?id=7266 "Transaction management in the R* distributed database management system"],''ACM Transactions on Database Systems (TODS)'', Volume 11 Issue 4, Dec. 1986, Pages 378 - 396</ref> |
An * next to the record type means that the record is forced to stable storage.<ref name="mohan1986">[[C. Mohan]], Bruce Lindsay and R. Obermarck (1986): [http://dl.acm.org/citation.cfm?id=7266 "Transaction management in the R* distributed database management system"],''ACM Transactions on Database Systems (TODS)'', Volume 11 Issue 4, Dec. 1986, Pages 378 - 396</ref> |
||
==Formal specification== |
|||
Two phase commit is a classic example widely used for an introductory example of various formal methods systems. |
|||
Formal specification can be model checked to validate the system for safety and liveness properties. Unlike testing, formal verification tools employ techniques like theorem proving and model checking to ensure the correctness in all possible scenarios. |
|||
Here is a specification for Two Phase Commit protocol in FizzBee language. The following specification specifies only a safety property specified as `always assertion` - there is never a case where one participant committed but another participant aborted |
|||
<syntaxhighlight lang="python"> |
|||
--- |
|||
deadlock_detection: false |
|||
--- |
|||
NUM_PARTICIPANTS = 2 |
|||
role Coordinator: |
|||
action Init: |
|||
self.prepared = set() |
|||
self.state = "init" |
|||
action Write: |
|||
if self.state != "init": |
|||
return |
|||
self.state = "working" |
|||
# Phase 1 |
|||
for p in participants: |
|||
vote = p.Prepare() |
|||
if vote == 'aborted': |
|||
# One participant aborted, start Phase 2: Failure case |
|||
self.Abort() |
|||
return |
|||
self.prepared.add(p.ID) |
|||
# Phase 2: Every participant responded success |
|||
self.Commit() |
|||
func Abort(): |
|||
self.state = "aborted" |
|||
for p in participants: |
|||
p.Abort() |
|||
func Commit(): |
|||
if self.state == 'working' and len(self.prepared) == len(participants): |
|||
self.state = 'committed' |
|||
for p in participants: |
|||
p.Commit() |
|||
role Participant: |
|||
action Init: |
|||
self.state = "working" |
|||
func Prepare(): |
|||
if self.state != 'working': |
|||
return self.state |
|||
oneof: |
|||
self.state = 'prepared' |
|||
self.state = 'aborted' |
|||
return self.state |
|||
func Commit(): |
|||
self.state = 'committed' |
|||
func Abort(): |
|||
self.state = 'aborted' |
|||
always assertion ParticipantsConsistent: |
|||
for p1 in participants: |
|||
for p2 in participants: |
|||
if p1.state == 'committed' and p2.state == 'aborted': |
|||
return False |
|||
return True |
|||
action Init: |
|||
participants = [] |
|||
for i in range(NUM_PARTICIPANTS): |
|||
p = Participant(ID=i) |
|||
participants.append(p) |
|||
coordinator = Coordinator() |
|||
</syntaxhighlight> |
|||
This formal specification can be verified using [https://fizzbee.io/play FizzBee online playground] |
|||
'''Liveness Properties:''' |
|||
One liveness property typically expected is, every transaction once initiated must be completed - either committed or aborted. |
|||
===Sequence diagrams=== |
|||
====Success==== |
|||
[[File:Two phase commit seq diagram success 01.png|thumb|none|The sequence diagram showing the success path of Two Phase Commit protocol created with [https://fizzbee.io FizzBee]]] |
|||
====Failure==== |
|||
[[File:Two phase commit seq diagram abort.png|thumb|none|The sequence diagram showing the abort path of Two Phase Commit protocol created with [[FizzBee]]]] |
|||
Timeout |
|||
[[File:Two phase commit seq diagram - msg delay and timeout 01.png|thumb|none|Sequence diagram showing the Two Phase Commit Protocol showing timeout handler in action]] |
|||
[[File:Two phase commit seq diagram - msg delay and timeout 04.png|thumb|none|Sequence diagram showing the Two Phase Commit Protocol showing timeout handler in action]] |
|||
Error recovery |
|||
[[File:Two phase commit seq diagram - coordinator crash recovery.png|thumb|none|Sequence diagram showing the Two Phase Commit Protocol how a coordinator crashed and recovered]] |
|||
==Disadvantages== |
==Disadvantages== |
||
The greatest disadvantage of the two-phase commit protocol is that it is a blocking protocol. If the coordinator fails permanently, some participants will never resolve their transactions: After a participant has sent an |
# The greatest disadvantage of the two-phase commit protocol is that it is a blocking protocol. |
||
# If the coordinator fails permanently, some participants will never resolve their transactions: After a participant has sent an agreement message as a response to the commit-request message from the coordinator, it will block until a commit or rollback is received. |
|||
# A two-phase commit protocol cannot dependably recover from a failure of both the coordinator and a cohort member during the commit phase. If only the coordinator had failed, and no cohort members had received a commit message, it could safely be inferred that no commit had happened. If, however, both the coordinator and a cohort member failed, it is possible that the failed cohort member was the first to be notified, and had actually done the commit. Even if a new coordinator is selected, it cannot confidently proceed with the operation until it has received an agreement from all cohort members, and hence must block until all cohort members respond. |
|||
==Implementing the two-phase commit protocol== |
==Implementing the two-phase commit protocol== |
||
===Common architecture=== |
===Common architecture=== |
||
In many cases the 2PC protocol is distributed in a computer network. It is easily distributed by implementing multiple dedicated 2PC components similar to each other, typically named |
In many cases the 2PC protocol is distributed in a computer network. It is easily distributed by implementing multiple dedicated 2PC components similar to each other, typically named [[transaction manager]]s (TMs; also referred to as 2PC agents or Transaction Processing Monitors), that carry out the protocol's execution for each transaction (e.g., [[The Open Group]]'s [[X/Open XA]]). The databases involved with a distributed transaction, the participants, both the coordinator and participants, register to close TMs (typically residing on respective same network nodes as the participants) for terminating that transaction using 2PC. Each distributed transaction has an ad hoc set of TMs, the TMs to which the transaction participants register. A leader, the coordinator TM, exists for each transaction to coordinate 2PC for it, typically the TM of the coordinator database. However, the coordinator role can be transferred to another TM for performance or reliability reasons. Rather than exchanging 2PC messages among themselves, the participants exchange the messages with their respective TMs. The relevant TMs communicate among themselves to execute the 2PC protocol schema above, "representing" the respective participants, for terminating that transaction. With this architecture the protocol is fully distributed (does not need any central processing component or data structure), and scales up with number of network nodes (network size) effectively. |
||
This common architecture is also effective for the distribution of other [[Atomic commit|atomic commitment protocol]]s besides 2PC, since all such protocols use the same voting mechanism and outcome propagation to protocol participants.<ref name="bernstein1987" /><ref name="weikum2001" /> |
This common architecture is also effective for the distribution of other [[Atomic commit|atomic commitment protocol]]s besides 2PC, since all such protocols use the same voting mechanism and outcome propagation to protocol participants.<ref name="bernstein1987" /><ref name="weikum2001" /> |
||
===Protocol optimizations=== |
===Protocol optimizations=== |
||
[[Database]] research has been done on ways to get most of the benefits of the two-phase commit protocol while reducing costs by |
[[Database]] research has been done on ways to get most of the benefits of the two-phase commit protocol while reducing costs by protocol optimizations<ref name="bernstein1987" /><ref name="weikum2001" /><ref name="Bern2009" /> and protocol operations saving under certain system's behavior assumptions. |
||
====Presumed |
====Presumed abort and presumed commit==== |
||
Presumed abort or Presumed commit are common such optimizations.<ref name="weikum2001" /><ref name=Bern2009/><ref name="mohan1983">[[C. Mohan]], Bruce Lindsay (1985): [http://portal.acm.org/citation.cfm?id=850772 "Efficient commit protocols for the tree of processes model of distributed transactions"],''ACM SIGOPS Operating Systems Review'', |
|||
19(2),pp. 40-52 (April 1985)</ref> An assumption about the outcome of transactions, either commit, or abort, can save both messages and logging operations by the participants during the 2PC protocol's execution. For example, when presumed abort, if during system recovery from failure no logged evidence for commit of some transaction is found by the recovery procedure, then it assumes that the transaction has been aborted, and acts accordingly. This means that it does not matter if aborts are logged at all, and such logging can be saved under this assumption. Typically a penalty of additional operations is paid during recovery from failure, depending on optimization type. Thus the best variant of optimization, if any, is chosen according to failure and transaction outcome statistics. |
19(2),pp. 40-52 (April 1985)</ref> An assumption about the outcome of transactions, either commit, or abort, can save both messages and logging operations by the participants during the 2PC protocol's execution. For example, when presumed abort, if during system recovery from failure no logged evidence for commit of some transaction is found by the recovery procedure, then it assumes that the transaction has been aborted, and acts accordingly. This means that it does not matter if aborts are logged at all, and such logging can be saved under this assumption. Typically a penalty of additional operations is paid during recovery from failure, depending on optimization type. Thus the best variant of optimization, if any, is chosen according to failure and transaction outcome statistics. |
||
====Tree two-phase commit protocol==== |
====Tree two-phase commit protocol==== |
||
The |
The [[Tree (data structure)|Tree]] 2PC protocol<ref name="weikum2001" /> (also called Nested 2PC, or Recursive 2PC) is a common variant of 2PC in a [[computer network]], which better utilizes the underlying communication infrastructure. The participants in a distributed transaction are typically invoked in an order which defines a tree structure, the invocation tree, where the participants are the nodes and the edges are the invocations (communication links). The same tree is commonly utilized to complete the transaction by a 2PC protocol, but also another communication tree can be utilized for this, in principle. In a tree 2PC the coordinator is considered the root ("top") of a communication tree (inverted tree), while the participants are the other nodes. The coordinator can be the node that originated the transaction (invoked recursively (transitively) the other participants), but also another node in the same tree can take the coordinator role instead. 2PC messages from the coordinator are propagated "down" the tree, while messages to the coordinator are "collected" by a participant from all the participants below it, before it sends the appropriate message "up" the tree (except an abort message, which is propagated "up" immediately upon receiving it or if the current participant initiates the abort). |
||
The |
The Dynamic two-phase commit (Dynamic two-phase commitment, D2PC) protocol<ref name="weikum2001" /><ref name="raz1995">[[Yoav Raz]] (1995): [[doi:10.1007/3-540-58907-4_14|"The Dynamic Two Phase Commitment (D2PC) protocol "]],''Database Theory — ICDT '95'', ''Lecture Notes in Computer Science'', Volume 893/1995, pp. 162-176, Springer, {{ISBN|978-3-540-58907-5}}</ref> is a variant of Tree 2PC with no predetermined coordinator. It subsumes several optimizations that have been proposed earlier. Agreement messages (Yes votes) start to propagate from all the leaves, each leaf when completing its tasks on behalf of the transaction (becoming ready). An intermediate (non leaf) node sends ready when an agreement message to the last (single) neighboring node from which agreement message has not yet been received. The coordinator is determined dynamically by racing agreement messages over the transaction tree, at the place where they collide. They collide either at a transaction tree node, to be the coordinator, or on a tree edge. In the latter case one of the two edge's nodes is elected as a coordinator (any node). D2PC is time optimal (among all the instances of a specific transaction tree, and any specific Tree 2PC protocol implementation; all instances have the same tree; each instance has a different node as coordinator): By choosing an optimal coordinator D2PC commits both the coordinator and each participant in minimum possible time, allowing the earliest possible release of locked resources in each transaction participant (tree node). |
||
==See also== |
==See also== |
||
*[[Atomic commit]] |
|||
*[[Commit (data management)]] |
|||
*[[Three-phase commit protocol]] |
*[[Three-phase commit protocol]] |
||
*[[X/Open XA|XA]] |
|||
*[[Paxos algorithm]] |
*[[Paxos algorithm]] |
||
*[[Raft_(algorithm)|Raft algorithm]] |
|||
*[[Two Generals' Problem]] |
*[[Two Generals' Problem]] |
||
==References== |
==References== |
||
{{Reflist}} |
{{Reflist}} |
||
==External links== |
|||
*[http://exploredatabase.blogspot.in/2014/07/two-phase-commit-protocol-in-pictures.html Two Phase Commit protocol explained in Pictures] by exploreDatabase |
|||
{{DEFAULTSORT:Two-Phase Commit Protocol}} |
{{DEFAULTSORT:Two-Phase Commit Protocol}} |
Latest revision as of 09:21, 25 November 2024
In transaction processing, databases, and computer networking, the two-phase commit protocol (2PC, tupac) is a type of atomic commitment protocol (ACP). It is a distributed algorithm that coordinates all the processes that participate in a distributed atomic transaction on whether to commit or abort (roll back) the transaction. This protocol (a specialised type of consensus protocol) achieves its goal even in many cases of temporary system failure (involving either process, network node, communication, etc. failures), and is thus widely used.[1][2][3] However, it is not resilient to all possible failure configurations, and in rare cases, manual intervention is needed to remedy an outcome. To accommodate recovery from failure (automatic in most cases) the protocol's participants use logging of the protocol's states. Log records, which are typically slow to generate but survive failures, are used by the protocol's recovery procedures. Many protocol variants exist that primarily differ in logging strategies and recovery mechanisms. Though usually intended to be used infrequently, recovery procedures compose a substantial portion of the protocol, due to many possible failure scenarios to be considered and supported by the protocol.
In a "normal execution" of any single distributed transaction (i.e., when no failure occurs, which is typically the most frequent situation), the protocol consists of two phases:
- The commit-request phase (or voting phase), in which a coordinator process attempts to prepare all the transaction's participating processes (named participants, cohorts, or workers) to take the necessary steps for either committing or aborting the transaction and to vote, either "Yes": commit (if the transaction participant's local portion execution has ended properly), or "No": abort (if a problem has been detected with the local portion), and
- The commit phase, in which, based on voting of the participants, the coordinator decides whether to commit (only if all have voted "Yes") or abort the transaction (otherwise), and notifies the result to all the participants. The participants then follow with the needed actions (commit or abort) with their local transactional resources (also called recoverable resources; e.g., database data) and their respective portions in the transaction's other output (if applicable).
The two-phase commit (2PC) protocol should not be confused with the two-phase locking (2PL) protocol, a concurrency control protocol.
Assumptions
[edit]The protocol works in the following manner: one node is a designated coordinator, which is the master site, and the rest of the nodes in the network are designated the participants. The protocol assumes that:
- there is stable storage at each node with a write-ahead log,
- no node crashes forever,
- the data in the write-ahead log is never lost or corrupted in a crash, and
- any two nodes can communicate with each other.
The last assumption is not too restrictive, as network communication can typically be rerouted. The first two assumptions are much stronger; if a node is totally destroyed then data can be lost.
The protocol is initiated by the coordinator after the last step of the transaction has been reached. The participants then respond with an agreement message or an abort message depending on whether the transaction has been processed successfully at the participant.
Basic algorithm
[edit]Commit request (or voting) phase
[edit]- The coordinator sends a query to commit message to all participants and waits until it has received a reply from all participants.
- The participants execute the transaction up to the point where they will be asked to commit. They each write an entry to their undo log and an entry to their redo log.
- Each participant replies with an agreement message (participant votes Yes to commit), if the participant's actions succeeded, or an abort message (participant votes No to commit), if the participant experiences a failure that will make it impossible to commit.
Commit (or completion) phase
[edit]Success
[edit]If the coordinator received an agreement message from all participants during the commit-request phase:
- The coordinator sends a commit message to all the participants.
- Each participant completes the operation, and releases all the locks and resources held during the transaction.
- Each participant sends an acknowledgement to the coordinator.
- The coordinator completes the transaction when all acknowledgements have been received.
Failure
[edit]If any participant votes No during the commit-request phase (or the coordinator's timeout expires):
- The coordinator sends a rollback message to all the participants.
- Each participant undoes the transaction using the undo log, and releases the resources and locks held during the transaction.
- Each participant sends an acknowledgement to the coordinator.
- The coordinator undoes the transaction when all acknowledgements have been received.
Message flow
[edit]Coordinator Participant QUERY TO COMMIT --------------------------------> VOTE YES/NO prepare*/abort* <------------------------------- commit*/abort* COMMIT/ROLLBACK --------------------------------> ACKNOWLEDGEMENT commit*/abort* <-------------------------------- end
An * next to the record type means that the record is forced to stable storage.[4]
Formal specification
[edit]Two phase commit is a classic example widely used for an introductory example of various formal methods systems.
Formal specification can be model checked to validate the system for safety and liveness properties. Unlike testing, formal verification tools employ techniques like theorem proving and model checking to ensure the correctness in all possible scenarios.
Here is a specification for Two Phase Commit protocol in FizzBee language. The following specification specifies only a safety property specified as `always assertion` - there is never a case where one participant committed but another participant aborted
---
deadlock_detection: false
---
NUM_PARTICIPANTS = 2
role Coordinator:
action Init:
self.prepared = set()
self.state = "init"
action Write:
if self.state != "init":
return
self.state = "working"
# Phase 1
for p in participants:
vote = p.Prepare()
if vote == 'aborted':
# One participant aborted, start Phase 2: Failure case
self.Abort()
return
self.prepared.add(p.ID)
# Phase 2: Every participant responded success
self.Commit()
func Abort():
self.state = "aborted"
for p in participants:
p.Abort()
func Commit():
if self.state == 'working' and len(self.prepared) == len(participants):
self.state = 'committed'
for p in participants:
p.Commit()
role Participant:
action Init:
self.state = "working"
func Prepare():
if self.state != 'working':
return self.state
oneof:
self.state = 'prepared'
self.state = 'aborted'
return self.state
func Commit():
self.state = 'committed'
func Abort():
self.state = 'aborted'
always assertion ParticipantsConsistent:
for p1 in participants:
for p2 in participants:
if p1.state == 'committed' and p2.state == 'aborted':
return False
return True
action Init:
participants = []
for i in range(NUM_PARTICIPANTS):
p = Participant(ID=i)
participants.append(p)
coordinator = Coordinator()
This formal specification can be verified using FizzBee online playground
Liveness Properties:
One liveness property typically expected is, every transaction once initiated must be completed - either committed or aborted.
Sequence diagrams
[edit]Success
[edit]Failure
[edit]Timeout
Error recovery
Disadvantages
[edit]- The greatest disadvantage of the two-phase commit protocol is that it is a blocking protocol.
- If the coordinator fails permanently, some participants will never resolve their transactions: After a participant has sent an agreement message as a response to the commit-request message from the coordinator, it will block until a commit or rollback is received.
- A two-phase commit protocol cannot dependably recover from a failure of both the coordinator and a cohort member during the commit phase. If only the coordinator had failed, and no cohort members had received a commit message, it could safely be inferred that no commit had happened. If, however, both the coordinator and a cohort member failed, it is possible that the failed cohort member was the first to be notified, and had actually done the commit. Even if a new coordinator is selected, it cannot confidently proceed with the operation until it has received an agreement from all cohort members, and hence must block until all cohort members respond.
Implementing the two-phase commit protocol
[edit]Common architecture
[edit]In many cases the 2PC protocol is distributed in a computer network. It is easily distributed by implementing multiple dedicated 2PC components similar to each other, typically named transaction managers (TMs; also referred to as 2PC agents or Transaction Processing Monitors), that carry out the protocol's execution for each transaction (e.g., The Open Group's X/Open XA). The databases involved with a distributed transaction, the participants, both the coordinator and participants, register to close TMs (typically residing on respective same network nodes as the participants) for terminating that transaction using 2PC. Each distributed transaction has an ad hoc set of TMs, the TMs to which the transaction participants register. A leader, the coordinator TM, exists for each transaction to coordinate 2PC for it, typically the TM of the coordinator database. However, the coordinator role can be transferred to another TM for performance or reliability reasons. Rather than exchanging 2PC messages among themselves, the participants exchange the messages with their respective TMs. The relevant TMs communicate among themselves to execute the 2PC protocol schema above, "representing" the respective participants, for terminating that transaction. With this architecture the protocol is fully distributed (does not need any central processing component or data structure), and scales up with number of network nodes (network size) effectively.
This common architecture is also effective for the distribution of other atomic commitment protocols besides 2PC, since all such protocols use the same voting mechanism and outcome propagation to protocol participants.[1][2]
Protocol optimizations
[edit]Database research has been done on ways to get most of the benefits of the two-phase commit protocol while reducing costs by protocol optimizations[1][2][3] and protocol operations saving under certain system's behavior assumptions.
Presumed abort and presumed commit
[edit]Presumed abort or Presumed commit are common such optimizations.[2][3][5] An assumption about the outcome of transactions, either commit, or abort, can save both messages and logging operations by the participants during the 2PC protocol's execution. For example, when presumed abort, if during system recovery from failure no logged evidence for commit of some transaction is found by the recovery procedure, then it assumes that the transaction has been aborted, and acts accordingly. This means that it does not matter if aborts are logged at all, and such logging can be saved under this assumption. Typically a penalty of additional operations is paid during recovery from failure, depending on optimization type. Thus the best variant of optimization, if any, is chosen according to failure and transaction outcome statistics.
Tree two-phase commit protocol
[edit]The Tree 2PC protocol[2] (also called Nested 2PC, or Recursive 2PC) is a common variant of 2PC in a computer network, which better utilizes the underlying communication infrastructure. The participants in a distributed transaction are typically invoked in an order which defines a tree structure, the invocation tree, where the participants are the nodes and the edges are the invocations (communication links). The same tree is commonly utilized to complete the transaction by a 2PC protocol, but also another communication tree can be utilized for this, in principle. In a tree 2PC the coordinator is considered the root ("top") of a communication tree (inverted tree), while the participants are the other nodes. The coordinator can be the node that originated the transaction (invoked recursively (transitively) the other participants), but also another node in the same tree can take the coordinator role instead. 2PC messages from the coordinator are propagated "down" the tree, while messages to the coordinator are "collected" by a participant from all the participants below it, before it sends the appropriate message "up" the tree (except an abort message, which is propagated "up" immediately upon receiving it or if the current participant initiates the abort).
The Dynamic two-phase commit (Dynamic two-phase commitment, D2PC) protocol[2][6] is a variant of Tree 2PC with no predetermined coordinator. It subsumes several optimizations that have been proposed earlier. Agreement messages (Yes votes) start to propagate from all the leaves, each leaf when completing its tasks on behalf of the transaction (becoming ready). An intermediate (non leaf) node sends ready when an agreement message to the last (single) neighboring node from which agreement message has not yet been received. The coordinator is determined dynamically by racing agreement messages over the transaction tree, at the place where they collide. They collide either at a transaction tree node, to be the coordinator, or on a tree edge. In the latter case one of the two edge's nodes is elected as a coordinator (any node). D2PC is time optimal (among all the instances of a specific transaction tree, and any specific Tree 2PC protocol implementation; all instances have the same tree; each instance has a different node as coordinator): By choosing an optimal coordinator D2PC commits both the coordinator and each participant in minimum possible time, allowing the earliest possible release of locked resources in each transaction participant (tree node).
See also
[edit]References
[edit]- ^ a b c Philip A. Bernstein, Vassos Hadzilacos, Nathan Goodman (1987): Concurrency Control and Recovery in Database Systems, Chapter 7, Addison Wesley Publishing Company, ISBN 0-201-10715-5
- ^ a b c d e f Gerhard Weikum, Gottfried Vossen (2001): Transactional Information Systems, Chapter 19, Elsevier, ISBN 1-55860-508-8
- ^ a b c Philip A. Bernstein, Eric Newcomer (2009): Principles of Transaction Processing, 2nd Edition Archived 2010-08-07 at the Wayback Machine, Chapter 8, Morgan Kaufmann (Elsevier), ISBN 978-1-55860-623-4
- ^ C. Mohan, Bruce Lindsay and R. Obermarck (1986): "Transaction management in the R* distributed database management system",ACM Transactions on Database Systems (TODS), Volume 11 Issue 4, Dec. 1986, Pages 378 - 396
- ^ C. Mohan, Bruce Lindsay (1985): "Efficient commit protocols for the tree of processes model of distributed transactions",ACM SIGOPS Operating Systems Review, 19(2),pp. 40-52 (April 1985)
- ^ Yoav Raz (1995): "The Dynamic Two Phase Commitment (D2PC) protocol ",Database Theory — ICDT '95, Lecture Notes in Computer Science, Volume 893/1995, pp. 162-176, Springer, ISBN 978-3-540-58907-5