Maximum subarray problem
In computer science, the maximum sum subarray problem is the task of finding a contiguous subarray with the largest sum, within a given one-dimensional array A[1...n] of numbers. Formally, the task is to find indices and with , such that the sum
is as large as possible. (Some formulations of the problem also allow the empty subarray to be considered; by convention, the sum of all values of the empty subarray is zero.) Each number in the input array A could be positive, negative, or zero.[1]
For example, for the array of values [−2, 1, −3, 4, −1, 2, 1, −5, 4], the contiguous subarray with the largest sum is [4, −1, 2, 1], with sum 6.
Some properties of this problem are:
- If the array contains all non-negative numbers, then the problem is trivial; a maximum subarray is the entire array.
- If the array contains all non-positive numbers, then a solution is any subarray of size 1 containing the maximal value of the array (or the empty subarray, if it is permitted).
- Several different sub-arrays may have the same maximum sum.
This problem can be solved using several different algorithmic techniques, including brute force,[2] divide and conquer,[3] dynamic programming,[4] and reduction to shortest paths.[citation needed]
History
The maximum subarray problem was proposed by Ulf Grenander in 1977 as a simplified model for maximum likelihood estimation of patterns in digitized images.[5]
Grenander was looking to find a rectangular subarray with maximum sum, in a two-dimensional array of real numbers. A brute-force algorithm for the two-dimensional problem runs in O(n6) time; because this was prohibitively slow, Grenander proposed the one-dimensional problem to gain insight into its structure. Grenander derived an algorithm that solves the one-dimensional problem in O(n2) time,[note 1] improving the brute force running time of O(n3). When Michael Shamos heard about the problem, he overnight devised an O(n log n) divide-and-conquer algorithm for it. Soon after, Shamos described the one-dimensional problem and its history at a Carnegie Mellon University seminar attended by Jay Kadane, who designed within a minute an O(n)-time algorithm,[5][6][7] which is as fast as possible.[note 2] In 1982, David Gries obtained the same O(n)-time algorithm by applying Dijkstra's "standard strategy";[8] in 1989, Richard Bird derived it by purely algebraic manipulation of the brute-force algorithm using the Bird–Meertens formalism.[9]
Grenander's two-dimensional generalization can be solved in O(n3) time either by using Kadane's algorithm as a subroutine, or through a divide-and-conquer approach. Slightly faster algorithms based on distance matrix multiplication have been proposed by Tamaki & Tokuyama (1998) and by Takaoka (2002). There is some evidence that no significantly faster algorithm exists; an algorithm that solves the two-dimensional maximum subarray problem in O(n3−ε) time, for any ε>0, would imply a similarly fast algorithm for the all-pairs shortest paths problem.[10]
Applications
This section needs attention from an expert in Computational biology. The specific problem is: fix inline tags.(September 2019) |
Maximum subarray problems arise in many fields, such as genomic sequence analysis and computer vision.
Genomic sequence analysis employs maximum subarray algorithms to identify important biological segments of protein sequences.[citation needed] These problems include conserved segments, GC-rich regions, tandem repeats, low-complexity filter, DNA binding domains, and regions of high charge.[citation needed]
In computer vision, maximum-subarray algorithms are used on bitmap images to detect the brightest area in an image.
Kadane's algorithm
Empty subarrays admitted
Example run |
---|
Kadane's original algorithm solves the problem version when empty subarrays are admitted. It scans the given array from left to right.
In the th step, it computes the subarray with the largest sum ending at ; this sum is maintained in variable current_sum
.[note 3]
Moreover, it computes the subarray with the largest sum anywhere in , maintained in variable best_sum
,[note 4]
and easily obtained as the maximum of all values of current_sum
seen so far, cf. line 7 of the algorithm.
As a loop invariant, in the th step, the old value of current_sum
holds the maximum over all of the sum .[note 5]
Therefore, current_sum
[note 6]
is the maximum over all of the sum . To extend the latter maximum to cover also the case , it is sufficient to consider also the empty subarray . This is done in line 6 by assigning current_sum
as the new value of current_sum
, which after that holds the maximum over all of the sum .
Thus, the problem can be solved with the following code,[4][7] expressed here in Python:
def max_subarray(numbers):
"""Find the largest sum of any contiguous subarray."""
best_sum = 0
current_sum = 0
for x in numbers:
current_sum = max(0, current_sum + x)
best_sum = max(best_sum, current_sum)
return best_sum
This version of the algorithm will return 0 if the input contains no positive elements (including when the input is empty).
No empty subarrays admitted
For the variant of the problem which disallows empty subarrays, best_sum
should be initialized to negative infinity instead[11] and also in the for loop current_sum
should be updated as max(x, current_sum + x)
.[note 7]
In that case, if the input contains no positive element, the returned value is that of the largest element (i.e., the value closest to 0), or negative infinity if the input was empty. For correctness, an exception should be raised when the input array is empty, since an empty array has no maximum subarray:
def max_subarray(numbers):
"""Find the largest sum of any contiguous subarray."""
if not(numbers):
raise ValueError('Empty array has no nonempty subarrays')
best_sum = float('-inf')
current_sum = 0
for x in numbers:
current_sum = max(x, current_sum + x)
best_sum = max(best_sum, current_sum)
return best_sum
Conditionally admitting empty subarrays
The only case when it matters if empty subarrays are admitted, is if all numbers in the input array are negative. In this case, the maximum subarray will either be empty (when empty subarrays are allowed), or contain the largest number in the input array (when empty subarrays are not allowed).
An alternative algorithm that admits empty subarrays is easily developed from the algorithm given above which does not admit empty subarrays: The only change that is needed is to return max(best_sum, 0)
instead of best_sum
. It can be seen that this version is correct:
- For an empty input array the previous algorithm will return minus infinity, so this algorithm will return zero, which corresponds to the sum of elements of an empty subarray.
- For an input array with only negative numbers, the previous algorithm will return the largest of the integers, which is negative. So this algorithm will return zero, which corresponds to the sum of elements of an empty subarray.
- For all other cases, there is at least one nonnegative integer in the output, so there is a nonempty subarray for which the sum of the elements is at least 0. Since the sum of the elements is always zero for empty subarrays, it doesn't matter if empty subarrays are admitted or not, so this algorithm correctly returns the same answer as the previous algorithm gives.
This algorithm can also be converted to a version that conditionally admits empty subarrays, based on a parameter: If empty subarrays are admitted, return max(0, best_sum)
, otherwise, return best_sum
. An exception should be raised the input array is empty but empty subarrays are not admitted:
def max_subarray(numbers, admit_empty_subarrays=True):
"""Find the largest sum of any contiguous subarray."""
if not(admit_empty_subarrays) and not(numbers):
raise ValueError('Empty array has no nonempty subarrays')
best_sum = float('-inf')
current_sum = 0
for x in numbers:
current_sum = max(x, current_sum + x)
best_sum = max(best_sum, current_sum)
if admit_empty_subarrays:
best_sum = max(0, best_sum)
return best_sum
Computing the best subarray's position
The algorithm can be modified to keep track of the starting and ending indices of the maximum subarray as well:
def max_subarray(numbers):
"""Find a contiguous subarray with the largest sum."""
best_sum = 0 # or: float('-inf')
best_start = best_end = 0 # or: None
current_sum = 0
for current_end, x in enumerate(numbers):
if current_sum <= 0:
# Start a new sequence at the current element
current_start = current_end
current_sum = x
else:
# Extend the existing sequence with the current element
current_sum += x
if current_sum > best_sum:
best_sum = current_sum
best_start = current_start
best_end = current_end + 1 # the +1 is to make 'best_end' match Python's slice convention (endpoint excluded)
return best_sum, best_start, best_end
In Python, arrays are indexed starting from 0, and slices exclude the endpoint, so that the subarray [22, 33] in the array a=[-11, 22, 33, -44] would be expressed as a[1:3].
Complexity
Because of the way this algorithm uses optimal substructures (the maximum subarray ending at each position is calculated in a simple way from a related but smaller and overlapping subproblem: the maximum subarray ending at the previous position) this algorithm can be viewed as a simple/trivial example of dynamic programming.
The runtime complexity of Kadane's algorithm is .[4][7]
Generalizations
Similar problems may be posed for higher-dimensional arrays, but their solutions are more complicated; see, e.g., Takaoka (2002). Brodal & Jørgensen (2007) showed how to find the k largest subarray sums in a one-dimensional array, in the optimal time bound .
The Maximum sum k-disjoint subarrays can also be computed in the optimal time bound .[12]
See also
Notes
- ^ By using a precomputed table of cumulative sums to compute the subarray sum in constant time
- ^ since every algorithm must at least scan the array once which already takes O(n) time
- ^ named
MaxEndingHere
in Bentley (1989), andc
in Gries (1982) - ^ named
MaxSoFar
in Bentley (1989), ands
in Gries (1982) - ^ This sum is when , corresponding to the empty subarray .
- ^
In the Python code below, is expressed as
x
, with the index left implicit. - ^ While the latter modification is not mentioned by Bentley (1989), it achieves maintaining the modified loop invariant
current_sum
at the beginning of the th step.
References
- ^ Bentley 1989, p. 69.
- ^ Bentley 1989, p. 70.
- ^ Bentley 1989, p. 73.
- ^ a b c Bentley 1989, p. 74.
- ^ a b Bentley 1984, p. 868-869.
- ^ Bentley 1989, p. 76-77.
- ^ a b c Gries 1982, p. 211.
- ^ Gries 1982, p. 209-211.
- ^ Bird 1989, Sect.8, p.126.
- ^ Backurs, Dikkala & Tzamos 2016.
- ^ Bentley 1989, p. 78,171.
- ^ Bengtsson & Chen 2007.
- Backurs, Arturs; Dikkala, Nishanth; Tzamos, Christos (2016), "Tight Hardness Results for Maximum Weight Rectangles", Proc. 43rd International Colloquium on Automata, Languages, and Programming: 81:1–81:13, doi:10.4230/LIPIcs.ICALP.2016.81, S2CID 12720136
{{citation}}
: CS1 maint: unflagged free DOI (link) - Bae, Sung Eun (2007), Sequential and Parallel Algorithms for the Generalized Maximum Subarray Problem (PDF) (Ph.D. thesis), University of Canterbury, S2CID 2681670, archived from the original (PDF) on 2017-10-26.
- Bengtsson, Fredrik; Chen, Jingsen (2007), Computing maximum-scoring segments optimally (PDF) (Research report), Luleå University of Technology
- Bentley, Jon (1984), "Programming Pearls: Algorithm Design Techniques", Communications of the ACM, 27 (9): 865–873, doi:10.1145/358234.381162, S2CID 207565329
- Bentley, Jon (May 1989), Programming Pearls (2nd? ed.), Reading, MA: Addison Wesley, ISBN 0-201-10331-1
- Bird, Richard S. (1989), "Algebraic Identities for Program Calculation" (PDF), The Computer Journal, 32 (2): 122–126, doi:10.1093/comjnl/32.2.122[dead link ]
- Brodal, Gerth Stølting; Jørgensen, Allan Grønlund (2007), "A linear time algorithm for the k maximal sums problem", Mathematical Foundations of Computer Science, Lecture Notes in Computer Science, vol. 4708, Springer-Verlag, pp. 442–453, doi:10.1007/978-3-540-74456-6_40.
- Gries, David (1982), "A Note on the Standard Strategy for Developing Loop Invariants and Loops" (PDF), Science of Computer Programming, 2 (3): 207–241, doi:10.1016/0167-6423(83)90015-1, hdl:1813/6370
- Takaoka, Tadao (2002), "Efficient algorithms for the maximum subarray problem by distance matrix multiplication", Electronic Notes in Theoretical Computer Science, 61: 191–200, doi:10.1016/S1571-0661(04)00313-5.
- Tamaki, Hisao; Tokuyama, Takeshi (1998), "Algorithms for the Maximum Subarray Problem Based on Matrix Multiplication", Proceedings of the 9th Symposium on Discrete Algorithms (SODA): 446–452, retrieved November 17, 2018
External links
- TAN, Lirong. "Maximum Contiguous Subarray Sum Problems" (PDF). Archived from the original (PDF) on 2015-10-10. Retrieved 2017-10-26.
- Mu, Shin-Cheng (2010). "The Maximum Segment Sum Problem: Its Origin, and a Derivation".
- "Notes on Maximum Subarray Problem". 2012.
- www.algorithmist.com
- alexeigor.wikidot.com
- greatest subsequential sum problem on Rosetta Code
- geeksforgeeks page on Kadane's Algorithm