Maximum coverage problem: Difference between revisions

Content deleted Content added

Inline

Latest revision as of 21:56, 27 December 2024

The maximum coverage problem is a classical question in computer science, computational complexity theory, and operations research. It is a problem that is widely taught in approximation algorithms.

As input you are given several sets and a number $k$ . The sets may have some elements in common. You must select at most $k$ of these sets such that the maximum number of elements are covered, i.e. the union of the selected sets has maximal size.

Formally, (unweighted) Maximum Coverage

Instance: A number

k

and a collection of sets

S=\{S_{1},S_{2},\ldots ,S_{m}\}

.

Objective: Find a subset

S'\subseteq S

of sets, such that

\left|S'\right|\leq k

and the number of covered elements

\left|\bigcup _{S_{i}\in S'}{S_{i}}\right|

is maximized.

The maximum coverage problem is NP-hard, and cannot be approximated to within $1-{\frac {1}{e}}+o(1)\approx 0.632$ under standard assumptions. This result essentially matches the approximation ratio achieved by the generic greedy algorithm used for maximization of submodular functions with a cardinality constraint.^[1]

ILP formulation

The maximum coverage problem can be formulated as the following integer linear program.

maximize	$\sum _{e_{j}\in E}y_{j}$	(maximizing the sum of covered elements)
subject to	$\sum {x_{i}}\leq k$	(no more than $k$ sets are selected)
	$\sum _{e_{j}\in S_{i}}x_{i}\geq y_{j}$	(if $y_{j}>0$ then at least one set $e_{j}\in S_{i}$ is selected)
	$y_{j}\in \{0,1\}$	(if $y_{j}=1$ then $e_{j}$ is covered)
	$x_{i}\in \{0,1\}$	(if $x_{i}=1$ then $S_{i}$ is selected for the cover)

Greedy algorithm

The greedy algorithm for maximum coverage chooses sets according to one rule: at each stage, choose a set which contains the largest number of uncovered elements. It can be shown that this algorithm achieves an approximation ratio of $1-{\frac {1}{e}}$ .^[2] ln-approximability results show that the greedy algorithm is essentially the best-possible polynomial time approximation algorithm for maximum coverage unless $P=NP$ .^[3]

Known extensions

The inapproximability results apply to all extensions of the maximum coverage problem since they hold the maximum coverage problem as a special case.

The Maximum Coverage Problem can be applied to road traffic situations; one such example is selecting which bus routes in a public transportation network should be installed with pothole detectors to maximise coverage, when only a limited number of sensors is available. This problem is a known extension of the Maximum Coverage Problem and was first explored in literature by Junade Ali and Vladimir Dyo.^[4]

Weighted version

In the weighted version every element $e_{j}$ has a weight $w(e_{j})$ . The task is to find a maximum coverage which has maximum weight. The basic version is a special case when all weights are $1$ .

maximize

\sum _{e\in E}w(e_{j})\cdot y_{j}

. (maximizing the weighted sum of covered elements).

subject to

\sum {x_{i}}\leq k

; (no more than

k

sets are selected).

\sum _{e_{j}\in S_{i}}x_{i}\geq y_{j}

; (if

y_{j}>0

then at least one set

e_{j}\in S_{i}

is selected).

y_{j}\in \{0,1\}

; (if

y_{j}=1

then

e_{j}

is covered)

x_{i}\in \{0,1\}

(if

x_{i}=1

then

S_{i}

is selected for the cover).

The greedy algorithm for the weighted maximum coverage at each stage chooses a set that contains the maximum weight of uncovered elements. This algorithm achieves an approximation ratio of $1-{\frac {1}{e}}$ .^[1]

Budgeted maximum coverage

In the budgeted maximum coverage version, not only does every element $e_{j}$ have a weight $w(e_{j})$ , but also every set $S_{i}$ has a cost $c(S_{i})$ . Instead of $k$ that limits the number of sets in the cover a budget $B$ is given. This budget $B$ limits the total cost of the cover that can be chosen.

maximize

\sum _{e\in E}w(e_{j})\cdot y_{j}

. (maximizing the weighted sum of covered elements).

subject to

\sum {c(S_{i})\cdot x_{i}}\leq B

; (the cost of the selected sets cannot exceed

B

).

\sum _{e_{j}\in S_{i}}x_{i}\geq y_{j}

; (if

y_{j}>0

then at least one set

e_{j}\in S_{i}

is selected).

y_{j}\in \{0,1\}

; (if

y_{j}=1

then

e_{j}

is covered)

x_{i}\in \{0,1\}

(if

x_{i}=1

then

S_{i}

is selected for the cover).

A greedy algorithm will no longer produce solutions with a performance guarantee. Namely, the worst case behavior of this algorithm might be very far from the optimal solution. The approximation algorithm is extended by the following way. First, define a modified greedy algorithm, that selects the set $S_{i}$ that has the best ratio of weighted uncovered elements to cost. Second, among covers of cardinality $1,2,...,k-1$ , find the best cover that does not violate the budget. Call this cover $H_{1}$ . Third, find all covers of cardinality $k$ that do not violate the budget. Using these covers of cardinality $k$ as starting points, apply the modified greedy algorithm, maintaining the best cover found so far. Call this cover $H_{2}$ . At the end of the process, the approximate best cover will be either $H_{1}$ or $H_{2}$ . This algorithm achieves an approximation ratio of $1-{1 \over e}$ for values of $k\geq 3$ . This is the best possible approximation ratio unless $NP\subseteq DTIME(n^{O(\log \log n)})$ .^[5]

Generalized maximum coverage

In the generalized maximum coverage version every set $S_{i}$ has a cost $c(S_{i})$ , element $e_{j}$ has a different weight and cost depending on which set covers it. Namely, if $e_{j}$ is covered by set $S_{i}$ the weight of $e_{j}$ is $w_{i}(e_{j})$ and its cost is $c_{i}(e_{j})$ . A budget $B$ is given for the total cost of the solution.

maximize

\sum _{e\in E,S_{i}}w_{i}(e_{j})\cdot y_{ij}

. (maximizing the weighted sum of covered elements in the sets in which they are covered).

subject to

\sum {c_{i}(e_{j})\cdot y_{ij}}+\sum {c(S_{i})\cdot x_{i}}\leq B

; (the cost of the selected sets cannot exceed

B

).

\sum _{i}y_{ij}\leq 1

; (element

e_{j}=1

can only be covered by at most one set).

\sum _{S_{i}}x_{i}\geq y_{ij}

; (if

y_{j}>0

then at least one set

e_{j}\in S_{i}

is selected).

y_{ij}\in \{0,1\}

; (if

y_{ij}=1

then

e_{j}

is covered by set

S_{i}

)

x_{i}\in \{0,1\}

(if

x_{i}=1

then

S_{i}

is selected for the cover).

Generalized maximum coverage algorithm

The algorithm uses the concept of residual cost/weight. The residual cost/weight is measured against a tentative solution and it is the difference of the cost/weight from the cost/weight gained by a tentative solution.

The algorithm has several stages. First, find a solution using greedy algorithm. In each iteration of the greedy algorithm the tentative solution is added the set which contains the maximum residual weight of elements divided by the residual cost of these elements along with the residual cost of the set. Second, compare the solution gained by the first step to the best solution which uses a small number of sets. Third, return the best out of all examined solutions. This algorithm achieves an approximation ratio of $1-1/e-o(1)$ .^[6]

Notes

^ ^a ^b G. L. Nemhauser, L. A. Wolsey and M. L. Fisher. An analysis of approximations for maximizing submodular set functions I, Mathematical Programming 14 (1978), 265–294
^ Hochbaum, Dorit S. (1997). "Approximating Covering and Packing Problems: Set Cover, Vertex Cover, Independent Set, and Related Problems". In Hochbaum, Dorit S. (ed.). Approximation Algorithms for NP-Hard Problems. Boston: PWS Publishing Company. pp. 94–143. ISBN 978-053494968-6.
^ Feige, Uriel (July 1998). "A Threshold of ln n for Approximating Set Cover". Journal of the ACM. 45 (4). New York, NY, USA: Association for Computing Machinery: 634–652. doi:10.1145/285055.285059. ISSN 0004-5411. S2CID 52827488.
^ Ali, Junade; Dyo, Vladimir (2017). "Coverage and Mobile Sensor Placement for Vehicles on Predetermined Routes: A Greedy Heuristic Approach". Proceedings of the 14th International Joint Conference on e-Business and Telecommunications. Vol. 2: WINSYS. pp. 83–88. doi:10.5220/0006469800830088. ISBN 978-989-758-261-5.
^ Khuller, Samir; Moss, Anna; Naor, Joseph (Seffi) (1999). "The budgeted maximum coverage problem". Information Processing Letters. 70: 39–45. CiteSeerX 10.1.1.49.5784. doi:10.1016/S0020-0190(99)00031-9.
^ Cohen, Reuven; Katzir, Liran (2008). "The Generalized Maximum Coverage Problem". Information Processing Letters. 108: 15–22. CiteSeerX 10.1.1.156.2073. doi:10.1016/j.ipl.2008.03.017.

References

Vazirani, Vijay V. (2001). Approximation Algorithms. Springer-Verlag. ISBN 978-3-540-65367-7.

[NVF-1] G. L. Nemhauser, L. A. Wolsey and M. L. Fisher. An analysis of approximations for maximizing submodular set functions I, Mathematical Programming 14 (1978), 265–294

[2] Hochbaum, Dorit S. (1997). "Approximating Covering and Packing Problems: Set Cover, Vertex Cover, Independent Set, and Related Problems". In Hochbaum, Dorit S. (ed.). Approximation Algorithms for NP-Hard Problems. Boston: PWS Publishing Company. pp. 94–143. ISBN 978-053494968-6.

[3] Feige, Uriel (July 1998). "A Threshold of ln n for Approximating Set Cover". Journal of the ACM. 45 (4). New York, NY, USA: Association for Computing Machinery: 634–652. doi:10.1145/285055.285059. ISSN 0004-5411. S2CID 52827488.

[4] Ali, Junade; Dyo, Vladimir (2017). "Coverage and Mobile Sensor Placement for Vehicles on Predetermined Routes: A Greedy Heuristic Approach". Proceedings of the 14th International Joint Conference on e-Business and Telecommunications. Vol. 2: WINSYS. pp. 83–88. doi:10.5220/0006469800830088. ISBN 978-989-758-261-5.

[5] Khuller, Samir; Moss, Anna; Naor, Joseph (Seffi) (1999). "The budgeted maximum coverage problem". Information Processing Letters. 70: 39–45. CiteSeerX 10.1.1.49.5784. doi:10.1016/S0020-0190(99)00031-9.

[6] Cohen, Reuven; Katzir, Liran (2008). "The Generalized Maximum Coverage Problem". Information Processing Letters. 108: 15–22. CiteSeerX 10.1.1.156.2073. doi:10.1016/j.ipl.2008.03.017.

[1]

[2]

[3]

[4]

[5]

[6]

@@ Line 8: / Line 8: @@
 Formally, (unweighted) Maximum Coverage
-: Instance: A number <math> k </math> and a collection of sets <math> S = S_1, S_2, \ldots, S_m </math>.
+: Instance: A number <math> k </math> and a collection of sets <math> S = \{S_1, S_2, \ldots, S_m\} </math>.
-: Objective: Find a subset <math> S^{'} \subseteq S</math> of sets, such that <math> \left| S^{'} \right| \leq k</math> and the number of covered elements <math> \left| \bigcup_{S_i \in S^{'}}{S_i} \right| </math> is maximized.
+: Objective: Find a subset <math> S' \subseteq S</math> of sets, such that <math> \left| S' \right| \leq k</math> and the number of covered elements <math> \left| \bigcup_{S_i \in S'}{S_i} \right| </math> is maximized.
-The maximum coverage problem is [[NP-hard]], and cannot be approximated within <math>1 - \frac{1}{e} + o(1) \approx 0.632</math> under standard assumptions.
+The maximum coverage problem is [[NP-hard]], and cannot be approximated to within <math>1 - \frac{1}{e} + o(1) \approx 0.632</math> under standard assumptions.
-This result essentially matches the approximation ratio achieved by the generic greedy algorithm used for  [[Submodular_set_function#Optimization_problems|maximization of submodular functions with a cardinality constraint]].<ref name="NVF"> [[George Nemhauser|G. L. Nemhauser]], L. A. Wolsey and M. L. Fisher. An analysis of approximations for maximizing submodular set functions I, Mathematical Programming 14 (1978), 265–294</ref>
+This result essentially matches the approximation ratio achieved by the generic greedy algorithm used for  [[Submodular set function#Optimization problems|maximization of submodular functions with a cardinality constraint]].<ref name="NVF">[[George Nemhauser|G. L. Nemhauser]], L. A. Wolsey and M. L. Fisher. An analysis of approximations for maximizing submodular set functions I, Mathematical Programming 14 (1978), 265–294</ref>
 ==ILP formulation==
@@ Line 29: / Line 29: @@
 == Greedy algorithm ==
-The [[greedy algorithm]] for maximum coverage chooses sets according to one rule: at each stage, choose a set which contains the largest number of uncovered elements. It can be shown that this algorithm achieves an approximation ratio of <math>1 - \frac{1}{e}</math>.<ref>{{cite book | last=Hochbaum | first=Dorit S. | authorlink=Dorit S. Hochbaum | editor-first=Dorit S. | editor-last=Hochbaum | year=1997 | chapter=Approximating Covering and Packing Problems: Set Cover, Vertex Cover, Independent Set, and Related Problems | title=Approximation Algorithms for NP-Hard Problems | publisher=PWS Publishing Company | location=Boston | isbn=053494968-1 | pages=94–143}}</ref> Inapproximability results show that the greedy algorithm is essentially the best-possible polynomial time approximation algorithm for maximum coverage.<ref>{{cite article | last = Feige | first = Uriel | authorlink = Uriel Feige | title = A Threshold of ln ''n'' for Approximating Set Cover | journal = Journal of the ACM | volume = 45 | number = 4 |date=July 1998 | issn = 0004-5411 | pages = 634–652 | doi = 10.1145/285055.285059 | publisher = Association for Computing Machinery | location = New York, NY, USA}}</ref>
+The [[greedy algorithm]] for maximum coverage chooses sets according to one rule: at each stage, choose a set which contains the largest number of uncovered elements. It can be shown that this algorithm achieves an approximation ratio of <math>1 - \frac{1}{e}</math>.<ref>{{cite book | last=Hochbaum | first=Dorit S. | author-link=Dorit S. Hochbaum | editor-first=Dorit S. | editor-last=Hochbaum | year=1997 | chapter=Approximating Covering and Packing Problems: Set Cover, Vertex Cover, Independent Set, and Related Problems | title=Approximation Algorithms for NP-Hard Problems | publisher=PWS Publishing Company | location=Boston | isbn=978-053494968-6 | pages=94–143}}</ref> ln-approximability results show that the greedy algorithm is essentially the best-possible polynomial time approximation algorithm for maximum coverage unless <math>P = NP</math>.<ref>{{cite journal | last = Feige | first = Uriel | author-link = Uriel Feige | title = A Threshold of ln ''n'' for Approximating Set Cover | journal = Journal of the ACM | volume = 45 | number = 4 |date=July 1998 | issn = 0004-5411 | pages = 634–652 | doi = 10.1145/285055.285059 | publisher = Association for Computing Machinery | location = New York, NY, USA| s2cid = 52827488 | doi-access = free }}</ref>
 == Known extensions ==
 The inapproximability results apply to all extensions of the maximum coverage problem since they hold the maximum coverage problem as a special case.
+The Maximum Coverage Problem can be applied to road traffic situations; one such example is selecting which bus routes in a public transportation network should be installed with pothole detectors to maximise coverage, when only a limited number of sensors is available. This problem is a known extension of the Maximum Coverage Problem and was first explored in literature by Junade Ali and Vladimir Dyo.<ref>{{cite book|last1=Ali|first1=Junade|last2=Dyo|first2=Vladimir|title=Proceedings of the 14th International Joint Conference on e-Business and Telecommunications |chapter=Coverage and Mobile Sensor Placement for Vehicles on Predetermined Routes: A Greedy Heuristic Approach |date=2017|volume=2: WINSYS|pages=83–88|doi=10.5220/0006469800830088|url=http://www.scitepress.org/DigitalLibrary/PublicationsDetail.aspx?ID=ddWw1NMB3VI%3d|isbn=978-989-758-261-5}}</ref>
 == Weighted version ==
@@ Line 39: / Line 41: @@
 :maximize <math>\sum_{e \in E} w(e_j) \cdot y_j </math>. (maximizing the weighted sum of covered elements).
 :subject to <math> \sum{x_i}  \leq k </math>; (no more than <math>k</math> sets are selected).
-::<math> \sum_{e_j \in S_i} x_i \geq y_j </math>; (if <math>y_j \geq 0 </math> then at least one set <math>e_j \in S_i</math> is selected).
+::<math> \sum_{e_j \in S_i} x_i \geq y_j </math>; (if <math>y_j > 0 </math> then at least one set <math>e_j \in S_i</math> is selected).
 ::<math>y_j \in \{0,1\}</math>; (if <math>y_j=1</math> then <math>e_j</math> is covered)
 ::<math>x_i \in \{0,1\}</math> (if <math>x_i=1</math> then <math>S_i</math> is selected for the cover).
-The greedy algorithm for the weighted maximum coverage at each stage chooses a set which contains the maximum weight of uncovered elements. This algorithm achieves an approximation ratio of <math>1 - \frac{1}{e}</math>.<ref name="NVF"/>
+The greedy algorithm for the weighted maximum coverage at each stage chooses a set that contains the maximum weight of uncovered elements. This algorithm achieves an approximation ratio of <math>1 - \frac{1}{e}</math>.<ref name="NVF"/>
 == Budgeted maximum coverage  ==
@@ Line 50: / Line 52: @@
 :maximize <math>\sum_{e \in E} w(e_j) \cdot y_j </math>. (maximizing the weighted sum of covered elements).
 :subject to <math> \sum{c(S_i) \cdot x_i}  \leq B </math>; (the cost of the selected sets cannot exceed <math>B</math>).
-::<math> \sum_{e_j \in S_i} x_i \geq y_j </math>; (if <math>y_j \geq 0 </math> then at least one set <math>e_j \in S_i</math> is selected).
+::<math> \sum_{e_j \in S_i} x_i \geq y_j </math>; (if <math>y_j > 0 </math> then at least one set <math>e_j \in S_i</math> is selected).
 ::<math>y_j \in \{0,1\}</math>; (if <math>y_j=1</math> then <math>e_j</math> is covered)
 ::<math>x_i \in \{0,1\}</math> (if <math>x_i=1</math> then <math>S_i</math> is selected for the cover).
-A greedy algorithm will no longer produce solutions with a performance guarantee. Namely, the worst case behavior of this algorithm might be very far from the optimal solution. The approximation algorithm is extended by the following way. First, after finding a solution using the greedy algorithm, return the better of the greedy algorithm's solution and the set of largest weight. Call this algorithm the modified greedy algorithm. Second, starting with all possible families of sets of sizes from one to (at least) three, augment these solutions with the modified greedy algorithm. Third, return the best out of all augmented solutions. This algorithm achieves an approximation ratio of <math>1- 1/e</math>. This is the best possible approximation ratio unless <math>NP \subseteq DTIME(n^{O(\log\log n)})</math>.<ref>Khuller, S., Moss, A., and Naor, J. 1999. [http://dx.doi.org/10.1016/S0020-0190(99)00031-9 The budgeted maximum coverage problem]. ''Inf. Process. Lett''. 70, 1 (Apr. 1999), 39-45.</ref>
+A greedy algorithm will no longer produce solutions with a performance guarantee. Namely, the worst case behavior of this algorithm might be very far from the optimal solution. The approximation algorithm is extended by the following way. First, define a modified greedy algorithm, that selects the set <math>S_i</math> that has the best ratio of weighted uncovered elements to cost. Second, among covers of cardinality <math>1, 2, ..., k-1</math>, find the best cover that does not violate the budget. Call this cover <math>H_1</math>. Third, find all covers of cardinality <math>k</math> that do not violate the budget. Using these covers of cardinality <math>k</math> as starting points, apply the modified greedy algorithm, maintaining the best cover found so far. Call this cover <math>H_2</math>. At the end of the process, the approximate best cover will be either <math>H_1</math> or <math>H_2</math>. This algorithm achieves an approximation ratio of <math>1- {1 \over e}</math> for values of <math>k \geq 3</math>. This is the best possible approximation ratio unless <math>NP \subseteq DTIME(n^{O(\log\log n)})</math>.<ref>{{Cite journal |doi = 10.1016/S0020-0190(99)00031-9|title = The budgeted maximum coverage problem|journal = Information Processing Letters|volume = 70|pages = 39–45|year = 1999|last1 = Khuller|first1 = Samir|last2 = Moss|first2 = Anna|last3 = Naor|first3 = Joseph (Seffi)|citeseerx = 10.1.1.49.5784}}</ref>
 == Generalized maximum coverage  ==
@@ Line 66: / Line 68: @@
 :subject to <math> \sum{c_i(e_j) \cdot y_{ij}} + \sum{c(S_i) \cdot x_i}  \leq B </math>; (the cost of the selected sets cannot exceed <math>B</math>).
 ::<math> \sum_{i} y_{ij} \leq 1 </math>; (element <math>e_j=1</math> can only be covered by at most one set).
-::<math> \sum_{S_i} x_i \geq y_{ij} </math>; (if <math>y_j \geq 0 </math> then at least one set <math>e_j \in S_i</math> is selected).
+::<math> \sum_{S_i} x_i \geq y_{ij} </math>; (if <math>y_j > 0 </math> then at least one set <math>e_j \in S_i</math> is selected).
 ::<math>y_{ij} \in \{0,1\} </math>; (if <math>y_{ij}=1</math> then <math>e_j</math> is covered by set <math>S_i</math>)
 ::<math>x_i \in \{0,1\}</math> (if <math>x_i=1</math> then <math>S_i</math> is selected for the cover).
@@ Line 73: / Line 75: @@
 The algorithm uses the concept of residual cost/weight. The residual cost/weight is measured against a tentative solution and it is the difference of the cost/weight from the cost/weight gained by a tentative solution.
-The algorithm has several stages. First, find a solution using greedy algorithm. In each iteration of the greedy algorithm the tentative solution is added the set which contains the maximum residual weight of elements divided by the residual cost of these elements along with the residual cost of the set. Second, compare the solution gained by the first step to the best solution which uses a small number of sets. Third, return the best out of all examined solutions. This algorithm achieves an approximation ratio of <math>1-1/e - o(1)</math>.<ref>Cohen, R. and Katzir, L. 2008. [http://dx.doi.org/10.1016/j.ipl.2008.03.017 The Generalized Maximum Coverage Problem]. ''Inf. Process. Lett''. 108, 1 (Sep. 2008), 15-22.</ref>
+The algorithm has several stages. First, find a solution using greedy algorithm. In each iteration of the greedy algorithm the tentative solution is added the set which contains the maximum residual weight of elements divided by the residual cost of these elements along with the residual cost of the set. Second, compare the solution gained by the first step to the best solution which uses a small number of sets. Third, return the best out of all examined solutions. This algorithm achieves an approximation ratio of <math>1-1/e - o(1)</math>.<ref>{{Cite journal |doi = 10.1016/j.ipl.2008.03.017|title = The Generalized Maximum Coverage Problem|journal = Information Processing Letters|volume = 108|pages = 15–22|year = 2008|last1 = Cohen|first1 = Reuven|last2 = Katzir|first2 = Liran|citeseerx = 10.1.1.156.2073}}</ref>
 == Related problems ==
@@ Line 82: / Line 84: @@
 == References ==
-* {{Cite book | last=Vazirani | first=Vijay V. | authorlink=Vijay Vazirani | title=Approximation Algorithms | year=2001 | publisher=Springer-Verlag | isbn=3-540-65367-8 | pages=}}
+* {{Cite book | last=Vazirani | first=Vijay V. | author-link=Vijay Vazirani | title=Approximation Algorithms | year=2001 | publisher=Springer-Verlag | isbn=978-3-540-65367-7 }}
-== External links ==
-[[Category:Set families]]
+[[Category:Families of sets]]
 [[Category:NP-complete problems]]