Maximum coverage problem

The maximum coverage problem is a classical question in computer science and complexity theory. It is a problem whose widely taught in approximation algorithms.

As input you are given several sets and a number $k$ . The sets may have some elements in common. You must select at most $k$ of these sets such that the maximum number of elements are covered, i.e. the union of the selected sets has maximal size.

Formally, (unweighted) Maximum Coverage

Instance: A number

k

and a collection of sets

S=S_{1},S_{2},\ldots ,S_{m}

, where

S_{i}\subseteq \left\{e_{1},e_{2},\ldots ,e_{n}\right\}

.

Objective: Find a subset

S^{'}\subseteq S

of sets, such that

\left|S^{'}\right|\leq k

and the number of covered elements

\left|\bigcup _{S_{i}\in S^{'}}{S_{i}}\right|

is maximized.

The maximum coverage problem is NP-hard, and cannot be approximated within ${\frac {e}{e-1}}-o(1)\approx 1.58$ under standard assumptions. This result essentially matches the approximation ratio achieved by the greedy algorithm.

ILP formulation

The maximum coverage problem can be formulated as the following integer linear program.

maximize

\sum _{e_{j}\in E}y_{j}

. (maximizing the sum of covered elements).

subject to

\sum {x_{i}}\leq k

; (no more than

k

sets are selected).

\sum _{\,e_{j}\in S_{i}}x_{i}\geq y_{j}

; (if

y_{j}\geq 0

then at least one set

e_{j}\in S_{i}

is selected).

0\leq y_{j}\leq 1

; (if

y_{j}=1

then

e_{j}

is covered)

x_{i}\in \{0,1\}

(if

x_{i}=1

then

S_{i}

is selected for the cover).

Greedy algorithm

The greedy algorithm for maximum coverage chooses sets according to one rule: at each stage, choose a set which contains the largest number of uncovered elements. It can be shown that this algorithm achieves an approximation ratio of ${\frac {e}{e-1}}$ . Inapproximability results show that the greedy algorithm is essentially the best-possible polynomial time approximation algorithm for maximum coverage.

Known Extensions

The inapproximability results apply to all extension all maximum coverage since they hold maximum coverage as a special case.

Weighted version

In the weighted version every element $e_{j}$ has a weight a weight $w(e_{j})$ . The task is to find a maximum coverage which has maximum weight. The basic version is a special case when all weights are $1$ .

maximize

\sum _{e\in E}w(e_{j})\cdot y_{j}

. (maximizing the weighted sum of covered elements).

subject to

\sum {x_{i}}\leq k

; (no more than

k

sets are selected).

\sum _{e_{j}\in S_{i}}x_{i}\geq y_{j}

; (if

y_{j}\geq 0

then at least one set

e_{j}\in S_{i}

is selected).

0\leq y_{j}\leq 1

; (if

y_{j}=1

then

e_{j}

is covered)

x_{i}\in \{0,1\}

(if

x_{i}=1

then

S_{i}

is selected for the cover).

The greedy algorithm for the weighted maximum coverage at each stage chooses a set which contains the maximum weight of uncovered elements. This algorithm achieves an approximation ratio of ${\frac {e}{e-1}}$ .

Budgeted Maximum Coverage

In the budgeted maximum coverage version not only every element $e_{j}$ has a weight $w(e_{j})$ , but also every set $S_{i}$ has a cost $c(S_{i})$ . Instead of $k$ that limits the number of sets in the cover a budget $B$ is given. This budget $B$ limits the weight of the cover that can be chosen.

maximize

\sum _{e\in E}w(e_{j})\cdot y_{j}

. (maximizing the weighted sum of covered elements).

subject to

\sum {c(S_{i})\cdot x_{i}}\leq B

; (the cost of the selected sets cannot exceed

B

).

\sum _{e_{j}\in S_{i}}x_{i}\geq y_{j}

; (if

y_{j}\geq 0

then at least one set

e_{j}\in S_{i}

is selected).

0\leq y_{j}\leq 1

; (if

y_{j}=1

then

e_{j}

is covered)

x_{i}\in \{0,1\}

(if

x_{i}=1

then

S_{i}

is selected for the cover).

A greedy algorithm will no longer produce solutions with a performance guarantee. Namely, the worst case behavior of this algorithm might be very far from the optimal solution. The approximation algorithm is extended by the following way: First find a solution using greedy algorithm. In each iteration of the greedy algorithm the tentative solution adds the set which contains the maximum weight of uncovered elements divided by the cost of the set. Second compare the solution gained by the first step to the best solution which uses a small number of sets. Third return the best out of all examined solutions. This algorithm achieves an approximation ratio of ${\frac {e}{e-1}}$ .

Generalized Maximum Coverage

In the generalized maximum coverage version every set $S_{i}$ has a cost $c(S_{i})$ , element $e_{j}$ has a different weight and cost depending on which set covers it. Namely, if $e_{j}$ is covered by set $S_{i}$ the weight of $e_{j}$ is $w_{i}(e_{j})$ and its cost is $c_{i}(e_{j})$ . A budget $B$ is given for the total cost of the solution.

maximize

\sum _{e\in E,S_{i}}w_{i}(e_{j})\cdot y_{ij}

. (maximizing the weighted sum of covered elements in the sets in which they are covered).

subject to

\sum {c_{i}(e_{j})\cdot w_{ij}}+\sum {c(S_{i})\cdot x_{i}}\leq B

; (the cost of the selected sets cannot exceed

B

).

\sum _{i}y_{ij}\geq 1

; (element

e_{j}=1

can only be covered by at most one set).

\sum _{S_{i}}x_{i}\geq y_{ij}

; (if

y_{j}\geq 0

then at least one set

e_{j}\in S_{i}

is selected).

y_{ij}\in \{0,1\}

; (if

y_{ij}=1

then

e_{j}

is covered by set

S_{i}

)

x_{i}\in \{0,1\}

(if

x_{i}=1

then

S_{i}

is selected for the cover).

Generalized Maximum Coverage Algorithm

The algorithm uses the concept of residual cost/weight. The residual cost/weight is measured against a tentative solution and its the difference of the cost/weight from the cost/weight gained by a tentative solution.

The algorithm has several stages. First find a solution using greedy algorithm. In each iteration of the greedy algorithm the tentative solution is added the set which contains the maximum residual weight of elements divided by the residual cost of these elements along with the residual cost of the set. Second compare the solution gained by the first step to the best solution which uses a small number of sets. Third return the best out of all examined solutions. This algorithm achieves an approximation ratio of ${\frac {e}{e-1}}+o(1)$ .

References

Vazirani, Vijay V. (2001). Approximation Algorithms. Springer-Verlag. ISBN 3-540-65367-8.
A Threshold of ln $n$ for Approximating Set Cover. Uriel Feige. Journal of the ACM (JACM), v.45 n.4, p.634 - 652, July 1998.
The budgeted maximum coverage problem Samir Khuller, Anna Moss, and Joseph (Seffi) Naor. Information Processing Letters, Vol. 70, Issue 1, pp. 39-45, 1999.
The generalized maximum coverage Reuven Cohen and Liran Katzir. Information Processing Letters, Vol. 108, Issue 1, pp. 15-22, September 2008.

External links

ILP formulation

Greedy algorithm

Known Extensions

Weighted version

Budgeted Maximum Coverage

Generalized Maximum Coverage

Generalized Maximum Coverage Algorithm

Related problems

References

External links