OPTICS algorithm
OPTICS is an algorithm for finding cluster in spatial data. It was presented by Mihael Ankerst, Markus M. Breunig, Hans-Peter Kriegel and Jörg Sander[1]. In its basic idea, it is similar to DBSCAN, but addresses one of DBSCAN's major weaknesses: the problem to detect meaningful cluster in data of varying density. In order to do so, the points of the database are (linearily) ordered such that points which are spatially closest become neighbors in the ordering. Additionally, a special distance is stored for each point that represents the density that needs to be accepted for a cluster in order to have both points belong to the same cluster. This is best represented in a dendrogram.
Basic idea
Like DBSCAN, OPTICS requires two parameters: , which describes the lowest density we want to consider, and , describing the number of points required to form a cluster. A point is a core point if at least points are found within its -neighborhood . Contrary to DBSCAN, OPTICS also consideres points that are part of a more densely packed cluster, so each point is assigned a core distance that basically describes the distance to its th point:
The reachability-distance of a point from another point describes the that we need to assume as the density of a cluster in order to have belong to the same cluster as :
Both the core-distance and the reachability-distance are undefined if no sufficiently dense cluster (w.r.t. ) is available. Given a sufficiently large , this will never happen, but then every -neighborhood query will return the entire database, resulting in an untractable runtime cost. Hence, the parameter is required to cut off the density of clusters that is no longer considered to be interesting.
Pseudocode
The basic approach of OPTICS is similar to DBSCAN, but instead of maintaining a set of cluster members, a priority queue is used.
OPTICS(DB, eps, MinPts) for each unprocessed point p of DB N = getNeighbors(p, eps) mark p as processed output p to the ordered list Seeds = empty priority queue if (core-distance(p, eps, Minpts) != UNDEFINED) update(N, p, Seeds, eps, Minpts) for each point q in Seeds N' = getNeighbors(q, eps) mark q as processed output q to the ordered list if (core-distance(q, eps, Minpts) != UNDEFINED) update(N', q, Seeds, eps, Minpts)
update(N, p, Seeds, eps, Minpts) coredist = core-distance(p, eps, MinPts) for each o in N if (o is not processed) new-reach-dist = max(coredist, dist(p,o)) if (o.reachability-distance == UNDEFINED) // o is not in Seeds o.reachability-distance = new-reach-dist Seeds.insert(o, new-reach-dist) else // o in Seeds, check for improvement if (new-reach-dist < o.reachability-distance) o.reachability-distance = new-reach-dist Seeds.move-up(o, new-reach-dist)
References
- ^ Mihael Ankerst, Markus M. Breunig, Hans-Peter Kriegel, Jörg Sander (1999). "OPTICS: Ordering Points To Identify the Clustering Structure". ACM SIGMOD international conference on Management of data. ACM Press. pp. 49–60.
{{cite conference}}
: Unknown parameter|booktitle=
ignored (|book-title=
suggested) (help)CS1 maint: multiple names: authors list (link)