Talk:Cluster analysis: Difference between revisions

Content deleted Content added

Inline

Latest revision as of 13:54, 15 February 2024

This is the talk page for discussing improvements to the Cluster analysis article.
This is not a forum for general discussion of the article's subject.

Put new text under old text. Click here to start a new topic.
New to Wikipedia? Welcome! Learn to edit; get help.

Article policies

Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL

Archives: 1: 12 months

Databases (inactive)

This article is within the scope of WikiProject Databases, a project which is currently considered to be inactive.DatabasesWikipedia:WikiProject DatabasesTemplate:WikiProject DatabasesDatabases

Computer science High‑importance

This article is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.Computer scienceWikipedia:WikiProject Computer scienceTemplate:WikiProject Computer scienceComputer science

High

This article has been rated as High-importance on the project's importance scale.

Things you can help WikiProject Computer science with:

Here are some tasks awaiting attention:

Article requests :
- Requested articles/Applied arts and sciences/Computer science, computing, and Internet
Cleanup :
- Computer science articles needing attention
- Computer science articles needing expert attention
Copyedit :
- Computing
Expand :
- Computer science
Infobox :
- Computer science articles without infoboxes
Maintain :
- Timeline of computing 2020–present
Photo :
- Find pictures for the biographies of computer scientists (see List of computer scientists)
- Computing articles needing images
Stubs :
- Computer science stubs
Unreferenced :
- WikiProject Computer science/Unreferenced BLPs
Project-related :
- Tag all relevant articles in Category:Computer science and sub-categories with {{WikiProject Computer science}}

Robotics Mid‑importance

	This article is within the scope of WikiProject Robotics, a collaborative effort to improve the coverage of Robotics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.RoboticsWikipedia:WikiProject RoboticsTemplate:WikiProject RoboticsRobotics
Mid	This article has been rated as Mid-importance on the project's importance scale.
	This article has been marked as needing immediate attention.

Statistics High‑importance

	This article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.StatisticsWikipedia:WikiProject StatisticsTemplate:WikiProject StatisticsStatistics
High	This article has been rated as High-importance on the importance scale.

Text has been copied to or from this article; see the list below. The source pages now serve to provide attribution for the content in the destination pages and must not be deleted as long as the copies exist. For attribution and to access older versions of the copied text, please see the history links below.

Copied Cluster analysis (history) → Hierarchical clustering (diff)
Copied Cluster analysis (history) → Fuzzy clustering (diff)
Copied Cluster analysis (history) → Educational data mining (diff)
Copied Cluster analysis (history) → Spectral clustering (diff)

The content of this article has been derived in whole or part from https://github.com/eXascaleInfolab/clubmark/tree/master/docs. Permission has been received from the copyright holder to release this material under both the Creative Commons Attribution-ShareAlike 3.0 Unported license and the GNU Free Documentation License. You may use either or both licenses. Evidence of this has been confirmed and stored by VRT volunteers, under ticket number 2019021110001288. Also available under Creative Commons Attribution 4.0 and Apache 2.0
This template is used by approved volunteers dealing with the Wikimedia volunteer response team system (VRTS) after receipt of a clear statement of permission at permissions-enwikimedia.org. Do not use this template to claim permission.

Inifinity-norm

Can someone please make infinity-norm a link: infinity-norm

(The article is currently locked.)

Sabotage

This page appears to have been deliberately vandalised.

Please unlock this page.

V-means clustering

A Google search for "V-means clustering" only returns this Wikipedia article. Can someone provide a citation for this?

for future ref, this is the V-means paragraph that was removed

V-means clustering

V-means clustering utilizes cluster analysis and nonparametric statistical tests to key researchers into segments of data that may contain distinct homogenous sub-sets. The methodology embraced by V-means clustering circumvents many of the problems that traditionally beleaguer standard techniques for categorizing data. First, instead of relying on analyst predictions for the number of distinct sub-sets (k-means clustering), V-means clustering generates a pareto optimal number of sub-sets. V-means clustering is calibrated to a usened confidence level p, whereby the algorithm divides the data and then recombines the resulting groups until the probability that any given group belongs to the same distribution as either of its neighbors is less than p.

Second, V-means clustering makes use of repeated iterations of the nonparametric Kolmogorov-Smirnov test. Standard methods of dividing data into its constituent parts are often entangled in definitions of distances (distance measure clustering) or in assumptions about the normality of the data (expectation maximization clustering), but nonparametric analysis draws inference from the distribution functions of sets.

Third, the method is conceptually simple. Some methods combine multiple techniques in sequence in order to produce more robust results. From a practical standpoint this muddles the meaning of the results and frequently leads to conclusions typical of “data dredging.”

Fuzzy c-means clarification

I believe ther is a typo at "typological analysis"; should be "topological"

The explanation of the fuzzy c-means algorithm seems quite difficult to follow, the actual order of the bullet points is correct but which bit is to be repeated and when is misleading.

"The fuzzy c-means algorithm is greatly similar to the k-means algorithm:

Choose a number of clusters
Assign randomly to each point coefficients for being in the clusters
Repeat until the algorithm has converged (that is, the coefficients' change between two iterations is no more than ε, the given sensitivity threshold) :
- Compute the centroid for each cluster, using the formula above
- For each point, compute its coefficients of being in the clusters, using the formula above"

Also aren't c-means and k-means just different names for the same thing, in which case can they be changed to be consistent throughout?

The c-means clustering relates only to the fuzzy logic clustering algorithm. You could say that k-means is teh convergence of c-clustering with ordinary logic, rather than fuzzy logic.

Remove or update grid-based clustering?

The grid-based clustering section has no real references and poorly described in comparison to the rest of the article.

@@ Line 1: / Line 1: @@
-{{talkheader}}
+{{Talk header}}
 {{notice|{{Graph:PageViews|365}}|heading=Daily page views |center=y |image=Open data small color.png}}
+{{WikiProject banner shell|class=C|
-{{WikiProjectBannerShell|
+=
-={{WPDATABASE|importance=high|class=C}}
-{{WikiProject Computer science|importance=high|class=C}}
+{{WikiProject Databases|importance=high}}
-{{WikiProject Robotics|class=C|importance=mid|attention=yes}}
+{{WikiProject Computer science|importance=high}}
-{{WPStatistics|importance=high|class=C}}
+{{WikiProject Robotics|importance=mid|attention=yes}}
+{{WikiProject Statistics|importance=high}}
-}}
-{{User:MiszaBot/config
-| algo=old(365d)
-| archive=Talk:Cluster analysis/Archive %(counter)d
-| counter=1
-| maxarchivesize=75K
-| archiveheader={{Automatic archive navigator}}
-| minthreadsleft=5
-| minthreadstoarchive=1
 }}
 {{Copied
@@ Line 33: / Line 25: @@
 |diff4        = http://en.wikipedia.org/enwiki/w/index.php?title=Cluster_analysis&diff=453684361&oldid=453662528
+}}
+{{User:MiszaBot/config
+| algo=old(365d)
+| archive=Talk:Cluster analysis/Archive %(counter)d
+| counter=1
+| maxarchivesize=75K
+| archiveheader={{Automatic archive navigator}}
+| minthreadsleft=5
+| minthreadstoarchive=1
 }}
 {{backwardscopy|url=http://files.aiscience.org/journal/article/html/70110028.html|title=What is Data Mining Methods with Different Group of Clustering and Classification|org=American Institute of Science, American Journal of Mobile Systems, Applications and Services|year=2015|monthday=October|comments=The authors even copied the sentence: 'An overview of algorithms explained in Wikipedia can be found in the list of statistics algorithms.', and the content on Wikipedia significantly predates this publication.}}
-{{ConfirmationOTRS|source=https://github.com/eXascaleInfolab/clubmark/tree/master/docs|otrs=2019021110001288|license=dual|note=Also available under [https://creativecommons.org/licenses/by/4.0/ Creative Commons Attribution 4.0] and [https://www.apache.org/licenses/LICENSE-2.0 Apache 2.0]}}
+{{Ticket confirmation|source=https://github.com/eXascaleInfolab/clubmark/tree/master/docs|id=2019021110001288|license=dual|note=Also available under [https://creativecommons.org/licenses/by/4.0/ Creative Commons Attribution 4.0] and [https://www.apache.org/licenses/LICENSE-2.0 Apache 2.0]}}
-{{dashboard.wikiedu.org assignment | course = Wikipedia:Wiki_Ed/New_York_University/Research_Process_and_Methodology_-_RPM_FA_2020_-_MASY1-GC_1260_200_Thu_(Fall_2020) | assignments = [[User:Rc4230|Rc4230]] | start_date = 2020-09-06 | end_date = 2020-12-06 }}
 == Inifinity-norm ==
@@ Line 87: / Line 87: @@
 The c-means clustering relates only to the fuzzy logic clustering algorithm. You could say that k-means is teh convergence of c-clustering with ordinary logic, rather than fuzzy logic.
-== What's the point of cluster analysis? ==
+== Remove or update grid-based clustering? ==
+The grid-based clustering section has no real references and poorly described in comparison to the rest of the article.
-Could someone the statistical field include a line or two in the intro (or elsewhere) that explains the purpose of the cluster analysis? The "What" and "How" is explained to a good extent but I can't find the "why" anywhere.  Given it's use in machine learning and data mining, I think it would be timely to include the reasons.
-[[User:Economicactvist|Economicactvist]] ([[User talk:Economicactvist|talk]]) 08:26, 25 June 2019 (UTC)