Nilsimsa Hash: Difference between revisions
clarified the original source and added references |
m better source |
||
Line 6: | Line 6: | ||
# The encoding should support an extremely low risk of false positives. |
# The encoding should support an extremely low risk of false positives. |
||
Nilsimsa similarity matching was taken in consideration by Jesse Kornblum when developing the fuzzy hashing in 2006,<ref>{{cite web|url=http://jessekornblum.livejournal.com/242493.html|title=The Fuzzy Hashing Patent|author=Jesse Kornblum|date=15 May 2008|website=LiveJournal|accessdate=23 February 2014}}</ref> that used the algorithms of spamsum by [[Andrew Tridgell]] (2002).<ref>{{cite |
Nilsimsa similarity matching was taken in consideration by Jesse Kornblum when developing the fuzzy hashing in 2006,<ref>{{cite web|url=http://jessekornblum.livejournal.com/242493.html|title=The Fuzzy Hashing Patent|author=Jesse Kornblum|date=15 May 2008|website=LiveJournal|accessdate=23 February 2014}}</ref> that used the algorithms of spamsum by [[Andrew Tridgell]] (2002).<ref>{{cite journal|author=Jesse Kornblum|title=Identifying almost identical files using context triggered piecewise hashing|url=http://dfrws.org/2006/proceedings/12-Kornblum.pdf|journal=DFRWS|date=2006|accessdate=23 February 2014}}</ref> |
||
Several implementations of Nilsimsa exist as [[open-source software]].<ref>{{cite web|url=https://code.google.com/p/py-nilsimsa/ |title=py-nilsimsa - Python port of Nilsimsa locality-sensitive hash - Google Project Hosting |publisher=Code.google.com |date= |accessdate=2013-09-01}}</ref><ref>{{cite web|url=http://nilsimsa.rubyforge.org/ |title=Nilsimsa |publisher=Nilsimsa.rubyforge.org |date= |accessdate=2013-09-01}}</ref><ref>{{cite web|url=http://metacpan.org/module/Digest::Nilsimsa/|title=Digest::Nilsimsa |publisher=metacpan.org |date= |accessdate=2013-09-01}}</ref> |
Several implementations of Nilsimsa exist as [[open-source software]].<ref>{{cite web|url=https://code.google.com/p/py-nilsimsa/ |title=py-nilsimsa - Python port of Nilsimsa locality-sensitive hash - Google Project Hosting |publisher=Code.google.com |date= |accessdate=2013-09-01}}</ref><ref>{{cite web|url=http://nilsimsa.rubyforge.org/ |title=Nilsimsa |publisher=Nilsimsa.rubyforge.org |date= |accessdate=2013-09-01}}</ref><ref>{{cite web|url=http://metacpan.org/module/Digest::Nilsimsa/|title=Digest::Nilsimsa |publisher=metacpan.org |date= |accessdate=2013-09-01}}</ref> |
Revision as of 14:08, 23 February 2014
This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these messages)
|
Nilsimsa is an anti-spam focused locality-sensitive hashing algorithm originally proposed the cmeclax remailer operator in 2001[1] and then reviewed by Damiani et. al. in their 2004 paper titled, "An Open Digest-based Technique for Spam Detection".[2] The goal of Nilsimsa is to generate a hash digest of an email message such that the digests of two similar messages are similar to each other. In comparison with cryptographic hash functions such as SHA-1 or MD5, making a small modification to a document does not substantially change the resulting hash of the document. Nilsimsa satisfies three requirements outlined by the paper's authors:
- The digest identifying each message should not vary significantly (sic) for changes that can be produced automatically.
- The encoding must be robust against intentional attacks.
- The encoding should support an extremely low risk of false positives.
Nilsimsa similarity matching was taken in consideration by Jesse Kornblum when developing the fuzzy hashing in 2006,[3] that used the algorithms of spamsum by Andrew Tridgell (2002).[4]
Several implementations of Nilsimsa exist as open-source software.[5][6][7]
References
- ^ cmeclax remailer operator (10 February 2002). "Nilsimsa v.0.2.4". Archived from the original on 7 July 2005. Retrieved 23 February 2014.
- ^ "An Open Digest-based Technique for Spam Detection" (PDF). 2004. Retrieved 2013-09-01.
{{cite web}}
: Unknown parameter|authors=
ignored (help) - ^ Jesse Kornblum (15 May 2008). "The Fuzzy Hashing Patent". LiveJournal. Retrieved 23 February 2014.
- ^ Jesse Kornblum (2006). "Identifying almost identical files using context triggered piecewise hashing" (PDF). DFRWS. Retrieved 23 February 2014.
- ^ "py-nilsimsa - Python port of Nilsimsa locality-sensitive hash - Google Project Hosting". Code.google.com. Retrieved 2013-09-01.
- ^ "Nilsimsa". Nilsimsa.rubyforge.org. Retrieved 2013-09-01.
- ^ "Digest::Nilsimsa". metacpan.org. Retrieved 2013-09-01.