Jump to content

Nilsimsa Hash: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Trek00 (talk | contribs)
clarified the original source and added references
Trek00 (talk | contribs)
m better source
Line 6: Line 6:
# The encoding should support an extremely low risk of false positives.
# The encoding should support an extremely low risk of false positives.


Nilsimsa similarity matching was taken in consideration by Jesse Kornblum when developing the fuzzy hashing in 2006,<ref>{{cite web|url=http://jessekornblum.livejournal.com/242493.html|title=The Fuzzy Hashing Patent|author=Jesse Kornblum|date=15 May 2008|website=LiveJournal|accessdate=23 February 2014}}</ref> that used the algorithms of spamsum by [[Andrew Tridgell]] (2002).<ref>{{cite web|url=http://ssdeep.sourceforge.net/|title=Fuzzy Hashing and ssdeep|website=SourceForge|accessdate=23 February 2014}}</ref>
Nilsimsa similarity matching was taken in consideration by Jesse Kornblum when developing the fuzzy hashing in 2006,<ref>{{cite web|url=http://jessekornblum.livejournal.com/242493.html|title=The Fuzzy Hashing Patent|author=Jesse Kornblum|date=15 May 2008|website=LiveJournal|accessdate=23 February 2014}}</ref> that used the algorithms of spamsum by [[Andrew Tridgell]] (2002).<ref>{{cite journal|author=Jesse Kornblum|title=Identifying almost identical files using context triggered piecewise hashing|url=http://dfrws.org/2006/proceedings/12-Kornblum.pdf|journal=DFRWS|date=2006|accessdate=23 February 2014}}</ref>


Several implementations of Nilsimsa exist as [[open-source software]].<ref>{{cite web|url=https://code.google.com/p/py-nilsimsa/ |title=py-nilsimsa - Python port of Nilsimsa locality-sensitive hash - Google Project Hosting |publisher=Code.google.com |date= |accessdate=2013-09-01}}</ref><ref>{{cite web|url=http://nilsimsa.rubyforge.org/ |title=Nilsimsa |publisher=Nilsimsa.rubyforge.org |date= |accessdate=2013-09-01}}</ref><ref>{{cite web|url=http://metacpan.org/module/Digest::Nilsimsa/|title=Digest::Nilsimsa |publisher=metacpan.org |date= |accessdate=2013-09-01}}</ref>
Several implementations of Nilsimsa exist as [[open-source software]].<ref>{{cite web|url=https://code.google.com/p/py-nilsimsa/ |title=py-nilsimsa - Python port of Nilsimsa locality-sensitive hash - Google Project Hosting |publisher=Code.google.com |date= |accessdate=2013-09-01}}</ref><ref>{{cite web|url=http://nilsimsa.rubyforge.org/ |title=Nilsimsa |publisher=Nilsimsa.rubyforge.org |date= |accessdate=2013-09-01}}</ref><ref>{{cite web|url=http://metacpan.org/module/Digest::Nilsimsa/|title=Digest::Nilsimsa |publisher=metacpan.org |date= |accessdate=2013-09-01}}</ref>

Revision as of 14:08, 23 February 2014

Nilsimsa is an anti-spam focused locality-sensitive hashing algorithm originally proposed the cmeclax remailer operator in 2001[1] and then reviewed by Damiani et. al. in their 2004 paper titled, "An Open Digest-based Technique for Spam Detection".[2] The goal of Nilsimsa is to generate a hash digest of an email message such that the digests of two similar messages are similar to each other. In comparison with cryptographic hash functions such as SHA-1 or MD5, making a small modification to a document does not substantially change the resulting hash of the document. Nilsimsa satisfies three requirements outlined by the paper's authors:

  1. The digest identifying each message should not vary significantly (sic) for changes that can be produced automatically.
  2. The encoding must be robust against intentional attacks.
  3. The encoding should support an extremely low risk of false positives.

Nilsimsa similarity matching was taken in consideration by Jesse Kornblum when developing the fuzzy hashing in 2006,[3] that used the algorithms of spamsum by Andrew Tridgell (2002).[4]

Several implementations of Nilsimsa exist as open-source software.[5][6][7]

References

  1. ^ cmeclax remailer operator (10 February 2002). "Nilsimsa v.0.2.4". Archived from the original on 7 July 2005. Retrieved 23 February 2014.
  2. ^ "An Open Digest-based Technique for Spam Detection" (PDF). 2004. Retrieved 2013-09-01. {{cite web}}: Unknown parameter |authors= ignored (help)
  3. ^ Jesse Kornblum (15 May 2008). "The Fuzzy Hashing Patent". LiveJournal. Retrieved 23 February 2014.
  4. ^ Jesse Kornblum (2006). "Identifying almost identical files using context triggered piecewise hashing" (PDF). DFRWS. Retrieved 23 February 2014.
  5. ^ "py-nilsimsa - Python port of Nilsimsa locality-sensitive hash - Google Project Hosting". Code.google.com. Retrieved 2013-09-01.
  6. ^ "Nilsimsa". Nilsimsa.rubyforge.org. Retrieved 2013-09-01.
  7. ^ "Digest::Nilsimsa". metacpan.org. Retrieved 2013-09-01.