Jump to content

User:Finkga/Cyber Analytics: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Finkga (talk | contribs)
Replaced content with '{{inactive userpage blanked}}'
Tag: Replaced
 
(9 intermediate revisions by 5 users not shown)
Line 1: Line 1:
{{inactive userpage blanked}}
{{Draft}}
<!-- This is a draft article -->

= Cyber Analytics =
[[File:Cyber_Analytics_graphic_v7.jpg|thumb|right|Cyber analytics is a science that supports the analysis of cyber data.]]
Cyber Analytics is a branch of [[Analytics | analytics]] that applies to the domain of [[computers]], [[computer network | networks]], and related [[computer data | data]]. Cyber analytics is the science of analysis applied to computers and computer networks. [[Analysis]] is a decision-making based on observable facts (data). A scientific approach compares observations to hypothetical models. Thus, cyber analytics helps the [[analyst]] to understand the behavior of computers, networks, and user activities from the data computer systems use and generate. Cyber analytics tells the story behind cyber data. Cyber analytics can be used to support [[computer security]], computer or network administration, auditing, and many other application areas. Another potential definition would be the systematic derivation of behavior from data.

== Derivation ==
Cyber analytics assumes there is a unifying story behind the fractured set of available data. In fact, there are many different stories interwoven through the various streams of data. Which story is seen also depends on the purposes and perspective of the analyst. The cyber analyst's job includes both [[synthesis]] of these separate streams, [[abductive reasoning | abduction]] of [[hypothesis | hypotheses]] that may explain them, and [[analysis]] of the hypotheses by comparing them to the data. Thus, cyber analytics is the science of investigation into the meaning of computer data. A more accurate term might be'''cyber Investigation''', but this connotes [[law enforcement]] which is only one possible application area. Analytics is more neutral and thus is preferred.

== Distinctive Features of Cyber Analytics ==
All analytic sciences support analysts who must make sense of massive, streaming data. Cyber analytics differs from other analytics primarily because of the characteristics of the data and the analysts who use it.
Computer and network data is generated by simpler processes than most textual data. Thus, we would expect it to have lower [[entropy (information theory) | entropy]] than data of human origin. However, cyber data is generated in extremely high volumes and velocities. Thus, it is streaming data that cannot easily be stored for long periods of time for off-line analysis. Cyber analysts often come from [[system administrator | system administration]], [[programmer | programming]] or other technical backgrounds as opposed to [[statistics]] where formal data analysis is taught. Thus, they often have their own approach to analysis based on subject matter expertise <ref>Fink GA, North CL, Endert A, and Rose SJ, “Visualizing Cyber Security: Usable Workspaces.” In Proceedings of the 2009 Workshop on Visualization for Computer Security (VizSEC 2009).</ref>.

Cyber analytics supports analysis, but not necessarily for human analysts. [[Machine learning]] techniques can inform automated analysis and response facilities that might not require human intervention at all. In contrast, [[Visual Analytics]] is more human-centric, with the human user essential as the consumer of the visualizations. Cyber analytics can be applied to [[forensics | forensic]] investigations or to predict future events. The latter is similar to [[Predictive analytics]] which is used mostly in business.

The key difference between cyber analytics for computer security and other forms of analysis is the essential adversarial nature of the analysis.

=== Cyber data ===
Cyber data is characterized by an extreme volume and velocity of highly-structured data that is mostly not suitable for humans to read. For example, Fink cites the daily volume of security-related log events that DOE passes up the chain for central analysis to be 500 million events<ref>Fink GA, McKinnon AD, Clements S, and Frincke DA, "Tensions in security collaboration goals and how this affects incident detection and response," chapter three in <i>Collaborative Cyber Security and Trust Management></i>, IGI Global, to appear.</ref>. Cyber data is not normally human-readable, although many log formats (such as syslog <ref>http://www.ietf.org/rfc/rfc5424.txt?number=5424</ref>) contain human-readable content. The data is typically structured according to some machine-oriented protocol, but the protocols used may be non-standard or proprietary implementations of standard protocols that may not fully interoperate.

The high velocity of cyber log data makes it impractical to store<ref>needed</ref>.

=== Cyber analysts ===

== The Need for Cyber Analytics ==
DOE cyber analysts must maintain near real-time situational awareness of a widely dispersed enterprise with over 100 sites, 500 thousand machines, and nearly 500 million events daily. The number of daily events is expected to soar into the billions in the near future. To maintain the safety of the DOE infrastructure, analysts must be able to gain a nation-wide perspective within seconds to minutes of a major event.

Analysis centers ask trending questions such as, “Are attacks becoming more effective?”, “Are attackers becoming more sophisticated?”, and “Are defenders improving their defensive posture?”. They must also answer key agency questions such as, “What resources is this external IP address accessing?”, and “Can you characterize the sites nation X is interested in?”.

Cyber analysts need tools for automated pattern extraction and recognition to track and monitor interesting events and show how bit patterns form indicators of behavioral patterns. They need predictive tools to support timely adaptation. For instance, they need to ability to detect the probes that form precursors of full-blown attacks. DOE cyber analysts need to be able to extend lessons learned at one site across the enterprise and to mitigate the effects of attacks before they happen.

== Challenges ==
Cyber analytics is a new science that needs the rigor of standard procedures for measurement, repeatability, and prediction. Reference data sets and test suites can provide fair comparison of competing methods. Unfortunately, realistic cyber data is typically highly sensitive. We need [[Anonymization#Anonymity_on_the_Internet | anonymization]] methods that preserve the security properties of collected data without compromising privacy of the providers.

Cyber analytics spans multiple scales from processors and processes to computers, routers, and other devices to networks and internetworks.

Cyber analytics will enable predictive and adaptive approaches that improve defenders’ situational awareness and help analysts react in a timely manner. Human-guided automated response is needed for Internet-speed attacks. Large-scale collaboration in cyber defense requires very broad, nontraditional command and control strategies. Finally, defenders need to learn to use deception and to detect deception by attackers.

== Tools ==
Cyber analysis tools and methods must be sensitive to the needs of the analyst so that they enable sense-making without forcing the analyst toward particular conclusions or uses of the data.

==References==
{{reflist}}

--[[User:Finkga|Finkga]] ([[User talk:Finkga|talk]]) 20:08, 17 July 2009 (UTC)

Latest revision as of 21:21, 27 April 2018