User:Finkga/Cyber Analytics
This is a draft article. It is a work in progress open to editing by anyone. Please ensure core content policies are met before publishing it as a live Wikipedia article. Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL Last edited by 169.237.10.220 (talk | contribs) 14 years ago. (Update) |
Cyber Analytics
Cyber Analytics is a branch of analytics that applies to the domain of computers, networks, and related data. Cyber analytics is the science of analysis applied to computers and computer networks. Analysis is decision-making based on observable facts (data). A scientific approach compares observations to hypothetical models. Thus, cyber analytics helps the analyst to understand the behavior of computers, networks, and user activities from the data computer systems use and generate. Cyber analytics tells the story behind cyber data. Cyber analytics can be used to support computer security, computer or network administration, auditing, and many other application areas. Another potential definition would be the systematic derivation of behavior from data[1].
Derivation
Cyber analytics assumes there is a unifying story behind the fractured set of available data. In fact, there are many different stories interwoven through the various streams of data. The purposes, perspective, and biases of the analyst may determine the story that he or she seeks from the data. The cyber analyst's job includes both synthesis of these separate streams, abduction of hypotheses that may explain them, and analysis of the hypotheses by comparing them to the data. Thus, cyber analytics is the science of investigation into the meaning of computer data. A more accurate term might be cyber Investigation, but this connotes law enforcement which is only one possible application area. Analytics is more neutral and thus is preferred.
Distinctive Features of Cyber Analytics
All analytic sciences support analysts who must make sense of massive (and often streaming) data. Cyber analytics differs from others primarily in characteristics of the data and the analysts who use it. Computer and network data is generated by simpler processes than most textual data.{Cn} Thus, it should have lower entropy than data of human origin. However, cyber data is generated in extremely high volumes and velocities; Streaming data is generated that cannot easily be stored for long periods of time for off-line analysis. Cyber analysts often come from system administration, programming or other technical backgrounds as opposed to statistics where formal data analysis is taught. They often have their own approach to analysis based on subject matter expertise [2].
Cyber analytics supports analysis, but not necessarily for human analysts. Machine learning techniques can inform automated analysis and response facilities that might not require human intervention at all. In contrast, Visual Analytics is more human-centric, with the human user essential as the consumer of the visualizations. Cyber analytics can be applied to forensic investigations or to predict future events. The latter is similar to Predictive analytics which is used mostly in business.
The key difference between cyber analytics for computer security and other forms of cyber analytics is the essential adversarial nature of security[3].
Cyber data
Cyber data is characterized by an extreme volume and velocity of highly-structured data that is mostly not suitable for humans to read. For example, Fink cites the daily volume of security-related log events that DOE passes up the chain for central analysis to be 500 million events[4]. Cyber data is not normally human-readable, although many log formats (such as syslog [5]) contain human-readable content. The data is typically structured according to some machine-oriented protocol, but the protocols used may be non-standard or proprietary implementations of standard protocols that may not fully interoperate.
The high velocity of cyber log data makes it impractical to store[6]. New data is arriving all the time, so that snapshots are impractical. Instead, incremental analysis with only partial indexing is required[7].
Cyber analysts
The Need for Cyber Analytics
DOE cyber analysts must maintain near real-time situational awareness of a widely dispersed enterprise with over 100 sites, 500 thousand machines, and nearly 500 million events daily. The number of daily events is expected to soar into the billions in the near future. To maintain the safety of the DOE infrastructure, analysts must be able to gain a nation-wide perspective within seconds to minutes of a major event.
Analysis centers ask trending questions such as, “Are attacks becoming more effective?”, “Are attackers becoming more sophisticated?”, and “Are defenders improving their defensive posture?”. They must also answer key agency questions such as, “What resources is this external IP address accessing?”, and “Can you characterize the sites nation X is interested in?”.
Cyber analysts need tools for automated pattern extraction and recognition to track and monitor interesting events and show how bit patterns form indicators of behavioral patterns. They need predictive tools to support timely adaptation. For instance, they need to ability to detect the probes that form precursors of full-blown attacks. DOE cyber analysts need to be able to extend lessons learned at one site across the enterprise and to mitigate the effects of attacks before they happen.
Challenges
Cyber analytics is a new science that needs the rigor of standard procedures for measurement, repeatability, and prediction. Reference data sets and test suites can provide fair comparison of competing methods. Unfortunately, realistic cyber data is typically highly sensitive. We need anonymization methods that preserve the security properties of collected data without compromising privacy of the providers.
Cyber analytics spans multiple scales from processors and processes to computers, routers, and other devices to networks and internetworks.
Cyber analytics will enable predictive and adaptive approaches that improve defenders’ situational awareness and help analysts react in a timely manner. Human-guided automated response is needed for Internet-speed attacks. Large-scale collaboration in cyber defense requires very broad, nontraditional command and control strategies. Finally, defenders need to learn to use deception and to detect deception by attackers.
Tools
Cyber analysis tools and methods must be sensitive to the needs of the analyst so that they enable sense-making without forcing the analyst toward particular conclusions or uses of the data.
References
- ^ Notes from the 2010 San Francisco Bay Area DOE Cyber Grassroots meeting, Analytics subgroup report. Strelitz RA and Fink GA, eds.
- ^ Fink GA, North CL, Endert A, and Rose SJ, “Visualizing Cyber Security: Usable Workspaces.” In Proceedings of the 2009 Workshop on Visualization for Computer Security (VizSEC 2009).
- ^ Dalvi, N., P. Domingos, Mausam, S. Sanghai, and D. Verma, "Adversarial Classification" in Proceedings of the 2004 ACM Conference on Knowledge Discovery and Data Mining (KDD 2004), pp. 99-108. ACM Press, 2004.
- ^ Fink GA, McKinnon AD, Clements S, and Frincke DA, "Tensions in security collaboration goals and how this affects incident detection and response," chapter three in Collaborative Cyber Security and Trust Management>, IGI Global, to appear.
- ^ http://www.ietf.org/rfc/rfc5424.txt?number=5424
- ^ Babcock, B., Babu, S., Datar, M., Motwani, R., and Widom, J. 2002. Models and issues in data stream systems. In Proceedings of the Twenty-First ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (Madison, Wisconsin, June 03 - 05, 2002). PODS '02. ACM, New York, NY, 1-16. DOI= http://doi.acm.org/10.1145/543613.543615
- ^ Renaud Delbru, Nickolai Toupikov, Michele Catasta, Robert Fuller, Giovanni Tummarello Lucene in Action, Second Edition