Does machine learning for cybersecurity live up to the hype?
Trying to keep intruders out of your network is like trying to keep vermin out of a house in a bad neighborhood. The rodents and insects never stop trying to get in, and the best you can do is slow them down, minimize the damage they do and try to eliminate them once they get inside. Like rats, cyber intruders can find the smallest opening, even one that exists for a very short time. And, like rats, they never stop trying. And as with rats, the best way to find them is to pay attention to anomalies. With rats, it may be odors, or droppings, or noises in the wall. With cyber intruders, the anomalies can be much more subtle. Defending against them is similar to attempting to defend against swarm attacks: vast numbers of armed drones that coordinate their activities without human direction — a physical attack vector that keeps defense planners awake at night.
Since there is so much data to sift through, and because the nature of the attacks varies, the attack vectors change constantly, and there is no way for a human to keep ahead of them, one would think that machine learning (ML) would be a good approach to keeping up with it. It’s already used, after all, in a great many data-rich activities: marketing, facial recognition, image analysis and more. A trivial example from everyday life is the way spam callers regularly change their false IDs to evade call blockers.
Supervised versus unsupervised
But how effective is ML for cybersecurity? There is plenty of hype about the ability of ML systems to detect and differentiate between spam and malware, but much less about other types of attacks. This is perhaps because ML has been much less successful in these areas. The big difference between these two areas, says Raffael Marty vice president of corporate strategy at Forcepoint, is the difference between supervised and unsupervised learning.
In a post to Towards Data Science, Marty distinguishes between supervised (where there is “good, labeled data”) and unsupervised (unlabeled, unstructured data) cases. Learning is easier when there are examples to study. Supervised learning works well in detecting spam and malware, he says, because there are good data sets for those, but it does poorly where the data is unlabeled, as in anomaly detection and risk scoring. And for unsupervised learning, he continues, the challenge is even greater.
Machine learning: useful or not?
“Unfortunately,” says Alexander Polyakov CTO and Co-Founder at ERPScan, “machine learning will never be a silver bullet for cybersecurity compared to image recognition or natural language processing, two areas where machine learning is thriving. There will always be a person who tries to find issues in our systems and bypass them. Therefore, if we detect 90% [of] attacks today, new methods will be invented tomorrow. To make things worse, hackers could also use machine learning to carry out their nefarious endeavors.” Microsoft echoes this in its promotional material: “[Hackers] reverse-engineer protections and build systems that support mutations in behavior. They masquerade their activities as noise, and learn quickly from mistakes.”
Stephen Newman, CTO of computer security company Damballa, points out that “the science of ML as it applies to cybersecurity is probably one of the most complex and least understood topics today.” CEO of White Hat Security Craig Hinkley warns, “AI and ML are unlikely, at least in the near future, to deliver the much-heralded ‘self-healing network.’ The technology does, however, bring to the table a previously unavailable smart layer that forms a critical first-response defense from hackers.”
In the end, AI and ML are just part of a successful defense strategy: “Businesses are best served by combining the AI technology with other tried and true analytics tools,” says CIO Dive editor Alex Hickey, such as simple pattern matching, statistical methods, and rules and first-order logic.”