Reining in Software Trojan Horses

Deep learning research identifies cybersecurity risks

What’s the easiest way for hackers or spies to penetrate a secured computer network?

Have the network managers open the door and invite them in.

Almost all networks purchase basic software from third-party creators. The bad guys have figured out that the third parties present an opportunity for them to penetrate software-supplier systems and hide malware inside the software to be purchased. The software becomes a digital Trojan horse, carrying attackers inside the network’s walls.

That was the strategy behind a huge espionage campaign, first revealed in December 2020, that compromised several major U.S. government agencies, including the Justice Department and the Treasury, as well as private companies including Google and Microsoft. It has been described as one of the largest and most successful digital espionage cases in history.

That’s where Professor Dianxiang Xu comes in. In the SS&C Data Analytics, Cybersecurity and High Performance Computing Facility of the Robert W. Plaster Free Enterprise and Research Center, Xu is using deep learning models, a specialized area of artificial intelligence (AI), to help combat the emerging threat. The goal is to use static code analysis of computer programs to find potential defects and security vulnerabilities. The work is funded by a National Science Foundation grant.

“Software vulnerability is a major source of cybersecurity risks. It is very difficult to identify vulnerabilities in software code as software has significantly increased in both size and complexity,” Xu says.

“Finding software vulnerabilities is analogous to ‘searching for a needle in a haystack.’ Recent advances in deep learning can be promising for predicting software vulnerabilities.”

Spies and hackers aren’t the only bad guys Xu is working to combat. He is also studying ways to use AI to collect and process digital evidence for presentation to juries in court. Xu is basing his network security work on a deep learning model known as The Transformer.

“Finding software vulnerabilities is analogous to ‘searching for a needle in a haystack.’ Recent advances in deep learning can be promising for predicting software vulnerabilities.” - Dianxiang Xu, Ph.D.

“The Transformer is a deep learning model introduced in 2017, used primarily in the field of natural language processing, or NLP,” he says. “It has enabled training on larger datasets than was possible before it was introduced. The pretrained transformer systems such as BERT (Bidirectional Encoder Representations from Transformers) have achieved state-of-the-art performance on a number of NLP tasks.”

“Considering the similarity and difference between natural languages and programming languages, we expect the transformer systems can be pretrained with a large amount of computer code so as to improve various program understanding tasks, such as detection of vulnerabilities in source code.” So, how vital is the anti-spyware research underway by computer scientists such as Xu?

In an article for The New Yorker, Sue Halpern wrote: “The simple truth is that cyber defense is hard, and in a country like the United States, where so much of our critical infrastructure is privately owned, it’s even harder. Every router, every software program, every industrial controller may inadvertently offer a way for malicious actors to enter and compromise a network.”

Inside the Plaster Center, Xu can be found chipping away at those many cyber threats, one model at a time.


Top Stories