Deep RL Vision Attention Malware Detection

1 minute read

Published:

Introduction

With the proliferation and development of contemporary computer systems, the demand for reliable data protection has reached an all time high. Traditional methods of malware detection no longer provide systems with the security they so vitally need. Attackers have developed novel and advanced methodoglogies to disguise malicious activity and avoid detection, leaving systems valnurable to malicous data acquistion. In specific, polymorphic malware has become increasingly difficult to detect due its dynamic nature. Polymorphic code refers to a species to code that uses a mutation engine to change its code whilst keeping the fundamental algorithm the same. This presents a problem for traditional methods that don’t account for mutations, and rely on firm pattern matching to detect anomalies. When polymorphic malware mutates it becomes increasing difficult to relate it back to its intial state. This neccitates an algorithm/model capable of recognizing and detecting overarching algorithm structure.

All software intitates interaction with hardware components, and at some point software intentions will need to be processed as operational codes, whether the software be goodware or malware, in order to interact with hardware. Operational code data therefore presents us with valuable information about softwatre and software intentions.

In this paper we focus on malware detection using operational code data. Using image processing techniques, operational code data is visually represented. This allows for the analysis of the relationship between nonadjoining instrcutions, which may revel intially obscure software intentions. Essentially opting for the examination of the overarching algortithm structure rather than the direct sequences.

Data Processing

The intial data set contained opcode sequences for multiple programs, both malware and goodware. Operational codes were encoded into numerical representations, each numerical value representing its repsective instruction label. The data was then reshaped to include a height, width and depth, allowing for the visualization of the data.

The data set was visualized using image processing techniques.

Convolutional Neural Network