Proposed Approach
Proposed Approach
Brief Summary
The proposed model utilizes a deep reinforcement learning architecture to implement a vision attention based binomial malware classification system. The system is fundamentally composed of multiple processes, each process vitally contributing to the whole. The core processes are broken down and novelties are described in detail below.
Convolutional Neural Network
A convolutional neural network (CNN) architecture is a class of deep neural networks commonly utilized in image analysis and classification. A CNN is essentially composed of convolution layers and pooling layers. The convolution layers use filtering to extract features from a raw image. Pooling layers reduce the dimensions of the feature map, therein scaling down the number of parameters and minimizing computation. CNN’s contain the ability to autonomously learn feature representation from raw data. This allows the model to learn abstract features from a provided set of training data and apply feature representation in classifying testing data.
Reinforcement Learning
Reinforcement learning is a subsection of machine learning focused on the mapping of optimal actions to specific environment states, in aim of augmenting reward. An agent is placed within an environment and equipped with a finite set of possible actions. Through repetition the agent learns optimal behavior within states. The foundation of reinforcement learning is the reward function, which drives the agent to pursue directed goals. Q-Learning is a reinforcement learning architecture which is concerned with the identification of optimal action-selection policies for any given Markov Decision Process. A Markov Decision Process is a mathematical framework for modeling decision making in discrete time. The Markov Decision Process is a 4-tuple:
(S,A,Pa, Ra) ** Define Variables Wiki**
A Q-Table is established for the agent, mapping the action that maximizes reward to each possible state. The reward, referred to as the Q value, can be formalized using the Bellman Equation, as seen in Equation XXX. Essentially, the Q value is equal to the immediate reward in addition to the long-term reward multiplied by a discount factor(LAMDA), that stresses the importance of long-term reward versus immediate reward.
Talk more about generic RL and algorithm + Q learning equations + Environment + Agent
If we have the Markov* property, *the future is independent of the past given the present.
Vision Attention Based Deep Reinforcement Learning Model
The proposed Deep Reinforcement Learning based vision attention architecture is embodied by three vital stages, namely data processing, image augmentation, and core model training.
The core model involves the use of an agent who chooses from a finite set of actions, each of which influence the environment and carry the agent from one state to the other. Each state is a visualized set of operational codes. The transition from one state to the another is facilitated through the selection and application of an action.
The environment is a representation of the holistic view of visualized operational code data. All initial states are singular visualized raw image manifestations of operational code data. Proceeding states are focused variants to the initial state.
After the observation of the state of the environment, an action must be selected from a set of possible actions named the action space. These actions involve the use of image manipulation techniques to apply carefully selected filters to the initial image. The filters allow for the examination of the spatial arrangement of intensity in regions of the image. Fundamentally, these filters highlight and emphasize regions of an image.
The reward function is vital in motivating the agent to choose instrumental actions. In aim of achieving an accurate model, correct binominal classification is our core aim. Therefore, the reward function was implemented to complement this aim.
An agents interacts with the environment through the use of actions.
According to the Markov property, all past states may be disregarded. The conservation of the present state is the foundation for a future state. Therein, the progression to an optimal future state can be conducted through the analysis
Data Processing
Image Augmentation
The proposed architecture is grounded on the assumption that the dataset contains as many images as possible that have already been focused. In order to implement this a systematic procedure was followed, ensuring that each training image was exposed to as many actions