Loading...

Malware Detection

©2017 Textbook 69 Pages

Summary

In the present work the behavior of malicious software is studied, the security challenges are understood, and an attempt is made to detect the malware behavior automatically using dynamic approach. Various classification techniques are studied. Malwares are then grouped according to these techniques and malware with unknown characteristics are clustered into an unknown group. The classifiers used in this research are k-Nearest Neighbors (kNN), J48 Decision Tree, and n-grams.

Excerpt

Table Of Contents


Nandal, Priyanka: Malware Detection, Hamburg, Anchor Academic Publishing 2017
PDF-eBook-ISBN: 978-3-96067-708-6
Druck/Herstellung: Anchor Academic Publishing, Hamburg, 2017
Bibliografische Information der Deutschen Nationalbibliothek:
Die Deutsche Nationalbibliothek verzeichnet diese Publikation in der Deutschen
Nationalbibliografie; detaillierte bibliografische Daten sind im Internet über
http://dnb.d-nb.de abrufbar.
Bibliographical Information of the German National Library:
The German National Library lists this publication in the German National Bibliography.
Detailed bibliographic data can be found at: http://dnb.d-nb.de
All rights reserved. This publication may not be reproduced, stored in a retrieval system
or transmitted, in any form or by any means, electronic, mechanical, photocopying,
recording or otherwise, without the prior permission of the publishers.
Das Werk einschließlich aller seiner Teile ist urheberrechtlich geschützt. Jede Verwertung
außerhalb der Grenzen des Urheberrechtsgesetzes ist ohne Zustimmung des Verlages
unzulässig und strafbar. Dies gilt insbesondere für Vervielfältigungen, Übersetzungen,
Mikroverfilmungen und die Einspeicherung und Bearbeitung in elektronischen Systemen.
Die Wiedergabe von Gebrauchsnamen, Handelsnamen, Warenbezeichnungen usw. in
diesem Werk berechtigt auch ohne besondere Kennzeichnung nicht zu der Annahme,
dass solche Namen im Sinne der Warenzeichen- und Markenschutz-Gesetzgebung als frei
zu betrachten wären und daher von jedermann benutzt werden dürften.
Die Informationen in diesem Werk wurden mit Sorgfalt erarbeitet. Dennoch können
Fehler nicht vollständig ausgeschlossen werden und die Diplomica Verlag GmbH, die
Autoren oder Übersetzer übernehmen keine juristische Verantwortung oder irgendeine
Haftung für evtl. verbliebene fehlerhafte Angaben und deren Folgen.
Alle Rechte vorbehalten
© Anchor Academic Publishing, Imprint der Diplomica Verlag GmbH
Hermannstal 119k, 22119 Hamburg
http://www.diplomica-verlag.de, Hamburg 2017
Printed in Germany

1
TABLE OF CONTENTS
Chapter 1: Introduction...6-36
1.1 Types of Malware...8
1.1.1 Virus...9
1.1.1.1 Virus Classification by Target...9
1.1.1.2 Virus Classification by Self Protection Strategy...10
1.1.2 Worm...11
1.1.2.1 Activation...11
1.1.2.2 Payload...12
1.1.2.3 Target Discovery...12
1.1.2.4 Propagation...13
1.1.3 Trojans...14
1.1.3.1 How Trojans work ...15
1.1.3.2 Trojans Types...16
1.1.4 Adware...18
1.1.5 Spyware...18
1.1.6 Rootkit...19
1.1.7 Backdoor...19
1.1.8 Keylogger...19
1.1.9 Ransomware...19
1.1.10 Remote Administration Tools...19
1.1.11 Botnet...20

2
1.1.12 Scareware...20
1.2 Malware Classification Tree...21
1.3 Classification Methods...22
1.3.1 Supervised Methods...22
1.3.1.1 Naive Bayes Classifier...22
1.3.1.2 J48 Decision Trees...23
1.3.1.3 Support Vector Machines...24
1.3.1.4 K-nearest neighbors...26
1.3.1.5 N-grams...27
1.3.2 Unsupervised Methods...29
1.3.2.1 K-means clustering algorithm...29
1.4 Difference between K-nearest neighbor and K-means clustering algorithm...30
1.5 Malware Detection Techniques...31
1.5.1 Signature Based Detection...31
1.5.2 Anomaly Based Detection...32
1.5.3 Heuristic Analysis or Pro-Active Defense...32
1.5.3.1 Hidden Markov Model Based Detection...32
1.5.4 Genetic Signature Detection...33
1.6 Malware Detection Tools...34
1.6.1 Microsoft Process Explorer...34
1.6.2 Trend Micro's HijackThis...34
1.6.3 Kaspersky's GetSystemInfo...34

3
1.6.4 Microsoft Baseline Security Analyzer...35
1.6.5 Secunia inspection scanners...35
1.6.6 Antivirus Programs...35
1.6.7 Microsoft's Malicious Software Removal Tool...35
1.6.8 SUPER Antispyware...36
1.6.9 Malware byte's Anti-Malware...36
1.6.10 GMER...36
Chapter 2: Literature Survey...37-50
2.1 Data Mining...37
2.1.1 Dynamic Misuse Detection ...37
2.1.2 Dynamic Hybrid Detection ...38
2.1.3 Static Anomaly Detection ...39
2.1.4 Static Hybrid Detection...39
2.1.5 Static Misuse Detection...41
2.2 Machine Learning...42
2.3 Cloud Computing...47
2.4 Android Malware Detection...49
Chapter 3: Implementation...51-58
3.1 Problem Description...51
3.2 System Requirements ...51
3.3 Datasets Used...52
3.4 Process Outline...53

4
3.4.1 Sandbox Configuration...53
3.4.2 Feature extraction and selection...53
3.4.3 Application of classification methods...55
3.5 Results and Discussion...57
Chapter 4: Conclusion and Future work...59
4.1 Conclusion...59
4.2 Future Work...59
References...60-66

5
LIST OF FIGURES
Figure 1: No. of Malware Specification ...8
Figure 2: Malware Classification Tree...21
Figure 3: SVM Scheme...25
Figure 4: KNN Scheme...26
Figure 5: Screenshot of .bytes file...54
Figure 6: Screenshot of .asm file...54
Figure 7: Screenshot of prediction file...57

6
CHAPTER 1
INTRODUCTION
Computer security, also known as cyber security or IT security is the protection of computer
systems from the theft or damage to their hardware, software or information, as well as from
disruption or misdirection of the services they provide. The security controls are used to provide
confidentiality, integrity, and availability of data, software, hardware, and firmware of computer
systems. To secure a computer system, it is important to understand the attacks that can be made
against it. The major attacks that can be made are phishing, spamming, exploits, malware, etc.
Phishing attacks are designed to steal a person's login and password details so that the cyber
criminal can assume control of the victim's social network, email and online bank accounts.
Seventy per cent of internet users choose the same password for almost every web service they
use. This is why phishing is so effective, as the criminal, by using the same login details, can
access multiple private accounts and manipulate them for their own good.
Spamming is when a cyber criminal sends emails designed to make a victim spend money on
counterfeit or fake goods. The majority of spam messages are sent, often advertising such as
pharmaceutical products or security software, which people believe they need to solve a security
issues which doesn't actually exist. Most widely recognized form is email spam. The other
spams include IM spam, blog spam, discussion forum spam, cell phone messaging spam, etc.
An exploit is a piece of software, a chunk of data, or sequence of commands that take advantage
of a bug, glitch or vulnerability in order to cause unintended or unanticipated behavior to occur
on computer software, hardware, or something electronic (usually computerized). This
frequently includes things such as gaining control of a computer system or allowing privilege
escalation or a denial of service attack.

7
Malware has become one of the major cyber threats with the expansion of internet. Malware is a
relatively new term that gets its name from malicious software. Malware is defined as software
designed to infiltrate or damage a computer system without the owner's informed consent. Any
software performing malicious actions, including information stealing, spying, etc. can be
referred to as malware. Malware is actually a generic definition for all kind of computer threats.
Therefore malware refers to malicious software to infect individual computers or an entire
organization's network. It exploits target system vulnerabilities, such as a bug in authentic
software (e.g., a browser or web application plug-in) that can be hijacked. It can also infect a
computer and turn it into a botnet, which means the cyber criminal can control the computer and
use it to send malware to others.
As per the definition given by Kaspersky Labs (2017) malware is a type of computer program
designed to infect a legitimate user's computer and inflict harm on it in multiple ways [1]. With
the expansion of internet diversity of malware is also increasing. Millions of hosts are being
attacked because the need of protection is not fulfilled by the anti-virus scanners. According to
the Kaspersky Labs (2016) [2], 6,563,145 different hosts were attacked, and 4,000,000 unique
malware objects were detected in 2015. The cost of data breaches is predicted by Juniper
Research (2016) to increase to $2.1 trillion globally by 2019 [3]. The tools used for attacking are
available extensively on the internet now-a-days.
Figure 1 show how the malware is rapidly increasing in volume day-by-day. The x-axis in Figure
1 indicates the year and the y-axis indicates the number of malware specimen generated in the
specified year.

8
Figure 1. No. of Malware Specification
(Courtesy: -https://www.gdatasoftware.com/blog/2017/04/29666-malware-trends-2017)
Therefore, malware protection of computer systems is one of the most important cyber security
tasks for single users and businesses, since even a single attack can result in compromised data
and sufficient losses. Massive losses and frequent attacks dictate the need for accurate and timely
detection methods. Current static and dynamic methods do not provide efficient detection,
especially when dealing with zero-day attacks. For this reason, machine learning-based
techniques can be used.
1.1 Types of Malware
This section categorizes malware into different classes depending on its purpose. A simple
classification of malware consists of file infectors and stand-alone malware. Another way of
classifying malware is based on their particular action: viruses, worms, backdoors, trojans,
rootkits, spyware, adware etc. Computer virus detection has evolved into malicious program
detection since Cohen first formalized the term computer virus in 1983 [4].

9
1.1.1 Virus
This is the simplest form of malicious software. It is simply any piece of software that is loaded
and launched without user's permission while reproducing itself or infecting (modifying) other
software [5]. In other words, Computer virus is a self replicating code (including possibly
evolved copies of it) that infects other executable programs. Viruses usually need human
intervention for replication and execution.
1.1.1.1 Virus Classification by Target
This section define target as the means exploited by the virus for execution. Based upon the
target viruses can be classified into three major classes.
a) Boot Sector Virus
Master Boot Record (Boot sector in DOS) is a piece of code that runs every time a computer
system is booted. Boot sector viruses infect the MBR on the disk, hence getting the privilege
of getting executed every time the computer system starts up.
b) File Virus
File virus is the most common form of viruses. They infect the file system on a computer.
File viruses infect executable programs and are executed every time the infected program is
run.
c) Macro Virus
Macro viruses infect documents and templates instead of executable programs. It is written in
a macro programming language that is built into applications like Microsoft Word or Excel.
Macro virus can be automatically executed every time the document is opened with the
application.

10
1.1.1.2 Virus Classification by Self-Protection Strategy
Self-protection strategy can be defined as the technique used by a virus to avoid detection. In
other words, it is known as the anti-antivirus techniques. Based upon self-protection strategies,
viruses can be classified into the following categories.
a) No Concealment
Based upon the self-protection strategy the first category can be defined as the one without
any concealment. The virus code is clean without any garbage instructions or encryption.
b) Code Obfuscation
Code obfuscation is a technique developed to avoid specific-signature detection. These
include adding no-op instructions, unnecessary jumps etc, so the virus code look muddled
and the signature fails.
c) Encryption
The next line of defense by the virus writers to defeat signature detection was code
encryption. Encrypted viruses use an encrypted virus body and an unencrypted decryption
engine. For each infection, the virus is encrypted with a different key to avoid giving a
constant signature.
d) Polymorphism
Encrypted viruses are caught by the presence of the unencrypted decryption engine that
remains constant for every infection. This is cured by the mutating techniques. Polymorphic
viruses feature a mutation engine that generates the decryption engine on the fly. It consists
of a decryption engine, a mutation engine and payload. The encrypted virus body and the
mutating decryption engine refuse to provide a constant signature.

11
e) Metamorphism
Metamorphic virus is a self mutating virus in its truest form of the word as it has no constant
parts. The virus body itself changes during the infection process and hence the infected file
represents a new generation that does not resemble the parent.
f) Stealth
Stealth techniques, also called code armoring, refers to the set of techniques developed by the
virus writers to avoid the recent detection methods of activity monitoring, code emulation,
etc. The techniques include anti-disassembly, anti-debugging, anti-emulation, anti-heuristics,
etc.
1.1.2 Worm
Computer worm is a self replicating stand alone program that spreads on computer networks.
Worms usually do not need any extra help from a user to replicate and execute. Worm can spread
over the network and replicate to other machines also [6]. The life cycle of worms has been
defined by Dan Ellis [7]. Based upon the operations involved in each phase in the life cycle,
worms can be classified into different categories. The following taxonomy is used by means of
similar factors [8].
1.1.2.1 Activation
Activation defines the means by which a worm is activated onto the target system. This is the
first phase in a worm's life cycle. Based upon activation techniques worms can be classified into
the following classes.
a) Human Activation
This is the slowest form of activation that requires a human to execute the worm.

12
b) Human Activity-Based Activation
In this form of activation, the worm execution is based upon some action that the user
perform not directly related to the worm such as launching an application program, etc.
c) Scheduled Process Activation
This type of activation depends upon scheduled system processes such as automatic
download of software updates, etc.
d) Self Activation
This is the fastest form of activation where a worm initializes its execution by exploiting the
vulnerabilities in the programs that are always running such as database or web servers.
1.1.2.2 Payload
The next phase in the worm's life cycle is payload delivery. Payload describes what a worm does
after the infection.
a) None
Majority of worms do not carry any payload. The still cause havoc by increasing machine
and network traffic load.
b) Internet Remote Control
Some worms open a backdoor on the victim's machine thus allowing others to connect to that
machine via internet.
c) Spam-Relays
Some worms convert the victim machine into a spam relay, thus allowing spammers to use it
as a server.
1.1.2.3 Target Discovery
In this phase, once the payload is delivered, the worm start looking for new targets to attack.

13
a) Scanning Worms
Scanning worms scan for targets by scanning sequentially through a block of addresses or by
scanning randomly.
b) Flash Worms
Flash worms use a pre-generated target list or a hit list to accelerate the target discovery
process.
c) Metaserver Worms
This type of worms uses a list of addresses to infect which is maintained by an external
metaserver.
d) Topological Worms
Topological worms try to find the local communication topology by searching through a list
of hosts maintained by application programs.
e) Passive Worms
Passive worms rely on user intervention or targets to contact worm for their execution.
1.1.2.4 Propagation
Propagation defines the means by which a worm spreads on a network. Based upon the
propagation mechanism worms can be divided into the following categories.
a) Self-Carried
Self carried worms are usually activated by themselves. They copy themselves to the target
as part of the infection process.
b) Second Channel
Second channel worm copy their body after the infection by creating a connection from
target to host to download the body.

14
c) Embedded
Embedded worms embed themselves in the normal communication process as a stealth
technique.
1.1.3 Trojans
While the words trojan, worm and virus are often used interchangeably, they are not the same.
Viruses, worms and trojan horses are all malicious programs that can cause damage to your
computer, but there are differences among the three. A trojan horse, also known as a trojan, is a
piece of malware, which appears to perform a certain action but in fact performs another such as
transmitting a computer virus. At first glance it will appear to be useful software but will actually
do damage once installed or run on your computer. Those on the receiving end of a Trojan horse
are usually tricked into opening them because they appear to be receiving legitimate software or
files from a legitimate source. When a Trojan is activated on your computer, the results can vary.
Some Trojans are designed to be more annoying than malicious (like changing your desktop,
adding silly active desktop icons) or they can cause serious damage by deleting files and
destroying information on your system. Trojans are also known to create a backdoor on your
computer that gives malicious users access to your system, possibly allowing confidential or
personal information to be compromised. Unlike viruses and worms, Trojans do not reproduce
by infecting other files nor do they self-replicate. Simply put, a Trojan horse is not a computer
virus. Unlike such malware, it does not propagate by self-replication but relies heavily on the
exploitation of an end-user. It is instead a categorical attribute, which can encompass many
different forms of codes. Therefore, a computer worm or virus may be a Trojan horse. The term
is derived from the classical story of the Trojan horse.

Details

Pages
Type of Edition
Erstausgabe
Year
2017
ISBN (PDF)
9783960677086
ISBN (Softcover)
9783960672081
File size
3.8 MB
Language
English
Publication date
2017 (November)
Grade
9.5
Keywords
J48 Decision Tree n-grams k-Nearest Neighbors Malicious software IT security Dynamic approach Computer security Trojan Adware Spyware Computer virus
Previous

Title: Malware Detection
book preview page numper 1
book preview page numper 2
book preview page numper 3
book preview page numper 4
book preview page numper 5
book preview page numper 6
book preview page numper 7
book preview page numper 8
book preview page numper 9
book preview page numper 10
book preview page numper 11
book preview page numper 12
book preview page numper 13
book preview page numper 14
book preview page numper 15
69 pages
Cookie-Einstellungen