PhD

Surviving the massive proliferation of mobile malware

From 2017 to 2020, I dedicated my life to this PhD in Computer Science at Université de Rennes 1. I was part of the WIDE team, a joint research team focusing on large-scale distributed computer systems.
It was a rewarding experience that allowed me to acquire strong research and full-stack experiences in Software Engineering. This page provides a summary of my thesis as well as relevant resources. You can download my thesis manuscript here.

Abstract

Nowadays, many of us are surrounded by smart devices that seamlessly operate interactively and autonomously together with multiple services to make our lives more confortable. These smart devices are part of larger ecosystems, in which various companies collaborate to ease the distribution of applications between developers and users. However malicious attackers take advantage of them illegitimately to infect users’ smart devices with malicious applications. Despite all the efforts made to defend these ecosystems, the rate of devices infected with malware is still increasing in 2020. In this thesis, we explore three research axes with the aim of globally improving malware detection in the Android ecosystem. We demonstrate that the accuracy of machine learning-based detection systems can be improved by automating their evaluation and by reusing the concept of AutoML to fine-tune learning algorithms parameters. We propose an approach to automatically create malware variants from combinations of complex evasion techniques to diversify experimental malware datasets in order to challenge existing detection systems. Finally, we propose methods to globally increase the quality of experimental datasets used to train and test detection systems.

Surviving the massive proliferation of mobile malware

Link to a pdf of my thesis manuscript

Thesis defense

Publications

DroidAutoML: A Microservice Architecture to Automate the Evaluation of Android Machine Learning Detection Systems

The mobile ecosystem is witnessing an unprecedented increase in the number of malware in the wild. To fight this threat, actors from both research and industry are constantly innovating to bring concrete solutions to improve security and malware protection. Traditional solutions such as signature-based anti viruses have shown their limits in front of massive proliferation of new malware, which are most often only variants specifically designed to bypass signature-based detection. Accordingly, it paves the way to the emergence of new approaches based on Machine Learning (ML) technics to boost the detection of unknown malware variants. Unfortunately, these solutions are most often underexploited due to the time and resource costs required to adequately fine tune machine learning algorithms. In reality, in the Android community, state-of-the-art studies do not focus on model training, and most often go through an empirical study with a manual process to choose the learning strategy, and/or use default values as parameters to configure ML algorithms. However, in the ML domain, it is well known admitted that to solve efficiently a ML problem, the tunability of hyper-parameters is of the utmost importance. Nevertheless, as soon as the targeted ML problem involves a massive amount of data, there is a strong tension between feasibility of exploring all combinations and accuracy. This tension imposes to automate the search for optimal hyper-parameters applied to ML algorithms, that is not anymore possible to achieve manually. To this end, we propose a generic and scalable solution to automatically both configure and evaluate ML algorithms to efficiently detect Android malware detection systems. Our approach is based on devOps principles and a microservice architecture deployed over a set of nodes to scale and exhaustively test a large number of ML algorithms and hyper-parameters combinations. With our approach, we are able to systematically find the best fit to increase up to 11% the accuracy of two state-of-the-art Android malware detection systems.