Thursday, May 7, 2026

LAB 10: APK Security Scanner - Multi-Agent Static Analysis pt.2

Taking Android Apps Apart: Building an AI-Powered Security Scanner

Part 2: The Analysis, Execution and Findings

In Part 1 we defined our objective which was building an automated AI pipeline to analyze Android apps and identify security flaws. With our local environment set up and our JADX translator ready it was time to put our scanner to the test.

In this post we walk through the actual reverse engineering process. We detail our methodology, the roadblocks we hit and our final discoveries as we transitioned from simulated logic to real-world AI agents.

Target Recap

We set out to analyze Android applications to automatically recover source code, inspect machine code and identify underlying vulnerabilities like hidden passwords. Our targets included a suite of intentionally vulnerable mock apps to establish a baseline. After that we used a real open-source application called QuickTiles to test how robust our scanner was against actual Android architecture.

Tech Stack and Architecture

Before we dive into the results it helps to understand exactly how our machinery operates behind the scenes. Our system is built on a custom technology stack designed for speed and security.

The Foundation

We built our backend using Node.js which is a popular environment that lets us run JavaScript on a server instead of just in a web browser. The frontend is a clean web dashboard built with standard HTML and JavaScript.

The Translators (JADX and MobSF)

When an app is uploaded it first goes to JADX. JADX is an open-source translator software. It works by unzipping the Android app and using complex algorithms to translate the robotic binary data back into human-readable Java code. We also connected MobSF (Mobile Security Framework). MobSF works by taking the app and running it through hundreds of predefined security rules. It acts as our automated backup checker to ensure we do not miss anything obvious.

The 8-Agent Assembly Line

Our AI does not tackle the app all at once. Instead we built a pipeline of specialized AI agents. They communicate with each other in a sequential chain. When one finishes its job it packages its findings and passes them directly to the next. This relay-race system ensures each agent stays focused on its specific job.

We recently upgraded our pipeline. While some agents still use fast deterministic code several of our most critical agents are now powered by the Gemini 2.5 Flash API. This means they are no longer just following static rules but are actually thinking and reasoning about the code in real-time.

The actual brainpower for our real-world LLM agents is located in the narrateFindings and addSmaliEvidence functions. Here is how we upgraded the simulated logic to use actual AI calls:

Intake Validation Agent

This acts as the bouncer at the door. It checks the uploaded file to make sure it is a real Android app and not a disguised file before letting it into the system.

Decompilation Agent

This agent operates the JADX translator. It carefully unpacks the app and organizes the translated Java code so the other agents can read it.

Permission & Component Agents (LLM Powered)

These agents now send the entire AndroidManifest.xml to Gemini. The AI reads the file and identifies risky configurations that are too complex for simple search rules to catch.

Static Logic Flaw Agent

This agent acts like a detective reading a book. It scans the human-readable Java code looking for hardcoded passwords or hidden bypasses. It acts as a fast filter to find interesting files for the AI.

Risk Narrative Agent (LLM Powered)

Instead of using pre-written templates this agent now sends every finding to Gemini. The AI writes a custom story for every vulnerability it sees and explains exactly how a hacker might exploit it.

Smali/DEX Bytecode Agent (LLM Powered)

This is our most advanced specialist. It extracts raw robotic machine instructions (known as Smali) and sends them to Gemini. The AI translates this difficult code into plain English so we can understand the low-level logic.

Report Aggregation Agent

This is the manager. It takes the stories from all the other agents and organizes them into a clean color-coded report for our dashboard.

MCP Connections (The Universal Plugs)

We connected our scanner using MCPs (Model Context Protocols). This is a fancy way of saying we built a universal plug. It allows our local scanner to connect directly to larger AI models so the AI can securely pull data from our tools. Instead of the AI guessing how to use our system we gave it specific "buttons" it can press.

Methodology

To inspect the targets we orchestrated a multi-stage workflow:

Intake and Extraction

We uploaded the app via our custom web dashboard.

Hybrid Analysis

We used fast search rules to find "files of interest" then routed those specific snippets to our Gemini-powered LLM agents for deep reasoning.

Decompilation Pipeline

Our system handed the app over to JADX. It unpacked the app and translated the confusing robotic machine code into Java code.

Aggregation

The Report Aggregation Agent collected all these findings and created a unified color-coded report for our dashboard.

Analysis Narrative

We began our analysis with our mock apps to validate the pipeline. Dropping the files into the website successfully triggered the pipeline and the dashboard populated with red warning cards highlighting intentionally placed hidden passwords.

The Evolution from Simulation to Reality

Initially we used "make-believe" agents that only followed static rules. While this worked for our mock apps it failed to capture the nuance of real-world code. By introducing the Gemini API we transformed the scanner.

The Course Correction

We realized that AI is not built to read entire massive applications at once. We kept our fast "Regular Expression" search tools as a spotlight. They find the suspicious lines of code and then we send only those specific lines to Gemini. This hybrid approach allowed us to use the speed of local code with the intelligence of a massive AI model.

Findings

What did our final LLM-powered pipeline discover?

Intelligent Manifest Review

Our Permission Agent successfully identified a risky "Backup Enabled" flag. While a search rule could have found this the AI added context by explaining exactly how a local attacker could steal app data using the ADB tool.

Real-Time Bytecode Translation

Our Smali agent proved to be the most impressive upgrade. When we sent it raw machine code it correctly explained that the code was checking a local boolean variable to see if "premium" mode was active. It then accurately described how to bypass that check.

Custom Remediation Advice:

Every finding on our dashboard now contains custom advice written by the AI. Instead of generic warnings the developer sees specific instructions tailored to their exact code snippet.

Validation

To ensure the AI was not just making things up (hallucinating) we validated the results using two methods:

Manual Comparison

We manually opened the JADX translator ourselves and visually verified the code. The line numbers and AI-generated explanations perfectly matched the actual logic.

MobSF Cross-Reference

We ran the same app through our backup scanner MobSF. The findings aligned perfectly which confirmed that our AI agents were producing high-fidelity security insights.

Future Deployment Plans

Right now everything runs securely on our private local computers. However we have ambitious plans for the future.

We plan to package this entire scanner and introduce it as a public service. To do this we plan to deploy the web application using modern cloud platforms like Vercel or Netlify. These platforms will allow us to easily host our code on the internet. To ensure our new service stays safe from hackers we will host it behind Cloudflare. Cloudflare acts as a massive digital shield that provides better protection by blocking bad traffic before it ever reaches our servers.

Reflection

Building this scanner taught us a valuable lesson about the intersection of AI and security. We learned that the most powerful systems are not 100% AI but are instead "AI-Augmented".

By using fast local code as a "filter" and the Gemini API as the "brain" we built a tool that is both fast and incredibly intelligent. The transition from simulated agents to actual LLM agents was the final step in creating a truly professional grade security scanner.

Ultimately we learned that combining a mature translator tool like JADX with the reasoning capabilities of Gemini creates a powerful synergy that can take apart even the most complex Android applications.

LAB 9: APK Security Scanner, Multi-agent Static Analysis

Taking Android Apps Apart: Building an AI-Powered Security Scanner

Part 1: Problem Definition, Background and Preparation

Mobile applications are a primary target for hackers in today’s digital landscape. Everything from our personal banking to our smart home devices lives on our phones. But what happens when software developers accidentally leave the digital "keys to the castle" inside the app itself?

In this series we will explore the world of Reverse Engineering for Android applications. Reverse engineering simply means taking something that is already built (like an app) and taking it apart to see exactly how it works on the inside.

We will not just do this manually though. We will be building an automated AI-powered vulnerability scanner. This is a system that uses Artificial Intelligence to automatically look for security flaws and explain how to fix them.

Here is a look at how we will approach this challenge, the background knowledge you need to follow along and our preparation for the analysis.

Project Overview

What are we reverse engineering?

We will target Android Application Packages (APKs). An APK is the file format Android uses to install apps on your phone much like a .exe file on a Windows computer. By opening up an APK we will be able to look at the underlying code and structure that makes the app run.

What is the broader context?

This project falls under Mobile Application Security Testing (MAST) which is the practice of looking for security holes in mobile apps before bad actors can find them. Usually security analysts use various tools to take an app apart and spend hours reading through the code to find mistakes. We will speed up this process by using AI Agents. Think of an "agent" as a specialized digital assistant. Instead of one AI trying to do everything we will use a "pipeline" or assembly line of several AI agents where each has a specific job. These jobs will include tasks like reading permissions, looking for passwords or writing the final report.

Why this target?

Taking Android apps apart is a fascinating mix of software development and security. Our team will enhance our security scanner by adding a new AI agent specifically trained to read Bytecode (the low-level robotic instructions the phone reads). The challenge of having an AI read this confusing machine-level code and translate it into plain English for a human will be a perfect way to test how smart AI really is at security research.

Research Question and Goal

What do we hope to understand?

Our main goal will be to see if a team of specialized AI agents can accurately take apart an Android app, read the code and find logic flaws. A logic flaw is a mistake in how the app is built. For example it is a flaw if a developer accidentally typed a secret password directly into the app's code (known as a "hardcoded secret") or if they built a screen that bypasses the login page. We specifically want to know if the AI will be able to understand the low-level machine code just as well as standard human-readable code.

What counts as success?

A successful result will be a fully automated assembly line that will:

Automatically unpack and translate the app's code.
Identify real security mistakes without giving us too many false alarms.
Extract the exact lines of code where the mistake is found as evidence.
Generate a plain-English explanation of why it is dangerous and how to fix it.
Display all of this on a clean professional web dashboard.

Background Information

Before we dive into the analysis we should break down some of the technical jargon:

The APK Format: Even though an APK ends in .apk it is essentially just a .zip file. If you rename it to .zip and extract it you will find several files inside. The most important are the AndroidManifest.xml (the rulebook that tells the phone what the app is allowed to do like access the camera) and classes.dex (the actual code).
Decompiling: When a programmer writes an app they write it in a language humans can read (like Java). Before it goes to the phone it is translated into a machine language that only the phone can read. This is called compiling. Decompiling is doing that process in reverse where we translate the machine code back into human-readable code.
DEX and Smali: Android's specific machine code is called DEX (Dalvik Executable). Because DEX is just binary data reverse engineers use tools to translate it into a slightly more readable format called Smali. It will still be very difficult to read which is why our newest AI agent will focus specifically on understanding it.
JADX: This will be a popular free software tool that acts as a translator. It takes the confusing DEX machine code and attempts to decompile it all the way back into the original Java code making it much easier for both humans and our AI to read.
MobSF (Mobile Security Framework): An industry-standard software tool used by security professionals to automatically scan apps. We will plug this into our system as an optional backup to double-check our AI's work.

Initial Reconnaissance

What information will be available?

We will have the basic scaffolding for our scanner. Our setup will include:

A web dashboard where we can drag and drop an app to scan it.
An assembly line of established AI assistants handling different parts of the scan.
A suite of "mock" apps we will build ourselves. These will be intentionally broken apps designed to safely test if our scanner works.
A real open-source app to test how our scanner handles a real-world scenario.

What will be unknown?

The biggest unknown will be how our AI handles Obfuscation. Obfuscation is a trick developers use to intentionally scramble their code before publishing it. They change clear variable names like "password" to random letters like "a.b.c()". Since AI relies heavily on words to understand context we will not know if the AI gets confused when the code is scrambled.

Challenges and Constraints

Technical Obstacles

Obfuscation: As mentioned scrambled code will make it harder for the AI to understand what it is looking at.

Information Overload: An average app can contain millions of lines of code. If we try to feed all of that into an AI like ChatGPT at once its memory will overflow and it will crash. This is known as exceeding the "context limit". We will have to figure out how to filter the code and only give the AI the most important pieces.

Operational & Ethical Constraints:

Taking apart software you do not own can sometimes break user agreements or laws. To stay strictly within legal and ethical boundaries we will only test apps we built ourselves or free open-source apps that give explicit permission to be studied.

Preparation Plan

Tooling and Environment: We will keep everything on our local private computer to ensure no data leaks out to the public internet.

The Brain: We will use an AI service to power our agents.

The Interface: A custom-built website running on our own computer.

The Translator: JADX (installed on our computer) to translate the app code.

Safety Precautions

Because we might be dealing with broken or dangerous code safety will be our top priority. We will use a technique called Static Analysis. This means we will only read the code like a book. We will never actually install or run the apps on a phone or computer. Since the code will not be running it cannot harm our system.

Initial Strategy

Our immediate next steps will involve finishing our newest Smali/DEX AI agent. We will run our safe mock apps through the assembly line to make sure everything works. Then we will upload the real open-source app have JADX translate it and see what our AI agents can discover hidden in the code!

Stay tuned for Part 2 where we will push the button unpack the apps and see what secrets our AI uncovers!