Taking Android Apps Apart: Building an AI-Powered Security Scanner

Part 2: The Analysis, Execution and Findings

In Part 1 we defined our objective which was building an automated AI pipeline to analyze Android apps and identify security flaws. With our local environment set up and our JADX translator ready it was time to put our scanner to the test.

In this post we walk through the actual reverse engineering process. We detail our methodology, the roadblocks we hit and our final discoveries as we transitioned from simulated logic to real-world AI agents.

Target Recap

We set out to analyze Android applications to automatically recover source code, inspect machine code and identify underlying vulnerabilities like hidden passwords. Our targets included a suite of intentionally vulnerable mock apps to establish a baseline. After that we used a real open-source application called QuickTiles to test how robust our scanner was against actual Android architecture.

Tech Stack and Architecture

Before we dive into the results it helps to understand exactly how our machinery operates behind the scenes. Our system is built on a custom technology stack designed for speed and security.

The Foundation

We built our backend using Node.js which is a popular environment that lets us run JavaScript on a server instead of just in a web browser. The frontend is a clean web dashboard built with standard HTML and JavaScript.

The Translators (JADX and MobSF)

When an app is uploaded it first goes to JADX. JADX is an open-source translator software. It works by unzipping the Android app and using complex algorithms to translate the robotic binary data back into human-readable Java code. We also connected MobSF (Mobile Security Framework). MobSF works by taking the app and running it through hundreds of predefined security rules. It acts as our automated backup checker to ensure we do not miss anything obvious.

The 8-Agent Assembly Line

Our AI does not tackle the app all at once. Instead we built a pipeline of specialized AI agents. They communicate with each other in a sequential chain. When one finishes its job it packages its findings and passes them directly to the next. This relay-race system ensures each agent stays focused on its specific job.

We recently upgraded our pipeline. While some agents still use fast deterministic code several of our most critical agents are now powered by the Gemini 2.5 Flash API. This means they are no longer just following static rules but are actually thinking and reasoning about the code in real-time.

The actual brainpower for our real-world LLM agents is located in the narrateFindings and addSmaliEvidence functions. Here is how we upgraded the simulated logic to use actual AI calls:

Intake Validation Agent

This acts as the bouncer at the door. It checks the uploaded file to make sure it is a real Android app and not a disguised file before letting it into the system.

Decompilation Agent

This agent operates the JADX translator. It carefully unpacks the app and organizes the translated Java code so the other agents can read it.

Permission & Component Agents (LLM Powered)

These agents now send the entire AndroidManifest.xml to Gemini. The AI reads the file and identifies risky configurations that are too complex for simple search rules to catch.

Static Logic Flaw Agent

This agent acts like a detective reading a book. It scans the human-readable Java code looking for hardcoded passwords or hidden bypasses. It acts as a fast filter to find interesting files for the AI.

Risk Narrative Agent (LLM Powered)

Instead of using pre-written templates this agent now sends every finding to Gemini. The AI writes a custom story for every vulnerability it sees and explains exactly how a hacker might exploit it.

Smali/DEX Bytecode Agent (LLM Powered)

This is our most advanced specialist. It extracts raw robotic machine instructions (known as Smali) and sends them to Gemini. The AI translates this difficult code into plain English so we can understand the low-level logic.

Report Aggregation Agent

This is the manager. It takes the stories from all the other agents and organizes them into a clean color-coded report for our dashboard.

MCP Connections (The Universal Plugs)

We connected our scanner using MCPs (Model Context Protocols). This is a fancy way of saying we built a universal plug. It allows our local scanner to connect directly to larger AI models so the AI can securely pull data from our tools. Instead of the AI guessing how to use our system we gave it specific "buttons" it can press.

Methodology

To inspect the targets we orchestrated a multi-stage workflow:

Intake and Extraction

We uploaded the app via our custom web dashboard.

Hybrid Analysis

We used fast search rules to find "files of interest" then routed those specific snippets to our Gemini-powered LLM agents for deep reasoning.

Decompilation Pipeline

Our system handed the app over to JADX. It unpacked the app and translated the confusing robotic machine code into Java code.

Aggregation

The Report Aggregation Agent collected all these findings and created a unified color-coded report for our dashboard.

Analysis Narrative

We began our analysis with our mock apps to validate the pipeline. Dropping the files into the website successfully triggered the pipeline and the dashboard populated with red warning cards highlighting intentionally placed hidden passwords.

The Evolution from Simulation to Reality

Initially we used "make-believe" agents that only followed static rules. While this worked for our mock apps it failed to capture the nuance of real-world code. By introducing the Gemini API we transformed the scanner.

The Course Correction

We realized that AI is not built to read entire massive applications at once. We kept our fast "Regular Expression" search tools as a spotlight. They find the suspicious lines of code and then we send only those specific lines to Gemini. This hybrid approach allowed us to use the speed of local code with the intelligence of a massive AI model.

Findings

What did our final LLM-powered pipeline discover?

Intelligent Manifest Review

Our Permission Agent successfully identified a risky "Backup Enabled" flag. While a search rule could have found this the AI added context by explaining exactly how a local attacker could steal app data using the ADB tool.

Real-Time Bytecode Translation

Our Smali agent proved to be the most impressive upgrade. When we sent it raw machine code it correctly explained that the code was checking a local boolean variable to see if "premium" mode was active. It then accurately described how to bypass that check.

Custom Remediation Advice:

Every finding on our dashboard now contains custom advice written by the AI. Instead of generic warnings the developer sees specific instructions tailored to their exact code snippet.

Validation

To ensure the AI was not just making things up (hallucinating) we validated the results using two methods:

Manual Comparison

We manually opened the JADX translator ourselves and visually verified the code. The line numbers and AI-generated explanations perfectly matched the actual logic.

MobSF Cross-Reference

We ran the same app through our backup scanner MobSF. The findings aligned perfectly which confirmed that our AI agents were producing high-fidelity security insights.

Future Deployment Plans

Right now everything runs securely on our private local computers. However we have ambitious plans for the future.

We plan to package this entire scanner and introduce it as a public service. To do this we plan to deploy the web application using modern cloud platforms like Vercel or Netlify. These platforms will allow us to easily host our code on the internet. To ensure our new service stays safe from hackers we will host it behind Cloudflare. Cloudflare acts as a massive digital shield that provides better protection by blocking bad traffic before it ever reaches our servers.

Reflection

Building this scanner taught us a valuable lesson about the intersection of AI and security. We learned that the most powerful systems are not 100% AI but are instead "AI-Augmented".

By using fast local code as a "filter" and the Gemini API as the "brain" we built a tool that is both fast and incredibly intelligent. The transition from simulated agents to actual LLM agents was the final step in creating a truly professional grade security scanner.

Ultimately we learned that combining a mature translator tool like JADX with the reasoning capabilities of Gemini creates a powerful synergy that can take apart even the most complex Android applications.

Art Krasniqi's CyberBlog

Thursday, May 7, 2026

LAB 10: APK Security Scanner - Multi-Agent Static Analysis pt.2