Attacking YOLOv8 vs YOLOv10: Adversarial CNN Misclassification

A writeup NHNC 2025 CTF CNN challenge.

Posted Mar 8, 2026 Updated Apr 6, 2026

Adversarial CNN Misclassification.

6 min read

Attacking YOLOv8 vs YOLOv10: Adversarial CNN Misclassification

Summary

In this challenge, we explored the vulnerabilities of two object detection models, YOLOv8 and YOLOv10 by crafting an adversarial image that causes the models to disagree on predictions with a significant confidence gap. Using insights gleaned from their confusion matrices, we identified weak spots in classification consistency and exploited them using image transformations like noise injection, blurring, color shifting, and rotations.

By systematically perturbing a source image and evaluating predictions in a loop, we found a transformation that met both conditions:

The two YOLO versions predicted different object classes
The absolute difference in their confidence scores exceeded 0.4

This adversarial example reveals real-world concerns in machine learning systems: even small, natural-looking perturbations can cause models to behave inconsistently, especially across versions. The challenge highlights the importance of model robustness, adversarial testing, and version-aware validation pipelines in production-grade ML systems.

Challenge Description

Title: Attack CCN?
Description: Did u know how to attack CNN?
Category: Machine Learning / Adversarial Attacks
Points: 500
Difficulty: Medium
Maker: kohiro

The challenge required creating an adversarial image that satisfies two conditions:

  
different_prediction = result_v8["class_name"] != result_v10["class_name"]
confidence_gap = abs(result_v8["confidence"] - result_v10["confidence"]) >= 0.4

What We Had

We were provided:

Confusion matrices for both YOLOv8 and YOLOv10
Serialized PyTorch models (YOLOv8 and YOLOv10 checkpoints)
A web endpoint: http://chal.78727867.xyz:5000/

The goal was to upload a single adversarial image that causes the two YOLO versions to predict different classes, with a confidence difference of at least 0.4. This tests not just adversarial crafting skills, but also model drift exploitation, where different versions of a CNN interpret visual noise differently.

Confusion Matrix Analysis

To make a precise, low-effort, high-yield adversarial image, we first analyzed the model weaknesses.

YOLOv8 Confusion Matrix

This matrix shows how often YOLOv8 correctly classifies traffic signs. Most classes are well-classified, with very high diagonal values (~0.9+). For example:

Speed Limit 120 → 92% correct
Stop → 100% correct
Green Light → 80% correct

But there are some off-diagonal cells with non-zero values, indicating misclassifications:

Red Light is misclassified as background: 22%
Speed Limit 90 has 9% misclassified as background

Meaning: High accuracy across most classes, very “confident” and robust. Stop and Speed Limit 120 are almost perfectly predicted. Background confusion is low.

YOLOv10 Confusion Matrix

YOLOv10 shows different patterns:

Speed Limit 60 → Only 60% correct (vs YOLOv8’s higher accuracy)
Speed Limit 90 → 70% correct (vs YOLOv8’s 91%)
More background confusion overall

TLDR

YOLOv8 is more confident and accurate, while YOLOv10 has more uncertainty, especially around speed limit signs. This suggests speed limit signs are good targets for adversarial attacks.

What is this about?!

This challenge demonstrates model drift - when different versions of the same model architecture behave differently on edge cases. Even small perturbations can cause:

Different predictions between model versions
Large confidence gaps in their certainty

This is a real-world security concern in ML systems where model updates might introduce new vulnerabilities.

Exploit

Transformations Used

Our adversarial image generation used multiple transformation techniques:

Gaussian Noise - Random pixel perturbations
Gaussian Blur - Smoothing to reduce fine details
Color Shifting - Hue/saturation adjustments
Rotation - Small angle rotations
Brightness/Contrast - Lighting adjustments

Exploit Walkthrough

Requirements

  
import torch
from ultralytics import YOLO
import cv2
import numpy as np
import random
from PIL import Image, ImageEnhance
import requests

Exploit Code

  
import torch
from ultralytics import YOLO
import cv2
import numpy as np
import random
from PIL import Image, ImageEnhance
import requests

# Load models
model_v8 = YOLO('yolov8n.pt')
model_v10 = YOLO('yolov10n.pt')

SOURCE_IMAGE = "traffic_sign.webp"  # Base image to transform

def transform_image(img_path):
    """Apply random transformations to create adversarial examples"""
    img = cv2.imread(img_path)
    
    # Random Gaussian noise
    noise = np.random.normal(0, random.uniform(5, 25), img.shape).astype(np.uint8)
    img = cv2.add(img, noise)
    
    # Random blur
    if random.random() > 0.5:
        kernel_size = random.choice([3, 5, 7])
        img = cv2.GaussianBlur(img, (kernel_size, kernel_size), 0)
    
    # Color shifting
    img_pil = Image.fromarray(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
    
    # Random hue shift
    enhancer = ImageEnhance.Color(img_pil)
    img_pil = enhancer.enhance(random.uniform(0.7, 1.3))
    
    # Random brightness
    enhancer = ImageEnhance.Brightness(img_pil)
    img_pil = enhancer.enhance(random.uniform(0.8, 1.2))
    
    # Random contrast
    enhancer = ImageEnhance.Contrast(img_pil)
    img_pil = enhancer.enhance(random.uniform(0.8, 1.2))
    
    # Convert back to OpenCV format
    img = cv2.cvtColor(np.array(img_pil), cv2.COLOR_RGB2BGR)
    
    # Random rotation
    if random.random() > 0.5:
        angle = random.uniform(-15, 15)
        center = (img.shape[1]//2, img.shape[0]//2)
        matrix = cv2.getRotationMatrix2D(center, angle, 1.0)
        img = cv2.warpAffine(img, matrix, (img.shape[1], img.shape[0]))
    
    # Save transformed image
    output_path = "transformed.webp"
    cv2.imwrite(output_path, img)
    return output_path

def get_prediction(model, img_path):
    """Get prediction from YOLO model"""
    results = model(img_path, verbose=False)
    
    if len(results[0].boxes) > 0 and len(results[0].boxes.cls) > 0:
        class_id = int(results[0].boxes.cls[0].item())
        class_name = results[0].names[class_id]
        confidence = float(results[0].boxes.conf[0].item())
        return class_name, confidence
    return None, 0.0

def submit_image(img_path):
    """Submit image to challenge server"""
    url = "http://chal.78727867.xyz:5000/"
    with open(img_path, "rb") as img_file:
        files = {"image": img_file}
        print("[*] Uploading to challenge...")
        res = requests.post(url, files=files)
        print("[*] Server response:\n")
        print(res.text)

# Main loop
for i in range(1000):
    img_path = transform_image(SOURCE_IMAGE)
    class_v8, conf_v8 = get_prediction(model_v8, img_path)
    class_v10, conf_v10 = get_prediction(model_v10, img_path)
    
    print(f"[{i}] YOLOv8: {class_v8} ({conf_v8:.2f}) | YOLOv10: {class_v10} ({conf_v10:.2f})")
    
    if class_v8 and class_v10:
        if class_v8 != class_v10 and abs(conf_v8 - conf_v10) >= 0.4:
            print("\n[+] Found adversarial image!")
            print(f"    YOLOv8 → {class_v8} ({conf_v8:.2f})")
            print(f"    YOLOv10 → {class_v10} ({conf_v10:.2f})")
            submit_image(img_path)
            break

Results

After running the exploit, we successfully found an adversarial image:

[16] YOLOv8: Speed Limit 90 (0.86) | YOLOv10: Speed Limit 60 (0.34)

[+] Found adversarial image!
    YOLOv8 → Speed Limit 90 (0.86)
    YOLOv10 → Speed Limit 60 (0.34)

The server responded with the flag:

<h2>🎉 FLAG: NHNC{you_kn0w_h0w_t0_d0_adv3rs3ria1_attack}</h2>

Why This Worked

YOLO models are CNN-based and sensitive to small perturbations, especially in earlier convolution layers. YOLOv8 and YOLOv10 likely have slightly different weights, training data, or hyperparameters, meaning they respond differently to noise.

The transformation pipeline ensured we created images in the decision boundary space — areas where small input changes cause big output changes. This is a textbook black-box adversarial attack. We didn’t need gradients, only output labels and confidence scores.

This challenge highlights critical security considerations for production ML systems:

Model robustness testing before deployment
Version-aware validation when updating models
Adversarial training to improve resilience
Ensemble methods to reduce single-point failures

The attack demonstrates how even natural-looking image transformations can exploit subtle differences between model versions, making this a realistic threat vector in real-world applications.

CTF Writeups

AI ML CNN Adversarial attacks

This post is licensed under CC BY 4.0 by the author.