Chapter 1: Computer Vision and Object Detection Fundamentals

Haiyue

October 2, 2025

34min

Chapter 1: Computer Vision and Object Detection Fundamentals

Learning Objectives

Understand the basic concepts and application scenarios of computer vision
Master the fundamentals of image processing
Understand the definition and challenges of object detection tasks
Familiarize with object detection evaluation metrics (mAP, IoU, etc.)

1.1 Basic Concepts of Computer Vision

1.1.1 What is Computer Vision

Computer Vision (CV) is an important branch of artificial intelligence that aims to enable computers to “understand” images and videos like humans do.

Core tasks include:

Image Classification: Determining what objects are present in an image
Object Detection: Finding the location and category of objects in an image
Semantic Segmentation: Assigning category labels to each pixel in an image
Instance Segmentation: Distinguishing different instances of the same category
Pose Estimation: Detecting key points of human bodies or objects

1.1.2 Application Scenarios of Computer Vision

# Computer vision application domains
cv_applications = {
    "Autonomous Driving": {
        "tasks": ["Vehicle Detection", "Pedestrian Detection", "Traffic Sign Recognition", "Lane Detection"],
        "technologies": ["Multi-Object Detection", "Depth Estimation", "Optical Flow", "SLAM"]
    },
    "Medical Imaging": {
        "tasks": ["Lesion Detection", "Organ Segmentation", "Disease Diagnosis", "Surgical Navigation"],
        "technologies": ["Medical Image Analysis", "3D Reconstruction", "Image Registration", "CAD Systems"]
    },
    "Security Surveillance": {
        "tasks": ["Face Recognition", "Behavior Analysis", "Anomaly Detection", "License Plate Recognition"],
        "technologies": ["Real-time Detection", "Object Tracking", "Behavior Recognition", "Crowd Analysis"]
    },
    "Industrial Inspection": {
        "tasks": ["Defect Detection", "Quality Control", "Assembly Inspection", "Dimension Measurement"],
        "technologies": ["Surface Inspection", "Shape Matching", "Precision Measurement", "Automated Inspection"]
    },
    "Retail E-commerce": {
        "tasks": ["Product Recognition", "Virtual Try-on", "Smart Recommendations", "Inventory Management"],
        "technologies": ["Object Recognition", "Image Search", "AR/VR", "Visual Recommendations"]
    }
}

1.2 Image Processing Fundamentals

1.2.1 Digital Image Representation

import numpy as np
import cv2
from PIL import Image
import matplotlib.pyplot as plt

class ImageProcessor:
    def __init__(self):
        self.image = None

    def load_image(self, image_path):
        """Load image"""
        self.image = cv2.imread(image_path)
        return self.image

    def image_info(self):
        """Get image information"""
        if self.image is not None:
            height, width, channels = self.image.shape
            print(f"Image dimensions: {width} x {height}")
            print(f"Number of channels: {channels}")
            print(f"Data type: {self.image.dtype}")
            print(f"Pixel value range: {self.image.min()} - {self.image.max()}")

    def color_space_conversion(self):
        """Color space conversion"""
        conversions = {}

        # BGR to RGB
        conversions['RGB'] = cv2.cvtColor(self.image, cv2.COLOR_BGR2RGB)

        # BGR to Gray
        conversions['Gray'] = cv2.cvtColor(self.image, cv2.COLOR_BGR2GRAY)

        # BGR to HSV
        conversions['HSV'] = cv2.cvtColor(self.image, cv2.COLOR_BGR2HSV)

        # BGR to LAB
        conversions['LAB'] = cv2.cvtColor(self.image, cv2.COLOR_BGR2LAB)

        return conversions

    def basic_operations(self):
        """Basic image operations"""
        operations = {}

        # Image resizing
        operations['resized'] = cv2.resize(self.image, (640, 480))

        # Image rotation
        center = (self.image.shape[1]//2, self.image.shape[0]//2)
        rotation_matrix = cv2.getRotationMatrix2D(center, 45, 1.0)
        operations['rotated'] = cv2.warpAffine(self.image, rotation_matrix,
                                             (self.image.shape[1], self.image.shape[0]))

        # Image flipping
        operations['flipped_h'] = cv2.flip(self.image, 1)  # Horizontal flip
        operations['flipped_v'] = cv2.flip(self.image, 0)  # Vertical flip

        # Image cropping
        h, w = self.image.shape[:2]
        operations['cropped'] = self.image[h//4:3*h//4, w//4:3*w//4]

        return operations

# Example usage
processor = ImageProcessor()
# image = processor.load_image('example.jpg')
# processor.image_info()
# conversions = processor.color_space_conversion()
# operations = processor.basic_operations()

1.2.2 Image Preprocessing Techniques

class ImagePreprocessor:
    def __init__(self):
        pass

    def noise_reduction(self, image):
        """Noise reduction"""
        methods = {}

        # Gaussian blur
        methods['gaussian'] = cv2.GaussianBlur(image, (5, 5), 0)

        # Median blur
        methods['median'] = cv2.medianBlur(image, 5)

        # Bilateral filter
        methods['bilateral'] = cv2.bilateralFilter(image, 9, 75, 75)

        return methods

    def edge_detection(self, image):
        """Edge detection"""
        gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

        edges = {}

        # Canny edge detection
        edges['canny'] = cv2.Canny(gray, 50, 150)

        # Sobel edge detection
        edges['sobel_x'] = cv2.Sobel(gray, cv2.CV_64F, 1, 0, ksize=3)
        edges['sobel_y'] = cv2.Sobel(gray, cv2.CV_64F, 0, 1, ksize=3)
        edges['sobel'] = np.sqrt(edges['sobel_x']**2 + edges['sobel_y']**2)

        # Laplacian edge detection
        edges['laplacian'] = cv2.Laplacian(gray, cv2.CV_64F)

        return edges

    def histogram_analysis(self, image):
        """Histogram analysis and equalization"""
        gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

        # Calculate histogram
        hist = cv2.calcHist([gray], [0], None, [256], [0, 256])

        # Histogram equalization
        equalized = cv2.equalizeHist(gray)

        # Adaptive histogram equalization
        clahe = cv2.createCLAHE(clipLimit=3.0, tileGridSize=(8,8))
        adaptive_eq = clahe.apply(gray)

        return {
            'histogram': hist,
            'equalized': equalized,
            'adaptive_equalized': adaptive_eq
        }

# Example usage
preprocessor = ImagePreprocessor()
# noise_methods = preprocessor.noise_reduction(image)
# edge_results = preprocessor.edge_detection(image)
# hist_results = preprocessor.histogram_analysis(image)

1.3 Object Detection Task Definition

1.3.1 What is Object Detection

Object Detection is one of the core tasks in computer vision, aiming to simultaneously accomplish:

Classification: Determining which categories of objects exist in the image
Localization: Determining the specific positions of these objects in the image

1.3.2 Challenges in Object Detection

class ObjectDetectionChallenges:
    def __init__(self):
        self.challenges = {
            "Scale Variation": {
                "description": "Same object appears in different sizes at different distances",
                "solutions": ["Multi-scale Training", "Feature Pyramid", "Scale Augmentation"],
                "example": "Distant vehicles are small, nearby vehicles are large"
            },
            "Occlusion": {
                "description": "Objects are partially or completely occluded by other objects",
                "solutions": ["Partial Detection", "Context Information", "Multi-view Fusion"],
                "example": "Face detection in crowds, traffic signs occluded by leaves"
            },
            "Illumination Changes": {
                "description": "Object appearance varies under different lighting conditions",
                "solutions": ["Data Augmentation", "Illumination Normalization", "Robust Features"],
                "example": "Vehicle detection in day and night, pedestrians in shadows"
            },
            "Complex Background": {
                "description": "Object detection is difficult in complex backgrounds",
                "solutions": ["Context Modeling", "Background Suppression", "Attention Mechanism"],
                "example": "Animals in forests, pedestrians on streets"
            },
            "Intra-class Variation": {
                "description": "Large appearance differences within the same category",
                "solutions": ["Diverse Training Data", "Feature Learning", "Data Augmentation"],
                "example": "Different dog breeds, different car models"
            },
            "Real-time Requirements": {
                "description": "Many applications require real-time detection performance",
                "solutions": ["Lightweight Networks", "Model Compression", "Hardware Optimization"],
                "example": "Autonomous driving, real-time surveillance systems"
            }
        }

    def print_challenges(self):
        """Print all challenges"""
        for challenge, details in self.challenges.items():
            print(f"\n{challenge}:")
            print(f"  Description: {details['description']}")
            print(f"  Solutions: {', '.join(details['solutions'])}")
            print(f"  Example: {details['example']}")

# Create instance
challenges = ObjectDetectionChallenges()
challenges.print_challenges()

1.3.3 Object Detection Algorithm Classification

class DetectionAlgorithmTaxonomy:
    def __init__(self):
        self.algorithms = {
            "Traditional Methods": {
                "characteristics": "Based on hand-crafted features and traditional machine learning",
                "representative_algorithms": [
                    "Viola-Jones",
                    "HOG + SVM",
                    "DPM (Deformable Part Models)"
                ],
                "advantages": ["Clear theory", "Low computational resource requirements"],
                "disadvantages": ["Limited feature representation", "Poor generalization"]
            },
            "Two-Stage Methods": {
                "characteristics": "First generate candidate regions, then classify and regress",
                "representative_algorithms": [
                    "R-CNN",
                    "Fast R-CNN",
                    "Faster R-CNN",
                    "Mask R-CNN"
                ],
                "advantages": ["High detection accuracy", "Suitable for complex scenes"],
                "disadvantages": ["Relatively slow", "Complex system"]
            },
            "One-Stage Methods": {
                "characteristics": "Directly predict object category and location",
                "representative_algorithms": [
                    "YOLO series",
                    "SSD",
                    "RetinaNet",
                    "FCOS"
                ],
                "advantages": ["Fast speed", "End-to-end training", "Suitable for real-time applications"],
                "disadvantages": ["Slightly lower accuracy than two-stage", "Difficulty with small objects"]
            },
            "Transformer Methods": {
                "characteristics": "Detection methods based on attention mechanisms",
                "representative_algorithms": [
                    "DETR",
                    "Deformable DETR",
                    "Sparse DETR"
                ],
                "advantages": ["Strong global modeling capability", "No NMS post-processing needed"],
                "disadvantages": ["Slow training convergence", "High computational complexity"]
            }
        }

    def compare_methods(self):
        """Compare different methods"""
        comparison = {
            "method_type": [],
            "speed": [],
            "accuracy": [],
            "complexity": [],
            "application_scenarios": []
        }

        for method, details in self.algorithms.items():
            comparison["method_type"].append(method)

            if "Fast speed" in details.get("advantages", []):
                comparison["speed"].append("Fast")
            elif method == "Two-Stage Methods":
                comparison["speed"].append("Slow")
            else:
                comparison["speed"].append("Medium")

        return comparison

taxonomy = DetectionAlgorithmTaxonomy()

1.4 Object Detection Evaluation Metrics

1.4.1 IoU (Intersection over Union)

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as patches

class DetectionMetrics:
    def __init__(self):
        pass

    def calculate_iou(self, box1, box2):
        """
        Calculate IoU between two bounding boxes
        box format: [x1, y1, x2, y2]
        """
        # Calculate intersection coordinates
        x1 = max(box1[0], box2[0])
        y1 = max(box1[1], box2[1])
        x2 = min(box1[2], box2[2])
        y2 = min(box1[3], box2[3])

        # Check if there is intersection
        if x2 <= x1 or y2 <= y1:
            return 0.0

        # Calculate intersection area
        intersection = (x2 - x1) * (y2 - y1)

        # Calculate individual areas
        area1 = (box1[2] - box1[0]) * (box1[3] - box1[1])
        area2 = (box2[2] - box2[0]) * (box2[3] - box2[1])

        # Calculate union area
        union = area1 + area2 - intersection

        # Calculate IoU
        iou = intersection / union
        return iou

    def visualize_iou(self, box1, box2):
        """Visualize IoU calculation process"""
        fig, ax = plt.subplots(1, 1, figsize=(8, 8))

        # Draw bounding boxes
        rect1 = patches.Rectangle((box1[0], box1[1]),
                                 box1[2]-box1[0], box1[3]-box1[1],
                                 linewidth=2, edgecolor='red',
                                 facecolor='red', alpha=0.3, label='Ground Truth')

        rect2 = patches.Rectangle((box2[0], box2[1]),
                                 box2[2]-box2[0], box2[3]-box2[1],
                                 linewidth=2, edgecolor='blue',
                                 facecolor='blue', alpha=0.3, label='Prediction')

        ax.add_patch(rect1)
        ax.add_patch(rect2)

        # Calculate and draw intersection
        x1 = max(box1[0], box2[0])
        y1 = max(box1[1], box2[1])
        x2 = min(box1[2], box2[2])
        y2 = min(box1[3], box2[3])

        if x2 > x1 and y2 > y1:
            intersection_rect = patches.Rectangle((x1, y1), x2-x1, y2-y1,
                                                linewidth=2, edgecolor='green',
                                                facecolor='green', alpha=0.5,
                                                label='Intersection')
            ax.add_patch(intersection_rect)

        # Calculate IoU
        iou = self.calculate_iou(box1, box2)

        ax.set_xlim(0, 10)
        ax.set_ylim(0, 10)
        ax.set_title(f'IoU = {iou:.3f}')
        ax.legend()
        ax.grid(True)

        return fig, iou

    def iou_threshold_analysis(self):
        """IoU threshold analysis"""
        thresholds = {
            0.1: "Very low overlap, usually not considered a correct detection",
            0.3: "Low overlap, possible detection",
            0.5: "Medium overlap, PASCAL VOC standard",
            0.7: "High overlap, COCO evaluation standard",
            0.9: "Very high overlap, almost perfect match"
        }

        for threshold, description in thresholds.items():
            print(f"IoU = {threshold}: {description}")

        return thresholds

# Example usage
metrics = DetectionMetrics()

# Example bounding boxes
gt_box = [2, 2, 6, 6]  # Ground Truth
pred_box = [3, 3, 7, 7]  # Prediction

iou_score = metrics.calculate_iou(gt_box, pred_box)
print(f"IoU Score: {iou_score:.3f}")

# Visualize IoU
# fig, iou = metrics.visualize_iou(gt_box, pred_box)
# plt.show()

# IoU threshold analysis
thresholds = metrics.iou_threshold_analysis()

1.4.2 Precision, Recall, and F1 Score

class PrecisionRecallMetrics:
    def __init__(self):
        pass

    def calculate_confusion_matrix(self, predictions, ground_truths, iou_threshold=0.5):
        """
        Calculate confusion matrix components
        predictions: [(box, confidence, class_id), ...]
        ground_truths: [(box, class_id), ...]
        """
        tp = 0  # True Positives
        fp = 0  # False Positives
        fn = 0  # False Negatives

        matched_gt = set()  # Matched ground truths

        # Sort predictions by confidence
        predictions = sorted(predictions, key=lambda x: x[1], reverse=True)

        for pred_box, confidence, pred_class in predictions:
            best_iou = 0
            best_gt_idx = -1

            # Find best matching ground truth
            for gt_idx, (gt_box, gt_class) in enumerate(ground_truths):
                if gt_class != pred_class:
                    continue

                iou = self.calculate_iou(pred_box, gt_box)
                if iou > best_iou:
                    best_iou = iou
                    best_gt_idx = gt_idx

            # Determine if it's a correct detection
            if best_iou >= iou_threshold and best_gt_idx not in matched_gt:
                tp += 1
                matched_gt.add(best_gt_idx)
            else:
                fp += 1

        # Calculate undetected ground truths
        fn = len(ground_truths) - len(matched_gt)

        return tp, fp, fn

    def calculate_iou(self, box1, box2):
        """Calculate IoU (reuse previous function)"""
        x1 = max(box1[0], box2[0])
        y1 = max(box1[1], box2[1])
        x2 = min(box1[2], box2[2])
        y2 = min(box1[3], box2[3])

        if x2 <= x1 or y2 <= y1:
            return 0.0

        intersection = (x2 - x1) * (y2 - y1)
        area1 = (box1[2] - box1[0]) * (box1[3] - box1[1])
        area2 = (box2[2] - box2[0]) * (box2[3] - box2[1])
        union = area1 + area2 - intersection

        return intersection / union

    def calculate_metrics(self, tp, fp, fn):
        """Calculate precision, recall, and F1 score"""
        precision = tp / (tp + fp) if (tp + fp) > 0 else 0
        recall = tp / (tp + fn) if (tp + fn) > 0 else 0
        f1_score = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0

        return {
            'precision': precision,
            'recall': recall,
            'f1_score': f1_score,
            'tp': tp,
            'fp': fp,
            'fn': fn
        }

    def plot_pr_curve(self, predictions, ground_truths, class_id=0):
        """Plot Precision-Recall curve"""
        # Sort by confidence
        predictions = sorted(predictions, key=lambda x: x[1], reverse=True)

        precisions = []
        recalls = []

        for threshold in np.arange(0.1, 1.0, 0.1):
            filtered_predictions = [(box, conf, cls) for box, conf, cls in predictions if conf >= threshold]

            tp, fp, fn = self.calculate_confusion_matrix(filtered_predictions, ground_truths)
            metrics = self.calculate_metrics(tp, fp, fn)

            precisions.append(metrics['precision'])
            recalls.append(metrics['recall'])

        # Plot PR curve
        plt.figure(figsize=(8, 6))
        plt.plot(recalls, precisions, 'b-', linewidth=2)
        plt.xlabel('Recall')
        plt.ylabel('Precision')
        plt.title('Precision-Recall Curve')
        plt.grid(True)
        plt.xlim([0, 1])
        plt.ylim([0, 1])

        # Calculate AP (Area under PR curve)
        ap = np.trapz(precisions, recalls)
        plt.text(0.6, 0.2, f'AP = {ap:.3f}', fontsize=12,
                bbox=dict(boxstyle="round,pad=0.3", facecolor="yellow"))

        return precisions, recalls, ap

# Example usage
pr_metrics = PrecisionRecallMetrics()

# Example data
predictions = [
    ([1, 1, 3, 3], 0.9, 0),  # (box, confidence, class_id)
    ([2, 2, 4, 4], 0.8, 0),
    ([5, 5, 7, 7], 0.7, 1)
]

ground_truths = [
    ([1, 1, 3, 3], 0),  # (box, class_id)
    ([5, 5, 7, 7], 1)
]

tp, fp, fn = pr_metrics.calculate_confusion_matrix(predictions, ground_truths)
metrics_result = pr_metrics.calculate_metrics(tp, fp, fn)

print("Detection Performance Metrics:")
print(f"True Positives: {metrics_result['tp']}")
print(f"False Positives: {metrics_result['fp']}")
print(f"False Negatives: {metrics_result['fn']}")
print(f"Precision: {metrics_result['precision']:.3f}")
print(f"Recall: {metrics_result['recall']:.3f}")
print(f"F1 Score: {metrics_result['f1_score']:.3f}")

1.4.3 mAP (mean Average Precision)

class mAPCalculator:
    def __init__(self):
        pass

    def calculate_ap(self, precision, recall):
        """
        Calculate AP (Average Precision) for a single class
        Using interpolation method
        """
        # Add endpoints
        precision = np.concatenate(([0], precision, [0]))
        recall = np.concatenate(([0], recall, [1]))

        # Ensure precision is monotonically decreasing
        for i in range(len(precision) - 1, 0, -1):
            precision[i - 1] = max(precision[i - 1], precision[i])

        # Find points where recall changes
        indices = np.where(recall[1:] != recall[:-1])[0]

        # Calculate area
        ap = np.sum((recall[indices + 1] - recall[indices]) * precision[indices + 1])

        return ap

    def calculate_map(self, all_predictions, all_ground_truths, num_classes, iou_thresholds=None):
        """
        Calculate mAP for multiple classes
        """
        if iou_thresholds is None:
            iou_thresholds = [0.5]  # PASCAL VOC standard

        class_aps = {}

        for class_id in range(num_classes):
            # Extract predictions and ground truths for current class
            class_predictions = [(box, conf, cls) for box, conf, cls in all_predictions if cls == class_id]
            class_ground_truths = [(box, cls) for box, cls in all_ground_truths if cls == class_id]

            if len(class_ground_truths) == 0:
                continue

            class_aps[class_id] = []

            for iou_threshold in iou_thresholds:
                # Calculate precision and recall at different confidence thresholds
                precisions = []
                recalls = []

                # Sort by confidence
                class_predictions.sort(key=lambda x: x[1], reverse=True)

                tp_cumsum = 0
                fp_cumsum = 0
                matched_gt = set()

                for pred_box, confidence, pred_class in class_predictions:
                    best_iou = 0
                    best_gt_idx = -1

                    for gt_idx, (gt_box, gt_class) in enumerate(class_ground_truths):
                        iou = self.calculate_iou(pred_box, gt_box)
                        if iou > best_iou:
                            best_iou = iou
                            best_gt_idx = gt_idx

                    if best_iou >= iou_threshold and best_gt_idx not in matched_gt:
                        tp_cumsum += 1
                        matched_gt.add(best_gt_idx)
                    else:
                        fp_cumsum += 1

                    precision = tp_cumsum / (tp_cumsum + fp_cumsum)
                    recall = tp_cumsum / len(class_ground_truths)

                    precisions.append(precision)
                    recalls.append(recall)

                # Calculate AP
                if len(precisions) > 0:
                    ap = self.calculate_ap(np.array(precisions), np.array(recalls))
                    class_aps[class_id].append(ap)

        # Calculate mAP
        all_aps = []
        for class_id, aps in class_aps.items():
            if len(aps) > 0:
                all_aps.extend(aps)

        map_score = np.mean(all_aps) if len(all_aps) > 0 else 0.0

        return {
            'mAP': map_score,
            'class_APs': class_aps,
            'detailed_results': {
                'per_class_ap': {cls: np.mean(aps) if len(aps) > 0 else 0 for cls, aps in class_aps.items()},
                'iou_thresholds': iou_thresholds
            }
        }

    def calculate_iou(self, box1, box2):
        """Calculate IoU"""
        x1 = max(box1[0], box2[0])
        y1 = max(box1[1], box2[1])
        x2 = min(box1[2], box2[2])
        y2 = min(box1[3], box2[3])

        if x2 <= x1 or y2 <= y1:
            return 0.0

        intersection = (x2 - x1) * (y2 - y1)
        area1 = (box1[2] - box1[0]) * (box1[3] - box1[1])
        area2 = (box2[2] - box2[0]) * (box2[3] - box2[1])
        union = area1 + area2 - intersection

        return intersection / union

    def coco_map(self, all_predictions, all_ground_truths, num_classes):
        """
        Calculate COCO standard mAP (IoU from 0.5 to 0.95, step 0.05)
        """
        iou_thresholds = np.arange(0.5, 1.0, 0.05)

        results = self.calculate_map(all_predictions, all_ground_truths,
                                   num_classes, iou_thresholds)

        # Additionally calculate mAP@0.5 and mAP@0.75
        map_50 = self.calculate_map(all_predictions, all_ground_truths,
                                  num_classes, [0.5])['mAP']
        map_75 = self.calculate_map(all_predictions, all_ground_truths,
                                  num_classes, [0.75])['mAP']

        results['mAP@0.5'] = map_50
        results['mAP@0.75'] = map_75

        return results

# Example usage
map_calculator = mAPCalculator()

# Example data (multi-class)
all_predictions = [
    ([1, 1, 3, 3], 0.9, 0),  # Class 0
    ([2, 2, 4, 4], 0.8, 0),
    ([5, 5, 7, 7], 0.7, 1),  # Class 1
    ([6, 6, 8, 8], 0.6, 1)
]

all_ground_truths = [
    ([1, 1, 3, 3], 0),
    ([5, 5, 7, 7], 1),
    ([8, 8, 10, 10], 1)
]

# Calculate mAP
map_results = map_calculator.calculate_map(all_predictions, all_ground_truths, num_classes=2)
print("mAP Results:")
print(f"mAP: {map_results['mAP']:.3f}")
print("Per-class AP:")
for class_id, ap in map_results['detailed_results']['per_class_ap'].items():
    print(f"  Class {class_id}: {ap:.3f}")

# Calculate COCO-style mAP
coco_results = map_calculator.coco_map(all_predictions, all_ground_truths, num_classes=2)
print(f"\nCOCO-style Evaluation:")
print(f"mAP@0.5:0.95: {coco_results['mAP']:.3f}")
print(f"mAP@0.5: {coco_results['mAP@0.5']:.3f}")
print(f"mAP@0.75: {coco_results['mAP@0.75']:.3f}")

1.5 Metrics Comparison and Selection

class MetricsComparison:
    def __init__(self):
        self.metrics_overview = {
            "IoU": {
                "purpose": "Measure bounding box overlap degree",
                "range": "[0, 1]",
                "advantages": ["Intuitive and easy to understand", "Simple calculation", "Widely used"],
                "disadvantages": ["Only considers overlap", "Doesn't consider category", "Threshold dependent"],
                "applicable_scenarios": "Bounding box quality assessment"
            },
            "Precision": {
                "purpose": "Measure detection result accuracy",
                "range": "[0, 1]",
                "advantages": ["Reflects false positives", "Intuitive calculation"],
                "disadvantages": ["Doesn't consider false negatives", "Threshold sensitive"],
                "applicable_scenarios": "Applications concerned with false positive rate"
            },
            "Recall": {
                "purpose": "Measure detection completeness",
                "range": "[0, 1]",
                "advantages": ["Reflects false negatives", "Intuitive calculation"],
                "disadvantages": ["Doesn't consider false positives", "Threshold sensitive"],
                "applicable_scenarios": "Applications concerned with false negative rate"
            },
            "F1-Score": {
                "purpose": "Balance precision and recall",
                "range": "[0, 1]",
                "advantages": ["Comprehensive metric", "Single value"],
                "disadvantages": ["Equal weight average", "May hide details"],
                "applicable_scenarios": "General evaluation requiring balance"
            },
            "AP": {
                "purpose": "Comprehensive performance evaluation for single class",
                "range": "[0, 1]",
                "advantages": ["Considers all thresholds", "Comprehensive evaluation", "Standardized"],
                "disadvantages": ["Complex calculation", "Difficult to interpret"],
                "applicable_scenarios": "In-depth evaluation for single class"
            },
            "mAP": {
                "purpose": "Comprehensive performance evaluation for multiple classes",
                "range": "[0, 1]",
                "advantages": ["Multi-class comprehensive", "Industry standard", "Comparable"],
                "disadvantages": ["Averaging may hide differences", "Most complex calculation"],
                "applicable_scenarios": "Multi-class detection evaluation standard"
            }
        }

    def print_comparison(self):
        """Print metrics comparison"""
        print("Object Detection Evaluation Metrics Comparison:")
        print("=" * 60)

        for metric, details in self.metrics_overview.items():
            print(f"\n{metric}:")
            for key, value in details.items():
                if isinstance(value, list):
                    print(f"  {key}: {', '.join(value)}")
                else:
                    print(f"  {key}: {value}")

    def choose_metrics(self, application_type):
        """Recommend metrics based on application type"""
        recommendations = {
            "Autonomous Driving": {
                "main_metrics": ["mAP@0.5", "Recall"],
                "reason": "Safety critical, cannot miss detections, high accuracy required",
                "additional_considerations": ["Real-time performance", "Small object detection capability"]
            },
            "Industrial Inspection": {
                "main_metrics": ["Precision", "F1-Score"],
                "reason": "Avoid waste from false positives, balance accuracy and completeness",
                "additional_considerations": ["Defect type balance", "Detection consistency"]
            },
            "Security Surveillance": {
                "main_metrics": ["Recall", "mAP@0.5"],
                "reason": "Cannot miss suspicious targets, overall performance must be good",
                "additional_considerations": ["Real-time processing capability", "Night performance"]
            },
            "Medical Imaging": {
                "main_metrics": ["Recall", "Precision"],
                "reason": "Cannot miss diagnoses, also need to control misdiagnoses",
                "additional_considerations": ["Sensitivity", "Specificity", "Clinical significance"]
            },
            "Retail Applications": {
                "main_metrics": ["mAP", "F1-Score"],
                "reason": "Multi-class products, need balanced performance",
                "additional_considerations": ["User experience", "Cost-effectiveness"]
            }
        }

        if application_type in recommendations:
            rec = recommendations[application_type]
            print(f"\n{application_type} Application Recommended Metrics:")
            print(f"Main metrics: {', '.join(rec['main_metrics'])}")
            print(f"Reason: {rec['reason']}")
            print(f"Additional considerations: {', '.join(rec['additional_considerations'])}")
        else:
            print("No recommendations found for this application type, please choose general metric: mAP")

        return recommendations.get(application_type)

# Example usage
comparison = MetricsComparison()
comparison.print_comparison()

# Recommend metrics based on application
autonomous_driving_metrics = comparison.choose_metrics("Autonomous Driving")
industrial_metrics = comparison.choose_metrics("Industrial Inspection")

Chapter Summary

1.6.1 Core Concepts Review

Computer Vision is the technology that enables computers to understand visual information
Object Detection simultaneously solves classification and localization problems
Image Preprocessing is an important step in improving detection performance
Evaluation Metrics help objectively assess algorithm performance

1.6.2 Important Technical Points

IoU calculation and threshold selection
Balancing precision and recall
mAP calculation method and significance
Metric selection for different application scenarios

1.6.3 Practical Points

Understand data characteristics and choose appropriate preprocessing methods
Select evaluation metrics based on application requirements
Focus on balancing algorithm real-time performance and accuracy
Pay attention to small object detection and occlusion handling

1.6.4 Next Chapter Preview

The next chapter will delve into deep learning fundamentals and convolutional neural networks, which are important theoretical foundations for understanding YOLO algorithms. We will learn:

Basic principles of deep learning
CNN structure and working mechanisms
Evolution of common network architectures
Preparation for YOLO learning

Through this chapter, we have established foundational knowledge in object detection, laying a solid foundation for subsequent in-depth study of YOLO series algorithms.

P Info

Project Cards Demo

Trading Chart Demo

Basic Skills

001. Genix Ventures Overview

P Info

Project Cards Demo

Trading Chart Demo

Basic Skills

001. Genix Ventures Overview

Chapter 1: Computer Vision and Object Detection Fundamentals

Chapter 1: Computer Vision and Object Detection Fundamentals

Learning Objectives

1.1 Basic Concepts of Computer Vision

1.1.1 What is Computer Vision

1.1.2 Application Scenarios of Computer Vision

1.2 Image Processing Fundamentals

1.2.1 Digital Image Representation

1.2.2 Image Preprocessing Techniques

1.3 Object Detection Task Definition

1.3.1 What is Object Detection

1.3.2 Challenges in Object Detection

1.3.3 Object Detection Algorithm Classification

1.4 Object Detection Evaluation Metrics

1.4.1 IoU (Intersection over Union)

1.4.2 Precision, Recall, and F1 Score

1.4.3 mAP (mean Average Precision)

1.5 Metrics Comparison and Selection

Chapter Summary

1.6.1 Core Concepts Review

1.6.2 Important Technical Points

1.6.3 Practical Points

1.6.4 Next Chapter Preview