Chapter 1: Computer Vision and Object Detection Fundamentals
Haiyue
34min
Chapter 1: Computer Vision and Object Detection Fundamentals
Learning Objectives
- Understand the basic concepts and application scenarios of computer vision
- Master the fundamentals of image processing
- Understand the definition and challenges of object detection tasks
- Familiarize with object detection evaluation metrics (mAP, IoU, etc.)
1.1 Basic Concepts of Computer Vision
1.1.1 What is Computer Vision
Computer Vision (CV) is an important branch of artificial intelligence that aims to enable computers to “understand” images and videos like humans do.
Core tasks include:
- Image Classification: Determining what objects are present in an image
- Object Detection: Finding the location and category of objects in an image
- Semantic Segmentation: Assigning category labels to each pixel in an image
- Instance Segmentation: Distinguishing different instances of the same category
- Pose Estimation: Detecting key points of human bodies or objects
1.1.2 Application Scenarios of Computer Vision
# Computer vision application domains
cv_applications = {
"Autonomous Driving": {
"tasks": ["Vehicle Detection", "Pedestrian Detection", "Traffic Sign Recognition", "Lane Detection"],
"technologies": ["Multi-Object Detection", "Depth Estimation", "Optical Flow", "SLAM"]
},
"Medical Imaging": {
"tasks": ["Lesion Detection", "Organ Segmentation", "Disease Diagnosis", "Surgical Navigation"],
"technologies": ["Medical Image Analysis", "3D Reconstruction", "Image Registration", "CAD Systems"]
},
"Security Surveillance": {
"tasks": ["Face Recognition", "Behavior Analysis", "Anomaly Detection", "License Plate Recognition"],
"technologies": ["Real-time Detection", "Object Tracking", "Behavior Recognition", "Crowd Analysis"]
},
"Industrial Inspection": {
"tasks": ["Defect Detection", "Quality Control", "Assembly Inspection", "Dimension Measurement"],
"technologies": ["Surface Inspection", "Shape Matching", "Precision Measurement", "Automated Inspection"]
},
"Retail E-commerce": {
"tasks": ["Product Recognition", "Virtual Try-on", "Smart Recommendations", "Inventory Management"],
"technologies": ["Object Recognition", "Image Search", "AR/VR", "Visual Recommendations"]
}
}
1.2 Image Processing Fundamentals
1.2.1 Digital Image Representation
import numpy as np
import cv2
from PIL import Image
import matplotlib.pyplot as plt
class ImageProcessor:
def __init__(self):
self.image = None
def load_image(self, image_path):
"""Load image"""
self.image = cv2.imread(image_path)
return self.image
def image_info(self):
"""Get image information"""
if self.image is not None:
height, width, channels = self.image.shape
print(f"Image dimensions: {width} x {height}")
print(f"Number of channels: {channels}")
print(f"Data type: {self.image.dtype}")
print(f"Pixel value range: {self.image.min()} - {self.image.max()}")
def color_space_conversion(self):
"""Color space conversion"""
conversions = {}
# BGR to RGB
conversions['RGB'] = cv2.cvtColor(self.image, cv2.COLOR_BGR2RGB)
# BGR to Gray
conversions['Gray'] = cv2.cvtColor(self.image, cv2.COLOR_BGR2GRAY)
# BGR to HSV
conversions['HSV'] = cv2.cvtColor(self.image, cv2.COLOR_BGR2HSV)
# BGR to LAB
conversions['LAB'] = cv2.cvtColor(self.image, cv2.COLOR_BGR2LAB)
return conversions
def basic_operations(self):
"""Basic image operations"""
operations = {}
# Image resizing
operations['resized'] = cv2.resize(self.image, (640, 480))
# Image rotation
center = (self.image.shape[1]//2, self.image.shape[0]//2)
rotation_matrix = cv2.getRotationMatrix2D(center, 45, 1.0)
operations['rotated'] = cv2.warpAffine(self.image, rotation_matrix,
(self.image.shape[1], self.image.shape[0]))
# Image flipping
operations['flipped_h'] = cv2.flip(self.image, 1) # Horizontal flip
operations['flipped_v'] = cv2.flip(self.image, 0) # Vertical flip
# Image cropping
h, w = self.image.shape[:2]
operations['cropped'] = self.image[h//4:3*h//4, w//4:3*w//4]
return operations
# Example usage
processor = ImageProcessor()
# image = processor.load_image('example.jpg')
# processor.image_info()
# conversions = processor.color_space_conversion()
# operations = processor.basic_operations()
1.2.2 Image Preprocessing Techniques
class ImagePreprocessor:
def __init__(self):
pass
def noise_reduction(self, image):
"""Noise reduction"""
methods = {}
# Gaussian blur
methods['gaussian'] = cv2.GaussianBlur(image, (5, 5), 0)
# Median blur
methods['median'] = cv2.medianBlur(image, 5)
# Bilateral filter
methods['bilateral'] = cv2.bilateralFilter(image, 9, 75, 75)
return methods
def edge_detection(self, image):
"""Edge detection"""
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
edges = {}
# Canny edge detection
edges['canny'] = cv2.Canny(gray, 50, 150)
# Sobel edge detection
edges['sobel_x'] = cv2.Sobel(gray, cv2.CV_64F, 1, 0, ksize=3)
edges['sobel_y'] = cv2.Sobel(gray, cv2.CV_64F, 0, 1, ksize=3)
edges['sobel'] = np.sqrt(edges['sobel_x']**2 + edges['sobel_y']**2)
# Laplacian edge detection
edges['laplacian'] = cv2.Laplacian(gray, cv2.CV_64F)
return edges
def histogram_analysis(self, image):
"""Histogram analysis and equalization"""
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Calculate histogram
hist = cv2.calcHist([gray], [0], None, [256], [0, 256])
# Histogram equalization
equalized = cv2.equalizeHist(gray)
# Adaptive histogram equalization
clahe = cv2.createCLAHE(clipLimit=3.0, tileGridSize=(8,8))
adaptive_eq = clahe.apply(gray)
return {
'histogram': hist,
'equalized': equalized,
'adaptive_equalized': adaptive_eq
}
# Example usage
preprocessor = ImagePreprocessor()
# noise_methods = preprocessor.noise_reduction(image)
# edge_results = preprocessor.edge_detection(image)
# hist_results = preprocessor.histogram_analysis(image)
1.3 Object Detection Task Definition
1.3.1 What is Object Detection
Object Detection is one of the core tasks in computer vision, aiming to simultaneously accomplish:
- Classification: Determining which categories of objects exist in the image
- Localization: Determining the specific positions of these objects in the image
1.3.2 Challenges in Object Detection
class ObjectDetectionChallenges:
def __init__(self):
self.challenges = {
"Scale Variation": {
"description": "Same object appears in different sizes at different distances",
"solutions": ["Multi-scale Training", "Feature Pyramid", "Scale Augmentation"],
"example": "Distant vehicles are small, nearby vehicles are large"
},
"Occlusion": {
"description": "Objects are partially or completely occluded by other objects",
"solutions": ["Partial Detection", "Context Information", "Multi-view Fusion"],
"example": "Face detection in crowds, traffic signs occluded by leaves"
},
"Illumination Changes": {
"description": "Object appearance varies under different lighting conditions",
"solutions": ["Data Augmentation", "Illumination Normalization", "Robust Features"],
"example": "Vehicle detection in day and night, pedestrians in shadows"
},
"Complex Background": {
"description": "Object detection is difficult in complex backgrounds",
"solutions": ["Context Modeling", "Background Suppression", "Attention Mechanism"],
"example": "Animals in forests, pedestrians on streets"
},
"Intra-class Variation": {
"description": "Large appearance differences within the same category",
"solutions": ["Diverse Training Data", "Feature Learning", "Data Augmentation"],
"example": "Different dog breeds, different car models"
},
"Real-time Requirements": {
"description": "Many applications require real-time detection performance",
"solutions": ["Lightweight Networks", "Model Compression", "Hardware Optimization"],
"example": "Autonomous driving, real-time surveillance systems"
}
}
def print_challenges(self):
"""Print all challenges"""
for challenge, details in self.challenges.items():
print(f"\n{challenge}:")
print(f" Description: {details['description']}")
print(f" Solutions: {', '.join(details['solutions'])}")
print(f" Example: {details['example']}")
# Create instance
challenges = ObjectDetectionChallenges()
challenges.print_challenges()
1.3.3 Object Detection Algorithm Classification
class DetectionAlgorithmTaxonomy:
def __init__(self):
self.algorithms = {
"Traditional Methods": {
"characteristics": "Based on hand-crafted features and traditional machine learning",
"representative_algorithms": [
"Viola-Jones",
"HOG + SVM",
"DPM (Deformable Part Models)"
],
"advantages": ["Clear theory", "Low computational resource requirements"],
"disadvantages": ["Limited feature representation", "Poor generalization"]
},
"Two-Stage Methods": {
"characteristics": "First generate candidate regions, then classify and regress",
"representative_algorithms": [
"R-CNN",
"Fast R-CNN",
"Faster R-CNN",
"Mask R-CNN"
],
"advantages": ["High detection accuracy", "Suitable for complex scenes"],
"disadvantages": ["Relatively slow", "Complex system"]
},
"One-Stage Methods": {
"characteristics": "Directly predict object category and location",
"representative_algorithms": [
"YOLO series",
"SSD",
"RetinaNet",
"FCOS"
],
"advantages": ["Fast speed", "End-to-end training", "Suitable for real-time applications"],
"disadvantages": ["Slightly lower accuracy than two-stage", "Difficulty with small objects"]
},
"Transformer Methods": {
"characteristics": "Detection methods based on attention mechanisms",
"representative_algorithms": [
"DETR",
"Deformable DETR",
"Sparse DETR"
],
"advantages": ["Strong global modeling capability", "No NMS post-processing needed"],
"disadvantages": ["Slow training convergence", "High computational complexity"]
}
}
def compare_methods(self):
"""Compare different methods"""
comparison = {
"method_type": [],
"speed": [],
"accuracy": [],
"complexity": [],
"application_scenarios": []
}
for method, details in self.algorithms.items():
comparison["method_type"].append(method)
if "Fast speed" in details.get("advantages", []):
comparison["speed"].append("Fast")
elif method == "Two-Stage Methods":
comparison["speed"].append("Slow")
else:
comparison["speed"].append("Medium")
return comparison
taxonomy = DetectionAlgorithmTaxonomy()
1.4 Object Detection Evaluation Metrics
1.4.1 IoU (Intersection over Union)
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as patches
class DetectionMetrics:
def __init__(self):
pass
def calculate_iou(self, box1, box2):
"""
Calculate IoU between two bounding boxes
box format: [x1, y1, x2, y2]
"""
# Calculate intersection coordinates
x1 = max(box1[0], box2[0])
y1 = max(box1[1], box2[1])
x2 = min(box1[2], box2[2])
y2 = min(box1[3], box2[3])
# Check if there is intersection
if x2 <= x1 or y2 <= y1:
return 0.0
# Calculate intersection area
intersection = (x2 - x1) * (y2 - y1)
# Calculate individual areas
area1 = (box1[2] - box1[0]) * (box1[3] - box1[1])
area2 = (box2[2] - box2[0]) * (box2[3] - box2[1])
# Calculate union area
union = area1 + area2 - intersection
# Calculate IoU
iou = intersection / union
return iou
def visualize_iou(self, box1, box2):
"""Visualize IoU calculation process"""
fig, ax = plt.subplots(1, 1, figsize=(8, 8))
# Draw bounding boxes
rect1 = patches.Rectangle((box1[0], box1[1]),
box1[2]-box1[0], box1[3]-box1[1],
linewidth=2, edgecolor='red',
facecolor='red', alpha=0.3, label='Ground Truth')
rect2 = patches.Rectangle((box2[0], box2[1]),
box2[2]-box2[0], box2[3]-box2[1],
linewidth=2, edgecolor='blue',
facecolor='blue', alpha=0.3, label='Prediction')
ax.add_patch(rect1)
ax.add_patch(rect2)
# Calculate and draw intersection
x1 = max(box1[0], box2[0])
y1 = max(box1[1], box2[1])
x2 = min(box1[2], box2[2])
y2 = min(box1[3], box2[3])
if x2 > x1 and y2 > y1:
intersection_rect = patches.Rectangle((x1, y1), x2-x1, y2-y1,
linewidth=2, edgecolor='green',
facecolor='green', alpha=0.5,
label='Intersection')
ax.add_patch(intersection_rect)
# Calculate IoU
iou = self.calculate_iou(box1, box2)
ax.set_xlim(0, 10)
ax.set_ylim(0, 10)
ax.set_title(f'IoU = {iou:.3f}')
ax.legend()
ax.grid(True)
return fig, iou
def iou_threshold_analysis(self):
"""IoU threshold analysis"""
thresholds = {
0.1: "Very low overlap, usually not considered a correct detection",
0.3: "Low overlap, possible detection",
0.5: "Medium overlap, PASCAL VOC standard",
0.7: "High overlap, COCO evaluation standard",
0.9: "Very high overlap, almost perfect match"
}
for threshold, description in thresholds.items():
print(f"IoU = {threshold}: {description}")
return thresholds
# Example usage
metrics = DetectionMetrics()
# Example bounding boxes
gt_box = [2, 2, 6, 6] # Ground Truth
pred_box = [3, 3, 7, 7] # Prediction
iou_score = metrics.calculate_iou(gt_box, pred_box)
print(f"IoU Score: {iou_score:.3f}")
# Visualize IoU
# fig, iou = metrics.visualize_iou(gt_box, pred_box)
# plt.show()
# IoU threshold analysis
thresholds = metrics.iou_threshold_analysis()
1.4.2 Precision, Recall, and F1 Score
class PrecisionRecallMetrics:
def __init__(self):
pass
def calculate_confusion_matrix(self, predictions, ground_truths, iou_threshold=0.5):
"""
Calculate confusion matrix components
predictions: [(box, confidence, class_id), ...]
ground_truths: [(box, class_id), ...]
"""
tp = 0 # True Positives
fp = 0 # False Positives
fn = 0 # False Negatives
matched_gt = set() # Matched ground truths
# Sort predictions by confidence
predictions = sorted(predictions, key=lambda x: x[1], reverse=True)
for pred_box, confidence, pred_class in predictions:
best_iou = 0
best_gt_idx = -1
# Find best matching ground truth
for gt_idx, (gt_box, gt_class) in enumerate(ground_truths):
if gt_class != pred_class:
continue
iou = self.calculate_iou(pred_box, gt_box)
if iou > best_iou:
best_iou = iou
best_gt_idx = gt_idx
# Determine if it's a correct detection
if best_iou >= iou_threshold and best_gt_idx not in matched_gt:
tp += 1
matched_gt.add(best_gt_idx)
else:
fp += 1
# Calculate undetected ground truths
fn = len(ground_truths) - len(matched_gt)
return tp, fp, fn
def calculate_iou(self, box1, box2):
"""Calculate IoU (reuse previous function)"""
x1 = max(box1[0], box2[0])
y1 = max(box1[1], box2[1])
x2 = min(box1[2], box2[2])
y2 = min(box1[3], box2[3])
if x2 <= x1 or y2 <= y1:
return 0.0
intersection = (x2 - x1) * (y2 - y1)
area1 = (box1[2] - box1[0]) * (box1[3] - box1[1])
area2 = (box2[2] - box2[0]) * (box2[3] - box2[1])
union = area1 + area2 - intersection
return intersection / union
def calculate_metrics(self, tp, fp, fn):
"""Calculate precision, recall, and F1 score"""
precision = tp / (tp + fp) if (tp + fp) > 0 else 0
recall = tp / (tp + fn) if (tp + fn) > 0 else 0
f1_score = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0
return {
'precision': precision,
'recall': recall,
'f1_score': f1_score,
'tp': tp,
'fp': fp,
'fn': fn
}
def plot_pr_curve(self, predictions, ground_truths, class_id=0):
"""Plot Precision-Recall curve"""
# Sort by confidence
predictions = sorted(predictions, key=lambda x: x[1], reverse=True)
precisions = []
recalls = []
for threshold in np.arange(0.1, 1.0, 0.1):
filtered_predictions = [(box, conf, cls) for box, conf, cls in predictions if conf >= threshold]
tp, fp, fn = self.calculate_confusion_matrix(filtered_predictions, ground_truths)
metrics = self.calculate_metrics(tp, fp, fn)
precisions.append(metrics['precision'])
recalls.append(metrics['recall'])
# Plot PR curve
plt.figure(figsize=(8, 6))
plt.plot(recalls, precisions, 'b-', linewidth=2)
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('Precision-Recall Curve')
plt.grid(True)
plt.xlim([0, 1])
plt.ylim([0, 1])
# Calculate AP (Area under PR curve)
ap = np.trapz(precisions, recalls)
plt.text(0.6, 0.2, f'AP = {ap:.3f}', fontsize=12,
bbox=dict(boxstyle="round,pad=0.3", facecolor="yellow"))
return precisions, recalls, ap
# Example usage
pr_metrics = PrecisionRecallMetrics()
# Example data
predictions = [
([1, 1, 3, 3], 0.9, 0), # (box, confidence, class_id)
([2, 2, 4, 4], 0.8, 0),
([5, 5, 7, 7], 0.7, 1)
]
ground_truths = [
([1, 1, 3, 3], 0), # (box, class_id)
([5, 5, 7, 7], 1)
]
tp, fp, fn = pr_metrics.calculate_confusion_matrix(predictions, ground_truths)
metrics_result = pr_metrics.calculate_metrics(tp, fp, fn)
print("Detection Performance Metrics:")
print(f"True Positives: {metrics_result['tp']}")
print(f"False Positives: {metrics_result['fp']}")
print(f"False Negatives: {metrics_result['fn']}")
print(f"Precision: {metrics_result['precision']:.3f}")
print(f"Recall: {metrics_result['recall']:.3f}")
print(f"F1 Score: {metrics_result['f1_score']:.3f}")
1.4.3 mAP (mean Average Precision)
class mAPCalculator:
def __init__(self):
pass
def calculate_ap(self, precision, recall):
"""
Calculate AP (Average Precision) for a single class
Using interpolation method
"""
# Add endpoints
precision = np.concatenate(([0], precision, [0]))
recall = np.concatenate(([0], recall, [1]))
# Ensure precision is monotonically decreasing
for i in range(len(precision) - 1, 0, -1):
precision[i - 1] = max(precision[i - 1], precision[i])
# Find points where recall changes
indices = np.where(recall[1:] != recall[:-1])[0]
# Calculate area
ap = np.sum((recall[indices + 1] - recall[indices]) * precision[indices + 1])
return ap
def calculate_map(self, all_predictions, all_ground_truths, num_classes, iou_thresholds=None):
"""
Calculate mAP for multiple classes
"""
if iou_thresholds is None:
iou_thresholds = [0.5] # PASCAL VOC standard
class_aps = {}
for class_id in range(num_classes):
# Extract predictions and ground truths for current class
class_predictions = [(box, conf, cls) for box, conf, cls in all_predictions if cls == class_id]
class_ground_truths = [(box, cls) for box, cls in all_ground_truths if cls == class_id]
if len(class_ground_truths) == 0:
continue
class_aps[class_id] = []
for iou_threshold in iou_thresholds:
# Calculate precision and recall at different confidence thresholds
precisions = []
recalls = []
# Sort by confidence
class_predictions.sort(key=lambda x: x[1], reverse=True)
tp_cumsum = 0
fp_cumsum = 0
matched_gt = set()
for pred_box, confidence, pred_class in class_predictions:
best_iou = 0
best_gt_idx = -1
for gt_idx, (gt_box, gt_class) in enumerate(class_ground_truths):
iou = self.calculate_iou(pred_box, gt_box)
if iou > best_iou:
best_iou = iou
best_gt_idx = gt_idx
if best_iou >= iou_threshold and best_gt_idx not in matched_gt:
tp_cumsum += 1
matched_gt.add(best_gt_idx)
else:
fp_cumsum += 1
precision = tp_cumsum / (tp_cumsum + fp_cumsum)
recall = tp_cumsum / len(class_ground_truths)
precisions.append(precision)
recalls.append(recall)
# Calculate AP
if len(precisions) > 0:
ap = self.calculate_ap(np.array(precisions), np.array(recalls))
class_aps[class_id].append(ap)
# Calculate mAP
all_aps = []
for class_id, aps in class_aps.items():
if len(aps) > 0:
all_aps.extend(aps)
map_score = np.mean(all_aps) if len(all_aps) > 0 else 0.0
return {
'mAP': map_score,
'class_APs': class_aps,
'detailed_results': {
'per_class_ap': {cls: np.mean(aps) if len(aps) > 0 else 0 for cls, aps in class_aps.items()},
'iou_thresholds': iou_thresholds
}
}
def calculate_iou(self, box1, box2):
"""Calculate IoU"""
x1 = max(box1[0], box2[0])
y1 = max(box1[1], box2[1])
x2 = min(box1[2], box2[2])
y2 = min(box1[3], box2[3])
if x2 <= x1 or y2 <= y1:
return 0.0
intersection = (x2 - x1) * (y2 - y1)
area1 = (box1[2] - box1[0]) * (box1[3] - box1[1])
area2 = (box2[2] - box2[0]) * (box2[3] - box2[1])
union = area1 + area2 - intersection
return intersection / union
def coco_map(self, all_predictions, all_ground_truths, num_classes):
"""
Calculate COCO standard mAP (IoU from 0.5 to 0.95, step 0.05)
"""
iou_thresholds = np.arange(0.5, 1.0, 0.05)
results = self.calculate_map(all_predictions, all_ground_truths,
num_classes, iou_thresholds)
# Additionally calculate mAP@0.5 and mAP@0.75
map_50 = self.calculate_map(all_predictions, all_ground_truths,
num_classes, [0.5])['mAP']
map_75 = self.calculate_map(all_predictions, all_ground_truths,
num_classes, [0.75])['mAP']
results['mAP@0.5'] = map_50
results['mAP@0.75'] = map_75
return results
# Example usage
map_calculator = mAPCalculator()
# Example data (multi-class)
all_predictions = [
([1, 1, 3, 3], 0.9, 0), # Class 0
([2, 2, 4, 4], 0.8, 0),
([5, 5, 7, 7], 0.7, 1), # Class 1
([6, 6, 8, 8], 0.6, 1)
]
all_ground_truths = [
([1, 1, 3, 3], 0),
([5, 5, 7, 7], 1),
([8, 8, 10, 10], 1)
]
# Calculate mAP
map_results = map_calculator.calculate_map(all_predictions, all_ground_truths, num_classes=2)
print("mAP Results:")
print(f"mAP: {map_results['mAP']:.3f}")
print("Per-class AP:")
for class_id, ap in map_results['detailed_results']['per_class_ap'].items():
print(f" Class {class_id}: {ap:.3f}")
# Calculate COCO-style mAP
coco_results = map_calculator.coco_map(all_predictions, all_ground_truths, num_classes=2)
print(f"\nCOCO-style Evaluation:")
print(f"mAP@0.5:0.95: {coco_results['mAP']:.3f}")
print(f"mAP@0.5: {coco_results['mAP@0.5']:.3f}")
print(f"mAP@0.75: {coco_results['mAP@0.75']:.3f}")
1.5 Metrics Comparison and Selection
class MetricsComparison:
def __init__(self):
self.metrics_overview = {
"IoU": {
"purpose": "Measure bounding box overlap degree",
"range": "[0, 1]",
"advantages": ["Intuitive and easy to understand", "Simple calculation", "Widely used"],
"disadvantages": ["Only considers overlap", "Doesn't consider category", "Threshold dependent"],
"applicable_scenarios": "Bounding box quality assessment"
},
"Precision": {
"purpose": "Measure detection result accuracy",
"range": "[0, 1]",
"advantages": ["Reflects false positives", "Intuitive calculation"],
"disadvantages": ["Doesn't consider false negatives", "Threshold sensitive"],
"applicable_scenarios": "Applications concerned with false positive rate"
},
"Recall": {
"purpose": "Measure detection completeness",
"range": "[0, 1]",
"advantages": ["Reflects false negatives", "Intuitive calculation"],
"disadvantages": ["Doesn't consider false positives", "Threshold sensitive"],
"applicable_scenarios": "Applications concerned with false negative rate"
},
"F1-Score": {
"purpose": "Balance precision and recall",
"range": "[0, 1]",
"advantages": ["Comprehensive metric", "Single value"],
"disadvantages": ["Equal weight average", "May hide details"],
"applicable_scenarios": "General evaluation requiring balance"
},
"AP": {
"purpose": "Comprehensive performance evaluation for single class",
"range": "[0, 1]",
"advantages": ["Considers all thresholds", "Comprehensive evaluation", "Standardized"],
"disadvantages": ["Complex calculation", "Difficult to interpret"],
"applicable_scenarios": "In-depth evaluation for single class"
},
"mAP": {
"purpose": "Comprehensive performance evaluation for multiple classes",
"range": "[0, 1]",
"advantages": ["Multi-class comprehensive", "Industry standard", "Comparable"],
"disadvantages": ["Averaging may hide differences", "Most complex calculation"],
"applicable_scenarios": "Multi-class detection evaluation standard"
}
}
def print_comparison(self):
"""Print metrics comparison"""
print("Object Detection Evaluation Metrics Comparison:")
print("=" * 60)
for metric, details in self.metrics_overview.items():
print(f"\n{metric}:")
for key, value in details.items():
if isinstance(value, list):
print(f" {key}: {', '.join(value)}")
else:
print(f" {key}: {value}")
def choose_metrics(self, application_type):
"""Recommend metrics based on application type"""
recommendations = {
"Autonomous Driving": {
"main_metrics": ["mAP@0.5", "Recall"],
"reason": "Safety critical, cannot miss detections, high accuracy required",
"additional_considerations": ["Real-time performance", "Small object detection capability"]
},
"Industrial Inspection": {
"main_metrics": ["Precision", "F1-Score"],
"reason": "Avoid waste from false positives, balance accuracy and completeness",
"additional_considerations": ["Defect type balance", "Detection consistency"]
},
"Security Surveillance": {
"main_metrics": ["Recall", "mAP@0.5"],
"reason": "Cannot miss suspicious targets, overall performance must be good",
"additional_considerations": ["Real-time processing capability", "Night performance"]
},
"Medical Imaging": {
"main_metrics": ["Recall", "Precision"],
"reason": "Cannot miss diagnoses, also need to control misdiagnoses",
"additional_considerations": ["Sensitivity", "Specificity", "Clinical significance"]
},
"Retail Applications": {
"main_metrics": ["mAP", "F1-Score"],
"reason": "Multi-class products, need balanced performance",
"additional_considerations": ["User experience", "Cost-effectiveness"]
}
}
if application_type in recommendations:
rec = recommendations[application_type]
print(f"\n{application_type} Application Recommended Metrics:")
print(f"Main metrics: {', '.join(rec['main_metrics'])}")
print(f"Reason: {rec['reason']}")
print(f"Additional considerations: {', '.join(rec['additional_considerations'])}")
else:
print("No recommendations found for this application type, please choose general metric: mAP")
return recommendations.get(application_type)
# Example usage
comparison = MetricsComparison()
comparison.print_comparison()
# Recommend metrics based on application
autonomous_driving_metrics = comparison.choose_metrics("Autonomous Driving")
industrial_metrics = comparison.choose_metrics("Industrial Inspection")
Chapter Summary
1.6.1 Core Concepts Review
- Computer Vision is the technology that enables computers to understand visual information
- Object Detection simultaneously solves classification and localization problems
- Image Preprocessing is an important step in improving detection performance
- Evaluation Metrics help objectively assess algorithm performance
1.6.2 Important Technical Points
- IoU calculation and threshold selection
- Balancing precision and recall
- mAP calculation method and significance
- Metric selection for different application scenarios
1.6.3 Practical Points
- Understand data characteristics and choose appropriate preprocessing methods
- Select evaluation metrics based on application requirements
- Focus on balancing algorithm real-time performance and accuracy
- Pay attention to small object detection and occlusion handling
1.6.4 Next Chapter Preview
The next chapter will delve into deep learning fundamentals and convolutional neural networks, which are important theoretical foundations for understanding YOLO algorithms. We will learn:
- Basic principles of deep learning
- CNN structure and working mechanisms
- Evolution of common network architectures
- Preparation for YOLO learning
Through this chapter, we have established foundational knowledge in object detection, laying a solid foundation for subsequent in-depth study of YOLO series algorithms.