\n",
"\n",
"Dans ce TP, nous allons mettre en pratique certaines des méthodes présentées en cours pour localiser des objets dans une image.\n",
"\n",
"En localisation et détection, on cherche à déterminer la position d'un objet, ainsi que sa classe, sous la forme d'une boîte englobante de largeur $b_w$ et hauteur $b_h$, et dont le centre a pour coordonnées le point $(b_x, b_y)$. \n",
"\n",
"
\n",
"
Figure 1: Modèle de boîte englobante utilisé pour la localisation
\n",
"\n",
"Le problème de localisation considère qu'un seul objet est présent sur l'image, alors que le problème de détection cherche à déterminer l'ensemble des objets présents sur l'image.\n",
"\n",
"\n",
"\n",
"Pour commencer, récupérez les images de la base de données :\n"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"id": "2ZjveWpbuNeV"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"fatal: le chemin de destination 'mangeoires_loc' existe déjà et n'est pas un répertoire vide.\n"
]
}
],
"source": [
"!git clone https://github.com/axelcarlier/mangeoires_loc.git"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "LF6aRZLE2-Yl"
},
"source": [
"La base de données consiste en des photographies prises par une caméra reliée à une Raspberry Pi, cachée dans une mangeoire. Plusieurs mangeoires sont disséminées dans la nature en Occitanie, et l'objectif de [ce projet](https://econect.cnrs.fr/) est la reconnaissance des espèces et le comptage des individus qui viennent se poser devant le caméra, afin de suivre l'évolution des populations d'oiseaux et ainsi monitorer la biodiversité.\n",
"\n",
"La base de données qui vous est fournie regroupe 11 espèces d'animaux, majoritairement des oiseaux, désignés par un code : \n",
"\n",
"1. Mésange charbonnière (**MESCHA**)\n",
"2. Verdier d'Europe (**VEREUR**)\n",
"3. Écureuil roux (**ECUROU**)\n",
"4. Pie bavarde (**PIEBAV**)\n",
"5. Sittelle torchepot (**SITTOR**)\n",
"6. Pinson des arbres (**PINARB**)\n",
"7. Mésange noire (**MESNOI**)\n",
"8. Mésange nonnette (**MESNON**)\n",
"9. Mésange bleue (**MESBLE**)\n",
"10. Rouge-gorge (**ROUGOR**)\n",
"11. Accenteur mouchet (**ACCMOU**)\n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"\n",
"
\n",
"
Figure 2: Exemples d'images de la base de données
"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "kxZ6cVouz9Lp"
},
"source": [
"# Localisation et classification d'objet\n",
"\n",
"Dans cette partie, nous allons nous concentrer sur le problème de la localisation d'un seul objet par classe. La base de données a été épurée pour se concentrer uniquement sur ce cas.\n"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"id": "SBA3fa8RSDpt"
},
"outputs": [],
"source": [
"import PIL\n",
"from PIL import Image\n",
"import csv\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"\n",
"import tensorflow as tf\n",
"\n",
"from tensorflow import keras\n",
"from tensorflow.keras import layers\n",
"from tensorflow.keras import models\n",
"from tensorflow.keras.layers import Dense, Conv2D, MaxPooling2D, Flatten, Dropout, Input\n",
"from tensorflow.keras.models import Model, Sequential\n",
"from tensorflow.keras.optimizers import Adam\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "HKRm5oT-_Qsw"
},
"source": [
"## Préparation des données\n",
"\n",
"Le code ci-dessous permet de charger les données et les formater pour la classification. Prenez le temps de regarder un peu le format des labels $y$.\n",
"Notez que les images sont rendues carrées lors du chargement."
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"id": "Q54zSuMvGM-5"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['MESCHA', 'VEREUR', 'ECUROU', 'PIEBAV', 'SITTOR', 'PINARB', 'MESNOI', 'MESNON', 'MESBLE', 'ROUGOR', 'ACCMOU']\n"
]
}
],
"source": [
"# Lecture du CSV contenant les informations relatives à la base de données\n",
"dataset = []\n",
"with open('mangeoires_loc/bd_mangeoires_equilibre.csv', newline='') as csvfile:\n",
"\tfilereader = csv.reader(csvfile, delimiter=' ', quotechar='|')\n",
"\tfor row in filereader:\n",
"\t\tdata = row[0].split(',')\n",
"\t\tif data[0] != 'Data':\n",
"\t\t\tbox = [float(data[5]), float(data[6]), float(data[7]), float(data[8])]\n",
"\t\t\tnew_entry = {'type': data[0], 'specie': data[1], 'path': data[2], 'shape': [float(data[3]), float(data[4])], 'box': box}\n",
"\t\t\tdataset.append(new_entry)\n",
"\n",
"# Nombre de classes de la base de données et intitulé des classes\n",
"class_labels = list(dict.fromkeys([item['specie'] for item in dataset]))\n",
"num_classes = len(class_labels)\n",
"\n",
"# Extraction des données d'apprentissage et de test \n",
"dataset_train = [item for item in dataset if item['type']=='TRAIN']\n",
"dataset_test = [item for item in dataset if item['type']=='TEST']\n",
"\n",
"print(class_labels)"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"id": "JPzqdJBVJLWJ"
},
"outputs": [],
"source": [
"def build_localization_tensors(image_size, dataset, num_classes):\n",
" # Préparation des structures de données pour x et y\n",
" x = np.zeros((len(dataset), image_size, image_size, 3))\n",
" y = np.empty((len(dataset), num_classes + 5)) # 1 + 4 + num_classes : présence / boîte englobante / classes\n",
"\n",
" # Compteur de parcours du dataset\n",
" i = 0\n",
"\n",
" for item in dataset:\n",
" # Lecture de l'image\n",
" img = Image.open('mangeoires_loc/' + item['path'])\n",
" # Mise à l'échelle de l'image\n",
" img = img.resize((image_size,image_size), Image.ANTIALIAS)\n",
" # Remplissage de la variable x\n",
" x[i] = np.asarray(img)\n",
"\n",
" y[i, 0] = 1 # Un objet est toujours présent !\n",
"\n",
" # Coordonnées de boîte englobante\n",
" img_shape = item['shape']\n",
" box = item['box']\n",
" bx = (box[0] + (box[2] - box[0])/2)/img_shape[0]\n",
" by = (box[1] + (box[3] - box[1])/2)/img_shape[1]\n",
" bw = (box[2] - box[0])/img_shape[0]\n",
" bh = (box[3] - box[1])/img_shape[1]\n",
" y[i, 1] = bx\n",
" y[i, 2] = by\n",
" y[i, 3] = bw\n",
" y[i, 4] = bh\n",
"\n",
" # Probabilités de classe, sous la forme d'une one-hot vector\n",
" label = class_labels.index(item['specie'])\n",
" classes_probabilities = keras.utils.to_categorical(label, num_classes=num_classes)\n",
" y[i, 5:] = classes_probabilities\n",
"\n",
" i = i+1\n",
"\n",
" return x, y\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "fnlljZWc_i1L"
},
"source": [
"Séparation des données d'entraînement pour extraire un ensemble de validation, et pré-traitement des données."
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"id": "FJLRiuFX_VPL"
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/tmp/ipykernel_3171614/3333676307.py:13: DeprecationWarning: ANTIALIAS is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.LANCZOS instead.\n",
" img = img.resize((image_size,image_size), Image.ANTIALIAS)\n"
]
}
],
"source": [
"from sklearn.model_selection import train_test_split\n",
"\n",
"\n",
"# Pour la suite du TP on considèrera des images de taille 64x64x3\n",
"# Augmenter cette valeur donnerait de meilleurs résultats mais nécessiterait des calculs plus long.\n",
"IMAGE_SIZE = 64\n",
"\n",
"# Lecture des données d'entraînement et de test\n",
"x, y = build_localization_tensors(IMAGE_SIZE, dataset_train, num_classes)\n",
"x_test, y_test = build_localization_tensors(IMAGE_SIZE, dataset_test, num_classes)\n",
"\n",
"#Extraction d'un ensemble de validation\n",
"x_train, x_val, y_train, y_val = train_test_split(x, y, test_size=0.10, random_state=42)\n",
"\n",
"# Pour améliorer l'entraînement, on peut centrer-réduire les coordonnées des bounding boxes...\n",
"y_std = np.std(y_train, axis=0)\n",
"y_mean = np.mean(y_train, axis=0)\n",
"y_train[...,1:5] = (y_train[...,1:5] - y_mean[1:5])/y_std[1:5]\n",
"y_val[...,1:5] = (y_val[...,1:5] - y_mean[1:5])/y_std[1:5]\n",
"y_test[...,1:5] = (y_test[...,1:5] - y_mean[1:5])/y_std[1:5]\n",
"\n",
"# ... et normaliser les valeurs de couleur\n",
"x_train = x_train/255\n",
"x_val = x_val/255\n",
"x_test = x_test/255"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "DE4wQYq3AKnA"
},
"source": [
"## Fonctions utiles"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "n6NBpMtaM-C0"
},
"source": [
"Une fonction de calcul de l'intersection sur union, qui nous sera utile pour les métriques d'évaluation de nos méthodes :\n",
"\n",
"$$ IoU (R_1, R_2) = \\frac{\\mathcal{A} (R_1 \\cap R_2)}{\\mathcal{A} (R_1 \\cup R_2)} = \\frac{\\mathcal{A} (R_1 \\cap R_2)}{\\mathcal{A} (R_1) + \\mathcal{A} (R_2) - \\mathcal{A} (R_1 \\cap R_2)} $$ \n",
"\n",
"
\n",
"\n",
"
\n"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"id": "rr-O1XuwLoL3"
},
"outputs": [],
"source": [
"### A COMPLETER\n",
"def intersection_sur_union(box1, box2):\n",
" \"\"\"\n",
" Calcul de l'intersection sur union entre deux rectangles box1 et box2\n",
"\n",
" Arguments:\n",
" box1, box2 -- les coordonnées des deux rectangles, chacun sous la forme [cx, cy, w, h]\n",
" où (cx, cy) désigne les coordonnées du centre du rectangle, \n",
" w sa largeur et h sa hauteur\n",
"\n",
" Retourne :\n",
" iou -- la valeur d'intersection sur union entre les deux rectangles \n",
" \"\"\"\n",
"\n",
" # unpacking\n",
" cx1, cy1, w1, h1 = box1\n",
" cx2, cy2, w2, h2 = box2\n",
"\n",
" # haut gauche R1\n",
" x1 = cx1 - w1/2\n",
" y1 = cy1 - h1/2\n",
" \n",
" # bas droite R1\n",
" x2 = cx1 + w1/2\n",
" y2 = cy1 + h1/2\n",
"\n",
" # haut gauche R2\n",
" x3 = cx2 - w2/2\n",
" y3 = cy2 - h2/2\n",
" \n",
" # bas droite R2\n",
" x4 = cx2 + w2/2\n",
" y4 = cy2 + 2/2\n",
"\n",
" # haut gauche intersection\n",
" xi1 = max(x1, x3)\n",
" yi1 = max(y1, y3)\n",
"\n",
" # bas droite intersection\n",
" xi2 = min(x2, x4)\n",
" yi2 = min(y2, y4)\n",
"\n",
" # dimension de l'intersection\n",
" wi = abs(xi2 - xi1)\n",
" hi = abs(yi2 - yi1)\n",
"\n",
" # calcul des aires\n",
" aR1iR2 = wi * hi\n",
" aR1 = w1 * h1\n",
" aR2 = w2 * h2\n",
"\n",
" return aR1iR2 / (aR1 + aR2 - aR1iR2)"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"id": "SdqAFWFxRx8l"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.2\n",
"0.0\n"
]
}
],
"source": [
"print(intersection_sur_union([2.5, 2, 1, 4], [2, 3, 4, 2])) # Résultat attendu : 0.2\n",
"print(intersection_sur_union([2.5, 2, 1, 4], [5, 3, 4, 2])) # Résultat attendu : 0.0"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "RLpMJGi3QC9i"
},
"source": [
"Calcul des différentes métriques : \n",
"\n",
"$$ P = \\frac{TP}{TP + FP} $$\n",
"\n",
"$$ R = \\frac{TP}{TP + FN} $$\n",
"\n",
"$$ F1 = \\frac{2}{\\frac{1}{P} + \\frac{1}{R}} $$\n",
"\n",
"où $TP$ désigne le nombre de vrais positifs, $FP$ le nombre de faux positifs, $FN$ le nombre de faux négatifs, $P$ la précision, $R$ le rappel et $F1$ le F1-score.\n",
"\n",
"On considère souvent qu'une détection est correcte si la classification est valide et que l'intersection sur union entre vérité terrain et prédiction est supérieure à 0.5 (on utilisera un seuil modifiable *iou_thres*)."
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {
"id": "5e_aIvjXLq32"
},
"outputs": [],
"source": [
"# A COMPLETER\n",
"def global_accuracy(y_true, y_pred, iou_thres=0.5):\n",
" \"\"\"\n",
" Calcul, pour chaque classe de la précision, du rappel et du F1-score ainsi \n",
" que du pourcentage global de bonnes détections.\n",
"\n",
" Arguments:\n",
" y_true -- les labels de la vérité terrain, de dimension (M, 1+4+N) où M désigne\n",
" le nombre d'éléments du dataset et N le nombre de classes (11 dans notre cas)\n",
" y_pred -- les labels prédits par un modèle, de dimension (M, 1+4+N) \n",
" iou_thres -- seuil d'intersection sur union entre une boîte \"vérité-terrain\" et \n",
" une boite prédite au-dessus duquel on considère que la prédiction est correcte \n",
"\n",
" Retourne :\n",
" class_res -- liste de longueur N contenant des dictionnaires sous la forme \n",
" {\"Précision\": p, \"Rappel\": r, \"F-score\": f} résumant les métriques\n",
" précision, rappel et F1-score pour chacune des classes.\n",
" accuracy -- pourcentage global de bonnes détections\n",
" \"\"\"\n",
" # Initialisation des métriques : nombre de vrais positifs (TP), faux positifs (FP)\n",
" # et faux négatifs (FN) pour chaque classe\n",
" class_metrics = []\n",
" for i in range(num_classes):\n",
" class_metrics.append({'TP': 0, 'FP': 0, 'FN': 0})\n",
"\n",
" # Nombres de détections correctes et de détections incorrectes\n",
" total_correct_detections = 0\n",
" total_incorrect_detections = 0\n",
" for i in range(y_true.shape[0]):\n",
" # Labels vérité-terrain et prédits\n",
" groundtruth_label = np.argmax(y_true[i,5:])\n",
" predicted_label = np.argmax(y_pred[i,5:])\n",
"\n",
" # Coordonnées de boîtes englobantes réelles et prédites\n",
" bx_true = (y_true[i,1]*y_std[1] + y_mean[1])\n",
" by_true = (y_true[i,2]*y_std[2] + y_mean[2])\n",
" bw_true = (y_true[i,3]*y_std[3] + y_mean[3])\n",
" bh_true = (y_true[i,4]*y_std[4] + y_mean[4]) \n",
" bx_pred = (y_pred[i,1]*y_std[1] + y_mean[1])\n",
" by_pred = (y_pred[i,2]*y_std[2] + y_mean[2])\n",
" bw_pred = (y_pred[i,3]*y_std[3] + y_mean[3])\n",
" bh_pred = (y_pred[i,4]*y_std[4] + y_mean[4]) \n",
"\n",
" # Calcul de l'intersection sur union\n",
" iou = intersection_sur_union([bx_true, by_true, bw_true, bh_true], [bx_pred, by_pred, bw_pred, bh_pred])\n",
" \n",
" # Si la détection est correcte : \n",
" if groundtruth_label == predicted_label and iou > iou_thres:\n",
" total_correct_detections += 1\n",
" class_metrics[predicted_label][\"TP\"] += 1\n",
" else:\n",
" total_incorrect_detections += 1\n",
" class_metrics[predicted_label][\"FP\"] += 1\n",
" class_metrics[groundtruth_label][\"FN\"] += 1\n",
"\n",
"# for i in range(num_classes):\n",
"# print(class_metrics[i])\n",
"\n",
" class_res = []\n",
" for i in range(num_classes):\n",
" TP = class_metrics[i][\"TP\"]\n",
" FP = class_metrics[i][\"FP\"]\n",
" FN = class_metrics[i][\"FN\"]\n",
" if (TP or FP) and (TP or FN): \n",
" P = TP / (TP + FP)\n",
" R = TP / (TP + FN)\n",
" F_score = 2 / ( 1/P + 1/R )\n",
" else:\n",
" P = 0\n",
" R = 0\n",
" F_score = 0\n",
" class_res.append({'Precision': P, 'Rappel': R, 'F-score': F_score})\n",
"\n",
" accuracy = total_correct_detections / (total_correct_detections + total_incorrect_detections)\n",
" return class_res, accuracy"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {
"id": "AOOWhuk4VBZE"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"La précision globale est de 66.7%\n",
"\n",
"--------------------------------------------\n",
"| Classe | Précision | Rappel | F1-score |\n",
"--------------------------------------------\n",
"| Classe 1 | 1.00 | 1.00 | 1.00 |\n",
"--------------------------------------------\n",
"| Classe 2 | 0.00 | 0.00 | 0.00 |\n",
"--------------------------------------------\n",
"| Classe 3 | 0.50 | 1.00 | 0.67 |\n",
"--------------------------------------------\n"
]
}
],
"source": [
"num_class_test = 3\n",
"class_labels_test = ['Classe 1', 'Classe 2', 'Classe 3']\n",
"y_true_test = np.ones((num_class_test,8))\n",
"y_true_test[0,:2] = [0.5, 0.5]\n",
"y_true_test[0, 5:] = [1, 0, 0]\n",
"y_true_test[1,:2] = [0.5, 0.5]\n",
"y_true_test[1, 5:] = [0, 1, 0]\n",
"y_true_test[2,:2] = [0.5, 0.5]\n",
"y_true_test[2, 5:] = [0, 0, 1]\n",
"y_pred_test = np.ones((num_class_test,8))\n",
"y_pred_test[0,:2] = [0.6, 0.6]\n",
"y_pred_test[0, 5:] = [1, 0, 0]\n",
"y_pred_test[1,:2] = [2.5, 2.5]\n",
"y_pred_test[1, 5:] = [0, 0, 1]\n",
"y_pred_test[2,:2] = [0.6, 0.6]\n",
"y_pred_test[2, 5:] = [0, 0, 1]\n",
"\n",
"class_res_test, acc_test = global_accuracy(y_true_test, y_pred_test)\n",
"\n",
"print(f\"La précision globale est de {100 * acc_test:.1f}%\")\n",
"\n",
"print()\n",
"print(\"--------------------------------------------\")\n",
"print(\"| Classe | Précision | Rappel | F1-score |\")\n",
"print(\"--------------------------------------------\")\n",
"for i in range(num_class_test):\n",
" print(f\"| {class_labels_test[i]:9s}| {class_res_test[i]['Precision']:.2f} | {class_res_test[i]['Rappel']:.2f} | {class_res_test[i]['F-score']:.2f} |\")\n",
" print(\"--------------------------------------------\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Q1Tg4VFVVB-w"
},
"source": [
"**Affichage attendu :**\n",
"```\n",
"La précision globale est de 66.7%\n",
"\n",
"--------------------------------------------\n",
"| Classe | Précision | Rappel | F1-score |\n",
"--------------------------------------------\n",
"| Classe 1 | 1.00 | 1.00 | 1.00 |\n",
"--------------------------------------------\n",
"| Classe 2 | 0.00 | 0.00 | 0.00 |\n",
"--------------------------------------------\n",
"| Classe 3 | 0.50 | 1.00 | 0.67 |\n",
"--------------------------------------------\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "cMntCEgkANMg"
},
"source": [
"La fonction ci-dessous permet de calculer l'intersection sur union sur des tenseurs (et non des tableaux numpy), elle sera donc utilisable comme métrique pendant l'entraînement."
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {
"id": "tVk9cB1WAMUK"
},
"outputs": [],
"source": [
"def compute_iou(y_true, y_pred):\n",
" ### \"Dénormalisation\" des coordonnées des boîtes englobantes\n",
" pred_box_xy = y_pred[..., 0:2]* y_std[0:2] + y_mean[0:2]\n",
" true_box_xy = y_true[..., 0:2]* y_std[0:2] + y_mean[0:2]\n",
"\n",
" ### \"Dénormalisation\" des largeur et hauteur des boîtes englobantes\n",
" pred_box_wh = y_pred[..., 2:4] * y_std[2:4] + y_mean[2:4]\n",
" true_box_wh = y_true[..., 2:4] * y_std[2:4] + y_mean[2:4]\n",
" \n",
" # Calcul des coordonnées minimales et maximales des boiptes englobantes réelles\n",
" true_wh_half = true_box_wh / 2.\n",
" true_mins = true_box_xy - true_wh_half\n",
" true_maxes = true_box_xy + true_wh_half\n",
" \n",
" # Calcul des coordonnées minimales et maximales des boiptes englobantes prédites\n",
" pred_wh_half = pred_box_wh / 2.\n",
" pred_mins = pred_box_xy - pred_wh_half\n",
" pred_maxes = pred_box_xy + pred_wh_half \n",
" \n",
" # Détermination de l'intersection des boîtes englobantes\n",
" intersect_mins = tf.maximum(pred_mins, true_mins)\n",
" intersect_maxes = tf.minimum(pred_maxes, true_maxes)\n",
" intersect_wh = tf.maximum(intersect_maxes - intersect_mins, 0.)\n",
" intersect_areas = intersect_wh[..., 0] * intersect_wh[..., 1]\n",
" \n",
" # Aire des boîtes englobantes prédites et réelles\n",
" true_areas = true_box_wh[..., 0] * true_box_wh[..., 1]\n",
" pred_areas = pred_box_wh[..., 0] * pred_box_wh[..., 1]\n",
"\n",
" # Aire de l'union des boîtes prédites et réelles\n",
" union_areas = pred_areas + true_areas - intersect_areas\n",
"\n",
" iou_scores = tf.truediv(intersect_areas, union_areas)\n",
" return iou_scores\n",
"\n",
"def iou():\n",
" def iou_metrics(y_true, y_pred):\n",
" return compute_iou(y_true, y_pred)\n",
" iou_metrics.__name__= \"IoU\"\n",
" return iou_metrics"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "vidE3XHlAkst"
},
"source": [
"Visualisation des données et labels"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {
"id": "8yotZHKgAiV1"
},
"outputs": [
{
"data": {
"image/png": "",
"text/plain": [
"