1. Introduction
In light of the challenges posed by climate change, economic crisis, overpopulation, and high global demand for protein, poultry production emerges as a more environmentally friendly option compared to meat from livestock. It has a lower impact on greenhouse gas emissions and demonstrates higher conversion rates [1]. Additionally, poultry production requires fewer resources and has a faster production cycle, making it an appealing protein source [2]. However, the intensification of breeding has presented challenges in farm management and disease control, emphasizing the need to prioritize poultry welfare. To address these concerns and enhance poultry production, the integration of automated illness detection systems becomes indispensable [3].
Features represent visual characteristics with a specific biological reaction or process under investigation. These visual features are extracted from a specific region of interest (ROI). However, these features differ from the segmentation features used to separate the broiler from the background. Unlike deep learning techniques, machine vision systems must manually extract these features from all the images. Generally, these features in broiler observing systems can be divided into three comprehensive classes, morphological features, locomotion features and optical flow measures [4–7].
Morphological features, including size and shape, are commonly used to describe agricultural products and can be used to detect diseased chickens based on differences in posture [7,8]. Animal size is a crucial morphological characteristic used to observe animal growth and health, aiding in identifying various vitalities such as disease, feed conversion, growth, and market maturity [9,10]. Shape-based techniques are frequently favored due to their constancy to sensor noise and invariance to light and color changes [11]. Locomotion behaviors, such as poultry gait, are important for monitoring birds for lameness, activity, and health, detecting infection and infestation, and for assessing management practices [4,7]. Optical flow, a computer vision technique, estimates the motion of objects in a video sequence by computing displacement vectors of pixels between consecutive frames, enabling tracking of movement in the scene [12].
Automated recognition, counting, and tracking of chickens in poultry houses are vital for improved farming productivity and animal care. However, this task faces challenges, including noisy backgrounds, varying lighting conditions, and obstructions caused by equipment in broiler rearing areas [13]. To fully embrace precision farming, adoption of artificial intelligence and digital monitoring is crucial [14]. Utilization of cameras and artificial intelligence allows for continuous monitoring of chicken behaviors [13].
The implementation of accurate and real-time identification of poultry health status has become possible with the advancement of artificial intelligence techniques, including machine learning, deep learning, and computer vision algorithms. Convolutional neural networks (CNNs) have emerged as a prominent choice for poultry detection and tracking, making use of imaging technologies [15–18]. CNN has been extensively employed in poultry classification [19], achieving an accuracy of 97% in detecting sick laying hens using infrared thermal images [20], as well as in poultry vocalization detection and pecking activity using acoustic data [21,22], and lameness recognition [23].
Nowadays, many studies have used several CNN-based detection algorithms, including Faster R-CNN, Dense CNN (DenseNet), ResNet, SSD, and YOLO [24–33]. Bird vocalizations captured in the open field can be localized and segmented using YOLOv2 and DenseNet [34]. Neethirajan [13] trained and tested YOLOv5 to enhance chick tracking precision accuracy. He also used the Kalman filter to improve the model to track various chickens concurrently. It is now practical to automatically identify chickens, track their bodily behavior as a flock, identify diseases, monitor heat stress, determine comfort levels, and more. YOLO is a popular algorithm for single-stage detection with great speed and accuracy [35,36]. It performs well in detecting obscured and small objects in challenging field settings and has a faster detection rate than other detection algorithms [37,38]. Lastly, YOLOv7 was applied for hemp duck count estimation [39]. These research works offer valuable insights into the effectiveness of deep learning models for broiler detection, demonstrating their potential for various applications in poultry farming and disease control. However, it is crucial to consider the specific requirements of each detection task and select the most appropriate model accordingly. While various studies have utilized deep learning algorithms to detect broilers, laying hens, and ducks through visual images, these approaches may not be suitable for broiler detection and pathological phenomena classification in low-light conditions using YOLO-based algorithms [40,41]. Further investigation and validation on larger datasets and real-world scenarios are necessary to optimize the performance and generalization of these models in practical applications.
The YOLOv8 detector, the latest member of the YOLO series developed by Ultralytics, introduces a novel approach by forecasting object centers directly, rather than using specified anchor boxes. Notably, YOLOv8 utilizes mosaic augmentation during online training, enabling it to detect small variations in the submitted photos. With versions ranging from YOLOv8n to YOLOv8x, achieving mean average precision, mAP, values of 37.3 and 53.9, respectively, on the COCO dataset [42]; these models have demonstrated significant advancements in real-time detection and improved accuracy [39]. In this study, YOLOv8 and v7 were employed to detect and classify pathological phenomena in broilers, marking their first use as brand-new detectors for this purpose.
This study introduces a novel approach for detecting chickens using computer vision YOLO-based algorithms applied to thermal and visual images with distinct complex backgrounds. The use of thermal images offers advantages such as independence from illumination, enabling monitoring of chickens under different lighting conditions. Additionally, thermal imaging allows for tracking chicken body temperatures in the poultry house through the detection of infrared rays. The aim of this work was to create a comprehensive dataset of pathological phenomena in broilers utilizing both visual and infrared thermal images for training YOLO-based models to effectively detect broilers and accurately identify specific pathological phenomena, such as slipped tendons, stressed chickens with open beaks, lethargic chickens, and diseased eyes, while considering challenges related to varying lighting conditions. By employing both visual- and thermal-based models for monitoring, farmers can obtain results from both thermal and visual viewpoints, ultimately enhancing the overall dependability of the monitoring process.
2. Materials and Methods
2.1. The Dataset Characteristics
The dataset utilized in this study comprises thermal and visual images collected from 600 Cobb Avian 48 male and female broiler birds of various ages at the research farm of the Department of Poultry Production, Faculty of Agriculture, Kafrelsheikh University, Egypt. Thermal images were captured using a UNI-T UTi165A thermal imaging camera (manufactured by Uni-Trend Technology Company Limited, Dongguan City, Guangdong Province, China) with a temperature range of −10 ◦C to 400 ◦C, a measurement resolution of 0.1 ◦C, and an IR resolution of 19,200 pixels. Various viewpoints, including frontal, flat, side, single, and group views, were considered to ensure image diversity (Figure 1). Measures were taken to ensure accurate thermal image acquisition, such as capturing multiple images of specific regions, considering environmental conditions, and providing an acclimation period for broilers. Visual images were obtained using a Redmi 10 phone camera (manufactured by Xiaomi Corporation, Beijing, China) with specifications including a 13 MP front camera, 8 GB RAM, 128 GB storage, a Qualcomm Snapdragon 678 processor, and Android 11 operating system. The thermal and visual images used in this study are available online [43].
Figure 1. Typical examples of thermal images.
The image labeling consists of various classes including lethargic chickens, slipped tendons, diseased eyes, stressed (beaks open) chickens, pendulous crops, and healthy chicken (Figure 2). The annotation, splitting, preprocessing, and augmentation of the images were accomplished using the “Roboflow” software [44]. This software provides the necessary tools to transform raw photos into a custom computer vision model and use it in applications [45,46]. To accurately outline the disease phenomenon in broilers, the background areas were minimized to reduce the surrounding area. Examples of labeled broiler diseases in both optical and thermal images are shown in Figure 2. The annotated images were saved in XML format files. All chicken images were annotated across the six classes. The dataset was split into a training set (2.9 K images), a validation set (269 images), and a test set (145 images). Overfitting is a more critical concern in deep learning models. To address overfitting in deep learning, techniques such as regularization (e.g., L1 or L2 regularization), dropout, early stopping, and data augmentation are commonly used. To test the model overfitting, there are no frequent images among the split datasets of training, validation, and test sets [47]. Processing was conducted using Auto-Orient: applied, resize: stretch to 640 × 640, specified format yolov7 pytorch, and the dataset was downloaded as a zip folder on the local computer.
Figure 2. Typical classes include labeled images from a broiler dataset. Pendulous crop (1), stressed (beaks open) (2), diseased eyes (3), slipped tendons (4), and lethargic chicken (5). The rows (A1–A5,B1–B5) show ROIs of thermal and visual images with their varied annotations and the rows (C1–C5,D1–D5) show different types of thermal analysis.
Poultry houses have a complex environment where sidelight, backlight, slight occlusion, and strong occlusion will affect an equalized image of broilers, causing false detection or missing detection of targets. The training image should have more scenes to extract features and overcome the interference of complex scenes [48]. However, limitations arise when dealing with diseased broilers, as the availability of images is restricted due to disease-related constraints, limited number of affected broilers, and time constraints in capturing the images. Consequently, current advancements in deep learning research focus on enhancing existing data to augment the training dataset and improve neural network generalization.
In Figure 2 (column 2), chickens can be observed panting, which involves opening their beaks and breathing rapidly. This behavior is a mechanism used by chickens to release heat and dissipate the internal heat, similar to how dogs pant. Panting and rapid breathing are considered early indicators of heat stress in chickens [49]. Figure 2 (column 4) illustrates a condition known as tendon rupture or perosis, which is a metabolic disease causing leg weakness in chickens, ducks, and turkeys. It usually occurs in poultry under six weeks of age, resulting in flattened and enlarged hocks [50]. Lastly, Figure 2 (column 1) shows a pendulous or spastic crop, as it is often known.
The thermal analysis of infected sections in this study is depicted in the table presented in Figure 2, where two different approaches were employed. The first approach is represented in row C, and the second method is shown in row D. The first technique initially identified the afflicted area, followed by determining the maximum, minimum and average temperatures. The second method uses a temperature mask to assist us in better comprehending the heat distribution pattern and the lumpiest part of the temperature.
2.2. Thermal Image Processing
This section describes the methodology and image processing techniques used in this study. Infrared cameras can identify points of heat concentration in chickens. However, obtaining clear thermal images is just the initial step in thermography. The real challenges lie in the subsequent processing and interpretation of these images to transform them into meaningful thermograms. Then, these can be the basis for efficient optimization measures on objects captured thermographically. To analyze and evaluate thermal images, powerful analysis software is essential. In this study, the UTi165A software [51] was utilized to process and annotate the thermal images captured via the UTi165A thermal camera. The process involved several steps, as shown in Figure 3. Once the thermal images are imported, various adjustments can be performed and analyzed using the UTi165A software’s tools. This may include adjusting the color palette, temperature range, image enhancement, and other settings to enhance the visibility and interpretation of thermal patterns, this step is very useful for improving thermal patterns and showing the temperature gradient inside the image. The UTi165A software provides annotation tools to mark specific regions or objects of interest within the thermal images. These tools places markers on the image to highlight specific areas or anomalies, but in this study, this tool is not utilized for YOLO training, it is only used to aid and guide annotating at this anomaly part as a specific pathological phenomenon via the Roboflow software. After processing and annotating the thermal images, the work can be saved within the UTi165A software to preserve the changes performed. The software has options to save the images with annotations overlaid or save the annotations separately as metadata associated with the images. The processed images aid the YOLO-based model to train more accurately on thermal images.
Figure 3. Figure 3. Flowchart of processing and annotation of thermal images via UTi165A.
2.3. Feature Extraction from Thermal Images
In this study, the extracted features from thermal images are primarily based on patterns and pixel intensity rather than colors. Since thermal images capture temperature distributions rather than visible light, color information is not as relevant for feature extraction in this context. The YOLO-based model analyzes the thermal images by focusing on patterns and variations in pixel intensity, which corresponds to temperature differences. It utilizes convolutional layers to detect distinct thermal patterns indicative of different pathological phenomena. It learns to recognize sharp transitions, temperature gradients, hotspots, and other thermal pattern characteristics of specific conditions, without relying on predefined rules. Through iterative learning, the model automatically extracts discriminative thermal features for accurate classification of broiler diseases.
2.4. Experimental Environment and Hyper-Parameter Settings
The CPU of the utilized computer in this research contains 16 threads and 8-core Intel (R) Core (TM) i7-10870H, belonging to the 10th generation. It operated at a clock speed of 2.21 GHz with a turbo speed of 5 GHz. The CPU had a cache memory of 16 MB, and a maximum memory size of 128 GB (DDR4-2933). The GPU utilized was the NVIDIA GeForce RTX3060 equipped with 3840 CUDA cores and 6 GB of video memory. The operating system employed was Windows 10, and the software versions utilized were PyTorch 1.8.1, Python 3.8, and CUDA 11 [52,53]. For the training phase of this study, the number of epochs was set to 100. The batch size used for the model training was 8, and the input size was defined as 416 × 416 pixels. Regularization was applied through the Batch Normalization (BN) layer to update the model’s weights. The momentum factor (momentum) was set to 0.937, and the weight decay rate was set to 0.0005, a regularization method that adds a penalty of 0.0005 to the loss function was used to avoid overfitting. The initial vector was set to 0.01 during the training process (Table 1).
Table 1. Hyperparameters definitions of YOLO versions 8, 7 and 5.
2.5. Data Augmentation
In this study, challenges were initially encountered when training YOLO-based models using raw visual- and thermal-based datasets, as the model’s performance was found to be unsatisfactory. To overcome this, the importance of finding suitable augmentation techniques to create a more diverse and high-quality dataset for improved model training was recognized [54]. Numerous augmentation techniques were used to enhance generalizability and prevent model overfitting. These techniques were implemented using Roboflow. Horizontal and vertical mirroring, rotating (90°, 180°, and 270°), blurring, and adding noise are examples of conventional data augmentation techniques [55]. The conventional image augmentation techniques are outlined in Table 2.
Table 2. Configurations of the used conventional augmentation techniques.
2.6. Mosaic Data Augmentation
The newly emerged data augmentation technique for combining many images called Mosaic data augmentation dramatically improves the background of objects being detected [56]. These techniques can expand the datasets available and strengthen the resilience of the detection models in intricate scenes [36]. In this research, thermal and visual datasets for broiler and disease detection at different complicated scenes were augmented by combing the conventional and Mosaic techniques for developing a reliable detection model.
As shown by Figure 4, the Mosaic data augmentation steps are as follows; firstly, a batch of image data was randomly extracted from the broiler pathological phenomena dataset. Then, four images of this batch were arbitrarily chosen, scaled, dispersed, and joined or spliced into new images, and the procedures above were carried out for batch size times. Lastly, the Yolov8-based algorithm was trained using the Mosaic data augmentation, which is appropriate for small object detection [39,57].
The augmented training set was ultimately composed of 1600 images for each combination of various augmentation methods, resulting in a total of 9600 images specifically for pathological phenomena (Figure 5). To achieve this, the Roboflow online program was configured to triple the image data for each augmentation technique.
Figure 5. Arrangement of thermal and visual augmented training sets.
2.7. Architecture of YOLOv8
YOLOv8 architecture has been changed to have higher object detection precision in complex scenes than other previous versions of YOLO, as shown in Figure 6. This updated architecture includes a backbone consisting of a series of convolutional layers that extract features at different resolutions; these features are then passed through a neck module, where they are consolidated before being fed into the detection head. A total of 100 sample datasets were chosen from Roboflow Universe to evaluate how well models generalize to new domains. The small version of YOLOv8 was evaluated alongside YOLOv5 and YOLOv7 on the RF100 benchmark. YOLOv8 has an overall better mAP. There are five different versions of YOLOv8, from the smallest YOLOv8n with a 37.3 mAP score to the largest YOLOv8x with a 53.9 mAP score on COCO [58].
Figure 6. YOLOv8 network architecture includes four generic modules of the input terminal, backbone, head, and prediction.
2.8. Improved YOLOv8 with Anchor Free
YOLO series models have undergone multiple iterations or updates over time. Each new version of the model builds upon the previous versions to overcome limitations and improve performance. The YOLOv8 architecture adopts an anchor-free approach, similar to YOLOX, for object detection. This anchor-free approach eliminates the need for predefined anchors or reference points, resulting in more efficient and adaptable object detection across different scales and aspect ratios. During training, loss functions are employed to optimize the model’s parameters by minimizing the discrepancy between predicted values and ground truth annotations. YOLO v8 utilizes similar loss functions as YOLO versions 5 and 7, including box loss and classification loss. However, it deviates from using objectness loss and instead employs distributional focal loss, which treats the continuous distribution of box locations as a discretized probability distribution. This approach considers box locations as probability distributions rather than precise coordinates, providing a different perspective on object detection. Anchor boxes were a challenging component of older YOLO models. Anchor-free lowers the number of box predictions, which hurries up object detection speed that facilitates the post-processing of more candidate detections afterward inference [42].
2.9. Evaluation Metrics
2.9.1. Detection Evaluation Metrics: Mean Average Precision, mAP
The diagram in Figure 7 illustrates how to compute mAP. The calculation begins with each class detection recording, moving on to calculate precision, recall, and average precision (AP), using an interpolation of 11 points, and lastly, computing mAP. mAP is a metric for measuring object detection and segmentation system performance.
Figure 7. Calculation steps of mean average precision, mAP.
Intersection over Union (IoU) is a widely employed metric, ranging from 0 to 1, that evaluates the accuracy and precision of object detection and segmentation algorithms by calculating the ratio of overlapping area to the total area of both regions, facilitating quantitative assessment of algorithm performance, Figure 8.
Figure 8. Intersection over Union (IoU), equation (a), bounding box localizations (b).
2.9.2. Performance Assessors
The standard performance measures used to determine the accuracy of the trained classification model are precision, recall, F1 score, and accuracy [59]. These performance measures primarily depend on four key probabilities with model predictions. Thus, the definitions of each prediction probability are as follows:
(1) TP: True positives, implying that the model correctly predicted a label compared to the ground truth (actual data). (2) TN: True negatives, meaning that the model did not foresee the label and is not a component of the truth. (3) FP: False positives, denoting that the model predicted a label, but it is not a part of the ground truth (error type one). (4) FN: False negatives, meaning that the model did not predict a label, but it is part of the ground truth (error type two).
Calculating the subsequent performance metrics is simple after model prediction probabilities have been identified:
Precision
Precision measures how well the model can find true positives (TP) among all positive predictions (TP + FP).
Recall
Recall measures how well the model can find true positives (TP) among all actual positive samples (TP + FN).
F1 Score
It finds the optimal threshold confidence score at which precision and recall result in the highest F1 score. The F1 score calculates the balance between precision and recall. When the F1 score is high, precision and recall are high, and vice versa.
Average Precision (AP)
The model’s ability to distinguish negative samples is reflected in precision. The model’s capacity to identify negative samples is stronger than the higher precision. Recall measures how well a model can locate positive samples. The model’s capacity to detect positive samples increases with recall. The result of combining the two is the F1 score. The model is more reliable with a higher F1 score. The average precision (AP), calculated independently for each category, is the highest precision average value over all recall scenarios. Precision is a fairly logical evaluation metric, but occasionally it only captures some things. mAP, recall, and F1 score were thus introduced for comprehensive evaluation. The following formulas were used to determine the precision and mAP [60]:
The precision at each recall level r is interpolated by taking the maximum precision measured using a method for which the corresponding recall exceeds r:
The IoU threshold is set to 0.5 [61]. The sample with the highest confidence is the positive sample when an object is frequently identified; the other is the negative sample. The precision value of 10 bisectors on the horizontal axis 0–1 (including 11 breakpoints) on the smoothed precision–recall curve was acquired. The average value was determined as the final AP using Equation (5).
The mAP is a metric used to evaluate object detection models, such as Fast R-CNN and YOLO Mask R-CNN. The mean average precision (AP) value is calculated over recall values from 0 to 1. The mAP formula involves the following sub-metrics: confusion matrix, IoU, recall, and precision, discussing and interpreting as follows by calculating the AP for each class, and then averaging over the S classes.
2.10. Train YOLOv8 Model
Once the dataset was annotated and classified into different disease categories using Roboflow software, YOLOv8 PyTorch was used to export them into the suitable format of YOLO TXT and YAML files. All formatted data were downloaded as a zip folder for the YOLOv8 training process and model development procedures. The YOLOv8 model was trained in a local machine applying the following image size criteria: 640, batch size: 8, number of training periods: 100, and weights: yolov8s.pt. The following flowchart illustrates the model development stages (Figure 9).
Figure 9. Developing procedures of the Yolov8 object detection model on the customized dataset.
3. Results and Discussion
3.1. Experimental Analysis of Chicken Detection and Diseases Classifications
The whole dataset consisted of 10,000 thermal and visual images with 50,000 annotated frames, divided into training (80%), testing (10%), and validation (10%) sets. The model was trained on the entire dataset for broiler detection using 100 epochs and 8 batches, taking approximately 9.6 h for completion. The graphs in Figures 10 and 11 show the model’s performance improvement, with various metrics for both the training and validation sets, including classification (cls_loss), objectness (obj_loss), distribution focal (dfl_loss), and box (box_loss) loss. These metrics assess the model’s ability to locate broilers accurately, determine their class, and detect pathological phenomena, with the focal loss function addressing class imbalance during training. It can modify the cross entropy loss to concentrate learning on challenging misclassified samples that are dynamically scaled with a scaling factor decrementing to zero as confidence in the correct class rises. The model is swiftly amended regarding precision, mean average precision, and recall indices before plateauing after approximately 37 epochs (Figure 10A). Validation data objectness and box losses decreased until about 35 to 40 epochs.
Figure 10. Plots of training and validation sets and performance metrics of YOLOv7-based model for broilers (A) first model (visual-based model) for broiler detection, only at different lighting conditions and (B) second model (thermal-based model) for pathological phenomena identification through raw data (without augmentation process) of thermal images datasets, over 100 of training epochs.
Figure 11. Plots of training and validation sets of broiler pathological phenomena classification model created through YOLOv8, second model (thermal-based), through thermal images dataset augmented using the Mosaic technique, box, classification, and distributional focal (dfl) losses, and performance metrics of precision, recall, and mean average precision over the training epochs.
The evaluation metrics precision and recall evaluates the accuracy and completeness of the model in detecting and labeling specific phenomena in broilers. According to Tables 3 and 4, the YOLOv8-based thermal image model for pathological phenomena detection achieved high precision and recall of 0.988 and 0.956, respectively, resulting in an optimal F1 score of 0.972, indicating good balance between precision and recall.
Table 3. Comparison of the model performance metrics of the YOLO versions using the thermal and visual image datasets augmented using the Mosaic technique for the classification of broiler pathological phenomena.
Table 4. YOLOv8-based model training performance by using raw and augmented thermal and visual datasets via traditional augmentation methods for the classification of broiler pathological phenomena.
There are two distinct interpretations of mAP, as seen in Figures 10 and 11. The first one, mAP50, represents the mean average precision at an IoU threshold of 0.5. The developed YOLOv7-based model achieves an average value of 0.95 using the raw visual and thermal images datasets (Figure 10A). The other form, mAP50-90, is calculated at thresholds of IoU from 0.5 to 0.95 with a step of 0.5. The Yolov7-based model achieved values above 0.7, which indicates that the model performs well for the broiler detection under various lighting circumstances. However, when identifying pathological phenomena, the YOLOv7-based model’s performance and losses metrics show lower efficacy indices (Figure 10B), suggesting the need for a qualified and sufficient dataset for reliable and precise detection. To address this, two types of image data augmentation techniques, traditional and Mosaic techniques, were used in this study (Tables 3 and 4), synchronized with the emergence of a new version of YOLOv8.
Figure 11 shows that the progression of the classification loss during model training (train/cls_loss) steadily decreased over epochs, starting at 4.5 and reaching 0.21. In contrast, the validation classification loss (val/cls_loss) started at 3.1 and ended at 0.6, which is three times greater than the train/cls_loss. This significant difference suggests that the augmentation process is primarily responsible for this variation between the two losses.
3.2. Model Comparison and the Influence of Different Dataset Augmentation Methods
The YOLOv8-based model, trained with thermal image dataset augmented via the Mosaic augmentation method (Table 3 and Figure 11), exhibited a gradual reduction in training and validation loss over 100 epochs. The performance metrics rapidly improved at the initial epochs of 25–40 and then stabilized close to one. For instance, the mAP50 metric reached 0.99 from epoch 99, showing consistent performance from epoch 73 until the end of training (Figure 11).
The Mosaic method proved to be the most effective technique for thermal image dataset augmentation in YOLO-based training for broiler pathological phenomena identification. The YOLOv8-based model achieved the highest performance metrics values among the other previous versions of 7 and 5 (Table 3), with mAP50 and mAP50-95 reaching 0.988 and 0.857, respectively, indicating exceptional performance in identifying pathological phenomena identification in complex scenes. While the Mosaic method significantly improved the thermal image dataset, its impact on the visual image dataset was different, resulting in substantial improvements with mAP50 and mAP50-90 reaching 0.829 and 0.679, respectively. Traditional augmentation methods were used individually or in combination to augment both thermal and visual image dataset. Then, the augmented dataset was used to train the YOLOv8-based model. The YOLOv8-based model training results are illustrated in Table 4. YOLO versions 8, 7, and 5 took 0.863, 0.992, and 0.139 h for training of the augmented thermal image dataset via the Mosaic method, respectively. The bounding box augmentation was applied to enhance the detection of quickly moving objects; the fundamental notion of bounding box augmentation is to change the information inside the bounding box by varying, for instance, the brightness and blur of an object relative to its background. Bounding box degree augmentation creates additional training data merely by changing the bounding boxes of a video frame. Thermal and visual image datasets augmented using the bounding box method were treated to rotate at 90◦ , have a brightness of 25 or undergo a box-shear process (Table 4). By using the grayscale augmentation method, the input image is randomly transformed into a single-channel grayscale output image. This leads the model to place less emphasis on color as a single grayscale may not be appropriate in developing a detection model for objects with one color only. Grayscale augmentation is different from grayscaling as a preprocessing phase. Grayscale, as an augmentation step, is applied arbitrarily to a portion of the images in a training dataset. The combination of augmentation methods of grayscale with 90◦ rotation and box-shear of the bounding box for the visual image dataset increases the trained-model ‘s precision indicator to 0.901 (Table 4). However, the augmentation of the visual image dataset via the brightness 30 method can increase the precision to 0.969, and both mAP50 and mAP50-90 to 0.654 and 0.412, respectively. The performance measures of the trained model for visual image dataset augmented via a combination of flip, rotation, blur and cutout methods achieves a precision, mAP50 and mAP50-95 of 0.914, 0.573 and 0.349, respectively. Lower model recall trained for datasets augmented through methods of Shear + Brightness + Noise and Cutout, indicates that the model can only identify the pathological phenomena with a decimal of 0.539 and 0.606, respectively. However, higher precisions of 0.903 and 0.964, respectively, show that these models have higher capacity to correctly label the identified pathological phenomena through the models. F1 score gives a final impression of model performance in the classification process. For instance, the two previous models have an F1 score of 0.675 and 0.744, respectively.
In general, all these traditional augmentation trials indicate that the brightness augmentation method has the highest impact on the visual image dataset quality; enhancing precision, recall, and mAP50 (see Table 4). However, the other augmentation methods have limited impressions on some performance measures of the visual image dataset. In comparison between visual and thermal images, visual images are enhanced better than thermal images via the traditional augmentation techniques. This is due to the predominant white color of broilers inside the poultry house which allows visual images to be more responsive to traditional augmentation techniques than thermal images. In contrast, the Mosaic augmentation technique enhances thermal images more than visual images due to the temperature gradient found in thermal images, which fits the principal working concept of Mosaic augmentation. Mosaic augmentation combines multiple thermal images into a single composite image. This results in a larger field of view and provides additional context for the model to learn from. By including multiple broilers or surroundings in the image, the model can capture a more comprehensive understanding of the thermal patterns and relationships between different areas, thereby improving its ability to detect and classify abnormalities. The Mosaic augmentation method has the key answer to these interpretations, which dramatically enhances the performance metrics of the YOLOv8- based trained model for the augmented thermal image dataset due to the temperature gradient found in thermal images, which does not exist in visual images (Table 3). However, augmenting visual image datasets with the Mosaic augmentation method can increase mAP50 and mAP50-95 to suitable levels for both trained YOLOv8 and YOLOv7-based models, but precision indexes stay at 0.861 and 0.802, respectively.
3.3. Developed Models Capacity
The created YOLOv8-based model in this study for broiler pathological phenomena detection is more precise and accurate with the infrared thermal camera. Broiler activity can be monitored during light hours through the YOLOv8-based model for the surveillance optical camera. The quality of the captured images is affected by illumination intensity. The previously generated models have faced challenges trying to detect objects in various lighting situations. Therefore, these challenges can be resolved by integrating optical and thermal cameras. The thermal camera is capable of capturing images in diverse lighting and weather conditions, offering a wider field of view to capture a greater variety of objects with enhanced clarity. Thus, the quality of image acquisition devices, either thermal or optical, affects the algorithm training. Consequently, both thermal and optical images make up the thermal- and visual-based datasets used in this study to develop thermal- and visual-based YOLOv8 models that work concurrently, providing different perspectives about broilers states. The main motivation to use thermal and optical cameras is the requirement to obtain more precise data represented in two types of images. This is especially important for intensive poultry farms where broilers must be continuously monitored. With the help of this method, the proposed models are more reliable for monitoring broilers aroundthe-clock in various local or microclimate circumstances. Images of the broilers were taken roughly from all viewpoints and at multiple locations or orientations. Significant poultry pathological phenomena were acknowledged in the poultry house of the Faculty of Agriculture at Kafrelsheikh University as a sick broiler, i.e., lethargic, slipped tendon, diseased eye, stressed (their beaks are open), pendulous crop, and healthy broiler. The model results show thermal and optical detection of different broiler cases (Figure 12).
Figure 12. Thermal and optical surveillance cameras for chicken and hen detection, body temperature illustration, and pathological phenomena identifications inside the poultry houses.
4. Conclusions
The proposed model is appropriate to detect and classify the pathological phenomena of broilers in intensive poultry houses that require round-the-clock monitoring during production season to avoid dangers. The environment inside these poultry houses is not maintained consistently and has complex scenes with sidelight, backlight, slight and strong occlusions, and daytime and nighttime illumination. Production tools, such as heaters, fans, feeders, drinking lines, dust, and others, affect light intensity at different locations inside the house. Poultry production takes care of the individuals from the first day they entered the poultry house to the proper harvesting size. Three different versions of the YOLO-based algorithm were tested to achieve the best one. The developed YOLOv8-based model demonstrates enhanced and reliable performance in broiler detection and pathological phenomena classifications compared to other versions of YOLO. The developed model was trained on broiler detection at various ages and sizes. Five main categories of pathological phenomena can be acknowledged as stressed (beaks open), diseased eyes, slipped tendons, pendulous crop, lethargic and healthy broiler. The developed model was trained using the images captured with infrared thermal and visual cameras. Experiments were performed using raw thermal and visual datasets to train YOLOv7 for broiler detection and pathological phenomena classification. The performance measures of the YOLOv7-based model trained with raw datasets show acceptable levels for broiler detections. In contrast, the model used for pathological phenomena classification has the lowest pursuance for the thermal image dataset of mAP50 and mAP50-95 at 0.478 and 0.278, respectively.
For this reason, data augmentation methods are necessary to enhance the quality of the thermal and visual image datasets. Different augmentation methods were combined or individually applied to obtain the most suitable one. The thermal image dataset augmented via the Mosaic method shows the highest performance metrics in training the YOLOv8- based model with an mAP50 of 0.988, an mAP50-95 of 0.857, an F1 score of 0.972, a precision of 0.988, and a recall of 0.956. Therefore, this model has the most efficient capacity for broiler detections and the pathological phenomena classification in all environmental conditions.
Overall, the implementation of the YOLOv8-based model in intensive poultry production offers significant benefits, enabling timely and accurate monitoring to avoid potential dangers during the production season. With its ability to handle complex scenes and diverse lighting conditions, this model contributes to improved poultry welfare and efficient disease control. The findings of this study open avenues for further advancements in precision livestock farming and demonstrate the potential of AI-based detection systems in enhancing poultry production management and animal care.
Author Contributions: Conceptualization, W.M.E., M.F.A., G.G.A.E.-W., I.A.E. and I.S.E.-S.; methodology, W.M.E., J.G., G.G.A.E.-W., I.A.E., S.K.A. and L.A.A.-S.; software, W.M.E., M.F.A., F.S.M., I.S.E.-S. and G.G.A.E.-W.; validation, W.M.E., J.G., G.G.A.E.-W., M.A.A., S.K.A. and M.F.A.; formal analysis, W.M.E., J.G., G.G.A.E.-W., M.F.A. and F.S.M.; investigation, W.M.E., J.G., I.S.E.-S., I.A.E. and M.F.A.; resources, I.A.E., S.K.A., L.A.A.-S., M.A.A. and F.S.M.; data curation, W.M.E., J.G., G.G.A.E.-W., I.S.E.-S. and M.F.A.; writing—original draft preparation, W.M.E., J.G., G.G.A.E.-W., I.A.E., I.S.E.-S. and M.F.A.; writing—review and editing, W.M.E., J.G., G.G.A.E.-W., I.S.E.-S. and M.F.A.; visualization, W.M.E., J.G., G.G.A.E.-W., S.K.A., L.A.A.-S., M.A.A., F.S.M. and M.F.A.; supervision, W.M.E.; project administration, W.M.E., S.K.A., M.A.A., F.S.M. and L.A.A.-S.; funding acquisition, S.K.A. and L.A.A.-S. All authors have read and agreed to the published version of the manuscript.
Funding: This research was funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2023R365), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia. The authors extend their appreciation to the Deanship of Scientific Research at King Khalid University for funding this work through Large Groups Project under grant number (R.G.P.2/77/44).
Institutional Review Board Statement: Not applicable.
Data Availability Statement: The data presented in this study are available in the main manuscript.
Conflicts of Interest: The authors declare no conflict of interest.
This article was originally published in Agriculture 2023,13,1527. https://doi.org/10.3390/agriculture13081527. This is an Open Access article distributed under the terms and conditions of the Creative Commons Attribution (CCBY) license (https://creativecommons.org/licenses/by/4.0/).