Introduction
Working principle of stereo vision sensor and calibration process
Algorithms for plant feature characterization
Application of stereo vision for feature characterization in crops
Application of stereo vision for feature characterization in fruit trees
Factor affecting stereo vision and stereo matching algorithm
Challenges of stereovision imaging and solutions
Conclusions
Introduction
Feature characterization is the process of extracting and describing unique features or points of interest in an image which is necessary for applications such as object tracking, 3D reconstruction, and monitoring of plants. Stereo vision systems can be used for detailed 3D characterization and feature extraction of plants by combining RGB color data with depth information. Features characterization of plants such as plant height, canopy volume, row distance, and plant spacing are essential for understanding and managing various aspects of plant growth and agricultural practices. Characterizing the features are crucial in upland crops and fruits cultivation for optimizing resources, yield estimation, monitoring and management, harvest planning, resource allocation, research, and development for promoting sustainable agricultural practices. Stereovision involves using two or more cameras to capture images of the same scene from different viewpoints of plants. These images are then processed to extract corresponding feature points and calculate the 3D coordinates through triangulation. It is necessary for applications such as autonomous driving, robotics, and 3D reconstruction that require depth perception (Amean et al., 2013). It provides high-resolution and accurate depth maps, especially for close-range objects and textured surfaces. Stereo vision can work in various lighting conditions and is cost-effective compared to other depth sensing technologies. Stereo vision enabled feature characterization of fruit trees and upland crops using different depth maps and disparity maps (Hu et al., 2020; Hao et al., 2022; Zhong et al., 2021), as well as applied in autonomous driving in agriculture (Feng et al., 2020; Muhovic and Pers, 2020), robot navigation (Yu et al., 2019; Zhang et al., 2022), and agro-industrial inspection (Chen and Shen, 2023; Zhou et al., 2020). Some key methods are followed for feature characterization of plants using stereovision such as (i) structural characterization which construct the 3D structure and geometry of plants by generating dense point clouds from the depth data, measures plant height, leaf area, biomass, and other structural traits from the 3D models, and analyze growth patterns and morphological changes over time by registering 3D models from different time points (Ruigrok et al., 2024; Dandrifosse et al., 2020), (ii) Spectral characterization that leads to estimate plant health parameters such as chlorophyll content, nutrient deficiencies from the spectral data (Yoon and Thai, 2010), (iii) Deep learning for feature extraction that uses RGB-D data from stereo cameras as input to deep learning models for plant detection and segmentation in dense scenes. This method also trained convolutional neural networks (CNNs) on stereo data to automatically extract features such as leaf angles, plant architecture (Xiang et al., 2023).
Stereo vision was applied in different types of smart agricultural activities particularly in cultivation of upland crops and fruit tress such as detecting, mapping and digitizing canopy geometry with different plant architecture (Scalisi et al., 2024), geometric characterization of trees (Rossel and Sanz, 2012), crop height estimation (Kim et al., 2021), orchard and tree mapping (Nielsen et al., 2009), automatic plant feature recognition (Amean, 2017), characterization of upland crop plants (Xiang et al., 2023). It was an important technique for quantification of several features of plants such as canopy height, canopy volume, plant and row spacing for smart agriculture practices (Malekabadi et al., 2019; Bao et al., 2019; Dandrifosse et al., 2020; Guo et al., 2018; Kim et al., 2020; Ni and Burks, 2018; Shan et al., 2018; Rovira-Mas et al., 2010; Milien et al., 2012). Stereo vision also enabled the reconstruction of 3D images of plants and trees (Lee et al., 2010; Sanz-Cortiella et al., 2011a, 2011b; Rosell and Sanz, 2012; Ni and Burks, 2013; Usha and Singh, 2013; Malekabadi et al., 2019). Stereo vision offers cost-effective and rapid plant structural information, including growth patterns and 3D geometry reconstruction. It facilitates measurements of plant height, convex hull volume, surface area, and stem diameter, surpassing other several sensors used in agriculture (Dandrifosse et al., 2020; Ni et al., 2016; Ni and Burks, 2013; Ni and Burks 2018). For these circumstances, stereo vision is getting more popular for feature characterization of plants.
A review study highlighted the stereo-vision where a custom image processing algorithm was used to calculate geometric features such as leaf area and plant dimensions of Boston lettuce growth in a plant factory with promising results (Yeh et al., 2014). The stereo vision was added to other sensors and measurement techniques used in plant features characterization such as laser sensors, depth cameras, LiDAR, high-resolution radar, ultrasonic sensors, digital photographs, and high-resolution X-ray computed tomography (Rosell and Sanz, 2012; Malekabadi et al., 2019; Muller-Linow et al., 2015). Stereo vision was used for disparity mapping as well as for analyzing tree and plant geometry, assisting in distance measurement from a stereo vision camera which was attached with agricultural machinery. Stereo vision was utilized for object detection, fruit recognition, and growth stages monitoring through accurate 3D reconstruction with depth mapping (Tavares and Vaz, 2009). It is preferred for plant feature characterization due to the advantages in providing cost-effective and rapid 3D imaging capabilities, adaptability to natural field condition, overcoming capabilities to the challenges such as homogeneous leaves texture and complex canopy architecture of plants. The suitability of outdoor imaging under varying illumination conditions, producing high-resolution depth, and color information made the stereo vision crucial for plant feature characterization. Given the varied advantages and extensive applicability of stereo vision technology, this review aimed to provide an overview of the use of this technology in characterizing features of upland crops and fruit trees.
Working principle of stereo vision sensor and calibration process
Stereo vision sensors are widely used in various agricultural applications to provide depth perception and 3D information about the objects and environment such as plant health monitoring such as plant height, canopy volume measurement using disparity and depth maps, autonomous navigation of agricultural vehicles such as tractors, combine harvesters under agricultural field conditions, as well as inter plant spacing measurement using stereo vision sensors (Jin and Tang, 2009; Vazquez-Arellano et al., 2016). Several stereo vision sensors usually used for stereo vision technology are shown in Table 1 with their specifications.
Stereo vision uses two identical cameras to capture dual images of a target from different perspectives. It is found to be leading to infer depth from two images acquired from different viewpoints. Fig. 1 represents the working principle of the stereo vision techniques. Stereo vision generates depth maps from images that reflect the triangular similarity of rays from multiple viewpoints, by mimicking human visual perception. The working principle determines the quantifiable depth perception by recording objects with two cameras that are located on a common baseline, with a fixed distance between the two lens centers (Degadwala et al., 2020). Observation of a scene from two slightly different perspectives is the main concept of stereovision that leads to determine the relationship between pixel positions in the images according to the principle of triangulation from where the three-dimensional information extraction can be possible. In Fig. 1, the image coordinate system is illustrated with deviations dl and dr for the left and right images of a stereo pair, respectively. The baseline (b) is the distance from the image planes to the centers of each lens. The offsets are proportional to the distance (Rd) from the object to the camera and this relationship is utilized to compute the depth information of the object. Using the value of dl and dr, the disparity (Dp) is calculated according to the equation (1).
Table 1.
Several stereo vision sensors that are usually used for stereo vision technology.
Rd and Dp are calculated using the equation (2).
Where, b is the baseline of stereo camera (mm), f is the focal length of the lenses, Dp is the disparity (pixels), W is the pixels size (mm/pixel), and Rd is the Z coordinate distance in a 3D frame (mm).
Similarly, the X and Y coordinates are calculated according to the equation (3) using the pixel values of xl and yl in right image which are the disparity values (Rovira-Mas et al. 2004).
Precise camera calibration is essential for accurate 3D information, correcting lens distortions and estimating intrinsic parameters such as focal length and principal point (Malekabadi et al., 2019). The calibration methods such as using chessboard patterns optimize the accuracy by minimizing discrepancies in observed features (Kumar et al., 2020; Zhang et al., 2023). Various calibration methods, including manual, Matlab toolbox, and OpenCV-based approaches, are adopted for stereo vision (Rovira-Mas et al., 2010; Bao et al., 2019; Zhang et al., 2022), with Zhang method being popular due to its simplicity and robustness (Sampling and Methods, 2023; Yu et al., 2019; Li et al., 2018; Zhong et al., 2021). Multi-camera calibration methods (Liu et al., 2022), such as using circular plates for feature extraction, enhance the precision (Cui et al., 2016). Calibration significantly impacts the stereo vision accuracy (Feng and Fang, 2021; Sampling and Methods, 2023; Yu et al., 2019), with errors leading to increased uncertainties in 3D reconstruction (Korthals et al., 2018).
Data acquisition and processing techniques
Data acquisition and processing techniques are vital for the effectiveness of stereo vision and matching algorithms. By acquiring these data acquisition and processing techniques, stereo vision systems can achieve robust and accurate depth estimation, essential for various applications like autonomous navigation, augmented reality, and object recognition. Ni et al. (2016) developed a procedure for data acquisition of plants and trees using the stereo vision where two stereo cameras were assembled as parallel with the baseline at 30 mm for reconstruction of the full view of plants and trees. Multiple images from different angles of view were captured where the target plants and trees were in the center, and the stereo vision camera positions were in around the target plants and trees accordingly. Moreover, the images suggested to be captured from the adjacent locations having an overlapping region. Bao et al. (2019) used Phenobot 1.0, an autonomous data acquisition system using the stereo vision and collected over 100,000 stereo images of sorghum from tall crop with dense canopies. The study faced challenges at maturity stage due to plant height (0.5-3 m) and horizontally spreading leaves were blocking for mid-level and top-level camera views. The study also suggested an alternative solution of attaching additional sets of stereo vision camera heads vertically with variable tilting angles. The process consists of image segmentation, 3D image reconstruction, depth mapping, and disparity mapping.
Image segmentation found to be crucial in various computer vision applications, such as 3D reconstruction, classification, object recognition, and motion detection (Mohammed and El-Sheimy, 2019). A study result highlighted the effectiveness of stereo vision in 3D reconstruction, noise-free object segmentation and a segmented disparity map (Mohammed and El-Sheimy, 2019). A specific stereo vision and clustering algorithm were found to provide improved segmentation results compared to methods relying solely on color or geometry (Dal et al., 2012; Mutto et al., 2011; Imaging et al., 2021; Sheng et al., 2020; Zhao et al., 2016). From literature review various methods were found to exist for image segmentation, including the thresholding method, edge-based method, region-based method, watershed method, clustering-based method, and neural-network-based method (Imaging et al., 2021). The thresholding method analyzed the gray-level histogram of full or partial images to generate threshold values, segmented objects by clustering pixels and widely accepted for its simplicity, robustness, and accuracy (Imaging et al., 2021; Li et al., 2011). The edge-based method identified object boundaries by detecting image edges, offering low complexity but susceptibility to noise (Imaging et al., 2021). The region-based method employed pixel feature homogeneity such as gray scale, color, or texture for segmentation, aiming to partition the image into regions with distinct characteristics (Angelina et al., 2012; Khokher et al., 2013).
The watershed method considered an image as a topographic map, utilizing variations in flood water heights and watershed lines for segmentation (Chai et al., 2006; Kang et al., 2009). A high-efficiency hardware accelerator was developed for a self-organizing map (SOM) neural network for implementing unsupervised color segmentation of stereo images in real time (Imaging et al., 2021; Torbati et al., 2014; Ortiz et al., 2014). Another study exhibited that clustering method was found to be leveraged intraclass and interclass homogeneity for optimal segmentation, often using K-means clustering (Imaging et al., 2021), fuzzy c-means (Li and SHEN, 2008), and probabilistic extensions such as the Gaussian mixture model with the expectation–maximization algorithm. That image segmentation was widely adopted for simplicity and accurate segmentation results (Fauvel et al., 2013). Stereo rectification is a process to eliminate lens distortions and standardize image pairs, aligning optical axes, and ensure row alignment of image planes where plant image pairs are rectified using stereo rectification. Stereo rectification aligned the images along epipolar lines and compensated for lens distortion, including the fish-eye effect around the image boundary (Li et al., 2017).
For projective reconstruction of plant and tree canopies with stereo cameras, Ni et al. (2016b) used visual structure-from-motion (VisualSFM) method. The actual 3D points for image pairs were estimated, and the projective reconstruction was transformed into metric reconstruction through rigid transformation. A validation experiment using a hexagon box demonstrated the ability of the method to achieve true size reconstruction. For 3D reconstruction of dormant cherry tree, two Kinect devices found to be used in an indoor environment where some branches were missed for occlusion and being long distant between camera and the tree (Wang and Zhang, 2013). To reconstruct 3D model of horticultural crops, Song and Eng (2008) used stereo vision where the cameras were installed for scanning the top of the crops to reconstruct the top view of the plants. Han and Burks (2013) studied on 3D reconstruction of citrus canopy using stereo depth cameras where 8 points algorithm was used for stitching consecutive images into a mosaic and results did not achieve real size reconstruction. Stereo vision was also found to be used for 3D reconstruction of corn plants (Wang et al., 2009).
Cheng et al. (2016) demonstrated that depth mapping through stereo vision used to estimate a 3D structure of a scene from stereo camera captured images. The study also showed that pixel depth could be estimated by matching pixels and knowing camera geometry. Depth maps found to be applied in robot navigation, driver assistance systems, and autonomous driving in this study. Ansari et al. (2010) mentioned in a study that the depth map enabled to provide distance information, allowing for tasks such as object detection and distance estimation where stereo vision-based depth sensing was found to capture depth at longer ranges with a high frame rate and a larger field of view to make it suitable for both indoor and outdoor applications. The depth map enabled for 3D object detection and 3D reconstruction of objects in agricultural field. Fig. 2 shows the procedure of calculating the depth values from the images for depth mapping according to the equation (4) as follows:
Where, D is the depth value from the images, d is the disparity, and the expression x/xl and xl-xr are to determine the disparity between two images.
A disparity map is a visual representation of the difference in depth between the corresponding pixels of a pair of stereo images. In agriculture, it is used to monitor the growth of trees, plants, and crops, as well as for crop detection and height measurement in digital farming (Nugroho et al., 2020). The disparity map helps in assessing tree canopy geometric characteristics and is an important tool in precision agriculture. It has diverse applications in robotics, object detection, remote sensing, and autonomous driving those are closely relevant to agriculture (Quan et al., 2023; Shean et al., 2016; Zhou et al., 2020). It enables the automatic measurement of crop height, which is a crucial factor in agricultural management and decision-making. The use of stereo vision systems and the computation of the disparity map contribute to the efficient monitoring and management of agricultural resources and can support various aspects of agricultural innovation and productivity improvement (Malekabadi et al., 2019). Several stereo vision sensors that are usually used are exhibited in Table 1 with technical specifications. A matrix with dimensions corresponding to an image is termed a disparity map. The values within the matrix represent the distances between corresponding pixels in the left and right images of a stereo pair (Malekabadi et al., 2019); Amean (2017) described a binocular stereo model with identical cameras separated by a baseline distance and coplanar image planes for disparity calculation.
Algorithms for plant feature characterization
Stereo matching algorithm and 3D reconstruction algorithm are used for plant feature characterization using stereo vision. Stereo matching algorithms are crucial for characterizing plant features by quickly and affordably measuring and reconstructing 3D structures. The corresponding pixels are identified in multiple views to compute disparities, aiding in applications such as autonomous driving and robotics (Quan et al., 2023; Shean et al., 2016; Zhou et al., 2020). The methods such as energy minimization and comparisons using window were utilized in stereo matching algorithms (Malekabadi et al., 2019; Quan et al., 2023; Yao and Xu, 2019; Zhou et al., 2020). The stereo matching algorithms were found to be categorized into local, global, and semi-global methods (Zhou et al., 2020), with approaches including the matching using the intensity and feature (Chen et al., 2023; Islam et al., 2023; Okura, 2022; Zhong et al., 2021). Stereo matching was enhanced by deep learning (Zhou et al., 2020), with non-end-to-end and unsupervised learning algorithms were showing promise but were facing challenges such as high computational errors and low-quality results (Quan et al., 2023; Tankovich et al., 2021). The techniques such as stereo camera calibration and rectification were crucial for quality 3D reconstruction of field crops (Bao et al., 2019).
Various techniques were used in 3D reconstruction of plants in agricultural fields using stereo vision to address challenges such as capturing non-rigid plants in noisy environments. These techniques included sensor data fusion, real-time reconstruction using depth cameras such as Microsoft Kinect, algorithms based on multi-view image sequences, and integration of machine learning for crop analysis and morphological feature characterization (Chen et al., 2020; Sampaio et al., 2021). These approaches catered to the unique requirements of 3D reconstruction in agriculture, providing advantages such as detailed morphological feature characterization and improved monitoring of plant characteristics. Real-time 3D reconstruction using Microsoft Kinect cameras was found cost-effective for on-the-field applications (Harandi et al., 2023). Integration of stereo matching algorithms with 3D reconstruction techniques could enhance the accuracy and efficiency of plant feature characterization processes. Fig. 3 demonstrates the stereo matching flow diagram of 3D reconstruction techniques of plants.
Application of stereo vision for feature characterization in crops
Measurement of plant height and canopy volume
Plant height was mentioned as an important morphological factor for crop growth identification, yield prediction, and crop cultivation management by Kim et al. (2021). The height of cotton plants was estimated using a tractor-mounted setup employing a Kinect-v2 sensor by Jiang et al. (2016), demonstrating its efficacy in real field conditions. A 3D model for measuring cauliflower plants was developed utilizing a Kinect-v1 based algorithm by Andujar et al. (2016), showing only 2 cm variance from the actual height. Depth map of wheat plants was generated by Dandrifosse et al. (2020), where a segmentation mask technique was followed to represent the plant height as the distance between camera-wheat and camera-ground distances (Fig. 4). Stereo vision achieved 97% precision in determining mean spike top heights when compared to manual measurements, with RMSE of 0.016 m. Plant height estimation of five crops including cabbage, potato, sesame, radish, and soybean was conducted using stereo vision by Kim et al. (2021), with the plant height estimated having an R2 ranging from 0.78 to 0.84 and less than 5% error for five different crops. Fig. 5 showed the steps of stereo image processing techniques for measuring the crop plant height. Three types of plants (croton, Jalapeno pepper, lemon tree) of varying leaf sizes were reconstructed using a metric reconstruction method for canopy volume estimation where reconstruction accuracy was verified with hexagon box of known volume and wrapped with printed citrus leaf images using a stereo vision. Plant canopy volumes were calculated by bounding boxes and divided into voxels. For canopy volume estimation, unused voxels were removed, and the volume of remaining voxels were summed in the study (Ni et al., 2016).
Plant spacing and row distance measurement
Kim et al. (2021) utilized stereo images to generate disparity maps, calculating pixel disparities to determine crop depth for inter-row distance and plant spacing measurements. Qiu et al. (2018) highlighted the laborious nature of conventional inter-plant space measurements, necessitating automatic measurement methods. Stereo vision was suggested for plant spacing and row distance measurements due to difficulties with individual plant separation using colour cameras. Mooney and Johnson (2014) demonstrated a stereo vision-based corn plant sensing technique with promising performance in individual corn plant detection and centre location measurements. Under natural light conditions, 96.7% of plants were correctly detected, with maximum distance errors of 5 cm and 1 cm for 74.6% and 62.3% of detections, respectively. Therefore, stereo vision enables to successfully measure the plant spacing and row distance.
Application of stereo vision for feature characterization in fruit trees
Tree height and canopy estimation
Morphological features characterization of fruit tree is an essential but labor-intensive task in horticulture. Manual measurements are often followed to measure the tree height and canopy measurement but might not be accurate and reliable particularly when the canopy volume was measured. Malekabadi et al. (2019) captured stereo images of plants and generated disparity map of canopy shape where the used algorithm achieved height calculations with errors of less than 7% for both elliptical (6-7%) and conical trees (2-3%). Dong and Isler (2018) used stereo vision techniques to estimate morphological parameters of apple trees such as tree height, canopy volume using alpha-shape algorithm where the bounding boxes were used. The height and volume of the bounding box represented the tree height and volume, respectively. The result showed 4 cm and 3.8 cm trunk diameter of error in height and volume measurement, respectively. Costa et al. (2019) used stereo vision for measuring the hazelnut tree distance and exhibited the distance measurement results with reasonable accuracy as well as error less than 5% in the range at distances lower than 20 m.
Comparison of stereo vision and different proximal sensors
In agriculture, various proximal sensors including LiDAR, time-of-flight cameras, structure-from-motion, and ultrasonic sensors are used alongside the mono and multi-view stereo vision for plant and tree characterization (Hui et al., 2018; Jay et al., 2014; Kazmi et al., 2012; Li et al., 2014; Perez-Sanz et al., 2017; Scharr et al., 2017). LiDAR sensors create precise canopy models but are expensive and often combined with RGB cameras for colour accuracy. Time-of-flight cameras offer quick depth computation but struggle in strong sunlight. Binocular stereo vision is cost-effective for outdoor conditions but faces challenges with stereo matching errors. Multi-view stereo systems enhance depth map quality, while structure-from-motion reconstructs scenes using a single moving camera. In field conditions, stereo vision is a simple and robust technique for studying canopy architecture, although in-field applications for crop detection and leaf characterization have limited comparison to laboratory settings (Leemans et al., 2013; Muller-Linow et al., 2015; Tilneac et al., 2012). Recent comparisons of 3D sensors for plant feature characterization indicate that stereo vision, while sensitive to sunlight and not ideal for outdoor use, can still provide in-depth information without special shading. Sunlight issues are also faced by time-of-flight cameras with no current solution, while multi-view stereo and structure-from-motion methods like binocular stereo can be enhanced with additional cameras or more shots per scene (Li et al., 2014; Perez-Sanz et al., 2017; Qiu et al., 2018; Vazquez-Arellano et al., 2016; Wang et al., 2018; Yuan et al., 2018). Ultrasonic sensors are cost-effective for measuring plant height but lack accuracy for creating 3D models, not such as LiDAR, which offers better accuracy. Stereo vision, though cheaper, smaller, and more flexible, is outperformed by LiDAR in providing color and metadata without multiple sensors and offers better resolution (Jimenez-Berni et al., 2018; Li et al., 2017). Table 2 showed several key considerations between the LiDAR and stereovision sensor which were found to be used for plant characterization.
Table 2.
Comparison of stereo vision with LiDAR.
Factor affecting stereo vision and stereo matching algorithm
Stereo vision and stereo matching, a major technology in computer vision, plays an important role in reconstructing the three-dimensional (3D) structures of the real world from two-dimensional (2D) images. The applications span across diverse fields including autonomous driving, augmented reality, robotics navigation, and agricultural field application. Despite the widespread utility, using stereo vision and stereo matching or disparity estimation for pixel matching across differently exposed stereo or multiview images presents considerable challenges. Traditional stereo vision and matching algorithms confront various limitations that impede their effectiveness in complex scenarios. Direct sunlight affects stereo matching, causing issues like overexposure that impact image quality and algorithm performance (Li et al., 2014). A study evaluating a stereo vision system for cotton row detection and boll location estimation in direct sunlight underscores the importance of considering light conditions in stereo matching algorithm development (Fue et al., 2020). Challenges arise when direct sunlight and leaf reflection deviate from specified constraints, causing variations in pixel intensities across stereoscopic images, posing difficulties in stereo matching, especially in sunlight zones, and affecting the clarity of leaf texture (Muller-Linow et al., 2015). Sunlight also affects stereoscopic system design, with baseline and affected camera height (Li et al., 2017). Field crop stereo matching is influenced by factors such as lighting conditions, homogeneous colors, dense canopies causing occlusion, intense sunlight leading to specular reflection, and challenges in preserving thin structures such as stems (Bao et al., 2019). Challenges in stereo matching include color inconsistencies, varying illumination, sensor differences, and specular reflections (Dattagupta, 2012). Solutions to address sunlight sensitivity include enhancing stereo matching algorithms with the census transform and using a shadowing device to minimize sunlight impact (Dandrifosse et al., 2020).
Handling occlusions and reflections is a common challenge in stereo vision, leading to mismatches and ambiguities in stereo matching. Minimizing occlusions and reflections through suitable camera configurations and lighting conditions is essential. Robust stereo matching algorithms capable of handling outliers and errors are also necessary. Proper camera calibration is crucial in stereo vision to determine camera parameters accurately, ensuring precise depth estimates and clear images. Using high-quality calibration targets like checkerboards or dot grids and following a rigorous procedure covering various angles and distances minimizes calibration errors. Regular recalibration, especially after exposure to environmental factors, maintains accuracy. Dealing with texture less and repetitive regions poses another challenge in stereo vision, making stereo matching difficult due to a lack of distinctive cues. Adding artificial texture or markers to such regions and utilizing stereo matching algorithms incorporating global or semi-global constraints help overcome this challenge, optimizing matching accuracy over a large area.
Challenges of stereovision imaging and solutions
Several challenges were encountered in the characterization of plants using stereo vision, including high matching time, cost, incorrect matching outputs, stereo image collection in unstructured orchard environments, registering reconstructions of the two sides of fruit tree rows, computation of the disparity map, and the effects of canopy shapes (Zhang et al., 2023). Challenges in handling stereo vision occlusion, especially in dense plant canopies, were found to affect the accuracy in plant feature characterization and measurement (Lowe, 2004; Mirbod et al., 2023; Tan et al., 2007; Wang and Zhang, 2013; Zhang et al., 2023). Challenges in accurately estimating plant parameters and canopy structure included the computation of trees from the disparity map, effects of canopy shapes on stereo vision, and registration of reconstructions of the two sides of fruit tree rows (Ni et al., 2016; Zhang et al., 2023). The complexity of data related to various stages of plant growth, ambient environmental conditions, and leaf overlapping were also identified as challenges in plant feature characterization and measurement (Amean, 2017). To address these challenges, several techniques were suggested, such as improving texture and lighting, avoiding strong sunlight, using local algorithms where global algorithms do not perform well, reconstructing plant canopies with camera matrices, and combining stereo vision with other 3D methods were suggested to improve the accuracy (Li et al., 2014; Ni and Burks, 2013; Wang et al., 2020; Zhang et al., 2023) demonstrates that occlusion in unstructured orchards consists of leaves, branches, and fruits hampers the proper depth measurement. To solve the issues, the semi-global matching method was optimized to obtain high accuracy with sparse disparity values. An improved bilateral filtering technique was suggested to use to solve the holes and discontinuities generated by occlusion. Furthermore, a pyramid fusion model was recommended to combine numerous low-resolution bilateral filtering results for improving accuracy, efficiency, and to create dense disparity maps with decreased errors to 3.2 mm, average relative error of 1.79%, and saving more than 90% of time. Sometimes due to the shortage of light condition, some portion of the depth image becomes visible that make challenging to get some detailed information from the images. To address this challenge, several image enhancement algorithms such as gamma correction, histogram equalization, and Contrast Limited Adaptive Histogram Equalization (CLAHE) are suggested to apply to improve the image quality. Gamma correction algorithm is used to make the image brighter without changing the disparity. Histogram equalization can make visible the missing parts of the depth images happened due to the lack of light condition but can amplify the noise. It is presented that CLAHE can enhance the depth images with the visualization of missing parts with less noise (Xu et al., 2016).
Addressing the dynamic environment is critical while obtaining stereovision pictures in open field. To ensure robust and accurate depth perception in such dynamic settings, a variety of factors should be considered, including lighting conditions, computational demands, algorithmic approaches such as efficient algorithms and post-processing, hardware configurations like camera setup and generic multi-core CPUs or optimized hardware setups, and environmental adaptability such as varying weather conditions and times of day. By addressing these concerns, real-time stereo vision systems may achieve excellent performance and reliability in wide areas, making them appropriate for applications such as autonomous driving of agricultural machinery, robotic navigation, and outdoor depth sensing in agricultural operation environment.
Conclusions
Stereo vision had emerged as a promising tool for characterizing the features of upland crop plants and fruit trees, as evidenced by various studies. Stereo vision had been successfully utilized for estimating plant height, canopy volume, row distance, and plant spacing. These applications had been proven the capability of stereo vision to offer detailed and accurate insights in plant features characterization. In conclusion, the integration of stereo vision technology in plant feature characterization had demonstrated its efficacy in providing high-precision measurements and understanding of plant 3D geometries. The review emphasized on data acquisition, camera calibration, stereo mapping, different algorithms used for stereo matching, ROI segmentation using bounding boxes, and 3D reconstruction of plants. Several challenges faced in plant features characterization were identified and the techniques for overcoming the challenges were also suggested. To address several techniques are being directed to consider for improving texture, lighting for avoiding strong sunlight, using local algorithms where global algorithms do not perform well, reconstructing plant canopies with camera matrices, and combining stereo vision with other 3D methods were suggested to improve the accuracy in processing the images of stereovision. An improved bilateral filtering technique was suggested to use to solve the holes and discontinuities generated by occlusion. Image enhancement algorithms such as gamma correction, histogram equalization, and CLAHE directed to apply on stereovision image processing for more accuracy. The adoption of stereovision techniques should be widespread in precision agriculture such that the stereo vision would hold considerable promise in advancing agricultural practices and facilitate in efficient and precise measurement of plants and trees.













