Genetic Research with Computer Vision: A Case Study in Studying Seed Dormancy

Reading time:
time
min

<h2>Assisting genetic research with computer vision</h2> <span data-preserver-spaces="true">To improve the detection of genetic signatures in seeds correlated with their dormancy, we have trained computer vision models. These models have captured more than was previously understood about the mechanisms of dormancy. Genetic research with computer vision is opening up the field to new discoveries and potential for growth. </span> <blockquote><span data-preserver-spaces="true">New to computer vision? </span><a class="editor-rtfLink" href="https://wordpress.appsilon.com/image-classification-tutorial/" target="_blank" rel="noopener noreferrer"><span data-preserver-spaces="true">Read our guide on getting started with fastai, ResNet, MobileNet, and more</span></a><span data-preserver-spaces="true">.</span></blockquote> <span data-preserver-spaces="true">Navigate to a section:</span> <ul><li><a href="#introduction">Introduction</a></li><li><a href="#data">Data</a></li><li><a href="#benchmark">Benchmark</a></li><li><a href="#architecture">Architecture</a></li><li><a href="#results">Results</a></li><li><a href="#conclusion">Conclusion</a></li></ul> <h2 id="introduction"><span data-preserver-spaces="true">Introduction</span></h2> <span data-preserver-spaces="true">Seed dormancy is an actively studied biological phenomenon crucial in many areas of the economy (e.g., at early stages of the food supply chain) and ecology (e.g., providing insights for studying the effects of the climate crisis on plants). </span> <span data-preserver-spaces="true">An active contributor to understanding the genetic mechanisms controlling seed dormancy is the </span><a class="editor-rtfLink" href="https://swiezewskilab.pl/" target="_blank" rel="noopener noreferrer"><strong><span data-preserver-spaces="true">Świeżewski </span></strong>lab</a><span data-preserver-spaces="true">, hosted at the </span><a class="editor-rtfLink" href="https://www.ibb.waw.pl/en" target="_blank" rel="noopener noreferrer"><span data-preserver-spaces="true">Institute of Biochemistry and Biophysics of the Polish Academy of Sciences</span></a><span data-preserver-spaces="true">. </span> <span data-preserver-spaces="true">The entire cycle of seed generation, dormancy, and germination is investigated by the researchers at the lab, allowing to broaden the understanding of critical triggers in the process and providing the possibility of finding scalable ways of monitoring it.</span> Looking at seeds of thale cress (<em>Arabidopsis thaliana</em>), the researchers had noticed that a morphological parameter differentiates to some extent the seeds germinated and are still dormant after a key period of two weeks from the moment of observation. <img class="wp-image-6333 size-full" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b0211fce6734eff8381a2c_test-1.webp" alt="Image 1 - Distributions of one of the morphological parameters of germinated and dormant seeds are shown, indicating that the parameter is more likely to be large for germinated seeds." width="1125" height="729" /> Image 1 - Distributions of one of the morphological parameters of germinated and dormant seeds are shown, indicating that the parameter is more likely to be large for germinated seeds. <span data-preserver-spaces="true">The morphological parameter of seeds was estimated based on pairs of images obtained from a machine handling their planting. The researchers contacted us for assistance in analyzing the data with </span><strong><span data-preserver-spaces="true">computer vision</span></strong><span data-preserver-spaces="true">. They hypothesized that more visually distinguishable features might be found by such models, allowing them to discern between seeds that in two weeks from the moment the photos were taken will be dormant or germinated.</span> <h2 id="data"><span data-preserver-spaces="true">Genetic research data for computer vision models</span></h2> <span data-preserver-spaces="true">The data given to us consisted of several thousand pairs of images of seeds - each seed being roughly 0.2mm in diameter. The photos came from a machine used to automate their placement on trays where they are monitored. </span> <img class="wp-image-6328 size-full" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b021217a16ac39e34c1210_2_1_75.webp" alt="Image 2 - Pair of photos of a single seed (suspended on a pneumatic needle)" width="1196" height="599" /> Image 2 - Pair of photos of a single seed (suspended on a pneumatic needle) <span data-preserver-spaces="true">The images were taken in a highly controlled environment, so it was clear that only a tiny fraction of the image contains the seed. Hence, we should identify the part of the image containing the seed to help the planned models focus on the seed's features. </span> <span data-preserver-spaces="true">Additionally, cropping the photos to the seed reduces their size, speeding up the models' training. However, the needle was not always placed in the same location of the image, so we needed to perform </span><strong><span data-preserver-spaces="true">adaptive cropping</span></strong><span data-preserver-spaces="true">. We decided to analyze the total color intensities for each row and column of pixels in the image - analyzing the obtained signals and automatically identifying the peaks in the red channels.</span> <img class="size-full wp-image-6314" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b395865b5a6f8240aaa2bc_3-5.webp" alt="Image 3 - Signals with pixels' intensities summed per row for the above photos (analogous plots were analyzed for the columns). The peaks are marked with a cross. The brightening of the background at the bottom of the images translates to a steadily growing intensity in color in the above plots." width="1570" height="516" /> Image 3 - Signals with pixels' intensities summed per row for the above photos (analogous plots were analyzed for the columns). The peaks are marked with a cross. The brightening of the background at the bottom of the images translates to a steadily growing intensity in color in the above plots. <span data-preserver-spaces="true">With a fixed square window centered on the locations of the peaks, we were able to crop the images to single out the seeds.</span> <img class="wp-image-6329 size-full" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b02123ebdf0e3658806b80_4_75.webp" alt="Image 4 - Photos of a seed with the seeds' location identified using the cropping based on the location of the peak of color intensity. Uncropped images used for genetic research with computer vision. " width="1194" height="642" /> Image 4 - Photos of a seed with the seeds' location identified using the cropping based on the location of the peak of color intensity <img class="size-full wp-image-6315" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b39587df484ee4e661a739_5-5.webp" alt="Image 5 - Cropped photos of a seed with actual resolution. Such images were later fed into the neural networks." width="656" height="330" /> Image 5 - Cropped photos of a seed with actual resolution. Such images were later fed into the neural networks <img class="wp-image-6316 size-full" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b395e5f32c7f090d664242_6-5.webp" alt="Image 6 - More examples of cropped images (a single image per pair is shown). Images used for genetic research with computer vision. It can be seen that it is next to impossible for a human eye to tell the difference between the classes." width="1306" height="1338" /> Image 6 - More examples of cropped images (a single image per pair is shown). It can be seen that it is next to impossible for a human eye to tell the difference between the classes <span data-preserver-spaces="true">The resolution is not very high. Even though the original images consist of almost 400000 pixels each, the seeds typically take around 500 of them. This poses a challenge for any attempt to recognize the visual features of the seeds.</span> <span data-preserver-spaces="true">In addition to cropping the images, the data had to be cleaned concerning the labels we received. The seeds were labeled with one of </span><strong><span data-preserver-spaces="true">four categories</span></strong><span data-preserver-spaces="true">: germinated, dormant, lost, and dead. The first two were the most exciting classes for our purposes. We dropped the additional classes for the current analysis, as they were products of improper application of seed handling.</span> <span data-preserver-spaces="true">Interestingly, the data came from four different trays (each containing seeds from a distinct plant specimen), and each of them had a different distribution of the four initial classes. Many interesting observations can be made by visualizing the location of the seeds on the round trays.</span> <img class="size-full wp-image-6335" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b0212546af19f939a92d3a_trays-1.webp" alt="Image 7 - The class distributions varied significantly between the trays" width="2160" height="1942" /> Image 7 - The class distributions varied significantly between the trays <span data-preserver-spaces="true">We selected 10% of the data (stratified by the label) as a validation set.</span> <span data-preserver-spaces="true">All results were evaluated on that set, and all the models trained only on the remainder training set.</span> <h2 id="benchmark"><span data-preserver-spaces="true">Benchmark</span></h2> <span data-preserver-spaces="true">We knew the morphological parameter of seeds correlates with their dormancy in the above sense, so we first trained a simple extreme gradient boosted tree model to predict the label (whether the seed germinated or stayed dormant). The predictions are based only on the value of that morphological parameter of the seeds identified individually in the two photos of a given seed. </span> <span data-preserver-spaces="true">The model reached </span><strong><span data-preserver-spaces="true">62% validation accuracy</span></strong><span data-preserver-spaces="true">, which is not very impressive for a binary classifier but serves as a benchmark for further modeling. In particular, this benchmark quantifies the extent to which the seed dormancy can be explained with the mentioned morphological parameter alone.</span> <h2 id="architecture"><span data-preserver-spaces="true">Architecture</span></h2> <span data-preserver-spaces="true">To accommodate the data being pairs of RGB photos, we decided to adapt known neural network architectures to accept six input channels, mixing the information from the two photos only in the later dense layers. </span> <span data-preserver-spaces="true">We've started from training residual networks within the ResNet family as they are the most robust in our experience, offering promising results at a reasonable training pace. We settled on an adapted version of an EfficientNet version B3 which gave us a </span><strong><span data-preserver-spaces="true">validation accuracy of 70%</span></strong><span data-preserver-spaces="true">.</span> <h2 id="results"><span data-preserver-spaces="true">Results</span></h2> Using computer vision analyses of the pairs of images, we were able to predict seed dormancy with accuracy significantly higher than a model based solely on the single morphological parameter known to correlate with the dormancy of the seeds. <span data-preserver-spaces="true">It's an exciting finding as it supports the hypothesis that the genetic mechanisms controlling dormancy are also reflected in more visually distinguishable features.</span> <span data-preserver-spaces="true">To confirm the above result, we have analyzed how the model's accuracy differs between different sizes of the seeds. While there is a difference (seeds with the morphological feature more pronounced being classified more accurately), it is not significant. </span> <span data-preserver-spaces="true">Moreover, the trays in which the seeds were grown were a stronger indicator of their dormancy predictability. Seeds from tray 2 were the hardest to predict, and those from tray 1 and 4 were the easiest (with accuracy reaching 75%). </span> <span data-preserver-spaces="true">Interestingly, for trays 1 and 4, the morphological parameter distributions for both classes are not misaligned, and the balance between classes is complementary.</span> <img class="wp-image-6334 size-full" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b021267e876a3f78c5bbc1_sizes_per_tray-1.webp" alt="Image 8 - Distributions of the morphological parameter for the labels separately and individual trays. Trays 2 and 3 exhibit the shift of germinated seeds towards larger values, while trays 1 and 4 only to a modest extent - the two distributions for those trays are better aligned." width="2400" height="2400" /> Image 8 - Distributions of the morphological parameter for the labels separately and individual trays. Trays 2 and 3 exhibit the shift of germinated seeds towards larger values, while trays 1 and 4 only to a modest extent - the two distributions for those trays are better aligned. <h2 id="conclusion"><span data-preserver-spaces="true">Conclusion: genetic research with computer vision - a gateway to growth</span></h2> <span data-preserver-spaces="true">In this project, we have prepared and modeled on a very interesting dataset - consisting of pairs of tiny seed photos. Our key contribution was the confirmation of the researchers' hypothesis - dormancy is controlled by genetic mechanisms that have visual signatures at a very early stage in the development of the seeds.</span> <span data-preserver-spaces="true">Learn more about computer vision:</span> <ul><li><a class="editor-rtfLink" href="https://wordpress.appsilon.com/object-detection-yolo-algorithm/" target="_blank" rel="noopener noreferrer"><span data-preserver-spaces="true">YOLO Algorithm and YOLO Object Detection: An Introduction</span></a></li><li><a class="editor-rtfLink" href="https://wordpress.appsilon.com/pp-yolo-object-detection/" target="_blank" rel="noopener noreferrer"><span data-preserver-spaces="true">PP-YOLO Object Detection Algorithm: Why It's Faster than YOLOv4</span></a></li><li><a class="editor-rtfLink" href="https://wordpress.appsilon.com/convolutional-neural-networks/" target="_blank" rel="noopener noreferrer"><span data-preserver-spaces="true">Convolutional Neural Networks: An Introduction</span></a></li><li><a class="editor-rtfLink" href="https://wordpress.appsilon.com/transfer-learning-introduction/" target="_blank" rel="noopener noreferrer"><span data-preserver-spaces="true">Introduction to Transfer Learning: Effective Machine Learning Without Custom Architecture</span></a></li><li><a class="editor-rtfLink" href="https://wordpress.appsilon.com/fast-ai-in-r/" target="_blank" rel="noopener noreferrer"><span data-preserver-spaces="true">Fast.ai in R: How to Make a Computer Vision Model within an R Environment</span></a></li><li><a class="editor-rtfLink" href="https://wordpress.appsilon.com/satellite-image-analysis-with-fast-ai-for-disaster-recovery/" target="_blank" rel="noopener noreferrer"><span data-preserver-spaces="true">Satellite Image Analysis with fast.ai for Disaster Recovery</span></a></li></ul> <a href="https://appsilon.com/careers/" target="_blank" rel="noopener noreferrer"><img class="aligncenter wp-image-5940 size-large" src="https://wordpress.appsilon.com/wp-content/uploads/2020/11/appsilon.hiring0-1024x576.jpg" alt="Job application call to action " width="1024" height="576" /></a> <p style="text-align: center;"><strong><span data-preserver-spaces="true">Appsilon is hiring for remote roles! See our </span></strong><a class="editor-rtfLink" href="https://wordpress.appsilon.com/careers/" target="_blank" rel="noopener noreferrer"><strong><span data-preserver-spaces="true">Careers</span></strong></a><strong><span data-preserver-spaces="true"> page for all open positions, including </span></strong><a class="editor-rtfLink" href="https://wordpress.appsilon.com/careers/#r-shiny-developer" target="_blank" rel="noopener noreferrer"><strong><span data-preserver-spaces="true">R Shiny Developers</span></strong></a><strong><span data-preserver-spaces="true">, </span></strong><a class="editor-rtfLink" href="https://wordpress.appsilon.com/careers/#fullstack-software-engineer-tech-lead" target="_blank" rel="noopener noreferrer"><strong><span data-preserver-spaces="true">Fullstack Engineers</span></strong></a><strong><span data-preserver-spaces="true">, </span></strong><a class="editor-rtfLink" href="https://wordpress.appsilon.com/careers/#frontend-engineer" target="_blank" rel="noopener noreferrer"><strong><span data-preserver-spaces="true">Frontend Engineers</span></strong></a><strong><span data-preserver-spaces="true">, a </span></strong><a class="editor-rtfLink" href="https://wordpress.appsilon.com/careers/#senior-infrastructure-engineer" target="_blank" rel="noopener noreferrer"><strong><span data-preserver-spaces="true">Senior Infrastructure Engineer</span></strong></a><strong><span data-preserver-spaces="true">, and a </span></strong><a class="editor-rtfLink" href="https://wordpress.appsilon.com/careers/#community-manager" target="_blank" rel="noopener noreferrer"><strong><span data-preserver-spaces="true">Community Manager</span></strong></a><strong><span data-preserver-spaces="true">. Join Appsilon and work on groundbreaking projects with the world's most influential Fortune 500 companies.</span></strong><a href="https://appsilon.com/careers/" target="_blank" rel="noopener noreferrer"> </a></p>

Have questions or insights?

Engage with experts, share ideas and take your data journey to the next level!
Explore Possibilities

Share Your Data Goals with Us

From advanced analytics to platform development and pharma consulting, we craft solutions tailored to your needs.

Talk to our Experts
r
python
tutorials
ai&research