Monitoring Marine Ecosystems with Machine Learning and Neural Networks

Marine ecosystems are changing rapidly in response to climate change. These changes will influence and alter ecological processes and <a href="https://appsilon.com/monitoring-ecosystems-with-computer-vision/" target="_blank" rel="noopener noreferrer">species interactions</a>. Monitoring and quickly responding to these changes depends, in part, on the speed at which researchers can process large datasets. This can be a long, time-consuming process. By addressing this bottleneck, machine learning models offer a solution to improving how we monitor marine and other ecosystems. And ultimately, we can use ML to improve reaction time to an evolving crisis. <ul><li><a href="#howml">How machine learning is improving ecology research</a></li><li><a href="#whyzooplankton">Why the state of Arctic zooplankton health is worrying</a></li><li><a href="#difficulties">What makes it so hard to monitor marine ecosystems</a></li><li><a href="#data-and-model">The data and the model</a></li><li><a href="#model-quality">Assessing model quality</a></li><li><a href="#summary">Summary</a></li></ul> <hr /> <h2 id="howml">How machine learning is improving ecology research</h2> By combining our experience and the domain expertise of Prof. Frederic Maps from <a href="https://www.ulaval.ca/en" target="_blank" rel="noopener noreferrer">Université Laval</a> and Dr. Sakina-Dorothée Ayata from <a href="https://www.sorbonne-universite.fr/" target="_blank" rel="noopener noreferrer">Sorbonne University</a>, we developed a machine learning model that is able to identify <a href="https://en.wikipedia.org/wiki/Copepod" target="_blank" rel="noopener noreferrer">copepod</a> lipid sacs in LOKI Images [<a href="http://dx.doi.org/10.1109/OCEANSE.2009.5278252" target="_blank" rel="noopener noreferrer">1</a>]. The mass of lipids inside a copepod is a good measure of the health of marine arctic ecosystems since copepods are a fundamental link in most food chains. Our solution decreases the time required to process several thousand images from weeks to minutes. This paves the way for a breakthrough in the study and monitoring of Arctic ecosystems. Moreover, since the machine learning algorithms we have trained are easily adjustable to other marine ecosystems, the impact can be spread beyond the arctic. <blockquote>"Manual annotation of the lipid sac in each individual image takes a lot of time, but could it be used to train an algorithm to automatically identify these structures in thousands of additional images? If we could automatically measure the size of the lipid sacs in the images we have and that have been collected in the Arctic, it could provide us new knowledge on the ecological status of the arctic marine ecosystem, and it is affected by environmental changes" - Dr. Sakina-Dorothée Ayata</blockquote> <img class="size-full wp-image-11569" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b01e68cbd83bfce97b03ec_false-positive-false-negative.webp" alt="Example copepod LOKI Image with lipid sac annotated" width="500" height="400" /> Example copepod LOKI Image with lipid sac annotated and predictions (yellow pixels are the annotation of the lipid sac, blue pixels are erroneously predicted to be a lipid sac by the model, green pixels are where the annotations and predictions match) <h2 id="whyzooplankton"><b>Why the state of Arctic zooplankton health is worrying</b></h2> The Arctic Ocean forms a peculiar ecosystem subjugated to harsh environmental conditions. including a thick ice cover and total darkness for months! All life relies on a brief period of a few summer weeks during which unicellular photosynthetic organisms, phytoplankton, have enough light to grow. Actually, Arctic animals do not rely on phytoplankton itself, but rather on other tiny organisms, copepods. Copepods have developed an amazing ability to gorge themselves on phytoplankton and transform this feast into huge stores of energetic lipids. So, Arctic copepods allow the whole trophic network, from fish larvae to enormous baleen whales, to benefit all year long from the pulsed summer productivity. As a result, any changes in their ability to thrive in such a sensitive environment could have critical consequences on the ecosystem as a whole. A major concern is that the current climate change impacts have been the strongest and the fastest over the Arctic region. Some changes in copepods species assemblages have already been observed: n some areas, the large, lipid-rich Arctic species are pushed away by smaller, leaner species of Atlantic origin. <h2 id="difficulties"><b>What makes it so hard to monitor marine ecosystems</b></h2> Monitoring marine plankton is a challenging endeavor. It traditionally requires the sampling of planktonic organisms with plankton nets during a scientific cruise. When sampled, these fragile organisms can be damaged. Once in the lab, it takes an incredible amount of time for experts to identify each organism individually under a microscope. Today, new techniques such as in situ imaging are now available to avoid damaging these fragile organisms and to make the taxonomic identification automatic from plankton images. More interestingly, quantitative imaging of marine plankton also provides access to individual features of planktonic organisms. These features include morphology, behavior, or physiological state, such as the size of lipid sacs of copepods. <h2><b>Building machine learning models to monitor marine ecosystems</b></h2> <h3 id="data-and-model"><b>The data and the model</b></h3> Prof. Frederic Maps provided us with LOKI images collected in situ. These rather high-detail images, with each pixel being 23 by 23 µm (0.023 x 0.023 mm), present tiny organisms and <a href="https://en.wikipedia.org/wiki/Marine_snow" target="_blank" rel="noopener noreferrer">other objects</a> that flow in arctic waters. After an image is taken, it's classified with a random forest model [<a href="http://dx.doi.org/10.1016/j.mio.2016.03.003" target="_blank" rel="noopener noreferrer">2</a>]. In this project, we worked on all images classified as different kinds of copepods. There were approximately <b>2400</b> of them. For those copepod images prof. Frederic Maps provided us with lipid sac segmentation masks prepared by a Masters student, Alexandra Mercier. <img class="size-full wp-image-11567" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b01e69c37dda6258e56482_copepods-2.webp" alt="LOKI image, segmentation mask - annotating the location of the lipid sac" width="504" height="436" /> LOKI image, segmentation mask - annotating the location of the lipid sac, LOKI image with segmentation mask In a short, 2-week, sprint we prepared a model that predicts the lipid sac location in the image. Since the size of a pixel is fixed for LOKI images, we can convert the pixel number to the physical area occupied by the sac. Invoking the results of Vogedes et al., we can convert area A<i> i</i>n mm^2 to mass M in mg [<a href="https://doi.org/10.1093/plankt/fbq068">3</a>]. <p style="text-align: center;">M = 0.197 A^{1.38}</p> <h3 id="model-quality"><b>Assessing model quality</b></h3> We focused on three metrics: <ol><li><a href="https://en.wikipedia.org/wiki/Jaccard_index" target="_blank" rel="noopener noreferrer">Intersection over union</a> for every prediction;</li><li style="font-weight: 400;" aria-level="1">Mean of errors of predictions - we want our model to be unbiased;</li><li style="font-weight: 400;" aria-level="1">Total mass of lipids in copepods.</li></ol> Our model achieves 0.70 IoU on validation data not used for training. The mean of errors is 0.012 um. For reference, the mean lipid sac mass in the validation data is 0.764 um. The error is not consistent across the whole range of copepod sizes - with larger ones being predicted better. The total mass of lipid content over 225 validation images was predicted as 181.58 mg, while the annotated value was 184.34 mg. Domain expert prof. Frederic Maps believes these numbers are more than we need to use the model on historic LOKI images data. Prof. Frederic Maps goes on to suggest that with some adjustment we might use this model in different areas of the Earth as well. After the completion of the research, we shared with prof. Maps the trained model, the code we used to obtain it, and a detailed study of the results. The study revealed an unexpected strength of the analysis. Namely, that the predictions of the model are unbiased (the error distribution is symmetrical). But also that the model’s performance is even better when we focus on the key copepod subspecies of interest. <h2 id="summary"><b>Summary - applying machine learning to environmental studies</b></h2> Combining expertise in different areas often leads to interesting results very quickly. There is a lot that machine learning can do in the biodiversity area. The primary building blocks are an idea, the data, and skilled engineers. At Appsilon, we have skilled ML engineers. We are eager to help preserve biodiversity by utilizing our skills and knowledge. If you have some interesting ideas, the data, and a lack of highly-skilled engineers, we're happy to support your initiative. Especially if it leads to improving the life quality on our planet. We encourage you to browse our machine learning case studies and how we can help your project through our Data For Good initiative. 

Have questions or insights?

Engage with experts, share ideas and take your data journey to the next level!

Is Your Software GxP Compliant?

Download a checklist designed for clinical managers in data departments to make sure that software meets requirements for FDA and EMA submissions.
Explore Possibilities

Share Your Data Goals with Us

From advanced analytics to platform development and pharma consulting, we craft solutions tailored to your needs.

Talk to our Experts
marine ecology
data for good
ai&research