Researchers of the attoworld-team of the Ludwig-Maximilians-Universität München and the Max-Planck-Institute of Quantum Optics have now gained further insights into how to use molecular fingerprinting for the analysis of complex molecular samples even better.

A mix of molecules can be very complex. For example, when it is extracted from liquids such as blood or urine. Analysing such a molecular composition precisely in a so-called molecular fingerprint is a major challenge. However, if successful, it could provide information about whether an organism is healthy or diseased.
A powerful way to analyse complex samples for their chemical composition is molecular spectroscopy using infrared or Raman techniques. Although the technology is widely used, the capabilities and limitations of molecular fingerprinting are not yet fully understood. Researchers at the attoworld-team of the Ludwig-Maximilians-Universität München and the Max-Planck-Institute of Quantum Optics have now gained further insights into how to use the technique even better for the analysis of complex molecular samples. Here, Dr. Marinus Huber and Tarek Eissa report which approach they have taken to approach the challenge.

Marinus, could you explain how molecular spectroscopy works.
Infrared and Raman spectroscopy are two complementary techniques both used to study molecular vibrations by analysing their interactions with electromagnetic radiation. When these molecular vibrations are measured, they can be associated to specific chemical bonds between molecules and thus provide insight about the chemical composition of investigated sample.
Infrared spectroscopy measures the absorption or transmission of infrared radiation by a molecule. When a molecule absorbs infrared radiation, it causes the bonds between its atoms to vibrate, and the specific way that the molecule vibrates depends on the types of atoms and bonds present. By measuring which frequencies of infrared radiation are absorbed by a sample, scientists can determine what types of bonds are present in the molecule and how they are arranged.
Raman spectroscopy, on the other hand, measures the scattering of light by a molecule. When a molecule scatters light, the scattered photons may gain or lose energy depending on the vibrational energy of the molecule. The frequency shifts in the scattered light can reveal information about the molecular vibrations, and therefore the chemical composition and structure of the molecule – its fingerprint.
Both infrared and Raman spectroscopy are powerful tools for identifying and characterizing molecules in a variety of applications, from identifying unknown substances to analysing the chemical composition of biological samples.

Marinus, how is molecular spectroscopy used by the attoworld-team?
The main application of molecular spectroscopy within the attoworld team is the infrared fingerprinting of bodily fluids, such as blood serum and plasma, for disease detection. The underlying idea is that even local diseases can affect the metabolism and leave their traces in the molecular composition of systemic body fluids, such as blood. This information can be extracted by measuring the vibrational spectrum using infrared spectroscopy. In a next step, we apply machine learning algorithms to identify spectral patterns associated with different diseases, which can then be used for disease diagnostics. Generally, such an approach could be practically translated and integrated into healthcare.

Marinus, can you briefly summarize what your latest publication is about?
In our latest publication we aimed to better understand the underlying capabilities and potential limitations of infrared fingerprinting for disease detection. To study this, we created a mathematical model – a way to compute and create artificial infrared fingerprints of blood serum that reflect the properties of actual measured fingerprints, but also considers other effects, such as measurement noise, as an additional parameter. The advantage of the simulation is that we can freely adjust the model parameters used to generate the infrared fingerprints - e.g. the molecular complexity of the sample - and investigate their influence on the accuracy of disease detection. This allows us to better understand which experimental boundary conditions are relevant for a medical application without the need for elaborate experiments, and we can then direct future clinical studies accordingly.

Marinus, what are the challenges?
One of the main challenges was to find a mathematical model that could accurately replicate the results we obtained from experimental datasets in the lab. Ultimately, the descriptive approach we described in the publication proved to be successful in independently reproducing all major findings. However, a challenge that remains open is to determine how well the model performs outside the parameter space for which we have data for calibration and cross-comparison. We are actively working on this and plan to address it in the near future.

Tarek, how does artificial intelligence come into play to help address these challenges?
Artificial intelligence can learn to recognize patterns in datasets that contain different groups of measurements. You can teach, or train, artificial intelligence algorithms on many measurements that you already know to which group they belong. The idea is that once it has seen enough data, it can make predictions on new data it has not seen before and, for example, decide whether a molecular fingerprint is healthy or diseased. The crucial factor here is that the ability to make these predictions is largely determined by the availability and quality of the data it had been trained on. This is where the simulation model comes into play. We use it to repeatedly manipulate the training data for diseased and healthy molecular fingerprints. Then, we study the effects of the variations on making accurate predictions. This gives us quick insights into how each simulated parameter influences these predictions and helps identify where to better focus our experimental efforts to improve disease detection efficacy.

Tarek, what insights does your research provide?
We investigated the effects of several adjustable simulation parameters. Specifically, the impact of the number of samples used for training, the influence of biological variability and measurement noise, and also the effect of the molecular complexity of the samples studied. I would say that our biggest findings were that the ability to distinguish cancer cases from controls very significantly improved when the biological variability and molecular complexity were reduced. While this may seem like an obvious conclusion, the significance lies in our ability to now quantitatively estimate the degree of improvement we can expect by implementing experimental studies and procedures that target these factors. Especially given that it is challenging to realize these improvements experimentally, our findings can guide researchers in estimating the potential rewards of such efforts.

Marinus: How can your findings be applied?
We can imagine several possible applications. First, it will help to improve the design of future studies. Now that we know the potentially limiting factors, we can account for them and plan future clinical studies and samplings accordingly. For example, our results suggest that longitudinal fingerprinting, which means to follow a person over time, will increase the accuracy of disease detection significantly. The good news here is that longitudinal sample collection is already ongoing!
Beyond that, the model can also be applied to other applications of infrared spectroscopy, such as detecting trace analytes in complex samples. When the experimental conditions are well characterized, it can help determine which analytes can be detected without the need for performing unnecessary experiments.
Overall, we believe that our model will help understand the potential and limitations of vibrational fingerprinting for various applications and, in turn, help avoid unnecessary and resource consuming experiments.

Original publication:
T. Eissa, K. Kepesidis, M. Zigman, and M. Huber
Limits and Prospects of Molecular Fingerprinting for Phenotyping Biological Systems Revealed through In Silico Modeling
Analytical Chemistry, (2023)