A data mining tool for untargeted biomarkers analysis: grapes ripening application
In metabolomics, data generated by untargeted approaches can be very complex due to the typically extensive number of features in raw data (with and without chemical relevance), dependence on raw data preprocessing methods, and lack of selective data mining tools to appropriately interpret these data. Extraction of meaningful information from these data is still a significant challenge in metabolomics.
Moreover, currently available tools may overprocess the data, eliminating useful information. This work aims at proposing a data mining tool capable of dealing with metabolomics data, specifically liquid chromatography-mass spectrometry (LC-MS) to enhance the extraction of meaningful chemical information.
The algorithm construction intended to be as general as possible in highlighting chemically relevant features, discarding non-informative signals specially background features.
The proposed algorithm was applied to an LC-MS data set generated from the analysis of grapes collected over a developmental period encompassing a 4-month period. The algorithm outcome is a short list of features from metabolites that are worth to be further investigated, for example by HRMS fragmentation for subsequent identification.
The performance of the algorithm in estimating potentially interesting features was compared with the commercial MZmine software. For this case study, the MZmine output yielded a final set of 37 features (out of 1543 initially identified) with noise features while the proposed algorithm identified 99 systematic features without noise. Also, the algorithm required 2 times less user-defined parameters when compared to MZmine. Globally, the proposed algorithm demonstrated a higher ability to pin-point features that may be associated with grapes developmental and maturation processes requiring minimal parameters definition, thus preventing user uncertainty and the compromise of experimental information.