Gdaphen, R pipeline to identify the most important qualitative and quantitative predictor variables from phenotypic data.
Fiche publication
Date publication
janvier 2023
Journal
BMC bioinformatics
Auteurs
Membres identifiés du Cancéropôle Est :
Dr HERAULT Yann
Tous les auteurs :
Muñiz Moreno MDM, Gavériaux-Ruff C, Herault Y
Lien Pubmed
Résumé
In individuals or animals suffering from genetic or acquired diseases, it is important to identify which clinical or phenotypic variables can be used to discriminate between disease and non-disease states, the response to treatments or sexual dimorphism. However, the data often suffers from low number of samples, high number of variables or unbalanced experimental designs. Moreover, several parameters can be recorded in the same test. Thus, correlations should be assessed, and a more complex statistical framework is necessary for the analysis. Packages already exist that provide analysis tools, but they are not found together, rendering the decision method and implementation difficult for non-statisticians.
Mots clés
Bootstrapping, Clinical data, Discrimination, Generalized linear models, Imputation, Machine learning, Model, Phenotypic data, Prediction, R package, Random forest
Référence
BMC Bioinformatics. 2023 01 26;24(1):28