Gdaphen, R pipeline to identify the most important qualitative and quantitative predictor variables from phenotypic data.

Fiche publication


Date publication

janvier 2023

Journal

BMC bioinformatics

Auteurs

Membres identifiés du Cancéropôle Est :
Dr HERAULT Yann


Tous les auteurs :
Muñiz Moreno MDM, Gavériaux-Ruff C, Herault Y

Résumé

In individuals or animals suffering from genetic or acquired diseases, it is important to identify which clinical or phenotypic variables can be used to discriminate between disease and non-disease states, the response to treatments or sexual dimorphism. However, the data often suffers from low number of samples, high number of variables or unbalanced experimental designs. Moreover, several parameters can be recorded in the same test. Thus, correlations should be assessed, and a more complex statistical framework is necessary for the analysis. Packages already exist that provide analysis tools, but they are not found together, rendering the decision method and implementation difficult for non-statisticians.

Mots clés

Bootstrapping, Clinical data, Discrimination, Generalized linear models, Imputation, Machine learning, Model, Phenotypic data, Prediction, R package, Random forest

Référence

BMC Bioinformatics. 2023 01 26;24(1):28