Learn R Programming

⚠️There's a newer version (1.3) of this package.Take me there.

Evolutionary Markov Chain for Adverse Drug Reaction

R package implementing an Evolutionary Monte-Carlo Markov Chain algorithm (an adaptation of Metropolis-Hastings). This package is designed to be used with medical data, especially with patients using medications.

Supervisor : Mr. Birmele Etienne

Install the package using the .tar.gz file

First of all you have to clone the emcAdr GitHub repository which is available here. Then you can install the package using the following command in the R console :

install.packages("~/path/to/emcAdr/package_src/emcAdr_1.0.tar.gz", repos = NULL, type = "source")

Get the modified ATC tree (containing every medications)

The algorithm uses a modified medication tree which include an upper bound that locates the last drug in the drug family represented by the current node (if the current node is not a leaf). You can find the original drugs tree in the emcAdr/data folder. You can also use your own tree but an upper bound for each node is mandatory (the upper bound of a leaf is the index of this leaf in your 2D array).

Build or use a dataset of patients

The algorithm requires a data frame of patient. Every line of this Data frame represents a single patient, the medications they are taking and a boolean representing whether they have the adverse drug effect under consideration.

They are 2 mandatory columns : patientATC and patient ADR. Respectively the index of the drugs taken by the patient in the tree of drugs (indexes start at 0) and the boolean representing whether he has the ADR.

There is an example of a row for a patient who takes 3 drugs (having 12, 56 and 798 as indexes) and doesn't have the adverse drug reaction under consideration :

patientATCpatientADR
12, 56, 7980

Use the EMC function

You are now ready to use the EMC function contained in the package. Here is an example

res <- EMC(n = 100,nbIndividuals = 5,ATCtree = ATC_Tree_UpperBound_2014, observations = simulPatient_df, startingIndividuals = c(), startingTemperatures = c())

Copy Link

Version

Install

install.packages('emcAdr')

Monthly Downloads

172

Version

1.2

License

GPL-3

Maintainer

Jules Bangard

Last Published

February 27th, 2025

Functions in emcAdr (1.2)

p_value_cocktails

Used to add the p_value to each cocktail of cocktail list
get_dissimilarity_from_txt_file

Recover the square matrix of distance between cocktails where the index (i,j) of the matrix is the distance between cocktails i and j in the csv file containing results of genetic algorithm
get_dissimilarity_from_genetic_results

Recover the square matrix of distance between cocktails where the index (i,j) of the matrix is the distance between cocktails i and j in the genetic_results list.
string_list_to_int_cocktails

Function used to convert a string vector of drugs in form "drug1:drug2" to a vector of index of the ATC tree ex: c(ATC_index(drug1), ATC_index(drugs2))
trueDistributionSizeTwoCocktail

The true distribution of the score among every size-two cocktails
trueDistributionDrugs

The true distribution of the score among every single nodes of the ATC
qq_plot_output

Make a Quantile-Quantile diagram from the output of the MCMC algorithm (DistributionAproximation) and the algorithm that exhaustively calculates the distribution
print_csv

Print every cocktails found during the genetic algorithm when used with the hyperparam_test_genetic_algorithm function. This enables to condense the solutions found in each files by collapsing similar cocktail in a single row by cocktail.
p_value_csv_file

Used to add the p_value to each cocktail of a csv_file that is an output of the genetic algorithm
p_value_on_sampled

Calculate p-value of sampled value
p_value_genetic_results

Used to add the p_value to each cocktail of an output of the genetic algorithm
plot_evolution

Plot the evolution of the mean and the best value of the population used by the GeneticAlgorithm
plot_frequency

Plot the histogram of the approximation of the RR distribution
clustering_genetic_algorithm

Clustering of the solutions of the genetic algorithm using the hclust algorithm
FAERS_myopathy

FAERS Myopathy Dataset
OutsandingScoreToDistribution

Output the outstanding score (Outstanding_score) outputed by the MCMC algorithm in a special format
calculate_divergence

Calculate the divergence between 2 distributions (the true Distribution and the learned one)
computeMetrics_size2

Function used in the reference article to compare diverse Disproportionality Analysis metrics
compute_RR_on_list

Function used to compute the Relative Risk on a list of cocktails
DistributionApproximation

The MCMC method that runs the random walk on a single cocktail in order to estimate the distribution of score among cocktails of size Smax.
ATCtoNumeric

Convert ATC Code for each patients to the corresponding DFS number of the ATC tree
GeneticAlgorithm

Genetic algorithm, trying to reach riskiest cocktails (the ones which maximize the fitness function, Hypergeometric score in our case)
hyperparam_test_genetic_algorithm

This function can be used in order to try different set of parameters for the genetic algorithm in a convenient way. This will run each combination of mutation_rate, nb_elite and alphas possible nb_test_desired times. For each sets of parameters, results will be saved in a file named according to the set of parameter. One can regroup the results of each run in a csv file by using the print_csv function specifying the names of each file that needs to be treated and the number of performed runs on each parameter set
hclust_genetic_solution

Clustering of the solutions of the genetic algorithm using the hclust algorithm
histogramToDitribution

Convert the histogram returned by the DistributionApproximation function, to a real number distribution (that can be used in a test for example)
int_cocktail_to_string_cocktail

Function used to convert integer cocktails (like the one outputed by the distributionApproximation function) to string cocktail in order to make them more readable
csv_to_population

Function used to convert your genetic algorithm results that are stored into a .csv file to a Data structure that can be used by the clustering algorithm
compute_hypergeom_on_list

Function used to compute the Hypergeometric score on a list of cocktails
ATC_Tree_UpperBound_2024

ATC Tree Upper Bound 2024
emcAdr-package

tools:::Rd_package_title("emcAdr")
get_dissimilarity_from_cocktail_list

Recover the square matrix of distance between cocktails where the index (i,j) of the matrix is the distance between cocktails i and j in an arbitrary cocktail list