Learn R Programming

Evolutionary Markov Chain for Adverse Drug Reaction

R package implementing an Evolutionary Monte-Carlo Markov Chain algorithm (an adaptation of Metropolis-Hastings). This package is designed to be used with medical data, especially with patients using medications.

Supervisor : Mr. Birmele Etienne

Install the package via CRAN repository

You can simply install the package by using the following command:

install.packages('emcAr')

Get the modified ATC tree (containing every medications)

The algorithm uses a modified medication tree which include an upper bound that locates the last drug in the drug family represented by the current node (if the current node is not a leaf). You can find the original drugs tree in the emcAdr/data folder. You can also use your own tree but an upper bound for each node is mandatory (the upper bound of a leaf is the index of this leaf in your 2D array).

Build or use a dataset of patients

The algorithm requires a data frame of patient. Every line of this Data frame represents a single patient, the medications they are taking and a boolean representing whether they have the adverse drug effect under consideration.

They are 2 mandatory columns : patientATC and patient ADR. Respectively the index of the drugs taken by the patient in the tree of drugs (indexes start at 0) and the boolean representing whether he has the ADR.

There is an example of a row for a patient who takes 3 drugs (having 12, 56 and 798 as indexes) and doesn't have the adverse drug reaction under consideration :

patientATCpatientADR
12, 56, 7980

Use the EMC function

You are now ready to use the EMC function contained in the package. Here is an example

res <- EMC(n = 100,nbIndividuals = 5,ATCtree = ATC_Tree_UpperBound_2014, observations = simulPatient_df, startingIndividuals = c(), startingTemperatures = c())

Copy Link

Version

Install

install.packages('emcAdr')

Monthly Downloads

172

Version

1.3

License

GPL-3

Maintainer

Jules Bangard

Last Published

February 5th, 2026

Functions in emcAdr (1.3)

trueDistributionSizeTwoCocktail

The true distribution of the score among every size-two cocktails
p_value_cocktails

Used to add the p_value to each cocktail of cocktail list
p_value_csv_file

Used to add the p_value to each cocktail of a csv_file that is an output of the genetic algorithm
int_cocktail_to_string_cocktail

Function used to convert integer cocktails (like the one outputed by the distributionApproximation function) to string cocktail in order to make them more readable
hyperparam_test_genetic_algorithm

This function can be used in order to try different set of parameters for the genetic algorithm in a convenient way. This will run each combination of mutation_rate, nb_elite and alphas possible nb_test_desired times. For each sets of parameters, results will be saved in a file named according to the set of parameter. One can regroup the results of each run in a csv file by using the print_csv function specifying the names of each file that needs to be treated and the number of performed runs on each parameter set
qq_plot_output

Make a Quantile-Quantile diagram from the output of the MCMC algorithm (DistributionAproximation) and the algorithm that exhaustively calculates the distribution
p_value_genetic_results

Used to add the p_value to each cocktail of an output of the genetic algorithm
csv_to_population

Function used to convert your genetic algorithm results that are stored into a .csv file to a Data structure that can be used by the clustering algorithm
p_value_on_sampled

Calculate p-value of sampled value
print_csv

Print every cocktails found during the genetic algorithm when used with the hyperparam_test_genetic_algorithm function. This enables to condense the solutions found in each files by collapsing similar cocktail in a single row by cocktail.
trueDistributionDrugs

The true distribution of the score among every single nodes of the ATC
string_list_to_int_cocktails

Function used to convert a string vector of drugs in form "drug1:drug2" to a vector of index of the ATC tree ex: c(ATC_index(drug1), ATC_index(drugs2))
remove_higher_cocktails

Filter out drug cocktails with high-level ATC classifications
run_firth_regression

Firth Penalized Logistic Regression for Drug Cocktails
plot_evolution

Plot the evolution of the mean and the best value of the population used by the GeneticAlgorithm
plot_frequency

Plot the histogram of the approximation of the RR distribution
combination_data_frame

Generate Matrix for Drug Combinations
clustering_genetic_algorithm

Clustering of the solutions of the genetic algorithm using the hclust algorithm
GeneticAlgorithm

Genetic algorithm, trying to reach riskiest cocktails (the ones which maximize the fitness function, Hypergeometric score in our case)
ATC_Tree_UpperBound_2024

ATC Tree Upper Bound 2024
calculate_divergence

Calculate the divergence between 2 distributions (the true Distribution and the learned one)
computeMetrics_size2

Function used in the reference article to compare diverse Disproportionality Analysis metrics
DistributionApproximation

The MCMC method that runs the random walk on a single cocktail in order to estimate the distribution of score among cocktails of size Smax.
ATCtoNumeric

Convert ATC Code for each patients to the corresponding DFS number of the ATC tree
OutsandingScoreToDistribution

Output the outstanding score (Outstanding_score) outputed by the MCMC algorithm in a special format
compute_hypergeom_on_list

Function used to compute the Hypergeometric score on a list of cocktails
compute_RR_on_list

Function used to compute the Relative Risk on a list of cocktails
compute_hypergeom_cocktail

Function used to compute the Hypergeometric score on a cocktail
get_dissimilarity_from_cocktail_list

Recover the square matrix of distance between cocktails where the index (i,j) of the matrix is the distance between cocktails i and j in an arbitrary cocktail list
hclust_genetic_solution

Clustering of the solutions of the genetic algorithm using the hclust algorithm
get_dissimilarity_from_genetic_results

Recover the square matrix of distance between cocktails where the index (i,j) of the matrix is the distance between cocktails i and j in the genetic_results list.
FAERS_myopathy

FAERS Myopathy Dataset
histogramToDitribution

Convert the histogram returned by the DistributionApproximation function, to a real number distribution (that can be used in a test for example)
emcAdr-package

tools:::Rd_package_title("emcAdr")
get_dissimilarity_from_txt_file

Recover the square matrix of distance between cocktails where the index (i,j) of the matrix is the distance between cocktails i and j in the csv file containing results of genetic algorithm