chemoDivCheck: Check data formatting

Description

Function to check that the datasets used by other functions in the chemodiv package are correctly formatted.

Usage

chemoDivCheck(sampleData, compoundData)

Value

One or several messages pointing out problems with data formatting, or a message informing that the datasets appear to be correctly formatted.

Arguments

sampleData: Data frame with the relative concentration of each compound (column) in every sample (row).
compoundData: Data frame with the compounds in sampleData as rows. Should have a column named "compound" with common names of the compounds, a column named "smiles" with SMILES IDs of the compounds, and a column named "inchikey" with the InChIKey IDs for the compounds.

Details

The function performs a number of checks on the two main datasets used as input data, to make sure datasets are formatted in a way suitable for the other functions in the package. This should make it easier for users to correctly construct datasets before starting with analyses.

Two datasets are needed to use the full set of analyses included in the package, and these can be checked for formatting issues. The first dataset should contain data on the proportions of different compounds (columns) in different samples (rows). Note that all calculations of diversity, and most calculations of dissimilarity, are only performed on relative, rather than absolute, values. The second dataset should contain, in each of three columns in a data frame, the compound name, SMILES and InChIKey IDs of all the compounds present in the first dataset. See chemodiv for details on obtaining SMILES and InChIKey IDs. For compound names, avoid starting with a number, avoid using Greek letters or special characters, and ensure there are no trailing spaces.

Examples

Run this code

data(minimalSampData)
data(minimalCompData)
chemoDivCheck(minimalSampData, minimalCompData) # Correct format
chemoDivCheck(minimalSampData, minimalCompData[c(2,3,1),]) # Incorrect format

data(alpinaSampData)
data(alpinaCompData)
chemoDivCheck(sampleData = alpinaSampData, compoundData = alpinaCompData)

Run the code above in your browser using DataLab