Learn R Programming

bioset

bioset is intended to help you working with sets of raw data.

Working in a lab it is not uncommon to have a data set of raw values (because your measuring device spat it out) and you now need to somehow transform and organise the data so that you can work with it.

Installation

A stable version of bioset is available on CRAN: https://cran.r-project.org/package=bioset

So all you need to do is:

install.packages("bioset")

You can find the latest additions and changes on GitHub. To spare CRAN administrators' time it is requested of all package authors not to submit changes too frequently.

Consequently, I will make new features available on GitHub first. Packages I have not yet submitted to CRAN will be labelled vX.Y.Z-pre.N and appear under: https://github.com/randomchars42/bioset/releases.

To install those packages you can use githubinstall

# install.packages("githubinstall")
gh_install_packages("bioset", ref = "vX.Y.Z-pre.N")

You can install the very latest changes in bioset-master from github with:

# install.packages("devtools")
devtools::install_github("randomchars42/bioset")

Why? What bioset can do for you

bioset lets you:

  • import raw data organised in matrices, e.g. measured values of a 8 x 12 (96-well) bio-assay plate
  • calculate concentrations using samples with known concentrations (calibrators) in your dataset
  • calculate means and variability for duplicates / triplicates / ...
  • convert your concentrations to (more or less) arbitrary units of concentration

Data import

Suppose you have an ods / xls(x) file with raw values obtained from a measurement like this:

123456
A102107156145360342
B198203101121231226
C296291276283430413
D430386325298110119

Save them as set_1.csv- thats like an ods / xls(x) file but its basically a text file with the values separated by commas. In the current versions of LibreOffice / OpenOffice / Microsoft office theres an option "Save as" > "csv".

Load the package.

library("bioset")

Then you can use set_read() to get all values with their position as name in a nice tibble:

set_read()
setpositionsample_idnamevalue
1A1A1A1102
1B1B1B1198
1C1C1C1296
1D1D1D1430
1A2A2A2107
1B2B2B2203
1C2C2C2291
1D2D2D2386
1A3A3A3156
1B3B3B3101
1C3C3C3276
1D3D3D3325
1A4A4A4145
1B4B4B4121
1C4C4C4283
1D4D4D4298
1A5A5A5360
1B5B5B5231
1C5C5C5430
1D5D5D5110
1A6A6A6342
1B6B6B6226
1C6C6C6413
1D6D6D6119

set_read() automagically reads set_1.csv in your current directory. If you have more than one set use set_read(num = 2) to read set 2, etc.

If your files are called plate_1.csv, plate_2.csv, ..., (run_1.csv, run_1.csv) you can set file_name = "plate_#NUM#.csv" (run_#NUM#.csv, ...).

If your files are stored in ./files/ tell set_read() where to look via path = "./files/".

Naming the values

Before feeding your samples into your measuring device you most likely drafted some sort of plan which position corresponds to which sample (didn't you?).

123456
ACAL1CAL1AABB
BCAL2CAL2CCDD
CCAL3CAL3EEFF
DCAL4CAL4GGHH

So you had some calibrators (1-4) and samples A, B, C, D, E, F, G, H, each in duplicates.

To easily set the names for your samples just copy the names into your set_1.csv:

123456
A102107156145360342
B198203101121231226
C296291276283430413
D430386325298110119
ECAL1CAL1AABB
FCAL2CAL2CCDD
GCAL3CAL3EEFF
HCAL4CAL4GGHH

Tell set_read() your data contains the names and which column should hold those names by setting additional_vars = c("name").

set_read(
  additional_vars = c("name")
)

This will get you:

setpositionsample_idnamevalue
1A1CAL1CAL1102
1B1CAL2CAL2198
1C1CAL3CAL3296
1D1CAL4CAL4430
1A2CAL1CAL1107
1B2CAL2CAL2203
1C2CAL3CAL3291
1D2CAL4CAL4386
1A3AA156
1B3CC101
1C3EE276
1D3GG325
1A4AA145
1B4CC121
1C4EE283
1D4GG298
1A5BB360
1B5DD231
1C5FF430
1D5HH110
1A6BB342
1B6DD226
1C6FF413
1D6HH119

Encoding additional properties

Suppose samples A, B, C, D were taken at day 1 and E, F, G, H were taken from the same rats / individuals / patients on day 2.

It would be more elegant to encode that into the data:

123456
A102107156145360342
B198203101121231226
C296291276283430413
D430386325298110119
ECAL1CAL1A_1A_1B_1B_1
FCAL2CAL2C_1C_1D_1D_1
GCAL3CAL3A_2A_2B_2B_2
HCAL4CAL4C_2C_2D_2D_2

Now, tell set_read() your data contains the names and day by setting additional_vars = c("name", "day"). This will get you:

set_read(
  additional_vars = c("name", "day")
)
setpositionsample_idnamedayvalue
1A1CAL1CAL1NA102
1B1CAL2CAL2NA198
1C1CAL3CAL3NA296
1D1CAL4CAL4NA430
1A2CAL1CAL1NA107
1B2CAL2CAL2NA203
1C2CAL3CAL3NA291
1D2CAL4CAL4NA386
1A3A_1A1156
1B3C_1C1101
1C3A_2A2276
1D3C_2C2325
1A4A_1A1145
1B4C_1C1121
1C4A_2A2283
1D4C_2C2298
1A5B_1B1360
1B5D_1D1231
1C5B_2B2430
1D5D_2D2110
1A6B_1B1342
1B6D_1D1226
1C6B_2B2413
1D6D_2D2119

Calculating concentrations

Propably, your measuring device only gave you raw values (extinction rates / relative light units / ...). You know the concentrations of CAL1, CAL2, CAL3 and CAL4. Conveniently, the concentrations follow a linear relationship. To get the concentrations for the rest of the samples you need to interpolate between those calibrators.

set_calc_concentrations() does exactly this for you:

set_calc_concentrations(
  data,
  cal_names = c("CAL1", "CAL2", "CAL3", "CAL4"),
  cal_values = c(1, 2, 3, 4) # ng / ml
)
setpositionsample_idnamedayvaluerealconcrecovery
1A1CAL1CAL1NA10211.00896861.0089686
1B1CAL2CAL2NA19821.96562030.9828102
1C1CAL3CAL3NA29632.94220230.9807341
1D1CAL4CAL4NA43044.27752861.0693822
1A2CAL1CAL1NA10711.05879421.0587942
1B2CAL2CAL2NA20322.01544591.0077230
1C2CAL3CAL3NA29132.89237670.9641256
1D2CAL4CAL4NA38643.83906330.9597658
1A3A_1A1156NA1.5470852NA
1B3C_1C1101NA0.9990035NA
1C3A_2A2276NA2.7428999NA
1D3C_2C2325NA3.2311908NA
1A4A_1A1145NA1.4374689NA
1B4C_1C1121NA1.1983059NA
1C4A_2A2283NA2.8126557NA
1D4C_2C2298NA2.9621325NA
1A5B_1B1360NA3.5799701NA
1B5D_1D1231NA2.2944694NA
1C5B_2B2430NA4.2775286NA
1D5D_2D2110NA1.0886896NA
1A6B_1B1342NA3.4005979NA
1B6D_1D1226NA2.2446437NA
1C6B_2B2413NA4.1081216NA
1D6D_2D2119NA1.1783757NA

Your calibrators are not so linear? Perhaps after a ln-ln transformation? You can use: model_func = fit_lnln and interpolate_func = interpolate_lnln. Basicallly, you can use any function as model_function that returns a model which is understood by your interpolate-func.

Duplicates / Triplicates / ...

So samples were measured in duplicates. For our further research you might want to use the mean and perhaps exclude samples with too much spread in their values.

set_calc_variability() to the rescue.

data <- set_calc_variability(
  data = data,
  ids = sample_id,
  value,
  conc
)

This will give you the mean and coefficient of variation (as well as n of the samples and the standard deviation) for the columns value and conc. It will use sample_id to determine which rows belong to the same sample.

setpositionsample_idnamedayvaluerealconcrecoveryvalue_nvalue_meanvalue_sdvalue_cvconc_nconc_meanconc_sdconc_cv
1A1CAL1CAL1NA10211.00896861.00896862104.53.5355340.033832921.0338810.03523200.0340774
1B1CAL2CAL2NA19821.96562030.98281022200.53.5355340.017633621.9905330.03523200.0176998
1C1CAL3CAL3NA29632.94220230.98073412293.53.5355340.012046122.9172890.03523200.0120770
1D1CAL4CAL4NA43044.27752861.06938222408.031.1126980.076256624.0582960.31004180.0763970
1A2CAL1CAL1NA10711.05879421.05879422104.53.5355340.033832921.0338810.03523200.0340774
1B2CAL2CAL2NA20322.01544591.00772302200.53.5355340.017633621.9905330.03523200.0176998
1C2CAL3CAL3NA29132.89237670.96412562293.53.5355340.012046122.9172890.03523200.0120770
1D2CAL4CAL4NA38643.83906330.95976582408.031.1126980.076256624.0582960.31004180.0763970
1A3A_1A1156NA1.5470852NA2150.57.7781750.051682221.4922770.07751050.0519411
1B3C_1C1101NA0.9990035NA2111.014.1421360.127406621.0986550.14092810.1282733
1C3A_2A2276NA2.7428999NA2279.54.9497470.017709322.7777780.04932480.0177569
1D3C_2C2325NA3.2311908NA2311.519.0918830.061290223.0966620.19025290.0614381
1A4A_1A1145NA1.4374689NA2150.57.7781750.051682221.4922770.07751050.0519411
1B4C_1C1121NA1.1983059NA2111.014.1421360.127406621.0986550.14092810.1282733
1C4A_2A2283NA2.8126557NA2279.54.9497470.017709322.7777780.04932480.0177569
1D4C_2C2298NA2.9621325NA2311.519.0918830.061290223.0966620.19025290.0614381
1A5B_1B1360NA3.5799701NA2351.012.7279220.036261923.4902840.12683530.0363395
1B5D_1D1231NA2.2944694NA2228.53.5355340.015472822.2695570.03523200.0155237
1C5B_2B2430NA4.2775286NA2421.512.0208150.028519124.1928250.11978890.0285700
1D5D_2D2110NA1.0886896NA2114.56.3639610.055580421.1335330.06341760.0559469
1A6B_1B1342NA3.4005979NA2351.012.7279220.036261923.4902840.12683530.0363395
1B6D_1D1226NA2.2446437NA2228.53.5355340.015472822.2695570.03523200.0155237
1C6B_2B2413NA4.1081216NA2421.512.0208150.028519124.1928250.11978890.0285700
1D6D_2D2119NA1.1783757NA2114.56.3639610.055580421.1335330.06341760.0559469

The short way

If you need to read and transform multiple sets sets_read can do that for you.

It takes basically the same arguments as set_read, set_calc_concentrations and set_calc_variability combined and combines their functionality. The principal difference is, that sets_read takes sets - the number of sets to process.

It returns a list and may (write_data = TRUE) create two files in your current directory: data_all.csv and data_samples.csv with the processed data.

sets_read()'s list holds the following items:

  • $all: here you will find all the data , including calibrators, duplicates, ... (saved in data_all.csv if write_data = TRUE)
  • $samples: only one row per distinct sample here - no calibrators, no duplicates -> most often you will work with this data (saved in data_samples.csv if write_data = TRUE)
  • $set1: a list
    • $plot: a plot showing you the function used to calculate the concentrations for this set. The points represent the calibrators.
    • $model: the model as returned by model_func
  • ($set2 - $setN): the same information for every set you have

Take a look at the data

# now you may run it :)
result_list <- sets_read(
  sets = 1,
  sep = ",",
  additional_vars = c("name", "day"),
  cal_names = c("CAL1", "CAL2", "CAL3", "CAL4"),
  cal_values = c(1, 2, 3, 4) # ng / ml
)
result_list$all
setpositionsample_idnamedayvaluerealrecoverynrawraw_meanraw_sdraw_cvconcentrationconcentration_sdconcentration_cv
1A1CAL1CAL1NA10211.00896862102104.53.5355340.03383291.0338810.03523200.0340774
1B1CAL2CAL2NA19820.98281022198200.53.5355340.01763361.9905330.03523200.0176998
1C1CAL3CAL3NA29630.98073412296293.53.5355340.01204612.9172890.03523200.0120770
1D1CAL4CAL4NA43041.06938222430408.031.1126980.07625664.0582960.31004180.0763970
1A2CAL1CAL1NA10711.05879422107104.53.5355340.03383291.0338810.03523200.0340774
1B2CAL2CAL2NA20321.00772302203200.53.5355340.01763361.9905330.03523200.0176998
1C2CAL3CAL3NA29130.96412562291293.53.5355340.01204612.9172890.03523200.0120770
1D2CAL4CAL4NA38640.95976582386408.031.1126980.07625664.0582960.31004180.0763970
1A3A_1A1156NANA2156150.57.7781750.05168221.4922770.07751050.0519411
1B3C_1C1101NANA2101111.014.1421360.12740661.0986550.14092810.1282733
1C3A_2A2276NANA2276279.54.9497470.01770932.7777780.04932480.0177569
1D3C_2C2325NANA2325311.519.0918830.06129023.0966620.19025290.0614381
1A4A_1A1145NANA2145150.57.7781750.05168221.4922770.07751050.0519411
1B4C_1C1121NANA2121111.014.1421360.12740661.0986550.14092810.1282733
1C4A_2A2283NANA2283279.54.9497470.01770932.7777780.04932480.0177569
1D4C_2C2298NANA2298311.519.0918830.06129023.0966620.19025290.0614381
1A5B_1B1360NANA2360351.012.7279220.03626193.4902840.12683530.0363395
1B5D_1D1231NANA2231228.53.5355340.01547282.2695570.03523200.0155237
1C5B_2B2430NANA2430421.512.0208150.02851914.1928250.11978890.0285700
1D5D_2D2110NANA2110114.56.3639610.05558041.1335330.06341760.0559469
1A6B_1B1342NANA2342351.012.7279220.03626193.4902840.12683530.0363395
1B6D_1D1226NANA2226228.53.5355340.01547282.2695570.03523200.0155237
1C6B_2B2413NANA2413421.512.0208150.02851914.1928250.11978890.0285700
1D6D_2D2119NANA2119114.56.3639610.05558041.1335330.06341760.0559469
result_list$samples
positionsample_idnamedayplatenrawraw_sdraw_cvconcentrationconcentration_sdconcentration_cv
A3A_1A112150.57.7781750.05168221.4922770.07751050.0519411
B3C_1C112111.014.1421360.12740661.0986550.14092810.1282733
C3A_2A212279.54.9497470.01770932.7777780.04932480.0177569
D3C_2C212311.519.0918830.06129023.0966620.19025290.0614381
A5B_1B112351.012.7279220.03626193.4902840.12683530.0363395
B5D_1D112228.53.5355340.01547282.2695570.03523200.0155237
C5B_2B212421.512.0208150.02851914.1928250.11978890.0285700
D5D_2D212114.56.3639610.05558041.1335330.06341760.0559469
result_list$set1$plot

Copy Link

Version

Install

install.packages('bioset')

Monthly Downloads

27

Version

0.2.3

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Eike Christian K<c3><bc>hn

Last Published

November 13th, 2018

Functions in bioset (0.2.3)

convert_conc

Convert a value of the given concentration into another concentration.
calc_factor_prefix

Get a factor to convert metric prefixes.
sets_read

Read sets and calculate concentrations and variability.
set_read

Read a data set from a data-sheet and turn it into a multi-column tibble.
models_linear

Linear model functions.
models_lnln

Model functions for data requiring ln-ln-transformation to fit a model.
set_calc_concentrations

Calculate concentrations for the set using contained calibrators.
set_calc_variability

Calculate parameters of variability for a given set of values.
convert_prefix

Convert between metric prefixes.
bioset-package

Convert a matrix of raw values into nice and tidy data.
calc_factor_conc

Get a factor to convert concentrations.