Learn R Programming

thinkr

{thinkr} is a set of tools for Cleaning Up Messy Files.

It contains some tools for cleaning up messy ‘Excel’ files to be suitable for R. People who have been working with ‘Excel’ for years built more or less complicated sheets with names, characters, formats that are not homogeneous. To be able to use them in R nowadays, we built a set of functions that will avoid the majority of importation problems and keep all the data at best.

Installation

CRAN version

install.packages("thinkr")

Github development version

# install.packages("devtools")
devtools::install_github("ThinkR-open/thinkr")

Once installed, you can load {thinkr}:

library(thinkr)

or without the package startup message:

suppressPackageStartupMessages(library(thinkr))

Usage

peep

peep function allows to print intermediate outputs inside a {dplyr}/%>% workflow

data(iris)
# just symbols
iris %>%
  peep(head, tail) %>%
  rename(species = Species) %>%
  summary()
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1          5.1         3.5          1.4         0.2  setosa
#> 2          4.9         3.0          1.4         0.2  setosa
#> 3          4.7         3.2          1.3         0.2  setosa
#> 4          4.6         3.1          1.5         0.2  setosa
#> 5          5.0         3.6          1.4         0.2  setosa
#> 6          5.4         3.9          1.7         0.4  setosa
#>     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
#> 145          6.7         3.3          5.7         2.5 virginica
#> 146          6.7         3.0          5.2         2.3 virginica
#> 147          6.3         2.5          5.0         1.9 virginica
#> 148          6.5         3.0          5.2         2.0 virginica
#> 149          6.2         3.4          5.4         2.3 virginica
#> 150          5.9         3.0          5.1         1.8 virginica
#>   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
#>  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
#>  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
#>  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
#>  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
#>  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
#>  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
#>        species  
#>  setosa    :50  
#>  versicolor:50  
#>  virginica :50  
#>                 
#>                 
#> 
# expressions with .
iris %>%
  peep(head(., n = 2), tail(., n = 3)) %>%
  summary()
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1          5.1         3.5          1.4         0.2  setosa
#> 2          4.9         3.0          1.4         0.2  setosa
#>     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
#> 148          6.5         3.0          5.2         2.0 virginica
#> 149          6.2         3.4          5.4         2.3 virginica
#> 150          5.9         3.0          5.1         1.8 virginica
#>   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
#>  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
#>  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
#>  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
#>  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
#>  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
#>  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
#>        Species  
#>  setosa    :50  
#>  versicolor:50  
#>  virginica :50  
#>                 
#>                 
#> 
# or both
iris %>%
  peep(head, tail(., n = 3)) %>%
  summary()
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1          5.1         3.5          1.4         0.2  setosa
#> 2          4.9         3.0          1.4         0.2  setosa
#> 3          4.7         3.2          1.3         0.2  setosa
#> 4          4.6         3.1          1.5         0.2  setosa
#> 5          5.0         3.6          1.4         0.2  setosa
#> 6          5.4         3.9          1.7         0.4  setosa
#>     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
#> 148          6.5         3.0          5.2         2.0 virginica
#> 149          6.2         3.4          5.4         2.3 virginica
#> 150          5.9         3.0          5.1         1.8 virginica
#>   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
#>  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
#>  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
#>  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
#>  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
#>  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
#>  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
#>        Species  
#>  setosa    :50  
#>  versicolor:50  
#>  virginica :50  
#>                 
#>                 
#> 
# use verbose to see what happens
iris %>%
  peep(head, tail(., n = 3), verbose = TRUE) %>%
  summary()
#> head(.)
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1          5.1         3.5          1.4         0.2  setosa
#> 2          4.9         3.0          1.4         0.2  setosa
#> 3          4.7         3.2          1.3         0.2  setosa
#> 4          4.6         3.1          1.5         0.2  setosa
#> 5          5.0         3.6          1.4         0.2  setosa
#> 6          5.4         3.9          1.7         0.4  setosa
#> tail(., n = 3)
#>     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
#> 148          6.5         3.0          5.2         2.0 virginica
#> 149          6.2         3.4          5.4         2.3 virginica
#> 150          5.9         3.0          5.1         1.8 virginica
#>   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
#>  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
#>  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
#>  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
#>  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
#>  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
#>  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
#>        Species  
#>  setosa    :50  
#>  versicolor:50  
#>  virginica :50  
#>                 
#>                 
#> 

clean_*

Function clean_names allows to clean dirty names, while removing special characters, spaces, …

data(iris)

iris %>% head()
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1          5.1         3.5          1.4         0.2  setosa
#> 2          4.9         3.0          1.4         0.2  setosa
#> 3          4.7         3.2          1.3         0.2  setosa
#> 4          4.6         3.1          1.5         0.2  setosa
#> 5          5.0         3.6          1.4         0.2  setosa
#> 6          5.4         3.9          1.7         0.4  setosa
iris %>%
  clean_names() %>%
  head()
#>   sepal_length sepal_width petal_length petal_width species
#> 1          5.1         3.5          1.4         0.2  setosa
#> 2          4.9         3.0          1.4         0.2  setosa
#> 3          4.7         3.2          1.3         0.2  setosa
#> 4          4.6         3.1          1.5         0.2  setosa
#> 5          5.0         3.6          1.4         0.2  setosa
#> 6          5.4         3.9          1.7         0.4  setosa

Function clean_vec allows to clean character vectors, while removing special characters, spaces, …

vector <- c("Jean Sébastien", "Anne-Sophie", "44@Bernard2")
cleaned <- clean_vec(vector)
cleaned
#> [1] "jean_sebastien" "anne_sophie"    "x44_bernard2"

Excel positions

Find Excel column position name from column number and inversely

ncol_to_excel(6)
#> [1] "F"
excel_to_ncol("AF")
#> [1] 32

Code of Conduct

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

Copy Link

Version

Install

install.packages('thinkr')

Monthly Downloads

295

Version

0.16

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

vincent guyader

Last Published

August 22nd, 2022

Functions in thinkr (0.16)

%ni%

not in
is_likert

is a factor a likert scale
look_like_a_number

return TRUE if this look like a number
replace_pattern

Replace pattern everywhere in a data.frame
%>%

Pipe operator
gsub2

like gsub but keep a factor as factor
is.12

does this vector only contains 1 and 2
is.01

does this vector only contains 0 and 1
is_full_figures

Predicate for charater vector full of figures
is_full_na

Predicate for full NA vector
thinkr-package

thinkr: Tools for Cleaning Up Messy Files
make_unique

make.unique improvement
save_as_csv

export a data.frame to csv
as_mon_numeric

Transform a vector into numeric if meaningful, even with bad decimal, space or %
set_col_type

set a given coltype to each column in a data.frame
peep

peep the pipeline
all_ggplot_to_pptx

Save all ggplot in a pptx
.efface_test

delete .test file in testthat folder
clean_vec

Clean character vector
dput_levels

return R instruction to create levels
excel_names

Get position or excel name of column
clean_levels

Clean levels label
clean_names

clean_names
from_excel_to_posixt

transform the excel numeric date format into POSIXct
find_name

find pattern in name's dataset