diffCsv: Diff CSV Files

Description

Reads CSV files with read.csv and passes the resulting data frames onto diffPrint. extra values are passed as arguments are passed to both read.csv and print. To the extent you wish to use different extra arguments for each of those functions you will need to read.csv the files and pass them to diffPrint yourself.

Usage

diffCsv(target, current, ...)
# S4 method for ANY
diffCsv(target, current, mode = gdo("mode"),
  context = gdo("context"), format = gdo("format"),
  brightness = gdo("brightness"), color.mode = gdo("color.mode"),
  word.diff = gdo("word.diff"), pager = gdo("pager"),
  guides = gdo("guides"), trim = gdo("trim"), rds = gdo("rds"),
  unwrap.atomic = gdo("unwrap.atomic"), max.diffs = gdo("max.diffs"),
  disp.width = gdo("disp.width"),
  ignore.white.space = gdo("ignore.white.space"),
  convert.hz.white.space = gdo("convert.hz.white.space"),
  tab.stops = gdo("tab.stops"), line.limit = gdo("line.limit"),
  hunk.limit = gdo("hunk.limit"), align = gdo("align"),
  style = gdo("style"), palette.of.styles = gdo("palette"),
  frame = par_frame(), interactive = gdo("interactive"),
  term.colors = gdo("term.colors"), tar.banner = NULL, cur.banner = NULL,
  extra = list())

Arguments

target

character(1L) or file connection with read capability; if character should point to a CSV file

current

like target

...

unused, for compatibility of methods with generics

mode

character(1L), one of:

“unified”: diff mode used by git diff
“sidebyside”: line up the differences side by side
“context”: show the target and current hunks in their entirety; this mode takes up a lot of screen space but makes it easier to see what the objects actually look like
“auto”: default mode; pick one of the above, will favor “sidebyside” unless getOption("width") is less than 80, or in diffPrint and objects are dimensioned and do not fit side by side, or in diffChr, diffDeparse, diffFile and output does not fit in side by side without wrapping

context

integer(1L) how many lines of context are shown on either side of differences (defaults to 2). Set to -1L to allow as many as there are. Set to “auto” to display as many as 10 lines or as few as 1 depending on whether total screen lines fit within the number of lines specified in line.limit. Alternatively pass the return value of auto_context to fine tune the parameters of the auto context calculation.

format

character(1L), controls the diff output format, one of:

“auto”: to select output format based on terminal capabilities; will attempt to use one of the ANSI formats if they appear to be supported, and if not will attempt to use HTML and browser output if in interactive mode.
“raw”: plain text
“ansi8”: color and format diffs using basic ANSI escape sequences
“ansi256”: like “ansi8”, except using the full range of ANSI formatting options
“html”: color and format using HTML markup; the resulting string is processed with enc2utf8 when output as a full web page (see docs for html.output under Style).

Defaults to “auto”. See palette.of.styles for details on customization, style for full control of output format.

brightness

character, one of “light”, “dark”, “neutral”, useful for adjusting color scheme to light or dark terminals. “neutral” by default. See PaletteOfStyles for details and limitations. Advanced: you may specify brightness as a function of format. For example, if you typically wish to use a “dark” color scheme, except for when in “html” format when you prefer the “light” scheme, you may use c("dark", html="light") as the value for this parameter. This is particularly useful if format is set to “auto” or if you want to specify a default value for this parameter via options. Any names you use should correspond to a format. You must have one unnamed value which will be used as the default for all formats that are not explicitly specified.

color.mode

character, one of “rgb” or “yb”. Defaults to “yb”. “yb” stands for “Yellow-Blue” for color schemes that rely primarily on those colors to style diffs. Those colors can be easily distinguished by individuals with limited red-green color sensitivity. See PaletteOfStyles for details and limitations. Also offers the same advanced usage as the brightness parameter.

word.diff

TRUE (default) or FALSE, whether to run a secondary word diff on the in-hunk differences. For atomic vectors setting this to FALSE could make the diff slower (see the unwrap.atomic parameter). For other uses, particularly with diffChr setting this to FALSE can substantially improve performance.

pager

one of “auto” (default), “on”, “off”, or a Pager object; controls whether and how a pager is used to display the diff output. If “on” will use the pager associated with the Style specified via the style parameters. The default behavior is to use a pager if either the R console does not support ANSI colors, or if the output of the diff* methods would be taller than one screen. If the system pager is not known to support ANSI colors then we will try to display the output in HTML with the IDE viewer if available or with the web browser if not. See Pager, view_or_browse, Style, and PaletteOfStyles for more details and for instructions on how to modify the default behavior.

guides

TRUE (default), FALSE, or a function that accepts at least two arguments and requires no more than two arguments. Guides are additional context lines that are not strictly part of a hunk, but provide important contextual data (e.g. column headers). If TRUE, the context lines are shown in addition to the normal diff output, typically in a different color to indicate they are not part of the hunk. If a function, the function should accept as the first argument the object being diffed, and the second the character representation of the object. The function should return the indices of the elements of the second character representation that should be treated as guides. See guides for more details.

trim

TRUE (default), FALSE, or a function that accepts at least two arguments and requires no more than two arguments. Function should compute for each line in captured output what portion of those lines should be diffed. By default, this is used to remove row meta data differences (e.g. [1,]) so they alone do not show up as differences in the diff. See trim for more details.

rds

TRUE (default) or FALSE, if TRUE will check whether target and/or current point to a file that can be read with readRDS and if so, loads the R object contained in the file and carries out the diff on the object instead of the original argument. Currently there is no mechanism for specifying additional arguments to readRDS

unwrap.atomic

TRUE (default) or FALSE. Only relevant for diffPrint, if TRUE, and word.diff is also TRUE, and both target and current are unnamed and atomic, the vectors are unwrapped and diffed element by element, and then re-wrapped. Since diffPrint is fundamentally a line diff, the re-wrapped lines are lined up in a manner that is as consistent as possible with the unwrapped diff. Lines that contain the location of the word differences will be paired up. Since the vectors may well be wrapped with different periodicities this will result in lines that are paired up that look like they should not be paired up, though the locations of the differences should be. If is entirely possible that setting this parameter to FALSE will result in a slower diff. This happens if two vectors are actually fairly similar, but their line representations are not. For example, in comparing 1:100 to c(100, 1:99), there is really only one difference at the “word” level, but every screen line is different.

max.diffs

integer(1L), number of differences after which we abandon the O(n^2) diff algorithm in favor of a linear one. Set to -1L to always stick to the original algorithm (defaults to 10000L).

disp.width

integer(1L) number of display columns to take up; note that in “sidebyside” mode the effective display width is half this number (set to 0L to use default widths which are getOption("width") for normal styles and 80L for HTML styles.

ignore.white.space

TRUE or FALSE, whether to consider differences in horizontal whitespace (i.e. spaces and tabs) as differences (defaults to FALSE)

convert.hz.white.space

TRUE or FALSE, whether modify input strings that contain tabs and carriage returns in such a way that they display as they would with those characters, but without using those characters (defaults to TRUE). The conversion assumes that tab stops are spaced evenly eight characters apart on the terminal. If this is not the case you may specify the tab stops explicitly with tab.stops.

tab.stops

integer, what tab stops to use when converting hard tabs to spaces. If not integer will be coerced to integer (defaults to 8L). You may specify more than one tab stop. If display width exceeds that addressable by your tab stops the last tab stop will be repeated.

line.limit

integer(2L) or integer(1L), if length 1 how many lines of output to show, where -1 means no limit. If length 2, the first value indicates the threshold of screen lines to begin truncating output, and the second the number of lines to truncate to, which should be fewer than the threshold. Note that this parameter is implemented on a best-efforts basis and should not be relied on to produce the exact number of lines requested. If you want a specific number of lines use [ or head / tail. One advantage of line.limit over these other options is that you can combine it with context="auto" and auto max.level selection (the latter for diffStr), which allows the diff to dynamically adjust to make best use of the available display lines. [, head, and tail just subset the text of the output.

hunk.limit

integer(2L) or integer (1L), how many diff hunks to show. Behaves similarly to line.limit. How many hunks are in a particular diff is a function of how many differences, and also how much context is used since context can cause two hunks to bleed into each other and become one.

align

numeric(1L) between 0 and 1, proportion of words in a line of target that must be matched in a line of current in the same hunk for those lines to be paired up when displayed (defaults to 0.25), or an AlignThreshold object. Set to 1 to turn off alignment which will cause all lines in a hunk from target to show up first, followed by all lines from current. Note that in order to be aligned lines must meet the threshold and have at least 3 matching alphanumeric characters (see AlignThreshold for details).

style

“auto”, a Style object, or a list. “auto” by default. If a Style object, will override the the format, brightness, and color.mode parameters. The Style object provides full control of diff output styling. If a list, then the same as “auto”, except that if the auto-selected Style requires instantiation (see PaletteOfStyles), then the list contents will be used as arguments when instantiating the style object. See Style for more details, in particular the examples.

palette.of.styles

PaletteOfStyles object; advanced usage, contains all the Style objects or “classRepresentation” objects extending Style that are selected by specifying the format, brightness, and color.mode parameters. See PaletteOfStyles for more details.

frame

an environment to use as the evaluation frame for the print/show/str, calls and for diffObj, the evaluation frame for the diffPrint / diffStr calls. Defaults to the return value of par_frame.

interactive

TRUE or FALSE whether the function is being run in interactive mode, defaults to the return value of interactive. If in interactive mode, pager will be used if pager is “auto”, and if ANSI styles are not supported and style is “auto”, output will be send to viewer/browser as HTML.

term.colors

integer(1L) how many ANSI colors are supported by the terminal. This variable is provided for when crayon::num_colors does not properly detect how many ANSI colors are supported by your terminal. Defaults to return value of crayon::num_colors and should be 8 or 256 to allow ANSI colors, or any other number to disallow them. This only impacts output format selection when style and format are both set to “auto”.

tar.banner

character(1L), language, or NULL, used to generate the text to display ahead of the diff section representing the target output. If NULL will use the deparsed target expression, if language, will use the language as it would the target expression, if character(1L), will use the string with no modifications. The language mode is provided because diffStr modifies the expression prior to display (e.g. by wrapping it in a call to str). Note that it is possible in some cases that the substituted value of target actually is character(1L), but if you provide a character(1L) value here it will be assumed you intend to use that value literally.

cur.banner

character(1L) like tar.banner, but for current

extra

list additional arguments to pass on to the functions used to create text representation of the objects to diff (e.g. print, str, etc.)

Value

a Diff object; see diffPrint.

Examples

Run this code

# NOT RUN {
iris.2 <- iris
iris.2$Sepal.Length[5] <- 99
f1 <- tempfile()
f2 <- tempfile()
write.csv(iris, f1, row.names=FALSE)
write.csv(iris.2, f2, row.names=FALSE)
## `pager="off"` for CRAN compliance; you may omit in normal use
diffCsv(f1, f2, pager="off")
unlink(c(f1, f2))
# }

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples