formatC
Formatting Using C-style Formats
formatC()
formats numbers individually and flexibly using
C
style format specifications.
prettyNum()
is used for “prettifying” (possibly
formatted) numbers, also in format.default
.
.format.zeros(x)
, an auxiliary function of prettyNum()
,
re-formats the zeros in a vector x
of formatted numbers.
Usage
formatC(x, digits = NULL, width = NULL,
format = NULL, flag = "", mode = NULL,
big.mark = "", big.interval = 3L,
small.mark = "", small.interval = 5L,
decimal.mark = getOption("OutDec"),
preserve.width = "individual",
zero.print = NULL, replace.zero = TRUE,
drop0trailing = FALSE)prettyNum(x, big.mark = "", big.interval = 3L,
small.mark = "", small.interval = 5L,
decimal.mark = getOption("OutDec"), input.d.mark = decimal.mark,
preserve.width = c("common", "individual", "none"),
zero.print = NULL, replace.zero = FALSE,
drop0trailing = FALSE, is.cmplx = NA,
…)
.format.zeros(x, zero.print, nx = suppressWarnings(as.numeric(x)),
replace = FALSE, warn.non.fitting = TRUE)
Arguments
- x
an atomic numerical or character object, possibly
complex
only forprettyNum()
, typically a vector of real numbers. Any class is discarded, with a warning.- digits
the desired number of digits after the decimal point (
format = "f"
) or significant digits (format = "g"
,= "e"
or= "fg"
).Default: 2 for integer, 4 for real numbers. If less than 0, the C default of 6 digits is used. If specified as more than 50, 50 will be used with a warning unless
format = "f"
where it is limited to typically 324. (Not more than 15--21 digits need be accurate, depending on the OS and compiler used. This limit is just a precaution against segfaults in the underlying C runtime.)- width
the total field width; if both
digits
andwidth
are unspecified,width
defaults to 1, otherwise todigits + 1
.width = 0
will usewidth = digits
,width < 0
means left justify the number in this field (equivalent toflag = "-"
). If necessary, the result will have more characters thanwidth
. For character data this is interpreted in characters (not bytes nor display width).- format
equal to
"d"
(for integers),"f"
,"e"
,"E"
,"g"
,"G"
,"fg"
(for reals), or"s"
(for strings). Default is"d"
for integers,"g"
for reals."f"
gives numbers in the usualxxx.xxx
format;"e"
and"E"
given.ddde+nn
orn.dddE+nn
(scientific format);"g"
and"G"
putx[i]
into scientific format only if it saves space to do so and drop trailing zeros and decimal point - unlessflag
contains"#"
which keeps trailing zeros for the"g", "G"
formats."fg"
(our own hybrid format) uses fixed format as"f"
, butdigits
as the minimum number of significant digits. This can lead to quite long result strings, see examples below. Note that unlikesignif
this prints large numbers with more significant digits thandigits
. Trailing zeros are dropped in this format, unlessflag
contains"#"
.- flag
for
formatC
, a character string giving a format modifier as in Kernighan and Ritchie (1988, page 243) or the C+99 standard."0"
pads leading zeros;
"-"
does left adjustment,
"+"
ensures a sign in all cases, i.e.,
"+"
for positive numbers ," "
if the first character is not a sign, the space character
" "
will be used instead."#"
specifies “an alternative output form”, specifically depending on
format
."'"
on some platform--locale combination, activates “thousands' grouping” for decimal conversion,
"I"
in some versions of
glibc
allow for integer conversion to use the locale's alternative output digits, if any.
There can be more than one of these flags, in any order. Other characters used to have no effect for
character
formatting, but signal an error since R 3.4.0.- mode
"double"
(or"real"
),"integer"
or"character"
. Default: Determined from the storage mode ofx
.- big.mark
character; if not empty used as mark between every
big.interval
decimals before (hencebig
) the decimal point.- big.interval
see
big.mark
above; defaults to 3.- small.mark
character; if not empty used as mark between every
small.interval
decimals after (hencesmall
) the decimal point.- small.interval
see
small.mark
above; defaults to 5.- decimal.mark
the character to be used to indicate the numeric decimal point.
- input.d.mark
if
x
ischaracter
, the character known to have been used as the numeric decimal point inx
.- preserve.width
string specifying if the string widths should be preserved where possible in those cases where marks (
big.mark
orsmall.mark
) are added."common"
, the default, corresponds toformat
-like behavior whereas"individual"
is the default informatC()
. Value can be abbreviated.- zero.print
logical, character string or
NULL
specifying if and how zeros should be formatted specially. Useful for pretty printing ‘sparse’ objects.- replace.zero, replace
logical; if
zero.print
is a character string, indicates if the exact zero entries inx
should be simply replaced byzero.print
. Otherwise, depending on the widths of the respective strings, the (formatted) zeroes are partly replaced byzero.print
and then padded with" "
to the right were applicable. In that case (falsereplace[.zero]
), if thezero.print
string does not fit, a warning is produced (ifwarn.non.fitting
is true).This works via
prettyNum()
, which calls.format.zeros(*, replace=replace.zero)
three times in this case, see the ‘Details’.- warn.non.fitting
logical; if it is true,
replace[.zero]
is false and thezero.print
string does not fit, awarning
is signalled.- drop0trailing
logical, indicating if trailing zeros, i.e.,
"0"
after the decimal mark, should be removed; also drops"e+00"
in exponential formats. This is simply passed toprettyNum()
, see the ‘Details’.- is.cmplx
optional logical, to be used when
x
is"character"
to indicate if it stems fromcomplex
vector or not. By default (NA
),x
is checked to ‘look like’ complex.- …
arguments passed to
format
.- nx
numeric vector of the same length as
x
, typically the numbers of which the character vectorx
is the pre-format.
Details
For numbers, formatC()
calls prettyNum()
when needed
which itself calls .format.zeros(*, replace=replace.zero)
.
(“when needed”: when zero.print
is not
NULL
, drop0trailing
is true, or one of big.mark
,
small.mark
, or decimal.mark
is not at default.)
If you set format
it overrides the setting of mode
, so
formatC(123.45, mode = "double", format = "d")
gives 123
.
The rendering of scientific format is platform-dependent: some systems
use n.ddde+nnn
or n.dddenn
rather than n.ddde+nn
.
formatC
does not necessarily align the numbers on the decimal
point, so formatC(c(6.11, 13.1), digits = 2, format = "fg")
gives
c("6.1", " 13")
. If you want common formatting for several
numbers, use format
.
prettyNum
is the utility function for prettifying x
.
x
can be complex (or format(<complex>)
), here. If
x
is not a character, format(x[i], ...)
is applied to
each element, and then it is left unchanged if all the other arguments
are at their defaults. Use the input.d.mark
argument for
prettyNum(x)
when x
is a character
vector not
resulting from something like format(<number>)
with a period as
decimal mark.
Because gsub
is used to insert the big.mark
and small.mark
, special characters need escaping. In particular,
to insert a single backslash, use "\\\\"
.
The C doubles used for R numerical vectors have signed zeros, which
formatC
may output as -0
, -0.000
….
There is a warning if big.mark
and decimal.mark
are the
same: that would be confusing to those reading the output.
Value
A character object of same size and attributes as x
(after
discarding any class), in the current locale's encoding.
Unlike format
, each number is formatted individually.
Looping over each element of x
, the C function
sprintf(…)
is called for numeric inputs (inside the C
function str_signif
).
formatC
: for character x
, do simple (left or right)
padding with white space.
Note
The default for decimal.mark
in formatC()
was changed in
R 3.2.0: for use within print
methods in packages which might
be used with earlier versions: use decimal.mark = getOption("OutDec")
explicitly.
References
Kernighan, B. W. and Ritchie, D. M. (1988) The C Programming Language. Second edition. Prentice Hall.
See Also
sprintf
for more general C-like formatting.
Examples
library(base)
# NOT RUN {
xx <- pi * 10^(-5:4)
cbind(format(xx, digits = 4), formatC(xx))
cbind(formatC(xx, width = 9, flag = "-"))
cbind(formatC(xx, digits = 5, width = 8, format = "f", flag = "0"))
cbind(format(xx, digits = 4), formatC(xx, digits = 4, format = "fg"))
f <- (-2:4); f <- f*16^f
# Default ("g") format:
formatC(pi*f)
# Fixed ("f") format, more than one flag ('width' partly "enlarged"):
cbind(formatC(pi*f, digits = 3, width=9, format = "f", flag = "0+"))
formatC( c("a", "Abc", "no way"), width = -7) # <=> flag = "-"
formatC(c((-1:1)/0,c(1,100)*pi), width = 8, digits = 1)
## note that some of the results here depend on the implementation
## of long-double arithmetic, which is platform-specific.
xx <- c(1e-12,-3.98765e-10,1.45645e-69,1e-70,pi*1e37,3.44e4)
## 1 2 3 4 5 6
formatC(xx)
formatC(xx, format = "fg") # special "fixed" format.
formatC(xx[1:4], format = "f", digits = 75) #>> even longer strings
formatC(c(3.24, 2.3e-6), format = "f", digits = 11)
formatC(c(3.24, 2.3e-6), format = "f", digits = 11, drop0trailing = TRUE)
r <- c("76491283764.97430", "29.12345678901", "-7.1234", "-100.1","1123")
## American:
prettyNum(r, big.mark = ",")
## Some Europeans:
prettyNum(r, big.mark = "'", decimal.mark = ",")
(dd <- sapply(1:10, function(i) paste((9:0)[1:i], collapse = "")))
prettyNum(dd, big.mark = "'")
## examples of 'small.mark'
pN <- stats::pnorm(1:7, lower.tail = FALSE)
cbind(format (pN, small.mark = " ", digits = 15))
cbind(formatC(pN, small.mark = " ", digits = 17, format = "f"))
cbind(ff <- format(1.2345 + 10^(0:5), width = 11, big.mark = "'"))
## all with same width (one more than the specified minimum)
## individual formatting to common width:
fc <- formatC(1.234 + 10^(0:8), format = "fg", width = 11, big.mark = "'")
cbind(fc)
## Powers of two, stored exactly, formatted individually:
pow.2 <- formatC(2^-(1:32), digits = 24, width = 1, format = "fg")
## nicely printed (the last line showing 5^32 exactly):
noquote(cbind(pow.2))
## complex numbers:
r <- 10.0000001; rv <- (r/10)^(1:10)
(zv <- (rv + 1i*rv))
op <- options(digits = 7) ## (system default)
(pnv <- prettyNum(zv))
stopifnot(pnv == "1+1i", pnv == format(zv),
pnv == prettyNum(zv, drop0trailing = TRUE))
## more digits change the picture:
options(digits = 8)
head(fv <- format(zv), 3)
prettyNum(fv)
prettyNum(fv, drop0trailing = TRUE) # a bit nicer
options(op)
## The ' flag :
doLC <- FALSE # <= R warns, so change to TRUE manually if you want see the effect
if(doLC)
oldLC <- Sys.setlocale("LC_NUMERIC", "de_CH.UTF-8")
formatC(1.234 + 10^(0:4), format = "fg", width = 11, flag = "'")
## --> ..... " 1'001" " 10'001" on supported platforms
if(doLC) ## revert, typically to "C" :
Sys.setlocale("LC_NUMERIC", oldLC)
# }
Community examples
# `formatC()` `formatC()` is based upon the C/C++ function `printf()`. The [documentation](http://www.cplusplus.com/reference/cstdio/printf) for that function contains details on how many of the arguments to `formatC()` work. ## Basic usage `formatC()` takes a numeric vector as an input and returns a character vector. With no arguments other than `x`, `formatC()` works like [`signif()`](https://www.rdocumentation.org/packages/base/topics/signif) + [`as.character()`](https://www.rdocumentation.org/packages/base/topics/character), (though with different rules for rounding). ```{r} (x <- 1.2345 * 10 ^ (-4:4)) formatC(x) as.character(signif(x, 4)) ``` ## `digits` and `format` arguments These arguments affect each other, and need to be discussed together. `format` determines whether the numbers should be displayed using fixed, scientific, or integer-like formatting. `digits` determines either the number of _significant digits_ or the number of _decimal paces_ to round the numbers to, depending upon the value of `format`. Note that the documentation is apparently wrong (as of v3.3.2) – the default value of `digits` is always 4, even for `integer` inputs. `format = "d"` treats the numbers as integers, so nothing after the decimal place is shown. Conversion to integer using [`as.integer()`](https://www.rdocumentation.org/packages/base/topics/integer) is shown for comparison. ```{r} (x <- c(1.2345, 6.789) * 10 ^ (-2:3) * c(1, -1, 1)) formatC(x, format = "d") as.character(as.integer(x)) ``` `format = "e"` uses scientific formatting. In this case, `digits` refers to the number of significant digits to round each number to. See [`signif()`](https://www.rdocumentation.org/packages/base/topics/signif) for comparison. ```{r} (x <- 1.2345 * 10 ^ (-4:4)) formatC(x, format = "e", digits = 1) formatC(x, format = "e", digits = 2) formatC(x, format = "e", digits = 3) formatC(x, format = "e") # implicitly digits = 4 formatC(x, format = "e", digits = 5) ``` `format = "E"` is a variant of `format = "e"` that prints an upper-case `"E"` to denote the start of the exponent. ```{r} (x <- 1.2345 * 10 ^ (-4:4)) formatC(x, format = "E") ``` `format = "f"` uses fixed formatting. In this case, `digits` refers to the number of decimal places to round each number to. See [`round()`](https://www.rdocumentation.org/packages/base/topics/round) for comparison. ```{r} (x <- 1.2345 * 10 ^ (-4:4)) formatC(x, format = "f", digits = 1) formatC(x, format = "f", digits = 2) formatC(x, format = "f", digits = 3) formatC(x, format = "f") # implicitly digits = 4 formatC(x, format = "f", digits = 5) ``` `format = "g"` automatically mixes fixed or scientific, trying to choose whichever takes the least amount of space. `digits` refers to significant digits. ```{r} (x <- 1.2345 * 10 ^ (-4:4)) formatC(x, format = "g", digits = 1) formatC(x, format = "g", digits = 2) formatC(x, format = "g", digits = 3) formatC(x, format = "g") # implicitly digits = 4 formatC(x, format = "g", digits = 5) ``` `format = "G"` is a variant of `format = "g"` that prints an upper-case `"E"` to denote the start of the exponent in scientifically formatted values. ```{r} (x <- 1.2345 * 10 ^ (-4:4)) formatC(x, format = "G") ``` `format = "fg"` works like `format = "g"`, except that `digits` refers to decimal places. In this case, trailing zeroes are dropped. ```{r} (x <- 1.2345 * 10 ^ (-4:4)) formatC(x, format = "fg", digits = 1) formatC(x, format = "fg", digits = 2) formatC(x, format = "fg", digits = 3) formatC(x, format = "fg") # implicitly digits = 4 formatC(x, format = "fg", digits = 5) ``` Negative values of `digits` are treated as `digits = 6`. ```{r} (x <- 1.2345 * 10 ^ (-4:4)) formatC(x, format = "e", digits = -1) formatC(x, format = "f", digits = -1) ``` The largest allowed value of `digits` is `324` when `format = "f"`, and 50 otherwise. Note that under most circumstances, you won't be better than 15 significant figures of accuracy, so the rest will be gibberish. ```{r} (x <- 1.2345 * 10 ^ (-4:4)) formatC(x, format = "f", digits = 325) formatC(x, format = "e", digits = 51) ``` `format = "s"` is a little different. It is designed to be used with an input that is already contains strings (rather than numbers). It's main purpose is to be used with the `width` argument (see below) to pad strings to a specified number of characters. ## `width` argument `width` specifies the minimum number of characters of each element of the result. If the number is too short, it is padded on the left with spaces. This is mostly useful for getting columns of numbers to line up. ```{r} (x <- 1.2345 * 10 ^ (-4:4)) formatC(x, format = "f", width = 6) formatC(x, format = "f", width = 7) formatC(x, format = "f", width = 8) formatC(x, format = "f", width = 9) ``` You can specify a negative width to pad on the right instead. ```{r} (x <- 1.2345 * 10 ^ (-4:4)) formatC(x, format = "f", width = -6) formatC(x, format = "f", width = -7) formatC(x, format = "f", width = -8) formatC(x, format = "f", width = -9) ``` `width = 0` is interpreted as `width = digits`, in which case you should never have padding. ```{r} (x <- 1.2345 * 10 ^ (-4:4)) formatC(x, format = "f", digits = 8, width = 0) formatC(x, format = "f", digits = 8, width = 8) ``` ## `flag` argument There are five modifying flags available: `+`, `` ``, `-`, `0`, and `#`. `flag` takes a sting containing zero or more of these flags in any order. `flag = "+"` means that positive numbers are prefixed with a plus sign. ```{r} (x <- c(1, -1, 1) * 1.2345 * 10 ^ (-4:4)) formatC(x, flag = "+") ``` `flag = "+"` means that positive numbers are prefixed with a space. When `flag = "+ "`, the plus sign takes priority. ```{r} (x <- c(1, -1, 1) * 1.2345 * 10 ^ (-4:4)) formatC(x, flag = " ") # space flag ignored formatC(x, flag = "+ ") formatC(x, flag = " +") ``` `flag = "-"` means pad short strings on the right, exactly like using a negative `width`. Using `flag = "-"` and a negative `width` _doesn't_ double reverse the padding; it still pads on the right. ```{r} (x <- 1.2345 * 10 ^ (-4:4)) formatC(x, width = 10, flag = "-") formatC(x, width = -10, flag = "-") ``` `flag = "0"` means pad short strings with zeroes, not spaces. It cannot be used in combination with `flag = "-"` to pad with zeroes on the right. ```{r} (x <- 1.2345 * 10 ^ (-4:4)) formatC(x, width = 10, flag = "0") formatC(x, width = 10, flag = "-0") # zero flag ignored ``` `flag = "#"` prints a trailing decimal point in the case where `digits = 0`. ```{r} (x <- 1.2345 * 10 ^ (-4:4)) formatC(x, digits = 0, format = "e") formatC(x, digits = 0, format = "e", flag = "#") formatC(x, digits = 0, format = "f") formatC(x, digits = 0, format = "f", flag = "#") formatC(x, digits = 0, format = "g") formatC(x, digits = 0, format = "g", flag = "#") ``` ## `big.mark` and `big.interval` arguments When the format is a fixed type (`format %in% c("d", "f", "fg")`), `big.mark` and `big.interval` control separators of digits before the decimal point. For example, in the USA, monetary quantities are often separated the a comma after every 3 digits. In France, monetary quantities are separated by a space every three digits. See [`Sys.localeconv()`](https://www.rdocumentation.org/packages/base/topics/Sys.localeconv) and the C++ documentation for [`lconv` structures](http://www.cplusplus.com/reference/clocale/lconv) describe the correct formatting for numbers in a given locale. In particular, note the `thousands_sep`, `grouping`, `mon_thousands_sep`, and `mon_grouping` elements. ```{r} (x <- 1.2345 * 10 ^ (1:10)) formatC(x, format = "d", big.mark = ",", big.interval = 3) formatC(x, format = "d", big.mark = " ", big.interval = 3) ``` ## `small.mark`, `small.interval`, and `decimal.mark` arguments These work in in a similar way to `big.mark` and `big.interval`, but they control the decimal place, and what happens afterwards. Use of these arguments only makes sense when `format %in% c("f", "fg")`. ```{r} (x <- 1.2345 * 10 ^ (0:-10)) formatC(x, format = "fg", decimal.mark = ".", small.mark = ",", small.interval = 3) formatC(x, format = "fg", decimal.mark = ",", small.mark = " ", small.interval = 3) ```