tidyr 1.0.2

tidyr 1.0.1

tidyr 1.0.0

Breaking changes

See vignette("in-packages") for a detailed transition guide.

Pivoting

New pivot_longer() and pivot_wider() provide modern alternatives to spread() and gather(). They have been carefully redesigned to be easier to learn and remember, and include many new features. Learn more in vignette("pivot").

These functions resolve multiple existing issues with spread()/gather(). Both functions now handle mulitple value columns (#149/#150), support more vector types (#333), use tidyverse conventions for duplicated column names (#496, #478), and are symmetric (#453). pivot_longer() gracefully handles duplicated column names (#472), and can directly split column names into multiple variables. pivot_wider() can now aggregate (#474), select keys (#572), and has control over generated column names (#208).

To demonstrate how these functions work in practice, tidyr has gained several new datasets: relig_income, construction, billboard, us_rent_income, fish_encounters and world_bank_pop.

Finally, tidyr demos have been removed. They are dated, and have been superseded by vignette("pivot").

Rectangling

tidyr contains four new functions to support rectangling, turning a deeply nested list into a tidy tibble: unnest_longer(), unnest_wider(), unnest_auto(), and hoist(). They are documented in a new vignette: vignette("rectangle").

unnest_longer() and unnest_wider() make it easier to unnest list-columns of vectors into either rows or columns (#418). unnest_auto() automatically picks between _longer() and _wider() using heuristics based on the presence of common names.

New hoist() provides a convenient way of plucking components of a list-column out into their own top-level columns (#341). This is particularly useful when you are working with deeply nested JSON, because it provides a convenient shortcut for the mutate() + map() pattern:

df %>% hoist(metadata, name = "name")
# shortcut for
df %>% mutate(name = map_chr(metadata, "name"))

Nesting

nest() and unnest() have been updated with new interfaces that are more closely aligned to evolving tidyverse conventions. They use the theory developed in vctrs to more consistently handle mixtures of input types, and their arguments have been overhauled based on the last few years of experience. They are supported by a new vignette("nest"), which outlines some of the main ideas of nested data (it’s still very rough, but will get better over time).

The biggest change is to their operation with multiple columns: df %>% unnest(x, y, z) becomes df %>% unnest(c(x, y, z)) and df %>% nest(x, y, z) becomes df %>% nest(data = c(x, y, z)).

I have done my best to ensure that common uses of nest() and unnest() will continue to work, generating an informative warning telling you precisely how you need to update your code. Please file an issue if I’ve missed an important use case.

unnest() has been overhauled:

Packing and chopping

Under the hood, nest() and unnest() are implemented with chop(), pack(), unchop(), and unpack():

Packing and chopping are interesting primarily because they are the atomic operations underlying nesting (and similarly, unchop and unpacking underlie unnesting), and I don’t expect them to be used directly very often.

New features

Bug fixes and minor improvements

tidyr 0.8.3

tidyr 0.8.2

tidyr 0.8.1

tidyr 0.8.0

Breaking changes

New features

Bug fixes and minor improvements

tidyr 0.7.2

tidyr 0.7.1

This is a hotfix release to account for some tidyselect changes in the unit tests.

Note that the upcoming version of tidyselect backtracks on some of the changes announced for 0.7.0. The special evaluation semantics for selection have been changed back to the old behaviour because the new rules were causing too much trouble and confusion. From now on data expressions (symbols and calls to : and c()) can refer to both registered variables and to objects from the context.

However the semantics for context expressions (any calls other than to : and c()) remain the same. Those expressions are evaluated in the context only and cannot refer to registered variables. If you’re writing functions and refer to contextual objects, it is still a good idea to avoid data expressions by following the advice of the 0.7.0 release notes.

tidyr 0.7.0

This release includes important changes to tidyr internals. Tidyr now supports the new tidy evaluation framework for quoting (NSE) functions. It also uses the new tidyselect package as selecting backend.

Breaking changes

Switch to tidy evaluation

tidyr is now a tidy evaluation grammar. See the programming vignette in dplyr for practical information about tidy evaluation.

The tidyr port is a bit special. While the philosophy of tidy evaluation is that R code should refer to real objects (from the data frame or from the context), we had to make some exceptions to this rule for tidyr. The reason is that several functions accept bare symbols to specify the names of new columns to create (gather() being a prime example). This is not tidy because the symbol do not represent any actual object. Our workaround is to capture these arguments using rlang::quo_name() (so they still support quasiquotation and you can unquote symbols or strings). This type of NSE is now discouraged in the tidyverse: symbols in R code should represent real objects.

Following the switch to tidy eval the underscored variants are softly deprecated. However they will remain around for some time and without warning for backward compatibility.

Switch to the tidyselect backend

The selecting backend of dplyr has been extracted in a standalone package tidyselect which tidyr now uses for selecting variables. It is used for selecting multiple variables (in drop_na()) as well as single variables (the col argument of extract() and separate(), and the key and value arguments of spread()). This implies the following changes:

tidyr 0.6.3

tidyr 0.6.2

tidyr 0.6.1

tidyr 0.6.0

API changes

Bug fixes and minor improvements

tidyr 0.5.1

tidyr 0.5.0

New functions

Bug fixes and minor improvements

tidyr 0.4.1

tidyr 0.4.0

Nested data frames

nest() and unnest() have been overhauled to support a useful way of structuring data frames: the nested data frame. In a grouped data frame, you have one row per observation, and additional metadata define the groups. In a nested data frame, you have one row per group, and the individual observations are stored in a column that is a list of data frames. This is a useful structure when you have lists of other objects (like models) with one element per group.

Expanding

Minor bug fixes and improvements

tidyr 0.3.1

tidyr 0.3.0

New features

Bug fixes and minor improvements

tidyr 0.2.0

New functions

Bug fixes and minor improvements