R 패키지 메타데이터와 수집 신호를 모아 봅니다.
첫 화면에서 판단해야 할 수집 신호를 먼저 배치합니다.
DESCRIPTION에서 감지한 backend 관련 package입니다.
기본 메타데이터를 작은 카드와 토큰으로 압축합니다.
| Package | Type | Spec |
|---|---|---|
| cli CRAN · 1.2.1 · 2026-05-30 | Imports | cli (>= 3.6.2) |
| generics CRAN · 1.2.1 · 2026-05-30 | Imports | generics |
| glue CRAN · 1.2.1 · 2026-05-30 | Imports | glue (>= 1.3.2) |
| lifecycle CRAN · 1.2.1 · 2026-05-30 | Imports | lifecycle (>= 1.0.5) |
| magrittr CRAN · 1.2.1 · 2026-05-30 | Imports | magrittr (>= 1.5) |
| methods CRAN · 1.2.1 · 2026-05-30 | Imports | methods |
| pillar CRAN · 1.2.1 · 2026-05-30 | Imports | pillar (>= 1.9.0) |
| R6 CRAN · 1.2.1 · 2026-05-30 | Imports | R6 |
| rlang CRAN · 1.2.1 · 2026-05-30 | Imports | rlang (>= 1.1.7) |
| tibble CRAN · 1.2.1 · 2026-05-30 | Imports | tibble (>= 3.2.0) |
| tidyselect CRAN · 1.2.1 · 2026-05-30 | Imports | tidyselect (>= 1.2.0) |
| utils CRAN · 1.2.1 · 2026-05-30 | Imports | utils |
| vctrs CRAN · 1.2.1 · 2026-05-30 | Imports | vctrs (>= 0.7.1) |
| broom CRAN · 1.2.1 · 2026-05-30 | Suggests | broom |
| covr CRAN · 1.2.1 · 2026-05-30 | Suggests | covr |
| DBI CRAN · 1.2.1 · 2026-05-30 | Suggests | DBI |
| dbplyr CRAN · 1.2.1 · 2026-05-30 | Suggests | dbplyr (>= 2.2.1) |
| ggplot2 CRAN · 1.2.1 · 2026-05-30 | Suggests | ggplot2 |
| knitr CRAN · 1.2.1 · 2026-05-30 | Suggests | knitr |
| Lahman CRAN · 1.2.1 · 2026-05-30 | Suggests | Lahman |
| lobstr CRAN · 1.2.1 · 2026-05-30 | Suggests | lobstr |
| nycflights13 CRAN · 1.2.1 · 2026-05-30 | Suggests | nycflights13 |
| purrr CRAN · 1.2.1 · 2026-05-30 | Suggests | purrr |
| rmarkdown CRAN · 1.2.1 · 2026-05-30 | Suggests | rmarkdown |
| RSQLite CRAN · 1.2.1 · 2026-05-30 | Suggests | RSQLite |
| stringi CRAN · 1.2.1 · 2026-05-30 | Suggests | stringi (>= 1.7.6) |
| testthat CRAN · 1.2.1 · 2026-05-30 | Suggests | testthat (>= 3.1.5) |
| tidyr CRAN · 1.2.1 · 2026-05-30 | Suggests | tidyr (>= 1.3.0) |
| withr CRAN · 1.2.1 · 2026-05-30 | Suggests | withr |
| 검색 결과가 없습니다. | ||
| Package | Type | Spec |
|---|---|---|
| actLifer 1.0.0 CRAN · 2026-05-30 | Depends | dplyr (>= 1.0.10) |
| adjustr 0.2.0 CRAN · 2026-05-30 | Depends | dplyr (>= 1.0.0) |
| Andromeda 1.2.0 CRAN · 2026-05-30 | Depends | dplyr |
| AQuadtree 1.0.6 CRAN · 2026-05-30 | Depends | dplyr |
| arse 1.0.0 CRAN · 2026-05-06 | Depends | dplyr |
| autoCovariateSelection 1.0.0 CRAN · 2026-05-30 | Depends | dplyr |
| autoGO 1.0.3 CRAN · 2026-05-30 | Depends | dplyr (>= 1.1.0) |
| babyTimeR 0.1.0 CRAN · 2026-05-30 | Depends | dplyr (>= 1.1.4) |
| BatchGetSymbols 2.6.4 CRAN · 2026-05-30 | Depends | dplyr |
| bbnet 1.2.1 CRAN · 2026-05-30 | Depends | dplyr |
| btb 0.2.2 CRAN · 2026-05-30 | Depends | dplyr |
| BTYD 2.4.3 CRAN · 2026-05-30 | Depends | dplyr |
| bulletcp 1.0.0 CRAN · 2026-05-30 | Depends | dplyr |
| bunching 0.8.6 CRAN · 2026-05-30 | Depends | dplyr (>= 0.8.1) |
| cchsflow 2.1.0 CRAN · 2026-05-30 | Depends | dplyr (>= 0.8.2) |
| cft 1.0.0 CRAN · 2026-05-30 | Depends | dplyr (>= 1.0.10) |
| chunked 0.6.2 CRAN · 2026-05-30 | Depends | dplyr (>= 0.7) |
| ClassificationEnsembles 1.0.2 CRAN · 2026-05-30 | Depends | dplyr |
| ConconiAnaerobicThresholdTest 1.0.0 CRAN · 2026-05-30 | Depends | dplyr |
| CONCUR 1.5 CRAN · 2026-05-30 | Depends | dplyr |
| contTimeCausal 1.1 CRAN · 2026-05-30 | Depends | dplyr |
| countmaskr 0.1.1 CRAN · 2026-05-30 | Depends | dplyr |
| cspp 0.3.3 CRAN · 2026-05-30 | Depends | dplyr(>= 1.0.0) |
| cvcqv 1.0.3 CRAN · 2026-05-30 | Depends | dplyr (>= 0.8.0.1) |
| dartR 2.9.9.5 CRAN · 2026-05-30 | Depends | dplyr |
| dartR.base 1.2.3 CRAN · 2026-05-30 | Depends | dplyr |
| diffEnrich 0.1.2 CRAN · 2026-05-30 | Depends | dplyr |
| duckplyr 1.2.1 CRAN · 2026-05-30 | Depends | dplyr (>= 1.2.0) |
| dvmisc 1.1.4 CRAN · 2026-05-30 | Depends | dplyr |
| echoice2 0.2.5 CRAN · 2026-05-30 | Depends | dplyr |
| egor 1.25.10 CRAN · 2026-05-30 | Depends | dplyr |
| EHRtemporalVariability 1.2.2 CRAN · 2026-05-30 | Depends | dplyr |
| EpiCurve 2.4-2 CRAN · 2026-05-30 | Depends | dplyr |
| EpiStats 1.6-2 CRAN · 2026-05-30 | Depends | dplyr |
| estimraw 1.0.0 CRAN · 2026-05-30 | Depends | dplyr |
| etl 0.4.3 CRAN · 2026-05-30 | Depends | dplyr |
| evalHTE 0.1.1 CRAN · 2026-05-30 | Depends | dplyr (>= 1.0.10) |
| evalITR 1.0.0 CRAN · 2026-05-30 | Depends | dplyr (>= 1.0) |
| eyetrackingR 0.2.2 CRAN · 2026-05-30 | Depends | dplyr (>= 0.7.4) |
| flatr 0.1.1 CRAN · 2026-05-30 | Depends | dplyr |
| ForecastingEnsembles 0.5.1 CRAN · 2026-05-30 | Depends | dplyr |
| FormulR 1.0.0 CRAN · 2026-05-30 | Depends | dplyr |
| funMoDisco 1.1.5 CRAN · 2026-05-30 | Depends | dplyr |
| gen5helper 1.0.1 CRAN · 2026-05-30 | Depends | dplyr |
| geotoolsR 1.2.1 CRAN · 2026-05-30 | Depends | dplyr |
| GerminaR 2.1.6 CRAN · 2026-05-30 | Depends | dplyr |
| geslaR 1.0-1 CRAN · 2026-05-30 | Depends | dplyr |
| ggmcmc 1.5.1.2 CRAN · 2026-05-30 | Depends | dplyr (>= 1.0.0) |
| ggraptR 1.3 CRAN · 2026-05-30 | Depends | dplyr (>= 0.7.5) |
| ggsurvey 1.0.0 CRAN · 2026-05-30 | Depends | dplyr |
| goat 1.1.5 CRAN · 2026-05-30 | Depends | dplyr (>= 1.0.3) |
| Greymodels 2.0.1 CRAN · 2026-05-30 | Depends | dplyr |
| HEDA 0.1.5 CRAN · 2026-05-30 | Depends | dplyr (>= 1.0.2) |
| huito 0.2.6 CRAN · 2026-05-30 | Depends | dplyr |
| IBRtools 0.1.3 CRAN · 2026-05-30 | Depends | dplyr (>= 1.0.6) |
| immundata 0.0.7 CRAN · 2026-05-30 | Depends | dplyr (>= 1.2.0) |
| implyr 0.5.0 CRAN · 2026-05-30 | Depends | dplyr (>= 1.1.2) |
| integr 1.0.0 CRAN · 2026-05-30 | Depends | dplyr (>= 0.7.6) |
| inti 0.6.92 CRAN · 2026-05-30 | Depends | dplyr |
| IPDfromKM 0.1.10 CRAN · 2026-05-30 | Depends | dplyr |
| itraxR 1.13.2 CRAN · 2026-05-30 | Depends | dplyr |
| LogisticEnsembles 1.0.2 CRAN · 2026-05-30 | Depends | dplyr |
| longitudinalANAL 0.2 CRAN · 2026-05-30 | Depends | dplyr |
| MacBehaviour 1.2.8 CRAN · 2026-05-30 | Depends | dplyr |
| malan 1.0.4 CRAN · 2026-05-30 | Depends | dplyr (>= 0.7.3) |
| manydata 1.1.3 CRAN · 2026-05-30 | Depends | dplyr |
| matchMulti 1.1.14 CRAN · 2026-05-30 | Depends | dplyr |
| metevalue 0.2.4 CRAN · 2026-05-30 | Depends | dplyr |
| mevr 1.1.1 CRAN · 2026-05-30 | Depends | dplyr |
| micropan 2.1 CRAN · 2026-05-30 | Depends | dplyr |
| microseq 2.1.7 CRAN · 2026-05-30 | Depends | dplyr |
| micss 0.3.1 CRAN · 2026-05-30 | Depends | dplyr |
| miLAG 1.0.5 CRAN · 2026-05-30 | Depends | dplyr (>= 1.0.8) |
| MiscMetabar 0.14.4 CRAN · 2026-05-19 | Depends | dplyr |
| mixopt 0.1.3 CRAN · 2026-05-30 | Depends | dplyr |
| monobin 0.2.4 CRAN · 2026-05-30 | Depends | dplyr |
| normfluodbf 2.0.3 CRAN · 2026-05-30 | Depends | dplyr (>= 1.1.4) |
| NTLKwIEx 0.2.0 CRAN · 2026-05-30 | Depends | dplyr |
| NumericEnsembles 1.2 CRAN · 2026-05-30 | Depends | dplyr |
| onmaRg 1.0.3 CRAN · 2026-05-30 | Depends | dplyr |
| OSFD 3.1 CRAN · 2026-05-30 | Depends | dplyr |
| PAMpal 1.5.2 CRAN · 2026-05-30 | Depends | dplyr (>= 1.1.1) |
| phenModel 1.0 CRAN · 2026-05-30 | Depends | dplyr |
| photosynthesisLRC 1.0.6 CRAN · 2026-05-30 | Depends | dplyr (>= 1.1.4) |
| pii 1.3.0 CRAN · 2026-05-30 | Depends | dplyr |
| PogromcyDanych 1.7.1 CRAN · 2026-05-30 | Depends | dplyr |
| PolicyPortfolios 0.5 CRAN · 2026-05-30 | Depends | dplyr (>= 1.0) |
| prodest 1.0.1 CRAN · 2026-05-30 | Depends | dplyr |
| promotionImpact 0.1.5 CRAN · 2026-05-30 | Depends | dplyr (>= 0.7.6) |
| psychometric 2.4 CRAN · 2026-05-30 | Depends | dplyr |
| PupillometryR 0.0.7 CRAN · 2026-05-30 | Depends | dplyr |
| PupilPre 0.6.3 CRAN · 2026-05-30 | Depends | dplyr (>= 0.8.0) |
| QuantPsyc 1.6 CRAN · 2026-05-30 | Depends | dplyr |
| quickpsy 0.1.5.2 CRAN · 2026-05-30 | Depends | dplyr |
| QurvE 1.1.2 CRAN · 2026-05-30 | Depends | dplyr |
| radiant.data 1.6.8 CRAN · 2026-05-30 | Depends | dplyr (>= 1.1.2) |
| rangecondprob 0.4.1 CRAN · 2026-05-30 | Depends | dplyr |
| RCPA 0.2.8 CRAN · 2026-05-30 | Depends | dplyr |
| recipes 1.3.2 CRAN · 2026-05-30 | Depends | dplyr (>= 1.1.0) |
| recurrentpseudo 1.0.0 CRAN · 2026-05-30 | Depends | dplyr |
| ReDaMoR 1.0.0 CRAN · 2026-05-30 | Depends | dplyr |
| repello 1.0.1 CRAN · 2026-05-30 | Depends | dplyr |
| rise 1.0.4 CRAN · 2026-05-30 | Depends | dplyr |
| RiskScorescvd 0.3.1 CRAN · 2026-05-30 | Depends | dplyr (>= 1.1.2) |
| rmdHelpers 1.3.1 CRAN · 2026-05-30 | Depends | dplyr |
| rollup 0.1.0 CRAN · 2026-05-30 | Depends | dplyr |
| rQSAR 1.0.0 CRAN · 2026-05-30 | Depends | dplyr |
| saccadr 0.1.3 CRAN · 2026-05-30 | Depends | dplyr |
| samplesize4surveys 4.1.1 CRAN · 2026-05-30 | Depends | dplyr |
| SCdeconR 1.0.2 CRAN · 2026-05-30 | Depends | dplyr |
| SEERaBomb 2019.2 CRAN · 2026-05-30 | Depends | dplyr |
| sfc 0.1.1 CRAN · 2026-05-30 | Depends | dplyr |
| shinyML 1.0.1 CRAN · 2026-05-30 | Depends | dplyr |
| shinySIR 0.1.2 CRAN · 2026-05-14 | Depends | dplyr (>= 0.8.0.1) |
| SimDissolution 0.1.0 CRAN · 2026-05-30 | Depends | dplyr |
| simITS 0.1.1 CRAN · 2026-05-30 | Depends | dplyr |
| spect 1.0 CRAN · 2026-05-30 | Depends | dplyr |
| StatMatch 1.4.3 CRAN · 2026-05-30 | Depends | dplyr |
| stcos 0.3.1 CRAN · 2026-05-13 | Depends | dplyr |
| sugarbag 0.1.10 CRAN · 2026-05-30 | Depends | dplyr (>= 1.0.0) |
| 검색 결과가 없습니다. | ||
| Type | Packages |
|---|---|
| Depends | 137 |
| Imports | 4,657 |
| Suggests | 1,114 |
| Enhances | 1 |
NEWS code{white-space: pre-wrap;} span.smallcaps{font-variant: small-caps;} span.underline{text-decoration: underline;} div.column{display: inline-block; vertical-align: top; width: 50%;} div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;} ul.task-list{list-style: none;} pre > code.sourceCode { white-space: pre; position: relative; } pre > code.sourceCode > span { display: inline-block; line-height: 1.25; } pre > code.sourceCode > span:empty { height: 1.2em; } .sourceCode { overflow: visible; } code.sourceCode > span { color: inherit; text-decoration: inherit; } div.sourceCode { margin: 1em 0; } pre.sourceCode { margin: 0; } @media screen { div.sourceCode { overflow: auto; } } @media print { pre > code.sourceCode { white-space: pre-wrap; } pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; } } pre.numberSource code { counter-reset: source-line 0; } pre.numberSource code > span { position: relative; left: -4em; counter-increment: source-line; } pre.numberSource code > span > a:first-child::before { content: counter(source-line); position: relative; left: -1em; text-align: right; vertical-align: baseline; border: none; display: inline-block; -webkit-touch-callout: none; -webkit-user-select: none; -khtml-user-select: none; -moz-user-select: none; -ms-user-select: none; user-select: none; padding: 0 4px; width: 4em; color: #aaaaaa; } pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa; padding-left: 4px; } div.sourceCode { } @media screen { pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; } } code span.al { color: #ff0000; font-weight: bold; } /* Alert */ code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */ code span.at { color: #7d9029; } /* Attribute */ code span.bn { color: #40a070; } /* BaseN */ code span.bu { color: #008000; } /* BuiltIn */ code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */ code span.ch { color: #4070a0; } /* Char */ code span.cn { color: #880000; } /* Constant */ code span.co { color: #60a0b0; font-style: italic; } /* Comment */ code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */ code span.do { color: #ba2121; font-style: italic; } /* Documentation */ code span.dt { color: #902000; } /* DataType */ code span.dv { color: #40a070; } /* DecVal */ code span.er { color: #ff0000; font-weight: bold; } /* Error */ code span.ex { } /* Extension */ code span.fl { color: #40a070; } /* Float */ code span.fu { color: #06287e; } /* Function */ code span.im { color: #008000; font-weight: bold; } /* Import */ code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */ code span.kw { color: #007020; font-weight: bold; } /* Keyword */ code span.op { color: #666666; } /* Operator */ code span.ot { color: #007020; } /* Other */ code span.pp { color: #bc7a00; } /* Preprocessor */ code span.sc { color: #4070a0; } /* SpecialChar */ code span.ss { color: #bb6688; } /* SpecialString */ code span.st { color: #4070a0; } /* String */ code span.va { color: #19177c; } /* Variable */ code span.vs { color: #4070a0; } /* VerbatimString */ code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */ dplyr 1.2.1 dplyr is now fully compliant with the R C API (#7819). dplyr 1.2.0 New features New filter_out() companion to filter() . Use filter() when specifying rows to keep . Use filter_out() when specifying rows to drop . filter_out() simplifies cases where you would have previously used a filter() to drop rows. It is particularly useful when missing values are involved. For example, to drop rows where the count is zero: df |> filter (count != 0 | is.na (count)) df |> filter_out (count == 0 ) With filter() , you must provide a “negative” condition of != 0 and must explicitly guard against accidentally dropping rows with NA . With filter_out() , you directly specify rows to drop and you don’t have to guard against dropping rows with NA , which tends to result in much clearer code. This work is a result of Tidyup 8: Expanding the filter() family , with a lot of great feedback from the community (#6560, #6891). New when_any() and when_all() , which are elementwise versions of any() and all() . Alternatively, you can think of them as performing repeated | and & on any number of inputs, for example: when_any(x, y, z) is equivalent to x | y | z . when_all(x, y, z) is equivalent to x & y & z . when_any() is particularly useful within filter() and filter_out() to specify comma separated conditions combined with | rather than & , like: # With `|` countries |> filter ( (name %in% c ( "US" , "CA" ) & between (score, 200 , 300 )) | (name %in% c ( "PR" , "RU" ) & between (score, 100 , 200 )) ) # With `when_any()`, you drop the explicit `|`, the extra `()`, and your # conditions are all indented to the same level countries |> filter ( when_any ( name %in% c ( "US" , "CA" ) & between (score, 200 , 300 ), name %in% c ( "PR" , "RU" ) & between (score, 100 , 200 ) )) # To drop these rows instead, use `filter_out()` countries |> filter_out ( when_any ( name %in% c ( "US" , "CA" ) & between (score, 200 , 300 ), name %in% c ( "PR" , "RU" ) & between (score, 100 , 200 ) )) This work is a result of Tidyup 8: Expanding the filter() family . case_when() is now part of a family of 4 related functions, 3 of which are new: Use case_when() to create a new vector based on logical conditions. Use replace_when() to update an existing vector based on logical conditions. Use recode_values() to create a new vector by mapping all old values to new values. Use replace_values() to update an existing vector by mapping some old values to new values. Learn all about these in a new vignette, vignette("recoding-replacing") . replace_when() is particularly useful for conditionally mutating rows within one or more columns, and can be thought of as an enhanced version of base::replace() . recode_values() and replace_values() have the familiar case_when() -style formula interface for easy interactive use, but also have from and to arguments as a way for you to incorporate a pre-built lookup table, making them more holistic replacements for both case_match() and recode() . This work is a result of Tidyup 7: Recoding and replacing values in the tidyverse , with a lot of great feedback from the community (#7728, #7729). case_when() has gained a new .unmatched argument. For extra safety, set .unmatched = "error" rather than providing a .default when you believe that you’ve handled every possible case, and it will error if a case is left unhandled. The new recode_values() also has this argument (#7653). if_else() , case_when() , and coalesce() have gotten significantly faster and use much less memory due to a rewrite in C via vctrs (#7723, #7725, #7727). New ptype argument for between() , allowing users to specify the desired output type. This is particularly useful for ordered factors and other complex types where the default common type behavior might not be ideal (#6906, @JamesHWade ). New rbind() method for rowwise_df to avoid creating corrupt rowwise data frames (r-lib/vctrs#1935). Lifecycle changes Newly stable .by has moved from experimental to stable (#7762). reframe() has moved from experimental to stable (#7713, @VisruthSK ). Newly breaking if_else() no longer allows condition to be a logical array. It must be a logical vector with no dim attribute (#7723). Newly deprecated case_match() is soft-deprecated, and is fully replaced by recode_values() and replace_values() , which are more flexible, more powerful, and have much better names. In case_when() , supplying all size 1 LHS inputs along with a size >1 RHS input is now soft-deprecated. This is an improper usage of case_when() that should instead be a series of if statements, like: # Scalars! code <- 1L flavor <- "vanilla" # Improper usage: case_when ( code == 1L && flavor == "chocolate" ~ x, code == 1L && flavor == "vanilla" ~README code{white-space: pre-wrap;} span.smallcaps{font-variant: small-caps;} span.underline{text-decoration: underline;} div.column{display: inline-block; vertical-align: top; width: 50%;} div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;} ul.task-list{list-style: none;} pre > code.sourceCode { white-space: pre; position: relative; } pre > code.sourceCode > span { display: inline-block; line-height: 1.25; } pre > code.sourceCode > span:empty { height: 1.2em; } .sourceCode { overflow: visible; } code.sourceCode > span { color: inherit; text-decoration: inherit; } div.sourceCode { margin: 1em 0; } pre.sourceCode { margin: 0; } @media screen { div.sourceCode { overflow: auto; } } @media print { pre > code.sourceCode { white-space: pre-wrap; } pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; } } pre.numberSource code { counter-reset: source-line 0; } pre.numberSource code > span { position: relative; left: -4em; counter-increment: source-line; } pre.numberSource code > span > a:first-child::before { content: counter(source-line); position: relative; left: -1em; text-align: right; vertical-align: baseline; border: none; display: inline-block; -webkit-touch-callout: none; -webkit-user-select: none; -khtml-user-select: none; -moz-user-select: none; -ms-user-select: none; user-select: none; padding: 0 4px; width: 4em; color: #aaaaaa; } pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa; padding-left: 4px; } div.sourceCode { } @media screen { pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; } } code span.al { color: #ff0000; font-weight: bold; } /* Alert */ code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */ code span.at { color: #7d9029; } /* Attribute */ code span.bn { color: #40a070; } /* BaseN */ code span.bu { color: #008000; } /* BuiltIn */ code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */ code span.ch { color: #4070a0; } /* Char */ code span.cn { color: #880000; } /* Constant */ code span.co { color: #60a0b0; font-style: italic; } /* Comment */ code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */ code span.do { color: #ba2121; font-style: italic; } /* Documentation */ code span.dt { color: #902000; } /* DataType */ code span.dv { color: #40a070; } /* DecVal */ code span.er { color: #ff0000; font-weight: bold; } /* Error */ code span.ex { } /* Extension */ code span.fl { color: #40a070; } /* Float */ code span.fu { color: #06287e; } /* Function */ code span.im { color: #008000; font-weight: bold; } /* Import */ code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */ code span.kw { color: #007020; font-weight: bold; } /* Keyword */ code span.op { color: #666666; } /* Operator */ code span.ot { color: #007020; } /* Other */ code span.pp { color: #bc7a00; } /* Preprocessor */ code span.sc { color: #4070a0; } /* SpecialChar */ code span.ss { color: #bb6688; } /* SpecialString */ code span.st { color: #4070a0; } /* String */ code span.va { color: #19177c; } /* Variable */ code span.vs { color: #4070a0; } /* VerbatimString */ code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */ dplyr Overview dplyr is a grammar of data manipulation, providing a consistent set of verbs that help you solve the most common data manipulation challenges: mutate() adds new variables that are functions of existing variables select() picks variables based on their names. filter() picks cases based on their values. summarise() reduces multiple values down to a single summary. arrange() changes the ordering of the rows. These all combine naturally with group_by() which allows you to perform any operation “by group”. You can learn more about them in vignette("dplyr") . As well as these single-table verbs, dplyr also provides a variety of two-table verbs, which you can learn about in vignette("two-table") . If you are new to dplyr, the best place to start is the data transformation chapter in R for Data Science. Backends In addition to data frames/tibbles, dplyr makes working with other computational backends accessible and efficient. Below is a list of alternative backends: arrow for larger-than-memory datasets, including on remote cloud storage like AWS S3, using the Apache Arrow C++ engine, Acero . dbplyr for data stored in a relational database. Translates your dplyr code to SQL. dtplyr for large, in-memory datasets. Translates your dplyr code to high performance data.table code. duckplyr for large, in-memory datasets. Translates your dplyr code to high performance duckdb queries with zero extra copies and an automatic R fallback when translation isn’t possible. sparklyr for very large datasets stored in Apache Spark . Installation # The easiest way to get dplyr is to install the whole tidyverse: install.packages ( "tidyverse" ) # Alternatively, install just dplyr: install.packages ( "dplyr" ) Development version To get a bug fix or to use a feature from the development version, you can install the development version of dplyr from GitHub. # install.packages("pak") pak :: pak ( "tidyverse/dplyr" ) Cheat Sheet Usage library (dplyr) starwars |> filter (species == "Droid" ) #> # A tibble: 6 × 14 #> name height mass hair_color skin_color eye_color birth_year sex gender #> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> #> 1 C-3PO 167 75 <NA> gold yellow 112 none masculi… #> 2 R2-D2 96 32 <NA> white, blue red 33 none masculi… #> 3 R5-D4 97 32 <NA> white, red red NA none masculi… #> 4 IG-88 200 140 none metal red 15 none masculi… #> 5 R4-P17 96 NA none silver, red red, blue NA none feminine #> # ℹ 1 more row #> # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>, #> # vehicles <list>, starships <list> starwars |> select (name, ends_with ( "color" )) #> # A tibble: 87 × 4 #> name hair_color skin_color eye_color #> <chr> <chr> <chr> <chr> #> 1 Luke Skywalker blond fair blue #> 2 C-3PO <NA> gold yellow #> 3 R2-D2 <NA> white, blue red #> 4 Darth Vader none white yellow #> 5 Leia Organa brown light brown #> # ℹ 82 more rows starwars |> mutate (name, bmi = mass / ((height / 100 ) ^ 2 )) |> select (name : mass, bmi) #> # A tibble: 87 × 4 #> name height mass bmi #> <chr> <int> <dbl> <dbl> #> 1 Luke Skywalker 172 77 26.0 #> 2 C-3PO 167 75 26.9 #> 3 R2-D2 96 32 34.7 #> 4 Darth Vader 202 136 33.3 #> 5 Leia Organa 150 49 21.8 #> # ℹ 82 more rows starwars |> arrange ( desc (mass)) #> # A tibble: 87 × 14 #> name height mass hair_color skin_color eye_color birth_year sex gender #> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> #> 1 Jabba De… 175 1358 <NA> green-tan… orange 600 herm… mascu… #> 2 Grievous 216 159 none brown, wh… green, y… NA male mascu… #> 3 IG-88 200 140 none metal red 15 none mascu… #> 4 Darth Va… 202 136 none white yellow 41.9 male mascu… #> 5 Tarfful 234 136 brown brown blue NA male mascu… #> # ℹ 82 more rows #> # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>, #> # vehicles <list>, starships <list> starwars |> group_by (species) |> summarise ( n = n (), mass = mean (mass, na.rm = TRUE ) ) |> filter ( n > 1 , mass > 50 ) #> # A tibble: 9 × 3 #> species n mass #> <chr> <int> <dbl> #> 1 Droid 6 69.8 #> 2 Gungan 3 74 #> 3 Human 35 81.3 #> 4 Kaminoan 2 88 #> 5 Mirialan 2 53.1 #> # ℹ 4 more rows Getting help If you encounter a clear bug, please file an issue with a minimal reproducible example on GitHub . For questions and other discussion, please use forum.posit.co . Code of conduct Please note that this project is released with a Contributor Code of Conduct . By participating in this project you agree to abide by its terms.Help for package dplyr const macros = { "\\R": "\\textsf{R}", "\\mbox": "\\text", "\\code": "\\texttt"}; function processMathHTML() { var l = document.getElementsByClassName('reqn'); for (let e of l) { katex.render(e.textContent, e, { throwOnError: false, macros }); } return; } Package {dplyr} Contents dplyr-package across all_equal all_vars args_by arrange arrange_all auto_copy backend_dbplyr band_members between bind_cols bind_rows c_across case-and-replace-when case_match check_dbplyr coalesce common_by compute consecutive_id context copy_to count cross_join cumall defunct defunct-each defunct-lazyeval deprec-context desc dim_desc distinct distinct_all distinct_prepare do dplyr-locale dplyr_by dplyr_data_masking dplyr_extending dplyr_tidy_select explain filter filter-joins filter_all funs glimpse group_by group_by_all group_by_drop_default group_cols group_data group_map group_nest group_split group_trim grouped_df ident if_else join_by last_dplyr_warnings lead-lag make_tbl mutate mutate-joins mutate_all n_distinct na_if near nest_by nest_join new_grouped_df nth ntile order_by percent_rank pick progress_estimated pull recode recode-and-replace-values reexports reframe relocate rename row_number rows rowwise same_src sample_n scoped select select_all setops slice sql src src_tbls starwars storms summarise summarise_all tbl tbl_ptype tbl_vars tidyeval-compat top_n transmute vars when-any-all with_groups with_order Type: Package Title: A Grammar of Data Manipulation Version: 1.2.1 Description: A fast, consistent tool for working with data frame like objects, both in memory and out of memory. License: MIT + file LICENSE URL: https://dplyr.tidyverse.org , https://github.com/tidyverse/dplyr BugReports: https://github.com/tidyverse/dplyr/issues Depends: R (≥ 4.1.0) Imports: cli (≥ 3.6.2), generics, glue (≥ 1.3.2), lifecycle (≥ 1.0.5), magrittr (≥ 1.5), methods, pillar (≥ 1.9.0), R6, rlang (≥ 1.1.7), tibble (≥ 3.2.0), tidyselect (≥ 1.2.0), utils, vctrs (≥ 0.7.1) Suggests: broom, covr, DBI, dbplyr (≥ 2.2.1), ggplot2, knitr, Lahman, lobstr, nycflights13, purrr, rmarkdown, RSQLite, stringi (≥ 1.7.6), testthat (≥ 3.1.5), tidyr (≥ 1.3.0), withr VignetteBuilder: knitr Config/build/compilation-database: true Config/Needs/website: tidyverse/tidytemplate Config/testthat/edition: 3 Encoding: UTF-8 LazyData: true RoxygenNote: 7.3.3 NeedsCompilation: yes Packaged: 2026-04-02 19:51:05 UTC; hadleywickham Author: Hadley Wickham [aut, cre], Romain François [aut], Lionel Henry [aut], Kirill Müller [aut], Davis Vaughan [aut], Posit Software, PBC [cph, fnd] Maintainer: Hadley Wickham <hadley@posit.co> Repository: CRAN Date/Publication: 2026-04-03 07:30:08 UTC dplyr: A Grammar of Data Manipulation Description To learn more about dplyr, start with the vignettes: browseVignettes(package = "dplyr") Author(s) Maintainer : Hadley Wickham hadley@posit.co ( ORCID ) Authors: Romain François ( ORCID ) Lionel Henry Kirill Müller ( ORCID ) Davis Vaughan davis@posit.co ( ORCID ) Other contributors: Posit Software, PBC [copyright holder, funder] See Also Useful links: https://dplyr.tidyverse.org https://github.com/tidyverse/dplyr Report bugs at https://github.com/tidyverse/dplyr/issues Apply a function (or functions) across multiple columns Description across() makes it easy to apply the same transformation to multiple columns, allowing you to use select() semantics inside in "data-masking" functions like summarise() and mutate() . See vignette("colwise") for more details. if_any() and if_all() apply the same predicate function to a selection of columns and combine the results into a single logical vector: if_any() is TRUE when the predicate is TRUE for any of the selected columns, if_all() is TRUE when the predicate is TRUE for all selected columns. If you just need to select columns without applying a transformation to each of them, then you probably want to use pick() instead. across() supersedes the family of "scoped variants" like summarise_at() , summarise_if() , and summarise_all() . Usage across(.cols, .fns, ..., .names = NULL, .unpack = FALSE) if_any(.cols, .fns, ..., .names = NULL) if_all(.cols, .fns, ..., .names = NULL) Arguments .cols < tidy-select > Columns to transform. You can't select grouping columns because they are already automatically handled by the verb (i.e. summarise() or mutate() ). .fns Functions to apply to each of the selected columns. Possible values are: A function, e.g. mean . A purrr-style lambda, e.g. ~ mean(.x, na.rm = TRUE) A named list of functions or lambdas, e.g. list(mean = mean, n_miss = ~ sum(is.na(.x)) . Each function is applied to each column, and the output is named by combining the function name and the column name using the glue specification in .names . Within these functions you can use cur_column() and cur_group() to access the current column and grouping keys respectively. ... Additional arguments for the function calls in .fns are no longer accepted in ... because it's not clear when they should be evaluated: once per across() or once per group? Instead supply additional arguments directly in .fns by using a lambda. For example, instead of across(a:b, mean, na.rm = TRUE) write across(a:b, ~ mean(.x, na.rm = TRUE)) . .names A glue specification that describes how to name the output columns. This can use {.col} to stand for the selected column name, and {.fn} to stand for the name of the function being applied. The default ( NULL ) is equivalent to "{.col}" for the single function case and "{.col}_{.fn}" for the case where a list is used for .fns . .unpack Optionally unpack data frames returned by functions in .fns , which expands the df-columns out into individual columns, retaining the number of rows in the data frame. If FALSE , the default, no unpacking is done. If TRUE , unpacking is done with a default glue specification of "{outer}_{inner}" . Otherwise, a single glue specification can be supplied to describe how to name the unpacked columns. This can use {outer} to refer to the name originally generated by .names , and {inner} to refer to the names of the data frame you are unpacking. Details When there are no selected columns: if_any() will return FALSE , consistent with the behavior of any() when called without inputs. if_all() will return TRUE , consistent with the behavior of all() when called without inputs. Value across() typically returns a tibble with one column for each column in .cols and each function in .fns . If .unpack is used, more columns may be returned depending on how the results of .fns are unpacked. if_any() and if_all() return a logical vector. Timing of evaluation R code in dplyr verbs is generally evaluated once per group. Inside across() however, code is evaluated once for each combination of columns and groups. If the evaluation timing is important, for example if you're generating random variables, think about when it should happen and place your code in consequence. gdf <- tibble(g = c(1, 1, 2, 3), v1 = 10:13, v2 = 20:23) |> group_by(g) set.seed(1) # Outside: 1 normal variate n <- rnorm(1) gdf |> mutate(across(v1:v2, ~ .x + n)) #> # A tibble: 4 x 3 #> # Groups: g [3] #> g v1 v2 #> <dbl> <dbl> <dbl> #> 1 1 9.37 19.4 #> 2 1 10.4 20.4 #> 3 2 11.4 21.4 #> 4 3 12.4 22.4 # Inside a verb: 3 normal variates (ngroup) gdf |> mutate(n = rnorm(1), across(v1:v2, ~ .x + n)) #> # A tibble: 4 x 4 #> # Groups: g [3] #> g v1 v2 n #> <dbl> <dbl> <dbl> <dbl> #> 1 1 10.2 20.2 0.184 #> 2 1 11.2 21.2 0.184 #> 3 2 11.2 21.2 -0.836 #> 4 3 14.6 24.6 1.60 # Inside `across()`: 6 normal variates (ncol * ngroup) gdf |> mutate(across(v1:v2, ~ .x + rnorm(1))) #> # A tibble: 4 x 3 #> # Groups: g [3] #> g v1 v2 #> <dbl> <dbl> <dbl> #> 1 1 10.3 20.7 #> 2 1 11.3 21.7 #> 3 2 11.2 22.6 #> 4 3 13.5 22.7 See Also c_across() for a function that returns a vector Examples # For better printing iris <- as_tibble(iris) # across() ----------------------------------------------------------------- # Using everythPackage 'dplyr' reference manual Package 'dplyr' Title: A Grammar of Data Manipulation Description: A fast, consistent tool for working with data frame like objects, both in memory and out of memory. Authors: Hadley Wickham [aut, cre] (ORCID: <https://orcid.org/0000-0003-4757-117X>), Romain François [aut] (ORCID: <https://orcid.org/0000-0002-2444-4226>), Lionel Henry [aut], Kirill Müller [aut] (ORCID: <https://orcid.org/0000-0002-1416-3412>), Davis Vaughan [aut] (ORCID: <https://orcid.org/0000-0003-4777-038X>), Posit Software, PBC [cph, fnd] Maintainer: Hadley Wickham < [email protected] > License: MIT + file LICENSE Version: 1.2.1.9000 Built: 2026-05-06 17:33:23 UTC Source: https://github.com/tidyverse/dplyr Help Index Apply a function (or functions) across multiple columns Apply predicate to all variables Order rows using column values Copy tables to same source, if necessary Band membership Detect where values fall in a specified range Bind multiple data frames by column Bind multiple data frames by row Combine values from multiple columns A general vectorised if-else Find the first non-missing element Force computation of a database query Generate a unique identifier for consecutive combinations Information about the "current" group or variable Copy a local data frame to a remote src Count the observations in each group Cross join Cumulative versions of any, all, and mean Descending order Keep distinct/unique rows Per-operation grouping with .by/by Explain details of a tbl Keep or drop rows that match a condition Filtering joins Get a glimpse of your data Group by one or more variables Select grouping variables Apply a function to each group Trim grouping structure Flag a character vector as SQL identifiers Vectorised if-else Join specifications Compute lagged or leading values Create, modify, and delete columns Mutating joins Count unique combinations Convert values to NA Compare two numeric vectors Nest join Extract the first, last, or nth value from a vector Bucket a numeric vector into n groups A helper function for ordering window function output Proportional ranking functions Select a subset of columns Extract a single column Recode values Recode and replace values Transform each group to an arbitrary number of rows Change column order Rename columns Integer ranking functions Manipulate individual rows Group input by rows Operate on a selection of variables Keep or drop columns using their names and types Set operations Subset rows using their positions SQL escaping. Starwars characters Storm tracks data Summarise each group down to one row Create a table from a data source Select variables Elementwise any() and all() Apply a function (or functions) across multiple columns Description across() makes it easy to apply the same transformation to multiple columns, allowing you to use select() semantics inside in "data-masking" functions like summarise() and mutate() . See vignette("colwise") for more details. if_any() and if_all() apply the same predicate function to a selection of columns and combine the results into a single logical vector: if_any() is TRUE when the predicate is TRUE for any of the selected columns, if_all() is TRUE when the predicate is TRUE for all selected columns. If you just need to select columns without applying a transformation to each of them, then you probably want to use pick() instead. across() supersedes the family of "scoped variants" like summarise_at() , summarise_if() , and summarise_all() . Usage across(.cols, .fns, ..., .names = NULL, .unpack = FALSE) if_any(.cols, .fns, ..., .names = NULL) if_all(.cols, .fns, ..., .names = NULL) across ( .cols , .fns , ... , .names = NULL , .unpack = FALSE ) if_any ( .cols , .fns , ... , .names = NULL ) if_all ( .cols , .fns , ... , .names = NULL ) Arguments .cols < tidy-select > Columns to transform. You can't select grouping columns because they are already automatically handled by the verb (i.e. summarise() or mutate() ). .fns Functions to apply to each of the selected columns. Possible values are: A function, e.g. mean . A purrr-style lambda, e.g. ~ mean(.x, na.rm = TRUE) A named list of functions or lambdas, e.g. list(mean = mean, n_miss = ~ sum(is.na(.x)) . Each function is applied to each column, and the output is named by combining the function name and the column name using the glue specification in .names . Within these functions you can use cur_column() and cur_group() to access the current column and grouping keys respectively. ... Additional arguments for the function calls in .fns are no longer accepted in ... because it's not clear when they should be evaluated: once per across() or once per group? Instead supply additional arguments directly in .fns by using a lambda. For example, instead of across(a:b, mean, na.rm = TRUE) write across(a:b, ~ mean(.x, na.rm = TRUE)) . .names A glue specification that describes how to name the output columns. This can use {.col} to stand for the selected column name, and {.fn} to stand for the name of the function being applied. The default ( NULL ) is equivalent to "{.col}" for the single function case and "{.col}_{.fn}" for the case where a list is used for .fns . .unpack Optionally unpack data frames returned by functions in .fns , which expands the df-columns out into individual columns, retaining the number of rows in the data frame. If FALSE , the default, no unpacking is done. If TRUE , unpacking is done with a default glue specification of "{outer}_{inner}" . Otherwise, a single glue specification can be supplied to describe how to name the unpacked columns. This can use {outer} to refer to the name originally generated by .names , and {inner} to refer to the names of the data frame you are unpacking. Details When there are no selected columns: if_any() will return FALSE , consistent with the behavior of any() when called without inputs. if_all() will return TRUE , consistent with the behavior of all() when called without inputs. Value across() typically returns a tibble with one column for each column in .cols and each function in .fns . If .unpack is used, more columns may be returned depending on how the results of .fns are unpacked. if_any() and if_all() return a logical vector. Timing of evaluation R code in dplyr verbs is generally evaluated once per group. Inside across() however, code is evaluated once for each combination of columns and groups. If the evaluation timing is important, for example if you're generating random variables, think about when it should happen and place your code in consequence. gdf <- tibble(g = c(1, 1, 2, 3), v1 = 10:13, v2 = 20:23) |> group_by(g) set.seed(1) # Outside: 1 normal variate n <- rnorm(1) gdf |> mutate(across(v1:v2, ~ .x + n)) #> # A tibble: 4 x 3 #> # Groups: g [3] #> g v1 v2 #> <dbl> <dbl> <dbl> #> 1 1 9.37 19.4 #> 2 1 10.4 20.4 #> 3 2 11.4 21.4 #> 4 3 12.4 22.4 # Inside a verb: 3 normal variates (ngroup) gdf |> mutate(n = rnorm(1), across(v1:v2, ~ .x + n)) #> # A tibble: 4 x 4 #> # Groups: g [3] #> g v1 v2 n #> <dbl> <dbl> <dbl> <dbl> #> 1 1 10.2 20.2 0.184 #> 2 1 11.2 21.2 0.184 #> 3 2 11.2 21.2 -0.836 #> 4 3 14.6 24.6 1.60 # Inside `across()`: 6 normal variates (ncol * ngroup) gdf |> mutate(across(v1:v2, ~ .x + rnorm(1))) #> # A tibble: 4 x 3 #> # Groups: g [3] #> g v1 v2 #> <dbl> <dbl> <dbl> #> 1 1 10.3 20.7 #> 2 1 11.3 21.7 #> 3 2 11.2 22.6 #> 4 3 13.5 22.7 See Also c_across() for a function that returns a vector Examples # For better printing iris <- as_tibble(iris) # across() ----------------------------------------------------------------- # Using everything() to apply the same function to all columns iris |> mutate(across(everything(), as.character)) # Different ways to select the same set of columns # See <https://tidyselect.r-lib.org/articles/syntax.html> for details iris |> mutate(across(c(Sepal.Length, Sepal.Width), round)) iris |> mutate(across(c(1, 2), round)) iris |> mutate(across(1:Sepal.Width, round)) iris |> mutate(across(where(is.double) &Package 'dplyr' reference manual Package 'dplyr' Title: A Grammar of Data Manipulation Description: A fast, consistent tool for working with data frame like objects, both in memory and out of memory. Authors: Hadley Wickham [aut, cre] (ORCID: <https://orcid.org/0000-0003-4757-117X>), Romain François [aut] (ORCID: <https://orcid.org/0000-0002-2444-4226>), Lionel Henry [aut], Kirill Müller [aut] (ORCID: <https://orcid.org/0000-0002-1416-3412>), Davis Vaughan [aut] (ORCID: <https://orcid.org/0000-0003-4777-038X>), Posit Software, PBC [cph, fnd] Maintainer: Hadley Wickham < [email protected] > License: MIT + file LICENSE Version: 1.2.1.9000 Built: 2026-05-06 17:33:23 UTC Source: https://github.com/tidyverse/dplyr Help Index Apply a function (or functions) across multiple columns Apply predicate to all variables Order rows using column values Copy tables to same source, if necessary Band membership Detect where values fall in a specified range Bind multiple data frames by column Bind multiple data frames by row Combine values from multiple columns A general vectorised if-else Find the first non-missing element Force computation of a database query Generate a unique identifier for consecutive combinations Information about the "current" group or variable Copy a local data frame to a remote src Count the observations in each group Cross join Cumulative versions of any, all, and mean Descending order Keep distinct/unique rows Per-operation grouping with .by/by Explain details of a tbl Keep or drop rows that match a condition Filtering joins Get a glimpse of your data Group by one or more variables Select grouping variables Apply a function to each group Trim grouping structure Flag a character vector as SQL identifiers Vectorised if-else Join specifications Compute lagged or leading values Create, modify, and delete columns Mutating joins Count unique combinations Convert values to NA Compare two numeric vectors Nest join Extract the first, last, or nth value from a vector Bucket a numeric vector into n groups A helper function for ordering window function output Proportional ranking functions Select a subset of columns Extract a single column Recode values Recode and replace values Transform each group to an arbitrary number of rows Change column order Rename columns Integer ranking functions Manipulate individual rows Group input by rows Operate on a selection of variables Keep or drop columns using their names and types Set operations Subset rows using their positions SQL escaping. Starwars characters Storm tracks data Summarise each group down to one row Create a table from a data source Select variables Elementwise any() and all() Apply a function (or functions) across multiple columns Description across() makes it easy to apply the same transformation to multiple columns, allowing you to use select() semantics inside in "data-masking" functions like summarise() and mutate() . See vignette("colwise") for more details. if_any() and if_all() apply the same predicate function to a selection of columns and combine the results into a single logical vector: if_any() is TRUE when the predicate is TRUE for any of the selected columns, if_all() is TRUE when the predicate is TRUE for all selected columns. If you just need to select columns without applying a transformation to each of them, then you probably want to use pick() instead. across() supersedes the family of "scoped variants" like summarise_at() , summarise_if() , and summarise_all() . Usage across(.cols, .fns, ..., .names = NULL, .unpack = FALSE) if_any(.cols, .fns, ..., .names = NULL) if_all(.cols, .fns, ..., .names = NULL) across ( .cols , .fns , ... , .names = NULL , .unpack = FALSE ) if_any ( .cols , .fns , ... , .names = NULL ) if_all ( .cols , .fns , ... , .names = NULL ) Arguments .cols < tidy-select > Columns to transform. You can't select grouping columns because they are already automatically handled by the verb (i.e. summarise() or mutate() ). .fns Functions to apply to each of the selected columns. Possible values are: A function, e.g. mean . A purrr-style lambda, e.g. ~ mean(.x, na.rm = TRUE) A named list of functions or lambdas, e.g. list(mean = mean, n_miss = ~ sum(is.na(.x)) . Each function is applied to each column, and the output is named by combining the function name and the column name using the glue specification in .names . Within these functions you can use cur_column() and cur_group() to access the current column and grouping keys respectively. ... Additional arguments for the function calls in .fns are no longer accepted in ... because it's not clear when they should be evaluated: once per across() or once per group? Instead supply additional arguments directly in .fns by using a lambda. For example, instead of across(a:b, mean, na.rm = TRUE) write across(a:b, ~ mean(.x, na.rm = TRUE)) . .names A glue specification that describes how to name the output columns. This can use {.col} to stand for the selected column name, and {.fn} to stand for the name of the function being applied. The default ( NULL ) is equivalent to "{.col}" for the single function case and "{.col}_{.fn}" for the case where a list is used for .fns . .unpack Optionally unpack data frames returned by functions in .fns , which expands the df-columns out into individual columns, retaining the number of rows in the data frame. If FALSE , the default, no unpacking is done. If TRUE , unpacking is done with a default glue specification of "{outer}_{inner}" . Otherwise, a single glue specification can be supplied to describe how to name the unpacked columns. This can use {outer} to refer to the name originally generated by .names , and {inner} to refer to the names of the data frame you are unpacking. Details When there are no selected columns: if_any() will return FALSE , consistent with the behavior of any() when called without inputs. if_all() will return TRUE , consistent with the behavior of all() when called without inputs. Value across() typically returns a tibble with one column for each column in .cols and each function in .fns . If .unpack is used, more columns may be returned depending on how the results of .fns are unpacked. if_any() and if_all() return a logical vector. Timing of evaluation R code in dplyr verbs is generally evaluated once per group. Inside across() however, code is evaluated once for each combination of columns and groups. If the evaluation timing is important, for example if you're generating random variables, think about when it should happen and place your code in consequence. gdf <- tibble(g = c(1, 1, 2, 3), v1 = 10:13, v2 = 20:23) |> group_by(g) set.seed(1) # Outside: 1 normal variate n <- rnorm(1) gdf |> mutate(across(v1:v2, ~ .x + n)) #> # A tibble: 4 x 3 #> # Groups: g [3] #> g v1 v2 #> <dbl> <dbl> <dbl> #> 1 1 9.37 19.4 #> 2 1 10.4 20.4 #> 3 2 11.4 21.4 #> 4 3 12.4 22.4 # Inside a verb: 3 normal variates (ngroup) gdf |> mutate(n = rnorm(1), across(v1:v2, ~ .x + n)) #> # A tibble: 4 x 4 #> # Groups: g [3] #> g v1 v2 n #> <dbl> <dbl> <dbl> <dbl> #> 1 1 10.2 20.2 0.184 #> 2 1 11.2 21.2 0.184 #> 3 2 11.2 21.2 -0.836 #> 4 3 14.6 24.6 1.60 # Inside `across()`: 6 normal variates (ncol * ngroup) gdf |> mutate(across(v1:v2, ~ .x + rnorm(1))) #> # A tibble: 4 x 3 #> # Groups: g [3] #> g v1 v2 #> <dbl> <dbl> <dbl> #> 1 1 10.3 20.7 #> 2 1 11.3 21.7 #> 3 2 11.2 22.6 #> 4 3 13.5 22.7 See Also c_across() for a function that returns a vector Examples # For better printing iris <- as_tibble(iris) # across() ----------------------------------------------------------------- # Using everything() to apply the same function to all columns iris |> mutate(across(everything(), as.character)) # Different ways to select the same set of columns # See <https://tidyselect.r-lib.org/articles/syntax.html> for details iris |> mutate(across(c(Sepal.Length, Sepal.Width), round)) iris |> mutate(across(c(1, 2), round)) iris |> mutate(across(1:Sepal.Width, round)) iris |> mutate(across(where(is.double) &Package 'dplyr' reference manual Package 'dplyr' Title: A Grammar of Data Manipulation Description: A fast, consistent tool for working with data frame like objects, both in memory and out of memory. Authors: Hadley Wickham [aut, cre] (ORCID: <https://orcid.org/0000-0003-4757-117X>), Romain François [aut] (ORCID: <https://orcid.org/0000-0002-2444-4226>), Lionel Henry [aut], Kirill Müller [aut] (ORCID: <https://orcid.org/0000-0002-1416-3412>), Davis Vaughan [aut] (ORCID: <https://orcid.org/0000-0003-4777-038X>), Posit Software, PBC [cph, fnd] Maintainer: Hadley Wickham < [email protected] > License: MIT + file LICENSE Version: 1.2.1.9000 Built: 2026-05-06 17:33:23 UTC Source: https://github.com/tidyverse/dplyr Help Index Apply a function (or functions) across multiple columns Apply predicate to all variables Order rows using column values Copy tables to same source, if necessary Band membership Detect where values fall in a specified range Bind multiple data frames by column Bind multiple data frames by row Combine values from multiple columns A general vectorised if-else Find the first non-missing element Force computation of a database query Generate a unique identifier for consecutive combinations Information about the "current" group or variable Copy a local data frame to a remote src Count the observations in each group Cross join Cumulative versions of any, all, and mean Descending order Keep distinct/unique rows Per-operation grouping with .by/by Explain details of a tbl Keep or drop rows that match a condition Filtering joins Get a glimpse of your data Group by one or more variables Select grouping variables Apply a function to each group Trim grouping structure Flag a character vector as SQL identifiers Vectorised if-else Join specifications Compute lagged or leading values Create, modify, and delete columns Mutating joins Count unique combinations Convert values to NA Compare two numeric vectors Nest join Extract the first, last, or nth value from a vector Bucket a numeric vector into n groups A helper function for ordering window function output Proportional ranking functions Select a subset of columns Extract a single column Recode values Recode and replace values Transform each group to an arbitrary number of rows Change column order Rename columns Integer ranking functions Manipulate individual rows Group input by rows Operate on a selection of variables Keep or drop columns using their names and types Set operations Subset rows using their positions SQL escaping. Starwars characters Storm tracks data Summarise each group down to one row Create a table from a data source Select variables Elementwise any() and all() Apply a function (or functions) across multiple columns Description across() makes it easy to apply the same transformation to multiple columns, allowing you to use select() semantics inside in "data-masking" functions like summarise() and mutate() . See vignette("colwise") for more details. if_any() and if_all() apply the same predicate function to a selection of columns and combine the results into a single logical vector: if_any() is TRUE when the predicate is TRUE for any of the selected columns, if_all() is TRUE when the predicate is TRUE for all selected columns. If you just need to select columns without applying a transformation to each of them, then you probably want to use pick() instead. across() supersedes the family of "scoped variants" like summarise_at() , summarise_if() , and summarise_all() . Usage across(.cols, .fns, ..., .names = NULL, .unpack = FALSE) if_any(.cols, .fns, ..., .names = NULL) if_all(.cols, .fns, ..., .names = NULL) across ( .cols , .fns , ... , .names = NULL , .unpack = FALSE ) if_any ( .cols , .fns , ... , .names = NULL ) if_all ( .cols , .fns , ... , .names = NULL ) Arguments .cols < tidy-select > Columns to transform. You can't select grouping columns because they are already automatically handled by the verb (i.e. summarise() or mutate() ). .fns Functions to apply to each of the selected columns. Possible values are: A function, e.g. mean . A purrr-style lambda, e.g. ~ mean(.x, na.rm = TRUE) A named list of functions or lambdas, e.g. list(mean = mean, n_miss = ~ sum(is.na(.x)) . Each function is applied to each column, and the output is named by combining the function name and the column name using the glue specification in .names . Within these functions you can use cur_column() and cur_group() to access the current column and grouping keys respectively. ... Additional arguments for the function calls in .fns are no longer accepted in ... because it's not clear when they should be evaluated: once per across() or once per group? Instead supply additional arguments directly in .fns by using a lambda. For example, instead of across(a:b, mean, na.rm = TRUE) write across(a:b, ~ mean(.x, na.rm = TRUE)) . .names A glue specification that describes how to name the output columns. This can use {.col} to stand for the selected column name, and {.fn} to stand for the name of the function being applied. The default ( NULL ) is equivalent to "{.col}" for the single function case and "{.col}_{.fn}" for the case where a list is used for .fns . .unpack Optionally unpack data frames returned by functions in .fns , which expands the df-columns out into individual columns, retaining the number of rows in the data frame. If FALSE , the default, no unpacking is done. If TRUE , unpacking is done with a default glue specification of "{outer}_{inner}" . Otherwise, a single glue specification can be supplied to describe how to name the unpacked columns. This can use {outer} to refer to the name originally generated by .names , and {inner} to refer to the names of the data frame you are unpacking. Details When there are no selected columns: if_any() will return FALSE , consistent with the behavior of any() when called without inputs. if_all() will return TRUE , consistent with the behavior of all() when called without inputs. Value across() typically returns a tibble with one column for each column in .cols and each function in .fns . If .unpack is used, more columns may be returned depending on how the results of .fns are unpacked. if_any() and if_all() return a logical vector. Timing of evaluation R code in dplyr verbs is generally evaluated once per group. Inside across() however, code is evaluated once for each combination of columns and groups. If the evaluation timing is important, for example if you're generating random variables, think about when it should happen and place your code in consequence. gdf <- tibble(g = c(1, 1, 2, 3), v1 = 10:13, v2 = 20:23) |> group_by(g) set.seed(1) # Outside: 1 normal variate n <- rnorm(1) gdf |> mutate(across(v1:v2, ~ .x + n)) #> # A tibble: 4 x 3 #> # Groups: g [3] #> g v1 v2 #> <dbl> <dbl> <dbl> #> 1 1 9.37 19.4 #> 2 1 10.4 20.4 #> 3 2 11.4 21.4 #> 4 3 12.4 22.4 # Inside a verb: 3 normal variates (ngroup) gdf |> mutate(n = rnorm(1), across(v1:v2, ~ .x + n)) #> # A tibble: 4 x 4 #> # Groups: g [3] #> g v1 v2 n #> <dbl> <dbl> <dbl> <dbl> #> 1 1 10.2 20.2 0.184 #> 2 1 11.2 21.2 0.184 #> 3 2 11.2 21.2 -0.836 #> 4 3 14.6 24.6 1.60 # Inside `across()`: 6 normal variates (ncol * ngroup) gdf |> mutate(across(v1:v2, ~ .x + rnorm(1))) #> # A tibble: 4 x 3 #> # Groups: g [3] #> g v1 v2 #> <dbl> <dbl> <dbl> #> 1 1 10.3 20.7 #> 2 1 11.3 21.7 #> 3 2 11.2 22.6 #> 4 3 13.5 22.7 See Also c_across() for a function that returns a vector Examples # For better printing iris <- as_tibble(iris) # across() ----------------------------------------------------------------- # Using everything() to apply the same function to all columns iris |> mutate(across(everything(), as.character)) # Different ways to select the same set of columns # See <https://tidyselect.r-lib.org/articles/syntax.html> for details iris |> mutate(across(c(Sepal.Length, Sepal.Width), round)) iris |> mutate(across(c(1, 2), round)) iris |> mutate(across(1:Sepal.Width, round)) iris |> mutate(across(where(is.double) &across() makes it easy to apply the same transformation to multiple columns, allowing you to use [=select]select() semantics inside in "data-masking" functions like [=summarise]summarise() and [=mutate]mutate(). See vignette("colwise") for more details. if_any() and if_all() apply the same predicate function to a selection of columns and combine the results into a single logical vector: if_any() is TRUE when the predicate is TRUE for any of the selected columns, if_all() is TRUE when the predicate is TRUE for all selected columns. If you just need to select columns without applying a transformation to each of them, then you probably want to use [=pick]pick() instead. across() supersedes the family of "scoped variants" like summarise_at(), summarise_if(), and summarise_all().
across(.cols, .fns, ..., .names = NULL, .unpack = FALSE) if_any(.cols, .fns, ..., .names = NULL) if_all(.cols, .fns, ..., .names = NULL)# For better printing iris <- as_tibble(iris) # across() ----------------------------------------------------------------- # Using everything() to apply the same function to all columns iris |> mutate(across(everything(), as.character)) # Different ways to select the same set of columns # See <https://tidyselect.r-lib.org/articles/syntax.html> for details iris |> mutate(across(c(Sepal.Length, Sepal.Width), round)) iris |> mutate(across(c(1, 2), round)) iris |> mutate(across(1:Sepal.Width, round)) iris |> mutate(across(where(is.double) & !c(Petal.Length, Petal.Width), round)) # Using an external vector of names cols <- c("Sepal.Length", "Petal.Width") iris |> mutate(across(all_of(cols), round)) # If the external vector is named, the output columns will be named according # to those names names(cols) <- tolower(cols) iris |> mutate(across(all_of(cols), round)) # A purrr-style formula iris |> group_by(Species) |> summarise(across(starts_with("Sepal"), ~ mean(.x, na.rm = TRUE))) # A named list of functions iris |> group_by(Species) |> summarise(across(starts_with("Sepal"), list(mean = mean, sd = sd))) # Use the .names argument to control the output names iris |> group_by(Species) |> summarise(across(starts_with("Sepal"), mean, .names = "mean_.col")) iris |> group_by(Species) |> summarise( across( starts_with("Sepal"), list(mean = mean, sd = sd), .names = ".col..fn" ) ) # If a named external vector is used for column selection, .names will use # those names when constructing the output names iris |> group_by(Species) |> summarise(across(all_of(cols), mean, .names = "mean_.col")) # When the list is not named, .fn is replaced by the function's position iris |> group_by(Species) |> summarise( across(starts_with("Sepal"), list(mean, sd), .names = ".col.fn.fn") ) # When the functions in .fns return a data frame, you typically get a # "packed" data frame back quantile_df <- function(x, probs = c(0.25, 0.5, 0.75)) tibble(quantile = probs, value = quantile(x, probs)) iris |> reframe(across(starts_with("Sepal"), quantile_df)) # Use .unpack to automatically expand these packed data frames into their # individual columns iris |> reframe(across(starts_with("Sepal"), quantile_df, .unpack = TRUE)) # .unpack can utilize a glue specification if you don't like the defaults iris |> reframe( across(starts_with("Sepal"), quantile_df, .unpack = "outer.inner") ) # This is also useful inside mutate(), for example, with a multi-lag helper multilag <- function(x, lags = 1:3) names(lags) <- as.character(lags) purrr::map_dfr(lags, lag, x = x) iris |> group_by(Species) |> mutate(across(starts_with("Sepal"), multilag, .unpack = TRUE)) |> select(Species, starts_with("Sepal")) # if_any() and if_all() ---------------------------------------------------- iris |> filter(if_any(ends_with("Width"), ~ . > 4)) iris |> filter_out(if_any(ends_with("Width"), ~ . > 4)) iris |> filter(if_all(ends_with("Width"), ~ . > 2)) iris |> filter_out(if_all(ends_with("Width"), ~ . > 2))htmlhttps://lifecycle.r-lib.org/articles/stages.html#deprecatedlifecycle-deprecated.svgoptions: alt='[Deprecated]'[Deprecated] all_equal() allows you to compare data frames, optionally ignoring row and column names. It is deprecated as of dplyr 1.1.0, because it makes it too easy to ignore important differences.
all_equal( target, current, ignore_col_order = TRUE, ignore_row_order = TRUE, convert = FALSE, ... )scramble <- function(x) x[sample(nrow(x)), sample(ncol(x))] # `all_equal()` ignored row and column ordering by default, # but we now feel that that makes it too easy to make mistakes mtcars2 <- scramble(mtcars) all_equal(mtcars, mtcars2) # Instead, be explicit about the row and column ordering all.equal( mtcars, mtcars2[rownames(mtcars), names(mtcars)] )htmlhttps://lifecycle.r-lib.org/articles/stages.html#supersededlifecycle-superseded.svgoptions: alt='[Superseded]'[Superseded] all_vars() and any_vars() were only needed for the scoped verbs, which have been superseded by the use of [=across]across() in an existing verb. See vignette("colwise") for details. These quoting functions signal to scoped filtering verbs (e.g. [=filter_if]filter_if() or [=filter_all]filter_all()) that a predicate expression should be applied to all relevant variables. The all_vars() variant takes the intersection of the predicate expressions with & while the any_vars() variant takes the union with |.
all_vars(expr) any_vars(expr)Use @inheritParams args_by to consistently document .by.
arrange() orders the rows of a data frame by the values of selected columns. Unlike other dplyr verbs, arrange() largely ignores grouping; you need to explicitly mention grouping variables (or use .by_group = TRUE) in order to group by them, and functions of variables are evaluated once per data frame, not once per group.
arrange(.data, ..., .by_group = FALSE) arrangedata.frame(.data, ..., .by_group = FALSE, .locale = NULL)arrange(mtcars, cyl, disp) arrange(mtcars, desc(disp)) # grouped arrange ignores groups by_cyl <- mtcars |> group_by(cyl) by_cyl |> arrange(desc(wt)) # Unless you specifically ask: by_cyl |> arrange(desc(wt), .by_group = TRUE) # use embracing when wrapping in a function; # see ?rlang::args_data_masking for more details tidy_eval_arrange <- function(.data, var) .data |> arrange( var ) tidy_eval_arrange(mtcars, mpg) # Use `across()` or `pick()` to select columns with tidy-select iris |> arrange(pick(starts_with("Sepal"))) iris |> arrange(across(starts_with("Sepal"), desc))htmlhttps://lifecycle.r-lib.org/articles/stages.html#supersededlifecycle-superseded.svgoptions: alt='[Superseded]'[Superseded] Scoped verbs (_if, _at, _all) have been superseded by the use of [=pick]pick() or [=across]across() in an existing verb. See vignette("colwise") for details. These scoped variants of [=arrange]arrange() sort a data frame by a selection of variables. Like [=arrange]arrange(), you can modify the variables before ordering with the .funs argument.
arrange_all(.tbl, .funs = list(), ..., .by_group = FALSE, .locale = NULL) arrange_at(.tbl, .vars, .funs = list(), ..., .by_group = FALSE, .locale = NULL) arrange_if( .tbl, .predicate, .funs = list(), ..., .by_group = FALSE, .locale = NULL )df <- as_tibble(mtcars) arrange_all(df) # -> arrange(df, pick(everything())) arrange_all(df, desc) # -> arrange(df, across(everything(), desc))Copy tables to same source, if necessary
auto_copy(x, y, copy = FALSE, ...)The sql_ generics are used to build the different types of SQL queries. The default implementations in dbplyr generates ANSI 92 compliant SQL. The db_ generics execute actions on the database. The default implementations in dbplyr typically just call the standard DBI S4 method.
db_desc(x) sql_translate_env(con) db_list_tables(con) db_has_table(con, table) db_data_type(con, fields) db_save_query(con, sql, name, temporary = TRUE, ...) db_begin(con, ...) db_commit(con, ...) db_rollback(con, ...) db_write_table(con, table, types, values, temporary = FALSE, ...) db_create_table(con, table, types, temporary = FALSE, ...) db_insert_into(con, table, values, ...) db_create_indexes(con, table, indexes = NULL, unique = FALSE, ...) db_create_index(con, table, columns, name = NULL, unique = FALSE, ...) db_drop_table(con, table, force = FALSE, ...) db_analyze(con, table, ...) db_explain(con, sql, ...) db_query_fields(con, sql, ...) db_query_rows(con, sql, ...) sql_select( con, select, from, where = NULL, group_by = NULL, having = NULL, order_by = NULL, limit = NULL, distinct = FALSE, ... ) sql_subquery(con, from, name = random_table_name(), ...) sql_join(con, x, y, vars, type = "inner", by = NULL, ...) sql_semi_join(con, x, y, anti = FALSE, by = NULL, ...) sql_set_op(con, x, y, method) sql_escape_string(con, x) sql_escape_ident(con, x)These data sets describe band members of the Beatles and Rolling Stones. They are toy data sets that can be displayed in their entirety on a slide (e.g. to demonstrate a join).
band_members band_instruments band_instruments2band_members band_instruments band_instruments2This is a shortcut for x >= left & x <= right, implemented for local vectors and translated to the appropriate SQL for remote tables.
between(x, left, right, ..., ptype = NULL)between(1:12, 7, 9) x <- rnorm(1e2) x[between(x, -1, 1)] # On a tibble using `filter()` filter(starwars, between(height, 100, 150)) # Using the `ptype` argument with ordered factors, where otherwise everything # is cast to the common type of character before the comparison x <- ordered( c("low", "medium", "high", "medium"), levels = c("low", "medium", "high") ) between(x, "medium", "high") between(x, "medium", "high", ptype = x)Bind any number of data frames by column, making a wider result. This is similar to do.call(cbind, dfs). Where possible prefer using a [=left_join]join to combine multiple data frames. bind_cols() binds the rows in order in which they appear so it is easy to create meaningless results without realising it.
bind_cols( ..., .name_repair = c("unique", "universal", "check_unique", "minimal") )df1 <- tibble(x = 1:3) df2 <- tibble(y = 3:1) bind_cols(df1, df2) # Row sizes must be compatible when column-binding try(bind_cols(tibble(x = 1:3), tibble(y = 1:2)))Bind any number of data frames by row, making a longer result. This is similar to do.call(rbind, dfs), but the output will contain all columns that appear in any of the inputs.
bind_rows(..., .id = NULL)df1 <- tibble(x = 1:2, y = letters[1:2]) df2 <- tibble(x = 4:5, z = 1:2) # You can supply individual data frames as arguments: bind_rows(df1, df2) # Or a list of data frames: bind_rows(list(df1, df2)) # When you supply a column name with the `.id` argument, a new # column is created to link each row to its original data frame bind_rows(list(df1, df2), .id = "id") bind_rows(list(a = df1, b = df2), .id = "id")c_across() is designed to work with [=rowwise]rowwise() to make it easy to perform row-wise aggregations. It has two differences from c(): It uses tidy select semantics so you can easily select multiple variables. See vignette("rowwise") for more details. It uses [vctrs:vec_c]vctrs::vec_c() in order to give safer outputs.
c_across(cols)df <- tibble(id = 1:4, w = runif(4), x = runif(4), y = runif(4), z = runif(4)) df |> rowwise() |> mutate( sum = sum(c_across(w:z)), sd = sd(c_across(w:z)) )case_when() and replace_when() are two forms of vectorized [=if_else]if_else(). They work by evaluating each case sequentially and using the first match for each element to determine the corresponding value in the output vector. Use case_when() when creating an entirely new vector. Use replace_when() when partially updating an existing vector. If you are just replacing a few values within an existing vector, then replace_when() is always a better choice because it is type stable, size stable, pipes better, and better expresses intent. A major difference between the two functions is what happens when no cases match: case_when() falls through to a .default as a final "else" statement. replace_when() retains the original values from x. See vignette("recoding-replacing") for more examples.
case_when( ..., .default = NULL, .unmatched = "default", .ptype = NULL, .size = NULL ) replace_when(x, ...)x <- 1:70 case_when( x %% 35 == 0 ~ "fizz buzz", x %% 5 == 0 ~ "fizz", x %% 7 == 0 ~ "buzz", .default = as.character(x) ) # Like an if statement, the arguments are evaluated in order, so you must # proceed from the most specific to the most general. This won't work: case_when( x %% 5 == 0 ~ "fizz", x %% 7 == 0 ~ "buzz", x %% 35 == 0 ~ "fizz buzz", .default = as.character(x) ) # If none of the cases match and no `.default` is supplied, NA is used: case_when( x %% 35 == 0 ~ "fizz buzz", x %% 5 == 0 ~ "fizz", x %% 7 == 0 ~ "buzz" ) # Note that `NA` values on the LHS are treated like `FALSE` and will be # assigned the `.default` value. You must handle them explicitly if you # want to use a different value. The exact way to handle missing values is # dependent on the set of LHS conditions you use. x[2:4] <- NA_real_ case_when( x %% 35 == 0 ~ "fizz buzz", x %% 5 == 0 ~ "fizz", x %% 7 == 0 ~ "buzz", is.na(x) ~ "nope", .default = as.character(x) ) # `case_when()` is not a replacement for basic if/else control flow. When # you have a single scalar condition, using if/else is faster, simpler to # reason about, and is lazy on the branch that isn't run. For example, this # seems to work: x <- "value" case_when(is.character(x) ~ x, .default = "not-a-character") # Until `x` is a non-character type x <- 1 try(case_when(is.character(x) ~ x, .default = "not-a-character")) # Instead, you should use if/else if (is.character(x)) y <- x else y <- "not-a-character" y # If you believe that you've covered every possible case, then set # `.unmatched = "error"` rather than supplying a `.default`. This adds an # extra layer of safety to `case_when()` and is particularly useful when you # have a series of complex expressions! set.seed(123) x <- sample(50) # Oops, we forgot to handle `50` try(case_when( x < 10 ~ "ten", x < 20 ~ "twenty", x < 30 ~ "thirty", x < 40 ~ "forty", x < 50 ~ "fifty", .unmatched = "error" )) case_when( x < 10 ~ "ten", x < 20 ~ "twenty", x < 30 ~ "thirty", x < 40 ~ "forty", x <= 50 ~ "fifty", .unmatched = "error" ) # Note that `NA` is considered unmatched and must be handled with its own # explicit case, even if that case just propagates the missing value! x[c(2, 5)] <- NA case_when( x < 10 ~ "ten", x < 20 ~ "twenty", x < 30 ~ "thirty", x < 40 ~ "forty", x <= 50 ~ "fifty", is.na(x) ~ NA, .unmatched = "error" ) # `replace_when()` is useful when you're updating an existing vector, # rather than creating an entirely new one. Note the so-far unused "puppy" # factor level: pets <- tibble( name = c("Max", "Bella", "Chuck", "Luna", "Cooper"), type = factor( c("dog", "dog", "cat", "dog", "cat"), levels = c("dog", "cat", "puppy") ), age = c(1, 3, 5, 2, 4) ) # We can replace some values with `"puppy"` based on arbitrary conditions. # Even though we are using a character `"puppy"` value, `replace_when()` will # automatically cast it to the factor type of `type` for us. pets |> mutate( type = replace_when(type, type == "dog" & age <= 2 ~ "puppy") ) # Compare that with this `case_when()` call, which loses the factor class. # It's always better to use `replace_when()` when updating a few values in # an existing vector! pets |> mutate( type = case_when(type == "dog" & age <= 2 ~ "puppy", .default = type) ) # `case_when()` and `replace_when()` evaluate all RHS expressions, and then # construct their result by extracting the selected (via the LHS expressions) # parts. For example, `NaN`s are produced here because `sqrt(y)` is evaluated # on all of `y`, not just where `y >= 0`. y <- seq(-2, 2, by = .5) replace_when(y, y >= 0 ~ sqrt(y)) # These functions are particularly useful inside `mutate()` when you want to # create a new variable that relies on a complex combination of existing # variables starwars |> select(name:mass, gender, species) |> mutate( type = case_when( height > 200 | mass > 200 ~ "large", species == "Droid" ~ "robot", .default = "other" ) ) # `case_when()` is not a tidy eval function. If you'd like to reuse # the same patterns, extract the `case_when()` call into a normal # function: case_character_type <- function(height, mass, species) case_when( height > 200 | mass > 200 ~ "large", species == "Droid" ~ "robot", .default = "other" ) case_character_type(150, 250, "Droid") case_character_type(150, 150, "Droid") # Such functions can be used inside `mutate()` as well: starwars |> mutate(type = case_character_type(height, mass, species)) |> pull(type) # `case_when()` ignores `NULL` inputs. This is useful when you'd # like to use a pattern only under certain conditions. Here we'll # take advantage of the fact that `if` returns `NULL` when there is # no `else` clause: case_character_type <- function(height, mass, species, robots = TRUE) case_when( height > 200 | mass > 200 ~ "large", if (robots) species == "Droid" ~ "robot", .default = "other" ) starwars |> mutate(type = case_character_type(height, mass, species, robots = FALSE)) |> pull(type) # `replace_when()` can also be used in combination with `pick()` to # conditionally mutate rows within multiple columns using a single condition. # Here `replace_when()` returns a data frame with new `species` and `name` # columns, which `mutate()` then automatically unpacks. starwars |> select(homeworld, species, name) |> mutate(replace_when( pick(species, name), homeworld == "Tatooine" ~ tibble( species = "Tatooinese", name = paste(name, "(Tatooine)") ) ))htmlhttps://lifecycle.r-lib.org/articles/stages.html#deprecatedlifecycle-deprecated.svgoptions: alt='[Deprecated]'[Deprecated] case_match() is deprecated. Please use [=recode_values]recode_values() and [=replace_values]replace_values() instead, which are more powerful, have more intuitive names, and have better safety. In addition to the familiar two-sided formula interface, these functions also have from and to arguments which allow you to incorporate a lookup table into the recoding process. This function allows you to vectorise multiple [=switch]switch() statements. Each case is evaluated sequentially and the first match for each element determines the corresponding value in the output vector. If no cases match, the .default is used.
case_match(.x, ..., .default = NULL, .ptype = NULL)# `case_match()` is deprecated and has been replaced by `recode_values()` and # `replace_values()` x <- c("a", "b", "a", "d", "b", NA, "c", "e") # `recode_values()` is a 1:1 replacement for `case_match()` case_match( x, "a" ~ 1, "b" ~ 2, "c" ~ 3, "d" ~ 4 ) recode_values( x, "a" ~ 1, "b" ~ 2, "c" ~ 3, "d" ~ 4 ) # `recode_values()` has an additional `unmatched` argument to help you catch # missed mappings try(recode_values( x, "a" ~ 1, "b" ~ 2, "c" ~ 3, "d" ~ 4, unmatched = "error" )) # `recode_values()` also has additional `from` and `to` arguments, which are # useful when your lookup table is defined elsewhere (for example, it could # be read in from a CSV file). This is very difficult to do with # `case_match()`! lookup <- tribble( ~from, ~to, "a", 1, "b", 2, "c", 3, "d", 4 ) recode_values(x, from = lookup$from, to = lookup$to) # Both `case_match()` and `recode_values()` work with more than just # character inputs: y <- as.integer(c(1, 2, 1, 3, 1, NA, 2, 4)) case_match( y, c(1, 3) ~ "odd", c(2, 4) ~ "even", .default = "missing" ) recode_values( y, c(1, 3) ~ "odd", c(2, 4) ~ "even", default = "missing" ) # Or with a lookup table lookup <- tribble( ~from, ~to, c(1, 3), "odd", c(2, 4), "even" ) recode_values(y, from = lookup$from, to = lookup$to, default = "missing") # `replace_values()` is a convenient way to replace selected values, leaving # everything else as is. It's similar to `case_match(y, .default = y)`. replace_values(y, NA ~ 0) case_match(y, NA ~ 0, .default = y) # Notably, `replace_values()` is type stable, which means that `y` can't # change types out from under you, unlike with `case_match()`! typeof(y) typeof(replace_values(y, NA ~ 0)) typeof(case_match(y, NA ~ 0, .default = y)) # We believe that `replace_values()` better expresses intent when doing a # partial replacement. Compare these two `mutate()` calls, each with the # goals of: # - Replace missings in `hair_color` # - Replace some of the `species` starwars |> mutate( hair_color = case_match(hair_color, NA ~ "unknown", .default = hair_color), species = case_match( species, "Human" ~ "Humanoid", "Droid" ~ "Robot", c("Wookiee", "Ewok") ~ "Hairy", .default = species ), .keep = "used" ) updates <- tribble( ~from, ~to, "Human", "Humanoid", "Droid", "Robot", c("Wookiee", "Ewok"), "Hairy" ) starwars |> mutate( hair_color = replace_values(hair_color, NA ~ "unknown"), species = replace_values(species, from = updates$from, to = updates$to), .keep = "used" )In dplyr 0.7.0, a number of database and SQL functions moved from dplyr to dbplyr. The generic functions stayed in dplyr (since there is no easy way to conditionally import a generic from different packages), but many other SQL and database helper functions moved. If you have written a backend, these functions generate the code you need to work with both dplyr 0.5.0 dplyr 0.7.0.
check_dbplyr() wrap_dbplyr_obj(obj_name)if (requireNamespace("dbplyr", quietly = TRUE)) withAutoprint(\ # examplesIf wrap_dbplyr_obj("build_sql") wrap_dbplyr_obj("base_agg") \) # examplesIfGiven a set of vectors, coalesce() finds the first non-missing value at each position. It's inspired by the SQL COALESCE function which does the same thing for SQL NULLs.
coalesce(..., .ptype = NULL, .size = NULL)# Replace missing values with a single value x <- sample(c(1:5, NA, NA, NA)) coalesce(x, 0L) # Or replace missing values with the corresponding non-missing value in # another vector x <- c(1, 2, NA, NA, 5, NA) y <- c(NA, NA, 3, 4, 5, NA) coalesce(x, y) # For cases like these where your replacement is a single value or a single # vector, `replace_values()` works just as well replace_values(x, NA ~ 0) coalesce(x, 0) replace_values(x, NA ~ y) coalesce(x, y) # `coalesce()` really shines when you have >2 vectors to coalesce with z <- c(NA, 2, 3, 4, 5, 6) coalesce(x, y, z) # If you're looking to replace values with `NA`, rather than replacing `NA` # with a value, then use `replace_values()` x <- c(0, -1, 5, -99, 8) replace_values(x, c(-1, -99) ~ NA) # The equivalent to a missing value in a list is `NULL` coalesce(list(1, 2, NULL, NA), list(0)) # Supply lists of vectors by splicing them into dots vecs <- list( c(1, 2, NA, NA, 5), c(NA, NA, 3, 4, 5) ) coalesce(!!!vecs)Extract out common by variables
common_by(by = NULL, x, y)compute() stores results in a remote temporary table. collect() retrieves data into a local tibble. collapse() is slightly different: it doesn't force computation, but instead forces generation of the SQL query. This is sometimes needed to work around bugs in dplyr's SQL generation. All functions preserve grouping and ordering.
compute(x, ...) collect(x, ...) collapse(x, ...)if (requireNamespace("dbplyr", quietly = TRUE) && requireNamespace("RSQLite", quietly = TRUE)) withAutoprint(\ # examplesIf mtcars2 <- dbplyr::src_memdb() |> copy_to(mtcars, name = "mtcars2-cc", overwrite = TRUE) remote <- mtcars2 |> filter(cyl == 8) |> select(mpg:drat) # Compute query and save in remote table compute(remote) # Compute query bring back to this session collect(remote) # Creates a fresh query based on the generated SQL collapse(remote) \) # examplesIfconsecutive_id() generates a unique identifier that increments every time a variable (or combination of variables) changes. Inspired by data.table::rleid().
consecutive_id(...)consecutive_id(c(TRUE, TRUE, FALSE, FALSE, TRUE, FALSE, NA, NA)) consecutive_id(c(1, 1, 1, 2, 1, 1, 2, 2)) df <- data.frame(x = c(0, 0, 1, 0), y = c(2, 2, 2, 2)) df |> group_by(x, y) |> summarise(n = n()) df |> group_by(id = consecutive_id(x, y), x, y) |> summarise(n = n())These functions return information about the "current" group or "current" variable, so only work inside specific contexts like [=summarise]summarise() and [=mutate]mutate(). n() gives the current group size. cur_group() gives the group keys, a tibble with one row and one column for each grouping variable. cur_group_id() gives a unique numeric identifier for the current group. cur_group_rows() gives the row indices for the current group. cur_column() gives the name of the current column (in [=across]across() only). See [=group_data]group_data() for equivalent functions that return values for all groups. See [=pick]pick() for a way to select a subset of columns using tidyselect syntax while inside summarise() or mutate().
n() cur_group() cur_group_id() cur_group_rows() cur_column()df <- tibble( g = sample(rep(letters[1:3], 1:3)), x = runif(6), y = runif(6) ) gf <- df |> group_by(g) gf |> summarise(n = n()) gf |> mutate(id = cur_group_id()) gf |> reframe(row = cur_group_rows()) gf |> summarise(data = list(cur_group())) gf |> mutate(across(everything(), ~ paste(cur_column(), round(.x, 2))))This function uploads a local data frame into a remote data source, creating the table definition as needed. Wherever possible, the new object will be temporary, limited to the current connection to the source.
copy_to(dest, df, name = deparse(substitute(df)), overwrite = FALSE, ...)iris2 <- dbplyr::src_memdb() |> copy_to(iris, overwrite = TRUE) iris2count() lets you quickly count the unique values of one or more variables: df |> count(a, b) is roughly equivalent to df |> group_by(a, b) |> summarise(n = n()). count() is paired with tally(), a lower-level helper that is equivalent to df |> summarise(n = n()). Supply wt to perform weighted counts, switching the summary from n = n() to n = sum(wt). add_count() and add_tally() are equivalents to count() and tally() but use mutate() instead of summarise() so that they add a new column with group-wise counts.
count(x, ..., wt = NULL, sort = FALSE, name = NULL) countdata.frame( x, ..., wt = NULL, sort = FALSE, name = NULL, .drop = group_by_drop_default(x) ) tally(x, wt = NULL, sort = FALSE, name = NULL) add_count(x, ..., wt = NULL, sort = FALSE, name = NULL, .drop = deprecated()) add_tally(x, wt = NULL, sort = FALSE, name = NULL)# count() is a convenient way to get a sense of the distribution of # values in a dataset starwars |> count(species) starwars |> count(species, sort = TRUE) starwars |> count(sex, gender, sort = TRUE) starwars |> count(birth_decade = round(birth_year, -1)) # use the `wt` argument to perform a weighted count. This is useful # when the data has already been aggregated once df <- tribble( ~name, ~gender, ~runs, "Max", "male", 10, "Sandra", "female", 1, "Susan", "female", 4 ) # counts rows: df |> count(gender) # counts runs: df |> count(gender, wt = runs) # When factors are involved, `.drop = FALSE` can be used to retain factor # levels that don't appear in the data df2 <- tibble( id = 1:5, type = factor(c("a", "c", "a", NA, "a"), levels = c("a", "b", "c")) ) df2 |> count(type) df2 |> count(type, .drop = FALSE) # Or, using `group_by()`: df2 |> group_by(type, .drop = FALSE) |> count() # tally() is a lower-level function that assumes you've done the grouping starwars |> tally() starwars |> group_by(species) |> tally() # both count() and tally() have add_ variants that work like # mutate() instead of summarise df |> add_count(gender, wt = runs) df |> add_tally(wt = runs)Cross joins match each row in x to every row in y, resulting in a data frame with nrow(x) * nrow(y) rows. Since cross joins result in all possible matches between x and y, they technically serve as the basis for all [=mutate-joins]mutating joins, which can generally be thought of as cross joins followed by a filter. In practice, a more specialized procedure is used for better performance.
cross_join(x, y, ..., copy = FALSE, suffix = c(".x", ".y"))# Cross joins match each row in `x` to every row in `y`. # Data within the columns is not used in the matching process. cross_join(band_instruments, band_members) # Control the suffix added to variables duplicated in # `x` and `y` with `suffix`. cross_join(band_instruments, band_members, suffix = c("", "_y"))dplyr provides cumall(), cumany(), and cummean() to complete R's set of cumulative functions.
cumall(x) cumany(x) cummean(x)# `cummean()` returns a numeric/integer vector of the same length # as the input vector. x <- c(1, 3, 5, 2, 2) cummean(x) cumsum(x) / seq_along(x) # `cumall()` and `cumany()` return logicals cumall(x < 5) cumany(x == 3) # `cumall()` vs. `cumany()` df <- data.frame( date = as.Date("2020-01-01") + 0:6, balance = c(100, 50, 25, -25, -50, 30, 120) ) # all rows after first overdraft df |> filter(cumany(balance < 0)) # all rows until first overdraft df |> filter(cumall(!(balance < 0)))htmlhttps://lifecycle.r-lib.org/articles/stages.html#defunctlifecycle-defunct.svgoptions: alt='[Defunct]'[Defunct] These functions were deprecated for at least two years before being made defunct. If there's a known replacement, calling the function will tell you about it.
# Deprecated in 1.0.0 ------------------------------------- combine(...) src_mysql( dbname, host = NULL, port = 0L, username = "root", password = "", ... ) src_postgres( dbname = NULL, host = NULL, port = NULL, user = NULL, password = NULL, ... ) src_sqlite(path, create = FALSE) src_local(tbl, pkg = NULL, env = NULL) src_df(pkg = NULL, env = NULL) tbl_df(data) as.tbl(x, ...) add_rownames(df, var = "rowname")htmlhttps://lifecycle.r-lib.org/articles/stages.html#defunctlifecycle-defunct.svgoptions: alt='[Defunct]'[Defunct] mutate_each() and summarise_each() are deprecated in favour of the new [=across]across() function that works within summarise() and mutate().
summarise_each(tbl, funs, ...) summarise_each_(tbl, funs, vars) mutate_each(tbl, funs, ...) mutate_each_(tbl, funs, vars) summarize_each(tbl, funs, ...) summarize_each_(tbl, funs, vars)htmlhttps://lifecycle.r-lib.org/articles/stages.html#defunctlifecycle-defunct.svgoptions: alt='[Defunct]'[Defunct] dplyr used to offer twin versions of each verb suffixed with an underscore. These versions had standard evaluation (SE) semantics: rather than taking arguments by code, like NSE verbs, they took arguments by value. Their purpose was to make it possible to program with dplyr. However, dplyr now uses tidy evaluation semantics. NSE verbs still capture their arguments, but you can now unquote parts of these arguments. This offers full programmability with NSE verbs. Thus, the underscored versions are now superfluous. Unquoting triggers immediate evaluation of its operand and inlines the result within the captured expression. This result can be a value or an expression to be evaluated later with the rest of the argument. See vignette("programming") for more information.
add_count_(x, vars, wt = NULL, sort = FALSE) add_tally_(x, wt, sort = FALSE) arrange_(.data, ..., .dots = list()) count_(x, vars, wt = NULL, sort = FALSE, .drop = group_by_drop_default(x)) distinct_(.data, ..., .dots, .keep_all = FALSE) do_(.data, ..., .dots = list()) filter_(.data, ..., .dots = list()) funs_(dots, args = list(), env = base_env()) group_by_(.data, ..., .dots = list(), add = FALSE) group_indices_(.data, ..., .dots = list()) mutate_(.data, ..., .dots = list()) tally_(x, wt, sort = FALSE) transmute_(.data, ..., .dots = list()) rename_(.data, ..., .dots = list()) select_(.data, ..., .dots = list()) slice_(.data, ..., .dots = list()) summarise_(.data, ..., .dots = list()) summarize_(.data, ..., .dots = list())htmlhttps://lifecycle.r-lib.org/articles/stages.html#deprecatedlifecycle-deprecated.svgoptions: alt='[Deprecated]'[Deprecated] These functions were deprecated in dplyr 1.1.0. cur_data() is deprecated in favor of [=pick]pick(). cur_data_all() is deprecated but does not have a direct replacement as selecting the grouping variables is not well-defined and is unlikely to ever be useful.
cur_data() cur_data_all()Transform a vector into a format that will be sorted in descending order. This is useful within [=arrange]arrange().
desc(x)desc(1:10) desc(factor(letters)) first_day <- seq(as.Date("1910/1/1"), as.Date("1920/1/1"), "years") desc(first_day) starwars |> arrange(desc(mass))Prints the dimensions of an array-like object in a user-friendly manner, substituting NA with ?? (for SQL queries).
dim_desc(x)dim_desc(mtcars)Keep only unique/distinct rows from a data frame. This is similar to [=unique.data.frame]unique.data.frame() but considerably faster.
distinct(.data, ..., .keep_all = FALSE)df <- tibble( x = sample(10, 100, rep = TRUE), y = sample(10, 100, rep = TRUE) ) nrow(df) nrow(distinct(df)) nrow(distinct(df, x, y)) distinct(df, x) distinct(df, y) # You can choose to keep all other variables as well distinct(df, x, .keep_all = TRUE) distinct(df, y, .keep_all = TRUE) # You can also use distinct on computed variables distinct(df, diff = abs(x - y)) # Use `pick()` to select columns with tidy-select distinct(starwars, pick(contains("color"))) # Grouping ------------------------------------------------- df <- tibble( g = c(1, 1, 2, 2, 2), x = c(1, 1, 2, 1, 2), y = c(3, 2, 1, 3, 1) ) df <- df |> group_by(g) # With grouped data frames, distinctness is computed within each group df |> distinct(x) # When `...` are omitted, `distinct()` still computes distinctness using # all variables in the data frame df |> distinct()htmlhttps://lifecycle.r-lib.org/articles/stages.html#supersededlifecycle-superseded.svgoptions: alt='[Superseded]'[Superseded] Scoped verbs (_if, _at, _all) have been superseded by the use of [=pick]pick() or [=across]across() in an existing verb. See vignette("colwise") for details. These scoped variants of [=distinct]distinct() extract distinct rows by a selection of variables. Like distinct(), you can modify the variables before ordering with the .funs argument.
distinct_all(.tbl, .funs = list(), ..., .keep_all = FALSE) distinct_at(.tbl, .vars, .funs = list(), ..., .keep_all = FALSE) distinct_if(.tbl, .predicate, .funs = list(), ..., .keep_all = FALSE)df <- tibble(x = rep(2:5, each = 2) / 2, y = rep(2:3, each = 4) / 2) distinct_all(df) # -> distinct(df, pick(everything())) distinct_at(df, vars(x,y)) # -> distinct(df, pick(x, y)) distinct_if(df, is.numeric) # -> distinct(df, pick(where(is.numeric))) # You can supply a function that will be applied before extracting the distinct values # The variables of the sorted tibble keep their original values. distinct_all(df, round) # -> distinct(df, across(everything(), round))*_prepare() performs standard manipulation that is needed prior to actual data processing. They are only be needed by packages that implement dplyr backends.
distinct_prepare( .data, vars, group_vars = character(), .keep_all = FALSE, caller_env = caller_env(2), error_call = caller_env() ) group_by_prepare( .data, ..., .add = FALSE, .dots = deprecated(), add = deprecated(), error_call = caller_env() )htmlhttps://lifecycle.r-lib.org/articles/stages.html#supersededlifecycle-superseded.svgoptions: alt='[Superseded]'[Superseded] do() is superseded as of dplyr 1.0.0, because its syntax never really felt like it belonged with the rest of dplyr. It's replaced by a combination of [=reframe]reframe() (which can produce multiple rows and multiple columns), [=nest_by]nest_by() (which creates a rowwise tibble of nested data), and [=pick]pick() (which allows you to access the data for the "current" group).
do(.data, ...)# do() with unnamed arguments becomes reframe() or summarise() # . becomes pick() by_cyl <- mtcars |> group_by(cyl) by_cyl |> do(head(., 2)) # -> by_cyl |> reframe(head(pick(everything()), 2)) by_cyl |> slice_head(n = 2) # Can refer to variables directly by_cyl |> do(mean = mean(.$vs)) # -> by_cyl |> summarise(mean = mean(vs)) # do() with named arguments becomes nest_by() + mutate() & list() models <- by_cyl |> do(mod = lm(mpg ~ disp, data = .)) # -> models <- mtcars |> nest_by(cyl) |> mutate(mod = list(lm(mpg ~ disp, data = data))) models |> summarise(rsq = summary(mod)$r.squared) # use broom to turn models into data models |> do(data.frame( var = names(coef(.$mod)), coef(summary(.$mod))) ) if (requireNamespace("broom", quietly = TRUE)) withAutoprint(\ # examplesIf # -> models |> reframe(broom::tidy(mod)) \) # examplesIfThis page documents details about the locale used by [=arrange]arrange() when ordering character vectors. Default locale The default locale used by arrange() is the C locale. This is used when .locale = NULL unless the deprecated dplyr.legacy_locale global option is set to TRUE. You can also force the C locale to be used unconditionally with .locale = "C". The C locale is not exactly the same as English locales, such as "en". The main difference is that the C locale groups the English alphabet by case, while most English locales group the alphabet by letter. For example, c("a", "b", "C", "B", "c") will sort as c("B", "C", "a", "b", "c") in the C locale, with all uppercase letters coming before lowercase letters, but will sort as c("a", "b", "B", "c", "C") in an English locale. This often makes little practical difference during data analysis, because both return identical results when case is consistent between observations. Reproducibility The C locale has the benefit of being completely reproducible across all supported R versions and operating systems with no extra effort. If you set .locale to an option from [stringi:stri_locale_list]stringi::stri_locale_list(), then stringi must be installed by anyone who wants to run your code. If you utilize this in a package, then stringi should be placed in Imports. Legacy behavior htmlhttps://lifecycle.r-lib.org/articles/stages.html#deprecatedlifecycle-deprecated.svgoptions: alt='[Deprecated]'[Deprecated] Prior to dplyr 1.1.0, character columns were ordered in the system locale. Setting the global option dplyr.legacy_locale to TRUE retains this legacy behavior, but this has been deprecated. Update existing code to explicitly call arrange(.locale = ) instead. Run Sys.getlocale("LC_COLLATE") to determine your system locale, and compare that against the list in [stringi:stri_locale_list]stringi::stri_locale_list() to find an appropriate value for .locale, i.e. for American English, "en_US". Setting .locale directly will override any usage of dplyr.legacy_locale.
if (dplyr:::has_minimum_stringi()) withAutoprint(\ # examplesIf df <- tibble(x = c("a", "b", "C", "B", "c")) df # Default locale is C, which groups the English alphabet by case, placing # uppercase letters before lowercase letters. arrange(df, x) # The American English locale groups the alphabet by letter. # Explicitly override `.locale` with `"en"` for this ordering. arrange(df, x, .locale = "en") # This Danish letter is expected to sort after `z` df <- tibble(x = c("o", "p", "00F8", "z")) df # The American English locale sorts it right after `o` arrange(df, x, .locale = "en") # Using `"da"` for Danish ordering gives the expected result arrange(df, x, .locale = "da") \) # examplesIfTo learn more about dplyr, start with the vignettes: browseVignettes(package = "dplyr")
There are two ways to group in dplyr: Persistent grouping with [=group_by]group_by() Per-operation grouping with .by/by This help page is dedicated to explaining where and why you might want to use the latter. Depending on the dplyr verb, the per-operation grouping argument may be named .by or by. The Supported verbs section below outlines this on a case-by-case basis. The remainder of this page will refer to .by for simplicity. Grouping radically affects the computation of the dplyr verb you use it with, and one of the goals of .by is to allow you to place that grouping specification alongside the code that actually uses it. As an added benefit, with .by you no longer need to remember to [=ungroup]ungroup() after [=summarise]summarise(), and summarise() won't ever message you about how it's handling the groups! This idea comes from https://CRAN.R-project.org/package=data.tabledata.table, which allows you to specify by alongside modifications in j, like: dt[, .(x = mean(x)), by = g]. Supported verbs [=mutate]mutate(.by = ) [=summarise]summarise(.by = ) [=reframe]reframe(.by = ) [=filter]filter(.by = ) [=filter_out]filter_out(.by = ) [=slice]slice(.by = ) [=slice_head]slice_head(by = ) and [=slice_tail]slice_tail(by = ) [=slice_min]slice_min(by = ) and [=slice_max]slice_max(by = ) [=slice_sample]slice_sample(by = ) Note that some dplyr verbs use by while others use .by. This is a purely technical difference. Differences between .by and group_by()ll .by group_by() Grouping only affects a single verb Grouping is persistent across multiple verbs Selects variables with [=dplyr_tidy_select]tidy-select Computes expressions with [rlang:args_data_masking]data-masking Summaries use existing order of group keys Summaries sort group keys in ascending order Using .by Let's take a look at the two grouping approaches using this expenses data set, which tracks costs accumulated across various ids and regions: html<div class="sourceCode r">expenses <- tibble( id = c(1, 2, 1, 3, 1, 2, 3), region = c("A", "A", "A", "B", "B", "A", "A"), cost = c(25, 20, 19, 12, 9, 6, 6) ) expenses #> # A tibble: 7 x 3 #> id region cost #> <dbl> <chr> <dbl> #> 1 1 A 25 #> 2 2 A 20 #> 3 1 A 19 #> 4 3 B 12 #> 5 1 B 9 #> 6 2 A 6 #> 7 3 A 6 html</div> Imagine that you wanted to compute the average cost per region. You'd probably write something like this: html<div class="sourceCode r">expenses |> group_by(region) |> summarise(cost = mean(cost)) #> # A tibble: 2 x 2 #> region cost #> <chr> <dbl> #> 1 A 15.2 #> 2 B 10.5 html</div> Instead, you can now specify the grouping inline within the verb: html<div class="sourceCode r">expenses |> summarise(cost = mean(cost), .by = region) #> # A tibble: 2 x 2 #> region cost #> <chr> <dbl> #> 1 A 15.2 #> 2 B 10.5 html</div> .by applies to a single operation, meaning that since expenses was an ungrouped data frame, the result after applying .by will also always be an ungrouped data frame, regardless of the number of grouping columns. html<div class="sourceCode r">expenses |> summarise(cost = mean(cost), .by = c(id, region)) #> # A tibble: 5 x 3 #> id region cost #> <dbl> <chr> <dbl> #> 1 1 A 22 #> 2 2 A 13 #> 3 3 B 12 #> 4 1 B 9 #> 5 3 A 6 html</div> Compare that with group_by() |> summarise(), where summarise() generally peels off 1 layer of grouping by default, typically with a message that it is doing so: html<div class="sourceCode r">expenses |> group_by(id, region) |> summarise(cost = mean(cost)) #> `summarise()` has regrouped the output. #> i Summaries were computed grouped by id and region. #> i Output is grouped by id. #> i Use `summarise(.groups = "drop_last")` to silence this message. #> i Use `summarise(.by = c(id, region))` for per-operation grouping #> (`?dplyr::dplyr_by`) instead. #> # A tibble: 5 x 3 #> # Groups: id [3] #> id region cost #> <dbl> <chr> <dbl> #> 1 1 A 22 #> 2 1 B 9 #> 3 2 A 13 #> 4 3 A 6 #> 5 3 B 12 html</div> Because .by grouping applies to a single operation, you don't need to worry about ungrouping, and it never needs to emit a message to remind you what it is doing with the groups. Note that with .by we specified multiple columns to group by using the [=dplyr_tidy_select]tidy-select syntax c(id, region). If you have a character vector of column names you'd like to group by, you can do so with .by = all_of(my_cols). It will group by the columns in the order they were provided. To prevent surprising results, you can't use .by on an existing grouped data frame: html<div class="sourceCode r">expenses |> group_by(id) |> summarise(cost = mean(cost), .by = c(id, region)) #> Error in `summarise()`: #> ! Can't supply `.by` when `.data` is a grouped data frame. html</div> So far we've focused on the usage of .by with summarise(), but .by works with a number of other dplyr verbs. For example, you could append the mean cost per region onto the original data frame as a new column rather than computing a summary: html<div class="sourceCode r">expenses |> mutate(cost_by_region = mean(cost), .by = region) #> # A tibble: 7 x 4 #> id region cost cost_by_region #> <dbl> <chr> <dbl> <dbl> #> 1 1 A 25 15.2 #> 2 2 A 20 15.2 #> 3 1 A 19 15.2 #> 4 3 B 12 10.5 #> 5 1 B 9 10.5 #> 6 2 A 6 15.2 #> 7 3 A 6 15.2 html</div> Or you could slice out the maximum cost per combination of id and region: html<div class="sourceCode r"># Note that the argument is named `by` in `slice_max()` expenses |> slice_max(cost, n = 1, by = c(id, region)) #> # A tibble: 5 x 3 #> id region cost #> <dbl> <chr> <dbl> #> 1 1 A 25 #> 2 2 A 20 #> 3 3 B 12 #> 4 1 B 9 #> 5 3 A 6 html</div> Result ordering When used with .by, summarise(), reframe(), and slice() all maintain the ordering of the existing data. This is different from group_by(), which has always sorted the group keys in ascending order. html<div class="sourceCode r">df <- tibble( month = c("jan", "jan", "feb", "feb", "mar"), temp = c(20, 25, 18, 20, 40) ) # Uses ordering by "first appearance" in the original data df |> summarise(average_temp = mean(temp), .by = month) #> # A tibble: 3 x 2 #> month average_temp #> <chr> <dbl> #> 1 jan 22.5 #> 2 feb 19 #> 3 mar 40 # Sorts in ascending order df |> group_by(month) |> summarise(average_temp = mean(temp)) #> # A tibble: 3 x 2 #> month average_temp #> <chr> <dbl> #> 1 feb 19 #> 2 jan 22.5 #> 3 mar 40 html</div> If you need sorted group keys, we recommend that you explicitly use [=arrange]arrange() either before or after the call to summarise(), reframe(), or slice(). This also gives you full access to all of arrange()'s features, such as desc() and the .locale argument. Verbs without .by support If a dplyr verb doesn't support .by, then that typically means that the verb isn't inherently affected by grouping. For example, [=pull]pull() and [=rename]rename() don't support .by, because specifying columns to group by would not affect their implementations. That said, there are a few exceptions to this where sometimes a dplyr verb doesn't support .by, but does have special support for grouped data frames created by [=group_by]group_by(). This is typically because the verbs are required to retain the grouping columns, for example: [=select]select() always retains grouping columns, with a message if any aren't specified in the select() call. [=distinct]distinct() and [=count]count() place unspecified grouping columns at the front of the data frame before computing their results. [=arrange]arrange() has a .by_group argument to optionally order by grouping columns first. If group_by() didn't exist, then these verbs would not have special support for grouped data frames.
This page is now located at [rlang:args_data_masking]?rlang::args_data_masking.
htmlhttps://lifecycle.r-lib.org/articles/stages.html#experimentallifecycle-experimental.svgoptions: alt='[Experimental]'[Experimental] These three functions, along with names<- and 1d numeric [ (i.e. x[loc]) methods, provide a minimal interface for extending dplyr to work with new data frame subclasses. This means that for simple cases you should only need to provide a couple of methods, rather than a method for every dplyr verb. These functions are a stop-gap measure until we figure out how to solve the problem more generally, but it's likely that any code you write to implement them will find a home in what comes next.
dplyr_row_slice(data, i, ...) dplyr_col_modify(data, cols) dplyr_reconstruct(data, template)This page describes the <tidy-select> argument modifier which indicates the argument supports tidy selections. Tidy selection provides a concise dialect of R for selecting variables based on their names or properties. Tidy selection is a variant of tidy evaluation. This means that inside functions, tidy-select arguments require special attention, as described in the Indirection section below. If you've never heard of tidy evaluation before, start with vignette("programming").
This is a generic function which gives more details about an object than [=print]print(), and is more focused on human readable output than [=str]str().
explain(x, ...) show_query(x, ...)if (requireNamespace("dbplyr", quietly = TRUE) && requireNamespace("RSQLite", quietly = TRUE)) withAutoprint(\ # examplesIf lahman_s <- dbplyr::lahman_sqlite() batting <- tbl(lahman_s, "Batting") batting |> show_query() batting |> explain() # The batting database has indices on all ID variables: # SQLite automatically picks the most restrictive index batting |> filter(lgID == "NL" & yearID == 2000L) |> explain() # OR's will use multiple indexes batting |> filter(lgID == "NL" | yearID == 2000) |> explain() # Joins will use indexes in both tables teams <- tbl(lahman_s, "Teams") batting |> left_join(teams, c("yearID", "teamID")) |> explain() \) # examplesIfThese functions are used to subset a data frame, applying the expressions in ... to determine which rows should be kept (for filter()) or dropped ( for filter_out()). Multiple conditions can be supplied separated by a comma. These will be combined with the & operator. To combine comma separated conditions using | instead, wrap them in [=when_any]when_any(). Both filter() and filter_out() treat NA like FALSE. This subtle behavior can impact how you write your conditions when missing values are involved. See the section on Missing values for important details and examples.
filter(.data, ..., .by = NULL, .preserve = FALSE) filter_out(.data, ..., .by = NULL, .preserve = FALSE)# Filtering for one criterion filter(starwars, species == "Human") # Filtering for multiple criteria within a single logical expression filter(starwars, hair_color == "none" & eye_color == "black") filter(starwars, hair_color == "none" | eye_color == "black") # Multiple comma separated expressions are combined using `&` starwars |> filter(hair_color == "none", eye_color == "black") # To combine comma separated expressions using `|` instead, use `when_any()` starwars |> filter(when_any(hair_color == "none", eye_color == "black")) # Filtering out to drop rows filter_out(starwars, hair_color == "none") # When filtering out, it can be useful to first interactively filter for the # rows you want to drop, just to double check that you've written the # conditions correctly. Then, just change `filter()` to `filter_out()`. filter(starwars, mass > 1000, eye_color == "orange") filter_out(starwars, mass > 1000, eye_color == "orange") # The filtering operation may yield different results on grouped # tibbles because the expressions are computed within groups. # # The following keeps rows where `mass` is greater than the # global average: starwars |> filter(mass > mean(mass, na.rm = TRUE)) # Whereas this keeps rows with `mass` greater than the per `gender` # average: starwars |> filter(mass > mean(mass, na.rm = TRUE), .by = gender) # If you find yourself trying to use a `filter()` to drop rows, then # you should consider if switching to `filter_out()` can simplify your # conditions. For example, to drop blond individuals, you might try: starwars |> filter(hair_color != "blond") # But this also drops rows with an `NA` hair color! To retain those: starwars |> filter(hair_color != "blond" | is.na(hair_color)) # But explicit `NA` handling like this can quickly get unwieldy, especially # with multiple conditions. Since your intent was to specify rows to drop # rather than rows to keep, use `filter_out()`. This also removes the need # for any explicit `NA` handling. starwars |> filter_out(hair_color == "blond") # To refer to column names that are stored as strings, use the `.data` # pronoun: vars <- c("mass", "height") cond <- c(80, 150) starwars |> filter( .data[[vars[[1]]]] > cond[[1]], .data[[vars[[2]]]] > cond[[2]] ) # Learn more in ?rlang::args_data_maskingFiltering joins filter rows from x based on the presence or absence of matches in y: semi_join() returns all rows from x with a match in y. anti_join() returns all rows from x without a match in y.
semi_join(x, y, by = NULL, copy = FALSE, ...) semi_joindata.frame(x, y, by = NULL, copy = FALSE, ..., na_matches = c("na", "never")) anti_join(x, y, by = NULL, copy = FALSE, ...) anti_joindata.frame(x, y, by = NULL, copy = FALSE, ..., na_matches = c("na", "never"))# "Filtering" joins keep cases from the LHS band_members |> semi_join(band_instruments) band_members |> anti_join(band_instruments) # To suppress the message about joining variables, supply `by` band_members |> semi_join(band_instruments, by = join_by(name)) # This is good practice in production codehtmlhttps://lifecycle.r-lib.org/articles/stages.html#supersededlifecycle-superseded.svgoptions: alt='[Superseded]'[Superseded] Scoped verbs (_if, _at, _all) have been superseded by the use of [=if_all]if_all() or [=if_any]if_any() in an existing verb. See vignette("colwise") for details. These scoped filtering verbs apply a predicate expression to a selection of variables. The predicate expression should be quoted with [=all_vars]all_vars() or [=any_vars]any_vars() and should mention the pronoun . to refer to variables.
filter_all(.tbl, .vars_predicate, .preserve = FALSE) filter_if(.tbl, .predicate, .vars_predicate, .preserve = FALSE) filter_at(.tbl, .vars, .vars_predicate, .preserve = FALSE)# While filter() accepts expressions with specific variables, the # scoped filter verbs take an expression with the pronoun `.` and # replicate it over all variables. This expression should be quoted # with all_vars() or any_vars(): all_vars(is.na(.)) any_vars(is.na(.)) # You can take the intersection of the replicated expressions: filter_all(mtcars, all_vars(. > 150)) # -> filter(mtcars, if_all(everything(), ~ .x > 150)) # Or the union: filter_all(mtcars, any_vars(. > 150)) # -> filter(mtcars, if_any(everything(), ~ . > 150)) # You can vary the selection of columns on which to apply the # predicate. filter_at() takes a vars() specification: filter_at(mtcars, vars(starts_with("d")), any_vars((. %% 2) == 0)) # -> filter(mtcars, if_any(starts_with("d"), ~ (.x %% 2) == 0)) # And filter_if() selects variables with a predicate function: filter_if(mtcars, ~ all(floor(.) == .), all_vars(. != 0)) # -> is_int <- function(x) all(floor(x) == x) filter(mtcars, if_all(where(is_int), ~ .x != 0))htmlhttps://lifecycle.r-lib.org/articles/stages.html#deprecatedlifecycle-deprecated.svgoptions: alt='[Deprecated]'[Deprecated] funs() is deprecated; please use list() instead. We deprecated this function because it provided a unique way of specifying anonymous functions, rather than adopting the conventions used by purrr and other packages in the tidyverse.
funs(..., .args = list())funs("mean", mean(., na.rm = TRUE)) # -> list(mean = mean, mean = ~ mean(.x, na.rm = TRUE)) funs(m1 = mean, m2 = "mean", m3 = mean(., na.rm = TRUE)) # -> list(m1 = mean, m2 = "mean", m3 = ~ mean(.x, na.rm = TRUE))glimpse() is like a transposed version of print(): columns run down the page, and data runs across. This makes it possible to see every column in a data frame. It's a little like [=str]str() applied to a data frame but it tries to show you as much data as possible. (And it always shows the underlying data, even when applied to a remote data source.) glimpse() is provided by the pillar package, and re-exported by dplyr. See [pillar:glimpse]pillar::glimpse() for more details.
glimpse(mtcars) # Note that original x is (invisibly) returned, allowing `glimpse()` to be # used within a pipeline. mtcars |> glimpse() |> select(1:3) glimpse(starwars)Most data operations are done on groups defined by variables. group_by() takes an existing tbl and converts it into a grouped tbl where operations are performed "by group". ungroup() removes grouping.
group_by(.data, ..., .add = FALSE, .drop = group_by_drop_default(.data)) ungroup(x, ...)by_cyl <- mtcars |> group_by(cyl) # grouping doesn't change how the data looks (apart from listing # how it's grouped): by_cyl # It changes how it acts with the other dplyr verbs: by_cyl |> summarise( disp = mean(disp), hp = mean(hp) ) by_cyl |> filter(disp == max(disp)) # Each call to summarise() removes a layer of grouping by_vs_am <- mtcars |> group_by(vs, am) by_vs <- by_vs_am |> summarise(n = n()) by_vs by_vs |> summarise(n = sum(n)) # To removing grouping, use ungroup by_vs |> ungroup() |> summarise(n = sum(n)) # By default, group_by() overrides existing grouping by_cyl |> group_by(vs, am) |> group_vars() # Use add = TRUE to instead append by_cyl |> group_by(vs, am, .add = TRUE) |> group_vars() # You can group by expressions: this is a short-hand # for a mutate() followed by a group_by() mtcars |> group_by(vsam = vs + am) # The implicit mutate() step is always performed on the # ungrouped data. Here we get 3 groups: mtcars |> group_by(vs) |> group_by(hp_cut = cut(hp, 3)) # If you want it to be performed by groups, # you have to use an explicit mutate() call. # Here we get 3 groups per value of vs mtcars |> group_by(vs) |> mutate(hp_cut = cut(hp, 3)) |> group_by(hp_cut) # when factors are involved and .drop = FALSE, groups can be empty tbl <- tibble( x = 1:10, y = factor(rep(c("a", "c"), each = 5), levels = c("a", "b", "c")) ) tbl |> group_by(y, .drop = FALSE) |> group_rows()htmlhttps://lifecycle.r-lib.org/articles/stages.html#supersededlifecycle-superseded.svgoptions: alt='[Superseded]'[Superseded] Scoped verbs (_if, _at, _all) have been superseded by the use of [=pick]pick() or [=across]across() in an existing verb. See vignette("colwise") for details. These scoped variants of [=group_by]group_by() group a data frame by a selection of variables. Like [=group_by]group_by(), they have optional mutate semantics.
group_by_all( .tbl, .funs = list(), ..., .add = FALSE, .drop = group_by_drop_default(.tbl) ) group_by_at( .tbl, .vars, .funs = list(), ..., .add = FALSE, .drop = group_by_drop_default(.tbl) ) group_by_if( .tbl, .predicate, .funs = list(), ..., .add = FALSE, .drop = group_by_drop_default(.tbl) )# Group a data frame by all variables: group_by_all(mtcars) # -> mtcars |> group_by(pick(everything())) # Group by variables selected with a predicate: group_by_if(iris, is.factor) # -> iris |> group_by(pick(where(is.factor))) # Group by variables selected by name: group_by_at(mtcars, vars(vs, am)) # -> mtcars |> group_by(pick(vs, am)) # Like group_by(), the scoped variants have optional mutate # semantics. This provide a shortcut for group_by() + mutate(): d <- tibble(x=c(1,1,2,2), y=c(1,2,1,2)) group_by_all(d, as.factor) # -> d |> group_by(across(everything(), as.factor)) group_by_if(iris, is.factor, as.character) # -> iris |> group_by(across(where(is.factor), as.character))Default value for .drop argument of group_by
group_by_drop_default(.tbl)group_by_drop_default(iris) iris |> group_by(Species) |> group_by_drop_default() iris |> group_by(Species, .drop = FALSE) |> group_by_drop_default()This selection helpers matches grouping variables. It can be used in [=select]select() or [=vars]vars() selections.
group_cols(vars = NULL, data = NULL)gdf <- iris |> group_by(Species) gdf |> select(group_cols()) # Remove the grouping variables from mutate selections: gdf |> mutate_at(vars(-group_cols()), `/`, 100) # -> No longer necessary with across() gdf |> mutate(across(everything(), ~ . / 100))This collection of functions accesses data about grouped data frames in various ways: group_data() returns a data frame that defines the grouping structure. The columns give the values of the grouping variables. The last column, always called .rows, is a list of integer vectors that gives the location of the rows in each group. group_keys() returns a data frame describing the groups. group_rows() returns a list of integer vectors giving the rows that each group contains. group_indices() returns an integer vector the same length as .data that gives the group that each row belongs to. group_vars() gives names of grouping variables as character vector. groups() gives the names of the grouping variables as a list of symbols. group_size() gives the size of each group. n_groups() gives the total number of groups. See context for equivalent functions that return values for the current group.
group_data(.data) group_keys(.tbl, ...) group_rows(.data) group_indices(.data, ...) group_vars(x) groups(x) group_size(x) n_groups(x)df <- tibble(x = c(1,1,2,2)) group_vars(df) group_rows(df) group_data(df) group_indices(df) gf <- group_by(df, x) group_vars(gf) group_rows(gf) group_data(gf) group_indices(gf)htmlhttps://lifecycle.r-lib.org/articles/stages.html#experimentallifecycle-experimental.svgoptions: alt='[Experimental]'[Experimental] group_map(), group_modify() and group_walk() are purrr-style functions that can be used to iterate on grouped tibbles.
group_map(.data, .f, ..., .keep = FALSE) group_modify(.data, .f, ..., .keep = FALSE) group_walk(.data, .f, ..., .keep = FALSE)# return a list mtcars |> group_by(cyl) |> group_map(~ head(.x, 2L)) # return a tibble grouped by `cyl` with 2 rows per group # the grouping data is recalculated mtcars |> group_by(cyl) |> group_modify(~ head(.x, 2L)) if (requireNamespace("broom", quietly = TRUE)) withAutoprint(\ # examplesIf # a list of tibbles iris |> group_by(Species) |> group_map(~ broom::tidy(lm(Petal.Length ~ Sepal.Length, data = .x))) # a restructured grouped tibble iris |> group_by(Species) |> group_modify(~ broom::tidy(lm(Petal.Length ~ Sepal.Length, data = .x))) \) # examplesIf # a list of vectors iris |> group_by(Species) |> group_map(~ quantile(.x$Petal.Length, probs = c(0.25, 0.5, 0.75))) # to use group_modify() the lambda must return a data frame iris |> group_by(Species) |> group_modify(~ quantile(.x$Petal.Length, probs = c(0.25, 0.5, 0.75)) |> tibble::enframe(name = "prob", value = "quantile") ) iris |> group_by(Species) |> group_modify(~ .x |> purrr::map_dfc(fivenum) |> mutate(nms = c("min", "Q1", "median", "Q3", "max")) ) # group_walk() is for side effects dir.create(temp <- tempfile()) iris |> group_by(Species) |> group_walk(~ write.csv(.x, file = file.path(temp, paste0(.y$Species, ".csv")))) list.files(temp, pattern = "csv$") unlink(temp, recursive = TRUE) # group_modify() and ungrouped data frames mtcars |> group_modify(~ head(.x, 2L))htmlhttps://lifecycle.r-lib.org/articles/stages.html#experimentallifecycle-experimental.svgoptions: alt='[Experimental]'[Experimental] Nest a tibble using a grouping specification
group_nest(.tbl, ..., .key = "data", keep = FALSE)#----- use case 1: a grouped data frame iris |> group_by(Species) |> group_nest() # this can be useful if the grouped data has been altered before nesting iris |> group_by(Species) |> filter(Sepal.Length > mean(Sepal.Length)) |> group_nest() #----- use case 2: using group_nest() on a ungrouped data frame with # a grouping specification that uses the data mask starwars |> group_nest(species, homeworld)htmlhttps://lifecycle.r-lib.org/articles/stages.html#experimentallifecycle-experimental.svgoptions: alt='[Experimental]'[Experimental] [=group_split]group_split() works like [base:split]base::split() but: It uses the grouping structure from [=group_by]group_by() and therefore is subject to the data mask It does not name the elements of the list based on the grouping as this only works well for a single character grouping variable. Instead, use [=group_keys]group_keys() to access a data frame that defines the groups. group_split() is primarily designed to work with grouped data frames. You can pass ... to group and split an ungrouped data frame, but this is generally not very useful as you want have easy access to the group metadata.
group_split(.tbl, ..., .keep = TRUE)ir <- iris |> group_by(Species) group_split(ir) group_keys(ir)htmlhttps://lifecycle.r-lib.org/articles/stages.html#experimentallifecycle-experimental.svgoptions: alt='[Experimental]'[Experimental] Drop unused levels of all factors that are used as grouping variables, then recalculates the grouping structure. group_trim() is particularly useful after a [=filter]filter() that is intended to select a subset of groups.
group_trim(.tbl, .drop = group_by_drop_default(.tbl))iris |> group_by(Species) |> filter(Species == "setosa", .preserve = TRUE) |> group_trim()The easiest way to create a grouped data frame is to call the group_by() method on a data frame or tbl: this will take care of capturing the unevaluated expressions for you. These functions are designed for programmatic use. For data analysis purposes see [=group_data]group_data() for the accessor functions that retrieve various metadata from a grouped data frames.
grouped_df(data, vars, drop = group_by_drop_default(data)) is.grouped_df(x) is_grouped_df(x)ident() takes strings and turns them as database identifiers (e.g. table or column names) quoting them using the identifer rules for your database. ident_q() does the same, but assumes the names have already been quoted, preventing them from being quoted again. These are generally for internal use only; if you need to supply an table name that is qualified with schema or catalog, or has already been quoted for some other reason, use I().
ident(...)# Identifiers are escaped with " if (requireNamespace("dbplyr", quietly = TRUE)) withAutoprint(\ # examplesIf ident("x") \) # examplesIfif_else() is a vectorized [=if]if-else. Compared to the base R equivalent, [=ifelse]ifelse(), this function allows you to handle missing values in the condition with missing and always takes true, false, and missing into account when determining what the output type should be.
if_else( condition, true, false, missing = NULL, ..., ptype = NULL, size = deprecated() )x <- c(-5:5, NA) if_else(x < 0, NA, x) # Explicitly handle `NA` values in the `condition` with `missing` if_else(x < 0, "negative", "positive", missing = "missing") # Unlike `ifelse()`, `if_else()` preserves types x <- factor(sample(letters[1:5], 10, replace = TRUE)) ifelse(x %in% c("a", "b", "c"), x, NA) if_else(x %in% c("a", "b", "c"), x, NA) # `if_else()` is often useful for creating new columns inside of `mutate()` starwars |> mutate(category = if_else(height < 100, "short", "tall"), .keep = "used")join_by() constructs a specification that describes how to join two tables using a small domain specific language. The result can be supplied as the by argument to any of the join functions (such as [=left_join]left_join()).
join_by(...)sales <- tibble( id = c(1L, 1L, 1L, 2L, 2L), sale_date = as.Date(c("2018-12-31", "2019-01-02", "2019-01-05", "2019-01-04", "2019-01-01")) ) sales promos <- tibble( id = c(1L, 1L, 2L), promo_date = as.Date(c("2019-01-01", "2019-01-05", "2019-01-02")) ) promos # Match `id` to `id`, and `sale_date` to `promo_date` by <- join_by(id, sale_date == promo_date) left_join(sales, promos, by) # For each `sale_date` within a particular `id`, # find all `promo_date`s that occurred before that particular sale by <- join_by(id, sale_date >= promo_date) left_join(sales, promos, by) # For each `sale_date` within a particular `id`, # find only the closest `promo_date` that occurred before that sale by <- join_by(id, closest(sale_date >= promo_date)) left_join(sales, promos, by) # If you want to disallow exact matching in rolling joins, use `>` rather # than `>=`. Note that the promo on `2019-01-05` is no longer considered the # closest match for the sale on the same date. by <- join_by(id, closest(sale_date > promo_date)) left_join(sales, promos, by) # Same as before, but also require that the promo had to occur at most 1 # day before the sale was made. We'll use a full join to see that id 2's # promo on `2019-01-02` is no longer matched to the sale on `2019-01-04`. sales <- mutate(sales, sale_date_lower = sale_date - 1) by <- join_by(id, closest(sale_date >= promo_date), sale_date_lower <= promo_date) full_join(sales, promos, by) # --------------------------------------------------------------------------- segments <- tibble( segment_id = 1:4, chromosome = c("chr1", "chr2", "chr2", "chr1"), start = c(140, 210, 380, 230), end = c(150, 240, 415, 280) ) segments reference <- tibble( reference_id = 1:4, chromosome = c("chr1", "chr1", "chr2", "chr2"), start = c(100, 200, 300, 415), end = c(150, 250, 399, 450) ) reference # Find every time a segment `start` falls between the reference # `[start, end]` range. by <- join_by(chromosome, between(start, start, end)) full_join(segments, reference, by) # If you wanted the reference columns first, supply `reference` as `x` # and `segments` as `y`, then explicitly refer to their columns using `x$` # and `y$`. by <- join_by(chromosome, between(y$start, x$start, x$end)) full_join(reference, segments, by) # Find every time a segment falls completely within a reference. # Sometimes using `x$` and `y$` makes your intentions clearer, even if they # match the default behavior. by <- join_by(chromosome, within(x$start, x$end, y$start, y$end)) inner_join(segments, reference, by) # Find every time a segment overlaps a reference in any way. by <- join_by(chromosome, overlaps(x$start, x$end, y$start, y$end)) full_join(segments, reference, by) # It is common to have right-open ranges with bounds like `[)`, which would # mean an end value of `415` would no longer overlap a start value of `415`. # Setting `bounds` allows you to compute overlaps with those kinds of ranges. by <- join_by(chromosome, overlaps(x$start, x$end, y$start, y$end, bounds = "[)")) full_join(segments, reference, by)Warnings that occur inside a dplyr verb like mutate() are caught and stashed away instead of being emitted to the console. This prevents rowwise and grouped data frames from flooding the console with warnings. To see the original warnings, use last_dplyr_warnings().
last_dplyr_warnings(n = 5)Find the "previous" (lag()) or "next" (lead()) values in a vector. Useful for comparing values behind of or ahead of the current values.
lag(x, n = 1L, default = NULL, order_by = NULL, ...) lead(x, n = 1L, default = NULL, order_by = NULL, ...)lag(1:5) lead(1:5) x <- 1:5 tibble(behind = lag(x), x, ahead = lead(x)) # If you want to look more rows behind or ahead, use `n` lag(1:5, n = 1) lag(1:5, n = 2) lead(1:5, n = 1) lead(1:5, n = 2) # If you want to define a value to pad with, use `default` lag(1:5) lag(1:5, default = 0) lead(1:5) lead(1:5, default = 6) # If the data are not already ordered, use `order_by` scrambled <- slice_sample( tibble(year = 2000:2005, value = (0:5) ^ 2), prop = 1 ) wrong <- mutate(scrambled, previous_year_value = lag(value)) arrange(wrong, year) right <- mutate(scrambled, previous_year_value = lag(value, order_by = year)) arrange(right, year)tbl() is the standard constructor for tbls. is.tbl() tests.
make_tbl(subclass, ...)mutate() creates new columns that are functions of existing variables. It can also modify (if the name is the same as an existing column) and delete columns (by setting their value to NULL).
mutate(.data, ...) mutatedata.frame( .data, ..., .by = NULL, .keep = c("all", "used", "unused", "none"), .before = NULL, .after = NULL )# Newly created variables are available immediately starwars |> select(name, mass) |> mutate( mass2 = mass * 2, mass2_squared = mass2 * mass2 ) # As well as adding new variables, you can use mutate() to # remove variables and modify existing variables. starwars |> select(name, height, mass, homeworld) |> mutate( mass = NULL, height = height * 0.0328084 # convert to feet ) # Use across() with mutate() to apply a transformation # to multiple columns in a tibble. starwars |> select(name, homeworld, species) |> mutate(across(!name, as.factor)) # see more in ?across # Window functions are useful for grouped mutates: starwars |> select(name, mass, homeworld) |> group_by(homeworld) |> mutate(rank = min_rank(desc(mass))) # see `vignette("window-functions")` for more details # By default, new columns are placed on the far right. df <- tibble(x = 1, y = 2) df |> mutate(z = x + y) df |> mutate(z = x + y, .before = 1) df |> mutate(z = x + y, .after = x) # By default, mutate() keeps all columns from the input data. df <- tibble(x = 1, y = 2, a = "a", b = "b") df |> mutate(z = x + y, .keep = "all") # the default df |> mutate(z = x + y, .keep = "used") df |> mutate(z = x + y, .keep = "unused") df |> mutate(z = x + y, .keep = "none") # Grouping ---------------------------------------- # The mutate operation may yield different results on grouped # tibbles because the expressions are computed within groups. # The following normalises `mass` by the global average: starwars |> select(name, mass, species) |> mutate(mass_norm = mass / mean(mass, na.rm = TRUE)) # Whereas this normalises `mass` by the averages within species # levels: starwars |> select(name, mass, species) |> group_by(species) |> mutate(mass_norm = mass / mean(mass, na.rm = TRUE)) # Indirection ---------------------------------------- # Refer to column names stored as strings with the `.data` pronoun: vars <- c("mass", "height") mutate(starwars, prod = .data[[vars[[1]]]] * .data[[vars[[2]]]]) # Learn more in ?rlang::args_data_maskingMutating joins add columns from y to x, matching observations based on the keys. There are four mutating joins: the inner join, and the three outer joins. Inner join An inner_join() only keeps observations from x that have a matching key in y. The most important property of an inner join is that unmatched rows in either input are not included in the result. This means that generally inner joins are not appropriate in most analyses, because it is too easy to lose observations. Outer joins The three outer joins keep observations that appear in at least one of the data frames: A left_join() keeps all observations in x. A right_join() keeps all observations in y. A full_join() keeps all observations in x and y.
inner_join( x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ..., keep = NULL ) inner_joindata.frame( x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ..., keep = NULL, na_matches = c("na", "never"), multiple = "all", unmatched = "drop", relationship = NULL ) left_join( x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ..., keep = NULL ) left_joindata.frame( x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ..., keep = NULL, na_matches = c("na", "never"), multiple = "all", unmatched = "drop", relationship = NULL ) right_join( x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ..., keep = NULL ) right_joindata.frame( x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ..., keep = NULL, na_matches = c("na", "never"), multiple = "all", unmatched = "drop", relationship = NULL ) full_join( x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ..., keep = NULL ) full_joindata.frame( x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ..., keep = NULL, na_matches = c("na", "never"), multiple = "all", relationship = NULL )band_members |> inner_join(band_instruments) band_members |> left_join(band_instruments) band_members |> right_join(band_instruments) band_members |> full_join(band_instruments) # To suppress the message about joining variables, supply `by` band_members |> inner_join(band_instruments, by = join_by(name)) # This is good practice in production code # Use an equality expression if the join variables have different names band_members |> full_join(band_instruments2, by = join_by(name == artist)) # By default, the join keys from `x` and `y` are coalesced in the output; use # `keep = TRUE` to keep the join keys from both `x` and `y` band_members |> full_join(band_instruments2, by = join_by(name == artist), keep = TRUE) # If a row in `x` matches multiple rows in `y`, all the rows in `y` will be # returned once for each matching row in `x`. df1 <- tibble(x = 1:3) df2 <- tibble(x = c(1, 1, 2), y = c("first", "second", "third")) df1 |> left_join(df2) # If a row in `y` also matches multiple rows in `x`, this is known as a # many-to-many relationship, which is typically a result of an improperly # specified join or some kind of messy data. In this case, a warning is # thrown by default: df3 <- tibble(x = c(1, 1, 1, 3)) df3 |> left_join(df2) # In the rare case where a many-to-many relationship is expected, set # `relationship = "many-to-many"` to silence this warning df3 |> left_join(df2, relationship = "many-to-many") # Use `join_by()` with a condition other than `==` to perform an inequality # join. Here we match on every instance where `df1$x > df2$x`. df1 |> left_join(df2, join_by(x > x)) # By default, NAs match other NAs so that there are two # rows in the output of this join: df1 <- data.frame(x = c(1, NA), y = 2) df2 <- data.frame(x = c(1, NA), z = 3) left_join(df1, df2) # You can optionally request that NAs don't match, giving a # a result that more closely resembles SQL joins left_join(df1, df2, na_matches = "never")htmlhttps://lifecycle.r-lib.org/articles/stages.html#supersededlifecycle-superseded.svgoptions: alt='[Superseded]'[Superseded] Scoped verbs (_if, _at, _all) have been superseded by the use of [=pick]pick() or [=across]across() in an existing verb. See vignette("colwise") for details. The scoped variants of [=mutate]mutate() and [=transmute]transmute() make it easy to apply the same transformation to multiple variables. There are three variants: _all affects every variable _at affects variables selected with a character vector or vars() _if affects variables selected with a predicate function:
mutate_all(.tbl, .funs, ...) mutate_if(.tbl, .predicate, .funs, ...) mutate_at(.tbl, .vars, .funs, ..., .cols = NULL) transmute_all(.tbl, .funs, ...) transmute_if(.tbl, .predicate, .funs, ...) transmute_at(.tbl, .vars, .funs, ..., .cols = NULL)iris <- as_tibble(iris) # All variants can be passed functions and additional arguments, # purrr-style. The _at() variants directly support strings. Here # we'll scale the variables `height` and `mass`: scale2 <- function(x, na.rm = FALSE) (x - mean(x, na.rm = na.rm)) / sd(x, na.rm) starwars |> mutate_at(c("height", "mass"), scale2) # -> starwars |> mutate(across(c("height", "mass"), scale2)) # You can pass additional arguments to the function: starwars |> mutate_at(c("height", "mass"), scale2, na.rm = TRUE) starwars |> mutate_at(c("height", "mass"), ~scale2(., na.rm = TRUE)) # -> starwars |> mutate(across(c("height", "mass"), ~ scale2(.x, na.rm = TRUE))) # You can also supply selection helpers to _at() functions but you have # to quote them with vars(): iris |> mutate_at(vars(matches("Sepal")), log) iris |> mutate(across(matches("Sepal"), log)) # The _if() variants apply a predicate function (a function that # returns TRUE or FALSE) to determine the relevant subset of # columns. Here we divide all the numeric columns by 100: starwars |> mutate_if(is.numeric, scale2, na.rm = TRUE) starwars |> mutate(across(where(is.numeric), ~ scale2(.x, na.rm = TRUE))) # mutate_if() is particularly useful for transforming variables from # one type to another iris |> mutate_if(is.factor, as.character) iris |> mutate_if(is.double, as.integer) # -> iris |> mutate(across(where(is.factor), as.character)) iris |> mutate(across(where(is.double), as.integer)) # Multiple transformations ---------------------------------------- # If you want to apply multiple transformations, pass a list of # functions. When there are multiple functions, they create new # variables instead of modifying the variables in place: iris |> mutate_if(is.numeric, list(scale2, log)) iris |> mutate_if(is.numeric, list(~scale2(.), ~log(.))) iris |> mutate_if(is.numeric, list(scale = scale2, log = log)) # -> iris |> as_tibble() |> mutate(across(where(is.numeric), list(scale = scale2, log = log))) # When there's only one function in the list, it modifies existing # variables in place. Give it a name to instead create new variables: iris |> mutate_if(is.numeric, list(scale2)) iris |> mutate_if(is.numeric, list(scale = scale2))n_distinct() counts the number of unique/distinct combinations in a set of one or more vectors. It's a faster and more concise equivalent to nrow(unique(data.frame(...))).
n_distinct(..., na.rm = FALSE)x <- c(1, 1, 2, 2, 2) n_distinct(x) y <- c(3, 3, NA, 3, 3) n_distinct(y) n_distinct(y, na.rm = TRUE) # Pairs (1, 3), (2, 3), and (2, NA) are distinct n_distinct(x, y) # (2, NA) is dropped, leaving 2 distinct combinations n_distinct(x, y, na.rm = TRUE) # Also works with data frames n_distinct(data.frame(x, y))This is a translation of the SQL command NULLIF. It is useful if you want to convert an annoying value to NA.
na_if(x, y)# `na_if()` is useful for replacing a single problematic value with `NA` na_if(c(-99, 1, 4, 3, -99, 5), -99) na_if(c("abc", "def", "", "ghi"), "") # You can use it to standardize `NaN`s to `NA` na_if(c(1, NaN, NA, 2, NaN), NaN) # Because `na_if()` is an R translation of SQL's `NULLIF` command, # it compares `x` and `y` element by element. Where `x` and `y` are # equal, the value in `x` is replaced with an `NA`. na_if( x = c(1, 2, 5, 5, 6), y = c(0, 2, 3, 5, 4) ) # If you have multiple problematic values that you'd like to replace with # `NA`, then `replace_values()` is a better choice than `na_if()` x <- c(-99, 1, 4, 0, -99, 5, -1, 0, 5) replace_values(x, c(0, -1, -99) ~ NA) # You'd have to nest `na_if()`s to achieve this try(na_if(x, c(0, -1, -99))) na_if(na_if(na_if(x, 0), -1), -99) # If you'd like to replace values that match a logical condition with `NA`, # use `replace_when()` replace_when(x, x < 0 ~ NA) # If you'd like to replace `NA` with some other value, use `replace_values()` x <- c(NA, 5, 2, NA, 0, 3) replace_values(x, NA ~ 0) # `na_if()` is particularly useful inside `mutate()` starwars |> select(name, eye_color) |> mutate(eye_color = na_if(eye_color, "unknown")) # `na_if()` can also be used with `mutate()` and `across()` # to alter multiple columns starwars |> mutate(across(where(is.character), ~na_if(., "unknown")))This is a safe way of comparing if two vectors of floating point numbers are (pairwise) equal. This is safer than using ==, because it has a built in tolerance
near(x, y, tol = .Machine$double.eps^0.5)sqrt(2) ^ 2 == 2 near(sqrt(2) ^ 2, 2)htmlhttps://lifecycle.r-lib.org/articles/stages.html#experimentallifecycle-experimental.svgoptions: alt='[Experimental]'[Experimental] nest_by() is closely related to [=group_by]group_by(). However, instead of storing the group structure in the metadata, it is made explicit in the data, giving each group key a single row along with a list-column of data frames that contain all the other data. nest_by() returns a rowwise data frame, which makes operations on the grouped data particularly elegant. See vignette("rowwise") for more details.
nest_by(.data, ..., .key = "data", .keep = FALSE)# After nesting, you get one row per group iris |> nest_by(Species) starwars |> nest_by(species) # The output is grouped by row, which makes modelling particularly easy models <- mtcars |> nest_by(cyl) |> mutate(model = list(lm(mpg ~ wt, data = data))) models models |> summarise(rsq = summary(model)$r.squared) if (requireNamespace("broom", quietly = TRUE)) withAutoprint(\ # examplesIf # This is particularly elegant with the broom functions models |> summarise(broom::glance(model)) models |> reframe(broom::tidy(model)) \) # examplesIf # Note that you can also `reframe()` to unnest the data models |> reframe(data)A nest join leaves x almost unchanged, except that it adds a new list-column, where each element contains the rows from y that match the corresponding row in x.
nest_join(x, y, by = NULL, copy = FALSE, keep = NULL, name = NULL, ...) nest_joindata.frame( x, y, by = NULL, copy = FALSE, keep = NULL, name = NULL, ..., na_matches = c("na", "never"), unmatched = "drop" )df1 <- tibble(x = 1:3) df2 <- tibble(x = c(2, 3, 3), y = c("a", "b", "c")) out <- nest_join(df1, df2) out out$df2new_grouped_df() and new_rowwise_df() are constructors designed to be high-performance so only check types, not values. This means it is the caller's responsibility to create valid values, and hence this is for expert use only. validate_grouped_df() and validate_rowwise_df() validate the attributes of a grouped_df or a rowwise_df.
new_grouped_df(x, groups, ..., class = character()) validate_grouped_df(x, check_bounds = FALSE) new_rowwise_df(data, group_data = NULL, ..., class = character()) validate_rowwise_df(x)# 5 bootstrap samples tbl <- new_grouped_df( tibble(x = rnorm(10)), groups = tibble(".rows" := replicate(5, sample(1:10, replace = TRUE), simplify = FALSE)) ) # mean of each bootstrap sample summarise(tbl, x = mean(x))These are useful helpers for extracting a single value from a vector. They are guaranteed to return a meaningful value, even when the input is shorter than expected. You can also provide an optional secondary vector that defines the ordering.
nth(x, n, order_by = NULL, default = NULL, na_rm = FALSE) first(x, order_by = NULL, default = NULL, na_rm = FALSE) last(x, order_by = NULL, default = NULL, na_rm = FALSE)x <- 1:10 y <- 10:1 first(x) last(y) nth(x, 1) nth(x, 5) nth(x, -2) # `first()` and `last()` are often useful in `summarise()` df <- tibble(x = x, y = y) df |> summarise( across(x:y, first, .names = "col_first"), y_last = last(y) ) # Selecting a position that is out of bounds returns a default value nth(x, 11) nth(x, 0) # This out of bounds behavior also applies to empty vectors first(integer()) # You can customize the default value with `default` nth(x, 11, default = -1L) first(integer(), default = 0L) # `order_by` provides optional ordering last(x) last(x, order_by = y) # `na_rm` removes missing values before extracting the value z <- c(NA, NA, 1, 3, NA, 5, NA) first(z) first(z, na_rm = TRUE) last(z, na_rm = TRUE) nth(z, 3, na_rm = TRUE) # For data frames, these select entire rows df <- tibble(a = 1:5, b = 6:10) first(df) nth(df, 4)ntile() is a sort of very rough rank, which breaks the input vector into n buckets. If length(x) is not an integer multiple of n, the size of the buckets will differ by up to one, with larger buckets coming first. Unlike other ranking functions, ntile() ignores ties: it will create evenly sized buckets even if the same value of x ends up in different buckets.
ntile(x = row_number(), n)x <- c(5, 1, 3, 2, 2, NA) ntile(x, 2) ntile(x, 4) # If the bucket sizes are uneven, the larger buckets come first ntile(1:8, 3) # Ties are ignored ntile(rep(1, 8), 3)This function makes it possible to control the ordering of window functions in R that don't have a specific ordering parameter. When translated to SQL it will modify the order clause of the OVER function.
order_by(order_by, call)order_by(10:1, cumsum(1:10)) x <- 10:1 y <- 1:10 order_by(x, cumsum(y)) df <- data.frame(year = 2000:2005, value = (0:5) ^ 2) scrambled <- df[sample(nrow(df)), ] wrong <- mutate(scrambled, running = cumsum(value)) arrange(wrong, year) right <- mutate(scrambled, running = order_by(year, cumsum(value))) arrange(right, year)These two ranking functions implement two slightly different ways to compute a percentile. For each x_i in x: cume_dist(x) counts the total number of values less than or equal to x_i, and divides it by the number of observations. percent_rank(x) counts the total number of values less than x_i, and divides it by the number of observations minus 1. In both cases, missing values are ignored when counting the number of observations.
percent_rank(x) cume_dist(x)x <- c(5, 1, 3, 2, 2) cume_dist(x) percent_rank(x) # You can understand what's going on by computing it by hand sapply(x, function(xi) sum(x <= xi) / length(x)) sapply(x, function(xi) sum(x < xi) / (length(x) - 1)) # The real computations are a little more complex in order to # correctly deal with missing valuespick() provides a way to easily select a subset of columns from your data using [=select]select() semantics while inside a [rlang:args_data_masking]"data-masking" function like [=mutate]mutate() or [=summarise]summarise(). pick() returns a data frame containing the selected columns for the current group. pick() is complementary to [=across]across(): With pick(), you typically apply a function to the full data frame. With across(), you typically apply a function to each column.
pick(...)df <- tibble( x = c(3, 2, 2, 2, 1), y = c(0, 2, 1, 1, 4), z1 = c("a", "a", "a", "b", "a"), z2 = c("c", "d", "d", "a", "c") ) df # `pick()` provides a way to select a subset of your columns using # tidyselect. It returns a data frame. df |> mutate(cols = pick(x, y)) # This is useful for functions that take data frames as inputs. # For example, you can compute a joint rank between `x` and `y`. df |> mutate(rank = dense_rank(pick(x, y))) # `pick()` is also useful as a bridge between data-masking functions (like # `mutate()` or `group_by()`) and functions with tidy-select behavior (like # `select()`). For example, you can use `pick()` to create a wrapper around # `group_by()` that takes a tidy-selection of columns to group on. For more # bridge patterns, see # https://rlang.r-lib.org/reference/topic-data-mask-programming.html#bridge-patterns. my_group_by <- function(data, cols) group_by(data, pick( cols )) df |> my_group_by(c(x, starts_with("z"))) # Or you can use it to dynamically select columns to `count()` by df |> count(pick(starts_with("z")))htmlhttps://lifecycle.r-lib.org/articles/stages.html#deprecatedlifecycle-deprecated.svgoptions: alt='[Deprecated]'[Deprecated] This progress bar has been deprecated since providing progress bars is not the responsibility of dplyr. Instead, you might try the more powerful https://github.com/r-lib/progressprogress package. This reference class represents a text progress bar displayed estimated time remaining. When finished, it displays the total duration. The automatic progress bar can be disabled by setting option dplyr.show_progress to FALSE.
progress_estimated(n, min_time = 0)p <- progress_estimated(3) p$tick() p$tick() p$tick() p <- progress_estimated(3) for (i in 1:3) p$pause(0.1)$tick()$print() p <- progress_estimated(3) p$tick()$print()$ pause(1)$stop() # If min_time is set, progress bar not shown until that many # seconds have elapsed p <- progress_estimated(3, min_time = 3) for (i in 1:3) p$pause(0.1)$tick()$print() p <- progress_estimated(10, min_time = 3) for (i in 1:10) p$pause(0.5)$tick()$print()pull() is similar to $. It's mostly useful because it looks a little nicer in pipes, it also works with remote data frames, and it can optionally name the output.
pull(.data, var = -1, name = NULL, ...)mtcars |> pull(-1) mtcars |> pull(1) mtcars |> pull(cyl) if (requireNamespace("dbplyr", quietly = TRUE) && requireNamespace("RSQLite", quietly = TRUE)) withAutoprint(\ # examplesIf # Also works for remote sources df <- dbplyr::memdb_frame(x = 1:10, y = 10:1, .name = "pull-ex") df |> mutate(z = x * y) |> pull() \) # examplesIf # Pull a named vector starwars |> pull(height, name)htmlhttps://lifecycle.r-lib.org/articles/stages.html#supersededlifecycle-superseded.svgoptions: alt='[Superseded]'[Superseded] recode() is superseded in favor of [=recode_values]recode_values() and [=replace_values]replace_values(), which are more general and have a much better interface. recode_factor() is also superseded, however, its direct replacement is not currently available but will eventually live in https://forcats.tidyverse.org/forcats. For creating new variables based on logical vectors, use [=if_else]if_else(). For even more complicated criteria, use [=case_when]case_when(). recode() is a vectorised version of [=switch]switch(): you can replace numeric values based on their position or their name, and character or factor values only by their name. This is an S3 generic: dplyr provides methods for numeric, character, and factors. You can use recode() directly with factors; it will preserve the existing order of levels while changing the values. Alternatively, you can use recode_factor(), which will change the order of levels to match the order of replacements.
recode(.x, ..., .default = NULL, .missing = NULL) recode_factor(.x, ..., .default = NULL, .missing = NULL, .ordered = FALSE)set.seed(1234) x <- sample(c("a", "b", "c"), 10, replace = TRUE) # `recode()` is superseded by `recode_values()` and `replace_values()` # If you are fully recoding a vector use `recode_values()` recode(x, a = "Apple", b = "Banana", .default = NA_character_) recode_values(x, "a" ~ "Apple", "b" ~ "Banana") # With a default recode(x, a = "Apple", b = "Banana", .default = "unknown") recode_values(x, "a" ~ "Apple", "b" ~ "Banana", default = "unknown") # If you are partially updating a vector and want to retain the original # vector's values in locations you don't make a replacement, use # `replace_values()` recode(x, a = "Apple", b = "Banana") replace_values(x, "a" ~ "Apple", "b" ~ "Banana") # `replace_values()` is easier to use with numeric vectors, because you don't # need to turn the numeric values into names y <- c(1:4, NA) recode(y, `2` = 20L, `4` = 40L) replace_values(y, 2 ~ 20L, 4 ~ 40L) # `recode()` is particularly confusing because it tries to handle both # full recodings to new vector types and partial updating of an existing # vector. With the above example, using doubles (20) rather than integers # (20L) results in a warning from `recode()`, because it thinks you are # doing a full recode and missed a case. `replace_values()` is type stable # on `y` and will instead coerce the double values to integer. recode(y, `2` = 20, `4` = 40) replace_values(y, 2 ~ 20, 4 ~ 40) # This also makes `replace_values()` much safer. If you provide # incompatible types, it will error. recode(y, `2` = "20", `4` = "40") try(replace_values(y, 2 ~ "20", 4 ~ "40")) # If you were trying to fully recode the vector and want a different output # type, use `recode_values()` recode_values(y, 2 ~ "20", 4 ~ "40") # And if you want to ensure you don't miss a case, use `unmatched`, which # errors rather than warns try(recode_values(y, 2 ~ "20", 4 ~ "40", unmatched = "error")) # --------------------------------------------------------------------------- # Lookup tables # If you were splicing an external lookup vector into `recode()`, you can # instead use the `from` and `to` arguments of `recode_values()` x <- c("a", "b", "a", "c", "d", "c") lookup <- c( "a" = "A", "b" = "B", "c" = "C", "d" = "D" ) recode(x, !!!lookup) recode_values(x, from = names(lookup), to = unname(lookup)) # `recode_values()` is much more flexible here because the lookup table # isn't restricted to just character values. We recommend using `tribble()` # to build your lookup tables. lookup <- tribble( ~from, ~to, "a", 1, "b", 2, "c", 3, "d", 4 ) recode_values(x, from = lookup$from, to = lookup$to) # --------------------------------------------------------------------------- # Factors # The factor method of `recode()` can generally be replaced with # `forcats::fct_recode()` x <- factor(c("a", "b", "c")) recode(x, a = "Apple") # forcats::fct_recode(x, "Apple" = "a") # `recode_factor()` does not currently have a direct replacement, but we # plan to add one to forcats. In the meantime, use a lookup table that # recodes every case, and then convert the `to` column to a factor. If you # define your lookup table in your preferred level order, then the conversion # to factor is straightforward! y <- c(3, 4, 1, 2, 4, NA) recode_factor( y, `1` = "a", `2` = "b", `3` = "c", `4` = "d", .missing = "M" ) lookup <- tribble( ~from, ~to, 1, "a", 2, "b", 3, "c", 4, "d", NA, "M" ) # `factor()` generates levels by sorting the unique values of `to`, which we # don't want, so we supply `levels = to` directly. Alternatively, use # `forcats::fct(to)`, which generates levels in order of appearance. lookup <- mutate(lookup, to = factor(to, levels = to)) recode_values(y, from = lookup$from, to = lookup$to)| Repository | Version | Published | First seen | Last seen | Docs |
|---|---|---|---|---|---|
| CRAN | 1.2.0 | 2026-02-03 | 2026-05-07 | 2026-05-07 | |
| CRAN | 1.1.4 | 2023-11-17 | 2026-05-07 | 2026-05-07 | |
| CRAN | 1.1.3 | 2023-09-03 | 2026-05-07 | 2026-05-07 | |
| CRAN | 1.1.2 | 2023-04-20 | 2026-05-07 | 2026-05-07 | |
| CRAN | 1.1.1 | 2023-03-22 | 2026-05-07 | 2026-05-07 | |
| CRAN | 1.1.0 | 2023-01-29 | 2026-05-07 | 2026-05-07 | |
| CRAN | 1.0.10 | 2022-09-01 | 2026-05-07 | 2026-05-07 | |
| CRAN | 1.0.9 | 2022-04-28 | 2026-05-07 | 2026-05-07 | |
| CRAN | 1.0.8 | 2022-02-08 | 2026-05-07 | 2026-05-07 | |
| CRAN | 1.0.7 | 2021-06-19 | 2026-05-07 | 2026-05-07 | |
| CRAN | 1.0.6 | 2021-05-05 | 2026-05-07 | 2026-05-07 | |
| CRAN | 1.0.5 | 2021-03-05 | 2026-05-07 | 2026-05-07 | |
| CRAN | 1.0.4 | 2021-02-02 | 2026-05-07 | 2026-05-07 | |
| CRAN | 1.0.3 | 2021-01-15 | 2026-05-07 | 2026-05-07 | |
| CRAN | 1.0.2 | 2020-08-18 | 2026-05-07 | 2026-05-07 | |
| CRAN | 1.0.1 | 2020-07-31 | 2026-05-07 | 2026-05-07 | |
| CRAN | 1.0.0 | 2020-05-29 | 2026-05-07 | 2026-05-07 | |
| CRAN | 0.8.5 | 2020-03-07 | 2026-05-07 | 2026-05-07 | |
| CRAN | 0.8.4 | 2020-01-31 | 2026-05-07 | 2026-05-07 | |
| CRAN | 0.8.3 | 2019-07-04 | 2026-05-07 | 2026-05-07 | |
| CRAN | 0.8.2 | 2019-06-29 | 2026-05-07 | 2026-05-07 | |
| CRAN | 0.8.1 | 2019-05-14 | 2026-05-07 | 2026-05-07 | |
| CRAN | 0.8.0.1 | 2019-02-15 | 2026-05-07 | 2026-05-07 | |
| CRAN | 0.8.0 | 2019-02-14 | 2026-05-07 | 2026-05-07 | |
| CRAN | 0.7.8 | 2018-11-10 | 2026-05-07 | 2026-05-07 | |
| CRAN | 0.7.7 | 2018-10-16 | 2026-05-07 | 2026-05-07 | |
| CRAN | 0.7.6 | 2018-06-29 | 2026-05-07 | 2026-05-07 | |
| CRAN | 0.7.5 | 2018-05-19 | 2026-05-07 | 2026-05-07 | |
| CRAN | 0.7.4 | 2017-09-28 | 2026-05-07 | 2026-05-07 | |
| CRAN | 0.7.3 | 2017-09-09 | 2026-05-07 | 2026-05-07 | |
| CRAN | 0.7.2 | 2017-07-21 | 2026-05-07 | 2026-05-07 | |
| CRAN | 0.7.1 | 2017-06-22 | 2026-05-07 | 2026-05-07 | |
| CRAN | 0.7.0 | 2017-06-09 | 2026-05-07 | 2026-05-07 | |
| CRAN | 0.5.0 | 2016-06-24 | 2026-05-07 | 2026-05-07 | |
| CRAN | 0.4.3 | 2015-09-01 | 2026-05-07 | 2026-05-07 | |
| CRAN | 0.4.2 | 2015-06-16 | 2026-05-07 | 2026-05-07 | |
| CRAN | 0.4.1 | 2015-01-14 | 2026-05-07 | 2026-05-07 | |
| CRAN | 0.4.0 | 2015-01-08 | 2026-05-07 | 2026-05-07 | |
| CRAN | 0.3.0.2 | 2014-10-11 | 2026-05-07 | 2026-05-07 | |
| CRAN | 0.3.0.1 | 2014-10-08 | 2026-05-07 | 2026-05-07 | |
| CRAN | 0.3 | 2014-10-04 | 2026-05-07 | 2026-05-07 | |
| CRAN | 0.2 | 2014-05-21 | 2026-05-07 | 2026-05-07 | |
| CRAN | 0.1.3 | 2014-03-15 | 2026-05-07 | 2026-05-07 | |
| CRAN | 0.1.2 | 2014-02-24 | 2026-05-07 | 2026-05-07 | |
| CRAN | 0.1.1 | 2014-01-29 | 2026-05-07 | 2026-05-07 | |
| CRAN | 0.1 | 2014-01-16 | 2026-05-07 | 2026-05-07 | |
| CRAN | 1.2.1 | 2026-05-29 | 2026-05-30 | ||
| R-universe | 1.2.1.9000 | 2026-05-29 | 2026-05-30 |
표시할 OSV 데이터가 없습니다.
표시할 OpenAlex 데이터가 없습니다.