Creating one variable from a list of variables in R?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
I have a sequence of variables in a dataframe (over 100) and I would like to create an indicator variable for if particular text patterns are present in any of the variables. Below is an example with three variables. One solution I've found is using tidyr::unite()
followed by dplyr::mutate()
, but I'm interested in a solution where I do not have to unite the variables.
c1<-c("T1", "X1", "T6", "R5")
c2<-c("R4", "C6", "C7", "X3")
c3<-c("C5", "C2", "X4", "T2")
df<-data.frame(c1, c2, c3)
c1 c2 c3
1 T1 R4 C5
2 X1 C6 C2
3 T6 C7 X4
4 R5 X3 T2
code.vec<-c("T1", "T2", "T3", "T4") #Text patterns of interest
code_regex<-paste(code.vec, collapse="|")
new<-df %>%
unite(all_c, c1:c3, remove=FALSE) %>%
mutate(indicator=if_else(grepl(code_regex, all_c), 1, 0)) %>%
select(-(all_c))
c1 c2 c3 indicator
1 T1 R4 C5 1
2 X1 C6 C2 0
3 T6 C7 X4 0
4 R5 X3 T2 1
Above is an example that produces the desired result, however I feel as if there should be a way of doing this in tidyverse
without having to unite the variables. This is something that SAS handles very easily using an ARRAY
statement and a DO
loop, and I'm hoping R has a good way of handling this.
The real dataframe has many additional variables besides from the "c" fields to search, so a solution that involves searching every column would require subsetting the dataframe to first only contain the variables I want to search, and then joining the data back with the other variables.
r dplyr tidyverse mutate
add a comment |
I have a sequence of variables in a dataframe (over 100) and I would like to create an indicator variable for if particular text patterns are present in any of the variables. Below is an example with three variables. One solution I've found is using tidyr::unite()
followed by dplyr::mutate()
, but I'm interested in a solution where I do not have to unite the variables.
c1<-c("T1", "X1", "T6", "R5")
c2<-c("R4", "C6", "C7", "X3")
c3<-c("C5", "C2", "X4", "T2")
df<-data.frame(c1, c2, c3)
c1 c2 c3
1 T1 R4 C5
2 X1 C6 C2
3 T6 C7 X4
4 R5 X3 T2
code.vec<-c("T1", "T2", "T3", "T4") #Text patterns of interest
code_regex<-paste(code.vec, collapse="|")
new<-df %>%
unite(all_c, c1:c3, remove=FALSE) %>%
mutate(indicator=if_else(grepl(code_regex, all_c), 1, 0)) %>%
select(-(all_c))
c1 c2 c3 indicator
1 T1 R4 C5 1
2 X1 C6 C2 0
3 T6 C7 X4 0
4 R5 X3 T2 1
Above is an example that produces the desired result, however I feel as if there should be a way of doing this in tidyverse
without having to unite the variables. This is something that SAS handles very easily using an ARRAY
statement and a DO
loop, and I'm hoping R has a good way of handling this.
The real dataframe has many additional variables besides from the "c" fields to search, so a solution that involves searching every column would require subsetting the dataframe to first only contain the variables I want to search, and then joining the data back with the other variables.
r dplyr tidyverse mutate
You said you don't want to useunite
, but it's worth noting that passing the argumentremove = FALSE
hasunite
create a column of the united variables leaving the others intact. Might be convenient in this case.
– camille
11 hours ago
Yes, it is convenient. And it does work. I just feel like there may be a simpler approach I'm missing that doesn't need to create a united variable.
– patward5656
11 hours ago
add a comment |
I have a sequence of variables in a dataframe (over 100) and I would like to create an indicator variable for if particular text patterns are present in any of the variables. Below is an example with three variables. One solution I've found is using tidyr::unite()
followed by dplyr::mutate()
, but I'm interested in a solution where I do not have to unite the variables.
c1<-c("T1", "X1", "T6", "R5")
c2<-c("R4", "C6", "C7", "X3")
c3<-c("C5", "C2", "X4", "T2")
df<-data.frame(c1, c2, c3)
c1 c2 c3
1 T1 R4 C5
2 X1 C6 C2
3 T6 C7 X4
4 R5 X3 T2
code.vec<-c("T1", "T2", "T3", "T4") #Text patterns of interest
code_regex<-paste(code.vec, collapse="|")
new<-df %>%
unite(all_c, c1:c3, remove=FALSE) %>%
mutate(indicator=if_else(grepl(code_regex, all_c), 1, 0)) %>%
select(-(all_c))
c1 c2 c3 indicator
1 T1 R4 C5 1
2 X1 C6 C2 0
3 T6 C7 X4 0
4 R5 X3 T2 1
Above is an example that produces the desired result, however I feel as if there should be a way of doing this in tidyverse
without having to unite the variables. This is something that SAS handles very easily using an ARRAY
statement and a DO
loop, and I'm hoping R has a good way of handling this.
The real dataframe has many additional variables besides from the "c" fields to search, so a solution that involves searching every column would require subsetting the dataframe to first only contain the variables I want to search, and then joining the data back with the other variables.
r dplyr tidyverse mutate
I have a sequence of variables in a dataframe (over 100) and I would like to create an indicator variable for if particular text patterns are present in any of the variables. Below is an example with three variables. One solution I've found is using tidyr::unite()
followed by dplyr::mutate()
, but I'm interested in a solution where I do not have to unite the variables.
c1<-c("T1", "X1", "T6", "R5")
c2<-c("R4", "C6", "C7", "X3")
c3<-c("C5", "C2", "X4", "T2")
df<-data.frame(c1, c2, c3)
c1 c2 c3
1 T1 R4 C5
2 X1 C6 C2
3 T6 C7 X4
4 R5 X3 T2
code.vec<-c("T1", "T2", "T3", "T4") #Text patterns of interest
code_regex<-paste(code.vec, collapse="|")
new<-df %>%
unite(all_c, c1:c3, remove=FALSE) %>%
mutate(indicator=if_else(grepl(code_regex, all_c), 1, 0)) %>%
select(-(all_c))
c1 c2 c3 indicator
1 T1 R4 C5 1
2 X1 C6 C2 0
3 T6 C7 X4 0
4 R5 X3 T2 1
Above is an example that produces the desired result, however I feel as if there should be a way of doing this in tidyverse
without having to unite the variables. This is something that SAS handles very easily using an ARRAY
statement and a DO
loop, and I'm hoping R has a good way of handling this.
The real dataframe has many additional variables besides from the "c" fields to search, so a solution that involves searching every column would require subsetting the dataframe to first only contain the variables I want to search, and then joining the data back with the other variables.
r dplyr tidyverse mutate
r dplyr tidyverse mutate
edited 11 hours ago
patward5656
asked 11 hours ago
patward5656patward5656
425
425
You said you don't want to useunite
, but it's worth noting that passing the argumentremove = FALSE
hasunite
create a column of the united variables leaving the others intact. Might be convenient in this case.
– camille
11 hours ago
Yes, it is convenient. And it does work. I just feel like there may be a simpler approach I'm missing that doesn't need to create a united variable.
– patward5656
11 hours ago
add a comment |
You said you don't want to useunite
, but it's worth noting that passing the argumentremove = FALSE
hasunite
create a column of the united variables leaving the others intact. Might be convenient in this case.
– camille
11 hours ago
Yes, it is convenient. And it does work. I just feel like there may be a simpler approach I'm missing that doesn't need to create a united variable.
– patward5656
11 hours ago
You said you don't want to use
unite
, but it's worth noting that passing the argument remove = FALSE
has unite
create a column of the united variables leaving the others intact. Might be convenient in this case.– camille
11 hours ago
You said you don't want to use
unite
, but it's worth noting that passing the argument remove = FALSE
has unite
create a column of the united variables leaving the others intact. Might be convenient in this case.– camille
11 hours ago
Yes, it is convenient. And it does work. I just feel like there may be a simpler approach I'm missing that doesn't need to create a united variable.
– patward5656
11 hours ago
Yes, it is convenient. And it does work. I just feel like there may be a simpler approach I'm missing that doesn't need to create a united variable.
– patward5656
11 hours ago
add a comment |
3 Answers
3
active
oldest
votes
We can use tidyverse
library(tidyverse)
df %>%
mutate_all(str_detect, pattern = code_regex) %>%
reduce(`+`) %>%
mutate(df, indicator = .)
# c1 c2 c3 indicator
#1 T1 R4 C5 1
#2 X1 C6 C2 0
#3 T6 C7 X4 0
#4 R5 X3 T2 1
Or using base R
Reduce(`+`, lapply(df, grepl, pattern = code_regex))
#[1] 1 0 0 1
Thistidyverse
solution seems to only work in the scenario where all of the columns are being searched. I have other variables in my real dataset, and when using it for that the output is all NA. Does this have something to do with thereduce
function?
– patward5656
10 hours ago
@patward5656 That is an easy fix.df %>% mutate_at(vars(starts_with("c")), str_detect, pattern = code_regex) %>% reduce("+") %>% mutate(df, indicator = .)
– akrun
10 hours ago
c1<-c("T1", "X1", "T6", "R5") c2<-c("R4", "C6", "C7", "X3") c3<-c("C5", "C2", "X4", "T2") z1<-c("C5", "C2", "X4", "T2") df<-data.frame(c1, c2, c3, z1) df %>% mutate_at(vars(starts_with("c")), str_detect, pattern = code_regex) %>% reduce(
+) %>% mutate(df, indicator = .) c1 c2 c3 z1 indicator 1 T1 R4 C5 C5 NA 2 X1 C6 C2 C2 NA 3 T6 C7 X4 X4 NA 4 R5 X3 T2 T2 NA Warning message: In Ops.factor(.x, .y) : ‘+’ not meaningful for factors
This produced NAs, it seems.
– patward5656
10 hours ago
1
@patward5656 I would usetransmute_at
instead ofmutate_at
df %>% transmute_at(vars(starts_with("c")), str_detect, pattern = code_regex) %>% reduce(
+)
– akrun
10 hours ago
1
Thanks. I believetransmute_at()
solves it perfectly.
– patward5656
10 hours ago
|
show 2 more comments
Using base R, we can use sapply
and use grepl
to find pattern in every column and assign 1 to rows where there is more than 0 matches.
df$indicator <- as.integer(rowSums(sapply(df, grepl, pattern = code_regex)) > 0)
df
# c1 c2 c3 indicator
#1 T1 R4 C5 1
#2 X1 C6 C2 0
#3 T6 C7 X4 0
#4 R5 X3 T2 1
If there are few other columns and we are interested to apply it only for columns which start with "c"
we can use grep
to filter them.
cols <- grep("^c", names(df))
as.integer(rowSums(sapply(df[cols], grepl, pattern = code_regex)) > 0)
Using dplyr
we can do
library(dplyr)
df$indicator <- as.integer(df %>%
mutate_at(vars(c1:c3), ~grepl(code_regex, .)) %>%
rowSums() > 0)
This is a good solution, but in the real data there are additional variables that I do not want to pattern search, so this would require me to index the dataframe to include only the columns I want to search first. Will edit my original post to include this information.
– patward5656
11 hours ago
The purr solution looks like what I was looking for--one line of code that doesn't involve uniting the variables.
– patward5656
11 hours ago
@patward5656 I think thepurrr
solution would not give you the expected output. I changed it to usemutate_at
which should work on range of columns. Moreover, you can use column numbers directly incols
forsapply
., say columns3:5
or1:3
to find pattern in those column.
– Ronak Shah
11 hours ago
add a comment |
Base R
with apply
apply(df[cols], 1, function(x) sum(grepl(code_regex, x)))
# [1] 1 0 0 1
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55795925%2fcreating-one-variable-from-a-list-of-variables-in-r%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
We can use tidyverse
library(tidyverse)
df %>%
mutate_all(str_detect, pattern = code_regex) %>%
reduce(`+`) %>%
mutate(df, indicator = .)
# c1 c2 c3 indicator
#1 T1 R4 C5 1
#2 X1 C6 C2 0
#3 T6 C7 X4 0
#4 R5 X3 T2 1
Or using base R
Reduce(`+`, lapply(df, grepl, pattern = code_regex))
#[1] 1 0 0 1
Thistidyverse
solution seems to only work in the scenario where all of the columns are being searched. I have other variables in my real dataset, and when using it for that the output is all NA. Does this have something to do with thereduce
function?
– patward5656
10 hours ago
@patward5656 That is an easy fix.df %>% mutate_at(vars(starts_with("c")), str_detect, pattern = code_regex) %>% reduce("+") %>% mutate(df, indicator = .)
– akrun
10 hours ago
c1<-c("T1", "X1", "T6", "R5") c2<-c("R4", "C6", "C7", "X3") c3<-c("C5", "C2", "X4", "T2") z1<-c("C5", "C2", "X4", "T2") df<-data.frame(c1, c2, c3, z1) df %>% mutate_at(vars(starts_with("c")), str_detect, pattern = code_regex) %>% reduce(
+) %>% mutate(df, indicator = .) c1 c2 c3 z1 indicator 1 T1 R4 C5 C5 NA 2 X1 C6 C2 C2 NA 3 T6 C7 X4 X4 NA 4 R5 X3 T2 T2 NA Warning message: In Ops.factor(.x, .y) : ‘+’ not meaningful for factors
This produced NAs, it seems.
– patward5656
10 hours ago
1
@patward5656 I would usetransmute_at
instead ofmutate_at
df %>% transmute_at(vars(starts_with("c")), str_detect, pattern = code_regex) %>% reduce(
+)
– akrun
10 hours ago
1
Thanks. I believetransmute_at()
solves it perfectly.
– patward5656
10 hours ago
|
show 2 more comments
We can use tidyverse
library(tidyverse)
df %>%
mutate_all(str_detect, pattern = code_regex) %>%
reduce(`+`) %>%
mutate(df, indicator = .)
# c1 c2 c3 indicator
#1 T1 R4 C5 1
#2 X1 C6 C2 0
#3 T6 C7 X4 0
#4 R5 X3 T2 1
Or using base R
Reduce(`+`, lapply(df, grepl, pattern = code_regex))
#[1] 1 0 0 1
Thistidyverse
solution seems to only work in the scenario where all of the columns are being searched. I have other variables in my real dataset, and when using it for that the output is all NA. Does this have something to do with thereduce
function?
– patward5656
10 hours ago
@patward5656 That is an easy fix.df %>% mutate_at(vars(starts_with("c")), str_detect, pattern = code_regex) %>% reduce("+") %>% mutate(df, indicator = .)
– akrun
10 hours ago
c1<-c("T1", "X1", "T6", "R5") c2<-c("R4", "C6", "C7", "X3") c3<-c("C5", "C2", "X4", "T2") z1<-c("C5", "C2", "X4", "T2") df<-data.frame(c1, c2, c3, z1) df %>% mutate_at(vars(starts_with("c")), str_detect, pattern = code_regex) %>% reduce(
+) %>% mutate(df, indicator = .) c1 c2 c3 z1 indicator 1 T1 R4 C5 C5 NA 2 X1 C6 C2 C2 NA 3 T6 C7 X4 X4 NA 4 R5 X3 T2 T2 NA Warning message: In Ops.factor(.x, .y) : ‘+’ not meaningful for factors
This produced NAs, it seems.
– patward5656
10 hours ago
1
@patward5656 I would usetransmute_at
instead ofmutate_at
df %>% transmute_at(vars(starts_with("c")), str_detect, pattern = code_regex) %>% reduce(
+)
– akrun
10 hours ago
1
Thanks. I believetransmute_at()
solves it perfectly.
– patward5656
10 hours ago
|
show 2 more comments
We can use tidyverse
library(tidyverse)
df %>%
mutate_all(str_detect, pattern = code_regex) %>%
reduce(`+`) %>%
mutate(df, indicator = .)
# c1 c2 c3 indicator
#1 T1 R4 C5 1
#2 X1 C6 C2 0
#3 T6 C7 X4 0
#4 R5 X3 T2 1
Or using base R
Reduce(`+`, lapply(df, grepl, pattern = code_regex))
#[1] 1 0 0 1
We can use tidyverse
library(tidyverse)
df %>%
mutate_all(str_detect, pattern = code_regex) %>%
reduce(`+`) %>%
mutate(df, indicator = .)
# c1 c2 c3 indicator
#1 T1 R4 C5 1
#2 X1 C6 C2 0
#3 T6 C7 X4 0
#4 R5 X3 T2 1
Or using base R
Reduce(`+`, lapply(df, grepl, pattern = code_regex))
#[1] 1 0 0 1
answered 11 hours ago
akrunakrun
424k13209287
424k13209287
Thistidyverse
solution seems to only work in the scenario where all of the columns are being searched. I have other variables in my real dataset, and when using it for that the output is all NA. Does this have something to do with thereduce
function?
– patward5656
10 hours ago
@patward5656 That is an easy fix.df %>% mutate_at(vars(starts_with("c")), str_detect, pattern = code_regex) %>% reduce("+") %>% mutate(df, indicator = .)
– akrun
10 hours ago
c1<-c("T1", "X1", "T6", "R5") c2<-c("R4", "C6", "C7", "X3") c3<-c("C5", "C2", "X4", "T2") z1<-c("C5", "C2", "X4", "T2") df<-data.frame(c1, c2, c3, z1) df %>% mutate_at(vars(starts_with("c")), str_detect, pattern = code_regex) %>% reduce(
+) %>% mutate(df, indicator = .) c1 c2 c3 z1 indicator 1 T1 R4 C5 C5 NA 2 X1 C6 C2 C2 NA 3 T6 C7 X4 X4 NA 4 R5 X3 T2 T2 NA Warning message: In Ops.factor(.x, .y) : ‘+’ not meaningful for factors
This produced NAs, it seems.
– patward5656
10 hours ago
1
@patward5656 I would usetransmute_at
instead ofmutate_at
df %>% transmute_at(vars(starts_with("c")), str_detect, pattern = code_regex) %>% reduce(
+)
– akrun
10 hours ago
1
Thanks. I believetransmute_at()
solves it perfectly.
– patward5656
10 hours ago
|
show 2 more comments
Thistidyverse
solution seems to only work in the scenario where all of the columns are being searched. I have other variables in my real dataset, and when using it for that the output is all NA. Does this have something to do with thereduce
function?
– patward5656
10 hours ago
@patward5656 That is an easy fix.df %>% mutate_at(vars(starts_with("c")), str_detect, pattern = code_regex) %>% reduce("+") %>% mutate(df, indicator = .)
– akrun
10 hours ago
c1<-c("T1", "X1", "T6", "R5") c2<-c("R4", "C6", "C7", "X3") c3<-c("C5", "C2", "X4", "T2") z1<-c("C5", "C2", "X4", "T2") df<-data.frame(c1, c2, c3, z1) df %>% mutate_at(vars(starts_with("c")), str_detect, pattern = code_regex) %>% reduce(
+) %>% mutate(df, indicator = .) c1 c2 c3 z1 indicator 1 T1 R4 C5 C5 NA 2 X1 C6 C2 C2 NA 3 T6 C7 X4 X4 NA 4 R5 X3 T2 T2 NA Warning message: In Ops.factor(.x, .y) : ‘+’ not meaningful for factors
This produced NAs, it seems.
– patward5656
10 hours ago
1
@patward5656 I would usetransmute_at
instead ofmutate_at
df %>% transmute_at(vars(starts_with("c")), str_detect, pattern = code_regex) %>% reduce(
+)
– akrun
10 hours ago
1
Thanks. I believetransmute_at()
solves it perfectly.
– patward5656
10 hours ago
This
tidyverse
solution seems to only work in the scenario where all of the columns are being searched. I have other variables in my real dataset, and when using it for that the output is all NA. Does this have something to do with the reduce
function?– patward5656
10 hours ago
This
tidyverse
solution seems to only work in the scenario where all of the columns are being searched. I have other variables in my real dataset, and when using it for that the output is all NA. Does this have something to do with the reduce
function?– patward5656
10 hours ago
@patward5656 That is an easy fix.
df %>% mutate_at(vars(starts_with("c")), str_detect, pattern = code_regex) %>% reduce("+") %>% mutate(df, indicator = .)
– akrun
10 hours ago
@patward5656 That is an easy fix.
df %>% mutate_at(vars(starts_with("c")), str_detect, pattern = code_regex) %>% reduce("+") %>% mutate(df, indicator = .)
– akrun
10 hours ago
c1<-c("T1", "X1", "T6", "R5") c2<-c("R4", "C6", "C7", "X3") c3<-c("C5", "C2", "X4", "T2") z1<-c("C5", "C2", "X4", "T2") df<-data.frame(c1, c2, c3, z1) df %>% mutate_at(vars(starts_with("c")), str_detect, pattern = code_regex) %>% reduce(
+) %>% mutate(df, indicator = .) c1 c2 c3 z1 indicator 1 T1 R4 C5 C5 NA 2 X1 C6 C2 C2 NA 3 T6 C7 X4 X4 NA 4 R5 X3 T2 T2 NA Warning message: In Ops.factor(.x, .y) : ‘+’ not meaningful for factors
This produced NAs, it seems.– patward5656
10 hours ago
c1<-c("T1", "X1", "T6", "R5") c2<-c("R4", "C6", "C7", "X3") c3<-c("C5", "C2", "X4", "T2") z1<-c("C5", "C2", "X4", "T2") df<-data.frame(c1, c2, c3, z1) df %>% mutate_at(vars(starts_with("c")), str_detect, pattern = code_regex) %>% reduce(
+) %>% mutate(df, indicator = .) c1 c2 c3 z1 indicator 1 T1 R4 C5 C5 NA 2 X1 C6 C2 C2 NA 3 T6 C7 X4 X4 NA 4 R5 X3 T2 T2 NA Warning message: In Ops.factor(.x, .y) : ‘+’ not meaningful for factors
This produced NAs, it seems.– patward5656
10 hours ago
1
1
@patward5656 I would use
transmute_at
instead of mutate_at
df %>% transmute_at(vars(starts_with("c")), str_detect, pattern = code_regex) %>% reduce(
+)
– akrun
10 hours ago
@patward5656 I would use
transmute_at
instead of mutate_at
df %>% transmute_at(vars(starts_with("c")), str_detect, pattern = code_regex) %>% reduce(
+)
– akrun
10 hours ago
1
1
Thanks. I believe
transmute_at()
solves it perfectly.– patward5656
10 hours ago
Thanks. I believe
transmute_at()
solves it perfectly.– patward5656
10 hours ago
|
show 2 more comments
Using base R, we can use sapply
and use grepl
to find pattern in every column and assign 1 to rows where there is more than 0 matches.
df$indicator <- as.integer(rowSums(sapply(df, grepl, pattern = code_regex)) > 0)
df
# c1 c2 c3 indicator
#1 T1 R4 C5 1
#2 X1 C6 C2 0
#3 T6 C7 X4 0
#4 R5 X3 T2 1
If there are few other columns and we are interested to apply it only for columns which start with "c"
we can use grep
to filter them.
cols <- grep("^c", names(df))
as.integer(rowSums(sapply(df[cols], grepl, pattern = code_regex)) > 0)
Using dplyr
we can do
library(dplyr)
df$indicator <- as.integer(df %>%
mutate_at(vars(c1:c3), ~grepl(code_regex, .)) %>%
rowSums() > 0)
This is a good solution, but in the real data there are additional variables that I do not want to pattern search, so this would require me to index the dataframe to include only the columns I want to search first. Will edit my original post to include this information.
– patward5656
11 hours ago
The purr solution looks like what I was looking for--one line of code that doesn't involve uniting the variables.
– patward5656
11 hours ago
@patward5656 I think thepurrr
solution would not give you the expected output. I changed it to usemutate_at
which should work on range of columns. Moreover, you can use column numbers directly incols
forsapply
., say columns3:5
or1:3
to find pattern in those column.
– Ronak Shah
11 hours ago
add a comment |
Using base R, we can use sapply
and use grepl
to find pattern in every column and assign 1 to rows where there is more than 0 matches.
df$indicator <- as.integer(rowSums(sapply(df, grepl, pattern = code_regex)) > 0)
df
# c1 c2 c3 indicator
#1 T1 R4 C5 1
#2 X1 C6 C2 0
#3 T6 C7 X4 0
#4 R5 X3 T2 1
If there are few other columns and we are interested to apply it only for columns which start with "c"
we can use grep
to filter them.
cols <- grep("^c", names(df))
as.integer(rowSums(sapply(df[cols], grepl, pattern = code_regex)) > 0)
Using dplyr
we can do
library(dplyr)
df$indicator <- as.integer(df %>%
mutate_at(vars(c1:c3), ~grepl(code_regex, .)) %>%
rowSums() > 0)
This is a good solution, but in the real data there are additional variables that I do not want to pattern search, so this would require me to index the dataframe to include only the columns I want to search first. Will edit my original post to include this information.
– patward5656
11 hours ago
The purr solution looks like what I was looking for--one line of code that doesn't involve uniting the variables.
– patward5656
11 hours ago
@patward5656 I think thepurrr
solution would not give you the expected output. I changed it to usemutate_at
which should work on range of columns. Moreover, you can use column numbers directly incols
forsapply
., say columns3:5
or1:3
to find pattern in those column.
– Ronak Shah
11 hours ago
add a comment |
Using base R, we can use sapply
and use grepl
to find pattern in every column and assign 1 to rows where there is more than 0 matches.
df$indicator <- as.integer(rowSums(sapply(df, grepl, pattern = code_regex)) > 0)
df
# c1 c2 c3 indicator
#1 T1 R4 C5 1
#2 X1 C6 C2 0
#3 T6 C7 X4 0
#4 R5 X3 T2 1
If there are few other columns and we are interested to apply it only for columns which start with "c"
we can use grep
to filter them.
cols <- grep("^c", names(df))
as.integer(rowSums(sapply(df[cols], grepl, pattern = code_regex)) > 0)
Using dplyr
we can do
library(dplyr)
df$indicator <- as.integer(df %>%
mutate_at(vars(c1:c3), ~grepl(code_regex, .)) %>%
rowSums() > 0)
Using base R, we can use sapply
and use grepl
to find pattern in every column and assign 1 to rows where there is more than 0 matches.
df$indicator <- as.integer(rowSums(sapply(df, grepl, pattern = code_regex)) > 0)
df
# c1 c2 c3 indicator
#1 T1 R4 C5 1
#2 X1 C6 C2 0
#3 T6 C7 X4 0
#4 R5 X3 T2 1
If there are few other columns and we are interested to apply it only for columns which start with "c"
we can use grep
to filter them.
cols <- grep("^c", names(df))
as.integer(rowSums(sapply(df[cols], grepl, pattern = code_regex)) > 0)
Using dplyr
we can do
library(dplyr)
df$indicator <- as.integer(df %>%
mutate_at(vars(c1:c3), ~grepl(code_regex, .)) %>%
rowSums() > 0)
edited 11 hours ago
answered 11 hours ago
Ronak ShahRonak Shah
49k104370
49k104370
This is a good solution, but in the real data there are additional variables that I do not want to pattern search, so this would require me to index the dataframe to include only the columns I want to search first. Will edit my original post to include this information.
– patward5656
11 hours ago
The purr solution looks like what I was looking for--one line of code that doesn't involve uniting the variables.
– patward5656
11 hours ago
@patward5656 I think thepurrr
solution would not give you the expected output. I changed it to usemutate_at
which should work on range of columns. Moreover, you can use column numbers directly incols
forsapply
., say columns3:5
or1:3
to find pattern in those column.
– Ronak Shah
11 hours ago
add a comment |
This is a good solution, but in the real data there are additional variables that I do not want to pattern search, so this would require me to index the dataframe to include only the columns I want to search first. Will edit my original post to include this information.
– patward5656
11 hours ago
The purr solution looks like what I was looking for--one line of code that doesn't involve uniting the variables.
– patward5656
11 hours ago
@patward5656 I think thepurrr
solution would not give you the expected output. I changed it to usemutate_at
which should work on range of columns. Moreover, you can use column numbers directly incols
forsapply
., say columns3:5
or1:3
to find pattern in those column.
– Ronak Shah
11 hours ago
This is a good solution, but in the real data there are additional variables that I do not want to pattern search, so this would require me to index the dataframe to include only the columns I want to search first. Will edit my original post to include this information.
– patward5656
11 hours ago
This is a good solution, but in the real data there are additional variables that I do not want to pattern search, so this would require me to index the dataframe to include only the columns I want to search first. Will edit my original post to include this information.
– patward5656
11 hours ago
The purr solution looks like what I was looking for--one line of code that doesn't involve uniting the variables.
– patward5656
11 hours ago
The purr solution looks like what I was looking for--one line of code that doesn't involve uniting the variables.
– patward5656
11 hours ago
@patward5656 I think the
purrr
solution would not give you the expected output. I changed it to use mutate_at
which should work on range of columns. Moreover, you can use column numbers directly in cols
for sapply
., say columns 3:5
or 1:3
to find pattern in those column.– Ronak Shah
11 hours ago
@patward5656 I think the
purrr
solution would not give you the expected output. I changed it to use mutate_at
which should work on range of columns. Moreover, you can use column numbers directly in cols
for sapply
., say columns 3:5
or 1:3
to find pattern in those column.– Ronak Shah
11 hours ago
add a comment |
Base R
with apply
apply(df[cols], 1, function(x) sum(grepl(code_regex, x)))
# [1] 1 0 0 1
add a comment |
Base R
with apply
apply(df[cols], 1, function(x) sum(grepl(code_regex, x)))
# [1] 1 0 0 1
add a comment |
Base R
with apply
apply(df[cols], 1, function(x) sum(grepl(code_regex, x)))
# [1] 1 0 0 1
Base R
with apply
apply(df[cols], 1, function(x) sum(grepl(code_regex, x)))
# [1] 1 0 0 1
answered 11 hours ago
nsinghsnsinghs
1,262621
1,262621
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55795925%2fcreating-one-variable-from-a-list-of-variables-in-r%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
You said you don't want to use
unite
, but it's worth noting that passing the argumentremove = FALSE
hasunite
create a column of the united variables leaving the others intact. Might be convenient in this case.– camille
11 hours ago
Yes, it is convenient. And it does work. I just feel like there may be a simpler approach I'm missing that doesn't need to create a united variable.
– patward5656
11 hours ago