Rank groups within a grouped sequence of TRUE/FALSE and NA
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
I have a little nut to crack.
I have a data.frame
like this:
group criterium
1 A NA
2 A TRUE
3 A TRUE
4 A TRUE
5 A FALSE
6 A FALSE
7 A TRUE
8 A TRUE
9 A FALSE
10 A TRUE
11 A TRUE
12 A TRUE
13 B NA
14 B FALSE
15 B TRUE
16 B TRUE
17 B TRUE
18 B FALSE
structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE,
FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE,
TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA,
-18L))
And I want to rank the groups of TRUE
in column criterium
in ascending order while disregarding the FALSE
and NA
. The goal is to have a unique group identifier inside each group of group
.
So the result should look like:
group criterium goal
1 A NA NA
2 A TRUE 1
3 A TRUE 1
4 A TRUE 1
5 A FALSE NA
6 A FALSE NA
7 A TRUE 2
8 A TRUE 2
9 A FALSE NA
10 A TRUE 3
11 A TRUE 3
12 A TRUE 3
13 B NA NA
14 B FALSE NA
15 B TRUE 1
16 B TRUE 1
17 B TRUE 1
18 B FALSE NA
I'm sure there is a relatively easy way to do this, I just can't think of one. I experimented with dense_rank()
and other window functions of dplyr
, but to no avail.
Thanks for the help!
r dplyr data.table rank
|
show 1 more comment
I have a little nut to crack.
I have a data.frame
like this:
group criterium
1 A NA
2 A TRUE
3 A TRUE
4 A TRUE
5 A FALSE
6 A FALSE
7 A TRUE
8 A TRUE
9 A FALSE
10 A TRUE
11 A TRUE
12 A TRUE
13 B NA
14 B FALSE
15 B TRUE
16 B TRUE
17 B TRUE
18 B FALSE
structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE,
FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE,
TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA,
-18L))
And I want to rank the groups of TRUE
in column criterium
in ascending order while disregarding the FALSE
and NA
. The goal is to have a unique group identifier inside each group of group
.
So the result should look like:
group criterium goal
1 A NA NA
2 A TRUE 1
3 A TRUE 1
4 A TRUE 1
5 A FALSE NA
6 A FALSE NA
7 A TRUE 2
8 A TRUE 2
9 A FALSE NA
10 A TRUE 3
11 A TRUE 3
12 A TRUE 3
13 B NA NA
14 B FALSE NA
15 B TRUE 1
16 B TRUE 1
17 B TRUE 1
18 B FALSE NA
I'm sure there is a relatively easy way to do this, I just can't think of one. I experimented with dense_rank()
and other window functions of dplyr
, but to no avail.
Thanks for the help!
r dplyr data.table rank
1
you can just about grab what you need with this work of beauty;as.numeric(as.factor(cumsum(is.na(d$criterium^NA)) + d$criterium^NA))
-- just needs to be applied by group
– user20650
2 days ago
that is a really funny solution. Very good job!
– Humpelstielzchen
2 days ago
In your example all of group A comes first, then group B. We don't need to handle cases with group=A, criterium=TRUE interspersed with group=B, criterium=TRUE?
– smci
2 days ago
No, when group A stops so stops the sequence for group A.
– Humpelstielzchen
2 days ago
But I'm suggesting if you construct an example with group=A, criterium=TRUE followed by group=B, criterium=TRUE (with no FALSE's in-between), would that get a new 'goal' number or not? Some of the answers here will fail because they don't group-bygroup
or consider the discontinuity ingroup
.
– smci
2 days ago
|
show 1 more comment
I have a little nut to crack.
I have a data.frame
like this:
group criterium
1 A NA
2 A TRUE
3 A TRUE
4 A TRUE
5 A FALSE
6 A FALSE
7 A TRUE
8 A TRUE
9 A FALSE
10 A TRUE
11 A TRUE
12 A TRUE
13 B NA
14 B FALSE
15 B TRUE
16 B TRUE
17 B TRUE
18 B FALSE
structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE,
FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE,
TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA,
-18L))
And I want to rank the groups of TRUE
in column criterium
in ascending order while disregarding the FALSE
and NA
. The goal is to have a unique group identifier inside each group of group
.
So the result should look like:
group criterium goal
1 A NA NA
2 A TRUE 1
3 A TRUE 1
4 A TRUE 1
5 A FALSE NA
6 A FALSE NA
7 A TRUE 2
8 A TRUE 2
9 A FALSE NA
10 A TRUE 3
11 A TRUE 3
12 A TRUE 3
13 B NA NA
14 B FALSE NA
15 B TRUE 1
16 B TRUE 1
17 B TRUE 1
18 B FALSE NA
I'm sure there is a relatively easy way to do this, I just can't think of one. I experimented with dense_rank()
and other window functions of dplyr
, but to no avail.
Thanks for the help!
r dplyr data.table rank
I have a little nut to crack.
I have a data.frame
like this:
group criterium
1 A NA
2 A TRUE
3 A TRUE
4 A TRUE
5 A FALSE
6 A FALSE
7 A TRUE
8 A TRUE
9 A FALSE
10 A TRUE
11 A TRUE
12 A TRUE
13 B NA
14 B FALSE
15 B TRUE
16 B TRUE
17 B TRUE
18 B FALSE
structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE,
FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE,
TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA,
-18L))
And I want to rank the groups of TRUE
in column criterium
in ascending order while disregarding the FALSE
and NA
. The goal is to have a unique group identifier inside each group of group
.
So the result should look like:
group criterium goal
1 A NA NA
2 A TRUE 1
3 A TRUE 1
4 A TRUE 1
5 A FALSE NA
6 A FALSE NA
7 A TRUE 2
8 A TRUE 2
9 A FALSE NA
10 A TRUE 3
11 A TRUE 3
12 A TRUE 3
13 B NA NA
14 B FALSE NA
15 B TRUE 1
16 B TRUE 1
17 B TRUE 1
18 B FALSE NA
I'm sure there is a relatively easy way to do this, I just can't think of one. I experimented with dense_rank()
and other window functions of dplyr
, but to no avail.
Thanks for the help!
r dplyr data.table rank
r dplyr data.table rank
edited 2 days ago
Humpelstielzchen
asked 2 days ago
HumpelstielzchenHumpelstielzchen
1,4251318
1,4251318
1
you can just about grab what you need with this work of beauty;as.numeric(as.factor(cumsum(is.na(d$criterium^NA)) + d$criterium^NA))
-- just needs to be applied by group
– user20650
2 days ago
that is a really funny solution. Very good job!
– Humpelstielzchen
2 days ago
In your example all of group A comes first, then group B. We don't need to handle cases with group=A, criterium=TRUE interspersed with group=B, criterium=TRUE?
– smci
2 days ago
No, when group A stops so stops the sequence for group A.
– Humpelstielzchen
2 days ago
But I'm suggesting if you construct an example with group=A, criterium=TRUE followed by group=B, criterium=TRUE (with no FALSE's in-between), would that get a new 'goal' number or not? Some of the answers here will fail because they don't group-bygroup
or consider the discontinuity ingroup
.
– smci
2 days ago
|
show 1 more comment
1
you can just about grab what you need with this work of beauty;as.numeric(as.factor(cumsum(is.na(d$criterium^NA)) + d$criterium^NA))
-- just needs to be applied by group
– user20650
2 days ago
that is a really funny solution. Very good job!
– Humpelstielzchen
2 days ago
In your example all of group A comes first, then group B. We don't need to handle cases with group=A, criterium=TRUE interspersed with group=B, criterium=TRUE?
– smci
2 days ago
No, when group A stops so stops the sequence for group A.
– Humpelstielzchen
2 days ago
But I'm suggesting if you construct an example with group=A, criterium=TRUE followed by group=B, criterium=TRUE (with no FALSE's in-between), would that get a new 'goal' number or not? Some of the answers here will fail because they don't group-bygroup
or consider the discontinuity ingroup
.
– smci
2 days ago
1
1
you can just about grab what you need with this work of beauty;
as.numeric(as.factor(cumsum(is.na(d$criterium^NA)) + d$criterium^NA))
-- just needs to be applied by group– user20650
2 days ago
you can just about grab what you need with this work of beauty;
as.numeric(as.factor(cumsum(is.na(d$criterium^NA)) + d$criterium^NA))
-- just needs to be applied by group– user20650
2 days ago
that is a really funny solution. Very good job!
– Humpelstielzchen
2 days ago
that is a really funny solution. Very good job!
– Humpelstielzchen
2 days ago
In your example all of group A comes first, then group B. We don't need to handle cases with group=A, criterium=TRUE interspersed with group=B, criterium=TRUE?
– smci
2 days ago
In your example all of group A comes first, then group B. We don't need to handle cases with group=A, criterium=TRUE interspersed with group=B, criterium=TRUE?
– smci
2 days ago
No, when group A stops so stops the sequence for group A.
– Humpelstielzchen
2 days ago
No, when group A stops so stops the sequence for group A.
– Humpelstielzchen
2 days ago
But I'm suggesting if you construct an example with group=A, criterium=TRUE followed by group=B, criterium=TRUE (with no FALSE's in-between), would that get a new 'goal' number or not? Some of the answers here will fail because they don't group-by
group
or consider the discontinuity in group
.– smci
2 days ago
But I'm suggesting if you construct an example with group=A, criterium=TRUE followed by group=B, criterium=TRUE (with no FALSE's in-between), would that get a new 'goal' number or not? Some of the answers here will fail because they don't group-by
group
or consider the discontinuity in group
.– smci
2 days ago
|
show 1 more comment
4 Answers
4
active
oldest
votes
Another data.table
approach:
library(data.table)
setDT(dt)
dt[, cr := rleid(criterium)][
(criterium), goal := rleid(cr), by=.(group)]
1
Tried withrleid
but didn't get it to work. (+1)
– markus
2 days ago
works for me. And seems to be the most elegant answer.
– Humpelstielzchen
2 days ago
add a comment |
Maybe I have over-complicated this but one way with dplyr
is
library(dplyr)
df %>%
mutate(temp = replace(criterium, is.na(criterium), FALSE),
temp1 = cumsum(!temp)) %>%
group_by(temp1) %>%
mutate(goal = +(row_number() == which.max(temp) & any(temp))) %>%
group_by(group) %>%
mutate(goal = ifelse(temp, cumsum(goal), NA)) %>%
select(-temp, -temp1)
# group criterium goal
# <fct> <lgl> <int>
# 1 A NA NA
# 2 A TRUE 1
# 3 A TRUE 1
# 4 A TRUE 1
# 5 A FALSE NA
# 6 A FALSE NA
# 7 A TRUE 2
# 8 A TRUE 2
# 9 A FALSE NA
#10 A TRUE 3
#11 A TRUE 3
#12 A TRUE 3
#13 B NA NA
#14 B FALSE NA
#15 B TRUE 1
#16 B TRUE 1
#17 B TRUE 1
#18 B FALSE NA
We first replace
NA
s in criterium
column to FALSE
and take cumulative sum over the negation of it (temp1
). We group_by
temp1
and assign 1 to every first TRUE
value in the group. Finally grouping by group
we take a cumulative sum for TRUE
values or return NA
for FALSE
and NA
values.
add a comment |
A pure Base R solution, we can create a custom function via rle
, and use it per group, i.e.
f1 <- function(x) {
x[is.na(x)] <- FALSE
rle1 <- rle(x)
y <- rle1$values
rle1$values[!y] <- 0
rle1$values[y] <- cumsum(rle1$values[y])
return(inverse.rle(rle1))
}
do.call(rbind,
lapply(split(df, df$group), function(i){i$goal <- f1(i$criterium);
i$goal <- replace(i$goal, is.na(i$criterium)|!i$criterium, NA);
i}))
Of course, If you want you can apply it via dplyr
, i.e.
library(dplyr)
df %>%
group_by(group) %>%
mutate(goal = f1(criterium),
goal = replace(goal, is.na(criterium)|!criterium, NA))
which gives,
# A tibble: 18 x 3
# Groups: group [2]
group criterium goal
<fct> <lgl> <dbl>
1 A NA NA
2 A TRUE 1
3 A TRUE 1
4 A TRUE 1
5 A FALSE NA
6 A FALSE NA
7 A TRUE 2
8 A TRUE 2
9 A FALSE NA
10 A TRUE 3
11 A TRUE 3
12 A TRUE 3
13 B NA NA
14 B FALSE NA
15 B TRUE 1
16 B TRUE 1
17 B TRUE 1
18 B FALSE NA
add a comment |
A data.table
option using rle
library(data.table)
DT <- as.data.table(dat)
DT[, goal := {
r <- rle(replace(criterium, is.na(criterium), FALSE))
r$values <- with(r, cumsum(values) * values)
out <- inverse.rle(r)
replace(out, out == 0, NA)
}, by = group]
DT
# group criterium goal
# 1: A NA NA
# 2: A TRUE 1
# 3: A TRUE 1
# 4: A TRUE 1
# 5: A FALSE NA
# 6: A FALSE NA
# 7: A TRUE 2
# 8: A TRUE 2
# 9: A FALSE NA
#10: A TRUE 3
#11: A TRUE 3
#12: A TRUE 3
#13: B NA NA
#14: B FALSE NA
#15: B TRUE 1
#16: B TRUE 1
#17: B TRUE 1
#18: B FALSE NA
step by step
When we call r <- rle(replace(criterium, is.na(criterium), FALSE))
we get an object of class rle
r
#Run Length Encoding
# lengths: int [1:9] 1 3 2 2 1 3 2 3 1
# values : logi [1:9] FALSE TRUE FALSE TRUE FALSE TRUE ...
We manipulate the values
compenent in the following way
r$values <- with(r, cumsum(values) * values)
r
#Run Length Encoding
# lengths: int [1:9] 1 3 2 2 1 3 2 3 1
# values : int [1:9] 0 1 0 2 0 3 0 4 0
That is, we replaced TRUE
s with the cumulative sum of values
and set the FALSE
s to 0
. Now inverse.rle
returns a vector in which values
will repeated lenghts
times
out <- inverse.rle(r)
out
# [1] 0 1 1 1 0 0 2 2 0 3 3 3 0 0 4 4 4 0
This is almost what OP wants but we need to replace the 0
s with NA
replace(out, out == 0, NA)
This is done for each group
.
data
dat <- structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE,
FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE,
TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA,
-18L))
Wow, impressive. Thanks for introducing me torle
andinverse.rle
. Gruß nach Leipzig.
– Humpelstielzchen
2 days ago
1
@Humpelstielzchen Gern geschehen. Will try to simplify and explain the logic a bit.
– markus
2 days ago
Thanks! I was dissecting your answer just like that. Your answer taught me the most. But chinsoon12 is just a Teufelskerl. ^^
– Humpelstielzchen
2 days ago
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55606323%2frank-groups-within-a-grouped-sequence-of-true-false-and-na%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
Another data.table
approach:
library(data.table)
setDT(dt)
dt[, cr := rleid(criterium)][
(criterium), goal := rleid(cr), by=.(group)]
1
Tried withrleid
but didn't get it to work. (+1)
– markus
2 days ago
works for me. And seems to be the most elegant answer.
– Humpelstielzchen
2 days ago
add a comment |
Another data.table
approach:
library(data.table)
setDT(dt)
dt[, cr := rleid(criterium)][
(criterium), goal := rleid(cr), by=.(group)]
1
Tried withrleid
but didn't get it to work. (+1)
– markus
2 days ago
works for me. And seems to be the most elegant answer.
– Humpelstielzchen
2 days ago
add a comment |
Another data.table
approach:
library(data.table)
setDT(dt)
dt[, cr := rleid(criterium)][
(criterium), goal := rleid(cr), by=.(group)]
Another data.table
approach:
library(data.table)
setDT(dt)
dt[, cr := rleid(criterium)][
(criterium), goal := rleid(cr), by=.(group)]
answered 2 days ago
chinsoon12chinsoon12
9,93611420
9,93611420
1
Tried withrleid
but didn't get it to work. (+1)
– markus
2 days ago
works for me. And seems to be the most elegant answer.
– Humpelstielzchen
2 days ago
add a comment |
1
Tried withrleid
but didn't get it to work. (+1)
– markus
2 days ago
works for me. And seems to be the most elegant answer.
– Humpelstielzchen
2 days ago
1
1
Tried with
rleid
but didn't get it to work. (+1)– markus
2 days ago
Tried with
rleid
but didn't get it to work. (+1)– markus
2 days ago
works for me. And seems to be the most elegant answer.
– Humpelstielzchen
2 days ago
works for me. And seems to be the most elegant answer.
– Humpelstielzchen
2 days ago
add a comment |
Maybe I have over-complicated this but one way with dplyr
is
library(dplyr)
df %>%
mutate(temp = replace(criterium, is.na(criterium), FALSE),
temp1 = cumsum(!temp)) %>%
group_by(temp1) %>%
mutate(goal = +(row_number() == which.max(temp) & any(temp))) %>%
group_by(group) %>%
mutate(goal = ifelse(temp, cumsum(goal), NA)) %>%
select(-temp, -temp1)
# group criterium goal
# <fct> <lgl> <int>
# 1 A NA NA
# 2 A TRUE 1
# 3 A TRUE 1
# 4 A TRUE 1
# 5 A FALSE NA
# 6 A FALSE NA
# 7 A TRUE 2
# 8 A TRUE 2
# 9 A FALSE NA
#10 A TRUE 3
#11 A TRUE 3
#12 A TRUE 3
#13 B NA NA
#14 B FALSE NA
#15 B TRUE 1
#16 B TRUE 1
#17 B TRUE 1
#18 B FALSE NA
We first replace
NA
s in criterium
column to FALSE
and take cumulative sum over the negation of it (temp1
). We group_by
temp1
and assign 1 to every first TRUE
value in the group. Finally grouping by group
we take a cumulative sum for TRUE
values or return NA
for FALSE
and NA
values.
add a comment |
Maybe I have over-complicated this but one way with dplyr
is
library(dplyr)
df %>%
mutate(temp = replace(criterium, is.na(criterium), FALSE),
temp1 = cumsum(!temp)) %>%
group_by(temp1) %>%
mutate(goal = +(row_number() == which.max(temp) & any(temp))) %>%
group_by(group) %>%
mutate(goal = ifelse(temp, cumsum(goal), NA)) %>%
select(-temp, -temp1)
# group criterium goal
# <fct> <lgl> <int>
# 1 A NA NA
# 2 A TRUE 1
# 3 A TRUE 1
# 4 A TRUE 1
# 5 A FALSE NA
# 6 A FALSE NA
# 7 A TRUE 2
# 8 A TRUE 2
# 9 A FALSE NA
#10 A TRUE 3
#11 A TRUE 3
#12 A TRUE 3
#13 B NA NA
#14 B FALSE NA
#15 B TRUE 1
#16 B TRUE 1
#17 B TRUE 1
#18 B FALSE NA
We first replace
NA
s in criterium
column to FALSE
and take cumulative sum over the negation of it (temp1
). We group_by
temp1
and assign 1 to every first TRUE
value in the group. Finally grouping by group
we take a cumulative sum for TRUE
values or return NA
for FALSE
and NA
values.
add a comment |
Maybe I have over-complicated this but one way with dplyr
is
library(dplyr)
df %>%
mutate(temp = replace(criterium, is.na(criterium), FALSE),
temp1 = cumsum(!temp)) %>%
group_by(temp1) %>%
mutate(goal = +(row_number() == which.max(temp) & any(temp))) %>%
group_by(group) %>%
mutate(goal = ifelse(temp, cumsum(goal), NA)) %>%
select(-temp, -temp1)
# group criterium goal
# <fct> <lgl> <int>
# 1 A NA NA
# 2 A TRUE 1
# 3 A TRUE 1
# 4 A TRUE 1
# 5 A FALSE NA
# 6 A FALSE NA
# 7 A TRUE 2
# 8 A TRUE 2
# 9 A FALSE NA
#10 A TRUE 3
#11 A TRUE 3
#12 A TRUE 3
#13 B NA NA
#14 B FALSE NA
#15 B TRUE 1
#16 B TRUE 1
#17 B TRUE 1
#18 B FALSE NA
We first replace
NA
s in criterium
column to FALSE
and take cumulative sum over the negation of it (temp1
). We group_by
temp1
and assign 1 to every first TRUE
value in the group. Finally grouping by group
we take a cumulative sum for TRUE
values or return NA
for FALSE
and NA
values.
Maybe I have over-complicated this but one way with dplyr
is
library(dplyr)
df %>%
mutate(temp = replace(criterium, is.na(criterium), FALSE),
temp1 = cumsum(!temp)) %>%
group_by(temp1) %>%
mutate(goal = +(row_number() == which.max(temp) & any(temp))) %>%
group_by(group) %>%
mutate(goal = ifelse(temp, cumsum(goal), NA)) %>%
select(-temp, -temp1)
# group criterium goal
# <fct> <lgl> <int>
# 1 A NA NA
# 2 A TRUE 1
# 3 A TRUE 1
# 4 A TRUE 1
# 5 A FALSE NA
# 6 A FALSE NA
# 7 A TRUE 2
# 8 A TRUE 2
# 9 A FALSE NA
#10 A TRUE 3
#11 A TRUE 3
#12 A TRUE 3
#13 B NA NA
#14 B FALSE NA
#15 B TRUE 1
#16 B TRUE 1
#17 B TRUE 1
#18 B FALSE NA
We first replace
NA
s in criterium
column to FALSE
and take cumulative sum over the negation of it (temp1
). We group_by
temp1
and assign 1 to every first TRUE
value in the group. Finally grouping by group
we take a cumulative sum for TRUE
values or return NA
for FALSE
and NA
values.
answered 2 days ago
Ronak ShahRonak Shah
46.7k104269
46.7k104269
add a comment |
add a comment |
A pure Base R solution, we can create a custom function via rle
, and use it per group, i.e.
f1 <- function(x) {
x[is.na(x)] <- FALSE
rle1 <- rle(x)
y <- rle1$values
rle1$values[!y] <- 0
rle1$values[y] <- cumsum(rle1$values[y])
return(inverse.rle(rle1))
}
do.call(rbind,
lapply(split(df, df$group), function(i){i$goal <- f1(i$criterium);
i$goal <- replace(i$goal, is.na(i$criterium)|!i$criterium, NA);
i}))
Of course, If you want you can apply it via dplyr
, i.e.
library(dplyr)
df %>%
group_by(group) %>%
mutate(goal = f1(criterium),
goal = replace(goal, is.na(criterium)|!criterium, NA))
which gives,
# A tibble: 18 x 3
# Groups: group [2]
group criterium goal
<fct> <lgl> <dbl>
1 A NA NA
2 A TRUE 1
3 A TRUE 1
4 A TRUE 1
5 A FALSE NA
6 A FALSE NA
7 A TRUE 2
8 A TRUE 2
9 A FALSE NA
10 A TRUE 3
11 A TRUE 3
12 A TRUE 3
13 B NA NA
14 B FALSE NA
15 B TRUE 1
16 B TRUE 1
17 B TRUE 1
18 B FALSE NA
add a comment |
A pure Base R solution, we can create a custom function via rle
, and use it per group, i.e.
f1 <- function(x) {
x[is.na(x)] <- FALSE
rle1 <- rle(x)
y <- rle1$values
rle1$values[!y] <- 0
rle1$values[y] <- cumsum(rle1$values[y])
return(inverse.rle(rle1))
}
do.call(rbind,
lapply(split(df, df$group), function(i){i$goal <- f1(i$criterium);
i$goal <- replace(i$goal, is.na(i$criterium)|!i$criterium, NA);
i}))
Of course, If you want you can apply it via dplyr
, i.e.
library(dplyr)
df %>%
group_by(group) %>%
mutate(goal = f1(criterium),
goal = replace(goal, is.na(criterium)|!criterium, NA))
which gives,
# A tibble: 18 x 3
# Groups: group [2]
group criterium goal
<fct> <lgl> <dbl>
1 A NA NA
2 A TRUE 1
3 A TRUE 1
4 A TRUE 1
5 A FALSE NA
6 A FALSE NA
7 A TRUE 2
8 A TRUE 2
9 A FALSE NA
10 A TRUE 3
11 A TRUE 3
12 A TRUE 3
13 B NA NA
14 B FALSE NA
15 B TRUE 1
16 B TRUE 1
17 B TRUE 1
18 B FALSE NA
add a comment |
A pure Base R solution, we can create a custom function via rle
, and use it per group, i.e.
f1 <- function(x) {
x[is.na(x)] <- FALSE
rle1 <- rle(x)
y <- rle1$values
rle1$values[!y] <- 0
rle1$values[y] <- cumsum(rle1$values[y])
return(inverse.rle(rle1))
}
do.call(rbind,
lapply(split(df, df$group), function(i){i$goal <- f1(i$criterium);
i$goal <- replace(i$goal, is.na(i$criterium)|!i$criterium, NA);
i}))
Of course, If you want you can apply it via dplyr
, i.e.
library(dplyr)
df %>%
group_by(group) %>%
mutate(goal = f1(criterium),
goal = replace(goal, is.na(criterium)|!criterium, NA))
which gives,
# A tibble: 18 x 3
# Groups: group [2]
group criterium goal
<fct> <lgl> <dbl>
1 A NA NA
2 A TRUE 1
3 A TRUE 1
4 A TRUE 1
5 A FALSE NA
6 A FALSE NA
7 A TRUE 2
8 A TRUE 2
9 A FALSE NA
10 A TRUE 3
11 A TRUE 3
12 A TRUE 3
13 B NA NA
14 B FALSE NA
15 B TRUE 1
16 B TRUE 1
17 B TRUE 1
18 B FALSE NA
A pure Base R solution, we can create a custom function via rle
, and use it per group, i.e.
f1 <- function(x) {
x[is.na(x)] <- FALSE
rle1 <- rle(x)
y <- rle1$values
rle1$values[!y] <- 0
rle1$values[y] <- cumsum(rle1$values[y])
return(inverse.rle(rle1))
}
do.call(rbind,
lapply(split(df, df$group), function(i){i$goal <- f1(i$criterium);
i$goal <- replace(i$goal, is.na(i$criterium)|!i$criterium, NA);
i}))
Of course, If you want you can apply it via dplyr
, i.e.
library(dplyr)
df %>%
group_by(group) %>%
mutate(goal = f1(criterium),
goal = replace(goal, is.na(criterium)|!criterium, NA))
which gives,
# A tibble: 18 x 3
# Groups: group [2]
group criterium goal
<fct> <lgl> <dbl>
1 A NA NA
2 A TRUE 1
3 A TRUE 1
4 A TRUE 1
5 A FALSE NA
6 A FALSE NA
7 A TRUE 2
8 A TRUE 2
9 A FALSE NA
10 A TRUE 3
11 A TRUE 3
12 A TRUE 3
13 B NA NA
14 B FALSE NA
15 B TRUE 1
16 B TRUE 1
17 B TRUE 1
18 B FALSE NA
edited 2 days ago
answered 2 days ago
SotosSotos
31.5k51741
31.5k51741
add a comment |
add a comment |
A data.table
option using rle
library(data.table)
DT <- as.data.table(dat)
DT[, goal := {
r <- rle(replace(criterium, is.na(criterium), FALSE))
r$values <- with(r, cumsum(values) * values)
out <- inverse.rle(r)
replace(out, out == 0, NA)
}, by = group]
DT
# group criterium goal
# 1: A NA NA
# 2: A TRUE 1
# 3: A TRUE 1
# 4: A TRUE 1
# 5: A FALSE NA
# 6: A FALSE NA
# 7: A TRUE 2
# 8: A TRUE 2
# 9: A FALSE NA
#10: A TRUE 3
#11: A TRUE 3
#12: A TRUE 3
#13: B NA NA
#14: B FALSE NA
#15: B TRUE 1
#16: B TRUE 1
#17: B TRUE 1
#18: B FALSE NA
step by step
When we call r <- rle(replace(criterium, is.na(criterium), FALSE))
we get an object of class rle
r
#Run Length Encoding
# lengths: int [1:9] 1 3 2 2 1 3 2 3 1
# values : logi [1:9] FALSE TRUE FALSE TRUE FALSE TRUE ...
We manipulate the values
compenent in the following way
r$values <- with(r, cumsum(values) * values)
r
#Run Length Encoding
# lengths: int [1:9] 1 3 2 2 1 3 2 3 1
# values : int [1:9] 0 1 0 2 0 3 0 4 0
That is, we replaced TRUE
s with the cumulative sum of values
and set the FALSE
s to 0
. Now inverse.rle
returns a vector in which values
will repeated lenghts
times
out <- inverse.rle(r)
out
# [1] 0 1 1 1 0 0 2 2 0 3 3 3 0 0 4 4 4 0
This is almost what OP wants but we need to replace the 0
s with NA
replace(out, out == 0, NA)
This is done for each group
.
data
dat <- structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE,
FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE,
TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA,
-18L))
Wow, impressive. Thanks for introducing me torle
andinverse.rle
. Gruß nach Leipzig.
– Humpelstielzchen
2 days ago
1
@Humpelstielzchen Gern geschehen. Will try to simplify and explain the logic a bit.
– markus
2 days ago
Thanks! I was dissecting your answer just like that. Your answer taught me the most. But chinsoon12 is just a Teufelskerl. ^^
– Humpelstielzchen
2 days ago
add a comment |
A data.table
option using rle
library(data.table)
DT <- as.data.table(dat)
DT[, goal := {
r <- rle(replace(criterium, is.na(criterium), FALSE))
r$values <- with(r, cumsum(values) * values)
out <- inverse.rle(r)
replace(out, out == 0, NA)
}, by = group]
DT
# group criterium goal
# 1: A NA NA
# 2: A TRUE 1
# 3: A TRUE 1
# 4: A TRUE 1
# 5: A FALSE NA
# 6: A FALSE NA
# 7: A TRUE 2
# 8: A TRUE 2
# 9: A FALSE NA
#10: A TRUE 3
#11: A TRUE 3
#12: A TRUE 3
#13: B NA NA
#14: B FALSE NA
#15: B TRUE 1
#16: B TRUE 1
#17: B TRUE 1
#18: B FALSE NA
step by step
When we call r <- rle(replace(criterium, is.na(criterium), FALSE))
we get an object of class rle
r
#Run Length Encoding
# lengths: int [1:9] 1 3 2 2 1 3 2 3 1
# values : logi [1:9] FALSE TRUE FALSE TRUE FALSE TRUE ...
We manipulate the values
compenent in the following way
r$values <- with(r, cumsum(values) * values)
r
#Run Length Encoding
# lengths: int [1:9] 1 3 2 2 1 3 2 3 1
# values : int [1:9] 0 1 0 2 0 3 0 4 0
That is, we replaced TRUE
s with the cumulative sum of values
and set the FALSE
s to 0
. Now inverse.rle
returns a vector in which values
will repeated lenghts
times
out <- inverse.rle(r)
out
# [1] 0 1 1 1 0 0 2 2 0 3 3 3 0 0 4 4 4 0
This is almost what OP wants but we need to replace the 0
s with NA
replace(out, out == 0, NA)
This is done for each group
.
data
dat <- structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE,
FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE,
TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA,
-18L))
Wow, impressive. Thanks for introducing me torle
andinverse.rle
. Gruß nach Leipzig.
– Humpelstielzchen
2 days ago
1
@Humpelstielzchen Gern geschehen. Will try to simplify and explain the logic a bit.
– markus
2 days ago
Thanks! I was dissecting your answer just like that. Your answer taught me the most. But chinsoon12 is just a Teufelskerl. ^^
– Humpelstielzchen
2 days ago
add a comment |
A data.table
option using rle
library(data.table)
DT <- as.data.table(dat)
DT[, goal := {
r <- rle(replace(criterium, is.na(criterium), FALSE))
r$values <- with(r, cumsum(values) * values)
out <- inverse.rle(r)
replace(out, out == 0, NA)
}, by = group]
DT
# group criterium goal
# 1: A NA NA
# 2: A TRUE 1
# 3: A TRUE 1
# 4: A TRUE 1
# 5: A FALSE NA
# 6: A FALSE NA
# 7: A TRUE 2
# 8: A TRUE 2
# 9: A FALSE NA
#10: A TRUE 3
#11: A TRUE 3
#12: A TRUE 3
#13: B NA NA
#14: B FALSE NA
#15: B TRUE 1
#16: B TRUE 1
#17: B TRUE 1
#18: B FALSE NA
step by step
When we call r <- rle(replace(criterium, is.na(criterium), FALSE))
we get an object of class rle
r
#Run Length Encoding
# lengths: int [1:9] 1 3 2 2 1 3 2 3 1
# values : logi [1:9] FALSE TRUE FALSE TRUE FALSE TRUE ...
We manipulate the values
compenent in the following way
r$values <- with(r, cumsum(values) * values)
r
#Run Length Encoding
# lengths: int [1:9] 1 3 2 2 1 3 2 3 1
# values : int [1:9] 0 1 0 2 0 3 0 4 0
That is, we replaced TRUE
s with the cumulative sum of values
and set the FALSE
s to 0
. Now inverse.rle
returns a vector in which values
will repeated lenghts
times
out <- inverse.rle(r)
out
# [1] 0 1 1 1 0 0 2 2 0 3 3 3 0 0 4 4 4 0
This is almost what OP wants but we need to replace the 0
s with NA
replace(out, out == 0, NA)
This is done for each group
.
data
dat <- structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE,
FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE,
TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA,
-18L))
A data.table
option using rle
library(data.table)
DT <- as.data.table(dat)
DT[, goal := {
r <- rle(replace(criterium, is.na(criterium), FALSE))
r$values <- with(r, cumsum(values) * values)
out <- inverse.rle(r)
replace(out, out == 0, NA)
}, by = group]
DT
# group criterium goal
# 1: A NA NA
# 2: A TRUE 1
# 3: A TRUE 1
# 4: A TRUE 1
# 5: A FALSE NA
# 6: A FALSE NA
# 7: A TRUE 2
# 8: A TRUE 2
# 9: A FALSE NA
#10: A TRUE 3
#11: A TRUE 3
#12: A TRUE 3
#13: B NA NA
#14: B FALSE NA
#15: B TRUE 1
#16: B TRUE 1
#17: B TRUE 1
#18: B FALSE NA
step by step
When we call r <- rle(replace(criterium, is.na(criterium), FALSE))
we get an object of class rle
r
#Run Length Encoding
# lengths: int [1:9] 1 3 2 2 1 3 2 3 1
# values : logi [1:9] FALSE TRUE FALSE TRUE FALSE TRUE ...
We manipulate the values
compenent in the following way
r$values <- with(r, cumsum(values) * values)
r
#Run Length Encoding
# lengths: int [1:9] 1 3 2 2 1 3 2 3 1
# values : int [1:9] 0 1 0 2 0 3 0 4 0
That is, we replaced TRUE
s with the cumulative sum of values
and set the FALSE
s to 0
. Now inverse.rle
returns a vector in which values
will repeated lenghts
times
out <- inverse.rle(r)
out
# [1] 0 1 1 1 0 0 2 2 0 3 3 3 0 0 4 4 4 0
This is almost what OP wants but we need to replace the 0
s with NA
replace(out, out == 0, NA)
This is done for each group
.
data
dat <- structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE,
FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE,
TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA,
-18L))
edited 2 days ago
answered 2 days ago
markusmarkus
15.5k11336
15.5k11336
Wow, impressive. Thanks for introducing me torle
andinverse.rle
. Gruß nach Leipzig.
– Humpelstielzchen
2 days ago
1
@Humpelstielzchen Gern geschehen. Will try to simplify and explain the logic a bit.
– markus
2 days ago
Thanks! I was dissecting your answer just like that. Your answer taught me the most. But chinsoon12 is just a Teufelskerl. ^^
– Humpelstielzchen
2 days ago
add a comment |
Wow, impressive. Thanks for introducing me torle
andinverse.rle
. Gruß nach Leipzig.
– Humpelstielzchen
2 days ago
1
@Humpelstielzchen Gern geschehen. Will try to simplify and explain the logic a bit.
– markus
2 days ago
Thanks! I was dissecting your answer just like that. Your answer taught me the most. But chinsoon12 is just a Teufelskerl. ^^
– Humpelstielzchen
2 days ago
Wow, impressive. Thanks for introducing me to
rle
and inverse.rle
. Gruß nach Leipzig.– Humpelstielzchen
2 days ago
Wow, impressive. Thanks for introducing me to
rle
and inverse.rle
. Gruß nach Leipzig.– Humpelstielzchen
2 days ago
1
1
@Humpelstielzchen Gern geschehen. Will try to simplify and explain the logic a bit.
– markus
2 days ago
@Humpelstielzchen Gern geschehen. Will try to simplify and explain the logic a bit.
– markus
2 days ago
Thanks! I was dissecting your answer just like that. Your answer taught me the most. But chinsoon12 is just a Teufelskerl. ^^
– Humpelstielzchen
2 days ago
Thanks! I was dissecting your answer just like that. Your answer taught me the most. But chinsoon12 is just a Teufelskerl. ^^
– Humpelstielzchen
2 days ago
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55606323%2frank-groups-within-a-grouped-sequence-of-true-false-and-na%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
you can just about grab what you need with this work of beauty;
as.numeric(as.factor(cumsum(is.na(d$criterium^NA)) + d$criterium^NA))
-- just needs to be applied by group– user20650
2 days ago
that is a really funny solution. Very good job!
– Humpelstielzchen
2 days ago
In your example all of group A comes first, then group B. We don't need to handle cases with group=A, criterium=TRUE interspersed with group=B, criterium=TRUE?
– smci
2 days ago
No, when group A stops so stops the sequence for group A.
– Humpelstielzchen
2 days ago
But I'm suggesting if you construct an example with group=A, criterium=TRUE followed by group=B, criterium=TRUE (with no FALSE's in-between), would that get a new 'goal' number or not? Some of the answers here will fail because they don't group-by
group
or consider the discontinuity ingroup
.– smci
2 days ago