Rank groups within a grouped sequence of TRUE/FALSE and NA

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}

I have a little nut to crack.

I have a data.frame like this:

   group criterium

1      A        NA

2      A      TRUE

3      A      TRUE

4      A      TRUE

5      A     FALSE

6      A     FALSE

7      A      TRUE

8      A      TRUE

9      A     FALSE

10     A      TRUE

11     A      TRUE

12     A      TRUE

13     B        NA

14     B     FALSE

15     B      TRUE

16     B      TRUE

17     B      TRUE

18     B     FALSE



structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 

1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", 

"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE, 

FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE, 

TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA, 

-18L))

And I want to rank the groups of TRUE in column criterium in ascending order while disregarding the FALSEand NA. The goal is to have a unique group identifier inside each group of group.

So the result should look like:

    group criterium goal

1      A        NA   NA

2      A      TRUE    1

3      A      TRUE    1

4      A      TRUE    1

5      A     FALSE   NA

6      A     FALSE   NA

7      A      TRUE    2

8      A      TRUE    2

9      A     FALSE   NA

10     A      TRUE    3

11     A      TRUE    3

12     A      TRUE    3

13     B        NA   NA

14     B     FALSE   NA

15     B      TRUE    1

16     B      TRUE    1

17     B      TRUE    1

18     B     FALSE   NA

I'm sure there is a relatively easy way to do this, I just can't think of one. I experimented with dense_rank() and other window functions of dplyr, but to no avail.

Thanks for the help!

edited 2 days ago

asked 2 days ago

Humpelstielzchen

1,4251318

1

you can just about grab what you need with this work of beauty; as.numeric(as.factor(cumsum(is.na(d$criterium^NA)) + d$criterium^NA)) -- just needs to be applied by group

– user20650
2 days ago

that is a really funny solution. Very good job!

– Humpelstielzchen
2 days ago

In your example all of group A comes first, then group B. We don't need to handle cases with group=A, criterium=TRUE interspersed with group=B, criterium=TRUE?

– smci
2 days ago

No, when group A stops so stops the sequence for group A.

– Humpelstielzchen
2 days ago

But I'm suggesting if you construct an example with group=A, criterium=TRUE followed by group=B, criterium=TRUE (with no FALSE's in-between), would that get a new 'goal' number or not? Some of the answers here will fail because they don't group-by group or consider the discontinuity in group.

– smci
2 days ago

|
show 1 more comment

I have a little nut to crack.

I have a data.frame like this:

   group criterium

1      A        NA

2      A      TRUE

3      A      TRUE

4      A      TRUE

5      A     FALSE

6      A     FALSE

7      A      TRUE

8      A      TRUE

9      A     FALSE

10     A      TRUE

11     A      TRUE

12     A      TRUE

13     B        NA

14     B     FALSE

15     B      TRUE

16     B      TRUE

17     B      TRUE

18     B     FALSE



structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 

1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", 

"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE, 

FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE, 

TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA, 

-18L))

And I want to rank the groups of TRUE in column criterium in ascending order while disregarding the FALSEand NA. The goal is to have a unique group identifier inside each group of group.

So the result should look like:

    group criterium goal

1      A        NA   NA

2      A      TRUE    1

3      A      TRUE    1

4      A      TRUE    1

5      A     FALSE   NA

6      A     FALSE   NA

7      A      TRUE    2

8      A      TRUE    2

9      A     FALSE   NA

10     A      TRUE    3

11     A      TRUE    3

12     A      TRUE    3

13     B        NA   NA

14     B     FALSE   NA

15     B      TRUE    1

16     B      TRUE    1

17     B      TRUE    1

18     B     FALSE   NA

I'm sure there is a relatively easy way to do this, I just can't think of one. I experimented with dense_rank() and other window functions of dplyr, but to no avail.

Thanks for the help!

edited 2 days ago

asked 2 days ago

Humpelstielzchen

1,4251318

1

you can just about grab what you need with this work of beauty; as.numeric(as.factor(cumsum(is.na(d$criterium^NA)) + d$criterium^NA)) -- just needs to be applied by group

– user20650
2 days ago

that is a really funny solution. Very good job!

– Humpelstielzchen
2 days ago

In your example all of group A comes first, then group B. We don't need to handle cases with group=A, criterium=TRUE interspersed with group=B, criterium=TRUE?

– smci
2 days ago

No, when group A stops so stops the sequence for group A.

– Humpelstielzchen
2 days ago

But I'm suggesting if you construct an example with group=A, criterium=TRUE followed by group=B, criterium=TRUE (with no FALSE's in-between), would that get a new 'goal' number or not? Some of the answers here will fail because they don't group-by group or consider the discontinuity in group.

– smci
2 days ago

|
show 1 more comment

I have a little nut to crack.

I have a data.frame like this:

   group criterium

1      A        NA

2      A      TRUE

3      A      TRUE

4      A      TRUE

5      A     FALSE

6      A     FALSE

7      A      TRUE

8      A      TRUE

9      A     FALSE

10     A      TRUE

11     A      TRUE

12     A      TRUE

13     B        NA

14     B     FALSE

15     B      TRUE

16     B      TRUE

17     B      TRUE

18     B     FALSE



structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 

1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", 

"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE, 

FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE, 

TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA, 

-18L))

And I want to rank the groups of TRUE in column criterium in ascending order while disregarding the FALSEand NA. The goal is to have a unique group identifier inside each group of group.

So the result should look like:

    group criterium goal

1      A        NA   NA

2      A      TRUE    1

3      A      TRUE    1

4      A      TRUE    1

5      A     FALSE   NA

6      A     FALSE   NA

7      A      TRUE    2

8      A      TRUE    2

9      A     FALSE   NA

10     A      TRUE    3

11     A      TRUE    3

12     A      TRUE    3

13     B        NA   NA

14     B     FALSE   NA

15     B      TRUE    1

16     B      TRUE    1

17     B      TRUE    1

18     B     FALSE   NA

I'm sure there is a relatively easy way to do this, I just can't think of one. I experimented with dense_rank() and other window functions of dplyr, but to no avail.

Thanks for the help!

edited 2 days ago

asked 2 days ago

Humpelstielzchen

1,4251318

I have a little nut to crack.

I have a data.frame like this:

   group criterium

1      A        NA

2      A      TRUE

3      A      TRUE

4      A      TRUE

5      A     FALSE

6      A     FALSE

7      A      TRUE

8      A      TRUE

9      A     FALSE

10     A      TRUE

11     A      TRUE

12     A      TRUE

13     B        NA

14     B     FALSE

15     B      TRUE

16     B      TRUE

17     B      TRUE

18     B     FALSE



structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 

1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", 

"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE, 

FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE, 

TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA, 

-18L))

And I want to rank the groups of TRUE in column criterium in ascending order while disregarding the FALSEand NA. The goal is to have a unique group identifier inside each group of group.

So the result should look like:

    group criterium goal

1      A        NA   NA

2      A      TRUE    1

3      A      TRUE    1

4      A      TRUE    1

5      A     FALSE   NA

6      A     FALSE   NA

7      A      TRUE    2

8      A      TRUE    2

9      A     FALSE   NA

10     A      TRUE    3

11     A      TRUE    3

12     A      TRUE    3

13     B        NA   NA

14     B     FALSE   NA

15     B      TRUE    1

16     B      TRUE    1

17     B      TRUE    1

18     B     FALSE   NA

I'm sure there is a relatively easy way to do this, I just can't think of one. I experimented with dense_rank() and other window functions of dplyr, but to no avail.

Thanks for the help!

r dplyr data.table rank

edited 2 days ago

asked 2 days ago

Humpelstielzchen

1,4251318

edited 2 days ago

asked 2 days ago

Humpelstielzchen

1,4251318

edited 2 days ago

asked 2 days ago

Humpelstielzchen

1,4251318

asked 2 days ago

Humpelstielzchen

1,4251318

asked 2 days ago

Humpelstielzchen

1,4251318

1

you can just about grab what you need with this work of beauty; as.numeric(as.factor(cumsum(is.na(d$criterium^NA)) + d$criterium^NA)) -- just needs to be applied by group

– user20650
2 days ago

that is a really funny solution. Very good job!

– Humpelstielzchen
2 days ago

In your example all of group A comes first, then group B. We don't need to handle cases with group=A, criterium=TRUE interspersed with group=B, criterium=TRUE?

– smci
2 days ago

No, when group A stops so stops the sequence for group A.

– Humpelstielzchen
2 days ago

But I'm suggesting if you construct an example with group=A, criterium=TRUE followed by group=B, criterium=TRUE (with no FALSE's in-between), would that get a new 'goal' number or not? Some of the answers here will fail because they don't group-by group or consider the discontinuity in group.

– smci
2 days ago

|
show 1 more comment

1

you can just about grab what you need with this work of beauty; as.numeric(as.factor(cumsum(is.na(d$criterium^NA)) + d$criterium^NA)) -- just needs to be applied by group

– user20650
2 days ago

that is a really funny solution. Very good job!

– Humpelstielzchen
2 days ago

In your example all of group A comes first, then group B. We don't need to handle cases with group=A, criterium=TRUE interspersed with group=B, criterium=TRUE?

– smci
2 days ago

No, when group A stops so stops the sequence for group A.

– Humpelstielzchen
2 days ago

But I'm suggesting if you construct an example with group=A, criterium=TRUE followed by group=B, criterium=TRUE (with no FALSE's in-between), would that get a new 'goal' number or not? Some of the answers here will fail because they don't group-by group or consider the discontinuity in group.

– smci
2 days ago

you can just about grab what you need with this work of beauty; as.numeric(as.factor(cumsum(is.na(d$criterium^NA)) + d$criterium^NA)) -- just needs to be applied by group

– user20650
2 days ago

that is a really funny solution. Very good job!

– Humpelstielzchen
2 days ago

In your example all of group A comes first, then group B. We don't need to handle cases with group=A, criterium=TRUE interspersed with group=B, criterium=TRUE?

– smci
2 days ago

No, when group A stops so stops the sequence for group A.

– Humpelstielzchen
2 days ago

But I'm suggesting if you construct an example with group=A, criterium=TRUE followed by group=B, criterium=TRUE (with no FALSE's in-between), would that get a new 'goal' number or not? Some of the answers here will fail because they don't group-by group or consider the discontinuity in group.

– smci
2 days ago

|
show 1 more comment

4 Answers
4

active

oldest

votes

Another data.table approach:

library(data.table)

setDT(dt)

dt[, cr := rleid(criterium)][

    (criterium), goal := rleid(cr), by=.(group)]

answered 2 days ago

chinsoon12

9,93611420

1

Tried with rleid but didn't get it to work. (+1)

– markus
2 days ago

works for me. And seems to be the most elegant answer.

– Humpelstielzchen
2 days ago

add a comment |

Maybe I have over-complicated this but one way with dplyr is

library(dplyr)



df %>%

  mutate(temp = replace(criterium, is.na(criterium), FALSE), 

         temp1 = cumsum(!temp)) %>%

   group_by(temp1) %>%

   mutate(goal =  +(row_number() == which.max(temp) & any(temp))) %>%

   group_by(group) %>%

   mutate(goal = ifelse(temp, cumsum(goal), NA)) %>%

   select(-temp, -temp1)



#  group criterium  goal

#   <fct> <lgl>     <int>

# 1 A     NA           NA

# 2 A     TRUE          1

# 3 A     TRUE          1

# 4 A     TRUE          1

# 5 A     FALSE        NA

# 6 A     FALSE        NA

# 7 A     TRUE          2

# 8 A     TRUE          2

# 9 A     FALSE        NA

#10 A     TRUE          3

#11 A     TRUE          3

#12 A     TRUE          3

#13 B     NA           NA

#14 B     FALSE        NA

#15 B     TRUE          1

#16 B     TRUE          1

#17 B     TRUE          1

#18 B     FALSE        NA

We first replace NAs in criterium column to FALSE and take cumulative sum over the negation of it (temp1). We group_by temp1 and assign 1 to every first TRUE value in the group. Finally grouping by group we take a cumulative sum for TRUE values or return NA for FALSE and NA values.

answered 2 days ago

Ronak Shah

46.7k104269

add a comment |

A pure Base R solution, we can create a custom function via rle, and use it per group, i.e.

f1 <- function(x) {

    x[is.na(x)] <- FALSE

    rle1 <- rle(x)

    y <- rle1$values

    rle1$values[!y] <- 0

    rle1$values[y] <- cumsum(rle1$values[y])

    return(inverse.rle(rle1))

}





do.call(rbind, 

     lapply(split(df, df$group), function(i){i$goal <- f1(i$criterium); 

                                             i$goal <- replace(i$goal, is.na(i$criterium)|!i$criterium, NA); 

    i}))

Of course, If you want you can apply it via dplyr, i.e.

library(dplyr)



df %>% 

 group_by(group) %>% 

 mutate(goal = f1(criterium), 

        goal = replace(goal, is.na(criterium)|!criterium, NA))

which gives,

# A tibble: 18 x 3

# Groups:   group [2]

   group criterium  goal

   <fct> <lgl>     <dbl>

 1 A     NA           NA

 2 A     TRUE          1

 3 A     TRUE          1

 4 A     TRUE          1

 5 A     FALSE        NA

 6 A     FALSE        NA

 7 A     TRUE          2

 8 A     TRUE          2

 9 A     FALSE        NA

10 A     TRUE          3

11 A     TRUE          3

12 A     TRUE          3

13 B     NA           NA

14 B     FALSE        NA

15 B     TRUE          1

16 B     TRUE          1

17 B     TRUE          1

18 B     FALSE        NA

edited 2 days ago

answered 2 days ago

Sotos

31.5k51741

add a comment |

A data.table option using rle

library(data.table)

DT <- as.data.table(dat)

DT[, goal := {

  r <- rle(replace(criterium, is.na(criterium), FALSE))

  r$values <- with(r, cumsum(values) * values)          

  out <- inverse.rle(r)                                 

  replace(out, out == 0, NA)

}, by = group]

DT

#    group criterium goal

# 1:     A        NA   NA

# 2:     A      TRUE    1

# 3:     A      TRUE    1

# 4:     A      TRUE    1

# 5:     A     FALSE   NA

# 6:     A     FALSE   NA

# 7:     A      TRUE    2

# 8:     A      TRUE    2

# 9:     A     FALSE   NA

#10:     A      TRUE    3

#11:     A      TRUE    3

#12:     A      TRUE    3

#13:     B        NA   NA

#14:     B     FALSE   NA

#15:     B      TRUE    1

#16:     B      TRUE    1

#17:     B      TRUE    1

#18:     B     FALSE   NA

step by step

When we call r <- rle(replace(criterium, is.na(criterium), FALSE)) we get an object of class rle

r

#Run Length Encoding

#  lengths: int [1:9] 1 3 2 2 1 3 2 3 1

#  values : logi [1:9] FALSE TRUE FALSE TRUE FALSE TRUE ...

We manipulate the values compenent in the following way

r$values <- with(r, cumsum(values) * values)

r

#Run Length Encoding

#  lengths: int [1:9] 1 3 2 2 1 3 2 3 1

#  values : int [1:9] 0 1 0 2 0 3 0 4 0

That is, we replaced TRUEs with the cumulative sum of values and set the FALSEs to 0. Now inverse.rle returns a vector in which values will repeated lenghts times

out <- inverse.rle(r)

out

# [1] 0 1 1 1 0 0 2 2 0 3 3 3 0 0 4 4 4 0

This is almost what OP wants but we need to replace the 0s with NA

replace(out, out == 0, NA)

This is done for each group.

data

dat <- structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 

1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", 

"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE, 

FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE, 

TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA, 

-18L))

edited 2 days ago

answered 2 days ago

markus

15.5k11336

Wow, impressive. Thanks for introducing me to rleand inverse.rle. Gruß nach Leipzig.

– Humpelstielzchen
2 days ago

1

@Humpelstielzchen Gern geschehen. Will try to simplify and explain the logic a bit.

– markus
2 days ago

Thanks! I was dissecting your answer just like that. Your answer taught me the most. But chinsoon12 is just a Teufelskerl. ^^

– Humpelstielzchen
2 days ago

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55606323%2frank-groups-within-a-grouped-sequence-of-true-false-and-na%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

Another data.table approach:

library(data.table)

setDT(dt)

dt[, cr := rleid(criterium)][

    (criterium), goal := rleid(cr), by=.(group)]

answered 2 days ago

chinsoon12

9,93611420

1

Tried with rleid but didn't get it to work. (+1)

– markus
2 days ago

works for me. And seems to be the most elegant answer.

– Humpelstielzchen
2 days ago

add a comment |

Another data.table approach:

library(data.table)

setDT(dt)

dt[, cr := rleid(criterium)][

    (criterium), goal := rleid(cr), by=.(group)]

answered 2 days ago

chinsoon12

9,93611420

1

Tried with rleid but didn't get it to work. (+1)

– markus
2 days ago

works for me. And seems to be the most elegant answer.

– Humpelstielzchen
2 days ago

add a comment |

Another data.table approach:

library(data.table)

setDT(dt)

dt[, cr := rleid(criterium)][

    (criterium), goal := rleid(cr), by=.(group)]

answered 2 days ago

chinsoon12

9,93611420

Another data.table approach:

library(data.table)

setDT(dt)

dt[, cr := rleid(criterium)][

    (criterium), goal := rleid(cr), by=.(group)]

answered 2 days ago

chinsoon12

9,93611420

answered 2 days ago

chinsoon12

9,93611420

answered 2 days ago

chinsoon12

9,93611420

answered 2 days ago

chinsoon12

9,93611420

1

Tried with rleid but didn't get it to work. (+1)

– markus
2 days ago

works for me. And seems to be the most elegant answer.

– Humpelstielzchen
2 days ago

add a comment |

1

Tried with rleid but didn't get it to work. (+1)

– markus
2 days ago

works for me. And seems to be the most elegant answer.

– Humpelstielzchen
2 days ago

Tried with rleid but didn't get it to work. (+1)

– markus
2 days ago

works for me. And seems to be the most elegant answer.

– Humpelstielzchen
2 days ago

add a comment |

Maybe I have over-complicated this but one way with dplyr is

library(dplyr)



df %>%

  mutate(temp = replace(criterium, is.na(criterium), FALSE), 

         temp1 = cumsum(!temp)) %>%

   group_by(temp1) %>%

   mutate(goal =  +(row_number() == which.max(temp) & any(temp))) %>%

   group_by(group) %>%

   mutate(goal = ifelse(temp, cumsum(goal), NA)) %>%

   select(-temp, -temp1)



#  group criterium  goal

#   <fct> <lgl>     <int>

# 1 A     NA           NA

# 2 A     TRUE          1

# 3 A     TRUE          1

# 4 A     TRUE          1

# 5 A     FALSE        NA

# 6 A     FALSE        NA

# 7 A     TRUE          2

# 8 A     TRUE          2

# 9 A     FALSE        NA

#10 A     TRUE          3

#11 A     TRUE          3

#12 A     TRUE          3

#13 B     NA           NA

#14 B     FALSE        NA

#15 B     TRUE          1

#16 B     TRUE          1

#17 B     TRUE          1

#18 B     FALSE        NA

answered 2 days ago

Ronak Shah

46.7k104269

add a comment |

Maybe I have over-complicated this but one way with dplyr is

library(dplyr)



df %>%

  mutate(temp = replace(criterium, is.na(criterium), FALSE), 

         temp1 = cumsum(!temp)) %>%

   group_by(temp1) %>%

   mutate(goal =  +(row_number() == which.max(temp) & any(temp))) %>%

   group_by(group) %>%

   mutate(goal = ifelse(temp, cumsum(goal), NA)) %>%

   select(-temp, -temp1)



#  group criterium  goal

#   <fct> <lgl>     <int>

# 1 A     NA           NA

# 2 A     TRUE          1

# 3 A     TRUE          1

# 4 A     TRUE          1

# 5 A     FALSE        NA

# 6 A     FALSE        NA

# 7 A     TRUE          2

# 8 A     TRUE          2

# 9 A     FALSE        NA

#10 A     TRUE          3

#11 A     TRUE          3

#12 A     TRUE          3

#13 B     NA           NA

#14 B     FALSE        NA

#15 B     TRUE          1

#16 B     TRUE          1

#17 B     TRUE          1

#18 B     FALSE        NA

answered 2 days ago

Ronak Shah

46.7k104269

add a comment |

Maybe I have over-complicated this but one way with dplyr is

library(dplyr)



df %>%

  mutate(temp = replace(criterium, is.na(criterium), FALSE), 

         temp1 = cumsum(!temp)) %>%

   group_by(temp1) %>%

   mutate(goal =  +(row_number() == which.max(temp) & any(temp))) %>%

   group_by(group) %>%

   mutate(goal = ifelse(temp, cumsum(goal), NA)) %>%

   select(-temp, -temp1)



#  group criterium  goal

#   <fct> <lgl>     <int>

# 1 A     NA           NA

# 2 A     TRUE          1

# 3 A     TRUE          1

# 4 A     TRUE          1

# 5 A     FALSE        NA

# 6 A     FALSE        NA

# 7 A     TRUE          2

# 8 A     TRUE          2

# 9 A     FALSE        NA

#10 A     TRUE          3

#11 A     TRUE          3

#12 A     TRUE          3

#13 B     NA           NA

#14 B     FALSE        NA

#15 B     TRUE          1

#16 B     TRUE          1

#17 B     TRUE          1

#18 B     FALSE        NA

answered 2 days ago

Ronak Shah

46.7k104269

Maybe I have over-complicated this but one way with dplyr is

library(dplyr)



df %>%

  mutate(temp = replace(criterium, is.na(criterium), FALSE), 

         temp1 = cumsum(!temp)) %>%

   group_by(temp1) %>%

   mutate(goal =  +(row_number() == which.max(temp) & any(temp))) %>%

   group_by(group) %>%

   mutate(goal = ifelse(temp, cumsum(goal), NA)) %>%

   select(-temp, -temp1)



#  group criterium  goal

#   <fct> <lgl>     <int>

# 1 A     NA           NA

# 2 A     TRUE          1

# 3 A     TRUE          1

# 4 A     TRUE          1

# 5 A     FALSE        NA

# 6 A     FALSE        NA

# 7 A     TRUE          2

# 8 A     TRUE          2

# 9 A     FALSE        NA

#10 A     TRUE          3

#11 A     TRUE          3

#12 A     TRUE          3

#13 B     NA           NA

#14 B     FALSE        NA

#15 B     TRUE          1

#16 B     TRUE          1

#17 B     TRUE          1

#18 B     FALSE        NA

answered 2 days ago

Ronak Shah

46.7k104269

answered 2 days ago

Ronak Shah

46.7k104269

answered 2 days ago

Ronak Shah

46.7k104269

answered 2 days ago

Ronak Shah

46.7k104269

add a comment |

A pure Base R solution, we can create a custom function via rle, and use it per group, i.e.

f1 <- function(x) {

    x[is.na(x)] <- FALSE

    rle1 <- rle(x)

    y <- rle1$values

    rle1$values[!y] <- 0

    rle1$values[y] <- cumsum(rle1$values[y])

    return(inverse.rle(rle1))

}





do.call(rbind, 

     lapply(split(df, df$group), function(i){i$goal <- f1(i$criterium); 

                                             i$goal <- replace(i$goal, is.na(i$criterium)|!i$criterium, NA); 

    i}))

Of course, If you want you can apply it via dplyr, i.e.

library(dplyr)



df %>% 

 group_by(group) %>% 

 mutate(goal = f1(criterium), 

        goal = replace(goal, is.na(criterium)|!criterium, NA))

which gives,

# A tibble: 18 x 3

# Groups:   group [2]

   group criterium  goal

   <fct> <lgl>     <dbl>

 1 A     NA           NA

 2 A     TRUE          1

 3 A     TRUE          1

 4 A     TRUE          1

 5 A     FALSE        NA

 6 A     FALSE        NA

 7 A     TRUE          2

 8 A     TRUE          2

 9 A     FALSE        NA

10 A     TRUE          3

11 A     TRUE          3

12 A     TRUE          3

13 B     NA           NA

14 B     FALSE        NA

15 B     TRUE          1

16 B     TRUE          1

17 B     TRUE          1

18 B     FALSE        NA

edited 2 days ago

answered 2 days ago

Sotos

31.5k51741

add a comment |

A pure Base R solution, we can create a custom function via rle, and use it per group, i.e.

f1 <- function(x) {

    x[is.na(x)] <- FALSE

    rle1 <- rle(x)

    y <- rle1$values

    rle1$values[!y] <- 0

    rle1$values[y] <- cumsum(rle1$values[y])

    return(inverse.rle(rle1))

}





do.call(rbind, 

     lapply(split(df, df$group), function(i){i$goal <- f1(i$criterium); 

                                             i$goal <- replace(i$goal, is.na(i$criterium)|!i$criterium, NA); 

    i}))

Of course, If you want you can apply it via dplyr, i.e.

library(dplyr)



df %>% 

 group_by(group) %>% 

 mutate(goal = f1(criterium), 

        goal = replace(goal, is.na(criterium)|!criterium, NA))

which gives,

# A tibble: 18 x 3

# Groups:   group [2]

   group criterium  goal

   <fct> <lgl>     <dbl>

 1 A     NA           NA

 2 A     TRUE          1

 3 A     TRUE          1

 4 A     TRUE          1

 5 A     FALSE        NA

 6 A     FALSE        NA

 7 A     TRUE          2

 8 A     TRUE          2

 9 A     FALSE        NA

10 A     TRUE          3

11 A     TRUE          3

12 A     TRUE          3

13 B     NA           NA

14 B     FALSE        NA

15 B     TRUE          1

16 B     TRUE          1

17 B     TRUE          1

18 B     FALSE        NA

edited 2 days ago

answered 2 days ago

Sotos

31.5k51741

add a comment |

A pure Base R solution, we can create a custom function via rle, and use it per group, i.e.

f1 <- function(x) {

    x[is.na(x)] <- FALSE

    rle1 <- rle(x)

    y <- rle1$values

    rle1$values[!y] <- 0

    rle1$values[y] <- cumsum(rle1$values[y])

    return(inverse.rle(rle1))

}





do.call(rbind, 

     lapply(split(df, df$group), function(i){i$goal <- f1(i$criterium); 

                                             i$goal <- replace(i$goal, is.na(i$criterium)|!i$criterium, NA); 

    i}))

Of course, If you want you can apply it via dplyr, i.e.

library(dplyr)



df %>% 

 group_by(group) %>% 

 mutate(goal = f1(criterium), 

        goal = replace(goal, is.na(criterium)|!criterium, NA))

which gives,

# A tibble: 18 x 3

# Groups:   group [2]

   group criterium  goal

   <fct> <lgl>     <dbl>

 1 A     NA           NA

 2 A     TRUE          1

 3 A     TRUE          1

 4 A     TRUE          1

 5 A     FALSE        NA

 6 A     FALSE        NA

 7 A     TRUE          2

 8 A     TRUE          2

 9 A     FALSE        NA

10 A     TRUE          3

11 A     TRUE          3

12 A     TRUE          3

13 B     NA           NA

14 B     FALSE        NA

15 B     TRUE          1

16 B     TRUE          1

17 B     TRUE          1

18 B     FALSE        NA

edited 2 days ago

answered 2 days ago

Sotos

31.5k51741

A pure Base R solution, we can create a custom function via rle, and use it per group, i.e.

f1 <- function(x) {

    x[is.na(x)] <- FALSE

    rle1 <- rle(x)

    y <- rle1$values

    rle1$values[!y] <- 0

    rle1$values[y] <- cumsum(rle1$values[y])

    return(inverse.rle(rle1))

}





do.call(rbind, 

     lapply(split(df, df$group), function(i){i$goal <- f1(i$criterium); 

                                             i$goal <- replace(i$goal, is.na(i$criterium)|!i$criterium, NA); 

    i}))

Of course, If you want you can apply it via dplyr, i.e.

library(dplyr)



df %>% 

 group_by(group) %>% 

 mutate(goal = f1(criterium), 

        goal = replace(goal, is.na(criterium)|!criterium, NA))

which gives,

# A tibble: 18 x 3

# Groups:   group [2]

   group criterium  goal

   <fct> <lgl>     <dbl>

 1 A     NA           NA

 2 A     TRUE          1

 3 A     TRUE          1

 4 A     TRUE          1

 5 A     FALSE        NA

 6 A     FALSE        NA

 7 A     TRUE          2

 8 A     TRUE          2

 9 A     FALSE        NA

10 A     TRUE          3

11 A     TRUE          3

12 A     TRUE          3

13 B     NA           NA

14 B     FALSE        NA

15 B     TRUE          1

16 B     TRUE          1

17 B     TRUE          1

18 B     FALSE        NA

edited 2 days ago

answered 2 days ago

Sotos

31.5k51741

edited 2 days ago

answered 2 days ago

Sotos

31.5k51741

answered 2 days ago

Sotos

31.5k51741

answered 2 days ago

Sotos

31.5k51741

add a comment |

A data.table option using rle

library(data.table)

DT <- as.data.table(dat)

DT[, goal := {

  r <- rle(replace(criterium, is.na(criterium), FALSE))

  r$values <- with(r, cumsum(values) * values)          

  out <- inverse.rle(r)                                 

  replace(out, out == 0, NA)

}, by = group]

DT

#    group criterium goal

# 1:     A        NA   NA

# 2:     A      TRUE    1

# 3:     A      TRUE    1

# 4:     A      TRUE    1

# 5:     A     FALSE   NA

# 6:     A     FALSE   NA

# 7:     A      TRUE    2

# 8:     A      TRUE    2

# 9:     A     FALSE   NA

#10:     A      TRUE    3

#11:     A      TRUE    3

#12:     A      TRUE    3

#13:     B        NA   NA

#14:     B     FALSE   NA

#15:     B      TRUE    1

#16:     B      TRUE    1

#17:     B      TRUE    1

#18:     B     FALSE   NA

step by step

When we call r <- rle(replace(criterium, is.na(criterium), FALSE)) we get an object of class rle

r

#Run Length Encoding

#  lengths: int [1:9] 1 3 2 2 1 3 2 3 1

#  values : logi [1:9] FALSE TRUE FALSE TRUE FALSE TRUE ...

We manipulate the values compenent in the following way

r$values <- with(r, cumsum(values) * values)

r

#Run Length Encoding

#  lengths: int [1:9] 1 3 2 2 1 3 2 3 1

#  values : int [1:9] 0 1 0 2 0 3 0 4 0

That is, we replaced TRUEs with the cumulative sum of values and set the FALSEs to 0. Now inverse.rle returns a vector in which values will repeated lenghts times

out <- inverse.rle(r)

out

# [1] 0 1 1 1 0 0 2 2 0 3 3 3 0 0 4 4 4 0

This is almost what OP wants but we need to replace the 0s with NA

replace(out, out == 0, NA)

This is done for each group.

data

dat <- structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 

1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", 

"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE, 

FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE, 

TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA, 

-18L))

edited 2 days ago

answered 2 days ago

markus

15.5k11336

Wow, impressive. Thanks for introducing me to rleand inverse.rle. Gruß nach Leipzig.

– Humpelstielzchen
2 days ago

1

@Humpelstielzchen Gern geschehen. Will try to simplify and explain the logic a bit.

– markus
2 days ago

Thanks! I was dissecting your answer just like that. Your answer taught me the most. But chinsoon12 is just a Teufelskerl. ^^

– Humpelstielzchen
2 days ago

add a comment |

A data.table option using rle

library(data.table)

DT <- as.data.table(dat)

DT[, goal := {

  r <- rle(replace(criterium, is.na(criterium), FALSE))

  r$values <- with(r, cumsum(values) * values)          

  out <- inverse.rle(r)                                 

  replace(out, out == 0, NA)

}, by = group]

DT

#    group criterium goal

# 1:     A        NA   NA

# 2:     A      TRUE    1

# 3:     A      TRUE    1

# 4:     A      TRUE    1

# 5:     A     FALSE   NA

# 6:     A     FALSE   NA

# 7:     A      TRUE    2

# 8:     A      TRUE    2

# 9:     A     FALSE   NA

#10:     A      TRUE    3

#11:     A      TRUE    3

#12:     A      TRUE    3

#13:     B        NA   NA

#14:     B     FALSE   NA

#15:     B      TRUE    1

#16:     B      TRUE    1

#17:     B      TRUE    1

#18:     B     FALSE   NA

step by step

When we call r <- rle(replace(criterium, is.na(criterium), FALSE)) we get an object of class rle

r

#Run Length Encoding

#  lengths: int [1:9] 1 3 2 2 1 3 2 3 1

#  values : logi [1:9] FALSE TRUE FALSE TRUE FALSE TRUE ...

We manipulate the values compenent in the following way

r$values <- with(r, cumsum(values) * values)

r

#Run Length Encoding

#  lengths: int [1:9] 1 3 2 2 1 3 2 3 1

#  values : int [1:9] 0 1 0 2 0 3 0 4 0

That is, we replaced TRUEs with the cumulative sum of values and set the FALSEs to 0. Now inverse.rle returns a vector in which values will repeated lenghts times

out <- inverse.rle(r)

out

# [1] 0 1 1 1 0 0 2 2 0 3 3 3 0 0 4 4 4 0

This is almost what OP wants but we need to replace the 0s with NA

replace(out, out == 0, NA)

This is done for each group.

data

dat <- structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 

1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", 

"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE, 

FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE, 

TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA, 

-18L))

edited 2 days ago

answered 2 days ago

markus

15.5k11336

Wow, impressive. Thanks for introducing me to rleand inverse.rle. Gruß nach Leipzig.

– Humpelstielzchen
2 days ago

1

@Humpelstielzchen Gern geschehen. Will try to simplify and explain the logic a bit.

– markus
2 days ago

Thanks! I was dissecting your answer just like that. Your answer taught me the most. But chinsoon12 is just a Teufelskerl. ^^

– Humpelstielzchen
2 days ago

add a comment |

A data.table option using rle

library(data.table)

DT <- as.data.table(dat)

DT[, goal := {

  r <- rle(replace(criterium, is.na(criterium), FALSE))

  r$values <- with(r, cumsum(values) * values)          

  out <- inverse.rle(r)                                 

  replace(out, out == 0, NA)

}, by = group]

DT

#    group criterium goal

# 1:     A        NA   NA

# 2:     A      TRUE    1

# 3:     A      TRUE    1

# 4:     A      TRUE    1

# 5:     A     FALSE   NA

# 6:     A     FALSE   NA

# 7:     A      TRUE    2

# 8:     A      TRUE    2

# 9:     A     FALSE   NA

#10:     A      TRUE    3

#11:     A      TRUE    3

#12:     A      TRUE    3

#13:     B        NA   NA

#14:     B     FALSE   NA

#15:     B      TRUE    1

#16:     B      TRUE    1

#17:     B      TRUE    1

#18:     B     FALSE   NA

step by step

When we call r <- rle(replace(criterium, is.na(criterium), FALSE)) we get an object of class rle

r

#Run Length Encoding

#  lengths: int [1:9] 1 3 2 2 1 3 2 3 1

#  values : logi [1:9] FALSE TRUE FALSE TRUE FALSE TRUE ...

We manipulate the values compenent in the following way

r$values <- with(r, cumsum(values) * values)

r

#Run Length Encoding

#  lengths: int [1:9] 1 3 2 2 1 3 2 3 1

#  values : int [1:9] 0 1 0 2 0 3 0 4 0

That is, we replaced TRUEs with the cumulative sum of values and set the FALSEs to 0. Now inverse.rle returns a vector in which values will repeated lenghts times

out <- inverse.rle(r)

out

# [1] 0 1 1 1 0 0 2 2 0 3 3 3 0 0 4 4 4 0

This is almost what OP wants but we need to replace the 0s with NA

replace(out, out == 0, NA)

This is done for each group.

data

dat <- structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 

1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", 

"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE, 

FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE, 

TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA, 

-18L))

edited 2 days ago

answered 2 days ago

markus

15.5k11336

A data.table option using rle

library(data.table)

DT <- as.data.table(dat)

DT[, goal := {

  r <- rle(replace(criterium, is.na(criterium), FALSE))

  r$values <- with(r, cumsum(values) * values)          

  out <- inverse.rle(r)                                 

  replace(out, out == 0, NA)

}, by = group]

DT

#    group criterium goal

# 1:     A        NA   NA

# 2:     A      TRUE    1

# 3:     A      TRUE    1

# 4:     A      TRUE    1

# 5:     A     FALSE   NA

# 6:     A     FALSE   NA

# 7:     A      TRUE    2

# 8:     A      TRUE    2

# 9:     A     FALSE   NA

#10:     A      TRUE    3

#11:     A      TRUE    3

#12:     A      TRUE    3

#13:     B        NA   NA

#14:     B     FALSE   NA

#15:     B      TRUE    1

#16:     B      TRUE    1

#17:     B      TRUE    1

#18:     B     FALSE   NA

step by step

When we call r <- rle(replace(criterium, is.na(criterium), FALSE)) we get an object of class rle

r

#Run Length Encoding

#  lengths: int [1:9] 1 3 2 2 1 3 2 3 1

#  values : logi [1:9] FALSE TRUE FALSE TRUE FALSE TRUE ...

We manipulate the values compenent in the following way

r$values <- with(r, cumsum(values) * values)

r

#Run Length Encoding

#  lengths: int [1:9] 1 3 2 2 1 3 2 3 1

#  values : int [1:9] 0 1 0 2 0 3 0 4 0

That is, we replaced TRUEs with the cumulative sum of values and set the FALSEs to 0. Now inverse.rle returns a vector in which values will repeated lenghts times

out <- inverse.rle(r)

out

# [1] 0 1 1 1 0 0 2 2 0 3 3 3 0 0 4 4 4 0

This is almost what OP wants but we need to replace the 0s with NA

replace(out, out == 0, NA)

This is done for each group.

data

dat <- structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 

1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", 

"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE, 

FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE, 

TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA, 

-18L))

edited 2 days ago

answered 2 days ago

markus

15.5k11336

edited 2 days ago

answered 2 days ago

markus

15.5k11336

answered 2 days ago

markus

15.5k11336

answered 2 days ago

markus

15.5k11336

Wow, impressive. Thanks for introducing me to rleand inverse.rle. Gruß nach Leipzig.

– Humpelstielzchen
2 days ago

1

@Humpelstielzchen Gern geschehen. Will try to simplify and explain the logic a bit.

– markus
2 days ago

Thanks! I was dissecting your answer just like that. Your answer taught me the most. But chinsoon12 is just a Teufelskerl. ^^

– Humpelstielzchen
2 days ago

add a comment |

Wow, impressive. Thanks for introducing me to rleand inverse.rle. Gruß nach Leipzig.

– Humpelstielzchen
2 days ago

1

@Humpelstielzchen Gern geschehen. Will try to simplify and explain the logic a bit.

– markus
2 days ago

Thanks! I was dissecting your answer just like that. Your answer taught me the most. But chinsoon12 is just a Teufelskerl. ^^

– Humpelstielzchen
2 days ago

Wow, impressive. Thanks for introducing me to rleand inverse.rle. Gruß nach Leipzig.

– Humpelstielzchen
2 days ago

@Humpelstielzchen Gern geschehen. Will try to simplify and explain the logic a bit.

– markus
2 days ago

Thanks! I was dissecting your answer just like that. Your answer taught me the most. But chinsoon12 is just a Teufelskerl. ^^

– Humpelstielzchen
2 days ago

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cfxtrjtrk