Rank groups within a grouped sequence of TRUE/FALSE and NA





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







10















I have a little nut to crack.



I have a data.frame like this:



   group criterium
1 A NA
2 A TRUE
3 A TRUE
4 A TRUE
5 A FALSE
6 A FALSE
7 A TRUE
8 A TRUE
9 A FALSE
10 A TRUE
11 A TRUE
12 A TRUE
13 B NA
14 B FALSE
15 B TRUE
16 B TRUE
17 B TRUE
18 B FALSE

structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE,
FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE,
TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA,
-18L))



And I want to rank the groups of TRUE in column criterium in ascending order while disregarding the FALSEand NA. The goal is to have a unique group identifier inside each group of group.



So the result should look like:



    group criterium goal
1 A NA NA
2 A TRUE 1
3 A TRUE 1
4 A TRUE 1
5 A FALSE NA
6 A FALSE NA
7 A TRUE 2
8 A TRUE 2
9 A FALSE NA
10 A TRUE 3
11 A TRUE 3
12 A TRUE 3
13 B NA NA
14 B FALSE NA
15 B TRUE 1
16 B TRUE 1
17 B TRUE 1
18 B FALSE NA



I'm sure there is a relatively easy way to do this, I just can't think of one. I experimented with dense_rank() and other window functions of dplyr, but to no avail.



Thanks for the help!










share|improve this question




















  • 1





    you can just about grab what you need with this work of beauty; as.numeric(as.factor(cumsum(is.na(d$criterium^NA)) + d$criterium^NA)) -- just needs to be applied by group

    – user20650
    2 days ago













  • that is a really funny solution. Very good job!

    – Humpelstielzchen
    2 days ago











  • In your example all of group A comes first, then group B. We don't need to handle cases with group=A, criterium=TRUE interspersed with group=B, criterium=TRUE?

    – smci
    2 days ago











  • No, when group A stops so stops the sequence for group A.

    – Humpelstielzchen
    2 days ago













  • But I'm suggesting if you construct an example with group=A, criterium=TRUE followed by group=B, criterium=TRUE (with no FALSE's in-between), would that get a new 'goal' number or not? Some of the answers here will fail because they don't group-by group or consider the discontinuity in group.

    – smci
    2 days ago




















10















I have a little nut to crack.



I have a data.frame like this:



   group criterium
1 A NA
2 A TRUE
3 A TRUE
4 A TRUE
5 A FALSE
6 A FALSE
7 A TRUE
8 A TRUE
9 A FALSE
10 A TRUE
11 A TRUE
12 A TRUE
13 B NA
14 B FALSE
15 B TRUE
16 B TRUE
17 B TRUE
18 B FALSE

structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE,
FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE,
TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA,
-18L))



And I want to rank the groups of TRUE in column criterium in ascending order while disregarding the FALSEand NA. The goal is to have a unique group identifier inside each group of group.



So the result should look like:



    group criterium goal
1 A NA NA
2 A TRUE 1
3 A TRUE 1
4 A TRUE 1
5 A FALSE NA
6 A FALSE NA
7 A TRUE 2
8 A TRUE 2
9 A FALSE NA
10 A TRUE 3
11 A TRUE 3
12 A TRUE 3
13 B NA NA
14 B FALSE NA
15 B TRUE 1
16 B TRUE 1
17 B TRUE 1
18 B FALSE NA



I'm sure there is a relatively easy way to do this, I just can't think of one. I experimented with dense_rank() and other window functions of dplyr, but to no avail.



Thanks for the help!










share|improve this question




















  • 1





    you can just about grab what you need with this work of beauty; as.numeric(as.factor(cumsum(is.na(d$criterium^NA)) + d$criterium^NA)) -- just needs to be applied by group

    – user20650
    2 days ago













  • that is a really funny solution. Very good job!

    – Humpelstielzchen
    2 days ago











  • In your example all of group A comes first, then group B. We don't need to handle cases with group=A, criterium=TRUE interspersed with group=B, criterium=TRUE?

    – smci
    2 days ago











  • No, when group A stops so stops the sequence for group A.

    – Humpelstielzchen
    2 days ago













  • But I'm suggesting if you construct an example with group=A, criterium=TRUE followed by group=B, criterium=TRUE (with no FALSE's in-between), would that get a new 'goal' number or not? Some of the answers here will fail because they don't group-by group or consider the discontinuity in group.

    – smci
    2 days ago
















10












10








10








I have a little nut to crack.



I have a data.frame like this:



   group criterium
1 A NA
2 A TRUE
3 A TRUE
4 A TRUE
5 A FALSE
6 A FALSE
7 A TRUE
8 A TRUE
9 A FALSE
10 A TRUE
11 A TRUE
12 A TRUE
13 B NA
14 B FALSE
15 B TRUE
16 B TRUE
17 B TRUE
18 B FALSE

structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE,
FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE,
TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA,
-18L))



And I want to rank the groups of TRUE in column criterium in ascending order while disregarding the FALSEand NA. The goal is to have a unique group identifier inside each group of group.



So the result should look like:



    group criterium goal
1 A NA NA
2 A TRUE 1
3 A TRUE 1
4 A TRUE 1
5 A FALSE NA
6 A FALSE NA
7 A TRUE 2
8 A TRUE 2
9 A FALSE NA
10 A TRUE 3
11 A TRUE 3
12 A TRUE 3
13 B NA NA
14 B FALSE NA
15 B TRUE 1
16 B TRUE 1
17 B TRUE 1
18 B FALSE NA



I'm sure there is a relatively easy way to do this, I just can't think of one. I experimented with dense_rank() and other window functions of dplyr, but to no avail.



Thanks for the help!










share|improve this question
















I have a little nut to crack.



I have a data.frame like this:



   group criterium
1 A NA
2 A TRUE
3 A TRUE
4 A TRUE
5 A FALSE
6 A FALSE
7 A TRUE
8 A TRUE
9 A FALSE
10 A TRUE
11 A TRUE
12 A TRUE
13 B NA
14 B FALSE
15 B TRUE
16 B TRUE
17 B TRUE
18 B FALSE

structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE,
FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE,
TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA,
-18L))



And I want to rank the groups of TRUE in column criterium in ascending order while disregarding the FALSEand NA. The goal is to have a unique group identifier inside each group of group.



So the result should look like:



    group criterium goal
1 A NA NA
2 A TRUE 1
3 A TRUE 1
4 A TRUE 1
5 A FALSE NA
6 A FALSE NA
7 A TRUE 2
8 A TRUE 2
9 A FALSE NA
10 A TRUE 3
11 A TRUE 3
12 A TRUE 3
13 B NA NA
14 B FALSE NA
15 B TRUE 1
16 B TRUE 1
17 B TRUE 1
18 B FALSE NA



I'm sure there is a relatively easy way to do this, I just can't think of one. I experimented with dense_rank() and other window functions of dplyr, but to no avail.



Thanks for the help!







r dplyr data.table rank






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 2 days ago







Humpelstielzchen

















asked 2 days ago









HumpelstielzchenHumpelstielzchen

1,4251318




1,4251318








  • 1





    you can just about grab what you need with this work of beauty; as.numeric(as.factor(cumsum(is.na(d$criterium^NA)) + d$criterium^NA)) -- just needs to be applied by group

    – user20650
    2 days ago













  • that is a really funny solution. Very good job!

    – Humpelstielzchen
    2 days ago











  • In your example all of group A comes first, then group B. We don't need to handle cases with group=A, criterium=TRUE interspersed with group=B, criterium=TRUE?

    – smci
    2 days ago











  • No, when group A stops so stops the sequence for group A.

    – Humpelstielzchen
    2 days ago













  • But I'm suggesting if you construct an example with group=A, criterium=TRUE followed by group=B, criterium=TRUE (with no FALSE's in-between), would that get a new 'goal' number or not? Some of the answers here will fail because they don't group-by group or consider the discontinuity in group.

    – smci
    2 days ago
















  • 1





    you can just about grab what you need with this work of beauty; as.numeric(as.factor(cumsum(is.na(d$criterium^NA)) + d$criterium^NA)) -- just needs to be applied by group

    – user20650
    2 days ago













  • that is a really funny solution. Very good job!

    – Humpelstielzchen
    2 days ago











  • In your example all of group A comes first, then group B. We don't need to handle cases with group=A, criterium=TRUE interspersed with group=B, criterium=TRUE?

    – smci
    2 days ago











  • No, when group A stops so stops the sequence for group A.

    – Humpelstielzchen
    2 days ago













  • But I'm suggesting if you construct an example with group=A, criterium=TRUE followed by group=B, criterium=TRUE (with no FALSE's in-between), would that get a new 'goal' number or not? Some of the answers here will fail because they don't group-by group or consider the discontinuity in group.

    – smci
    2 days ago










1




1





you can just about grab what you need with this work of beauty; as.numeric(as.factor(cumsum(is.na(d$criterium^NA)) + d$criterium^NA)) -- just needs to be applied by group

– user20650
2 days ago







you can just about grab what you need with this work of beauty; as.numeric(as.factor(cumsum(is.na(d$criterium^NA)) + d$criterium^NA)) -- just needs to be applied by group

– user20650
2 days ago















that is a really funny solution. Very good job!

– Humpelstielzchen
2 days ago





that is a really funny solution. Very good job!

– Humpelstielzchen
2 days ago













In your example all of group A comes first, then group B. We don't need to handle cases with group=A, criterium=TRUE interspersed with group=B, criterium=TRUE?

– smci
2 days ago





In your example all of group A comes first, then group B. We don't need to handle cases with group=A, criterium=TRUE interspersed with group=B, criterium=TRUE?

– smci
2 days ago













No, when group A stops so stops the sequence for group A.

– Humpelstielzchen
2 days ago







No, when group A stops so stops the sequence for group A.

– Humpelstielzchen
2 days ago















But I'm suggesting if you construct an example with group=A, criterium=TRUE followed by group=B, criterium=TRUE (with no FALSE's in-between), would that get a new 'goal' number or not? Some of the answers here will fail because they don't group-by group or consider the discontinuity in group.

– smci
2 days ago







But I'm suggesting if you construct an example with group=A, criterium=TRUE followed by group=B, criterium=TRUE (with no FALSE's in-between), would that get a new 'goal' number or not? Some of the answers here will fail because they don't group-by group or consider the discontinuity in group.

– smci
2 days ago














4 Answers
4






active

oldest

votes


















7














Another data.table approach:



library(data.table)
setDT(dt)
dt[, cr := rleid(criterium)][
(criterium), goal := rleid(cr), by=.(group)]





share|improve this answer



















  • 1





    Tried with rleid but didn't get it to work. (+1)

    – markus
    2 days ago











  • works for me. And seems to be the most elegant answer.

    – Humpelstielzchen
    2 days ago



















6














Maybe I have over-complicated this but one way with dplyr is



library(dplyr)

df %>%
mutate(temp = replace(criterium, is.na(criterium), FALSE),
temp1 = cumsum(!temp)) %>%
group_by(temp1) %>%
mutate(goal = +(row_number() == which.max(temp) & any(temp))) %>%
group_by(group) %>%
mutate(goal = ifelse(temp, cumsum(goal), NA)) %>%
select(-temp, -temp1)

# group criterium goal
# <fct> <lgl> <int>
# 1 A NA NA
# 2 A TRUE 1
# 3 A TRUE 1
# 4 A TRUE 1
# 5 A FALSE NA
# 6 A FALSE NA
# 7 A TRUE 2
# 8 A TRUE 2
# 9 A FALSE NA
#10 A TRUE 3
#11 A TRUE 3
#12 A TRUE 3
#13 B NA NA
#14 B FALSE NA
#15 B TRUE 1
#16 B TRUE 1
#17 B TRUE 1
#18 B FALSE NA


We first replace NAs in criterium column to FALSE and take cumulative sum over the negation of it (temp1). We group_by temp1 and assign 1 to every first TRUE value in the group. Finally grouping by group we take a cumulative sum for TRUE values or return NA for FALSE and NA values.






share|improve this answer































    4














    A pure Base R solution, we can create a custom function via rle, and use it per group, i.e.



    f1 <- function(x) {
    x[is.na(x)] <- FALSE
    rle1 <- rle(x)
    y <- rle1$values
    rle1$values[!y] <- 0
    rle1$values[y] <- cumsum(rle1$values[y])
    return(inverse.rle(rle1))
    }


    do.call(rbind,
    lapply(split(df, df$group), function(i){i$goal <- f1(i$criterium);
    i$goal <- replace(i$goal, is.na(i$criterium)|!i$criterium, NA);
    i}))


    Of course, If you want you can apply it via dplyr, i.e.



    library(dplyr)

    df %>%
    group_by(group) %>%
    mutate(goal = f1(criterium),
    goal = replace(goal, is.na(criterium)|!criterium, NA))


    which gives,




    # A tibble: 18 x 3
    # Groups: group [2]
    group criterium goal
    <fct> <lgl> <dbl>
    1 A NA NA
    2 A TRUE 1
    3 A TRUE 1
    4 A TRUE 1
    5 A FALSE NA
    6 A FALSE NA
    7 A TRUE 2
    8 A TRUE 2
    9 A FALSE NA
    10 A TRUE 3
    11 A TRUE 3
    12 A TRUE 3
    13 B NA NA
    14 B FALSE NA
    15 B TRUE 1
    16 B TRUE 1
    17 B TRUE 1
    18 B FALSE NA






    share|improve this answer

































      4














      A data.table option using rle



      library(data.table)
      DT <- as.data.table(dat)
      DT[, goal := {
      r <- rle(replace(criterium, is.na(criterium), FALSE))
      r$values <- with(r, cumsum(values) * values)
      out <- inverse.rle(r)
      replace(out, out == 0, NA)
      }, by = group]
      DT
      # group criterium goal
      # 1: A NA NA
      # 2: A TRUE 1
      # 3: A TRUE 1
      # 4: A TRUE 1
      # 5: A FALSE NA
      # 6: A FALSE NA
      # 7: A TRUE 2
      # 8: A TRUE 2
      # 9: A FALSE NA
      #10: A TRUE 3
      #11: A TRUE 3
      #12: A TRUE 3
      #13: B NA NA
      #14: B FALSE NA
      #15: B TRUE 1
      #16: B TRUE 1
      #17: B TRUE 1
      #18: B FALSE NA


      step by step



      When we call r <- rle(replace(criterium, is.na(criterium), FALSE)) we get an object of class rle



      r
      #Run Length Encoding
      # lengths: int [1:9] 1 3 2 2 1 3 2 3 1
      # values : logi [1:9] FALSE TRUE FALSE TRUE FALSE TRUE ...


      We manipulate the values compenent in the following way



      r$values <- with(r, cumsum(values) * values)
      r
      #Run Length Encoding
      # lengths: int [1:9] 1 3 2 2 1 3 2 3 1
      # values : int [1:9] 0 1 0 2 0 3 0 4 0


      That is, we replaced TRUEs with the cumulative sum of values and set the FALSEs to 0. Now inverse.rle returns a vector in which values will repeated lenghts times



      out <- inverse.rle(r)
      out
      # [1] 0 1 1 1 0 0 2 2 0 3 3 3 0 0 4 4 4 0


      This is almost what OP wants but we need to replace the 0s with NA



      replace(out, out == 0, NA)


      This is done for each group.



      data



      dat <- structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
      1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A",
      "B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE,
      FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE,
      TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA,
      -18L))





      share|improve this answer


























      • Wow, impressive. Thanks for introducing me to rleand inverse.rle. Gruß nach Leipzig.

        – Humpelstielzchen
        2 days ago






      • 1





        @Humpelstielzchen Gern geschehen. Will try to simplify and explain the logic a bit.

        – markus
        2 days ago













      • Thanks! I was dissecting your answer just like that. Your answer taught me the most. But chinsoon12 is just a Teufelskerl. ^^

        – Humpelstielzchen
        2 days ago












      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55606323%2frank-groups-within-a-grouped-sequence-of-true-false-and-na%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      4 Answers
      4






      active

      oldest

      votes








      4 Answers
      4






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      7














      Another data.table approach:



      library(data.table)
      setDT(dt)
      dt[, cr := rleid(criterium)][
      (criterium), goal := rleid(cr), by=.(group)]





      share|improve this answer



















      • 1





        Tried with rleid but didn't get it to work. (+1)

        – markus
        2 days ago











      • works for me. And seems to be the most elegant answer.

        – Humpelstielzchen
        2 days ago
















      7














      Another data.table approach:



      library(data.table)
      setDT(dt)
      dt[, cr := rleid(criterium)][
      (criterium), goal := rleid(cr), by=.(group)]





      share|improve this answer



















      • 1





        Tried with rleid but didn't get it to work. (+1)

        – markus
        2 days ago











      • works for me. And seems to be the most elegant answer.

        – Humpelstielzchen
        2 days ago














      7












      7








      7







      Another data.table approach:



      library(data.table)
      setDT(dt)
      dt[, cr := rleid(criterium)][
      (criterium), goal := rleid(cr), by=.(group)]





      share|improve this answer













      Another data.table approach:



      library(data.table)
      setDT(dt)
      dt[, cr := rleid(criterium)][
      (criterium), goal := rleid(cr), by=.(group)]






      share|improve this answer












      share|improve this answer



      share|improve this answer










      answered 2 days ago









      chinsoon12chinsoon12

      9,93611420




      9,93611420








      • 1





        Tried with rleid but didn't get it to work. (+1)

        – markus
        2 days ago











      • works for me. And seems to be the most elegant answer.

        – Humpelstielzchen
        2 days ago














      • 1





        Tried with rleid but didn't get it to work. (+1)

        – markus
        2 days ago











      • works for me. And seems to be the most elegant answer.

        – Humpelstielzchen
        2 days ago








      1




      1





      Tried with rleid but didn't get it to work. (+1)

      – markus
      2 days ago





      Tried with rleid but didn't get it to work. (+1)

      – markus
      2 days ago













      works for me. And seems to be the most elegant answer.

      – Humpelstielzchen
      2 days ago





      works for me. And seems to be the most elegant answer.

      – Humpelstielzchen
      2 days ago













      6














      Maybe I have over-complicated this but one way with dplyr is



      library(dplyr)

      df %>%
      mutate(temp = replace(criterium, is.na(criterium), FALSE),
      temp1 = cumsum(!temp)) %>%
      group_by(temp1) %>%
      mutate(goal = +(row_number() == which.max(temp) & any(temp))) %>%
      group_by(group) %>%
      mutate(goal = ifelse(temp, cumsum(goal), NA)) %>%
      select(-temp, -temp1)

      # group criterium goal
      # <fct> <lgl> <int>
      # 1 A NA NA
      # 2 A TRUE 1
      # 3 A TRUE 1
      # 4 A TRUE 1
      # 5 A FALSE NA
      # 6 A FALSE NA
      # 7 A TRUE 2
      # 8 A TRUE 2
      # 9 A FALSE NA
      #10 A TRUE 3
      #11 A TRUE 3
      #12 A TRUE 3
      #13 B NA NA
      #14 B FALSE NA
      #15 B TRUE 1
      #16 B TRUE 1
      #17 B TRUE 1
      #18 B FALSE NA


      We first replace NAs in criterium column to FALSE and take cumulative sum over the negation of it (temp1). We group_by temp1 and assign 1 to every first TRUE value in the group. Finally grouping by group we take a cumulative sum for TRUE values or return NA for FALSE and NA values.






      share|improve this answer




























        6














        Maybe I have over-complicated this but one way with dplyr is



        library(dplyr)

        df %>%
        mutate(temp = replace(criterium, is.na(criterium), FALSE),
        temp1 = cumsum(!temp)) %>%
        group_by(temp1) %>%
        mutate(goal = +(row_number() == which.max(temp) & any(temp))) %>%
        group_by(group) %>%
        mutate(goal = ifelse(temp, cumsum(goal), NA)) %>%
        select(-temp, -temp1)

        # group criterium goal
        # <fct> <lgl> <int>
        # 1 A NA NA
        # 2 A TRUE 1
        # 3 A TRUE 1
        # 4 A TRUE 1
        # 5 A FALSE NA
        # 6 A FALSE NA
        # 7 A TRUE 2
        # 8 A TRUE 2
        # 9 A FALSE NA
        #10 A TRUE 3
        #11 A TRUE 3
        #12 A TRUE 3
        #13 B NA NA
        #14 B FALSE NA
        #15 B TRUE 1
        #16 B TRUE 1
        #17 B TRUE 1
        #18 B FALSE NA


        We first replace NAs in criterium column to FALSE and take cumulative sum over the negation of it (temp1). We group_by temp1 and assign 1 to every first TRUE value in the group. Finally grouping by group we take a cumulative sum for TRUE values or return NA for FALSE and NA values.






        share|improve this answer


























          6












          6








          6







          Maybe I have over-complicated this but one way with dplyr is



          library(dplyr)

          df %>%
          mutate(temp = replace(criterium, is.na(criterium), FALSE),
          temp1 = cumsum(!temp)) %>%
          group_by(temp1) %>%
          mutate(goal = +(row_number() == which.max(temp) & any(temp))) %>%
          group_by(group) %>%
          mutate(goal = ifelse(temp, cumsum(goal), NA)) %>%
          select(-temp, -temp1)

          # group criterium goal
          # <fct> <lgl> <int>
          # 1 A NA NA
          # 2 A TRUE 1
          # 3 A TRUE 1
          # 4 A TRUE 1
          # 5 A FALSE NA
          # 6 A FALSE NA
          # 7 A TRUE 2
          # 8 A TRUE 2
          # 9 A FALSE NA
          #10 A TRUE 3
          #11 A TRUE 3
          #12 A TRUE 3
          #13 B NA NA
          #14 B FALSE NA
          #15 B TRUE 1
          #16 B TRUE 1
          #17 B TRUE 1
          #18 B FALSE NA


          We first replace NAs in criterium column to FALSE and take cumulative sum over the negation of it (temp1). We group_by temp1 and assign 1 to every first TRUE value in the group. Finally grouping by group we take a cumulative sum for TRUE values or return NA for FALSE and NA values.






          share|improve this answer













          Maybe I have over-complicated this but one way with dplyr is



          library(dplyr)

          df %>%
          mutate(temp = replace(criterium, is.na(criterium), FALSE),
          temp1 = cumsum(!temp)) %>%
          group_by(temp1) %>%
          mutate(goal = +(row_number() == which.max(temp) & any(temp))) %>%
          group_by(group) %>%
          mutate(goal = ifelse(temp, cumsum(goal), NA)) %>%
          select(-temp, -temp1)

          # group criterium goal
          # <fct> <lgl> <int>
          # 1 A NA NA
          # 2 A TRUE 1
          # 3 A TRUE 1
          # 4 A TRUE 1
          # 5 A FALSE NA
          # 6 A FALSE NA
          # 7 A TRUE 2
          # 8 A TRUE 2
          # 9 A FALSE NA
          #10 A TRUE 3
          #11 A TRUE 3
          #12 A TRUE 3
          #13 B NA NA
          #14 B FALSE NA
          #15 B TRUE 1
          #16 B TRUE 1
          #17 B TRUE 1
          #18 B FALSE NA


          We first replace NAs in criterium column to FALSE and take cumulative sum over the negation of it (temp1). We group_by temp1 and assign 1 to every first TRUE value in the group. Finally grouping by group we take a cumulative sum for TRUE values or return NA for FALSE and NA values.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered 2 days ago









          Ronak ShahRonak Shah

          46.7k104269




          46.7k104269























              4














              A pure Base R solution, we can create a custom function via rle, and use it per group, i.e.



              f1 <- function(x) {
              x[is.na(x)] <- FALSE
              rle1 <- rle(x)
              y <- rle1$values
              rle1$values[!y] <- 0
              rle1$values[y] <- cumsum(rle1$values[y])
              return(inverse.rle(rle1))
              }


              do.call(rbind,
              lapply(split(df, df$group), function(i){i$goal <- f1(i$criterium);
              i$goal <- replace(i$goal, is.na(i$criterium)|!i$criterium, NA);
              i}))


              Of course, If you want you can apply it via dplyr, i.e.



              library(dplyr)

              df %>%
              group_by(group) %>%
              mutate(goal = f1(criterium),
              goal = replace(goal, is.na(criterium)|!criterium, NA))


              which gives,




              # A tibble: 18 x 3
              # Groups: group [2]
              group criterium goal
              <fct> <lgl> <dbl>
              1 A NA NA
              2 A TRUE 1
              3 A TRUE 1
              4 A TRUE 1
              5 A FALSE NA
              6 A FALSE NA
              7 A TRUE 2
              8 A TRUE 2
              9 A FALSE NA
              10 A TRUE 3
              11 A TRUE 3
              12 A TRUE 3
              13 B NA NA
              14 B FALSE NA
              15 B TRUE 1
              16 B TRUE 1
              17 B TRUE 1
              18 B FALSE NA






              share|improve this answer






























                4














                A pure Base R solution, we can create a custom function via rle, and use it per group, i.e.



                f1 <- function(x) {
                x[is.na(x)] <- FALSE
                rle1 <- rle(x)
                y <- rle1$values
                rle1$values[!y] <- 0
                rle1$values[y] <- cumsum(rle1$values[y])
                return(inverse.rle(rle1))
                }


                do.call(rbind,
                lapply(split(df, df$group), function(i){i$goal <- f1(i$criterium);
                i$goal <- replace(i$goal, is.na(i$criterium)|!i$criterium, NA);
                i}))


                Of course, If you want you can apply it via dplyr, i.e.



                library(dplyr)

                df %>%
                group_by(group) %>%
                mutate(goal = f1(criterium),
                goal = replace(goal, is.na(criterium)|!criterium, NA))


                which gives,




                # A tibble: 18 x 3
                # Groups: group [2]
                group criterium goal
                <fct> <lgl> <dbl>
                1 A NA NA
                2 A TRUE 1
                3 A TRUE 1
                4 A TRUE 1
                5 A FALSE NA
                6 A FALSE NA
                7 A TRUE 2
                8 A TRUE 2
                9 A FALSE NA
                10 A TRUE 3
                11 A TRUE 3
                12 A TRUE 3
                13 B NA NA
                14 B FALSE NA
                15 B TRUE 1
                16 B TRUE 1
                17 B TRUE 1
                18 B FALSE NA






                share|improve this answer




























                  4












                  4








                  4







                  A pure Base R solution, we can create a custom function via rle, and use it per group, i.e.



                  f1 <- function(x) {
                  x[is.na(x)] <- FALSE
                  rle1 <- rle(x)
                  y <- rle1$values
                  rle1$values[!y] <- 0
                  rle1$values[y] <- cumsum(rle1$values[y])
                  return(inverse.rle(rle1))
                  }


                  do.call(rbind,
                  lapply(split(df, df$group), function(i){i$goal <- f1(i$criterium);
                  i$goal <- replace(i$goal, is.na(i$criterium)|!i$criterium, NA);
                  i}))


                  Of course, If you want you can apply it via dplyr, i.e.



                  library(dplyr)

                  df %>%
                  group_by(group) %>%
                  mutate(goal = f1(criterium),
                  goal = replace(goal, is.na(criterium)|!criterium, NA))


                  which gives,




                  # A tibble: 18 x 3
                  # Groups: group [2]
                  group criterium goal
                  <fct> <lgl> <dbl>
                  1 A NA NA
                  2 A TRUE 1
                  3 A TRUE 1
                  4 A TRUE 1
                  5 A FALSE NA
                  6 A FALSE NA
                  7 A TRUE 2
                  8 A TRUE 2
                  9 A FALSE NA
                  10 A TRUE 3
                  11 A TRUE 3
                  12 A TRUE 3
                  13 B NA NA
                  14 B FALSE NA
                  15 B TRUE 1
                  16 B TRUE 1
                  17 B TRUE 1
                  18 B FALSE NA






                  share|improve this answer















                  A pure Base R solution, we can create a custom function via rle, and use it per group, i.e.



                  f1 <- function(x) {
                  x[is.na(x)] <- FALSE
                  rle1 <- rle(x)
                  y <- rle1$values
                  rle1$values[!y] <- 0
                  rle1$values[y] <- cumsum(rle1$values[y])
                  return(inverse.rle(rle1))
                  }


                  do.call(rbind,
                  lapply(split(df, df$group), function(i){i$goal <- f1(i$criterium);
                  i$goal <- replace(i$goal, is.na(i$criterium)|!i$criterium, NA);
                  i}))


                  Of course, If you want you can apply it via dplyr, i.e.



                  library(dplyr)

                  df %>%
                  group_by(group) %>%
                  mutate(goal = f1(criterium),
                  goal = replace(goal, is.na(criterium)|!criterium, NA))


                  which gives,




                  # A tibble: 18 x 3
                  # Groups: group [2]
                  group criterium goal
                  <fct> <lgl> <dbl>
                  1 A NA NA
                  2 A TRUE 1
                  3 A TRUE 1
                  4 A TRUE 1
                  5 A FALSE NA
                  6 A FALSE NA
                  7 A TRUE 2
                  8 A TRUE 2
                  9 A FALSE NA
                  10 A TRUE 3
                  11 A TRUE 3
                  12 A TRUE 3
                  13 B NA NA
                  14 B FALSE NA
                  15 B TRUE 1
                  16 B TRUE 1
                  17 B TRUE 1
                  18 B FALSE NA







                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited 2 days ago

























                  answered 2 days ago









                  SotosSotos

                  31.5k51741




                  31.5k51741























                      4














                      A data.table option using rle



                      library(data.table)
                      DT <- as.data.table(dat)
                      DT[, goal := {
                      r <- rle(replace(criterium, is.na(criterium), FALSE))
                      r$values <- with(r, cumsum(values) * values)
                      out <- inverse.rle(r)
                      replace(out, out == 0, NA)
                      }, by = group]
                      DT
                      # group criterium goal
                      # 1: A NA NA
                      # 2: A TRUE 1
                      # 3: A TRUE 1
                      # 4: A TRUE 1
                      # 5: A FALSE NA
                      # 6: A FALSE NA
                      # 7: A TRUE 2
                      # 8: A TRUE 2
                      # 9: A FALSE NA
                      #10: A TRUE 3
                      #11: A TRUE 3
                      #12: A TRUE 3
                      #13: B NA NA
                      #14: B FALSE NA
                      #15: B TRUE 1
                      #16: B TRUE 1
                      #17: B TRUE 1
                      #18: B FALSE NA


                      step by step



                      When we call r <- rle(replace(criterium, is.na(criterium), FALSE)) we get an object of class rle



                      r
                      #Run Length Encoding
                      # lengths: int [1:9] 1 3 2 2 1 3 2 3 1
                      # values : logi [1:9] FALSE TRUE FALSE TRUE FALSE TRUE ...


                      We manipulate the values compenent in the following way



                      r$values <- with(r, cumsum(values) * values)
                      r
                      #Run Length Encoding
                      # lengths: int [1:9] 1 3 2 2 1 3 2 3 1
                      # values : int [1:9] 0 1 0 2 0 3 0 4 0


                      That is, we replaced TRUEs with the cumulative sum of values and set the FALSEs to 0. Now inverse.rle returns a vector in which values will repeated lenghts times



                      out <- inverse.rle(r)
                      out
                      # [1] 0 1 1 1 0 0 2 2 0 3 3 3 0 0 4 4 4 0


                      This is almost what OP wants but we need to replace the 0s with NA



                      replace(out, out == 0, NA)


                      This is done for each group.



                      data



                      dat <- structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
                      1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A",
                      "B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE,
                      FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE,
                      TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA,
                      -18L))





                      share|improve this answer


























                      • Wow, impressive. Thanks for introducing me to rleand inverse.rle. Gruß nach Leipzig.

                        – Humpelstielzchen
                        2 days ago






                      • 1





                        @Humpelstielzchen Gern geschehen. Will try to simplify and explain the logic a bit.

                        – markus
                        2 days ago













                      • Thanks! I was dissecting your answer just like that. Your answer taught me the most. But chinsoon12 is just a Teufelskerl. ^^

                        – Humpelstielzchen
                        2 days ago
















                      4














                      A data.table option using rle



                      library(data.table)
                      DT <- as.data.table(dat)
                      DT[, goal := {
                      r <- rle(replace(criterium, is.na(criterium), FALSE))
                      r$values <- with(r, cumsum(values) * values)
                      out <- inverse.rle(r)
                      replace(out, out == 0, NA)
                      }, by = group]
                      DT
                      # group criterium goal
                      # 1: A NA NA
                      # 2: A TRUE 1
                      # 3: A TRUE 1
                      # 4: A TRUE 1
                      # 5: A FALSE NA
                      # 6: A FALSE NA
                      # 7: A TRUE 2
                      # 8: A TRUE 2
                      # 9: A FALSE NA
                      #10: A TRUE 3
                      #11: A TRUE 3
                      #12: A TRUE 3
                      #13: B NA NA
                      #14: B FALSE NA
                      #15: B TRUE 1
                      #16: B TRUE 1
                      #17: B TRUE 1
                      #18: B FALSE NA


                      step by step



                      When we call r <- rle(replace(criterium, is.na(criterium), FALSE)) we get an object of class rle



                      r
                      #Run Length Encoding
                      # lengths: int [1:9] 1 3 2 2 1 3 2 3 1
                      # values : logi [1:9] FALSE TRUE FALSE TRUE FALSE TRUE ...


                      We manipulate the values compenent in the following way



                      r$values <- with(r, cumsum(values) * values)
                      r
                      #Run Length Encoding
                      # lengths: int [1:9] 1 3 2 2 1 3 2 3 1
                      # values : int [1:9] 0 1 0 2 0 3 0 4 0


                      That is, we replaced TRUEs with the cumulative sum of values and set the FALSEs to 0. Now inverse.rle returns a vector in which values will repeated lenghts times



                      out <- inverse.rle(r)
                      out
                      # [1] 0 1 1 1 0 0 2 2 0 3 3 3 0 0 4 4 4 0


                      This is almost what OP wants but we need to replace the 0s with NA



                      replace(out, out == 0, NA)


                      This is done for each group.



                      data



                      dat <- structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
                      1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A",
                      "B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE,
                      FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE,
                      TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA,
                      -18L))





                      share|improve this answer


























                      • Wow, impressive. Thanks for introducing me to rleand inverse.rle. Gruß nach Leipzig.

                        – Humpelstielzchen
                        2 days ago






                      • 1





                        @Humpelstielzchen Gern geschehen. Will try to simplify and explain the logic a bit.

                        – markus
                        2 days ago













                      • Thanks! I was dissecting your answer just like that. Your answer taught me the most. But chinsoon12 is just a Teufelskerl. ^^

                        – Humpelstielzchen
                        2 days ago














                      4












                      4








                      4







                      A data.table option using rle



                      library(data.table)
                      DT <- as.data.table(dat)
                      DT[, goal := {
                      r <- rle(replace(criterium, is.na(criterium), FALSE))
                      r$values <- with(r, cumsum(values) * values)
                      out <- inverse.rle(r)
                      replace(out, out == 0, NA)
                      }, by = group]
                      DT
                      # group criterium goal
                      # 1: A NA NA
                      # 2: A TRUE 1
                      # 3: A TRUE 1
                      # 4: A TRUE 1
                      # 5: A FALSE NA
                      # 6: A FALSE NA
                      # 7: A TRUE 2
                      # 8: A TRUE 2
                      # 9: A FALSE NA
                      #10: A TRUE 3
                      #11: A TRUE 3
                      #12: A TRUE 3
                      #13: B NA NA
                      #14: B FALSE NA
                      #15: B TRUE 1
                      #16: B TRUE 1
                      #17: B TRUE 1
                      #18: B FALSE NA


                      step by step



                      When we call r <- rle(replace(criterium, is.na(criterium), FALSE)) we get an object of class rle



                      r
                      #Run Length Encoding
                      # lengths: int [1:9] 1 3 2 2 1 3 2 3 1
                      # values : logi [1:9] FALSE TRUE FALSE TRUE FALSE TRUE ...


                      We manipulate the values compenent in the following way



                      r$values <- with(r, cumsum(values) * values)
                      r
                      #Run Length Encoding
                      # lengths: int [1:9] 1 3 2 2 1 3 2 3 1
                      # values : int [1:9] 0 1 0 2 0 3 0 4 0


                      That is, we replaced TRUEs with the cumulative sum of values and set the FALSEs to 0. Now inverse.rle returns a vector in which values will repeated lenghts times



                      out <- inverse.rle(r)
                      out
                      # [1] 0 1 1 1 0 0 2 2 0 3 3 3 0 0 4 4 4 0


                      This is almost what OP wants but we need to replace the 0s with NA



                      replace(out, out == 0, NA)


                      This is done for each group.



                      data



                      dat <- structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
                      1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A",
                      "B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE,
                      FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE,
                      TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA,
                      -18L))





                      share|improve this answer















                      A data.table option using rle



                      library(data.table)
                      DT <- as.data.table(dat)
                      DT[, goal := {
                      r <- rle(replace(criterium, is.na(criterium), FALSE))
                      r$values <- with(r, cumsum(values) * values)
                      out <- inverse.rle(r)
                      replace(out, out == 0, NA)
                      }, by = group]
                      DT
                      # group criterium goal
                      # 1: A NA NA
                      # 2: A TRUE 1
                      # 3: A TRUE 1
                      # 4: A TRUE 1
                      # 5: A FALSE NA
                      # 6: A FALSE NA
                      # 7: A TRUE 2
                      # 8: A TRUE 2
                      # 9: A FALSE NA
                      #10: A TRUE 3
                      #11: A TRUE 3
                      #12: A TRUE 3
                      #13: B NA NA
                      #14: B FALSE NA
                      #15: B TRUE 1
                      #16: B TRUE 1
                      #17: B TRUE 1
                      #18: B FALSE NA


                      step by step



                      When we call r <- rle(replace(criterium, is.na(criterium), FALSE)) we get an object of class rle



                      r
                      #Run Length Encoding
                      # lengths: int [1:9] 1 3 2 2 1 3 2 3 1
                      # values : logi [1:9] FALSE TRUE FALSE TRUE FALSE TRUE ...


                      We manipulate the values compenent in the following way



                      r$values <- with(r, cumsum(values) * values)
                      r
                      #Run Length Encoding
                      # lengths: int [1:9] 1 3 2 2 1 3 2 3 1
                      # values : int [1:9] 0 1 0 2 0 3 0 4 0


                      That is, we replaced TRUEs with the cumulative sum of values and set the FALSEs to 0. Now inverse.rle returns a vector in which values will repeated lenghts times



                      out <- inverse.rle(r)
                      out
                      # [1] 0 1 1 1 0 0 2 2 0 3 3 3 0 0 4 4 4 0


                      This is almost what OP wants but we need to replace the 0s with NA



                      replace(out, out == 0, NA)


                      This is done for each group.



                      data



                      dat <- structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
                      1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A",
                      "B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE,
                      FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE,
                      TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA,
                      -18L))






                      share|improve this answer














                      share|improve this answer



                      share|improve this answer








                      edited 2 days ago

























                      answered 2 days ago









                      markusmarkus

                      15.5k11336




                      15.5k11336













                      • Wow, impressive. Thanks for introducing me to rleand inverse.rle. Gruß nach Leipzig.

                        – Humpelstielzchen
                        2 days ago






                      • 1





                        @Humpelstielzchen Gern geschehen. Will try to simplify and explain the logic a bit.

                        – markus
                        2 days ago













                      • Thanks! I was dissecting your answer just like that. Your answer taught me the most. But chinsoon12 is just a Teufelskerl. ^^

                        – Humpelstielzchen
                        2 days ago



















                      • Wow, impressive. Thanks for introducing me to rleand inverse.rle. Gruß nach Leipzig.

                        – Humpelstielzchen
                        2 days ago






                      • 1





                        @Humpelstielzchen Gern geschehen. Will try to simplify and explain the logic a bit.

                        – markus
                        2 days ago













                      • Thanks! I was dissecting your answer just like that. Your answer taught me the most. But chinsoon12 is just a Teufelskerl. ^^

                        – Humpelstielzchen
                        2 days ago

















                      Wow, impressive. Thanks for introducing me to rleand inverse.rle. Gruß nach Leipzig.

                      – Humpelstielzchen
                      2 days ago





                      Wow, impressive. Thanks for introducing me to rleand inverse.rle. Gruß nach Leipzig.

                      – Humpelstielzchen
                      2 days ago




                      1




                      1





                      @Humpelstielzchen Gern geschehen. Will try to simplify and explain the logic a bit.

                      – markus
                      2 days ago







                      @Humpelstielzchen Gern geschehen. Will try to simplify and explain the logic a bit.

                      – markus
                      2 days ago















                      Thanks! I was dissecting your answer just like that. Your answer taught me the most. But chinsoon12 is just a Teufelskerl. ^^

                      – Humpelstielzchen
                      2 days ago





                      Thanks! I was dissecting your answer just like that. Your answer taught me the most. But chinsoon12 is just a Teufelskerl. ^^

                      – Humpelstielzchen
                      2 days ago


















                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55606323%2frank-groups-within-a-grouped-sequence-of-true-false-and-na%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      How did Captain America manage to do this?

                      迪纳利

                      南乌拉尔铁路局