Pandas dataframe: Remove secondary upcoming same value











up vote
9
down vote

favorite
1












I have a dataframe:



col1  col2
a 0
b 1
c 1
d 0
c 1
d 0


On 'col2' I want to keep only the first 1 from the top and replace every 1 below the first one with a 0, such that the output is:



col1  col2
a 0
b 1
c 0
d 0
c 0
d 0


Thank you very much.










share|improve this question




























    up vote
    9
    down vote

    favorite
    1












    I have a dataframe:



    col1  col2
    a 0
    b 1
    c 1
    d 0
    c 1
    d 0


    On 'col2' I want to keep only the first 1 from the top and replace every 1 below the first one with a 0, such that the output is:



    col1  col2
    a 0
    b 1
    c 0
    d 0
    c 0
    d 0


    Thank you very much.










    share|improve this question


























      up vote
      9
      down vote

      favorite
      1









      up vote
      9
      down vote

      favorite
      1






      1





      I have a dataframe:



      col1  col2
      a 0
      b 1
      c 1
      d 0
      c 1
      d 0


      On 'col2' I want to keep only the first 1 from the top and replace every 1 below the first one with a 0, such that the output is:



      col1  col2
      a 0
      b 1
      c 0
      d 0
      c 0
      d 0


      Thank you very much.










      share|improve this question















      I have a dataframe:



      col1  col2
      a 0
      b 1
      c 1
      d 0
      c 1
      d 0


      On 'col2' I want to keep only the first 1 from the top and replace every 1 below the first one with a 0, such that the output is:



      col1  col2
      a 0
      b 1
      c 0
      d 0
      c 0
      d 0


      Thank you very much.







      python pandas dataframe






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Dec 6 at 15:46









      timgeb

      47.7k116288




      47.7k116288










      asked Dec 6 at 15:33









      s900n

      425616




      425616
























          8 Answers
          8






          active

          oldest

          votes

















          up vote
          8
          down vote



          accepted










          You can find the index of the first 1 and set others to 0:



          mask = df['col2'].eq(1)
          df.loc[mask & (df.index != mask.idxmax()), 'col2'] = 0


          For better performance, see Efficiently return the index of the first value satisfying condition in array.






          share|improve this answer





















          • Can you think of a good solution for the case when the index is arbitrary, like Index(['u', 'v', 'w', 'x', 'y', 'z'] AND col2 could be something like [2, 0, 0, 1, 3, 1]?
            – timgeb
            Dec 6 at 16:18












          • @timgeb, To adapt this solution, I think you can use positional indexing (instead of index labels). Something like df.loc[mask & (np.arange(df.shape[0]) != np.where(mask)[0][0]), 'col2'] = 0. But I'm sure there are more Pythonic ways.
            – jpp
            Dec 6 at 16:28












          • Ah, I thought of using numpy, too. Just a bit differently. See my case 3. ;)
            – timgeb
            Dec 6 at 16:30




















          up vote
          4
          down vote













          np.flatnonzero



          Because I thought we needed more answers



          df.loc[df.index[np.flatnonzero(df.col2)[1:]], 'col2'] -= 1
          df

          col1 col2
          0 a 0
          1 b 1
          2 c 0
          3 d 0
          4 c 0
          5 d 0




          Same thing but a little more sneaky.



          df.col2.values[np.flatnonzero(df.col2.values)[1:]] -= 1
          df

          col1 col2
          0 a 0
          1 b 1
          2 c 0
          3 d 0
          4 c 0
          5 d 0





          share|improve this answer






























            up vote
            4
            down vote













            Case 1: df has only ones and zeros in col2 and integer indexes.



            >>> df
            col1 col2
            0 a 0
            1 b 1
            2 c 1
            3 d 0
            4 c 1
            5 d 0


            You can use:



            >>> df.loc[df['col2'].idxmax() + 1:, 'col2'] = 0
            >>> df
            col1 col2
            0 a 0
            1 b 1
            2 c 0
            3 d 0
            4 c 0
            5 d 0




            Case2: df can have all kinds of values in col2 and has integer indexes.



            >>> df # demo dataframe
            col1 col2
            0 a 0
            1 b 1
            2 c 2
            3 d 2
            4 c 3
            5 d 3


            You can use:



            >>> df.loc[(df['col2'] == 1).idxmax() + 1:, 'col2'] = 0
            >>> df
            col1 col2
            0 a 0
            1 b 1
            2 c 0
            3 d 0
            4 c 0
            5 d 0




            Case 3: df can have all kinds of values in col2 and has an arbitrary index.



            >>> df
            col1 col2
            u a -1
            v b 1
            w c 2
            x d 2
            y c 3
            z d 3


            You can use:



            >>> df['col2'].iloc[(df['col2'].values == 1).argmax() + 1:] = 0
            >>> df
            col1 col2
            u a -1
            v b 1
            w c 0
            x d 0
            y c 0
            z d 0





            share|improve this answer






























              up vote
              3
              down vote













              Using drop_duplicates with reindex



              df.col2=df.col2.drop_duplicates().reindex(df.index,fill_value=0)
              df
              Out[1078]:
              col1 col2
              0 a 0
              1 b 1
              2 c 0
              3 d 0
              4 c 0
              5 d 0





              share|improve this answer




























                up vote
                3
                down vote













                You can use numpy for an effficient solution:



                a = df.col2.values
                b = np.zeros_like(a)
                b[a.argmax()] = 1
                df.assign(col2=b)




                  col1  col2
                0 a 0
                1 b 1
                2 c 0
                3 d 0
                4 c 0
                5 d 0





                share|improve this answer






























                  up vote
                  1
                  down vote













                  i like this too



                  data['col2'][np.where(data['col2'] == 1)[0][0]+1:] = 0





                  share|improve this answer





















                  • Chained indexing is not recommended.
                    – jpp
                    Dec 6 at 16:41












                  • Thanks for the update..
                    – iamklaus
                    Dec 7 at 8:43


















                  up vote
                  1
                  down vote













                  Sooo many options, here's mine... almost the same as timgebs answer (found independently), but still different ;)



                  Find the index of col2 that has the first occurence of a 1, and change all row values after that index to 0:



                  df['col2'].iloc[df.col2.idxmax()+1:] = 0





                  share|improve this answer





















                  • Be careful, this sets all values to 0 after the specified index, not just the ones equal to 1. Though that's the same with some other answers too.
                    – jpp
                    Dec 6 at 16:42












                  • Totally agree. Your solution is more general.
                    – Sander van den Oord
                    Dec 6 at 17:42


















                  up vote
                  0
                  down vote













                  id = list(df["col2"]).index(1)
                  df.iloc[id+1:]["col2"].replace(1,0,inplace=True)





                  share|improve this answer

















                  • 3




                    While this code may answer the question, providing additional context regarding how and/or why it solves the problem would improve the answer's long-term value.
                    – Nic3500
                    Dec 6 at 16:00










                  • Chained indexing is not recommended.
                    – jpp
                    Dec 6 at 16:41











                  Your Answer






                  StackExchange.ifUsing("editor", function () {
                  StackExchange.using("externalEditor", function () {
                  StackExchange.using("snippets", function () {
                  StackExchange.snippets.init();
                  });
                  });
                  }, "code-snippets");

                  StackExchange.ready(function() {
                  var channelOptions = {
                  tags: "".split(" "),
                  id: "1"
                  };
                  initTagRenderer("".split(" "), "".split(" "), channelOptions);

                  StackExchange.using("externalEditor", function() {
                  // Have to fire editor after snippets, if snippets enabled
                  if (StackExchange.settings.snippets.snippetsEnabled) {
                  StackExchange.using("snippets", function() {
                  createEditor();
                  });
                  }
                  else {
                  createEditor();
                  }
                  });

                  function createEditor() {
                  StackExchange.prepareEditor({
                  heartbeatType: 'answer',
                  convertImagesToLinks: true,
                  noModals: true,
                  showLowRepImageUploadWarning: true,
                  reputationToPostImages: 10,
                  bindNavPrevention: true,
                  postfix: "",
                  imageUploader: {
                  brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                  contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                  allowUrls: true
                  },
                  onDemand: true,
                  discardSelector: ".discard-answer"
                  ,immediatelyShowMarkdownHelp:true
                  });


                  }
                  });














                  draft saved

                  draft discarded


















                  StackExchange.ready(
                  function () {
                  StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53654729%2fpandas-dataframe-remove-secondary-upcoming-same-value%23new-answer', 'question_page');
                  }
                  );

                  Post as a guest















                  Required, but never shown

























                  8 Answers
                  8






                  active

                  oldest

                  votes








                  8 Answers
                  8






                  active

                  oldest

                  votes









                  active

                  oldest

                  votes






                  active

                  oldest

                  votes








                  up vote
                  8
                  down vote



                  accepted










                  You can find the index of the first 1 and set others to 0:



                  mask = df['col2'].eq(1)
                  df.loc[mask & (df.index != mask.idxmax()), 'col2'] = 0


                  For better performance, see Efficiently return the index of the first value satisfying condition in array.






                  share|improve this answer





















                  • Can you think of a good solution for the case when the index is arbitrary, like Index(['u', 'v', 'w', 'x', 'y', 'z'] AND col2 could be something like [2, 0, 0, 1, 3, 1]?
                    – timgeb
                    Dec 6 at 16:18












                  • @timgeb, To adapt this solution, I think you can use positional indexing (instead of index labels). Something like df.loc[mask & (np.arange(df.shape[0]) != np.where(mask)[0][0]), 'col2'] = 0. But I'm sure there are more Pythonic ways.
                    – jpp
                    Dec 6 at 16:28












                  • Ah, I thought of using numpy, too. Just a bit differently. See my case 3. ;)
                    – timgeb
                    Dec 6 at 16:30

















                  up vote
                  8
                  down vote



                  accepted










                  You can find the index of the first 1 and set others to 0:



                  mask = df['col2'].eq(1)
                  df.loc[mask & (df.index != mask.idxmax()), 'col2'] = 0


                  For better performance, see Efficiently return the index of the first value satisfying condition in array.






                  share|improve this answer





















                  • Can you think of a good solution for the case when the index is arbitrary, like Index(['u', 'v', 'w', 'x', 'y', 'z'] AND col2 could be something like [2, 0, 0, 1, 3, 1]?
                    – timgeb
                    Dec 6 at 16:18












                  • @timgeb, To adapt this solution, I think you can use positional indexing (instead of index labels). Something like df.loc[mask & (np.arange(df.shape[0]) != np.where(mask)[0][0]), 'col2'] = 0. But I'm sure there are more Pythonic ways.
                    – jpp
                    Dec 6 at 16:28












                  • Ah, I thought of using numpy, too. Just a bit differently. See my case 3. ;)
                    – timgeb
                    Dec 6 at 16:30















                  up vote
                  8
                  down vote



                  accepted







                  up vote
                  8
                  down vote



                  accepted






                  You can find the index of the first 1 and set others to 0:



                  mask = df['col2'].eq(1)
                  df.loc[mask & (df.index != mask.idxmax()), 'col2'] = 0


                  For better performance, see Efficiently return the index of the first value satisfying condition in array.






                  share|improve this answer












                  You can find the index of the first 1 and set others to 0:



                  mask = df['col2'].eq(1)
                  df.loc[mask & (df.index != mask.idxmax()), 'col2'] = 0


                  For better performance, see Efficiently return the index of the first value satisfying condition in array.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Dec 6 at 15:37









                  jpp

                  88k195099




                  88k195099












                  • Can you think of a good solution for the case when the index is arbitrary, like Index(['u', 'v', 'w', 'x', 'y', 'z'] AND col2 could be something like [2, 0, 0, 1, 3, 1]?
                    – timgeb
                    Dec 6 at 16:18












                  • @timgeb, To adapt this solution, I think you can use positional indexing (instead of index labels). Something like df.loc[mask & (np.arange(df.shape[0]) != np.where(mask)[0][0]), 'col2'] = 0. But I'm sure there are more Pythonic ways.
                    – jpp
                    Dec 6 at 16:28












                  • Ah, I thought of using numpy, too. Just a bit differently. See my case 3. ;)
                    – timgeb
                    Dec 6 at 16:30




















                  • Can you think of a good solution for the case when the index is arbitrary, like Index(['u', 'v', 'w', 'x', 'y', 'z'] AND col2 could be something like [2, 0, 0, 1, 3, 1]?
                    – timgeb
                    Dec 6 at 16:18












                  • @timgeb, To adapt this solution, I think you can use positional indexing (instead of index labels). Something like df.loc[mask & (np.arange(df.shape[0]) != np.where(mask)[0][0]), 'col2'] = 0. But I'm sure there are more Pythonic ways.
                    – jpp
                    Dec 6 at 16:28












                  • Ah, I thought of using numpy, too. Just a bit differently. See my case 3. ;)
                    – timgeb
                    Dec 6 at 16:30


















                  Can you think of a good solution for the case when the index is arbitrary, like Index(['u', 'v', 'w', 'x', 'y', 'z'] AND col2 could be something like [2, 0, 0, 1, 3, 1]?
                  – timgeb
                  Dec 6 at 16:18






                  Can you think of a good solution for the case when the index is arbitrary, like Index(['u', 'v', 'w', 'x', 'y', 'z'] AND col2 could be something like [2, 0, 0, 1, 3, 1]?
                  – timgeb
                  Dec 6 at 16:18














                  @timgeb, To adapt this solution, I think you can use positional indexing (instead of index labels). Something like df.loc[mask & (np.arange(df.shape[0]) != np.where(mask)[0][0]), 'col2'] = 0. But I'm sure there are more Pythonic ways.
                  – jpp
                  Dec 6 at 16:28






                  @timgeb, To adapt this solution, I think you can use positional indexing (instead of index labels). Something like df.loc[mask & (np.arange(df.shape[0]) != np.where(mask)[0][0]), 'col2'] = 0. But I'm sure there are more Pythonic ways.
                  – jpp
                  Dec 6 at 16:28














                  Ah, I thought of using numpy, too. Just a bit differently. See my case 3. ;)
                  – timgeb
                  Dec 6 at 16:30






                  Ah, I thought of using numpy, too. Just a bit differently. See my case 3. ;)
                  – timgeb
                  Dec 6 at 16:30














                  up vote
                  4
                  down vote













                  np.flatnonzero



                  Because I thought we needed more answers



                  df.loc[df.index[np.flatnonzero(df.col2)[1:]], 'col2'] -= 1
                  df

                  col1 col2
                  0 a 0
                  1 b 1
                  2 c 0
                  3 d 0
                  4 c 0
                  5 d 0




                  Same thing but a little more sneaky.



                  df.col2.values[np.flatnonzero(df.col2.values)[1:]] -= 1
                  df

                  col1 col2
                  0 a 0
                  1 b 1
                  2 c 0
                  3 d 0
                  4 c 0
                  5 d 0





                  share|improve this answer



























                    up vote
                    4
                    down vote













                    np.flatnonzero



                    Because I thought we needed more answers



                    df.loc[df.index[np.flatnonzero(df.col2)[1:]], 'col2'] -= 1
                    df

                    col1 col2
                    0 a 0
                    1 b 1
                    2 c 0
                    3 d 0
                    4 c 0
                    5 d 0




                    Same thing but a little more sneaky.



                    df.col2.values[np.flatnonzero(df.col2.values)[1:]] -= 1
                    df

                    col1 col2
                    0 a 0
                    1 b 1
                    2 c 0
                    3 d 0
                    4 c 0
                    5 d 0





                    share|improve this answer

























                      up vote
                      4
                      down vote










                      up vote
                      4
                      down vote









                      np.flatnonzero



                      Because I thought we needed more answers



                      df.loc[df.index[np.flatnonzero(df.col2)[1:]], 'col2'] -= 1
                      df

                      col1 col2
                      0 a 0
                      1 b 1
                      2 c 0
                      3 d 0
                      4 c 0
                      5 d 0




                      Same thing but a little more sneaky.



                      df.col2.values[np.flatnonzero(df.col2.values)[1:]] -= 1
                      df

                      col1 col2
                      0 a 0
                      1 b 1
                      2 c 0
                      3 d 0
                      4 c 0
                      5 d 0





                      share|improve this answer














                      np.flatnonzero



                      Because I thought we needed more answers



                      df.loc[df.index[np.flatnonzero(df.col2)[1:]], 'col2'] -= 1
                      df

                      col1 col2
                      0 a 0
                      1 b 1
                      2 c 0
                      3 d 0
                      4 c 0
                      5 d 0




                      Same thing but a little more sneaky.



                      df.col2.values[np.flatnonzero(df.col2.values)[1:]] -= 1
                      df

                      col1 col2
                      0 a 0
                      1 b 1
                      2 c 0
                      3 d 0
                      4 c 0
                      5 d 0






                      share|improve this answer














                      share|improve this answer



                      share|improve this answer








                      edited Dec 6 at 15:56

























                      answered Dec 6 at 15:51









                      piRSquared

                      151k22138282




                      151k22138282






















                          up vote
                          4
                          down vote













                          Case 1: df has only ones and zeros in col2 and integer indexes.



                          >>> df
                          col1 col2
                          0 a 0
                          1 b 1
                          2 c 1
                          3 d 0
                          4 c 1
                          5 d 0


                          You can use:



                          >>> df.loc[df['col2'].idxmax() + 1:, 'col2'] = 0
                          >>> df
                          col1 col2
                          0 a 0
                          1 b 1
                          2 c 0
                          3 d 0
                          4 c 0
                          5 d 0




                          Case2: df can have all kinds of values in col2 and has integer indexes.



                          >>> df # demo dataframe
                          col1 col2
                          0 a 0
                          1 b 1
                          2 c 2
                          3 d 2
                          4 c 3
                          5 d 3


                          You can use:



                          >>> df.loc[(df['col2'] == 1).idxmax() + 1:, 'col2'] = 0
                          >>> df
                          col1 col2
                          0 a 0
                          1 b 1
                          2 c 0
                          3 d 0
                          4 c 0
                          5 d 0




                          Case 3: df can have all kinds of values in col2 and has an arbitrary index.



                          >>> df
                          col1 col2
                          u a -1
                          v b 1
                          w c 2
                          x d 2
                          y c 3
                          z d 3


                          You can use:



                          >>> df['col2'].iloc[(df['col2'].values == 1).argmax() + 1:] = 0
                          >>> df
                          col1 col2
                          u a -1
                          v b 1
                          w c 0
                          x d 0
                          y c 0
                          z d 0





                          share|improve this answer



























                            up vote
                            4
                            down vote













                            Case 1: df has only ones and zeros in col2 and integer indexes.



                            >>> df
                            col1 col2
                            0 a 0
                            1 b 1
                            2 c 1
                            3 d 0
                            4 c 1
                            5 d 0


                            You can use:



                            >>> df.loc[df['col2'].idxmax() + 1:, 'col2'] = 0
                            >>> df
                            col1 col2
                            0 a 0
                            1 b 1
                            2 c 0
                            3 d 0
                            4 c 0
                            5 d 0




                            Case2: df can have all kinds of values in col2 and has integer indexes.



                            >>> df # demo dataframe
                            col1 col2
                            0 a 0
                            1 b 1
                            2 c 2
                            3 d 2
                            4 c 3
                            5 d 3


                            You can use:



                            >>> df.loc[(df['col2'] == 1).idxmax() + 1:, 'col2'] = 0
                            >>> df
                            col1 col2
                            0 a 0
                            1 b 1
                            2 c 0
                            3 d 0
                            4 c 0
                            5 d 0




                            Case 3: df can have all kinds of values in col2 and has an arbitrary index.



                            >>> df
                            col1 col2
                            u a -1
                            v b 1
                            w c 2
                            x d 2
                            y c 3
                            z d 3


                            You can use:



                            >>> df['col2'].iloc[(df['col2'].values == 1).argmax() + 1:] = 0
                            >>> df
                            col1 col2
                            u a -1
                            v b 1
                            w c 0
                            x d 0
                            y c 0
                            z d 0





                            share|improve this answer

























                              up vote
                              4
                              down vote










                              up vote
                              4
                              down vote









                              Case 1: df has only ones and zeros in col2 and integer indexes.



                              >>> df
                              col1 col2
                              0 a 0
                              1 b 1
                              2 c 1
                              3 d 0
                              4 c 1
                              5 d 0


                              You can use:



                              >>> df.loc[df['col2'].idxmax() + 1:, 'col2'] = 0
                              >>> df
                              col1 col2
                              0 a 0
                              1 b 1
                              2 c 0
                              3 d 0
                              4 c 0
                              5 d 0




                              Case2: df can have all kinds of values in col2 and has integer indexes.



                              >>> df # demo dataframe
                              col1 col2
                              0 a 0
                              1 b 1
                              2 c 2
                              3 d 2
                              4 c 3
                              5 d 3


                              You can use:



                              >>> df.loc[(df['col2'] == 1).idxmax() + 1:, 'col2'] = 0
                              >>> df
                              col1 col2
                              0 a 0
                              1 b 1
                              2 c 0
                              3 d 0
                              4 c 0
                              5 d 0




                              Case 3: df can have all kinds of values in col2 and has an arbitrary index.



                              >>> df
                              col1 col2
                              u a -1
                              v b 1
                              w c 2
                              x d 2
                              y c 3
                              z d 3


                              You can use:



                              >>> df['col2'].iloc[(df['col2'].values == 1).argmax() + 1:] = 0
                              >>> df
                              col1 col2
                              u a -1
                              v b 1
                              w c 0
                              x d 0
                              y c 0
                              z d 0





                              share|improve this answer














                              Case 1: df has only ones and zeros in col2 and integer indexes.



                              >>> df
                              col1 col2
                              0 a 0
                              1 b 1
                              2 c 1
                              3 d 0
                              4 c 1
                              5 d 0


                              You can use:



                              >>> df.loc[df['col2'].idxmax() + 1:, 'col2'] = 0
                              >>> df
                              col1 col2
                              0 a 0
                              1 b 1
                              2 c 0
                              3 d 0
                              4 c 0
                              5 d 0




                              Case2: df can have all kinds of values in col2 and has integer indexes.



                              >>> df # demo dataframe
                              col1 col2
                              0 a 0
                              1 b 1
                              2 c 2
                              3 d 2
                              4 c 3
                              5 d 3


                              You can use:



                              >>> df.loc[(df['col2'] == 1).idxmax() + 1:, 'col2'] = 0
                              >>> df
                              col1 col2
                              0 a 0
                              1 b 1
                              2 c 0
                              3 d 0
                              4 c 0
                              5 d 0




                              Case 3: df can have all kinds of values in col2 and has an arbitrary index.



                              >>> df
                              col1 col2
                              u a -1
                              v b 1
                              w c 2
                              x d 2
                              y c 3
                              z d 3


                              You can use:



                              >>> df['col2'].iloc[(df['col2'].values == 1).argmax() + 1:] = 0
                              >>> df
                              col1 col2
                              u a -1
                              v b 1
                              w c 0
                              x d 0
                              y c 0
                              z d 0






                              share|improve this answer














                              share|improve this answer



                              share|improve this answer








                              edited Dec 6 at 16:29

























                              answered Dec 6 at 15:38









                              timgeb

                              47.7k116288




                              47.7k116288






















                                  up vote
                                  3
                                  down vote













                                  Using drop_duplicates with reindex



                                  df.col2=df.col2.drop_duplicates().reindex(df.index,fill_value=0)
                                  df
                                  Out[1078]:
                                  col1 col2
                                  0 a 0
                                  1 b 1
                                  2 c 0
                                  3 d 0
                                  4 c 0
                                  5 d 0





                                  share|improve this answer

























                                    up vote
                                    3
                                    down vote













                                    Using drop_duplicates with reindex



                                    df.col2=df.col2.drop_duplicates().reindex(df.index,fill_value=0)
                                    df
                                    Out[1078]:
                                    col1 col2
                                    0 a 0
                                    1 b 1
                                    2 c 0
                                    3 d 0
                                    4 c 0
                                    5 d 0





                                    share|improve this answer























                                      up vote
                                      3
                                      down vote










                                      up vote
                                      3
                                      down vote









                                      Using drop_duplicates with reindex



                                      df.col2=df.col2.drop_duplicates().reindex(df.index,fill_value=0)
                                      df
                                      Out[1078]:
                                      col1 col2
                                      0 a 0
                                      1 b 1
                                      2 c 0
                                      3 d 0
                                      4 c 0
                                      5 d 0





                                      share|improve this answer












                                      Using drop_duplicates with reindex



                                      df.col2=df.col2.drop_duplicates().reindex(df.index,fill_value=0)
                                      df
                                      Out[1078]:
                                      col1 col2
                                      0 a 0
                                      1 b 1
                                      2 c 0
                                      3 d 0
                                      4 c 0
                                      5 d 0






                                      share|improve this answer












                                      share|improve this answer



                                      share|improve this answer










                                      answered Dec 6 at 15:41









                                      W-B

                                      97.6k73162




                                      97.6k73162






















                                          up vote
                                          3
                                          down vote













                                          You can use numpy for an effficient solution:



                                          a = df.col2.values
                                          b = np.zeros_like(a)
                                          b[a.argmax()] = 1
                                          df.assign(col2=b)




                                            col1  col2
                                          0 a 0
                                          1 b 1
                                          2 c 0
                                          3 d 0
                                          4 c 0
                                          5 d 0





                                          share|improve this answer



























                                            up vote
                                            3
                                            down vote













                                            You can use numpy for an effficient solution:



                                            a = df.col2.values
                                            b = np.zeros_like(a)
                                            b[a.argmax()] = 1
                                            df.assign(col2=b)




                                              col1  col2
                                            0 a 0
                                            1 b 1
                                            2 c 0
                                            3 d 0
                                            4 c 0
                                            5 d 0





                                            share|improve this answer

























                                              up vote
                                              3
                                              down vote










                                              up vote
                                              3
                                              down vote









                                              You can use numpy for an effficient solution:



                                              a = df.col2.values
                                              b = np.zeros_like(a)
                                              b[a.argmax()] = 1
                                              df.assign(col2=b)




                                                col1  col2
                                              0 a 0
                                              1 b 1
                                              2 c 0
                                              3 d 0
                                              4 c 0
                                              5 d 0





                                              share|improve this answer














                                              You can use numpy for an effficient solution:



                                              a = df.col2.values
                                              b = np.zeros_like(a)
                                              b[a.argmax()] = 1
                                              df.assign(col2=b)




                                                col1  col2
                                              0 a 0
                                              1 b 1
                                              2 c 0
                                              3 d 0
                                              4 c 0
                                              5 d 0






                                              share|improve this answer














                                              share|improve this answer



                                              share|improve this answer








                                              edited Dec 6 at 15:53

























                                              answered Dec 6 at 15:39









                                              user3483203

                                              29.9k82354




                                              29.9k82354






















                                                  up vote
                                                  1
                                                  down vote













                                                  i like this too



                                                  data['col2'][np.where(data['col2'] == 1)[0][0]+1:] = 0





                                                  share|improve this answer





















                                                  • Chained indexing is not recommended.
                                                    – jpp
                                                    Dec 6 at 16:41












                                                  • Thanks for the update..
                                                    – iamklaus
                                                    Dec 7 at 8:43















                                                  up vote
                                                  1
                                                  down vote













                                                  i like this too



                                                  data['col2'][np.where(data['col2'] == 1)[0][0]+1:] = 0





                                                  share|improve this answer





















                                                  • Chained indexing is not recommended.
                                                    – jpp
                                                    Dec 6 at 16:41












                                                  • Thanks for the update..
                                                    – iamklaus
                                                    Dec 7 at 8:43













                                                  up vote
                                                  1
                                                  down vote










                                                  up vote
                                                  1
                                                  down vote









                                                  i like this too



                                                  data['col2'][np.where(data['col2'] == 1)[0][0]+1:] = 0





                                                  share|improve this answer












                                                  i like this too



                                                  data['col2'][np.where(data['col2'] == 1)[0][0]+1:] = 0






                                                  share|improve this answer












                                                  share|improve this answer



                                                  share|improve this answer










                                                  answered Dec 6 at 15:42









                                                  iamklaus

                                                  78648




                                                  78648












                                                  • Chained indexing is not recommended.
                                                    – jpp
                                                    Dec 6 at 16:41












                                                  • Thanks for the update..
                                                    – iamklaus
                                                    Dec 7 at 8:43


















                                                  • Chained indexing is not recommended.
                                                    – jpp
                                                    Dec 6 at 16:41












                                                  • Thanks for the update..
                                                    – iamklaus
                                                    Dec 7 at 8:43
















                                                  Chained indexing is not recommended.
                                                  – jpp
                                                  Dec 6 at 16:41






                                                  Chained indexing is not recommended.
                                                  – jpp
                                                  Dec 6 at 16:41














                                                  Thanks for the update..
                                                  – iamklaus
                                                  Dec 7 at 8:43




                                                  Thanks for the update..
                                                  – iamklaus
                                                  Dec 7 at 8:43










                                                  up vote
                                                  1
                                                  down vote













                                                  Sooo many options, here's mine... almost the same as timgebs answer (found independently), but still different ;)



                                                  Find the index of col2 that has the first occurence of a 1, and change all row values after that index to 0:



                                                  df['col2'].iloc[df.col2.idxmax()+1:] = 0





                                                  share|improve this answer





















                                                  • Be careful, this sets all values to 0 after the specified index, not just the ones equal to 1. Though that's the same with some other answers too.
                                                    – jpp
                                                    Dec 6 at 16:42












                                                  • Totally agree. Your solution is more general.
                                                    – Sander van den Oord
                                                    Dec 6 at 17:42















                                                  up vote
                                                  1
                                                  down vote













                                                  Sooo many options, here's mine... almost the same as timgebs answer (found independently), but still different ;)



                                                  Find the index of col2 that has the first occurence of a 1, and change all row values after that index to 0:



                                                  df['col2'].iloc[df.col2.idxmax()+1:] = 0





                                                  share|improve this answer





















                                                  • Be careful, this sets all values to 0 after the specified index, not just the ones equal to 1. Though that's the same with some other answers too.
                                                    – jpp
                                                    Dec 6 at 16:42












                                                  • Totally agree. Your solution is more general.
                                                    – Sander van den Oord
                                                    Dec 6 at 17:42













                                                  up vote
                                                  1
                                                  down vote










                                                  up vote
                                                  1
                                                  down vote









                                                  Sooo many options, here's mine... almost the same as timgebs answer (found independently), but still different ;)



                                                  Find the index of col2 that has the first occurence of a 1, and change all row values after that index to 0:



                                                  df['col2'].iloc[df.col2.idxmax()+1:] = 0





                                                  share|improve this answer












                                                  Sooo many options, here's mine... almost the same as timgebs answer (found independently), but still different ;)



                                                  Find the index of col2 that has the first occurence of a 1, and change all row values after that index to 0:



                                                  df['col2'].iloc[df.col2.idxmax()+1:] = 0






                                                  share|improve this answer












                                                  share|improve this answer



                                                  share|improve this answer










                                                  answered Dec 6 at 15:55









                                                  Sander van den Oord

                                                  551419




                                                  551419












                                                  • Be careful, this sets all values to 0 after the specified index, not just the ones equal to 1. Though that's the same with some other answers too.
                                                    – jpp
                                                    Dec 6 at 16:42












                                                  • Totally agree. Your solution is more general.
                                                    – Sander van den Oord
                                                    Dec 6 at 17:42


















                                                  • Be careful, this sets all values to 0 after the specified index, not just the ones equal to 1. Though that's the same with some other answers too.
                                                    – jpp
                                                    Dec 6 at 16:42












                                                  • Totally agree. Your solution is more general.
                                                    – Sander van den Oord
                                                    Dec 6 at 17:42
















                                                  Be careful, this sets all values to 0 after the specified index, not just the ones equal to 1. Though that's the same with some other answers too.
                                                  – jpp
                                                  Dec 6 at 16:42






                                                  Be careful, this sets all values to 0 after the specified index, not just the ones equal to 1. Though that's the same with some other answers too.
                                                  – jpp
                                                  Dec 6 at 16:42














                                                  Totally agree. Your solution is more general.
                                                  – Sander van den Oord
                                                  Dec 6 at 17:42




                                                  Totally agree. Your solution is more general.
                                                  – Sander van den Oord
                                                  Dec 6 at 17:42










                                                  up vote
                                                  0
                                                  down vote













                                                  id = list(df["col2"]).index(1)
                                                  df.iloc[id+1:]["col2"].replace(1,0,inplace=True)





                                                  share|improve this answer

















                                                  • 3




                                                    While this code may answer the question, providing additional context regarding how and/or why it solves the problem would improve the answer's long-term value.
                                                    – Nic3500
                                                    Dec 6 at 16:00










                                                  • Chained indexing is not recommended.
                                                    – jpp
                                                    Dec 6 at 16:41















                                                  up vote
                                                  0
                                                  down vote













                                                  id = list(df["col2"]).index(1)
                                                  df.iloc[id+1:]["col2"].replace(1,0,inplace=True)





                                                  share|improve this answer

















                                                  • 3




                                                    While this code may answer the question, providing additional context regarding how and/or why it solves the problem would improve the answer's long-term value.
                                                    – Nic3500
                                                    Dec 6 at 16:00










                                                  • Chained indexing is not recommended.
                                                    – jpp
                                                    Dec 6 at 16:41













                                                  up vote
                                                  0
                                                  down vote










                                                  up vote
                                                  0
                                                  down vote









                                                  id = list(df["col2"]).index(1)
                                                  df.iloc[id+1:]["col2"].replace(1,0,inplace=True)





                                                  share|improve this answer












                                                  id = list(df["col2"]).index(1)
                                                  df.iloc[id+1:]["col2"].replace(1,0,inplace=True)






                                                  share|improve this answer












                                                  share|improve this answer



                                                  share|improve this answer










                                                  answered Dec 6 at 15:43









                                                  shyamrag cp

                                                  385




                                                  385








                                                  • 3




                                                    While this code may answer the question, providing additional context regarding how and/or why it solves the problem would improve the answer's long-term value.
                                                    – Nic3500
                                                    Dec 6 at 16:00










                                                  • Chained indexing is not recommended.
                                                    – jpp
                                                    Dec 6 at 16:41














                                                  • 3




                                                    While this code may answer the question, providing additional context regarding how and/or why it solves the problem would improve the answer's long-term value.
                                                    – Nic3500
                                                    Dec 6 at 16:00










                                                  • Chained indexing is not recommended.
                                                    – jpp
                                                    Dec 6 at 16:41








                                                  3




                                                  3




                                                  While this code may answer the question, providing additional context regarding how and/or why it solves the problem would improve the answer's long-term value.
                                                  – Nic3500
                                                  Dec 6 at 16:00




                                                  While this code may answer the question, providing additional context regarding how and/or why it solves the problem would improve the answer's long-term value.
                                                  – Nic3500
                                                  Dec 6 at 16:00












                                                  Chained indexing is not recommended.
                                                  – jpp
                                                  Dec 6 at 16:41




                                                  Chained indexing is not recommended.
                                                  – jpp
                                                  Dec 6 at 16:41


















                                                  draft saved

                                                  draft discarded




















































                                                  Thanks for contributing an answer to Stack Overflow!


                                                  • Please be sure to answer the question. Provide details and share your research!

                                                  But avoid



                                                  • Asking for help, clarification, or responding to other answers.

                                                  • Making statements based on opinion; back them up with references or personal experience.


                                                  To learn more, see our tips on writing great answers.





                                                  Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                                                  Please pay close attention to the following guidance:


                                                  • Please be sure to answer the question. Provide details and share your research!

                                                  But avoid



                                                  • Asking for help, clarification, or responding to other answers.

                                                  • Making statements based on opinion; back them up with references or personal experience.


                                                  To learn more, see our tips on writing great answers.




                                                  draft saved


                                                  draft discarded














                                                  StackExchange.ready(
                                                  function () {
                                                  StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53654729%2fpandas-dataframe-remove-secondary-upcoming-same-value%23new-answer', 'question_page');
                                                  }
                                                  );

                                                  Post as a guest















                                                  Required, but never shown





















































                                                  Required, but never shown














                                                  Required, but never shown












                                                  Required, but never shown







                                                  Required, but never shown

































                                                  Required, but never shown














                                                  Required, but never shown












                                                  Required, but never shown







                                                  Required, but never shown







                                                  Popular posts from this blog

                                                  數位音樂下載

                                                  When can things happen in Etherscan, such as the picture below?

                                                  格利澤436b