Delete multiple columns using awk or sed












3















I have a database with 6037 space-separated columns and 450 rows like the one below:



1807 1452 1598 1 6.655713  A B A B ... 0 
1808 1452 1763 1 9.362033 0 0 A B ... A
1809 1452 1527 2 6.728534 A B A A ... B
1810 1452 1367 2 9.4055 A B A A B ... A
... ... ... ... ... ... ... ... ... ...
1812 1452 1258 1 6.363032 0 0 A B ... B


I want to get a new database with only the first 676 columns.



Preferably, some form that uses awk or sed command.










share|improve this question









New contributor




andrec is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

























    3















    I have a database with 6037 space-separated columns and 450 rows like the one below:



    1807 1452 1598 1 6.655713  A B A B ... 0 
    1808 1452 1763 1 9.362033 0 0 A B ... A
    1809 1452 1527 2 6.728534 A B A A ... B
    1810 1452 1367 2 9.4055 A B A A B ... A
    ... ... ... ... ... ... ... ... ... ...
    1812 1452 1258 1 6.363032 0 0 A B ... B


    I want to get a new database with only the first 676 columns.



    Preferably, some form that uses awk or sed command.










    share|improve this question









    New contributor




    andrec is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.























      3












      3








      3








      I have a database with 6037 space-separated columns and 450 rows like the one below:



      1807 1452 1598 1 6.655713  A B A B ... 0 
      1808 1452 1763 1 9.362033 0 0 A B ... A
      1809 1452 1527 2 6.728534 A B A A ... B
      1810 1452 1367 2 9.4055 A B A A B ... A
      ... ... ... ... ... ... ... ... ... ...
      1812 1452 1258 1 6.363032 0 0 A B ... B


      I want to get a new database with only the first 676 columns.



      Preferably, some form that uses awk or sed command.










      share|improve this question









      New contributor




      andrec is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.












      I have a database with 6037 space-separated columns and 450 rows like the one below:



      1807 1452 1598 1 6.655713  A B A B ... 0 
      1808 1452 1763 1 9.362033 0 0 A B ... A
      1809 1452 1527 2 6.728534 A B A A ... B
      1810 1452 1367 2 9.4055 A B A A B ... A
      ... ... ... ... ... ... ... ... ... ...
      1812 1452 1258 1 6.363032 0 0 A B ... B


      I want to get a new database with only the first 676 columns.



      Preferably, some form that uses awk or sed command.







      text-processing sed awk






      share|improve this question









      New contributor




      andrec is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question









      New contributor




      andrec is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question








      edited yesterday









      dessert

      24.9k672105




      24.9k672105






      New contributor




      andrec is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 2 days ago









      andrecandrec

      161




      161




      New contributor




      andrec is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      andrec is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      andrec is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






















          3 Answers
          3






          active

          oldest

          votes


















          7
















          If the column delimiter in your file is a single character, e.g. a space, cut can do that easily:



          cut -d' ' -f-676 <in >out


          This prints only the space-separated columns from the first to the 676th.



          If you need e.g. every whitespace character to count as a delimiter, a sed solution is:



          sed -r 's/s+S+//677g' <in >out


          This replaces every column (= at least one whitespace character followed by at least one non-whitespace character) beginning with the 677th with nothing. Using character groups you can specify any set of delimiters you need, e.g. for “4”, “#” and “K”:



          sed -r 's/[4#K]+[^4#K]+//677g' <in >out


          For a reasonable awk approach kindly refer to steeldriver’s answer, but here is another one looping over the columns and only printing them (separated by FS) if their number is <= 676:



          awk '{for (i=1;i<=676;i++) {printf (i==1?"":FS)$i}; print ""}' <in >out


          For a character group you have to specify the output field separator for the output, e.g. for [4#K] and "sep":




          awk -F'[4#K]' '{for (i=1;i<=676;i++) {printf (i==1?"":"sep")$i}; print ""}' <in >out





          share|improve this answer

































            4














            For a single-character delimiter (such as space or comma) I would recommend using the cut command over either awk or sed.



            However since you asked about awk specifically, I think a reasonable way to do it would be to decrement the field count:



            awk -v last=676 '{while(NF>last) NF--} 1' datafile


            Tested in GNU Awk (gawk) and mawk.






            share|improve this answer





















            • 1





              Why not just { NF = last; print } instead of the loop?

              – wchargin
              yesterday






            • 1





              @wchargin Doh! yes that's much better - wanna post it as an answer?

              – steeldriver
              yesterday



















            3














            You could use



            mlr --nidx --fs ' ' --repifs cat inputFile.csv | cut -d ' ' -f-2


            In this way with mlr (https://github.com/johnkerl/miller/releases/tag/5.4.0) you manage field separators (if you have more than one spaces, they become one per field), and with cut you extract (in my example) the first two fields.



            From



            1807   1452 1598  1 6.655713  A B A B
            1808 1452 1763 1 9.362033 0 0 A B
            1809 1452 1527 2 6.728534 A B A A
            1810 1452 1367 2 9.4055 A B A A B


            to



            1807 1452
            1808 1452
            1809 1452
            1810 1452


            Some notes about Miller options:





            • --nidx is to set the format; this is a generic index-numbered table (the first field is 1, the second is 2, ecc..);


            • --fs to set the separator (here is a space);


            • --repifs means that multiple successive occurrences of the field separator count as one


            • cat passes input records directly to output.






            share|improve this answer

























              Your Answer








              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "89"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });






              andrec is a new contributor. Be nice, and check out our Code of Conduct.










              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1127670%2fdelete-multiple-columns-using-awk-or-sed%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              3 Answers
              3






              active

              oldest

              votes








              3 Answers
              3






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              7
















              If the column delimiter in your file is a single character, e.g. a space, cut can do that easily:



              cut -d' ' -f-676 <in >out


              This prints only the space-separated columns from the first to the 676th.



              If you need e.g. every whitespace character to count as a delimiter, a sed solution is:



              sed -r 's/s+S+//677g' <in >out


              This replaces every column (= at least one whitespace character followed by at least one non-whitespace character) beginning with the 677th with nothing. Using character groups you can specify any set of delimiters you need, e.g. for “4”, “#” and “K”:



              sed -r 's/[4#K]+[^4#K]+//677g' <in >out


              For a reasonable awk approach kindly refer to steeldriver’s answer, but here is another one looping over the columns and only printing them (separated by FS) if their number is <= 676:



              awk '{for (i=1;i<=676;i++) {printf (i==1?"":FS)$i}; print ""}' <in >out


              For a character group you have to specify the output field separator for the output, e.g. for [4#K] and "sep":




              awk -F'[4#K]' '{for (i=1;i<=676;i++) {printf (i==1?"":"sep")$i}; print ""}' <in >out





              share|improve this answer






























                7
















                If the column delimiter in your file is a single character, e.g. a space, cut can do that easily:



                cut -d' ' -f-676 <in >out


                This prints only the space-separated columns from the first to the 676th.



                If you need e.g. every whitespace character to count as a delimiter, a sed solution is:



                sed -r 's/s+S+//677g' <in >out


                This replaces every column (= at least one whitespace character followed by at least one non-whitespace character) beginning with the 677th with nothing. Using character groups you can specify any set of delimiters you need, e.g. for “4”, “#” and “K”:



                sed -r 's/[4#K]+[^4#K]+//677g' <in >out


                For a reasonable awk approach kindly refer to steeldriver’s answer, but here is another one looping over the columns and only printing them (separated by FS) if their number is <= 676:



                awk '{for (i=1;i<=676;i++) {printf (i==1?"":FS)$i}; print ""}' <in >out


                For a character group you have to specify the output field separator for the output, e.g. for [4#K] and "sep":




                awk -F'[4#K]' '{for (i=1;i<=676;i++) {printf (i==1?"":"sep")$i}; print ""}' <in >out





                share|improve this answer




























                  7












                  7








                  7









                  If the column delimiter in your file is a single character, e.g. a space, cut can do that easily:



                  cut -d' ' -f-676 <in >out


                  This prints only the space-separated columns from the first to the 676th.



                  If you need e.g. every whitespace character to count as a delimiter, a sed solution is:



                  sed -r 's/s+S+//677g' <in >out


                  This replaces every column (= at least one whitespace character followed by at least one non-whitespace character) beginning with the 677th with nothing. Using character groups you can specify any set of delimiters you need, e.g. for “4”, “#” and “K”:



                  sed -r 's/[4#K]+[^4#K]+//677g' <in >out


                  For a reasonable awk approach kindly refer to steeldriver’s answer, but here is another one looping over the columns and only printing them (separated by FS) if their number is <= 676:



                  awk '{for (i=1;i<=676;i++) {printf (i==1?"":FS)$i}; print ""}' <in >out


                  For a character group you have to specify the output field separator for the output, e.g. for [4#K] and "sep":




                  awk -F'[4#K]' '{for (i=1;i<=676;i++) {printf (i==1?"":"sep")$i}; print ""}' <in >out





                  share|improve this answer

















                  If the column delimiter in your file is a single character, e.g. a space, cut can do that easily:



                  cut -d' ' -f-676 <in >out


                  This prints only the space-separated columns from the first to the 676th.



                  If you need e.g. every whitespace character to count as a delimiter, a sed solution is:



                  sed -r 's/s+S+//677g' <in >out


                  This replaces every column (= at least one whitespace character followed by at least one non-whitespace character) beginning with the 677th with nothing. Using character groups you can specify any set of delimiters you need, e.g. for “4”, “#” and “K”:



                  sed -r 's/[4#K]+[^4#K]+//677g' <in >out


                  For a reasonable awk approach kindly refer to steeldriver’s answer, but here is another one looping over the columns and only printing them (separated by FS) if their number is <= 676:



                  awk '{for (i=1;i<=676;i++) {printf (i==1?"":FS)$i}; print ""}' <in >out


                  For a character group you have to specify the output field separator for the output, e.g. for [4#K] and "sep":




                  awk -F'[4#K]' '{for (i=1;i<=676;i++) {printf (i==1?"":"sep")$i}; print ""}' <in >out






                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited yesterday

























                  answered yesterday









                  dessertdessert

                  24.9k672105




                  24.9k672105

























                      4














                      For a single-character delimiter (such as space or comma) I would recommend using the cut command over either awk or sed.



                      However since you asked about awk specifically, I think a reasonable way to do it would be to decrement the field count:



                      awk -v last=676 '{while(NF>last) NF--} 1' datafile


                      Tested in GNU Awk (gawk) and mawk.






                      share|improve this answer





















                      • 1





                        Why not just { NF = last; print } instead of the loop?

                        – wchargin
                        yesterday






                      • 1





                        @wchargin Doh! yes that's much better - wanna post it as an answer?

                        – steeldriver
                        yesterday
















                      4














                      For a single-character delimiter (such as space or comma) I would recommend using the cut command over either awk or sed.



                      However since you asked about awk specifically, I think a reasonable way to do it would be to decrement the field count:



                      awk -v last=676 '{while(NF>last) NF--} 1' datafile


                      Tested in GNU Awk (gawk) and mawk.






                      share|improve this answer





















                      • 1





                        Why not just { NF = last; print } instead of the loop?

                        – wchargin
                        yesterday






                      • 1





                        @wchargin Doh! yes that's much better - wanna post it as an answer?

                        – steeldriver
                        yesterday














                      4












                      4








                      4







                      For a single-character delimiter (such as space or comma) I would recommend using the cut command over either awk or sed.



                      However since you asked about awk specifically, I think a reasonable way to do it would be to decrement the field count:



                      awk -v last=676 '{while(NF>last) NF--} 1' datafile


                      Tested in GNU Awk (gawk) and mawk.






                      share|improve this answer















                      For a single-character delimiter (such as space or comma) I would recommend using the cut command over either awk or sed.



                      However since you asked about awk specifically, I think a reasonable way to do it would be to decrement the field count:



                      awk -v last=676 '{while(NF>last) NF--} 1' datafile


                      Tested in GNU Awk (gawk) and mawk.







                      share|improve this answer














                      share|improve this answer



                      share|improve this answer








                      edited yesterday

























                      answered yesterday









                      steeldriversteeldriver

                      70k11114186




                      70k11114186








                      • 1





                        Why not just { NF = last; print } instead of the loop?

                        – wchargin
                        yesterday






                      • 1





                        @wchargin Doh! yes that's much better - wanna post it as an answer?

                        – steeldriver
                        yesterday














                      • 1





                        Why not just { NF = last; print } instead of the loop?

                        – wchargin
                        yesterday






                      • 1





                        @wchargin Doh! yes that's much better - wanna post it as an answer?

                        – steeldriver
                        yesterday








                      1




                      1





                      Why not just { NF = last; print } instead of the loop?

                      – wchargin
                      yesterday





                      Why not just { NF = last; print } instead of the loop?

                      – wchargin
                      yesterday




                      1




                      1





                      @wchargin Doh! yes that's much better - wanna post it as an answer?

                      – steeldriver
                      yesterday





                      @wchargin Doh! yes that's much better - wanna post it as an answer?

                      – steeldriver
                      yesterday











                      3














                      You could use



                      mlr --nidx --fs ' ' --repifs cat inputFile.csv | cut -d ' ' -f-2


                      In this way with mlr (https://github.com/johnkerl/miller/releases/tag/5.4.0) you manage field separators (if you have more than one spaces, they become one per field), and with cut you extract (in my example) the first two fields.



                      From



                      1807   1452 1598  1 6.655713  A B A B
                      1808 1452 1763 1 9.362033 0 0 A B
                      1809 1452 1527 2 6.728534 A B A A
                      1810 1452 1367 2 9.4055 A B A A B


                      to



                      1807 1452
                      1808 1452
                      1809 1452
                      1810 1452


                      Some notes about Miller options:





                      • --nidx is to set the format; this is a generic index-numbered table (the first field is 1, the second is 2, ecc..);


                      • --fs to set the separator (here is a space);


                      • --repifs means that multiple successive occurrences of the field separator count as one


                      • cat passes input records directly to output.






                      share|improve this answer






























                        3














                        You could use



                        mlr --nidx --fs ' ' --repifs cat inputFile.csv | cut -d ' ' -f-2


                        In this way with mlr (https://github.com/johnkerl/miller/releases/tag/5.4.0) you manage field separators (if you have more than one spaces, they become one per field), and with cut you extract (in my example) the first two fields.



                        From



                        1807   1452 1598  1 6.655713  A B A B
                        1808 1452 1763 1 9.362033 0 0 A B
                        1809 1452 1527 2 6.728534 A B A A
                        1810 1452 1367 2 9.4055 A B A A B


                        to



                        1807 1452
                        1808 1452
                        1809 1452
                        1810 1452


                        Some notes about Miller options:





                        • --nidx is to set the format; this is a generic index-numbered table (the first field is 1, the second is 2, ecc..);


                        • --fs to set the separator (here is a space);


                        • --repifs means that multiple successive occurrences of the field separator count as one


                        • cat passes input records directly to output.






                        share|improve this answer




























                          3












                          3








                          3







                          You could use



                          mlr --nidx --fs ' ' --repifs cat inputFile.csv | cut -d ' ' -f-2


                          In this way with mlr (https://github.com/johnkerl/miller/releases/tag/5.4.0) you manage field separators (if you have more than one spaces, they become one per field), and with cut you extract (in my example) the first two fields.



                          From



                          1807   1452 1598  1 6.655713  A B A B
                          1808 1452 1763 1 9.362033 0 0 A B
                          1809 1452 1527 2 6.728534 A B A A
                          1810 1452 1367 2 9.4055 A B A A B


                          to



                          1807 1452
                          1808 1452
                          1809 1452
                          1810 1452


                          Some notes about Miller options:





                          • --nidx is to set the format; this is a generic index-numbered table (the first field is 1, the second is 2, ecc..);


                          • --fs to set the separator (here is a space);


                          • --repifs means that multiple successive occurrences of the field separator count as one


                          • cat passes input records directly to output.






                          share|improve this answer















                          You could use



                          mlr --nidx --fs ' ' --repifs cat inputFile.csv | cut -d ' ' -f-2


                          In this way with mlr (https://github.com/johnkerl/miller/releases/tag/5.4.0) you manage field separators (if you have more than one spaces, they become one per field), and with cut you extract (in my example) the first two fields.



                          From



                          1807   1452 1598  1 6.655713  A B A B
                          1808 1452 1763 1 9.362033 0 0 A B
                          1809 1452 1527 2 6.728534 A B A A
                          1810 1452 1367 2 9.4055 A B A A B


                          to



                          1807 1452
                          1808 1452
                          1809 1452
                          1810 1452


                          Some notes about Miller options:





                          • --nidx is to set the format; this is a generic index-numbered table (the first field is 1, the second is 2, ecc..);


                          • --fs to set the separator (here is a space);


                          • --repifs means that multiple successive occurrences of the field separator count as one


                          • cat passes input records directly to output.







                          share|improve this answer














                          share|improve this answer



                          share|improve this answer








                          edited yesterday

























                          answered yesterday









                          aborrusoaborruso

                          20115




                          20115






















                              andrec is a new contributor. Be nice, and check out our Code of Conduct.










                              draft saved

                              draft discarded


















                              andrec is a new contributor. Be nice, and check out our Code of Conduct.













                              andrec is a new contributor. Be nice, and check out our Code of Conduct.












                              andrec is a new contributor. Be nice, and check out our Code of Conduct.
















                              Thanks for contributing an answer to Ask Ubuntu!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1127670%2fdelete-multiple-columns-using-awk-or-sed%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              數位音樂下載

                              When can things happen in Etherscan, such as the picture below?

                              格利澤436b