Comparing multiple data files





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}







0















I have several files containing text data. I need each individual file to have only unique lines against all the other text files. For example, textfile1 has a line entry called "foobar", but so does textfile15. What is the best way to perform a comparison for uniqueness of the individual rows against multiple files?










share|improve this question




















  • 1





    And what do you want to be done with lines that are found in several files? Only display them? Remove them from the files?

    – Byte Commander
    Sep 30 '15 at 15:39






  • 1





    If both file1 and file15 have foobar, which one should be kept? Is keeping the order of the lines important? Please edit your question, show us example input and your desired output.

    – terdon
    Sep 30 '15 at 16:49


















0















I have several files containing text data. I need each individual file to have only unique lines against all the other text files. For example, textfile1 has a line entry called "foobar", but so does textfile15. What is the best way to perform a comparison for uniqueness of the individual rows against multiple files?










share|improve this question




















  • 1





    And what do you want to be done with lines that are found in several files? Only display them? Remove them from the files?

    – Byte Commander
    Sep 30 '15 at 15:39






  • 1





    If both file1 and file15 have foobar, which one should be kept? Is keeping the order of the lines important? Please edit your question, show us example input and your desired output.

    – terdon
    Sep 30 '15 at 16:49














0












0








0








I have several files containing text data. I need each individual file to have only unique lines against all the other text files. For example, textfile1 has a line entry called "foobar", but so does textfile15. What is the best way to perform a comparison for uniqueness of the individual rows against multiple files?










share|improve this question
















I have several files containing text data. I need each individual file to have only unique lines against all the other text files. For example, textfile1 has a line entry called "foobar", but so does textfile15. What is the best way to perform a comparison for uniqueness of the individual rows against multiple files?







scripts






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 24 at 8:50









Sergiy Kolodyazhnyy

75.1k9155327




75.1k9155327










asked Sep 30 '15 at 15:25









mrnr1mrnr1

104




104








  • 1





    And what do you want to be done with lines that are found in several files? Only display them? Remove them from the files?

    – Byte Commander
    Sep 30 '15 at 15:39






  • 1





    If both file1 and file15 have foobar, which one should be kept? Is keeping the order of the lines important? Please edit your question, show us example input and your desired output.

    – terdon
    Sep 30 '15 at 16:49














  • 1





    And what do you want to be done with lines that are found in several files? Only display them? Remove them from the files?

    – Byte Commander
    Sep 30 '15 at 15:39






  • 1





    If both file1 and file15 have foobar, which one should be kept? Is keeping the order of the lines important? Please edit your question, show us example input and your desired output.

    – terdon
    Sep 30 '15 at 16:49








1




1





And what do you want to be done with lines that are found in several files? Only display them? Remove them from the files?

– Byte Commander
Sep 30 '15 at 15:39





And what do you want to be done with lines that are found in several files? Only display them? Remove them from the files?

– Byte Commander
Sep 30 '15 at 15:39




1




1





If both file1 and file15 have foobar, which one should be kept? Is keeping the order of the lines important? Please edit your question, show us example input and your desired output.

– terdon
Sep 30 '15 at 16:49





If both file1 and file15 have foobar, which one should be kept? Is keeping the order of the lines important? Please edit your question, show us example input and your desired output.

– terdon
Sep 30 '15 at 16:49










1 Answer
1






active

oldest

votes


















0














To find if all text files are unique:



cat *.txt | wc -l ; cat *.txt | sort -u | wc -l


If the lines match they are unique.



To find what duplicates are present:



cat *.txt | sort | uniq -d 


Here is a more complete scenario:
I have songs downloaded. When I download new songs I want to ensure I am not repeating.
So I would do



find . -name *.txt | sort -u > catalog.music  


Now suppose I find a playlist which I would later download. And this play list is in downloadNew.txt .
I would do



grep -F -f downloadNew.txt catalog.music 


If search is unsuccessfull downloadNew.txt is unique, else duplicates are present .






share|improve this answer
























    Your Answer








    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "89"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f680130%2fcomparing-multiple-data-files%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    To find if all text files are unique:



    cat *.txt | wc -l ; cat *.txt | sort -u | wc -l


    If the lines match they are unique.



    To find what duplicates are present:



    cat *.txt | sort | uniq -d 


    Here is a more complete scenario:
    I have songs downloaded. When I download new songs I want to ensure I am not repeating.
    So I would do



    find . -name *.txt | sort -u > catalog.music  


    Now suppose I find a playlist which I would later download. And this play list is in downloadNew.txt .
    I would do



    grep -F -f downloadNew.txt catalog.music 


    If search is unsuccessfull downloadNew.txt is unique, else duplicates are present .






    share|improve this answer




























      0














      To find if all text files are unique:



      cat *.txt | wc -l ; cat *.txt | sort -u | wc -l


      If the lines match they are unique.



      To find what duplicates are present:



      cat *.txt | sort | uniq -d 


      Here is a more complete scenario:
      I have songs downloaded. When I download new songs I want to ensure I am not repeating.
      So I would do



      find . -name *.txt | sort -u > catalog.music  


      Now suppose I find a playlist which I would later download. And this play list is in downloadNew.txt .
      I would do



      grep -F -f downloadNew.txt catalog.music 


      If search is unsuccessfull downloadNew.txt is unique, else duplicates are present .






      share|improve this answer


























        0












        0








        0







        To find if all text files are unique:



        cat *.txt | wc -l ; cat *.txt | sort -u | wc -l


        If the lines match they are unique.



        To find what duplicates are present:



        cat *.txt | sort | uniq -d 


        Here is a more complete scenario:
        I have songs downloaded. When I download new songs I want to ensure I am not repeating.
        So I would do



        find . -name *.txt | sort -u > catalog.music  


        Now suppose I find a playlist which I would later download. And this play list is in downloadNew.txt .
        I would do



        grep -F -f downloadNew.txt catalog.music 


        If search is unsuccessfull downloadNew.txt is unique, else duplicates are present .






        share|improve this answer













        To find if all text files are unique:



        cat *.txt | wc -l ; cat *.txt | sort -u | wc -l


        If the lines match they are unique.



        To find what duplicates are present:



        cat *.txt | sort | uniq -d 


        Here is a more complete scenario:
        I have songs downloaded. When I download new songs I want to ensure I am not repeating.
        So I would do



        find . -name *.txt | sort -u > catalog.music  


        Now suppose I find a playlist which I would later download. And this play list is in downloadNew.txt .
        I would do



        grep -F -f downloadNew.txt catalog.music 


        If search is unsuccessfull downloadNew.txt is unique, else duplicates are present .







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Oct 2 '15 at 10:04









        AmitAmit

        332112




        332112






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Ask Ubuntu!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f680130%2fcomparing-multiple-data-files%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Category:香港粉麵

            List *all* the tuples!

            Channel [V]