Comparing multiple data files
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}
I have several files containing text data. I need each individual file to have only unique lines against all the other text files. For example, textfile1 has a line entry called "foobar", but so does textfile15. What is the best way to perform a comparison for uniqueness of the individual rows against multiple files?
scripts
add a comment |
I have several files containing text data. I need each individual file to have only unique lines against all the other text files. For example, textfile1 has a line entry called "foobar", but so does textfile15. What is the best way to perform a comparison for uniqueness of the individual rows against multiple files?
scripts
1
And what do you want to be done with lines that are found in several files? Only display them? Remove them from the files?
– Byte Commander
Sep 30 '15 at 15:39
1
If bothfile1andfile15havefoobar, which one should be kept? Is keeping the order of the lines important? Please edit your question, show us example input and your desired output.
– terdon♦
Sep 30 '15 at 16:49
add a comment |
I have several files containing text data. I need each individual file to have only unique lines against all the other text files. For example, textfile1 has a line entry called "foobar", but so does textfile15. What is the best way to perform a comparison for uniqueness of the individual rows against multiple files?
scripts
I have several files containing text data. I need each individual file to have only unique lines against all the other text files. For example, textfile1 has a line entry called "foobar", but so does textfile15. What is the best way to perform a comparison for uniqueness of the individual rows against multiple files?
scripts
scripts
edited Mar 24 at 8:50
Sergiy Kolodyazhnyy
75.1k9155327
75.1k9155327
asked Sep 30 '15 at 15:25
mrnr1mrnr1
104
104
1
And what do you want to be done with lines that are found in several files? Only display them? Remove them from the files?
– Byte Commander
Sep 30 '15 at 15:39
1
If bothfile1andfile15havefoobar, which one should be kept? Is keeping the order of the lines important? Please edit your question, show us example input and your desired output.
– terdon♦
Sep 30 '15 at 16:49
add a comment |
1
And what do you want to be done with lines that are found in several files? Only display them? Remove them from the files?
– Byte Commander
Sep 30 '15 at 15:39
1
If bothfile1andfile15havefoobar, which one should be kept? Is keeping the order of the lines important? Please edit your question, show us example input and your desired output.
– terdon♦
Sep 30 '15 at 16:49
1
1
And what do you want to be done with lines that are found in several files? Only display them? Remove them from the files?
– Byte Commander
Sep 30 '15 at 15:39
And what do you want to be done with lines that are found in several files? Only display them? Remove them from the files?
– Byte Commander
Sep 30 '15 at 15:39
1
1
If both
file1 and file15 have foobar, which one should be kept? Is keeping the order of the lines important? Please edit your question, show us example input and your desired output.– terdon♦
Sep 30 '15 at 16:49
If both
file1 and file15 have foobar, which one should be kept? Is keeping the order of the lines important? Please edit your question, show us example input and your desired output.– terdon♦
Sep 30 '15 at 16:49
add a comment |
1 Answer
1
active
oldest
votes
To find if all text files are unique:
cat *.txt | wc -l ; cat *.txt | sort -u | wc -l
If the lines match they are unique.
To find what duplicates are present:
cat *.txt | sort | uniq -d
Here is a more complete scenario:
I have songs downloaded. When I download new songs I want to ensure I am not repeating.
So I would do
find . -name *.txt | sort -u > catalog.music
Now suppose I find a playlist which I would later download. And this play list is in downloadNew.txt .
I would do
grep -F -f downloadNew.txt catalog.music
If search is unsuccessfull downloadNew.txt is unique, else duplicates are present .
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "89"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f680130%2fcomparing-multiple-data-files%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
To find if all text files are unique:
cat *.txt | wc -l ; cat *.txt | sort -u | wc -l
If the lines match they are unique.
To find what duplicates are present:
cat *.txt | sort | uniq -d
Here is a more complete scenario:
I have songs downloaded. When I download new songs I want to ensure I am not repeating.
So I would do
find . -name *.txt | sort -u > catalog.music
Now suppose I find a playlist which I would later download. And this play list is in downloadNew.txt .
I would do
grep -F -f downloadNew.txt catalog.music
If search is unsuccessfull downloadNew.txt is unique, else duplicates are present .
add a comment |
To find if all text files are unique:
cat *.txt | wc -l ; cat *.txt | sort -u | wc -l
If the lines match they are unique.
To find what duplicates are present:
cat *.txt | sort | uniq -d
Here is a more complete scenario:
I have songs downloaded. When I download new songs I want to ensure I am not repeating.
So I would do
find . -name *.txt | sort -u > catalog.music
Now suppose I find a playlist which I would later download. And this play list is in downloadNew.txt .
I would do
grep -F -f downloadNew.txt catalog.music
If search is unsuccessfull downloadNew.txt is unique, else duplicates are present .
add a comment |
To find if all text files are unique:
cat *.txt | wc -l ; cat *.txt | sort -u | wc -l
If the lines match they are unique.
To find what duplicates are present:
cat *.txt | sort | uniq -d
Here is a more complete scenario:
I have songs downloaded. When I download new songs I want to ensure I am not repeating.
So I would do
find . -name *.txt | sort -u > catalog.music
Now suppose I find a playlist which I would later download. And this play list is in downloadNew.txt .
I would do
grep -F -f downloadNew.txt catalog.music
If search is unsuccessfull downloadNew.txt is unique, else duplicates are present .
To find if all text files are unique:
cat *.txt | wc -l ; cat *.txt | sort -u | wc -l
If the lines match they are unique.
To find what duplicates are present:
cat *.txt | sort | uniq -d
Here is a more complete scenario:
I have songs downloaded. When I download new songs I want to ensure I am not repeating.
So I would do
find . -name *.txt | sort -u > catalog.music
Now suppose I find a playlist which I would later download. And this play list is in downloadNew.txt .
I would do
grep -F -f downloadNew.txt catalog.music
If search is unsuccessfull downloadNew.txt is unique, else duplicates are present .
answered Oct 2 '15 at 10:04
AmitAmit
332112
332112
add a comment |
add a comment |
Thanks for contributing an answer to Ask Ubuntu!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f680130%2fcomparing-multiple-data-files%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
And what do you want to be done with lines that are found in several files? Only display them? Remove them from the files?
– Byte Commander
Sep 30 '15 at 15:39
1
If both
file1andfile15havefoobar, which one should be kept? Is keeping the order of the lines important? Please edit your question, show us example input and your desired output.– terdon♦
Sep 30 '15 at 16:49