How to count in linear time worst-case?
$begingroup$
This question and this question got me thinking a little bit. For sorting an array of length $n$ with $k$ unique elements in $O(n + k log k)$, we need to be able to store counts of values in the array. There are some suggestions, but I'm looking for a way to do this in worst case linear time. More specifically:
Given a list $A$ of $n$ elements with $k$ elements distinct, determine a list of tuples $U = {(x_i, c_i)}^k$ of all unique elements $x_i in A$ such that $c_i$ is the count of element $x_i$ in $A$.
Here are some (failed) ideas I've had and have been suggested:
Balanced Binary Search Tree - With this it will take $O(log k)$ to insert into the tree and increase values. After inserts we could do a tree traversal in $O(k)$. Thus, total time comes out to $O(n log k)$ which is too slow.
Hash Map - With this we can get $O(1)$ expected inserts and thus $O(n)$ expected time. However, this is still not $O(n)$ worst case.
Empty Space Mapping - Find the minimum and maximum element in $A$. Allocate (but do not initialize) enough memory to cover this range. Use this memory basically as a hash map and include a random hash so that we don't try to access corrupted memory. This strategy presents issues. (1) It's probabilistic with very very very low probability of failing, but still not guaranteed. Using memory like this limits us to floating-point or integer constraints.
Associative Arrays - There are many other associative arrays that can be used, similar to hash maps and BSTs, but I am not finding any that match these constraints.
Maybe there is some obvious method I am missing, but I also think it could be potentially not be possible. What are your thoughts?
algorithms search-trees hash-tables
$endgroup$
add a comment |
$begingroup$
This question and this question got me thinking a little bit. For sorting an array of length $n$ with $k$ unique elements in $O(n + k log k)$, we need to be able to store counts of values in the array. There are some suggestions, but I'm looking for a way to do this in worst case linear time. More specifically:
Given a list $A$ of $n$ elements with $k$ elements distinct, determine a list of tuples $U = {(x_i, c_i)}^k$ of all unique elements $x_i in A$ such that $c_i$ is the count of element $x_i$ in $A$.
Here are some (failed) ideas I've had and have been suggested:
Balanced Binary Search Tree - With this it will take $O(log k)$ to insert into the tree and increase values. After inserts we could do a tree traversal in $O(k)$. Thus, total time comes out to $O(n log k)$ which is too slow.
Hash Map - With this we can get $O(1)$ expected inserts and thus $O(n)$ expected time. However, this is still not $O(n)$ worst case.
Empty Space Mapping - Find the minimum and maximum element in $A$. Allocate (but do not initialize) enough memory to cover this range. Use this memory basically as a hash map and include a random hash so that we don't try to access corrupted memory. This strategy presents issues. (1) It's probabilistic with very very very low probability of failing, but still not guaranteed. Using memory like this limits us to floating-point or integer constraints.
Associative Arrays - There are many other associative arrays that can be used, similar to hash maps and BSTs, but I am not finding any that match these constraints.
Maybe there is some obvious method I am missing, but I also think it could be potentially not be possible. What are your thoughts?
algorithms search-trees hash-tables
$endgroup$
2
$begingroup$
It cannot be done in comparison model since the problem of element distinctness has a lower bound of $Omega(nlog n)$ decision-tree complexity.
$endgroup$
– Apass.Jack
7 hours ago
$begingroup$
@Apass.Jack, oh right that's correct. A trivial reduction I did not consider. If you write it up as a quick blurb answer, I'll accept.
$endgroup$
– ryan
7 hours ago
$begingroup$
Why is the HashMap not assured amortized O(n) ?
$endgroup$
– javadba
2 hours ago
1
$begingroup$
@javadba For example, suppose all elements are hashed to the same value.
$endgroup$
– Apass.Jack
2 hours ago
$begingroup$
Ah ok so if it's an imperfect hashing.
$endgroup$
– javadba
1 hour ago
add a comment |
$begingroup$
This question and this question got me thinking a little bit. For sorting an array of length $n$ with $k$ unique elements in $O(n + k log k)$, we need to be able to store counts of values in the array. There are some suggestions, but I'm looking for a way to do this in worst case linear time. More specifically:
Given a list $A$ of $n$ elements with $k$ elements distinct, determine a list of tuples $U = {(x_i, c_i)}^k$ of all unique elements $x_i in A$ such that $c_i$ is the count of element $x_i$ in $A$.
Here are some (failed) ideas I've had and have been suggested:
Balanced Binary Search Tree - With this it will take $O(log k)$ to insert into the tree and increase values. After inserts we could do a tree traversal in $O(k)$. Thus, total time comes out to $O(n log k)$ which is too slow.
Hash Map - With this we can get $O(1)$ expected inserts and thus $O(n)$ expected time. However, this is still not $O(n)$ worst case.
Empty Space Mapping - Find the minimum and maximum element in $A$. Allocate (but do not initialize) enough memory to cover this range. Use this memory basically as a hash map and include a random hash so that we don't try to access corrupted memory. This strategy presents issues. (1) It's probabilistic with very very very low probability of failing, but still not guaranteed. Using memory like this limits us to floating-point or integer constraints.
Associative Arrays - There are many other associative arrays that can be used, similar to hash maps and BSTs, but I am not finding any that match these constraints.
Maybe there is some obvious method I am missing, but I also think it could be potentially not be possible. What are your thoughts?
algorithms search-trees hash-tables
$endgroup$
This question and this question got me thinking a little bit. For sorting an array of length $n$ with $k$ unique elements in $O(n + k log k)$, we need to be able to store counts of values in the array. There are some suggestions, but I'm looking for a way to do this in worst case linear time. More specifically:
Given a list $A$ of $n$ elements with $k$ elements distinct, determine a list of tuples $U = {(x_i, c_i)}^k$ of all unique elements $x_i in A$ such that $c_i$ is the count of element $x_i$ in $A$.
Here are some (failed) ideas I've had and have been suggested:
Balanced Binary Search Tree - With this it will take $O(log k)$ to insert into the tree and increase values. After inserts we could do a tree traversal in $O(k)$. Thus, total time comes out to $O(n log k)$ which is too slow.
Hash Map - With this we can get $O(1)$ expected inserts and thus $O(n)$ expected time. However, this is still not $O(n)$ worst case.
Empty Space Mapping - Find the minimum and maximum element in $A$. Allocate (but do not initialize) enough memory to cover this range. Use this memory basically as a hash map and include a random hash so that we don't try to access corrupted memory. This strategy presents issues. (1) It's probabilistic with very very very low probability of failing, but still not guaranteed. Using memory like this limits us to floating-point or integer constraints.
Associative Arrays - There are many other associative arrays that can be used, similar to hash maps and BSTs, but I am not finding any that match these constraints.
Maybe there is some obvious method I am missing, but I also think it could be potentially not be possible. What are your thoughts?
algorithms search-trees hash-tables
algorithms search-trees hash-tables
asked 8 hours ago
ryanryan
3,2291927
3,2291927
2
$begingroup$
It cannot be done in comparison model since the problem of element distinctness has a lower bound of $Omega(nlog n)$ decision-tree complexity.
$endgroup$
– Apass.Jack
7 hours ago
$begingroup$
@Apass.Jack, oh right that's correct. A trivial reduction I did not consider. If you write it up as a quick blurb answer, I'll accept.
$endgroup$
– ryan
7 hours ago
$begingroup$
Why is the HashMap not assured amortized O(n) ?
$endgroup$
– javadba
2 hours ago
1
$begingroup$
@javadba For example, suppose all elements are hashed to the same value.
$endgroup$
– Apass.Jack
2 hours ago
$begingroup$
Ah ok so if it's an imperfect hashing.
$endgroup$
– javadba
1 hour ago
add a comment |
2
$begingroup$
It cannot be done in comparison model since the problem of element distinctness has a lower bound of $Omega(nlog n)$ decision-tree complexity.
$endgroup$
– Apass.Jack
7 hours ago
$begingroup$
@Apass.Jack, oh right that's correct. A trivial reduction I did not consider. If you write it up as a quick blurb answer, I'll accept.
$endgroup$
– ryan
7 hours ago
$begingroup$
Why is the HashMap not assured amortized O(n) ?
$endgroup$
– javadba
2 hours ago
1
$begingroup$
@javadba For example, suppose all elements are hashed to the same value.
$endgroup$
– Apass.Jack
2 hours ago
$begingroup$
Ah ok so if it's an imperfect hashing.
$endgroup$
– javadba
1 hour ago
2
2
$begingroup$
It cannot be done in comparison model since the problem of element distinctness has a lower bound of $Omega(nlog n)$ decision-tree complexity.
$endgroup$
– Apass.Jack
7 hours ago
$begingroup$
It cannot be done in comparison model since the problem of element distinctness has a lower bound of $Omega(nlog n)$ decision-tree complexity.
$endgroup$
– Apass.Jack
7 hours ago
$begingroup$
@Apass.Jack, oh right that's correct. A trivial reduction I did not consider. If you write it up as a quick blurb answer, I'll accept.
$endgroup$
– ryan
7 hours ago
$begingroup$
@Apass.Jack, oh right that's correct. A trivial reduction I did not consider. If you write it up as a quick blurb answer, I'll accept.
$endgroup$
– ryan
7 hours ago
$begingroup$
Why is the HashMap not assured amortized O(n) ?
$endgroup$
– javadba
2 hours ago
$begingroup$
Why is the HashMap not assured amortized O(n) ?
$endgroup$
– javadba
2 hours ago
1
1
$begingroup$
@javadba For example, suppose all elements are hashed to the same value.
$endgroup$
– Apass.Jack
2 hours ago
$begingroup$
@javadba For example, suppose all elements are hashed to the same value.
$endgroup$
– Apass.Jack
2 hours ago
$begingroup$
Ah ok so if it's an imperfect hashing.
$endgroup$
– javadba
1 hour ago
$begingroup$
Ah ok so if it's an imperfect hashing.
$endgroup$
– javadba
1 hour ago
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
This is a nice question.
In the comparison model or, what is more general, the algebraic decision-tree model, the problem of element distinctness has a lower bound of $Theta(nlog n)$ time-complexity in the worst case as said in this Wikipedia article. So there is no algorithm to count distinct elements in linear time in the worst case, even without counting the duplicities.
However, it is not clear whether it can be done in another computational model. It seems unlikely in any reasonable deterministic computational model.
$endgroup$
$begingroup$
Is this really an instance of the element distinctness problem? Just generating the tuples doesn't require the check for distinctness. Not disagreeing, just curious.
$endgroup$
– mascoj
2 hours ago
1
$begingroup$
What I am saying is, if you can produce that tuple of distinct elements, then you can also solve the problem of element distinctness by checking if the size of the tuple is $n$.
$endgroup$
– Apass.Jack
2 hours ago
$begingroup$
Good call. Thanks
$endgroup$
– mascoj
2 hours ago
add a comment |
$begingroup$
There exist randomized algorithms whose expected running time is $O(n)$; or where the probability that the running time takes longer than $cn$ is exponentially small in $c$.
In particular, randomly choose a 2-universal hash function, then use it to hash all of the elements of the array. This achieves the stated running times, if you choose the length of the output of the 2-universal hash appropriately.
As another example, you can build a randomized algorithm whose worst-case running time is $O(n)$ (it always runs in linear time, no matter what) and has a probability of error of at most $1/2^{100}$. (How? Run the above algorithm, and terminate it if it runs longer than $cn$ steps for some appropriately chosen $c$.) In practice, that's good enough, as the probability that your computer outputs the wrong answer due to a cosmic ray is already much higher than $1/2^{100}$.
$endgroup$
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "419"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcs.stackexchange.com%2fquestions%2f108465%2fhow-to-count-in-linear-time-worst-case%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
This is a nice question.
In the comparison model or, what is more general, the algebraic decision-tree model, the problem of element distinctness has a lower bound of $Theta(nlog n)$ time-complexity in the worst case as said in this Wikipedia article. So there is no algorithm to count distinct elements in linear time in the worst case, even without counting the duplicities.
However, it is not clear whether it can be done in another computational model. It seems unlikely in any reasonable deterministic computational model.
$endgroup$
$begingroup$
Is this really an instance of the element distinctness problem? Just generating the tuples doesn't require the check for distinctness. Not disagreeing, just curious.
$endgroup$
– mascoj
2 hours ago
1
$begingroup$
What I am saying is, if you can produce that tuple of distinct elements, then you can also solve the problem of element distinctness by checking if the size of the tuple is $n$.
$endgroup$
– Apass.Jack
2 hours ago
$begingroup$
Good call. Thanks
$endgroup$
– mascoj
2 hours ago
add a comment |
$begingroup$
This is a nice question.
In the comparison model or, what is more general, the algebraic decision-tree model, the problem of element distinctness has a lower bound of $Theta(nlog n)$ time-complexity in the worst case as said in this Wikipedia article. So there is no algorithm to count distinct elements in linear time in the worst case, even without counting the duplicities.
However, it is not clear whether it can be done in another computational model. It seems unlikely in any reasonable deterministic computational model.
$endgroup$
$begingroup$
Is this really an instance of the element distinctness problem? Just generating the tuples doesn't require the check for distinctness. Not disagreeing, just curious.
$endgroup$
– mascoj
2 hours ago
1
$begingroup$
What I am saying is, if you can produce that tuple of distinct elements, then you can also solve the problem of element distinctness by checking if the size of the tuple is $n$.
$endgroup$
– Apass.Jack
2 hours ago
$begingroup$
Good call. Thanks
$endgroup$
– mascoj
2 hours ago
add a comment |
$begingroup$
This is a nice question.
In the comparison model or, what is more general, the algebraic decision-tree model, the problem of element distinctness has a lower bound of $Theta(nlog n)$ time-complexity in the worst case as said in this Wikipedia article. So there is no algorithm to count distinct elements in linear time in the worst case, even without counting the duplicities.
However, it is not clear whether it can be done in another computational model. It seems unlikely in any reasonable deterministic computational model.
$endgroup$
This is a nice question.
In the comparison model or, what is more general, the algebraic decision-tree model, the problem of element distinctness has a lower bound of $Theta(nlog n)$ time-complexity in the worst case as said in this Wikipedia article. So there is no algorithm to count distinct elements in linear time in the worst case, even without counting the duplicities.
However, it is not clear whether it can be done in another computational model. It seems unlikely in any reasonable deterministic computational model.
answered 6 hours ago
Apass.JackApass.Jack
14.5k1940
14.5k1940
$begingroup$
Is this really an instance of the element distinctness problem? Just generating the tuples doesn't require the check for distinctness. Not disagreeing, just curious.
$endgroup$
– mascoj
2 hours ago
1
$begingroup$
What I am saying is, if you can produce that tuple of distinct elements, then you can also solve the problem of element distinctness by checking if the size of the tuple is $n$.
$endgroup$
– Apass.Jack
2 hours ago
$begingroup$
Good call. Thanks
$endgroup$
– mascoj
2 hours ago
add a comment |
$begingroup$
Is this really an instance of the element distinctness problem? Just generating the tuples doesn't require the check for distinctness. Not disagreeing, just curious.
$endgroup$
– mascoj
2 hours ago
1
$begingroup$
What I am saying is, if you can produce that tuple of distinct elements, then you can also solve the problem of element distinctness by checking if the size of the tuple is $n$.
$endgroup$
– Apass.Jack
2 hours ago
$begingroup$
Good call. Thanks
$endgroup$
– mascoj
2 hours ago
$begingroup$
Is this really an instance of the element distinctness problem? Just generating the tuples doesn't require the check for distinctness. Not disagreeing, just curious.
$endgroup$
– mascoj
2 hours ago
$begingroup$
Is this really an instance of the element distinctness problem? Just generating the tuples doesn't require the check for distinctness. Not disagreeing, just curious.
$endgroup$
– mascoj
2 hours ago
1
1
$begingroup$
What I am saying is, if you can produce that tuple of distinct elements, then you can also solve the problem of element distinctness by checking if the size of the tuple is $n$.
$endgroup$
– Apass.Jack
2 hours ago
$begingroup$
What I am saying is, if you can produce that tuple of distinct elements, then you can also solve the problem of element distinctness by checking if the size of the tuple is $n$.
$endgroup$
– Apass.Jack
2 hours ago
$begingroup$
Good call. Thanks
$endgroup$
– mascoj
2 hours ago
$begingroup$
Good call. Thanks
$endgroup$
– mascoj
2 hours ago
add a comment |
$begingroup$
There exist randomized algorithms whose expected running time is $O(n)$; or where the probability that the running time takes longer than $cn$ is exponentially small in $c$.
In particular, randomly choose a 2-universal hash function, then use it to hash all of the elements of the array. This achieves the stated running times, if you choose the length of the output of the 2-universal hash appropriately.
As another example, you can build a randomized algorithm whose worst-case running time is $O(n)$ (it always runs in linear time, no matter what) and has a probability of error of at most $1/2^{100}$. (How? Run the above algorithm, and terminate it if it runs longer than $cn$ steps for some appropriately chosen $c$.) In practice, that's good enough, as the probability that your computer outputs the wrong answer due to a cosmic ray is already much higher than $1/2^{100}$.
$endgroup$
add a comment |
$begingroup$
There exist randomized algorithms whose expected running time is $O(n)$; or where the probability that the running time takes longer than $cn$ is exponentially small in $c$.
In particular, randomly choose a 2-universal hash function, then use it to hash all of the elements of the array. This achieves the stated running times, if you choose the length of the output of the 2-universal hash appropriately.
As another example, you can build a randomized algorithm whose worst-case running time is $O(n)$ (it always runs in linear time, no matter what) and has a probability of error of at most $1/2^{100}$. (How? Run the above algorithm, and terminate it if it runs longer than $cn$ steps for some appropriately chosen $c$.) In practice, that's good enough, as the probability that your computer outputs the wrong answer due to a cosmic ray is already much higher than $1/2^{100}$.
$endgroup$
add a comment |
$begingroup$
There exist randomized algorithms whose expected running time is $O(n)$; or where the probability that the running time takes longer than $cn$ is exponentially small in $c$.
In particular, randomly choose a 2-universal hash function, then use it to hash all of the elements of the array. This achieves the stated running times, if you choose the length of the output of the 2-universal hash appropriately.
As another example, you can build a randomized algorithm whose worst-case running time is $O(n)$ (it always runs in linear time, no matter what) and has a probability of error of at most $1/2^{100}$. (How? Run the above algorithm, and terminate it if it runs longer than $cn$ steps for some appropriately chosen $c$.) In practice, that's good enough, as the probability that your computer outputs the wrong answer due to a cosmic ray is already much higher than $1/2^{100}$.
$endgroup$
There exist randomized algorithms whose expected running time is $O(n)$; or where the probability that the running time takes longer than $cn$ is exponentially small in $c$.
In particular, randomly choose a 2-universal hash function, then use it to hash all of the elements of the array. This achieves the stated running times, if you choose the length of the output of the 2-universal hash appropriately.
As another example, you can build a randomized algorithm whose worst-case running time is $O(n)$ (it always runs in linear time, no matter what) and has a probability of error of at most $1/2^{100}$. (How? Run the above algorithm, and terminate it if it runs longer than $cn$ steps for some appropriately chosen $c$.) In practice, that's good enough, as the probability that your computer outputs the wrong answer due to a cosmic ray is already much higher than $1/2^{100}$.
answered 6 hours ago
D.W.♦D.W.
104k14131298
104k14131298
add a comment |
add a comment |
Thanks for contributing an answer to Computer Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcs.stackexchange.com%2fquestions%2f108465%2fhow-to-count-in-linear-time-worst-case%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
$begingroup$
It cannot be done in comparison model since the problem of element distinctness has a lower bound of $Omega(nlog n)$ decision-tree complexity.
$endgroup$
– Apass.Jack
7 hours ago
$begingroup$
@Apass.Jack, oh right that's correct. A trivial reduction I did not consider. If you write it up as a quick blurb answer, I'll accept.
$endgroup$
– ryan
7 hours ago
$begingroup$
Why is the HashMap not assured amortized O(n) ?
$endgroup$
– javadba
2 hours ago
1
$begingroup$
@javadba For example, suppose all elements are hashed to the same value.
$endgroup$
– Apass.Jack
2 hours ago
$begingroup$
Ah ok so if it's an imperfect hashing.
$endgroup$
– javadba
1 hour ago