In Bayesian inference, why are some terms dropped from the posterior predictive?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}
$begingroup$
In Kevin Murphy's Conjugate Bayesian analysis of the Gaussian distribution, he writes that the posterior predictive distribution is
$$
p(x mid D) = int p(x mid theta) p(theta mid D) d theta
$$
where $D$ is the data on which the model is fit and $x$ is unseen data. What I don't understand is why the dependence on $D$ disappears in the first term in the integral. Using basic rules of probability, I would have expected:
$$
begin{align}
p(a) &= int p(a mid c) p(c) dc
\
p(a mid b) &= int p(a mid c, b) p(c mid b) dc
\
&downarrow
\
p(x mid D) &= int overbrace{p(x mid theta, D)}^{star} p(theta mid D) d theta
end{align}
$$
Question: Why does the dependence on $D$ in term $star$ disappear?
For what it's worth, I've seen this kind of formulation (dropping variables in conditionals) other places. For example, in Ryan Adam's Bayesian Online Changepoint Detection, he writes the posterior predictive as
$$
p(x_{t+1} mid r_t) = int p(x_{t+1} mid theta) p(theta mid r_{t}, x_{t}) d theta
$$
where again, since $D = {x_t, r_t}$, I would have expected
$$
p(x_{t+1} mid x_t, r_t) = int p(x_{t+1} mid theta, x_t, r_t) p(theta mid r_{t}, x_{t}) d theta
$$
bayesian predictive-models inference posterior
$endgroup$
add a comment |
$begingroup$
In Kevin Murphy's Conjugate Bayesian analysis of the Gaussian distribution, he writes that the posterior predictive distribution is
$$
p(x mid D) = int p(x mid theta) p(theta mid D) d theta
$$
where $D$ is the data on which the model is fit and $x$ is unseen data. What I don't understand is why the dependence on $D$ disappears in the first term in the integral. Using basic rules of probability, I would have expected:
$$
begin{align}
p(a) &= int p(a mid c) p(c) dc
\
p(a mid b) &= int p(a mid c, b) p(c mid b) dc
\
&downarrow
\
p(x mid D) &= int overbrace{p(x mid theta, D)}^{star} p(theta mid D) d theta
end{align}
$$
Question: Why does the dependence on $D$ in term $star$ disappear?
For what it's worth, I've seen this kind of formulation (dropping variables in conditionals) other places. For example, in Ryan Adam's Bayesian Online Changepoint Detection, he writes the posterior predictive as
$$
p(x_{t+1} mid r_t) = int p(x_{t+1} mid theta) p(theta mid r_{t}, x_{t}) d theta
$$
where again, since $D = {x_t, r_t}$, I would have expected
$$
p(x_{t+1} mid x_t, r_t) = int p(x_{t+1} mid theta, x_t, r_t) p(theta mid r_{t}, x_{t}) d theta
$$
bayesian predictive-models inference posterior
$endgroup$
add a comment |
$begingroup$
In Kevin Murphy's Conjugate Bayesian analysis of the Gaussian distribution, he writes that the posterior predictive distribution is
$$
p(x mid D) = int p(x mid theta) p(theta mid D) d theta
$$
where $D$ is the data on which the model is fit and $x$ is unseen data. What I don't understand is why the dependence on $D$ disappears in the first term in the integral. Using basic rules of probability, I would have expected:
$$
begin{align}
p(a) &= int p(a mid c) p(c) dc
\
p(a mid b) &= int p(a mid c, b) p(c mid b) dc
\
&downarrow
\
p(x mid D) &= int overbrace{p(x mid theta, D)}^{star} p(theta mid D) d theta
end{align}
$$
Question: Why does the dependence on $D$ in term $star$ disappear?
For what it's worth, I've seen this kind of formulation (dropping variables in conditionals) other places. For example, in Ryan Adam's Bayesian Online Changepoint Detection, he writes the posterior predictive as
$$
p(x_{t+1} mid r_t) = int p(x_{t+1} mid theta) p(theta mid r_{t}, x_{t}) d theta
$$
where again, since $D = {x_t, r_t}$, I would have expected
$$
p(x_{t+1} mid x_t, r_t) = int p(x_{t+1} mid theta, x_t, r_t) p(theta mid r_{t}, x_{t}) d theta
$$
bayesian predictive-models inference posterior
$endgroup$
In Kevin Murphy's Conjugate Bayesian analysis of the Gaussian distribution, he writes that the posterior predictive distribution is
$$
p(x mid D) = int p(x mid theta) p(theta mid D) d theta
$$
where $D$ is the data on which the model is fit and $x$ is unseen data. What I don't understand is why the dependence on $D$ disappears in the first term in the integral. Using basic rules of probability, I would have expected:
$$
begin{align}
p(a) &= int p(a mid c) p(c) dc
\
p(a mid b) &= int p(a mid c, b) p(c mid b) dc
\
&downarrow
\
p(x mid D) &= int overbrace{p(x mid theta, D)}^{star} p(theta mid D) d theta
end{align}
$$
Question: Why does the dependence on $D$ in term $star$ disappear?
For what it's worth, I've seen this kind of formulation (dropping variables in conditionals) other places. For example, in Ryan Adam's Bayesian Online Changepoint Detection, he writes the posterior predictive as
$$
p(x_{t+1} mid r_t) = int p(x_{t+1} mid theta) p(theta mid r_{t}, x_{t}) d theta
$$
where again, since $D = {x_t, r_t}$, I would have expected
$$
p(x_{t+1} mid x_t, r_t) = int p(x_{t+1} mid theta, x_t, r_t) p(theta mid r_{t}, x_{t}) d theta
$$
bayesian predictive-models inference posterior
bayesian predictive-models inference posterior
asked Apr 2 at 16:04
gwggwg
178214
178214
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
This is based on the assumption that $x$ is conditionally independent of $D$, given $theta$. This is a reasonable assumption in many cases, because all it says is that the training and testing data ($D$ and $x$, respectively) are independently generated from the same set of unknown parameters $theta$. Given this independence assumption, $p(x|theta,D)=p(x|theta)$, and so the $D$ drops out of the more general form that you expected.
In your second example, it seems that a similar independence assumption is being applied, but now (explicitly) across time. These assumptions may be explicitly stated elsewhere in the text, or they may be implicitly clear to anyone who is sufficiently familiar with the context of the problem (although that doesn't necessarily mean that in your particular examples - which I'm not familiar with - the authors were right to assume this familiarity).
$endgroup$
add a comment |
$begingroup$
It's because $x$ is assumed to be independent of $D$ given $theta$. In other words, all data is assumed to be i.i.d. from a normal distribution with parameters $theta$. Once $theta$ is taken into account using information from $D$, there is no more information that $D$ gives us about a new data point $x$. Therefore $p(x|theta, D) = p(x|theta)$.
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "65"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f400785%2fin-bayesian-inference-why-are-some-terms-dropped-from-the-posterior-predictive%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
This is based on the assumption that $x$ is conditionally independent of $D$, given $theta$. This is a reasonable assumption in many cases, because all it says is that the training and testing data ($D$ and $x$, respectively) are independently generated from the same set of unknown parameters $theta$. Given this independence assumption, $p(x|theta,D)=p(x|theta)$, and so the $D$ drops out of the more general form that you expected.
In your second example, it seems that a similar independence assumption is being applied, but now (explicitly) across time. These assumptions may be explicitly stated elsewhere in the text, or they may be implicitly clear to anyone who is sufficiently familiar with the context of the problem (although that doesn't necessarily mean that in your particular examples - which I'm not familiar with - the authors were right to assume this familiarity).
$endgroup$
add a comment |
$begingroup$
This is based on the assumption that $x$ is conditionally independent of $D$, given $theta$. This is a reasonable assumption in many cases, because all it says is that the training and testing data ($D$ and $x$, respectively) are independently generated from the same set of unknown parameters $theta$. Given this independence assumption, $p(x|theta,D)=p(x|theta)$, and so the $D$ drops out of the more general form that you expected.
In your second example, it seems that a similar independence assumption is being applied, but now (explicitly) across time. These assumptions may be explicitly stated elsewhere in the text, or they may be implicitly clear to anyone who is sufficiently familiar with the context of the problem (although that doesn't necessarily mean that in your particular examples - which I'm not familiar with - the authors were right to assume this familiarity).
$endgroup$
add a comment |
$begingroup$
This is based on the assumption that $x$ is conditionally independent of $D$, given $theta$. This is a reasonable assumption in many cases, because all it says is that the training and testing data ($D$ and $x$, respectively) are independently generated from the same set of unknown parameters $theta$. Given this independence assumption, $p(x|theta,D)=p(x|theta)$, and so the $D$ drops out of the more general form that you expected.
In your second example, it seems that a similar independence assumption is being applied, but now (explicitly) across time. These assumptions may be explicitly stated elsewhere in the text, or they may be implicitly clear to anyone who is sufficiently familiar with the context of the problem (although that doesn't necessarily mean that in your particular examples - which I'm not familiar with - the authors were right to assume this familiarity).
$endgroup$
This is based on the assumption that $x$ is conditionally independent of $D$, given $theta$. This is a reasonable assumption in many cases, because all it says is that the training and testing data ($D$ and $x$, respectively) are independently generated from the same set of unknown parameters $theta$. Given this independence assumption, $p(x|theta,D)=p(x|theta)$, and so the $D$ drops out of the more general form that you expected.
In your second example, it seems that a similar independence assumption is being applied, but now (explicitly) across time. These assumptions may be explicitly stated elsewhere in the text, or they may be implicitly clear to anyone who is sufficiently familiar with the context of the problem (although that doesn't necessarily mean that in your particular examples - which I'm not familiar with - the authors were right to assume this familiarity).
edited Apr 2 at 17:27
answered Apr 2 at 16:26
Ruben van BergenRuben van Bergen
4,0291924
4,0291924
add a comment |
add a comment |
$begingroup$
It's because $x$ is assumed to be independent of $D$ given $theta$. In other words, all data is assumed to be i.i.d. from a normal distribution with parameters $theta$. Once $theta$ is taken into account using information from $D$, there is no more information that $D$ gives us about a new data point $x$. Therefore $p(x|theta, D) = p(x|theta)$.
$endgroup$
add a comment |
$begingroup$
It's because $x$ is assumed to be independent of $D$ given $theta$. In other words, all data is assumed to be i.i.d. from a normal distribution with parameters $theta$. Once $theta$ is taken into account using information from $D$, there is no more information that $D$ gives us about a new data point $x$. Therefore $p(x|theta, D) = p(x|theta)$.
$endgroup$
add a comment |
$begingroup$
It's because $x$ is assumed to be independent of $D$ given $theta$. In other words, all data is assumed to be i.i.d. from a normal distribution with parameters $theta$. Once $theta$ is taken into account using information from $D$, there is no more information that $D$ gives us about a new data point $x$. Therefore $p(x|theta, D) = p(x|theta)$.
$endgroup$
It's because $x$ is assumed to be independent of $D$ given $theta$. In other words, all data is assumed to be i.i.d. from a normal distribution with parameters $theta$. Once $theta$ is taken into account using information from $D$, there is no more information that $D$ gives us about a new data point $x$. Therefore $p(x|theta, D) = p(x|theta)$.
edited Apr 2 at 16:55
answered Apr 2 at 16:26
JP TrawinskiJP Trawinski
603310
603310
add a comment |
add a comment |
Thanks for contributing an answer to Cross Validated!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f400785%2fin-bayesian-inference-why-are-some-terms-dropped-from-the-posterior-predictive%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown