for a persistent perceptual experience, why is video able to have a lower frame rate than audio?












2












$begingroup$


In film, images are typically shown to us at around 24 frames per second, but modern sound files will often have 44100 or 48000 samples per second.
There's a threshold above ~12 fps where we will perceive successive frames as unified motion instead of individual pictures (c.f. phi phenomenon, persistence of vision, beta movement). But to get this unified experience in the auditory domain, we need a much higher "framerate". Why is this?










share|improve this question









$endgroup$

















    2












    $begingroup$


    In film, images are typically shown to us at around 24 frames per second, but modern sound files will often have 44100 or 48000 samples per second.
    There's a threshold above ~12 fps where we will perceive successive frames as unified motion instead of individual pictures (c.f. phi phenomenon, persistence of vision, beta movement). But to get this unified experience in the auditory domain, we need a much higher "framerate". Why is this?










    share|improve this question









    $endgroup$















      2












      2








      2





      $begingroup$


      In film, images are typically shown to us at around 24 frames per second, but modern sound files will often have 44100 or 48000 samples per second.
      There's a threshold above ~12 fps where we will perceive successive frames as unified motion instead of individual pictures (c.f. phi phenomenon, persistence of vision, beta movement). But to get this unified experience in the auditory domain, we need a much higher "framerate". Why is this?










      share|improve this question









      $endgroup$




      In film, images are typically shown to us at around 24 frames per second, but modern sound files will often have 44100 or 48000 samples per second.
      There's a threshold above ~12 fps where we will perceive successive frames as unified motion instead of individual pictures (c.f. phi phenomenon, persistence of vision, beta movement). But to get this unified experience in the auditory domain, we need a much higher "framerate". Why is this?







      perception psychophysics sensory-perception






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked 6 hours ago









      RECURSIVE FARTSRECURSIVE FARTS

      3441314




      3441314






















          2 Answers
          2






          active

          oldest

          votes


















          2












          $begingroup$

          Sound is pressure waves; young humans can hear (aka, detect pressure waves) up to about 20 kHz. To produce these high frequency waves with a speaker with a time-domain signal, it is necessary to have a sampling rate at least 2x the highest frequency that will be represented. In practice, those very high frequencies aren't included in music, and definitely aren't included in speech, so ~44kHz is sufficient. There is a membrane inside the cochlea that is structured to vibrate at different frequencies along its length. At the higher frequencies, neurons don't actually respond to every sound wave, they respond to the envelope, so it is possible to respond to frequencies much higher than the frequencies that neurons can even fire at.



          Vision depends on detection of photons. A photon hits a photosensitive molecule in a photoreceptor in the retina, which causes a chemical change. That changed chemical binds to a protein, which causes a cascade of events that ultimately causes a change in the release of a neurotransmitter. Vision is slow: the cascade in response to a single photon takes on the order of 100s of milliseconds. We can detect things a bit faster than that because the visual system responds to changes so the slope of that response is a relevant feature, but overall this slow process means that light information is low-pass filtered. As long as a signal is sufficiently faster than this low-pass filter, differences between a frame-by-frame versus a smooth signal are mostly not noticed. However, it isn't true that 24 frames per second is a limit. Modern monitors often operate much faster, such as 60-144Hz, because these faster frame rates are important for perception of smooth motion at high speeds. Slower frame rates are sufficient when changes are small, however.



          In nature, a lot of things vibrate a high frequencies into the 1000s of Hz, so there are good evolutionary reasons to detect high frequency sounds. However, very few things move at those speeds, and those that do are typically not behaviorally relevant (e.g., you don't need to see every sweep of an insect's wings to detect it as an insect).






          share|improve this answer











          $endgroup$













          • $begingroup$
            Do you have any references for your claims?
            $endgroup$
            – Chris Rogers
            3 hours ago






          • 2




            $begingroup$
            @ChrisRogers I don't generally provide references for intro textbook-level knowledge. Everything here is available on Wikipedia for people who like to use Wikipedia, or any introductory neuroscience textbook.
            $endgroup$
            – Bryan Krause
            3 hours ago








          • 2




            $begingroup$
            @ChrisRogers I think that policy makes sense where it makes sense and not where it doesn't. If you have a link to Meta explaining that, I'll have a look, but I'm not aware of such a policy that applies to knowledge like "sound is pressure waves" or "vision is the detection of photons." I don't see value in adding links to Wikipedia or adding an arbitrary textbook source that most people aren't likely to have access to anyways.
            $endgroup$
            – Bryan Krause
            3 hours ago








          • 3




            $begingroup$
            @ChrisRogers I could provide references to those, but those examples are exactly why I feel referencing everything is a bit silly, because neither of those numbers is in any way relevant to this answer. 2x the highest frequency is the Nyquist rate, a basic concept in physics. The human hearing range is also not important, as long as it's well above the 24 Hz rate for movie video.
            $endgroup$
            – Bryan Krause
            3 hours ago






          • 1




            $begingroup$
            @ChrisRogers I agree with Bryan, there is nothing in this answer that is not covered in a general reference like Wikipedia (not that I think Wikipedia is an authoritative source, but if it is in wikipedia, it should be considered common knowledge). As the question can be answered with common knowledge, maybe it is not a good fit for the site as it is not an advanced questions in psychology & neuroscience ...
            $endgroup$
            – StrongBad
            2 hours ago



















          1












          $begingroup$

          I don't have a full answer, but it might get things started...



          You are mixing up two concepts frame rate and sampling rate. In a video presented at 24 fps each frame, potentially, has a wide range of spatial frequencies. Typically the spatial frequencies are limited by the number of pixels, but you can low pass filter each frame to reduce the spatial frequencies (you will end up with a blurry picture). This spatial filtering has nothing to do with frame rate.



          The 44.1 kHz sampling rate in audio signals is more akin to the spatial frequencies of a picture/frame than the frame rate of a video. An example of audio frames would be something like decomposing the audio signal into a bunch of slices with the short time Fourier transform (STFT), setting each slice to have a constant spectrum (and phase???), and reconstructing. Reconstructing the signal from a modified STFT is non-trivial (cf., Griffin and Lim 1984). Given the difficulties in the process and the lack of an application, I am not sure anyone has really investigated how the duration of the slices affects things.






          share|improve this answer









          $endgroup$













          • $begingroup$
            Lack of an application? Audio compression (mp3, etc) typically uses a Fourier transform (or some sort of wavelet). Pretty much all of the dimensions of compression have been investigated to find the most efficient encoding that limits perceived decay, since compression is lossy.
            $endgroup$
            – Bryan Krause
            1 hour ago












          • $begingroup$
            @BryanKrause yes, but I don't see the link between those types of frames and the idea of a constant segment of signal frame. Maybe there is and maybe I am missing the relevant literature and if so would love to see an answer with more reference ... as I said, I don't have a full answer.
            $endgroup$
            – StrongBad
            1 hour ago











          Your Answer





          StackExchange.ifUsing("editor", function () {
          return StackExchange.using("mathjaxEditing", function () {
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          });
          });
          }, "mathjax-editing");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "391"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          noCode: true, onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fpsychology.stackexchange.com%2fquestions%2f21572%2ffor-a-persistent-perceptual-experience-why-is-video-able-to-have-a-lower-frame%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          2 Answers
          2






          active

          oldest

          votes








          2 Answers
          2






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          2












          $begingroup$

          Sound is pressure waves; young humans can hear (aka, detect pressure waves) up to about 20 kHz. To produce these high frequency waves with a speaker with a time-domain signal, it is necessary to have a sampling rate at least 2x the highest frequency that will be represented. In practice, those very high frequencies aren't included in music, and definitely aren't included in speech, so ~44kHz is sufficient. There is a membrane inside the cochlea that is structured to vibrate at different frequencies along its length. At the higher frequencies, neurons don't actually respond to every sound wave, they respond to the envelope, so it is possible to respond to frequencies much higher than the frequencies that neurons can even fire at.



          Vision depends on detection of photons. A photon hits a photosensitive molecule in a photoreceptor in the retina, which causes a chemical change. That changed chemical binds to a protein, which causes a cascade of events that ultimately causes a change in the release of a neurotransmitter. Vision is slow: the cascade in response to a single photon takes on the order of 100s of milliseconds. We can detect things a bit faster than that because the visual system responds to changes so the slope of that response is a relevant feature, but overall this slow process means that light information is low-pass filtered. As long as a signal is sufficiently faster than this low-pass filter, differences between a frame-by-frame versus a smooth signal are mostly not noticed. However, it isn't true that 24 frames per second is a limit. Modern monitors often operate much faster, such as 60-144Hz, because these faster frame rates are important for perception of smooth motion at high speeds. Slower frame rates are sufficient when changes are small, however.



          In nature, a lot of things vibrate a high frequencies into the 1000s of Hz, so there are good evolutionary reasons to detect high frequency sounds. However, very few things move at those speeds, and those that do are typically not behaviorally relevant (e.g., you don't need to see every sweep of an insect's wings to detect it as an insect).






          share|improve this answer











          $endgroup$













          • $begingroup$
            Do you have any references for your claims?
            $endgroup$
            – Chris Rogers
            3 hours ago






          • 2




            $begingroup$
            @ChrisRogers I don't generally provide references for intro textbook-level knowledge. Everything here is available on Wikipedia for people who like to use Wikipedia, or any introductory neuroscience textbook.
            $endgroup$
            – Bryan Krause
            3 hours ago








          • 2




            $begingroup$
            @ChrisRogers I think that policy makes sense where it makes sense and not where it doesn't. If you have a link to Meta explaining that, I'll have a look, but I'm not aware of such a policy that applies to knowledge like "sound is pressure waves" or "vision is the detection of photons." I don't see value in adding links to Wikipedia or adding an arbitrary textbook source that most people aren't likely to have access to anyways.
            $endgroup$
            – Bryan Krause
            3 hours ago








          • 3




            $begingroup$
            @ChrisRogers I could provide references to those, but those examples are exactly why I feel referencing everything is a bit silly, because neither of those numbers is in any way relevant to this answer. 2x the highest frequency is the Nyquist rate, a basic concept in physics. The human hearing range is also not important, as long as it's well above the 24 Hz rate for movie video.
            $endgroup$
            – Bryan Krause
            3 hours ago






          • 1




            $begingroup$
            @ChrisRogers I agree with Bryan, there is nothing in this answer that is not covered in a general reference like Wikipedia (not that I think Wikipedia is an authoritative source, but if it is in wikipedia, it should be considered common knowledge). As the question can be answered with common knowledge, maybe it is not a good fit for the site as it is not an advanced questions in psychology & neuroscience ...
            $endgroup$
            – StrongBad
            2 hours ago
















          2












          $begingroup$

          Sound is pressure waves; young humans can hear (aka, detect pressure waves) up to about 20 kHz. To produce these high frequency waves with a speaker with a time-domain signal, it is necessary to have a sampling rate at least 2x the highest frequency that will be represented. In practice, those very high frequencies aren't included in music, and definitely aren't included in speech, so ~44kHz is sufficient. There is a membrane inside the cochlea that is structured to vibrate at different frequencies along its length. At the higher frequencies, neurons don't actually respond to every sound wave, they respond to the envelope, so it is possible to respond to frequencies much higher than the frequencies that neurons can even fire at.



          Vision depends on detection of photons. A photon hits a photosensitive molecule in a photoreceptor in the retina, which causes a chemical change. That changed chemical binds to a protein, which causes a cascade of events that ultimately causes a change in the release of a neurotransmitter. Vision is slow: the cascade in response to a single photon takes on the order of 100s of milliseconds. We can detect things a bit faster than that because the visual system responds to changes so the slope of that response is a relevant feature, but overall this slow process means that light information is low-pass filtered. As long as a signal is sufficiently faster than this low-pass filter, differences between a frame-by-frame versus a smooth signal are mostly not noticed. However, it isn't true that 24 frames per second is a limit. Modern monitors often operate much faster, such as 60-144Hz, because these faster frame rates are important for perception of smooth motion at high speeds. Slower frame rates are sufficient when changes are small, however.



          In nature, a lot of things vibrate a high frequencies into the 1000s of Hz, so there are good evolutionary reasons to detect high frequency sounds. However, very few things move at those speeds, and those that do are typically not behaviorally relevant (e.g., you don't need to see every sweep of an insect's wings to detect it as an insect).






          share|improve this answer











          $endgroup$













          • $begingroup$
            Do you have any references for your claims?
            $endgroup$
            – Chris Rogers
            3 hours ago






          • 2




            $begingroup$
            @ChrisRogers I don't generally provide references for intro textbook-level knowledge. Everything here is available on Wikipedia for people who like to use Wikipedia, or any introductory neuroscience textbook.
            $endgroup$
            – Bryan Krause
            3 hours ago








          • 2




            $begingroup$
            @ChrisRogers I think that policy makes sense where it makes sense and not where it doesn't. If you have a link to Meta explaining that, I'll have a look, but I'm not aware of such a policy that applies to knowledge like "sound is pressure waves" or "vision is the detection of photons." I don't see value in adding links to Wikipedia or adding an arbitrary textbook source that most people aren't likely to have access to anyways.
            $endgroup$
            – Bryan Krause
            3 hours ago








          • 3




            $begingroup$
            @ChrisRogers I could provide references to those, but those examples are exactly why I feel referencing everything is a bit silly, because neither of those numbers is in any way relevant to this answer. 2x the highest frequency is the Nyquist rate, a basic concept in physics. The human hearing range is also not important, as long as it's well above the 24 Hz rate for movie video.
            $endgroup$
            – Bryan Krause
            3 hours ago






          • 1




            $begingroup$
            @ChrisRogers I agree with Bryan, there is nothing in this answer that is not covered in a general reference like Wikipedia (not that I think Wikipedia is an authoritative source, but if it is in wikipedia, it should be considered common knowledge). As the question can be answered with common knowledge, maybe it is not a good fit for the site as it is not an advanced questions in psychology & neuroscience ...
            $endgroup$
            – StrongBad
            2 hours ago














          2












          2








          2





          $begingroup$

          Sound is pressure waves; young humans can hear (aka, detect pressure waves) up to about 20 kHz. To produce these high frequency waves with a speaker with a time-domain signal, it is necessary to have a sampling rate at least 2x the highest frequency that will be represented. In practice, those very high frequencies aren't included in music, and definitely aren't included in speech, so ~44kHz is sufficient. There is a membrane inside the cochlea that is structured to vibrate at different frequencies along its length. At the higher frequencies, neurons don't actually respond to every sound wave, they respond to the envelope, so it is possible to respond to frequencies much higher than the frequencies that neurons can even fire at.



          Vision depends on detection of photons. A photon hits a photosensitive molecule in a photoreceptor in the retina, which causes a chemical change. That changed chemical binds to a protein, which causes a cascade of events that ultimately causes a change in the release of a neurotransmitter. Vision is slow: the cascade in response to a single photon takes on the order of 100s of milliseconds. We can detect things a bit faster than that because the visual system responds to changes so the slope of that response is a relevant feature, but overall this slow process means that light information is low-pass filtered. As long as a signal is sufficiently faster than this low-pass filter, differences between a frame-by-frame versus a smooth signal are mostly not noticed. However, it isn't true that 24 frames per second is a limit. Modern monitors often operate much faster, such as 60-144Hz, because these faster frame rates are important for perception of smooth motion at high speeds. Slower frame rates are sufficient when changes are small, however.



          In nature, a lot of things vibrate a high frequencies into the 1000s of Hz, so there are good evolutionary reasons to detect high frequency sounds. However, very few things move at those speeds, and those that do are typically not behaviorally relevant (e.g., you don't need to see every sweep of an insect's wings to detect it as an insect).






          share|improve this answer











          $endgroup$



          Sound is pressure waves; young humans can hear (aka, detect pressure waves) up to about 20 kHz. To produce these high frequency waves with a speaker with a time-domain signal, it is necessary to have a sampling rate at least 2x the highest frequency that will be represented. In practice, those very high frequencies aren't included in music, and definitely aren't included in speech, so ~44kHz is sufficient. There is a membrane inside the cochlea that is structured to vibrate at different frequencies along its length. At the higher frequencies, neurons don't actually respond to every sound wave, they respond to the envelope, so it is possible to respond to frequencies much higher than the frequencies that neurons can even fire at.



          Vision depends on detection of photons. A photon hits a photosensitive molecule in a photoreceptor in the retina, which causes a chemical change. That changed chemical binds to a protein, which causes a cascade of events that ultimately causes a change in the release of a neurotransmitter. Vision is slow: the cascade in response to a single photon takes on the order of 100s of milliseconds. We can detect things a bit faster than that because the visual system responds to changes so the slope of that response is a relevant feature, but overall this slow process means that light information is low-pass filtered. As long as a signal is sufficiently faster than this low-pass filter, differences between a frame-by-frame versus a smooth signal are mostly not noticed. However, it isn't true that 24 frames per second is a limit. Modern monitors often operate much faster, such as 60-144Hz, because these faster frame rates are important for perception of smooth motion at high speeds. Slower frame rates are sufficient when changes are small, however.



          In nature, a lot of things vibrate a high frequencies into the 1000s of Hz, so there are good evolutionary reasons to detect high frequency sounds. However, very few things move at those speeds, and those that do are typically not behaviorally relevant (e.g., you don't need to see every sweep of an insect's wings to detect it as an insect).







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited 3 hours ago

























          answered 4 hours ago









          Bryan KrauseBryan Krause

          1,399211




          1,399211












          • $begingroup$
            Do you have any references for your claims?
            $endgroup$
            – Chris Rogers
            3 hours ago






          • 2




            $begingroup$
            @ChrisRogers I don't generally provide references for intro textbook-level knowledge. Everything here is available on Wikipedia for people who like to use Wikipedia, or any introductory neuroscience textbook.
            $endgroup$
            – Bryan Krause
            3 hours ago








          • 2




            $begingroup$
            @ChrisRogers I think that policy makes sense where it makes sense and not where it doesn't. If you have a link to Meta explaining that, I'll have a look, but I'm not aware of such a policy that applies to knowledge like "sound is pressure waves" or "vision is the detection of photons." I don't see value in adding links to Wikipedia or adding an arbitrary textbook source that most people aren't likely to have access to anyways.
            $endgroup$
            – Bryan Krause
            3 hours ago








          • 3




            $begingroup$
            @ChrisRogers I could provide references to those, but those examples are exactly why I feel referencing everything is a bit silly, because neither of those numbers is in any way relevant to this answer. 2x the highest frequency is the Nyquist rate, a basic concept in physics. The human hearing range is also not important, as long as it's well above the 24 Hz rate for movie video.
            $endgroup$
            – Bryan Krause
            3 hours ago






          • 1




            $begingroup$
            @ChrisRogers I agree with Bryan, there is nothing in this answer that is not covered in a general reference like Wikipedia (not that I think Wikipedia is an authoritative source, but if it is in wikipedia, it should be considered common knowledge). As the question can be answered with common knowledge, maybe it is not a good fit for the site as it is not an advanced questions in psychology & neuroscience ...
            $endgroup$
            – StrongBad
            2 hours ago


















          • $begingroup$
            Do you have any references for your claims?
            $endgroup$
            – Chris Rogers
            3 hours ago






          • 2




            $begingroup$
            @ChrisRogers I don't generally provide references for intro textbook-level knowledge. Everything here is available on Wikipedia for people who like to use Wikipedia, or any introductory neuroscience textbook.
            $endgroup$
            – Bryan Krause
            3 hours ago








          • 2




            $begingroup$
            @ChrisRogers I think that policy makes sense where it makes sense and not where it doesn't. If you have a link to Meta explaining that, I'll have a look, but I'm not aware of such a policy that applies to knowledge like "sound is pressure waves" or "vision is the detection of photons." I don't see value in adding links to Wikipedia or adding an arbitrary textbook source that most people aren't likely to have access to anyways.
            $endgroup$
            – Bryan Krause
            3 hours ago








          • 3




            $begingroup$
            @ChrisRogers I could provide references to those, but those examples are exactly why I feel referencing everything is a bit silly, because neither of those numbers is in any way relevant to this answer. 2x the highest frequency is the Nyquist rate, a basic concept in physics. The human hearing range is also not important, as long as it's well above the 24 Hz rate for movie video.
            $endgroup$
            – Bryan Krause
            3 hours ago






          • 1




            $begingroup$
            @ChrisRogers I agree with Bryan, there is nothing in this answer that is not covered in a general reference like Wikipedia (not that I think Wikipedia is an authoritative source, but if it is in wikipedia, it should be considered common knowledge). As the question can be answered with common knowledge, maybe it is not a good fit for the site as it is not an advanced questions in psychology & neuroscience ...
            $endgroup$
            – StrongBad
            2 hours ago
















          $begingroup$
          Do you have any references for your claims?
          $endgroup$
          – Chris Rogers
          3 hours ago




          $begingroup$
          Do you have any references for your claims?
          $endgroup$
          – Chris Rogers
          3 hours ago




          2




          2




          $begingroup$
          @ChrisRogers I don't generally provide references for intro textbook-level knowledge. Everything here is available on Wikipedia for people who like to use Wikipedia, or any introductory neuroscience textbook.
          $endgroup$
          – Bryan Krause
          3 hours ago






          $begingroup$
          @ChrisRogers I don't generally provide references for intro textbook-level knowledge. Everything here is available on Wikipedia for people who like to use Wikipedia, or any introductory neuroscience textbook.
          $endgroup$
          – Bryan Krause
          3 hours ago






          2




          2




          $begingroup$
          @ChrisRogers I think that policy makes sense where it makes sense and not where it doesn't. If you have a link to Meta explaining that, I'll have a look, but I'm not aware of such a policy that applies to knowledge like "sound is pressure waves" or "vision is the detection of photons." I don't see value in adding links to Wikipedia or adding an arbitrary textbook source that most people aren't likely to have access to anyways.
          $endgroup$
          – Bryan Krause
          3 hours ago






          $begingroup$
          @ChrisRogers I think that policy makes sense where it makes sense and not where it doesn't. If you have a link to Meta explaining that, I'll have a look, but I'm not aware of such a policy that applies to knowledge like "sound is pressure waves" or "vision is the detection of photons." I don't see value in adding links to Wikipedia or adding an arbitrary textbook source that most people aren't likely to have access to anyways.
          $endgroup$
          – Bryan Krause
          3 hours ago






          3




          3




          $begingroup$
          @ChrisRogers I could provide references to those, but those examples are exactly why I feel referencing everything is a bit silly, because neither of those numbers is in any way relevant to this answer. 2x the highest frequency is the Nyquist rate, a basic concept in physics. The human hearing range is also not important, as long as it's well above the 24 Hz rate for movie video.
          $endgroup$
          – Bryan Krause
          3 hours ago




          $begingroup$
          @ChrisRogers I could provide references to those, but those examples are exactly why I feel referencing everything is a bit silly, because neither of those numbers is in any way relevant to this answer. 2x the highest frequency is the Nyquist rate, a basic concept in physics. The human hearing range is also not important, as long as it's well above the 24 Hz rate for movie video.
          $endgroup$
          – Bryan Krause
          3 hours ago




          1




          1




          $begingroup$
          @ChrisRogers I agree with Bryan, there is nothing in this answer that is not covered in a general reference like Wikipedia (not that I think Wikipedia is an authoritative source, but if it is in wikipedia, it should be considered common knowledge). As the question can be answered with common knowledge, maybe it is not a good fit for the site as it is not an advanced questions in psychology & neuroscience ...
          $endgroup$
          – StrongBad
          2 hours ago




          $begingroup$
          @ChrisRogers I agree with Bryan, there is nothing in this answer that is not covered in a general reference like Wikipedia (not that I think Wikipedia is an authoritative source, but if it is in wikipedia, it should be considered common knowledge). As the question can be answered with common knowledge, maybe it is not a good fit for the site as it is not an advanced questions in psychology & neuroscience ...
          $endgroup$
          – StrongBad
          2 hours ago











          1












          $begingroup$

          I don't have a full answer, but it might get things started...



          You are mixing up two concepts frame rate and sampling rate. In a video presented at 24 fps each frame, potentially, has a wide range of spatial frequencies. Typically the spatial frequencies are limited by the number of pixels, but you can low pass filter each frame to reduce the spatial frequencies (you will end up with a blurry picture). This spatial filtering has nothing to do with frame rate.



          The 44.1 kHz sampling rate in audio signals is more akin to the spatial frequencies of a picture/frame than the frame rate of a video. An example of audio frames would be something like decomposing the audio signal into a bunch of slices with the short time Fourier transform (STFT), setting each slice to have a constant spectrum (and phase???), and reconstructing. Reconstructing the signal from a modified STFT is non-trivial (cf., Griffin and Lim 1984). Given the difficulties in the process and the lack of an application, I am not sure anyone has really investigated how the duration of the slices affects things.






          share|improve this answer









          $endgroup$













          • $begingroup$
            Lack of an application? Audio compression (mp3, etc) typically uses a Fourier transform (or some sort of wavelet). Pretty much all of the dimensions of compression have been investigated to find the most efficient encoding that limits perceived decay, since compression is lossy.
            $endgroup$
            – Bryan Krause
            1 hour ago












          • $begingroup$
            @BryanKrause yes, but I don't see the link between those types of frames and the idea of a constant segment of signal frame. Maybe there is and maybe I am missing the relevant literature and if so would love to see an answer with more reference ... as I said, I don't have a full answer.
            $endgroup$
            – StrongBad
            1 hour ago
















          1












          $begingroup$

          I don't have a full answer, but it might get things started...



          You are mixing up two concepts frame rate and sampling rate. In a video presented at 24 fps each frame, potentially, has a wide range of spatial frequencies. Typically the spatial frequencies are limited by the number of pixels, but you can low pass filter each frame to reduce the spatial frequencies (you will end up with a blurry picture). This spatial filtering has nothing to do with frame rate.



          The 44.1 kHz sampling rate in audio signals is more akin to the spatial frequencies of a picture/frame than the frame rate of a video. An example of audio frames would be something like decomposing the audio signal into a bunch of slices with the short time Fourier transform (STFT), setting each slice to have a constant spectrum (and phase???), and reconstructing. Reconstructing the signal from a modified STFT is non-trivial (cf., Griffin and Lim 1984). Given the difficulties in the process and the lack of an application, I am not sure anyone has really investigated how the duration of the slices affects things.






          share|improve this answer









          $endgroup$













          • $begingroup$
            Lack of an application? Audio compression (mp3, etc) typically uses a Fourier transform (or some sort of wavelet). Pretty much all of the dimensions of compression have been investigated to find the most efficient encoding that limits perceived decay, since compression is lossy.
            $endgroup$
            – Bryan Krause
            1 hour ago












          • $begingroup$
            @BryanKrause yes, but I don't see the link between those types of frames and the idea of a constant segment of signal frame. Maybe there is and maybe I am missing the relevant literature and if so would love to see an answer with more reference ... as I said, I don't have a full answer.
            $endgroup$
            – StrongBad
            1 hour ago














          1












          1








          1





          $begingroup$

          I don't have a full answer, but it might get things started...



          You are mixing up two concepts frame rate and sampling rate. In a video presented at 24 fps each frame, potentially, has a wide range of spatial frequencies. Typically the spatial frequencies are limited by the number of pixels, but you can low pass filter each frame to reduce the spatial frequencies (you will end up with a blurry picture). This spatial filtering has nothing to do with frame rate.



          The 44.1 kHz sampling rate in audio signals is more akin to the spatial frequencies of a picture/frame than the frame rate of a video. An example of audio frames would be something like decomposing the audio signal into a bunch of slices with the short time Fourier transform (STFT), setting each slice to have a constant spectrum (and phase???), and reconstructing. Reconstructing the signal from a modified STFT is non-trivial (cf., Griffin and Lim 1984). Given the difficulties in the process and the lack of an application, I am not sure anyone has really investigated how the duration of the slices affects things.






          share|improve this answer









          $endgroup$



          I don't have a full answer, but it might get things started...



          You are mixing up two concepts frame rate and sampling rate. In a video presented at 24 fps each frame, potentially, has a wide range of spatial frequencies. Typically the spatial frequencies are limited by the number of pixels, but you can low pass filter each frame to reduce the spatial frequencies (you will end up with a blurry picture). This spatial filtering has nothing to do with frame rate.



          The 44.1 kHz sampling rate in audio signals is more akin to the spatial frequencies of a picture/frame than the frame rate of a video. An example of audio frames would be something like decomposing the audio signal into a bunch of slices with the short time Fourier transform (STFT), setting each slice to have a constant spectrum (and phase???), and reconstructing. Reconstructing the signal from a modified STFT is non-trivial (cf., Griffin and Lim 1984). Given the difficulties in the process and the lack of an application, I am not sure anyone has really investigated how the duration of the slices affects things.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered 2 hours ago









          StrongBadStrongBad

          2,014623




          2,014623












          • $begingroup$
            Lack of an application? Audio compression (mp3, etc) typically uses a Fourier transform (or some sort of wavelet). Pretty much all of the dimensions of compression have been investigated to find the most efficient encoding that limits perceived decay, since compression is lossy.
            $endgroup$
            – Bryan Krause
            1 hour ago












          • $begingroup$
            @BryanKrause yes, but I don't see the link between those types of frames and the idea of a constant segment of signal frame. Maybe there is and maybe I am missing the relevant literature and if so would love to see an answer with more reference ... as I said, I don't have a full answer.
            $endgroup$
            – StrongBad
            1 hour ago


















          • $begingroup$
            Lack of an application? Audio compression (mp3, etc) typically uses a Fourier transform (or some sort of wavelet). Pretty much all of the dimensions of compression have been investigated to find the most efficient encoding that limits perceived decay, since compression is lossy.
            $endgroup$
            – Bryan Krause
            1 hour ago












          • $begingroup$
            @BryanKrause yes, but I don't see the link between those types of frames and the idea of a constant segment of signal frame. Maybe there is and maybe I am missing the relevant literature and if so would love to see an answer with more reference ... as I said, I don't have a full answer.
            $endgroup$
            – StrongBad
            1 hour ago
















          $begingroup$
          Lack of an application? Audio compression (mp3, etc) typically uses a Fourier transform (or some sort of wavelet). Pretty much all of the dimensions of compression have been investigated to find the most efficient encoding that limits perceived decay, since compression is lossy.
          $endgroup$
          – Bryan Krause
          1 hour ago






          $begingroup$
          Lack of an application? Audio compression (mp3, etc) typically uses a Fourier transform (or some sort of wavelet). Pretty much all of the dimensions of compression have been investigated to find the most efficient encoding that limits perceived decay, since compression is lossy.
          $endgroup$
          – Bryan Krause
          1 hour ago














          $begingroup$
          @BryanKrause yes, but I don't see the link between those types of frames and the idea of a constant segment of signal frame. Maybe there is and maybe I am missing the relevant literature and if so would love to see an answer with more reference ... as I said, I don't have a full answer.
          $endgroup$
          – StrongBad
          1 hour ago




          $begingroup$
          @BryanKrause yes, but I don't see the link between those types of frames and the idea of a constant segment of signal frame. Maybe there is and maybe I am missing the relevant literature and if so would love to see an answer with more reference ... as I said, I don't have a full answer.
          $endgroup$
          – StrongBad
          1 hour ago


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Psychology & Neuroscience Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fpsychology.stackexchange.com%2fquestions%2f21572%2ffor-a-persistent-perceptual-experience-why-is-video-able-to-have-a-lower-frame%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          數位音樂下載

          格利澤436b

          When can things happen in Etherscan, such as the picture below?