How can I fix broken shift-JIS filenames?












4














I've got some files with shift-jis filenames in ANSI.
e.g.



home_03@‚¢ƒgƒ‰ƒ“ƒNŠJ‚¢‚½Aƒtƒ@ƒCƒ‹—L‚è 


when they should be in shift-jis like



home_03@青いトランク開いた、ファイル有り


This is because the archive extractor I'm using doesn't support shift-jis. That can't really be helped. But is there a way to fix the filenames of the files I've extracted?



edit:



another example



Ší‹ï‘ä@ƒXƒpƒi


should be



器具台@スパナ









share|improve this question





























    4














    I've got some files with shift-jis filenames in ANSI.
    e.g.



    home_03@‚¢ƒgƒ‰ƒ“ƒNŠJ‚¢‚½Aƒtƒ@ƒCƒ‹—L‚è 


    when they should be in shift-jis like



    home_03@青いトランク開いた、ファイル有り


    This is because the archive extractor I'm using doesn't support shift-jis. That can't really be helped. But is there a way to fix the filenames of the files I've extracted?



    edit:



    another example



    Ší‹ï‘ä@ƒXƒpƒi


    should be



    器具台@スパナ









    share|improve this question



























      4












      4








      4


      1





      I've got some files with shift-jis filenames in ANSI.
      e.g.



      home_03@‚¢ƒgƒ‰ƒ“ƒNŠJ‚¢‚½Aƒtƒ@ƒCƒ‹—L‚è 


      when they should be in shift-jis like



      home_03@青いトランク開いた、ファイル有り


      This is because the archive extractor I'm using doesn't support shift-jis. That can't really be helped. But is there a way to fix the filenames of the files I've extracted?



      edit:



      another example



      Ší‹ï‘ä@ƒXƒpƒi


      should be



      器具台@スパナ









      share|improve this question















      I've got some files with shift-jis filenames in ANSI.
      e.g.



      home_03@‚¢ƒgƒ‰ƒ“ƒNŠJ‚¢‚½Aƒtƒ@ƒCƒ‹—L‚è 


      when they should be in shift-jis like



      home_03@青いトランク開いた、ファイル有り


      This is because the archive extractor I'm using doesn't support shift-jis. That can't really be helped. But is there a way to fix the filenames of the files I've extracted?



      edit:



      another example



      Ší‹ï‘ä@ƒXƒpƒi


      should be



      器具台@スパナ






      windows-8.1 filenames shift-jis






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Dec 23 at 18:13

























      asked Dec 23 at 16:12









      Hiccup

      1478




      1478






















          1 Answer
          1






          active

          oldest

          votes


















          6














          Since you're using Windows, PowerShell is probably the easiest method.



          Now, PowerShell internally uses UTF-16 for its strings, so a conversion would involve four steps:




          1. Read the incorrect filename from the filesystem into PS (represented internally as a UTF-16 string)

          2. Tell PS to convert the string into a raw byte array as if the string were <incorrect encoding>. We can't use the PS string directly (as it's UTF-16).

          3. Tell PS to convert the byte array back to a string interpreting it as <correct encoding>. This will give use a UTF-16 string of the raw bytes interpreted as Shift-JIS.

          4. Rename the file


          Let's start by defining the encodings. In your case, I'm guessing your source is Windows-1252 (default non-Unicode codepage for Western/English Windows).



          $srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
          $destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")


          You could also use [System.Text.Encoding]::Default to get the current system codepage but I prefer to be explicit.



          Then we apply the conversion steps:



          $newName = $destEnc.GetString($srcEnc.GetBytes($oldName))


          In your example, home_03@‚¢ƒgƒ‰ƒ“ƒNŠJ‚¢‚½Aƒtƒ@ƒCƒ‹—L‚è becomes home_03@ツいトランク開いたAファイル有り. While this is different from your example result (see notes at bottom), it matches what I get from http://string-functions.com/encodedecode.aspx's Windows-1252 => Shift-JIS. If this is incorrect, you may have to play around until you find the correct source and destination encodings.



          Putting it together with a standard loop:



          $srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
          $destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")
          Get-ChildItem | %{Rename-Item -LiteralPath "$_" "$($destEnc.GetString($srcEnc.GetBytes($_.Name)))"}


          Or if you prefer to recurse into subdirectories:



          $srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
          $destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")
          Get-ChildItem -Recurse | %{Rename-Item -LiteralPath "$_" "$($destEnc.GetString($srcEnc.GetBytes($_.Name)))"}


          Add -File to Get-ChildItem if you want to avoid renaming directories.





          Looks like your example included two characters that were invalid in Windows-1252 and were likely dropped when you posted the question (based on reversing the process using your example output). There's a 144 (0x90) between the first @ and Â, and a 129 (0x81) between the ½ and A. For the convenience of anyone else looking to test, here's a base64-encoded version of the raw bytes: aG9tZV8wM0CQwoKig2eDiYOTg06KSoKigr2BQYN0g0CDQ4OLl0yC6A==.





          Also note that this will not work when there are characters Windows considers invalid in either your source or destination filenames. Especially in the source filename, as your extraction tool probably would have irrecoverably mangled the name on extraction (by dropping the bytes corresponding to the invalid characters like ? or in the wrong encoding). The only thing you can do in those cases is use an alternative extraction tool that avoids this problem entirely.






          share|improve this answer























          • Thanks a lot. This fixes most of the names but for the rest I'll just have to find a working extraction tool for the archive (psarc format) that the files came from.
            – Hiccup
            Dec 24 at 16:23











          Your Answer








          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "3"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1387145%2fhow-can-i-fix-broken-shift-jis-filenames%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          6














          Since you're using Windows, PowerShell is probably the easiest method.



          Now, PowerShell internally uses UTF-16 for its strings, so a conversion would involve four steps:




          1. Read the incorrect filename from the filesystem into PS (represented internally as a UTF-16 string)

          2. Tell PS to convert the string into a raw byte array as if the string were <incorrect encoding>. We can't use the PS string directly (as it's UTF-16).

          3. Tell PS to convert the byte array back to a string interpreting it as <correct encoding>. This will give use a UTF-16 string of the raw bytes interpreted as Shift-JIS.

          4. Rename the file


          Let's start by defining the encodings. In your case, I'm guessing your source is Windows-1252 (default non-Unicode codepage for Western/English Windows).



          $srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
          $destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")


          You could also use [System.Text.Encoding]::Default to get the current system codepage but I prefer to be explicit.



          Then we apply the conversion steps:



          $newName = $destEnc.GetString($srcEnc.GetBytes($oldName))


          In your example, home_03@‚¢ƒgƒ‰ƒ“ƒNŠJ‚¢‚½Aƒtƒ@ƒCƒ‹—L‚è becomes home_03@ツいトランク開いたAファイル有り. While this is different from your example result (see notes at bottom), it matches what I get from http://string-functions.com/encodedecode.aspx's Windows-1252 => Shift-JIS. If this is incorrect, you may have to play around until you find the correct source and destination encodings.



          Putting it together with a standard loop:



          $srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
          $destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")
          Get-ChildItem | %{Rename-Item -LiteralPath "$_" "$($destEnc.GetString($srcEnc.GetBytes($_.Name)))"}


          Or if you prefer to recurse into subdirectories:



          $srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
          $destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")
          Get-ChildItem -Recurse | %{Rename-Item -LiteralPath "$_" "$($destEnc.GetString($srcEnc.GetBytes($_.Name)))"}


          Add -File to Get-ChildItem if you want to avoid renaming directories.





          Looks like your example included two characters that were invalid in Windows-1252 and were likely dropped when you posted the question (based on reversing the process using your example output). There's a 144 (0x90) between the first @ and Â, and a 129 (0x81) between the ½ and A. For the convenience of anyone else looking to test, here's a base64-encoded version of the raw bytes: aG9tZV8wM0CQwoKig2eDiYOTg06KSoKigr2BQYN0g0CDQ4OLl0yC6A==.





          Also note that this will not work when there are characters Windows considers invalid in either your source or destination filenames. Especially in the source filename, as your extraction tool probably would have irrecoverably mangled the name on extraction (by dropping the bytes corresponding to the invalid characters like ? or in the wrong encoding). The only thing you can do in those cases is use an alternative extraction tool that avoids this problem entirely.






          share|improve this answer























          • Thanks a lot. This fixes most of the names but for the rest I'll just have to find a working extraction tool for the archive (psarc format) that the files came from.
            – Hiccup
            Dec 24 at 16:23
















          6














          Since you're using Windows, PowerShell is probably the easiest method.



          Now, PowerShell internally uses UTF-16 for its strings, so a conversion would involve four steps:




          1. Read the incorrect filename from the filesystem into PS (represented internally as a UTF-16 string)

          2. Tell PS to convert the string into a raw byte array as if the string were <incorrect encoding>. We can't use the PS string directly (as it's UTF-16).

          3. Tell PS to convert the byte array back to a string interpreting it as <correct encoding>. This will give use a UTF-16 string of the raw bytes interpreted as Shift-JIS.

          4. Rename the file


          Let's start by defining the encodings. In your case, I'm guessing your source is Windows-1252 (default non-Unicode codepage for Western/English Windows).



          $srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
          $destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")


          You could also use [System.Text.Encoding]::Default to get the current system codepage but I prefer to be explicit.



          Then we apply the conversion steps:



          $newName = $destEnc.GetString($srcEnc.GetBytes($oldName))


          In your example, home_03@‚¢ƒgƒ‰ƒ“ƒNŠJ‚¢‚½Aƒtƒ@ƒCƒ‹—L‚è becomes home_03@ツいトランク開いたAファイル有り. While this is different from your example result (see notes at bottom), it matches what I get from http://string-functions.com/encodedecode.aspx's Windows-1252 => Shift-JIS. If this is incorrect, you may have to play around until you find the correct source and destination encodings.



          Putting it together with a standard loop:



          $srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
          $destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")
          Get-ChildItem | %{Rename-Item -LiteralPath "$_" "$($destEnc.GetString($srcEnc.GetBytes($_.Name)))"}


          Or if you prefer to recurse into subdirectories:



          $srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
          $destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")
          Get-ChildItem -Recurse | %{Rename-Item -LiteralPath "$_" "$($destEnc.GetString($srcEnc.GetBytes($_.Name)))"}


          Add -File to Get-ChildItem if you want to avoid renaming directories.





          Looks like your example included two characters that were invalid in Windows-1252 and were likely dropped when you posted the question (based on reversing the process using your example output). There's a 144 (0x90) between the first @ and Â, and a 129 (0x81) between the ½ and A. For the convenience of anyone else looking to test, here's a base64-encoded version of the raw bytes: aG9tZV8wM0CQwoKig2eDiYOTg06KSoKigr2BQYN0g0CDQ4OLl0yC6A==.





          Also note that this will not work when there are characters Windows considers invalid in either your source or destination filenames. Especially in the source filename, as your extraction tool probably would have irrecoverably mangled the name on extraction (by dropping the bytes corresponding to the invalid characters like ? or in the wrong encoding). The only thing you can do in those cases is use an alternative extraction tool that avoids this problem entirely.






          share|improve this answer























          • Thanks a lot. This fixes most of the names but for the rest I'll just have to find a working extraction tool for the archive (psarc format) that the files came from.
            – Hiccup
            Dec 24 at 16:23














          6












          6








          6






          Since you're using Windows, PowerShell is probably the easiest method.



          Now, PowerShell internally uses UTF-16 for its strings, so a conversion would involve four steps:




          1. Read the incorrect filename from the filesystem into PS (represented internally as a UTF-16 string)

          2. Tell PS to convert the string into a raw byte array as if the string were <incorrect encoding>. We can't use the PS string directly (as it's UTF-16).

          3. Tell PS to convert the byte array back to a string interpreting it as <correct encoding>. This will give use a UTF-16 string of the raw bytes interpreted as Shift-JIS.

          4. Rename the file


          Let's start by defining the encodings. In your case, I'm guessing your source is Windows-1252 (default non-Unicode codepage for Western/English Windows).



          $srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
          $destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")


          You could also use [System.Text.Encoding]::Default to get the current system codepage but I prefer to be explicit.



          Then we apply the conversion steps:



          $newName = $destEnc.GetString($srcEnc.GetBytes($oldName))


          In your example, home_03@‚¢ƒgƒ‰ƒ“ƒNŠJ‚¢‚½Aƒtƒ@ƒCƒ‹—L‚è becomes home_03@ツいトランク開いたAファイル有り. While this is different from your example result (see notes at bottom), it matches what I get from http://string-functions.com/encodedecode.aspx's Windows-1252 => Shift-JIS. If this is incorrect, you may have to play around until you find the correct source and destination encodings.



          Putting it together with a standard loop:



          $srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
          $destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")
          Get-ChildItem | %{Rename-Item -LiteralPath "$_" "$($destEnc.GetString($srcEnc.GetBytes($_.Name)))"}


          Or if you prefer to recurse into subdirectories:



          $srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
          $destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")
          Get-ChildItem -Recurse | %{Rename-Item -LiteralPath "$_" "$($destEnc.GetString($srcEnc.GetBytes($_.Name)))"}


          Add -File to Get-ChildItem if you want to avoid renaming directories.





          Looks like your example included two characters that were invalid in Windows-1252 and were likely dropped when you posted the question (based on reversing the process using your example output). There's a 144 (0x90) between the first @ and Â, and a 129 (0x81) between the ½ and A. For the convenience of anyone else looking to test, here's a base64-encoded version of the raw bytes: aG9tZV8wM0CQwoKig2eDiYOTg06KSoKigr2BQYN0g0CDQ4OLl0yC6A==.





          Also note that this will not work when there are characters Windows considers invalid in either your source or destination filenames. Especially in the source filename, as your extraction tool probably would have irrecoverably mangled the name on extraction (by dropping the bytes corresponding to the invalid characters like ? or in the wrong encoding). The only thing you can do in those cases is use an alternative extraction tool that avoids this problem entirely.






          share|improve this answer














          Since you're using Windows, PowerShell is probably the easiest method.



          Now, PowerShell internally uses UTF-16 for its strings, so a conversion would involve four steps:




          1. Read the incorrect filename from the filesystem into PS (represented internally as a UTF-16 string)

          2. Tell PS to convert the string into a raw byte array as if the string were <incorrect encoding>. We can't use the PS string directly (as it's UTF-16).

          3. Tell PS to convert the byte array back to a string interpreting it as <correct encoding>. This will give use a UTF-16 string of the raw bytes interpreted as Shift-JIS.

          4. Rename the file


          Let's start by defining the encodings. In your case, I'm guessing your source is Windows-1252 (default non-Unicode codepage for Western/English Windows).



          $srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
          $destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")


          You could also use [System.Text.Encoding]::Default to get the current system codepage but I prefer to be explicit.



          Then we apply the conversion steps:



          $newName = $destEnc.GetString($srcEnc.GetBytes($oldName))


          In your example, home_03@‚¢ƒgƒ‰ƒ“ƒNŠJ‚¢‚½Aƒtƒ@ƒCƒ‹—L‚è becomes home_03@ツいトランク開いたAファイル有り. While this is different from your example result (see notes at bottom), it matches what I get from http://string-functions.com/encodedecode.aspx's Windows-1252 => Shift-JIS. If this is incorrect, you may have to play around until you find the correct source and destination encodings.



          Putting it together with a standard loop:



          $srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
          $destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")
          Get-ChildItem | %{Rename-Item -LiteralPath "$_" "$($destEnc.GetString($srcEnc.GetBytes($_.Name)))"}


          Or if you prefer to recurse into subdirectories:



          $srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
          $destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")
          Get-ChildItem -Recurse | %{Rename-Item -LiteralPath "$_" "$($destEnc.GetString($srcEnc.GetBytes($_.Name)))"}


          Add -File to Get-ChildItem if you want to avoid renaming directories.





          Looks like your example included two characters that were invalid in Windows-1252 and were likely dropped when you posted the question (based on reversing the process using your example output). There's a 144 (0x90) between the first @ and Â, and a 129 (0x81) between the ½ and A. For the convenience of anyone else looking to test, here's a base64-encoded version of the raw bytes: aG9tZV8wM0CQwoKig2eDiYOTg06KSoKigr2BQYN0g0CDQ4OLl0yC6A==.





          Also note that this will not work when there are characters Windows considers invalid in either your source or destination filenames. Especially in the source filename, as your extraction tool probably would have irrecoverably mangled the name on extraction (by dropping the bytes corresponding to the invalid characters like ? or in the wrong encoding). The only thing you can do in those cases is use an alternative extraction tool that avoids this problem entirely.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Dec 24 at 13:50

























          answered Dec 23 at 16:48









          Bob

          45.4k20137172




          45.4k20137172












          • Thanks a lot. This fixes most of the names but for the rest I'll just have to find a working extraction tool for the archive (psarc format) that the files came from.
            – Hiccup
            Dec 24 at 16:23


















          • Thanks a lot. This fixes most of the names but for the rest I'll just have to find a working extraction tool for the archive (psarc format) that the files came from.
            – Hiccup
            Dec 24 at 16:23
















          Thanks a lot. This fixes most of the names but for the rest I'll just have to find a working extraction tool for the archive (psarc format) that the files came from.
          – Hiccup
          Dec 24 at 16:23




          Thanks a lot. This fixes most of the names but for the rest I'll just have to find a working extraction tool for the archive (psarc format) that the files came from.
          – Hiccup
          Dec 24 at 16:23


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Super User!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.





          Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


          Please pay close attention to the following guidance:


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1387145%2fhow-can-i-fix-broken-shift-jis-filenames%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          How did Captain America manage to do this?

          迪纳利

          南乌拉尔铁路局