How can I fix broken shift-JIS filenames?
I've got some files with shift-jis filenames in ANSI.
e.g.
home_03@‚¢ƒgƒ‰ƒ“ƒNŠJ‚¢‚½Aƒtƒ@ƒCƒ‹—L‚è
when they should be in shift-jis like
home_03@青いトランク開いた、ファイル有り
This is because the archive extractor I'm using doesn't support shift-jis. That can't really be helped. But is there a way to fix the filenames of the files I've extracted?
edit:
another example
Ší‹ï‘ä@ƒXƒpƒi
should be
器具台@スパナ
windows-8.1 filenames shift-jis
add a comment |
I've got some files with shift-jis filenames in ANSI.
e.g.
home_03@‚¢ƒgƒ‰ƒ“ƒNŠJ‚¢‚½Aƒtƒ@ƒCƒ‹—L‚è
when they should be in shift-jis like
home_03@青いトランク開いた、ファイル有り
This is because the archive extractor I'm using doesn't support shift-jis. That can't really be helped. But is there a way to fix the filenames of the files I've extracted?
edit:
another example
Ší‹ï‘ä@ƒXƒpƒi
should be
器具台@スパナ
windows-8.1 filenames shift-jis
add a comment |
I've got some files with shift-jis filenames in ANSI.
e.g.
home_03@‚¢ƒgƒ‰ƒ“ƒNŠJ‚¢‚½Aƒtƒ@ƒCƒ‹—L‚è
when they should be in shift-jis like
home_03@青いトランク開いた、ファイル有り
This is because the archive extractor I'm using doesn't support shift-jis. That can't really be helped. But is there a way to fix the filenames of the files I've extracted?
edit:
another example
Ší‹ï‘ä@ƒXƒpƒi
should be
器具台@スパナ
windows-8.1 filenames shift-jis
I've got some files with shift-jis filenames in ANSI.
e.g.
home_03@‚¢ƒgƒ‰ƒ“ƒNŠJ‚¢‚½Aƒtƒ@ƒCƒ‹—L‚è
when they should be in shift-jis like
home_03@青いトランク開いた、ファイル有り
This is because the archive extractor I'm using doesn't support shift-jis. That can't really be helped. But is there a way to fix the filenames of the files I've extracted?
edit:
another example
Ší‹ï‘ä@ƒXƒpƒi
should be
器具台@スパナ
windows-8.1 filenames shift-jis
windows-8.1 filenames shift-jis
edited Dec 23 at 18:13
asked Dec 23 at 16:12
Hiccup
1478
1478
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
Since you're using Windows, PowerShell is probably the easiest method.
Now, PowerShell internally uses UTF-16 for its strings, so a conversion would involve four steps:
- Read the incorrect filename from the filesystem into PS (represented internally as a UTF-16 string)
- Tell PS to convert the string into a raw byte array as if the string were <incorrect encoding>. We can't use the PS string directly (as it's UTF-16).
- Tell PS to convert the byte array back to a string interpreting it as <correct encoding>. This will give use a UTF-16 string of the raw bytes interpreted as Shift-JIS.
- Rename the file
Let's start by defining the encodings. In your case, I'm guessing your source is Windows-1252 (default non-Unicode codepage for Western/English Windows).
$srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
$destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")
You could also use [System.Text.Encoding]::Default
to get the current system codepage but I prefer to be explicit.
Then we apply the conversion steps:
$newName = $destEnc.GetString($srcEnc.GetBytes($oldName))
In your example, home_03@‚¢ƒgƒ‰ƒ“ƒNŠJ‚¢‚½Aƒtƒ@ƒCƒ‹—L‚è
becomes home_03@ツいトランク開いたAファイル有り
. While this is different from your example result (see notes at bottom), it matches what I get from http://string-functions.com/encodedecode.aspx's Windows-1252 => Shift-JIS. If this is incorrect, you may have to play around until you find the correct source and destination encodings.
Putting it together with a standard loop:
$srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
$destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")
Get-ChildItem | %{Rename-Item -LiteralPath "$_" "$($destEnc.GetString($srcEnc.GetBytes($_.Name)))"}
Or if you prefer to recurse into subdirectories:
$srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
$destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")
Get-ChildItem -Recurse | %{Rename-Item -LiteralPath "$_" "$($destEnc.GetString($srcEnc.GetBytes($_.Name)))"}
Add -File
to Get-ChildItem
if you want to avoid renaming directories.
Looks like your example included two characters that were invalid in Windows-1252 and were likely dropped when you posted the question (based on reversing the process using your example output). There's a 144
(0x90
) between the first @
and Â
, and a 129
(0x81
) between the ½
and A
. For the convenience of anyone else looking to test, here's a base64-encoded version of the raw bytes: aG9tZV8wM0CQwoKig2eDiYOTg06KSoKigr2BQYN0g0CDQ4OLl0yC6A==
.
Also note that this will not work when there are characters Windows considers invalid in either your source or destination filenames. Especially in the source filename, as your extraction tool probably would have irrecoverably mangled the name on extraction (by dropping the bytes corresponding to the invalid characters like ?
or in the wrong encoding). The only thing you can do in those cases is use an alternative extraction tool that avoids this problem entirely.
Thanks a lot. This fixes most of the names but for the rest I'll just have to find a working extraction tool for the archive (psarc format) that the files came from.
– Hiccup
Dec 24 at 16:23
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "3"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1387145%2fhow-can-i-fix-broken-shift-jis-filenames%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Since you're using Windows, PowerShell is probably the easiest method.
Now, PowerShell internally uses UTF-16 for its strings, so a conversion would involve four steps:
- Read the incorrect filename from the filesystem into PS (represented internally as a UTF-16 string)
- Tell PS to convert the string into a raw byte array as if the string were <incorrect encoding>. We can't use the PS string directly (as it's UTF-16).
- Tell PS to convert the byte array back to a string interpreting it as <correct encoding>. This will give use a UTF-16 string of the raw bytes interpreted as Shift-JIS.
- Rename the file
Let's start by defining the encodings. In your case, I'm guessing your source is Windows-1252 (default non-Unicode codepage for Western/English Windows).
$srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
$destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")
You could also use [System.Text.Encoding]::Default
to get the current system codepage but I prefer to be explicit.
Then we apply the conversion steps:
$newName = $destEnc.GetString($srcEnc.GetBytes($oldName))
In your example, home_03@‚¢ƒgƒ‰ƒ“ƒNŠJ‚¢‚½Aƒtƒ@ƒCƒ‹—L‚è
becomes home_03@ツいトランク開いたAファイル有り
. While this is different from your example result (see notes at bottom), it matches what I get from http://string-functions.com/encodedecode.aspx's Windows-1252 => Shift-JIS. If this is incorrect, you may have to play around until you find the correct source and destination encodings.
Putting it together with a standard loop:
$srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
$destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")
Get-ChildItem | %{Rename-Item -LiteralPath "$_" "$($destEnc.GetString($srcEnc.GetBytes($_.Name)))"}
Or if you prefer to recurse into subdirectories:
$srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
$destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")
Get-ChildItem -Recurse | %{Rename-Item -LiteralPath "$_" "$($destEnc.GetString($srcEnc.GetBytes($_.Name)))"}
Add -File
to Get-ChildItem
if you want to avoid renaming directories.
Looks like your example included two characters that were invalid in Windows-1252 and were likely dropped when you posted the question (based on reversing the process using your example output). There's a 144
(0x90
) between the first @
and Â
, and a 129
(0x81
) between the ½
and A
. For the convenience of anyone else looking to test, here's a base64-encoded version of the raw bytes: aG9tZV8wM0CQwoKig2eDiYOTg06KSoKigr2BQYN0g0CDQ4OLl0yC6A==
.
Also note that this will not work when there are characters Windows considers invalid in either your source or destination filenames. Especially in the source filename, as your extraction tool probably would have irrecoverably mangled the name on extraction (by dropping the bytes corresponding to the invalid characters like ?
or in the wrong encoding). The only thing you can do in those cases is use an alternative extraction tool that avoids this problem entirely.
Thanks a lot. This fixes most of the names but for the rest I'll just have to find a working extraction tool for the archive (psarc format) that the files came from.
– Hiccup
Dec 24 at 16:23
add a comment |
Since you're using Windows, PowerShell is probably the easiest method.
Now, PowerShell internally uses UTF-16 for its strings, so a conversion would involve four steps:
- Read the incorrect filename from the filesystem into PS (represented internally as a UTF-16 string)
- Tell PS to convert the string into a raw byte array as if the string were <incorrect encoding>. We can't use the PS string directly (as it's UTF-16).
- Tell PS to convert the byte array back to a string interpreting it as <correct encoding>. This will give use a UTF-16 string of the raw bytes interpreted as Shift-JIS.
- Rename the file
Let's start by defining the encodings. In your case, I'm guessing your source is Windows-1252 (default non-Unicode codepage for Western/English Windows).
$srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
$destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")
You could also use [System.Text.Encoding]::Default
to get the current system codepage but I prefer to be explicit.
Then we apply the conversion steps:
$newName = $destEnc.GetString($srcEnc.GetBytes($oldName))
In your example, home_03@‚¢ƒgƒ‰ƒ“ƒNŠJ‚¢‚½Aƒtƒ@ƒCƒ‹—L‚è
becomes home_03@ツいトランク開いたAファイル有り
. While this is different from your example result (see notes at bottom), it matches what I get from http://string-functions.com/encodedecode.aspx's Windows-1252 => Shift-JIS. If this is incorrect, you may have to play around until you find the correct source and destination encodings.
Putting it together with a standard loop:
$srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
$destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")
Get-ChildItem | %{Rename-Item -LiteralPath "$_" "$($destEnc.GetString($srcEnc.GetBytes($_.Name)))"}
Or if you prefer to recurse into subdirectories:
$srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
$destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")
Get-ChildItem -Recurse | %{Rename-Item -LiteralPath "$_" "$($destEnc.GetString($srcEnc.GetBytes($_.Name)))"}
Add -File
to Get-ChildItem
if you want to avoid renaming directories.
Looks like your example included two characters that were invalid in Windows-1252 and were likely dropped when you posted the question (based on reversing the process using your example output). There's a 144
(0x90
) between the first @
and Â
, and a 129
(0x81
) between the ½
and A
. For the convenience of anyone else looking to test, here's a base64-encoded version of the raw bytes: aG9tZV8wM0CQwoKig2eDiYOTg06KSoKigr2BQYN0g0CDQ4OLl0yC6A==
.
Also note that this will not work when there are characters Windows considers invalid in either your source or destination filenames. Especially in the source filename, as your extraction tool probably would have irrecoverably mangled the name on extraction (by dropping the bytes corresponding to the invalid characters like ?
or in the wrong encoding). The only thing you can do in those cases is use an alternative extraction tool that avoids this problem entirely.
Thanks a lot. This fixes most of the names but for the rest I'll just have to find a working extraction tool for the archive (psarc format) that the files came from.
– Hiccup
Dec 24 at 16:23
add a comment |
Since you're using Windows, PowerShell is probably the easiest method.
Now, PowerShell internally uses UTF-16 for its strings, so a conversion would involve four steps:
- Read the incorrect filename from the filesystem into PS (represented internally as a UTF-16 string)
- Tell PS to convert the string into a raw byte array as if the string were <incorrect encoding>. We can't use the PS string directly (as it's UTF-16).
- Tell PS to convert the byte array back to a string interpreting it as <correct encoding>. This will give use a UTF-16 string of the raw bytes interpreted as Shift-JIS.
- Rename the file
Let's start by defining the encodings. In your case, I'm guessing your source is Windows-1252 (default non-Unicode codepage for Western/English Windows).
$srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
$destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")
You could also use [System.Text.Encoding]::Default
to get the current system codepage but I prefer to be explicit.
Then we apply the conversion steps:
$newName = $destEnc.GetString($srcEnc.GetBytes($oldName))
In your example, home_03@‚¢ƒgƒ‰ƒ“ƒNŠJ‚¢‚½Aƒtƒ@ƒCƒ‹—L‚è
becomes home_03@ツいトランク開いたAファイル有り
. While this is different from your example result (see notes at bottom), it matches what I get from http://string-functions.com/encodedecode.aspx's Windows-1252 => Shift-JIS. If this is incorrect, you may have to play around until you find the correct source and destination encodings.
Putting it together with a standard loop:
$srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
$destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")
Get-ChildItem | %{Rename-Item -LiteralPath "$_" "$($destEnc.GetString($srcEnc.GetBytes($_.Name)))"}
Or if you prefer to recurse into subdirectories:
$srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
$destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")
Get-ChildItem -Recurse | %{Rename-Item -LiteralPath "$_" "$($destEnc.GetString($srcEnc.GetBytes($_.Name)))"}
Add -File
to Get-ChildItem
if you want to avoid renaming directories.
Looks like your example included two characters that were invalid in Windows-1252 and were likely dropped when you posted the question (based on reversing the process using your example output). There's a 144
(0x90
) between the first @
and Â
, and a 129
(0x81
) between the ½
and A
. For the convenience of anyone else looking to test, here's a base64-encoded version of the raw bytes: aG9tZV8wM0CQwoKig2eDiYOTg06KSoKigr2BQYN0g0CDQ4OLl0yC6A==
.
Also note that this will not work when there are characters Windows considers invalid in either your source or destination filenames. Especially in the source filename, as your extraction tool probably would have irrecoverably mangled the name on extraction (by dropping the bytes corresponding to the invalid characters like ?
or in the wrong encoding). The only thing you can do in those cases is use an alternative extraction tool that avoids this problem entirely.
Since you're using Windows, PowerShell is probably the easiest method.
Now, PowerShell internally uses UTF-16 for its strings, so a conversion would involve four steps:
- Read the incorrect filename from the filesystem into PS (represented internally as a UTF-16 string)
- Tell PS to convert the string into a raw byte array as if the string were <incorrect encoding>. We can't use the PS string directly (as it's UTF-16).
- Tell PS to convert the byte array back to a string interpreting it as <correct encoding>. This will give use a UTF-16 string of the raw bytes interpreted as Shift-JIS.
- Rename the file
Let's start by defining the encodings. In your case, I'm guessing your source is Windows-1252 (default non-Unicode codepage for Western/English Windows).
$srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
$destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")
You could also use [System.Text.Encoding]::Default
to get the current system codepage but I prefer to be explicit.
Then we apply the conversion steps:
$newName = $destEnc.GetString($srcEnc.GetBytes($oldName))
In your example, home_03@‚¢ƒgƒ‰ƒ“ƒNŠJ‚¢‚½Aƒtƒ@ƒCƒ‹—L‚è
becomes home_03@ツいトランク開いたAファイル有り
. While this is different from your example result (see notes at bottom), it matches what I get from http://string-functions.com/encodedecode.aspx's Windows-1252 => Shift-JIS. If this is incorrect, you may have to play around until you find the correct source and destination encodings.
Putting it together with a standard loop:
$srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
$destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")
Get-ChildItem | %{Rename-Item -LiteralPath "$_" "$($destEnc.GetString($srcEnc.GetBytes($_.Name)))"}
Or if you prefer to recurse into subdirectories:
$srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
$destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")
Get-ChildItem -Recurse | %{Rename-Item -LiteralPath "$_" "$($destEnc.GetString($srcEnc.GetBytes($_.Name)))"}
Add -File
to Get-ChildItem
if you want to avoid renaming directories.
Looks like your example included two characters that were invalid in Windows-1252 and were likely dropped when you posted the question (based on reversing the process using your example output). There's a 144
(0x90
) between the first @
and Â
, and a 129
(0x81
) between the ½
and A
. For the convenience of anyone else looking to test, here's a base64-encoded version of the raw bytes: aG9tZV8wM0CQwoKig2eDiYOTg06KSoKigr2BQYN0g0CDQ4OLl0yC6A==
.
Also note that this will not work when there are characters Windows considers invalid in either your source or destination filenames. Especially in the source filename, as your extraction tool probably would have irrecoverably mangled the name on extraction (by dropping the bytes corresponding to the invalid characters like ?
or in the wrong encoding). The only thing you can do in those cases is use an alternative extraction tool that avoids this problem entirely.
edited Dec 24 at 13:50
answered Dec 23 at 16:48
Bob
45.4k20137172
45.4k20137172
Thanks a lot. This fixes most of the names but for the rest I'll just have to find a working extraction tool for the archive (psarc format) that the files came from.
– Hiccup
Dec 24 at 16:23
add a comment |
Thanks a lot. This fixes most of the names but for the rest I'll just have to find a working extraction tool for the archive (psarc format) that the files came from.
– Hiccup
Dec 24 at 16:23
Thanks a lot. This fixes most of the names but for the rest I'll just have to find a working extraction tool for the archive (psarc format) that the files came from.
– Hiccup
Dec 24 at 16:23
Thanks a lot. This fixes most of the names but for the rest I'll just have to find a working extraction tool for the archive (psarc format) that the files came from.
– Hiccup
Dec 24 at 16:23
add a comment |
Thanks for contributing an answer to Super User!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1387145%2fhow-can-i-fix-broken-shift-jis-filenames%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown