How can I fix broken shift-JIS filenames?

I've got some files with shift-jis filenames in ANSI.
e.g.

home_03@Â‚¢ƒgƒ‰ƒ“ƒNŠJ‚¢‚½Aƒtƒ@ƒCƒ‹—L‚è

when they should be in shift-jis like

home_03@青いトランク開いた、ファイル有り

This is because the archive extractor I'm using doesn't support shift-jis. That can't really be helped. But is there a way to fix the filenames of the files I've extracted?

edit:

another example

Ší‹ï‘ä@ƒXƒpƒi

should be

器具台@スパナ

edited Dec 23 at 18:13

asked Dec 23 at 16:12

Hiccup

1478

add a comment |

I've got some files with shift-jis filenames in ANSI.
e.g.

home_03@Â‚¢ƒgƒ‰ƒ“ƒNŠJ‚¢‚½Aƒtƒ@ƒCƒ‹—L‚è

when they should be in shift-jis like

home_03@青いトランク開いた、ファイル有り

This is because the archive extractor I'm using doesn't support shift-jis. That can't really be helped. But is there a way to fix the filenames of the files I've extracted?

edit:

another example

Ší‹ï‘ä@ƒXƒpƒi

should be

器具台@スパナ

edited Dec 23 at 18:13

asked Dec 23 at 16:12

Hiccup

1478

add a comment |

I've got some files with shift-jis filenames in ANSI.
e.g.

home_03@Â‚¢ƒgƒ‰ƒ“ƒNŠJ‚¢‚½Aƒtƒ@ƒCƒ‹—L‚è

when they should be in shift-jis like

home_03@青いトランク開いた、ファイル有り

This is because the archive extractor I'm using doesn't support shift-jis. That can't really be helped. But is there a way to fix the filenames of the files I've extracted?

edit:

another example

Ší‹ï‘ä@ƒXƒpƒi

should be

器具台@スパナ

edited Dec 23 at 18:13

asked Dec 23 at 16:12

Hiccup

1478

I've got some files with shift-jis filenames in ANSI.
e.g.

home_03@Â‚¢ƒgƒ‰ƒ“ƒNŠJ‚¢‚½Aƒtƒ@ƒCƒ‹—L‚è

when they should be in shift-jis like

home_03@青いトランク開いた、ファイル有り

This is because the archive extractor I'm using doesn't support shift-jis. That can't really be helped. But is there a way to fix the filenames of the files I've extracted?

edit:

another example

Ší‹ï‘ä@ƒXƒpƒi

should be

器具台@スパナ

windows-8.1 filenames shift-jis

edited Dec 23 at 18:13

asked Dec 23 at 16:12

Hiccup

1478

edited Dec 23 at 18:13

asked Dec 23 at 16:12

Hiccup

1478

edited Dec 23 at 18:13

asked Dec 23 at 16:12

Hiccup

1478

asked Dec 23 at 16:12

Hiccup

1478

asked Dec 23 at 16:12

Hiccup

1478

add a comment |

1 Answer
1

active

oldest

votes

Since you're using Windows, PowerShell is probably the easiest method.

Now, PowerShell internally uses UTF-16 for its strings, so a conversion would involve four steps:

Read the incorrect filename from the filesystem into PS (represented internally as a UTF-16 string)

Tell PS to convert the string into a raw byte array as if the string were <incorrect encoding>. We can't use the PS string directly (as it's UTF-16).

Tell PS to convert the byte array back to a string interpreting it as <correct encoding>. This will give use a UTF-16 string of the raw bytes interpreted as Shift-JIS.

Rename the file

Let's start by defining the encodings. In your case, I'm guessing your source is Windows-1252 (default non-Unicode codepage for Western/English Windows).

$srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")

$destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")

You could also use [System.Text.Encoding]::Default to get the current system codepage but I prefer to be explicit.

Then we apply the conversion steps:

$newName = $destEnc.GetString($srcEnc.GetBytes($oldName))

In your example, home_03@Â‚¢ƒgƒ‰ƒ“ƒNŠJ‚¢‚½Aƒtƒ@ƒCƒ‹—L‚è becomes home_03@ﾂいトランク開いたAファイル有り. While this is different from your example result (see notes at bottom), it matches what I get from http://string-functions.com/encodedecode.aspx's Windows-1252 => Shift-JIS. If this is incorrect, you may have to play around until you find the correct source and destination encodings.

Putting it together with a standard loop:

$srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")

$destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")

Get-ChildItem | %{Rename-Item -LiteralPath "$_" "$($destEnc.GetString($srcEnc.GetBytes($_.Name)))"}

Or if you prefer to recurse into subdirectories:

$srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")

$destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")

Get-ChildItem -Recurse | %{Rename-Item -LiteralPath "$_" "$($destEnc.GetString($srcEnc.GetBytes($_.Name)))"}

Add -File to Get-ChildItem if you want to avoid renaming directories.

Looks like your example included two characters that were invalid in Windows-1252 and were likely dropped when you posted the question (based on reversing the process using your example output). There's a 144 (0x90) between the first @ and Â, and a 129 (0x81) between the ½ and A. For the convenience of anyone else looking to test, here's a base64-encoded version of the raw bytes: aG9tZV8wM0CQwoKig2eDiYOTg06KSoKigr2BQYN0g0CDQ4OLl0yC6A==.

Also note that this will not work when there are characters Windows considers invalid in either your source or destination filenames. Especially in the source filename, as your extraction tool probably would have irrecoverably mangled the name on extraction (by dropping the bytes corresponding to the invalid characters like ? or in the wrong encoding). The only thing you can do in those cases is use an alternative extraction tool that avoids this problem entirely.

edited Dec 24 at 13:50

answered Dec 23 at 16:48

Bob

45.4k20137172

Thanks a lot. This fixes most of the names but for the rest I'll just have to find a working extraction tool for the archive (psarc format) that the files came from.
– Hiccup
Dec 24 at 16:23

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "3"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1387145%2fhow-can-i-fix-broken-shift-jis-filenames%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Since you're using Windows, PowerShell is probably the easiest method.

Now, PowerShell internally uses UTF-16 for its strings, so a conversion would involve four steps:

Read the incorrect filename from the filesystem into PS (represented internally as a UTF-16 string)

Tell PS to convert the string into a raw byte array as if the string were <incorrect encoding>. We can't use the PS string directly (as it's UTF-16).

Tell PS to convert the byte array back to a string interpreting it as <correct encoding>. This will give use a UTF-16 string of the raw bytes interpreted as Shift-JIS.

Rename the file

Let's start by defining the encodings. In your case, I'm guessing your source is Windows-1252 (default non-Unicode codepage for Western/English Windows).

$srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")

$destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")

You could also use [System.Text.Encoding]::Default to get the current system codepage but I prefer to be explicit.

Then we apply the conversion steps:

$newName = $destEnc.GetString($srcEnc.GetBytes($oldName))

Putting it together with a standard loop:

$srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")

$destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")

Get-ChildItem | %{Rename-Item -LiteralPath "$_" "$($destEnc.GetString($srcEnc.GetBytes($_.Name)))"}

Or if you prefer to recurse into subdirectories:

$srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")

$destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")

Get-ChildItem -Recurse | %{Rename-Item -LiteralPath "$_" "$($destEnc.GetString($srcEnc.GetBytes($_.Name)))"}

Add -File to Get-ChildItem if you want to avoid renaming directories.

edited Dec 24 at 13:50

answered Dec 23 at 16:48

Bob

45.4k20137172

Thanks a lot. This fixes most of the names but for the rest I'll just have to find a working extraction tool for the archive (psarc format) that the files came from.
– Hiccup
Dec 24 at 16:23

add a comment |

Since you're using Windows, PowerShell is probably the easiest method.

Now, PowerShell internally uses UTF-16 for its strings, so a conversion would involve four steps:

Read the incorrect filename from the filesystem into PS (represented internally as a UTF-16 string)

Tell PS to convert the string into a raw byte array as if the string were <incorrect encoding>. We can't use the PS string directly (as it's UTF-16).

Tell PS to convert the byte array back to a string interpreting it as <correct encoding>. This will give use a UTF-16 string of the raw bytes interpreted as Shift-JIS.

Rename the file

Let's start by defining the encodings. In your case, I'm guessing your source is Windows-1252 (default non-Unicode codepage for Western/English Windows).

$srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")

$destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")

You could also use [System.Text.Encoding]::Default to get the current system codepage but I prefer to be explicit.

Then we apply the conversion steps:

$newName = $destEnc.GetString($srcEnc.GetBytes($oldName))

Putting it together with a standard loop:

$srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")

$destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")

Get-ChildItem | %{Rename-Item -LiteralPath "$_" "$($destEnc.GetString($srcEnc.GetBytes($_.Name)))"}

Or if you prefer to recurse into subdirectories:

$srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")

$destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")

Get-ChildItem -Recurse | %{Rename-Item -LiteralPath "$_" "$($destEnc.GetString($srcEnc.GetBytes($_.Name)))"}

Add -File to Get-ChildItem if you want to avoid renaming directories.

edited Dec 24 at 13:50

answered Dec 23 at 16:48

Bob

45.4k20137172

Thanks a lot. This fixes most of the names but for the rest I'll just have to find a working extraction tool for the archive (psarc format) that the files came from.
– Hiccup
Dec 24 at 16:23

add a comment |

Since you're using Windows, PowerShell is probably the easiest method.

Now, PowerShell internally uses UTF-16 for its strings, so a conversion would involve four steps:

Read the incorrect filename from the filesystem into PS (represented internally as a UTF-16 string)

Tell PS to convert the string into a raw byte array as if the string were <incorrect encoding>. We can't use the PS string directly (as it's UTF-16).

Tell PS to convert the byte array back to a string interpreting it as <correct encoding>. This will give use a UTF-16 string of the raw bytes interpreted as Shift-JIS.

Rename the file

Let's start by defining the encodings. In your case, I'm guessing your source is Windows-1252 (default non-Unicode codepage for Western/English Windows).

$srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")

$destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")

You could also use [System.Text.Encoding]::Default to get the current system codepage but I prefer to be explicit.

Then we apply the conversion steps:

$newName = $destEnc.GetString($srcEnc.GetBytes($oldName))

Putting it together with a standard loop:

$srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")

$destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")

Get-ChildItem | %{Rename-Item -LiteralPath "$_" "$($destEnc.GetString($srcEnc.GetBytes($_.Name)))"}

Or if you prefer to recurse into subdirectories:

$srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")

$destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")

Get-ChildItem -Recurse | %{Rename-Item -LiteralPath "$_" "$($destEnc.GetString($srcEnc.GetBytes($_.Name)))"}

Add -File to Get-ChildItem if you want to avoid renaming directories.

edited Dec 24 at 13:50

answered Dec 23 at 16:48

Bob

45.4k20137172

Since you're using Windows, PowerShell is probably the easiest method.

Now, PowerShell internally uses UTF-16 for its strings, so a conversion would involve four steps:

Read the incorrect filename from the filesystem into PS (represented internally as a UTF-16 string)

Tell PS to convert the string into a raw byte array as if the string were <incorrect encoding>. We can't use the PS string directly (as it's UTF-16).

Tell PS to convert the byte array back to a string interpreting it as <correct encoding>. This will give use a UTF-16 string of the raw bytes interpreted as Shift-JIS.

Rename the file

Let's start by defining the encodings. In your case, I'm guessing your source is Windows-1252 (default non-Unicode codepage for Western/English Windows).

$srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")

$destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")

You could also use [System.Text.Encoding]::Default to get the current system codepage but I prefer to be explicit.

Then we apply the conversion steps:

$newName = $destEnc.GetString($srcEnc.GetBytes($oldName))

Putting it together with a standard loop:

$srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")

$destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")

Get-ChildItem | %{Rename-Item -LiteralPath "$_" "$($destEnc.GetString($srcEnc.GetBytes($_.Name)))"}

Or if you prefer to recurse into subdirectories:

$srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")

$destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")

Get-ChildItem -Recurse | %{Rename-Item -LiteralPath "$_" "$($destEnc.GetString($srcEnc.GetBytes($_.Name)))"}

Add -File to Get-ChildItem if you want to avoid renaming directories.

edited Dec 24 at 13:50

answered Dec 23 at 16:48

Bob

45.4k20137172

edited Dec 24 at 13:50

answered Dec 23 at 16:48

Bob

45.4k20137172

answered Dec 23 at 16:48

Bob

45.4k20137172

answered Dec 23 at 16:48

Bob

45.4k20137172

Thanks a lot. This fixes most of the names but for the rest I'll just have to find a working extraction tool for the archive (psarc format) that the files came from.
– Hiccup
Dec 24 at 16:23

add a comment |

Thanks a lot. This fixes most of the names but for the rest I'll just have to find a working extraction tool for the archive (psarc format) that the files came from.
– Hiccup
Dec 24 at 16:23

Thanks a lot. This fixes most of the names but for the rest I'll just have to find a working extraction tool for the archive (psarc format) that the files came from.
– Hiccup
Dec 24 at 16:23

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Super User!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cfxtrjtrk