Decoding assembly instructions in a Game Boy disassembler
$begingroup$
I am coding a game boy disassembler in C++. The program (simplified here) converts a vector of byte into an assembly instruction and print it.
To do so, it iterates through the vector of bytes (char) with an iterator. I noticed that with the it!=char_vect.end()
condition, there is the risk that the iterator goes out from the vector. So I have to check that I do not get out of the vector at each iteration and if it occurs, I throw an error.
#include <iostream>
#include <vector>
int disassemble(std::vector<char>::iterator& it){
int opcode = *it;
int opbytes = 1; //number of bytes used by the operator
switch(opcode){
case 0x76 : std::cout<<"HALT "<<std::endl; opbytes = 1; break;
case 0x10 : std::cout<<"STOP $"<<(int)*(it+1)<<std::endl; opbytes =2; break;
//... About 250 different cases
default : std::cout<<"Instruction not implemented yet"<<std::endl;
}
return opbytes;
}
int main(){
std::vector<char> char_vect = {0x76, 0x10, 0x20}; // Fill vector of char
std::vector<char>::iterator it = char_vect.begin();
// First I only did that
//while (it!=char_vect.end()){
// it+=disassemble(it);
//}
// But I have to check that I do not go out of the char_vect
while (it!=char_vect.end()){
try{
it+=disassemble(it);
//check that it do not overpass end
if (size_t(it - char_vect.begin())>char_vect.size()){
throw "Try to disassemble instruction outside of memory bounds";
}
}
catch(const char * c){
std::cerr << "Fatal error: " << c << std::endl;
return 1;
}
}
return 1;
}
This code works fine. Now I wonder, I have a possible exception case which is the iterator going further than vect_char.end()
. Is a try / catch
an appropriate way to handle this case knowing that:
The reason for the iterator to go out of bounds could be either:
- A non valid input byte sequence, for example {0x76, 0x20, 0x10} since the instruction 0x10 expects an argument. This should not append.
- A mistake in the disassemble function. For exemple with the valid input {0x76, 0x10, 0x20}, if I code by mistake that 0x10 uses 3 opbytes instead of 2, the iterator will go out of bounds. This should not append either.
If I iterated through the
vect_char
with an index, I would not have this issue and the code would be more compact
I never really used try/catch before and I do not know if it is intented for such situations, so I would like to know, how would you have handled this unexpected case?
c++ error-handling iterator assembly serialization
New contributor
$endgroup$
add a comment |
$begingroup$
I am coding a game boy disassembler in C++. The program (simplified here) converts a vector of byte into an assembly instruction and print it.
To do so, it iterates through the vector of bytes (char) with an iterator. I noticed that with the it!=char_vect.end()
condition, there is the risk that the iterator goes out from the vector. So I have to check that I do not get out of the vector at each iteration and if it occurs, I throw an error.
#include <iostream>
#include <vector>
int disassemble(std::vector<char>::iterator& it){
int opcode = *it;
int opbytes = 1; //number of bytes used by the operator
switch(opcode){
case 0x76 : std::cout<<"HALT "<<std::endl; opbytes = 1; break;
case 0x10 : std::cout<<"STOP $"<<(int)*(it+1)<<std::endl; opbytes =2; break;
//... About 250 different cases
default : std::cout<<"Instruction not implemented yet"<<std::endl;
}
return opbytes;
}
int main(){
std::vector<char> char_vect = {0x76, 0x10, 0x20}; // Fill vector of char
std::vector<char>::iterator it = char_vect.begin();
// First I only did that
//while (it!=char_vect.end()){
// it+=disassemble(it);
//}
// But I have to check that I do not go out of the char_vect
while (it!=char_vect.end()){
try{
it+=disassemble(it);
//check that it do not overpass end
if (size_t(it - char_vect.begin())>char_vect.size()){
throw "Try to disassemble instruction outside of memory bounds";
}
}
catch(const char * c){
std::cerr << "Fatal error: " << c << std::endl;
return 1;
}
}
return 1;
}
This code works fine. Now I wonder, I have a possible exception case which is the iterator going further than vect_char.end()
. Is a try / catch
an appropriate way to handle this case knowing that:
The reason for the iterator to go out of bounds could be either:
- A non valid input byte sequence, for example {0x76, 0x20, 0x10} since the instruction 0x10 expects an argument. This should not append.
- A mistake in the disassemble function. For exemple with the valid input {0x76, 0x10, 0x20}, if I code by mistake that 0x10 uses 3 opbytes instead of 2, the iterator will go out of bounds. This should not append either.
If I iterated through the
vect_char
with an index, I would not have this issue and the code would be more compact
I never really used try/catch before and I do not know if it is intented for such situations, so I would like to know, how would you have handled this unexpected case?
c++ error-handling iterator assembly serialization
New contributor
$endgroup$
3
$begingroup$
This question might be more on-topic over at SoftwareEngineering.StackExchange.com . However, before posting please follow their tour and read "How do I ask a good question?", "What topics can I ask about here?" and "What types of questions should I avoid asking?".
$endgroup$
– BCdotWEB
15 hours ago
$begingroup$
Does or doesn't the current code work as intended? I assume you've tested it?
$endgroup$
– Mast
14 hours ago
1
$begingroup$
@Mast Yes it does work as intended, I just tested it.
$endgroup$
– Mai Kar
14 hours ago
2
$begingroup$
Your error is simply passing insufficient Information todisassemble()
.
$endgroup$
– Deduplicator
13 hours ago
3
$begingroup$
I reformulated my demand because I think it was not clear enough. This question is mainly about validating my design choices, i.e. the use of iterators and try/catch.
$endgroup$
– Mai Kar
12 hours ago
add a comment |
$begingroup$
I am coding a game boy disassembler in C++. The program (simplified here) converts a vector of byte into an assembly instruction and print it.
To do so, it iterates through the vector of bytes (char) with an iterator. I noticed that with the it!=char_vect.end()
condition, there is the risk that the iterator goes out from the vector. So I have to check that I do not get out of the vector at each iteration and if it occurs, I throw an error.
#include <iostream>
#include <vector>
int disassemble(std::vector<char>::iterator& it){
int opcode = *it;
int opbytes = 1; //number of bytes used by the operator
switch(opcode){
case 0x76 : std::cout<<"HALT "<<std::endl; opbytes = 1; break;
case 0x10 : std::cout<<"STOP $"<<(int)*(it+1)<<std::endl; opbytes =2; break;
//... About 250 different cases
default : std::cout<<"Instruction not implemented yet"<<std::endl;
}
return opbytes;
}
int main(){
std::vector<char> char_vect = {0x76, 0x10, 0x20}; // Fill vector of char
std::vector<char>::iterator it = char_vect.begin();
// First I only did that
//while (it!=char_vect.end()){
// it+=disassemble(it);
//}
// But I have to check that I do not go out of the char_vect
while (it!=char_vect.end()){
try{
it+=disassemble(it);
//check that it do not overpass end
if (size_t(it - char_vect.begin())>char_vect.size()){
throw "Try to disassemble instruction outside of memory bounds";
}
}
catch(const char * c){
std::cerr << "Fatal error: " << c << std::endl;
return 1;
}
}
return 1;
}
This code works fine. Now I wonder, I have a possible exception case which is the iterator going further than vect_char.end()
. Is a try / catch
an appropriate way to handle this case knowing that:
The reason for the iterator to go out of bounds could be either:
- A non valid input byte sequence, for example {0x76, 0x20, 0x10} since the instruction 0x10 expects an argument. This should not append.
- A mistake in the disassemble function. For exemple with the valid input {0x76, 0x10, 0x20}, if I code by mistake that 0x10 uses 3 opbytes instead of 2, the iterator will go out of bounds. This should not append either.
If I iterated through the
vect_char
with an index, I would not have this issue and the code would be more compact
I never really used try/catch before and I do not know if it is intented for such situations, so I would like to know, how would you have handled this unexpected case?
c++ error-handling iterator assembly serialization
New contributor
$endgroup$
I am coding a game boy disassembler in C++. The program (simplified here) converts a vector of byte into an assembly instruction and print it.
To do so, it iterates through the vector of bytes (char) with an iterator. I noticed that with the it!=char_vect.end()
condition, there is the risk that the iterator goes out from the vector. So I have to check that I do not get out of the vector at each iteration and if it occurs, I throw an error.
#include <iostream>
#include <vector>
int disassemble(std::vector<char>::iterator& it){
int opcode = *it;
int opbytes = 1; //number of bytes used by the operator
switch(opcode){
case 0x76 : std::cout<<"HALT "<<std::endl; opbytes = 1; break;
case 0x10 : std::cout<<"STOP $"<<(int)*(it+1)<<std::endl; opbytes =2; break;
//... About 250 different cases
default : std::cout<<"Instruction not implemented yet"<<std::endl;
}
return opbytes;
}
int main(){
std::vector<char> char_vect = {0x76, 0x10, 0x20}; // Fill vector of char
std::vector<char>::iterator it = char_vect.begin();
// First I only did that
//while (it!=char_vect.end()){
// it+=disassemble(it);
//}
// But I have to check that I do not go out of the char_vect
while (it!=char_vect.end()){
try{
it+=disassemble(it);
//check that it do not overpass end
if (size_t(it - char_vect.begin())>char_vect.size()){
throw "Try to disassemble instruction outside of memory bounds";
}
}
catch(const char * c){
std::cerr << "Fatal error: " << c << std::endl;
return 1;
}
}
return 1;
}
This code works fine. Now I wonder, I have a possible exception case which is the iterator going further than vect_char.end()
. Is a try / catch
an appropriate way to handle this case knowing that:
The reason for the iterator to go out of bounds could be either:
- A non valid input byte sequence, for example {0x76, 0x20, 0x10} since the instruction 0x10 expects an argument. This should not append.
- A mistake in the disassemble function. For exemple with the valid input {0x76, 0x10, 0x20}, if I code by mistake that 0x10 uses 3 opbytes instead of 2, the iterator will go out of bounds. This should not append either.
If I iterated through the
vect_char
with an index, I would not have this issue and the code would be more compact
I never really used try/catch before and I do not know if it is intented for such situations, so I would like to know, how would you have handled this unexpected case?
c++ error-handling iterator assembly serialization
c++ error-handling iterator assembly serialization
New contributor
New contributor
edited 12 hours ago
Mai Kar
New contributor
asked 15 hours ago
Mai KarMai Kar
465
465
New contributor
New contributor
3
$begingroup$
This question might be more on-topic over at SoftwareEngineering.StackExchange.com . However, before posting please follow their tour and read "How do I ask a good question?", "What topics can I ask about here?" and "What types of questions should I avoid asking?".
$endgroup$
– BCdotWEB
15 hours ago
$begingroup$
Does or doesn't the current code work as intended? I assume you've tested it?
$endgroup$
– Mast
14 hours ago
1
$begingroup$
@Mast Yes it does work as intended, I just tested it.
$endgroup$
– Mai Kar
14 hours ago
2
$begingroup$
Your error is simply passing insufficient Information todisassemble()
.
$endgroup$
– Deduplicator
13 hours ago
3
$begingroup$
I reformulated my demand because I think it was not clear enough. This question is mainly about validating my design choices, i.e. the use of iterators and try/catch.
$endgroup$
– Mai Kar
12 hours ago
add a comment |
3
$begingroup$
This question might be more on-topic over at SoftwareEngineering.StackExchange.com . However, before posting please follow their tour and read "How do I ask a good question?", "What topics can I ask about here?" and "What types of questions should I avoid asking?".
$endgroup$
– BCdotWEB
15 hours ago
$begingroup$
Does or doesn't the current code work as intended? I assume you've tested it?
$endgroup$
– Mast
14 hours ago
1
$begingroup$
@Mast Yes it does work as intended, I just tested it.
$endgroup$
– Mai Kar
14 hours ago
2
$begingroup$
Your error is simply passing insufficient Information todisassemble()
.
$endgroup$
– Deduplicator
13 hours ago
3
$begingroup$
I reformulated my demand because I think it was not clear enough. This question is mainly about validating my design choices, i.e. the use of iterators and try/catch.
$endgroup$
– Mai Kar
12 hours ago
3
3
$begingroup$
This question might be more on-topic over at SoftwareEngineering.StackExchange.com . However, before posting please follow their tour and read "How do I ask a good question?", "What topics can I ask about here?" and "What types of questions should I avoid asking?".
$endgroup$
– BCdotWEB
15 hours ago
$begingroup$
This question might be more on-topic over at SoftwareEngineering.StackExchange.com . However, before posting please follow their tour and read "How do I ask a good question?", "What topics can I ask about here?" and "What types of questions should I avoid asking?".
$endgroup$
– BCdotWEB
15 hours ago
$begingroup$
Does or doesn't the current code work as intended? I assume you've tested it?
$endgroup$
– Mast
14 hours ago
$begingroup$
Does or doesn't the current code work as intended? I assume you've tested it?
$endgroup$
– Mast
14 hours ago
1
1
$begingroup$
@Mast Yes it does work as intended, I just tested it.
$endgroup$
– Mai Kar
14 hours ago
$begingroup$
@Mast Yes it does work as intended, I just tested it.
$endgroup$
– Mai Kar
14 hours ago
2
2
$begingroup$
Your error is simply passing insufficient Information to
disassemble()
.$endgroup$
– Deduplicator
13 hours ago
$begingroup$
Your error is simply passing insufficient Information to
disassemble()
.$endgroup$
– Deduplicator
13 hours ago
3
3
$begingroup$
I reformulated my demand because I think it was not clear enough. This question is mainly about validating my design choices, i.e. the use of iterators and try/catch.
$endgroup$
– Mai Kar
12 hours ago
$begingroup$
I reformulated my demand because I think it was not clear enough. This question is mainly about validating my design choices, i.e. the use of iterators and try/catch.
$endgroup$
– Mai Kar
12 hours ago
add a comment |
3 Answers
3
active
oldest
votes
$begingroup$
I have some ideas about how you might be able to improve your program.
Avoid problems
Rather than trying to deal with the problem for every instruction, one approach is avoiding it entirely. One way to do that is to simply append a number of bytes to the end of the vector
. If the maximum bytes for an instruction is $n$, then append $n-1$ bytes to the end and stop when you've advanced into the padded area.
Check before advancing
One could also pass the number of remaining bytes to the disassemble
function. However, the mechanism I'd suggest would be to pass a range, e.g.:
int diassembleOne(std::vector<char>::iterator& it, std::vector<char>::iterator& end) {
// figure out number of bytes for this opcode
if (std::distance(it, end) > opbytes) {
it = end;
// throw or return 0
}
// disassemble the instruction thoroughly
std::advance(it, opbytes);
return opbytes;
}
Use const
iterators
If all the code is doing is disassembling, then it shouldn't alter the underlying vector
. For that reason, I'd recommend passing a std::vector<char>::const_iterator &
.
Use classes
I'd suggest using an Opcode
class like this:
class Opcode {
char code;
short int bytes;
std::string_view name;
bool operator==(char opcode) const { return code == opcode; }
int decode(std::vector<char>::const_iterator& it, std::ostream& out=std::cout) const {
out << name;
++it;
for (int i{bytes-1}; i; --i) {
out << static_cast<unsigned>(*it++);
}
out << 'n';
return bytes;
}
};
constexpr std::array<Opcode,2> instructions {
{ 0x10, 2, "STOP $" },
{ 0x76, 2, "HALT " },
};
Pass a pair of iterators to the dissemble function
As mentioned before, you can pass a pair of iterators to the disassemble
function. Using that plus the class above:
int disassembleOne(std::vector<char>::const_iterator& it, std::vector<char>::const_iterator& end){
auto op{std::find(instructions.begin(), instructions.end(), *it)};
if (op == instructions.end()) {
std::cout << "Instruction not foundn";
it = end;
return 0; // instruction not found or off the end of the passed array
}
if (std::distance(it, end) < op->bytes) {
std::cout << "Not enough bytes left to decode " << op->name << 'n';
it = end;
return 0; // instruction not found or off the end of the passed array
}
return op->decode(it);
}
Now main
becomes very simple:
int main(){
const std::vector<char> char_vect = {0x76, 0x10, 0x20, 0x10}; // Fill vector of char
auto end{char_vect.cend()};
for (auto it{char_vect.cbegin()}; it != end; disassembleOne(it, end)) {
}
}
Another way to do this would be to put more of the processing in the Opcode
itself -- it would make sense that each opcode would know how to decode itself.
Be clear about caller expectations
This code, as with the original code, expects that the it
being passed is a valid iterator that is not the end. It is good to document that in comments in the code.
$endgroup$
$begingroup$
The loop inmain
doesn't use the result of the call todisassembleOne
.
$endgroup$
– Roland Illig
9 hours ago
$begingroup$
@RolandIllig It doesn't need to use the return value because the iterator, which is passed by reference, is updated within the function.
$endgroup$
– Edward
8 hours ago
add a comment |
$begingroup$
I think the big switch
is a problem. Consider a more data-driven approach where each opcode is described by a struct:
struct OpCode
{
unsigned char command_byte;
unsigned char mask; // if only a subset of bits determine command
unsigned char length;
// other members as needed - e.g. a pointer to the "print" function
};
Now, the code that reads instructions can determine whether the command is unterminated, without needing to repeat the logic for every multi-byte opcode.
I've included the mask
so that we don't encode every single instruction (e.g. ld R1, R2
encodes the source and destination registers in the bits of a single-byte command; it would be tedious and error-prone to write them all separately here).
The simple length
value I've shown isn't quite enough, given that the Game Boy's LR35902 processor supports the 0xCB prefix for extended Z80 instructions - we might want to handle that outside of the normal flow, and use it to switch between different instruction tables.
We get away with a simple length
value here, because the only prefix instruction supported by the Game Boy's LR35902 processor is 0xCB, which is always followed by a single-byte instruction. If we were decoding Z80 instructions (with ED prefix), then we'd need something a little more sophisticated.
$endgroup$
add a comment |
$begingroup$
Your code's division into functions isn't very natural. The objective is to have each function performing one specific task, which should also be as orthogonal as possible to the tasks performed by the other functions. For instance, your disassemble
function performs three different functions: it reads from the instruction stream, it interprets assembly code and returns the number of bytes that should be skipped to get to the next instruction. That's a not-so-coherent mix of responsibilities.
There's also a bug in your code, because it+=disassemble(it);
could point beyond char_vect.end()
which is in itself undefined behavior, even if you don't dereference the iterator.
I would build my disassembler around an iterator, or better even a range (if you don't mind external libraries or anticipating the next standard):
#include <algorithm>
#include <vector>
#include <iostream>
#include <range/v3/view.hpp>
using iterator = std::vector<char>::const_iterator;
struct truncated_instruction {};
// one function to interpret the instruction
void interpret(char instruction) {
if (instruction == 'x' throw bad_instruction();
std::cout << (int) instruction << ' ';
}
// one function to get the number of bytes to skip
int nb_bytes(char c) { return 1; }
class disassembled
: public ranges::view_facade<disassembled>
{
friend ranges::range_access;
iterator first, last;
char const & read() const { return *first; }
bool equal(ranges::default_sentinel) const { return first == last; }
void next() {
// one function to get to the next instruction
auto bytes_to_skip = nb_bytes(*first);
if (std::distance(first, last) < bytes_to_skip) throw truncated_instruction();
// check if there's enough space left to advance before advancing
std::advance(first, bytes_to_skip);
}
public:
disassembled() = default;
explicit disassembled(const std::vector<char>& v) : first(v.begin()), last(v.end()) {}
};
int main() {
std::vector<char> char_vect = {0x76, 0x10, 0x20, 0x30};
try {
for (auto instruction : disassembled(char_vect)) interpret(instruction);
}
catch // ...
}
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Mai Kar is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f215349%2fdecoding-assembly-instructions-in-a-game-boy-disassembler%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
I have some ideas about how you might be able to improve your program.
Avoid problems
Rather than trying to deal with the problem for every instruction, one approach is avoiding it entirely. One way to do that is to simply append a number of bytes to the end of the vector
. If the maximum bytes for an instruction is $n$, then append $n-1$ bytes to the end and stop when you've advanced into the padded area.
Check before advancing
One could also pass the number of remaining bytes to the disassemble
function. However, the mechanism I'd suggest would be to pass a range, e.g.:
int diassembleOne(std::vector<char>::iterator& it, std::vector<char>::iterator& end) {
// figure out number of bytes for this opcode
if (std::distance(it, end) > opbytes) {
it = end;
// throw or return 0
}
// disassemble the instruction thoroughly
std::advance(it, opbytes);
return opbytes;
}
Use const
iterators
If all the code is doing is disassembling, then it shouldn't alter the underlying vector
. For that reason, I'd recommend passing a std::vector<char>::const_iterator &
.
Use classes
I'd suggest using an Opcode
class like this:
class Opcode {
char code;
short int bytes;
std::string_view name;
bool operator==(char opcode) const { return code == opcode; }
int decode(std::vector<char>::const_iterator& it, std::ostream& out=std::cout) const {
out << name;
++it;
for (int i{bytes-1}; i; --i) {
out << static_cast<unsigned>(*it++);
}
out << 'n';
return bytes;
}
};
constexpr std::array<Opcode,2> instructions {
{ 0x10, 2, "STOP $" },
{ 0x76, 2, "HALT " },
};
Pass a pair of iterators to the dissemble function
As mentioned before, you can pass a pair of iterators to the disassemble
function. Using that plus the class above:
int disassembleOne(std::vector<char>::const_iterator& it, std::vector<char>::const_iterator& end){
auto op{std::find(instructions.begin(), instructions.end(), *it)};
if (op == instructions.end()) {
std::cout << "Instruction not foundn";
it = end;
return 0; // instruction not found or off the end of the passed array
}
if (std::distance(it, end) < op->bytes) {
std::cout << "Not enough bytes left to decode " << op->name << 'n';
it = end;
return 0; // instruction not found or off the end of the passed array
}
return op->decode(it);
}
Now main
becomes very simple:
int main(){
const std::vector<char> char_vect = {0x76, 0x10, 0x20, 0x10}; // Fill vector of char
auto end{char_vect.cend()};
for (auto it{char_vect.cbegin()}; it != end; disassembleOne(it, end)) {
}
}
Another way to do this would be to put more of the processing in the Opcode
itself -- it would make sense that each opcode would know how to decode itself.
Be clear about caller expectations
This code, as with the original code, expects that the it
being passed is a valid iterator that is not the end. It is good to document that in comments in the code.
$endgroup$
$begingroup$
The loop inmain
doesn't use the result of the call todisassembleOne
.
$endgroup$
– Roland Illig
9 hours ago
$begingroup$
@RolandIllig It doesn't need to use the return value because the iterator, which is passed by reference, is updated within the function.
$endgroup$
– Edward
8 hours ago
add a comment |
$begingroup$
I have some ideas about how you might be able to improve your program.
Avoid problems
Rather than trying to deal with the problem for every instruction, one approach is avoiding it entirely. One way to do that is to simply append a number of bytes to the end of the vector
. If the maximum bytes for an instruction is $n$, then append $n-1$ bytes to the end and stop when you've advanced into the padded area.
Check before advancing
One could also pass the number of remaining bytes to the disassemble
function. However, the mechanism I'd suggest would be to pass a range, e.g.:
int diassembleOne(std::vector<char>::iterator& it, std::vector<char>::iterator& end) {
// figure out number of bytes for this opcode
if (std::distance(it, end) > opbytes) {
it = end;
// throw or return 0
}
// disassemble the instruction thoroughly
std::advance(it, opbytes);
return opbytes;
}
Use const
iterators
If all the code is doing is disassembling, then it shouldn't alter the underlying vector
. For that reason, I'd recommend passing a std::vector<char>::const_iterator &
.
Use classes
I'd suggest using an Opcode
class like this:
class Opcode {
char code;
short int bytes;
std::string_view name;
bool operator==(char opcode) const { return code == opcode; }
int decode(std::vector<char>::const_iterator& it, std::ostream& out=std::cout) const {
out << name;
++it;
for (int i{bytes-1}; i; --i) {
out << static_cast<unsigned>(*it++);
}
out << 'n';
return bytes;
}
};
constexpr std::array<Opcode,2> instructions {
{ 0x10, 2, "STOP $" },
{ 0x76, 2, "HALT " },
};
Pass a pair of iterators to the dissemble function
As mentioned before, you can pass a pair of iterators to the disassemble
function. Using that plus the class above:
int disassembleOne(std::vector<char>::const_iterator& it, std::vector<char>::const_iterator& end){
auto op{std::find(instructions.begin(), instructions.end(), *it)};
if (op == instructions.end()) {
std::cout << "Instruction not foundn";
it = end;
return 0; // instruction not found or off the end of the passed array
}
if (std::distance(it, end) < op->bytes) {
std::cout << "Not enough bytes left to decode " << op->name << 'n';
it = end;
return 0; // instruction not found or off the end of the passed array
}
return op->decode(it);
}
Now main
becomes very simple:
int main(){
const std::vector<char> char_vect = {0x76, 0x10, 0x20, 0x10}; // Fill vector of char
auto end{char_vect.cend()};
for (auto it{char_vect.cbegin()}; it != end; disassembleOne(it, end)) {
}
}
Another way to do this would be to put more of the processing in the Opcode
itself -- it would make sense that each opcode would know how to decode itself.
Be clear about caller expectations
This code, as with the original code, expects that the it
being passed is a valid iterator that is not the end. It is good to document that in comments in the code.
$endgroup$
$begingroup$
The loop inmain
doesn't use the result of the call todisassembleOne
.
$endgroup$
– Roland Illig
9 hours ago
$begingroup$
@RolandIllig It doesn't need to use the return value because the iterator, which is passed by reference, is updated within the function.
$endgroup$
– Edward
8 hours ago
add a comment |
$begingroup$
I have some ideas about how you might be able to improve your program.
Avoid problems
Rather than trying to deal with the problem for every instruction, one approach is avoiding it entirely. One way to do that is to simply append a number of bytes to the end of the vector
. If the maximum bytes for an instruction is $n$, then append $n-1$ bytes to the end and stop when you've advanced into the padded area.
Check before advancing
One could also pass the number of remaining bytes to the disassemble
function. However, the mechanism I'd suggest would be to pass a range, e.g.:
int diassembleOne(std::vector<char>::iterator& it, std::vector<char>::iterator& end) {
// figure out number of bytes for this opcode
if (std::distance(it, end) > opbytes) {
it = end;
// throw or return 0
}
// disassemble the instruction thoroughly
std::advance(it, opbytes);
return opbytes;
}
Use const
iterators
If all the code is doing is disassembling, then it shouldn't alter the underlying vector
. For that reason, I'd recommend passing a std::vector<char>::const_iterator &
.
Use classes
I'd suggest using an Opcode
class like this:
class Opcode {
char code;
short int bytes;
std::string_view name;
bool operator==(char opcode) const { return code == opcode; }
int decode(std::vector<char>::const_iterator& it, std::ostream& out=std::cout) const {
out << name;
++it;
for (int i{bytes-1}; i; --i) {
out << static_cast<unsigned>(*it++);
}
out << 'n';
return bytes;
}
};
constexpr std::array<Opcode,2> instructions {
{ 0x10, 2, "STOP $" },
{ 0x76, 2, "HALT " },
};
Pass a pair of iterators to the dissemble function
As mentioned before, you can pass a pair of iterators to the disassemble
function. Using that plus the class above:
int disassembleOne(std::vector<char>::const_iterator& it, std::vector<char>::const_iterator& end){
auto op{std::find(instructions.begin(), instructions.end(), *it)};
if (op == instructions.end()) {
std::cout << "Instruction not foundn";
it = end;
return 0; // instruction not found or off the end of the passed array
}
if (std::distance(it, end) < op->bytes) {
std::cout << "Not enough bytes left to decode " << op->name << 'n';
it = end;
return 0; // instruction not found or off the end of the passed array
}
return op->decode(it);
}
Now main
becomes very simple:
int main(){
const std::vector<char> char_vect = {0x76, 0x10, 0x20, 0x10}; // Fill vector of char
auto end{char_vect.cend()};
for (auto it{char_vect.cbegin()}; it != end; disassembleOne(it, end)) {
}
}
Another way to do this would be to put more of the processing in the Opcode
itself -- it would make sense that each opcode would know how to decode itself.
Be clear about caller expectations
This code, as with the original code, expects that the it
being passed is a valid iterator that is not the end. It is good to document that in comments in the code.
$endgroup$
I have some ideas about how you might be able to improve your program.
Avoid problems
Rather than trying to deal with the problem for every instruction, one approach is avoiding it entirely. One way to do that is to simply append a number of bytes to the end of the vector
. If the maximum bytes for an instruction is $n$, then append $n-1$ bytes to the end and stop when you've advanced into the padded area.
Check before advancing
One could also pass the number of remaining bytes to the disassemble
function. However, the mechanism I'd suggest would be to pass a range, e.g.:
int diassembleOne(std::vector<char>::iterator& it, std::vector<char>::iterator& end) {
// figure out number of bytes for this opcode
if (std::distance(it, end) > opbytes) {
it = end;
// throw or return 0
}
// disassemble the instruction thoroughly
std::advance(it, opbytes);
return opbytes;
}
Use const
iterators
If all the code is doing is disassembling, then it shouldn't alter the underlying vector
. For that reason, I'd recommend passing a std::vector<char>::const_iterator &
.
Use classes
I'd suggest using an Opcode
class like this:
class Opcode {
char code;
short int bytes;
std::string_view name;
bool operator==(char opcode) const { return code == opcode; }
int decode(std::vector<char>::const_iterator& it, std::ostream& out=std::cout) const {
out << name;
++it;
for (int i{bytes-1}; i; --i) {
out << static_cast<unsigned>(*it++);
}
out << 'n';
return bytes;
}
};
constexpr std::array<Opcode,2> instructions {
{ 0x10, 2, "STOP $" },
{ 0x76, 2, "HALT " },
};
Pass a pair of iterators to the dissemble function
As mentioned before, you can pass a pair of iterators to the disassemble
function. Using that plus the class above:
int disassembleOne(std::vector<char>::const_iterator& it, std::vector<char>::const_iterator& end){
auto op{std::find(instructions.begin(), instructions.end(), *it)};
if (op == instructions.end()) {
std::cout << "Instruction not foundn";
it = end;
return 0; // instruction not found or off the end of the passed array
}
if (std::distance(it, end) < op->bytes) {
std::cout << "Not enough bytes left to decode " << op->name << 'n';
it = end;
return 0; // instruction not found or off the end of the passed array
}
return op->decode(it);
}
Now main
becomes very simple:
int main(){
const std::vector<char> char_vect = {0x76, 0x10, 0x20, 0x10}; // Fill vector of char
auto end{char_vect.cend()};
for (auto it{char_vect.cbegin()}; it != end; disassembleOne(it, end)) {
}
}
Another way to do this would be to put more of the processing in the Opcode
itself -- it would make sense that each opcode would know how to decode itself.
Be clear about caller expectations
This code, as with the original code, expects that the it
being passed is a valid iterator that is not the end. It is good to document that in comments in the code.
edited 12 hours ago
answered 12 hours ago
EdwardEdward
47.3k378212
47.3k378212
$begingroup$
The loop inmain
doesn't use the result of the call todisassembleOne
.
$endgroup$
– Roland Illig
9 hours ago
$begingroup$
@RolandIllig It doesn't need to use the return value because the iterator, which is passed by reference, is updated within the function.
$endgroup$
– Edward
8 hours ago
add a comment |
$begingroup$
The loop inmain
doesn't use the result of the call todisassembleOne
.
$endgroup$
– Roland Illig
9 hours ago
$begingroup$
@RolandIllig It doesn't need to use the return value because the iterator, which is passed by reference, is updated within the function.
$endgroup$
– Edward
8 hours ago
$begingroup$
The loop in
main
doesn't use the result of the call to disassembleOne
.$endgroup$
– Roland Illig
9 hours ago
$begingroup$
The loop in
main
doesn't use the result of the call to disassembleOne
.$endgroup$
– Roland Illig
9 hours ago
$begingroup$
@RolandIllig It doesn't need to use the return value because the iterator, which is passed by reference, is updated within the function.
$endgroup$
– Edward
8 hours ago
$begingroup$
@RolandIllig It doesn't need to use the return value because the iterator, which is passed by reference, is updated within the function.
$endgroup$
– Edward
8 hours ago
add a comment |
$begingroup$
I think the big switch
is a problem. Consider a more data-driven approach where each opcode is described by a struct:
struct OpCode
{
unsigned char command_byte;
unsigned char mask; // if only a subset of bits determine command
unsigned char length;
// other members as needed - e.g. a pointer to the "print" function
};
Now, the code that reads instructions can determine whether the command is unterminated, without needing to repeat the logic for every multi-byte opcode.
I've included the mask
so that we don't encode every single instruction (e.g. ld R1, R2
encodes the source and destination registers in the bits of a single-byte command; it would be tedious and error-prone to write them all separately here).
The simple length
value I've shown isn't quite enough, given that the Game Boy's LR35902 processor supports the 0xCB prefix for extended Z80 instructions - we might want to handle that outside of the normal flow, and use it to switch between different instruction tables.
We get away with a simple length
value here, because the only prefix instruction supported by the Game Boy's LR35902 processor is 0xCB, which is always followed by a single-byte instruction. If we were decoding Z80 instructions (with ED prefix), then we'd need something a little more sophisticated.
$endgroup$
add a comment |
$begingroup$
I think the big switch
is a problem. Consider a more data-driven approach where each opcode is described by a struct:
struct OpCode
{
unsigned char command_byte;
unsigned char mask; // if only a subset of bits determine command
unsigned char length;
// other members as needed - e.g. a pointer to the "print" function
};
Now, the code that reads instructions can determine whether the command is unterminated, without needing to repeat the logic for every multi-byte opcode.
I've included the mask
so that we don't encode every single instruction (e.g. ld R1, R2
encodes the source and destination registers in the bits of a single-byte command; it would be tedious and error-prone to write them all separately here).
The simple length
value I've shown isn't quite enough, given that the Game Boy's LR35902 processor supports the 0xCB prefix for extended Z80 instructions - we might want to handle that outside of the normal flow, and use it to switch between different instruction tables.
We get away with a simple length
value here, because the only prefix instruction supported by the Game Boy's LR35902 processor is 0xCB, which is always followed by a single-byte instruction. If we were decoding Z80 instructions (with ED prefix), then we'd need something a little more sophisticated.
$endgroup$
add a comment |
$begingroup$
I think the big switch
is a problem. Consider a more data-driven approach where each opcode is described by a struct:
struct OpCode
{
unsigned char command_byte;
unsigned char mask; // if only a subset of bits determine command
unsigned char length;
// other members as needed - e.g. a pointer to the "print" function
};
Now, the code that reads instructions can determine whether the command is unterminated, without needing to repeat the logic for every multi-byte opcode.
I've included the mask
so that we don't encode every single instruction (e.g. ld R1, R2
encodes the source and destination registers in the bits of a single-byte command; it would be tedious and error-prone to write them all separately here).
The simple length
value I've shown isn't quite enough, given that the Game Boy's LR35902 processor supports the 0xCB prefix for extended Z80 instructions - we might want to handle that outside of the normal flow, and use it to switch between different instruction tables.
We get away with a simple length
value here, because the only prefix instruction supported by the Game Boy's LR35902 processor is 0xCB, which is always followed by a single-byte instruction. If we were decoding Z80 instructions (with ED prefix), then we'd need something a little more sophisticated.
$endgroup$
I think the big switch
is a problem. Consider a more data-driven approach where each opcode is described by a struct:
struct OpCode
{
unsigned char command_byte;
unsigned char mask; // if only a subset of bits determine command
unsigned char length;
// other members as needed - e.g. a pointer to the "print" function
};
Now, the code that reads instructions can determine whether the command is unterminated, without needing to repeat the logic for every multi-byte opcode.
I've included the mask
so that we don't encode every single instruction (e.g. ld R1, R2
encodes the source and destination registers in the bits of a single-byte command; it would be tedious and error-prone to write them all separately here).
The simple length
value I've shown isn't quite enough, given that the Game Boy's LR35902 processor supports the 0xCB prefix for extended Z80 instructions - we might want to handle that outside of the normal flow, and use it to switch between different instruction tables.
We get away with a simple length
value here, because the only prefix instruction supported by the Game Boy's LR35902 processor is 0xCB, which is always followed by a single-byte instruction. If we were decoding Z80 instructions (with ED prefix), then we'd need something a little more sophisticated.
answered 11 hours ago
Toby SpeightToby Speight
26.2k742118
26.2k742118
add a comment |
add a comment |
$begingroup$
Your code's division into functions isn't very natural. The objective is to have each function performing one specific task, which should also be as orthogonal as possible to the tasks performed by the other functions. For instance, your disassemble
function performs three different functions: it reads from the instruction stream, it interprets assembly code and returns the number of bytes that should be skipped to get to the next instruction. That's a not-so-coherent mix of responsibilities.
There's also a bug in your code, because it+=disassemble(it);
could point beyond char_vect.end()
which is in itself undefined behavior, even if you don't dereference the iterator.
I would build my disassembler around an iterator, or better even a range (if you don't mind external libraries or anticipating the next standard):
#include <algorithm>
#include <vector>
#include <iostream>
#include <range/v3/view.hpp>
using iterator = std::vector<char>::const_iterator;
struct truncated_instruction {};
// one function to interpret the instruction
void interpret(char instruction) {
if (instruction == 'x' throw bad_instruction();
std::cout << (int) instruction << ' ';
}
// one function to get the number of bytes to skip
int nb_bytes(char c) { return 1; }
class disassembled
: public ranges::view_facade<disassembled>
{
friend ranges::range_access;
iterator first, last;
char const & read() const { return *first; }
bool equal(ranges::default_sentinel) const { return first == last; }
void next() {
// one function to get to the next instruction
auto bytes_to_skip = nb_bytes(*first);
if (std::distance(first, last) < bytes_to_skip) throw truncated_instruction();
// check if there's enough space left to advance before advancing
std::advance(first, bytes_to_skip);
}
public:
disassembled() = default;
explicit disassembled(const std::vector<char>& v) : first(v.begin()), last(v.end()) {}
};
int main() {
std::vector<char> char_vect = {0x76, 0x10, 0x20, 0x30};
try {
for (auto instruction : disassembled(char_vect)) interpret(instruction);
}
catch // ...
}
$endgroup$
add a comment |
$begingroup$
Your code's division into functions isn't very natural. The objective is to have each function performing one specific task, which should also be as orthogonal as possible to the tasks performed by the other functions. For instance, your disassemble
function performs three different functions: it reads from the instruction stream, it interprets assembly code and returns the number of bytes that should be skipped to get to the next instruction. That's a not-so-coherent mix of responsibilities.
There's also a bug in your code, because it+=disassemble(it);
could point beyond char_vect.end()
which is in itself undefined behavior, even if you don't dereference the iterator.
I would build my disassembler around an iterator, or better even a range (if you don't mind external libraries or anticipating the next standard):
#include <algorithm>
#include <vector>
#include <iostream>
#include <range/v3/view.hpp>
using iterator = std::vector<char>::const_iterator;
struct truncated_instruction {};
// one function to interpret the instruction
void interpret(char instruction) {
if (instruction == 'x' throw bad_instruction();
std::cout << (int) instruction << ' ';
}
// one function to get the number of bytes to skip
int nb_bytes(char c) { return 1; }
class disassembled
: public ranges::view_facade<disassembled>
{
friend ranges::range_access;
iterator first, last;
char const & read() const { return *first; }
bool equal(ranges::default_sentinel) const { return first == last; }
void next() {
// one function to get to the next instruction
auto bytes_to_skip = nb_bytes(*first);
if (std::distance(first, last) < bytes_to_skip) throw truncated_instruction();
// check if there's enough space left to advance before advancing
std::advance(first, bytes_to_skip);
}
public:
disassembled() = default;
explicit disassembled(const std::vector<char>& v) : first(v.begin()), last(v.end()) {}
};
int main() {
std::vector<char> char_vect = {0x76, 0x10, 0x20, 0x30};
try {
for (auto instruction : disassembled(char_vect)) interpret(instruction);
}
catch // ...
}
$endgroup$
add a comment |
$begingroup$
Your code's division into functions isn't very natural. The objective is to have each function performing one specific task, which should also be as orthogonal as possible to the tasks performed by the other functions. For instance, your disassemble
function performs three different functions: it reads from the instruction stream, it interprets assembly code and returns the number of bytes that should be skipped to get to the next instruction. That's a not-so-coherent mix of responsibilities.
There's also a bug in your code, because it+=disassemble(it);
could point beyond char_vect.end()
which is in itself undefined behavior, even if you don't dereference the iterator.
I would build my disassembler around an iterator, or better even a range (if you don't mind external libraries or anticipating the next standard):
#include <algorithm>
#include <vector>
#include <iostream>
#include <range/v3/view.hpp>
using iterator = std::vector<char>::const_iterator;
struct truncated_instruction {};
// one function to interpret the instruction
void interpret(char instruction) {
if (instruction == 'x' throw bad_instruction();
std::cout << (int) instruction << ' ';
}
// one function to get the number of bytes to skip
int nb_bytes(char c) { return 1; }
class disassembled
: public ranges::view_facade<disassembled>
{
friend ranges::range_access;
iterator first, last;
char const & read() const { return *first; }
bool equal(ranges::default_sentinel) const { return first == last; }
void next() {
// one function to get to the next instruction
auto bytes_to_skip = nb_bytes(*first);
if (std::distance(first, last) < bytes_to_skip) throw truncated_instruction();
// check if there's enough space left to advance before advancing
std::advance(first, bytes_to_skip);
}
public:
disassembled() = default;
explicit disassembled(const std::vector<char>& v) : first(v.begin()), last(v.end()) {}
};
int main() {
std::vector<char> char_vect = {0x76, 0x10, 0x20, 0x30};
try {
for (auto instruction : disassembled(char_vect)) interpret(instruction);
}
catch // ...
}
$endgroup$
Your code's division into functions isn't very natural. The objective is to have each function performing one specific task, which should also be as orthogonal as possible to the tasks performed by the other functions. For instance, your disassemble
function performs three different functions: it reads from the instruction stream, it interprets assembly code and returns the number of bytes that should be skipped to get to the next instruction. That's a not-so-coherent mix of responsibilities.
There's also a bug in your code, because it+=disassemble(it);
could point beyond char_vect.end()
which is in itself undefined behavior, even if you don't dereference the iterator.
I would build my disassembler around an iterator, or better even a range (if you don't mind external libraries or anticipating the next standard):
#include <algorithm>
#include <vector>
#include <iostream>
#include <range/v3/view.hpp>
using iterator = std::vector<char>::const_iterator;
struct truncated_instruction {};
// one function to interpret the instruction
void interpret(char instruction) {
if (instruction == 'x' throw bad_instruction();
std::cout << (int) instruction << ' ';
}
// one function to get the number of bytes to skip
int nb_bytes(char c) { return 1; }
class disassembled
: public ranges::view_facade<disassembled>
{
friend ranges::range_access;
iterator first, last;
char const & read() const { return *first; }
bool equal(ranges::default_sentinel) const { return first == last; }
void next() {
// one function to get to the next instruction
auto bytes_to_skip = nb_bytes(*first);
if (std::distance(first, last) < bytes_to_skip) throw truncated_instruction();
// check if there's enough space left to advance before advancing
std::advance(first, bytes_to_skip);
}
public:
disassembled() = default;
explicit disassembled(const std::vector<char>& v) : first(v.begin()), last(v.end()) {}
};
int main() {
std::vector<char> char_vect = {0x76, 0x10, 0x20, 0x30};
try {
for (auto instruction : disassembled(char_vect)) interpret(instruction);
}
catch // ...
}
answered 12 hours ago
papagagapapagaga
4,547321
4,547321
add a comment |
add a comment |
Mai Kar is a new contributor. Be nice, and check out our Code of Conduct.
Mai Kar is a new contributor. Be nice, and check out our Code of Conduct.
Mai Kar is a new contributor. Be nice, and check out our Code of Conduct.
Mai Kar is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Code Review Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f215349%2fdecoding-assembly-instructions-in-a-game-boy-disassembler%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
3
$begingroup$
This question might be more on-topic over at SoftwareEngineering.StackExchange.com . However, before posting please follow their tour and read "How do I ask a good question?", "What topics can I ask about here?" and "What types of questions should I avoid asking?".
$endgroup$
– BCdotWEB
15 hours ago
$begingroup$
Does or doesn't the current code work as intended? I assume you've tested it?
$endgroup$
– Mast
14 hours ago
1
$begingroup$
@Mast Yes it does work as intended, I just tested it.
$endgroup$
– Mai Kar
14 hours ago
2
$begingroup$
Your error is simply passing insufficient Information to
disassemble()
.$endgroup$
– Deduplicator
13 hours ago
3
$begingroup$
I reformulated my demand because I think it was not clear enough. This question is mainly about validating my design choices, i.e. the use of iterators and try/catch.
$endgroup$
– Mai Kar
12 hours ago