Writing a disassembler

[c++ with libbfd from binutils-dev package]

Hello_Friend,
today is a nice day, so let us build something.
I assume many of you should have heard names like ollydb or IDA before, so let's try building something like this today.

Don't get ahead of yourself, the functionality will be nothing compared to this tools, and we will work only on console level, but that's fine as long as we learn something from it, right?

Introduction

To build out own custom debugging tools, we will need to power up our include folder first.
We are going to install the binutils-dev package per apt-get.
You should also refresh your C++ skills, and read into the general structure of a binary program.
Well, that last part is not all too important since I am going to write some basic tutorials about that soon...

So, folks, fire up your favorite programming tools and let's get down to work...

Addendum:

To compile the program, use gcc like this:

g++ -lbfd loader.cc loader_main.cc -o programname

Use the program like this:
./programname targetbinary targetsection

Basic structure of our program

We are going to create a header file with 3 classes, an implementation file, and a main program.
A binary should be represented by a fitting data structure, so the first class we will create in the header file is called Binary.
In a binary file, there are a multitude of different sectors and links to about anything, and I understand if this will be confusing at first. The sectors, or sections, shall be represented in their own class, called Sections. The links are not links in the classic way (like in a browser), consider them rather as symbols, telling the linker and interpreter where to look at. Thus, this class will be called Symbols.

Next is our main worker bee, the implementation of the program, the place you could consider the heart of our code. I'll just call this file loader.cc, but go on and be creative..
The loader will have the sole purpose of loading the binary into it's data structure, and load the sections and symbols into their class data structures. In retrospective, this sounds much easier than it is. We also will provide meanings of unloading the binary so it won't stay in memory forever.

Finally, the main program will be responsible to call everything in the right order, print stuff to the user in the right format, and clean up afterwards.

The header file

As meantioned, we will be in need of 3 classes for our program, so we will declare them here.
We will also declare some free functions which we will be calling later in the worker part.

Here is the code of the header file:


#ifndef LOADER_H
#define LOADER_H

//imports
#include <stdint.h>
#include <string>
#include <vector>
#include <bfd.h>

class Binary;
class Section;
class Symbol;

//relocations
class Symbol
{
 public:
  enum symbol_type
  {
   symbol_unknown  = 0,
   symbol_function = 1,
   symbol_local = 2,
   symbol_global = 3,
   symbol_section = 4
  };

  //insance variables
  symbol_type  type;
  std::string  name;
  uint64_t   adresse;
  bool    weak;

  //construktor
  Symbol() : type(symbol_unknown), name(), adresse(0),weak(0) {}

};  //class symbol

class Section
{
 public:
  enum section_type
  {
   section_none = 0,
   section_data = 1,
   section_code = 2
  };

  //instance variables
  Binary    *binary; //objects are on heap anyway
  std::string  name;
  section_type  type;
  uint64_t   starting_adr;
  uint64_t   size;
  uint8_t   *bytes;  //for raw hexdump, allocated on heap (with *)

  //check wheter a specific instruction belongs to this section
  bool contains(uint64_t adresse) 
  {
   //next level boolean
   return (adresse >= starting_adr) && (adresse - starting_adr < size);
  }

  //constructor
  Section() : binary(NULL), type(section_none), starting_adr(0), size(0), bytes(NULL){}

}; //class Section

//this represents the binary as a whole and is our main class 
//for this api, also it loads other stuff from the input file
class Binary
{
 public:
  enum binary_type
  {
   type_auto  = 0,
   type_elf  = 1,
   type_pe  = 2
  };

  //target processor type
  //the distinction btw. 32 and 64bit
  //is made by the unsigned binarybits var
  enum binary_arch
  {
   bin_NONE = 0,
   bin_x86_32 = 1,
   bin_x86_64 = 2
  };

  //instance variables
  std::string    progname;
  std::string    progType;  //type string
  std::string    arch;   //architecture string (x86-64 usually)
  unsigned    binarybits;  //implying int if no type specified
  uint64_t    entrypoint;
  binary_arch    targetArch;  //architecture number
  binary_type    type;   //type number
  std::vector<Section> sectionList;
  std::vector<Symbol>  symbolList;

  //function to get some hot sections fast
  //can be improved by a proper argument list

  Section *get_Section_Text()
  {
   for (auto &s : sectionList)
    if(s.name ==".text") return &s; return NULL;
  }

  Section *get_Section_Stringtable()
  {
   for (auto &s : sectionList)
    if(s.name ==".strtab") return &s; return NULL;
  }

  Section *get_Section_Data()
  {
   for (auto &s : sectionList)
    if(s.name ==".data") return &s; return NULL;
  }

  //constructor 
  Binary() : type(type_auto), targetArch(bin_NONE),binarybits(0), entrypoint(0){}


}; //class Binary

//these are the two main functions of this api:
int load_binary(std::string &progname, Binary *bin_file, Binary::binary_type bin_type);
void unload_binary(Binary *bin_file);

//function declarations
int open_bfd(std::string &fname, Binary *bin, Binary::binary_type bin_type);
int load_symbols(bfd *b, Binary *bin);
int load_sections(bfd *b, Binary *bin);


#endif // LOADER_H

Okay, let's quickly dive into this.
Most parts should be obvious due to my comments there anyway...

At the top, after the include guards and include headers, we first have to declare the classes so the linker won't be confused about them. Also, you should try to leave the order as it is, because g++ is not the most smartest linker and starts complaining about definitions pretty fast.

At 16, you'll find the Symbol class definition. This class represents symbols used by the linker and the interpreter. Because there exist normal and dynamic symbols in a binary, we will try to build it as generic as possible.

Every symbol that we create needs a type as there exist a multitude of different symbols. To keep track about which symbol the final instance will be, we define an enum with different types.
As you can see, there are currently 5 different symbol types presend, of which symbol_function is the most interesting one for us since it points to the start of a function in the target binary.
Also to mention is the bool weak, which will mark weak symbols later, but more about these details in future blog posts.
The cunstructor defines some values from the beginning, for example we assume a symbol to be of type symbol_unknown until it is identified, and we assume symbols to not be weak by default.

At 39 begins the Section class, which can be considered almost identical to the symbol class, so nothing new here. Note how we defined a member function here to check whether a specific address lies within the scope of the section instance. We are not using it in the final program tho, but if you want to expand the program later, feel free to use it (and write more of those functions!)

At 71 we finally get to our Binary class, the main data structure we are going to use.
The enum we can find at the top of the class will later distinguish between Linux and windows binaries. The enum after that is rather self explaining and checks for a target processor type. I don't know about ARM support yet, but you should find out by looking into the libbfd header which you can find under /user/include. In case you're wondering about the single unsigned keyword there, C++ will assume int in case you don't specify something else.
Again, the functions beneath the instance variables aren't used in the final program, but are nice to have and a good exercise for you to expand.

At 130 starts the declaration of the free functions in the main worker. If a function isn't static, you should always declare it in it's header file, respectivly.

So much for the header file, take your time and read through it so you understand the basic concept of what we are building here.

Loader.cc - the main worker bee

We declared some functions in the header file, let's now define them in a C++ class.

Here is the code for loader.cc:


#include <stdio .h="">
#include <stdlib .h="">
#include <bfd .h="">
#include "loader.h"

void unload_binary(Binary *bin)
{
 size_t i;
 Section *section;

 for (int i = 0; i < bin->sectionList.size(); i++)
 {
  section = &bin->sectionList[i];

  //just free the only malloced class variable
  if(section -> bytes)
   free(section -> bytes);
 }
}

//all libbfds functions work on file handles to binarys, so we need one of these
int open_bfd(std::string &fname, Binary *bin, Binary::binary_type type)
{
 /*
  return values:
  1  -> ok
  0  -> error
  -1 -> failed to open
  -2 -> wrong file type
  -3 -> unknown binary format
  -4 -> tryed opening MS-DOS file (lol)
  -5 -> unknown or unsupported type
  -6 -> unsupported binary arch
  -7 -> section loading failed (unlikely at this point)
 */

 bfd_init(); 
 const bfd_arch_info_type *targetarch;
 
 bfd *bfd_h; //handle to binary

 //get the name of the to be opened binary
 bin->progname = std::string(fname);

 bfd_h = bfd_openr(fname.c_str(), NULL); //second param is type, but autocasting here

 //check handle working, bfd_get_error gives number to check against in the library
 if(!bfd_h)
 {
  fprintf(stderr, "failed to open binary '%s' (%s)\n", fname.c_str(), bfd_errmsg(bfd_get_error()));
  return -1;
 }

 //can be bfd_object, bfd_archive or bfd_core
 if(!bfd_check_format(bfd_h, bfd_object))
 {
  //an "bfd_object" is an executable, relocatable object or a shared library
  fprintf(stderr, "file '%s' not an executable (%s)\n", fname.c_str(), bfd_errmsg(bfd_get_error()));
  return -2;
 }

 //because lib is bad, we need to manually set errors to "none" again
 bfd_set_error(bfd_error_no_error);

 /* check the type of binary 
  bfd_target_msdos_flavour
  bfd_target_coff_flavour
  bfd_target_elf_flavour
  bfd_target_unknown_flavour --> break on this one
 */

 if(bfd_get_flavour(bfd_h) == bfd_target_unknown_flavour)
 {
  fprintf(stderr, "Unrecognised format of binary '%s' (%s)\n", fname.c_str(), bfd_errmsg(bfd_get_error()));
  return -3;
 }

 //get the entry point of the binary
 bin->entrypoint = bfd_get_start_address(bfd_h);

 //get thunload_binarye type (as string)
 bin->progType = std::string(bfd_h->xvec->name);

        //get binary flavour
 switch(bfd_h->xvec->flavour)
 {
  case bfd_target_elf_flavour:
   bin->type = Binary::type_elf;
  break;
  case bfd_target_coff_flavour:
   bin->type = Binary::type_pe;
  break;
  case bfd_target_msdos_flavour:
   fprintf(stderr, "Are u really trying to open '%s' , a MS-DOS file?(%s)\n", fname.c_str(), bfd_errmsg(bfd_get_error()));
   return -4;
  break;
  case bfd_target_unknown_flavour:
   return -5;
  break;
  default:
   fprintf(stderr, "Unsupported file type!(%s)\n", fname.c_str(), bfd_errmsg(bfd_get_error()));
   return -5;
 }

 targetarch = bfd_get_arch_info(bfd_h);
 bin->arch = std::string(targetarch->printable_name);

 switch(targetarch->mach) // "machine" in libbfd
 {
  case bfd_mach_i386_i386:
   bin->targetArch = Binary::bin_x86_32;
   bin->binarybits = 32;
  break;
  case bfd_mach_x86_64:
   bin->targetArch = Binary::bin_x86_64;
   bin->binarybits = 64;
  break;
  default:
   fprintf(stderr, "Unsupported architecture(%s)\n", fname.c_str(), bfd_errmsg(bfd_get_error()));
   return -6;
 }
 printf("\nREADING SYMBOLS AND DYNAMIC SYMBOLS\n\n");

 //try to handle symbols (they may be stripped)
 load_symbols(bfd_h,bin);

 printf("\nREADING SECTION LIST\n");

 //load binary's sections
 if(load_sections(bfd_h,bin) < 0)
  return -7;

 //close the handle and finish function :-)
 bfd_close(bfd_h);

 fprintf(stdout, "\nLOADING BINARY FINISHED SUCESSFULLY! RESULTS BELOW:\n");

 //everything worked fine
 return 1;

}

//load symbols from binary if it is not stripped
int load_symbols(bfd *bfd_h, Binary *bin)
{
 long symbol_size, dynsym_size;
 long symbol_count,dynsym_count;

 //2 arrays for symbols
 bfd_symbol **symbols, **dyn_symbols;

 Symbol sym;

 symbols = NULL;
 dyn_symbols = NULL;

 //get size of symbol tables, to malloc afterwards
 symbol_size = bfd_get_symtab_upper_bound(bfd_h);  //this remains a mystery (g++ had problems here)
 dynsym_size = bfd_get_dynamic_symtab_upper_bound(bfd_h);//just copy func name from library and it works

 if (symbols < 0 || dyn_symbols < 0)
 {
  fprintf(stderr, "An error occured while reading the symbol table: '%s'\n", bfd_errmsg(bfd_get_error()));
  return 0;
 }

 //allocate some space for the symbol arrays
 symbols = (bfd_symbol**)malloc(symbol_size);
 dyn_symbols = (bfd_symbol**)malloc(dynsym_size);

 //this should not happen at all in modern times
 if(!symbols || !dyn_symbols)
 {
  fprintf(stderr, "Out of memory\n");
  return -1;
 }

 //populate the tables and return the number of symbols
 symbol_count = bfd_canonicalize_symtab(bfd_h, symbols);
 dynsym_count = bfd_canonicalize_dynamic_symtab(bfd_h, dyn_symbols);


 fprintf(stdout, "symbol count is %d entries long\n" , symbol_count);
 fprintf(stdout, "dyn_symbol count is %d entries long\n" , dynsym_count);

 //collect all function_symbols from the symbol table
 if(symbol_count>0)
  for(long i=0; i<symbol_count; i++)
  {

   sym = Symbol();

   //check flags from libbfd.h  -- flags AND libary_flags equals 1
   if(symbols[i]-> flags & BSF_FUNCTION)
    sym.type = Symbol::symbol_function;
   else if(symbols[i]-> flags & BSF_LOCAL)
    sym.type = Symbol::symbol_local;
   else if(symbols[i]-> flags & BSF_GLOBAL)
    sym.type = Symbol::symbol_global;
   else if(symbols[i]-> flags & BSF_SECTION_SYM)
    sym.type = Symbol::symbol_section;
    
   //check whether symbol is weak
   if(symbols[i]-> flags & BSF_WEAK)
    sym.weak = 1;

   //get rest of symbol information
   sym.name = std::string(symbols[i]->name);
   sym.adresse = bfd_asymbol_value(symbols[i]);

   //pushback object
   bin->symbolList.push_back(sym);

  }
 else
  fprintf(stderr, "\nSYMBOL COUNT VALUE -> 0, ERROR MSG: '%s'\n", bfd_errmsg(bfd_get_error()));

 //place a dummy Symbol in the list to mark the beginning of the dynamic symbols
 sym = Symbol();
 sym.type   = Symbol::symbol_unknown;
 sym.name   = std::string("######### DYNAMIC SYMBOLS BELOW ########");
 sym.adresse   = -1;
 sym.weak   = 0;
 bin-> symbolList.push_back(sym);

 //same as above, just now for dynamic symbols
 //i have this strange feeling that dynamic symbols ALWAYS R localsymbols....
 if(dyn_symbols>0)
  for(long i=0; i<dynsym_count; i++)
  {
    
   sym = Symbol();

   //check flags from libbfd.h  -- flags AND libary_flags equals 1
   if(symbols[i]-> flags & BSF_FUNCTION)
    sym.type = Symbol::symbol_function;
   else if(symbols[i]-> flags & BSF_LOCAL)
    sym.type = Symbol::symbol_local;
   else if(symbols[i]-> flags & BSF_GLOBAL)
    sym.type = Symbol::symbol_global;
   else if(symbols[i]-> flags & BSF_SECTION_SYM)
    sym.type = Symbol::symbol_section;
    
   //check whether symbol is weak
   if(symbols[i]-> flags & BSF_WEAK)
    sym.weak = 1;

   //get rest of symbol information
   sym.name = std::string(dyn_symbols[i]->name);
   sym.adresse = bfd_asymbol_value(dyn_symbols[i]);

   //pushback object
   bin->symbolList.push_back(sym);

  }
 else
  fprintf(stderr, "\nDYNSIM COUNT 0, ERROR MSG: '%s'\n", bfd_errmsg(bfd_get_error()));

 if(symbols)
  free(symbols);

 if(dyn_symbols)
  free(dyn_symbols);
 
 return 1;
}

int load_sections(bfd *bfd_h, Binary *bin)
{
 int bfd_flags;
 uint64_t start_address, size;
 const char *section_name;
 bfd_section* bfd_section;
 Section *section;
 Section::section_type type ;

 
 //read the section's flags and set type (thus, call by reference)
 for (bfd_section = bfd_h->sections; bfd_section; bfd_section = bfd_section->next) //stops on null pointer
 {
  bfd_flags = bfd_get_section_flags(bfd_h, bfd_section);

  if(bfd_flags & SEC_CODE)
   type = Section::section_code;
  else if(bfd_flags & SEC_DATA)
   type = Section::section_data;
  else
  {
   type = Section::section_none;
   continue;
  }

  //fprintf(stdout, "found a section with type: %d\n", bfd_flags);

  // vma = virtual memory adress ?? (startadresse der section)
  start_address  = bfd_section_vma(bfd_h, bfd_section);
  size    = bfd_section_size(bfd_h, bfd_section);
  section_name = bfd_section_name(bfd_h, bfd_section);

  if(!section_name) section_name = "";

  //this is actually pretty smart way of getting reference
  bin -> sectionList.push_back(Section());
  section = &bin->sectionList.back();

  section->binary   = bin;
  section->name    = std::string(section_name);
  section->type    = type;
  section->starting_adr  = start_address;
  section->size    = size;
  section->bytes    = (uint8_t*)malloc(size);

  if(!section->bytes)
  {
   fprintf(stderr, "Out of memory!\n");
   return -1;
  }

  if(!bfd_get_section_contents(bfd_h,bfd_section, section->bytes, 0 ,size))
  {
   fprintf(stderr, "Failed to read section '%s'(%s)\n", 
    section_name, bfd_errmsg(bfd_get_error()));

   return -2;
  }
  
 }

 return 0;

}

Yes, this is a pretty long file as it is our main working program.
Don't worry, it's no arcane magic spell, it's only C++ ( there SHOULD be a difference at least )

At 7, the first function, unload_binary, is used to unload the binary after we are finished with work later, thus is rather simple.

At 23, we start our work with open_bfd (that was not the best way to pick for the function in retrospective). We pass a file path to our function, and the first thing we are doing inside the function is to call bfd_init(); This call is so important that I almost wrote a single function just for it. Following along, we create a bfd handle and gather some basic information about the binary. This should be rather self-explaining.

At 86-104, we determine the type of binary file. As you might have already guessed, there are some major differences between PE-files (for Microsoft windows) and ELF-files (for UNIX-style systems). I will write about this in the future, but for now just remember that these are different binary files that are basically the same but with other naming and slightly different structure. Yes, Microsoft is making money by selling you what you could get basically for free, but that is a different story.

At 109-122, we gather information about the target system architecture. You probably have heard the terms 32bit and 64bit systems, although you might read this at a point in the future where no 32bit executable exist anymore. We do not really intend to load any 8- and 16bit binaries, so we ignore the fact that these exist.

At 126 and 131, we jump to the working functions for loading sections and symbols (we'll get to that in a minute). You might wonder why I decided to return an error code at 131, when loading the sections, but not at 126, when loading symbols. The reason for this is that a binary may be stripped completely from symbols, as is normal practice when releasing a program. The reason for this is capitalism, as it makes work harder for crackers, especially in case of video games and the like. Such a stripped binary would not have a symbol table or any symbols at all, but this is no reason for us to return an error code, since 0 symbols does not automatically mean that we failed our operation.
At 135, we close our handle, since we don't want to leave any hanging pointers in memory.

At 145, you will find the beginning of the load_symbols function.
At 159-160, we get the upper boundaries of the symbol tables so we can allocate enough space on the heap to load all symbols into objects. This might never be of concern, but you never knew how big the program you tried to load in the future was, so this is good practise at this point. We allocate the space for the symbols at lines 169-170.

At 188, we start working with the normal symbols of the binary. If there are any present, we start looping over all of them, and start by determining the type of symbol. If you are unfamiliar with this kind of if-statement, look into logical AND. Basically, if a specific flag is set, the result must be true, because 1&1 = 1, right? Well....
Since we are not immediately outputting this list, we place a dummy symbol at 220 in the list, which we'll later use to print an information to the user that this symbol marks the start of the dynamic symbols. Speaking of which, these start at 229 and are pretty straightforward the same as the normal symbols. To be honest, I just copy pasted that part and renamed the loop variable. Eventually we are freeing the symbol lists, if they where present to begin with.

At 269 you'll find the entrance to the function that finds the sections for us, namely load_sections.
We start looping over the section list and extracting information about each section. You should already be familiar with this, since it is not that different from the way we gathered symbol information before. I especially want to mention the 2 lines 304 and 305 since they blew my mind when I first encountered them (I took them from a book about C++). Also, at line 312 you will find an allocation for the bytes field of our data structure. The reason is that we are saving the program bytes of each section so we can output them later, as you can see at line 320.

Yikes, that was much more than I anticipated. Although the code isn't that hard to understand, it is above the novice level and definitely a lot of lines at once. But the good thing is, that was already the hardest part of our program. The final part will be shorter and easier again, I promise..

loader_main.cc - our main program

Sheesh, there is still more code to come, but this will be the last part, after that it is finally time to compile and test our construct.

Straigthforward, here is the code:


#include <stdio.h>
#include <stdint.h>
#include <string>
#include "loader.h"

/*
 created by clockw0rk
 shokwaVe_sec
*/

int main(int argc, char *argv[])
{
 size_t i;
 Binary bin;
 Section *sec;
 Symbol *sym;
 std::string filename;
 std::string dump_name;

 if(argc < 2)
 {
  printf("BAKA!");
  return 1;
 }

 //assign filename, print msg to user
 filename.assign(argv[1]);
 printf("\nSHOKWAVE_BINARY_LOADER : INITIALIZING\n[use terminal at least 100 characters wide pl0x]\n\n");
 fprintf(stdout, "FILENAME: %s\n", filename.c_str());

 //user wants to print section xy
 if(argc>2)
 {
  dump_name.assign(argv[2]);
  fprintf(stdout, "ARGUMENT: %s, trying to dump section at the end\n", dump_name.c_str());
 }


 //##################################################################################################
 //begin work loop
 if(open_bfd(filename, &bin, Binary::type_auto) <0)
  return 1;


 //output to user to clarify binary values
 fprintf(stdout,"\nBinary Name: '%s' \nBinary Arch: %s/%s ( %u bits )\nEntry Point: 0x%016jx\n", 
  bin.progname.c_str(),
  bin.progType.c_str(),
  bin.arch.c_str(),
  bin.binarybits,
  bin.entrypoint);

 //##################################################################################################
 fprintf(stdout, "\nSECTION LIST\n\n  VMA       SIZE     NAME      TYPE\n\n");

 //print sections to screen
 for(int i=0;i<bin.sectionList.size();i++)
 {
  sec = &bin.sectionList[i];

  fprintf(stdout,"  0x%016jx %-8ju %-20s %s\n", 
   sec-> starting_adr,
   sec-> size,
   sec-> name.c_str(),
   sec-> type == Section::section_code ? "CODE" : "DATA");
 }

 //##################################################################################################
 //print symbols to screen, if any
 if(bin.symbolList.size() > 0)
 {

  fprintf(stdout, "\n\nSYMBOL TABLE(s):\n\n");
  for(int i=0; i<bin.symbolList.size(); i++)
  {
   sym = &bin.symbolList[i];

   //print the correct symbol type, from enum list
   std::string symType = "";
   switch(sym->type)
   {
    case Symbol::symbol_function:
     symType = "FUNCTION ";
    break;

    case Symbol::symbol_local:
     symType = "LOCALSYM ";
    break;

    case Symbol::symbol_global:
     symType = "GLOBALSYM";
    break;

    case Symbol::symbol_section:
     symType = "SECTION  ";

    case Symbol::symbol_unknown:
     symType = "UNKNOWN  ";
   }

   if(sym->weak)
    symType.append("       WEAK");


   //print the symbol informations, use else when name gets too long
   if(sym-> name.length() < 80)
    //format symbol string and print
    fprintf(stdout," %-80s           0x%016jx   %s\n",
     sym->name.c_str(),
     sym-> adresse,
     symType.c_str());
   else
    fprintf(stdout," %s\n%-92s0x%016jx   %s\n",
     sym->name.c_str(),
     "",
     sym-> adresse,
     symType.c_str());
  }
 }

 //##################################################################################################
 //try dumping the contents of section x
 if(argc>2)
 {
  fprintf(stdout, "\n\nTrying to dump section %s\n", dump_name.c_str());
  for(auto &s : bin.sectionList)
   if(s.name == dump_name.c_str())
   {
    fprintf(stdout, "\nRAW HEXDUMP 4 SECTION %s; size:%i byte:\n\n", dump_name.c_str(),s.size);

    //tracker for formatting the dump
    int counter = 0;
    int linebreaker = 0;

    for (i = 0; i < s.size; i++)
    {
       unsigned char c = ((uint8_t*)s.bytes)[i] ;
       if(static_cast<int>(c) < 16)
        fprintf (stdout,"0%x", static_cast<int>(c)) ;
       else
        fprintf (stdout,"%x", static_cast<int>(c)) ;
       
       counter++;
       if(counter == 4)   //rework this so it is not so ugly anymore!!!
       {
        fprintf(stdout,"  ");
        counter = 0;
        linebreaker++;
        if(linebreaker == 8)
        {
         linebreaker = 0;
         fprintf(stdout,"\n");
        }
       }
    }


   }
 }

 //##################################################################################################
 fprintf(stdout,"\n\nEnd of operations, freeing memory. Sayonara\n");
 unload_binary(&bin);

 return 0;
}

This class file starts straigthforward with the main-function of our program on line 12.
After declaring some instance variables and checking whether the user called the program with enough parameters, we first assign the target binary at line 28.

Also, if the user wants to output specific sections in hex-format, we safe his parameter after line 33.
If you are curious, this is not correct yet since the user might want to output more than just one section at once, so feel free to expand this procedure so it can take more than one parameter. For now, one section is enough to output.

Line 42 starts the whole work-cycle we defined in the second file, so the call to open_bfd will start a whole bunch of work before it eventually returns.
If this should ever fail, we will break the program flow at this point with error code 1.

At line 47, the whole work-cycle is already finished, and all that is left to do is output our results to the user. We start by printing the most basic information here.

At lines 58-67, we are printing the section information since it is really valuable information. To be honest, this format string looks pretty intimidating, and it is not really intuitive, but it will output all sections in a nice, structured way, as shown in the picture below.

At line 71, we start printing the symbols to the user. First, we check which kind of symbol we got and prepare an output string which we perhaps expand with the word weak in line 102. Then we output the symbol to the screen, using 2 different versions of formatted output because the screen looks ugly if the symbol name is too long, as shown in lines 107-118.

The output of this operations should look something like this:

Note how each section also has a symbol pointing to it, as seen in the upper part of the picture.

At line 124, if a second parameter for the section to be outputted as hex-dump is given, we start doing this. We are comparing the name of each section against the parameter given, thus the user will have to give the section name exactly, like .text (with a dot).
Following to this is some black string magic which I refuse to explain here since formatting strings is not point of this tutorial. If everything went smooth, the output will look something like this:

In this case there are only 32 bytes present since the testprogram only contains the string "Let the fire consume you".

At last, in line 164, we unload the binary, thus freeing heap memory before closing the program.

Conclusion

This is a pretty big program, I admit. Take your time and read everything carefully, and try to understand each part.

Soon we will start finding out what parts of the binary are really important, for example the mysterious .got section, but this is stuff for some other tutorials.

I hope you had fun and learned something. Use your skills wisely, and try to expand the code if you can. Oh, and let me know if you could achieve something, or just write me your thoughts , after all I do all this stuff without earning any money from it..

Build a system that even a monkey can use, and only monkeys will use it.

- numb.3rs

MetaByte

Search This Blog

PE File Reader