PE File Reader

Pe File Reader [Opening .exe files to analyze the PE-header] Hello_Friend, and welcome to the 20's, Cybermonkeys! Let's start this decade by learning some stuff. Last year, we already took an in-depth look at Linux-binaries (aka ELF files) , now let's start messing around with it's infamous Windows-Counterpart (that also happens to be the xBox file format). Introduction The PE-file format is around for quite some time now, and while heavy optimizations took place, it hasn't really changed all that much since then. In fact, it is one of the most widely encountered file formats out in the wilds. Although there is technically a difference between PE32 files (32bit) and PE32+ files (64bit), we will ignore this fact for the sake of this blogpost. Some file extensions deriving from this format are: .acm   [ a Windows audio-codec] .ax    [MPEG-4 DVD format] .cpl   [dynamic link libraries for control panels] .dll   [dynamic link libraries] .drv   [ha

Elf file structure




Elf file format

[an in-depth look onto the Linux binary format]


Ohayo_gozaimasu,

As promised in the past, we are going to have a deeper look into the "Executable and Linkable Format" for Linux, called ELF.
Yeah, ELF, like the mystic creatures we know from various fantasy settings. If you're curious, there is also a format called DWARF, for debugging purposes, but these are tales for another time, little HUMAN.
To understand the structure of this format is crucial if you want to go on debugging, disassembling and instrumenting binaries in the future.

Introduction: how a binary is made

The ELF format is a widely used executable format, used f.e. in:

  • Unix, Solaris, Bsd, BEOS and Android
  • PSP, Playstation 2-4, Dreamcast, Gamecube and Wii
  • Operating systems made by Nokia, Samsung and Ericson
  • Microcontrollers from ATmel and Texas Instruments
So, as you see this is a widely used format, but what does it stand for anyway?
You see, an ELF file is kind of an arcane structure holding all information the operating system needs to execute a program f.e. on your pc. To better understand this, let us look at how a program is made first:


  1. A programmer writes Code in a high-level language, let's assume C
  2. After adding and implementing each file and specifying each header, the compilation process starts with the preprocessor
    1. The preprocessor expands all header files in on big source file
    2. At this point, all macros are expanded in said file as well
  3. Now the compiler and assembler get to work, in this case gcc
    1. The compiler reads the code, checking for errors as it is nothing more than a giant finite automat
    2. The compiler outputs assembler code based on the input
    3. The assembler then assembles this code to machine instructions, represented in hexadecimal form
  4. This operation results in the output of one or more .o files (object files), to be specific relocatable object files

This is as far as the compiler is involved. Many people don't know that there is a second program besides the compiler that is necessary for the creation of programs: the linker

  1. The linker first resolves all relocation symbols called symbolic references
  2. Then it links together all object files and external libraries into one big blob of data
  3. The linker furthermore creates a basic structure in the file that we will look upon soon
  4. The final file, called a.out (if not given a different name) contains all information for the process interpreter to start and execute the process
The code in an object file contains many unresolved symbolic addresses, the linker will resolve these and turn them into valid addresses.


The on-disk representation of a binary

So, let's investigate the file just created with gcc (or basically any other program, as they are in general terms pretty much identical).
Try opening it and see -- YIKES, it's all rubbish and numbers.
This is of course because we are looking at a binary file which contains machine instructions meant to be read by a processor (or let's say, an interpreter at least).
Trying to look at this would be looking at Japanese Kanji without speaking a single word of Japanese.
So what can we do? Well, one option would require you to get an x86 assembly reference sheet and look up each hexadecimal number (called opcode) and resolve the respective instruction to it (called mnemonic).
This will of course take weeks, although I believe you would get used to some of the instructions really fast. But another problem you would have to face was that not every hex-number in the file is really an instruction for the processor; there are many numbers that contain addresses, information and structure, so this would lead nowhere since you couldn't tell them apart from each other.

No, the correct way is of course to use or write tools that can help you understand the contents, for example my disassembler or readelf.
We will look at how to do this for PE files (Windows) in the future, but for now let's stick with ELFes here.

Let's take a look at how an ELF file is structured in general:

Before diving deeper into each section, let's get a high-level overview first.

The ELF Header is a structure that is present in each and every ELF file. Without this header the operating system would refuse to even load the program into memory. It contains various information, f.e. the entry point into the binary, and it always starts with the magic numbers 7f 45 4c 46. These are the mystic number 7 followed by the ASCII representation of the letters e l f. So each binary starts with 0x7elf, or the idiom seven eleven.

The elf header is followed by an optional program header table. This table is used for storing information about segments (code chunks at run-time).

Following up after this is each section of the binary which we will shortly look into. Some important sections are .text and .rodata. Each section has various flags set, describing whether the section is executable (thus containing explicit code) or data. Some sections like .rodata are read-only, meaning the process is unable to modify them at run-time to alter it's behaviour. This, of course, does not mean you can't fiddle around with these sections before they are loaded.

At the very end of the binary is the so called section header table. Each section in the binary is related to by a section header, describing it's properties and containing information like the starting address and the length (or offset) of the section. The section headers for every section in the binary are contained in the section header table. There is also a string table present there, but more about this later.

If you are curious about the structures of ELF files, you can look up the structs at /usr/include/elf.h on basically every Linux machine.

So, after getting a brief overview about the general structure of a binary, let's descend a bit deeper into the rabbit's hole.

The ELF-header

As already noted, each ELF binary starts with an ELF-header.
For x86 binarys, the executable header is always 52 bytes long, for x86_64 binarys, it is always 64 bytes long. I'm not sure about this right now so scratch it.
I will now list each field and give a brief explanation about it's contents:


  1. e_ident [16 byte]
    1. magic: contains the values 7f 45 4c 46 [4 byte]
    2. class : 1 [32bit] or 2[64bit]
    3. data  : 1[little endian] or 2 [big endian]
    4. version : ignore this, it is always 1
    5. osabi : default value for this is 0 [UNIX V Abi]
    6. abiversion: default value is 0 
    7. padding : [7 byte] reserved for future changes that might or might not come
  2. e_type [16 byte]
    1. type description of binary, usually
      1. et_rel for relocatable
      2. et_exec for executable
      3. et_dyn for dinamic library
  3. e_machine [16 byte]
    1. em_386 (32bit), x86_64 (64bit) or em_arm
  4. e_version [32 byte]
    1. describes ELF information at build-time, the same as the version field in e_ident and always 1 as well
  5. e_entry [64 byte]
    1. This is the (virtual)address at which execution will start later. As we will see in the future, it is a common attack vector for code-injection (well-known address attack). It is also a good point to start disassembling a binary.
  6. e_phoff [64 byte]
    1. In-file (not virtual)offset to program header table. May be 0 if there is no program header table since it is optional
  7. e_shoff [64 byte]
    1. Similar, an In-file offset to the section header table or 0.
  8. eh_flags [32 byte]
    1. Flags needed for a specific architecture. Usually this is important for ARM-binaries while set to 0 for 64bit binaries.
  9. e_ehsize [16 byte]
    1. Size of execution header. (64 or 52 byte ~ inconsistent information, handle with care)
  10. e_*entsize [64 byte]
    1. e_phentsize [16 byte] : size of the program header sections
    2. e_phnum [16 byte] : number of program headers
    3. e_shentsize [16 byte] : size of the section header sections
    4. e_shnum [16 byte] : number of section headers
  11. e_shstrndx [16 byte]
    1. This special entry in the section header table is associated with the string table. This section, called .shstrtab, contains the name of each section represent inside the binary, in ASCII format. It is not present at run-time, and it needs to be fixed when injecting code. Well, I believe it needs to be patched for code injection, but feel free to fiddle around with it.
That's it for the executable header. Although I am unsure about the actual size entries for the field e_ehsize, it should not be too much of an issue to you since you can just check whether the binary is 64- or 32bit by looking at the e_machine field.

Section headers


Following up, let's check out sections. There are plenty of sections, but one step at a time.
Sections are listed in the section header table which is located at the very end of the binary. You can get the offset to each section and it's size from this table, so it is extremely useful when writing a disassembler or any form of injector or the likes. Sections only exist at link-time, when the operating system loads the binary into a process, these sections will be compressed into so called segments.

Let's first take a look at the section header description defined in elf.h:


  1. sh_name [32 byte]
    1. Pointer to entry in string table, useful to grab for a disassembler
  2. sh_type [32 byte]
    1. Section type, if this equals SHT_PROGBITS, the section contains program data
    2. Special types of sections are:
      1. sht_dynsym: symbol table, required for the linker
      2. sht_symtab: static symbol tables
      3. sht_strtab: string table
      4. sht_rel: relocation files
      5. sht_rela: relocation entries
      6. sht_dynamic: information for dynamic linking
  3. sh_flags [64 byte]
    1. flags containing various information, here some noteworthy:
      1. shf_write: section is writable during execution
      2. shf_alloc: content is loaded into virtual memory at run-time
      3. shf_execinstr: section contains executable code
  4. sh_addr [64 byte]
    1. virtual address at which this section begins, useful for disassembling
  5. sh_offset [64 byte]
    1. in-file offset to beginning of the section, sometimes the linker needs to know about this, other than that it is useful for manually calculating offsets
  6. sh_size [64 byte]
    1. size of each section respectively. This is also useful when calculating how big each section is, whether there is free space between sections, and for code injection, since the linker places some sections at an address that is divideable by [sh_addralign].
  7. sh_link [32 byte]
    1. explicit index of each respective section in the section header table. Sometimes the linker needs to know about this. I believe this can be instrumented for kind of an injection-detection mechanic, since newly injected sections may not calculate this value correctly
  8. sh_info [32 byte]
    1. various information about the section, f.e. whether this section needs relocation.
  9. sh_addralign [64 byte]
    1. If this section needs to be mapped to a specific location, the multiplier to this location is contained by this field. F.e., it might contain 16, so the section must be mapped to an address dividable by 16 (1600 f.e.)
  10. sh_entsize [64 byte]
    1. size of some well-defined data structures inside certain sections, not that important 

That's it for the section header description, now let's examine the sections...

The sections

In a "standard" ELF file, there should be around 27 different sections, depending on compilation. Sometimes there are a few less, and of course sections can be created by the programmer, too. Some sections have very similar purpose, and some aren't that important. I will focus on the more important ones here, and I will give you a brief hint on how to modify or even hijack a program with the help of some of these sections.


  • NULL
    • This sections is always the first one, and it has all fields zeroed out. It does not really have a purpose, but you can join our Stackoverflow discussion about it...
  • .interp
    • This is a section holding the path to the interpreter. The interpreter is the part of the os that handles mapping the binary to a real executable process later.
  • .note.ABI-tag and .note.gnu.build-id
    • These sections are really funny. The section header table and the program header table as well point to them, but in reality nobody really needs them. They contain auxiliary information about the binary, like expected kernel version and that this file is a GNU/Linux binary. If they are missing in the binary, the loader will just assume that it's a native binary, so it is safe to just overwrite them. This can be used to write mods for a program, or even writing a "parasite", as I will show you in future posts.
  • .gnu.hash
    • This section contains hashes (o'rly ^^) that allow a fast lookup of symbols. It's used by the linker and not that interesting.
  • .dynsym, .dynstr, .strtab, .symtab, .shstrtab
    • .symtab and .strtab sections contain a symbol table which associates a symbolic name with a piece of code or data elsewhere, think of it as kind of a "roadmap" to the binary. The actual strings containing the symbolic names are found inside the .strtab section. In practise, binaries are often stripped off of symbols, meaning these sections are removed to make disassembling harder
    • .dynsym and .dynstr are equivalent to the above sections, except they are needed for dynamic linking rather than static linking. These sections cannot be stripped since information stored inside them is crucial for dynamic linking.
    • The .shstrtab is really just a big array containing the name of every section in the binary. Tools like readelf can use it to find section names for analysis.
  • .gnu-version and .gnu-version_r
    • These sections contain version definitions and requirements. Since they are uninteresting and marked read-only, just ignore them. 
  • .reala.dyn and .rela.plt
    • These sections contain tables of relocation entries used by the linker. Each entry describes an address where a relocation needs to be applied and a description about how the particular value needs to be inserted.
  • .init and .fini
    • These sections contain code that needs to run before and after the executable is executed. Think of these sections as kind of constructor and destructor, respectively. As a side effect, the entry point of the binary points to the beginning of the .init section that later will be placed inside the .text segment. So if you want to inject any code inside the binary and don't mind that it is easy to track that you manipulated it, you would want to overwrite the entry point to your injected section, returning to the original entry point after wards. This method is useful for writing mods f.e.
  • .text
    • This is the place where the main code of your whole program resides. It is the main target of most reverse engineering attempts and the biggest section at large. This section is marked executable but not writable, since this would mean a giant security risk. In fact, sections that contain executable code should never be marked writable. Besides your custom code, the section contains standard functions made by the compiler, such as _start, register_tm_clones and frame_dummy. Now start is an interesting function since every programmer learns that a program will start at the beginning of the main function. In reality, a program will start at this function and first will resolve an address in the .plt section, making a call to the external library __libc_start_main which then calls the correspondenting main function inside your program. You can change the default startup function by setting the compiler flag -e, f.e. gcc -e main myprog.c, but i strongly encourage you not to do so unless you know how to setup a process.
  • .rodata, .data, .bss
    • As mentioned above, code sections should not be marked writable. But if the program works with variables, they have to be stored somewhere, which is why these 3 sections exist. You might wonder why there are 3 separate sections for working with variables. The reason for this is that there are different types of variables. 
    • The .rodata sections stands for ReadOnly data, which are constants and thus this section is marked read-only. 
    • The values of initialized variables are stored inside the .data section, which is writable. 
    • Finally, the .bss section (originally for "block started by symbol") is a place where uninitialized variables will go, f.e. instance variables that are initialized at run-time. This section has type SHT_NOBITS and is not present in the on-disk representation of the binary. The reason for this is that these variables simply do not exist before run-time, and it's sole purpose is to tell the interpreter to setup appropriate space inside the process when beginning to load the program.
  • .got , .plt, .got.plt, .plt.got
    • These sections are used for so called "lazy binding", and because they have a somewhat special purpose, we will look at them below in an own chapter. For now, think of them as "jumpgates" to external functions such as an api call. They can be overwritten by you to replace existing calls, f.e. could you overwrite calls to "printf" with your own implementation of printf which is a handy trick up your sleeve.
    • The .plt.got section will never show up unless you use the -z now compile flag and therefor I will completely ignore it here.
  • .init_array and .fini_array
    • These 2 sections contain an array of pointers to functions that serve as constructors, pretty much like .init and .fini. Unlike .init which holds only one "global" constructor pointer, .init_array can hold as many functions to use as constructors as you want. The programmer in fact can set up his own functions to use as constructors by decorating them with __attribute__((constructor)) in a C source file. These pointers can easily be hijacked to hook into the binary. Older versions of gcc might call these sections .ctors and .dtors (constructors and destructors)
    • One pointer contained in .init_array points to the beginning of the frame_dummy functions, which kind of sets up an initial stack frame for the process and handles exception throws.
There are some additional sections like .comment (for versioning), but let's not get overboard here. Re-read every section description carefully and try to get a feeling on how the linker has setup the binary for the use of the interpreter.

Lazy binding and .plt stubs

Phew, let's focus one last time to get an overview about a special way of resolving external function calls. In ancient times, when ELFs where still young, there was only static linking (I guess lol). This means that all external function modules are copied into the final executable. While this method may or may not be somewhat faster at run-time, it is redundancy of doom and also stands against the principles of shared libraries (which are used to save RAM among others).

But with dynamic linking there arose the need of "last-minute-relocation", meaning the compiler could never know the exact address of a function call from a shared library so the interpreter (or dynamic linker) has to resolve function addresses at run-time. But since the interpreter is a lazy soab, these relocations usually are deferred until the very last minute. This is called lazy binding.Think of it as an IT student who needs to do a presentation in 2 weeks. What will happen is that he will smoke weed for 1 and a half week and then slowly starts to think about collecting information, and on the very day before the presentation, he will put everything together at last ( ͡° ͜ʖ ͡°) .

Lazy binding will ensure the interpreter will never needlessly waste time relocating stuff that might get called once after the program already run for 48 hours straigth. However, if you happen to require real-time performance in your program, you might consider turning this functionality off by exporting an environment variable called LD_BIND_NOW.

So, how does lazy binding work in ELF files you might ask. On Linux, it is implemented using the 2 special sections called .plt (procedure linking table) and .got (global offset table). Besides that, there exist a separate section called .got.plt that is run-time-writable for security reasons.
Remember how I told you that executable sections should never be writable? That's why there is this special section nowadays.

What is this all about? First, consider the following picture:


Well, my drawing skills might not be that good, but I think this will solve our needs to understand function stubs in ELF files. As I find AT&T syntax plain ugly, I am using Intel syntax here.


  1. At some point in your program, you make a call to puts for whatever reason. If you did not make this call, for example because the user did not choose to call it, the dynamic linker will save time by never resolving the stub.
  2. In the .plt section, each stub is listed after the default stub. If you look at the picture, the puts stub pushes 0x0 as it's second instruction. If we had more stubs, the number would increment with each stub as kind of an identifier. At first, the puts stub performs a jump to the calculated address [rip+0x200c12]. This calculated address is the entry inside .got.plt on the right side, which (initially) holds the address of the next instruction after this jump (push 0x0). That's a very fucked up way to calculate an address, but you will soon understand why it is implemented like that.
  3. So, we jump to [rip+0x200c12], which is in this example pointing to the instruction push 0x0.
  4. After we pushed 0 onto the stack, we perform a jump to the default stub. Now, with 0 on the stack, the linker will know which stub asked for the external call. The default stub will then push that same address onto the stack for the dynamic linker, so it gets patched after the first successfull call. After setting up make a jump to the dynamic linker (who happens to sit inside .got.plt aswell)
  5. The dynamic linker pops these 2 values off the stack (at some point at least) and realizes that a relocation of the puts function is requested by the first stub (0x0). After resolving the address inside the libc library, the dynamic linker patches the address entry in the .got.plt section (which is writable as we discussed earlier) and then, finally, makes a call to puts.
  6. As the .got.plt section is patched now, the <address> is now pointing to the puts function directly rather than pointing to the push instruction in the .plt section. Any following call now will be forwarded to puts directly.

What can we conclude from this?
Well, at first we can understand why the first call to an external library function always is the slowest.
Also, if the call is never made because it is a real exotic call, possibly only used when a specific error occurs, we saved precious processor time (not so precious nowadays, but anyways).

But there are 2 other important discoveries we just made:
  1. If we want to use a specific external call to hook into a binary, we can overwrite the address in a .plt stub with a call to a function we injected into the binary. This will of course render the original function call useless as it will never be made, but this is solid way to mod any program so far. I will evaluate this in the near future at some point.
  2. If we need to augment a specific library call, we can overwrite the .plt.got address so the dynamic linker will resolve it with a different external function. This will of course require serious planning or brute force, but it is a very strong technique. Or you can simply force the interpreter to preload a library by exporting the environment variable LD_PRELOAD. If you write LD_PRELOAD=`pwd`/myLib.so ./myprogram, the library will be loaded first. We will look into this technique in the future, but for now just remember that it is a mighty tool in your box.

Conclusion

That sure was a lot of input from me, but if you take your time and read it 2 or 3 times, it should not be too hard to understand. It can also help to fire up readelf and look at a real-world binary on your system.
If you just want to dump every information we just went through, use
readelf -e programname

Also, if you are curious about how a program looks at run-time, here is a picture demonstrating this. Basically, all sections will get mapped together into so called segments, and these segments are addressed by the program header table now. (Some sections will get dropped because they are unimportant at run-time). I know that this representation might not be the most accurate, but let's call it a day here. I will show you some in-memory binaries in the future, I promise ;)

That's it. I hope you are not scared of reverse engineering fundamentals, since it is a very fun and challenging topic. Also, if you every want to be considered a professional, you'd have to at least remember the basic structure of an ELF file. GAMBARE!

It's better to die for yourself than to live for someone else.
- numb.3rs

Comments