Understanding Windows Portable Executable headers

Dissecting binaries and their structure for malware analysis

Published on
5 min read
Understanding Windows Portable Executable headers

Introduction

Understanding the structure of Windows Portable Executables (PE files) is vital to understanding and reverse engineering malware. PE files use a structure named the Common Object File Format (COFF) data structure. It consists of DLLs, shared objects and other PE files. Becoming aware of how these executables work will provide a greater understanding of how malware may use in a variety of ways.

We shall go through each of the main headers within a PE file and will dissect them and understand how they each work to build a complete portable executable.

IMAGE_DOS_HEADER

The first 64 bytes of the PE file will consist of this header; where the first two bytes are 4D 5A. In ASCII, this means 'MZ' which are the initials of Mark Zbikowski, who created the MS-DOS file format. Furthermore, MZ identifies the file as a portable executable.

This will be the first bytes in the first section of the header named e_magic. These first bytes within a file are known as 'magic bytes', as they act as a signature of the file type. The last section of the header is named e_lfanew which denotes the start address of the next header IMAGE_NT_HEADERS.

DOS_STUB

This header contains the message This program cannot be run in DOS mode which is another indicator of a PE file. It contains also the amount of entropy (randomness in data) and the section size in bytes. The purpose of this header is to simply print the error message above, when the executable is run in an incompatible environment, such as in MS-DOS.

The randomness of data may point to whether the binary has been packed or not. Typically the greater the randomness, the greater the likelihood it has been packed. 

IMAGE_NT_HEADERS

  • NT_HEADER

    The first four bytes of this header consist of 50 45 00 00 which translates to PE in ASCII. This identifies the start of the NT_HEADER. The NT_HEADER contains the FILE_HEADER and the IMAGE_OPTIONAL_HEADER.

  • FILE_HEADER

    This header contains some very important sections, such as:

    • Machine: Contains the PE file architecture (such as i386 for 32-bit and x86 for 64-bit systems)

    • Number of sections: A PE file contains sections where different types of data may be stored. This header lists the number of those sections.

    • TimeDateStamp: Contains the date and time of the compilation

    • Characteristics: This mentions the characteristics of the binary; whether it has line numbers, stripped relocation information and so on.

  • OPTIONAL_HEADER

    Some important sections are:

    • Magic: Describes the binary architecture: 0x010B is 32-bit and 0x020b is 64-bit.

    • AddressOfEntryPoint: This is where Windows will start execution (the first instruction) is present at the address detailed in this section. The address is a Relative Virtual Address meaning its offset is relative to the base address of the image in memory.

    • ImageBase: The preferred loading address of the file, generally 0x00400000.

    • Subsystem: Describes the system required to execute the binary, be it a GUI, CUI or other. A Windows GUI subsystem is represented by the bytes 0x0002.

    • DataDirectory: Contains import and export tables, which may show what DLLs the program is trying to use; thus suggesting possible behaviour of the file.

  • IMAGE_SECTION_HEADER

    This header contains data used to perform its functions, such as the executable code, images, elements, icons and so on. The .rsrc section is particularly interesting as it can store any data type (even .exe files)! The important sections are:

    • .text: Contains executable code for the binary.

    • .data: Contains initialised data of the binary.

    • .rdata/.idata: Contains import information from the Windows API, or other files.

    • .rsrc: Containers icons, images or data of any type. This section is particularly interesting as it can allow an embedded .exe file within this section.

Each section typically consists of some vital information such as :

  • VirtualAddress: the Relative Virtual Address in memory

  • VirtualSize: the size of the binary once loaded into memory

  • SizeOfRawData: the size of the binary before loaded into memory (on disk)

  • Characteristics: The permissions of the sections, such as WRITE or READ.

Comparisons between the VirtualSize  and SizeOfRawData  is particularly interesting as it may indicate whether a binary has been packed (compressed) or not. Large or noticeable differences between these two sizes may indicate that the binary was packed with a packer. Packing the binary makes it more difficult to perform static analysis as it typically details fewer amounts of information.

IMAGE_IMPORT_DESCRIPTOR

This header contains information about the different imports that the binary uses to perform specific functions. This means the binary will not include all the necessary code within it but will make calls to the Windows API to perform functions such as creating a file or process. Studying these imports may allow an analyst to understand what the binary is trying to accomplish.

Conclusion

One should now have a fairly decent understanding of the PE file, and how each section within each header provides value to an analyst, especially within the static analysis stage.

These information points form cues as to the behavior of the binary and are valuable in understanding the purpose of the binary and how it functions.

 

Author

Discussion (0)

Subscribe