hacktricks/macos-hardening/macos-security-and-privilege-escalation/mac-os-architecture/universal-binaries-and-mach-o-format.md

16 KiB
Raw Blame History

Universal binaries & Mach-O Format

☁️ HackTricks Cloud ☁️ -🐦 Twitter 🐦 - 🎙️ Twitch 🎙️ - 🎥 Youtube 🎥

Basic Information

Mac OS binaries usually are compiled as universal binaries. A universal binary can support multiple architectures in the same file.

These binaries follows the Mach-O structure which is basically compased of:

  • Header
  • Load Commands
  • Data

Fat Header

Search for the file with: mdfind fat.h | grep -i mach-o | grep -E "fat.h$"

#define FAT_MAGIC	0xcafebabe
#define FAT_CIGAM	0xbebafeca	/* NXSwapLong(FAT_MAGIC) */

struct fat_header {
	uint32_t	magic;		/* FAT_MAGIC or FAT_MAGIC_64 */
	uint32_t	nfat_arch;	/* number of structs that follow */
};

struct fat_arch {
	cpu_type_t	cputype;	/* cpu specifier (int) */
	cpu_subtype_t	cpusubtype;	/* machine specifier (int) */
	uint32_t	offset;		/* file offset to this object file */
	uint32_t	size;		/* size of this object file */
	uint32_t	align;		/* alignment as a power of 2 */
};

The header has the magic bytes followed by the number of archs the file contains (nfat_arch) and each arch will have a fat_arch struct.

Check it with:

% file /bin/ls
/bin/ls: Mach-O universal binary with 2 architectures: [x86_64:Mach-O 64-bit executable x86_64] [arm64e:Mach-O 64-bit executable arm64e]
/bin/ls (for architecture x86_64):	Mach-O 64-bit executable x86_64
/bin/ls (for architecture arm64e):	Mach-O 64-bit executable arm64e

% otool -f -v /bin/ls
Fat headers
fat_magic FAT_MAGIC
nfat_arch 2
architecture x86_64
    cputype CPU_TYPE_X86_64
    cpusubtype CPU_SUBTYPE_X86_64_ALL
    capabilities 0x0
    offset 16384
    size 72896
    align 2^14 (16384)
architecture arm64e
    cputype CPU_TYPE_ARM64
    cpusubtype CPU_SUBTYPE_ARM64E
    capabilities PTR_AUTH_VERSION USERSPACE 0
    offset 98304
    size 88816
    align 2^14 (16384)

or using the Mach-O View tool:

As you may be thinking usually a universal binary compiled for 2 architectures doubles the size of one compiled for just 1 arch.

Mach-O Header

The header contains basic information about the file, such as magic bytes to identify it as a Mach-O file and information about the target architecture. You can find it in: mdfind loader.h | grep -i mach-o | grep -E "loader.h$"

#define	MH_MAGIC	0xfeedface	/* the mach magic number */
#define MH_CIGAM	0xcefaedfe	/* NXSwapInt(MH_MAGIC) */
struct mach_header {
	uint32_t	magic;		/* mach magic number identifier */
	cpu_type_t	cputype;	/* cpu specifier (e.g. I386) */
	cpu_subtype_t	cpusubtype;	/* machine specifier */
	uint32_t	filetype;	/* type of file (usage and alignment for the file) */
	uint32_t	ncmds;		/* number of load commands */
	uint32_t	sizeofcmds;	/* the size of all the load commands */
	uint32_t	flags;		/* flags */
};

#define MH_MAGIC_64 0xfeedfacf /* the 64-bit mach magic number */
#define MH_CIGAM_64 0xcffaedfe /* NXSwapInt(MH_MAGIC_64) */
struct mach_header_64 {
	uint32_t	magic;		/* mach magic number identifier */
	int32_t		cputype;	/* cpu specifier */
	int32_t		cpusubtype;	/* machine specifier */
	uint32_t	filetype;	/* type of file */
	uint32_t	ncmds;		/* number of load commands */
	uint32_t	sizeofcmds;	/* the size of all the load commands */
	uint32_t	flags;		/* flags */
	uint32_t	reserved;	/* reserved */
};

Filetypes:

  • MH_EXECUTE (0x2): Standard Mach-O executable
  • MH_DYLIB (0x6): A Mach-O dynamic linked library (i.e. .dylib)
  • MH_BUNDLE (0x8): A Mach-O bundle (i.e. .bundle)
# Checking the mac header of a binary
otool -arch arm64e -hv /bin/ls
Mach header
      magic  cputype cpusubtype  caps    filetype ncmds sizeofcmds      flags
MH_MAGIC_64    ARM64          E USR00     EXECUTE    19       1728   NOUNDEFS DYLDLINK TWOLEVEL PIE

Or using Mach-O View:

Mach-O Load commands

This specifies the layout of the file in memory. It contains the location of the symbol table, the main thread context at the beginning of execution, and which shared libraries are required.
The commands basically instruct the dynamic loader (dyld) how to load the binary in memory.

Load commands all begin with a load_command structure, defined in the previously mentioned loader.h:

struct load_command {
        uint32_t cmd;           /* type of load command */
        uint32_t cmdsize;       /* total size of command in bytes */
};

There are about 50 different types of load commands that the system handles differently. The most common ones are: LC_SEGMENT_64, LC_LOAD_DYLINKER, LC_MAIN, LC_LOAD_DYLIB, and LC_CODE_SIGNATURE.

LC_SEGMENT/LC_SEGMENT_64

{% hint style="success" %} Basically, this type of Load Command define how to load the sections that are stored in DATA when the binary is executed. {% endhint %}

These commands define segments that are mapped into the virtual memory space of a process when it is executed.

There are different types of segments, such as the __TEXT segment, which holds the executable code of a program, and the __DATA segment, which contains data used by the process. These segments are located in the data section of the Mach-O file.

Each segment can be further divided into multiple sections. The load command structure contains information about these sections within the respective segment.

In the header first you find the segment header:

struct segment_command_64 { /* for 64-bit architectures */
	uint32_t	cmd;		/* LC_SEGMENT_64 */
	uint32_t	cmdsize;	/* includes sizeof section_64 structs */
	char		segname[16];	/* segment name */
	uint64_t	vmaddr;		/* memory address of this segment */
	uint64_t	vmsize;		/* memory size of this segment */
	uint64_t	fileoff;	/* file offset of this segment */
	uint64_t	filesize;	/* amount to map from the file */
	int32_t		maxprot;	/* maximum VM protection */
	int32_t		initprot;	/* initial VM protection */
	uint32_t	nsects;		/* number of sections in segment */
	uint32_t	flags;		/* flags */
};

Example of segment header:

This header defines the number of sections whose headers appear after it:

struct section_64 { /* for 64-bit architectures */
	char		sectname[16];	/* name of this section */
	char		segname[16];	/* segment this section goes in */
	uint64_t	addr;		/* memory address of this section */
	uint64_t	size;		/* size in bytes of this section */
	uint32_t	offset;		/* file offset of this section */
	uint32_t	align;		/* section alignment (power of 2) */
	uint32_t	reloff;		/* file offset of relocation entries */
	uint32_t	nreloc;		/* number of relocation entries */
	uint32_t	flags;		/* flags (section type and attributes)*/
	uint32_t	reserved1;	/* reserved (for offset or index) */
	uint32_t	reserved2;	/* reserved (for count or sizeof) */
	uint32_t	reserved3;	/* reserved */
};

Example of section header:

If you add the section offset (0x37DC) + the offset where the arch starts, in this case 0x18000 --> 0x37DC + 0x18000 = 0x1B7DC

It's also possible to get headers information from the command line with:

otool -lv /bin/ls

Common segments loaded by this cmd:

  • __PAGEZERO: It instructs the kernel to map the address zero so it cannot be read from, written to, or executed. The maxprot and minprot variables in the structure are set to zero to indicate there are no read-write-execute rights on this page.
    • This allocation is important to mitigate NULL pointer dereference vulnerabilities.
  • __TEXT: Contains executable code and data that is read-only. Common sections of this segment:
    • __text: Compiled binary code
    • __const: Constant data
    • __cstring: String constants
    • __stubs and __stubs_helper: Involved during the dynamic library loading process
  • __DATA: Contains data that is writable.
    • __data: Global variables (that have been initialized)
    • __bss: Static variables (that have not been initialized)
    • __objc_* (__objc_classlist, __objc_protolist, etc): Information used by the Objective-C runtime
  • __LINKEDIT: Contains information for the linker (dyld) such as, "symbol, string, and relocation table entries."
  • __OBJC: Contains information used by the Objective-C runtime. Though this information might also be found in the __DATA segment, within various in __objc_* sections.

LC_MAIN

Contains the entrypoint in the entryoff attribute. At load time, dyld simply adds this value to the (in-memory) base of the binary, then jumps to this instruction to start execution of the binarys code.

LC_CODE_SIGNATURE

Contains information about the code signature of the Macho-O file. It only contains an offset that points to the signature blob. This is typically at the very end of the file.

LC_LOAD_DYLINKER

Contains the path to the dynamic linker executable that maps shared libraries into the process address space. The value is always set to /usr/lib/dyld. Its important to note that in macOS, dylib mapping happens in user mode, not in kernel mode.

LC_LOAD_DYLIB

This load command describes a dynamic library dependency which instructs the loader (dyld) to load and link said library. There is a LC_LOAD_DYLIB load command for each library that the Mach-O binary requires.

  • This load command is a structure of type dylib_command (which contains a struct dylib, describing the actual dependent dynamic library):
struct dylib_command {
        uint32_t        cmd;            /* LC_LOAD_{,WEAK_}DYLIB */
        uint32_t        cmdsize;        /* includes pathname string */
        struct dylib    dylib;          /* the library identification */ 
};

struct dylib {
    union lc_str  name;                 /* library's path name */
    uint32_t timestamp;                 /* library's build time stamp */
    uint32_t current_version;           /* library's current version number */
    uint32_t compatibility_version;     /* library's compatibility vers number*/
};

You could also get this info from the cli with:

otool -L /bin/ls
/bin/ls:
	/usr/lib/libutil.dylib (compatibility version 1.0.0, current version 1.0.0)
	/usr/lib/libncurses.5.4.dylib (compatibility version 5.4.0, current version 5.4.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1319.0.0)

Some potential malware related libraries are:

  • DiskArbitration: Monitoring USB drives
  • AVFoundation: Capture audio and video
  • CoreWLAN: Wifi scans.

{% hint style="info" %} A Mach-O binary can contain one or more constructors, that will be executed before the address specified in LC_MAIN.
The offsets of any constructors are held in the __mod_init_func section of the __DATA_CONST segment. {% endhint %}

Mach-O Data

The heart of the file is the final region, the data, which consists of a number of segments as laid out in the load-commands region. Each segment can contain a number of data sections. Each of these sections contains code or data of one particular type.

{% hint style="success" %} The data is basically the part containing all the information loaded by the load commands LC_SEGMENTS_64 {% endhint %}

This includes:

  • Function table: Which holds information about the program functions.
  • Symbol table: Which contains information about the external function used by the binary
  • It could also contain internal function, variable names as well and more.

To check it you could use the Mach-O View tool:

Or from the cli:

size -m /bin/ls
☁️ HackTricks Cloud ☁️ -🐦 Twitter 🐦 - 🎙️ Twitch 🎙️ - 🎥 Youtube 🎥