Aether: memory forensics and threat hunting tool

Mr.Z June 2, 2026 18 min read

Aether: memory forensics and threat hunting tool

18 min read 3,929 words

Introduction

Table of Contents

I started building this tool a few months ago purely out of curiosity just to see if I could hunt for specific patterns in memory for Phantom ASP.Net Loader. But one feature led to another, and it quickly evolved into something much more substantial. It’s become a core future project for me, and I’m excited to finally share the first public beta release.

Introducing Aether, a memory forensic tool written in Zig and made specifically to help analyst to inspect, hunt and detect malicious behaviour, it is coded with a mindset of an attacker who wants to help defender and glueing missing pieces to build a good, fast and reliable software.

Why Zig, I believe the specific language you use matters less than the craftsmanship you put into it; as long as you’re enjoying the process and building something quality, stable, easy to build and use that’s what truly counts.

The current release version is v0.8 and it still require development to address limitations, community feedbacks and unknown cases. you can always keep an eye on the development plans and upcoming features here

The problem with “Modified Code” detection

Every modern memory scanner in this space, relies on a version of the same core idea: compare what’s in memory against what should be in memory. For executable images (DLLs/EXE), the kernel tracks which pages have been written to since they were mapped from the disk. These are called private pages – they’ve been copy-on-written way from the shared file-backed image. if the executable code page gone private, something has modified them at runtime. this is a strong signal, but in practice still nearly useless on it’s own.
Here’s what creates private pages in a normal, none-compromised Windows process:

Relocation : if a DLL doesn’t land as it’s prefered ImageBase(which is always, thanks to ASLR), the loader patches relocations targets directory into .text section. Those pages go private.
IAT binding: Every imported function address gets written in the Import Address Table (IAT). The IAT usually lives in .rdata and every LdrpSnapModule call during process init makes those pages private.
.NET JIT: if the process hosts the CLR, tiered compilation writes executable code into CLR.dll, clrjit.dll and every NGEN/R2R image. hundreds of pages go private during normal managed execution.
CFGS bitmap updates: The Control Flow Guard(CFG) metadata writes touch pages in the image regions on module load.
EDR hooks: every EDR/XDR solution install inline hooks, the fact it looks identical to a malicious detour.

What Aether does to solve this?

Instead of emitting a finding for every private page and leaving triage to the analyst, Aether runs five sequential filters (L1-L5) pipeline over every candidate before deciding whether to report it. Each layer eliminates a specific class of false positive. A candidate has to survive all five to produce a finding.(this proof to reduce over 80% of FP)

L1 – structural filter

Only executable sub-regions of MEM_IMAGE allocations are considered. This single gate drops about 80% of false positives immediately, because the dominant noise source is .data and .rdata sections going private during IAT binding and global initialization. Those sections don’t have the PAGE_EXECUTE_* protection flag. They’re not interesting for code-injection detection, and L1 stops them from ever entering the pipeline.

L2 – quantitative grading

Not all private code pages are equally suspicious. One modified page in a 240-page DLL could be a single EDR hook. Sixty modified pages in a 240-page DLL means someone replaced a quarter of the code section. L2 counts private pages via a batched K32QueryWorkingSetEx call (one syscall per region, not one per 4 KB page – roughly 100x faster than the naive approach) and computes the ratio. Candidates get graded LOW, MEDIUM, or HIGH based on thresholds of 5% and 25% private-page ratio, with absolute counts as a secondary gate.

L3 – corroboration

A candidate is only promoted to a reported finding if at least one independent signal agrees on the same AllocationBase. That independent signal can be a signature hit from the pattern scanner, a MISSING_PEB finding (the module isn’t registered with the PEB – strong DLL hollowing indicator), a PRIVATE_RWX allocation overlapping the same base, a hook-prologue byte pattern, or a confirmed on-disk diff. If a candidate sits alone with no corroboration from any other detection pass, it gets dropped entirely. No output. No noise.
This is a deliberate trade-off. It means Aether will miss a perfectly clean code-cave injection that modifies three bytes in a legitimate DLL and has no other IOCs. That’s a real attack technique. But in practice, implants that modify code almost always leave other traces – they allocate RWX memory nearby, they create threads, they show up in pattern scans, or the modified module’s PEB entry is inconsistent. Requiring corroboration eliminates hundreds of false positives per scan while losing almost nothing in real-world detection coverage.

L4 – CLR-aware per-module suppression

If the target process hosts the .NET runtime, earlier tools either skip the entire process (missing real implants in .NET hosts) or scan everything (drowning in JIT noise). Aether takes a middle path: it suppresses modified-code findings only for modules whose basenames match a JIT target allowlist (.ni.dll, mscor, clr, coreclr, system.private.corelib). Everything else in the same process is scanned normally. If an attacker injects shellcode into KERNEL32.dll inside a .NET worker process, L4 doesn’t protect them – KERNEL32 isn’t on the allowlist.

L5 – on-disk diff

For each surviving candidate, Aether maps the original module file from disk using CreateFileMappingW with SEC_IMAGE_NO_EXECUTE – the kernel applies the same section layout it would for a real load, so RVAs match. Then it compares the first 16 bytes of each private executable page in memory against the corresponding RVA on disk. If they differ, the modification is real – not loader noise, not a relocation, not an IAT write. This is the kill shot for distinguishing legitimate private pages from actual code tampering. A confirmed disk diff immediately promotes the finding to MODIFIED_CODE_HIGH regardless of the ratio from L2.

There’s also a hook-prologue probe inside the candidate loop. For each private code page, Aether reads the first 16 bytes and matches against five common x86/x64 trampoline patterns:
E9 ?? ?? ?? ?? JMP rel32 FF 25 ?? ?? ?? ?? JMP [rip+disp32] 68 ?? ?? ?? ?? C3 PUSH imm32 ; RET 48 B8 ?? ?? ?? ?? ?? ?? ?? ?? FF E0 MOV RAX, imm64 ; JMP RAX 49 BB ?? ?? ?? ?? ?? ?? ?? ?? 41 FF E3 MOV R11, imm64 ; JMP R11

The last two are Detours-style trampolines. A hook prologue by itself lands as MEDIUM severity – EDR products use these exact patterns for legitimate hooks. But when a hook prologue appears on the same allocation base as a disk diff, or a missing PEB entry, or a pattern-scan hit, the corroboration in L3 promotes it. The hook is no longer “probably EDR” – it’s “probably part of the same implant that caused three other findings at this base address.”

Catching the threads, not just the code

Finding modified code is half the problem. The other half is finding who is running it.
For example, Moneta has a thread check that reads Win32StartAddress from NtQueryInformationThread (info class 9), then check if it points inside a loaded module. If it doesn’t, flag it. This catches the introductory-level attack: VirtualAllocEx a page, write shellcode, CreateRemoteThread pointing at it. Start address is in MEM_PRIVATE, clearly not in any module, detection fires.

As an offensive practitioner, I know this check is trivial to beat. Here are five common techniques that bypass it completely:
Module stomping
Overwrite the .text section of a legitimate DLL with your payload. The thread’s start address points into ntdll.dll or kernel32.dll or wherever you stomped. It’s “inside a module.” Check passes.

DLL hollowing.
Map a fresh copy of a DLL, hollow out the code, replace it with yours. The memory type is still MEM_IMAGE. It looks like a module. But it was never registered with the PEB module list, because it is loaded with NtCreateSection + NtMapViewOfSection, not LdrLoadDll.

Win32StartAddress spoofing.
After thread creation, call NtSetInformationThread to overwrite the Win32StartAddress field. Point it at kernel32!LoadLibraryW or ntdll!RtlExitUserThread. Now every tool that reads this field sees a benign system function as the start address. The thread is actually running shellcode.

Thread context hijacking.
Create a thread suspended, pointing at a legitimate address. Then SetThreadContext to overwrite Rip to your shellcode. Resume the thread. Win32StartAddress still shows the original legitimate address. The thread is executing somewhere else entirely.
Staged loading. Allocate MEM_PRIVATE with PAGE_READWRITE (not executable). Write the payload. Create a thread pointing at it. Later, VirtualProtect flips it to RX. At thread creation time, the page wasn’t even executable – the naive check might not flag a RW region.

Aether’s Thread Start-Address Validation (TSAV) catches all five. Here’s how:

For every thread, TSAV queries the Win32StartAddress, then runs VirtualQueryEx on that address to get the actual memory region characteristics (type, protection, allocation base).

Then it cross-correlates. TSAV runs after the L1–L5 pipeline completes, so it has a set of bad_alloc_bases – every allocation base address that triggered a modified-code finding, a missing PEB entry, a private RWX flag, a disk diff, or a hook prologue. If a thread’s start address falls inside one of those bases, it’s not just “in a module.” It’s in a module that Aether already flagged as tampered. That gets a TSAV_MODIFIED_HOST verdict at HIGH severity.

For suspended threads, TSAV goes further. It opens the thread with THREAD_GET_CONTEXT (falling back gracefully through an access-rights ladder if denied) and reads the actual Rip or Eip via GetThreadContext / Wow64GetThreadContext. The Win32StartAddress field is process-writable – any user-mode code can rewrite it. But the live instruction pointer of a suspended thread is captured at suspension time. If Rip points to a suspicious region while Win32StartAddress looks clean, that’s a spoofing IOC. TSAV_SUSPENDED_RIP, CRITICAL severity.
There’s also a spoof-trampoline denylist. TSAV resolves addresses for commonly abused API functions – LoadLibraryA, LoadLibraryW, RtlExitUserThread, WinExec, CreateProcessA, VirtualAlloc, and others – and checks if any thread’s Win32StartAddress matches one of them. These are the functions that loaders use as spoof targets because they’re always present and always look legitimate. A match gets TSAV_SPOOF_TRAMPOLINE at MEDIUM severity.

Verdict	Severity	What it catches
TSAV_SHELLCODE_PRIVATE	CRITICAL	CRITICAL Thread in MEM_PRIVATE + PAGE_EXECUTE_*. Classic shellcode.
TSAV_SUSPENDED_RIP	CRITICAL	CRITICAL Rip disagrees with Win32StartAddress, points to suspicious memory. Spoofing or hijack.
TSAV_HOLLOWED_HOST	HIGH	Thread in MEM_IMAGE not in the PEB module list. Hollowed DLL.
TSAV_MODIFIED_HOST	HIGH	Thread in an allocation already flagged by L1–L5. Cross-correlation.
TSAV_STAGED_PRIVATE_RW	HIGH	HIGH Thread in MEM_PRIVATE + PAGE_READWRITE. Pre-VirtualProtect staging.
TSAV_MAPPED_NONPE	MEDIUM	Thread in MEM_MAPPED without PE headers. sRDI-style loader.
TSAV_SPOOF_TRAMPOLINE	MEDIUM	Start address matches a known spoof target function.

The following PoC demonstrates the Aether ability to detect hollowed and APC injection

Entropy analysis & Shellcode heuristics

Scanning for a plain-text MZ magic and PE-headers in MEM_PRIVATE regions is easy find. The harder case when loaders adopt encryption which is the challenging puzzle. C2 frameworks such as Cobalt-Strike, Havoc or equivalent uses uses sleep_masks. when the beacon sleeps, the Shellcode is encrypted in place. that means scanning an encrypted region while the beacon is sleeping observes none structural pattern and sometimes the memory region during a sleep is set as non-executable.

I am developing this feature slowly to approach it better, starting by focusing on XOR pattern. Aether does not brute-force keys but it derives them from unknown plain-text with better algorithm. at each offset in the memory buffer, assume a PE header starts there. Compute what the XOR key must be if offset 0x4E (relative to the PE start) contains the encrypted DOS stub:

key[i % key_len] = buf[offset + 0x4E + i] ^ dos_stub[i]

For a 4-byte key, you only need four strategically chosen positions within the 39-byte stub to recover all four key bytes. Positions are selected so that (0x4E + p) mod key_len covers every key slot exactly once. Once the candidate key is derived, verify it against the full stub. If the entire 39-byte string decrypts correctly, you almost certainly have the right key.

But Aether doesn’t stop at the stub. A successful candidate has to pass four consistency checks:

XOR-decrypting offset 0 with the derived key produces MZ (the DOS signature).
The full 39-byte DOS stub at offset 0x4E decrypts correctly.
The e_lfanew field at offset 0x3C (decrypted) yields a value in [0x40, 0x1000] – a sane PE header offset.
XOR-decrypting four bytes at the decrypted e_lfanew offset produces PE\0\0.

The XOR-PE detector handles one class of encrypted payloads: PE files under a rolling XOR key with the DOS stub intact. But plenty of payloads don’t look like PE at all. Raw shellcode staged by Cobalt Strike’s execute-assembly, Donut-wrapped .NET payloads during the sleep phase, custom loaders using AES or RC4 – these sit in memory as high-entropy blobs with no structural PE markers to anchor on.

The obvious approach is Shannon entropy algorithm that check randomness of byte values in region, flag anything above a threshold. this what most write-ups suggests and what I implemented first. it doesn’t work enough.

On facts, high-entropy private memory is everywhere. Browser engines are the worst offenders, Edge’s V8 JIT allocates hundred’s of MEM_PRIVATE regions for Javascript object heaps, compressed bytecode caches, serialised data structures, and string interning tables. These routinely hit entropy values of 7.3, Edge produces +20 fp from a single process at least.
however, to reduce the amount of FP, aether does the same way for modified code problem, don’t rely on single signal but later multiple independent indicators and require corroboration as the following:

Sliding-window entropy instead of whole-region averaging: A 512-byte encrypted shellcode blob inside a 64 KB region produces a whole-region entropy of maybe 5.5 – below any useful threshold. Aether slides a 256-byte window across the region in 64-byte strides and tracks the peak. A small encrypted payload inside a larger allocation still produces a peak window near 7.9, even if the surrounding bytes are structured code. The window size is chosen because 256 bytes gives log2(256) = 8.0 as the theoretical maximum, and encrypted data consistently lands between 7.8 and 7.99 at this scale.
Chi-squared uniformity test on the peak window. Shannon entropy tells you bytes are spread across many values. Chi-squared tells you whether they’re spread uniformly. Encrypted data (AES, RC4, strong XOR) produces a nearly perfect uniform distribution – chi-squared values near 256.0 against the expected uniform. Compressed data (zlib, LZMA) has high entropy but non-uniform distribution – chi-squared in the 400-800 range. JIT code is higher still. This single metric separates “encrypted” from “compressed” better than entropy alone.
Null byte ratio: x86/x64 code is full of null bytes – they appear in immediate operands, displacement fields, padding, and instruction encoding. A typical code region has 5-15% null bytes. Encrypted data has about 1/256 = 0.39%. If a high-entropy window has near-zero null bias, that’s a strong encryption indicator.
Shellcode prelude detection: instead of looking for generic single-byte opcodes (which match everything), Aether checks the first 64 bytes for multi-byte instruction sequences that are genuinely specific to shellcode e.g (FC E8 – CLD; CALL rel32, the block_api pattern used by nearly every Metasploit payload).
Entropy gradient detection : encoded shellcode (shikata_ga_nai, custom XOR stubs, LZNT1-compressed stagers) has a distinctive structure: a short plaintext decoder loop at the front, followed by an abrupt entropy jump where the encoded payload begins. The decoder stub is structured code (entropy 4.0-6.0). The payload is encrypted noise (entropy 7.0+). This transition pattern – low header, high body, gap of at least 1.5 – doesn’t occur in compressed resources (which are dense from byte 0) or in JIT code (which has relatively uniform entropy throughout). When Aether detects this gradient, it fires ENTROPY_SHELLCODE regardless of whether it found a prelude match.

High-entropy alone, no matter how high, does not return a finding. in other word, there is no standalone “this region has high entropy”. I did a lot of testing and there are still number of FPs specially in browsers, electron apps..etc. however, this will definitely helps in other parts of analysis if you combine it with TSAV for example or frequent beaconing hunt mode flag.

Dump memory region

Aether allows you to extract memory regions with custom base address offset and desired size. the output is clear enough to help analyst identify the suspicious area. This feature might need more scalability while switching between different addresses and sizes. but the purpose is to do this after the tool suggest which region is likely malicious/ suspicious.

Use Aether to dump memory region by the base address and size

Signature scanning.

Beyond structural analysis, Aether runs a traditional pattern scanner with a first-byte index that bins signatures into 256 buckets. Each byte position in the buffer only checks patterns whose first byte matches – a simple optimization that brings the inner loop from O(bytes * patterns) down to O(bytes * avg_bucket_size), which is 50–100x faster with hundreds of signatures loaded.

Patterns are scanned in both ASCII and UTF-16LE encoding. This matters for .NET processes. The CLR stores all managed strings as UTF-16LE internally, so a webshell string like msxsl:script appears in memory as 6D 00 73 00 78 00 73 00 6C 00 3A 00 73 00 63 00 72 00 69 00 70 00 74 00. Missing UTF-16LE means missing the majority of webshell indicators in managed processes. I wrote this feature to address Phantom .ASP loader I released a couple of months ago.

Signatures are loaded from JSON rule packs dropped into a rules/ directory. No recompilation needed. The repo ships some rules for Cobalt Strike, Meterpreter, Sliver, Brute Ratel, Havoc, Mimikatz, PowerShell cradles, .NET offensive tools (Rubeus, Seatbelt, SharpHound), Phantom Loader and generic webshell patterns. Each signature carries a weight, a category, and a MITRE ATT&CK mapping. Categories aggregate into a score – a process that hits four XSLT-related patterns with a combined weight of 18 is reported differently than a process with one generic string match at weight 2.

Scan a process for malicious pattern in memory based on rules-engine

Beaconing detection, timing analysis

The –hunt flag mode is separate from scanning or –network flag option. It monitors a process’s outbound connections over time to detect c2 beacon patterns.

The first version of this was was a simple hit counter, you can view diff on the following commit, by using a simple hit counter in a loop, count how many times a remote end-points appears, flag anything above a threshold. I have removed that feature because a process might hit the same CDN endpoint 50 times in 30 seconds because it’s lazy-loading web fonts :).

a C2 beacon with a 60 second sleep interval only shows up three times in three minutes. So the rewrite focuses on what actually distinguishes from normal traffic.

here’s how it works now:

Connection event tracking: each end-point gets 64 entry ring buffer of timestamps, when a connection newly appears in the TCP table, that’s new connection event and current tick get’s recorded. A persistent TCP session that says EASTABLISHED across multiple polls only counts once and this is critical because without a single long-lived WebSocket connection would accumulate “hits” every poll cycle and look like periodic beaconing.
Previous-set differencing. Every poll cycle, the monitor tracks which endpoint keys were present. On the next cycle, only endpoints that are in the current table but weren’t in the previous table count as new arrivals. This is what separates “a new short-lived connection to the same server” from “the same connection still open from last time.”
Coefficient of Variation (CoV) scoring. Once an endpoint has three or more recorded events, the monitor computes inter-arrival intervals between consecutive timestamps. From those intervals it derives the mean, standard deviation, and Coefficient of Variation (stddev / mean). CoV is a dimensionless measure of how regular the intervals are:

1. CoV < 0.20 (HIGH) confidence beacon. Intervals are within 20% of the mean. This is a tight callback loop with low jitter. Cobalt Strike at 10% jitter lands around CoV 0.06-0.12. Metasploit’s default reverse_https with no jitter lands below 0.05.

2. CoV 0.20-0.40 (MEDIUM) confidence. More jitter, but still periodic. Cobalt Strike with 30-40% jitter, or Havoc’s default sleep configuration.

3. CoV > 0.40 (Low). Human-driven or bursty traffic. Not flagged.
Endpoint filtering. Loopback (127.), link-local (169.254.), and null addresses are excluded before tracking. These are never C2 destinations and would otherwise dominate the results on machines running local services.
Time-based monitoring. The CLI accepts –hunt PID SLEEP_MS DURATION_S. The monitor polls every SLEEP_MS milliseconds for DURATION_S seconds total. Default is 2-second polls for 120 seconds. Longer durations catch slower beacons – a 60-second sleep interval needs at least 3 minutes of monitoring to accumulate enough events for scoring. The old version ran a fixed 15 iterations regardless of timing, which meant it was simultaneously too short for slow beacons and too long for fast ones.
Persist detection. while monitoring the beaconing into an end-point, and it was >50% of polls for at least 10 seconds, it scores as persistent long-lived session. Presence ratio =>80% gets a PERSIST verdict.

Please note: for sleep-0 beacon you can use –hunt PID 500 60 (poll every 500ms for 60s) if you selected 2ms as shown in the proof of concept below, it works but hammers the TCP tables so 300-500ms more than fast enough and lighter.

The following screenshot below shows the ability to detect a beaconing across different sleep/jitter configs.

putting it together

a typical use case during an incident investigation:

# find the processes pid you care about
aether.exe --lookup "runtime.exe"

# scan the suspicious one by process id 
aether.exe --scan --pid 4820 --verbose

# scan the suspicious one by process name
aether.exe --scan --lookup runtime.exe --verbose

# export for the SIEM or the report
aether.exe --pid 4820 --json > findings.json

# sweep every process on the host
aether.exe --scan-all --json > host-sweep.json

# dump a flagged region for offline analysis
aether.exe --dump 4820 0x20F4A310000 4096

# check for beaconing while you investigate
aether.exe --hunt 4820 2000 20

# check for any networking activity 
aether.exe --network 4820 

# perform rules signature scan based on available rules 

aether.exe --scan --pid 4829 --rules <NAME> or all

# find the processes pid you care about
aether.exe --lookup "runtime.exe"

# scan the suspicious one by process id 
aether.exe --scan --pid 4820 --verbose

# scan the suspicious one by process name
aether.exe --scan --lookup runtime.exe --verbose

# export for the SIEM or the report
aether.exe --pid 4820 --json > findings.json

# sweep every process on the host
aether.exe --scan-all --json > host-sweep.json

# dump a flagged region for offline analysis
aether.exe --dump 4820 0x20F4A310000 4096

# check for beaconing while you investigate
aether.exe --hunt 4820 2000 20

# check for any networking activity 
aether.exe --network 4820 

# perform rules signature scan based on available rules 

aether.exe --scan --pid 4829 --rules <NAME> or all

Limitations and what’s next

here is a list of the tool limitation by design or capabilities:

Aether is usermode-only tool and does not have kernel driver, therefor there is no rootkit detection. however, it supports process elevated context to access administrative processes.
Ipv4 is not yet supported in networking
The on-disk diff silently skips modules whose files are locked or deleted.
The TSAV RIP probe only works on Suspended threads a Win32StartAddress rewrite on a running thread can only be caught if the thread happends to be parked in a wait at scan time.
Beacon detector’s CoV scoring needs at least three connections events to produce verdict so beacons with sleep intervals longer than duration_s / 3 won’t score – extend the duration accordingly.

On the roadmap:

Adding –dump-compare for diffing a region between two scans to catch staged.
API hash constant detection
UDP beaconing tracking
Fixing JSON syntax
FQDN resolve mechanism