Muzan Writeup

Introduction

Alright, let’s go through this reverse engineering challenge. I’m going to walk you through what I saw, why it looked that way, how I de-obfuscated the obfuscator (Alcatraz), and how I turned the scrambled verification logic into a clean, solvable equation. how to verify that every step is actually works. No hand-waving. We’ll go from absolutely nothing to a working, verifiable flag solver.


First look: opening the binary and getting a feel for it

I opened the sample in IDA x64 and jumped into the function that prints “Enter The Flag:” and “Correct Flag :)”. That’s our target. At a glance, the control flow graph is a mess, lots of tiny basic blocks, opaque jumps, and a dispatcher vibe. This is textbook control-flow flattening. It’s what Alcatraz does: bulldozing the natural CFG and replacing it with a state machine driven by a jump table or a computed indirect jump.

At this point I took two screenshots in IDA to capture the baseline:

  • screenshot 1: the graph view of the target function before any cleanup

  • screenshot 2: a disassembly close-up showing those small gadgets with pushf/pop rax/popf and some rotate (rol/ror) noise around what looks like a constant setup of registers

// positive sp value has been detected, the output may be wrong!
int __fastcall main_0(int argc, const char **argv, const char **envp)
{
  unsigned __int8 *v7; // rax
  __int64 n8; // rbx
  unsigned __int8 *v9; // rbp
  __int64 n0xA; // rdi
  __int64 v11; // rsi
  unsigned __int8 *v12; // r9
  int v13; // r10d
  int v14; // r11d
  __int64 v15; // r12
  __int64 v16; // r13
  __int64 v17; // r14
  __int64 v18; // r15
  __m128i v19; // xmm2
  __m128i v20; // xmm3
  __m128i si128; // xmm4
  int n4; // eax
  __m128i v28; // xmm3
  __m128i v29; // xmm3
  int v30; // eax
  int v31; // ecx
  int v32; // eax
  int v33; // ecx
  int v34; // eax
  int v35; // ecx
  int v36; // eax
  int v37; // ecx
  int v38; // eax
  int v39; // ecx
  int v40; // eax
  int v41; // ecx
  int v42; // eax
  int v43; // ecx
  int v44; // eax
  int v45; // ecx
  int v46; // eax
  int v47; // ecx
  unsigned __int8 *v48; // rax
  int v49; // ecx
  unsigned __int8 *v50; // rax
  int v55; // eax
  unsigned __int64 v56; // rdx
  int v57; // r8d
  int v58; // ecx
  int v59; // r8d
  unsigned __int64 v60; // rcx
  __int64 v70; // [rsp-78h] [rbp-D8h]
  __int64 v71; // [rsp-70h] [rbp-D0h]
  __int64 v72; // [rsp-68h] [rbp-C8h]
  __int64 v73; // [rsp-60h] [rbp-C0h]
  __int64 v74; // [rsp-58h] [rbp-B8h]
  int v75; // [rsp-50h] [rbp-B0h]
  int v76; // [rsp-4Ch] [rbp-ACh]
  int v77; // [rsp-48h] [rbp-A8h]
  int v78; // [rsp-44h] [rbp-A4h]
  int v79; // [rsp-40h] [rbp-A0h]
  int n219540062; // [rsp-3Ch] [rbp-9Ch]
  int v81; // [rsp-38h] [rbp-98h]
  int v82; // [rsp-34h] [rbp-94h]
  _BYTE v83[32]; // [rsp-30h] [rbp-90h] BYREF
  __int64 v84; // [rsp-10h] [rbp-70h]
  char v85; // [rsp-8h] [rbp-68h]
  __int64 v86; // [rsp+10h] [rbp-50h]
  __int64 v87; // [rsp+18h] [rbp-48h]
  __int64 v88; // [rsp+20h] [rbp-40h]
  __int64 v89; // [rsp+28h] [rbp-38h]
  unsigned __int8 *v90; // [rsp+30h] [rbp-30h]
  __int64 n8_1; // [rsp+40h] [rbp-20h]
  __int64 v92; // [rsp+48h] [rbp-18h]
  __int64 n0xA_1; // [rsp+50h] [rbp-10h]

  v90 = v7;
  __asm { pushf }
  __asm { pushf }
  n4 = __ROL4__((-__ROL4__((-__ROL4__(-14014273, 85) - 1483494869) ^ 0xF5BA43AB, 58) - 762076986) ^ 0x9CAAFAA9, 61);
  __asm { popf }
  while ( 1 )
  {
    while ( 1 )
    {
      while ( 1 )
      {
        while ( 1 )
        {
          while ( n4 == 4 )
          {
            __asm { popf }
            v17 = *(unsigned int *)(v9 - 13);
            v12 = v9 - 121;
            v13 = *(_DWORD *)(v9 - 17);
            v18 = *(unsigned int *)(v9 - 21);
            v14 = *(_DWORD *)(v9 - 25);
            v15 = *(unsigned int *)(v9 - 29);
            n8 = *(unsigned int *)(v9 - 33);
            v16 = *(unsigned int *)(v9 - 37);
            v11 = *(unsigned int *)(v9 - 41);
            __asm { pushf }
            __asm { pushf }
            n4 = __ROL4__((-__ROL4__(-1278099038, 35) - 1836776992) ^ 0xF3F6FECB, 247);
            __asm { popf }
          }
          if ( !n4 )
          {
            __asm { popf }
            goto LABEL_20;
          }
          if ( n4 != 3 )
            break;
          __asm { popf }
          LODWORD(v50) = rand();
          *(_DWORD *)&v9[4 * n8++ - 41] ^= (unsigned int)v50;
          _CF = (unsigned __int64)n8 < 8;
          _OF = __OFSUB__(n8, 8);
          _ZF = n8 == 8;
          _SF = n8 - 8 < 0;
          v90 = v50;
          if ( n8 < 8 )
          {
            __asm { pushf }
            __asm { pushf }
            n4 = __ROL4__((-__ROL4__(-626440728, 186) - 1961474012) ^ 0xE4ABA30D, 40);
            __asm { popf }
          }
          else
          {
            __asm { pushf }
            __asm { pushf }
            n4 = __ROL4__(
                   (-__ROL4__((-__ROL4__(-788719454, 80) - 1077230303) ^ 0xC08CF1EC, 146) - 947773240) ^ 0xB070821C,
                   78);
            __asm { popf }
          }
        }
        if ( n4 != 1 )
          break;
        __asm { popf }
        v19 = _mm_add_epi32(
                v19,
                _mm_xor_si128(
                  _mm_unpacklo_epi16(
                    _mm_unpacklo_epi8(_mm_cvtsi32_si128(*(_DWORD *)&v90[(_QWORD)v9 - 9]), (__m128i)0LL),
                    (__m128i)0LL),
                  si128));
        v20 = _mm_add_epi32(
                v20,
                _mm_xor_si128(
                  _mm_unpacklo_epi16(
                    _mm_unpacklo_epi8(_mm_cvtsi32_si128(*(_DWORD *)&v90[(_QWORD)v9 - 5]), (__m128i)0LL),
                    (__m128i)0LL),
                  si128));
        _CF = (unsigned __int64)(v90 + 8) < 0x28;
        _OF = __OFSUB__(v90 + 8, 40);
        _ZF = v90 + 8 == (unsigned __int8 *)40;
        _SF = (__int64)(v90 - 32) < 0;
        if ( (__int64)(v90 + 8) < 40 )
        {
          v90 += 8;
          __asm { pushf }
          __asm { pushf }
          n4 = __ROL4__((-__ROL4__(-1627802732, 100) - 968880555) ^ 0xD624D50C, 137);
          __asm { popf }
        }
        else
        {
          v90 += 8;
          __asm { pushf }
          __asm { pushf }
          n4 = __ROL4__((-__ROL4__(754593784, 129) - 543262696) ^ 0x85AA1838, 93);
          __asm { popf }
        }
      }
      if ( n4 != 5 )
        break;
      __asm { popf }
      v55 = v11 ^ *((_DWORD *)v12 + 1) ^ *(_DWORD *)v12;
      v56 = (unsigned __int64)(unsigned int)n0xA >> 1;
      v57 = v11 ^ v16 ^ *((_DWORD *)v12 + 1) ^ v55;
      v58 = n8 ^ v57;
      v59 = v14 ^ n8 ^ v15 ^ v55 ^ n8 ^ v57;
      v60 = v13 ^ (unsigned int)v17 ^ v59 | ((unsigned __int64)(v13 ^ v59 ^ v14 ^ (unsigned int)v18 ^ v58) << 32);
      _CF = v60 < *(_QWORD *)&v9[8 * v56 - 81];
      _OF = __OFSUB__(v60, *(_QWORD *)&v9[8 * v56 - 81]);
      _ZF = v60 == *(_QWORD *)&v9[8 * v56 - 81];
      _SF = (__int64)(v60 - *(_QWORD *)&v9[8 * v56 - 81]) < 0;
      if ( v60 == *(_QWORD *)&v9[8 * v56 - 81] )
      {
        v90 = (unsigned __int8 *)(v13 ^ (unsigned int)v17 ^ v59);
        __asm { pushf }
        __asm { pushf }
        n4 = __ROL4__(
               (-__ROL4__(
                   (-__ROL4__((-__ROL4__(1718938224, 128) - 2024341848) ^ 0x8BE47D95, 99) - 2010330144) ^ 0x914189A9,
                   78)
              - 856877530)
             ^ 0xEEF66678,
               228);
        __asm { popf }
      }
      else
      {
        v90 = (unsigned __int8 *)(v13 ^ (unsigned int)v17 ^ v59);
        __asm { pushf }
        __asm { pushf }
        n4 = __ROL4__(
               (-__ROL4__(
                   (-__ROL4__((-__ROL4__(-1018769451, 147) - 120432411) ^ 0xC33F9DF2, 33) - 757820733) ^ 0xC0732295,
                   212)
              - 1770517206)
             ^ 0xAC9DFC44,
               90);
        __asm { popf }
      }
    }
    if ( n4 == 7 )
      break;
    switch ( n4 )
    {
      case 6:
        __asm { popf }
        n0xA = (unsigned int)(n0xA + 2);
        v12 += 8;
        _CF = (unsigned int)n0xA < 0xA;
        _OF = __OFSUB__((_DWORD)n0xA, 10);
        _ZF = (_DWORD)n0xA == 10;
        _SF = (int)n0xA - 10 < 0;
        if ( (int)n0xA < 10 )
        {
          __asm { pushf }
          __asm { pushf }
          n4 = __ROL4__(
                 (-__ROL4__(
                     (-__ROL4__((-__ROL4__(-76654901, 182) - 537288865) ^ 0xEA90545E, 179) - 1255342175) ^ 0xE967F231,
                     57)
                - 803521327)
               ^ 0xD7F7DA55,
                 19);
          __asm { popf }
        }
        else
        {
          __asm { pushf }
          __asm { pushf }
          n4 = __ROL4__((-__ROL4__(316099821, 231) - 1678384859) ^ 0xB04F6A9F, 1);
          __asm { popf }
        }
        break;
      case 2:
        __asm { popf }
        v28 = _mm_add_epi32(v20, v19);
        v29 = _mm_add_epi32(v28, _mm_srli_si128(v28, 8));
        v20 = _mm_add_epi32(v29, _mm_srli_si128(v29, 4));
        srand(_mm_cvtsi128_si32(v20));
        v30 = *(v9 - 4);
        *(_DWORD *)(v9 - 121) = *(v9 - 6) | ((*(v9 - 7) | ((*(v9 - 8) | (*(v9 - 9) << 8)) << 8)) << 8);
        v31 = *(v9 - 2) | ((*(v9 - 3) | ((v30 | (*(v9 - 5) << 8)) << 8)) << 8);
        v32 = *v9;
        *(_DWORD *)(v9 - 117) = v31;
        v33 = v9[2] | ((v9[1] | ((v32 | (*(v9 - 1) << 8)) << 8)) << 8);
        v34 = v9[4];
        *(_DWORD *)(v9 - 113) = v33;
        v35 = v9[6] | ((v9[5] | ((v34 | (v9[3] << 8)) << 8)) << 8);
        v36 = v9[8];
        *(_DWORD *)(v9 - 109) = v35;
        v37 = v9[10] | ((v9[9] | ((v36 | (v9[7] << 8)) << 8)) << 8);
        v38 = v9[12];
        *(_DWORD *)(v9 - 105) = v37;
        v39 = v9[14] | ((v9[13] | ((v38 | (v9[11] << 8)) << 8)) << 8);
        v40 = v9[16];
        *(_DWORD *)(v9 - 101) = v39;
        n8 = n0xA;
        v41 = v9[18] | ((v9[17] | ((v40 | (v9[15] << 8)) << 8)) << 8);
        v42 = v9[20];
        *(_DWORD *)(v9 - 97) = v41;
        v43 = v9[22] | ((v9[21] | ((v42 | (v9[19] << 8)) << 8)) << 8);
        v44 = v9[24];
        *(_DWORD *)(v9 - 93) = v43;
        v45 = v9[26] | ((v9[25] | ((v44 | (v9[23] << 8)) << 8)) << 8);
        v46 = v9[28];
        *(_DWORD *)(v9 - 89) = v45;
        v47 = v9[29] | ((v46 | (v9[27] << 8)) << 8);
        v48 = (unsigned __int8 *)v9[30];
        v49 = (unsigned int)v48 | (v47 << 8);
        _CF = 0;
        _OF = 0;
        _ZF = v49 == 0;
        _SF = v49 < 0;
        *(_DWORD *)(v9 - 85) = v49;
        v90 = v48;
        __asm { pushf }
        __asm { pushf }
        n4 = __ROL4__((-__ROL4__(1333729645, 147) - 1309078879) ^ 0xE68E8EAF, 63);
        __asm { popf }
        break;
      case 8:
        __asm { popf }
        sub_140001020("Wrong Flag :(\n"); // printf
        exit(0);
      default:
LABEL_20:
        n8_1 = n8;
        v92 = v11;
        n0xA_1 = n0xA;
        v90 = v9;
        v89 = v15;
        v88 = v16;
        v87 = v17;
        v86 = v18;
        v9 = &v83[9];
        v75 = -559038737;
        v84 = 0;
        v85 = 0;
        v76 = -1163005939;
        v70 = __ROL8__((1283329420LL - __ROL8__(0x1D95F89EC94C687ALL, 108)) ^ 0x497C0EEB, 249);
        v71 = __ROL8__(
                (2008158680LL
               - __ROL8__(
                   (1547021613LL - __ROL8__((1172812282LL - __ROL8__(0xEB645AA5F7B0D2E7uLL, 154)) ^ 0x7912A431, 141))
                 ^ 0x7493360E,
                   96))
              ^ 0x75428303,
                255);
        v72 = __ROL8__((1338722600LL - __ROL8__(0x5E7024965AB4CD3ELL, 181)) ^ 0x56D2A6C6, 136);
        v73 = __ROL8__(0x31AFBF7651170D39LL, 120);
        v74 = __ROL8__(
                (1636449877LL
               - __ROL8__(
                   (1670977784LL - __ROL8__((1166723768LL - __ROL8__(0x3898B033E9195B23LL, 10)) ^ 0x5080DF90, 59))
                 ^ 0x702B0F49,
                   32))
              ^ 0x4CB808E7,
                98);
        memset(v83, 0, sizeof(v83));
        v77 = -17958194;
        v78 = -889275714;
        v79 = -559039810;
        n219540062 = 219540062;
        v81 = -556882451;
        v82 = -1163023331;
        sub_140001020("Enter The Flag:\n"); //printf
        sub_140001080("%s"); // scanf
        si128 = _mm_load_si128((const __m128i *)&xmmword_1400032A0);
        _CF = 0;
        _OF = 0;
        _ZF = 1;
        _SF = 0;
        n0xA = 0;
        v19 = 0;
        v20 = 0;
        __asm { pushf }
        __asm { pushf }
        n4 = __ROL4__(
               (-__ROL4__((-__ROL4__(1504726340, 137) - 244906837) ^ 0x83DFF658, 206) - 496977522) ^ 0xFEFAB4CA,
               207);
        __asm { popf }
        break;
    }
  }
  __asm { popf }
  sub_140001020("Correct Flag :)\n"); //printf
  return 0;
}

Those little sequences are important. They’re the “constant-setting gadgets” the obfuscator uses to hide simple mov instructions. Instead of mov rax, 0x1234..., you’ll see “do some nonsense, preserve flags, rotate, restore flags”, so constant propagation and pattern matchers don’t help you.


What Alcatraz did, plainly

Alcatraz applied control-flow flattening and instruction-level junking:

  • It flattened the original function into a dispatcher that sets a “state” and then jumps somewhere computed. This transforms the readable graph into spaghetti.

  • It replaced direct loads like mov reg, imm with tiny wrapped sequences: mov/movabs + pushf + rol + popf and similar. The register ends up with the same value, but only at the very end of that gadget and only at runtime.

  • It shields edges with indirect or computed branches so IDA can’t reliably recover “next block”, which collapses decompilation quality.

The good news is that stuff still executes. If we give ourselves a controlled little sandbox, we can run just those tiny slices and read the end state, which is the whole point of using Unicorn.


The plan to take our function back

I don’t want to run the whole program. I want micro-emulation. That’s why I use the toolchain is Capstone + Unicorn + Keystone + IDA:

import sys
from unicorn import *
from unicorn.x86_const import *
import capstone
import keystone
import ida_kernwin
import ida_funcs
import idaapi
import ida_nalt


class X86Deflattener:
    def __init__(self, filename, addr_hex, size_hex):
        self.filename = filename
        self.start = int(addr_hex, 16)
        self.size = int(size_hex, 16)
        self.ADDRESS = 0x0
        self.cp = capstone.Cs(capstone.CS_ARCH_X86, capstone.CS_MODE_64)
        self.cp1 = capstone.Cs(capstone.CS_ARCH_X86, capstone.CS_MODE_64)
        self.reg_dict = {"invalid": 0, "ah": 1, "al": 2, "ax": 3, "bh": 4, "bl": 5, "bp": 6, "bpl": 7, "bx": 8, "ch": 9, "cl": 10,

                         "cs": 11, "cx": 12, "dh": 13, "di": 14, "dil": 15, "dl": 16, "ds": 17, "dx": 18, "eax": 19, "ebp": 20,

                         "ebx": 21, "ecx": 22, "edi": 23, "edx": 24, "eflags": 25, "eip": 26, "es": 28, "esi": 29, "esp": 30,

                         "fpsw": 31, "fs": 32, "gs": 33, "ip": 34, "rax": 35, "rbp": 36, "rbx": 37, "rcx": 38, "rdi": 39, "rdx": 40,

                         "rip": 41, "rsi": 43, "rsp": 44, "si": 45, "sil": 46, "sp": 47, "spl": 48, "ss": 49, "cr0": 50, "cr1": 51,

                         "cr2": 52, "cr3": 53, "cr4": 54, "cr8": 58, "dr0": 66, "dr1": 67, "dr2": 68, "dr3": 69, "dr4": 70,

                         "dr5": 71, "dr6": 72, "dr7": 73, "fp0": 82, "fp1": 83, "fp2": 84, "fp3": 85, "fp4": 86, "fp5": 87,

                         "fp6": 88, "fp7": 89, "k0": 90, "k1": 91, "k2": 92, "k3": 93, "k4": 94, "k5": 95, "k6": 96, "k7": 97,

                         "mm0": 98, "mm1": 99, "mm2": 100, "mm3": 101, "mm4": 102, "mm5": 103, "mm6": 104, "mm7": 105, "r8": 106,

                         "r9": 107, "r10": 108, "r11": 109, "r12": 110, "r13": 111, "r14": 112, "r15": 113, "st0": 114, "st1": 115,

                         "st2": 116, "st3": 117, "st4": 118, "st5": 119, "st6": 120, "st7": 121, "xmm0": 122, "xmm1": 123,

                         "xmm2": 124, "xmm3": 125, "xmm4": 126, "xmm5": 127, "xmm6": 128, "xmm7": 129, "xmm8": 130, "xmm9": 131,

                         "xmm10": 132, "xmm11": 133, "xmm12": 134, "xmm13": 135, "xmm14": 136, "xmm15": 137, "xmm16": 138,

                         "xmm17": 139, "xmm18": 140, "xmm19": 141, "xmm20": 142, "xmm21": 143, "xmm22": 144, "xmm23": 145,

                         "xmm24": 146, "xmm25": 147, "xmm26": 148, "xmm27": 149, "xmm28": 150, "xmm29": 151, "xmm30": 152,

                         "xmm31": 153, "ymm0": 154, "ymm1": 155, "ymm2": 156, "ymm3": 157, "ymm4": 158, "ymm5": 159, "ymm6": 160,

                         "ymm7": 161, "ymm8": 162, "ymm9": 163, "ymm10": 164, "ymm11": 165, "ymm12": 166, "ymm13": 167, "ymm14": 168,

                         "ymm15": 169, "ymm16": 170, "ymm17": 171, "ymm18": 172, "ymm19": 173, "ymm20": 174, "ymm21": 175,

                         "ymm22": 176, "ymm23": 177, "ymm24": 178, "ymm25": 179, "ymm26": 180, "ymm27": 181, "ymm28": 182,

                         "ymm29": 183, "ymm30": 184, "ymm31": 185, "zmm0": 186, "zmm1": 187, "zmm2": 188, "zmm3": 189, "zmm4": 190,

                         "zmm5": 191, "zmm6": 192, "zmm7": 193, "zmm8": 194, "zmm9": 195, "zmm10": 196, "zmm11": 197, "zmm12": 198,

                         "zmm13": 199, "zmm14": 200, "zmm15": 201, "zmm16": 202, "zmm17": 203, "zmm18": 204, "zmm19": 205,

                         "zmm20": 206, "zmm21": 207, "zmm22": 208, "zmm23": 209, "zmm24": 210, "zmm25": 211, "zmm26": 212,

                         "zmm27": 213, "zmm28": 214, "zmm29": 215, "zmm30": 216, "zmm31": 217, "r8b": 218, "r9b": 219, "r10b": 220,

                         "r11b": 221, "r12b": 222, "r13b": 223, "r14b": 224, "r15b": 225, "r8d": 226, "r9d": 227, "r10d": 228,

                         "r11d": 229, "r12d": 230, "r13d": 231, "r14d": 232, "r15d": 233, "r8w": 234, "r9w": 235, "r10w": 236,

                         "r11w": 237, "r12w": 238, "r13w": 239, "r14w": 240, "r15w": 241, "idtr": 242, "gdtr": 243, "ldtr": 244,

                         "tr": 245, "fpcw": 246, "fptag": 247, "msr": 248, "mxcsr": 249, "fs_base": 250, "gs_base": 251,

                         "flags": 252, "rflags": 253, "fip": 254, "fcs": 255, "fdp": 256, "fds": 257, "fop": 258, "ending": 259}
        with open(self.filename, "rb") as f:
            data = f.read()
        self.code = data[self.start:self.start + self.size]
        self.start_addr = []
        self.end_addr = []
        self.start_dest = {}
        self.dest_dict = {}
        self.mov_start = {}
        self.mov_end = {}
        self.mov_result = {}
        self.tmp_dest = None
        self.tmp_emu_start = None

    def gen_jmp(self, dst_hex, cur):
        if cur == -1:
            cur = 0
        dst = int(dst_hex, 16)
        dstb = (dst - cur + 0xFFFFFFFF - 4) & 0xFFFFFFFF
        a = (dstb >> 24) & 0xff
        b = (dstb >> 16) & 0xff
        c = (dstb >> 8) & 0xff
        d = dstb & 0xFF
        fb = bytes([0xe9, d, c, b, a])
        for i in self.cp1.disasm(fb, 0, len(fb)):
            print(i.mnemonic + " " + i.op_str)
        return [0xe9, d, c, b, a]

  
    @staticmethod
    def _hook_cf(uc, address, size, self_obj):
        for _ in self_obj.cp1.disasm(uc.mem_read(address, size), 0, size):
            print(">>> Tracing instruction at 0x%x, instruction size = 0x%x" % (address, size))
        for i in self_obj.cp1.disasm(uc.mem_read(address, size), 0, size):
            if i.mnemonic == "call":
                uc.reg_write(UC_X86_REG_EIP, address + size)
        if address in self_obj.start_addr and address != 1:
            self_obj.tmp_dest = address
            print("DESTINATION FOUND")
            uc.emu_stop()
  
    @staticmethod
    def _hook_mov(uc, address, size, self_obj):
        print("demov: ins at 0x%x, instruction size = 0x%x , end at 0x%x" % (address, size, self_obj.mov_end[self_obj.tmp_emu_start]))
        if address == self_obj.mov_end[self_obj.tmp_emu_start]:
            op1 = None
            reg_num = None
            if self_obj.mov_start.get(self_obj.tmp_emu_start) is not None:
                op1 = self_obj.mov_start.get(self_obj.tmp_emu_start)
                reg_num = self_obj.reg_dict.get(op1)
            if reg_num is not None:
                print("mov result :" + str(hex(uc.reg_read(reg_num))))
                if self_obj.mov_result.get(self_obj.tmp_emu_start) is not None:
                    self_obj.mov_result[self_obj.tmp_emu_start].extend([uc.reg_read(reg_num), op1])
                else:
                    self_obj.mov_result[self_obj.tmp_emu_start] = [uc.reg_read(reg_num), op1]
            uc.emu_stop()

  
    def scan(self):
        flag1 = False
        prev = ""
        pre_prev = ""
        op = ""
        for i in self.cp.disasm(self.code, 0, len(self.code)):
            print("[0x%x]:%s %s" % (i.address, i.mnemonic, i.op_str))
            if i.mnemonic + " " + i.op_str == "pop rax" and prev == "popf ":
                self.start_addr.append(self.ADDRESS + i.address - 2)
                pre_prev = prev
                prev = i.mnemonic + " " + i.op_str
                flag1 = True
            elif i.mnemonic + " " + i.op_str == "pushf " and prev == "push rax":
                pre_prev = prev
                prev = i.mnemonic + " " + i.op_str
                self.end_addr.append(self.ADDRESS + i.address - 1)
            elif flag1 and i.mnemonic == "jmp":
                self.start_dest[self.ADDRESS + i.address - 3] = i.op_str
                pre_prev = prev
                prev = i.mnemonic + " " + i.op_str
                flag1 = False
            elif i.mnemonic + " " + i.op_str == "pushf " and (prev[0:4] == "mov " or prev[0:6] == "movabs"):
                if prev[0:4] == "mov ":
                    op = prev.split(",")[0][4:]
                    self.mov_start[i.address - 5] = op
                elif prev[0:6] == "movabs":
                    op = prev.split(",")[0][7:]
                    self.mov_start[i.address - 10] = op
                    self.mov_result[i.address - 10] = ["abs"]
                pre_prev = prev
                prev = i.mnemonic + " " + i.op_str
            elif i.mnemonic != "pushf" and prev[0:4] == "popf" and pre_prev[0:3] == "rol":
                assert pre_prev[4:4 + len(op)] == op
                self.mov_end[list(self.mov_start.keys())[-1]] = i.address - 2
                pre_prev = prev
                prev = i.mnemonic + " " + i.op_str
            else:
                pre_prev = prev
                prev = i.mnemonic + " " + i.op_str
  

    def emulate_cf(self):
        try:
            mu = Uc(UC_ARCH_X86, UC_MODE_64)
            mu.mem_map(self.ADDRESS, 10 * 1024 * 1024)
            mu.mem_write(self.ADDRESS, self.code)
            mu.reg_write(UC_X86_REG_RSP, self.ADDRESS + 0x20000)
            for addr in self.end_addr:
                mu.hook_add(UC_HOOK_CODE, X86Deflattener._hook_cf, self, addr, len(self.code) - addr)
                mu.emu_start(addr, self.ADDRESS + len(self.code))
                if self.tmp_dest is not None:
                    self.dest_dict[addr] = self.tmp_dest
                    self.tmp_dest = None
                else:
                    raise RuntimeError("Couldn't find the successor")
        except UcError as e:
            print("ERROR: %s" % e)
  
    def emulate_movs(self):
        try:
            mu1 = Uc(UC_ARCH_X86, UC_MODE_64)
            mu1.mem_map(self.ADDRESS, 10 * 1024 * 1024)
            mu1.mem_write(self.ADDRESS, self.code)
            mu1.reg_write(UC_X86_REG_RSP, self.ADDRESS + 0x20000)
            for addr in self.mov_start:
                self.tmp_emu_start = addr
                mu1.hook_add(UC_HOOK_CODE, X86Deflattener._hook_mov, self, addr, len(self.code) - addr)
                mu1.emu_start(addr, self.ADDRESS + len(self.code))
        except UcError as e:
            print("ERROR: %s" % e)

    def patch(self):
        with open(self.filename, "rb") as f:
            org = bytearray(f.read())
        ks = keystone.Ks(keystone.KS_ARCH_X86, keystone.KS_MODE_64)
        for movstart in self.mov_result:
            end = self.mov_end[movstart]
            r = self.mov_result.get(movstart)
            if len(r) == 3:
                insn = f"movabs {r[2]}, {hex(r[1])}"
            else:
                insn = f"mov {r[1]}, {hex(r[0])}"
            encoding, count = ks.asm(insn.encode("utf-8"))
            encoding.extend([0x90] * (end + 1 - movstart - len(encoding)))
            for i in range(movstart, end + 2):
                if i - movstart < len(encoding):
                    org[self.start + i] = encoding[i - movstart]
                else:
                    org[self.start + i] = 0x90
                print(f"[+] patching {hex(i)}")
        for end in self.dest_dict:
            dst = self.start_dest[self.dest_dict[end]]
            jmp = self.gen_jmp(dst, end)
            jmp.extend([0x90] * 8)
            for i in range(0, 13):
                org[self.start + end + i] = jmp[i]
        out = self.filename + "_deobf.exe"
        with open(out, "wb") as f:
            f.write(bytes(org))
        print("writing patch to:" + out)
  
    def run(self):
        for i in self.cp.disasm(self.code, 0, len(self.code)):
            print("[0x%x]:%s %s" % (i.address, i.mnemonic, i.op_str))
        self.scan()
        try:
            self.emulate_cf()
        except RuntimeError:
            pass
        self.emulate_movs()
        print("[+] >>> Emulation done. Start Patching")
        self.patch()

def ida_get_span():
    ea_str = ida_kernwin.ask_str("0x140009000", 0, "Enter any EA inside the function (hex)")
    if not ea_str:
        print("[x] Cancelled")
        return None
    try:
        ea = int(ea_str, 16)
    except ValueError:
        print("[x] Invalid hex.")
        return None
    func = ida_funcs.get_func(ea)
    if not func:
        print(f"[x] No function at 0x{ea:X}")
        return None
    first_tail_start = None
    it = ida_funcs.func_tail_iterator_t(func)
    ok = it.first()
    while ok:
        t = it.chunk()
        if t.start_ea != func.start_ea:
            if first_tail_start is None or t.start_ea < first_tail_start:
                first_tail_start = t.start_ea
        ok = it.next()
    if first_tail_start:
        contiguous_end = first_tail_start - 1
    else:
        contiguous_end = func.end_ea - 1
    start = func.start_ea
    size = contiguous_end - start + 1
    rva_start = start - idaapi.get_imagebase()
    print("=" * 60)
    print(f"Function Name : {ida_funcs.get_func_name(ea)}")
    print(f"Start EA      : 0x{start:X}")
    print(f"Contiguous End: 0x{contiguous_end:X}  (before first tail)")
    print(f"Contiguous Size: 0x{size:X} ({size} bytes)")
    print("-" * 60)
    print("=" * 60)
    return ida_nalt.get_input_file_path(), hex(rva_start), hex(size)

def main():
    params = ida_get_span()
    if not params:
        return
    filename, addr_hex, size_hex = params
    X86Deflattener(filename, addr_hex, size_hex).run()

if __name__ == "__main__":
    main()
  • Capstone will scan and tag the patterns we care about (dispatcher edges and constant gadgets).

  • Unicorn will execute just enough instructions to resolve what the obfuscator is hiding:

    • “If I start here and run until there, which basic block do I land on next?” → true successor

    • “If I execute this weird mov+pushf+rol+popf gadget until the end, what value is actually in the destination register?” → true constant

  • Keystone will reassemble clean replacements:

    • replace the whole gadget with a single mov or movabs,

    • replace the indirect/dispatcher transfer with a single jmp.

  • IDA gives me the file path, imagebase, and a clean contiguous slice of the function body to patch without accidentally touching a tail chunk. The goal isn’t to recompile the entire binary. It’s to surgically rewrite the function body back into something that looks like the original author wrote it.


Running the de-flattener and what to expect on screen

I loaded the script you saw into IDA’s Python console. The first thing it does is ask me for any address inside the function. I paste an EA from the body, and it computes a contiguous span from the function start up to before the first tail chunk. most Alcatraz’d functions have cold tails, so we don’t smash those while patching the hot block.

This is what it prints (yours will differ in addresses):

Function Name : main_0
Start EA      : 0x140009000
Contiguous End: 0x1400098CD  (before first tail)
Contiguous Size: 0x8CE (2254 bytes)

Now the fun part. The script does three steps:

  1. scan(): with Capstone it walks through the bytes and recognizes the obfuscator’s signatures:

    • finds places where a block ends and does an encoded jump via dispatcher

    • finds places where a mov-gadget starts, and where it ends, and which register it targets

  2. emulate_cf(): for each obfuscated edge it drops into Unicorn, maps memory, sets a fake stack, installs a code hook, and executes from the block-end until the hook detects we just entered a known block-start. That address is the true successor we’ll later hard-patch into a direct jmp.

  3. emulate_movs(): for each gadget it again uses Unicorn, runs from gadget start to gadget end, and at the end reads the destination register to capture the true constant. Some gadgets are movabs (64-bit immediate), those are handled too.

Once both emulations are done, the patch() step assembles two types of fixes with Keystone:

  • replaces each gadget with a single mov or movabs of the exact value we observed, and NOP the leftover bytes.

  • replaces each obfuscated transfer site with a single jmp rel32 to the true successor and pad with NOPs.

It writes a similar file named something like yourbinary_deobf.exe. Finally I re-open that in IDA and regenerate the function graph. It looks like a normal function again:

  • screenshot 3: the graph view of the same function after patching

  • screenshot 4: that same gadget location now shows a clean movabs rax, 0xdead... instead of pushf/rol/popf noise.

This is after we removed the flattening and constant cloaking, and the code is readable again.


Let’s read the real logic now

With the noise gone, IDA’s pseudocode makes sense. The function:

__int64 main_0_0()
{
  __m128i si128; // xmm4
  int n10; // edi
  __int64 n40; // rax
  __m128i v3; // xmm2
  __m128i v4; // xmm3
  __m128i v5; // xmm1
  __m128i v6; // xmm3
  __m128i v7; // xmm3
  __int64 n8; // rbx
  _DWORD *v9; // r9
  int v10; // eax
  int v11; // r8d
  int v12; // ecx
  int v13; // r8d
  _DWORD v15[10]; // [rsp+20h] [rbp-79h] BYREF
  _QWORD v16[5]; // [rsp+48h] [rbp-51h]
  int v17; // [rsp+70h] [rbp-29h]
  int v18; // [rsp+74h] [rbp-25h]
  int v19; // [rsp+78h] [rbp-21h]
  int v20; // [rsp+7Ch] [rbp-1Dh]
  int v21; // [rsp+80h] [rbp-19h]
  int n219540062; // [rsp+84h] [rbp-15h]
  int v23; // [rsp+88h] [rbp-11h]
  int v24; // [rsp+8Ch] [rbp-Dh]
  __int128 v25; // [rsp+90h] [rbp-9h] BYREF
  __int128 v26; // [rsp+A0h] [rbp+7h]
  __int64 v27; // [rsp+B0h] [rbp+17h]
  char v28; // [rsp+B8h] [rbp+1Fh]

  v17 = -559038737;
  v27 = 0;
  v28 = 0;
  v18 = -1163005939;
  v16[0] = 0x2672F0BC4D4B105CLL;
  v16[1] = 0x6066A4EC7505175FLL;
  v16[2] = 0x3431FBEA2D544958LL;
  v16[3] = 0x3931AFBF7651170DLL;
  v16[4] = 0x6B30FCFC27034543LL;
  v25 = 0;
  v19 = -17958194;
  v26 = 0;
  v20 = -889275714;
  v21 = -559039810;
  n219540062 = 219540062;
  v23 = -556882451;
  v24 = -1163023331;
  printf("Enter The Flag:\n");
  scanf("%s", &v25);
  si128 = _mm_load_si128((const __m128i *)&xmmword_7FF76E9C32A0); 
  // .rdata:00007FF76E9C32A0 xmmword_7FF76E9C32A0 xmmword 99000000990000009900000099h
  n10 = 0;
  n40 = 0;
  v3 = 0;
  v4 = 0;
  do
  {
    v3 = _mm_add_epi32(
           v3,
           _mm_xor_si128(
             _mm_unpacklo_epi16(
               _mm_unpacklo_epi8(_mm_cvtsi32_si128(*(_DWORD *)((char *)&v25 + n40)), (__m128i)0LL),
               (__m128i)0LL),
             si128));
    v5 = _mm_unpacklo_epi8(_mm_cvtsi32_si128(*(_DWORD *)((char *)&v25 + n40 + 4)), (__m128i)0LL);
    n40 += 8;
    v4 = _mm_add_epi32(v4, _mm_xor_si128(_mm_unpacklo_epi16(v5, (__m128i)0LL), si128));
  }
  while ( n40 < 40 );
  v6 = _mm_add_epi32(v4, v3);
  v7 = _mm_add_epi32(v6, _mm_srli_si128(v6, 8));
  srand(_mm_cvtsi128_si32(_mm_add_epi32(v7, _mm_srli_si128(v7, 4))));
  v15[0] = BYTE3(v25) | ((BYTE2(v25) | ((BYTE1(v25) | ((unsigned __int8)v25 << 8)) << 8)) << 8);
  v15[1] = BYTE7(v25) | ((BYTE6(v25) | ((BYTE5(v25) | (BYTE4(v25) << 8)) << 8)) << 8);
  v15[2] = BYTE11(v25) | ((BYTE10(v25) | ((BYTE9(v25) | (BYTE8(v25) << 8)) << 8)) << 8);
  v15[3] = HIBYTE(v25) | ((BYTE14(v25) | ((BYTE13(v25) | (BYTE12(v25) << 8)) << 8)) << 8);
  v15[4] = BYTE3(v26) | ((BYTE2(v26) | ((BYTE1(v26) | ((unsigned __int8)v26 << 8)) << 8)) << 8);
  v15[5] = BYTE7(v26) | ((BYTE6(v26) | ((BYTE5(v26) | (BYTE4(v26) << 8)) << 8)) << 8);
  n8 = 0;
  v15[6] = BYTE11(v26) | ((BYTE10(v26) | ((BYTE9(v26) | (BYTE8(v26) << 8)) << 8)) << 8);
  v15[7] = HIBYTE(v26) | ((BYTE14(v26) | ((BYTE13(v26) | (BYTE12(v26) << 8)) << 8)) << 8);
  v15[8] = BYTE3(v27) | ((BYTE2(v27) | ((BYTE1(v27) | ((unsigned __int8)v27 << 8)) << 8)) << 8);
  v15[9] = HIBYTE(v27) | ((BYTE6(v27) | ((BYTE5(v27) | (BYTE4(v27) << 8)) << 8)) << 8);
  do
    *(&v17 + n8++) ^= rand();
  while ( n8 < 8 );
  v9 = v15;
  do
  {
    v10 = v17 ^ v9[1] ^ *v9;
    v11 = v17 ^ v18 ^ v9[1] ^ v10;
    v12 = v19 ^ v11;
    v13 = v21 ^ v19 ^ v20 ^ v10 ^ v19 ^ v11;
    if ( (v23 ^ v24 ^ (unsigned int)v13 | ((unsigned __int64)(v23 ^ v13 ^ v21 ^ n219540062 ^ (unsigned int)v12) << 32)) != v16[(unsigned __int64)(unsigned int)n10 >> 1] )
    {
      printf("Wrong Flag :(\n");
      exit(0);
    }
    n10 += 2;
    v9 += 2;
  }
  while ( n10 < 10 );
  printf("Correct Flag :)\n");
  return 0;
}
  • prints “Enter The Flag:”

  • reads a string into a 40-byte buffer

  • runs an SSE routine that mixes the 40 bytes against a 16-byte constant

  • uses the result to seed srand

  • calls rand() eight times and XORs those numbers into internal 32-bit constants

  • splits the 40-byte input into ten big-endian 32-bit words and checks five 64-bit constants block by block

  • prints “Correct Flag :)” if all five match

I grabbed two close-ups:

  • screenshot 5: the SSE sequence where _mm_load_si128(&xmmword_...) pulls 99 00 00 00 repeated and feeds the srand(...) path

  • screenshot 6: the verification loop that builds v10, v11, v12, v13 and finally compares (high<<32)|low with one of the five 64-bit targets

This is where we stop looking and start thinking. How do we turn this into math we can solve on paper?


The seed and the PRNG: be precise or you’ll chase ghosts

The seed is not hardcoded; it’s computed from the 40 input bytes and that XMM constant. The constant is four lanes of 0x00000099. The code expands bytes to 32-bit lanes, XORs with 0x99, sums across lanes, shifts by 8 bytes, adds again, then shifts by 4 bytes and adds, and finally takes the low 32 bits. For this build, that reduction equals

seed = (sum over j in 0..39 of (flag[j] ^ 0x99)) mod 2^32

If you ever port this write-up script to a different build and nothing matches, the lane reduction is the first thing I would validate in a debugger (break before srand, log the seed for a dummy input).

The PRNG is Windows’ MSVCRT rand():

state = (state * 214013 + 2531011) mod 2^32
return (state >> 16) & 0x7fff

The function calls it exactly eight times, and maps those returns like this:

R17 = v17 ^ r0
R18 = v18 ^ r1
R19 = v19 ^ r2
R20 = v20 ^ r3
R21 = v21 ^ r4
R22 = n219540062 ^ r5   ← yes, this one too
R23 = v23 ^ r6
R24 = v24 ^ r7

Don’t forget R22: is used in the high-dword formula. If you leave it constant, you’ll get “no solution” forever.


The algebra that turns the whole problem into two 32-bit masks

The function splits your 40 bytes into ten big-endian 32-bit words:

A0, B0, A1, B1, A2, B2, A3, B3, A4, B4

For each i from 0 to 4, it computes:

v10 = R17 ^ Ai ^ Bi
v11 = R18 ^ Ai
v12 = R19 ^ v11                 = R19 ^ R18 ^ Ai
v13 = R21 ^ R20 ^ v10 ^ v11     = R21 ^ R20 ^ R17 ^ R18 ^ Bi

low32  = R23 ^ R24 ^ v13
high32 = R23 ^ v13 ^ R21 ^ R22 ^ v12

and checks that (high32 << 32) | low32 equals one of the five 64-bit constants embedded in the binary.

Now for the punchline. If I define

K' = R23 ^ R24 ^ R21 ^ R20 ^ R17 ^ R18
M  = R23 ^ R20 ^ R17 ^ R22 ^ R19

then the two equations collapse to simple, direct expressions for the unknowns:

Bi = L[i] ^ K'
Ai = H[i] ^ M ^ Bi

where L[i] and H[i] are the low/high dwords of the i-th 64-bit target. That means the entire 40-byte flag depends only on those two 32-bit masks K' and M.

That’s the moment the problem stops being “reverse a binary” and becomes “solve two XOR masks under a couple of formatting constraints.”


Pinning down the two masks from the flag format

I know the flag format starts with FlagY{ and ends with }. That pins two things:

  • A0 is "Flag" (4 bytes, big-endian)

  • B0 begins with "Y{"

Because B0 = L[0] ^ K', those first two bytes of B0 constrain the first two bytes of K' immediately. Once K' is fixed, I compute B0, then compute

M = A0 ^ H[0] ^ B0

and that gives me every Ai and Bi for i=0..4:

Bi = L[i] ^ K'
Ai = H[i] ^ M ^ Bi

I reassemble the 40 bytes in order, enforce the ending } if I want, then compute the seed from the bytes (the ^0x99 sum), generate the eight rand() values, build R17..R24, and replay the five 64-bit checks. If they all pass, that’s the exact flag the program expects.

That validation is important: it proves we didn’t make a seed mistake, a lane mistake, or an endian mistake.


A minimal verifier you can paste next to your write-up

This is the smallest possible Python skeleton that matches the program’s math. Fill V16 and the eight 32-bit constants from your binary (the ones you saw in the decompiled snippet).

class MSVCRTRand:
    def __init__(self, seed): self.state = seed & 0xFFFFFFFF
    def rand(self):
        self.state = (self.state * 214013 + 2531011) & 0xFFFFFFFF
        return (self.state >> 16) & 0x7FFF

def seed_from_flag(flag40: bytes) -> int:
    return sum((b ^ 0x99) for b in flag40[:40]) & 0xFFFFFFFF

def verify(flag40: bytes, V16, v17,v18,v19,v20,v21,n219,v23,v24) -> bool:
    rng = MSVCRTRand(seed_from_flag(flag40))
    r = [rng.rand() for _ in range(8)]
    R17 = v17 ^ r[0]; R18 = v18 ^ r[1]; R19 = v19 ^ r[2]; R20 = v20 ^ r[3]
    R21 = v21 ^ r[4]; R22 = n219 ^ r[5]; R23 = v23 ^ r[6]; R24 = v24 ^ r[7]
    words = [int.from_bytes(flag40[i:i+4], 'big') for i in range(0,40,4)]
    for i in range(5):
        A = words[2*i]; B = words[2*i+1]
        v10 = (R17 ^ A ^ B) & 0xFFFFFFFF
        v11 = (R18 ^ A) & 0xFFFFFFFF
        v12 = (R19 ^ v11) & 0xFFFFFFFF
        v13 = (R21 ^ R20 ^ v10 ^ v11) & 0xFFFFFFFF
        low  = (R23 ^ R24 ^ v13) & 0xFFFFFFFF
        high = (R23 ^ v13 ^ R21 ^ R22 ^ v12) & 0xFFFFFFFF
        want = V16[i]
        if ((high << 32) | low) != want:
            return False
    return True

If you want to actually solve for K' and M automatically, constrain B0 to start with "Y{" and loop over the small byte-space left for K' until you get a candidate that verifies. Because we’re not brute-forcing 40 bytes, it’s instant.


Automating the process to get the flag

The following script will automate the process of getting the flag

#!/usr/bin/env python3

from itertools import product


class MSVCRTRand:
    def __init__(self, seed): self.state = seed & 0xFFFFFFFF
    def rand(self):
        self.state = (self.state * 214013 + 2531011) & 0xFFFFFFFF
        return (self.state >> 16) & 0x7FFF
V16 = [
    0x2672F0BC4D4B105C,
    0x6066A4EC7505175F,
    0x3431FBEA2D544958,
    0x3931AFBF7651170D,
    0x6B30FCFC27034543,
]

def u32(x): return x & 0xFFFFFFFF
v17 = u32(-559038737)
v18 = u32(-1163005939)
v19 = u32(-17958194)
v20 = u32(-889275714)
v21 = u32(-559039810)
n219 = u32(219540062)
v23 = u32(-556882451)
v24 = u32(-1163023331)
  
PREFIX = b"FlagY{"
  
def be_u32(b): return int.from_bytes(b, "big")
def be4(x): return (x & 0xFFFFFFFF).to_bytes(4, "big")
L = [x & 0xFFFFFFFF for x in V16]
H = [(x >> 32) & 0xFFFFFFFF for x in V16]
A0 = be_u32(PREFIX[:4])
B0_first2 = PREFIX[4:6]
allowed = []
for pos in range(4):
    s = set(range(256))
    for i in range(5):
        b = L[i].to_bytes(4, "big")[pos]
        s = {k for k in s if 32 <= (b ^ k) < 127}
    allowed.append(sorted(s))
  
cands = []
for kb in product(*allowed):
    Kp = be_u32(bytes(kb))
    B0 = u32(L[0] ^ Kp)
    if be4(B0)[:2] != B0_first2:
        continue
    M = u32(A0 ^ H[0] ^ B0)
    flag = bytearray()
    ok = True
    for i in range(5):
        Bi = u32(L[i] ^ Kp)
        Ai = u32(H[i] ^ M ^ Bi)
        flag += be4(Ai) + be4(Bi)
    if not flag.startswith(PREFIX):
        continue
    if flag[-1] != ord('}'):
        continue
    cands.append(bytes(flag))
  
def seed_from_flag(flag_bytes):
    s = 0
    for b in flag_bytes[:40]:
        s = (s + (b ^ 0x99)) & 0xFFFFFFFF
    return s
  
def verify(flag):
    seed = seed_from_flag(flag)
    rng = MSVCRTRand(seed)
    r = [rng.rand() for _ in range(8)]
    R17 = v17 ^ r[0]
    R18 = v18 ^ r[1]
    R19 = v19 ^ r[2]
    R20 = v20 ^ r[3]
    R21 = v21 ^ r[4]
    R22 = n219 ^ r[5]
    R23 = v23 ^ r[6]
    R24 = v24 ^ r[7]
    words = [int.from_bytes(flag[i:i+4], "big") for i in range(0, 40, 4)]
    idx = 0
    for j in range(5):
        A = words[idx]; B = words[idx+1]; idx += 2
        v10 = u32(R17 ^ B ^ A)
        v11 = u32(R18 ^ A)
        v12 = u32(R19 ^ v11)
        v13 = u32(R21 ^ R20 ^ v10 ^ v11)
        low  = u32(R23 ^ R24 ^ v13)
        high = u32(R23 ^ v13 ^ R21 ^ R22 ^ v12)
        if ((high << 32) | low) != V16[j]:
            return None
    return seed

solutions = []
for f in cands:
    sd = verify(f)
    if sd is not None:
        solutions.append((f.decode('ascii', 'ignore'), sd))

if not solutions:
    print("no solution")
else:
    for flag, seed in solutions:
        print(flag)
        print(f"seed={seed} (0x{seed:08X})")

FLAG: FlagY{ab8624a5fa40359d8fb595baf3af88334}

SEED: seed=8248 (0x00002038)


Before deobfuscation vs after deobfuscation: how the code feels different

Before: the function feels like an opaque system. You scroll up and down, but no single line means what it says. Jumps don’t go to where they point, movs don’t hold what they load until a dozen instructions later, and the graph is a swirl. You can’t even trust what IDA believes.

After: the function reads like ordinary code. A movabs is a movabs. A jmp goes where it literally says it goes. IDA can propagate constants, decompile properly, and you are back in home territory. This is where you can do the real reversing: the math above becomes obvious in the pseudocode once your eyes aren’t fighting the obfuscator.

  • Before – flattened CFG

  • After – clean CFG


Pitfalls I hit and how I kept myself honest

  • MSVCRT rand() vs glibc rand(): they are not the same. Use the Windows one. I always drop three test seeds in a scratch script (seed=1, seed=0, seed=0x12345678) and compare output to known vectors to avoid embarrassment.

  • The sixth mask (R22 = n219540062 ^ r5) has to be in the high-dword expression. Early on I left n219540062 as a constant and nothing matched.

  • Big-endian word building from the input bytes. The code literally uses BYTE3/BYTE2/... pattern. I stick to int.from_bytes(..., 'big') and .to_bytes(..., 'big') consistently to avoid accidental LE/BE flips.

  • The XMM lane reduction. In this binary it’s “sum all bytes with ^0x99”. If I ever see a mismatch with a different build, I’ll test the 8 easy variants: pick byte index 0..3 within each 4-byte group, with or without the ^0x99. One of those is what the compiler emitted.


Conclusion

This is how I like to structure a CTF reversing write-up so it’s useful to the next person: show the problem, show what the obfuscator actually does to you, build a simple tool that gives you your function back, and then do the normal reverse engineering you’d do on clean code. If you swap Alcatraz for a different flattener, the workflow barely changes: scan → micro-emulate → patch → read clean code → solve.

References:

Alcatraz-Deobfuscator https://github.com/weak1337/Alcatraz http://keowu.re/posts/Analyzing-Mutation-Coded-VM-Protect-and-Alcatraz-English/

Last updated