We have intercepted an encrypted message with critical information, and also managed to recover the machine that is able to decrypt it, with a copy of the source program it should run to decrypt the message. The crazy scientist that built this machine was accidentally killed during the extraction. It’s a very elaborate mechanical machine with tons of pipes and valves but we managed to reverse-engineer its logic and build a simulation out of it, but now we need to convert the source of the program into something that the machine is able to understand and execute! The encrypted message is already loaded into the simulation.

Category: hardware

Solver: rgw, nh1729

Flag: HTB{f1rm_15_b3tw33n_h4rd_4nd_50ft}

Writeup

Circuit Analyis

In this challenge, we get four files: A Logisim circuit file cpu.circ, an example assembly file example.asm with associated binary example.data as well as an assembly file program.asm. This is what example.asm looks like:

:start
        movl ax, 10
:sub1
        movl bx, 1
        sub  bx
        cmp  ax,  ax
        movl bx,  :sub1
        jnz
        rst

This is what example.data looks like:

v2.0 raw
10000a 100101 10100 130000 100101 e0000 40000

When we open the circuit file in Logisim Evolution ([1]), we see a CPU circuit which is divided into subcomponents:

cpu_screenshot.png

The CPU contains some RAM, registers and a terminal device. The task is now to assemble program.asm into the binary instructions that the CPU reads from RAM. There is a “network device” with some read-only memory. This memory will be decrypted by the assembled program.

The first step was to figure out the instruction format. Upon inspecting the RAM, we found a word size of 21 bits. When looking at the instruction register (IR) and operand registers (RA, RB), we noticed that the opcode with is 5 bit and each operand has a width of 8 bit:

instruction_format_screenshot.png

Therefore, we can calculate instruction = (ir<<16) | (ra<<8) | rb.

Next, we inspected the assembly code. There, we saw that commands can take integer constants or one of four general-purpose-registers ax, bx, cx or dx. By looking at the circuit, we could associate each instruction with an opcode. These were either visible in the components themselves or via the wiring to a multiplexer:

instruction_codes_screenshot_1.png

instruction_codes_screenshot_2.png

Since the components were all labeled, we could extract a complete opcode table from the circuit file. The next step was to write an assembler that converts our input program into the respective opcodes.

Writing an assembler

We decided to write our assembler in python. To parse the asm files, we first defined dictionaries with all required symbols:

OPCODES = {
	'add'	: 0b00000,
	'sub'	: 0b00001,
	'mul'	: 0b00010,
	'clr'	: 0b00011,
	'rst'	: 0b00100,
	'jmp'	: 0b00101,
	'ljmp'	: 0b00110,
	'jlp'	: 0b00111,
	'jg'	: 0b01000,
	'jge'	: 0b01001,
	'jl'	: 0b01010,
	'jle'	: 0b01011,
	'je'	: 0b01100,
	'jz'	: 0b01101,
	'jnz'	: 0b01110,
	'mov'	: 0b01111,
	'movl'	: 0b10000,
	'call'	: 0b10001,
	'ret'	: 0b10010,
	'cmp'	: 0b10011,
	'push'	: 0b10100,
	'pop'	: 0b10101,
	'div'	: 0b10110,
	'mmiv'	: 0b10111,
	'mmov'	: 0b11000,
	'msk'	: 0b11010,
	'mskb'	: 0b11011,
}

REGISTERS = {
	'ax': 0,
	'bx': 1,
	'cx': 2,
	'dx': 3
}

Then, we build our main assembler function. For assembling, we have to keep track of all labels in the code. We decided to collect all labels using a virtual instruction pointer before starting to build the output so labels could be referenced globally in the assembler code.

def compile(s):
	lines = s.split('\n')
	output = []
	labels = {}
	
	# scan labels
	ip = 0
	for line in lines:
		if line.strip().startswith(':'):
			labels[line] = ip
		else:
			ip += 1

In the main loop over all instructions, we skipped labels and empty lines, and then combined the up to two operands together with the opcode into the full instruction.

for line in lines:
	line = line.strip()

	if len(line) == 0:
		continue

	tokens = [l for l in line.split() if len(l) > 0]

	if tokens[0].startswith(':'):
		continue

	opcode = OPCODES[tokens[0]]
	ra = 0
	rb = 0

	if len(tokens) > 1:
		ra = resolve_symbol(tokens[1].replace(',',''),labels)
	if len(tokens) > 2:
		rb = resolve_symbol(tokens[2].replace(',',''),labels)

	instruction = (opcode << 16) | (ra << 8) | rb

	output.append(instruction)

To resolve registers, constants and symbols as operands correctly, we extracted a simple function for that:

def resolve_symbol(s,labels):
	if s.startswith(':'):
		if '+' in s:
			l,o = s.split('+') 
			return labels[l] + int(o,0)
		return labels[s]
	if s in REGISTERS:
		return REGISTERS[s]
	return int(s,0)

As a side note, in line 53 in the program.asm file, a label is referenced without a colon.

movl bx,  sub5

Instead of writing more robust code, we fixed this by hand in the asm file. Finally, we only had to call our function and print the output in a format that can be pasted into the RAM component in logisim.

f = open('program.asm')
s = f.read()
f.close()

for a in compile(s):
	print(hex(a)[2:], end=' ')
print()

This produced the final machine code.

10020a 30000 100101 100300 170003 10012e 110100 180000 100137 110100 100101 100300 140300 100137 110100 100101 10100 130000 10010a e0000 10012e 110100 100100 170101 170201 150000 100101 100300 100137 110100 180101 1a0000 170101 180102 1b0000 170201 1000ff 130100 100119 a0000 100100 180301 100201 100200 100116 50100 100005 100137 110100 100101 10100 130000 10012f e0000 120000 100204 100200 120000

After inserting the code into the virtual processor, we started the simulation. First, we thought we made a mistake as the processor ran without output on the display element. Inspecting the memory, we saw the process indeed writing bytes in the range beginning at 0xfe000000. After one minute and twenty seconds on maximum simulation frequency, the flag was printed slowly but surely onto the display:

flag_display_screenshot.png

In total, it took more than 3 minutes to print the flag.

Solver

OPCODES = {
	'add'	: 0b00000,
	'sub'	: 0b00001,
	'mul'	: 0b00010,
	'clr'	: 0b00011,
	'rst'	: 0b00100,
	'jmp'	: 0b00101,
	'ljmp'	: 0b00110,
	'jlp'	: 0b00111,
	'jg'	: 0b01000,
	'jge'	: 0b01001,
	'jl'	: 0b01010,
	'jle'	: 0b01011,
	'je'	: 0b01100,
	'jz'	: 0b01101,
	'jnz'	: 0b01110,
	'mov'	: 0b01111,
	'movl'	: 0b10000,
	'call'	: 0b10001,
	'ret'	: 0b10010,
	'cmp'	: 0b10011,
	'push'	: 0b10100,
	'pop'	: 0b10101,
	'div'	: 0b10110,
	'mmiv'	: 0b10111,
	'mmov'	: 0b11000,
	'msk'	: 0b11010,
	'mskb'	: 0b11011,
}

REGISTERS = {
	'ax': 0,
	'bx': 1,
	'cx': 2,
	'dx': 3
}

def compile(s):
	lines = s.split('\n')
	output = []
	labels = {}
	
	# scan labels
	ip = 0
	for line in lines:
		if line.strip().startswith(':'):
			labels[line] = ip
		else:
			ip += 1
	
	for line in lines:
		line = line.strip()
		
		if len(line) == 0:
			continue
		
		tokens = [l for l in line.split() if len(l) > 0]
		
		if tokens[0].startswith(':'):
			continue
			
		opcode = OPCODES[tokens[0]]
		ra = 0
		rb = 0
		
		if len(tokens) > 1:
			ra = resolve_symbol(tokens[1].replace(',',''),labels)
		if len(tokens) > 2:
			rb = resolve_symbol(tokens[2].replace(',',''),labels)
			
		instruction = (opcode << 16) | (ra << 8) | rb
		
		output.append(instruction)
	return output

def resolve_symbol(s,labels):
	if s.startswith(':'):
		if '+' in s:
			l,o = s.split('+') 
			return labels[l] + int(o,0)
		return labels[s]
	if s in REGISTERS:
		return REGISTERS[s]
	return int(s,0)
	

f = open('program.asm')
s = f.read()
f.close()

for a in compile(s):
	print(hex(a)[2:], end=' ')
print()

Other resources

[1] https://github.com/logisim-evolution/logisim-evolution