0%

To be a bug hunter with Binary Ninja in IoT

Binary Ninja is an easy-to-use binary analysis platform that provides rich API interfaces to help security researchers perform automated analysis.

中文版

Recently I am doing security research work on various IoT devices, these are mainly routers, NAS, NVR, IP cameras and other products. Normally, I will first try to download the firmware of these products on the Internet, and then use binwalk to unpack and analyze them. Most of them are Linux embedded systems. After obtaining their rootfs, the ELF program inside will be statically processed.

Based on experience, I will focus on the code written by the manufacturer when implementing some specific protocols, such as http, cgi, upnp, netatalk, sslvpn, etc. I have listed some vulnerability prototypes that I will audit during static analysis below. Although the dangerous functions listed below need to be banned in modern software development, due to historical reasons, many IoT devices still have these ancient codes left behind, and there are opportunities to be exploited by attackers.

Untitled

After sorting out what needs to be done, the work of vulnerability hunting becomes more repetitive. The most time-consuming task is to audit the context when each function is referenced, and then check:

  1. Whether the current function call is safe (especially check whether there is the possibility of buffer overflow)
  2. Can the attacker's input reach this path

If the above two conditions are met, it is likely that this is a exploitable vulnerability. However, when an ELF is more complex, the call relationship of these high-risk functions will become abnormally many, and it will become quite laborious to rely on human audit. For example, developers of manufacturers especially like to directly call the system function in CGI to realize some tasks, such as restarting the machine, checking updates, etc. Most of the parameters of the system function are beyond the control of the attacker. How we can quickly find the vulnerability that can be controlled by the attacker for command injection?

This obviously involves the category of static analysis. I can conduct a relatively complete analysis of the whole program according to the control flow and data flow, but it requires huge computational power, which is likely to lead to an exponential increase in analysis time. As a vulnerability digger, I am more concerned about how to find exploitable vulnerabilities than absolutely relying on computers to find bugs automatically. I can accept the false alarm generated by automatic analysis. As long as it is more convenient than pressing the X key in IDA for human flesh scanning, it can be regarded as automatic analysis. As for how to reduce its false positive rate, we can make this thing first and then consider this problem.

After investigation, I tried to use angr, IDApython and Ghrida for analysis, but the results were not satisfactory (maybe I was not familiar with the use of these software). Finally, I used binary Ninja to complete a simple automatic analysis tool

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
from binaryninja import *
from enum import Enum, auto

class Error(Enum):
FORMAT_UNCONSTANT = auto()
FORMAT_OVERFLOW = auto()
STACKOVERFLOW = auto()
COMMANDINJECT = auto()

def check_system(bv,symbol="system"):
addr = get_function_addr(bv,symbol)
if addr == None:
return []
refs = bv.get_code_refs(addr)
ret = []
for ref in refs:
func = ref.function
cmd = func.get_parameter_at(ref.address,None,0)
if is_constant(cmd):
continue
ret.append((symbol,func.name,ref.address,Error.COMMANDINJECT))
return ret

def get_function_addr(bv,symbol):
syms = []
if symbol in bv.symbols:
syms = bv.symbols[symbol]
if "_%s"%symbol in bv.symbols:
syms = bv.symbols["_%s"%symbol]
for i in syms:
if "mips32" == bv.arch.name or "mipsel32" == bv.arch.name:
if i.type == SymbolType.ImportAddressSymbol:
return i.address
else:
if i.type == SymbolType.ImportedFunctionSymbol:
return i.address
return None

def is_constant(a):
return a.type == RegisterValueType.ConstantPointerValue or a.type == RegisterValueType.ConstantValue

if __name__ == "__main__":
input_file = sys.argv[1]
if os.path.exists(input_file + ".bndb"):
bv = open_view(input_file + ".bndb")
else:
bv = open_view(input_file)
settings = SaveSettings()
bv.create_database(input_file + ".bndb", None, settings)
ret = check_system(bv)
for i in ret:
print("%-15s function: %-20s addr: 0x%x %s"%(i[0],i[1],i[2],i[3]))

The main logic of the above code is in the check_system function. I just filter out the calls whose parameters are not constants when calling system, and finally output all the results. This greatly reduces the number of system calls I need to check. Similarly, I can also check whether there is a risk of buffer out-of-bounds in the call to sprintf through Binary Ninja's API.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
def check_sprintf(bv,symbol = "sprintf"):
addr = get_function_addr(bv,symbol)
if addr == None:
return []
refs = bv.get_code_refs(addr)
ret = []
for ref in refs:
func = ref.function
fmt = func.get_parameter_at(ref.address,None,1)
if not is_constant(fmt):
ret.append((symbol,func.name,ref.address,Error.FORMAT_UNCONSTANT))
continue
asc = bv.get_ascii_string_at(fmt.value,min_length = 2)
if asc == None:
continue
fmt_value = asc.value
cidx = 0
arg_idx = 1
while True:
idx = fmt_value.find("%",cidx)
if idx < 0:
break
arg_idx += 1
cidx = idx + 1
if fmt_value[idx:].startswith("%s"):
arg = func.get_parameter_at(ref.address,None,arg_idx)
if not is_constant(arg):
ret.append((symbol,func.name,ref.address,Error.FORMAT_OVERFLOW))
return ret

The false positive rate of the above vulnerability analysis will still be relatively high, based on experience, I often add some additional constraints that do not seem to be particularly correct:

  • strcpy strcat
    Require the dst parameter to be stack space (more likely to appear stackoverflow)

  • system
    Require calls to snprintf or sprintf at the same time as system is called (command injection is more likely)

  • sscanf
    It is required to call sscanf without calling fopen (the developer will use sscanf to read the information in the configuration file)

Through the above constraints, the results of the automatic analysis output can generally be used as a reference for static analysis, but if you want to perform batch firmware analysis, further optimization is required. At present, I am researching how to use Binary Ninja's intermediate language to conduct a relatively complete data flow analysis to further reduce the false positive rate. If there are other updates, I will synchronize them here. Interested students are also welcome to contact me.

References

  1. https://www.zerodayinitiative.com/blog/2022/2/14/static-taint-analysis-using-binary-ninja-a-case-study-of-mysql-cluster-vulnerabilities
  2. https://blog.trailofbits.com/2017/01/31/breaking-down-binary-ninjas-low-level-il/
  3. https://blog.trailofbits.com/2018/04/04/vulnerability-modeling-with-binary-ninja/