Title Software based Remote Attestation: measuring integrity of user applications and kernels Authors

Program analysis and code obfuscation

Download 135.21 Kb.

Page	2/3
Date	18.10.2016
Size	135.21 Kb.
	#1441

1 2 3

2.6 Program analysis and code obfuscation

Program Analysis requires disassembly of code and the control flow graph (CFG) generation. The linux tool ‘objdump’ is one of the simplest linear sweep tools that perform disassembly. It moves through the entire code once, disassembling each instruction as and when encountered. This method suffers from a weakness that it misinterprets data embedded inside instructions hence carefully constructed branch statements induce errors [20]. Linear sweep is also susceptible to insertion of dummy instructions and self modifying code. Recursive Traversal involves decoding executable code at the target of a branch before analyzing the next executable code in the current location. This technique can also be defeated by opaque predicates [21]where one target of a branch contains complex instructions which never execute [22].

CFG generation involves identifying blocks of code such that they have one entry point and only one branch instruction with target addresses. Once blocks are identified, branch targets are identified to create a CFG. Compiler optimization techniques such as executing instructions in the delay slot of a branch cause issues to the CGF and require iterative procedures to generate an accurate CFG. The execution time of these algorithms is non-linear (n²) [23].

2.7 Kernel integrity measurement schemes

An attacker can compromise any measurements taken by a user level program by installing a kernel level rootkit. The kernel provides file system, memory management and system calls for user applications. The remote attestation scheme as implemented in this work requires kernel support. This section describes prior work done in implementing kernel integrity measurement.

Co-processor schemes that are installed on the PCI slot of the PC have been used to measure the integrity of the kernel as mentioned in section 2.1. One scheme [13] computes the integrity of the kernel at installation time and stores this value for future comparisons. The core of the system lies in a co-processor (SecCore) that performs integrity measurement of a kernel module during system boot. The kernel interrupt service routine (SecISR) performs integrity checks on a kernel checker and a user application checker. The kernel checker proceeds with attesting the entire kernel .TEXT section and modules. The system determines that during installation for the machine used for building the prototype, the .TEXT section began at virtual address 0xC0100000 which corresponded to the physical address 0x00100000, and begin measurements at this address.
Another work focuses on developing a framework for classifying rootkits [24]. The authors state that there are three classes of rootkits, those that modify system call table, those that modify targets of system calls, and those that redirect references to the system call table by redirecting to a different location. A kernel level rootkit may perform these actions by using /dev/kmem device file, an example of such a rootkit is the knark rootkit [25]. The rootkit detector keeps a copy of the original System.map file and compares the current system call table’s addresses with the original values. A difference between the two tables indicates system call table modification. This system of detecting changes to system call table detected the presence of knark rootkit that modifies 8 system calls. The framework also detects rootkits like SucKIT [26] which overwrite kernel memory to create a fake system call table. Any user access to the system calls re directs to the new table. The rootkit checker determines if the current system call table starts at a location different that the original address, in which case a compromise is detected.
LKIM [27] obtains hashes and contextual measurements to determine the integrity of the platform. In addition to taking hash measurements on kernel Text section, system call table, LKIM also takes measurements on some other descriptors such as inodes, executable file format handlers, Linux security model hooks and so on. The measurements taken are defined by a set of measurement instructions. The paper states that there is no silver bullet to prevent the Linux OS from forging results, hence propose a hypervisor based scheme instead of a native OS scheme. The hypervisor scheme involves changing Xen’s domain U to host the LKIM infrastructure. The domain hosting LKIM is provided Domain 0 privileges.

Threat model and Assumptions

We assume that Mallory an attacker has complete control over software residing on Alice’s machine and Mallory possesses the power to start a clean copy of Alice’s installed program P to execute it in a controlled environment to return results to Trent. Mallory can also attempt to re-direct the challenge to another machine which runs a clean copy of P. We assume that Mallory will not perform transient attacks like patching P with malicious code at any given time t and then at any time t + ∆ replace the old instructions back and remove any modifications. This behavior can be classified as rootkit like behavior which will not be determined by the application level remote attestation. However, a rootkit like this would get detected in the kernel level remote attestation as described in section 7.
We assume that Alice will trust the code provided by Trent and allow it to execute on the machine to be verified, and that Alice has means such as certificates and digital signatures to verify that the verification code (C) has been generated by Trent. We also assume that Alice is not running M_Alice behind a NAT and that the machine has only one network interface. The reason to make these assumptions is that C takes measurements on M_Alice to determine if it is the same machine that contacted Trent. If M_Alice is behind a NAT then Trent would see the request coming from a router and measurements from M_Alice. This work focuses on the general client platform where only one network interface is installed, and each network interface has only one IP address associated with it. In the case that there are many addresses configured on the same network interface, the code can be altered to populate all possible IP addresses that it reads from the interface and send them to Trent. Trent can parse through the result to find the matching IP address.
For the user application attestation part, this work does not assume a compromised kernel. The verification code C relies on the kernel to handle the system calls executed through interrupts, and to read the file structure containing the open connections on the system. There are many system call routines in the Linux kernel and monitoring and duplicating the results of each of these may be a difficult task for malware. Reading the port file structure also requires support from the operating system. We will assume that the OS provides correct results when the contents of a directory and file are read out. Without this assumption, Remote Attestation cannot be performed entirely without kernel support.
For the kernel attestation part, we assume that the kernel is compromised; system call tables may be corrupted, and a malware may have changed the interrupt descriptors. Runtime code injection is performed on a kernel module to measure the integrity of the kernel. It is assumed that Alice has means such as digital certificates to determine that the code being injected is generated by a trusted server. It is also assumed that the trusted server is the OS vendor or a corporate network administrator who has knowledge of the OS mappings for the client.

Overview of operations to be performed on Client end

If Alice could download the entire copy of P every time the program had to be executed then Remote Attestation would not be required. However, since P is an installed application, Alice must have customized certain profile options, saved some data which will be cumbersome to create ever time.
Alice uses P to contact Trent for a service, Trent returns to P: a challenge which is executable code (C). P must inject C in its virtual memory and execute it at a location specified by Trent. C computes certain measurements and communicates integrity measurement value M₁ directly to Trent. This process is depicted in Fig. 1. Trent has a local copy of P on which the same sets of tests are executed as above to produce a value M₀. Trent compares M₁ and M₀; if the two values are the same then Alice is informed that P has not been tampered. This raises the issue of verifiable code execution, in which Trent wants to be certain that C took its measurements on P residing inside M_Alice. To provide this guarantee C executes some more tests on M_Alice and returns their results to Trent. These checks ensure that C was not bounced to another machine, and that it was not executed in a sandbox environment inside a dummy P process within M_Alice.
There are many ways in which Mallory may tamper with the execution of C. Mallory may substitute values of M₁ being sent to Trent such that there is no evidence of any modification to P having taken place. It is also possible that Mallory may have loaded another copy of P which has not been tampered inside a sandbox, execute C within it, and provide the results back to Trent. Mallory may have also redirected the challenge to another machine on the network making it compute and send the responses back to Trent. Without addressing these issues, it is not possible for Trent to correctly determine whether the measurements accurately reflect the state of P on M_Alice. If Trent can determine that C executed on M_Alice, and C was not executed in a sandbox then Trent can produce code whose results are difficult to guess and the results can indicate the correct state of P. Achieving these guarantees require that C provides Trent with a machine identifier and a process identifier.
Trent can retain a sense of certainty that the results are genuine by producing code that makes it difficult for Mallory to pre-compute results. Once these factors are satisfied, Trent can determine whether P on M_Alice has been tampered. The entire process of Remote Attestation is shown in Fig. 2.
4.1 Determining checksum and MD5 on P

C computes a MD5 hash of P to determine if the code section has been tampered. Downloading MD5 code is an expensive operation as the code size is fairly large, and MD5 code cannot be randomized as it may lose its properties. Due to these reasons, the MD5 code permanently resides on P. To prevent Mallory from exploiting this aspect, a two phase hash protocol is implemented. Trent places a mathematical checksum inside C which computes the checksum on the region of P containing the MD5 executable code along with some other selected regions. Trent computes the results of the checksum locally and verifies if C is returning the expected value. C proceeds with the rest of the protocol if Trent responds in affirmative.
Trent changes the operations of the checksum in every instance so that Mallory cannot use prior knowledge to predict the results of the mathematical operations. C does not take the checksums over fixed sized regions; instead Trent divides the entire area over which checksum is taken into multiple overlapping sub-regions, the boundaries of the sub-regions are defined inside C by Trent by moving the data pointer back by a random number that is generated during compilation of the C source code. For the prototype implementation, the method used to generate the random numbers was the ‘rand’ call, since rand call may not me truly random, we used the ‘srand’ call and used the current stack pointer of the source code generating program as the seed to the random number. The stack of all processes is randomized using Address Space Layout Randomization (ASLR) [28]. It can be noted that this is not as secure as using a cryptographically secure random number generator. In real world applications, Trent can use the Linux ‘/dev/random’ file [29] to read random numbers.
The individual checksums are then combined and sent to Trent. This is depicted in Fig. 3. C performs MD5 hash on overlapping sub-regions of P defined in a similar fashion as above. A degree of obfuscation is added by following the procedure in Fig. 4. C initially takes the MD5 hash of the first sub-region (H₁). It then obtains the MD5 hash of the next sub-region (H₂). It then concatenates the two values to produce H₁H₂. Then a MD5 Hash of H₁H₂ is taken to produce H₁₂. H₁₂ is then concatenated with H₃ to produce H₁₂H₃. H₁₂H₃is hashed again to produce H₂₃ and so on. This process is followed for all the sub-regions and sent to Trent.
Drawing inferences from executable code is considered difficult as discussed in section 2. Randomizing the boundary overlaps between the sub-regions makes it difficult to predict the hash values being generated. Mallory has to execute the code to observe the computation being performed. The checksums are taken on overlapping sub regions to make the prediction of results more difficult for Mallory. This creates multiple levels of indeterminacy for an attack to take place. Mallory has to not only predict the boundaries of the sub-regions, but has to also deal with the overlap among the sub-regions. Overlapping checksums also ensures that if by accident the sub-regions are defined identically in two different versions of C, the results of computation produced by C are still different. This also ensures that some random sections of P are present more than once in the checksum, making it more difficult for Mallory to hide any modifications to such regions.
MD5 checksum has been used in this prototype, it has been discovered that it has collisions. However, MD5 can be substituted easily with a different hashing algorithm in a software based attestation scheme, the same cannot be done easily in a TPM or hardware based attestation scheme.
4.2 Determining process identifiers.

C determines whether it was executed inside a fake process or the correct P process by obtaining some identifiers. C determines the number of processes having an open connection to Trent on M_Alice. This is obtained by determining the remote address and remote port combinations on each of the port descriptors in the system. C communicates to Trent using the descriptor provided by P and does not create a new connection. This implies that in an ideal situation there must be only one such descriptor on the entire system, and the process utilizing it must be the process under which C is executing. The passing of socket descriptor from P to C also addresses the issue of redirection of challenge to another machine partially. The only way for such a connection to exist on a machine is if Trent accepts the incoming request, otherwise the machine will not have a socket descriptor with the ability to communicate with Trent.
If there is more than one process having such a connection then an error message is sent to Trent. If there is only one such process, C computes its own process id and compares the two values. If they match an affirmative message is sent to Trent. If the values do not match then it reports an error with an appropriate message to Trent.
4.3 Determining the Identifier for M_Alice

C has to provide Trent the guarantee that it was not re-directed to another machine and that it was not executed in a sandbox environment or pasted on another clean copy of P within M_Alice. The first is achieved by obtaining any particular unique machine identifier. In this case the IP address of the machine can serve as an identifier. Trent has received a request from Alice and has access to the IP address of M_Alice. If C returns the IP address of the machine it is executing on Trent can determine if both are the same machine or not. It can be argued that IP addresses are dynamic however there is little possibility that any machine will change its IP address in the small time window between a request by Alice to measurements being taken and provided to Trent. C determines the IP address of M_Alice using System Interrupts. Mallory will also find it hard to tamper with the results of an Interrupt. The interrupt ensures that the address present on the Network interface is correctly reported to Trent. It can again be noted that Mallory may have changed the address of the network interface to match that of M_Alice, but as these machines are not behind a NAT it would be quite difficult for Mallory to provide the identical address to another machine on an external network and communicate with that machine. On receiving the results of the four tests, Trent knows that P has not been tampered from the time of installation to the time of request of verification being sent from M_Alice.

Design of Checksum code produced by Trent

Trent has to prevent Mallory from analyzing the operations performed by C. Trent places a series of obfuscations inside the generated code along with a time threshold (T) by which the response from M_Alice is expected. If C does not respond back in a stipulated period of time (allowing for network delays), Trent will know that something went wrong at M_Alice. This includes denial of service based attacks where Trent will inform Alice that C is not communicating back.
Fig. 5 shows a sample snippet of the C mathematical checksum code. The send function used in the checksum snippet is implemented using inline ASM. It is evident that in order to forge any results, Mallory must determine the value of checksum2 being returned to Trent. This requires that Mallory identifies all the instructions modifying checksum2 and the locations on stack that it uses for computation. To prevent Mallory from analyzing the injected code, certain obfuscations are placed in C as discussed below:

Changing execution flow and locations of variables on stack

To prevent Mallory from utilizing knowledge about a previous instance of Cin the current test, Trent changes the checksum operations performed by selecting mathematical operations on memory blocks from a pool of possible operations and also changes the order of the instructions. The results of these operations are stored temporarily in the stack. Trent changes the pointers on the stack for all the local variables inside C for every instance. These steps prevent Mallory from successfully launching an attack similar to those used for HD-DVD key stealing [30, 31].

Inserting Dummy Instructions

Program Analysis is a non linear operation as discussed in section 2. An increase in the number of instructions that Mallory has to analyze decreases the time window available to forge the results of these operations. Trent inserts instructions that never execute and also inserts operations that are performed on M_Alicebut not included as part of the results sent back to Trent. These additions to the code make it difficult for Mallory to correctly analyze C within a reasonable period of time.

Changing instructions during execution

Mallory may perform static analysis on the executable code C sent by Trent. A good disassembler can provide significant information on the instructions being executed, and allow Mallory to determine when system calls are made and when function calls are made. In addition it may also allow Mallory to see the area of code which reads memory recursively. If these tools do not have access to the code to be executed before it actually executes, then Mallory cannot determine the operations performed by C. Trent removes some instructions in C while sending the code to M_Alice and places code inside C with data offsets such that during execution, this section in C changes the modified instructions to the correct values. This way without executing C it is difficult for Mallory to determine the exact contents of C.

Implementation of user application attestation

In this section the implementation of the techniques proposed in this paper are described. All the coding was done using the C language on Intel x86 architecture machines on Linux kernel using the gcc compiler.

Generation of C by Trent

Trent generates C for every instance of verification request. If Trent sent out the same copy of the verification code, then Mallory can gain significant knowledge on the individual checks performed by C, by generating new code for every instance of verification Trent mitigates this possibility. Trent also places obfuscations inside the code to prevent static analysis of the executable code. The operations performed by Trent to obfuscate the operations performed during verification are discussed below.

Changing execution flow and locations of variables on stack

Changing execution flow and locations of stack serves to prevent the program analysis on C. The source code of C was divided into four blocks which are independent of each other. Trent assigns randomly generated sequence numbers to the four blocks and places them accordingly inside C source code.
The checksum block is randomized by creating a pool of mathematical operations that can be performed on every memory location and selecting from the pool of operations. The pool of operations is created by replacing the mathematical operation with other mathematical operation on the exact same location.
Once the mathematical operations are selected in the C source code, Trent changes the sub-regions for the checksum code and the MD5 calling procedure. This is done by replacing the numbers defining the sub-regions. Chas sub–regions defined in its un-compiled code. To randomize the sub-regions, a pre-processor is executed on the un-compiled C such that it changes the numbers defining the sub-regions. The numbers are generated such that the sub-regions overlap by a random value.
C allocates space on the local stack to store computational values. Instead of utilizing fixed locations on the stack, Trent replaces all variables inside C with pointers to locations on the stack. To allocate space on the stack Trent declares a large array of type ‘char’ of size N, which has enough space to hold contents of all the other variables simultaneously. Trent executes a pre-processor which assigns locations to the pointers. The pre-processor maintains a counter which starts at 0 and ends at N-1. It randomly picks a pointer to be assigned a location and assigns it to the value on the counter and increments the counter using the size of the corresponding variable in question. This continues until all the pointers are assigned a location on the stack. Trent compiles C source code to produce the executable after placing these obfuscations.

Download 135.21 Kb.

Share with your friends:

1 2 3