How deep of analysis can a SOC analyst actually provide?

Jared Anderson
How deep of analysis can a SOC analyst actually provide?

How accurate of a story can an analyst present without having everything in front of them? (In this case, everything being a completely reverse engineered piece of malware along with supporting network traffic.)

What inspired this blog post was a talk at DerbyCon 2016 by Brandon Young. In his talk, “Reverse engineering all the malware…and why you should stop.”, Brandon mentions that during analysis, the majority of the time reverse engineering is frankly not needed. In many cases, somebody else has already done the work or the process is automated. Malware authors are lazy. More often than not, a new variation of malware contains a lot of the same code, with some additional features added. Maybe it’s packed or obfuscated in a different way. However, does any of that really matter? When it comes down to it, do we actually care about a few code changes or a different packing method? Well, maybe, but I would argue that indicators of compromise (IOCs) are the main thing that analysts really care about. After all, our main job is to determine if the host is infected or compromised. But, can we go further than that? Can we provide a deeper analysis?

For me, analyzing network traffic for known IOCs and determining compromised or not compromised is not satisfying enough. I want to be able to build a story around what happened. If the machine does get infected/compromised, can I identify the malware family? Beyond that, rather than just saying “the machine is infected with insert malware name here,” can I give information on capabilities, how it works, etc.?

The issue here is that a typical security analyst usually lacks the capability to reverse engineer a piece of malware. I currently fall into that category. If the initial attack vector is present and visible in the network traffic, I can tell where the malware came from, if it was successfully downloaded, and any outbound traffic generated (assuming I have IOCs or manage to catch it myself). But what happens when there are no conclusive signatures and I can’t determine what the malware is that I’m looking at? By automating the reverse engineering process and effectively using the tools in my toolbox, I might be able to determine this without the help of my reverse engineers. This allows me to provide deeper analysis and saves crucial time for the reverse engineers on my team.

No reverse engineering? How can I possibly determine this information?

Analysts have sandboxing tools like Cuckoo/Malwr/Hybrid-Analysis, file carvers/extractors, open source platforms like VirusTotal/Threatcrowd and the most magical tool of all: Google. These tools will all help aid us in our analysis, but again, the importance of automating the reverse engineering process cannot be understated. Be it writing additional signatures for Cuckoo, writing a decoder to grab C2s from a piece of malware (which I’ve done with Angler EK), creating file carving/extracting software, incorporating open source intel into tools, etc., automation of reverse engineering is critical to this process.

Below is an example of some recent analysis I did that shows how an analyst can dive deeper using the aforementioned tools to automate/bypass the reverse engineering process.

Please keep in mind that none of this analysis is new or groundbreaking. This post is meant to show that by automating the reverse engineering process and providing the right tools to analysts, we (analysts) can provide much deeper and more accurate analysis without the need for reverse engineering.

Analysis

The following file was an email attachment caught with a several H1N1 Yara Signatures:

Filename: copy_84563754.doc 
SHA256: 5a092be487f8c0ef6f44f002698a133317f0760fc4ff96370bf06e09d41a4925

If you toss the above hash into VT, it has not yet been submitted to VT. No surprise there as these attachments are likely dynamically generated, therefore each one will have a different hash value. So, no cheating this time, we’ll have to do some analysis on this one. To Cuckoo we go!

After leaving the doc alone in the sandbox to play for a few minutes, Cuckoo lets me know what has been going on:

Figure 1: Looks like our doc attachment was misbehaving and needs a timeout.

Figure 1: Looks like our doc attachment was misbehaving and needs a timeout.

Figure 2: Uh oh, looks like the VBA (macros) drops out an exe!

Figure 2: Uh oh, looks like the VBA (macros) drops out an exe!

Let’s find out a little bit more about our new friend jj814.exe.

Filename: jj814.exe 
SHA256: 0d03fade2b60a0581d688e2631be65f77e40f4353b43d9e49bb751d780f6678b

It currently is scoring a 36/57 on VT, so it is almost certainly malicious. That being said, what is it? Vendors on VT identify this as a wide variety of things from Cerber (ransomware) to Sality. Oh look, Cuckoo is smart and identifies it as H1N1.

Cuckoo also lets us know about some HTTP requests that were made. We have 2 GETs to two separate domains for an exe and a DLL, as well as a POST that follows the pattern of /h/gate.php (an indicator of H1N1). We now have three domains (IOCs) that we can look for in network traffic. Please see Figure 5 for domain names.

Typically, H1N1 downloads a Pony DLL and a Vawtrak executable. Let’s wget them and see if we can verify that.

Filename: inst.exe 
SHA256: fedf01e29e30dfdc56aa2acf706b51c43759e9077f4816f56484418e6c0c0538
Filename: pm.dll 
SHA256: ba796b422cb9c2ad35d009167137a4e669468648d78ffd60386b45ab7a8326f1

If you submit both of these hashes to VT, you will see that vendors now have a high detection rate for both of these files (Vawtrak + Pony). Here we have a couple choices: 1. We’ve determined the malware families. We’re done! 2. We’ve determined the malware families! Could we spend a little more time and see if we find out anything else? (Hint: I’m choosing option two!)

Again, keep in mind this is only an example. While this analysis is certainly not groundbreaking, it does show that with the right tools, a SOC analyst is able to provide fairly deep analysis when it comes to malware, even without a reverse engineer.

File Carving

Let’s put these files through our file carving framework to get some additional information about each file.

Initially, Inst.exe appears to be a legitimate software from MAGIX AG. First, I noticed the creation date of 2014-04-21. First, I thought that it could possibly be an old vulnerable application. In fact, given the other file that was downloaded is a DLL, a probable theory I’d developed was that there was sideloading vulnerability which would allow pm.dll to be executed and evade detection. While some of the imports from kernel32.dll and user32.dll were suspicious, there was nothing explicitly screaming malicious.

However, Cuckoo mentioned that this file likely contains encrypted or compressed data, and is compressed using UPX (commonly used by malware). Also, why would H1N1 download a legitimate executable? Although I cannot prove it, several signs are pointing to this being malware. Given what we’ve previously seen from H1N1, I’m willing to bet it is Vawtrak.

Figure 3: This exe at first appears to be an old but legitimate program created by MAGIX AG. But is it really...

Figure 3: This exe at first appears to be an old but legitimate program created by MAGIX AG. But is it really…

pm.dll looks a bit more interesting and reveals some more interesting information. The creation date was 2016-09-27 (I got it 2016-09-28), meaning it was compiled the night before, which is a bit suspicious. It imports wsock32.dll, wininet.dll and urlmon.dll giving it networking capabilities, as well as several other suspicious imports.

 Figure 4: pm.dll wsock32.dll import gives pm.dll networking capabilities. It also has several other suspicious imports (not pictured).

Figure 4: pm.dll wsock32.dll import gives pm.dll networking capabilities. It also has several other suspicious imports (not pictured).

At this point, we can surely determine that this is H1N1 downloader that downloads both Vawktrak and Pony. In fact, a simple Google search shows that the Vawtrak and Pony filenames haven’t changed for the past month! Also, all of our assumptions are backed up by the reverse engineering + analysis by Josh Reynolds at Cisco. However, even though we have this information, let’s continue on and pretend it wasn’t available. After all, this is meant to show the process that an analyst can follow!

Let’s check out the strings in this pm.dll to see if we can grab any other useful information out of it: 

strings pm.dll > stringspmdll.txt

Voila! We get three C2s (same domains, different paths: now /zapoy/gate.php, a known indicator of Pony) We see various Windows functions, as well as names and paths of FTP programs, stored browser data, email account data, etc. We also see a hardcoded POST request indicating the data will be sent out in a POST that is binary encoded.

 Figure 5: Hard coded C2s that follow patterns of Pony, imports, etc.

Figure 5: Hard coded C2s that follow patterns of Pony, imports, etc.

Figure 6: Hard coded FTP Client and Web Browser names and paths

Figure 6: Hard coded FTP Client and Web Browser names and paths 

Figure 7: Hardcoded HTTP POST likey the POST pattern for data exfiltration

Figure 7: Hardcoded HTTP POST likey the POST pattern for data exfiltration

The Result

So, without doing any reverse engineering and using the tools we have available we can tell a detailed, robust story with moderate to high confidence:

High Confidence: Email comes in with an attachment with macros that produces a PE. The PE makes three HTTP requests: a GET for an exe, a GET for a DLL and a POST to domain.com/h/gate.php (known indicator of H1N1 downloader).

Moderate Confidence: The downloaded exe (inst.exe) is Vawtrak. It is identified as Vawtrak by several vendors, it has similar imports as Vawtrak and it is packed using UPX (commonly seen with malware including Vawtrak). Also, H1N1 is known for downloading both Pony (DLL) and Vawtrak (EXE), so this matches known intel.

High Confidence: The downloaded DLL (pm.dll) exhibits malicious characteristics; AntiVirus industry claims it’s related to Pony. The DLL was compiled yesterday evening (2016-09-27) has host enumeration and networking capability, reinforcing its attribution as malicious.

High Confidence: pm.dll is Pony. Running strings on the dll reveals domains that match patterns of Pony malware. It also reveals file paths for popular FTP clients, web browsers and email clients, presumably for stealing credentials and private information. Additionally, we see a hard-coded HTTP POST that reveals exfil patterns and encoding, etc.

Great, so what’s the point?

Just as Brandon Young mentioned in his talk at DerbyCon, let’s not reinvent the wheel. If the malware in question is something new and interesting or isn’t caught by existing signatures, then it may be worth tearing apart. If the reverse engineering work has already been done (be it in-house or someone else, for example Josh Reynold’s H1N1 write-up), don’t waste your time doing work that has already been done. (I know, I know. I broke my cardinal rule of doing work that has already been done by doing the analysis in this writeup. Again, I just wanted to show the analytical process!) Finally, automate the reverse engineering process. Create a decoder to pull C2s out of a piece of malware, create new sandbox signatures to detect a piece of malware or to prevent anti-vm measures, incorporate open source intel/signatures/decoders into automated tools and make these tools available and accessible to analysts.

The biggest takeaway here is that by automating the reverse engineering process, analysts are empowered to provide deeper, more detailed analysis while also saving crucial reverse engineering time.

Am I saying that we can eventually phase out reverse engineers? Of course not! We need reverse engineers tearing apart new malware, writing new signatures and tools to aid us with our analysis. But by automating the process and providing analysts with these tools/capabilities, we allow more time for reverse engineers to work on more important things rather than reinventing the wheel. And that crucial time that they’ll get back means more tools, more signatures, more capability for us. It’s a winning formula for all involved.

 

Note: This was orginally posted on MuziSec.com.