Tracking & Killing Bot Networks

In a previous blog I discussed how one of the more enjoyable parts of my day-to-day malware rituals also involves the tracking and killing of command and control bot networks. Recently I have begun automating this process a bit; I have created a series of scripts that extract irc servers, port numbers and channels from malware as it comes in and then checks if the irc server is still online, a custom bot then logs into the server, queries the active channels and determines how many zombies are active on the network. If an irc server is determined to be active with zombies actively connected, the server is then reported to the abuse address listed in the whois information for the servers IP Address.

The automation of this process is something I have had on my todo list for a little while but finally stopped procrastinating it and got it done. The real advantage of it being automated now is I can easily generate a tangible set of information that allows for me to see how many bot networks are present in the malware I process daily, weekly and monthly, how many of those networks are still active and more importantly how many of those networks have active zombies still connected. Likewise, as I’ve discussed previously, I am working on a threat portal and having the irc c&c data processing automated will more easily allow me to put that information on the threat portal and integrate it into the aggregate threat feed that the portal will offer for route/firewall/DNSBL drops.

Here are some statistics on IRC command and control networks as seen in the malware processed by me in the last 30 days:
Total Processed Malware (30d): 607
Total IRC C&C Servers: 251
Total Online IRC C&C Servers (as of 08/17/10): 118
Total Online IRC C&C Servers with Active Zombie Hosts: 30
Total Zombies Observed on Online IRC C&C Servers: 1,679 (55 average per server)

There are some notable observations, out of the total of 251 noted IRC C&C servers, only 118 of them are still online, of those 118 that are still active, 64 of them utilize free DNS naming services and/or dynamic dns services, the other 54 create C&C channels on established public IRC networks or use the DNS name of compromised hosts running an IRC server. Most every one of the 133 now inactive IRC servers used IP addresses within the host malware script, a small majority used DNS names of compromised hosts.

It goes without saying that by using public DNS services / dynamic DNS services, it allows attackers the flexibility to quickly recover a C&C server and its participating zombies in the event of the host server being shutdown. Further, a number of more mature IRC C&C bots will continue reconnection attempts periodically when disconnected from the host C&C server, further increasing the chance of fully recovering the zombie network for the attacker.

Also increasingly, PHP is becoming more common as a language of choice for C&C bot agents, though Perl agents are still vastly more popular. The LMD project currently has classified 44 unique C&C bot agents comprising 286 agent scripts/binaries, 14 classes or 38 scripts of which are PHP based and 21 classes or 213 scripts of which are Perl based, 9 classes or 35 scripts/binaries being Other (c/ruby/java).

Currently there is an average of 6 bot networks being abuse reported per day, of those only about 2-3 per day ever receive any form of followup and/or shutdown of the host running the network. That is a rate of less than 50% on average, which is abysmal to say the least. When the threat management portal goes up in the coming weeks, these networks will find themselves at the top of the threat feed and planted squarely on the front page of the portal — we might not be able to shut them down but we sure can filter them off our networks.

Understanding Signatures

The signature naming scheme for LMD is a little confusing and something I’ve received more than a few questions about, more so about what the *.unclassed signatures mean. The naming scheme (to me) is straight forward and breaks down as follows:

{SIG_FORMAT}lang/vector.type.name.ID#

The ‘SIG_FORMAT’ is either HEX or MD5 reflecting the internal format of the signature, the ‘lang/vector’ is the language or attack vector of the malware, ‘type’ is a short descriptive field for what the malware does (i.e: ircbot, mailer, injection etc…), ‘name’ is a short descriptive name unique to the piece of malware and ‘ID#’ is the internal signature ID number.

What some people appear confused about is signatures such as ‘{HEX}base64.inject.unclassed.7’ that use the term “unclassed” for the name field. Essentially, signatures that are unclassed represent a group of malware that is not necessarily unique from each other but that follows the same attack vector, such as base64 encoded scripts; there are hundreds of these scripts and in encoded form it doesn’t really matter what they do, we are detecting the encoded format not the decoded, so they get lumped together. In other instances, I will throw some malware into an unclassed group when it is very new and I have not had time yet for processing it into its own classification, for example the web.malware.unclassed is a dumping ground for allot of malware that is newly submitted, which I have reviewed and confirmed IS MALWARE but have not yet classified it or determined if it is a variant of an existing malware classification.

It needs to be understood that the processing of malware is mostly a manual task, though there are some elements of it that are automated, the actual review of each malware file is done by hand to remove the chance of false positives — keeping LMD accurate and reliable. As such, not all malware makes it into a classification group right away, the important part is that malware is reviewed, verified and signatures generated for it in a timely fashion. I process malware daily from the network edge IPS system at work, from user submitted files and from various malware news groups / web sites and the priority is getting the signatures up for in the wild threats. The signature name/classification serves informative purposes, yes it is important but not as important as the actual verification and signature generation.

ATF v2: Weighted Threats

When I first introduced you all to the Aggregate Threat Feed back in May, it was a much smaller feed with very simple ambitions — pulling together threat data at work from our network edge and host based firewalls and aggregating the data into a usable feed. The actual intention being that as an attacker exposes themselves more on the network through invasive scans and attacks, they would quickly climb up the threat feed and end up banned proactively. Though this did and still does happen in a way, a problem was introduced when more and more data started to come in from the network edge and it quickly outweighed data from the hosts.

The old way the threat feed was sorted was by number of events. For the network edge IPS the events correlated with actual signature events on the network edge, so these could number from 50 events for an SNMP community scan to thousands of events for an SSH scan. Then you have the host based firewall events (mostly brute force attacks), these events are correlated into the feed by the occurrence of an attackers address across unique servers, so if 1.2.1.2 made a brute force attempt against 11 servers it would show in the feed as 11 events.

The problem that developed here is that the network edge IPS is far more noisy on an exposure level than the host based firewalls, so you would end up with hundreds of IP’s from the network edge with thousands of events each, while the host based firewalls, even though they represent hundreds of attacking IP’s also, the actual event counts relative to unique servers those IP’s attacked, was FAR lower. This meant that often the top 50 or 100 items in the threat feed were all IPS events, though quite valid events the actual host based events had more of a threat significance than some of the IPS events. The host events were simply being washed out of the top 100 on the list from the sheer volume of IPS events (who really wants to import 300 addresses from a threat feed? let alone even 100).

So, what I decided on doing was adding a weighted field into the database that is based on unique targets for each attacking IP. This weighted field is the new sort method for the feed and it works something like this. If the IPS picks up an attacker hammering five servers with an SQL injection exploit, that attacking IP ends up in the threat feed with a weight of 5, if we then have an attacker that runs brute force attacks on 30 servers, that attacking IP ends up in the threat feed with a weight of 30. The end result is that the threat feed gets better populated with the highest weighted attackers at the top, so those attackers who are more aggressive across unique targets, quickly end up at the top of the list. This allows the feed to better protect the devices/hosts it is being used on from a developing attack before the attacker reaches that device/host on the network.

Drop Format:
http://asonoc.com/api/atf.php?top=50

List Format (fields: IP | SERVICE | EVENTS | WEIGHT):
http://asonoc.com/api/atf.php?top=50&fmt=list

Signature Updates: Month In Review

Since I will be busy this coming week with other priorities, I am posting an early month in review blog on signature updates.

In the last 3 weeks we have not seen a whole lot of action on in-the-wild malware, most of what is propagating at the moment are variants of already detected content. That is however not to say there has not been new signatures extracted, allot of this months signatures have come from account level compromises on vulnerable e107, wordpress and joomla installations along with user submissions. There is not a whole lot of ground breaking malware threats, it is more of the usual such as mass mailers, perl/php command shells, irc bots and php socket flooding tools.

In total, the 3 weeks ending Sat July 24th, there has been 128 new signatures in 54 classifications with 65 signatures being added in the last 7 days. This brings us to a total of 2,588 (1002 MD5 / 1586 HEX) signatures, an increase of 117 signatures over the last blog post on signature updates. For those paying attention, there is a discrepancy of -11 signatures between the 128 new signatures and the +117 change since the last update, this is because there has also been 11 signatures removed for poor performance/false positives.

As always new signatures are automatically updated daily or can be manually updated with the -u|–update command line options. The 128 new signatures fall into the following classification groups:

base64.inject.unclassed    exp.linux.unclassed
perl.cmdshell.n0va         perl.ircbot.Arabhack
perl.ircbot.BaMbY          perl.ircbot.devil
perl.ircbot.fx29           perl.ircbot.genol
perl.ircbot.karawan        perl.ircbot.oldwolf
perl.ircbot.plasa          perl.ircbot.putr4XtReme
perl.ircbot.rafflesia      perl.ircbot.UberCracker
perl.md5browser.avi        perl.shell.cgitelnet
php.cmdshell.antichat      php.cmdshell.avi
php.cmdshell.aZRaiL        php.cmdshell.c100
php.cmdshell.DxShell       php.cmdshell.h4ntu
php.cmdshell.hackru        php.cmdshell.KAdot
php.cmdshell.lama          php.cmdshell.Macker
php.cmdshell.mic22         php.cmdshell.myshell
php.cmdshell.NCC           php.cmdshell.r3v3ng4ns
php.cmdshell.r57           php.cmdshell.s72
php.cmdshell.Safe0ver      php.cmdshell.SimShell
php.cmdshell.SRCrew        php.cmdshell.Storm7
php.cmdshell.unclassed     php.cmdshell.winx
php.cmdshell.wls           php.cmdshell.xakep
php.cmdshell.ZaCo          php.cpcrack.Aria
php.exe.globals            php.include.remote
php.ircbot.NewLive         php.mailer.DALLAS
php.mailer.unclassed       php.mailer.YoUngEST
php.nested.base64          php.pktflood.unclassed
php.rshell.0wned           web.malware.unclassed

The other side: who uses rfxn.com projects?

In one of my usual A.D.D. moments I decided to aggregate some data on project downloads and daily update queries to the rfxn.com server, to get a picture of who exactly is using the projects. Although this information is not terribly important, I do find it interesting. I need to stress that none of the listed organizations, agencies or businesses in any way endorse, sponsor or represent the opinions expressed on this site, they are simply users of my projects. That said, lets have a look at who uses the projects.

The basics:
1,808 Unique Networks across 117 Countries

Top 10 Usage Networks:
GNAX – Global Net Access
Hetzner Online
Waveform Technology
LEASEWEB
OVH
LIGHTPOINT COLOCATION & HOSTING
SoftLayer Technologies
MZIMA – Mzima Networks
CORPCOLO – Corporate Colocation
ThePlanet Internet Services

Top 10 Institutions of Higher Learning:
Columbia University
University of California at Berkeley
University of Maryland
Stanford University
York University
Washington University
University of Iowa
University of Puerto Rico
University of Alaska
University of Western Australia

Top Federal & Governmental Agencies:
State of Minnesota
Lafayette Consolidated Government
United States Coast Guard
Federal Aviation Administration

Top Corporations:
Yahoo (Bangalore Network Monitoring Center)
Yahoo (China Datacenter)
Microsoft Corp
Sun Microsystems
Google Inc
Cisco Systems
Bell Canada
Internap Network Services
IBM New Zealand

Top 15 Countries:
United States
Brazil
United Kingdom
Russian Federation
Netherlands
Canada
Germany
Australia
Turkey
Poland
Thailand
Romania
France
Japan
Switzerland