Know Your Malware Part Two – Hacky Obfuscation Techniques
In the first post in this series, we covered common PHP encoding techniques and how they’re used by malware to hide from security analysts and scanners. In today’s post, we’re going to dive a little bit deeper into other obfuscation techniques that make use of other features available in PHP.
Obfuscation Redux
In the first post in this series, we defined Obfuscation as the process of concealing the purpose or functionality of code or data so that it evades detection and is more difficult for a human or security software to analyze, but still fulfills its intended purpose. One of the main contributing factors to the popularity of PHP is its ease of use, but the same functionality that makes it easy to use also makes it easy to abuse, often in ways that were never intended.
The techniques covered in this post are often simpler and “hackier” than the ones listed in the previous article, and most of them are less reliable as indicators of malicious activity individually, as several of them typically need to be combined in order to achieve sufficient obfuscation. These techniques are also often easier for a human analyst to spot, but they are also more difficult to detect using scanning tools due to the wide variety of permutations available. Such simpler obfuscation methods can also be creatively combined with encoding techniques, granting malware authors a formidable array of tactics to avoid detection.
While it is not practical to cover every possible technique in active use, this article will detail the more commonly found methods, and help illustrate the wide range of possibilities when decoding obfuscated malware. Several of the methods we will cover today, such as comment abuse, can be combined into almost infinite variations with minute changes, thus rendering them completely undetectable to traditional hash-based malware scanning and even partially slowing down regular expression-based scanning of the type used by Wordfence.
Fortunately, while these methods do make analysis more difficult, and can slow down scanning, their presence in certain combinations is a strong signal of malicious activity, and the malware detection signatures used by the Wordfence plugin and Wordfence CLI are tuned to detect these combinations with astoundingly few false positives. Wordfence CLI in particular is useful in these cases, as it is highly performant and can run multithreaded jobs, compensating for any speed penalties imposed by these techniques.
Comment Abuse
PHP has several methods of adding code comments that you may already be familiar with. Well-commented code is considered a best practice, as it makes it much easier to maintain software and pay off technical debt, but comments can also be used for illicit purposes.
PHP uses three styles of comments:
//
, denoting a single line comment that ends on the next line.
#
, likewise a single line comment that ends on the next line, though this is less common than ‘//’.
/*
, the beginning of a multiline comment, which can only closed with */
.
Multiline comments are particularly useful to malware authors because they are ignored by PHP, and do not have to extend over multiple lines. This means that an attacker can “break up” their code to evade scanners using comments. For instance, the following code block prints “Hello, World!”:
1 2 3 4 | <?php echo /*blah */ "Hello, World!" /*b lah*/ ; |
While this is a very basic example, more complicated examples can be found in real malware, such as the following snippet, which makes use of several additional obfuscation techniques, including octal escape sequences and invisible null bytes:
1 2 | ,<?php function /*ti*/ ed_ixpn(){ echo /* o_lpl*/ 20508; } $disdcrxh_ /* ohgvr*/ = 'disdcrxh_' /* _jnsm */ ^ '' ; $zggkgqda = "\146" . "\151" . $disdcrxh_ (361-253) . /* qts */ "e" . "_" . $disdcrxh_ (564-452) /* rxw*/ . $disdcrxh_ (1006-889) . "t" . $disdcrxh_ (952-857) . /* w */ "c" . $disdcrxh_ (111) . /*fcup */ "n" . $disdcrxh_ (162-46) /* djtrl */ . /* pwdn */ "e" . $disdcrxh_ (407-297) . $disdcrxh_ (854-738) . $disdcrxh_ (115); |
While we’re not going to fully analyze this malware today, it already presents problems for many scanners. For instance, a scanner searching for the very first line of code, function ed_ixpn()
would fail to find it because of the comments. While detection using regular expressions, such as the ones used by the Wordfence Plugin scanner and Wordfence CLI are capable of detecting malware of this type, it still imposes a performance penalty on detection due to the enormous number of possible variations.
Concatenation Catastrophe
PHP makes string concatenation very simple via the dot .
operator. This allows programmers to join two separate strings with minimal hassle. For instance, the following code outputs “Hello, World!”:
1 | <?php echo “He”.”llo,”.”wor”.”ld”; |
There are a large number of legitimate use cases for string concatenation, so it’s generally only an indicator of malicious activity when combined with several other obfuscation techniques. The malware sample we shared earlier provides a good example of this, with octal encoding concatenated with the return values of various functions, which we’ll get to in a later section.
Index Fun
PHP, like most languages, stores text strings as arrays of characters, each with a defined position or index. This makes it possible to assemble arbitrary commands and data from a string containing the required characters, using the array index of each character and the concatenation operator. For instance, the following code prints “Hello, World!”:
1 2 3 4 | <?php $string = "Wow, what a cool Helpful research device!" ; echo $string [17]. $string [18]. $string [19]. $string [19]. $string [1]. $string [3]. $string [4]. $string [0]. $string [1]. $string [25]. $string [15]. $string [34]. $string [40]; |
PHP arrays start with an index of 0, meaning that $string[0]
in the example above would be “W”, the first letter of “Wow, what a cool Helpful research device!”. By concatenating letters from different parts of that text string, it’s possible to assemble an entirely different text string.
This method can be very helpful for hiding the underlying text being assembled from human researchers and security scan tools alike, and though it does have the occasional legitimate use in selecting chunks of text, when used extensively it is a strong indicator of malicious activity, though it typically needs to be combined with additional techniques such as evaluating the resulting string or passing it to a function.
Math, Not Even Once
PHP allows mathematical operations within other functionality. One of the interesting features in the malware snippet – $disdcrxh_(564-452)
– demonstrates this, with it turning out as $disdcrxh_112
due to the subtraction of 564 and 452 in the parenthesis. This functionality can likewise be combined with the string index technique mentioned above. For example, the following code prints out “Hello, World!”:
1 2 3 | <?php $string = "Wow, what a cool Helpful research device!" ; echo $string [(15+2)]. $string [(20-2)]. $string [(10+9)]. $string [(29-10)]. $string [(5-4)]. $string [(1+2)]. $string [(2+2)]. $string [(5-5)]. $string [(12-11)]. $string [(5*5)]. $string [(5*3)]. $string [34]. $string [(160/4)]; |
This adds an additional obfuscation layer that can make it even more difficult to determine the code’s functionality without executing it. However, it is incredibly rare for this type of code to be used legitimately, so the presence of this technique is typically an indicator of malicious activity.
String Reversals
One of the most basic functions in PHP’s text string manipulation libraries is strrev
, which is used to reverse strings of text. For instance, the following code snippet prints out “Hello, World!”:
1 | <?php echo strrev ( "!dlroW ,olleH" ); |
While not particularly effective at obfuscation on its own, it can be combined with the techniques in this article as well as nearly all of the techniques in our previous article on encoding to make it even more difficult to decode malicious functionality. While it has a number of legitimate use cases, the presence of strrev
alongside two or more additional encoding or obfuscation techniques is often a reliable indicator of compromise.
Variable, Dynamic, and Anonymous Functions
PHP has the ability to use variables to store function names as variables and then invoke those functions using the variable. This is widely used by legitimate software, but can also be combined with several other techniques, such as string concatenation, in which case it is often an indicator of malicious activity. For instance, the following code snippet prints out “Hello, World!”:
1 2 3 | <?php $hello = 'pri' . 'ntf' ; $string = 'Hello, World!' ; $hello ( $string ); |
This can also be combined with dynamic function invocation using methods such as call_user_func
, which accepts a function for its first parameter and any arguments to be passed to that function in subsequent parameters. As with variable function names, this is widely used in legitimate code, but it can still make analysis more difficult, especially for automated tools looking primarily for more basic function call syntax. For example, the following code snippet prints out “Hello, World!”:
1 2 3 4 | <?php $hello = 'pri' . 'ntf' ; $string = 'Hello, World!' ; $call = 'call_user_func' ; $call ( $hello , $string ); |
Finally, PHP also allows for anonymous functions, which are exactly what they sound like – functions without a name. These can be combined with variable assignment as shown:
1 2 3 4 5 | <?php $hello = function () { printf( "Hello, World!" ); }; $hello (); |
While anonymous functions are widely used in legitimate code, it is possible to use them in combination with other features to make it more difficult for automated scanning tools or human analysts to keep track of code flow and as such are useful for obfuscation.
We’ve begun to combine obfuscation layers in our examples to provide a better picture of the type of obfuscation often found in the wild, and there’s still more to come.
GOTO Labels
One of the oldest and most basic code functions is the goto
statement. While some legitimate software still uses GOTO statements, the functionality is considered poor coding practice and is not widely used, though it reflects how the code operates at a fundamental level far more accurately than more modern syntax. Its primary use in obfuscation is similar to comment abuse in that it breaks up the code so that it is more difficult to determine the control flow.
For example, the following code snippet prints out “Hello, World!” if and only if $_GET['input']
is present and set to ‘hello’, otherwise it prints “Sorry”:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | <?php $hello = 'pri' . 'ntf' ; $string = 'Hello, World!' ; if (isset( $_GET [ 'input' ]) && $_GET [ 'input' ]== 'hello' ){ goto printyes; } else goto printno; printyes: echo "Hello, World!" ; goto end ; printno: echo "Sorry" ; end : ?> |
Include/Require of non-PHP files
PHP uses the include
and require
functions to include and execute code located in a separate file. This is almost universally used, and occasionally the .inc
extension is used instead of PHP for files to be included. However, one particular feature that is ripe for abuse is that PHP will include files with any extension and execute them as code. This allows attackers to upload the bulk of their malicious code as a file with an allowed extension, often an image extension such as .ico
or .png
, and then simply include that file from a loader file with a PHP extension. Inclusion of files without a .php
or .inc
extension is thus almost always an indicator of malicious activity.
For instance, take the following set of files:
loader.php
:
1 | <?php include ( 'hello.ico' ); |
hello.ico
:
1 | <?php echo "Hello, World!" ; |
This will print out “Hello, World” when loader.php
is executed, even though hello.ico
does not have a PHP extension and would not run as PHP if accessed directly.
Putting it All Together
Here’s an example that makes use of everything we’ve learned today apart from including files:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | <?php $string = /*blah*/ "Wow, what a cool Helpful research device!" /*blah*/ ; $mashed = $string [(160/4)]. /*blah*/ $string [34]. /*blah*/ $string [(5*3)] /*blah*/ . $string [(5*5)] /*blah*/ . $string [(12-11)]. /*blah*/ $string [(5-5)]. /*blah*/ /*blah*/ $string [(2+2)]. /*blah*/ $string [(1+2)]. /*blah*/ $string [(5-4)] /*blah*/ . $string [(29-10)]. /*blah*/ $string [(10+9)]. /*blah*/ $string [(20-2)] /*blah*/ . $string [(15+2)]; function /*blah*/ echostring( /*blah*/ $str /*blah*/ ){ echo /*blah*/ $str ; return /*blah*/ ; } $rev /*blah*/ = /*blah*/ function ( $str ){ return /*blah*/ strrev ( $str ); }; goto /*blah*/ dostuff; echo /*blah*/ "That didn't work!" ; dostuff /*blah*/ : call_user_func( /*blah*/ 'echostring' , /*blah*/ $rev ( /*blah*/ $mashed )); |
It begins with comments breaking up the code as well as the concatenation and string indexing techniques we covered earlier, which assigns “Hello, World!” in reverse, or “!dlroW ,olleH” to the $mashed
variable.
A quick glance at the code might lead you to believe that it outputs “That didn’t work!” but thanks to the goto
statement that line of code is skipped – such misleading uses are par for the course with malware that uses goto
statements.
In the dostuff
section, we use call_user_func
to call the echostring
function, which really just does the same thing as echo
but serves as an additional layer of obfuscation to untangle, especially if the function were to be given a less friendly name. The echostring
function is fed the output of the anonymous function assigned to the $rev
variable, which again simply performs a str_rev
on the input. The result is that $mashed
is reversed and echoed out as “Hello, World!”. While we have kept the function and variable names relatively relevant for this example, there’s nothing preventing a malware author from naming these functions whatever they want, and indeed, misleading or nonsensical function names are more common than meaningful or useful function names in PHP malware.
Conclusion
In today’s post, we covered a number of the more creative, or “hacky” malware obfuscation techniques in widespread use, and showed examples of how they can be combined to make it difficult to analyze code functionality. All of these techniques can also be combined with the techniques in our previous post on malware obfuscation to make life even more difficult for analysts and security scanners. These two posts cover the most popular obfuscation methods used by PHP malware, but there are even more advanced and sophisticated techniques, including genuine encryption, which we will cover in our next article, alongside less commonly-used functionality.
PHP malware is constantly evolving, and our malware analysts release dozens of detection signatures every month, which can be used by the Wordfence scanner as well as by Wordfence CLI. While the vast majority of new signatures will only be made available to Wordfence Premium, Wordfence Care, Wordfence Response, and the Paid Wordfence CLI Tiers, the free version of Wordfence and Wordfence CLI still offer excellent detection capabilities, and include our broadest signature set, which in our testing detects at least one indicator of compromise on more than 90% of infected sites. We also plan to periodically update our free signature set with signatures that detect the most widespread malware from our full signature set.
Once again, we encourage readers who want to learn more about this to experiment with the various code snippets we have presented. As always, be sure to be careful with any actual malware samples you find and only execute them in a hardened virtual environment, as even PHP malware can be used for local privilege escalation on vulnerable machines.
For security researchers looking to disclose vulnerabilities responsibly and obtain a CVE ID, you can submit your findings to Wordfence Intelligence and potentially earn a spot on our leaderboard.
This article was written by Ramuel Gall, a former Wordfence Senior Security Researcher.
Comments
9:58 am
Nice
3:37 am
Thank you for the information!
4:32 am
Invoking functions using a variable seems such a stupid idea. Like if they designed the language to facilitate hacking or exploits. No wonder everyone is thinking that PHP is inherently insecure.
6:23 am
Hi Shocked,
While this functionality is useful for malicious actors, it's also found in a number of other languages that are often considered secure, such as Python and Go. PHP does tend to be more forgiving than most other languages, however, which does make it easier to abuse built-in functionality.