Preface
This article assumes a preliminary understanding of Abstract Syntax Tree structure and BabelJS. Click Here to read my introductory article on the usage of Babel.
What is String Concealing?
In JavaScript, string concealing is an obfuscation technique that transforms code in a way that disguises references to string literals. After doing so, the code becomes much less readable to a human at first glance. This can be done in multiple different ways, including but not limited to:
- Encoding the string as a hexadecimal/Unicode representation,
- Splitting a single string into multiple substrings, then concatenating them,
- Storing all string literals in a single array and referencing an element in the array when a string value is required
- Using an algorithm to encrypt strings, then calling a corresponding decryption algorithm on the encrypted value whenever its value needs to be read
In the following sections, I will provide some examples of these techniques in action and discuss how to reverse them.
Examples
Example #1: Hexadecimal/Unicode Escape Sequence Representations
Rather than storing a string as a literal, an author may choose to store it as an escape sequence. The javascript engine will parse the actual string literal value of an escaped string before it is used or printed to the console. However, it’s virtually unreadable to an ordinary human. Below is an example of a sample obfuscated using this technique.
Original Source Code
Javascript 1
2/**
3 * "Input.js"
4 * Original, unobfuscated code.
5 *
6*/
7
8class Person {
9 constructor(name, school, emoji) {
10 this.name = name;
11 this.school = school;
12 this.favEmoji = emoji;
13 }
14 sayHello() {
15 let helloStatement =
16 "Hello, my name is " +
17 this.name +
18 ". I go to " +
19 this.school +
20 " and my favourite emoji is " +
21 this.favEmoji;
22 console.log(helloStatement);
23 }
24}
25
26const examplePerson = new Person("David", "University of Obfuscation", "🤪");
27
28examplePerson.sayHello();
Post-Obfuscation Code
Javascript 1/**
2 * "stringEscapeObfuscated.js"
3 * This is the resulting code after obfuscation.
4 *
5*/
6
7class Person {
8 constructor(name, school, emoji) {
9 this.name = name;
10 this.school = school;
11 this.favEmoji = emoji;
12 }
13 sayHello() {
14 let helloStatement =
15 "\x48\x65\x6c\x6c\x6f\x2c\x20\x6d\x79\x20\x6e\x61\x6d\x65\x20\x69\x73\x20" + // Hexadecimal Escape Sequence
16 this["\x6e\x61\x6d\x65"] + // Hexadecimal encoding of member expression property
17 "\u002e\u0020\u0049\u0020\u0067\u006f\u0020\u0074\u006f\u0020" + // Unicode Escape Sequence
18 this["\u0073\u0063\u0068\u006f\u006f\u006c"] + // Unicode encoding of member expression property
19 "\x20\x61\x6e\x64\x20\x6d\x79\x20\x66\x61\x76\x6f\x75\x72\x69\x74\x65\u0020\u0065\u006d\u006f\u006a\u0069\u0020\u0069\u0073\u0020" + // Hexadecimal and Unicode Mix Escape Sequence
20 this["\x66\x61\x76\u0045\u006d\u006f\u006a\u0069"]; // Hexadecimal and Unicode encoding of member expression property
21 console.log(helloStatement);
22 }
23}
24
25const examplePerson = new Person(
26 "\u0044\u0061\u0076\u0069\u0064", // Unicode Escape Sequence */
27 "\x55\x6e\x69\x76\x65\x72\x73\x69\x74\x79\x20\x6f\x66\x20\x4f\x62\x66\x75\x73\x63\x61\x74\x69\x6f\x6e", // Hexadecimal Escape Sequence
28 "\u{1F92A}" // Curly Bracket Unicode Escape Sequence
29);
30
31examplePerson.sayHello();
Analysis Methodology
Despite appearing daunting at first glance, this obfuscation technique is relatively trivial to reverse. To begin, let’s copy and paste the obfuscated sample into AST Explorer
Our targets of interest here are the obfuscated strings, which are of type StringLiteral. Let’s take a closer look at one of these nodes:
We can deduce two things from analyzing the structure of these nodes:
- The actual, unobfuscated value has been parsed by Babel and is stored in the value property.
- All nodes containing escaped text sequences have a property, extra which store the actual value and encoded text in extra.rawValue and extra.raw properties respectively
Since the parsed value is already stored in the value property, we can safely delete the extra property, causing Babel to default to the value property when generating the code and thereby restoring the original strings. To do this, we create a visitor that iterates through all StringLiteral_to nodes to delete the **_extra** property if it exists. After that, we can generate code from the resulting AST to get the deobfuscated result. The babel implementation is shown below:
Babel Deobfuscation Script
Javascript 1/**
2 * Deobfuscator.js
3 * The babel script used to deobfuscate the target file
4 *
5*/
6const parser = require("@babel/parser");
7const traverse = require("@babel/traverse").default;
8const t = require("@babel/types");
9const generate = require("@babel/generator").default;
10const beautify = require("js-beautify");
11const { readFileSync, writeFile } = require("fs");
12
13/**
14 * Main function to deobfuscate the code.
15 * @param source The source code of the file to be deobfuscated
16 *
17 */
18function deobfuscate(source) {
19 /**
20 * Visitor for removing encoding.
21 */
22 const deobfuscateEncodedStringVisitor = {
23 StringLiteral(path) {
24 if (path.node.extra) delete path.node.extra;
25 },
26 };
27
28 //Parse AST of Source Code
29 const ast = parser.parse(source);
30
31 // Execute the visitor
32 traverse(ast, deobfuscateEncodedStringVisitor);
33
34 // Code Beautification
35 let deobfCode = generate(ast, { comments: false }).code;
36 deobfCode = beautify(deobfCode, {
37 indent_size: 2,
38 space_in_empty_paren: true,
39 });
40 // Output the deobfuscated result
41 writeCodeToFile(deobfCode);
42}
43/**
44 * Writes the deobfuscated code to output.js
45 * @param code The deobfuscated code
46 */
47function writeCodeToFile(code) {
48 let outputPath = "output.js";
49 writeFile(outputPath, code, (err) => {
50 if (err) {
51 console.log("Error writing file", err);
52 } else {
53 console.log(`Wrote file to ${outputPath}`);
54 }
55 });
56}
57
58deobfuscate(readFileSync("./stringEscapeObfuscated.js", "utf8"));
After processing the obfuscated script with the babel plugin above, we get the following result:
Post-Deobfuscation Result
Javascript 1class Person {
2 constructor(name, school, emoji) {
3 this.name = name;
4 this.school = school;
5 this.favEmoji = emoji;
6 }
7
8 sayHello() {
9 let helloStatement = "Hello, my name is " + this["name"] + ". I go to " + this["school"] + " and my favourite emoji is " + this["favEmoji"];
10 console.log(helloStatement);
11 }
12
13}
14
15const examplePerson = new Person("David", "University of Obfuscation", "\uD83E\uDD2A"); // Babel won't generate the actual representation of non-ascii characters
16examplePerson.sayHello();
The strings are now deobfuscated, and the code becomes much easier to read.
Example #2: String-Array Map Obfuscation
This type of obfuscation removes references to string literals and places them in a special array. Whenever a value must be accessed, the obfuscated script will reference the original string’s position in the string array. This technique is often combined with the previously discussed technique of storing strings as hexadecimal/unicode escape sequences. To isolate the point in this example, I’ve chosen not to include additional encoding. Below is an example of this obfuscation technique in practice:
Original Source Code
Javascript 1
2/**
3 * "Input.js"
4 * Original, unobfuscated code.S
5 *
6*/
7
8class Person {
9 constructor(name, school, animal) {
10 this.name = name;
11 this.school = school;
12 this.favAnimal = animal;
13 }
14 sayHello() {
15 let helloStatement =
16 "Hello, my name is " +
17 this.name +
18 ". I go to " +
19 this.school +
20 " and my favourite animal is a " +
21 this.favAnimal;
22 console.log(helloStatement);
23 }
24}
25
26const examplePerson = new Person("David", "University of Obfuscation", "Penguin");
27
28examplePerson.sayHello();
Post-Obfuscation Code
Javascript 1/**
2 * "stringArrayObfuscated.js"
3 * This is the resulting code after obfuscation.
4 *
5*/
6
7// This is the string array lookup table.
8var _0xcd45 = [
9 "name",
10 "school",
11 "favAnimal",
12 "Hello, my name is ",
13 ". I go to ",
14 " and my favourite animal is a ",
15 "log",
16 "David",
17 "University of Obfuscation",
18 "Penguin",
19 "sayHello",
20];
21class Person {
22 constructor(name, school, animal) {
23 // Member expression properties obfuscated using this technique
24 this[_0xcd45[0]] = name;
25 this[_0xcd45[1]] = school;
26 this[_0xcd45[2]] = animal;
27 }
28 sayHello() {
29 let helloStatement =
30 _0xcd45[3] +
31 this[_0xcd45[0]] +
32 _0xcd45[4] +
33 this[_0xcd45[1]] +
34 _0xcd45[5] +
35 this[_0xcd45[2]];
36 console[_0xcd45[6]](helloStatement);
37 }
38}
39const examplePerson = new Person(_0xcd45[7], _0xcd45[8], _0xcd45[9]);// Obfuscation of string arguments using this technique
40examplePerson[_0xcd45[10]](); // Member expression property obfuscated using this technique
Analysis Methodology
Similar to the first example, this obfuscation technique is mostly for show and very trivial to undo. To begin, let’s copy and paste the obfuscated sample into AST Explorer
Our targets of interest here are the master array, _0xcd45
and its references. These references to it are of type MemberExpression. Let’s take a closer look at one of the MemberExpression nodes of interest.
We can notice that, unlike the first example, babel does not compute the actual value of these member expressions for us. However, it does store the name of the array they are referencing and the position of the array to be accessed.
Let’s now expand the VariableDeclaration node that holds the string array.
We can observe that the name of the string array,_0xcd45
is held in path.node.declarations[0].id.name
. We can also see that path.node.declarations[0].init.elements
is an array of nodes, which holds each node of the string literals declared in the string array. Finally, the string array is the first VariableDeclaration with an init value of type ArrayExpression encountered at the top of the file.
[Note: Traditionally, javascript obfuscators put the string arrays at the top of the file/code block. However, sometimes this may not always be the case (e.g. other string-containing arrays are declared first or reassignment of the string array). You may need to make a slight modification to this step in that case.]
Using those observations, we can come up with the following logic to restore the code:
-
Traverse the ast to search for the variable declaration of the string array. To check if it is the string array’s declaration, it must meet the following criteria:
- The VariableDeclaration node must declare only ONE variable.
- Its corresponding VariableDeclarator node must have an init property of type ArrayExpression
- ALL of the elements of the ArrayExpression must be of type StringLiteral
-
After finding the declaration, we can:
- Store the string array’s name in a variable,
stringArrayName
- Store a copy of all its elements in a variable,
stringArrayElements
- Store the string array’s name in a variable,
-
Find all references to the string array. One of the most powerful features of Babel is it’s support for scopes.
From the Babel Plugin Handbook:
References all belong to a particular scope; this relationship is known as a binding.
We’ll take advantage of this feature by doing the following:
- To ensure that we are getting the references to the correct identifier, we will get the path of the
id
property and store it in a variable,idPath
. - We will then get the binding of the string array, using
idPath.scope.getBinding(stringArrayName)
and store it in a variable,binding
. - If the binding does not exist, we will skip this variable declarator by returning early.
- The
constant
property ofbinding
is a boolean determining if the variable is constant. If the value ofconstant
is false (i.e, it is reassigned/modified), replacing the references will be unsafe. In that case, we will return early. - The
referencePaths
property ofbinding
is an array containing every NodePaths that reference the string array. We’ll extract this to its own variable.
- To ensure that we are getting the references to the correct identifier, we will get the path of the
-
We will create a variable,
shouldRemove
, which will be a flag dictating whether or not we can remove the original VariableDeclaration. By default, we’ll initialize it totrue
. More on this in the next step. -
We will loop through each individual
referencePath
of thereferencePaths
array, and check if they meet all the following criteria:- The parent NodePath of the current
referencePath
must be a MemberExpression. The reason we are checking the parent node is because thereferencePath
refers to the actual referenced identifier (in our example,_0xcd45
), which would be contained in a MemberExpression parent node (such as_0xcd45[0]
) - The parent NodePath’s
object
field must be the the current referencePath’s node (that is, it must be the string array’s identifier) - The parent NodePath’s
computed
field must betrue
. This means that bracket notation is being used for member access (ex._0xcd45[0]
). - The parent NodePath’s
property
field must be of typeNumericLiteral
, so we can use it’s value to access the corresponding node by index.
- The parent NodePath of the current
-
If all of these criteria are met, we can lookup the corresponding node in our
stringArrayElements
array using the value stored in the parent NodePath’sproperty
field, and safely replace thereferencePath
’s parent path with it (that is, replace the entire MemberExpression with the actual string). -
If at least one of these conditions are not met for the current
referencePath
, we will be unable to replace the referencePath. In this case, removing the original VariableDeclarator of the string array would be unsafe, since these references to it would be in the final code. Therefore, we should set ourshouldDelete
flag to false. We’ll then skip to the next iteration of the for loop. -
After we have finished iterating over all the referencePaths, we will use the value of our
shouldRemove
flag to determine if it is safe to remove the original VariableDeclaration.
- If
shouldRemove
still has the default value oftrue
, that means all referencePaths have been successfully replaced, and the original declaration of the string array is no longer needed, so we can remove it. - If
shouldRemove
is equal tofalse
, we encountered a referencePath that we could not replace. It is then unsafe to remove the original declaration of the string array, so we don’t remove it.
The Babel implementation is shown below:
Babel Deobfuscation Script
Javascript 1/**
2 * Deobfuscator.js
3 * The babel script used to deobfuscate the target file
4 *
5 */
6
7const parser = require("@babel/parser");
8const traverse = require("@babel/traverse").default;
9const t = require("@babel/types");
10const generate = require("@babel/generator").default;
11const beautify = require("js-beautify");
12const { readFileSync, writeFile } = require("fs");
13
14/**
15 * Main function to deobfuscate the code.
16 * @param source The source code of the file to be deobfuscated
17 *
18 */
19function deobfuscate(source) {
20 /**
21 * Visitor for removing encoding.
22 */
23
24 const deobfuscateStringArrayVisitor = {
25 VariableDeclaration(path) {
26 const { declarations } = path.node;
27 if (
28 // The VariableDeclaration node must declare only ONE variable.
29 declarations.length !== 1 ||
30 // It's corresponding VariableDeclarator node must have an init property of type ArrayExpression
31 !t.isArrayExpression(declarations[0].init)
32 )
33 return; //skip
34
35 const stringArrayElements = [];
36 for (const elementNode of declarations[0].init.elements) {
37 // ALL of the elements of the ArrayExpression_must be of type StringLiteral
38 if (!t.isStringLiteral(elementNode)) return;
39 else {
40 // Store a copy of all its elements in a variable
41 stringArrayElements.push(elementNode);
42 }
43 }
44 // Store the string array's name in a variable
45 const stringArrayName = declarations[0].id.name;
46 // Get the path of the identifier. By using this path, we ensure we will ALWAYS correctly refer to the scope of the array
47 const idPath = path.get("declarations.0.id");
48 // Get the binding of the array.
49 const binding = idPath.scope.getBinding(stringArrayName);
50
51 if (!binding) return;
52
53 const { constant, referencePaths } = binding;
54
55 // This wouldn't be safe if the array was not constant.
56 if (!constant) return;
57 // This decides if we can remove the array or not.
58 // If there are any references to the array that cannot be replaced, it is unsafe to remove the original VariableDeclaration.
59 let shouldRemove = true;
60
61 for (const referencePath of referencePaths) {
62 const { parentPath: refParentPath } = referencePath;
63 const { object, computed, property } = refParentPath.node;
64 // Criteria to be a valid path for replacement:
65 // The refParent must be of type MemberExpression
66 // The "object" field of the refParent must be a reference to the array (the original referencePath)
67 // The "computed" field of the refParent must be true (indicating use of bracket notation)
68 // The "property" field of the refParent must be a numeric literal, so we can access the corresponding element of the array by index.
69 if (
70 !(
71 t.isMemberExpression(refParentPath.node) &&
72 object == referencePath.node &&
73 computed == true &&
74 t.isNumericLiteral(property)
75 )
76 ) {
77 // If the above conditions aren't met, we've run into a reference that can't be replaced.
78 // Therefore, it'd be unsafe to remove the original variable declaration, since it will still be referenced after our transformation has completed.
79 shouldRemove = false;
80 continue;
81 }
82
83 // If the above conditions are met:
84
85 // Replace the parentPath of the referencePath (the actual MemberExpression) with it's actual value.
86
87 refParentPath.replaceWith(stringArrayElements[property.value]);
88 }
89
90 if (shouldRemove) path.remove();
91 },
92 };
93
94 //Parse AST of Source Code
95 const ast = parser.parse(source);
96
97 // Execute the visitor
98 traverse(ast, deobfuscateStringArrayVisitor);
99
100 // Code Beautification
101 let deobfCode = generate(ast, { comments: false }).code;
102 deobfCode = beautify(deobfCode, {
103 indent_size: 2,
104 space_in_empty_paren: true,
105 });
106 // Output the deobfuscated result
107 writeCodeToFile(deobfCode);
108}
109/**
110 * Writes the deobfuscated code to output.js
111 * @param code The deobfuscated code
112 */
113function writeCodeToFile(code) {
114 let outputPath = "output.js";
115 writeFile(outputPath, code, (err) => {
116 if (err) {
117 console.log("Error writing file", err);
118 } else {
119 console.log(`Wrote file to ${outputPath}`);
120 }
121 });
122}
123
124deobfuscate(readFileSync("./stringArrayObfuscated.js", "utf8"));
After processing the obfuscated script with the babel plugin above, we get the following result:
Post-Deobfuscation Result
Javascript 1class Person {
2 constructor(name, school, animal) {
3 this["name"] = name;
4 this["school"] = school;
5 this["favAnimal"] = animal;
6 }
7
8 sayHello() {
9 let helloStatement = "Hello, my name is " + this["name"] + ". I go to " + this["school"] + " and my favourite animal is a " + this["favAnimal"];
10 console["log"](helloStatement);
11 }
12
13}
14
15const examplePerson = new Person("David", "University of Obfuscation", "Penguin");
16examplePerson["sayHello"]();
The strings are now deobfuscated, and the code becomes much easier to read.
Example #3: String Concatenation
This type of obfuscation, in its most basic form, takes a string such as the following:
javascript1let myString = "Hello World";
And splits it into multiple parts:
javascript1let myString = "He" + "l" + "l" + "o W" + "o" + "rl" + "d"; // => Hello World
You might be thinking, “Hey, the obfuscated version doesn’t look that bad”, and you’d be right. However, keep in mind that a file will typically have a lot more obfuscation layered on top. An example using the techniques already covered above could look something like this (or likely more advanced):
javascript1var _0xba8a = ["\x48\x65", "\x6C", "\x6F\x20\x57", "\x6F", "\x72\x6C", "\x64"]; //Encoded string array
2let myString =
3 _0xba8a[0] +
4 _0xba8a[1] +
5 _0xba8a[1] +
6 _0xba8a[2] +
7 _0xba8a[3] +
8 _0xba8a[4] +
9 _0xba8a[5]; // string concatenation
The following analysis will only cover the most basic case from the first example I showed you. Traditionally, a file’s obfuscation layers are peeled back one at a time. Your goal as a reverse engineer would be to make transformations to the code such that it looks like the basic case and only then apply this analysis.
Original Source Code
Javascript 1
2/**
3 * "Input.js"
4 * Original, unobfuscated code.
5 *
6*/
7
8class Person {
9 constructor(name, school, animal) {
10 this.name = name;
11 this.school = school;
12 this.favAnimal = animal;
13 }
14 sayHello() {
15 let helloStatement =
16 "Hello, my name is " +
17 this.name +
18 ". I go to " +
19 this.school +
20 " and my favourite animal is a " +
21 this.favAnimal;
22 console.log(helloStatement);
23 }
24}
25
26const examplePerson = new Person("David", "University of Obfuscation", "DOGGO");
27
28examplePerson.sayHello();
Post-Obfuscation Code
Javascript 1/**
2 * "stringConcatenationObfuscated.js"
3 * This is the resulting code after obfuscation.
4 *
5*/
6
7class Person {
8 constructor(name, school, emoji) {
9 this.name = name;
10 this.school = school;
11 this.favAnimal = emoji;
12 }
13 sayHello() {
14 let helloStatement =
15 "Hello, my name is " +
16 this.name +
17 ". I g" + "o t"+ "o " +
18 this.school +
19 " an" + "d "+ "m"+"y"+ " fa"+"vo"+"ur"+"ite" +" ani"+"ma"+"l" +" is" + " a "+
20 this.favAnimal;
21 console.log(helloStatement);
22 }
23}
24
25const examplePerson = new Person("D"+"a"+"vi"+"d", "Un"+"ive"+"rsi"+"ty"+ " o"+"f " + "Ob"+"fus"+"cat"+"ion", "D"+"O"+"G"+"G"+"O");
26
27examplePerson.sayHello();
Analysis Methodology
Let’s paste our obfuscated code into AST Explorer.
Our targets of interest here are all of the strings being concatenated. Let’s click on one of them to take a closer look at one of the nodes of interest.
We can make the following observations from the AST structure:
- We can see that each individual substring is of type StringLiteral.
- More importantly, the string literals seem to be contained in multiple nested BinaryExpressions.
So how could we go about solving this?
There are a few ways to do this. One would be to work up recursively from the right-most StringLiteral node in the binary expression and manually concatenate the string at each step. However, there’s a much simpler way to accomplish the same thing using Babel’s inbuilt path.evaluate() function. The steps for coding the deobfuscator are included below:
- Traverse through the AST to search for BinaryExpressions
- If a BinaryExpression is encountered, try to evaluate it using path.evaluate().
- If path.evaluate returns confident:true, check if the evaluated value is a StringLiteral. If either condition is false, return.
- Replace the BinaryExpression node with the computed value as a StringLiteral, stored in value.
The babel implementation is shown below:
Babel Deobfuscation Script
Javascript 1/**
2 * Deobfuscator.js
3 * The babel script used to deobfuscate the target file
4 *
5 */
6const parser = require("@babel/parser");
7const traverse = require("@babel/traverse").default;
8const t = require("@babel/types");
9const generate = require("@babel/generator").default;
10const beautify = require("js-beautify");
11const { readFileSync, writeFile } = require("fs");
12
13/**
14 * Main function to deobfuscate the code.
15 * @param source The source code of the file to be deobfuscated
16 *
17 */
18function deobfuscate(source) {
19 const deobfuscateStringConcatVisitor = {
20 BinaryExpression(path) {
21 let { confident, value } = path.evaluate(); // Evaluate the binary expression
22 if (!confident) return; // Skip if not confident
23 if (typeof value == "string") {
24 path.replaceWith(t.stringLiteral(value)); // Substitute the simplified value
25 }
26 },
27 };
28
29 //Parse AST of Source Code
30 const ast = parser.parse(source);
31
32 // Execute the visitor
33 traverse(ast, deobfuscateStringConcatVisitor);
34
35 // Code Beautification
36 let deobfCode = generate(ast, { comments: false }).code;
37 deobfCode = beautify(deobfCode, {
38 indent_size: 2,
39 space_in_empty_paren: true,
40 });
41 // Output the deobfuscated result
42 writeCodeToFile(deobfCode);
43}
44/**
45 * Writes the deobfuscated code to output.js
46 * @param code The deobfuscated code
47 */
48function writeCodeToFile(code) {
49 let outputPath = "output.js";
50 writeFile(outputPath, code, (err) => {
51 if (err) {
52 console.log("Error writing file", err);
53 } else {
54 console.log(`Wrote file to ${outputPath}`);
55 }
56 });
57}
58
59deobfuscate(readFileSync("./stringConcatenationObfuscated.js", "utf8"));
After processing the obfuscated script with the babel plugin above, we get the following result:
Post-Deobfuscation Result
Javascript 1class Person {
2 constructor(name, school, emoji) {
3 this.name = name;
4 this.school = school;
5 this.favAnimal = emoji;
6 }
7
8 sayHello() {
9 let helloStatement = "Hello, my name is " + this.name + ". I g" + "o t" + "o " + this.school + " an" + "d " + "m" + "y" + " fa" + "vo" + "ur" + "ite" + " ani" + "ma" + "l" + " is" + " a " + this.favAnimal;
10 console.log(helloStatement);
11 }
12
13}
14
15const examplePerson = new Person("David", "University of Obfuscation", "DOGGO");
16examplePerson.sayHello();
But hold on, that looks only partly deobfuscated!
A Minor Complication
Okay, I may have lied to you a bit. The example I gave you actually contains two cases. The simplest case with ONLY string literals:
javascript1const examplePerson = new Person(
2 "D" + "a" + "vi" + "d",
3 "Un" + "ive" + "rsi" + "ty" + " o" + "f " + "Ob" + "fus" + "cat" + "ion",
4 "D" + "O" + "G" + "G" + "O"
5);
And the bit more advanced case, where string literals are mixed with non-string literals (in this case, variables):
javascript 1let helloStatement =
2 "Hello, my name is " +
3 this.name +
4 ". I g" +
5 "o t" +
6 "o " +
7 this.school +
8 " an" +
9 "d " +
10 "m" +
11 "y" +
12 " fa" +
13 "vo" +
14 "ur" +
15 "ite" +
16 " ani" +
17 "ma" +
18 "l" +
19 " is" +
20 " a " +
21 this.favAnimal;
The above algorithm will not work for the second case as is. However, there’s a simple remedy. Simply edit the obfuscated file to wrap consecutive strings in brackets like so:
javascript 1let helloStatement =
2 "Hello, my name is " +
3 this.name +
4 (". I g" + "o t" + "o ") +
5 this.school +
6 (" an" +
7 "d " +
8 "m" +
9 "y" +
10 " fa" +
11 "vo" +
12 "ur" +
13 "ite" +
14 " ani" +
15 "ma" +
16 "l" +
17 " is" +
18 " a ") +
19 this.favAnimal;
And our deobfuscator will output our desired result:
javascript1let helloStatement =
2 "Hello, my name is " +
3 this.name +
4 ". I go to " +
5 this.school +
6 " and my favourite animal is a " +
7 this.favAnimal;
I’m sure some of you might be wondering why the algorithm doesn’t work without manually adding the brackets. This is outside of the scope of this article. However, if you’re interested in the reason for this intricacy and an algorithm that simplifies it without needing to manually add the brackets, check out my article about Constant Folding. But for now, I’ll move on to another example.
Example #4: String Encryption
First and foremost, string encryption IS NOT the same as encoding strings as hexadecimal or unicode. Whereas the javascript interpreter will automatically interpret"\x48\x65\x6c\x6c\x6f"
as "Hello"
, encrypted strings must be passed through to a decryption function and evaluated before they become useful to the javascript engine (or representable as a StringLiteral by Babel).
For example, even though Base64 is a type of encoding, in the context of string concealing it falls under string encryption since console.log("SGVsbG8=")
prints SGVsbG8=
, but console.log(atob{SGVsbG8=})
prints Hello
. In this example, atob() is the decoding function.
Most obfuscators will define custom functions for encrypting and decrypting strings. Sometimes, the string may need to go through multiple decryption functions Therefore, there is no universal solution for deobfuscating string encryption. Most of the time, you’ll need to manually analyze the code to find the string decryption function, hard-code it into your deobfuscator, then evaluate it for each CallExpression that references it. The example below will cover a single example that uses an XOR cipher from this repository for obfuscating the strings.
Original Source Code
Javascript 1
2/**
3 * "Input.js"
4 * Original, unobfuscated code.
5 *
6*/
7
8class Person {
9 constructor(name, school, animal) {
10 this.name = name;
11 this.school = school;
12 this.favAnimal = animal;
13 }
14 sayHello() {
15 let helloStatement =
16 "Hello, my name is " +
17 this.name +
18 ". I go to " +
19 this.school +
20 " and my favourite animal is a " +
21 this.favAnimal;
22 console.log(helloStatement);
23 }
24}
25
26const examplePerson = new Person("David", "University of Obfuscation", "DOGGO");
27
28examplePerson.sayHello();
Post-Obfuscation Code
Javascript 1/**
2 * "stringEncryptionObfuscated.js"
3 * This is the resulting code after obfuscation.
4 *
5*/
6
7/**
8 * The decryption function
9 * A simple implementation of an XOR cipher.
10 * @param _0xed68x1 The string to be decrypted
11 * @param _0xed68x2 The decryption key
12*/
13function _0x2720d7(_0xed68x1, _0xed68x2) {
14 var _0xed68x3 = "";
15 if (!_0xed68x2) {
16 _0xed68x2 = 6;
17 }
18 for (var _0xed68x4 = 0; _0xed68x4 < _0xed68x1["length"]; ++_0xed68x4) {
19 _0xed68x3 += String["fromCharCode"](
20 _0xed68x2 ^ _0xed68x1["charCodeAt"](_0xed68x4)
21 );
22 }
23 return _0xed68x3;
24}
25class Person {
26 constructor(name, school, animal) {
27 this[_0x2720d7("댎댁댍댅", 438971164636e3)] = name;
28 this[_0x2720d7("敷敧敬敫敫敨", 298471289414916)] = school;
29 this[_0x2720d7("옞옙옎옹옖옑옕옙옔", 834789504173688)] = animal;
30 }
31 sayHello() {
32 let helloStatement =
33 _0x2720d7("ᵅᵨᵡᵡᵢᴡᴭᵠᵴᴭᵣᵬᵠᵨᴭᵤᵾᴭ", 12786957) +
34 this[_0x2720d7("のちねづ", 468128861335552)] +
35 _0x2720d7("໌ໄໟໄ", 88739499) +
36 this[_0x2720d7("噥噵噾噹噹噺", 327790472222230)] +
37 _0x2720d7(
38 "汚氛气氞汚気氃汚氜氛氌氕氏氈氓氎氟汚氛气氓気氛氖汚氓氉汚氛汚",
39 38694010
40 ) +
41 this[_0x2720d7("녠녧녰녇녨녯녫녧녪", 148377547550982)];
42 console[_0x2720d7("㐠㐣㐫", 21889598764108)](helloStatement);
43 }
44}
45const examplePerson = new Person(
46 _0x2720d7("幕幰幧幸幵", 33775121),
47 _0x2720d7("﹪﹑﹖﹉﹚﹍﹌﹖﹋﹆﹐﹙ﹰ﹝﹙﹊﹌﹜﹞﹋﹖﹐﹑", 46595647),
48 _0x2720d7("Ⳑⳛⳓⳓⳛ", 85339284)
49);
50examplePerson[_0x2720d7("릪릸릠릑림릵릵릶", 803843901012441)]();
Analysis Methodology
Let’s paste our obfuscated code into AST Explorer.
Our targets of interest here are the cryptic calls to the _0x2720d7
function. Let’s take a closer a closer look at one of them.
We can observe that the nodes of interest are of type CallExpression. Each call expression takes in two parameters. The first is a StringLiteral which holds the encrypted string. The second is a NumericLiteral, which is used as a decryption key.
There are two ways we can deobfuscate this script, the second of which I personally prefer since it looks cleaner.
Method #1: The Copy-Paste Technique
The first method involves the following steps:
- Find the decryption function in the obfuscated script
- Paste the decryption function,
_0x2720d7
, in our deobfuscator - Traverse the ast in search for the FunctionDeclaration of the decryption function (in this case,
_0x2720d7
). Once found, remove the path as it is no longer necessary - Traverse the ast in search of CallExpressions where the callee is the decryption function (in this case,
_0x2720d7
). Once found:- Assign each arugument of
path.node.arguments
to a variable, e.g.stringToDecrypt
anddecryptionKey
respectively. - Create a variable,
result
- Evaluate
_0x2720d7(stringToDecrypt,decryptionKey)
and assign the resulting value toresult
- Replace the CallExpression path with the actual value:
path.replaceWith(t.valueToNode(result))
- Assign each arugument of
One of the reasons I don’t like to use this method is that the code for the deobfuscator can become quite long and messy if:
- The decryption function contains many lines of code, or
- There are many parameters to parse from the CallExpression
A cleaner approach in my opinion is the next method, which evaluates the decryption function and its calls in a virtual machine.
Method #2: Using the NodeJS VM module
Whenever possible, I prefer to use this method because of its cleanliness. Why? Well,
- It doesn’t require me to copy-paste the entire encryption function into my deobfuscator
- I don’t need to manually parse any of the arguments of CallExpressions before execution.
The only downside is that it requires two separate visitors and therefore two traversals, whereas you can probably implement the first method in a single traversal.
Here are the steps to implement it:
- Create a variable,
decryptFuncCtx
and assign it an empty context usingvm.createContext()
- Traverse the ast in search for the FunctionDeclaration of the decryption function (in this case,
_0x2720d7
). Once found:- Use
@babel/generator
to generate the function’s source code from the node and assign it to a variable,decryptFuncCode
- Add the decryption function to the VM’s context using
vm.runInContext(decryptFuncCode, decryptFuncCtx)
- Delete the FunctionDeclaration node with
path.remove()
as it’s now useless, and stop traversing withpath.stop()
- Use
- Traverse the ast in search of CallExpressions where the callee is the decryption function (in this case,
_0x2720d7
). Once found:- Use
@babel/generator
to generate the CallExpression’s source code from the node and assign it to a variable,expressionCode
- Evaluate the function call in the context of
decryptFuncCtx
usingvm.runInContext(expressionCode,decryptFuncCtx)
. - Optionally assign the result to a variable,
value
- Replace the CallExpression node with the computed value to restore the unobfuscated string literal.
- Use
Note: for both of these methods you should probably come up with a dynamic way to detect the decryption function (by analyzing the structure of the function node or # of calls) in case the script is morphing. You should also pay mind to the scope of function and also check if it’s ever redefined later in the script. But for this example, I will neglect that and just hardcode the name for simplicity.
The babel implementation for the second method is shown below:
Babel Deobfuscation Script
Javascript 1/**
2 * Deobfuscator.js
3 * The babel script used to deobfuscate the target file
4 *
5 */
6const parser = require("@babel/parser");
7const traverse = require("@babel/traverse").default;
8const t = require("@babel/types");
9const generate = require("@babel/generator").default;
10const beautify = require("js-beautify");
11const { readFileSync, writeFile } = require("fs");
12const vm = require("vm");
13const { create } = require("chrome-remote-interface-extra/lib/page/Page");
14/**
15 * Main function to deobfuscate the code.
16 * @param source The source code of the file to be deobfuscated
17 *
18 */
19function deobfuscate(source) {
20 //Parse AST of Source Code
21 const ast = parser.parse(source);
22
23 const decryptFuncCtx = vm.createContext();
24 // Visitor for populating the VM context
25 const createDecryptFuncCtxVisitor = {
26 FunctionDeclaration(path) {
27 const node = path.node;
28 if (node.id.name == "_0x2720d7") {// Hard-coded decryption function name for simplification
29
30 const decryptFuncCode = generate(node).code; // Generate the code to execute in context
31 vm.runInContext(decryptFuncCode, decryptFuncCtx); // Execute the decryption function delcaration in VM context
32 path.remove() // Remove the decryption function since it has served its use
33 path.stop(); // stop traversing once the decryption function has been added to the context
34
35 }
36 },
37 };
38
39 // Visitor for decrypting the string
40 const deobfuscateEncryptedStringsVisitor = {
41 CallExpression(path) {
42 const node = path.node;
43 if (node.callee.name == "_0x2720d7") { // Hard-coded decryption function name for simplification
44
45
46 const expressionCode = generate(node).code; // Convert the CallExpression to code
47 const value = vm.runInContext(expressionCode, decryptFuncCtx); // Evaluate the code
48 path.replaceWith(t.valueToNode(value)); // Replace the node with the resulting value.
49 }
50 },
51 };
52 // Create the context
53 traverse(ast, createDecryptFuncCtxVisitor);
54 // Decrypt all strings
55 traverse(ast, deobfuscateEncryptedStringsVisitor);
56
57 // Code Beautification
58 let deobfCode = generate(ast, { comments: false }).code;
59 deobfCode = beautify(deobfCode, {
60 indent_size: 2,
61 space_in_empty_paren: true,
62 });
63 // Output the deobfuscated result
64 writeCodeToFile(deobfCode);
65}
66/**
67 * Writes the deobfuscated code to output.js
68 * @param code The deobfuscated code
69 */
70function writeCodeToFile(code) {
71 let outputPath = "output.js";
72 writeFile(outputPath, code, (err) => {
73 if (err) {
74 console.log("Error writing file", err);
75 } else {
76 console.log(`Wrote file to ${outputPath}`);
77 }
78 });
79}
80
81deobfuscate(readFileSync("./stringEncryptionObfuscated.js", "utf8"));
After processing the obfuscated script with the babel plugin above, we get the following result:
Post-Deobfuscation Result
Javascript 1class Person {
2 constructor(name, school, animal) {
3 this["name"] = name;
4 this["school"] = school;
5 this["favAnimal"] = animal;
6 }
7
8 sayHello() {
9 let helloStatement = "Hello, my name is " + this["name"] + ". I go to " + this["school"] + " and my favourite animal is a " + this["favAnimal"];
10 console["log"](helloStatement);
11 }
12
13}
14
15const examplePerson = new Person("David", "University of Obfuscation", "DOGGO");
16examplePerson["sayHello"]();
The strings are now deobfuscated, and the code becomes much easier to read.
Conclusion
Phew, that was quite the long segment! That about sums up the majority of string concealing techniques you’ll find in the wild and how to reverse them.
Before I go, I want to address one thing (as a bonus of sorts):
After deobfuscating the strings, we can see that they’re restored to:
javascript1this["name"] = name;
2this["school"] = school;
3this["favAnimal"] = animal;
But someone familiar with Javascript knows that the convention is to write it like this:
javascript1this.name = name;
2this.school = school;
3this.favAnimal = animal;
The good news is, you can also use Babel to restore the traditional dot operator formatting in MemberExpressions. Read my article about it here!
If you’re interested, you can find the source code for all the examples in this repository.
I hope that this article helped you learn something new. Thanks for reading, and happy reversing!