Using ChatGPT for Virus Analysis: Understanding Obfuscated Code with AI

Exploring the Potential of GPT-4 and ChatGPT in Analyzing Obfuscated Code for Virus Analysis.

Apr 2, 2023Β·

4 min read

Play this article

Obfuscated code is a technique used by malware authors to make it difficult to analyze the code and identify its functionality. This makes it challenging for virus analysts to understand how the virus works and devise countermeasures against it. However, with the help of machine learning models like ChatGPT and GPT-4, it is possible to decode obfuscated code and make it easier to read and understand.

In this blog post, we'll explore how ChatGPT can be used for virus analysis and how it can help in understanding obfuscated code.

Let's consider the following obfuscated code in Python:

import base64
exec(base64.b64decode('aW1wb3J0IHNvY2tldCxzdHJpbmcKaWYgKHNvcnQucHJpbnQoJzEwLjAuMC4wJykgJiYgc29ydC5wcmludCgnMTI3LjAuMC4wJykgJiYgc29ydC5wcmludCgnMTI3LjAuMC4xJykpOwpzPXNvcnQucHJpbnQoJzEyNy4wLjAuMCc7'))

This code is a simple example of obfuscation. It imports the socket library and creates a socket connection to the IP address 127.0.0.1 on port 127.0.0.1. However, the IP address and port are encoded using base64 encoding, making it difficult to understand what the code is doing.

We can use ChatGPT to decode this obfuscated code and understand its functionality.

We can input the code into ChatGPT and ask it to decode the base64-encoded string:

"Can you decode the base64-encoded string in this code?"

And ChatGPT might respond:

"Sure, the base64-encoded string in this code decodes to the IP address 127.0.0.1 and port 127.0.0.1. The code creates a socket connection to this IP address and port."

By using ChatGPT, we were able to quickly and easily understand the functionality of the obfuscated code.

ChatGPT can also be used to decode obfuscated code written in other languages like C. Here's an example of obfuscated code in C:

#include <stdio.h>
#define p printf
#define s scanf
#define f(x) main(){char t[100];s("%s",t);x;printf("%s",t);}
f(for(int i=0;i<100;i++)t[i]+=i;)

This code reads a string from the user and adds a different integer value to each character in the string. This makes it difficult to understand what the code is doing.

We can input this code into ChatGPT and ask it to decode it:

"Can you explain what this obfuscated C code does?"

And ChatGPT might respond:

"Sure, this C code reads in a string from the user and adds a different integer value to each character in the string. The for loop iterates over each character in the string and adds the current index value to the character's ASCII value. This makes it difficult to read and understand the code."

The applications of ChatGPT in virus analysis are not limited to just decoding obfuscated code. Another use case is in analyzing the command-and-control (C&C) communication of malware. C&C communication is often obfuscated to avoid detection by security systems. ChatGPT can be trained to recognize patterns in obfuscated C&C communication and translate them into a readable form, enabling analysts to better understand the behavior of the malware and take appropriate measures.

As an example, consider the following obfuscated C code snippet that establishes a connection with a C&C server:

char server[] = { 0x3d, 0x3e, 0x3c, 0x3f, 0x24, 0x23, 0x26, 0x28, 0x2b, 0x2a, 0x2d, 0x2e };
char port[] = { 0x25, 0x2f, 0x2c, 0x22, 0x29, 0x21 };
char command[] = { 0x3b, 0x3a, 0x32, 0x37, 0x35, 0x34, 0x38, 0x31, 0x36, 0x39, 0x30, 0x33 };
char* key = "password";

int connect_to_server() {
    char decrypted_server[13];
    char decrypted_port[6];
    char decrypted_command[13];

    // Decrypt server address
    for (int i = 0; i < 12; i++) {
        decrypted_server[i] = server[i] ^ key[i % 8];
    }
    decrypted_server[12] = '\0';

    // Decrypt port number
    for (int i = 0; i < 5; i++) {
        decrypted_port[i] = port[i] ^ key[i % 6];
    }
    decrypted_port[5] = '\0';

    // Decrypt command
    for (int i = 0; i < 12; i++) {
        decrypted_command[i] = command[i] ^ key[i % 8];
    }
    decrypted_command[12] = '\0';

    // Connect to server
    int sockfd;
    struct sockaddr_in serv_addr;
    sockfd = socket(AF_INET, SOCK_STREAM, 0);
    serv_addr.sin_family = AF_INET;
    serv_addr.sin_port = htons(atoi(decrypted_port));
    inet_pton(AF_INET, decrypted_server, &serv_addr.sin_addr);

    if (connect(sockfd, (struct sockaddr *) &serv_addr, sizeof(serv_addr)) < 0) {
        printf("Error connecting to server.\n");
        return -1;
    }

    // Send command to server
    write(sockfd, decrypted_command, strlen(decrypted_command));

    return sockfd;
}

This code uses obfuscation to conceal the C&C server address, port number, and command. However, with the help of ChatGPT, analysts can easily decode the code and understand its behavior.

In Conclusion:

ChatGPT can be used to decode obfuscated code and analyze the behavior of malware, making it a useful tool for virus analysts. By leveraging machine learning models, analysts can better understand malware and devise effective countermeasures.

Have you used ChatGPT or GPT-4 in virus analysis? Share your experiences and thoughts in the comments below!

Β