Beyond static code analysis - understanding the semantics

István Szénási
May 17, 2023

Using static code analyzers in the development phase improves your security posture. As a regular user of several, I can 100% agree with this. There are, however, limitations. While they can highlight most of the "direct" issues in your code, they lack "understanding". They can't see the big picture. Let's see how LLMs come to our rescue. With the use of an LLM, like GPT-4 we should be able to uncover semantic issues in our codebase.

Let’s take a simple webshop

Our theoretical webshop allows users to make purchases using a web form. It has a simple payment provider implementation, such as PayPal Express Checkout.

The checkout flow would be like this:

  1. The user creates the order on the webshop
  2. The user submits the form on our webpage, and then they are redirected to a payment provider
  3. The payment is made on the payment provider side between the provider and the user
  4. After a successful payment, the user is redirected to the webshop, which can verify the payment with PayPal and fulfill the order

The code

We’ll use Python with the Flask framework. For the sake of readability, I included the relevant parts only.

The client side form

<form id="purchaseForm" action="https://paymentprovider.com/checkout" method="POST">
  Item name & short description
  <input type="hidden" name="item_id" value="{{ item.id }}" />
  <input type="hidden" name="price" value="{{ item.price }}" />
  <input type="hidden" name="return_url" value="https://webshop.com/validate_payment" />
  <input type="submit" value="Purchase" />
</form>

The server side

from flask import Flask, request

app = Flask(__name__)

@app.route("/validate_payment", methods=["GET"])
def validate_payment():
  transaction_id = request.args.get("transaction_id")
  transaction_status = verify_payment_with_provider(transaction_id)

  if transaction_status == "completed": 
    item_id = request.args.get("item_id")
    mark_order_as_complete(item_id)
    return "Payment successful"
  else:
    return "Payment failed"

def verify_payment_with_provider(transaction_id):
  # Simulated verification with the payment provider
  return "completed"

def mark_order_as_complete(item_id):
  # Update order status in the database
  pass

if __name__ == "__main__":
  app.run()

The analysis

1. Preprocessing the code

To analyze the code, we want to give the LLM the best chance to understand the semantics. This may involve tokenization, converting the code into something more amenable to natural language processing. Now we'll extract the relevant parts and send them to the LLM.

2. Feed the code to the LLM

Once the code is pre-processed, we can feed it into the LLM for analysis. We build a context for the LLM by specifying the prompt and any relevant parameters or settings.

Now we'll use OpenAI's GPT-4 model to analyze the code with this simple prompt:

The prompt

Analyze the following code snippets from a webshop application. The code consists of
an HTML form for purchases that is submitted to the payment provider. The server-side 
Python code handles the redirection and validates the transaction ID with the payment
provider. Identify any potential security vulnerabilities or semantic issues in the
code. Provide the results in a table with the following columns: id, source code line
number, short summary, description, suggested mitigation, and severity.

HTML form, rendered with Jinja:
<code_snippet>

Server-side code (Python with Flask):
<code_snippet>

I used OpenA'sI ChatGPT interface and requested a table as a response, but it seems easy to integrate with an IDE or a CI system and ask the ChatGPT to format it for that.

3. Analyzing the LLM response

After processing the input, the LLM will generate a response that may contain potential vulnerabilities, suggestions for code improvements, or any kind of insight.

These results need to be taken with a grain of salt (just like the static analyzer results). Treat these results as "findings", not "vulnerabilities", and act accordingly.

The generated response:

id line # short summary description suggested mitigation severity
1 HTML: 3 Price tampering The price value is easily modifiable by the user, which can lead to incorrect payment amounts. Move price handling to the server-side and not include it in the form. High
2 HTML: 4 Insecure return URL The return URL is hardcoded and could be manipulated by a user to redirect to a different site. Move the return URL to the server-side configuration and do not include it in the form. Medium
3 Python: 8 GET method usage The validate_payment() function uses the GET method, which can lead to sensitive data exposure. Change the request method to POST and use request.form.get() instead of request.args.get(). Medium
4 Python: 9, 12 Insufficient validation The server-side code lacks proper validation of the transaction_id, item_id, and transaction_status values. Add validation checks for transaction_id, item_id, and transaction_status High
5 Python: 18 Mock payment verification The verify_payment_with_provider() function always returns "completed", which is insecure. Replace the mock implementation with an actual call to the payment provider's API. High

This is already valuable. Try it out, and see how you like it.

Let's take a look at the findings.

The findings

  1. Price tampering:
    Score! This is what I was looking for! It “understood” the price concept and created the finding regarding that. I don't think a static analyzer would easily find this.
    It’s interesting that if we calculate the CVSS base score of this vulnerability, it shows 6.5, which is usually considered as medium but in our case, I totally see the LLM’s high score justified.
  2. Insecure return URL:
    True, but this should also be caught by static analysis. See our previous post.
  3. GET method usage:
    Fail! It understood the dangers of using get here, but according to the suggested mitigation, it merely performed the same as a static analyzer. It failed to understand the flow with the payment provider.
  4. Insufficient validation:
    True, but this should also be found with static analysis. Also, in this form, the high severity can be disputed.
  5. Mock payment verification:
    Score! The ChatGPT understood that this is a mock solution. Obviously, not checking the transactions at all can rationalize a high severity.

The mitigation results

The suggested mitigation looks reasonable. Let’s implement them!

Act as an experienced Python web developer.
Modify the provided code snippets by applying the suggested fixes.

LLM Response

I've applied the suggested fixes to the code snippets below.

HTML form, rendered with Jinja:

<form id="purchaseForm" action="https://webshop.com/submit_purchase" method="POST">
  <input type="hidden" name="item_id" value="{{ item.id }}" />
  <input type="submit" value="Purchase" />
</form>

Server-side code (Python with Flask):

from flask import Flask, request, redirect

app = Flask(__name__)

@app.route("/submit_purchase", methods=["POST"])
def submit_purchase():
    item_id = request.form.get("item_id")
    item = get_item_by_id(item_id)  # You should implement this function
    if not item:
        return "Invalid item ID", 400

    return_url = "https://webshop.com/validate_payment"
    price = item.price
    data = {
        "item_id": item_id,
        "price": price,
        "return_url": return_url
    }

    response = submit_payment_to_provider(data)  # You should implement this function
    return redirect(response.url)  # This should redirect to the payment provider's URL

@app.route("/validate_payment", methods=["POST"])
def validate_payment():
    transaction_id = request.form.get("transaction_id")
    item_id = request.form.get("item_id")
    transaction_status = verify_payment_with_provider(transaction_id)

    if not transaction_id or not item_id or transaction_status not in ["completed", "failed"]:
        return "Invalid request parameters", 400

    if transaction_status == "completed":
        mark_order_as_complete(item_id)
        return "Payment successful"
    else:
        return "Payment failed"

def verify_payment_with_provider(transaction_id):
    # Replace this function with an actual call to the payment provider's API
    pass

def get_item_by_id(item_id):
    # Retrieve item details from the database based on the item_id
    pass

def submit_payment_to_provider(data):
    # Submit the payment data to the payment provider and return their response
    pass

def mark_order_as_complete(item_id):
    # Update order status in the database
    pass

if __name__ == "__main__":
    app.run()

This updated code addresses the identified vulnerabilities and semantic issues. The price and return URLs are now handled server-side, the validate_payment function uses the POST method, and additional validation checks are in place. The verify_payment_with_provider function should be replaced with an actual call to the payment provider's API.

According to our topic, let’s focus on the semantic issues. It fixed the issue by  implementing a different payment flow without us mentioning it. (It makes us submit the payment, which may or may not be supported by our provider. In reality, it should be supported. 🙂)

4. Refine & Iterate

The initial analysis provided by the LLMs is usually not perfect, and further refinement may be necessary. We can refine the input and the parameters of the LLM or provide additional context to improve the analysis, like documentation and more (or less?) code.

This is an iterative process that helps the LLM gain a more accurate and comprehensive understanding of the semantics of the code.

For our example, it's a possible improvement to provide more context about the payment provider API and the order processing log or to explain the "dummy" functions. We can add more context by specifying the desired security requirements or standards. (e.g., PCI DSS, FIPS-140, ...)

5. Conclusion

LLMs can be a very powerful tool for code analysis. However, their effectiveness depends greatly on the context and how they are used. Mastering them is more like learning software development than learning to use a new tool. We need to learn new skills, such as crafting precise queries, providing sufficient context, and refining the responses.

As with all code scanners, LLM's findings should be viewed as potential problems, not confirmed vulnerabilities. We must carefully analyze the suggestions and the potential impact of the suggested changes; blindly applying them does more harm than good.

In summary, incorporating LLMs into the SDLC has the potential to change the way we approach code analysis and security. However, it is critical that we develop the skills necessary to use them properly to achieve the best results. If we do, we'll have a potent tool to help us build better (and more secure) software.

Back to Blog

Ready to Transform Your Organization?

Take the first step toward harnessing the power of AI for your organization. Get in touch with our experts, and let's embark on a transformative journey together.

Contact Us today