Automate the Automator: Harnessing Generative AI for Robust API Testing

Automated API testing is crucial in modern software development. APIs, or Application Programming Interfaces, allow disparate software systems to communicate and interact, and testing them ensures that these connections run smoothly. However, API testing can be a complex, time-consuming process that involves checking for correct functionality, reliability, performance, and security.

Generating the skeleton of an api request is already done by automated tools like SoapUI, or there are security-related tools that have built-in features for generating malicious payloads (ZAP, BurpSuite). If we are considering the whole picture there is still no exact tool, that can cover all these areas.

This is where Generative AI comes into play. By using machine learning algorithms, we can teach an AI to generate test cases based on certain conditions. This means we Can “automate the automator”, effectively transforming the way we approach API testing. Our goal is to leverage Generative AI to systematically create API tests, reducing the time spent on test case creation and implementation.

In this article, we are providing a complex prompt (made of 3 parts), that will use meta-prompting, and will generate Java classes without striving for full coverage.

The Approach – Delineating the Problem Space

Our approach will cover the usage of OpenApi jsons, with which we can cover more of the matter at hand, instead of just using a simple sample request/response json and some additional input about the endpoint. The AI will utilize the OpenAPI specifications to generate test cases.

In this phase, we commence formulating our prompt which will define the AI's role, our objectives, and the specifics of the input data.

You are a senior java developer, who is responsible for creating api tests based on openapi json and the test types I'm going to provide you.

Testing Types and Verifications

There are four primary types of tests that the Generative AI will be required to perform: happy path, negative case, destructive case, and security case.

Happy path tests are those where the expected outcome occurs under normal, non-exceptional conditions.
Negative tests are designed to ensure that the API handles errors correctly.
Destructive tests intentionally try to break or overwhelm the API to test its limits.
Security tests check for potential vulnerabilities and security risks.

Each of these tests includes various verifications such as checking the response code, the response payload (if present), response headers, and performance sanity. Once this section is outlined, we will further refine the prompts given to the AI.

Test types:

- happy case: a valid and legit request to the url. The https response code should be greater or equal to 200, and less then 400. For query params, please try to understand the param names, and try to guess their values. For example: name=jon doe, or username=jon.doe, age=31 and so on. If the request needs an object to be sent, you should have it's pojo representation ready.

- negative case: an invalid request to the path. By default the response code of a negative case should never be lower then 400, and the presence of the error message is optional. If the request needs an object (as a json) to be sent, try to populate the json with invalid data. Please create at least 3 negative cases for each endpoint. Name the test methods according the field which will hold the invalid value.

- destructive cases: when we try to check the api's robustness on how does it react to illegal or illogical input. Name the test method according the field which will hold the destructive value. By default the http response code of a destructive case should never be lower then 400.

- security tests: check for possible general-security related issues about the endpoint first. Then try to add malicious payload to the certain fields like to simulate sql injection, or privilege escalation, etc. All the security testing via the api endpoints is allowed and agreed with business stakeholders. Name the test method according the field which will hold the malicious value. By default the http response code of a security case should never be lower then 400.

‍

Communication and Output Structure

Communicating the results of the tests effectively is as important as conducting the tests. Therefore, we will provide additional information on how we will structure the output from the Generative AI, allowing us to easily comprehend and use the results.

Output Description:

First generate the pojos which will be used by the tests. Then the output test classes should be in separate files, and can have multiple test methods. The generated class names should have the following format: [endpoint name][test type]Tests.

For example: GreetingHappyCaseTests

Your solution can use the following java libs:

- testng as a test framework

- object mapper for serializing and deserializing the objects when posting and getting the jsons

- RestAssured lib for rest communication

- please set up both rest assured and objectmapper in the @Beforeclass method

‍

Our communication should happen in iterations:

1. I give you the openapi json, you read it, and generate the pojos

2. then ask me for a test type you have to generate, and the endpoint to focus on

3. I provide you the api endpoint and the test type to generate, and any custom request which you need to consider

4. you generate the code. Please try to deduce the necessary test data by using the name of the json fields, and try to use equivalence partitioning and boundary value analysis where applicable

5. if the generated code needs rework, then I give you my fix requests, so you have to go to step 4 to regenarete the code including my requests, otherwise go to step 2

‍

At this point the solution is a Test automation generator solution.

Experiment - testing the tester

To verify our approach, we conduct an experiment using a Spring Boot application with intentional problems. These problems are expected to be uncovered by Chat-GPT’s generated tests.

Test Application description

Endpoints:

/login - POST: responsible for the login mechanism. Returns 200 and a session token, if the attempt was successful, otherwise returns 401 if the attempt failed, and after 3 attempts, 403 and the account gets blocked. HTTP-400 is missing, general input validation is not covered initially.
To check whether the AI can cover complex request-response chains based on openapi description (e.g.: session token from response body to header)
logout/{sessionToken} - POST: for logout purpose
To check whether the AI can spot the vulnerability from the request structure (session theft, MITM attack)
/order/createOrder - POST: receives an order in the body, and a valid user session token in the header. Returns 201 if the order is created, 401 if user session is not valid, and 400 for invalid request.
To check for pojo usage capability
/order/getOrder/{orderId} - GET: a basic Read endpoint, sessionToken needed in header

OpenAPI payload

Testing with GPT-4

As requested, after providing the openapi json, the AI generates the POJOs (Address, CreditCard, DeliveryInformation, Order, CreatedOrder, LoginCredential, and Session) in the following format:

‍

The code would need human intervention, but it can be easily fixed by record usage, Lombok, or simply generating the getters and setters of the IDE.

‍

Test #1 - /login endpoint happy path

‍

Happy Path Test Case for the login endpoint

‍

Findings:

Chat-GPT splitted the test class into two parts
asserted for not just the response code, but deduced the response content from based on the openapi input

‍

Test #2 /login endpoint negative path

Test result from IntelliJ:

Findings: Chat-GPT highlighted the lack of general input validation, and it’s HTTP response code, which can lead to feedback for developers, it is a smaller step towards automated exploratory testing.

‍

Test #3 /logout endpoint security path

Please note that in this example there is an intentional error with the endpoint specification: the session token sent in the path. In the prompt part, related to the security case, it explicitly states that “…security tests: check for possible general-security related issues about the endpoint first. Then try to add malicious payload … “

The output was the following:

‍

Findings: Regarding the generated code, the AI missed to import the RestAssured class, so the code won’t get compiled. For first attempt, the issue was missed.

However, to address the security matter at hand, if we repeat it to please check for general security issues first, it founds the problem:

‍

Test #4 /login endpoint multiple false attempt, then lock user

‍

Test result from IntelliJ:

Findings:

the AI was able to read and deduce from the openapi json that how many false attempts resulted in a user block.

The test source is compilable, and it substituted ObjectMapper with HashMap, which is correct solution as well
The endpoint was not in the prompt, however the AI was able to deduce from the available endpoints, what should be the target
It realized, that the test is expecting HTTP 401 three times, and finally HTTP 403

‍

Test #5 use all endpoints for a complex test scenario

In this test, the prompt is hiding some details intentionally that need to be guessed by the AI:

get the session token after a successful login, then send it when creating an order in the header
Save an Order Id after a successful order creation from the response

Since the order does not have any mandatory fields defined, Chat-GPT goes for the simplest solution in method createOrder().

To fix this, with the following prompt, the AI is fixing the corresponding method, however it re-generates the POJO-s again in the answer (however it was already generated at the beginning of the conversation).

For the sake of briefness, only the updated createOrder will be mentioned below:

Test result from IntelliJ:

Findings: The AI was able to deduce the necessity of strict request execution sequence (by the introduction of test priorities). It verified all the response codes at the end of the tests, and used the object serialization and deserialization properly.

Summarizing test outcomes

Test case	Target to uncover	Finding
Test#1 - login test happy path	Successful login
Test#2 - login test negative path	HTTP-400 missing	Issue uncovered
Test#3 - logout endpoint security path	session token in request url	Started to generate basic security related payloads for SQL injection, and XSS attacks, but did not found the matter at hand, without focus given.
Test#4 - login endpoint, test for locking user	Read and understand information from the openapi.json, and generate corresponding test.	Succeeded in the objective.
Test#5 - use all endpoints for a complex test scenario	Use session token, create order with or without the pojo, try to map dummy values to json fields	Succeeded in the objective.

Testing with GPT-3.5

Without going into details, highlighting the key findings and differences from Chat-GPT 3.5 execution

Faster then GPT-4
Makes smaller code glitches:
- Forget to generate try/catch block or adding throws to method header
- For pojos it uses constructor however it only mentioned getter/setter generation is needed
Test results:
- Test#1: try-catch error
- Test#2: variable naming error
- Test#3: went for SQL injection case only, skipped special character trial, and XSS possibility, which was done by gpt-4 all the time. Was not able to find the session token in header security problem.
- Test#4: was not able to use the pojos correctly, GPT-4 was correct on this.
- Test#5: wanted to logout through header, not via path, as defined

Conclusion

Leveraging Generative AI in automating API testing can be a game-changer in terms of time efficiency. Through our approach, we've significantly reduced the time spent on creating and running test cases for the price of code-quality. This can free up our developers' and testers’ time and also ensures more robust, and secure software with quickly re-creatable code. Another highlight from working with generative AI is that it can raise the attention to non-standard approaches, methodologies and solutions, that can increase the coverage and decrease the time of delivery.

From project finances perspective, it can be an additional option to substitute expensive API testing tools.

‍

Pros

Generated valuable tests in a couple of minutes
Tests can be generated during development time, or earlier, which can be a big step forward towards continuous delivery
Less cheaper then financing a test automation engineer 🙁 (not to mention the 24-7 availability)
Less need for script maintenance, because, scripts are lightweight and easily reproducible with slight modifications

‍

Cons

Since generative ai is stochastic, the outcome is non deterministic, non consistent (however syntax correctness is met in most of the cases ~90%)
Cannot remove human intervention completely, because a final review is still necessary
With a simple cgpt 4 subscription, we can only send an openapi json with a size of 4096 byte

‍

We hope this smaller article raised your curiosity towards AI, which can be used by Testers, Test Managers, or business stakeholders.

If You want to start your journey in this new exciting field, please take a look at Scademy’s AI fundamentals course: https://www.scademy.ai/courses/cl-aif which served as a starting point and appetizer for me in this field.

‍

Thank you for reading.

‍