AI-Powered Testing: Generate Unit Tests, Find Bugs, and Automate QA with LLMs
Use AI to write unit tests, generate test cases from requirements, and find bugs before users do. Complete tutorial with Python examples for pytest, coverage analysis, and CI integration.
Testing is the part of software development that everyone agrees is important and nobody wants to do. Writing tests is tedious. Maintaining tests is worse. And the coverage gap — the code paths you didn’t test because you didn’t think of them — is where production bugs hide.
AI can help with all three problems. It can generate unit tests from source code, create test cases from requirements documents, identify untested edge cases, and even find bugs by reasoning about code logic. It won’t replace a QA engineer, but it will make every developer’s testing faster and more thorough.
Here’s how to integrate AI into your testing workflow.
What AI Testing Can (and Can’t) Do
Can do:
- Generate unit tests from function signatures and docstrings
- Identify edge cases humans miss (null inputs, boundary values, race conditions)
- Create test data fixtures
- Convert requirements into test cases
- Explain what existing tests do (and don’t) cover
- Suggest improvements to existing test suites
Can’t do (reliably):
- Replace integration testing (AI doesn’t know your infrastructure)
- Guarantee 100% correctness (generated tests may have bugs too)
- Understand business logic without context
- Replace human judgment on test priorities
Setup
pip install pytest pytest-cov anthropic python-dotenv
Step 1: Generate Unit Tests from Code
# test_generator.py
"""Generate unit tests from source code using Claude."""
import os
import json
import anthropic
from dotenv import load_dotenv
load_dotenv()
class TestGenerator:
def __init__(self):
self.client = anthropic.Anthropic(
api_key=os.getenv('ANTHROPIC_API_KEY')
)
def generate_tests(
self,
source_code: str,
test_framework: str = "pytest",
coverage_targets: list[str] = None
) -> str:
"""
Generate unit tests for given source code.
Args:
source_code: The Python code to test
test_framework: Testing framework (pytest, unittest)
coverage_targets: Specific functions/methods to focus on
Returns:
Generated test code as a string
"""
targets = ""
if coverage_targets:
targets = f"\nFocus on testing these functions: {', '.join(coverage_targets)}"
prompt = f"""Generate comprehensive {test_framework} unit tests for this code:
```python
{source_code}
{targets}
Requirements:
- Test all public functions and methods
- Include tests for:
- Normal/happy path cases
- Edge cases (empty inputs, None, zero, negative numbers)
- Boundary values
- Error handling (expected exceptions)
- Type validation
- Use descriptive test names that explain what’s being tested
- Include docstrings for each test explaining intent
- Use {test_framework} fixtures where appropriate
- Add parametrize decorators for similar test cases
- Mock external dependencies if present
Return ONLY the test code. No explanation. Include all necessary imports."""
response = self.client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
system=(
"You are a senior test engineer. Generate thorough, "
"production-quality unit tests. Every test should have "
"a clear purpose and test exactly one behavior."
),
messages=[{"role": "user", "content": prompt}]
)
test_code = response.content[0].text
# Extract code from markdown if present
if "```python" in test_code:
test_code = test_code.split("```python")[1].split("```")[0]
elif "```" in test_code:
test_code = test_code.split("```")[1].split("```")[0]
return test_code.strip()
def find_missing_coverage(
self,
source_code: str,
existing_tests: str
) -> str:
"""Identify untested code paths and generate additional tests."""
prompt = f"""Analyze this source code and its existing tests.
Identify code paths that are NOT covered by the existing tests. Generate additional tests for the uncovered paths.
Source code:
{source_code}
Existing tests:
{existing_tests}
Return:
- A comment block listing the uncovered code paths
- Additional pytest tests covering those paths
Return ONLY Python code (comments + tests). No prose."""
response = self.client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
system="You are a test coverage analyst. Find and fill testing gaps.",
messages=[{"role": "user", "content": prompt}]
)
result = response.content[0].text
if "```python" in result:
result = result.split("```python")[1].split("```")[0]
return result.strip()
---
## Step 2: Example — Generating Tests
Given this source code:
```python
# calculator.py
"""A calculator module with various operations."""
from typing import Union
Number = Union[int, float]
class Calculator:
def __init__(self, precision: int = 2):
self.precision = precision
self.history: list[str] = []
def add(self, a: Number, b: Number) -> Number:
"""Add two numbers."""
result = round(a + b, self.precision)
self.history.append(f"{a} + {b} = {result}")
return result
def divide(self, a: Number, b: Number) -> float:
"""Divide a by b. Raises ZeroDivisionError if b is 0."""
if b == 0:
raise ZeroDivisionError("Cannot divide by zero")
result = round(a / b, self.precision)
self.history.append(f"{a} / {b} = {result}")
return result
def percentage(self, value: Number, total: Number) -> float:
"""Calculate what percentage value is of total."""
if total == 0:
raise ValueError("Total cannot be zero")
result = round((value / total) * 100, self.precision)
return result
def get_history(self) -> list[str]:
"""Return operation history."""
return self.history.copy()
def clear_history(self) -> None:
"""Clear operation history."""
self.history.clear()
The AI generates:
# test_calculator.py (AI-generated)
import pytest
from calculator import Calculator
@pytest.fixture
def calc():
"""Create a Calculator instance for testing."""
return Calculator(precision=2)
class TestAdd:
"""Tests for Calculator.add method."""
def test_add_positive_integers(self, calc):
"""Adding two positive integers returns their sum."""
assert calc.add(2, 3) == 5
def test_add_negative_numbers(self, calc):
"""Adding negative numbers works correctly."""
assert calc.add(-2, -3) == -5
def test_add_mixed_sign(self, calc):
"""Adding positive and negative numbers works."""
assert calc.add(5, -3) == 2
def test_add_floats(self, calc):
"""Adding floats respects precision setting."""
assert calc.add(1.111, 2.222) == 3.33
def test_add_zero(self, calc):
"""Adding zero returns the other number."""
assert calc.add(5, 0) == 5
assert calc.add(0, 5) == 5
@pytest.mark.parametrize("a,b,expected", [
(0, 0, 0),
(1e10, 1e10, 2e10),
(-1e10, 1e10, 0),
(0.1, 0.2, 0.3),
])
def test_add_parametrized(self, calc, a, b, expected):
"""Test add with various input combinations."""
assert calc.add(a, b) == expected
def test_add_records_history(self, calc):
"""Adding records the operation in history."""
calc.add(2, 3)
assert len(calc.get_history()) == 1
assert "2 + 3 = 5" in calc.get_history()[0]
class TestDivide:
"""Tests for Calculator.divide method."""
def test_divide_normal(self, calc):
"""Normal division returns correct result."""
assert calc.divide(10, 2) == 5.0
def test_divide_by_zero_raises(self, calc):
"""Dividing by zero raises ZeroDivisionError."""
with pytest.raises(ZeroDivisionError, match="Cannot divide by zero"):
calc.divide(10, 0)
def test_divide_zero_numerator(self, calc):
"""Dividing zero by any non-zero number returns 0."""
assert calc.divide(0, 5) == 0.0
def test_divide_float_result(self, calc):
"""Division with non-integer result respects precision."""
assert calc.divide(1, 3) == 0.33
def test_divide_negative(self, calc):
"""Division with negative numbers has correct sign."""
assert calc.divide(-10, 2) == -5.0
assert calc.divide(10, -2) == -5.0
assert calc.divide(-10, -2) == 5.0
class TestPercentage:
"""Tests for Calculator.percentage method."""
def test_percentage_normal(self, calc):
"""Normal percentage calculation."""
assert calc.percentage(25, 100) == 25.0
def test_percentage_zero_total_raises(self, calc):
"""Zero total raises ValueError."""
with pytest.raises(ValueError, match="Total cannot be zero"):
calc.percentage(10, 0)
def test_percentage_over_100(self, calc):
"""Value greater than total gives percentage over 100."""
assert calc.percentage(150, 100) == 150.0
def test_percentage_zero_value(self, calc):
"""Zero value gives 0 percentage."""
assert calc.percentage(0, 100) == 0.0
class TestHistory:
"""Tests for history tracking."""
def test_empty_history(self, calc):
"""New calculator has empty history."""
assert calc.get_history() == []
def test_history_records_operations(self, calc):
"""Multiple operations are recorded in order."""
calc.add(1, 2)
calc.divide(10, 5)
history = calc.get_history()
assert len(history) == 2
def test_clear_history(self, calc):
"""Clear history empties the list."""
calc.add(1, 2)
calc.clear_history()
assert calc.get_history() == []
def test_history_returns_copy(self, calc):
"""get_history returns a copy, not a reference."""
calc.add(1, 2)
history = calc.get_history()
history.clear()
assert len(calc.get_history()) == 1
class TestPrecision:
"""Tests for precision configuration."""
def test_custom_precision(self):
"""Custom precision affects rounding."""
calc = Calculator(precision=4)
assert calc.add(1.11111, 2.22222) == 3.3333
def test_zero_precision(self):
"""Zero precision rounds to integers."""
calc = Calculator(precision=0)
assert calc.add(1.5, 2.5) == 4
Step 3: Run and Measure Coverage
# Run tests with coverage
pytest test_calculator.py -v --cov=calculator --cov-report=term-missing
# Output:
# Name Stmts Miss Cover Missing
# -----------------------------------------------
# calculator.py 25 0 100%
# -----------------------------------------------
# TOTAL 25 0 100%
Step 4: Integrate into CI/CD
# .github/workflows/ai-tests.yml
name: AI-Assisted Testing
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.12'
- name: Install dependencies
run: pip install pytest pytest-cov
- name: Run tests
run: pytest --cov --cov-report=xml -v
- name: Check coverage threshold
run: |
coverage report --fail-under=80
Step 5: Bug Finding with AI
def find_bugs(self, source_code: str) -> list[dict]:
"""Analyze code for potential bugs."""
prompt = f"""Analyze this Python code for potential bugs,
edge cases, and logic errors.
```python
{source_code}
For each issue found, return JSON: [ {{ “severity”: “critical|high|medium|low”, “line”: “approximate line number or function name”, “issue”: “description of the problem”, “example”: “input that triggers the bug”, “fix”: “suggested fix” }} ]
Look for:
- Off-by-one errors
- Null/None handling gaps
- Type coercion issues
- Race conditions
- Resource leaks
- Integer overflow
- Incorrect error handling
- Missing input validation
Return ONLY the JSON array."""
response = self.client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
system="You are a senior code reviewer specializing in bug detection.",
messages=[{"role": "user", "content": prompt}]
)
result_text = response.content[0].text
if "```" in result_text:
result_text = result_text.split("```")[1]
if result_text.startswith("json"):
result_text = result_text[4:]
result_text = result_text.split("```")[0]
try:
return json.loads(result_text.strip())
except json.JSONDecodeError:
return [{"error": "Failed to parse bug analysis"}]
---
## Best Practices
1. **Always review generated tests.** AI-generated tests can contain errors — wrong assertions, incorrect test logic, or tests that pass for the wrong reasons.
2. **Use AI for the first draft, humans for the final version.** Let AI generate 80% of your tests, then manually add the business-logic-specific cases that require domain knowledge.
3. **Run coverage analysis after AI test generation.** Use the coverage report to identify gaps, then ask the AI to fill them specifically.
4. **Don't trust AI for security testing.** AI can find basic input validation issues but misses complex security vulnerabilities. Use dedicated security tools (Bandit, safety, SAST tools) for that.
5. **Version control your test generation prompts.** As you refine your prompts to produce better tests, track those changes just like code.
The goal isn't AI-generated tests that replace human testing. It's AI-generated tests that give you a solid baseline, so human testing time is spent on the complex, creative test scenarios that actually catch the bugs that matter.
Write the boring tests with AI. Save your brain for the interesting ones. Sources
> Want more like this?
Get the best AI insights delivered weekly.
> Related Articles
Web Scraping with AI: Build a Smart Data Extraction Pipeline
Traditional web scraping breaks when websites change layouts. AI-powered scraping understands page structure and extracts data intelligently. Here's how to build one using Python, Beautiful Soup, and Claude.
Create an AI Art Portfolio: From Generation to Gallery in One Weekend
Build a professional AI art portfolio website with curated collections, consistent style, and proper attribution. Covers prompt engineering, style consistency, curation, and deployment.
Build an AI Chrome Extension: Add Claude to Any Webpage in 60 Minutes
Build a Chrome extension that summarizes web pages, answers questions about content, and rewrites selected text — all powered by Claude. Full source code and step-by-step instructions included.
Tags
> Stay in the loop
Weekly AI tools & insights.