Writing a Custom Nuclei Template to Detect Excessive Data Exposure in API Endpoints

Detecting excessive data exposure in API endpoints requires a targeted approach, and custom Nuclei templates offer the flexibility to pinpoint specific sensitive data leakage patterns. This isn't about generic error messages; it's about explicitly looking for fields that an API should never return to unauthorized or even typically authorized clients, such as internal identifiers, password hashes, or private keys, often due to misconfigurations or overly permissive default serialization settings. The goal is to build a reliable signature that flags these critical exposures without unnecessary noise.

Understanding Excessive Data Exposure in APIs

Excessive data exposure, a common API security vulnerability (OWASP API Security Top 10 A3:2023), occurs when an API provides more data than is strictly necessary for a client's legitimate function. This over-fetching of data can inadvertently expose sensitive information, ranging from user PII (Personally Identifiable Information) like email addresses and phone numbers to system-critical details such as API keys, database connection strings, or internal infrastructure details. A standard user profile endpoint, for instance, might return a password_hash or jwt_secret field that a client-side application never actually uses or displays. Identifying these instances manually across numerous endpoints is time-consuming and prone to oversight; automation with tools like Nuclei is essential.

Crafting the Nuclei Request

The foundation of any Nuclei template is the HTTP request definition. When targeting API endpoints for excessive data exposure, we typically focus on common data retrieval methods, primarily GET requests, but POST requests that return data (e.g., login or data submission responses) are also relevant. For an API, the path often follows predictable patterns like /api/v1/users/{id} or /data/reports. We need to construct a request that simulates a legitimate interaction, possibly including authentication headers if the endpoint is protected.

Consider a scenario where we suspect an endpoint /api/v2/user/profile might expose internal user IDs or other sensitive data. A basic Nuclei HTTP request block would look like this:


id: api-excessive-data-exposure
info:
  name: API User Profile Excessive Data Exposure
  author: your-pentester-handle
  severity: high
  description: Detects excessive data exposure in /api/v2/user/profile endpoint.
  reference:
    - https://owasp.org/API-Security/editions/2023/en/0x2-a3-excessive-data-exposure/
  tags: api, data-exposure, pii, critical

http:
  - method: GET
    path:
      - "{{BaseURL}}/api/v2/user/profile"
    headers:
      # Common headers for API requests. Adjust as needed.
      User-Agent: "Nuclei-Scanner/1.0"
      Accept: "application/json"
      # If authentication is required, add a bearer token or API key here.
      # Authorization: "Bearer {{token}}"
    matchers-condition: and

Here, {{BaseURL}} is a dynamic variable supplied by Nuclei, representing the target host. The headers section is crucial for mimicking a real client, and if the API requires authentication, you'd insert a valid token. For initial reconnaissance and discovering publicly exposed services, tools like Zondex can be instrumental in identifying potential API endpoints that might be ripe for this kind of testing.

Handling Authentication for Protected Endpoints

Many API endpoints are protected. To test them, your Nuclei template needs to include valid authentication. This could involve an Authorization header with a bearer token, an API key, or cookie-based authentication. For bearer tokens, you might define it in the variables section or pass it directly if it's static for your testing scope.


    variables:
      token: "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..." # Replace with a valid JWT token

    headers:
      Authorization: "Bearer {{token}}"

Remember that tokens can expire. For more dynamic scenarios, you might need a prior request to obtain a fresh token, though that adds complexity beyond a simple data exposure check for a known endpoint.

Defining Matchers for Sensitive Data

The core of detecting excessive data exposure lies in Nuclei's matchers. We need to define patterns that specifically look for sensitive fields within the API's response body. Nuclei offers various matcher types, but for JSON API responses, regex and json matchers are most effective.

Using Regex Matchers

Regular expressions are powerful for identifying specific string patterns that denote sensitive information. Common fields to look for might include variations of 'password', 'secret', 'key', 'ssn', 'private', 'admin', or 'internal_id'.


    matchers:
      - type: regex
        part: body
        regex:
          - '(?i)"(password_hash|jwt_secret|private_key|api_key|ssn|social_security_number|db_connection_string|internal_user_id)"\s*:'
        condition: or # Match if any of these fields are found
        name: "Sensitive Field Found"
        severity: high

The (?i) flag makes the regex case-insensitive. We're looking for the field names followed by a colon, which is typical in JSON structures. The part: body ensures the regex is applied to the HTTP response body. Using condition: or allows us to trigger an alert if *any* of the specified sensitive fields are present.

Utilizing JSON Path Matchers

For more structured JSON responses, Nuclei's json matcher allows you to query specific paths within the JSON document, similar to XPath for XML. This is particularly useful if you know the exact or approximate location where sensitive data might appear.


      - type: json
        part: body
        json:
          - '$.user.passwordHash'
          - '$.user.privateKey'
          - '$.settings.internalDbConfig'
        condition: or
        name: "Specific JSON Path Match"
        severity: critical

Here, $.user.passwordHash targets a passwordHash field nested under a user object at the root level. Combining regex and json matchers provides comprehensive coverage.

Status Code and Word/Line Count Matchers (for Context)

While not directly for data exposure, it's good practice to ensure your request was successful before applying sensitive data matchers. A status matcher can confirm a successful response (e.g., 200 OK).

      - type: status
        status:
          - 200
        name: "HTTP Status 200"

You can also use word or line count matchers to detect unusually large responses that might hint at excessive data, though these are less precise than regex or JSON path for specific content.

Building a Full Custom Nuclei Template

Let's consolidate these ideas into a complete Nuclei template designed to detect excessive data exposure on a hypothetical user profile endpoint. This template will send a GET request to /api/v2/profile and check the response body for several common sensitive field names.


id: api-excessive-data-exposure-v2-profile
info:
  name: API Profile Endpoint - Excessive Data Exposure
  author: hacker-x
  severity: high
  description: |
    Checks for excessive data exposure in the /api/v2/profile endpoint.
    This template looks for sensitive fields such as password hashes,
    private keys, API keys, or internal identifiers that should not be
    returned to a standard client.
  reference:
    - https://owasp.org/API-Security/editions/2023/en/0x2-a3-excessive-data-exposure/
    - https://docs.projectdiscovery.io/nuclei/get-started/
  tags: api, data-exposure, pii, sensitive, critical

http:
  - method: GET
    path:
      - "{{BaseURL}}/api/v2/profile" # Target user profile endpoint
    headers:
      User-Agent: "Nuclei-Scanner/1.0 (Excessive Data Exposure Template)"
      Accept: "application/json"
      # Authorization: "Bearer YOUR_VALID_AUTH_TOKEN_HERE" # Uncomment and replace if auth needed

    matchers-condition: and # All conditions must be met for a match
    matchers:
      # Matcher 1: Ensure we get a successful HTTP response
      - type: status
        status:
          - 200
        name: "HTTP Status OK"

      # Matcher 2: Detect sensitive keywords using regex
      - type: regex
        part: body
        regex:
          - '(?i)"(password_hash|jwt_secret|private_key|api_key|ssn|social_security_number|db_connection_string|internal_id|internal_user_id|admin_flag|is_admin|secret_token)"\s*:'
          - '(?i)"(credit_card_number|cvv|bank_account_number|routing_number)"\s*:'
        condition: or # Trigger if any of these sensitive field regexes match
        name: "Sensitive Field Regex Match"
        severity: critical

      # Matcher 3: Detect specific JSON paths that might contain sensitive data
      - type: json
        part: body
        json:
          - '$.user.security.passwordHash'
          - '$.user.internalData.privateKey'
          - '$.adminDetails.apiSecret'
          - '$.settings.dbCredentials.username'
          - '$.settings.dbCredentials.password'
        condition: or
        name: "Specific Sensitive JSON Path Match"
        severity: high

    # You can also use extractors to pull out the matched sensitive data for reporting
    # extractors:
    #   - type: regex
    #     part: body
    #     regex:
    #       - '(?i)"(password_hash|jwt_secret|private_key|api_key|ssn|db_connection_string|internal_user_id)"\s*:\s*("[^"]+")'
    #     name: "ExposedSensitiveData"

Executing the Template and Interpreting Results

Once your custom template (e.g., excessive-data-profile.yaml) is ready, you can run Nuclei against your target. You can test against a single URL or a list of targets. For routing traffic through proxies, especially for internal or controlled environments, GProxy can be integrated with Nuclei's proxy flags.

To run the template against a single target:


nuclei -u https://api.example.com -t excessive-data-profile.yaml -silent

Example successful detection output:


[FTL] api-excessive-data-exposure-v2-profile [critical] - https://api.example.com/api/v2/profile - [Sensitive Field Regex Match]

The output clearly indicates that the api-excessive-data-exposure-v2-profile template matched, with a critical severity, on the specified URL, specifically due to the "Sensitive Field Regex Match" condition. This immediately tells a pentester that sensitive data is being exposed.

For broader scanning or integration into a continuous testing pipeline, you might use an input file:


nuclei -l targets.txt -t excessive-data-profile.yaml -o results.txt

Where targets.txt contains a list of URLs, one per line. The -o results.txt flag saves the output to a file for later analysis. When looking to automate web security testing across an entire application or suite of APIs, platforms like Secably often incorporate custom scanning logic, similar to Nuclei templates, to ensure continuous coverage against these types of vulnerabilities.

Advanced Considerations and Best Practices

When developing templates for excessive data exposure, consider the following:

Contextualizing Requests: Some sensitive data might only appear under specific conditions, like after a user logs in or performs a certain action. Your template might need to mimic these workflows, potentially requiring chained requests if the API is complex.
False Positives: Be precise with your regex and JSON paths to minimize false positives. Generic terms like "id" are likely to appear legitimately; focus on "internal_id", "admin_id", or other more specific identifiers.
Fuzzing Parameters: For endpoints that take an ID (e.g., /api/v2/user?id=123), consider iterating through different IDs or using common default values (id=1, id=admin) to see if more data is exposed for certain users.
Payload Diversity: If the API accepts different content types (XML, form data), ensure your template's Accept and Content-Type headers are appropriate and test against those variants.
Rate Limiting: APIs often have rate limits. Nuclei's -rate-limit flag can help prevent your scans from being blocked.

Developing effective custom Nuclei templates for excessive data exposure is an iterative process. It combines a solid understanding of potential API vulnerabilities with practical regex and JSON querying skills. Regular refinement of your templates based on new findings and API changes ensures your detection capabilities remain robust.