# Redpine Connect -- Full Documentation

> AI-powered document search API with hybrid retrieval.

Redpine Connect lets you search across uploaded document collections using a hybrid retrieval pipeline that combines dense vector search with sparse keyword matching. Upload PDFs, ingest web content, or push structured data -- then query it all through a single REST API.

---

# Getting Started

The Redpine Search API lets you query your document collections using hybrid retrieval (dense + sparse vectors). Get from zero to your first search result in under a minute.

## 1. Get an API key

Create an API key from your API Keys page in the dashboard. Keys are scoped to your organization and optionally to specific collections.

```bash
Authorization: Bearer sk_live_YOUR_API_KEY
```

## 2. Make your first request

Send a POST request to the search endpoint with your collection name and query text. That's all you need.

```bash
curl -X POST "https://api-staging.redpine.ai/api/v1/search/query" \
  -H "Authorization: Bearer sk_live_YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"collection": "my-collection", "query": "your search query"}'
```

## 3. Explore the response

Results are ranked by relevance. Each result includes the matched text and any metadata from the original document.

```json
{
  "results": [
    {
      "id": "abc123",
      "text": "Matching document text...",
      "metadata": { "title": "Document Title" }
    }
  ],
  "queryId": "qry_a1b2c3d4e5f6",
  "latencyMs": 42
}
```

## What's next?

- **Authentication** -- API key management and scoping
- **Search API** -- Full endpoint reference and examples
- **Error Reference** -- Status codes and troubleshooting

---

# Authentication

All API requests require authentication via an API key passed as a Bearer token in the Authorization header.

## API Keys

Create and manage API keys from your API Keys page in the dashboard. Keys are prefixed with `sk_live_` and are tied to your organization.

## Header Format

Include your API key in every request using the `Authorization` header:

```bash
Authorization: Bearer sk_live_YOUR_API_KEY
```

## Example Request

```bash
curl -X POST "https://api-staging.redpine.ai/api/v1/search/query" \
  -H "Authorization: Bearer sk_live_abc123def456" \
  -H "Content-Type: application/json" \
  -d '{"collection": "my-collection", "query": "search text"}'
```

## Key Scoping

API keys can be scoped to control access:

- **Organization-wide** -- access all collections in your organization.
- **Collection-specific** -- restrict access to one or more named collections. Requests to other collections return `403`.

## Authentication Errors

If the API key is missing or invalid, the API returns `401 Unauthorized`:

```json
{
  "error": {
    "code": "AUTHENTICATION_REQUIRED",
    "message": "Invalid or missing API key",
    "request_id": "req_a1b2c3d4"
  }
}
```

If the key is valid but lacks access to the requested collection, the API returns `403 Forbidden`.

## Security Best Practices

- Never expose API keys in client-side code or public repositories.
- Use collection-scoped keys when possible to limit blast radius.
- Rotate keys periodically and revoke unused keys.
- Store keys in environment variables or a secrets manager.

---

# Search API

Search documents in your collections using hybrid retrieval.

## Search Documents

`POST /api/v1/search/query`

Search documents in a collection using hybrid retrieval (dense + sparse vectors). Returns ranked results. The search mode and reranking settings are determined by the collection's configuration.

### Request Body

| Parameter | Type | Required | Description |
|---|---|---|---|
| `collection` | string | Yes | Collection name to search |
| `query` | string | Yes | Search query text (max 1000 characters) |
| `limit` | integer | No | Max results to return (default 10, max 30) |
| `filters` | object \| null | No | Metadata filters on indexed fields |
| `include_metadata` | boolean | No | Include metadata in results (default true) |
| `include_images` | boolean | No | Fetch and include figure images as base64 in metadata.figures[].image_data (default false, adds latency) |

### Response Fields

| Field | Type | Description |
|---|---|---|
| `results` | array | Array of search results with id, text, and metadata |
| `queryId` | string | Unique query identifier. Use to re-fetch results for free within 7 days via `GET /api/v1/search/results/{queryId}` |
| `latencyMs` | integer | Search latency in milliseconds |

### Example Request (cURL)

```bash
curl -X POST "https://api-staging.redpine.ai/api/v1/search/query" \
  -H "Authorization: Bearer sk_live_YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "collection": "my-collection",
    "query": "What are the symptoms of diabetes?",
    "limit": 10,
    "include_metadata": true,
    "filters": {
      "and": [
        {"field": "year", "gte": 2020},
        {"field": "category", "eq": "research"}
      ]
    }
  }'
```

### Example Request (Python)

```python
import requests

response = requests.post(
    "https://api-staging.redpine.ai/api/v1/search/query",
    headers={"Authorization": "Bearer sk_live_YOUR_API_KEY"},
    json={
        "collection": "my-collection",
        "query": "What are the symptoms of diabetes?",
        "limit": 10,
        "include_metadata": True,
        "filters": {
            "and": [
                {"field": "year", "gte": 2020},
                {"field": "category", "eq": "research"},
            ]
        },
    },
)

data = response.json()
for result in data["results"]:
    print(result["text"][:200])
```

### Example Request (TypeScript)

```typescript
const response = await fetch(
  "https://api-staging.redpine.ai/api/v1/search/query",
  {
    method: "POST",
    headers: {
      Authorization: "Bearer sk_live_YOUR_API_KEY",
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      collection: "my-collection",
      query: "What are the symptoms of diabetes?",
      limit: 10,
      include_metadata: true,
      filters: {
        and: [
          { field: "year", gte: 2020 },
          { field: "category", eq: "research" },
        ],
      },
    }),
  }
);

const data = await response.json();
for (const result of data.results) {
  console.log(result.text.slice(0, 200));
}
```

### Example Response

```json
{
  "results": [
    {
      "id": "abc123",
      "text": "Type 2 diabetes symptoms include increased thirst, frequent urination...",
      "metadata": {
        "title": "Diabetes Overview",
        "figures": [
          {
            "id": "fig1",
            "label": "Figure 1",
            "caption": "Glucose metabolism pathway",
            "image_data": "<base64>"
          }
        ]
      }
    }
  ],
  "queryId": "qry_a1b2c3d4e5f6",
  "latencyMs": 42
}
```

## Re-fetch Results

Every search response includes a `queryId`. Use it to retrieve the same results again without being charged, for up to 7 days after the original search.

`GET /api/v1/search/results/{queryId}`

The request must use the same API key that performed the original search.

**Response:** Identical to the original search response, with `latencyMs: 0`. Includes `X-Cache: hit` and `X-Cache-Expires` headers.

**Errors:**
- `404` — Query ID not found or belongs to a different API key
- `410` — Cached result has expired (past 7-day window)

### Example (cURL)

```bash
curl "https://api-staging.redpine.ai/api/v1/search/results/qry_a1b2c3d4e5f6" \
  -H "Authorization: Bearer sk_live_YOUR_API_KEY"
```

## Filtering

Use the `filters` parameter to narrow search results by metadata fields. Two formats are supported and auto-detected based on the top-level keys.

### Operators

| Operator | Description | Example value |
|---|---|---|
| `eq` | Equals | `"research"` |
| `ne` | Not equals | `"deleted"` |
| `in` | Matches any in list | `["tech", "science"]` |
| `not_in` | Excludes values in list | `["spam", "junk"]` |
| `gt` | Greater than | `2020` |
| `gte` | Greater than or equal | `2020` |
| `lt` | Less than | `2025` |
| `lte` | Less than or equal | `2025` |
| `between` | Range (inclusive) | `[2020, 2025]` |

### Simple format

Key-value pairs where each key is a metadata field. Supports exact match, lists, ranges, and negation.

```json
// Exact match
{"category": "research"}

// Range
{"year": {"gte": 2020, "lte": 2025}}

// Any-of list
{"category": ["tech", "science"]}

// Negation
{"status": {"not": "deleted"}}
```

### Structured DSL

Boolean combinators (`and`, `or`, `not`) with explicit field conditions. Supports arbitrary nesting.

```json
{
  "and": [
    {"field": "year", "gte": 2020},
    {"field": "year", "lte": 2025},
    {
      "or": [
        {"field": "category", "in": ["tech", "science"]},
        {"field": "status", "eq": "published"}
      ]
    }
  ]
}
```

### Date filtering

ISO date strings (`YYYY-MM-DD` or `YYYY-MM-DDTHH:MM:SS`) are automatically detected and used for datetime range queries.

```json
{"field": "published_date", "between": ["2024-01-01", "2024-12-31"]}
```

---

# Filtering

Narrow search results by metadata fields using the `filters` parameter. Two formats are supported: a simple key-value format and a structured DSL with boolean combinators. The API auto-detects the format based on top-level keys.

## Operators

| Operator | Description | Example value |
|---|---|---|
| `eq` | Equals | `"research"` |
| `ne` | Not equals | `"deleted"` |
| `in` | Matches any in list | `["tech", "science"]` |
| `not_in` | Excludes values in list | `["spam", "junk"]` |
| `gt` | Greater than | `2020` |
| `gte` | Greater than or equal | `2020` |
| `lt` | Less than | `2025` |
| `lte` | Less than or equal | `2025` |
| `between` | Range (inclusive) | `[2020, 2025]` |

## Simple Format

Key-value pairs where each key is a metadata field name. Supports exact match, list membership, range operators, and negation.

### Exact match

```json
{
  "filters": {
    "category": "research"
  }
}
```

### Range

```json
{
  "filters": {
    "year": { "gte": 2020, "lte": 2025 }
  }
}
```

### Any-of list

Pass an array to match any value in the list (equivalent to `in`).

```json
{
  "filters": {
    "category": ["tech", "science"]
  }
}
```

### Negation

```json
{
  "filters": {
    "status": { "not": "deleted" }
  }
}
```

## Structured DSL

Use boolean combinators (`and`, `or`, `not`) with explicit field conditions for complex filtering logic. Supports arbitrary nesting.

### Basic AND condition

```json
{
  "filters": {
    "and": [
      { "field": "year", "gte": 2020 },
      { "field": "category", "eq": "research" }
    ]
  }
}
```

### OR condition

```json
{
  "filters": {
    "or": [
      { "field": "category", "eq": "tech" },
      { "field": "category", "eq": "science" }
    ]
  }
}
```

### Nested combinators

Combinators can be nested to build complex expressions. This example matches documents from 2020-2025 that are either in the tech/science categories or have "published" status.

```json
{
  "filters": {
    "and": [
      { "field": "year", "gte": 2020 },
      { "field": "year", "lte": 2025 },
      {
        "or": [
          { "field": "category", "in": ["tech", "science"] },
          { "field": "status", "eq": "published" }
        ]
      }
    ]
  }
}
```

### NOT combinator

Use `not` to exclude results matching a condition.

```json
{
  "filters": {
    "and": [
      { "field": "category", "eq": "research" },
      {
        "not": {
          "field": "status", "eq": "retracted"
        }
      }
    ]
  }
}
```

## Date Filtering

ISO date strings (`YYYY-MM-DD` or `YYYY-MM-DDTHH:MM:SS`) are automatically detected and used for datetime range queries. No special syntax is needed -- just pass the date string as a value.

### Date range with gte/lte

```json
{
  "filters": {
    "and": [
      { "field": "published_date", "gte": "2024-01-01" },
      { "field": "published_date", "lte": "2024-12-31" }
    ]
  }
}
```

### Date range with between

```json
{
  "filters": {
    "field": "published_date",
    "between": ["2024-01-01", "2024-12-31"]
  }
}
```

---

The `filters` parameter is part of the Search API request body. See the full endpoint reference for more details.

---

# Rate Limits

Rate limits protect the API from excessive usage and ensure fair access for all consumers.

## Current Limits

| Endpoint | Limit |
|---|---|
| `POST /api/v1/search/query` | 60 requests / minute |
| Other endpoints | Varies by endpoint |

Limits are enforced per IP address.

## Response Headers

Every response includes rate limit headers so you can track your usage:

```
HTTP/1.1 200 OK
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 57
X-RateLimit-Reset: 1709000000
```

- `X-RateLimit-Limit` -- Maximum requests allowed per window.
- `X-RateLimit-Remaining` -- Requests remaining in the current window.
- `X-RateLimit-Reset` -- Unix timestamp when the window resets.

## Exceeding the Limit

When you exceed the rate limit, the API returns `429 Too Many Requests` with a `Retry-After` header:

```
HTTP/1.1 429 Too Many Requests
Retry-After: 30

{
  "error": {
    "code": "RATE_LIMITED",
    "message": "Rate limit exceeded. Try again in 30 seconds.",
    "request_id": "req_a1b2c3d4"
  }
}
```

## Best Practices

- **Respect Retry-After** -- wait the indicated time before retrying.
- **Use exponential backoff** -- on repeated 429s, double your wait time between retries.
- **Cache results** -- avoid redundant requests for the same query.
- **Monitor your usage** -- check `X-RateLimit-Remaining` to proactively throttle before hitting limits.

---

# Error Reference

The API uses standard HTTP status codes and returns structured error responses to help you diagnose issues.

## Error Response Format

All error responses follow this structure:

```json
{
  "error": {
    "code": "ERROR_CODE",
    "message": "Human-readable description of what went wrong",
    "request_id": "req_a1b2c3d4"
  }
}
```

- `code` -- Machine-readable error code. Use this for programmatic error handling.
- `message` -- Human-readable description. Safe to display to end users.
- `request_id` -- Unique request identifier. Include this when contacting support.

## Error Codes

| Status | Code | Meaning |
|---|---|---|
| `200` | -- | Success |
| `401` | AUTHENTICATION_REQUIRED | Invalid or missing API key |
| `403` | INSUFFICIENT_PERMISSIONS | No access to requested collection |
| `404` | NOT_FOUND | Collection not found |
| `422` | VALIDATION_ERROR | Invalid request body |
| `429` | RATE_LIMITED | Rate limit exceeded |
| `500` | INTERNAL_ERROR | Internal server error |

## Troubleshooting

- **401 AUTHENTICATION_REQUIRED** -- Check that the Authorization header is present and the key is valid. Keys start with sk_live_.
- **403 INSUFFICIENT_PERMISSIONS** -- The API key doesn't have access to this collection. Use an org-wide key or add the collection to the key's scope.
- **404 NOT_FOUND** -- Verify the collection name exists. Names are case-sensitive.
- **422 VALIDATION_ERROR** -- Check required fields (collection, query). Ensure query is under 1000 characters and limit is between 1-30.
- **429 RATE_LIMITED** -- Wait for the Retry-After period and implement exponential backoff. See Rate Limits for details.
- **500 INTERNAL_ERROR** -- An unexpected error occurred. Retry the request. If it persists, contact support with the request_id.

---

Usage is tracked per API key. View statistics on your Usage page in the dashboard.
