Docs/storage strategies

Storage Strategies: Split & Shield Pattern

To maximize performance and minimize Token costs for AI Agents, PipeAgent uses a three-tier storage architecture. This ensures that Agents only process the data they actually need.

The Problem: HTML Bloat & Token Burn

Traditional scraping returns raw HTML or massive unoptimized JSON. When an Agent reads a 100KB HTML page just to find one price, you are "burning" thousands of tokens on layout code rather than reasoning.

The Solution: 3-Tier Slicing

PipeAgent providers choose a strategy during data ingestion (Push API).

1. Snapshot Strategy

Best for: Single JSON objects, configuration files, or atomic status updates.

How it works: The entire payload is stored as a single blob.

Agent Payload: Minimal (1:1 with source).

2. Collection Strategy

Best for: Simple lists (e.g., Top 10 News, Recent Trades).

How it works: Data is stored as an array. Supports server-side pagination (limit, offset) and JSONPath projection.

Agent Payload: Partial (Only requested items/fields).

3. Relational (Split & Shield) Strategy

Best for: Large datasets (e.g., Product catalogs, Real-estate listings).

The "Split": During ingestion, we separate data into a Light List (ID + Name) and Heavy Details (Full specs).

The "Shield": Agents first fetch the Light List to "scout" for relevance. Only when a specific item is identified do they request the Heavy Details using the details mode.

Agent Payload: Optimized (Up to 98% reduction in initial payload size).

---

High Availability Persistence

While data is served with millisecond latency from Redis, PipeAgent automatically snapshots every update to Supabase Storage.

Audit Trail: Every data push is archived with a timestamp.

Fault Tolerance: If Redis caches are cleared, the system can automatically re-hydrate from the latest snapshot.

---

Consumer Usage

1. Basic Fetch (Default)

Returns the Snapshot, the full Collection, or the Light List (for Relational feeds).

bash

GET /api/v1/feed/{feed_id}

2. Advanced JSONPath Projection

Filter, slice, and project JSON on the server to protect your Agent's context window.

bash

# Conditional Filtering
GET /api/v1/feed/{feed_id}?jsonpath=$[?(@.price < 100)]

# Array Slicing (Get top 5)
GET /api/v1/feed/{feed_id}?jsonpath=$[0:5]

# Deep Field Extraction
GET /api/v1/feed/{feed_id}?jsonpath=$[*].metadata.tags

3. Relational Details Mode

Retrieve heavy data for specific IDs.

bash

GET /api/v1/feed/{feed_id}?mode=details&ids=prod_01,prod_02

Performance Tips

Use Collection for any list over 20 items.

Always implement Relational for datasets where item descriptions are large.

Combine with JSONPath to shield your LLM contexts from irrelevant fields.

Version 1.0.4 - Premium Infrastructure

Legal Disclaimer

Storage Strategies: Split & Shield Pattern

The Problem: HTML Bloat & Token Burn

The Solution: 3-Tier Slicing

1. Snapshot Strategy

2. Collection Strategy

3. Relational (Split & Shield) Strategy

High Availability Persistence

Consumer Usage

1. Basic Fetch (Default)

2. Advanced JSONPath Projection

3. Relational Details Mode

Performance Tips