!

Legal Disclaimer

PipeAgent is a data distribution gateway. We do not own, verify, or endorse the data provided by third-party creators. Use at your own discretion.

Docs/storage strategies

Storage Strategies: Split & Shield Pattern

To maximize performance and minimize Token costs for AI Agents, PipeAgent uses a three-tier storage architecture. This ensures that Agents only process the data they actually need.

The Problem: HTML Bloat & Token Burn

Traditional scraping returns raw HTML or massive unoptimized JSON. When an Agent reads a 100KB HTML page just to find one price, you are "burning" thousands of tokens on layout code rather than reasoning.

The Solution: 3-Tier Slicing

PipeAgent providers choose a strategy during data ingestion (Push API).

1. Snapshot Strategy

  • Best for: Single JSON objects, configuration files, or atomic status updates.
  • How it works: The entire payload is stored as a single blob.
  • Agent Payload: Minimal (1:1 with source).
  • 2. Collection Strategy

  • Best for: Simple lists (e.g., Top 10 News, Recent Trades).
  • How it works: Data is stored as an array. Supports server-side pagination (limit, offset) and JSONPath projection.
  • Agent Payload: Partial (Only requested items/fields).
  • 3. Relational (Split & Shield) Strategy

  • Best for: Large datasets (e.g., Product catalogs, Real-estate listings).
  • The "Split": During ingestion, we separate data into a Light List (ID + Name) and Heavy Details (Full specs).
  • The "Shield": Agents first fetch the Light List to "scout" for relevance. Only when a specific item is identified do they request the Heavy Details using the details mode.
  • Agent Payload: Optimized (Up to 98% reduction in initial payload size).
  • ---

    High Availability Persistence

    While data is served with millisecond latency from Redis, PipeAgent automatically snapshots every update to Supabase Storage.

  • Audit Trail: Every data push is archived with a timestamp.
  • Fault Tolerance: If Redis caches are cleared, the system can automatically re-hydrate from the latest snapshot.
  • ---

    Consumer Usage

    1. Basic Fetch (Default)

    Returns the Snapshot, the full Collection, or the Light List (for Relational feeds).

    bash
    GET /api/v1/feed/{feed_id}

    2. Advanced JSONPath Projection

    Filter, slice, and project JSON on the server to protect your Agent's context window.

    bash
    # Conditional Filtering
    GET /api/v1/feed/{feed_id}?jsonpath=$[?(@.price < 100)]
    
    # Array Slicing (Get top 5)
    GET /api/v1/feed/{feed_id}?jsonpath=$[0:5]
    
    # Deep Field Extraction
    GET /api/v1/feed/{feed_id}?jsonpath=$[*].metadata.tags

    3. Relational Details Mode

    Retrieve heavy data for specific IDs.

    bash
    GET /api/v1/feed/{feed_id}?mode=details&ids=prod_01,prod_02

    Performance Tips

  • Use Collection for any list over 20 items.
  • Always implement Relational for datasets where item descriptions are large.
  • Combine with JSONPath to shield your LLM contexts from irrelevant fields.
  • Version 1.0.4 - Premium Infrastructure