Ingesting memories - XTrace Memory

Ingest is the write path. You send conversation messages; the server runs LLM-based extraction to pull out facts (and, when relevant, artifacts and episodes), embeds each one, and stores them in your org’s vector index.

The mental model

Ingest is asynchronous by default. Extraction is LLM-bound — typically 3–10 seconds — so the API returns a job immediately and does the work in the background. Your code polls or opts into sync mode.

┌──────────┐                              ┌───────────────┐
│  Client  │  POST /v1/memories  ──────►  │   Memory API  │
│          │  ◄────  IngestJob (pending)  │  (returns 1s) │
└──────────┘                              └───────────────┘
                                                  │
                                                  │  extraction (3–10s)
                                                  ▼
                                          status: succeeded
                                          result.memories_created: [...]

Required fields

Every ingest needs:

messages — array of { role, content }. Empty array → 400.
user_id — keys the per-user session namespace
conv_id — anchors every extracted memory to a conversation (for replay, export, bulk retract)

Optional: agent_id, app_id, metadata (arbitrary key/value, becomes filterable on search), extract_artifacts: true (opts into the more expensive artifact-extraction stage).

Async ingest (default)

const job = await client.memories.ingest({
  messages: [
    { role: 'user', content: 'My favorite food is pad see ew.' },
    { role: 'assistant', content: 'Noted — Thai food.' },
  ],
  user_id: 'alice',
  conv_id: 'conv_2026_05_16',
});

// pollUntilDone handles exponential backoff (500ms → 5s) and timeout.
const done = await client.memories.jobs.pollUntilDone(job.id);

if (done.status === 'failed') {
  throw new Error(`Ingest failed: ${done.error?.message}`);
}

console.log('Created', done.result?.memories_created.length, 'memories');

Sync ingest (`wait: true`)

Useful for demos, one-shot scripts, or any code where you want the result inline:

const job = await client.memories.ingest(
  {
    messages: [{ role: 'user', content: 'I am vegetarian.' }],
    user_id: 'alice',
    conv_id: 'conv_2026_05_16',
  },
  { wait: true },
);

if (job.status === 'succeeded') {
  console.log('Inline result:', job.result?.memories_created);
} else if (job.status === 'failed') {
  console.error('Extraction failed:', job.error);
} else {
  // Sync budget elapsed (30s) — fell back to async; poll job.id as above.
  console.log('Polling required:', job.id);
}

The server holds the connection for up to 30 seconds. If extraction finishes in that window the response is terminal (succeeded or failed). If the budget elapses, you get a pending/running job back and have to poll — same as async mode.

Use sync mode for interactive demos and CLI tools; use async mode for production agent loops where you want to dispatch ingest and continue working.

What gets extracted

You pass messages; you don’t pre-decide what’s a fact vs an artifact vs an episode. The server’s extraction pipeline decides:

Type	Triggered when
Fact	The default. A semantic claim in a turn (“User likes X”, “User works at Y”).
Artifact	The conversation references a structured object — a doc, code snippet, summary — that’s worth storing standalone. Requires `extract_artifacts: true` on ingest.
Episode	A stretch of turns gets summarized into a session-level memory. Server-driven; no client knob.

The result.memories_created array tells you what landed; each entry is a thin reference ({id, type, text}). For the full row, call client.memories.get(id).

What’s in `metadata`

Anything you put in metadata is stored verbatim on every memory extracted from this ingest, and each key becomes an indexed payload field filterable on search:

await client.memories.ingest({
  messages: [/* ... */],
  user_id: 'alice',
  conv_id: 'conv_2026_05_16',
  metadata: {
    project:  'atlas',
    channel:  'support',
    priority: 'high',
  },
});

// Later:
await client.memories.search({
  query: 'thai food',
  filters: { user_id: 'alice', project: 'atlas' },
});

Reserved internal keys (tag1–tag5, kb_type, org_id, etc.) are stripped silently.

Failure modes

Extraction can fail for various reasons — upstream LLM hiccup, content that doesn’t yield extractable facts, rate limits. The job lands in status: "failed" with an error.code and error.message. Retry by submitting the same body again; we don’t auto-retry server-side. Common failure codes:

Code	Meaning
`ingest_failed`	Generic extraction error; check `error.message`
`rate_limit_exceeded`	Org quota hit; wait and retry

Documentation Index

​The mental model

​Required fields

​Async ingest (default)

​Sync ingest (wait: true)

​What gets extracted

​What’s in metadata

​Failure modes

​See also

The mental model

Required fields

Async ingest (default)

Sync ingest (`wait: true`)

What gets extracted

What’s in `metadata`

Failure modes

See also