Connected Context and Persistent Memory: Neo4j Providers for the Microsoft Agent Framework

Standard RAG retrieves document chunks by semantic similarity. Ask about Apple’s risk exposure in SEC 10-K filings, and vector search returns the right paragraphs but misses the connections between them. A chunk mentioning competitive pricing surfaces separately from product categories, separately from geographic dependencies. The retriever can’t traverse from a filing excerpt to the company that filed it, to the risk factors it faces, and the products it sells. A knowledge graph solves this by storing connections between unstructured text and the structured entities around it, so retrieval follows relationships rather than relying solely on similarity.

A second gap compounds the first. Without persistent memory, every conversation starts from zero. An agent has no record of what the user explored in prior sessions, what preferences they expressed, or which entities already surfaced. Continuity across sessions doesn’t exist.

These are two distinct problems, one of retrieval and one of memory, and they call for different architectural responses.

The Microsoft Agent Framework is an open-source SDK and runtime for building AI agents in Python and NET. Agents invoke external tools through a standardized interface, whether those tools are local functions, REST APIs, or MCP servers. They form workflows in which multiple specialized agents collaborate on complex tasks, using a graph-based architecture that routes data along typed edges between components. The framework runs locally for development and integrates with Microsoft Foundry for production deployment with tracing and metrics.

The framework provides two complementary building blocks for data access: tools and context providers. Tools let an agent take explicit actions during a conversation turn by calling APIs, querying databases, or executing code. Context providers operate around the turn. They inject knowledge before the model runs and persist information after it responds, without the agent needing to request either. Neo4j addresses both gaps through two context providers built on this interface, one for knowledge graph retrieval and one for agent memory.

Two Neo4j Context Providers

What makes a graph database practical for these agent workloads is that graph traversal and semantic search are combined into a single operation. Neo4j includes built-in vector search, so a single query can find the most relevant text chunks based on embedding similarity, then expand through graph relationships to collect structured context such as products, risk factors, and geographic exposure, without a separate retrieval step for each. The pattern applies wherever relationships carry meaning: financial filings linking companies to risks, supply chains connecting parts through assemblies, compliance networks mapping regulations to dependencies.

The Neo4j Context Provider (a knowledge graph retriever) addresses the first gap in the opening scenario: accessing the risk factors and products that lie beyond the top-k chunks. It searches a Neo4j database and traverses the graph to return structured company data, including products, risk factors, and filing metadata, alongside the text chunks that vector search found. This provider is stateless. It reads from the graph but doesn’t write to it. The knowledge it surfaces comes from data that was loaded independently: SEC filings, product catalogs, maintenance records, whatever the graph contains.

The Neo4j Agent Memory provider addresses the second gap by ensuring that session twelve builds on sessions one through eleven. It stores conversation history, extracts entities and relationships from messages, records user preferences, and logs reasoning traces. On each turn, it injects relevant memories from prior conversations alongside the current context. Unlike the knowledge retriever, the memory provider writes to the graph on every interaction. The graph grows as the agent converses, building a personalized knowledge base that compounds over time.

Either context provider can be used independently, or both can be attached to the same agent simultaneously. The knowledge retriever brings domain expertise from a curated knowledge graph. Agent memory brings continuity and personalization from the agent’s interaction history. Together, they give the agent access to what it needs to know and what it has already learned.

How the Knowledge Graph Context Provider Works

The knowledge retriever delegates all searches to the neo4j-graphrag Python library, which provides tested components for vector, full-text, and hybrid search. The provider acts as an adapter between that library and MAF’s context provider interface.

When a user sends a message, the provider executes a five-step sequence:

  1. Filter messages. Keep only the most recent user and assistant messages from the conversation, typically the last 10 turns. System messages contain instructions, not searchable content.
  2. Build a query. Concatenate the filtered text into a single search string. Including conversational context helps the search stay relevant when the current message references something mentioned earlier.
  3. Execute the search. Run the query against a configured Neo4j index. For vector search, the provider embeds the query text and finds nodes with similar embeddings ranked by cosine similarity. For full-text search, the query passes to Neo4j’s BM25 scoring algorithm. The hybrid mode runs both and combines the results.
  4. Traverse the graph. If a retrieval_query is configured, execute it against each matched node. This Cypher query follows relationships from matched nodes to related entities and returns structured metadata alongside the original text. Without a retrieval query, the provider returns raw search results, which works well for simpler use cases where graph traversal isn’t needed.
  5. Format and inject. Package the results as messages that the framework injects into the conversation. Each result includes its relevance score, metadata fields from the retrieval query, and the text content.

The model receives the user’s question alongside formatted search results and uses them to respond. It doesn’t know the results came from Neo4j.

The context provider offers multiple retrieval patterns, all based on the neo4j-graphrag Python library. These include:

  • VectorRetriever — semantic similarity search using embeddings
  • VectorCypherRetriever — vector search followed by a Cypher graph traversal that collects structured metadata from connected entities
  • HybridRetriever — combines vector and fulltext (BM25) search
  • HybridCypherRetriever — hybrid search followed by a Cypher graph traversal
  • FulltextRetriever — keyword-based BM25 search

The Cypher variants add a graph traversal step after the initial search, following relationships from matched nodes to related entities. This design means graph enrichment is an upgrade path, not a commitment. Start with basic vector search and add a retrieval query later without changing agent code.

Configuring Graph-Enriched Retrieval

Graph-enriched retrieval is where the true power of GraphRAG lies: semantic search finds relevant text chunks, and graph traversal surfaces the structured context around them. This is configured with a retrieval query, a Cypher query that runs after the vector search and defines which relationships to traverse, what metadata to collect, and how to structure the results the agent receives.

The example below shows how this would work with a knowledge graph built from SEC filings, where document chunks link to documents, documents link to companies, and companies link to products and risk factors. The following retrieval query would then be part of the context provider configuration.The query receives two variables from the vector search: node (the matched chunk) and score (its similarity ranking). From there, it walks the graph. The first MATCH follows the chain from the chunk to its parent document to the company that filed it. Two OPTIONAL MATCH clauses then collect related entities, risk factors, and products, in separate passes to avoid the cross-product duplication that would occur if both were matched in a single clause. Each collection is capped at five items. The WHERE score IS NOT NULL filter removes any rows that lost their score during the optional matching. The final RETURN assembles a flat result with the original text, the similarity score, and the structured metadata the agent will use.

The provider configuration points at a Neo4j vector index and passes the retrieval query. The key architectural choices are index_type, which selects the search strategy, retrieval_query, which triggers graph traversal after search, and top_k, which controls how many chunks the initial search returns before the Cypher traversal runs against each one.

How Graph Traversal Changes What the Agent Sees

The configuration above points at a vector index and adds a retrieval query for graph traversal. To see what that retrieval query changes, consider the same question run both ways: first with vector search alone, then with the graph traversal applied after it. The underlying search is identical. The difference is what the agent receives.

Vector Search Only

The retriever returns text chunks ranked by cosine similarity. These are relevant paragraphs, but disconnected fragments:

The agent responds with what it can piece together from those chunks:

The answer is partial. The agent found products in chunk 1 and mentions competition from chunk 2, but specific risk factors like geography, short product life cycles, and evolving industry standards weren’t in the top-k chunks, so they’re missing from the response entirely.

Graph-Enriched Retrieval

The same vector search finds the same chunks. But the retrieval query then traverses the graph, following relationships from chunks to the company node, then to connected products and risk factors:

The agent now has a structured context and responds comprehensively:

Same vector search, same top-k chunks. The difference is what happens after the search. The retrieval query traverses the graph and surfaces a structured context that the agent can reason over.

Graph-enriched retrieval addresses the first gap identified in the opening: reaching entities beyond the top-k chunks. But the second gap remains. The agent still has no memory of prior sessions, no record of what the user has already explored, and no accumulated preferences. Each conversation starts from zero.

How Neo4j Agent Memory Works

Neo4j Agent Memory closes this second gap. Where the knowledge retriever gives an agent access to a curated knowledge base, the memory provider enables it to learn from its own conversations.

The Neo4j Agent Memory provider implements MAF’s context-provider interface, with both before_run and after_run hooks. Before the model runs, it gathers relevant memories and injects them as context. After the model responds, it persists the new messages, extracts entities and relationships, and optionally records reasoning traces. The graph grows with every conversation.

On each turn, the before_run hook assembles context from three memory types. It pulls recent messages from the current session along with semantically similar messages from past sessions. It retrieves user preferences and relevant entities from long-term memory. It finds similar past tasks from the reasoning trace store. All of this is formatted and injected into the agent’s context window alongside whatever the knowledge retriever contributed.

The after_run hook handles persistence. It saves the new user and assistant messages, along with their embeddings, for future semantic search. It runs entity extraction over the conversation text, identifying people, organizations, locations, and other entities, and writes them to the graph with relationships linking them back to the messages that mentioned them. Entity extraction can run asynchronously, so it doesn’t block the response.

What Neo4j Agent Memory Stores

The memory provider organizes knowledge into three layers, each serving a different temporal and structural purpose.

Short-term memory captures the conversation itself. When the analyst asks about Apple’s supply chain exposure in session twelve, the provider surfaces a relevant exchange from session three about semiconductor sourcing, even though the two conversations used different terminology. Messages are stored as nodes linked in sequence by NEXT_MESSAGE relationships, grouped under a Conversation node for the session. Each message carries an embedding vector, enabling semantic search across the full conversation history.

Long-term memory structures the knowledge that accumulates across conversations. The system knows this analyst focuses on risk exposure rather than dividend yield, and that “Apple” and “Apple Inc.” refer to the same entity. It stores four types of information. Entities follow the POLE+O classification — Person, Organization, Location, Event, and Object — a taxonomy that provides consistent entity typing across extraction methods. The extraction pipeline supports domain-specific schemas beyond POLE+O for specialized use cases, including scientific, medical, legal, and business contexts. Entities are extracted from conversations, deduplicated using a combination of embedding similarity and fuzzy string matching, and connected through typed relationships. Preferences capture what the user cares about, categorized by topic. Facts represent subject-predicate-object triples with temporal validity, recording that a company appointed a new CEO effective on a specific date. Relationships between entities are first-class objects, linking a company to its products, a person to their role, or a risk factor to the geography it affects.

Reasoning memory records how the agent has worked, not just what it discussed. When a similar company-risk analysis arrives, the provider surfaces the prior approach: the tools used, the structure used, and whether it succeeded. Each task execution is stored as a ReasoningTrace containing the individual steps the agent took, their arguments and results, and the outcome. These traces carry embeddings of the task description, so when a similar request arrives in a future session, the provider can surface the prior approach.

The deduplication system warrants closer inspection. When extraction identifies “Apple” in one message and “Apple Inc.” in another, the resolution pipeline compares them using exact matching, fuzzy string matching, and embedding similarity. If the confidence exceeds the threshold, the two nodes merge with a SAME_AS relationship, preserving the link. This prevents the graph from fragmenting into disconnected mentions of the same entity, thereby defeating the purpose of graph-based memory.

How Persistent Memory Changes What the Agent Remembers

The knowledge retriever’s value is visible in a single turn. Agent memory’s value emerges across sessions. The memory provider described above stores conversations, extracts entities, and records reasoning traces on every turn. To see what that persistence changes, consider the same question with and without the memory provider attached. The underlying SEC filing data is identical. The difference is whether the agent can draw on what it learned in prior sessions.

Without Memory

The analyst asks: “How does Apple’s supply chain risk compare to what we discussed last week?” The agent has no prior context. It searches SEC filing data and returns whatever the current top-k chunks contain about supply chains. It says nothing about last week’s conversation, the semiconductor sourcing discussion from session three, or the analyst’s established focus on geographic risk. Every session starts at zero.

With Memory

The same question triggers the memory provider. Short-term memory surfaces the session-three exchange about semiconductor sourcing, matched by embedding similarity even though the analyst used different terminology. Long-term memory contributes to the analyst’s recorded preference for geographic risk analysis and the deduplicated entity graph linking Apple to its suppliers. Reasoning memory finds a similar company-risk analysis that the agent ran in session seven and surfaces the approach it used.

The agent synthesizes current SEC data with prior conversational context:

The knowledge retriever contributed the SEC filing data. The memory provider contributed the conversational continuity that made the response coherent across sessions.

Combining Both Providers

Both providers are attached to a single agent via MAF’s context provider list. The following example shows how to configure both providers on a single agent so that it benefits from graph-enriched retrieval and persistent memory simultaneously. The configuration determines what each provider contributes to the agent’s context window on every turn.

On each turn, the knowledge retriever searches the SEC filings graph and injects structured company data. The memory provider injects relevant past conversations, known preferences, and similar prior analyses. The agent sees both: the domain knowledge it needs and the conversational history that makes its responses coherent across sessions.

Context That Compounds

In the first sessions, the knowledge retriever carries most of the weight. The memory graph is sparse, and the agent answers from SEC filing data alone. It surfaces risk factors, products, and geographic exposure because the retrieval query traverses those relationships. Still, it has no sense of what the analyst has already covered or what patterns they care about.

By session ten, the balance shifts. The memory graph holds dozens of entity nodes extracted from prior conversations, a record of which risk categories the analyst returns to most often, and reasoning traces from completed analyses. When the analyst asks about supply chain exposure, the memory provider surfaces the semiconductor sourcing discussion from session three and the preference for geographic risk. The knowledge retriever still searches the same SEC filings graph, but the memory provider narrows what matters. The two providers start reinforcing each other.

By session fifty, the entity graph is dense with deduplicated nodes linking companies, people, risk factors, and products across months of analysis. Reasoning traces from prior analyses provide reusable patterns for structuring new responses. A question about Apple’s risk profile no longer returns a generic summary. It lands in a context shaped by every company the analyst has compared, every risk category they have prioritized, and every analytical approach that succeeded before. The curated knowledge hasn’t changed. What changed is everything the agent learned along the way.

Deploy and Integrate

Ready to move from concept to code? Follow these steps to implement graph-powered agents on Azure:

This article first appeared on Read More