FileSystem

The FileSystem middleware injects a set of file system operation tools (ls, read_file, write_file, edit_file, glob, grep) and an optional command execution tool (execute) into the Agent, enabling the Agent to interact with local or remote file systems.

import "github.com/cloudwego/eino/adk/middlewares/filesystem"

Quick Start

import (
    "context"
    "github.com/cloudwego/eino/adk"
    "github.com/cloudwego/eino/adk/middlewares/filesystem"
)

// 1. Create middleware
middleware, err := filesystem.New(ctx, &filesystem.MiddlewareConfig{
    Backend: myBackend, // Implements the filesystem.Backend interface
})

// 2. Inject into Agent
agent, err := adk.NewChatModelAgent(ctx, &adk.ChatModelAgentConfig{
    // ...
    Middlewares: []adk.ChatModelAgentMiddleware{middleware},
})

Constructors

Function SignatureDescription
New(ctx, *MiddlewareConfig) (ChatModelAgentMiddleware, error)
Recommended. Returns
ChatModelAgentMiddleware
, supports dynamically modifying Instruction and Tools through the
BeforeAgent
hook.
NewTyped[M MessageType](ctx, *MiddlewareConfig) (TypedChatModelAgentMiddleware[M], error)
Generic version, type parameter
M
supports
*schema.Message
and
*schema.AgenticMessage
.
New
is equivalent to
NewTyped[*schema.Message]
.

💡 Deprecated: NewMiddleware(ctx, *Config) (AgentMiddleware, error) is the legacy constructor; new code should use New. NewMiddleware returns the struct AgentMiddleware, which lacks the flexibility of the BeforeAgent hook; additionally, it enables the “large result offloading” feature by default (see below), which has been removed in the New path.


MiddlewareConfig

MiddlewareConfig is the configuration struct used by New / NewTyped.

Core Fields

FieldTypeDescription
Backend
filesystem.Backend
Required. Provides file system operation capabilities, powering the 6 tools: ls, read\_file, write\_file, edit\_file, glob, grep. The interface is defined in the
github.com/cloudwego/eino/adk/filesystem
package.
Shell
filesystem.Shell
Optional. Provides command execution capability; when set, registers the
execute
tool. Mutually exclusive with
StreamingShell
.
StreamingShell
filesystem.StreamingShell
Optional. Provides streaming command execution capability; when set, registers the streaming
execute
tool. Mutually exclusive with
Shell
.
UseMultiModalRead
bool
Optional, defaults to
false
. When enabled, the
read_file
tool becomes an
EnhancedInvokableTool
, supporting multi-modal content such as images/PDFs. Requires the Backend to also implement the filesystem.MultiModalReader interface.
CustomSystemPrompt
*string
Optional. Overrides the system prompt appended to the Agent Instruction. If
nil
, no system prompt is appended.

Tool Configuration Fields

Each tool has a corresponding *ToolConfig field for customizing the tool name, description, replacing the implementation, or disabling it:

FieldCorresponding Tool
LsToolConfig
ls
ReadFileToolConfig
read\_file
WriteFileToolConfig
write\_file
EditFileToolConfig
edit\_file
GlobToolConfig
glob
GrepToolConfig
grep

The execute tool currently does not support customization via ToolConfig; its registration is controlled solely by whether Shell / StreamingShell is set.


ToolConfig

type ToolConfig struct {
    Name       string         // Override tool name, empty string uses default
    Desc       *string        // Override tool description, nil uses default
    CustomTool tool.BaseTool  // Custom tool implementation, replaces Backend default when set
    Disable    bool           // Set to true to not register this tool
}

Priority: Disable=true > CustomTool > Backend default implementation.


Tool Name Constants

const (
    ToolNameLs        = "ls"
    ToolNameReadFile  = "read_file"
    ToolNameWriteFile = "write_file"
    ToolNameEditFile  = "edit_file"
    ToolNameGlob      = "glob"
    ToolNameGrep      = "grep"
    ToolNameExecute   = "execute"
)

Injected Tools

ToolDefault NameRegistration ConditionDescription
ls
ls
Backend ≠ nilList files and subdirectories in a directory
read\_file
read_file
Backend ≠ nilRead file content, supports offset/limit pagination. When
UseMultiModalRead
is enabled, can read images and PDFs
write\_file
write_file
Backend ≠ nilCreate or overwrite a file
edit\_file
edit_file
Backend ≠ nilPrecise string replacement editing, supports
replace_all
glob
glob
Backend ≠ nilMatch file paths by glob pattern
grep
grep
Backend ≠ nilRegex search of file content, supports multiple output modes and pagination
execute
execute
Shell ≠ nil or StreamingShell ≠ nilExecute shell commands

Backend Interface

Backend is defined in the github.com/cloudwego/eino/adk/filesystem package. The middleware package re-exports request/response types via type aliases (e.g., ReadRequest, FileContent), but the Backend interface itself needs to be referenced from the adk/filesystem package.

type Backend interface {
    LsInfo(ctx context.Context, req *LsInfoRequest) ([]FileInfo, error)
    Read(ctx context.Context, req *ReadRequest) (*FileContent, error)
    GrepRaw(ctx context.Context, req *GrepRequest) ([]GrepMatch, error)
    GlobInfo(ctx context.Context, req *GlobInfoRequest) ([]FileInfo, error)
    Write(ctx context.Context, req *WriteRequest) error
    Edit(ctx context.Context, req *EditRequest) error
}

Shell and StreamingShell

type Shell interface {
    Execute(ctx context.Context, input *ExecuteRequest) (*ExecuteResponse, error)
}

type StreamingShell interface {
    ExecuteStreaming(ctx context.Context, input *ExecuteRequest) (*schema.StreamReader[*ExecuteResponse], error)
}

These two are mutually exclusive — only one can be set. StreamingShell supports streaming output, suitable for long-running commands.


MultiModalReader Extension Interface

When UseMultiModalRead = true, the Backend needs to additionally implement the MultiModalReader interface:

type MultiModalReader interface {
    MultiModalRead(ctx context.Context, req *MultiModalReadRequest) (*MultiFileContent, error)
}

Behavior:

  • The read_file tool is upgraded from InvokableTool to EnhancedInvokableTool, returning multi-modal results via schema.ToolResult.Parts
  • The default implementation supports reading image files (PNG, JPG, etc.) and PDF files (supports the pages parameter to specify page ranges, up to 20 pages at a time)
  • The tool description automatically appends a multi-modal capability suffix; if the description is customized via ReadFileToolConfig.Desc, no suffix is appended

💡 When using ChatModelAgentMiddleware, you need to implement the WrapEnhancedInvokableToolCall method for the multi-modal read_file tool to work.

// MultiModalReadRequest extends ReadRequest
type MultiModalReadRequest struct {
    ReadRequest
    Pages string  // PDF page range, e.g., "1-5", "3", "10-20"
}

// MultiFileContent return result
type MultiFileContent struct {
    *FileContent            // Plain text result
    Parts []FileContentPart // Multi-modal result (mutually exclusive with FileContent; FileContent is ignored when Parts is non-empty)
}

type FileContentPart struct {
    Type     FileContentPartType // "image" or "pdf"
    MIMEType string              // e.g., "image/png", "application/pdf"
    Data     []byte              // Raw binary data
}

Deprecated: Legacy Config and Large Result Offloading

💡 The following content only applies to the NewMiddleware + Config legacy path. The New / NewTyped path does not include the large result offloading feature.

The legacy Config provides an additional “Large Tool Result Offloading” mechanism on top of MiddlewareConfig:

FieldDescription
WithoutLargeToolResultOffloading bool
Set to
true
to disable offloading, defaults to
false
(enabled)
LargeToolResultOffloadingTokenLimit int
Token threshold, defaults to
20000
LargeToolResultOffloadingPathGen func(ctx, *compose.ToolInput) (string, error)
Offloading path generation function, defaults to
/large_tool_result/{ToolCallID}

Trigger condition: Offloading is triggered when the character count of the tool’s return result exceeds tokenLimit × 4.

Offloading behavior: The complete result is written to a file via Backend.Write, and the original return is replaced with a summary (first 10 lines + file path hint). The Agent can read the full result via read_file with pagination.