Skip to content

AgentTool should support passing data files to sub-agents with optimize_data_file code executors #4488

@vinismarques

Description

@vinismarques

Required Information

Is your feature request related to a specific problem?

When using AgentTool to call a sub-agent that has a code executor with optimize_data_file=True, there is no straightforward way to pass data files (e.g. CSV) to the sub-agent. AgentTool.run_async always wraps args["request"] into a text-only types.Content:

# AgentTool.run_async (line 211-215)
content = types.Content(
    role='user',
    parts=[types.Part.from_text(text=args['request'])],
)

This means the optimize_data_file pre-processor (_extract_and_replace_inline_files) never finds any inline_data Parts to process, even though the sub-agent's code executor is fully configured to handle them.

The only options today are:

  1. Embed the data as a massive text string in args["request"], which defeats the purpose of optimize_data_file (all data goes through the LLM context window)
  2. Bypass AgentTool entirely and reimplement its Runner setup to construct custom types.Content with inline_data Parts -- which means duplicating ~60 lines of internal logic (state forwarding, ForwardingArtifactService, session creation, cleanup, etc.)

Describe the Solution You'd Like

Allow AgentTool to accept additional types.Part objects (particularly inline_data parts) that get included in the types.Content sent to the sub-agent. This could be as simple as an optional extra_parts parameter:

# Option A: extra_parts parameter on run_async
agent_tool = AgentTool(agent=analytics_agent)
csv_part = types.Part(inline_data=types.Blob(mime_type="text/csv", data=csv_bytes))
result = await agent_tool.run_async(
    args={"request": "Analyze this dataset"},
    tool_context=tool_context,
    extra_parts=[csv_part],  # These get appended to the Content's parts list
)

Or alternatively, allow passing a pre-built types.Content directly:

# Option B: accept Content directly
content = types.Content(
    role="user",
    parts=[
        types.Part.from_text(text="Analyze this dataset"),
        types.Part(inline_data=types.Blob(mime_type="text/csv", data=csv_bytes)),
    ],
)
result = await agent_tool.run_async(
    args={"request": "Analyze this dataset"},
    tool_context=tool_context,
    content_override=content,
)

Either approach would let optimize_data_file work naturally: the pre-processor would find the inline_data Part, replace it with a file reference, upload it to the sandbox, and auto-run explore_df().

Impact on your work

We have a multi-agent system where a root agent delegates to a BigQuery agent (to fetch data) and then to an analytics agent (to analyze it with code execution). The analytics agent uses VertexAiCodeExecutor(optimize_data_file=True, stateful=True).

Today we have to choose between:

  • Injecting thousands of rows as text in the prompt (expensive, slow, wastes tokens)
  • Pre-populating the internal _code_executor_input_files state key before the AgentTool call (works but relies on an internal implementation detail)
  • Reimplementing AgentTool.run_async from scratch (fragile, hard to maintain)

A first-class way to pass data files through AgentTool would make the optimize_data_file feature usable in multi-agent systems, which seems like the intended use case.

Not time-critical -- we have a working workaround via state pre-injection.

Willingness to contribute

Yes -- happy to submit a PR if you agree on the approach.


Recommended Information

Describe Alternatives You've Considered

1. Text injection (current): Dump all data as a string in args["request"]. Works, but the LLM processes every row in its context window. For 5K rows this caused 863K input tokens and a 5+ minute LLM call.

2. Bypass AgentTool with manual Runner: Reimplement the runner setup from AgentTool.run_async to control types.Content construction. Works, but couples to internal APIs (_invocation_context, ForwardingArtifactService, InMemorySessionService, InMemoryMemoryService, credential_service, plugin_manager). Any AgentTool improvements must be manually ported.

3. State-based file pre-injection (current workaround): Pre-populate _code_executor_input_files in session state before the AgentTool call. AgentTool forwards this state to the child session, where the optimize_data_file pre-processor auto-discovers and processes the file. This works well (~20 lines of code) but relies on an internal state key.

Proposed API / Implementation

Minimal change to AgentTool.run_async -- append extra parts to the constructed Content:

# In AgentTool.run_async, after building `content`:
if extra_parts := kwargs.get("extra_parts"):
    content.parts.extend(extra_parts)

This would require either:

  • Adding extra_parts as a parameter to run_async (breaking the BaseTool interface slightly)
  • Or storing it as an instance attribute set before calling run_async

Additional Context

  • ADK version: 1.25.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    core[Component] This issue is related to the core interface and implementation

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions