-
Notifications
You must be signed in to change notification settings - Fork 2.9k
AgentTool should support passing data files to sub-agents with optimize_data_file code executors#4488Description
Required Information
Is your feature request related to a specific problem?
When using AgentTool to call a sub-agent that has a code executor with optimize_data_file=True, there is no straightforward way to pass data files (e.g. CSV) to the sub-agent. AgentTool.run_async always wraps args["request"] into a text-only types.Content:
# AgentTool.run_async (line 211-215)
content = types.Content(
role='user',
parts=[types.Part.from_text(text=args['request'])],
)This means the optimize_data_file pre-processor (_extract_and_replace_inline_files) never finds any inline_data Parts to process, even though the sub-agent's code executor is fully configured to handle them.
The only options today are:
- Embed the data as a massive text string in
args["request"], which defeats the purpose ofoptimize_data_file(all data goes through the LLM context window) - Bypass
AgentToolentirely and reimplement its Runner setup to construct customtypes.Contentwithinline_dataParts -- which means duplicating ~60 lines of internal logic (state forwarding,ForwardingArtifactService, session creation, cleanup, etc.)
Describe the Solution You'd Like
Allow AgentTool to accept additional types.Part objects (particularly inline_data parts) that get included in the types.Content sent to the sub-agent. This could be as simple as an optional extra_parts parameter:
# Option A: extra_parts parameter on run_async
agent_tool = AgentTool(agent=analytics_agent)
csv_part = types.Part(inline_data=types.Blob(mime_type="text/csv", data=csv_bytes))
result = await agent_tool.run_async(
args={"request": "Analyze this dataset"},
tool_context=tool_context,
extra_parts=[csv_part], # These get appended to the Content's parts list
)Or alternatively, allow passing a pre-built types.Content directly:
# Option B: accept Content directly
content = types.Content(
role="user",
parts=[
types.Part.from_text(text="Analyze this dataset"),
types.Part(inline_data=types.Blob(mime_type="text/csv", data=csv_bytes)),
],
)
result = await agent_tool.run_async(
args={"request": "Analyze this dataset"},
tool_context=tool_context,
content_override=content,
)Either approach would let optimize_data_file work naturally: the pre-processor would find the inline_data Part, replace it with a file reference, upload it to the sandbox, and auto-run explore_df().
Impact on your work
We have a multi-agent system where a root agent delegates to a BigQuery agent (to fetch data) and then to an analytics agent (to analyze it with code execution). The analytics agent uses VertexAiCodeExecutor(optimize_data_file=True, stateful=True).
Today we have to choose between:
- Injecting thousands of rows as text in the prompt (expensive, slow, wastes tokens)
- Pre-populating the internal
_code_executor_input_filesstate key before theAgentToolcall (works but relies on an internal implementation detail) - Reimplementing
AgentTool.run_asyncfrom scratch (fragile, hard to maintain)
A first-class way to pass data files through AgentTool would make the optimize_data_file feature usable in multi-agent systems, which seems like the intended use case.
Not time-critical -- we have a working workaround via state pre-injection.
Willingness to contribute
Yes -- happy to submit a PR if you agree on the approach.
Recommended Information
Describe Alternatives You've Considered
1. Text injection (current): Dump all data as a string in args["request"]. Works, but the LLM processes every row in its context window. For 5K rows this caused 863K input tokens and a 5+ minute LLM call.
2. Bypass AgentTool with manual Runner: Reimplement the runner setup from AgentTool.run_async to control types.Content construction. Works, but couples to internal APIs (_invocation_context, ForwardingArtifactService, InMemorySessionService, InMemoryMemoryService, credential_service, plugin_manager). Any AgentTool improvements must be manually ported.
3. State-based file pre-injection (current workaround): Pre-populate _code_executor_input_files in session state before the AgentTool call. AgentTool forwards this state to the child session, where the optimize_data_file pre-processor auto-discovers and processes the file. This works well (~20 lines of code) but relies on an internal state key.
Proposed API / Implementation
Minimal change to AgentTool.run_async -- append extra parts to the constructed Content:
# In AgentTool.run_async, after building `content`:
if extra_parts := kwargs.get("extra_parts"):
content.parts.extend(extra_parts)This would require either:
- Adding
extra_partsas a parameter torun_async(breaking theBaseToolinterface slightly) - Or storing it as an instance attribute set before calling
run_async
Additional Context
- ADK version:
1.25.0