{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\"Open" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Multi-Agent Document Analysis with AG2 and Docling\n", "\n", "| Step | Tech | Execution |\n", "| --- | --- | --- |\n", "| Document conversion | [Docling](https://docling-project.github.io/docling/) | 💻 Local |\n", "| Multi-agent orchestration | [AG2](https://docs.ag2.ai/) | 🌐 Remote (LLM) |\n", "\n", "This example demonstrates how to combine **Docling** for document conversion with **AG2** for\n", "multi-agent analysis. Docling converts PDF, DOCX, HTML, and other formats into structured\n", "Markdown and tables. AG2 agents then collaborate to analyze the extracted content.\n", "\n", "The pipeline:\n", "1. A **Document Processor** agent uses Docling tools to convert documents and extract tables.\n", "2. An **Analyst** agent synthesizes the extracted content into a structured summary.\n", "3. A **UserProxy** orchestrates the conversation via a GroupChat." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup\n", "\n", "- 👉 For best conversion speed, use GPU acceleration whenever available; e.g. if running on Colab, use GPU-enabled runtime.\n", "- Requires an OpenAI API key set as the `OPENAI_API_KEY` environment variable.\n", "- First run downloads ML models (~1–2 GB). Subsequent runs use cached models." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%pip install -q --progress-bar off --no-warn-conflicts docling \"ag2[openai]>=0.11.4,<1.0\" pandas" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import json\n", "import os\n", "\n", "from autogen import (\n", " AssistantAgent,\n", " GroupChat,\n", " GroupChatManager,\n", " LLMConfig,\n", " UserProxyAgent,\n", ")\n", "\n", "from docling.datamodel.base_models import ConversionStatus\n", "from docling.document_converter import DocumentConverter\n", "\n", "# Set your OpenAI API key (or configure via .env / Colab secrets)\n", "# os.environ[\"OPENAI_API_KEY\"] = \"sk-...\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Document Conversion with Docling\n", "\n", "First, let's convert a sample document and inspect the output. We use the\n", "[Docling Technical Report](https://arxiv.org/pdf/2408.09869) as the demo document." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "DOC_SOURCE = \"https://arxiv.org/pdf/2408.09869\"\n", "\n", "converter = DocumentConverter()\n", "result = converter.convert(DOC_SOURCE)\n", "\n", "print(f\"Status: {result.status}\")\n", "print(f\"Pages: {len(list(result.document.pages))}\")\n", "print()\n", "\n", "# Preview the first 2000 characters of extracted Markdown\n", "markdown = result.document.export_to_markdown()\n", "print(f\"Markdown length: {len(markdown):,} characters\")\n", "print(\"---\")\n", "print(markdown[:2000])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Table Extraction\n", "\n", "Docling automatically detects and extracts tables. Let's inspect them." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": "tables = list(result.document.tables)\nprint(f\"Found {len(tables)} table(s)\")\n\nfor i, table in enumerate(tables):\n table_df = table.export_to_dataframe(doc=result.document)\n print(f\"\\n### Table {i + 1} (shape: {table_df.shape})\")\n print(table_df.to_markdown())" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## AG2 Multi-Agent Setup\n", "\n", "Now we set up AG2 agents that use Docling as their document processing backend.\n", "\n", "**Architecture:**\n", "- `document_processor` — calls Docling tools to convert documents and extract tables\n", "- `analyst` — analyzes the extracted content and produces a structured summary\n", "- `user_proxy` — orchestrates the conversation, executes tool calls\n", "\n", "The agents communicate via a `GroupChat` managed by a `GroupChatManager`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "llm_config = LLMConfig(\n", " {\n", " \"model\": \"gpt-4o-mini\",\n", " \"api_key\": os.environ.get(\"OPENAI_API_KEY\"),\n", " \"api_type\": \"openai\",\n", " }\n", ")\n", "\n", "MAX_CONTENT_CHARS = 15000 # Truncation limit to stay within LLM context\n", "\n", "\n", "def is_termination_msg(msg):\n", " content = msg.get(\"content\", \"\") or \"\"\n", " return \"TERMINATE\" in content\n", "\n", "\n", "proxy = UserProxyAgent(\n", " name=\"user_proxy\",\n", " human_input_mode=\"NEVER\",\n", " max_consecutive_auto_reply=10,\n", " code_execution_config=False,\n", " is_termination_msg=is_termination_msg,\n", ")\n", "\n", "processor = AssistantAgent(\n", " name=\"document_processor\",\n", " system_message=(\n", " \"You are a document processing agent. Use the convert_document tool to \"\n", " \"extract text from a document, and extract_tables to get structured table \"\n", " \"data. Always call convert_document first, then extract_tables if the user \"\n", " \"asks about tables or data.\"\n", " ),\n", " llm_config=llm_config,\n", ")\n", "\n", "analyst = AssistantAgent(\n", " name=\"analyst\",\n", " system_message=(\n", " \"You are a document analyst. Based on the content extracted by the \"\n", " \"document_processor, provide a clear and structured analysis including:\\n\"\n", " \"- A concise summary of the document\\n\"\n", " \"- Key findings or contributions\\n\"\n", " \"- Notable data from any tables\\n\\n\"\n", " \"When your analysis is complete, end your message with TERMINATE.\"\n", " ),\n", " llm_config=llm_config,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Tool Registration\n", "\n", "We register Docling operations as AG2 tools. The `converter` instance created earlier\n", "is reused — `DocumentConverter` is stateless and thread-safe." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": "@proxy.register_for_execution()\n@processor.register_for_llm(\n description=\"Convert a document (PDF, DOCX, HTML, or URL) to markdown text\"\n)\ndef convert_document(source: str) -> str:\n \"\"\"Convert a document to markdown using Docling.\"\"\"\n conv_result = converter.convert(source)\n if conv_result.status == ConversionStatus.FAILURE:\n return f\"Error: Document conversion failed for {source}\"\n md = conv_result.document.export_to_markdown()\n if len(md) > MAX_CONTENT_CHARS:\n return (\n md[:MAX_CONTENT_CHARS]\n + f\"\\n\\n[Truncated — showing first {MAX_CONTENT_CHARS:,} of {len(md):,} characters]\"\n )\n return md\n\n\n@proxy.register_for_execution()\n@processor.register_for_llm(\n description=\"Extract tables from a document as JSON. Returns a list of tables, each as a list of row records.\"\n)\ndef extract_tables(source: str) -> str:\n \"\"\"Extract tables from a document using Docling.\"\"\"\n conv_result = converter.convert(source)\n if conv_result.status == ConversionStatus.FAILURE:\n return f\"Error: Document conversion failed for {source}\"\n tables = list(conv_result.document.tables)\n if not tables:\n return \"No tables found in the document.\"\n table_data = []\n for i, table in enumerate(tables):\n table_df = table.export_to_dataframe(doc=conv_result.document)\n table_data.append(\n {\n \"table_index\": i + 1,\n \"rows\": table_df.shape[0],\n \"columns\": table_df.shape[1],\n \"data\": table_df.to_dict(orient=\"records\"),\n }\n )\n return json.dumps(table_data, indent=2)\n\n\nprint(f\"Tools registered on proxy: {list(proxy._function_map.keys())}\")" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Run the Multi-Agent Analysis\n", "\n", "The `user_proxy` sends a task to the group chat. The `document_processor` will use\n", "Docling tools to extract content, and the `analyst` will synthesize the findings." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "group_chat = GroupChat(\n", " agents=[proxy, processor, analyst],\n", " messages=[],\n", " max_round=10,\n", ")\n", "\n", "manager = GroupChatManager(\n", " groupchat=group_chat,\n", " llm_config=llm_config,\n", " is_termination_msg=is_termination_msg,\n", ")\n", "\n", "result = proxy.run(\n", " manager,\n", " message=(\n", " f\"Analyze the document at {DOC_SOURCE} — \"\n", " \"summarize its key findings and extract any tables.\"\n", " ),\n", ").process()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Further Reading\n", "\n", "- [Docling documentation](https://docling-project.github.io/docling/)\n", "- [AG2 documentation](https://docs.ag2.ai/)\n", "- [Docling examples](https://docling-project.github.io/docling/examples/)\n", "- [AG2 tool use guide](https://docs.ag2.ai/user-guide/agentchat-user-guide/tutorial/tool-use)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbformat_minor": 2, "pygments_lexer": "ipython3", "version": "3.12.0" } }, "nbformat": 4, "nbformat_minor": 4 }