Page-Agent: Alibaba's Open Source AI Web Copilot

Niels
Niels Co-founder
Publicado el 10 mar 2026

At Emelia, we build a B2B prospecting SaaS that relies on artificial intelligence every day. Bridgers, our digital and AI agency, helps companies design intelligent solutions. And with Maylee, our AI-native email client, we constantly explore new ways AI can simplify web interactions. When Alibaba open-sources an agent capable of controlling any web page through natural language with a single line of code, it is exactly the kind of tool that catches our attention. Here is everything you need to know about page-agent.

What is Alibaba's page-agent?

Alibaba Cloud Logo

Page-agent is an open source JavaScript library developed by Alibaba. The concept is straightforward: you add a script to your web page, and an AI agent takes control of the interface through natural language commands. No server required, no Python, no headless browser. Everything runs client-side, directly in the user's browser.

In practical terms, page-agent turns any website into an AI-controllable application. You type "Fill in the contact form with Acme Corp's information," and the agent executes. You say "Click the login button," and it does. The agent analyzes the page's DOM (HTML structure), identifies interactive elements, and performs the requested actions.

The project is hosted on GitHub under the MIT license and has already accumulated over 2,900 stars. The current version (v1.5.4, released March 9, 2026) is the result of 683 commits across 18 releases. It trended on Hacker News (77 points, 37 comments) and was picked up on daily.dev and by the Japanese tech community.

How does page-agent work? The technical architecture

What sets page-agent apart from most web automation tools is its fundamentally different approach. Where solutions like browser-use, Playwright, or Selenium control the browser from the outside (via a server, a Python script, or a separate process), page-agent lives inside the web page itself.

Page-Agent AI Copilot Concept

DOM manipulation without vision models

Page-agent works exclusively through text-based DOM manipulation. No screenshots, no OCR, no multimodal language model required. The agent parses the page's HTML structure, identifies buttons, form fields, links, and other interactive elements, then generates the appropriate actions.

This approach has several major advantages. First, it is significantly cheaper in LLM tokens than a vision-based approach (sending screenshots to a multimodal model is expensive). Second, it is faster, because text processing is instantaneous compared to image analysis. Third, it requires no special browser permissions.

BYOLLM: Bring your own language model

Page-agent adopts a "Bring Your Own LLM" philosophy. You connect the model of your choice: GPT-4, Claude, Qwen, Mistral, or any other model compatible with the OpenAI API format. The DOM processing layer derives from browser-use (MIT-licensed), but the decision-making intelligence relies on the LLM you provide.

This means you retain full control over costs, data privacy, and response quality. You can even use a local model if you prefer.

How to install and integrate page-agent

Integrating page-agent is remarkably simple. Two methods are available.

Method 1: The script tag (a single line of code)

The simplest approach is adding a single script tag to your HTML:

``html <script src="https://cdn.jsdelivr.net/npm/page-agent@1.5.4/dist/iife/page-agent.demo.js" crossorigin="true"></script> ``

That is it. One line of code and your page has a working AI agent with a built-in user interface. This demo version uses a test LLM provided by Alibaba, ideal for evaluating the tool before deploying to production.

Method 2: NPM installation (for production)

For production use, install the package via npm:

``bash npm install page-agent ``

Then initialize the agent with your own LLM configuration:

```javascript import { PageAgent } from 'page-agent'

const agent = new PageAgent({ model: 'qwen3.5-plus', baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1', apiKey: 'YOUR_API_KEY', language: 'en-US', })

await agent.execute('Click the login button') ```

Note the simplicity of the API: a configuration object, a call to execute(), and you are running. The language parameter localizes the agent's user interface.

Key features of page-agent

User interface with human-in-the-loop validation

Page-agent does not operate blindly. It includes an elegant user interface that appears directly in the web page. Before each critical action, the user can see what the agent is about to do and approve or reject the operation. This is a fundamental design choice: AI assists, but the human stays in control.

This human validation mechanism is essential for production environments. Imagine an agent automatically filling out an order form without confirmation. The human-in-the-loop design eliminates that risk.

Chrome extension for multi-page tasks

By default, page-agent operates within a single web page. But Alibaba offers an optional Chrome extension that extends the agent's capabilities across multiple browser tabs. This enables complex workflows: opening a page, extracting information from it, navigating to another page, and inserting that data.

Multilingual support

Page-agent supports multiple languages for its user interface, making deployment in international contexts straightforward. The language parameter in the configuration lets you switch between available languages.

Page-agent vs browser-use vs Playwright vs Selenium: the comparison

To understand what page-agent brings to the table, comparing it with existing alternatives is essential. Here is a summary:

Criteria

page-agent

browser-use

Playwright

Selenium

Runs in the browser

Yes

No

No

No

Backend required

No

Yes (Python)

Yes

Yes

Vision models needed

No

Optional

N/A

N/A

Integration effort

1 line of code

Significant

Significant

Significant

Human-in-the-loop

Built-in

No

No

No

Multi-page support

Chrome extension

Native

Native

Native

Language

JavaScript/TypeScript

Python

Multi-language

Multi-language

License

MIT

MIT

Apache 2.0

Apache 2.0

Primary use case

In-page copilot

Server automation

E2E testing

E2E testing

The fundamental difference is one of positioning. Playwright and Selenium are testing and automation tools that control the browser from the outside. Browser-use adds an AI layer on top of that server paradigm. Page-agent flips the logic: the agent lives in the page, alongside the user.

This positioning creates an entirely new use case. It is no longer about automating tasks in the background, but about offering an AI copilot to the end user, directly within their working interface.

Concrete use cases for page-agent

Turn your SaaS into an AI product with a few lines of code

This is arguably page-agent's most impactful application. Today, companies like Notion, Salesforce, and HubSpot charge between $20 and $30 per month for their AI copilot features. These copilots do essentially the same thing: they understand the interface, execute actions on user request, and provide contextual assistance.

With page-agent, any SaaS vendor can integrate a similar AI copilot with a few lines of JavaScript. No backend rewrite, no new infrastructure. You add the script, connect an LLM, and your users can control your application through natural language.

For a startup on a tight budget, this means being able to offer a premium AI feature without the months of development it would normally require.

Automate complex form filling (ERP, CRM, back-office)

If you have ever worked with an ERP like SAP or a CRM like Salesforce, you know the pain of 30-field forms. Page-agent can transform these 20-click workflows into a single sentence: "Create a new contact for John Smith, Sales Director at Acme Corp, email john@acme.com, phone 555-0123."

For sales, administrative, and accounting teams spending hours on data entry, the productivity gain is immediate.

Improve web application accessibility

Web accessibility remains a major challenge. Page-agent opens an interesting path: allowing users to control complex interfaces through voice commands or screen readers, in natural language. Instead of navigating with a keyboard through dozens of menus, a visually impaired user could simply say "Open my notifications" or "Send a message to the marketing team."

This is not a complete accessibility solution, but it is an assistive layer that can significantly improve the user experience for people with disabilities.

Create automated tests in natural language

QA teams spend considerable time writing and maintaining test scripts. With page-agent, it becomes possible to write tests in natural language: "Go to the registration page, fill the form with test data, click Submit, and verify the confirmation message appears."

This approach lowers the barrier to automated testing and makes test scenarios understandable by non-developers, facilitating collaboration between product and engineering teams.

Guide users through customer success workflows

User onboarding is a critical challenge for any SaaS product. Instead of creating video tutorials or PDF guides that nobody reads, you could integrate page-agent as an interactive onboarding assistant. The user says "Show me how to create my first campaign," and the agent walks them through the interface step by step, executing or demonstrating each action.

For customer success teams, this could significantly reduce the number of support tickets and accelerate time-to-value for new users.

Real-world scenarios: imagine page-agent in your daily work

To make things more tangible, here are a few concrete scenarios.

Your CRM, controlled by voice. You are on a phone call with a prospect. Instead of frantically navigating your CRM to find their information, you type (or say, via a speech recognition module): "Show me the contact record for Sarah Johnson at TechCorp." The agent locates the search field, enters the name, clicks the correct result, and displays the record. You never lost focus on your conversation.

ERP form filling. You receive a purchase order by email. Instead of manually copying 15 fields into your ERP's entry form, you copy the information and ask the agent: "Create a new supplier order with this information: Supplier ABC Industries, reference PO-2026-0342, 500 units of Product X at $12.50 per unit, expected delivery April 15." The agent fills the form and waits for your approval before submitting.

Interactive client onboarding. A new customer just subscribed to your platform. Instead of a welcome email with a 20-page PDF, they find an AI assistant directly in the interface that says: "Welcome! Would you like me to show you how to set up your first project?" The customer says yes, and the agent guides them action by action.

Page-agent limitations: what you need to know

Like any tool, page-agent is not perfect. Understanding its limitations before adoption is important.

Client-side execution only

Page-agent runs exclusively in the user's browser. This means it cannot perform background tasks, schedule automated runs, or function without a user present. For classic server-side automation (data extraction, overnight workflows, API integrations), you will still need tools like Playwright or browser-use.

LLM call costs

Every agent action requires a language model call. For simple workflows (a click, a field to fill), the cost is negligible. But for complex scenarios involving many steps, LLM tokens add up. Choosing a model with a good quality-to-price ratio and monitoring consumption is important.

DOM complexity and dynamic pages

Modern web pages use sophisticated JavaScript frameworks (React, Vue, Angular) that generate complex DOM structures, with nested components, virtual elements, and dynamic rendering. Page-agent may struggle with certain highly complex interfaces or elements that are not represented in standard DOM patterns.

Limited multi-page without extension

Without the Chrome extension, page-agent is limited to the current page. Workflows requiring navigation between multiple sites or tabs need the extension installed, adding a deployment step.

Project maturity

With 2,900 stars and a still-young community (9 contributors), page-agent remains a relatively recent project. The documentation, while functional, is not as comprehensive as Playwright's or Selenium's. For mission-critical production deployments, this maturity factor should be considered.

How to get started with page-agent: step by step

1. Test the one-line demo. Add the CDN script tag to any HTML page to see the agent in action with the demo LLM.

2. Install via NPM. Run npm install page-agent in your project.

3. Configure your LLM. Choose your model (GPT-4, Claude, Qwen) and configure the PageAgent object with your API keys.

4. Test simple commands. Start with basic actions: "Click this button," "Fill this field with this value."

5. Explore complex workflows. Chain multiple actions, test form navigation, try more elaborate natural language commands.

6. Install the Chrome extension (optional). If you need multi-page workflows, install the extension to extend the agent's capabilities.

7. Deploy to production. Switch from the demo LLM to your own model, adjust language settings, and integrate the agent into your application.

Who should use page-agent?

Page-agent is for you if:

  • You are a SaaS vendor and want to add an AI copilot to your product without rewriting your backend.

  • You manage complex internal tools (ERP, CRM, back-office) and want to simplify the user experience.

  • You are working on web application accessibility.

  • You are looking for an alternative to traditional test scripts for your QA team.

  • You are an agency and want to quickly prototype AI experiences for your clients.

Page-agent is probably not for you if:

  • You need server-side background automation (scheduled workflows, data collection).

  • You are looking for a mature solution with an extensive ecosystem and large community.

  • You work on native mobile or desktop applications (page-agent is web-only).

  • You need to control remote browsers in the cloud (look at browser-use or Playwright for that).

What page-agent reveals about the future of web interfaces

Beyond the tool itself, page-agent illustrates a deeper trend: the democratization of the AI copilot layer. Companies charging premium subscriptions today for AI assistants embedded in their software are seeing an open source tool arrive that can replicate this functionality with three lines of code.

This does not mean proprietary copilots will disappear. They often offer deeper integration, product-specific features, and dedicated support. But page-agent drastically lowers the barrier to entry. For the thousands of SaaS products, internal tools, and web applications that would never have had the resources to develop an AI copilot, this opens a door.

The fact that Alibaba is releasing this tool as open source, under the MIT license, with no usage restrictions, sends a strong signal. After the race to build language models (Qwen, LLaMA, Mistral), it is the AI application layer that is opening up. Page-agent is one of the first tools to make this vision concrete: AI as a universal interaction layer for the web, accessible to any developer with a code editor and an API key.

logo emelia

Descubre Emelia, tu herramienta de prospección todo en uno.

logo emelia

Precios claros, transparentes y sin costes ocultos.

Sin compromiso, precios para ayudarte a aumentar tu prospección.

Start

37€

/mes

Envío ilimitado de emails

Conectar 1 cuenta de LinkedIn

Acciones LinkedIn ilimitadas

Email Warmup incluido

Extracción ilimitada

Contactos ilimitados

Grow

Popular
arrow-right
97€

/mes

Envío ilimitado de emails

Hasta 5 cuentas de LinkedIn

Acciones LinkedIn ilimitadas

Email Warmup ilimitado

Contactos ilimitados

1 integración CRM

Scale

297€

/mes

Envío ilimitado de emails

Hasta 20 cuentas de LinkedIn

Acciones LinkedIn ilimitadas

Email Warmup ilimitado

Contactos ilimitados

Conexión Multi CRM

Llamadas API ilimitadas

Créditos(opcional)

No necesitas créditos si solo quieres enviar emails o hacer acciones en LinkedIn

Se pueden utilizar para:

Buscar Emails

Acción IA

Buscar Números

Verificar Emails

1,000
5,000
10,000
50,000
100,000
1,000 Emails encontrados
1,000 Acciones IA
20 Números
4,000 Verificaciones
19por mes

Descubre otros artículos que te pueden interesar!

Ver todos los artículos
MarieMarie Head Of Sales
Leer más
NielsNiels Co-founder
Leer más
MathieuMathieu Co-founder
Leer más
MarieMarie Head Of Sales
Leer más
Made with ❤ for Growth Marketers by Growth Marketers
Copyright © 2026 Emelia All Rights Reserved