Blog

Niels Co-founder

Publicado el 10 mar 2026Actualizado el 27 may 2026

Encuentra y contacta a tus futuros clientes

Plataforma de prospección todo-en-uno

Probar gratis →

Volver al hub

Blog

Page-Agent: Alibaba's Open Source AI Web Copilot

Niels Co-founder

Publicado el 10 mar 2026Actualizado el 27 may 2026

At Emelia, we build a B2B prospecting SaaS that relies on artificial intelligence every day. Bridgers, our digital and AI agency, helps companies design intelligent solutions. And with Maylee, our AI-native email client, we constantly explore new ways AI can simplify web interactions. When Alibaba open-sources an agent capable of controlling any web page through natural language with a single line of code, it is exactly the kind of tool that catches our attention. Here is everything you need to know about page-agent.

What is Alibaba's page-agent?

Page-agent is an open source JavaScript library developed by Alibaba. The concept is straightforward: you add a script to your web page, and an AI agent takes control of the interface through natural language commands. No server required, no Python, no headless browser. Everything runs client-side, directly in the user's browser.

In practical terms, page-agent turns any website into an AI-controllable application. You type "Fill in the contact form with Acme Corp's information," and the agent executes. You say "Click the login button," and it does. The agent analyzes the page's DOM (HTML structure), identifies interactive elements, and performs the requested actions.

The project is hosted on GitHub under the MIT license and has already accumulated over 2,900 stars. The current version (v1.5.4, released March 9, 2026) is the result of 683 commits across 18 releases. It trended on Hacker News (77 points, 37 comments) and was picked up on daily.dev and by the Japanese tech community.

How does page-agent work? The technical architecture

What sets page-agent apart from most web automation tools is its fundamentally different approach. Where solutions like browser-use, Playwright, or Selenium control the browser from the outside (via a server, a Python script, or a separate process), page-agent lives inside the web page itself.

DOM manipulation without vision models

Page-agent works exclusively through text-based DOM manipulation. No screenshots, no OCR, no multimodal language model required. The agent parses the page's HTML structure, identifies buttons, form fields, links, and other interactive elements, then generates the appropriate actions.

This approach has several major advantages. First, it is significantly cheaper in LLM tokens than a vision-based approach (sending screenshots to a multimodal model is expensive). Second, it is faster, because text processing is instantaneous compared to image analysis. Third, it requires no special browser permissions.

BYOLLM: Bring your own language model

Page-agent adopts a "Bring Your Own LLM" philosophy. You connect the model of your choice: GPT-4, Claude, Qwen, Mistral, or any other model compatible with the OpenAI API format. The DOM processing layer derives from browser-use (MIT-licensed), but the decision-making intelligence relies on the LLM you provide.

This means you retain full control over costs, data privacy, and response quality. You can even use a local model if you prefer.

How to install and integrate page-agent

Integrating page-agent is remarkably simple. Two methods are available.

Method 1: The script tag (a single line of code)

The simplest approach is adding a single script tag to your HTML:

<script src="https://cdn.jsdelivr.net/npm/page-agent@1.5.4/dist/iife/page-agent.demo.js" crossorigin="true"></script>

That is it. One line of code and your page has a working AI agent with a built-in user interface. This demo version uses a test LLM provided by Alibaba, ideal for evaluating the tool before deploying to production.

Method 2: NPM installation (for production)

For production use, install the package via npm:

npm install page-agent

Then initialize the agent with your own LLM configuration:

```javascript import { PageAgent } from 'page-agent'

const agent = new PageAgent({ model: 'qwen3.5-plus', baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1', apiKey: 'YOUR_API_KEY', language: 'en-US', })

await agent.execute('Click the login button') ```

Note the simplicity of the API: a configuration object, a call to execute(), and you are running. The language parameter localizes the agent's user interface.

Key features of page-agent

User interface with human-in-the-loop validation

Page-agent does not operate blindly. It includes an elegant user interface that appears directly in the web page. Before each critical action, the user can see what the agent is about to do and approve or reject the operation. This is a fundamental design choice: AI assists, but the human stays in control.

This human validation mechanism is essential for production environments. Imagine an agent automatically filling out an order form without confirmation. The human-in-the-loop design eliminates that risk.

Chrome extension for multi-page tasks

By default, page-agent operates within a single web page. But Alibaba offers an optional Chrome extension that extends the agent's capabilities across multiple browser tabs. This enables complex workflows: opening a page, extracting information from it, navigating to another page, and inserting that data.

Multilingual support

Page-agent supports multiple languages for its user interface, making deployment in international contexts straightforward. The language parameter in the configuration lets you switch between available languages.

Page-agent vs browser-use vs Playwright vs Selenium: the comparison

To understand what page-agent brings to the table, comparing it with existing alternatives is essential. Here is a summary:

Criteria	page-agent	browser-use	Playwright	Selenium
Runs in the browser	Yes	No	No	No
Backend required	No	Yes (Python)	Yes	Yes
Vision models needed	No	Optional	N/A	N/A
Integration effort	1 line of code	Significant	Significant	Significant
Human-in-the-loop	Built-in	No	No	No
Multi-page support	Chrome extension	Native	Native	Native
Language	JavaScript/TypeScript	Python	Multi-language	Multi-language
License	MIT	MIT	Apache 2.0	Apache 2.0
Primary use case	In-page copilot	Server automation	E2E testing	E2E testing

The fundamental difference is one of positioning. Playwright and Selenium are testing and automation tools that control the browser from the outside. Browser-use adds an AI layer on top of that server paradigm. Page-agent flips the logic: the agent lives in the page, alongside the user.

This positioning creates an entirely new use case. It is no longer about automating tasks in the background, but about offering an AI copilot to the end user, directly within their working interface.

Concrete use cases for page-agent

Turn your SaaS into an AI product with a few lines of code

This is arguably page-agent's most impactful application. Today, companies like Notion, Salesforce, and HubSpot charge between $20 and $30 per month for their AI copilot features. These copilots do essentially the same thing: they understand the interface, execute actions on user request, and provide contextual assistance.

With page-agent, any SaaS vendor can integrate a similar AI copilot with a few lines of JavaScript. No backend rewrite, no new infrastructure. You add the script, connect an LLM, and your users can control your application through natural language.

For a startup on a tight budget, this means being able to offer a premium AI feature without the months of development it would normally require.

Automate complex form filling (ERP, CRM, back-office)

If you have ever worked with an ERP like SAP or a CRM like Salesforce, you know the pain of 30-field forms. Page-agent can transform these 20-click workflows into a single sentence: "Create a new contact for John Smith, Sales Director at Acme Corp, email john@acme.com, phone 555-0123."

For sales, administrative, and accounting teams spending hours on data entry, the productivity gain is immediate.

Improve web application accessibility

Web accessibility remains a major challenge. Page-agent opens an interesting path: allowing users to control complex interfaces through voice commands or screen readers, in natural language. Instead of navigating with a keyboard through dozens of menus, a visually impaired user could simply say "Open my notifications" or "Send a message to the marketing team."

This is not a complete accessibility solution, but it is an assistive layer that can significantly improve the user experience for people with disabilities.

Create automated tests in natural language

QA teams spend considerable time writing and maintaining test scripts. With page-agent, it becomes possible to write tests in natural language: "Go to the registration page, fill the form with test data, click Submit, and verify the confirmation message appears."

This approach lowers the barrier to automated testing and makes test scenarios understandable by non-developers, facilitating collaboration between product and engineering teams.

Guide users through customer success workflows

User onboarding is a critical challenge for any SaaS product. Instead of creating video tutorials or PDF guides that nobody reads, you could integrate page-agent as an interactive onboarding assistant. The user says "Show me how to create my first campaign," and the agent walks them through the interface step by step, executing or demonstrating each action.

For customer success teams, this could significantly reduce the number of support tickets and accelerate time-to-value for new users.

Real-world scenarios: imagine page-agent in your daily work

To make things more tangible, here are a few concrete scenarios.

Your CRM, controlled by voice. You are on a phone call with a prospect. Instead of frantically navigating your CRM to find their information, you type (or say, via a speech recognition module): "Show me the contact record for Sarah Johnson at TechCorp." The agent locates the search field, enters the name, clicks the correct result, and displays the record. You never lost focus on your conversation.

ERP form filling. You receive a purchase order by email. Instead of manually copying 15 fields into your ERP's entry form, you copy the information and ask the agent: "Create a new supplier order with this information: Supplier ABC Industries, reference PO-2026-0342, 500 units of Product X at $12.50 per unit, expected delivery April 15." The agent fills the form and waits for your approval before submitting.

Interactive client onboarding. A new customer just subscribed to your platform. Instead of a welcome email with a 20-page PDF, they find an AI assistant directly in the interface that says: "Welcome! Would you like me to show you how to set up your first project?" The customer says yes, and the agent guides them action by action.

Page-agent limitations: what you need to know

Like any tool, page-agent is not perfect. Understanding its limitations before adoption is important.

Client-side execution only

Page-agent runs exclusively in the user's browser. This means it cannot perform background tasks, schedule automated runs, or function without a user present. For classic server-side automation (data extraction, overnight workflows, API integrations), you will still need tools like Playwright or browser-use.

LLM call costs

Every agent action requires a language model call. For simple workflows (a click, a field to fill), the cost is negligible. But for complex scenarios involving many steps, LLM tokens add up. Choosing a model with a good quality-to-price ratio and monitoring consumption is important.

DOM complexity and dynamic pages

Modern web pages use sophisticated JavaScript frameworks (React, Vue, Angular) that generate complex DOM structures, with nested components, virtual elements, and dynamic rendering. Page-agent may struggle with certain highly complex interfaces or elements that are not represented in standard DOM patterns.

Limited multi-page without extension

Without the Chrome extension, page-agent is limited to the current page. Workflows requiring navigation between multiple sites or tabs need the extension installed, adding a deployment step.

Project maturity

With 2,900 stars and a still-young community (9 contributors), page-agent remains a relatively recent project. The documentation, while functional, is not as comprehensive as Playwright's or Selenium's. For mission-critical production deployments, this maturity factor should be considered.

How to get started with page-agent: step by step

1. Test the one-line demo. Add the CDN script tag to any HTML page to see the agent in action with the demo LLM.

2. Install via NPM. Run npm install page-agent in your project.

3. Configure your LLM. Choose your model (GPT-4, Claude, Qwen) and configure the PageAgent object with your API keys.

4. Test simple commands. Start with basic actions: "Click this button," "Fill this field with this value."

5. Explore complex workflows. Chain multiple actions, test form navigation, try more elaborate natural language commands.

6. Install the Chrome extension (optional). If you need multi-page workflows, install the extension to extend the agent's capabilities.

7. Deploy to production. Switch from the demo LLM to your own model, adjust language settings, and integrate the agent into your application.

Who should use page-agent?

Page-agent is for you if:

You are a SaaS vendor and want to add an AI copilot to your product without rewriting your backend.
You manage complex internal tools (ERP, CRM, back-office) and want to simplify the user experience.
You are working on web application accessibility.
You are looking for an alternative to traditional test scripts for your QA team.
You are an agency and want to quickly prototype AI experiences for your clients.

Page-agent is probably not for you if:

You need server-side background automation (scheduled workflows, data collection).
You are looking for a mature solution with an extensive ecosystem and large community.
You work on native mobile or desktop applications (page-agent is web-only).
You need to control remote browsers in the cloud (look at browser-use or Playwright for that).

What page-agent reveals about the future of web interfaces

Beyond the tool itself, page-agent illustrates a deeper trend: the democratization of the AI copilot layer. Companies charging premium subscriptions today for AI assistants embedded in their software are seeing an open source tool arrive that can replicate this functionality with three lines of code.

This does not mean proprietary copilots will disappear. They often offer deeper integration, product-specific features, and dedicated support. But page-agent drastically lowers the barrier to entry. For the thousands of SaaS products, internal tools, and web applications that would never have had the resources to develop an AI copilot, this opens a door.

The fact that Alibaba is releasing this tool as open source, under the MIT license, with no usage restrictions, sends a strong signal. After the race to build language models (Qwen, LLaMA, Mistral), it is the AI application layer that is opening up. Page-agent is one of the first tools to make this vision concrete: AI as a universal interaction layer for the web, accessible to any developer with a code editor and an API key.

Descubre Emelia, tu herramienta de prospección todo en uno.

Lanzo mi campaña

Precios claros, transparentes y sin costes ocultos.

Sin compromiso, precios para ayudarte a aumentar tu prospección.

Start

37€

/mes

Envío ilimitado de emails

Conectar 1 cuenta de LinkedIn

Acciones LinkedIn ilimitadas

Email Warmup incluido

Extracción ilimitada

Contactos ilimitados

Grow

Popular

97€

/mes

Envío ilimitado de emails

Hasta 5 cuentas de LinkedIn

Acciones LinkedIn ilimitadas

Email Warmup ilimitado

Contactos ilimitados

1 integración CRM

Scale

297€

/mes

Envío ilimitado de emails

Hasta 20 cuentas de LinkedIn

Acciones LinkedIn ilimitadas

Email Warmup ilimitado

Contactos ilimitados

Conexión Multi CRM

Llamadas API ilimitadas

Créditos(opcional)

No necesitas créditos si solo quieres enviar emails o hacer acciones en LinkedIn

Se pueden utilizar para:

Buscar Emails

Acción IA

Buscar Números

Verificar Emails

€19por mes

1,000

1,000 Emails encontrados

1,000 Acciones IA

20 Números

4,000 Verificaciones

5,000

10,000

50,000

100,000

1,000 Emails encontrados

1,000 Acciones IA

20 Números

4,000 Verificaciones

€19por mes

Descubre otros artículos que te pueden interesar!

Ver todos los artículos

Software

Publicado el 27 may 2025

Software CRM 2026: top 9 (gratis y de pago)

Mathieu Co-founder

Software

Publicado el 30 jun 2024

Waalaxy: Opiniones, precios, alternativas (2026)

Marie Head Of Sales

Software

Publicado el 5 abr 2025

Alternativas a SignalHire: las mejores herramientas B2B 2026

Niels Co-founder

Software

Publicado el 14 jul 2024

6 alternativas a Skylead para gastar menos y mejorar la generación de clientes potenciales

Marie Head Of Sales

Prospección B2B

Publicado el 28 may 2025

Marketing B2B 2026: definición, estrategias y ejemplos

Niels Co-founder

Marketing

Publicado el 9 jun 2023

Cold email: guía completa para empezar en 2026

Niels Co-founder

Made with ❤ for Growth Marketers by Growth Marketers

Encuentra y contacta a tus futuros clientes

Page-Agent: Alibaba's Open Source AI Web Copilot

What is Alibaba's page-agent?

How does page-agent work? The technical architecture

DOM manipulation without vision models

BYOLLM: Bring your own language model

How to install and integrate page-agent

Method 1: The script tag (a single line of code)

Method 2: NPM installation (for production)

Key features of page-agent

User interface with human-in-the-loop validation

Chrome extension for multi-page tasks

Multilingual support

Page-agent vs browser-use vs Playwright vs Selenium: the comparison

Concrete use cases for page-agent

Turn your SaaS into an AI product with a few lines of code

Automate complex form filling (ERP, CRM, back-office)

Improve web application accessibility

Create automated tests in natural language

Guide users through customer success workflows

Real-world scenarios: imagine page-agent in your daily work

Page-agent limitations: what you need to know

Client-side execution only

LLM call costs

DOM complexity and dynamic pages

Limited multi-page without extension

Project maturity

How to get started with page-agent: step by step

Who should use page-agent?

What page-agent reveals about the future of web interfaces

Descubre Emelia, tu herramienta de prospección todo en uno.

Precios claros, transparentes y sin costes ocultos.

Start

Grow

Scale

Créditos(opcional)

Descubre otros artículos que te pueden interesar!

Software CRM 2026: top 9 (gratis y de pago)

Waalaxy: Opiniones, precios, alternativas (2026)

Alternativas a SignalHire: las mejores herramientas B2B 2026

6 alternativas a Skylead para gastar menos y mejorar la generación de clientes potenciales

Marketing B2B 2026: definición, estrategias y ejemplos

Cold email: guía completa para empezar en 2026

Enlaces útiles

Acerca de

Features

Síguenos

Socios