What is Claude 3.5 Sonnet Computer Use? How to use？

Can you imagine? One day, you open your computer and tell your AI assistant, “Hey, fill out this application form for me.”

And it not only replies, “Sure thing, boss,” but also zips around your computer, getting it done without you having to lift a finger!

Early this morning, the globally renowned large model platform Anthropic released the upgraded Claude 3.5 Sonnet.

They also introduced a revolutionary feature called “Computer Use.” Through an API, developers can enable Claude to use a computer like a human, controlling the mouse and keyboard, viewing the screen, moving the cursor, clicking buttons, and entering text.

For example, users can have Claude search for information on the web, fill in data in spreadsheets, or open software to perform specific tasks. It can also assist developers with repetitive tasks, test code, and more.

Throughout the entire process, Claude will automatically execute the corresponding operations based on the given instructions.

Demo of Claude 3.5 Sonnet Computer use

Auto Search

In this demo, an Anthropic researcher gave Claude a highly challenging task:

“My friend is coming to San Francisco and l want to watch the sunrise with him at the Golden Gate Bridge tomorrow morning. We’ll be coming from Pacific Heights. Could you find us a great viewing spot, check the drive time and sunrise time, then set up a calendar event that gives us enough time to get there?”

Claude independently opened Google and started searching:

How far is the Golden Gate Bridge from the user’s location? Claude opened a map to find the distance:

After gathering the necessary information, it opened the calendar and scheduled the appointment for its owner.

Automated Coding for Website Creation

A developer demonstrated how Claude smoothly controlled his laptop to complete a website programming task.

First, Claude navigated to Claude.ai in the developer’s Chrome browser and created a 90s-themed personal homepage.

It entered the URL, typed in prompts, and sent requests to another Claude:

Claude AI returned some code, and the rendered page looked pretty good.

However, the developer wanted to make some local modifications on his computer.

So, he asked Claude to download the files and open them in VS Code. Claude successfully completed these instructions.

Next, the developer asked Claude to start a server so he could view the file in the browser.

Claude opened the VS Code terminal and tried to start a server, but encountered an error: Python wasn’t installed on the machine:

However, there was an error in the terminal output, and a file icon was missing at the top.

The developer asked Claude to identify and fix the error in the file.

Impressively, Claude found the problematic line in VS Code, deleted it, saved the file, and reran the website:

Auto Data Retrieval and Form Filling

Imagine we need to fill out a supplier request form from “Ant Equipment Company,” but the required data is scattered across the computer. Can Claude help us complete this task?

Claude started by taking screenshots of the developer’s screen and quickly discovered that “Ant Equipment Company” was not listed in the form.

At this point, it immediately switched to the CRM system to search for the company. After finding it, it scrolled through the pages to gather all the necessary information for the form and then submitted it.

This means that many of the tedious tasks we have to do at work can be delegated to Claude!

Now, this feature is available in the API. Well-known companies like Asana, Canva, Cognition, DoorDash, Replit, and The Browser Company are exploring Claude’s new capabilities to execute complex tasks involving dozens or even hundreds of steps.

For example, Replit is leveraging Claude 3.5 Sonnet’s computer usage and UI navigation abilities to develop features for Replit Agent, enabling real-time evaluation during the application development process.

Far Below Human Performance, But Promising Future

How does the newly upgraded Claude 3.5 Sonnet perform in computer usage?

In the OSWorld test, it scored 14.9% in tasks based solely on screenshots, significantly outperforming the second-ranked AI system (7.8%).

When more operational steps were allowed to complete tasks, Claude’s score increased to 22.0%.

This indicates that multiple interactions between the model and its environment can optimize task performance.

Although this result is a substantial improvement from before, it is still far below the human performance of 72.36%.

This suggests that Claude 3.5 Sonnet has considerable room for future improvement.

How Claude’s Computer Use Works?

Currently, computer use primarily relies on APIs to drive automated instructions.

How Claude Understands Commands?

When developers send commands to Claude via the API, Claude utilizes its natural language processing capabilities to interpret these instructions.

Its internal language model performs lexical, syntactic, and semantic analysis on the command text.

For example, for a command like “fill out an online form using data from the computer,” Claude identifies the key actions as filling out the form and sourcing data from the computer.

Based on the language patterns and knowledge learned during pre-training, Claude maps the intent of the command to corresponding computer operation concepts.

The pre-trained knowledge includes common computer operation terms and software function descriptions to accurately execute specific actions.

How Claude Controls the Computer?

Claude again uses the API to control the underlying frameworks in systems like Windows and MacOS, including the mouse, keyboard, buttons, and text boxes.

Once Claude determines the computer operation to be performed, it begins executing the specific actions.

For example, for cursor movement, the API sends the corresponding command to the operating system, which then passes it to the mouse driver to move the cursor.

For button clicks, the API first locates the button’s position on the screen and then simulates a mouse click event sent to the operating system.

When entering text, it simulates keyboard input to type the text into the target text box character by character or by phrases.

How to Use the Claude Computer Use?

Performance of New Claude 3.5 Sonnet

The upgraded Claude 3.5 Sonnet has achieved significant performance improvements across various industry benchmarks. Notably, it has made remarkable strides in agent coding and tool usage tasks.

In terms of coding capabilities, its performance in the SWE-bench Verified test has surged from 33.4% to 49.0%. This surpasses all publicly available models, including inference models like OpenAI’s o1-preview and specialized systems designed for agent coding.

Additionally, Claude 3.5 Sonnet excelled in the TAU-bench, a benchmark evaluating agent tool usage capabilities. Its score in the retail sector increased from 62.6% to 69.2%, and in the more challenging aviation sector, it jumped from 36.0% to 46.0%.

The table below shows that in the GPQA (Diamond) reasoning test benchmark, the new Claude 3.5 Sonnet significantly outperformed GPT-4o.

Claude 3.5 Sonnet has set new industry standards in visual QA, mathematical reasoning, document visual QA, chart QA, and scientific table benchmarks.

Applications of new Claude 3.5 Sonnet Model

Claude 3.5 Sonnet can understand nuanced instructions and context, identify and correct its own errors, and generate in-depth analysis and insights from complex data. Combining advanced coding, visual recognition, and writing capabilities, Claude 3.5 Sonnet can be applied in various scenarios.

Simulating Human Computer Operations

By integrating Claude via API, developers can guide Claude to use a computer like a human—observing the screen, moving the mouse, clicking buttons, and typing text. Claude 3.5 Sonnet is the first cutting-edge AI model capable of reliably using a computer in this manner. Although it is still experimental in public testing, its capabilities will continue to improve over time.

Automated Code Generation

Claude 3.5 Sonnet can assist throughout the entire software development lifecycle—from initial design to bug fixing, system maintenance, and performance optimization. It can be directly integrated into products or used as an intelligent coding assistant via the Claude.ai platform.

Intelligent Conversational Systems

With enhanced reasoning abilities and a friendly, natural tone, Claude 3.5 Sonnet is ideal for developing intelligent conversational systems that need to connect data across systems and perform actions.

Smart Knowledge Q&A

Claude 3.5 Sonnet’s large-scale context processing capabilities and very low hallucination rate make it an ideal choice for handling Q&A tasks involving large knowledge bases, documents, and code repositories.

Visual Information Extraction

Claude 3.5 Sonnet can easily extract information from charts, graphs, and complex diagrams, making it an ideal AI model for data analysis and data science tasks.

Process Automation

Claude 3.5 Sonnet can automate repetitive tasks or processes. It has industry-leading command execution capabilities, able to handle complex workflows and operations.

Pricing

Notably, the upgraded Claude 3.5 Sonnet achieves significant performance breakthroughs while maintaining the same price and speed as its predecessor.

Feedback from early testers further confirms that the upgraded Claude 3.5 Sonnet represents a qualitative leap in AI-driven coding.

API pricing starts at $3 per million input tokens and $15 per million output tokens.

By using smart caching technology, costs can be reduced by up to 90%, and using the batch processing API can save 50% in costs.

The Way to Furture

Until now, LLM developers have been striving to adapt tools to fit the model, creating special environments for AI to use specially designed tools to complete various tasks.

Now, Anthropic is taking a different approach—they are choosing to adapt the model to the tools. This means Claude can integrate into our daily computer environment and directly use existing software, just like a human.

Although Claude has reached the highest current level, its operations are still relatively slow and prone to errors.

Many of the actions we perform daily on computers, such as dragging and zooming, are still beyond Claude’s capabilities.

However, Claude’s current performance makes us optimistic about the future: the ability of AI to operate computers will improve rapidly, and one day, even beginners in software development will be able to use it with ease.

Author
Recent Posts

Shawn Banks

Shawn Banks is a seasoned blog writer with a deep focus on AI-generated content (AIGC). With extensive experience in exploring the intersection of artificial intelligence and digital media, Shawn banks provides insights into the latest innovations, tools, and trends shaping the future of content creation. Passionate about making complex AI concepts accessible, Shawn Banks writes user-centric articles that not only educate but also inspire content creators, developers, and businesses to leverage AI technologies for transformative results. Whether discussing advanced AI models, exploring creative applications of generative AI, or reviewing cutting-edge tools, Shawn Banks is dedicated to helping readers stay ahead in the rapidly evolving world of AIGC.

Try It online For Free

Get Started