Building with AI Agents: A Practical Guide Inspired by Spotify and Anthropic

Overview

Artificial intelligence agents are redefining how we approach software development. Instead of following static scripts or manually orchestrating every microservice, developers now entrust autonomous agents with tasks like code generation, testing, and even architectural decisions. This shift, often called agentic development, was a central theme in the live collaboration between Spotify and Anthropic. In this guide, we’ll distill their conversation into actionable steps, showing you how to build your own agent-powered workflows. By the end, you’ll understand when and how to integrate AI agents into your development pipeline—without losing control.

Building with AI Agents: A Practical Guide Inspired by Spotify and Anthropic — Source: engineering.atspotify.com

Prerequisites

Before diving in, make sure you have:

Basic programming knowledge (Python, JavaScript, or similar)
Familiarity with APIs – we’ll call Anthropic’s Claude API
A code editor (VS Code recommended)
An Anthropic API key (sign up at console.anthropic.com)
Git for version control
Node.js or Python 3.8+ installed

Optional but helpful: experience with prompt engineering and basic agent patterns (react, plan-execute).

Step-by-Step Instructions

1. Define Your Agent’s Goal and Scope

Spotify’s developer workflows revolve around rapid iteration and safe deployment. Their agents aren’t general-purpose; they’re specialized. Start by answering:
What repetitive task do I want to automate?
Examples: generating unit tests, refactoring legacy code, or drafting documentation.

Write a clear objective statement. For our tutorial, let’s build an agent that reviews pull requests (PRs) and suggests improvements.

2. Set Up the Development Environment

Create a new project folder:

mkdir agentic-dev-spotify
cd agentic-dev-spotify
npm init -y   # For Node.js, or python -m venv venv for Python

Install the Anthropic SDK:

npm install @anthropic-ai/sdk  # Node.js
# or pip install anthropic        # Python

Store your API key safely (use environment variables):

export ANTHROPIC_API_KEY="sk-ant-..."  # Linux/macOS
set ANTHROPIC_API_KEY="sk-ant-..."   # Windows

3. Build the Core Agent Logic

An agent typically follows a loop: observe → think → act → evaluate. We’ll implement a simple version tailored to PR review.

import anthropic
import os

client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

def review_code(diff_text: str) -> str:
    prompt = f"""You are a senior code reviewer. Analyze this git diff and list:
1. Potential bugs
2. Style issues
3. Security concerns
4. Suggested improvements

Diff:
{diff_text}
"""
    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=1024,
        messages=[{"role": "user", "content": prompt}]
    )
    return response.content[0].text

This is the think phase. Next, we need to let the agent act – e.g., post comments on the PR via GitHub API.

4. Wire Up the Feedback Loop

Spotify agents often work asynchronously. Add a simple loop that:

Fetches new PR diffs (via GitHub webhook or polling)
Calls the review function
Posts results back to the PR

import requests

def post_comment(repo, pr_number, comment):
    url = f"https://api.github.com/repos/{repo}/issues/{pr_number}/comments"
    headers = {"Authorization": f"token {os.environ['GITHUB_TOKEN']}"}
    data = {"body": comment}
    requests.post(url, json=data, headers=headers)

Now combine into a minimal agent loop:

while True:
    new_diffs = fetch_new_diffs()  # implement with polling
    for diff in new_diffs:
        feedback = review_code(diff["content"])
        post_comment(diff["repo"], diff["pr_number"], feedback)

For production, use asynchronous queues to avoid blocking.

5. Add Guardrails and Human-in-the-Loop

Anthropic’s agents prioritise safety. Spotify adds human approval before merging. Extend your agent:

Set confidence thresholds: if Claude’s response contains high-priority bugs, tag a real reviewer.
Implement a “request changes” action that waits for manual confirmation.

def escalate_if_critical(feedback, diff_info):
    if "critical" in feedback.lower():
        assign_human_reviewer(diff_info)
        return False
    return True

6. Deploy and Observe

Deploy as a GitHub Action, background service, or serverless function. Monitor metrics:

Number of PRs reviewed per day
Percentage of suggestions accepted by developers
False-positive rate

Use logging and dashboards (Datadog, Grafana).

Common Mistakes

Treating Agents as Black Boxes

Don’t assume the AI never errs. Always log raw outputs and have a fallback plan. Spotify engineers never bypass manual review for production–critical changes.

Overloading the Prompt

Agents work best with focused instructions. Including huge codebases in one prompt leads to hallucinations. Chunk inputs (e.g., diff of only changed functions).

Ignoring Rate Limits

Anthropic’s API has rate limits. Implement exponential backoff. Or use a queue (Redis, SQS) to handle spikes gracefully.

Skipping Safety Layers

Agents can generate malicious code if prompted maliciously. Validate agent outputs before execution – e.g., never pipe agent-generated commands directly into eval().

Summary

Agentic development, as demonstrated by Spotify and Anthropic, moves beyond simple automation. By building a specialised agent with clear goals, feedback loops, and safety guards, you can offload repetitive tasks while retaining control. The key takeaway: design for collaboration, not replacement. Use the steps above to start small – your first agent might just review a few PRs, but the pattern scales to anything.