Why Backup Tools Should Not Connect to Your Databases Directly

Most backup tools work the same way: they reach into your database, authenticate with stored credentials, open a connection, and extract data. It's the obvious approach — and it's quietly one of the most consequential architectural decisions you can make wrong.

The pattern is so widespread that most teams never question it. But the model has a fundamental flaw baked into its design. It requires your backup infrastructure to hold privileged access to your most sensitive systems, permanently, at scale. That is not a backup problem. That is an attack surface problem.

At Portabase, we built our backup system around the opposite model: an agent that runs close to the database, initiates the backup itself, and pushes data outward. The backup platform never needs inbound credentials. This architectural inversion is not a technical preference — it is a security posture, a compliance strategy, and a scalability decision rolled into one.

The Pull Model and the Credential Sprawl Problem

In a pull-based backup model, the backup tool actively connects to each database, using stored credentials (users, passwords, connection strings, or service accounts) to extract data and send it to storage.

This is simple in theory, but it means the backup system effectively holds access to every database it protects. If the backup tool is compromised, an attacker can often reach the same data as a database administrator across the entire environment.

It also creates network friction: firewalls must allow the backup tool to reach every database. In dynamic cloud setups, this leads to either overly permissive rules or constant maintenance.

Backup tool accessing databases through firewall

Over time, this approach naturally leads to credential sprawl. Each database requires its own service account, and in larger setups this quickly scales to dozens or hundreds of credentials across production, staging, and DR environments.

The problem isn’t just scale, but control. Permissions tend to drift as systems evolve, and backup service accounts often accumulate broader access than intended. Security teams frequently discover that these credentials form a powerful, lightly monitored access layer across the entire database estate.

If those credentials are ever exposed, the impact is immediate: whoever has them can potentially access every connected database.

The Agent Model: Inverting the Control Plane

An agent-based architecture eliminates the need for the backup platform to hold database credentials or maintain inbound network access. Instead, a lightweight agent process runs on or near the database host — inside the same network trust boundary. The agent is configured locally with the access it needs, and it initiates all communication outward to the backup platform.

This inversion of control is architecturally significant. The backup platform receives data; it never asks for it by reaching in. No credentials travel outside the database's trust boundary. No firewall exceptions are required for inbound connections. If the backup platform is compromised, the attacker gains access to backup data — which is serious — but they do not automatically gain the ability to connect to production databases, because the platform never held that access.

The principle at work here is least privilege in its most meaningful sense. The backup platform is granted exactly the capabilities it needs to receive and store data. Nothing more. The agent, which does hold local database access, is scoped to a single host and can be audited, rotated, and revoked independently.

Why This Architecture Scales Across Complex Environments

The operational advantages of the agent model compound as infrastructure complexity grows. In a single-environment deployment, the differences are real but manageable. In a multi-cloud, hybrid, or air-gapped environment, the pull model begins to break down in ways that are expensive to work around.

Multi-cloud deployments typically mean databases running in VPCs or VNets with private addressing, isolated by design from external access. Getting a pull-based backup tool to reach these databases requires either VPN tunnels, private link configurations, or peering arrangements that add cost and maintenance overhead. An agent deployed inside each cloud environment simply pushes outward over HTTPS — a connection type that is almost always permitted in egress-only network policies.

Air-gapped environments are a more extreme version of the same problem. Databases in regulated or classified environments are deliberately isolated from external networks. A pull-based tool cannot function in this context without compromising the air gap. An agent-based model can be configured to push to an intermediary endpoint within the same isolation boundary, maintaining security without sacrificing backup coverage.

Hybrid environments — where on-premises databases coexist with cloud deployments — present similar topology challenges. The agent model treats each environment uniformly. An agent deployed on an on-premises host behaves identically to one deployed in a cloud VM. The backup platform does not need to know or care about network topology.

Compliance Is Not Checkbox Work — It Is Architecture

European regulatory frameworks are increasingly specific about what constitutes adequate technical controls for data protection. ISO 27001, NIS2, and GDPR each have implications for how backup access is structured — and the agent model aligns meaningfully with each of them.

With ISO 27001, the key idea is least privilege. Pull-based backup tools often need broad, long-lived database credentials across many systems, which are hard to tightly control and review. An agent-based model keeps database access local and scoped, while the backup platform itself holds no credentials at all.

NIS2 focuses on supply chain risk. In a pull model, a compromise of the backup platform can expose every database it can access. With agents, the blast radius is smaller: the platform never has direct database access in the first place.

GDPR emphasizes data minimization and traceability. The agent model creates a simple, auditable flow — database → local agent → backup platform — where every step can be logged. Pull-based systems are more indirect, with external systems initiating access into databases, making audit and control more complex.

Compliance of agent-based architecture with ISO 27001, NIS2 and GDPR

Auditability as a First-Class Feature

Security and compliance teams are often most concerned not with preventing incidents, but with being able to reconstruct them afterward. Auditability — the ability to answer "who accessed what, when, and from where" — is increasingly a hard requirement in regulated industries.

Pull-based backup tools create audit ambiguity. When a database access log shows a connection from the backup tool's service account, it is not always possible to distinguish a legitimate scheduled backup from an unauthorized query. The connection pattern looks the same. If the backup tool's service account is used maliciously — either by an insider at the vendor or by an attacker who has compromised the tool — the resulting database access log entries are indistinguishable from routine backup activity.

Agent-based systems produce a different audit profile. The agent's activity is local and logged locally. The backup platform's activity is limited to receiving inbound data and can be logged at the platform level. The two logs can be cross-referenced. Any deviation — data received at the platform without a corresponding agent-side backup event — is detectable. This is not an accident of implementation; it is the natural consequence of an architecture where each component has a defined and bounded role.

The Architectural Choice Is the Security Decision

Backup architecture is often treated as an operational concern — something to be solved with the right vendor and the right SLA. The decision about how a backup tool accesses your databases is, in reality, a security decision with compliance implications and operational consequences that compound over time.

The pull model made sense when databases were few, environments were simple, and regulatory requirements were less specific. It does not hold up under modern infrastructure complexity, or under the scrutiny of frameworks like ISO 27001, NIS2, and GDPR. Every database credential stored by a backup tool is a liability. Every firewall exception created for inbound backup access is an attack surface. Every environment where a pull-based tool cannot reach without network redesign is a gap in your backup coverage.

The agent model resolves these problems at the architectural level. It does not require compensating controls, additional firewall rules, or credential governance workarounds — because it does not create the underlying problems in the first place. The backup platform receives data. It does not reach in and take it. That distinction, simple as it sounds, is the difference between a backup system that strengthens your security posture and one that quietly erodes it.

If you are evaluating backup tooling — or reconsidering infrastructure you already have in place — the first question to ask is not about pricing, retention policies, or storage integrations. It is this: does your backup tool connect to your databases, or do your databases connect to it? The answer tells you everything about the risk model you are accepting.