Problem Statement
We started observing connection pressure on Amazon DocumentDB as traffic increased.
Our workloads (ECS services and microservices) were opening a large number of database connections. Over time, this created serious risks:
-
Connection exhaustion
-
Performance degradation
-
Increasing latency
-
Unpredictable failures under load
-
Pressure to upgrade the DocumentDB instance size
In containerized environments:
-
Each service maintains its own connection pool
-
Horizontal scaling multiplies connection counts
-
Even idle containers hold open connections
For example:
-
10 ECS tasks
-
Each maintaining 100 connections
That results in 1,000 active connections.
DocumentDB enforces connection limits based on instance class. As we approached those limits:
-
New connections failed
-
Latency increased
-
Failures became inconsistent
The immediate solution would have been simple:
Increase the DocumentDB instance size.
But that would increase cost without solving the underlying architectural issue. The real problem was connection management.
What Was Tried
We evaluated the following approaches:
-
Increasing DocumentDB instance size
-
Tuning connection pool size in application code
-
Reducing container scaling limits
-
Introducing a connection pooling layer
Scaling the database was the easiest option but not the most sustainable one. It would temporarily increase limits but not address connection multiplication caused by horizontal scaling.
We decided to introduce MongoBetween, a lightweight TCP proxy and connection pooler for MongoDB-compatible databases.
What Worked
We deployed MongoBetween as a dedicated connection management layer:
Architecture Before
App β DocumentDB
Every container connected directly to DocumentDB.
Architecture After
App β MongoBetween β DocumentDB
Now:
-
Applications connect to MongoBetween
-
MongoBetween maintains pooled connections to DocumentDB
-
Total active database connections reduced significantly
-
Connection reuse improved
-
Centralized connection management
Deployment Architecture
MongoBetween was deployed using:
-
ECS Fargate
-
Internal Network Load Balancer (NLB)
-
Private subnets
-
CloudWatch logging
-
IAM roles for task execution
Key configuration:
-
Exposed port: 27016
-
Network mode: awsvpc
-
CPU: 512
-
Memory: 1024
Application connection string updated from:
mongodb://documentdb-endpoint:27017
To:
mongodb://mongobetween-nlb:27016
No application logic changes required.
What Didnβt Work / Challenges
-
Initial understanding of connection behavior required analysis of metrics.
-
Connection pool misconfiguration can still cause pressure if limits are not tuned properly.
-
MongoBetween becomes part of the data path β it must be monitored carefully.
-
Security groups must be tightly restricted (never expose publicly).
-
High availability considerations must be evaluated (single-task vs multi-task deployment).
It adds a layer of complexity β but a controlled one.
Final Outcome / Learning
The root cause of our issue was not database capacity β it was connection explosion due to horizontal scaling.
Instead of scaling vertically (which increases cost), we improved the architecture by introducing a centralized connection pooling layer.
Key learnings:
-
Database scaling is not always the right first move.
-
Containerized workloads can unintentionally exhaust DB connections.
-
Connection pooling architecture matters at scale.
-
Solving architectural inefficiencies reduces long-term cost.
-
Observability of DB connection metrics is critical.
Production systems do not always need bigger instances. Sometimes they need better design.
If there are alternative approaches to managing DocumentDB connection limits. Please feel free to share your thoughts.
Weβre always open to refining the architecture further and learning from other production patterns.
Reference:
https://github.com/coinbase/mongobetween