Building Secure Site-to-Site VPN Architecture for Multi-Operator Telecom Integration
Overview
This post explores how we built a scalable and secure network architecture to integrate with multiple Mobile Network Operators (MNOs) for real-time telecom recharge processing. The solution uses Site-to-Site VPN connectivity to establish private, secure communication channels between cloud infrastructure and MNO networks.
Use Case: Financial technology platforms that need to integrate with multiple telecom operators for services like mobile recharges, balance inquiries, or value-added services often face the challenge of establishing secure, reliable connectivity with each operator’s private network infrastructure.
Problem Statement
When building a telecom recharge platform that integrates with multiple MNOs, we encountered several critical challenges:
1. Private Network Requirement
MNOs do not expose their recharge APIs to the public internet for security and compliance reasons. All API endpoints are hosted on private IP addresses within their internal networks.
2. IP Whitelisting Constraints
Each MNO has strict security policies requiring:
- Requests must originate from pre-approved IP addresses or CIDR ranges
- Static IP requirements vary by operator
- No public internet exposure of their API infrastructure
3. Multiple Integration Complexity
- Each MNO has unique network configurations, firewall rules, and VPN requirements
- Different operators have different technical capabilities and security standards
- Need to support 5+ MNOs simultaneously with isolated network segments
4. Security and Compliance
- Transaction data is highly sensitive (financial and personal information)
- Need end-to-end encryption for all recharge requests
- Audit trail requirements for all network communications
5. Scalability Concerns
- Must handle variable traffic loads across different operators
- Some MNOs accept flexible IP ranges, others require single static IPs
- Solution must support adding new MNOs without architectural changes
Initial Approaches Tried
Approach: Public API Endpoints with Authentication
What we tried: Initially explored if MNOs could expose public HTTPS endpoints secured with API keys and OAuth tokens.
Why it didn’t work:
- MNOs refused to expose internal recharge systems to public internet
- Compliance and security policies mandated private network connectivity
- Concerns about DDoS attacks and unauthorized access attempts
Why Site-to-Site VPN Architecture
After evaluating the constraints and failed approaches, we chose Site-to-Site VPN as the foundation of our MNO integration architecture. Here’s why:
1. Private Network Extension
Site-to-Site VPN creates secure tunnels that extend our cloud VPC into MNO private networks, making our infrastructure appear as if it’s part of their internal network.
2. Meets MNO Security Requirements
- Provides encrypted communication channels that satisfy compliance requirements
- Allows MNOs to maintain their private IP addressing schemes
- Enables mutual trust establishment through VPN configuration
3. Scalable Network Isolation
- Each MNO gets a dedicated subnet within our VPC
- Network traffic is isolated per operator using routing rules
- Can support unlimited MNOs by creating additional subnets and VPN tunnels
4. Flexible IP Management
- Supports both static IP requirements (single EC2 instances)
- Supports CIDR range requirements (auto-scaling container workloads)
- MNO-specific routing ensures traffic originates from approved network segments
5. Operational Simplicity
- Once VPN tunnel is established, it’s transparent to applications
- Applications make standard HTTP/SOAP API calls as if MNO is on local network
- No special VPN client software needed in application code
6. Cloud-Native Integration
- Leverages cloud provider’s managed VPN services
- Integrates seamlessly with existing VPC, subnets, and security groups
- Supports cross-account resource sharing for development/testing environments
Implementation Details
Architecture Overview
The architecture consists of three main layers:
[User Layer] → [Application Layer] → [Network Layer] → [MNO Networks]
1. Network Foundation Layer
VPC Setup:
Production VPC (10.0.0.0/16)
├── Public Subnet (Load Balancers)
├── Private Subnet (Application Workloads)
├── MNO-Specific Subnets:
├── Operator-A Subnet (10.0.1.0/24)
├── Operator-B Subnet (10.0.2.0/24)
└── Operator-C Subnet (10.0.3.0/24)
Site-to-Site VPN Configuration:
- Each MNO requires a dedicated VPN connection
- VPN tunnels are configured based on MNO-provided specifications (network requirements, firewall rules, IP ranges)
- Routing tables ensure traffic destined for MNO APIs goes through correct VPN tunnel
- Each MNO has unique configuration requirements that must be manually configured by DevOps
Configuration Management:
- Network configurations are stored as infrastructure parameters
- Subnet IDs are stored using naming convention:
/mno/subnets/{OPERATOR_NAME}_SUBNET_ID - This allows application layer to programmatically discover available MNO networks
2. Hybrid Deployment Strategy
The architecture supports two deployment patterns based on MNO requirements:
Pattern A: Container-Based (ECS Fargate)
- When to use: MNO accepts CIDR range (e.g., 10.0.1.0/24)
- Benefits: Auto-scaling, cost-effective, serverless
- Implementation:
- Containerized applications deployed as Fargate tasks
- Tasks run in MNO-specific subnet
- Application Load Balancer with path-based routing (
/operator-a,/operator-b) - IP addresses are dynamic but always within whitelisted CIDR range
Pattern B: Instance-Based (EC2)
- When to use: MNO requires single static IP whitelisting
- Benefits: Meets strict IP requirements
- Implementation:
- Dedicated EC2 instance in MNO-specific subnet
- Elastic IP assigned and shared with MNO for whitelisting
- Manual deployment and management required
3. Application Architecture
Configuration-Driven Design:
// Example MNO Configuration
{
name: 'OPERATOR_A',
subnet: 'OPERATOR_A_SUBNET_ID',
api: {
ip: '172.16.1.10', // MNO private IP
port: 8080,
type: 'SOAP'
},
phoneNumberPattern: /\+9370/,
deployedResource: 'ECS', // or 'EC2'
rechargeHandler: 'OperatorARechargeHandler'
}
Request Flow:
1. User initiates recharge request
↓
2. API Gateway authenticates and routes request
↓
3. Load Balancer routes to MNO-specific path
↓
4. Application (ECS/EC2) running in MNO subnet receives request
↓
5. Application makes API call to MNO private IP
↓
6. Request travels through Site-to-Site VPN tunnel
↓
7. MNO receives request from whitelisted IP/subnet
↓
8. MNO processes recharge and returns response
↓
9. Response flows back through same path
4. Security Implementation
Network Security:
- Security groups restrict traffic between subnets
- Each MNO subnet has isolated egress rules
- Network ACLs provide additional layer of defense
Credential Management:
- MNO API credentials (username, password, account IDs) stored in cloud secrets manager
- Applications retrieve credentials at runtime using IAM role-based access
- Credentials are never hardcoded or stored in application code
- Each deployment environment (dev, staging, prod) has separate credentials
Access Control:
- Least privilege IAM policies for all resources
- VPN endpoints secured with authentication
- Application-level authentication using MNO-provided credentials
5. Cross-Account Architecture
To support multiple environments (development, staging, production):
VPC Sharing Strategy:
- Production VPC subnets are shared with non-production AWS accounts
- Developers can test integrations from correct network segments
- Maintains network isolation while enabling full development workflow
- Cost optimization by sharing VPN infrastructure
6. Technology Stack
Infrastructure:
- Cloud Provider: AWS
- VPN: Site-to-Site VPN connections (manually configured per MNO)
- Compute: ECS Fargate for auto-scaling workloads, EC2 for static IP requirements
- Load Balancing: Application Load Balancer with path-based routing
- Configuration: Systems Manager Parameter Store
Application:
- Runtime: Node.js with Express framework
- Container: Docker images deployed to ECS
- Protocol Support: SOAP and REST API integrations
- Secrets: AWS Secrets Manager for credential storage
Outcome & Benefits
1. Security Compliance
All MNO security requirements met
End-to-end encrypted communication channels
No exposure of sensitive APIs to public internet
Audit trail for all network communications
2. Scalability Achieved
Successfully integrated 5+ MNOs with isolated network segments
Auto-scaling for container-based deployments handles traffic spikes
Can add new MNOs without architectural changes
Cross-account sharing enables parallel development
3. Cost Optimization
Container-based deployments: pay only for active processing time
Shared VPN infrastructure across environments
Eliminated need for multiple VPN client licenses
Reduced operational overhead through automation
4. Operational Excellence
Configuration-driven approach simplifies MNO onboarding
Standardized deployment patterns across all operators
Clear separation between manual network setup and automated application deployment
Automated deployment for container-based integrations
5. Reliability
Isolated failure domains per MNO
Load balancer health checks and automatic failover
Multiple availability zones for high availability
VPN tunnels provide stable, consistent connectivity
Lessons Learned & Pitfalls
1. VPN Configuration is Manual Work
Learning: Each MNO has unique network requirements that cannot be fully automated.
- VPN tunnel configuration requires manual DevOps involvement
- Network firewall rules vary significantly across operators
- Testing connectivity takes time and coordination with MNO network teams
Recommendation: Budget sufficient time for network setup phase (2-4 weeks per MNO)
2. Standardize Configuration Naming
Learning: Inconsistent naming between DevOps infrastructure and application configuration causes deployment failures.
Solution: Established strict naming convention for subnet parameters:
- DevOps creates:
/mno/subnets/OPERATOR_NAME_SUBNET_ID - Application reads this exact parameter name
- Prevents spelling mistakes and ensures consistency
3. Handle Both Static and Dynamic IP Requirements
Learning: Not all MNOs have the same IP whitelisting capabilities.
Solution: Built hybrid architecture supporting both:
- ECS Fargate for MNOs accepting CIDR ranges (majority of cases)
- EC2 instances for MNOs requiring single static IPs (rare but necessary)
4. Cross-Account Testing is Critical
Learning: Production VPN connectivity must be testable from non-production environments.
Solution: Share production VPC subnets with dev/staging accounts using resource sharing. This allows developers to test actual network connectivity without deploying to production.
5. Configuration-Driven Design is Key
Learning: Hardcoding MNO-specific logic leads to unmaintainable code.
Solution: Externalize all MNO-specific configurations:
- API endpoints, ports, protocols
- Phone number patterns for routing
- Deployment resource type (ECS vs EC2)
- Custom request/response handlers
6. Secrets Management from Day One
Learning: Never store credentials in code or configuration files.
Solution: Use managed secrets service from the beginning:
- Store credentials in secrets manager
- Use IAM roles for access control
- Rotate credentials regularly
- Separate secrets per environment
7. Network Troubleshooting is Complex
Pitfall: When API calls fail, determining if it’s network, VPN, firewall, or application issue is challenging.
Recommendation:
- Implement comprehensive logging at each layer
- Set up monitoring for VPN tunnel status
- Document troubleshooting runbooks for each MNO
- Maintain contact information for MNO network teams
Future Improvements
1. VPN Automation
Explore Infrastructure-as-Code approaches for VPN configuration where possible:
- Automated VPN tunnel provisioning for MNOs with standardized requirements
- Configuration templates for common MNO network patterns
- Reduce manual setup time from weeks to days
2. Advanced Load Balancing
Implement intelligent traffic routing:
- Health-based routing to automatically bypass degraded MNO connections
- Geographic routing for MNOs with multiple regional endpoints
- Weighted routing for A/B testing new MNO integrations
3. Enhanced Monitoring
Build comprehensive observability:
- Real-time VPN tunnel health dashboards
- Per-MNO latency and success rate metrics
- Automated alerting for connectivity issues
- Network flow analysis for troubleshooting
4. Transit Gateway Architecture
For organizations with 10+ MNO integrations, consider:
- AWS Transit Gateway for centralized VPN management
- Hub-and-spoke network topology
- Simplified routing across multiple VPCs and VPN connections
- Better cost optimization at scale
5. Disaster Recovery
Implement resilience improvements:
- Secondary VPN tunnels for critical MNO connections
- Automated failover mechanisms
- Connection quality monitoring and automatic rerouting
- Regular DR testing procedures
6. Developer Experience
Improve development workflow:
- Local development environment with MNO API mocks
- Automated integration testing framework
- Self-service MNO onboarding portal for admins
- Better configuration validation tools
Conclusion
Building a Site-to-Site VPN architecture for multi-MNO integration requires careful balance between security, scalability, and operational complexity. By combining secure network connectivity with a hybrid deployment strategy, we created a system that:
- Meets stringent MNO security requirements
- Scales automatically based on traffic patterns
- Supports diverse technical requirements across operators
- Maintains operational efficiency through automation where possible
The key to success is embracing the hybrid nature of the problem: standardize and automate where possible (application deployment), but recognize when manual configuration is necessary (network setup). This pragmatic approach allows teams to move quickly while maintaining security and reliability standards.
Whether you’re building a fintech platform, IoT integration, or any system requiring secure connectivity to multiple private networks, Site-to-Site VPN architecture provides a proven pattern for success.
Tags: #VPN #NetworkArchitecture #CloudComputing #TelecomIntegration #SecurityArchitecture AWS #Microservices #FinTech
