Building Secure Site-to-Site VPN Architecture for Multi-Operator Telecom Integration

Building Secure Site-to-Site VPN Architecture for Multi-Operator Telecom Integration

Overview

This post explores how we built a scalable and secure network architecture to integrate with multiple Mobile Network Operators (MNOs) for real-time telecom recharge processing. The solution uses Site-to-Site VPN connectivity to establish private, secure communication channels between cloud infrastructure and MNO networks.

Use Case: Financial technology platforms that need to integrate with multiple telecom operators for services like mobile recharges, balance inquiries, or value-added services often face the challenge of establishing secure, reliable connectivity with each operator’s private network infrastructure.

Problem Statement

When building a telecom recharge platform that integrates with multiple MNOs, we encountered several critical challenges:

1. Private Network Requirement

MNOs do not expose their recharge APIs to the public internet for security and compliance reasons. All API endpoints are hosted on private IP addresses within their internal networks.

2. IP Whitelisting Constraints

Each MNO has strict security policies requiring:

  • Requests must originate from pre-approved IP addresses or CIDR ranges
  • Static IP requirements vary by operator
  • No public internet exposure of their API infrastructure

3. Multiple Integration Complexity

  • Each MNO has unique network configurations, firewall rules, and VPN requirements
  • Different operators have different technical capabilities and security standards
  • Need to support 5+ MNOs simultaneously with isolated network segments

4. Security and Compliance

  • Transaction data is highly sensitive (financial and personal information)
  • Need end-to-end encryption for all recharge requests
  • Audit trail requirements for all network communications

5. Scalability Concerns

  • Must handle variable traffic loads across different operators
  • Some MNOs accept flexible IP ranges, others require single static IPs
  • Solution must support adding new MNOs without architectural changes

Initial Approaches Tried

Approach: Public API Endpoints with Authentication

What we tried: Initially explored if MNOs could expose public HTTPS endpoints secured with API keys and OAuth tokens.

Why it didn’t work:

  • MNOs refused to expose internal recharge systems to public internet
  • Compliance and security policies mandated private network connectivity
  • Concerns about DDoS attacks and unauthorized access attempts

Why Site-to-Site VPN Architecture

After evaluating the constraints and failed approaches, we chose Site-to-Site VPN as the foundation of our MNO integration architecture. Here’s why:

1. Private Network Extension

Site-to-Site VPN creates secure tunnels that extend our cloud VPC into MNO private networks, making our infrastructure appear as if it’s part of their internal network.

2. Meets MNO Security Requirements

  • Provides encrypted communication channels that satisfy compliance requirements
  • Allows MNOs to maintain their private IP addressing schemes
  • Enables mutual trust establishment through VPN configuration

3. Scalable Network Isolation

  • Each MNO gets a dedicated subnet within our VPC
  • Network traffic is isolated per operator using routing rules
  • Can support unlimited MNOs by creating additional subnets and VPN tunnels

4. Flexible IP Management

  • Supports both static IP requirements (single EC2 instances)
  • Supports CIDR range requirements (auto-scaling container workloads)
  • MNO-specific routing ensures traffic originates from approved network segments

5. Operational Simplicity

  • Once VPN tunnel is established, it’s transparent to applications
  • Applications make standard HTTP/SOAP API calls as if MNO is on local network
  • No special VPN client software needed in application code

6. Cloud-Native Integration

  • Leverages cloud provider’s managed VPN services
  • Integrates seamlessly with existing VPC, subnets, and security groups
  • Supports cross-account resource sharing for development/testing environments

Implementation Details

Architecture Overview

The architecture consists of three main layers:

[User Layer] → [Application Layer] → [Network Layer] → [MNO Networks]

1. Network Foundation Layer

VPC Setup:

Production VPC (10.0.0.0/16)
├── Public Subnet (Load Balancers)
├── Private Subnet (Application Workloads)
├── MNO-Specific Subnets:
    ├── Operator-A Subnet (10.0.1.0/24)
    ├── Operator-B Subnet (10.0.2.0/24)
    └── Operator-C Subnet (10.0.3.0/24)

Site-to-Site VPN Configuration:

  • Each MNO requires a dedicated VPN connection
  • VPN tunnels are configured based on MNO-provided specifications (network requirements, firewall rules, IP ranges)
  • Routing tables ensure traffic destined for MNO APIs goes through correct VPN tunnel
  • Each MNO has unique configuration requirements that must be manually configured by DevOps

Configuration Management:

  • Network configurations are stored as infrastructure parameters
  • Subnet IDs are stored using naming convention: /mno/subnets/{OPERATOR_NAME}_SUBNET_ID
  • This allows application layer to programmatically discover available MNO networks

2. Hybrid Deployment Strategy

The architecture supports two deployment patterns based on MNO requirements:

Pattern A: Container-Based (ECS Fargate)

  • When to use: MNO accepts CIDR range (e.g., 10.0.1.0/24)
  • Benefits: Auto-scaling, cost-effective, serverless
  • Implementation:
    • Containerized applications deployed as Fargate tasks
    • Tasks run in MNO-specific subnet
    • Application Load Balancer with path-based routing (/operator-a, /operator-b)
    • IP addresses are dynamic but always within whitelisted CIDR range

Pattern B: Instance-Based (EC2)

  • When to use: MNO requires single static IP whitelisting
  • Benefits: Meets strict IP requirements
  • Implementation:
    • Dedicated EC2 instance in MNO-specific subnet
    • Elastic IP assigned and shared with MNO for whitelisting
    • Manual deployment and management required

3. Application Architecture

Configuration-Driven Design:

// Example MNO Configuration
{
  name: 'OPERATOR_A',
  subnet: 'OPERATOR_A_SUBNET_ID',
  api: {
    ip: '172.16.1.10',      // MNO private IP
    port: 8080,
    type: 'SOAP'
  },
  phoneNumberPattern: /\+9370/,
  deployedResource: 'ECS',   // or 'EC2'
  rechargeHandler: 'OperatorARechargeHandler'
}

Request Flow:

1. User initiates recharge request
   ↓
2. API Gateway authenticates and routes request
   ↓
3. Load Balancer routes to MNO-specific path
   ↓
4. Application (ECS/EC2) running in MNO subnet receives request
   ↓
5. Application makes API call to MNO private IP
   ↓
6. Request travels through Site-to-Site VPN tunnel
   ↓
7. MNO receives request from whitelisted IP/subnet
   ↓
8. MNO processes recharge and returns response
   ↓
9. Response flows back through same path

4. Security Implementation

Network Security:

  • Security groups restrict traffic between subnets
  • Each MNO subnet has isolated egress rules
  • Network ACLs provide additional layer of defense

Credential Management:

  • MNO API credentials (username, password, account IDs) stored in cloud secrets manager
  • Applications retrieve credentials at runtime using IAM role-based access
  • Credentials are never hardcoded or stored in application code
  • Each deployment environment (dev, staging, prod) has separate credentials

Access Control:

  • Least privilege IAM policies for all resources
  • VPN endpoints secured with authentication
  • Application-level authentication using MNO-provided credentials

5. Cross-Account Architecture

To support multiple environments (development, staging, production):

VPC Sharing Strategy:

  • Production VPC subnets are shared with non-production AWS accounts
  • Developers can test integrations from correct network segments
  • Maintains network isolation while enabling full development workflow
  • Cost optimization by sharing VPN infrastructure

6. Technology Stack

Infrastructure:

  • Cloud Provider: AWS
  • VPN: Site-to-Site VPN connections (manually configured per MNO)
  • Compute: ECS Fargate for auto-scaling workloads, EC2 for static IP requirements
  • Load Balancing: Application Load Balancer with path-based routing
  • Configuration: Systems Manager Parameter Store

Application:

  • Runtime: Node.js with Express framework
  • Container: Docker images deployed to ECS
  • Protocol Support: SOAP and REST API integrations
  • Secrets: AWS Secrets Manager for credential storage

Outcome & Benefits

1. Security Compliance

  • :white_check_mark: All MNO security requirements met
  • :white_check_mark: End-to-end encrypted communication channels
  • :white_check_mark: No exposure of sensitive APIs to public internet
  • :white_check_mark: Audit trail for all network communications

2. Scalability Achieved

  • :white_check_mark: Successfully integrated 5+ MNOs with isolated network segments
  • :white_check_mark: Auto-scaling for container-based deployments handles traffic spikes
  • :white_check_mark: Can add new MNOs without architectural changes
  • :white_check_mark: Cross-account sharing enables parallel development

3. Cost Optimization

  • :white_check_mark: Container-based deployments: pay only for active processing time
  • :white_check_mark: Shared VPN infrastructure across environments
  • :white_check_mark: Eliminated need for multiple VPN client licenses
  • :white_check_mark: Reduced operational overhead through automation

4. Operational Excellence

  • :white_check_mark: Configuration-driven approach simplifies MNO onboarding
  • :white_check_mark: Standardized deployment patterns across all operators
  • :white_check_mark: Clear separation between manual network setup and automated application deployment
  • :white_check_mark: Automated deployment for container-based integrations

5. Reliability

  • :white_check_mark: Isolated failure domains per MNO
  • :white_check_mark: Load balancer health checks and automatic failover
  • :white_check_mark: Multiple availability zones for high availability
  • :white_check_mark: VPN tunnels provide stable, consistent connectivity

Lessons Learned & Pitfalls

1. VPN Configuration is Manual Work

Learning: Each MNO has unique network requirements that cannot be fully automated.

  • VPN tunnel configuration requires manual DevOps involvement
  • Network firewall rules vary significantly across operators
  • Testing connectivity takes time and coordination with MNO network teams

Recommendation: Budget sufficient time for network setup phase (2-4 weeks per MNO)

2. Standardize Configuration Naming

Learning: Inconsistent naming between DevOps infrastructure and application configuration causes deployment failures.

Solution: Established strict naming convention for subnet parameters:

  • DevOps creates: /mno/subnets/OPERATOR_NAME_SUBNET_ID
  • Application reads this exact parameter name
  • Prevents spelling mistakes and ensures consistency

3. Handle Both Static and Dynamic IP Requirements

Learning: Not all MNOs have the same IP whitelisting capabilities.

Solution: Built hybrid architecture supporting both:

  • ECS Fargate for MNOs accepting CIDR ranges (majority of cases)
  • EC2 instances for MNOs requiring single static IPs (rare but necessary)

4. Cross-Account Testing is Critical

Learning: Production VPN connectivity must be testable from non-production environments.

Solution: Share production VPC subnets with dev/staging accounts using resource sharing. This allows developers to test actual network connectivity without deploying to production.

5. Configuration-Driven Design is Key

Learning: Hardcoding MNO-specific logic leads to unmaintainable code.

Solution: Externalize all MNO-specific configurations:

  • API endpoints, ports, protocols
  • Phone number patterns for routing
  • Deployment resource type (ECS vs EC2)
  • Custom request/response handlers

6. Secrets Management from Day One

Learning: Never store credentials in code or configuration files.

Solution: Use managed secrets service from the beginning:

  • Store credentials in secrets manager
  • Use IAM roles for access control
  • Rotate credentials regularly
  • Separate secrets per environment

7. Network Troubleshooting is Complex

Pitfall: When API calls fail, determining if it’s network, VPN, firewall, or application issue is challenging.

Recommendation:

  • Implement comprehensive logging at each layer
  • Set up monitoring for VPN tunnel status
  • Document troubleshooting runbooks for each MNO
  • Maintain contact information for MNO network teams

Future Improvements

1. VPN Automation

Explore Infrastructure-as-Code approaches for VPN configuration where possible:

  • Automated VPN tunnel provisioning for MNOs with standardized requirements
  • Configuration templates for common MNO network patterns
  • Reduce manual setup time from weeks to days

2. Advanced Load Balancing

Implement intelligent traffic routing:

  • Health-based routing to automatically bypass degraded MNO connections
  • Geographic routing for MNOs with multiple regional endpoints
  • Weighted routing for A/B testing new MNO integrations

3. Enhanced Monitoring

Build comprehensive observability:

  • Real-time VPN tunnel health dashboards
  • Per-MNO latency and success rate metrics
  • Automated alerting for connectivity issues
  • Network flow analysis for troubleshooting

4. Transit Gateway Architecture

For organizations with 10+ MNO integrations, consider:

  • AWS Transit Gateway for centralized VPN management
  • Hub-and-spoke network topology
  • Simplified routing across multiple VPCs and VPN connections
  • Better cost optimization at scale

5. Disaster Recovery

Implement resilience improvements:

  • Secondary VPN tunnels for critical MNO connections
  • Automated failover mechanisms
  • Connection quality monitoring and automatic rerouting
  • Regular DR testing procedures

6. Developer Experience

Improve development workflow:

  • Local development environment with MNO API mocks
  • Automated integration testing framework
  • Self-service MNO onboarding portal for admins
  • Better configuration validation tools

Conclusion

Building a Site-to-Site VPN architecture for multi-MNO integration requires careful balance between security, scalability, and operational complexity. By combining secure network connectivity with a hybrid deployment strategy, we created a system that:

  • Meets stringent MNO security requirements
  • Scales automatically based on traffic patterns
  • Supports diverse technical requirements across operators
  • Maintains operational efficiency through automation where possible

The key to success is embracing the hybrid nature of the problem: standardize and automate where possible (application deployment), but recognize when manual configuration is necessary (network setup). This pragmatic approach allows teams to move quickly while maintaining security and reliability standards.

Whether you’re building a fintech platform, IoT integration, or any system requiring secure connectivity to multiple private networks, Site-to-Site VPN architecture provides a proven pattern for success.


Tags: #VPN #NetworkArchitecture #CloudComputing #TelecomIntegration #SecurityArchitecture AWS #Microservices #FinTech

3 Likes