Problem Statement
In production, we were facing user-reported issues that we could not reproduce locally or in staging.
From a monitoring perspective:
-
Logs showed APIs were functioning.
-
APM showed normal response times.
-
Error monitoring showed no crashes.
-
Infrastructure metrics looked healthy.
Yet users were still struggling. During SEV1 incidents, this created a major challenge: We were guessing instead of observing.
The biggest gap was visibility into what the user was actually experiencing in the browser. Traditional monitoring tools did not capture real user interaction or frontend behavior.
What Was Tried
Initially, we relied on:
-
Application logs
-
APM tracing
-
Error tracking tools
-
Infrastructure monitoring
However, these tools primarily provided backend visibility.
They could not show:
-
What the user clicked
-
Where they hesitated
-
Whether a button was unresponsive
-
If a UI element silently failed
-
If a frontend state mismatch occurred
To address this gap, we implemented New Relic Session Replay in production.
What Worked
We integrated Session Replay into our React (SPA) application using the New Relic browser snippet.
Configuration included:
-
Enabled only in Production
-
100% sampling rate
-
Session Replay capture enabled
-
Distributed tracing enabled for correlation
The snippet was added globally to ensure all routes were monitored.
Impact
Session Replay allowed us to:
-
Watch real user session recordings
-
Observe full navigation paths
-
Correlate user actions with backend traces
-
Identify frontend state issues
-
Reproduce issues that were previously non-deterministic
Configuration:
-
Application Type: React (SPA)
-
Environment: Production only
-
Sampling Rate: 100%
-
Capture Session Replays: Enabled
-
Distributed Tracing: Enabled
The browser snippet was added globally so it loads across all routes.
Privacy & Security Controls
Production monitoring must never compromise user trust.
We configured:
-
Masking of password fields
-
Masking of credit card information
-
Automatic redaction of sensitive inputs
All sensitive user data was masked before capture.
Observability without privacy controls is reckless. We ensured responsible implementation.
What Didn’t Work / Challenges
-
100% sampling increases storage usage and cost.
-
Some user sessions were not captured in certain cases.
-
Session recordings have a maximum duration limit of 4 hours.
-
Reviewing long sessions can be time-consuming during investigations.
-
A new session is created after 30 minutes of inactivity.
-
Debugging may require reviewing multiple recordings for the same user.
-
Session retention is limited to 8 days unless a higher plan is purchased.
-
There is a 100GB storage limit for session data.
-
Sessions remain anonymous until the user logs in.
-
Replay analysis requires a disciplined review process during incidents.
-
Session Replay did not work reliably for our NextJs application
We’re open to exploring alternative tools that address limitations such as retention caps, storage limits, multi-session tracking, and better Next.js support. If there are more scalable or cost-effective frontend observability solutions that solve these gaps more efficiently, we’d be interested in evaluating them.
Reference: Session Replay | New Relic