Job description
Top Required Skills:
- System Administration (Linux/Windows) – Managing and troubleshooting servers.
- “Eyes on Glass” Monitoring Experience – Real-time monitoring of system alerts.
- AWS Experience – Cloud infrastructure monitoring.
- Enterprise Monitoring Tools (Dynatrace is a HUGE plus!) – Tools like:
- Dynatrace, Datadog, AppDynamics, BigPanda, SCOM, LogicMonitor
- (Experience with ServiceNow for ticketing is a plus.)
Job Responsibilities:
- Proactive Monitoring:
- Monitor company, mobile channels, and production environments.
- Track application health, performance, and availability using enterprise tools.
- Incident Response:
- Quickly assess and resolve system alerts from monitoring tools.
- Escalate critical issues to appropriate teams (Server Ops, Network Ops, DevOps, etc.).
- System Troubleshooting:
- Work with Linux/Windows, VMware/Hyper-V, AWS, middleware (WebLogic, WebSphere, DataPower), databases, and storage.
- Write scripts (Shell, Python) for automation (a plus).
- ITIL Processes:
- Follow Incident, Problem, and Change Management best practices.
- Communication & Collaboration:
- Work with on-shore/off-shore teams, IT Command Center, and business stakeholders.
- Provide clear, timely updates during outages.