← All Jobs
Posted Mar 31, 2026

Datadog Platform Expert

Apply Now
This is a remote position. Datadog Platform Expert We are seeking a high-level Datadog Expert to audit and optimize our leading client’s primary observability platform. This is not a "user" role; we need an expert capable of re-engineering data flows for maximum efficiency. Datadog Platform Expertise (Must Have) - Minimum 4+ years of hands-on experience with the Datadog platform in production environments. - Deep expertise across Datadog’s core product suite: Infrastructure Monitoring, APM (Application Performance Monitoring), Log Management, Synthetics, Network Monitoring, and Real User Monitoring (RUM). - Proven experience in Datadog cost optimisation, including data ingestion reduction, licence right-sizing, and metric cardinality management. - Expert-level knowledge of Datadog Agent deployment, configuration, and troubleshooting across bare-metal, VM, and containerised environments (Docker, Kubernetes). - Strong experience with Datadog’s tagging strategy, service catalogue, and custom metrics (DogStatsD, custom checks). - Experience with Datadog API and programmatic management of monitors, dashboards, and SLOs. - Familiarity with Datadog’s pricing model and ability to forecast and optimise costs based on usage patterns. Cloud Infrastructure (Must Have) - Strong AWS experience (minimum 3+ years), including EC2, ECS/EKS, Lambda, RDS, S3, CloudWatch, and VPC networking. - Experience monitoring AWS cost drivers and correlating infrastructure changes with observability cost impact. - Familiarity with Infrastructure-as-Code (Terraform, CloudFormation) for managing Datadog resources programmatically. - Understanding of Kubernetes monitoring patterns: DaemonSets, sidecar injection, cluster-level metrics, and container log collection. Service Management and Automation (Must Have) - Experience integrating Datadog with Jira Service Management, including webhook-based alert forwarding and bidirectional status sync. - Knowledge of incident management workflows: escalation policies, runbook automation, and post-incident review processes. - Experience with PagerDuty, OpsGenie, or similar on-call management tools and their integration with Datadog. - Ability to design and implement automated remediation workflows triggered by Datadog alerts. Data Quality and Analytics (Must Have) - Experience auditing and improving data quality in observability pipelines (metrics, logs, traces). - Strong analytical skills with the ability to identify patterns, anomalies, and data integrity issues in large-scale telemetry data. - Experience designing custom dashboards and reports for engineering leadership, focusing on actionable insights. Preferred and Bonus Skills - Datadog Fundamentals Certification, Log Management Certification, or APM Certification (highly preferred). - Datadog Cloud SIEM for AWS Fundamentals certification. - Experience with FinOps frameworks and cloud cost management tools (AWS Cost Explorer, Trusted Advisor, CloudHealth, Kubecost). - Experience in financial services or banking environments, particularly with regulatory compliance for data handling and retention. - Familiarity with Thought Machine (core banking platform) or similar modern banking technology stacks. - Experience with AI/ML-driven observability features: anomaly detection, forecasting, Watchdog, and intelligent alerting. - Contributions to or experience with Datadog’s open-source ecosystem (datadog-agent, dd-trace libraries, integrations). - Experience with log parsing, pipeline processing, and log-to-metric conversion strategies in Datadog.