When it comes to managing a healthy alerting system for your security operations center (SOC), tuning false positives is only half the battle. An often overlooked aspect of a healthy alerting system is making sure that critical detections which rarely fire haven’t simply broken completely without anybody noticing.At GitLab, the Signals Engineering team tests detections by simulating real malicious behavior on infrastructure we own to validate that our detections fire end-to-end — from the log source, through ingestion, into the SIEM, and all the way through our security orchestration, automation, and response (SOAR) alert routing. This is the approach taken by commercial Breach and Attack Simulation (BAS) tools, but those tools are expensive, generic, and not tailored to our specific detection stack. So we built our own fully automated framework we named Weekly Attack Testing for Continuous Health, or WATCH.In this article, you'll learn why we developed this framework, how it works, and how to use it in your environment.A gap in detection validationWith log schema changes, SIEM updates, pipeline misconfigurations, etc. there are a million ways for your detections to fail silently and only one way for them to fire as expected. When faced with these odds, the conclusion is obvious: “Let’s trigger some old detections!” This raises the next question, however, of “How exactly does one trigger detections?” and “How often?”One way to trigger detections is through the synthetic approach of reintroducing logs into your SIEM that simulate malicious behavior. Then, you wait to see if your detection rule catches the fake issue and triggers an alert. This approach, aside from failing to prove the detection works in a “real world” scenario, doesn’t validate one of the most error-prone stages of the alert lifecycle, log ingestion (i.e. from log source to SIEM).We previously wrote about how our GitLab Universal Automated Response and Detection (GUARD) system automates detection creation and deployment through a detections as code (DaC) pipeline and how alerts are routed and triaged through our SOAR. Our DaC pipelines solve the problem of validating that a detection can deploy without errors, but it doesn't answer the question of whether that detection will actually fire when the behavior it targets occurs in the wild.WATCH closes that gap. It's the continuous validation layer that gives us confidence that our detections are working.How WATCH worksAt a high level, WATCH works by executing scripted attack simulations in our staging environment, and then verifying that the expected alerts propagate through our entire security monitoring stack: our SIEM for detection rules, our SOAR for alert routing, and ultimately the dashboards our team uses to monitor detection health.The lifecycle of a WATCH test looks like this:Scheduling: Every week, a scheduled GitLab CI/CD pipeline discovers all active tests and distributes them into randomized time slots across the week. Randomization is important; we don't want tests firing at predictable times, which would make it too easy to distinguish test activity from real threats and could mask timing-sensitive issues with our detections.Heads-up notification: Before a test runs, WATCH notifies our SOAR via a dedicated "WATCH Heads Up" story, registering the detections it expects to trigger. This creates trackable records so our SOAR knows what's coming.Execution: The test runs its simulated malicious behavior. For example, it resets an admin account password or makes suspicious API calls against the staging environment.Detection: The SIEM processes the activity logs from staging and (hopefully) fires the corresponding detection rules.Correlation: As alerts arrive in our SOAR, an "Is this a WATCH Test?" check determines whether each alert corresponds to a registered test by matching on three factors: the time window between the test run and the alert, the actor identity (IP or username), and the rule ID of the detection that fired. This is what prevents WATCH-generated alerts from being escalated as real incidents to SIRT, while still validating the full pipeline.Verification: A follow-up pipeline stage checks whether all expected detections fired, updates the detection status metadata, and deploys updated results to our GitLab Pages dashboard. If any detection fails to fire, a notification is sent to our team's Slack channel.Using WATCH with GitLab CI/CDWATCH leverages GitLab CI/CD as its orchestration backbone across three pipeline stages.The schedule_pipelines stage runs weekly and handles test distribution. It discovers all active tests, bins them into groups, and creates scheduled pipelines set to run at random times throughout the week. Each scheduled pipeline is given a TESTS_TO_RUN variable specifying which tests it should execute.The run_tests stage is where the actual attack simulation happens. It executes the tests assigned to that pipeline run, saves execution statistics to detection_status.json, and records SOAR record IDs so alert correlation can happen downstream.The pages stage handles verification and reporting. It queries our SOAR to confirm that alerts were generated and properly routed, updates detection metadata with the verification results, and deploys the GitLab Pages dashboard with the latest test outcomes.Below is a template GitLab CI/CD gitlab-ci.yml configuration file for the WATCH pipeline: spec:
Automate detection testing with GitLab CI/CD and Duo
Learn how GitLab's Signals Engineering team built the WATCH framework to continuously validate our security monitoring pipeline.








