Chapter 4.2 Quiz - SIEM, SOAR & Detection Engineering

Quiz Mode - All answers are hidden under collapsible sections. Attempt each question before revealing the answer.

Question 1

A Sigma rule fires 300 times per day in your environment, but after analyst review, only 3 of those alerts are true positives. Calculate the false positive rate, explain the operational impact, and describe the correct tuning approach without eliminating true positive coverage.

Reveal Answer

Answer:

Calculation:

Total alerts:       300/day
True positives:     3/day
False positives:    297/day
FPR = FP / Total = 297/300 = 99%
TPR = TP / Total = 3/300 = 1%
Precision = TP / (TP + FP) = 3/300 = 0.01 (1%)

Operational impact: An analyst receiving 300 alerts/day spends approximately 1-2 minutes per alert = 5-10 hours/day on this single rule. At 99% FPR, analysts will begin ignoring it - alert fatigue means the 1% real detections are missed. This is worse than having no rule because it actively degrades SOC responsiveness.

Correct tuning approach - preserve TP coverage while reducing FP volume:

# Step 1: Analyze FP pattern - what do all 297 FPs have in common?
# Example finding: all FPs are from svc-deploy account running SCCM scripts

# Step 2: Add targeted filter in Sigma rule
detection:
  selection_powershell:
      Image|endswith: '\powershell.exe'
  selection_download:
      CommandLine|contains:
          - 'DownloadString'
          - 'WebClient'
  filter_known_good:
      User|contains:
          - 'svc-deploy'           # SCCM deployment account
          - 'svc-wsus'             # WSUS update account
      CommandLine|contains:
          - 'WindowsUpdate'        # Known script path
          - 'SCCMInstall'
  condition: selection_powershell and selection_download
             and not filter_known_good

# Step 3: Validate - run the modified rule against 30 days of historical data
# Count FPs and TPs - confirm reduction without losing TP coverage

# Step 4: If FPR still high, consider converting to risk-based scoring
# Instead of alert, add risk score to the entity
# Alert only when cumulative risk score exceeds threshold (combines multiple weaker signals)

Key principle: Never suppress broadly (e.g., exclude all svc-* accounts). Suppress specifically and document every suppression with an expiry date for review.

Question 2

You write an EQL sequence detection for a "PowerShell → cmd.exe → net.exe" process chain. It fires zero times in 30 days despite evidence from threat intel that this technique is used in your industry. What are four possible reasons for zero detections, and how do you diagnose each?

Reveal Answer

Answer:

Reason 1 - Process creation events not being collected

The most common failure. Event ID 4688 (process creation) requires an audit policy change that is not enabled by default.

# Diagnose: check if 4688 events exist in your index
# Elastic KQL:
event.code:4688 | count
# If zero - audit policy not configured

# Fix: enable via GPO
# Computer Config - Windows Settings - Security Settings -
# Advanced Audit Policy - Detailed Tracking - Audit Process Creation - Success
# AND enable command line:
auditpol /set /subcategory:"Process Creation" /success:enable
reg add "HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System\Audit" /v ProcessCreationIncludeCmdLine_Enabled /d 1

Reason 2 - Command line logging disabled

Event 4688 may exist but the process.command_line field is empty because command line auditing requires a separate registry key.

# Diagnose: find 4688 events with empty CommandLine
event.code:4688 AND NOT process.command_line:*
# If many results - command line logging disabled
# Fix: registry key shown above + Sysmon as alternative source

Reason 3 - Attackers using direct parent-child bypass (process hollowing, injection)

If the attacker injects into an existing process or uses COM object instantiation, the process chain does not match powershell.exe - cmd.exe. The attack occurs but the parent-child relationship is different.

# Diagnose: search for net.exe with unexpected parents
event.code:4688 AND process.name:net.exe
| stats count by process.parent.name
# If net.exe spawned by svchost, explorer, etc. - injection bypassing chain
# Fix: add additional detection for net.exe spawned by any unusual parent

Reason 4 - EQL maxspan too short or field name mismatch

The EQL maxspan window may be smaller than the actual attack timing, or field names may not match your ECS mapping.

# Diagnose: run each stage independently
process where process.name == "powershell.exe"   # Does this match anything?
process where process.name == "cmd.exe" and process.parent.name == "powershell.exe"

# If stage 1 matches but stage 2 does not - field naming issue
# Check actual field names in your index:
GET logs-*/_mapping/field/process.parent.name
# May need: winlog.event_data.ParentProcessName instead of process.parent.name

# Increase maxspan and test:
sequence with maxspan=10m   # Was 2m - extend for slow attackers

Operational lesson: Test every detection rule with Atomic Red Team immediately after writing it. A rule that has never been validated against real execution has unknown coverage.

Question 3

Describe the complete SOAR playbook that should automatically execute when a Kerberoasting alert fires. List each step, what data is collected, what automated actions are taken, and at what point human escalation is required.

Reveal Answer

Answer:

Trigger: Event 4769 with TicketEncryptionType=0x17 (RC4) from non-computer account, > 3 service accounts in 5 minutes from the same source IP.

Automated Playbook Steps:

Step 1 - Enrich the source identity (automated, ~30 seconds)
─────────────────────────────────────────────────────────────
INPUTS:  source_ip, source_user from the alert
ACTIONS:
  - Resolve IP to hostname via DHCP logs
  - Query AD: get full account details (groups, last password change, MFA enrolled)
  - Query EDR (CrowdStrike/Defender): get process that made the Kerberos request
    - Was it klist.exe, Rubeus.exe, GetUserSPNs.py, or a legitimate app?
  - Check asset inventory: is this a managed workstation or unmanaged device?
  - Threat intel lookup: has this IP appeared in prior incidents?
OUTPUT:  Enriched source profile attached to case

Step 2 - Assess the targeted service accounts (automated, ~60 seconds)
───────────────────────────────────────────────────────────────────────
INPUTS:  list of ServiceName values from 4769 events
ACTIONS:
  - Query AD for each targeted SPN account:
    - Is it a gMSA? (240-char password - cracking infeasible - lower urgency)
    - When was password last reset?
    - What systems does this account have access to?
    - Is it a high-privilege account (DA, EA, admin)?
  - Cross-reference: has the hash been seen in known breach databases (HIBP API)?
OUTPUT:  Risk rating per targeted account (CRITICAL if DA/EA, HIGH otherwise)

Step 3 - Containment decision (automated with conditions)
──────────────────────────────────────────────────────────
IF process = known malicious tool (Rubeus.exe, GetUserSPNs.py):
  - AUTOMATED: Isolate host via EDR (network isolate, not full shutdown)
  - AUTOMATED: Disable source user account (temporary, 4-hour auto-re-enable)
  - ESCALATE IMMEDIATELY to Tier 2

IF process = unknown/suspicious but not confirmed malicious:
  - AUTOMATED: Block source IP at internal firewall
  - ESCALATE to Tier 1 for review within 15 minutes

IF process = known legitimate (SQL Server, IIS service requesting its own SPN):
  - LOG and CLOSE as false positive
  - Add to rule exception list

Step 4 - Force targeted account password resets (human required)
─────────────────────────────────────────────────────────────────
AUTOMATED PREP:
  - Generate password reset ticket in ServiceNow/Jira
  - Attach list of targeted accounts with current password age
  - Tag accounts that need priority reset (high privilege, old password)

HUMAN ACTION REQUIRED:
  - Verify reset will not break dependent services before executing
  - Reset targeted account passwords (RC4 hash is now invalid)
  - For gMSA: trigger rotation (automated, but human must approve)
  - Document which accounts were targeted in the IR record

Step 5 - Hunting pivot (automated, runs in background)
───────────────────────────────────────────────────────
ACTIONS:
  - Query SIEM: did this source IP/user perform any successful lateral movement
    in the 24 hours following the Kerberoasting attempt?
    - Look for 4624 logon type 3 from source IP to new hosts
    - Look for new service ticket requests (4769) for high-value SPNs
  - Query EDR: any new processes, network connections, or file writes
    from the potentially-compromised host?
  - If lateral movement evidence found - escalate immediately
OUTPUT:  Timeline of attacker activity attached to case

Human escalation triggers:

Confirmed malicious tool identified
DA/EA account was targeted
Evidence of lateral movement detected
Source account is privileged (IT staff, service account with admin rights)
Multiple hosts affected simultaneously (worm-like behavior)

MITRE ATT&CK: T1558.003 (Kerberoasting), T1078 (Valid Accounts), T1021 (Remote Services)

Question 4

You run Atomic Red Team test T1003.006 (DCSync) against a lab DC and check your Elastic SIEM. No alert fires. Walking through the detection chain, identify the three most likely points of failure and the diagnostic command for each.

Reveal Answer

Answer:

Failure Point 1 - Windows Advanced Audit Policy: DS Access not enabled

DCSync generates Event ID 4662 (Directory Service Access) only when "Audit Directory Service Access" is enabled. This subcategory is not enabled by default, even on DCs.

# Diagnose: check current audit policy on the DC
auditpol /get /subcategory:"Directory Service Access"
# If "No Auditing" - events are not generated regardless of SIEM config

# Fix:
auditpol /set /subcategory:"Directory Service Access" /success:enable /failure:enable
# Or via GPO: Computer Config - Advanced Audit Policy - DS Access - Audit Directory Service Access

# Verify: re-run Atomic test, then check Windows Event Viewer
Get-WinEvent -FilterHashtable @{LogName='Security'; Id=4662} -MaxEvents 10

Failure Point 2 - Elastic Agent / Winlogbeat not collecting Security log at sufficient volume

Even with audit policy enabled, the collector may be dropping events due to buffer overflow, rate limiting, or incorrect channel configuration.

# Diagnose: check if ANY 4662 events exist in Elastic (independent of the rule)
# Kibana Dev Tools:
GET logs-system.security-*/_count
{
  "query": {
    "term": { "winlog.event_id": "4662" }
  }
}
# If count = 0 - collection problem

# Check Elastic Agent logs on the DC:
# Windows: C:\Program Files\Elastic\Agent\data\elastic-agent-*\logs\
# Look for: "event dropped", "queue full", "failed to publish"

# Check Windows Security log size - if too small, events get overwritten
# before collection:
Get-EventLog -LogName Security -Newest 1 | Select-Object Index

# Fix: increase Security log size
wevtutil sl Security /ms:1073741824   # Set max size to 1GB

Failure Point 3 - Sigma/EQL rule field mapping mismatch

The detection rule references winlog.event_data.ObjectType but the actual field in the Elastic index may be winlog.event_data.Properties or formatted differently by the ingest pipeline.

# Diagnose: find the actual field names for a known 4662 event
# First confirm a 4662 exists (from manual test):
Get-WinEvent -FilterHashtable @{LogName='Security'; Id=4662} -MaxEvents 1 |
  Format-List *

# Then in Kibana - find the raw document and inspect field names:
GET logs-system.security-*/_search
{
  "query": {"term": {"winlog.event_id": "4662"}},
  "size": 1,
  "_source": true
}
# Inspect the returned document - compare actual field names against your rule

# Common mismatch: rule checks winlog.event_data.ObjectType contains "domainDNS"
# but actual field is: winlog.event_data.ObjectClass = "domainDNS"
# Or the GUID is in Properties[0] not ObjectType

# Fix: update the Sigma rule to match actual field names in your environment
# Add field mapping to sigma pipeline:
# fieldmappings:
#   ObjectType: winlog.event_data.ObjectClass

Summary table:

Failure Point	Symptom	Diagnostic Command
Audit policy not enabled	4662 events absent from Event Viewer	`auditpol /get /subcategory:"Directory Service Access"`
Collection not working	4662 absent from SIEM despite being in Event Viewer	Elastic count query for event_id:4662 + agent logs
Field name mismatch	4662 in SIEM but rule never fires	Raw document inspection in Kibana Dev Tools

MITRE ATT&CK: T1003.006 (DCSync) - this exercise validates that your detection for credential dumping actually works end-to-end.

End of Quiz 4.2 - SIEM, SOAR & Detection Engineering