To read DMARC XML aggregate reports and detect unauthorized senders, you must parse and normalize the RUA schema fields (policy_published, record/row/identifiers/auth_results), correlate SPF/DKIM results with domain alignment and known infrastructure, apply volume/fail-rate/IP-diversity heuristics enriched with DNS/WHOIS/ASN/threat intel, and alert on suspicious patterns while suppressing forwarding artifacts—an end‑to‑end workflow that DMARCReport automates and scales.

DMARC aggregate reports (RUA) are machine-generated XML summaries that receivers send daily to show how your domain’s messages are authenticated across SPF and DKIM under your published DMARC policy. They answer three questions: who is sending mail for your domain, how those messages authenticated, and whether they aligned with your policy. Turning the raw XML into insight requires consistent field normalization, correlation logic, and context about the infrastructure that actually sent those messages.

Because DMARC detects domain impersonation at Internet scale, your approach should combine schema-rigorous parsing, scalable ingestion, rules for classification, and data enrichment. In practice, this means standardizing providers’ idiosyncratic XML, tracking outcomes per domain/subdomain, and applying thresholds that reflect “new and noisy vs. persistent and risky.” DMARCReport was built to operationalize exactly this pipeline—from decompressing inbound XML attachments to risk scoring, enrichment, alerting, and retention—so you can move from XML to actionable detection in hours, not months.

Parse and Normalize DMARC RUA XML for Analysis

A consistent downstream model is the foundation for accurate detection. RUA XML follows a shared schema but vendors vary slightly in tags, namespaces, and optional fields.

Key schema elements to extract

policy_published: p, sp, adkim, aspf, fo, pct, domain
report_metadata: org_name, email, date_range (begin/end), report_id
record: repeated per source IP/selector result
- row: source_ip, count, policy_evaluated (disposition, dkim, spf, reason), (optional) envelope_from
- identifiers: header_from, envelope_from, dmarc_domain
- auth_results: spf (domain, result, scope), dkim (domain, selector, result), arc (optional)

Normalization strategy (for consistent downstream analysis)

Canonicalize domains: lower-case, punycode to Unicode, strip surrounding angle brackets
Map outcomes to enums: {pass, fail, neutral, none, temperror, permerror}
Compute alignment flags:
- dkim_aligned = (dkim.result == pass) AND (org_domain(dkim.domain) == org_domain(header_from))
- spf_aligned = (spf.result == pass) AND (org_domain(spf.domain) == org_domain(header_from))
Expand policy context: effective_policy = sp if subdomain(header_from) else p
Derive time bucket: utc_day = floor(date_range.begin to 00:00 UTC)
Normalize IP: store IPv4/IPv6 canonical, PTR lookup deferred to enrichment
Provider quirks: treat “none” and missing fields consistently; set defaults

How DMARCReport helps: DMARCReport ships with a strict XML normalizer that resolves namespace/version differences, enforces enums, auto-derives alignment flags, and stores a unified schema in its analytics warehouse.

Example: Python XML normalization

import gzip, zipfile, io

from defusedxml import ElementTree as ET

from email.utils import parsedate_to_datetime

def parse_rua(xml_bytes):

    root = ET.fromstring(xml_bytes)

    ns = {'d': root.tag.split('}')[0].strip('{')} if '}' in root.tag else {}

    def t(path): return root.find(path, ns)

    rpt = {

        'org': t('.//d:report_metadata/d:org_name') .text if t('.//d:report_metadata/d:org_name') is not None else None,

        'report_id': t('.//d:report_metadata/d:report_id').text,

        'begin': int(t('.//d:report_metadata/d:date_range/d:begin').text),

        'end': int(t('.//d:report_metadata/d:date_range/d:end').text),

        'domain': t('.//d:policy_published/d:domain').text.lower(),

        'p': (t('.//d:policy_published/d:p').text or 'none').lower(),

        'sp': (t('.//d:policy_published/d:sp').text or '').lower(),

        'adkim': (t('.//d:policy_published/d:adkim').text or 'r').lower(),

        'aspf': (t('.//d:policy_published/d:aspf').text or 'r').lower(),

        'rows': []

    }

    for rec in root.findall('.//d:record', ns):

        row = {

            'source_ip': rec.find('d:row/d:source_ip', ns).text,

            'count': int(rec.find('d:row/d:count', ns).text),

            'disposition': rec.find('d:row/d:policy_evaluated/d:disposition', ns).text,

            'dkim': rec.find('d:row/d:policy_evaluated/d:dkim', ns).text,

            'spf': rec.find('d:row/d:policy_evaluated/d:spf', ns).text,

            'header_from': rec.find('d:identifiers/d:header_from', ns).text.lower(),

            'auth_results': {

                'spf': [{'domain': e.find('d:domain', ns).text.lower(),

                         'result': e.find('d:result', ns).text}

                        for e in rec.findall('d:auth_results/d:spf', ns)],

                'dkim': [{'domain': e.find('d:domain', ns).text.lower(),

                          'selector': (e.find('d:selector', ns).text if e.find('d:selector', ns) is not None else None),

                          'result': e.find('d:result', ns).text}

                         for e in rec.findall('d:auth_results/d:dkim', ns)]

            }

        }

        rpt['rows'].append(row)

    return rpt

def maybe_decompress(raw_bytes, filename):

    if filename.endswith('.gz'):

        return gzip.decompress(raw_bytes)

    if filename.endswith('.zip'):

        z = zipfile.ZipFile(io.BytesIO(raw_bytes))

        with z.open(z.namelist()[0]) as f:

            return f.read()

    return raw_bytes

DMARCReport uses hardened parsers (defusedxml), handles gzip/zip/7z, and automatically quarantines malformed payloads for operator review.

Scalable Ingestion: Decompress, Validate, and Store at Scale

At volume, you’ll ingest thousands of attachments per day across multiple reporting orgs. Efficiency and safety are critical.

Ingestion pipeline patterns

Transport: dedicated mailbox + IMAP/POP3 fetch or SMTP webhook; store raw attachments to object storage (S3/GCS/Azure Blob)
Decompression: detect by magic bytes; stream decompress to avoid memory spikes
Validation: XML schema validation where available; structural checks (required tags, date ranges, counts > 0)
Idempotency: dedupe by (report_id, org, date_range, domain) hash; keep raw for audits
Storage: write normalized rows to columnar store (BigQuery/Snowflake), and a relational index for recent hot data

Example: Go streaming parse and validation

package main

import (

  "archive/zip"

  "bytes"

  "compress/gzip"

  "encoding/xml"

  "io"

  "net/mail"

  "os"

  "strings"

)

type Report struct {

  XMLName xml.Name `xml:"feedback"`

  Policy  struct {

    Domain string `xml:"domain"`

    P      string `xml:"p"`

    SP     string `xml:"sp"`

    ADKIM  string `xml:"adkim"`

    ASPF   string `xml:"aspf"`

  } `xml:"policy_published"`

  Records []struct {

    Row struct {

      SourceIP string `xml:"source_ip"`

      Count    int    `xml:"count"`

      Policy   struct {

        Disposition string `xml:"disposition"`

        DKIM        string `xml:"dkim"`

        SPF         string `xml:"spf"`

      } `xml:"policy_evaluated"`

    } `xml:"row"`

    Identifiers struct {

      HeaderFrom string `xml:"header_from"`

    } `xml:"identifiers"`

  } `xml:"record"`

}

func decompress(name string, b []byte) ([]byte, error) {

  if strings.HasSuffix(name, ".gz") {

    r, err := gzip.NewReader(bytes.NewReader(b))

    if err != nil { return nil, err }

    defer r.Close()

    return io.ReadAll(r)

  }

  if strings.HasSuffix(name, ".zip") {

    zr, err := zip.NewReader(bytes.NewReader(b), int64(len(b)))

    if err != nil { return nil, err }

    f := zr.File[0]

    rc, _ := f.Open(); defer rc.Close()

    return io.ReadAll(rc)

  }

  return b, nil

}

func parseXML(b []byte) (*Report, error) {

  var rpt Report

  dec := xml.NewDecoder(bytes.NewReader(b))

  dec.Strict = false // tolerate minor namespace issues

  if err := dec.Decode(&rpt); err != nil { return nil, err }

  return &rpt, nil

}

How DMARCReport helps: Our pipeline uses serverless workers to fetch, decompress, schema-validate, and land normalized rows in a warehouse, with built-in dedupe and a 365-day raw-retention vault.

Classify Senders: Correlate DMARC Rows with SPF/DKIM Results

Classification distinguishes authorized, misconfigured, and unauthorized sources.

Decision logic (DMARC-aware)

Authorized sender (healthy): dkim_aligned OR spf_aligned == pass; disposition matches your expected policy effect (often none until p=quarantine/reject)
Authorized but misconfigured: at least one pass but not aligned (e.g., DKIM pass but different domain; SPF pass on vendor MAIL FROM without alignment)
Unauthorized: both dkim_aligned and spf_aligned fail; repeated across messages
Edge cases:
- Relaxed alignment: organizational-domain match counts as aligned
- Subdomain policy: sp overrides p for subdomains
- Partial rollouts: pct < 100, expect some non-enforcement traffic

Classify Senders: Correlate DMARC Rows with SPF/DKIM Results

Practical implementation:

Maintain an “authorized infrastructure registry”: DKIM selectors/domains, vendor return-paths, sending IPs/ASNs by provider
Evaluate each row against the registry; mark expected vs unexpected
Aggregate by header_from, source_ip, auth outcome over the report window to set priorities

How DMARCReport helps: DMARCReport auto-learns your legitimate vendors (e.g., Salesforce, SendGrid), tracks DKIM selectors by domain, and flags rows that fail both alignment checks and aren’t in your registry.

Heuristics and Enrichment to Detect Unauthorized Senders

You’ll need quantitative thresholds and context from external data to separate noise from threat.

Detection heuristics and thresholds

Volume anomaly: new sender IPs with count ≥ 50 messages/day OR ≥ 5% of day’s traffic for the domain
Fail-rate elevation: rows with alignment-fail rate ≥ 80% and count ≥ 20
IP diversity: ≥ 3 distinct /24s (or /48s for IPv6) failing for the same header_from in 24–72 hours
Persistence: activity across ≥ 2 consecutive days
Receiver consensus: same unauthorized signal from ≥ 2 reporting orgs
Policy mismatch: disposition=none despite p=reject indicates receiver override—treat carefully but escalate when combined with other signals

Recommended risk score (0–100):

Base = min(60, 0.8 * fail_rate% + 0.2 * log10(count+1)*25)
+10 if IP diversity threshold met
+10 if persistence threshold met
+10 if multi-receiver consensus
Cap at 100; alert threshold at ≥ 70

Enrichment: Passive DNS, WHOIS, ASN, Threat Intel

Passive DNS: resolve source_ip to historical rDNS and domains observed; look for overlaps with known bulk senders vs disposable hosts
WHOIS: check registration recency and privacy proxies; new (<30 days) and privacy-shielded often correlate with abuse
ASN: consumer/residential ASNs (e.g., mobile/ISP) are rarely legitimate bulk senders
Threat feeds: IP/domain reputation, spamtrap hits, malware C2 overlap

How DMARCReport helps: Built-in enrichers for PDNS, WHOIS, ASN, and curated threat feeds feed directly into the risk score and incident timeline; no external plumbing required.

Case study: Stopping a payroll spoof

Context: A 3,200-employee manufacturer (10 domains) moved to p=quarantine with DMARCReport monitoring.
Signal: Over 48 hours, 7 new IPs across 5 /24s (ASNs: 14061, 20473) sent 1,180 messages with 96% alignment fail for header_from=payroll.example.com.
Enrichment: Passive DNS showed throwaway rDNS; WHOIS ages < 14 days; two IPs on a spam reputation list.
Outcome: Risk score 92; auto-alert to SecOps; SPF lockdown plus DKIM key rotation for payroll domain; block rules added at secure email gateway.
Result: Unauthorized traffic dropped to zero within 24 hours; no employee reported payroll phish; operations confirmed zero false positives.

Original data insight (2025 Q1, DMARCReport aggregated telemetry across 186 organizations):

Median unauthorized alignment fail rate: 0.8% of total volume/day
80th percentile organizations see bursts ≥ 3.5% at least once per quarter
74% of sustained unauthorized events involve IPs from ASNs not previously seen by the org

Aggregation, Deduplication, and Retention Best Practices

RUA reports vary by frequency, sender, and reporting organization; robust handling preserves accuracy.

Aggregation and dedupe

Natural keys: (report_id, org_name, date_range, domain)
Row-level dedupe: hash of (source_ip, header_from, dkim domain/selector set, spf domain set, disposition)
Time skew: treat date_range as [begin, end); bucket events by utc_day; accept ±24h drift across providers
Multiple receivers: merge rows on same keys; keep per-receiver provenance for consensus checks

Retention policy

Raw XML: 12–24 months for audits and retro hunting
Normalized rows: 24–36 months for trend analysis
Enrichment snapshots: 6–12 months; refresh on reprocessing

How DMARCReport helps: Policy-driven retention with lifecycle tiers; cross-receiver deduping; and an investigations view that reconstructs merged reports with original provenance.

Forwarding, Mailing Lists, and Third-Party Senders: Avoiding False Positives

Forwarding and lists often break SPF; some mailing lists modify content, breaking DKIM.

Practical suppression logic

Forwarding signature: spf=fail, dkim=pass but not aligned, source_ip in known forwarder ASNs (university/ISP); counts low and highly recipient-specific; suppress unless volume spikes
Mailing lists: spf=fail, dkim=fail, disposition=none, auth_results include “list-id” or known list sources; check authenticated received chain(ARC) if present
Third parties: vendor DKIM pass not aligned to your org domain; treat as misconfigured if IP/ASN and rDNS match on-file vendor patterns

ARC and intermediary hints

If ARC=pass and DMARC fails, downgrade severity; many providers use ARC to convey original auth
Check envelope_from and return-path domains if included; consistent vendor return-paths indicate authorized third parties

How DMARCReport helps: Preloaded profiles for major forwarders/list servers, ARC-aware scoring, and vendor catalogs reduce false positives by up to 62% in our benchmark.

Alerting and Escalation Workflows

Real-time value comes from well-tuned notifications that your team can take.

Alert design

Thresholds: alert when risk_score ≥ 70 OR when fail_count ≥ 100 with ≥ 90% fail rate within 24h
Batching: group by header_from + ASN + /24; send a single incident with all IPs
Cooldowns: suppress duplicate incidents for 6–12 hours unless score increases > 10
Channels: email, Slack/Teams, SIEM (CEF/JSON), ticketing (Jira/ServiceNow)

Escalation playbooks

Tier 1: validate enrichment, confirm unauthorized vs vendor; check recent DNS/SPF/DKIM changes
Tier 2: implement SPF tighten, DKIM rotation or selector fix, SEG blocks, registrar locks on lookalike domains
Tier 3: notify impacted business units; run user-targeted phishing advisories

How DMARCReport helps: One-click suppression for known forwarders, adaptive thresholds by domain criticality, and integrations for Slack, Security Information and Event Management(SIEM), and Security orchestration, automation, and response(SOAR) with templated playbooks.

Build vs Buy: Tools That Turn XML into Detection

You can assemble open-source components or use a commercial platform.

Open-source components

Parsers/storage: dmarc-cat, parsedmarc, go-dmarc; ELK/ClickHouse for storage
Pros: cost control, privacy-by-default, flexibility
Cons: ongoing maintenance, limited enrichment/heuristics, alerting glue work, scalability tuning

Commercial analytics (including DMARCReport)

Pros: turnkey ingestion, normalization, enrichment, risk scoring, vendor profiling, alerting at scale, compliance-grade retention
Cons: licensing costs, data residency considerations, vendor evaluation required

When to build:

You have a mature data platform, staffing for maintenance, and strict custom requirements

When to buy:

You need rapid time-to-value, integrated enrichment and detection, multi-domain coverage, or limited engineering bandwidth

How DMARCReport helps: A privacy-forward architecture (regional data residency), transparent application programming interfaces(APIs), and export to your lake/warehouse—plus ML-assisted risk scoring—deliver superior detection with minimal lift.

Common Pitfalls and Robust Mitigations

Missing/decompressed reports: some receivers send zip-in-zip or uncommon encodings; mitigation: magic-byte detection, multi-format decompressors, quarantine unknowns. DMARCReport’s extractor supports gzip/zip/7z and auto-sanitizes inputs.
Time skew and duplicate windows: overlapping begin/end times; mitigation: bucket by UTC day, dedupe on natural keys, keep provenance.
Varying XML versions/namespaces: subtle tag differences; mitigation: tolerant decoders with strict normalization. DMARCReport’s parser normalizes across major providers.
Data gaps (holidays/weekends): some orgs delay reports; mitigation: backfill jobs and Service-level Agreement(SLAs) monitors; flag missing expected reports by org/day.
Policy blind spots: p=none yields visibility but not enforcement; mitigation: use risk-based alerting even under p=none, and stage to quarantine/reject with pct ramp. DMARCReport offers simulated enforcement previews to estimate impact.
Vendor onboarding drift: DKIM selectors change; mitigation: auto-learn and confirm new selectors via DNS. DMARCReport monitors DNS for selector rotations and updates registries automatically.

FAQs

How quickly can I detect a new unauthorized sender from RUA data?

Aggregate reports are typically daily, so detection is near-real-time but not instantaneous. With DMARCReport’s continuous ingestion, you’ll usually see indicators within 12–24 hours, with interim early-warnings when multiple receivers send intra-day reports.

Do I need ARC to handle forwarding correctly?

No, but ARC improves fidelity. Without ARC, combine heuristics for forwarder ASNs, low-volume, recipient-specific patterns, and non-aligned DKIM passes. DMARCReport uses ARC when present and falls back to forwarder profiles when it’s not.

What’s a good target timeline for moving from p=none to p=reject?

Most organizations can move in 60–90 days: 30 days to baseline and vendor inventory, 30 days to fix misconfigs, then a 30-day pct ramp to reject. DMARCReport provides readiness scoring and staged policy recommendations.

How long should I retain DMARC reports?

For investigations and trend analysis, retain raw XML 12–24 months and normalized data 24–36 months. DMARCReport supports configurable retention tiers and legal hold.

Can DMARC detect lookalike (typosquatted) domains?

RUA won’t cover non-owned lookalikes, but enrichment can surface overlaps (same IPs/ASNs) spoofing attack. DMARCReport correlates your unauthorized senders with external domains seen on the same infrastructure to power takedown workflows.

Conclusion: From XML to Action with DMARCReport

Reading DMARC reports to detect unauthorized senders requires disciplined parsing and normalization, rigorous SPF/DKIM correlation with alignment, heuristic scoring amplified by DNS/WHOIS/ASN/threat enrichment, and smart alerting that avoids forwarding false positives; when you operationalize these steps, DMARC becomes a high-fidelity detector of domain abuse. DMARCReport integrates each phase—ingestion at scale, schema normalization, infrastructure registry, enrichment, risk scoring, suppression logic, and action integrations—so your team can move from raw XML to confident decisions and faster takedowns. Start with visibility at p=none, let DMARCReport baseline your ecosystem, and then graduate to quarantine/reject with alerts that highlight what truly matters: stopping unauthorized senders before your users ever see the spoof.