Using parsedmarc
CLI help
usage: parsedmarc [-h] [-c CONFIG_FILE] [--strip-attachment-payloads] [-o OUTPUT]
[--aggregate-json-filename AGGREGATE_JSON_FILENAME]
[--forensic-json-filename FORENSIC_JSON_FILENAME]
[--aggregate-csv-filename AGGREGATE_CSV_FILENAME]
[--forensic-csv-filename FORENSIC_CSV_FILENAME]
[-n NAMESERVERS [NAMESERVERS ...]] [-t DNS_TIMEOUT] [--offline]
[-s] [--verbose] [--debug] [--log-file LOG_FILE] [-v]
[file_path ...]
Parses DMARC reports
positional arguments:
file_path one or more paths to aggregate or forensic report
files, emails, or mbox files'
optional arguments:
-h, --help show this help message and exit
-c CONFIG_FILE, --config-file CONFIG_FILE
a path to a configuration file (--silent implied)
--strip-attachment-payloads
remove attachment payloads from forensic report output
-o OUTPUT, --output OUTPUT
write output files to the given directory
--aggregate-json-filename AGGREGATE_JSON_FILENAME
filename for the aggregate JSON output file
--forensic-json-filename FORENSIC_JSON_FILENAME
filename for the forensic JSON output file
--aggregate-csv-filename AGGREGATE_CSV_FILENAME
filename for the aggregate CSV output file
--forensic-csv-filename FORENSIC_CSV_FILENAME
filename for the forensic CSV output file
-n NAMESERVERS [NAMESERVERS ...], --nameservers NAMESERVERS [NAMESERVERS ...]
nameservers to query
-t DNS_TIMEOUT, --dns_timeout DNS_TIMEOUT
number of seconds to wait for an answer from DNS
(default: 2.0)
--offline do not make online queries for geolocation or DNS
-s, --silent only print errors and warnings
--verbose more verbose output
--debug print debugging information
--log-file LOG_FILE output logging to a file
-v, --version show program's version number and exit
Note
Starting in parsedmarc
6.0.0, most CLI options were moved to a
configuration file, described below.
Configuration file
parsedmarc
can be configured by supplying the path to an INI file
parsedmarc -c /etc/parsedmarc.ini
For example
# This is an example comment
[general]
save_aggregate = True
save_forensic = True
[imap]
host = imap.example.com
user = dmarcresports@example.com
password = $uperSecure
[mailbox]
watch = True
delete = False
[elasticsearch]
hosts = 127.0.0.1:9200
ssl = False
[opensearch]
hosts = https://admin:admin@127.0.0.1:9200
ssl = True
[splunk_hec]
url = https://splunkhec.example.com
token = HECTokenGoesHere
index = email
[s3]
bucket = my-bucket
path = parsedmarc
[syslog]
server = localhost
port = 514
[gelf]
host = logger
port = 12201
mode = tcp
[webhook]
aggregate_url = https://aggregate_url.example.com
forensic_url = https://forensic_url.example.com
smtp_tls_url = https://smtp_tls_url.example.com
timeout = 60
The full set of configuration options are:
general
save_aggregate
- bool: Save aggregate report data to Elasticsearch, Splunk and/or S3save_forensic
- bool: Save forensic report data to Elasticsearch, Splunk and/or S3save_smtp_tls
- bool: Save SMTP-STS report data to Elasticsearch, Splunk and/or S3strip_attachment_payloads
- bool: Remove attachment payloads from resultsoutput
- str: Directory to place JSON and CSV files in. This is required if you set either of the JSON output file options.aggregate_json_filename
- str: filename for the aggregate JSON output fileforensic_json_filename
- str: filename for the forensic JSON output fileip_db_path
- str: An optional custom path to a MMDB file from MaxMind or DBIPoffline
- bool: Do not use online queries for geolocation or DNSalways_use_local_files
- Disables the download of the reverse DNS maplocal_reverse_dns_map_path
- Overrides the default local file path to use for the reverse DNS mapreverse_dns_map_url
- Overrides the default download URL for the reverse DNS mapnameservers
- str: A comma separated list of DNS resolvers (Default:[Cloudflare's public resolvers]
)dns_test_address
- str: a dummy address used for DNS pre-flight checks (Default: 1.1.1.1)dns_timeout
- float: DNS timeout perioddebug
- bool: Print debugging messagessilent
- bool: Only print errors (Default:True
)log_file
- str: Write log messages to a file at this pathn_procs
- int: Number of process to run in parallel when parsing in CLI mode (Default:1
)Note
Setting this to a number larger than one can improve performance when processing thousands of files
mailbox
reports_folder
- str: The mailbox folder (or label for Gmail) where the incoming reports can be found (Default:INBOX
)archive_folder
- str: The mailbox folder (or label for Gmail) to sort processed emails into (Default:Archive
)watch
- bool: Use the IMAPIDLE
command to process messages as they arrive or poll MS Graph for new messagesdelete
- bool: Delete messages after processing them, instead of archiving themtest
- bool: Do not move or delete messagesbatch_size
- int: Number of messages to read and process before saving. Default10
. Use0
for no limit.check_timeout
- int: Number of seconds to wait for a IMAP IDLE response or the number of seconds until the next mail check (Default:30
)
imap
host
- str: The IMAP server hostname or IP addressport
- int: The IMAP server port (Default:993
)Note
%
characters must be escaped with another%
character, so use%%
wherever a%
character is used.Note
Starting in version 8.0.0, most options from the
imap
section have been moved to themailbox
section.Note
If your host recommends another port, still try 993
ssl
- bool: Use an encrypted SSL/TLS connection (Default:True
)skip_certificate_verification
- bool: Skip certificate verification (not recommended)user
- str: The IMAP userpassword
- str: The IMAP password
msgraph
auth_method
- str: Authentication method, valid types areUsernamePassword
,DeviceCode
, orClientSecret
(Default:UsernamePassword
).user
- str: The M365 user, required when the auth method is UsernamePasswordpassword
- str: The user password, required when the auth method is UsernamePasswordclient_id
- str: The app registration’s client IDclient_secret
- str: The app registration’s secrettenant_id
- str: The Azure AD tenant ID. This is required for all auth methods except UsernamePassword.mailbox
- str: The mailbox name. This defaults to the current user if using the UsernamePassword auth method, but could be a shared mailbox if the user has access to the mailboxtoken_file
- str: Path to save the token file (Default:.token
)allow_unencrypted_storage
- bool: Allows the Azure Identity module to fall back to unencrypted token cache (Default:False
). Even if enabled, the cache will always try encrypted storage first.Note
You must create an app registration in Azure AD and have an admin grant the Microsoft Graph
Mail.ReadWrite
(delegated) permission to the app. If you are usingUsernamePassword
auth and the mailbox is different from the username, you must grant the appMail.ReadWrite.Shared
.Warning
If you are using the
ClientSecret
auth method, you need to grant theMail.ReadWrite
(application) permission to the app. You must also restrict the application’s access to a specific mailbox since it allows all mailboxes by default. Use theNew-ApplicationAccessPolicy
command in the Exchange PowerShell module. If you need to scope the policy to shared mailboxes, you can add them to a mail enabled security group and use that as the group id.New-ApplicationAccessPolicy -AccessRight RestrictAccess -AppId "<CLIENT_ID>" -PolicyScopeGroupId "<MAILBOX>" -Description "Restrict access to dmarc reports mailbox."
elasticsearch
hosts
- str: A comma separated list of hostnames and ports or URLs (e.g.127.0.0.1:9200
orhttps://user:secret@localhost
)Note
Special characters in the username or password must be URL encoded.
user
- str: Basic auth usernamepassword
- str: Basic auth passwordapiKey
- str: API keyssl
- bool: Use an encrypted SSL/TLS connection (Default:True
)timeout
- float: Timeout in seconds (Default: 60)cert_path
- str: Path to a trusted certificatesindex_suffix
- str: A suffix to apply to the index namesindex_prefix
- str: A prefix to apply to the index namesmonthly_indexes
- bool: Use monthly indexes instead of daily indexesnumber_of_shards
- int: The number of shards to use when creating the index (Default:1
)number_of_replicas
- int: The number of replicas to use when creating the index (Default:0
)
opensearch
hosts
- str: A comma separated list of hostnames and ports or URLs (e.g.127.0.0.1:9200
orhttps://user:secret@localhost
)Note
Special characters in the username or password must be URL encoded.
user
- str: Basic auth usernamepassword
- str: Basic auth passwordapiKey
- str: API keyssl
- bool: Use an encrypted SSL/TLS connection (Default:True
)timeout
- float: Timeout in seconds (Default: 60)cert_path
- str: Path to a trusted certificatesindex_suffix
- str: A suffix to apply to the index namesindex_prefix
- str: A prefix to apply to the index namesmonthly_indexes
- bool: Use monthly indexes instead of daily indexesnumber_of_shards
- int: The number of shards to use when creating the index (Default:1
)number_of_replicas
- int: The number of replicas to use when creating the index (Default:0
)
splunk_hec
url
- str: The URL of the Splunk HTTP Events Collector (HEC)token
- str: The HEC tokenindex
- str: The Splunk index to useskip_certificate_verification
- bool: Skip certificate verification (not recommended)
kafka
hosts
- str: A comma separated list of Kafka hostsuser
- str: The Kafka userpasssword
- str: The Kafka passwordssl
- bool: Use an encrypted SSL/TLS connection (Default:True
)skip_certificate_verification
- bool: Skip certificate verification (not recommended)aggregate_topic
- str: The Kafka topic for aggregate reportsforensic_topic
- str: The Kafka topic for forensic reports
smtp
host
- str: The SMTP hostnameport
- int: The SMTP port (Default:25
)ssl
- bool: Require SSL/TLS instead of using STARTTLSskip_certificate_verification
- bool: Skip certificate verification (not recommended)user
- str: the SMTP usernamepassword
- str: the SMTP passwordfrom
- str: The From header to use in the emailto
- list: A list of email addresses to send tosubject
- str: The Subject header to use in the email (Default:parsedmarc report
)attachment
- str: The ZIP attachment filenamesmessage
- str: The email message (Default:Please see the attached parsedmarc report.
)Note
%
characters must be escaped with another%
character, so use%%
wherever a%
character is used.
s3
bucket
- str: The S3 bucket namepath
- str: The path to upload reports to (Default:/
)region_name
- str: The region name (Optional)endpoint_url
- str: The endpoint URL (Optional)access_key_id
- str: The access key id (Optional)secret_access_key
- str: The secret access key (Optional)
syslog
server
- str: The Syslog server name or IP addressport
- int: The UDP port to use (Default:514
)
gmail_api
credentials_file
- str: Path to file containing the credentials, None to disable (Default:None
)token_file
- str: Path to save the token file (Default:.token
)Note
credentials_file and token_file can be got with quickstart.Please change the scope to
https://www.googleapis.com/auth/gmail.modify
.include_spam_trash
- bool: Include messages in Spam and Trash when searching reports (Default:False
)scopes
- str: Comma separated list of scopes to use when acquiring credentials (Default:https://www.googleapis.com/auth/gmail.modify
)oauth2_port
- int: The TCP port for the local server to listen on for the OAuth2 response (Default:8080
)paginate_messages
- bool: WhenTrue
, fetch all applicable Gmail messages. WhenFalse
, only fetch up to 100 new messages per run (Default:True
)
log_analytics
client_id
- str: The app registration’s client IDclient_secret
- str: The app registration’s client secrettenant_id
- str: The tenant id where the app registration residesdce
- str: The Data Collection Endpoint (DCE). Example:https://{DCE-NAME}.{REGION}.ingest.monitor.azure.com
.dcr_immutable_id
- str: The immutable ID of the Data Collection Rule (DCR)dcr_aggregate_stream
- str: The stream name for aggregate reports in the DCRdcr_forensic_stream
- str: The stream name for the forensic reports in the DCRdcr_smtp_tls_stream
- str: The stream name for the SMTP TLS reports in the DCR
Note
Information regarding the setup of the Data Collection Rule can be found here.
gelf
host
- str: The GELF server name or IP addressport
- int: The port to usemode
- str: The GELF transport type to use. Valid modes:tcp
,udp
,tls
maildir
reports_folder
- str: Full path for mailbox maidir location (Default:INBOX
)maildir_create
- bool: Create maildir if not present (Default: False)
webhook
- Post the individual reports to a webhook url with the report as the JSON bodyaggregate_url
- str: URL of the webhook which should receive the aggregate reportsforensic_url
- str: URL of the webhook which should receive the forensic reportssmtp_tls_url
- str: URL of the webhook which should receive the smtp_tls reportstimeout
- int: Interval in which the webhook call should timeout
Warning
It is strongly recommended to not use the nameservers
setting. By default, parsedmarc
uses
Cloudflare’s public resolvers, which are much faster and more
reliable than Google, Cisco OpenDNS, or even most local resolvers.
The nameservers
option should only be used if your network
blocks DNS requests to outside resolvers.
Note
save_aggregate
and save_forensic
are separate options
because you may not want to save forensic reports
(also known as failure reports) to your Elasticsearch instance,
particularly if you are in a highly-regulated industry that
handles sensitive data, such as healthcare or finance. If your
legitimate outgoing email fails DMARC, it is possible
that email may appear later in a forensic report.
Forensic reports contain the original headers of an email that failed a DMARC check, and sometimes may also include the full message body, depending on the policy of the reporting organization.
Most reporting organizations do not send forensic reports of any kind for privacy reasons. While aggregate DMARC reports are sent at least daily, it is normal to receive very few forensic reports.
An alternative approach is to still collect forensic/failure/ruf
reports in your DMARC inbox, but run parsedmarc
with
save_forensic = True
manually on a separate IMAP folder (using
the reports_folder
option), after you have manually moved
known samples you want to save to that folder
(e.g. malicious samples and non-sensitive legitimate samples).
Warning
Elasticsearch 8 change limits policy for shards, restricting by default to 1000. parsedmarc use a shard per analyzed day. If you have more than ~3 years of data, you will need to update this limit. Check current usage (from Management -> Dev Tools -> Console):
GET /_cluster/health?pretty
{
...
"active_primary_shards": 932,
"active_shards": 932,
...
}
Update the limit to 2k per example:
PUT _cluster/settings
{
"persistent" : {
"cluster.max_shards_per_node" : 2000
}
}
Increasing this value increases resource usage.
Running parsedmarc as a systemd service
Use systemd to run parsedmarc
as a service and process reports as
they arrive.
Protect the parsedmarc
configuration file from prying eyes
sudo chown root:parsedmarc /etc/parsedmarc.ini
sudo chmod u=rw,g=r,o= /etc/parsedmarc.ini
Create the service configuration file
sudo nano /etc/systemd/system/parsedmarc.service
[Unit]
Description=parsedmarc mailbox watcher
Documentation=https://domainaware.github.io/parsedmarc/
Wants=network-online.target
After=network.target network-online.target elasticsearch.service
[Service]
ExecStart=/opt/parsedmarc/venv/bin/parsedmarc -c /etc/parsedmarc.ini
User=parsedmarc
Group=parsedmarc
Restart=always
RestartSec=5m
[Install]
WantedBy=multi-user.target
Then, enable the service
sudo systemctl daemon-reload
sudo systemctl enable parsedmarc.service
sudo service parsedmarc restart
Note
You must also run the above commands whenever you edit
parsedmarc.service
.
Warning
Always restart the service every time you upgrade to a new version of
parsedmarc
:
sudo service parsedmarc restart
To check the status of the service, run:
service parsedmarc status
Note
In the event of a crash, systemd will restart the service after 10
minutes, but the service parsedmarc status
command will only show
the logs for the current process. To view the logs for previous runs
as well as the current process (newest to oldest), run:
journalctl -u parsedmarc.service -r