In enterprise infrastructure, “inventory” is usually a static spreadsheet or a CMDB record. Production, however, is dynamic: package versions move, kernel parameters get tweaked, fresh users appear, and unexpected ports get opened. That’s why I prefer to design inventory not as a “list” but as queryable, living truth.
osquery makes this possible: you query the operating system as if it were a database. FleetDM then spreads osquery in a manageable way across hundreds or thousands of nodes.
Architecture: a “working” install with the minimum components
The practical minimum architecture for FleetDM:
- Fleet server
- MySQL (state)
- Redis (queue / cache)
- osquery agent (on hosts)
Two production tips:
- Run Fleet inside a separate management segment (do not let it behave like a “general service” reachable from anywhere)
- Standardize TLS (via a reverse proxy or directly on Fleet)
Agent rollout: “install” is not enough; “keep it alive” is the goal
On the agent side the targets are:
- Automated install (Ansible, salt, cloud-init, etc.)
- Enrollment secret management (with a rotation plan)
- “Offline host” signal (is the agent missing, or is the host?)
A practical approach:
- Install the agent via package manager or a signed binary
- Provide the enrollment secret from a file or a secret store
- Run it as a systemd service (with restart policy and resource limits)
- Set up labels / segments on day one (prod / pilot / dev, etc.)
Query pack design: not “everything,” but “what drives a decision”
Querying everything is tempting at the start, but it is not sustainable. I split packs into three:
- Inventory (once a day): kernel, OS version, package list, disk / mount
- Security baseline (once an hour): SSH config, sudoers changes, hashes of critical files
- Hunt / incident (temporary): specific IOCs, processes, ports, persistence
Example: critical SSH signals
SELECT
key,
value
FROM ssh_configs
WHERE key IN ('PermitRootLogin', 'PasswordAuthentication', 'PubkeyAuthentication');
Example: unexpected listening ports
SELECT
p.pid,
p.name,
l.address,
l.port,
l.protocol
FROM listening_ports l
JOIN processes p ON l.pid = p.pid
WHERE l.port NOT IN (22, 80, 443);
Operational flow: where will I route the signal?
Fleet / osquery alone produces “records.” For them to produce value, route the signal to these three places:
- A log pipeline (e.g. Vector / OTel into search and archive)
- Alerting (thresholds / statistics: new user, new port, baseline drift)
- Runbooks (the first 5 steps when an alarm fires)
A rule that has worked for me in the field: without the trio of query + alarm + runbook, that query gets forgotten in time.
Performance and cost: keep the agent “quiet”
Common mistakes:
- Polling heavy tables like
processesonce a minute - Running the same query on every host at the same cadence
- Streaming raw query output into the SIEM and blowing up costs
Solutions:
- Lower the frequency (inventory daily)
- Differentiate by labels (prod more frequently, dev less so)
- Summarize (send deltas, not the full list)
Use during incidents: a “hunt pack” for controlled hunting
osquery is highly useful in incidents, but the rule is: temporary pack.
- IOC hashes
- Suspicious process names
- Persistence points (cron, systemd, rc scripts)
When the incident closes, retire the pack. As long as “incident queries” linger, both cost and risk pile up.
Conclusion
FleetDM + osquery turns enterprise inventory from a “document” into a shared query layer for operations and security. Set up well, it surfaces drift, unexpected ports, baseline deviations, and incident signals quickly. Set up poorly, it produces noise. So start small and grow with packs that actually drive decisions.