Overview

We help customers analyze computer generated log files. While logs from a single server can often be analyzed via simple text-based tools (e.g. grep), this approach does not scale well as the number of log sources and total log volume increases. As infrastructure grows, it is common to forward all logs to a central location for easier access and to enable global analysis.

While storing terabytes of logs is not much of a problem, analyzing large volumes of logs requires significant compute power. At the same time, this compute sits idle most of the time, while no analysis is taking place. Much better utilization can be achieved by pooling logs from different organizations, which is why log analysis is often outsourced to third party providers.

Many log analysis solutions attempt to pre-index (often structured) logs and thereby trade flexibility away to gain query speed. In contrast, we provide large-scale log analysis based on a massively parallelized regular expression search across the original logs.

Ingestion

To be analyzed, logs must first reach us. This happens by installing log forwarding agents on the machines generating the logs (or on dedicated forwarding hosts) within your infrastructure. These forwarding agents take care of the authentication to and seamless failover between our ingestion endpoints. Please see our setup guides for the various forwarding agent implementations we currently offer.

Incoming logs are assigned to a physical log stream, along up to four textual dimensions or "prefixes" (as queries and access controls are based on prefixes of these dimensions). Typical setups use prefix0 for the hostname of the generating machine and prefix1 for the log file name or service type. This allows queries to target either all logs of a machine (by specifying a prefix0) or all logs of a certain service (by specifying a prefix1). Certain properties (e.g. retention time) are configurable per physical log stream.

Analysis

Log analysis happens either in our web UI or for advanced cases via API access to our search infrastructure. All log queries are ultimately answered by a full scan across the stored logs, pre-filtered by time range and physical log streams. We cache results transparently and are able to re-use partial results from earlier scans for similar queries. Please see the querying guide for details on query types, allowable regex syntax and result aggregations. For automated or very complex analysis, please review the API documentation.

Access Controls

We currently define three access roles: Writer (i.e. the permission to forward logs), Reader (i.e. the permission to query logs), and Admin (i.e. the permission to change configuration and permissions).

Permission to forward logs into a specific log stream is granted via write tokens which encode both the authentication and the target log stream.

Permission to query logs is granted on prefixes along the four dimensions defining the log stream. A user with read access on

prefix0: "com.example"
prefix1: ""
prefix2: ""
prefix3: ""

has access to any log streams where prefix0 starts with "com.example" (e.g. "com.example.www", "com.example.test") but can query along prefix1-3 freely. A single user can have multiple access permissions to different combination of prefixes.

Admin permissions follow the same model as query permissions. An admin permission with prefix0 set to "com.example" can only configure log streams which have prefix0 starting with "com.example" and can only grant and revoke permissions which have prefix0 starting with "com.example".