Designing a Scalable, Event-Driven Log Entry Service
After establishing why you’ll eventually build your own logging system, this post dives into the actual architecture: event-driven architecture, CQRS, and strategic use of Kafka, Cassandra, and Postgres.
This is a simple system built from well-understood patterns to deliver forensic-grade logging where every line matters.
Domain Model: Workflows, Not Web Servers
The domain model reflects the forensic logging context:
Tenant: A system generating logs (e.g., an ETL pipeline system, a CI/CD platform)
Log: Owned by a domain aggregate within that system. Could be a specific job execution, pipeline run, or workflow instance. Identified by
(tenant_id, log_id)
LogEntry: An individual line within that log, containing:
- The actual log payload (stdout, stderr, structured data)
- A
timestamp
(when the event occurred) - A
receive_timestamp
(when the logging service received it) - A system-assigned sequential
log_entry_id
within that log
This model captures the reality: systems have domain aggregates (job runs, pipeline executions) that own their logs, and order within each log is sacred.
The Requirements: No Line Left Behind
Building on the forensic logging needs outlined earlier, the system must:
- Capture every log entry from ETL jobs, CI pipelines, and workflow engines (no sampling, no dropping)
- Maintain strict ordering within each job/pipeline execution (crucial for debugging stateful workflows)
- Assign sequential identifiers to each entry for precise references and redaction requirements
- Support efficient browsing of logs by timestamp windows
- Scale horizontally to handle both burst loads and steady growth
- Guarantee durability (a log written is a log preserved, even through cascading failures)
These aren’t the requirements of typical observability tools. This is about building an immutable audit trail.
Architecture
API Sketch
Protobuf/gRPC was chosen over JSON/HTTP.
syntax = "proto3";
service LoggingCommandService {
rpc CreateLog(CreateLogRequest) returns (CreateLogResponse);
rpc DeleteLog(DeleteLogRequest) returns (DeleteLogResponse);
rpc WriteLogEntries(WriteLogEntriesRequest) returns (WriteLogEntriesResponse);
rpc RevokeLogEntries(RevokeLogEntriesRequest) returns (RevokeLogEntriesResponse);
}
service LoggingQueryService {
rpc ListLogEntries(ListLogEntriesRequest) returns (ListLogEntriesResponse);
}
message CreateLogRequest {}
message CreateLogResponse {
Log log = 1;
}
message Log {
string id = 1; // UUID
}
message LogEntry {
string log_id = 1;
string insert_id = 2; // for idempotency
string entry_id = 20; // for revocation and stable sorting
google.protobuf.Timestamp time = 21;
google.protobuf.Timestamp insert_time = 22;
map<string, string> labels = 23;
oneof payload {
string text_payload = 30;
google.protobuf.Struct json_payload = 31;
}
bool redacted = 40;
// Severity leaves gaps for future extension.
enum Severity {
UNSPECIFIED = 0;
TRACE = 100;
DEBUG = 200;
INFO = 300;
NOTICE = 400;
WARNING = 500;
ERROR = 600;
CRITICAL = 700;
ALERT = 800;
EMERGENCY = 900;
}
Severity severity = 41;
}
Data Store Selection
These choices define the guarantees and shape the paths:
Kafka — Durable Buffer & Ordered Ingress
Role: Immediate durability and ordered processing.
Config sketch:
- Retention: ~7 days (buffer for processing delays)
- Replication factor: 3 (tolerates two node failures)
- Partitioning key:
(tenant_id, log_id)
to enforce partial ordering within a log
Cassandra — Permanent Archive & Counter State
Role: Long-term, append-heavy store for raw entries and counters.
Why: Write-optimized, horizontally scalable, and predictable under burst.
Schema sketch:
-- Source of truth for entries (protobuf payloads) CREATE TABLE log_entries ( tenant_id text, log_id text, log_entry_id bigint, timestamp timestamp, payload blob, -- protobuf bytes PRIMARY KEY ((tenant_id, log_id, log_entry_id), log_entry_id) ); -- Idempotency / dedup by client-supplied insert_id CREATE TABLE log_entries_by_insert_id ( tenant_id text, log_id text, insert_id text, log_entry_id bigint, PRIMARY KEY ((tenant_id, log_id), insert_id) );
Postgres — Read Model for Investigations
Role: Fast browsing for investigations.
Optimizations:
- Index for time-window scans:
(tenant_id, log_id, timestamp, log_entry_id)
- Index for time-window scans:
Schema sketch:
CREATE TABLE log_entries (
tenant_id UUID NOT NULL,
log_id UUID NOT NULL, l
log_entry_id BIGINT NOT NULL,
ts TIMESTAMPTZ NOT NULL,
payload bytea NOT NULL,
PRIMARY KEY (tenant_id, log_id, timestamp, log_entry_id)
);
Write Path: Guaranteed Capture (overview)
- Command API accepts log streams from running jobs.
- Durability first: append to Kafka “pending” topic.
- LogEntry Service consumes per-execution partitions, assigns sequential IDs.
- Persist to Cassandra (source of truth) and emit to “committed” topic.
Read Path: Forensic Analysis (overview)
- Projection Consumers hydrate queryable views in Postgres from the committed topic.
- Query API serves browsing requests with full historical access.
- No aggregation, no sampling — every line remains accessible.
Why Event-Driven for Forensic Logging?
Ordering Without Coordination
Kafka partitions by (tenant_id, log_id)
ensure all entries from a single job execution flow through the same partition. No distributed locks, no split-brain scenarios. Deterministic ordering per execution.
Horizontal Scaling Built-In
- Need more throughput? Add partitions and LogEntry Service workers.
- Partitions process independently with no shared state.
- Linear scaling: more partitions → more throughput.
The Sequential ID Challenge
Sequential IDs within each execution are essential for:
- Precise references in bug reports (“error occurs after entry 15234”)
- Redaction of sensitive data without reordering
- Efficient range operations
Implementation: Batched Counter Management
The LogEntry Service maintains an in-memory counter per (tenant_id, log_id)
and processes in batches:
While consuming from a Kafka partition:
1) Poll messages
2) For each message:
- If not a LogWrittenEvent, pass-through
- Else:
- Extract (tenant_id, log_id)
- If counter missing, load from Cassandra
- Assign incrementing log_entry_id
- Stage for bulk write
3) Bulk write entries to Cassandra
4) Bulk update counters in Cassandra
5) Emit to log_events_v1_committed
6) Commit Kafka offsets
Batched writes minimize round-trips; the cache amortizes counter lookups.
Correctness Through Idempotency
Two Cassandra tables cooperate to deliver exactly-once processing:
-- Deduplication keyed by client-supplied insert_id
CREATE TABLE log_entries_by_insert_id (
tenant_id text,
log_id text,
insert_id text,
log_entry_id bigint,
PRIMARY KEY ((tenant_id, log_id), insert_id)
);
-- Authoritative entries (also usable for existence checks)
CREATE TABLE log_entries (
tenant_id text,
log_id text,
log_entry_id bigint,
timestamp timestamp,
payload blob, -- protobuf bytes
PRIMARY KEY ((tenant_id, log_id, log_entry_id), log_entry_id)
);
Processing steps:
- Check
log_entries_by_insert_id
for theinsert_id
. - If absent, write to both tables atomically (LWT or application-level guard).
log_entries
remains the store of record.
Postgres Projection: Friendly for Investigations
Optimized for timestamp-window browsing and human-scale search:
-- "Show me logs from this execution window"
SELECT *
FROM log_entries
WHERE tenant_id = 'etl-customer-sync'
AND log_id = '2024-01-15-run-001'
AND timestamp BETWEEN '2024-01-15 10:00:00' AND '2024-01-15 10:30:00'
ORDER BY timestamp, log_entry_id;
Partition by month; index (tenant_id, log_id, timestamp, log_entry_id)
; add full-text indexes on a text column (e.g., message_text
) to search for error strings.
What’s Next in the Series
- Real-time tailing: polling for new entries with efficient range queries
- Retention strategies: balancing compliance needs with storage costs
- Search optimization: Elasticsearch projections for complex log analysis
- Multi-region deployment: ensuring logs survive datacenter failures
The Bigger Picture
When every line matters and order is sacred, standard tools fall short. This architecture shows a clean, simple path to fill that gap: event-driven, strongly ordered, and uncompromisingly complete.
It’s not complex—just the right primitives assembled well. The result handles millions of events per second while ensuring that months later you can still find that one critical log line that explains why the pipeline failed at 3 AM.