← Back to Modules

dbbasic-logs Specification

Version: 1.0 Status: Specification Author: DBBasic Project Date: October 2025

Links: - PyPI: https://pypi.org/project/dbbasic-logs/ - GitHub: https://github.com/askrobots/dbbasic-logs - Specification: http://dbbasic.com/logs-spec


Philosophy

"Log everything. Query anything. Compress the rest."

Logs are the foundation that everything else depends on. Every module needs logging. Make it simple, structured, and Unix-native.

Design Principles

  1. Foundational: All other modules use this
  2. Simple: One-line logging, no setup
  3. Structured: TSV format, queryable
  4. Unix-Native: Plain text, grep-able, compressible
  5. Zero Config: Works out of the box

Critical Insight: Logs Are Infrastructure

Every module needs logging:

# dbbasic-queue
log.info("Processing job", job_id="abc123", type="send_email")
log.error("Job failed after 3 attempts", job_id="abc123")

# dbbasic-sessions
log.info("User logged in", user_id=42, ip="192.168.1.1")
log.warning("Invalid session token")

# dbbasic-email
log.info("Email queued", to="user@example.com")
log.error("SMTP connection failed", error="Connection refused")

# dbbasic-accounts
log.info("User registered", user_id=42, email="user@example.com")
log.warning("Failed login attempt", email="hacker@bad.com")

This means dbbasic-logs must be: - Simple (no complex setup) - Standalone (no dependencies on other dbbasic modules) - Always available (import and use) - Reliable (can't break other modules)

This is infrastructure, not a feature.


What Needs to Be Logged?

1. Application Logs (Your Code)

log.info("User updated profile", user_id=42)
log.warning("API rate limit approaching", user_id=42, count=95)
log.error("Database write failed", error="Connection lost")
log.debug("Cache miss", key="user:42")

Why: - Debug business logic - Audit trail - Performance insights

2. Exceptions (Uncaught Errors)

try:
    process_payment(order)
except Exception as e:
    log.exception("Payment processing failed", order_id=order.id)
    # Captures stack trace automatically

Why: - See what's breaking - Stack traces for debugging - Group similar errors - Replaces Sentry for 90% of use cases

3. Access Logs (HTTP Requests)

192.168.1.1 - GET /api/users 200 0.05s
192.168.1.2 - POST /login 401 0.02s

Why: - Traffic patterns - Debug 404s/500s - Spot attacks - Performance monitoring

Note: Gunicorn already provides this. dbbasic-logs can either use gunicorn's or provide its own.

4. Module Logs (Internal Operations)

# dbbasic-queue internal logging
log.debug("Found 5 pending jobs")
log.info("Job completed", job_id="abc123", duration=2.3)

Why: - Debug module behavior - Performance tracking - Operational visibility


Storage Format

Multiple TSV Files by Type

data/logs/
  app/
    2025-10-09.tsv         (today - active)
    2025-10-08.tsv.gz      (yesterday - compressed)
    2025-10-07.tsv.gz      (older - compressed)
  errors/
    2025-10-09.tsv
    2025-10-08.tsv.gz
  access/
    2025-10-09.tsv
    2025-10-08.tsv.gz

Why separate files: - Different retention policies (keep errors longer) - Different query patterns (search errors often, access rarely) - Smaller files (faster grep) - Can delete access logs but keep errors

TSV Format (All Logs)

Columns:

timestamp   level   message context

Example (app logs):

1696886400  INFO    User logged in  {"user_id":42,"ip":"192.168.1.1"}
1696886401  ERROR   Payment failed  {"order_id":123,"error":"Timeout"}
1696886402  WARNING Rate limit  {"user_id":42,"endpoint":"/api"}

Example (error logs with stack trace):

1696886400  ERROR   Division by zero    {"file":"calc.py","line":42,"trace":"Traceback..."}

Example (access logs):

1696886400  INFO    GET /api/users  {"ip":"192.168.1.1","status":200,"duration":0.05}
1696886401  INFO    POST /login {"ip":"192.168.1.2","status":401,"duration":0.02}

Why TSV: - Structured (easy to query) - Plain text (grep/zgrep) - Compressible (10:1 ratio) - Standard format across all log types


API Specification

Function: log.info(message, **context)

Purpose: Log informational message

Parameters: - message (str): Log message - **context (kwargs): Additional context (JSON serialized)

Behavior: 1. Get current timestamp 2. Serialize context to JSON 3. Append to data/logs/app/{today}.tsv

Example:

from dbbasic_logs import log

log.info("User logged in", user_id=42, ip=request.remote_addr)

# Writes to data/logs/app/2025-10-09.tsv:
# 1696886400    INFO    User logged in  {"user_id":42,"ip":"192.168.1.1"}

Function: log.error(message, **context)

Purpose: Log error message

Parameters: - message (str): Error message - **context (kwargs): Additional context

Behavior: 1. Same as info() but level=ERROR 2. Writes to data/logs/app/{today}.tsv

Example:

log.error("Payment processing failed", order_id=123, error="Gateway timeout")

Function: log.exception(message, **context)

Purpose: Log exception with automatic stack trace capture

Parameters: - message (str): Error description - **context (kwargs): Additional context

Behavior: 1. Capture current exception and stack trace 2. Serialize trace to context 3. Append to data/logs/errors/{today}.tsv

Example:

try:
    result = process_payment(order)
except Exception as e:
    log.exception("Payment failed", order_id=order.id, amount=order.total)
    raise

# Writes to data/logs/errors/2025-10-09.tsv with full stack trace

Function: log.access(method, path, status, duration, **context)

Purpose: Log HTTP access (alternative to gunicorn logs)

Parameters: - method (str): HTTP method (GET, POST, etc.) - path (str): Request path - status (int): Response status code - duration (float): Request duration in seconds - **context (kwargs): IP, user_id, etc.

Behavior: 1. Format access log entry 2. Append to data/logs/access/{today}.tsv

Example:

log.access(
    method="GET",
    path="/api/users",
    status=200,
    duration=0.05,
    ip=request.remote_addr,
    user_id=current_user.id
)

Note: Optional - can use gunicorn's access logs instead.


Function: log.search(pattern, log_type='app', days=7)

Purpose: Search logs across compressed and uncompressed files

Parameters: - pattern (str): Regex pattern to search for - log_type (str): Log type ('app', 'errors', 'access', 'all') - days (int): How many days back to search

Returns: - List of matching log entries

Behavior: 1. Find all log files in date range (including .gz) 2. Use grep/zgrep to search 3. Parse TSV results 4. Return structured data

Example:

# Find all errors in last 7 days
errors = log.search("ERROR", log_type='errors', days=7)

# Find logs for specific user
user_logs = log.search("user_id.*42", log_type='all', days=30)

# Find slow requests
slow = log.search("duration.*[5-9]\.", log_type='access', days=1)

Function: log.tail(log_type='app', lines=100)

Purpose: Get most recent log entries (like tail -f)

Parameters: - log_type (str): Which log to tail - lines (int): Number of lines

Returns: - List of recent log entries

Example:

# Get last 100 app logs
recent = log.tail('app', lines=100)

# Watch for errors
recent_errors = log.tail('errors', lines=50)

Implementation

Core Implementation (~60 lines)

import os
import time
import json
import logging
import traceback
from datetime import datetime, timedelta

LOG_DIR = os.getenv('LOG_DIR', 'data/logs')

class DBBasicLogger:
    def __init__(self):
        os.makedirs(f'{LOG_DIR}/app', exist_ok=True)
        os.makedirs(f'{LOG_DIR}/errors', exist_ok=True)
        os.makedirs(f'{LOG_DIR}/access', exist_ok=True)

    def _write(self, log_type, level, message, context):
        """Write to appropriate log file"""
        today = time.strftime('%Y-%m-%d')
        log_file = f'{LOG_DIR}/{log_type}/{today}.tsv'

        timestamp = int(time.time())
        context_json = json.dumps(context) if context else '{}'

        with open(log_file, 'a') as f:
            f.write(f'{timestamp}\t{level}\t{message}\t{context_json}\n')

    def info(self, message, **context):
        """Log info message"""
        self._write('app', 'INFO', message, context)

    def warning(self, message, **context):
        """Log warning message"""
        self._write('app', 'WARNING', message, context)

    def error(self, message, **context):
        """Log error message"""
        self._write('app', 'ERROR', message, context)

    def debug(self, message, **context):
        """Log debug message"""
        self._write('app', 'DEBUG', message, context)

    def exception(self, message, **context):
        """Log exception with stack trace"""
        context['trace'] = traceback.format_exc()
        self._write('errors', 'ERROR', message, context)

    def access(self, method, path, status, duration, **context):
        """Log HTTP access"""
        msg = f'{method} {path} {status}'
        context['duration'] = duration
        self._write('access', 'INFO', msg, context)

    def search(self, pattern, log_type='app', days=7):
        """Search logs using grep/zgrep"""
        import subprocess
        from datetime import datetime, timedelta

        results = []
        for i in range(days):
            date = (datetime.now() - timedelta(days=i)).strftime('%Y-%m-%d')

            # Try uncompressed first
            log_file = f'{LOG_DIR}/{log_type}/{date}.tsv'
            if os.path.exists(log_file):
                cmd = ['grep', pattern, log_file]
            else:
                # Try compressed
                log_file_gz = f'{log_file}.gz'
                if os.path.exists(log_file_gz):
                    cmd = ['zgrep', pattern, log_file_gz]
                else:
                    continue

            try:
                output = subprocess.check_output(cmd, text=True)
                for line in output.strip().split('\n'):
                    if line:
                        parts = line.split('\t')
                        results.append({
                            'timestamp': int(parts[0]),
                            'level': parts[1],
                            'message': parts[2],
                            'context': json.loads(parts[3]) if len(parts) > 3 else {}
                        })
            except subprocess.CalledProcessError:
                pass  # No matches

        return results

    def tail(self, log_type='app', lines=100):
        """Get recent log entries"""
        today = time.strftime('%Y-%m-%d')
        log_file = f'{LOG_DIR}/{log_type}/{today}.tsv'

        if not os.path.exists(log_file):
            return []

        # Read last N lines
        with open(log_file) as f:
            all_lines = f.readlines()
            recent = all_lines[-lines:] if len(all_lines) > lines else all_lines

        results = []
        for line in recent:
            parts = line.strip().split('\t')
            results.append({
                'timestamp': int(parts[0]),
                'level': parts[1],
                'message': parts[2],
                'context': json.loads(parts[3]) if len(parts) > 3 else {}
            })
        return results

# Global instance
log = DBBasicLogger()

That's it. ~60 lines for complete logging infrastructure.


Dependencies

No external packages. No services. No setup.


Usage Examples

Application Logging

from dbbasic_logs import log

@app.route('/api/users/<user_id>')
def get_user(user_id):
    log.info("Fetching user", user_id=user_id, ip=request.remote_addr)

    user = User.get(user_id)
    if not user:
        log.warning("User not found", user_id=user_id)
        return 404

    log.debug("User data retrieved", user_id=user_id, fields=len(user.__dict__))
    return jsonify(user)

Exception Logging

from dbbasic_logs import log

@app.route('/api/process-payment', methods=['POST'])
def process_payment():
    try:
        order = Order.get(request.json['order_id'])
        result = charge_card(order)
        log.info("Payment processed", order_id=order.id, amount=order.total)
        return jsonify(result)
    except PaymentError as e:
        log.exception("Payment failed", order_id=order.id, amount=order.total)
        return jsonify({'error': str(e)}), 400

Stack trace automatically captured in errors/2025-10-09.tsv

Module Integration (dbbasic-queue)

# Inside dbbasic-queue
from dbbasic_logs import log

def process_jobs(handlers):
    jobs = get_pending_jobs()
    log.info("Processing jobs", count=len(jobs))

    for job in jobs:
        try:
            log.info("Starting job", job_id=job['id'], type=job['type'])
            result = handlers[job['type']](job['payload'])
            log.info("Job completed", job_id=job['id'], duration=elapsed)
        except Exception as e:
            log.exception("Job failed", job_id=job['id'], type=job['type'])

Other modules just import and use. No setup needed.

Access Logging Middleware

from dbbasic_logs import log
import time

@app.before_request
def log_request_start():
    request.start_time = time.time()

@app.after_request
def log_request_end(response):
    duration = time.time() - request.start_time

    log.access(
        method=request.method,
        path=request.path,
        status=response.status_code,
        duration=duration,
        ip=request.remote_addr,
        user_agent=request.user_agent.string
    )

    return response

Log Compression & Rotation

Automatic Daily Rotation

#!/bin/bash
# /etc/cron.daily/dbbasic-logs-rotate
# Runs daily at midnight

LOG_DIR="data/logs"
YESTERDAY=$(date -d yesterday +%Y-%m-%d)

# Compress yesterday's logs
for log_type in app errors access; do
    if [ -f "${LOG_DIR}/${log_type}/${YESTERDAY}.tsv" ]; then
        gzip "${LOG_DIR}/${log_type}/${YESTERDAY}.tsv"
    fi
done

# Delete logs older than 30 days (app and access)
find ${LOG_DIR}/app -name "*.tsv.gz" -mtime +30 -delete
find ${LOG_DIR}/access -name "*.tsv.gz" -mtime +30 -delete

# Keep errors longer (90 days)
find ${LOG_DIR}/errors -name "*.tsv.gz" -mtime +90 -delete

Why this works: - Yesterday's logs compressed (10:1 ratio) - Today's log stays uncompressed (fast writes) - Old logs auto-deleted - Different retention per type

Compression Benefits

Typical log compression:

Original: 10MB/day
Compressed: 1MB/day (gzip)

30 days uncompressed: 300MB
30 days compressed: 30MB

10x space savings

Unix tools work on compressed:

# Search compressed logs
zgrep "ERROR" data/logs/errors/2025-10-08.tsv.gz

# View compressed logs
zless data/logs/app/2025-10-08.tsv.gz

# Count errors in compressed log
zcat data/logs/errors/2025-10-08.tsv.gz | wc -l

No special tools needed - Unix handles it.


Querying Logs

Command Line (Unix Way)

# All errors today
grep ERROR data/logs/app/2025-10-09.tsv

# All errors last 7 days (compressed + uncompressed)
zgrep ERROR data/logs/app/2025-10-*.tsv*

# Specific user activity
zgrep 'user_id.*42' data/logs/app/*.tsv*

# Count 500 errors
grep "status.*500" data/logs/access/2025-10-09.tsv | wc -l

# Slow requests (> 1 second)
grep "duration.*[1-9]\." data/logs/access/2025-10-09.tsv

# Failed jobs
zgrep "Job failed" data/logs/app/*.tsv*

Python API (Programmatic)

from dbbasic_logs import log

# Search last 7 days for errors
errors = log.search("ERROR", log_type='app', days=7)
for error in errors:
    print(f"{error['timestamp']}: {error['message']}")
    print(f"  Context: {error['context']}")

# Get recent activity
recent = log.tail('app', lines=100)

# Find user activity
user_activity = log.search(f"user_id.*{user_id}", log_type='all', days=30)

Web Dashboard

from dbbasic_logs import log

@app.route('/admin/logs')
def logs_dashboard():
    recent_errors = log.tail('errors', lines=50)
    stats = {
        'errors_today': len(log.search('ERROR', days=1)),
        'warnings_today': len(log.search('WARNING', days=1)),
    }
    return render('logs', errors=recent_errors, stats=stats)

Performance Characteristics

Benchmarks

Operation Time Notes
Write log 0.1ms Append to file
Search today 0.5s grep 10MB file
Search compressed 2s zgrep 1MB compressed
Tail recent 0.01s Read last N lines

Storage

Daily Logs Uncompressed Compressed 30 Days Total
Low traffic 1MB 100KB 3MB
Medium traffic 10MB 1MB 30MB
High traffic 100MB 10MB 300MB

Even high-traffic sites: < 1GB for 30 days of logs

Compare to Sentry: Starts at $29/month, limited events


Comparison to Alternatives

Sentry (Error Tracking SaaS)

Sentry:

Setup: SDK integration, API keys
Cost: $29-$299/month
Features: Grouping, alerts, dashboards
Storage: Cloud (they control it)
Search: Web UI
Privacy: Sends errors to third-party

dbbasic-logs:

Setup: Import and use
Cost: $0
Features: TSV storage, grep search
Storage: Your server
Search: grep/zgrep + Python API
Privacy: All local

When to use Sentry: - Team needs web UI - Want fancy grouping/trending - Don't mind paying monthly - Already using it

When to use dbbasic-logs: - Want simplicity and control - grep is good enough - $0 budget - Privacy concerns

Python logging + RotatingFileHandler

Python stdlib:

import logging
from logging.handlers import RotatingFileHandler

handler = RotatingFileHandler('app.log', maxBytes=10000, backupCount=5)
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
handler.setFormatter(formatter)
logger = logging.getLogger('app')
logger.addHandler(handler)
logger.setLevel(logging.INFO)

# Use it
logger.info("User logged in", extra={'user_id': 42})

Problems: - Complex setup (8 lines just for config) - Not structured (free-form text) - Context in extra={} (awkward) - Hard to query programmatically

dbbasic-logs:

from dbbasic_logs import log

log.info("User logged in", user_id=42)

Benefits: - One line setup (import) - Structured (TSV) - Context in kwargs (natural) - Easy to query (Python API or grep)

ELK Stack (Elasticsearch + Logstash + Kibana)

ELK:

Setup: Docker compose with 3 services
Memory: 4GB+ RAM
Complexity: High
Search: Very fast, powerful
UI: Excellent dashboards

dbbasic-logs:

Setup: Import
Memory: 0 (just files)
Complexity: Low
Search: grep (good enough)
UI: Build your own (or just use grep)

When to use ELK: - Multiple servers - Massive log volume - Need advanced analytics

When to use dbbasic-logs: - Single server - Reasonable log volume - grep is sufficient


Architectural Decisions

Why Multiple Log Files (by Type)?

Single file approach:

data/logs/2025-10-09.tsv (all logs mixed)

Problems: - Different retention needs (keep errors longer) - Different query patterns - Larger files (slower grep)

Multiple files:

app/2025-10-09.tsv
errors/2025-10-09.tsv
access/2025-10-09.tsv

Benefits: - Separate retention (errors: 90 days, access: 7 days) - Faster search (smaller files) - Clear organization - Can delete access logs but keep errors

Why TSV Instead of JSON Lines?

JSON Lines:

{"timestamp":1696886400,"level":"INFO","message":"User logged in","user_id":42}

TSV:

1696886400  INFO    User logged in  {"user_id":42}

TSV wins: - Consistent with dbbasic ecosystem - Simpler parsing - Better compression (less redundant keys) - grep-friendly (fixed columns)

Why Date-Based Files Instead of Rotation?

RotatingFileHandler:

app.log       (active)
app.log.1     (rotated)
app.log.2     (rotated)

Problems: - Number suffixes meaningless - Complex rotation logic - Hard to find "logs from last Tuesday"

Date-based:

2025-10-09.tsv
2025-10-08.tsv.gz
2025-10-07.tsv.gz

Benefits: - Filename = date (instantly know what it is) - No rotation logic (new day = new file) - Easy to find specific date - Simple compression (gzip yesterday)

Why gzip Compression?

Alternatives: - bzip2 (better compression, slower) - xz (best compression, slowest) - lz4 (faster, worse compression)

gzip wins: - Good compression (~10:1 for logs) - Fast decompression - Universal (every Unix system has zgrep) - Standard (everyone knows it)

Why Not Send to Syslog?

Could integrate with syslog:

import syslog
syslog.syslog(syslog.LOG_INFO, "User logged in")

Problems: - Not structured - Mixed with system logs - Hard to query programmatically - Less control

dbbasic-logs approach: - Own files (full control) - Structured TSV (query easily) - Separate from system logs - Simple to understand


Security & Privacy

Log Sanitization

Don't log sensitive data:

# BAD
log.info("User logged in", password=user_password)

# GOOD
log.info("User logged in", user_id=user.id)

Sanitize automatically:

def sanitize_context(context):
    """Remove sensitive fields"""
    sensitive = ['password', 'credit_card', 'ssn', 'api_key']
    return {k: v for k, v in context.items() if k not in sensitive}

# Use in _write()
context = sanitize_context(context)

File Permissions

# Restrict log access
chmod 600 data/logs/**/*.tsv
chown www-data:www-data data/logs

GDPR Considerations

Logs may contain personal data: - IP addresses - User IDs - Email addresses

Retention policy:

# Delete old logs (GDPR compliance)
# Keep errors 90 days, access 7 days

User data deletion:

def delete_user_logs(user_id):
    """Remove user from logs (GDPR right to deletion)"""
    # Filter out user from all logs
    for log_file in glob.glob('data/logs/**/*.tsv'):
        filter_file(log_file, lambda row:
            f'"user_id":{user_id}' not in row[3]
        )

Integration with Other Modules

dbbasic-queue Integration

# Inside dbbasic-queue
from dbbasic_logs import log

def process_jobs(handlers):
    jobs = get_pending_jobs()
    log.info("Queue worker started", pending_count=len(jobs))

    for job in jobs:
        log.info("Processing job", job_id=job['id'], type=job['type'])

        try:
            result = handlers[job['type']](job['payload'])
            log.info("Job completed",
                job_id=job['id'],
                type=job['type'],
                duration=elapsed,
                result=result
            )
        except Exception as e:
            log.exception("Job failed",
                job_id=job['id'],
                type=job['type'],
                attempts=job['attempts']
            )

dbbasic-accounts Integration

# Inside dbbasic-accounts
from dbbasic_logs import log

def register(email, password):
    log.info("User registration started", email=email)

    if User.exists(email):
        log.warning("Registration failed - email exists", email=email)
        raise ValueError("Email already registered")

    user = User.create(email, hash_password(password))
    log.info("User registered", user_id=user.id, email=email)
    return user

def authenticate(email, password):
    user = User.get_by_email(email)

    if not user or not verify_password(password, user.password_hash):
        log.warning("Failed login attempt", email=email, ip=request.remote_addr)
        return None

    log.info("User logged in", user_id=user.id, email=email)
    return user

Security benefit: Failed login attempts logged automatically.

dbbasic-email Integration

# Inside dbbasic-email
from dbbasic_logs import log

def send_email(to, subject, body):
    log.info("Sending email", to=to, subject=subject)

    try:
        smtp.sendmail(from_addr, to, msg)
        log.info("Email sent", to=to, subject=subject)
    except SMTPException as e:
        log.error("Email failed", to=to, error=str(e))
        raise

Monitoring & Alerting

Simple Error Monitoring

# Check for recent errors
errors = log.search("ERROR|EXCEPTION", log_type='errors', days=1)

if len(errors) > 10:
    # Alert: Too many errors today
    send_alert(f"{len(errors)} errors in last 24 hours")

Failed Job Monitoring

# Check for stuck jobs
failed_jobs = log.search("Job failed", log_type='app', days=1)

# Group by job type
from collections import Counter
failures_by_type = Counter([
    json.loads(e['context'])['type']
    for e in failed_jobs
])

if failures_by_type['send_email'] > 5:
    send_alert("Email sending is failing repeatedly")

Performance Monitoring

# Find slow requests
slow = log.search("duration.*[5-9]\.", log_type='access', days=1)

# Average response time
durations = [e['context']['duration'] for e in log.tail('access', lines=1000)]
avg = sum(durations) / len(durations)

if avg > 1.0:
    send_alert(f"Average response time: {avg}s")

Testing Requirements

Unit Tests

from dbbasic_logs import log
import os
import time

def test_info_logging():
    log.info("Test message", foo="bar")

    today = time.strftime('%Y-%m-%d')
    log_file = f'data/logs/app/{today}.tsv'

    assert os.path.exists(log_file)

    with open(log_file) as f:
        last_line = f.readlines()[-1]
        assert 'INFO' in last_line
        assert 'Test message' in last_line
        assert '"foo":"bar"' in last_line

def test_exception_logging():
    try:
        raise ValueError("Test error")
    except:
        log.exception("Test exception", context="test")

    today = time.strftime('%Y-%m-%d')
    log_file = f'data/logs/errors/{today}.tsv'

    with open(log_file) as f:
        last_line = f.readlines()[-1]
        assert 'ValueError' in last_line
        assert 'Traceback' in last_line

def test_search():
    log.info("Searchable message", user_id=42)
    results = log.search("Searchable", days=1)

    assert len(results) > 0
    assert results[0]['message'] == "Searchable message"
    assert results[0]['context']['user_id'] == 42

def test_tail():
    for i in range(10):
        log.info(f"Message {i}")

    recent = log.tail('app', lines=5)
    assert len(recent) == 5
    assert "Message 9" in recent[-1]['message']

Deployment

Production Setup

1. Create log directories:

mkdir -p data/logs/{app,errors,access}

2. Set up rotation cron:

# /etc/cron.daily/dbbasic-logs
cat > /etc/cron.daily/dbbasic-logs << 'EOF'
#!/bin/bash
YESTERDAY=$(date -d yesterday +%Y-%m-%d)
for type in app errors access; do
    [ -f "data/logs/${type}/${YESTERDAY}.tsv" ] && gzip "data/logs/${type}/${YESTERDAY}.tsv"
done
find data/logs/app -name "*.tsv.gz" -mtime +30 -delete
find data/logs/access -name "*.tsv.gz" -mtime +30 -delete
find data/logs/errors -name "*.tsv.gz" -mtime +90 -delete
EOF

chmod +x /etc/cron.daily/dbbasic-logs

3. Use in app:

from dbbasic_logs import log

log.info("Application started")

That's it. No services, no configuration.

Docker Integration

FROM python:3.11-slim

RUN pip install dbbasic-logs

# Create log volume
VOLUME /app/data/logs

COPY app.py .
CMD ["python", "app.py"]

docker-compose.yml:

services:
  app:
    volumes:
      - app_logs:/app/data/logs

volumes:
  app_logs:

Logs persist across container restarts.


Configuration

Environment Variables

LOG_DIR=data/logs              # Base log directory
LOG_LEVEL=INFO                 # Minimum level to log
LOG_RETENTION_DAYS=30          # How long to keep logs
LOG_ERROR_RETENTION_DAYS=90    # Errors kept longer

Defaults work for 99% of cases.


Common Questions

Q: What about log levels?

A: Standard levels supported: - DEBUG (verbose) - INFO (normal) - WARNING (potential issues) - ERROR (failures)

Filter by level in search:

errors = log.search("ERROR", log_type='app')

Q: How do I aggregate logs across servers?

A: Use log shipping:

# Ship logs to central server
rsync -az data/logs/ central:/logs/server1/

Or mount shared NFS:

mount central:/logs /app/data/logs

Or graduate to ELK/Datadog when you actually need it.

Q: What about structured logging?

A: TSV IS structured logging. Context field is JSON.

Query it programmatically:

results = log.search("Payment", days=7)
for r in results:
    print(r['context']['order_id'])

Q: Can I log to stdout for Docker?

A: Yes, add stdout handler:

# Log to both file and stdout
log.info("Message", user_id=42)
# Writes to file AND prints to stdout

# Docker captures stdout

Q: What about log aggregation?

A: For single server, not needed. For multi-server:

Option 1: rsync logs to central server Option 2: Ship to S3/object storage Option 3: Graduate to proper log aggregation (ELK, Datadog)


Package Structure

dbbasic-logs/
├── dbbasic_logs/
│   ├── __init__.py          # Main implementation (60 lines)
│   └── rotate.py            # Rotation script (cron)
├── tests/
│   ├── test_logging.py
│   ├── test_search.py
│   └── test_rotation.py
├── setup.py
├── README.md
├── LICENSE
└── CHANGELOG.md

Success Criteria

This implementation is successful if:

  1. Simple: < 100 lines of code
  2. Structured: TSV format, queryable
  3. Foundational: All modules can use it
  4. Zero Config: Import and use
  5. Searchable: grep/zgrep + Python API
  6. Compressed: Auto-compress old logs
  7. Unix-Native: Plain text, standard tools
  8. Replaces Sentry: For 90% of use cases

Comparison Summary

Feature Sentry ELK Python logging dbbasic-logs
Setup API key Docker 3 services 8 lines config Import
Cost $29-299/mo $0 (self-host) $0 $0
Storage Cloud Elasticsearch Log files TSV files
Search Web UI Kibana grep grep + Python
Structure Yes Yes No Yes (TSV)
Compression N/A Yes Manual Auto (gzip)
Privacy Cloud Local Local Local
Query API REST REST No Python
Dependency SDK 3 services stdlib stdlib

dbbasic-logs: Structured like Sentry, simple like stdlib, cheap like free.


References


Summary

dbbasic-logs is foundational infrastructure:

It works because: - TSV = structured + plain text - Date-based files = simple rotation - gzip = automatic compression - Unix tools (grep/zgrep) = powerful search - No services, no setup, no cost

Use it when: - Building any app (it's foundational) - Want structured logs without Sentry - grep is good enough for search - Local storage preferred

Graduate to ELK/Datadog when: - Multiple servers need aggregation - Massive log volume - Team needs web dashboards

Until then, use TSV. It's structured, searchable, and simple.


Next Steps: Implement, test, deploy, use everywhere.

No Sentry. No ELK. No services. Just TSV files.

60 lines of code that every other module depends on.