---
title: "Apache vs Nginx Log Formats: The Complete Analysis Guide"
description: "Complete comparison of Apache and Nginx log formats. Learn Combined Log Format, custom directives, JSON logging, and parsing techniques for effective log analysis."
category: "Tutorials"
date: "2025-02-12"
author: "GetBeast"
tags: ["tutorials", "apache", "nginx", "log-formats", "server-logs", "log-analysis", "parsing"]
url: "https://getbeast.io/blog/log-formats/"
reading_time: "15 min"
---

# Apache vs Nginx Log Formats: The Complete Analysis Guide

Complete comparison of Apache and Nginx log formats. Learn Combined Log Format, custom directives, JSON logging, and parsing techniques for effective log analysis.

## Table of Contents

1. [Why Log Format Matters](#why-format-matters)
2. [Apache Common Log Format (CLF)](#apache-clf)
3. [Apache Combined Log Format](#apache-combined)
4. [Nginx Default Log Format](#nginx-default)
5. [Custom Log Formats: Apache vs Nginx Directives](#custom-formats)
6. [JSON Structured Logging](#json-logging)
7. [Log Rotation and Management](#log-rotation)
8. [Parsing Log Files](#parsing)
9. [Which Format to Choose](#which-format)
10. [Conclusion](#conclusion)

## Why Log Format Matters

Every HTTP request that hits your web server generates a log entry. The format of that entry determines what you can analyze, how quickly you can parse it, and whether your monitoring pipeline can ingest it efficiently. Choosing the right log format directly impacts your ability to debug production issues, detect security threats, optimize performance, and understand traffic patterns.

Apache HTTP Server and Nginx account for over 70% of all web servers on the internet. Despite serving the same fundamental purpose, they use different syntax for log format configuration, different default field orders, and different variable naming conventions. Understanding both is essential for any operations engineer, SRE, or developer working with web infrastructure.

> **Key Insight:** The default log formats for both Apache and Nginx derive from the NCSA Common Log Format defined in the early 1990s. Despite being over 30 years old, CLF remains the foundation that most log analysis tools expect.

## Apache Common Log Format (CLF)

The Common Log Format is the most basic standardized log format. Apache defines it with the following `LogFormat` directive:

```
LogFormat "%h %l %u %t \"%r\" %>s %b" common
CustomLog /var/log/apache2/access.log common
```

A typical CLF entry looks like this:

```
203.0.113.50 - frank [10/Feb/2025:13:55:36 -0700] "GET /api/v2/users HTTP/1.1" 200 2326
```

### Field-by-Field Breakdown

| Field | Directive | Example Value | Description |
|-------|-----------|---------------|-------------|
| Remote Host | `%h` | `203.0.113.50` | Client IP address. Uses DNS hostname if `HostnameLookups On` |
| Identity | `%l` | `-` | RFC 1413 identity. Almost always a hyphen. Requires `mod_ident` |
| User | `%u` | `frank` | Authenticated username. Hyphen if no auth |
| Timestamp | `%t` | `[10/Feb/2025:13:55:36 -0700]` | Request time in strftime format |
| Request Line | `%r` | `GET /api/v2/users HTTP/1.1` | Full first line of request: method, URI, protocol |
| Status Code | `%>s` | `200` | Final HTTP status code (after internal redirects) |
| Bytes Sent | `%b` | `2326` | Response body size in bytes. Hyphen for zero bytes |

> **Warning:** The `%h` directive will perform a DNS reverse lookup if `HostnameLookups` is enabled, which can significantly slow your server under load. Always keep `HostnameLookups Off` in production.

### Status Code Nuance: %s vs %>s

Apache distinguishes between the original status code (`%s`) and the final status code (`%>s`). This matters when internal redirects occur:

```
# Original request returns 301, internal redirect returns 200
# %s  = 301 (original status)
# %>s = 200 (final status after redirect)
```

Always use `%>s` in production log formats unless you specifically need to track pre-redirect status codes.

## Apache Combined Log Format

The Combined Log Format extends CLF with two critical fields: Referer and User-Agent.

```
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
CustomLog /var/log/apache2/access.log combined
```

Example output:

```
203.0.113.50 - frank [10/Feb/2025:13:55:36 -0700] "GET /api/v2/users HTTP/1.1" 200 2326 "https://example.com/dashboard" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36"
```

### The Two Extra Fields

| Field | Directive | Description |
|-------|-----------|-------------|
| Referer | `%{Referer}i` | The URL the client came from. Hyphen if direct/bookmarked |
| User-Agent | `%{User-Agent}i` | Browser or bot identification string |

### Custom LogFormat Directives

Apache's `mod_log_config` supports extensive customization:

```
# Add response time in microseconds
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %D" combined_with_time

# Add SSL protocol and cipher
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %{SSL_PROTOCOL}x %{SSL_CIPHER}x" combined_ssl

# Add X-Forwarded-For for reverse proxy setups
LogFormat "%{X-Forwarded-For}i %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" proxy_combined

# Add virtual host and server port
LogFormat "%v:%p %h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" vhost_combined
```

| Directive | Description | Example Value |
|-----------|-------------|---------------|
| `%D` | Request processing time in microseconds | `234567` |
| `%T` | Request processing time in seconds | `0` |
| `%{ms}T` | Request processing time in milliseconds | `234` |
| `%I` | Bytes received (requires `mod_logio`) | `4872` |
| `%O` | Bytes sent including headers (requires `mod_logio`) | `23456` |
| `%v` | Canonical server name | `www.example.com` |
| `%p` | Server port | `443` |
| `%X` | Connection status: `X`=aborted, `+`=keepalive, `-`=closed | `+` |
| `%{VARNAME}e` | Environment variable | Varies |
| `%{Header}i` | Request header value | Varies |
| `%{Header}o` | Response header value | Varies |

### Conditional Logging

Apache supports conditional logging based on environment variables:

```
# Don't log health check requests
SetEnvIf Request_URI "^/health$" dontlog
SetEnvIf Request_URI "^/readyz$" dontlog
CustomLog /var/log/apache2/access.log combined env=!dontlog

# Log bots to a separate file
SetEnvIf User-Agent "Googlebot" is_bot
SetEnvIf User-Agent "bingbot" is_bot
CustomLog /var/log/apache2/bot_access.log combined env=is_bot
CustomLog /var/log/apache2/human_access.log combined env=!is_bot
```

> **Best Practice:** Separating bot traffic into its own log file makes analysis significantly faster. [LogBeast](https://getbeast.io/logbeast/) can automatically detect and categorize bot traffic from any log format.

## Nginx Default Log Format

Nginx defines its default log format using the `log_format` directive. The built-in format is called `combined`:

```
log_format combined '$remote_addr - $remote_user [$time_local] '
                    '"$request" $status $body_bytes_sent '
                    '"$http_referer" "$http_user_agent"';

access_log /var/log/nginx/access.log combined;
error_log /var/log/nginx/error.log warn;
```

### Nginx Variable Reference

| Nginx Variable | Apache Equivalent | Description |
|----------------|-------------------|-------------|
| `$remote_addr` | `%h` / `%a` | Client IP address |
| `$remote_user` | `%u` | Authenticated username |
| `$time_local` | `%t` | Local time in CLF format |
| `$time_iso8601` | `%{%Y-%m-%dT%H:%M:%S%z}t` | ISO 8601 timestamp |
| `$request` | `%r` | Full request line |
| `$status` | `%>s` | Response status code |
| `$body_bytes_sent` | `%b` | Body bytes sent (excludes headers) |
| `$bytes_sent` | `%O` | Total bytes sent (includes headers) |
| `$http_referer` | `%{Referer}i` | Referer header |
| `$http_user_agent` | `%{User-Agent}i` | User-Agent header |
| `$request_time` | `%D` (different unit) | Request processing time in seconds with ms resolution |
| `$upstream_response_time` | N/A | Time spent waiting for upstream |
| `$connection` | N/A | Connection serial number |
| `$connection_requests` | N/A | Number of requests on this connection |
| `$msec` | N/A | Time in seconds with ms resolution |
| `$pipe` | N/A | Pipelined request indicator |

### Nginx Error Log Configuration

Unlike access logs, Nginx error logs have a fixed format that cannot be customized. You can only control the severity level:

```
# Error log levels: debug, info, notice, warn, error, crit, alert, emerg
error_log /var/log/nginx/error.log warn;

# Per-server block error logs
server {
    listen 443 ssl;
    server_name example.com;
    error_log /var/log/nginx/example.com.error.log error;
}
```

> **Warning:** Setting `error_log` to `debug` level in production generates enormous volumes of output and measurably impacts performance. Use `warn` or `error` for production.

## Custom Log Formats: Apache vs Nginx Directives

### Complete Directive Comparison Table

| Data Point | Apache Directive | Nginx Variable |
|------------|-----------------|----------------|
| Client IP | `%a` | `$remote_addr` |
| Client IP (behind proxy) | `%{X-Forwarded-For}i` | `$http_x_forwarded_for` |
| Real Client IP (proxy-aware) | `%a` (with `mod_remoteip`) | `$realip_remote_addr` |
| Server hostname | `%v` | `$server_name` |
| Server port | `%p` | `$server_port` |
| Request method | `%m` | `$request_method` |
| Request URI | `%U` | `$uri` |
| Request URI (original) | `%U%q` | `$request_uri` |
| Query string | `%q` | `$args` |
| Protocol | `%H` | `$server_protocol` |
| Request time (seconds) | `%T` | `$request_time` |
| Request time (microseconds) | `%D` | N/A |
| Request time (milliseconds) | `%{ms}T` | N/A |
| Bytes received | `%I` | `$request_length` |
| Bytes sent (body only) | `%b` | `$body_bytes_sent` |
| Bytes sent (total) | `%O` | `$bytes_sent` |
| SSL protocol | `%{SSL_PROTOCOL}x` | `$ssl_protocol` |
| SSL cipher | `%{SSL_CIPHER}x` | `$ssl_cipher` |
| Upstream response time | `%{BALANCER_WORKER_ROUTE}e` | `$upstream_response_time` |
| Upstream status | N/A | `$upstream_status` |
| Upstream address | N/A | `$upstream_addr` |
| GeoIP country | `%{GEOIP_COUNTRY_CODE}e` | `$geoip_country_code` |
| Any request header | `%{HeaderName}i` | `$http_headername` |
| Any response header | `%{HeaderName}o` | `$sent_http_headername` |

### Production-Ready Custom Formats

```
# Apache - Extended production format
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %D %{X-Forwarded-For}i %v %{SSL_PROTOCOL}x %X" production
CustomLog /var/log/apache2/access.log production
```

```
# Nginx - Extended production format
log_format production '$remote_addr - $remote_user [$time_local] '
                      '"$request" $status $body_bytes_sent '
                      '"$http_referer" "$http_user_agent" '
                      '$request_time $http_x_forwarded_for '
                      '$server_name $ssl_protocol '
                      '$upstream_response_time $upstream_status';

access_log /var/log/nginx/access.log production;
```

> **Tip:** When adding custom fields, always append them to the end of the Combined format. This ensures backward compatibility with existing log parsers.

## JSON Structured Logging

Modern observability stacks (Elasticsearch, Splunk, Datadog, Loki) work best with structured data. JSON log formats eliminate parsing ambiguity and support schema evolution.

### Apache JSON Configuration

```
LogFormat "{\"timestamp\":\"%{%Y-%m-%dT%H:%M:%S%z}t\",\"remote_addr\":\"%a\",\"remote_user\":\"%u\",\"request_method\":\"%m\",\"request_uri\":\"%U%q\",\"protocol\":\"%H\",\"status\":%>s,\"body_bytes_sent\":%B,\"http_referer\":\"%{Referer}i\",\"http_user_agent\":\"%{User-Agent}i\",\"request_time_us\":%D,\"ssl_protocol\":\"%{SSL_PROTOCOL}x\",\"vhost\":\"%v\"}" json
CustomLog /var/log/apache2/access.json.log json
```

> **Warning:** Apache's manual JSON construction is fragile. If a User-Agent string contains an unescaped double quote, it will produce invalid JSON.

### Nginx JSON Configuration

```
log_format json_log escape=json
    '{'
        '"timestamp":"$time_iso8601",'
        '"remote_addr":"$remote_addr",'
        '"remote_user":"$remote_user",'
        '"request_method":"$request_method",'
        '"request_uri":"$request_uri",'
        '"protocol":"$server_protocol",'
        '"status":$status,'
        '"body_bytes_sent":$body_bytes_sent,'
        '"request_length":$request_length,'
        '"http_referer":"$http_referer",'
        '"http_user_agent":"$http_user_agent",'
        '"request_time":$request_time,'
        '"upstream_response_time":"$upstream_response_time",'
        '"upstream_status":"$upstream_status",'
        '"ssl_protocol":"$ssl_protocol",'
        '"server_name":"$server_name",'
        '"connection":$connection,'
        '"connection_requests":$connection_requests'
    '}';

access_log /var/log/nginx/access.json.log json_log;
```

> **Best Practice:** Always use `escape=json` in Nginx JSON log formats. Without it, special characters in user-agent strings and URLs will produce invalid JSON.

### JSON Format Comparison

| Feature | Apache JSON | Nginx JSON |
|---------|-------------|------------|
| Native JSON support | No | No |
| Auto-escaping | No (manual) | Yes (`escape=json`) |
| Numeric types | Manual (omit quotes) | Manual (omit quotes) |
| Nested objects | Not supported | Not supported |
| ISO 8601 timestamps | `%{%Y-%m-%dT%H:%M:%S%z}t` | `$time_iso8601` |
| Upstream metrics | Limited | Comprehensive |
| Broken JSON risk | High | Low (with escape=json) |

## Log Rotation and Management

Without rotation, web server logs grow unbounded. A busy site generating 10,000 requests per minute produces roughly 3 GB of Combined format logs per day.

### Apache Logrotate Configuration

```
# /etc/logrotate.d/apache2
/var/log/apache2/*.log {
    daily
    missingok
    rotate 52
    compress
    delaycompress
    notifempty
    create 640 root adm
    sharedscripts
    postrotate
        if invoke-rc.d apache2 status > /dev/null 2>&1; then
            invoke-rc.d apache2 reload > /dev/null
        fi
    endscript
}
```

### Nginx Logrotate Configuration

```
# /etc/logrotate.d/nginx
/var/log/nginx/*.log {
    daily
    missingok
    rotate 52
    compress
    delaycompress
    notifempty
    create 0640 www-data adm
    sharedscripts
    postrotate
        invoke-rc.d nginx rotate >/dev/null 2>&1
    endscript
}
```

> **Key Difference:** Apache requires a full `reload` (graceful restart) to reopen log files after rotation. Nginx supports a dedicated `rotate` signal (`USR1`) that reopens log files without any service interruption.

### Disk Space Estimation

| Requests/day | CLF (~150 bytes/line) | Combined (~350 bytes/line) | JSON (~600 bytes/line) |
|--------------|-----------------------|---------------------------|----------------------|
| 100,000 | ~14 MB | ~33 MB | ~57 MB |
| 1,000,000 | ~143 MB | ~333 MB | ~572 MB |
| 10,000,000 | ~1.4 GB | ~3.3 GB | ~5.7 GB |
| 100,000,000 | ~14 GB | ~33 GB | ~57 GB |

With gzip compression, expect 85-95% size reduction.

## Parsing Log Files

### Regex Patterns

```
# Combined Log Format regex (PCRE)
^(?P<ip>\S+) \S+ (?P<user>\S+) \[(?P<timestamp>[^\]]+)\] "(?P<method>\S+) (?P<path>\S+) (?P<protocol>\S+)" (?P<status>\d{3}) (?P<bytes>\S+) "(?P<referer>[^"]*)" "(?P<useragent>[^"]*)"
```

### AWK One-Liners for Quick Analysis

```bash
# Top 20 IP addresses
awk '{print $1}' access.log | sort | uniq -c | sort -rn | head -20

# Top 20 requested URLs
awk '{print $7}' access.log | sort | uniq -c | sort -rn | head -20

# Status code distribution
awk '{print $9}' access.log | sort | uniq -c | sort -rn

# Requests per hour
awk '{print substr($4,2,14)}' access.log | sort | uniq -c

# Total bandwidth in MB
awk '{sum+=$10} END {printf "%.2f MB\n", sum/1024/1024}' access.log

# Find all Googlebot requests
awk -F'"' '$6 ~ /Googlebot/ {print $2}' access.log | sort | uniq -c | sort -rn | head -20

# 5xx errors with full details
awk '$9 ~ /^5/ {print $0}' access.log | tail -50
```

### Python Log Parser

```python
#!/usr/bin/env python3
"""Production-grade log parser for Apache/Nginx Combined Log Format."""

import re
import gzip
import sys
from collections import Counter

COMBINED_RE = re.compile(
    r'^(?P<ip>\S+) \S+ (?P<user>\S+) '
    r'\[(?P<timestamp>[^\]]+)\] '
    r'"(?P<method>\S+) (?P<path>\S+) (?P<protocol>\S+)" '
    r'(?P<status>\d{3}) (?P<bytes>\S+) '
    r'"(?P<referer>[^"]*)" '
    r'"(?P<useragent>[^"]*)"'
)

def parse_line(line):
    match = COMBINED_RE.match(line)
    if match:
        d = match.groupdict()
        d['bytes'] = 0 if d['bytes'] == '-' else int(d['bytes'])
        d['status'] = int(d['status'])
        return d
    return None

def analyze_log(filepath):
    stats = {'total': 0, 'parsed': 0, 'status_codes': Counter(), 'top_ips': Counter()}
    opener = gzip.open if filepath.endswith('.gz') else open
    with opener(filepath, 'rt', encoding='utf-8', errors='replace') as f:
        for line in f:
            stats['total'] += 1
            entry = parse_line(line.strip())
            if entry:
                stats['parsed'] += 1
                stats['status_codes'][entry['status']] += 1
                stats['top_ips'][entry['ip']] += 1

    print(f"Total: {stats['total']:,} | Parsed: {stats['parsed']:,}")
    for code, count in stats['status_codes'].most_common():
        print(f"  {code}: {count:,}")

if __name__ == '__main__':
    for filepath in sys.argv[1:]:
        analyze_log(filepath)
```

### Parsing JSON Logs

```bash
# jq - extract all 5xx errors
cat access.json.log | jq -r 'select(.status >= 500) | "\(.timestamp) \(.status) \(.request_uri)"'

# jq - top IPs by request count
cat access.json.log | jq -r '.remote_addr' | sort | uniq -c | sort -rn | head -20

# jq - average response time per endpoint
cat access.json.log | jq -r '"\(.request_uri) \(.request_time)"' | \
    awk '{sum[$1]+=$2; count[$1]++} END {for (u in sum) printf "%s %.3f (%d reqs)\n", u, sum[u]/count[u], count[u]}' | \
    sort -k2 -rn | head -20
```

> **Pro Tip:** JSON logs eliminate the need for complex regex parsing. The small disk space overhead (~70% larger than Combined) is well worth the parsing simplicity. [LogBeast](https://getbeast.io/logbeast/) natively supports both Combined and JSON formats with automatic detection.

## Which Format to Choose

### Decision Matrix

| Criteria | CLF | Combined | Custom Extended | JSON |
|----------|-----|----------|-----------------|------|
| Disk usage | Lowest | Medium | Medium-High | Highest |
| Parse complexity | Simple regex | Moderate regex | Complex regex | Trivial (native JSON) |
| Tool compatibility | Universal | Universal | Custom parsers | Modern tools only |
| Bot/SEO analysis | No (no UA) | Yes | Yes | Yes |
| Performance debugging | No (no timing) | No (no timing) | Yes | Yes |
| ELK/Splunk/Datadog | Supported | Supported | Grok patterns needed | Native ingest |
| Schema evolution | Rigid | Rigid | Version carefully | Add fields freely |
| Human readability | Good | Good | Moderate | Verbose but clear |

### Recommendations by Use Case

- **Small sites (< 100K requests/day):** Use Combined format for the best balance of information and simplicity.
- **Medium sites (100K - 10M requests/day):** Use Custom Extended format with request timing.
- **Large sites (> 10M requests/day):** Use JSON format piped directly to your observability platform.
- **Microservices / Kubernetes:** Use JSON format exclusively for native integration with Fluentd and log collectors.
- **SEO-focused analysis:** Use Combined or Custom Extended with User-Agent field for bot detection and crawl analysis.

**Hybrid Approach:** Write two log files simultaneously -- Combined for backward compatibility and JSON for pipeline ingestion:

```
# Apache - Dual logging
CustomLog /var/log/apache2/access.log combined
CustomLog /var/log/apache2/access.json.log json

# Nginx - Dual logging
access_log /var/log/nginx/access.log combined;
access_log /var/log/nginx/access.json.log json_log;
```

## Conclusion

Apache and Nginx share a common heritage in the NCSA Common Log Format, but their configuration syntax and available variables differ significantly. Key takeaways:

- **CLF and Combined formats** remain the universal standard. Start here unless you have a specific reason not to.
- **Apache uses `%` directives** while **Nginx uses `$` variables**. The mapping is well-defined but not always one-to-one.
- **JSON logging** eliminates parsing ambiguity at the cost of disk space. Use `escape=json` in Nginx.
- **Request timing** (`%D` in Apache, `$request_time` in Nginx) is the single most valuable custom field you can add.
- **Log rotation** is non-negotiable in production. Nginx has a slight advantage with its zero-downtime `USR1` signal.
- **Dual logging** (Combined + JSON) gives you the best of both worlds at the cost of double disk usage.

Whatever format you choose, the most important step is to actually analyze your logs regularly.

> **Next Step:** Ready to analyze your Apache and Nginx logs without writing parsers? [LogBeast](https://getbeast.io/logbeast/) automatically detects CLF, Combined, Custom, and JSON log formats from both servers. Import your logs and get instant insights into traffic patterns, bot behavior, and performance metrics.

---

*Published by [GetBeast](https://getbeast.io) on February 12, 2025*