# Target Specification

This document specifies requirements for installation targets for the
integration of Vector.

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”,
“SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be
interpreted as described in [RFC 2119].

Other words, such as "agent", "aggregator", "node", and "service" are to be
interpreted as described in the [terminology document][terminology_document].

- [1. Introduction](#1-introduction)
- [2. Installation Targets](#2-installation-targets)
- [3. Deployment Architectures](#3-deployment-architectures)
  - [4. Agent Architecture](#4-agent-architecture)
  - [5. Aggregator Architecture](#5-aggregator-architecture)
  - [6. Unified Architecture](#6-unified-architecture)
- [7. Hardening](#7-hardening)

## 1. Introduction

In its simplest form, installing Vector consists of downloading the binary and
making it executable, but leaves much to be desired for users looking to
integrate Vector in real-world production environments. To adhere with Vector's
["reduce decisions" design principle][reduce_decisions], Vector must also be opinionated about how
it's deployed, providing easy facilities for adopting Vector's
[reference architectures][reference_architectures],
[achieving high availability][high_availability], and [hardening][hardening]
Vector.

## 2. Installation Targets

Vector supports a number of installation targets that can be categorized into:

- Virtual/Physical Machine
- Orchestration Platform

The primary differentiator between the two being that Virtual/Physical Machines
provide a single node as the deployment target, whereas Orchestration Platforms
allow for a scheduler to deploy Vector across a number of nodes. These categories
have their own requirements for each
[Deployment Architecture](#3-deployment-architectures).

Examples of Virtual/Physical Machine targets include, but are not limited to:

- Debian
- Docker
- RHEL
- Windows

Examples of Orchestration Platform targets include, but are not limited to:

- Kubernetes

## 3. Deployment Architectures

When supporting a target, Vector must support them through the paradigm of
architectures:

- Targets MUST support the [agent architecture][agent_architecture] by
  providing a single command that deploys Vector and achieves the
  [agent architecture requirements](#agent-architecture).
- Targets SHOULD support the [aggregator architecture][aggregator_architecture] by
  providing a single command that deploys Vector and achieves the
  [aggregator architecture requirements](#aggregator-architecture).
- Targets MAY support the [unified architecture][unified_architecture] by
  providing a single command that deploys Vector and achieves the
  [unified architecture requirements](#unified-architecture).

### 4. Agent Architecture

The [agent architecture][agent_architecture] deploys Vector on each individual
node for distributed data collection and processing. Along with general
[hardening](#7-hardening) requirements, the following requirements define support
for this architecture:

- Architecture
  - MUST deploy as a daemon on existing nodes, one Vector process per node.
  - MUST deploy with Vector's [default agent configuration][default_agent_configuration].
- Sizing
  - MUST deploy as a good infrastructure citizen, giving resource priority to
    other services on the same node.
  - SHOULD be limited to 1 vCPUs by default, MUST be overridable by the user.
  - SHOULD be limited to 2 GiB of memory per vCPU by default, MUST be
    overridable by the user.
  - SHOULD be limited to 1 GiB of disk space, MUST be overridable by the user.

### 5. Aggregator Architecture

The [aggregator architecture][aggregator_architecture] deploys Vector onto
dedicated nodes for data aggregation. Along with general [hardening](#7-hardening)
requirements, the following requirements define support for this architecture:

- Architecture
  - MUST deploy as a service with reserved/dedicated resources.
  - SHOULD deploy with a persistent disk that is available between deployments by default,
    MUST be overridable by the user if they do not want a persistent disk.
  - MUST deploy with Vector's [default aggregator configuration][default_aggregator_configuration].
  - Configured Vector ports, including non-default user configured ports,
    SHOULD be automatically accessible within the Cluster or VPC.
  - Configured Vector sources, including non-default user configured sources,
    SHOULD be automatically discoverable via target service discovery
    mechanisms.
- Sizing
  - MUST have dedicated/reserved resources that cannot be stolen by other services, preventing
    the "noisy neighbor" problem to the degree possible.
  - The Vector service SHOULD NOT be artificially limited with resource
    limiters such as cgroups.
  - SHOULD require 8 vCPUs by default, MUST be overridable by the user.
  - SHOULD require 2 GiB of memory per vCPU (16 GiB in this case) by default,
    MUST be overridable by the user.
  - SHOULD request 36 GiB of disk space per vCPU by default (288 GiB in this case),
    MUST be overridable by the user.

The following are additional requirements for Orchestration Platform installation
targets:

- High Availability
  - SHOULD deploy across multiple nodes by default, MUST be overridable by the user.
  - SHOULD deploy across multiple availability zones by default, MUST be overridable by the user.
- Scaling
  - SHOULD provide facilities for provisioning a load balancer to enable horizontal scaling
    out of the box. MUST be overridable by the user.
    - Cloud-managed load balancers (i.e., AWS NLB) SHOULD be supported in addition to
      self-managed load balancers (i.e., HAProxy).
    - Cloud-managed load balancers SHOULD be prioritized by default over self-managed
      load balancers.
    - Network load balancers (layer-4) SHOULD be prioritized over HTTP load balancers (layer-7)
  - Autoscaling SHOULD be enabled by default, driven by an average of 85%
    CPU utilization and a stabilization period of 5 minutes.

### 6. Unified Architecture

The [unified architecture][unified_architecture] deploys Vector on each
individual node as an agent and as a separate service as an aggregator.
The requirements for both the [agent](#4-agent-architecture) and the
[aggregator](#5-aggregator-architecture) apply to this architecture.
This architecture SHOULD NOT be installed on Virtual/Physical Machine
targets as there is little added benefit.

## 7. Hardening

- Setup
  - An unprivileged Vector service account SHOULD be created upon installation
    for running the Vector process.
- Data hardening
  - Swap SHOULD be disabled to prevent in-flight data from leaking to disk.
    Swap would also make Vector prohibitively slow.
  - Vector's data directory SHOULD be read and write restricted to Vector's
    dedicated service account.
  - Core dumps SHOULD be prevented for the Vector process to prevent in flight
    data from leaking to disk.
- Process hardening
  - Vector's artifacts
    - All communication during the setup process, such as downloading Vector
      artifacts, MUST use encrypted channels.
    - Downloaded Vector artifacts MUST be verified against the provided
      checksum.
    - The latest Vector version SHOULD be downloaded unless otherwise specified
      by the user.
  - Vector's configuration
    - Vector's configuration directory SHOULD be read restricted to Vector's
      service account.
  - Vector's runtime
    - Vector SHOULD be run under an unprivileged, dedicated service account.
    - Vector's service account SHOULD NOT have the ability to overwrite Vector's
      binary or configuration files. The only directory the Vector service
      account should write to is Vector’s data directory.
- Network hardening
  - Configured sources and sinks SHOULD use encrypted channels by default.

[agent_architecture]: https://vector.dev/docs/setup/going-to-prod/arch/agent/
[aggregator_architecture]: https://vector.dev/docs/setup/going-to-prod/arch/aggregator/
[default_agent_configuration]: https://github.com/vectordotdev/vector/blob/master/config/agent/vector.yaml
[default_aggregator_configuration]: https://github.com/vectordotdev/vector/blob/master/config/aggregator/vector.yaml
[hardening]: https://vector.dev/docs/setup/going-to-prod/hardening/
[high_availability]: https://vector.dev/docs/setup/going-to-prod/high-availability/
[reduce_decisions]: https://github.com/vectordotdev/vector/blob/master/docs/USER_EXPERIENCE_DESIGN.md#be-opinionated--reduce-decisions
[reference_architectures]: https://vector.dev/docs/setup/going-to-prod/arch/
[rfc 2119]: https://datatracker.ietf.org/doc/html/rfc2119
[terminology_document]: https://vector.dev/docs/reference/glossary/
[unified_architecture]: https://vector.dev/docs/setup/going-to-prod/arch/unified/