[{"content":"Introduction OpenCode is an open-source, terminal-native AI coding agent built in Go. Unlike cloud-based IDEs like Cursor or Claude Code, it is entirely provider-agnostic: you bring your own API keys, run models locally, or subscribe to managed plans, and OpenCode handles the orchestration. With support for 75+ LLM providers, native LSP integration, MCP extensibility, and a thriving plugin ecosystem, it has become a standard-bearer for agentic coding.\nThis guide is structured to take you from installation to production workflows. We begin with core concepts — agents, subagents, LSP, and MCP — then introduce Subagent-Driven Development (SDD), the methodology that ties orchestration, planning, and execution into a coherent workflow. From there, we map SDD to real-world domains: full-stack development, sysadmin operations, and DevOps infrastructure, drawing on documented use cases from developers in the field.\nWhat is OpenCode? OpenCode is an open-source AI coding agent (MIT license) designed to run in your terminal, desktop, or IDE. It treats agents as a runtime system, not loose prompts. Agents are defined in code or loaded from Markdown, merged into a shared registry, and executed through a unified prompt, permission, and session pipeline.\nKey Features Provider-agnostic: Claude, OpenAI, Google Gemini, Groq, Fireworks, Together AI, OpenRouter, Azure, AWS Bedrock, and local models via Ollama. Native Terminal UI: Built with Bubble Tea (Go) for a smooth TUI experience. Multi-session support: Run multiple agents in parallel, each with its own context. LSP integration: Automatically loads Language Server Protocol servers for code intelligence. MCP support: Extend capabilities via the Model Context Protocol. Plugin system: TypeScript/JavaScript plugins with 25+ lifecycle hooks. Client/Server architecture: Run the server headlessly and connect from multiple clients. Privacy-first: Does not store your code or context data. Installation 1 2 3 4 5 6 7 8 # Via install script curl -fsSL https://opencode.ai/install | bash # Via npm npm install -g opencode-ai # Start OpenCode opencode On first launch, OpenCode detects your project structure, initializes a .opencode/ directory, and prompts you to configure a model provider.\nOpenCode Go: Low-Cost Model Access OpenCode Go is a low-cost subscription ($5 for the first month, then $10/month) that provides reliable access to curated open-source coding models. It is designed for developers who want generous limits and stable global access without managing multiple API keys.\nWhat You Get Price: $5 first month, then $10/month. Cancel anytime. Models included: GLM-5.1, GLM-5, Kimi K2.5, Kimi K2.6, MiMo-V2-Pro, MiMo-V2-Omni, MiMo-V2.5-Pro, MiMo-V2.5, Qwen3.5 Plus, Qwen3.6 Plus, MiniMax M2.5, MiniMax M2.7, DeepSeek V4 Pro, DeepSeek V4 Flash. Hosting: US, EU, and Singapore for stable global access. Privacy: Zero-retention policy; providers do not use your data for model training. Usage Limits Limits are defined in dollar value rather than fixed request counts:\nWindow Limit Per 5 hours $12 Per week $30 Per month $60 Cheaper models stretch further. DeepSeek V4 Flash allows ~31,650 requests per 5 hours, while GLM-5.1 allows ~880.\nSetup Subscribe at opencode.ai/go. Copy your API key. In the TUI, run /connect, select OpenCode Go, and paste your key. Run /models to see available models. Model IDs use the format opencode-go/\u0026lt;model-id\u0026gt;, for example:\n1 2 3 { \u0026#34;model\u0026#34;: \u0026#34;opencode-go/kimi-k2.6\u0026#34; } Core Concepts Before diving into workflows, it is essential to understand the building blocks of OpenCode: agents, subagents, LSP, MCP, and the project rules system.\nAgents and Subagents OpenCode has two types of agents:\nPrimary agents: The main assistants you interact with directly. Cycle through them with the Tab key. Built-in primary agents include Build (full tools) and Plan (read-only, cannot modify files). Subagents: Specialized assistants that primary agents invoke for specific tasks. You can also invoke them manually with @mention. Built-in subagents:\nSubagent Role Tools General Multi-step research and execution Full access except todo Explore Fast, read-only codebase exploration Read, search only When one agent delegates work, it does not simply append a prompt. It creates a child session with fresh context, passes a scoped instruction, and receives a structured result back. This makes delegation session-based, resumable, and inspectable.\nLSP (Language Server Protocol) LSP integration gives OpenCode deep code intelligence. The AI can see type information, function signatures, import paths, and diagnostics — not just raw text.\nSupported operations: goToDefinition, findReferences, hover, documentSymbol, workspaceSymbol, goToImplementation, prepareCallHierarchy, incomingCalls, outgoingCalls.\nThe lsp tool is available when OPENCODE_EXPERIMENTAL_LSP_TOOL=true is set. OpenCode includes pre-configured LSP servers for 30+ languages.\nMCP (Model Context Protocol) MCP servers extend OpenCode with external tools and services. Built-in MCP-like tools include:\nwebsearch: Performs web searches via Exa AI (no API key required with OpenCode provider). webfetch: Fetches and reads specific URLs. lsp: Interacts with configured Language Servers. Custom MCP servers are configured in opencode.json:\n1 2 3 4 5 6 7 8 9 10 { \u0026#34;mcpServers\u0026#34;: { \u0026#34;github\u0026#34;: { \u0026#34;type\u0026#34;: \u0026#34;stdio\u0026#34;, \u0026#34;command\u0026#34;: \u0026#34;npx\u0026#34;, \u0026#34;args\u0026#34;: [\u0026#34;-y\u0026#34;, \u0026#34;@modelcontextprotocol/server-github\u0026#34;], \u0026#34;env\u0026#34;: [\u0026#34;GITHUB_PERSONAL_ACCESS_TOKEN=ghp_...\u0026#34;] } } } Be selective: MCP servers add tool definitions to your context window. The GitHub MCP alone can consume significant tokens.\nAGENTS.md and Project Rules Run /init to generate an AGENTS.md file in your project root. This file teaches OpenCode about your project structure, conventions, and coding patterns. It is similar to Cursor\u0026rsquo;s rules and improves the quality of generated code.\nExample from a production TypeScript monorepo:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 # Project: Payment Processing API This is a TypeScript monorepo using Bun workspaces. ## Structure - `packages/core/` - Shared business logic - `packages/api/` - Express API handlers - `packages/workers/` - Background job processors ## Conventions - Use Zod for all input validation - All database queries go through the repository pattern - Prefer composition over inheritance - Test files live next to source files (*.test.ts) ## Commands - `bun test` - Run all tests - `bun run lint` - Run ESLint and Prettier - `bun run build` - TypeScript compilation Permissions OpenCode filters tools before the model sees them, then checks permissions again at execution time. This makes orchestration bounded by policy, not trust.\n1 2 3 4 5 6 7 { \u0026#34;permission\u0026#34;: { \u0026#34;bash\u0026#34;: \u0026#34;ask\u0026#34;, \u0026#34;edit\u0026#34;: \u0026#34;ask\u0026#34;, \u0026#34;write\u0026#34;: \u0026#34;allow\u0026#34; } } Values: allow, deny, ask. For production or sensitive environments, set bash and edit to ask.\nAgent Orchestration with oh-my-opencode-slim oh-my-opencode-slim is an agent orchestration plugin for OpenCode. Instead of forcing a single model to handle every task, it routes jobs to specialized subagents, balancing quality, speed, and cost.\nThe Pantheon Agent Role Default Model Orchestrator Master delegator and strategic coordinator openai/gpt-5.4 Explorer Fast codebase search and pattern matching openai/gpt-5.4-mini Librarian External documentation and library research openai/gpt-5.4-mini Oracle Strategic technical advisor, code reviewer, simplifier openai/gpt-5.4 Designer UI/UX specialist for visual polish openai/gpt-5.4-mini Fixer Fast implementation specialist for bounded tasks openai/gpt-5.4-mini Observer Read-only visual analysis (images, PDFs, diagrams) Disabled by default Installation 1 bunx oh-my-opencode-slim@latest install The installer generates a default OpenAI configuration. Edit ~/.config/opencode/oh-my-opencode-slim.json to use Kimi, GitHub Copilot, or other providers. The configuration supports JSONC and includes an official JSON Schema for autocompletion.\nKey Features Council: Run multiple models in parallel and synthesise a single answer with @council. Multiplexer Integration: Watch agents work live in Tmux or Zellij panes. Session Management: Reuse recent child-agent sessions with short aliases. Auto-continue: Automatically resume orchestrator sessions with cooldowns and safety checks. Preset Switching: Switch agent model presets at runtime with /preset. Cartography Skill: Generate hierarchical codemaps to understand large codebases faster. Interview: Turn rough ideas into structured markdown specs via a browser-based Q\u0026amp;A flow. Subagent-Driven Development (SDD) Subagent-Driven Development is the methodology that makes agent orchestration practical. It is not a single tool but a workflow pattern: decompose work into independent tasks, dispatch a fresh subagent for each task, and enforce review before completion. SDD prevents context pollution, controls costs, and maintains quality.\nThere are two complementary interpretations of SDD in the OpenCode ecosystem:\nSpec-Driven Development: Requirements → Design → Tasks → Implementation. Specs are temporary scaffolding; code is the source of truth. Delete specs after implementation. Subagent-Driven Development: Each independent task gets a fresh subagent with isolated context, followed by automatic two-stage review. In practice, these merge: you write a spec, decompose it into tasks, and dispatch subagents for each task with review gates.\nWhen to Use SDD Use SDD when:\nYou have a detailed implementation plan. Tasks are mostly independent with weak dependencies. You want to complete all tasks in one session without switching contexts. Quality gates (spec compliance + code review) are non-negotiable. Avoid SDD for tiny one-file changes where the overhead of subagent dispatch exceeds the benefit. In those cases, use a single Build agent directly.\nThe SDD Workflow Phase 1: Exploration and Spec Writing The Orchestrator or Planner agent analyses the request and creates a specification.\n1 2 3 4 5 # Initialize SDD scaffolding (if using sdd-flow or Agent Teams Lite) /sdd-init # Or manually: switch to Plan mode and ask for a spec \u0026#34;Plan a user authentication system with JWT tokens, refresh token rotation, and role-based access control. Write the spec to specs/auth.md\u0026#34; Phase 2: Task Decomposition Break the spec into independent tasks. Each task should have a single, bounded scope.\nExample tasks for an auth system:\nImplement JWT token generation and validation utilities. Create login and refresh endpoints. Add middleware for role-based access control. Write unit tests for token utilities. Phase 3: Subagent Dispatch For each task, the Orchestrator spawns a fresh subagent via the Task tool. Each subagent starts with zero context from previous tasks, preventing pollution. The Orchestrator injects only the relevant spec section and project standards.\n1 Orchestrator → Task(subagent=\u0026#34;sdd-apply\u0026#34;, prompt=\u0026#34;Implement task 1: JWT utilities. Read specs/auth.md section 3.1. Follow TDD. Run tests before returning.\u0026#34;) Phase 4: Two-Stage Review After a subagent completes its task, SDD enforces two review stages before marking the task complete:\nSpec Compliance Review: A reviewer subagent checks whether the implementation matches the spec exactly. Did it implement what was asked? Are the interfaces correct? Code Quality Review: A second reviewer checks security, performance, maintainability, and test coverage. If either review fails, the task is sent back to the implementer subagent with feedback. This creates automatic checkpoints.\nPhase 5: Integration and Validation Once all tasks pass review, the Orchestrator integrates the work, runs the full test suite, and validates end-to-end behaviour.\nSDD Tools and Plugins Several community projects implement SDD workflows for OpenCode:\ncc-sdd (Spec-Driven Development) cc-sdd brings structured Spec-Driven Development to OpenCode with slash commands:\n1 npx cc-sdd@latest --opencode-skills Commands include:\n/kiro-spec-init: Start a new feature spec. /kiro-spec-requirements: Write requirements. /kiro-spec-design: Create architecture design. /kiro-spec-tasks: Generate task checklist. /kiro-impl: Autonomous implementation with per-task subagents, TDD, and independent review. Each task gets a fresh implementer running TDD (RED → GREEN), an independent reviewer, and an auto-debug pass if blocked.\nsdd-flow sdd-flow is a plugin that embeds SDD directly into your repository:\n1 2 3 4 5 6 7 8 9 # One-time bootstrap /sdd-init # Switch to the \u0026#34;Spec Driven\u0026#34; planning agent # Describe your feature in natural language # Approve each phase before it advances # Execute the plan /implement Assets are repo-local: .opencode/skills/, .specify/, specs/, and AGENTS.md. This means the workflow travels with the code, not just the developer\u0026rsquo;s local config.\nAgent Teams Lite (Gentleman Programming) Agent Teams Lite provides a complete SDD orchestrator with 10 specialised sub-agents and slash commands:\nCommand Purpose /sdd-init Initialise SDD context /sdd-explore Investigate an idea /sdd-new Start a new change /sdd-apply Implement tasks /sdd-verify Validate implementation /sdd-archive Archive completed change The system uses a skill registry: .atl/skill-registry.md captures project conventions, and the orchestrator injects compact rules into each subagent prompt as ## Project Standards (auto-resolved).\nSDD Profiles OpenCode SDD Profiles allow you to create named model configurations and switch between them with Tab inside OpenCode. Each profile generates its own orchestrator plus 10 sub-agents in opencode.json:\n1 2 3 gentle-ai sync \\ --profile cheap:anthropic/claude-haiku-3.5-20241022 \\ --profile-phase cheap:sdd-apply:anthropic/claude-sonnet-4-20250514 This creates a \u0026ldquo;cheap\u0026rdquo; profile where everything runs on Haiku except sdd-apply, which uses Sonnet. Press Tab to cycle between sdd-orchestrator, sdd-orchestrator-cheap, and sdd-orchestrator-premium.\nSubagent-to-Subagent Delegation OpenCode PR #7756 introduced subagent-to-subagent delegation with configurable task_budget and depth limits to prevent infinite loops. By default, only primary agents can task subagents. To enable subagent delegation, set a budget:\n1 2 3 4 5 6 7 8 9 10 11 12 { \u0026#34;agent\u0026#34;: { \u0026#34;sdd-apply\u0026#34;: { \u0026#34;task_budget\u0026#34;: 3, \u0026#34;permission\u0026#34;: { \u0026#34;task\u0026#34;: { \u0026#34;sdd-debug\u0026#34;: \u0026#34;allow\u0026#34; } } } } } This allows an implementer subagent to spawn a debugger subagent up to 3 times if it gets stuck.\nModel Configuration and Presets High-Performance Setup 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 { \u0026#34;$schema\u0026#34;: \u0026#34;https://opencode.ai/config.json\u0026#34;, \u0026#34;agents\u0026#34;: { \u0026#34;coder\u0026#34;: { \u0026#34;model\u0026#34;: \u0026#34;anthropic/claude-sonnet-4-5-20250929\u0026#34;, \u0026#34;maxTokens\u0026#34;: 8000, \u0026#34;reasoningEffort\u0026#34;: \u0026#34;medium\u0026#34; }, \u0026#34;task\u0026#34;: { \u0026#34;model\u0026#34;: \u0026#34;openai/gpt-4.1-mini\u0026#34;, \u0026#34;maxTokens\u0026#34;: 5000 }, \u0026#34;title\u0026#34;: { \u0026#34;model\u0026#34;: \u0026#34;openai/gpt-4.1-mini\u0026#34; } } } Budget Setup (~$30/month) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 { \u0026#34;agents\u0026#34;: { \u0026#34;coder\u0026#34;: { \u0026#34;model\u0026#34;: \u0026#34;opencode-go/kimi-k2.6\u0026#34;, \u0026#34;maxTokens\u0026#34;: 8000 }, \u0026#34;task\u0026#34;: { \u0026#34;model\u0026#34;: \u0026#34;opencode-go/deepseek-v4-flash\u0026#34;, \u0026#34;maxTokens\u0026#34;: 5000 }, \u0026#34;title\u0026#34;: { \u0026#34;model\u0026#34;: \u0026#34;copilot/gpt-4o-mini\u0026#34; } } } Free Setup 1 2 3 4 5 6 7 8 9 10 11 { \u0026#34;provider\u0026#34;: { \u0026#34;ollama\u0026#34;: { \u0026#34;api_url\u0026#34;: \u0026#34;http://localhost:11434/v1\u0026#34; } }, \u0026#34;model\u0026#34;: { \u0026#34;default\u0026#34;: \u0026#34;opencode/minimax-m2.5-free\u0026#34;, \u0026#34;fast\u0026#34;: \u0026#34;ollama/qwen3.6-35b-a3b\u0026#34; } } DevOps-Focused Configuration 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 { \u0026#34;$schema\u0026#34;: \u0026#34;https://opencode.ai/config.json\u0026#34;, \u0026#34;model\u0026#34;: \u0026#34;opencode-go/qwen3.6-plus\u0026#34;, \u0026#34;permission\u0026#34;: { \u0026#34;bash\u0026#34;: \u0026#34;ask\u0026#34;, \u0026#34;edit\u0026#34;: \u0026#34;ask\u0026#34;, \u0026#34;write\u0026#34;: \u0026#34;allow\u0026#34; }, \u0026#34;agents\u0026#34;: { \u0026#34;devops-engineer\u0026#34;: { \u0026#34;model\u0026#34;: \u0026#34;opencode-go/glm-5\u0026#34;, \u0026#34;maxTokens\u0026#34;: 8000, \u0026#34;instructions\u0026#34;: \u0026#34;You are a DevOps specialist. Confirm before applying Kubernetes manifests. Prefer idempotent patterns.\u0026#34; } }, \u0026#34;lsp\u0026#34;: { \u0026#34;yaml\u0026#34;: { \u0026#34;command\u0026#34;: \u0026#34;yaml-language-server\u0026#34;, \u0026#34;args\u0026#34;: [\u0026#34;--stdio\u0026#34;] }, \u0026#34;terraform\u0026#34;: { \u0026#34;command\u0026#34;: \u0026#34;terraform-ls\u0026#34;, \u0026#34;args\u0026#34;: [\u0026#34;serve\u0026#34;] } } } Model Variants and Profiles Many models support reasoning variants. Use variant_cycle to switch between low, medium, high, and xhigh.\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 { \u0026#34;provider\u0026#34;: { \u0026#34;anthropic\u0026#34;: { \u0026#34;models\u0026#34;: { \u0026#34;claude-sonnet-4-5-20250929\u0026#34;: { \u0026#34;options\u0026#34;: { \u0026#34;thinking\u0026#34;: { \u0026#34;type\u0026#34;: \u0026#34;enabled\u0026#34;, \u0026#34;budgetTokens\u0026#34;: 16000 } } } } } } } Multi-provider fallbacks prevent session failure:\n1 2 3 4 5 6 7 8 { \u0026#34;agents\u0026#34;: { \u0026#34;coder\u0026#34;: { \u0026#34;model\u0026#34;: \u0026#34;opencode-go/kimi-k2.6\u0026#34;, \u0026#34;fallback_models\u0026#34;: [\u0026#34;opencode-go/glm-5\u0026#34;, \u0026#34;anthropic/claude-sonnet-4-5\u0026#34;] } } } Domain-Specific Applications The following sections map the SDD methodology to real-world use cases documented by developers, sysadmins, and DevOps engineers.\nFull-Stack Development Greenfield Application with SDD In a project-based tutorial by ZBuild, a developer built a full-stack bookmark manager in ~30 minutes using OpenCode. The stack included Express.js/TypeScript, SQLite with FTS5, and a vanilla frontend.\nMapped to SDD phases:\nSpec: Plan mode outlined the data model, API routes, and database schema. The spec was saved to .opencode/plans/bookmark-manager.md. Tasks: Decomposed into scaffolding, data model, API endpoints, search, tags, frontend, and tests. Dispatch: Build mode implemented each task sequentially, with the agent running curl to verify API behaviour after each endpoint. Review: The agent ran the test suite and fixed failures before proceeding. This Plan/Build cycle prevents the AI from making large structural decisions blindly.\nMulti-Agent Refactoring with Orchestration A Vercel engineer documented using OpenCode with the Vercel AI Gateway to migrate an authentication module from session-based auth to JWT tokens using the ulw (ultrawork) keyword.\nThe SDD-style workflow:\nOrchestrator (Claude Opus 4.6) received the request. It spawned two background subagents in parallel: Explore (GPT-5 Mini): Mapped 12 files in the auth flow. Librarian (Claude Sonnet 4.6): Found the framework\u0026rsquo;s recommended JWT pattern. After receiving findings, the Orchestrator delegated subtasks: Cryptographic logic → GPT-5.4 worker. Middleware updates → Claude Haiku 4.5 worker. Agent-browser verified the login flow in a real browser. Result: ~70% cost reduction compared to a single large model, with no manual model switching.\nMulti-Lens Code Review as SDD Verification JP Caparas built a multi-agent review system that mirrors the SDD two-stage review pattern. A lead reviewer agent analyses the diff, then spawns specialist reviewers in parallel:\nreview-frontend: .tsx, .jsx, .vue, .css, .scss files. review-backend: .py, .go, .ts in api/ or services/. review-devops: Dockerfile, *.yaml, *.tf, .github/workflows/*. The lead agent synthesises findings into a unified report: LGTM, NEEDS CHANGES, or DISCUSS. This is the verification stage of SDD applied to existing code rather than new implementation.\nE2B Sandbox for Teams A Reddit user documented running OpenCode inside E2B cloud sandboxes for non-technical users. The workflow uses:\nA tailored AGENTS.md with persona rules and anti-hallucination guardrails. A three-file context system: PROJECT.md (spec), MEMORY.md (build notes), and a slim conversation log. Automated verification after each build (schema checks, API wiring validation). Auto-commit every 5 minutes. This shows SDD principles applied in a managed environment: spec first, execution second, verification always.\nSysAdmin and Operations Natural-Language Remote Server Management The argv.cloud blog introduced \u0026ldquo;Agentic Sysadmin\u0026rdquo; using OpenCode with a custom remote.ts tool that executes commands over SSH.\nSetup:\nStandard .ssh/config with host aliases. Custom tool wrapping execa to run ssh -o BatchMode=yes \u0026lt;host\u0026gt; \u0026lt;command\u0026gt;. Optional sudo via OC_SSH env variable injected into stdin (the LLM never sees the password). SDD mapping:\nSpec: \u0026ldquo;Audit test-server with Lynis and generate a Markdown summary.\u0026rdquo; Task: Run Lynis quietly, then read the report. Dispatch: The agent runs the command remotely and fetches the report. Review: The agent synthesises the raw output into structured Markdown locally. Example prompts:\n1 2 opencode ask \u0026#34;Update system packages on server-33 and check for leftover services\u0026#34; opencode ask \u0026#34;Create a crontab on big-pc that runs Lynis weekly and saves the report to /var/log/lynis-weekly.log\u0026#34; There is no YAML, no Ansible playbook, and no Puppet manifest. The LLM reasons about the current state and generates appropriate commands on the fly.\nServer Documentation and Migration Pedro Serey mapped a \u0026ldquo;black box\u0026rdquo; server into structured documentation using the Explore agent. Discovery commands (lsblk, ip, systemctl list-units) were synthesised into:\nstorage.md: Drives, mount points, filesystems. services/README.md: Containers, Docker Compose files, environment variables. network.md: Interfaces, routes, firewall rules. After mapping, the documentation served as the spec for a Proxmox migration plan. The agent generated a disciplined backup routine with proper database dumps, preventing data corruption.\nCritical lesson: After the agent staged unintended git commits during migration, the author pivoted to a human-in-the-loop workflow: Plan agent proposes commands, human executes in tmux, output is pasted back. This is Plan mode used as an SDD spec/review gate before execution.\nAI-Powered Shell Command Generation The zsh-ask-opencode plugin integrates OpenCode into ZSH. Press Ctrl+O to transform natural language into optimised shell commands:\n1 2 3 4 5 $ list all files modified in the last 24 hours # ^O generates: find . -mtime -1 -type f $ compress all jpg files in this directory to 50% size # ^O generates: convert *.jpg -quality 50% compressed.jpg The plugin ranks three options by speed, safety, and reliability. You review before execution — a manual review gate for one-line tasks.\nInteractive PTY Management The opencode-pty plugin gives OpenCode control over pseudo-terminals. Unlike the synchronous bash tool, pty_spawn allows:\nBackground processes (e.g., tail -f /var/log/syslog). Interactive input (Ctrl+C, arrow keys). Terminal snapshots without ANSI noise. Waiting until screen content matches a regex. This is essential for sysadmin tasks involving long-running processes, such as monitoring deployments or stepping through interactive installers.\nDevOps and Infrastructure Infrastructure-as-Code Generation ComputingForGeeks documented using OpenCode for DevOps workflows. The agent excels at generating boilerplate:\nTerraform: Variables, outputs, resource scaffolding. Ansible: Playbooks with SELinux awareness and handlers. Kubernetes: Deployment, Service, Ingress, HPA manifests. Example prompt:\n1 opencode run \u0026#34;Create Kubernetes manifests for a Python Flask app: a Deployment with 3 replicas, resource limits, health checks, and a non-root security context. Add a ClusterIP Service and an Ingress with TLS.\u0026#34; Honest assessment: AI agents consistently get provider version pinning and complex state dependencies wrong. Treat AI-generated infrastructure like a pull request from a junior engineer: always run terraform plan and test in staging.\nSDD mapping for IaC:\nSpec: \u0026ldquo;We need a three-tier AWS architecture with VPC, private subnets, RDS, and an EKS cluster.\u0026rdquo; Tasks: VPC module, subnet module, RDS module, EKS module, outputs. Dispatch: Each module gets a subagent. Terraform LSP validates syntax. Review: terraform plan is the spec compliance check. A second reviewer checks for security anti-patterns (open security groups, hardcoded secrets). Multi-Agent CI/CD Pipeline Creation Using ultrawork mode:\n1 opencode run \u0026#34;@ultrawork Set up a complete CI/CD pipeline with GitHub Actions that builds a Docker image, pushes to ECR, runs Trivy security scan, deploys to EKS staging with Helm, runs integration tests, and promotes to production on approval.\u0026#34; The planner agent ensures cohesion:\n.github/workflows/deploy.yml references exact Helm values files. The Docker image tag propagates through every step. scripts/integration-test.sh hits the correct staging URL. This is SDD at scale: one spec becomes dozens of coordinated files, with the planner acting as the Orchestrator ensuring cross-file consistency.\nProject-Specific DevOps Agents The jon23d/opencode-configs repository demonstrates a mature agent hierarchy:\ndockerfile-best-practices/SKILL.md deployment-planning/SKILL.md kubernetes-manifests/SKILL.md The pipeline models a full software delivery lifecycle:\n1 2 3 4 5 6 7 8 9 10 User request → build (clarify) → architect (plan) → build (review plan) → backend/frontend engineers (implement) → code-reviewer → security-reviewer → observability-reviewer → qa (E2E + OpenAPI) → devops-engineer (infra change) → developer-advocate (docs, docker-compose) → build (report) This is SDD with specialised subagents for each review stage.\nKubernetes-Native Agents with KubeOpenCode KubeOpenCode runs OpenCode agents as Kubernetes CRDs:\n1 2 3 4 5 6 7 8 9 10 11 12 apiVersion: kubeopencode.io/v1alpha1 kind: Agent metadata: name: dev-agent spec: profile: \u0026#34;Interactive development agent\u0026#34; workspaceDir: /workspace persistence: sessions: size: \u0026#34;2Gi\u0026#34; standby: idleTimeout: \u0026#34;30m\u0026#34; Attach from your terminal:\n1 kubeoc agent attach default -n kubeopencode-system This is ideal for CI/CD pipelines and team-shared agents. It uses a two-container pattern: an init container copies the OpenCode binary, and the worker container executes tasks.\nDocker and Local Model Runners Community projects provide ready-made Docker setups:\nnimbleflux/opencode-docker: Docker image, Compose, and Helm chart. utek/opencode-docker: Lightweight environment with Node.js, Python, Git, and GitHub CLI. Docker Model Runner: Connect OpenCode to locally served models. Security, Safety and Best Practices Default Permissions Are Permissive By default, OpenCode enables most tools without approval. A Reddit PSA highlighted that the agent can generate a Python script and immediately execute it. Lock this down:\n1 2 3 4 5 6 7 { \u0026#34;permission\u0026#34;: { \u0026#34;bash\u0026#34;: \u0026#34;ask\u0026#34;, \u0026#34;edit\u0026#34;: \u0026#34;ask\u0026#34;, \u0026#34;write\u0026#34;: \u0026#34;ask\u0026#34; } } Use Plan Mode for Audits For sysadmins and SREs, Plan mode is essential for auditing scripts and infrastructure-as-code without accidentally running anything.\nHuman-in-the-Loop for Production Even with good intentions, an agent with SSH access can stage unintended changes. The recommended workflow:\nAgent proposes the command in Plan mode. Human reviews and executes manually (e.g., in a tmux pane). Human pastes the output back to the agent. Sandboxing Containers/VMs: Run OpenCode inside Docker or a VM. OS-level sandboxing: nono uses Landlock (Linux) and Seatbelt (macOS) for default-deny access. 1 nono run --allow ./my_project_dir -- opencode Context Window Management Developers have reported that context compaction can occur multiple times during large tasks. Mitigations:\nEnable autoCompact: true. Use the Cartography skill to generate codemaps. Break large tasks into smaller, scoped sessions. In SDD, fresh subagents per task naturally limit context sprawl. Advanced Configuration Custom Agents Create project-specific agents by adding markdown files to .opencode/agents/:\n1 2 3 4 5 6 7 8 9 --- description: \u0026#34;Reviews code for best practices and potential issues\u0026#34; mode: subagent model: anthropic/claude-sonnet-4-20250514 permission: edit: deny --- You are a code reviewer. Focus on security, performance, and maintainability. Do not make changes; only provide feedback. Context Paths Include additional context files:\n1 2 3 4 5 6 7 { \u0026#34;contextPaths\u0026#34;: [ \u0026#34;.github/copilot-instructions.md\u0026#34;, \u0026#34;.cursorrules\u0026#34;, \u0026#34;opencode.md\u0026#34; ] } Auto-Compact 1 2 3 { \u0026#34;autoCompact\u0026#34;: true } Session Management Use multi-session to work on multiple features in parallel. Use /undo and /redo to revert changes. Share session links with teammates. Conclusion OpenCode represents a new paradigm in AI-assisted development: open-source, provider-agnostic, and deeply integrated into the tools developers already use. But the tool alone is not enough. Subagent-Driven Development provides the methodology that makes agent orchestration coherent, safe, and scalable.\nThe SDD workflow — explore, spec, decompose, dispatch, review, integrate — maps naturally to full-stack development, sysadmin operations, and DevOps infrastructure. Real-world practitioners have shown that this pattern scales from a 30-minute bookmark manager to multi-agent CI/CD pipelines, from natural-language server management to Kubernetes-native agent platforms.\nKey takeaways:\nStart with a spec. Use Plan mode, AGENTS.md, and the Cartography skill to build context before writing code. Decompose into independent tasks. Each task gets a fresh subagent with isolated context. Enforce two-stage review. Spec compliance first, code quality second. No task passes without both. Configure permissions carefully. Set bash and edit to ask in production. Use sandboxes. Leverage specialised agents. Let the Orchestrator route work; do not force one model to do everything. Use OpenCode Go for reliable, low-cost access to curated open-source models. Keep a human in the loop for production operations, especially when the agent has shell or SSH access. Happy agentic coding!\nReferences OpenCode Official Site OpenCode Documentation OpenCode Go OpenCode GitHub Repository OpenCode Agents Documentation OpenCode Rules Documentation oh-my-opencode-slim GitHub Oh My OpenCode (Original Plugin) cc-sdd: Spec-Driven Development sdd-flow: Spec-Driven Development for OpenCode Agent Teams Lite Gentleman AI: SDD Profiles Subagent-Driven Development Tutorial Agentic Coding: Spec-Driven Development DEV Community: Agent Orchestration in OpenCode Vercel KB: OpenCode with AI Gateway ZBuild: Full-Stack Bookmark Manager Tutorial JP Caparas: Multi-Agent Code Review Graphwiz: Vibe Coding with OpenCode AI for You: OpenCode Practical Examples argv.cloud: Agentic Sysadmin Pedro Serey: Using OpenCode as a SysAdmin zsh-ask-opencode GitHub opencode-pty GitHub ComputingForGeeks: AI Agents for DevOps jon23d/opencode-configs GitHub KubeOpenCode Docker Docs: OpenCode with Docker Model Runner Sidekick Agent Hub (VS Code Extension) Reddit: E2B Cloud Sandboxes Reddit: Context Window Management Reddit: Security and Permissions ","date":"2026-05-02T10:00:00+01:00","permalink":"/en/p/opencode-agent-orchestration-and-subagent-driven-development-a-complete-guide/","title":"OpenCode, Agent Orchestration and Subagent-Driven Development: A Complete Guide"},{"content":"Where We Stand In the first two posts of the series, we set up OPNsense from scratch and took it to a configuration that is no longer trivial. Let\u0026rsquo;s do a quick recap before continuing.\nLayer What Was Configured Post Hardware Mini PC with Intel N100/N200, Intel i226-V NICs First Connectivity PPPoE on WAN, bridge on LAN, wireless interface First Detection Suricata IDS/IPS with ET Open, Abuse.ch, Feodo rulesets First Shared Intelligence CrowdSec with firewall bouncer and community collections First VPN WireGuard with configured peers First Firewall Rules Basic rules for LAN, WireGuard, and WAN First Offloading Checksum, TSO, TCP buffer tuning First Users Dedicated admin with OTP, restricted root, protected WebUI First Backups AES-256-CBC encrypted, automatic, external storage Second DPI Zenarmor with per-interface and category policies Second Segmentation 5 VLANs (Main, Guests, IoT, Servers, Management) with inter-VLAN rules Second SSH ed25519 keys only, non-standard port, IP-restricted access Second DNS Unbound with DNS over TLS to Quad9 and Cloudflare Second Logging Remote syslog with TCP/TLS to Grafana+Loki or ELK Second It\u0026rsquo;s a solid configuration. But so far everything has been done manually, through the web interface, without a formal review process or a way to reproduce the configuration if something breaks beyond the XML backup. This post covers what\u0026rsquo;s missing: advanced practices, automation, a serious auditing framework, and a comparison with alternatives to make informed decisions.\nSecurity Posture Review Defense in Depth: What We Have If we look at what we\u0026rsquo;ve configured as defense layers, the architecture has some depth:\nDefense Layer OPNsense Component Current State Perimeter WAN deny-all rules + incoming WireGuard Functional Intrusion Detection Suricata IPS with ET Open and Abuse.ch Functional, default tuning Log Analysis CrowdSec bouncer + community intelligence Functional Application Inspection Zenarmor DPI with per-VLAN policies Functional Segmentation 5 VLANs with deny-all inter-VLAN rules by default Functional Access Control Dedicated admin, OTP, SSH with ed25519 keys Functional DNS Encryption Unbound with DoT to Quad9/Cloudflare Functional Backups Encrypted, automatic, external storage Functional Five layers of detection and prevention, real segmentation, and reasonable access control. For a homelab or small office, it\u0026rsquo;s more than most have. But there are gaps.\nRemaining Gaps Being honest, there are several things that haven\u0026rsquo;t been addressed that matter:\nNo geolocation filtering. The entire planet can try to connect to exposed ports on WAN. Most automated attacks come from IP ranges with which you have no legitimate relationship. No reverse proxy. If self-hosted services are exposed (Nextcloud, Jellyfin), they go directly via NAT without TLS termination or application-level protection. Suricata is at default configuration. Rules are enabled, but performance parameters haven\u0026rsquo;t been tuned and irrelevant categories haven\u0026rsquo;t been removed. Everything has been configured manually. If tomorrow OPNsense needs to be rebuilt from scratch, the XML backup is the only option. There are no playbooks, no version control of the configuration, no way to review what changed and when without opening the web interface history. No formal audit cadence. The second post mentioned a weekly/monthly/quarterly routine, but without a framework behind it, it\u0026rsquo;s easy for it to remain good intentions. No automated threat feeds beyond Suricata rule updates. IP blocklists don\u0026rsquo;t update themselves. The following sections cover these gaps.\nAdvanced Practices GeoIP Blocking with MaxMind The idea is simple: if there\u0026rsquo;s no legitimate reason for a connection to come from certain countries, blocking those IP ranges reduces noise. It\u0026rsquo;s not a real security measure, because any attacker with a VPN bypasses it, but it does eliminate a significant amount of automated scans and brute force attacks.\nOPNsense uses MaxMind\u0026rsquo;s free GeoLite2 databases, which require an account.\nRegister a free account at MaxMind and generate a license key. In Firewall \u0026gt; Aliases \u0026gt; GeoIP settings, enter the license key. Create a GeoIP type alias in Firewall \u0026gt; Aliases: Name: GeoIP_Block Type: GeoIP Content: select the countries to block. In Firewall \u0026gt; Rules \u0026gt; WAN, add a rule at the top: Action: Block Source: GeoIP_Block Destination: * Description: Geographic blocking One warning: keep this list updated. MaxMind updates the databases weekly. Configure automatic update in the GeoIP settings so they don\u0026rsquo;t become obsolete.\nAdvanced Suricata Tuning OPNSense 26.1 includes Suricata 8, which improves multi-thread performance and adds support for new protocols. But the default configuration is conservative. If the hardware has headroom, tuning these parameters makes a difference.\nIn Services \u0026gt; Intrusion Detection \u0026gt; Administration, the advanced section allows configuring parameters not in the standard interface. The most relevant:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 # /usr/local/opnsense/service/conf/actions.d/conf.d/ # Adjust based on available RAM (these values are for 8 GB) # Maximum memory for flow tracking flow: memcap: 256mb hash-size: 65536 # Maximum memory for TCP stream reconstruction stream: memcap: 512mb reassembly.memcap: 256mb # Ring buffer size for af-packet af-packet: - interface: igb0 ring-size: 30000 cluster-type: cluster_flow Beyond memory tuning, review the active rule categories. If there are no SCADA servers, exposed databases, or SMTP services on the network, disable those categories. Each active rule consumes CPU on every inspected packet.\nTo identify the rules generating the most noise, analyze the EVE JSON log:\n1 2 3 4 # Top 10 alerts by SID in the last 24 hours cat /var/log/suricata/eve.json | \\ jq -r \u0026#39;select(.event_type==\u0026#34;alert\u0026#34;) | .alert.signature_id\u0026#39; | \\ sort | uniq -c | sort -rn | head -10 If a rule generates hundreds of daily alerts with none being true positives, disable it or adjust its threshold.\nReverse Proxy with HAProxy and ACME If self-hosted services are exposed to the internet, doing it directly with port forwarding is functional but insecure. A reverse proxy allows terminating TLS with valid certificates, applying rate limiting, and having a centralized control point.\nInstall the os-haproxy and os-acme-client plugins from System \u0026gt; Firmware \u0026gt; Plugins.\nCertificates with Let\u0026rsquo;s Encrypt (ACME):\nIn Services \u0026gt; ACME Client \u0026gt; Accounts, create an ACME account. In Services \u0026gt; ACME Client \u0026gt; Challenge Types, configure a DNS-01 challenge. It\u0026rsquo;s preferred over HTTP-01 because it doesn\u0026rsquo;t require opening port 80 and supports wildcards. In Services \u0026gt; ACME Client \u0026gt; Certificates, create certificates for each service. HAProxy Configuration:\nIn Services \u0026gt; HAProxy \u0026gt; Real Servers, create a backend for each internal service (Nextcloud at 192.168.40.10:443, Jellyfin at 192.168.40.11:8096, etc.). In Services \u0026gt; HAProxy \u0026gt; Rules \u0026amp; Checks \u0026gt; Conditions, create conditions based on SNI or hostname. In Services \u0026gt; HAProxy \u0026gt; Virtual Services \u0026gt; Public Services, create a frontend that listens on port 443, binds the ACME certificates, and routes to backends based on conditions. Enable HSTS in HTTP headers to force HTTPS. Configure rate limiting per IP to mitigate brute force attacks against login forms. Threat Feeds and Scheduled Aliases IP blocklists lose value if they aren\u0026rsquo;t updated. OPNsense allows creating URL Table type aliases that are downloaded automatically.\nIn Firewall \u0026gt; Aliases, create aliases with these sources:\nAlias URL Frequency Description Spamhaus_DROP https://www.spamhaus.org/drop/drop.txt Daily Hijacked ranges or used for spam Spamhaus_EDROP https://www.spamhaus.org/drop/edrop.txt Daily Extension of DROP Abusech_Feodo https://feodotracker.abuse.ch/downloads/ipblocklist.txt Every 6h Banking botnet IPs Abusech_SSLBL https://sslbl.abuse.ch/blacklist/sslipblacklist.txt Every 6h IPs with malicious certificates Apply these aliases as source in blocking rules on WAN. URL Table type aliases update automatically according to the configured frequency.\nZFS Snapshots for Rollback If ZFS was chosen as the filesystem during installation (instead of UFS), one of its best features can be used: instant snapshots with rollback.\nBefore any important change (firmware update, massive rule change, plugin installation), create a snapshot:\n1 2 3 4 5 6 7 8 # Create a snapshot before updating zfs snapshot zroot/ROOT/default@pre-update-$(date +%Y%m%d) # List existing snapshots zfs list -t snapshot # If something goes wrong, rollback to the previous snapshot zfs rollback zroot/ROOT/default@pre-update-20260413 This returns the entire filesystem to the snapshot state in seconds. It\u0026rsquo;s insurance against failed updates that complements XML configuration backups. The snapshot recovers everything; the XML backup only recovers the configuration.\nInfrastructure as Code for OPNsense Doing everything from the web interface works, but it has known problems: there\u0026rsquo;s no readable change history, there\u0026rsquo;s no way to review what was modified before applying it, and rebuilding the configuration requires following a step-by-step guide or restoring an opaque backup.\nOPNsense REST API OPNsense exposes a fairly complete REST API. The documentation is in /api/ and covers most web interface functionalities: aliases, firewall rules, interface configuration, IDS, CrowdSec, Unbound, HAProxy, and more.\nTo use the API, create a key/secret pair in System \u0026gt; Access \u0026gt; Users, select the administrator user, and generate an API key. OPNsense generates a file with the key and secret.\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 # Query firewall aliases curl -k -u \u0026#34;API_KEY:API_SECRET\u0026#34; \\ https://192.168.1.1/api/firewall/alias/searchItem # Create a new alias curl -k -u \u0026#34;API_KEY:API_SECRET\u0026#34; \\ -X POST \\ -H \u0026#34;Content-Type: application/json\u0026#34; \\ -d \u0026#39;{\u0026#34;alias\u0026#34;:{\u0026#34;name\u0026#34;:\u0026#34;test_alias\u0026#34;,\u0026#34;type\u0026#34;:\u0026#34;host\u0026#34;,\u0026#34;content\u0026#34;:\u0026#34;10.0.0.1\u0026#34;}}\u0026#39; \\ https://192.168.1.1/api/firewall/alias/addItem # Apply pending changes to the firewall curl -k -u \u0026#34;API_KEY:API_SECRET\u0026#34; \\ -X POST \\ https://192.168.1.1/api/firewall/alias/reconfigure The API doesn\u0026rsquo;t cover 100% of the web interface. Some plugin functions (like parts of Zenarmor) aren\u0026rsquo;t exposed. But for firewall management, aliases, rules, interfaces, and most core services, it\u0026rsquo;s sufficient.\nAnsible: ansibleguy/collection_opnsense The ansibleguy.opnsense collection is the most mature option for managing OPNsense as code in 2026. It wraps the REST API in idempotent Ansible modules.\n1 2 # Install the collection ansible-galaxy collection install ansibleguy.opnsense An example playbook managing aliases and firewall rules:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 --- - name: Configure OPNsense firewall hosts: opnsense connection: httpapi vars: ansible_httpapi_port: 443 ansible_httpapi_use_ssl: true ansible_httpapi_validate_certs: false tasks: - name: Create RFC1918 alias ansibleguy.opnsense.alias: name: RFC1918 type: network content: - \u0026#34;10.0.0.0/8\u0026#34; - \u0026#34;172.16.0.0/12\u0026#34; - \u0026#34;192.168.0.0/16\u0026#34; description: \u0026#34;RFC1918 private networks\u0026#34; - name: Create admin alias ansibleguy.opnsense.alias: name: Admin_IPs type: host content: - \u0026#34;192.168.50.10\u0026#34; - \u0026#34;192.168.50.11\u0026#34; description: \u0026#34;Admin IPs\u0026#34; - name: Block IoT to internal networks ansibleguy.opnsense.rule: interface: \u0026#34;opt3\u0026#34; # VLAN 30 IoT action: block source_net: \u0026#34;IoT net\u0026#34; destination_net: \u0026#34;RFC1918\u0026#34; description: \u0026#34;IoT without access to internal networks\u0026#34; Ansible supports check mode (--check) to see what would change without applying anything. This is especially useful for reviewing firewall changes before executing them, something the web interface doesn\u0026rsquo;t allow.\nThe main limitation is that not all OPNsense modules are covered by the collection. Services like Zenarmor, CrowdSec, or advanced Suricata configurations may require direct API calls using Ansible\u0026rsquo;s uri module.\nVersion Control of config.xml The most straightforward approach to have configuration under version control: export the config.xml, encrypt it, and store it in a private Git repository. It\u0026rsquo;s what\u0026rsquo;s already done with the backups from the second post, but integrated into a Git workflow.\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 #!/bin/bash # export_config.sh - Run from cron or manually before changes BACKUP_DIR=\u0026#34;/root/config-backups\u0026#34; REPO_DIR=\u0026#34;/root/opnsense-config\u0026#34; DATE=$(date +%Y%m%d_%H%M) # Export current configuration cp /conf/config.xml \u0026#34;${BACKUP_DIR}/config_${DATE}.xml\u0026#34; # Encrypt with age (simpler than OpenSSL for this use) age -r age1publickey... \\ -o \u0026#34;${REPO_DIR}/config_${DATE}.xml.age\u0026#34; \\ \u0026#34;${BACKUP_DIR}/config_${DATE}.xml\u0026#34; # Clean up unencrypted file rm \u0026#34;${BACKUP_DIR}/config_${DATE}.xml\u0026#34; # Commit to repository cd \u0026#34;${REPO_DIR}\u0026#34; git add . git commit -m \u0026#34;config: backup ${DATE}\u0026#34; git push origin main With this, you have a change history of the configuration with timestamps and the ability to do git diff between encrypted versions (or decrypt two versions and compare them with diff). It\u0026rsquo;s not as clean as the VyOS model where configuration is plain text, but it works.\nCI/CD Pipeline for Validation Taking automation one step further: validate configuration changes before applying them. A lightweight pipeline in GitLab CI or GitHub Actions that:\nDecrypts the config.xml in an ephemeral environment. Validates the XML structure with xmllint. Checks anti-patterns with custom rules (rules with source=any destination=any action=pass, users without OTP, unnecessary services enabled). Runs the Ansible playbook in check mode against a staging environment (if it exists) or just validates the syntax. This doesn\u0026rsquo;t replace manual testing, but it catches obvious errors before they reach production.\nIn terms of IaC maturity, OPNsense is at an intermediate point:\nAspect OPNsense VyOS OpenWrt REST API Complete Complete Limited (ubus/JSON-RPC) Terraform No mature provider Official provider (Foltik/vyos) No provider Ansible ansibleguy.opnsense vyos.vyos (official) Community roles Config format XML (config.xml) Plain text CLI UCI (plain text) Git workflow Manual or scripted export Native (text config) Manual export VyOS wins at IaC by design: its configuration is plain text from day one, with atomic commits and native rollback. OPNsense compensates with a solid REST API and the Ansible collection, but requires more effort to reach the same level of automation.\nAuditing Framework: ISO 27001 and ENS Why It Matters Even for a Homelab It\u0026rsquo;s tempting to think that a formal auditing framework is only for companies. But the reality is more pragmatic: an auditing framework is a checklist that people with more experience than you have already validated. Following it avoids the \u0026ldquo;I forgot to review X\u0026rdquo; problem that appears when the security routine depends only on memory.\nThere\u0026rsquo;s an additional reason if you\u0026rsquo;re in Spain: the National Security Scheme (ENS, Real Decreto 311/2022) is mandatory for the public sector and its technology providers1. This increasingly includes freelancers and small companies providing services to public administrations. Having the infrastructure aligned with the ENS, even at a basic level, isn\u0026rsquo;t just good practice—it\u0026rsquo;s a potential requirement.\nRelevant ISO 27001:2022 Controls ISO 27001:2022 organizes security controls in Annex A2. The ones that apply directly to what was configured in this series:\nControl Description OPNsense Implementation A.8.20 Networks security WAN deny-all firewall, IPS, CrowdSec, GeoIP A.8.22 Networks segregation 5 VLANs with restrictive inter-VLAN rules A.8.23 Web filtering Zenarmor DPI with per-category policies A.8.15 Logging Remote syslog with TLS, Suricata EVE JSON A.8.16 Monitoring activities Suricata alerts, CrowdSec dashboards, Zenarmor A.8.17 Clock synchronization NTP configured in System \u0026gt; General (essential for log correlation) A.5.15 Access control Least-privilege policy, dedicated admin A.5.17 Authentication information 16+ character passwords, OTP enabled A.5.18 Access rights Periodic review of users and permissions A.8.2 Privileged access Restricted root, separate admin, SSH keys only The standard doesn\u0026rsquo;t prescribe exact review frequencies, but auditors expect: firewall rule review at least quarterly, privileged access review quarterly, and continuous log monitoring with weekly manual review.\nApplicable ENS Measures The ENS (RD 311/2022) classifies systems into three levels: basic, medium, and high. The relevant measures for a firewall:\nENS Measure Description Minimum Level Implementation mp.com.1 Secure perimeter Medium WAN firewall, GeoIP, deny-all by default mp.com.2 Confidentiality protection Medium WireGuard VPN, DoT for DNS mp.com.4 Networks segregation Medium VLANs with inter-VLAN rules op.exp.2 Security configuration Basic SSH hardening, disabled services op.exp.8 Activity logging Basic Remote syslog (2-year retention for medium/high) op.acc.1 Identification Basic Unique users, no shared accounts op.acc.5 Authentication mechanism Basic Strong passwords; OTP for medium/high op.acc.7 Remote access Basic WireGuard VPN mandatory for remote admin op.mon.1 Intrusion detection Medium Suricata IPS + CrowdSec The CCN-STIC guides from the National Cryptology Center provide detailed implementation instructions. Specifically, CCN-STIC-811 for system interconnection and CCN-STIC-408 for perimeter security are the most relevant for this context3.\nAn important ENS detail: log retention for medium and high levels is a minimum of two years. If the remote syslog doesn\u0026rsquo;t have capacity for that, it needs to be planned. With Loki and compression, firewall logs from a homelab don\u0026rsquo;t take up much space, but it needs to be considered from the design phase.\nAudit Calendar Combining ISO 27001 and ENS recommendations with what\u0026rsquo;s realistic for a small environment:\nFrequency Task Type Daily Automated review of IDS/IPS alerts and CrowdSec decisions Automated Weekly Manual log review: blocked events, failed login attempts, anomalous traffic Manual Monthly Firewall rule review and alias update. Verify threat feeds are updating Manual Quarterly Complete rule review: identify rules with no hits, overly permissive rules, forgotten temporary rules Manual Quarterly Access review: active users, permissions, valid SSH keys, API tokens Manual Semi-annual Basic exposure test: nmap from outside the network against the public IP Manual Annual Complete security posture review. Compare with previous year\u0026rsquo;s state Manual For automated daily review, a basic script that runs with cron:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 #!/bin/bash # /root/scripts/daily_audit.sh # Run with: crontab -e -\u0026gt; 0 7 * * * /root/scripts/daily_audit.sh LOG=\u0026#34;/var/log/daily_audit_$(date +%Y%m%d).log\u0026#34; echo \u0026#34;=== Daily audit $(date) ===\u0026#34; \u0026gt; \u0026#34;$LOG\u0026#34; # Critical Suricata alerts in the last 24h echo -e \u0026#34;\\n--- Suricata alerts (severity 1) ---\u0026#34; \u0026gt;\u0026gt; \u0026#34;$LOG\u0026#34; cat /var/log/suricata/eve.json | \\ jq -r \u0026#39;select(.event_type==\u0026#34;alert\u0026#34; and .alert.severity==1) | \u0026#34;\\(.timestamp) \\(.alert.signature) src=\\(.src_ip) dst=\\(.dest_ip)\u0026#34;\u0026#39; | \\ tail -50 \u0026gt;\u0026gt; \u0026#34;$LOG\u0026#34; # Active CrowdSec decisions echo -e \u0026#34;\\n--- CrowdSec active decisions ---\u0026#34; \u0026gt;\u0026gt; \u0026#34;$LOG\u0026#34; cscli decisions list -o raw 2\u0026gt;/dev/null | wc -l \u0026gt;\u0026gt; \u0026#34;$LOG\u0026#34; # Age of last backup echo -e \u0026#34;\\n--- Last backup ---\u0026#34; \u0026gt;\u0026gt; \u0026#34;$LOG\u0026#34; LAST_BACKUP=$(ls -t /conf/backup/*.xml 2\u0026gt;/dev/null | head -1) if [ -n \u0026#34;$LAST_BACKUP\u0026#34; ]; then stat -f \u0026#34;%Sm\u0026#34; \u0026#34;$LAST_BACKUP\u0026#34; \u0026gt;\u0026gt; \u0026#34;$LOG\u0026#34; else echo \u0026#34;No backups found\u0026#34; \u0026gt;\u0026gt; \u0026#34;$LOG\u0026#34; fi # System status echo -e \u0026#34;\\n--- System resources ---\u0026#34; \u0026gt;\u0026gt; \u0026#34;$LOG\u0026#34; echo \u0026#34;CPU: $(sysctl -n dev.cpu.0.temperature 2\u0026gt;/dev/null || echo \u0026#39;N/A\u0026#39;)\u0026#34; \u0026gt;\u0026gt; \u0026#34;$LOG\u0026#34; echo \u0026#34;RAM: $(sysctl -n vm.stats.vm.v_free_count)\u0026#34; \u0026gt;\u0026gt; \u0026#34;$LOG\u0026#34; echo \u0026#34;States: $(pfctl -si 2\u0026gt;/dev/null | grep \u0026#39;current entries\u0026#39;)\u0026#34; \u0026gt;\u0026gt; \u0026#34;$LOG\u0026#34; # Send via email or copy to syslog logger -t daily_audit \u0026#34;Audit completed. See $LOG\u0026#34; This script isn\u0026rsquo;t meant to be a SIEM. It\u0026rsquo;s a first line of automated review that flags if there\u0026rsquo;s something requiring attention. For serious log analysis, the Grafana + Loki combination or a dedicated Wazuh is appropriate.\nOPNsense vs. Alternatives Before continuing to invest time in OPNsense, it\u0026rsquo;s worth comparing with the other serious open source firewall/router options. Not to migrate now, but to know what\u0026rsquo;s out there and make informed decisions if needs change.\nVyOS VyOS is a network operating system based on Debian with a CLI configuration model inspired by JunOS. It has no web interface.\nWhat it does better than OPNsense:\nPlain text configuration. VyOS config is a readable text file, with atomic commits and native rollback. It\u0026rsquo;s the IaC dream: git diff works directly on the configuration. Official Terraform provider (Foltik/vyos). Real declarative network infrastructure. Advanced routing. BGP, OSPF, IS-IS at scale. Supports routing tables with more than one million BGP prefixes. Performance. The VPP dataplane enables 10 Gbps+ throughput with appropriate hardware. Cloud-native. Direct deployment on AWS, Azure, and GCP with official support. What it does worse:\nNo GUI. Everything is CLI or API. The learning curve is steep if you don\u0026rsquo;t come from enterprise networking. Limited integrated security. There\u0026rsquo;s no equivalent to Suricata with GUI, no DPI like Zenarmor, no integrated CrowdSec. Suricata can be installed manually, but without the integration OPNsense offers. Paid LTS. Since 2024, stable LTS images require a subscription. Rolling releases are free but without stability guarantee. OpenWrt OpenWrt is a Linux-based router operating system. Its strength is embedded hardware support.\nWhat it does better than OPNsense:\nMassive hardware support. Runs on more than 500 commercial router models, plus x86. Native WiFi. Directly manages wireless interfaces, with excellent driver support and advanced AP configuration. Lightweight. Can run on 128 MB RAM and 16 MB flash on embedded hardware. Package ecosystem. More than 27,000 packages available. What it does worse:\nBasic security. nftables firewall without integrated IDS/IPS. Suricata and Snort packages are community-maintained, poorly maintained, and have performance problems on limited hardware. No DPI. There\u0026rsquo;s no equivalent to Zenarmor. Limited IaC. UCI is scriptable but there\u0026rsquo;s no Terraform provider or mature REST API. Complicated updates. On x86, updating OpenWrt means reinstalling and restoring configuration. OPNsense updates with one click. When to Choose Each Criterion OPNsense VyOS OpenWrt Integrated security Excellent (Suricata, CrowdSec, Zenarmor) Basic Minimal IaC and automation Good (API + Ansible) Excellent (Terraform + text config) Limited (UCI) 10G+ performance Moderate Excellent (VPP) Not applicable Learning curve Moderate (GUI) Steep (CLI) Moderate Cost Free Paid LTS, free rolling Free Ideal use case Perimeter firewall/UTM Enterprise or cloud edge router WiFi AP, lightweight gateway For a homelab or small office where security is the priority, OPNsense remains the best option. The combination of Suricata, CrowdSec, and Zenarmor with a usable web interface has no equivalent among the alternatives.\nVyOS makes sense if the environment grows toward complex routing (multiple uplinks with BGP, SD-WAN) or if infrastructure is managed exclusively with Terraform.\nOpenWrt makes sense as a complement: a WiFi AP running OpenWrt behind an OPNsense is a solid combination. But as a perimeter firewall for security, it falls short.\nConclusions and Next Steps In three posts we\u0026rsquo;ve gone from a mini PC without an operating system to a segmented network with five VLANs, three detection layers (Suricata, CrowdSec, Zenarmor), VPN with WireGuard, encrypted DNS, automatic backups, automation with Ansible, and an auditing framework based on real standards.\nIt\u0026rsquo;s not perfect. OPNsense\u0026rsquo;s IaC doesn\u0026rsquo;t reach VyOS\u0026rsquo;s level. Zenarmor has a licensing model that limits advanced features in the free version. And maintaining an audit routine requires discipline that\u0026rsquo;s easy to set aside when everything seems to work well.\nBut it\u0026rsquo;s a solid foundation to keep building on. There are topics that were deliberately left out of this series because they deserve their own depth:\nWazuh SIEM integration: correlation of Suricata, CrowdSec, and system logs in one place, with centralized alerts and dashboards. It\u0026rsquo;s the natural step for anyone wanting real security monitoring. Multi-WAN and failover: configure two internet connections with load balancing and automatic failover. Relevant when availability matters. Advanced HAProxy: mutual TLS (mTLS) with client certificates, certificate authentication for internal services, OAuth2 as an authentication layer against self-hosted applications. Network monitoring with NetFlow/Insight: detailed traffic analysis by protocol, host, and port to detect anomalies that IDS doesn\u0026rsquo;t catch. Complete automation with Terraform: if OPNsense gets a mature Terraform provider (or if there\u0026rsquo;s a partial migration to VyOS for routing), declarative management of the entire network. If there\u0026rsquo;s interest, a fourth post can cover Wazuh integration and advanced monitoring. That\u0026rsquo;s where the jump from \u0026ldquo;secure homelab\u0026rdquo; to \u0026ldquo;infrastructure with real visibility\u0026rdquo; becomes most evident.\nReal Decreto 311/2022, de 3 de mayo, por el que se regula el Esquema Nacional de Seguridad. BOE-A-2022-7191.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nISO/IEC 27001:2022 — Information security, cybersecurity and privacy protection — Information security management systems — Requirements.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nThe CCN-STIC guides from the National Cryptology Center provide detailed instructions for ENS implementation. Specifically, CCN-STIC-811 covers system interconnection and CCN-STIC-408 covers perimeter security.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","date":"2026-04-11T00:00:00Z","permalink":"/en/p/opnsense-auditing-automation-and-advanced-practices/","title":"OPNsense: Auditing, Automation, and Advanced Practices"},{"content":"Native encrypted backups Losing the OPNsense configuration after hours of tweaking is the kind of disaster that only happens once. After that incident, automatic backups are configured.\nOPNsense has a native backup system that exports all configuration to an XML file. Since version 24.1, these backups can be encrypted directly from the web interface.\nEncrypted backup configuration In System \u0026gt; Configuration \u0026gt; Backups:\nGo to the Google Drive / Nextcloud section if you want remote backup, or stick with local backup. Check the Encrypt backup box and enter an encryption password. This password is independent of system credentials. Store it in a password manager, because without it the backup is unrecoverable. In the automatic backup section (Scheduled), configure the frequency. A daily backup is reasonable for most environments. Manual backup from the interface In System \u0026gt; Configuration \u0026gt; Backups, the Download configuration button generates an XML with the current configuration. If encryption was selected, the downloaded file will be encrypted with AES-256-CBC.\nBackup from the console To automate backups from the command line:\n1 2 3 4 5 6 7 8 9 10 # Export the configuration cp /conf/config.xml /root/backup_$(date +%Y%m%d).xml # Encrypt with OpenSSL openssl enc -aes-256-cbc -salt -pbkdf2 \\ -in /root/backup_$(date +%Y%m%d).xml \\ -out /root/backup_$(date +%Y%m%d).xml.enc # Remove the unencrypted file rm /root/backup_$(date +%Y%m%d).xml It is recommended to copy encrypted backups to external storage: a NAS, an S3 bucket, or even a private Git repository (the encrypted XML is small, a few hundred KB).\nRestoration To restore, go to System \u0026gt; Configuration \u0026gt; Backups, upload the file and, if encrypted, enter the password. OPNsense applies the configuration and restarts the affected services.\nStatic IP assignment There are devices that need to always have the same IP: servers, NAS, printers, surveillance cameras. This can be done in two ways: configuring a static IP on the device itself or, which is cleaner, assigning DHCP reservations in OPNsense.\nDHCP reservations (the recommended way) In Services \u0026gt; DHCPv4 \u0026gt; [interface]:\nGo to the DHCP Static Mappings section. Add a new entry with: MAC Address: the MAC address of the device. IP Address: the IP you want to always assign. Hostname: a descriptive name. The advantage of doing it this way is that management is centralized in OPNsense. If you change routers tomorrow, devices don\u0026rsquo;t need to be reconfigured.\nRange convention A convention that works well for organizing the network:\nRange Usage .1 Gateway (OPNsense) .2 - .19 Infrastructure (switches, APs, NAS) .20 - .49 Servers and services .50 - .99 Fixed IP devices (printers, cameras) .100 - .254 Dynamic DHCP pool This means that just by seeing a device\u0026rsquo;s IP, you already know which category it falls into.\nZenarmor (Sensei) Zenarmor is a deep packet inspection (DPI) plugin for OPNsense. It goes beyond what Suricata or CrowdSec do because it inspects traffic at the application level: it can distinguish between Netflix and YouTube, between Telegram and WhatsApp, between legitimate traffic and potentially dangerous applications.\nWhat it does exactly Application-based traffic classification: identifies more than 300 applications and protocols. Content filtering by categories: allows blocking entire categories (gambling, malware, adult content) without needing to maintain lists manually. Encrypted traffic analysis (TLS): Zenarmor can classify HTTPS traffic without decrypting it, using metadata like SNI, JA3 fingerprints, and connection patterns. Detailed reporting: dashboards with consumption by device, application, and category. Installation In System \u0026gt; Firmware \u0026gt; Plugins, search for os-sunnyvalley and install. After installation, Zenarmor appears in the main menu.\nWhen starting Zenarmor for the first time, a configuration wizard runs:\nDeployment mode: choose Routed Mode to inspect all traffic passing through OPNsense. Bridge mode is for specific cases. Database engine: Zenarmor uses a local database for logs. For modest hardware (N100), select SQLite. For more powerful hardware, Elasticsearch gives better query performance. Interfaces to protect: select the LAN interfaces you want to inspect. Default policy: start with a permissive policy (monitor only) and adjust after seeing actual traffic. Policy configuration In Zenarmor \u0026gt; Policies:\nPolicies are applied by interface or by device group. A reasonable configuration:\nGeneral policy (main LAN):\nBlock categories: Malware, Phishing, Cryptomining, C2 (Command \u0026amp; Control). Monitor but allow: Streaming, Social Media, Gaming. Allow everything else. Policy for IoT (we\u0026rsquo;ll create this with VLANs later):\nBlock everything except the necessary domains for each device. IoT devices should not be able to access the internet freely. Policy for guests:\nBlock: P2P, Tor, VPN (to avoid filter bypass). Limit bandwidth per device. Difference with Suricata and CrowdSec Feature Suricata (IDS/IPS) CrowdSec Zenarmor Inspection Packets and signatures Logs and patterns Application (DPI) What it detects Exploits, malware, C2 Brute force, scans Applications, categories Blocking By signature/rule By IP (temp ban) By application/category Main resource CPU (high) CPU (low) CPU (medium) + RAM Complementary Yes Yes Yes All three complement each other. Suricata looks for known threats in traffic. CrowdSec detects malicious behavior in logs and shares intelligence. Zenarmor classifies and filters at the application level. Using them together gives security coverage that is hard to beat on home equipment.\nDisable insecure SSH SSH is the usual way to access the OPNsense console remotely. But the default configuration has aspects that should be changed.\nSecure SSH configuration In System \u0026gt; Settings \u0026gt; Administration, SSH section:\nDisable root login via SSH. Create a specific user with SSH access and sudo permissions. Change the default port. Port 22 is the first one bots scan. Change to a high port (for example, 2222 or something less predictable). Disable password authentication. Use exclusively SSH keys: 1 2 3 4 5 # On your local machine, generate a key pair if you don\u0026#39;t have one ssh-keygen -t ed25519 -C \u0026#34;opnsense-admin\u0026#34; # Copy the public key cat ~/.ssh/id_ed25519.pub Paste the public key in the user\u0026rsquo;s profile at System \u0026gt; Access \u0026gt; Users \u0026gt; [user] \u0026gt; Authorized Keys. In the SSH configuration, uncheck Permit Password Login. SSH access restriction by firewall Create a firewall rule that only allows SSH from specific IPs:\nIn Firewall \u0026gt; Rules \u0026gt; LAN, create a rule:\nAction: Pass Protocol: TCP Source: alias with admin IPs Destination: This Firewall Destination port: the configured SSH port And another rule that blocks SSH from any other source.\nVLANs: network segmentation Segmentation with VLANs is probably the most important change you can make to home network security. Without segmentation, a compromised WiFi bulb has direct access to the NAS with family photos. With VLANs, each type of device lives in its own isolated segment.\nVLAN design VLAN ID Name Subnet Purpose 10 Main 192.168.10.0/24 Trusted devices: laptops, desktops, personal mobiles 20 Guests 192.168.20.0/24 Guest devices, no access to internal network 30 IoT 192.168.30.0/24 IoT devices: cameras, sensors, bulbs, vacuums 40 Servers 192.168.40.0/24 Servers, NAS, self-hosted services 50 Management 192.168.50.0/24 Infrastructure management: switches, APs, OPNsense itself Creating VLANs in OPNsense For each VLAN:\nGo to Interfaces \u0026gt; Other Types \u0026gt; VLAN. Create a new VLAN: Parent interface: the physical interface connected to the managed switch (for example, igb1). VLAN tag: the ID from the table above (10, 20, 30, 40, 50). Description: the name of the VLAN. Go to Interfaces \u0026gt; Assignments and assign each VLAN as a new interface. Configure each interface: Enable the interface. Assign a static IP: the gateway IP for that subnet (for example, 192.168.10.1/24 for VLAN 10). Configure DHCP in Services \u0026gt; DHCPv4 for each VLAN with its corresponding range. Switch configuration The managed switch needs to be configured to understand VLANs:\nThe trunk port going to OPNsense must be tagged for all VLANs (10, 20, 30, 40, 50). Access ports are configured as untagged in the corresponding VLAN. For example, the port where a guest AP is connected is set to untagged on VLAN 20. If the AP supports multiple SSIDs with VLANs (like Ubiquiti or TP-Link Omada), you can create one SSID per VLAN and the AP handles tagging the traffic. Firewall rules between VLANs This is the point where segmentation is really defined. Without firewall rules, VLANs share the same router and can communicate with each other. Explicit rules need to be created.\nGeneral principle: deny everything between VLANs by default and allow only what\u0026rsquo;s necessary.\nVLAN 10 (Main):\nAction Source Destination Port Description Pass Main net * * Full internet access Pass Main net Servers net * Access to internal services Block Main net Management net * No direct access to management Block Main net IoT net * IoT isolation VLAN 20 (Guests):\nAction Source Destination Port Description Pass Guests net * 80, 443, 53 Only web browsing and DNS Block Guests net RFC1918 * No access to internal networks VLAN 30 (IoT):\nAction Source Destination Port Description Pass IoT net * 443, 8883 Only HTTPS and MQTT for cloud Pass IoT net IoT gateway 53 DNS Block IoT net RFC1918 * No access to internal networks VLAN 40 (Servers):\nAction Source Destination Port Description Pass Servers net * 80, 443, 53 Internet access for updates Pass Servers net Servers net * Inter-service communication Block Servers net Main net * Servers don\u0026rsquo;t initiate connections to Main VLAN 50 (Management):\nAction Source Destination Port Description Pass Management net * * Full access (admins only) To implement the RFC1918 network blocking rule (which covers all private subnets), create an alias in Firewall \u0026gt; Aliases:\nName: RFC1918 Type: Network Content: 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16 This alias is used as the destination in blocking rules to prevent VLANs like Guests or IoT from accessing any internal network.\nHardening and best practices Updates The first and most basic thing: keep OPNsense updated. Security updates are published regularly and patches are applied quickly.\nConfigure email update notifications. Apply updates during low-usage times. Before updating, make a backup (it\u0026rsquo;s already automated if you followed the first section). DNS over TLS (DoT) Configure Unbound (OPNsense\u0026rsquo;s DNS resolver) to use DNS over TLS:\nIn Services \u0026gt; Unbound DNS \u0026gt; General:\nEnable DNS over TLS. In Custom forwarding, add DNS servers that support DoT: 1 2 3 4 5 6 7 # Quad9 (includes malware filtering) 9.9.9.9@853#dns.quad9.net 149.112.112.112@853#dns.quad9.net # Cloudflare 1.1.1.1@853#cloudflare-dns.com 1.0.0.1@853#cloudflare-dns.com This encrypts DNS queries between OPNsense and the resolver, preventing the ISP from seeing which domains each device queries.\nDisable unnecessary services In System \u0026gt; Settings \u0026gt; Administration:\nDisable UPnP unless strictly necessary (and even then, limit it to specific interfaces). Disable SNMP if it\u0026rsquo;s not used for monitoring. Review installed plugins and uninstall those not in use. Centralized logging OPNsense can send logs to an external syslog server. If you have a monitoring stack (Grafana + Loki, or ELK), configure sending in System \u0026gt; Settings \u0026gt; Logging \u0026gt; Remote:\nServer: the syslog server IP. Protocol: TCP with TLS if possible. Facility: select which logs to send (firewall, system, IDS). Regular audit Establish a review routine:\nWeekly: review IDS/IPS and CrowdSec logs. Look for recurring patterns. Monthly: review firewall rules. Are there rules that no longer make sense? Has any new device been added that needs specific rules? Quarterly: review users and permissions. Is each user still necessary? Are SSH keys still valid? In the third and final post of the series, we\u0026rsquo;ll review everything configured, check the overall security posture, and look at advanced practices.\n","date":"2026-04-03T00:00:00Z","permalink":"/en/p/opnsense-network-segmentation-vlans-and-hardening/","title":"OPNsense: network segmentation, VLANs and hardening"},{"content":"What hardware OPNsense needs Before opening the installer, it\u0026rsquo;s worth knowing what OPNsense requires and what makes sense to buy. The official documentation distinguishes between minimum and recommended, but in practice there are nuances that matter quite a bit.\nOfficial minimum requirements Resource Minimum Recommended CPU 64-bit x86-64, 1 GHz Recent multi-core with AES-NI RAM 2 GB 8 GB Storage 40 GB SSD 120 GB SSD NICs 2 network interfaces 2+ Intel interfaces AES-NI has been mandatory since OPNsense 24.1. Without this instruction the installer won\u0026rsquo;t even boot. Any Intel processor from sixth generation onward includes it, and it\u0026rsquo;s been present in AMD since Ryzen.\nBudget options that work The cheapest option I\u0026rsquo;ve had good experience with is a mini PC with an Intel N100 or N200 processor. They\u0026rsquo;re designed for low power consumption and have AES-NI, which covers the main requirement. They can be found with four Intel i226-V Ethernet ports for under 150 euros.\nSome specific models that work:\nTopton N100/N200 with 4x i226-V: between 120 and 170 euros depending on configuration. They come without RAM or SSD, which are purchased separately. An 8 GB DDR5 module and a 128 GB SSD add about 30 euros more. Protectli VP2420 or VP2410: more expensive (around 300 euros), but with official support and an aluminum case that dissipates heat well. A good option if you prefer something with serious warranty. Recycled hardware with two NICs: a Dell OptiPlex or Lenovo ThinkCentre with a dual-port Intel PCIe card works perfectly. They can be found for 50-80 euros in second-hand markets. What really matters: make sure the network interfaces are Intel. Realtek works, but they cause problems with offloading and performance under load. It\u0026rsquo;s worth paying the difference.\nInstallation Installing OPNsense is straightforward. Download the ISO image from the official website, write it to a USB with dd or Rufus on Windows, and boot from the USB.\n1 2 # Write the image to a USB (be careful to select the correct device) dd if=OPNsense-24.7-dvd-amd64.iso of=/dev/sdX bs=4M status=progress On boot, a live environment appears with the option to try before installing. The default user is installer with password opnsense.\nDuring installation:\nSelect the destination disk for installation (the internal SSD, not the USB). Choose the filesystem. UFS is the simple and stable option. ZFS has advantages (snapshots, compression), but for a firewall with a single disk, UFS is sufficient. Define the root password. Remove the USB and reboot. After reboot, OPNsense boots directly and presents a text console with a basic menu. From there you can assign interfaces and configure the LAN IP address to access the web interface.\nPPPoE configuration on WAN If your ISP uses PPPoE (as is the case with many fiber connections in Spain and Latin America), it needs to be configured on the WAN interface.\nIn Interfaces \u0026gt; WAN:\nChange the IPv4 configuration type to PPPoE. Enter the username and password provided by the ISP. With most providers you don\u0026rsquo;t need to touch the MTU, but if you notice fragmentation issues, adjust it to 1492 (the standard for PPPoE over Ethernet with MTU 1500). Save and apply. If the connection doesn\u0026rsquo;t come up, verify that the ONT cable goes directly to the port assigned as WAN. Some ISP ONTs need to be in bridge mode for OPNsense to negotiate the PPPoE session directly.\nLAN in bridge mode There are situations where it\u0026rsquo;s useful to group multiple physical ports into the same network segment, for example when the mini PC has four ports and we want three of them to work as a switch without additional hardware.\nTo configure a bridge in OPNsense:\nGo to Interfaces \u0026gt; Other Types \u0026gt; Bridge. Create a new bridge and add the interfaces you want to group (for example, igb1, igb2, igb3). Go to Interfaces \u0026gt; Assignments and assign the newly created bridge as the LAN interface. Configure the static LAN IP on the bridge interface (for example, 192.168.1.1/24). Enable the DHCP server in Services \u0026gt; DHCPv4 pointing to the bridge interface. With this, all three ports share the same network segment and DHCP serves addresses for all of them.\nWireless interface and firmware OPNsense supports some WiFi cards, but the support is limited compared to Linux. Atheros cards work best, but many need additional firmware that isn\u0026rsquo;t included by default.\nTo install the required firmware:\n1 2 3 4 # From the OPNsense console (option 8 from the menu for shell) pkg install wifi-firmware-atheros # Or for Intel cards: pkg install wifi-firmware-intel After installing the firmware, restart the system. The wireless interface should appear in Interfaces \u0026gt; Assignments.\nTo configure the access point:\nGo to Interfaces \u0026gt; Wireless and create a new device in Access Point mode. Select the standard (802.11ac/ax if the card supports it). Configure the SSID and WPA2/WPA3 security. Assign the wireless interface and give it an IP in a different range from the wired LAN, or add it to the existing bridge if you want it on the same segment. An honest warning: integrated WiFi in OPNsense works, but don\u0026rsquo;t expect the performance or stability of a dedicated access point. For serious use, it\u0026rsquo;s better to use an external AP (Ubiquiti, TP-Link Omada) and let OPNsense handle only routing and firewall.\nIDS and IPS: intrusion detection and prevention OPNsense includes Suricata as the IDS/IPS engine. The difference between the two modes is simple: IDS detects and logs, IPS detects and blocks.\nInitial configuration In Services \u0026gt; Intrusion Detection:\nEnable IDS: check the activation box. IPS mode: if you want it to block traffic, change the mode to IPS. This requires Suricata to run in inline mode, which is the default behavior in OPNsense. Interfaces: select WAN at minimum. If you want to inspect internal traffic as well, add LAN, but this consumes quite a bit more CPU. Pattern matcher: select Hyperscan if the hardware supports it (processors with SSSE3). It\u0026rsquo;s significantly faster than the default matcher. Recommended rule sets in 2026 It\u0026rsquo;s not about activating all available rules. That consumes resources and generates false positives. A reasonable selection:\nRule set What it\u0026rsquo;s for Recommendation ET Open (Emerging Threats) Known threats, malware, C2 Activate. It\u0026rsquo;s the foundation Abuse.ch SSL Blacklist Certificates associated with malware Activate Abuse.ch URLhaus Malware distribution URLs Activate ET Open Compromised Known compromised IPs Activate Feodo Tracker Banking botnets Activate ET Open Tor Tor traffic Only if you want to block Tor Snort VRT Snort commercial rules Requires subscription, not essential Best practices for IDS/IPS in 2026 Don\u0026rsquo;t activate all rules. Select the ones that make sense for your environment. A home network doesn\u0026rsquo;t need SCADA or SQL server rules. Automatic rule updates: configure scheduled rule downloads. In Schedule within IDS, set a daily update. Review logs before switching to IPS mode. Leave the system in IDS mode for at least a week to identify false positives. If something legitimate triggers alerts, create an exception before starting to block. Monitor CPU usage. Suricata can consume a lot on modest hardware. If the processor stays above 80% usage with IPS active, reduce the number of rules or limit inspection to the WAN interface. Use EVE JSON logging to export events to a SIEM or analysis tool. The JSON format facilitates integration with Elasticsearch, Grafana, or Wazuh. Don\u0026rsquo;t rely solely on IDS/IPS. It\u0026rsquo;s one more layer of defense. It doesn\u0026rsquo;t replace good firewall rules, network segmentation, or regular updates. CrowdSec CrowdSec complements Suricata with a different approach: log analysis and shared decisions with the community. While Suricata inspects packets in real time, CrowdSec analyzes service logs and applies bans based on behavior patterns.\nInstallation CrowdSec has an official plugin for OPNsense:\nGo to System \u0026gt; Firmware \u0026gt; Plugins. Search for os-crowdsec and install it. After installation, it appears in Services \u0026gt; CrowdSec. Recommended configuration and collections (Optional) After installation, register the instance in the CrowdSec central console:\n1 2 # From the OPNsense shell cscli console enroll \u0026lt;your-enrollment-key\u0026gt; The collections define what attack patterns CrowdSec detects. Recommended for a home or small office firewall:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 # Base collection for firewalls (usually already installed) cscli collections install crowdsecurity/freebsd # Detection of port scanning and SSH brute force cscli collections install crowdsecurity/sshd cscli collections install crowdsecurity/iptables # HTTP protection if you expose web services cscli collections install crowdsecurity/nginx cscli collections install crowdsecurity/base-http-scenarios # Aggressive scan detection cscli collections install crowdsecurity/http-cve # Community blocklists (known malicious IPs) cscli collections install crowdsecurity/whitelist-good-actors Enable the firewall bouncer so CrowdSec can create blocking rules directly in the OPNsense firewall:\n1 cscli bouncers add opnsense-firewall The generated token is entered in the plugin configuration in the web interface, in Services \u0026gt; CrowdSec \u0026gt; Bouncer.\nThe real advantage of CrowdSec is shared intelligence. When a community member detects an attacking IP, that information is distributed to everyone else. It\u0026rsquo;s like having a collaborative IP reputation system.\nWireGuard WireGuard is the cleanest option for VPN in 2026. Faster, simpler, and with better cryptography than OpenVPN or IPsec.\nServer configuration In VPN \u0026gt; WireGuard:\nCreate an instance: go to the Instances tab (or Local in earlier versions) and add a new one.\nGenerate a key pair (done automatically when creating the instance). Listen port: 51820 (or whatever you prefer). Tunnel address: 10.10.10.1/24. Add a peer: in the Peers tab, create a new pair.\nClient\u0026rsquo;s public key (generated on the client device). Allowed IPs: 10.10.10.2/32 (the IP the client will have inside the tunnel). Assign the interface: go to Interfaces \u0026gt; Assignments, assign the WireGuard interface (wg0 or wg1), enable it, and don\u0026rsquo;t touch the IP configuration (it\u0026rsquo;s already defined in WireGuard).\nClient configuration On the client (mobile, laptop), the configuration is a simple file:\n1 2 3 4 5 6 7 8 9 10 [Interface] PrivateKey = \u0026lt;client-private-key\u0026gt; Address = 10.10.10.2/24 DNS = 10.10.10.1 [Peer] PublicKey = \u0026lt;server-public-key\u0026gt; Endpoint = \u0026lt;public-ip-or-ddns\u0026gt;:51820 AllowedIPs = 0.0.0.0/0 PersistentKeepalive = 25 With AllowedIPs = 0.0.0.0/0, all client traffic goes through the tunnel. If you only want to access the local network, change to AllowedIPs = 192.168.1.0/24, 10.10.10.0/24.\nFirewall rules It\u0026rsquo;s no use configuring services if the firewall rules don\u0026rsquo;t allow the correct traffic. OPNsense blocks everything by default on WAN, which is correct. The work is in allowing what\u0026rsquo;s necessary on LAN and WireGuard.\nRules for LAN In Firewall \u0026gt; Rules \u0026gt; LAN:\nAction Protocol Source Destination Destination Port Description Pass IPv4+6 LAN net * * Allow LAN outbound Pass IPv4 LAN net LAN address 53 DNS to firewall Pass IPv4 LAN net LAN address 443 WebUI access The first rule is the most permissive and allows LAN to go out to the internet. In the second post of the series we\u0026rsquo;ll see how to restrict this with VLANs.\nRules for WireGuard In Firewall \u0026gt; Rules \u0026gt; WireGuard (or the assigned interface):\nAction Protocol Source Destination Destination Port Description Pass IPv4 WireGuard net LAN net * LAN access from VPN Pass IPv4 WireGuard net * * Internet outbound from VPN Rule on WAN for WireGuard In Firewall \u0026gt; Rules \u0026gt; WAN, add a rule to allow incoming connection to the WireGuard port:\nAction Protocol Source Destination Destination Port Description Pass UDP * WAN address 51820 Incoming WireGuard Offloading and hardware tuning When everything is working, it\u0026rsquo;s time to squeeze out the performance. OPNsense runs on FreeBSD, and it has several offloading options that can make a noticeable difference.\nWhat offloading is Offloading means delegating certain network operations to the network card hardware instead of processing them in software on the CPU. This frees up CPU cycles for other tasks (like Suricata or CrowdSec) and reduces latency.\nAvailable options In Interfaces \u0026gt; Settings:\nHardware CRC (Checksum Offloading): delegates TCP/UDP/IP checksum calculation to the network card. Enable if the NIC supports it (Intel i210/i225/i226 do). Measurably reduces CPU load. Hardware TSO (TCP Segmentation Offloading): the network card handles splitting large TCP packets into smaller segments. Improves throughput in large transfers. Can cause problems with some IPS configurations, so test and verify. Hardware LRO (Large Receive Offloading): groups small incoming packets into larger blocks before passing them to the CPU. Reduces interrupts. Do not enable if using IPS in inline mode, as it interferes with packet inspection. VLAN Hardware Filtering: if VLANs are used (we\u0026rsquo;ll cover this in the second post), let the NIC filter by VLAN ID in hardware. Additional system tuning In System \u0026gt; Settings \u0026gt; Tunables, some useful adjustments:\n1 2 3 4 5 6 7 8 9 # Increase network buffers net.inet.tcp.recvspace=65536 net.inet.tcp.sendspace=65536 # Enable RACK and BBR if hardware supports it (FreeBSD 14+) net.inet.tcp.functions_default=bbr # Adjust the number of NIC queues hw.igb.num_queues=4 Don\u0026rsquo;t obsess over tuning. On most home connections (up to 1 Gbps symmetric), an N100 with the default settings already gives maximum performance. Tuning starts to matter when you want to squeeze 2.5 Gbps connections or more, or when Suricata consumes too much CPU.\nUser management and web interface security Using root for day-to-day web interface access is bad practice. If someone compromises those credentials, they have full control.\nCreate a new administrator user In System \u0026gt; Access \u0026gt; Users:\nCreate a new user with a descriptive name (not admin, something less predictable). Assign a strong password. Minimum 16 characters, generated with a password manager. In Effective Privileges, assign the admins group. Enable OTP authentication if possible. OPNsense supports native TOTP, so it can be used with any authenticator app. Change the root password In System \u0026gt; Access \u0026gt; Users, select root and change the password to something long and random. Store this password in a safe place (password manager) and don\u0026rsquo;t use it for daily access.\nA more radical option: disable root login on the web interface. This can be done by removing web access privileges from the root user, leaving only physical console access as a last resort.\nRestrict web interface access By default, the web interface is accessible from the entire LAN. This can be restricted:\nChange the HTTPS port: in System \u0026gt; Settings \u0026gt; Administration, change the port from 443 to another non-standard one (for example, 8443). Restrict by source IP: create an alias in Firewall \u0026gt; Aliases with the IPs from which web interface access is allowed. Then, in the LAN firewall rules, create a rule that allows access to the WebUI port only from that alias, and a blocking rule for everything else. Enable HTTPS with your own certificate: in System \u0026gt; Trust \u0026gt; Certificates, generate a self-signed certificate or import one from Let\u0026rsquo;s Encrypt. This eliminates browser warnings and ensures the connection is properly encrypted. Brute force protection: in System \u0026gt; Settings \u0026gt; Administration, configure the maximum number of failed attempts and the lockout time. Disable HTTP: make sure only HTTPS is enabled. Unencrypted HTTP access to a firewall\u0026rsquo;s admin interface makes no sense. In the next post in the series, we\u0026rsquo;ll take security further with encrypted backups, VLANs, Zenarmor, and system hardening.\n","date":"2026-04-01T00:00:00Z","permalink":"/en/p/opnsense-from-hardware-to-a-working-firewall/","title":"OPNsense: from hardware to a working firewall"},{"content":"Why these two tools and not others The security tooling landscape for pipelines is huge, and it is easy to end up with a pipeline that spends twenty minutes just on scans. After trying several combinations, I stick with Trivy and Semgrep for a simple reason: they cover two distinct attack surfaces with minimal friction.\nSemgrep analyzes your source code looking for dangerous patterns — SQL injections, insecure deserialization, hardcoded secrets. It does this fast and without needing to compile anything. Trivy, on the other hand, takes care of everything that is not your code: dependencies with known CVEs, outdated base images, problematic IaC configurations. Between the two you cover your own code and third-party code.\nBoth are open-source, run without an external server, and produce JSON output that you can easily parse in CI. No licenses or third-party dashboards needed to get started.\nPipeline structure The idea is that security should not be an isolated stage at the end, but something that runs in parallel with the rest of your checks. Here is the general layout:\n1 2 3 4 5 6 7 8 9 stages: - test - security - build - deploy variables: TRIVY_SEVERITY: \u0026#34;HIGH,CRITICAL\u0026#34; SEMGREP_RULES: \u0026#34;p/owasp-top-ten p/security-audit\u0026#34; The security stage runs at the same level as test. If any scan fails, the pipeline stops before building the image or deploying anything.\nSetting up Semgrep Basic job 1 2 3 4 5 6 7 8 9 10 11 12 semgrep: stage: security image: semgrep/semgrep:latest script: - semgrep ci --config \u0026#34;$SEMGREP_RULES\u0026#34; --json --output semgrep-results.json artifacts: paths: - semgrep-results.json when: always rules: - if: $CI_PIPELINE_SOURCE == \u0026#34;merge_request_event\u0026#34; - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH This runs Semgrep on every merge request and on every push to the main branch. Results are always saved as an artifact, even if the pipeline fails — you will want to review them later.\nCustom rules Generic rulesets are fine to start with, but as soon as you have project-specific patterns you want to catch, you will need custom rules. Create a .semgrep/ directory at the project root:\n1 2 3 4 5 6 7 8 9 10 11 # .semgrep/no-env-secrets.yml rules: - id: no-os-environ-secrets patterns: - pattern: os.environ[$KEY] - metavariable-regex: metavariable: $KEY regex: \u0026#34;.*(SECRET|PASSWORD|TOKEN|KEY).*\u0026#34; message: \u0026#34;Direct access to secrets from environment variables. Use the secrets manager.\u0026#34; languages: [python] severity: WARNING Then reference it in the pipeline:\n1 2 variables: SEMGREP_RULES: \u0026#34;p/owasp-top-ten p/security-audit .semgrep/\u0026#34; Handling false positives There will be false positives. That is unavoidable. What matters is how you handle them. The worst reaction is to disable the entire rule or slap allow_failure: true on the job. Instead, use inline annotations:\n1 2 # nosemgrep: python.lang.security.audit.hardcoded-password TEST_PASSWORD = \u0026#34;dummy\u0026#34; # test fixture, not used in production Every suppression should have a comment explaining why it is safe to ignore. No exceptions. If you cannot justify it, do not suppress it.\nFor broader suppressions, use .semgrepignore:\n1 2 3 4 # Exclude test fixtures tests/fixtures/ # Exclude auto-generated code *_generated.py Setting up Trivy Dependency scanning 1 2 3 4 5 6 7 8 9 10 11 12 13 14 trivy-fs: stage: security image: name: aquasec/trivy:latest entrypoint: [\u0026#34;\u0026#34;] script: - trivy fs --severity \u0026#34;$TRIVY_SEVERITY\u0026#34; --exit-code 1 --format json --output trivy-fs.json . artifacts: paths: - trivy-fs.json when: always rules: - if: $CI_PIPELINE_SOURCE == \u0026#34;merge_request_event\u0026#34; - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH Trivy examines the project\u0026rsquo;s lockfiles (package-lock.json, requirements.txt, go.sum, etc.) and cross-references versions against vulnerability databases. The --exit-code 1 flag makes the job fail if it finds anything HIGH or CRITICAL.\nContainer image scanning If you build Docker images, scan them before pushing to the registry:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 trivy-image: stage: security image: name: aquasec/trivy:latest entrypoint: [\u0026#34;\u0026#34;] needs: - job: build-image artifacts: true script: - trivy image --severity \u0026#34;$TRIVY_SEVERITY\u0026#34; --exit-code 1 --format json --output trivy-image.json \u0026#34;$CI_REGISTRY_IMAGE:$CI_COMMIT_SHA\u0026#34; artifacts: paths: - trivy-image.json when: always rules: - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH This job depends on the image build (via needs) and only runs on the main branch, not on every MR. Scanning images is slower than scanning the filesystem, so reserve that for what actually gets deployed.\nIaC scanning One often overlooked advantage of Trivy is that it also analyzes infrastructure configurations:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 trivy-iac: stage: security image: name: aquasec/trivy:latest entrypoint: [\u0026#34;\u0026#34;] script: - trivy config --severity \u0026#34;$TRIVY_SEVERITY\u0026#34; --exit-code 1 --format json --output trivy-iac.json . artifacts: paths: - trivy-iac.json when: always rules: - if: $CI_PIPELINE_SOURCE == \u0026#34;merge_request_event\u0026#34; changes: - \u0026#34;**/*.tf\u0026#34; - \u0026#34;**/Dockerfile\u0026#34; - \u0026#34;**/*.yml\u0026#34; - \u0026#34;**/*.yaml\u0026#34; It catches Dockerfiles running as root, Terraform files with overly open security groups, and Kubernetes configs without resource limits. The changes block ensures it only runs when relevant files are modified, avoiding unnecessary scans.\nBlocking policies Do not start blocking everything from day one. That breeds frustration, creative workarounds, and eventually someone puts allow_failure: true on every security job.\nBetter to do it in phases:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 # Phase 1: Report only semgrep: allow_failure: true # Phase 2: Block critical only semgrep: script: - semgrep ci --config \u0026#34;$SEMGREP_RULES\u0026#34; --json --output semgrep-results.json - | CRITICAL=$(jq \u0026#39;[.results[] | select(.extra.severity == \u0026#34;ERROR\u0026#34;)] | length\u0026#39; semgrep-results.json) if [ \u0026#34;$CRITICAL\u0026#34; -gt 0 ]; then echo \u0026#34;Blocked: $CRITICAL critical findings\u0026#34; exit 1 fi allow_failure: false # Phase 3: Block high + critical # ... adjust the jq filter Same goes for Trivy. The --severity flag already lets you control which levels block the pipeline. Start with CRITICAL only, and once the team has adapted, add HIGH.\nThe complete pipeline Putting it all together:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 stages: - test - security - build - deploy variables: TRIVY_SEVERITY: \u0026#34;HIGH,CRITICAL\u0026#34; SEMGREP_RULES: \u0026#34;p/owasp-top-ten p/security-audit\u0026#34; semgrep: stage: security image: semgrep/semgrep:latest script: - semgrep ci --config \u0026#34;$SEMGREP_RULES\u0026#34; --json --output semgrep-results.json - | CRITICAL=$(jq \u0026#39;[.results[] | select(.extra.severity == \u0026#34;ERROR\u0026#34;)] | length\u0026#39; semgrep-results.json) echo \u0026#34;Critical findings: $CRITICAL\u0026#34; if [ \u0026#34;$CRITICAL\u0026#34; -gt 0 ]; then echo \u0026#34;Pipeline blocked\u0026#34; exit 1 fi artifacts: paths: - semgrep-results.json when: always rules: - if: $CI_PIPELINE_SOURCE == \u0026#34;merge_request_event\u0026#34; - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH trivy-fs: stage: security image: name: aquasec/trivy:latest entrypoint: [\u0026#34;\u0026#34;] script: - trivy fs --severity \u0026#34;$TRIVY_SEVERITY\u0026#34; --exit-code 1 --format json --output trivy-fs.json . artifacts: paths: - trivy-fs.json when: always rules: - if: $CI_PIPELINE_SOURCE == \u0026#34;merge_request_event\u0026#34; - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH trivy-config: stage: security image: name: aquasec/trivy:latest entrypoint: [\u0026#34;\u0026#34;] script: - trivy config --severity \u0026#34;$TRIVY_SEVERITY\u0026#34; --exit-code 1 --format json --output trivy-iac.json . artifacts: paths: - trivy-iac.json when: always rules: - if: $CI_PIPELINE_SOURCE == \u0026#34;merge_request_event\u0026#34; changes: - \u0026#34;**/*.tf\u0026#34; - \u0026#34;**/Dockerfile\u0026#34; - \u0026#34;**/*.yml\u0026#34; trivy-image: stage: security image: name: aquasec/trivy:latest entrypoint: [\u0026#34;\u0026#34;] needs: - job: build artifacts: true script: - trivy image --severity \u0026#34;$TRIVY_SEVERITY\u0026#34; --exit-code 1 --format json --output trivy-image.json \u0026#34;$CI_REGISTRY_IMAGE:$CI_COMMIT_SHA\u0026#34; artifacts: paths: - trivy-image.json when: always rules: - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH Semgrep and the Trivy scans run in parallel within the security stage. If any of them fails, the pipeline stops before building or deploying.\nSurfacing results in merge requests JSON reports are fine for auditing, but developers need feedback visible directly in the MR. GitLab supports native security reports if you use official templates, but with external tools you can parse the JSON and comment on the MR:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 comment-results: stage: .post image: alpine:latest script: - apk add --no-cache curl jq - | SEMGREP_COUNT=$(jq \u0026#39;.results | length\u0026#39; semgrep-results.json 2\u0026gt;/dev/null || echo \u0026#34;0\u0026#34;) TRIVY_COUNT=$(jq \u0026#39;.Results[]?.Vulnerabilities // [] | length\u0026#39; trivy-fs.json 2\u0026gt;/dev/null || echo \u0026#34;0\u0026#34;) BODY=\u0026#34;### Security summary\\n- Semgrep: ${SEMGREP_COUNT} findings\\n- Trivy: ${TRIVY_COUNT} vulnerabilities\u0026#34; curl --request POST \\ --header \u0026#34;PRIVATE-TOKEN: ${GITLAB_API_TOKEN}\u0026#34; \\ --header \u0026#34;Content-Type: application/json\u0026#34; \\ --data \u0026#34;{\\\u0026#34;body\\\u0026#34;: \\\u0026#34;$BODY\\\u0026#34;}\u0026#34; \\ \u0026#34;${CI_API_V4_URL}/projects/${CI_PROJECT_ID}/merge_requests/${CI_MERGE_REQUEST_IID}/notes\u0026#34; rules: - if: $CI_PIPELINE_SOURCE == \u0026#34;merge_request_event\u0026#34; allow_failure: true Not the most elegant solution, but it works. If you are on GitLab Ultimate you get integrated security reports. If not, this gives enough visibility.\nCache and performance Scans add time to the pipeline, and if that time is excessive the team will end up disabling them. A few things that help:\n1 2 3 4 5 6 7 8 trivy-fs: variables: TRIVY_CACHE_DIR: \u0026#34;.trivycache/\u0026#34; cache: key: trivy-db paths: - .trivycache/ # ...rest of the job Caching Trivy\u0026rsquo;s vulnerability database avoids downloading it on every run. It is roughly 40MB downloaded from GitHub, and on shared runners that download can take a while.\nFor Semgrep, the binary itself is already fast, but if your repository is large, limit the paths:\n1 2 3 semgrep: script: - semgrep ci --config \u0026#34;$SEMGREP_RULES\u0026#34; --include=\u0026#34;src/\u0026#34; --include=\u0026#34;app/\u0026#34; --json --output semgrep-results.json No point scanning node_modules/, vendor/, or asset directories.\nWhat I learned running this in practice I have been using this setup in production for a while now, and some things only become clear with actual use.\nTrivy flags a lot of CVEs in base images that have no fix available. If you do not filter with --ignore-unfixed, you will have constant noise. Better to add that flag and focus on what you can actually fix:\n1 trivy image --ignore-unfixed --severity HIGH,CRITICAL my-image:latest With Semgrep, community rulesets are a good starting point, but the rules that deliver the most value are the ones you write yourself, tailored to your project\u0026rsquo;s patterns. A single rule that catches unparameterized ORM queries is worth more than a hundred generic rules.\nAnd the most important thing: do not try to plug every hole at once. Start with scans in report-only mode, review what comes up, tune the rules, and only then start blocking. If you block everything on day one, someone will have put allow_failure: true on every job by day two.\n","date":"2026-03-16T00:00:00Z","permalink":"/en/p/shift-left-security-integrating-trivy-and-semgrep-in-gitlab-ci/","title":"Shift-left security: integrating Trivy and Semgrep in GitLab CI"},{"content":"sdm is a Raspberry Pi SD card image manager that lets you script and repeat everything you usually do by hand: users, SSH keys, packages, encryption, and more. Instead of configuring each card manually, you customize a base image once and then burn as many identical cards as you want.\nIn this post I show how to install and use sdm to create an encrypted RaspiOS image that you can unlock over SSH, and then burn it to an SD card.\nRequirements To run sdm you need:\nLast version of systemd-nspawn The RaspiOS image you want to customize, for example: 2025-12-04-raspios-trixie-arm64-lite.img. Installation Typical installation looks like this:\n1 2 3 4 5 6 7 # 1. Clone the sdm repository git clone https://github.com/gitbls/sdm.git cd sdm # 2. Optionally put sdm in your PATH sudo cp sdm /usr/local/bin/ sudo chmod +x /usr/local/bin/sdm Customize image After reading the documentation on Disk Encryption and plugins, I am performing this customization.\n1 2 3 4 5 6 7 8 9 10 sudo sdm --customize --host rpi-1 --plugin-debug --expand-root --regen-ssh-host-keys \\ --plugin swap:\u0026#34;filesize=2048|zramsize=1024\u0026#34; \\ --plugin user:\u0026#34;deluser=pi|adduser=user|uid=1234|password=12345|addgroup=sudo\u0026#34; \\ --plugin copyfile:\u0026#34;from=/home/user/.ssh/rpi-1.user.ed25519.pub|to=/home/user/.ssh/authorized_keys|chown=user:user|chmod=600|mkdirif\u0026#34; \\ --plugin sshd:\u0026#34;password-authentication=yes|port=22222\u0026#34; \\ --plugin cryptroot:\u0026#34;ssh|authkeys=/home/user/.ssh/initramfs_authorized_keys|crypto=xchacha|ipaddr=192.168.1.20|gateway=192.168.1.1|netmask=255.255.255.0|dns=1.1.1.1\u0026#34; \\ --plugin network:\u0026#34;ifname=eth0|ipv4-static-ip=192.168.1.20|ipv4-static-gateway=192.168.20.1\u0026#34; \\ --plugin disables:piwiz \\ --restart \\ 2025-12-04-raspios-trixie-arm64-lite.img This sdm \u0026ndash;customize command customizes a RasPiOS Trixie ARM64 lite image by:\nExpanding the root filesystem and regenerating SSH host keys Configuring swap, creating a user, deploying an SSH key, and customizing SSH daemon settings Setting up full root filesystem encryption with initramfs SSH unlock and static networking Disabling piwiz and restarting after customization Detailed Breakdown Global switches:\ncustomize: Customize the specified image file. host rpi-1: Set the hostname to rpi-1. plugin-debug: Enable plugin debugging output. expand-root: Expand the root filesystem on first boot. regen-ssh-host-keys: Regenerate SSH host keys during first boot to ensure each device has unique keys. restart: Restart the image after customization completes. rget image: 2025-12-04-raspios-trixie-arm64-lite.img. Plugins swap:\u0026quot;filesize=2048|zramsize=1024\u0026quot;: Configure a 2 GB swap file and a 1 GB zram device via the swap plugin. user:\u0026quot;deluser=pi|adduser=user|uid=4321|password=12345|addgroup=sudo\u0026quot;: Delete the default pi user, create a new user user with UID 4321, set the password, and add to the sudo group. copyfile:\u0026quot;from=...|to=...|chown=user:user|chmod=600|mkdirif\u0026quot;: Copy an SSH public key into the image as authorized_keys for the user account, creating the .ssh directory if needed, and setting ownership/permissions sshd:\u0026ldquo;password-authentication=yes|port=62626\u0026rdquo;: Configure the SSH daemon to allow password authentication and listen on port 62626 cryptroot:\u0026quot;ssh|authkeys=...|crypto=xchacha|ipaddr=...|gateway=...|netmask=...|dns=...\u0026quot;: Set up full root filesystem encryption with LUKS, enable initramfs SSH unlock, use xchacha encryption, and configure static networking for the initramfs network:\u0026quot;ifname=eth0|ipv4-static-ip=...|ipv4-static-gateway=...\u0026quot;: Configure a static IP address for eth0 in the running system disables:piwiz: Disable the piwiz first-boot setup wizard, to not prompting for keyboard and user at first boot. Encrypt the SD card (rootfs) During customize, we have invoke the cryptroot plugin, then now it is only neccesary to burn the image in SD card in /dev/sdb:\n1 sudo sdm --burn /dev/sdb 2025-12-04-raspios-trixie-arm64-lite.img First boot: sdm-auto-encrypt runs and reboots\nThe service runs sdm-cryptconfig \u0026ndash;sdm \u0026hellip; \u0026ndash;reboot, which updates initramfs, cmdline.txt, fstab, and crypttab. Second boot: initramfs prompt; run sdmcryptfs\nThe system drops to (initramfs). Connect a scratch disk larger than used space and run: 1 (initramfs) sdmcryptfs /dev/mmcblk0 /dev/sdY Replace /dev/mmcblk0 with your system disk and /dev/sdY with the scratch disk. Follow prompts to encrypt and unlock the rootfs. Then type exit to continue boot. Final boot: encrypted rootfs active\nThe system now prompts for the passphrase at each boot. If SSH was enabled, you can unlock remotely by SSHing as root to the initramfs IP during boot.\n","date":"2025-12-21T00:00:00Z","permalink":"/en/p/customizing-an-encrypted-raspberry-pi-os-with-sdm/","title":"Customizing an encrypted Raspberry Pi OS with sdm"},{"content":"DevSecOps gets thrown around in job descriptions and conference talks a lot. But behind the buzzword are real lessons that only come from doing the work. From building pipelines that break when you add security gates, to watching teams ignore the tools you spent months deploying, to finally finding what actually works.\nThese are lessons we learned the hard way. They\u0026rsquo;re opinionated, practical, shaped by experience.\nSecurity is everyone\u0026rsquo;s responsibility Sounds like a break room poster, but it\u0026rsquo;s the most important lesson here. If security is only the security team\u0026rsquo;s job, you\u0026rsquo;ve lost.\nDevelopers make security decisions every time they write code, whether they know it or not. How they validate input. How they handle secrets. How they configure network access. Every PR is a security event.\nWhat works: make security part of the normal development workflow, not a gate at the end. Developers learn when they get fast feedback on security issues in their PR. They resent finding out three weeks later from an auditor.\nWe\u0026rsquo;ve seen this repeatedly: teams that treat security as shared responsibility find fewer critical vulnerabilities in production. Teams that silo it find them in the news.\nAutomate everything you can Manual security processes do not scale. Period. If your security review is a human reading a checklist, it will be skipped under deadline pressure, inconsistently applied, and resented by everyone involved.\nAutomate the things that can be automated:\nDependency scanning in every CI build (Dependabot, Snyk, Trivy) Static analysis on every pull request (Semgrep, SonarQube) Secret detection as a pre-commit hook and CI check (gitleaks, detect-secrets) Container image scanning before deployment (Trivy, Grype) Infrastructure as Code scanning (tfsec, Checkov, KICS) Compliance as Code for runtime policy enforcement (OPA, Kyverno) The goal is not to catch everything automatically. The goal is to catch the easy stuff automatically so that human reviewers can focus on the hard stuff: business logic flaws, design-level security issues, threat modeling.\nStart Small One of the biggest mistakes we have made is trying to secure everything at once. You roll out SAST, DAST, SCA, container scanning, IaC scanning, and runtime protection in one quarter. The result? Alert fatigue, developer rebellion, and a wall of unresolved findings that nobody looks at.\nStart with one tool, one pipeline, one team. Get it working well. Get developers comfortable with it. Resolve the false positives. Tune the rules. Then expand.\nA practical progression:\nMonth 1: Secret detection in pre-commit hooks and CI. This is uncontroversial and catches real issues. Month 2: Dependency scanning with automated PR creation for updates. Developers see the value immediately. Month 3: Container image scanning blocking deployments of critical/high vulnerabilities. Month 4+: Static analysis, gradually expanding rule sets. Each step should be stable before moving to the next. Rushing creates noise, and noise teaches people to ignore alerts.\nBlameless culture matters When a security incident happens because someone pushed a secret to a public repo, or because a vulnerability was not patched in time, the response matters more than the incident itself.\nIf people get blamed, they hide things. They do not report near-misses. They cover up mistakes. And the next incident will be worse because nobody shared the lessons from the last one.\nBlameless postmortems are not about letting people off the hook. They are about understanding systemic failures. Why was it possible to push a secret? Why was there no scanning? Why was the patching process slow? Fix the system, not the person.\nWe have found that teams with genuinely blameless cultures have significantly better security postures. People report suspicious things. They ask for help early. They flag risks before they become incidents.\nTooling is not enough without culture change We once deployed a comprehensive security scanning pipeline with beautiful dashboards, Slack notifications, Jira ticket creation, the works. Six months later, there were 3,000 unresolved findings and the Slack channel was muted by every developer.\nThe tools were fine. The culture was not ready.\nBefore you deploy tooling, invest in:\nTraining: Developers need to understand why the tool exists and how to act on its findings. Ownership: Someone needs to own the backlog of findings and triage them. If nobody owns it, nobody does it. SLAs: Define clear timelines for remediating findings by severity. Critical gets 48 hours. High gets a week. Medium gets a sprint. Low gets a quarter. Feedback loops: When a tool produces a false positive, there must be an easy way to report it and get the rule tuned. Otherwise, developers learn to ignore everything. Invest in developer experience for security tools If your security tool makes developers\u0026rsquo; lives harder, they will find a way around it. This is not a character flaw. It is human nature and good engineering instinct: remove obstacles to shipping.\nThe security tools that get adopted are the ones that:\nRun fast: A SAST scan that takes 20 minutes will be bypassed. One that takes 30 seconds will be tolerated. Integrate natively: Show results in the PR, not in a separate portal. Nobody wants to log into another dashboard. Have low false positive rates: Every false positive erodes trust. Invest time in tuning. Provide actionable guidance: \u0026ldquo;SQL injection vulnerability on line 42\u0026rdquo; is useless without \u0026ldquo;here is how to fix it.\u0026rdquo; Fail gracefully: If the scanner is down, the pipeline should warn, not block. Availability of the development pipeline is non-negotiable. We think of it this way: if a developer has to change their workflow to accommodate a security tool, the tool has failed. The best security tooling is invisible.\nMonitoring and observability are non-negotiable You cannot secure what you cannot see. Security monitoring is not optional, and it is not something you bolt on after the fact.\nWhat this means in practice:\nCentralized logging: All application, infrastructure, and security tool logs in one place. If you have to SSH into a box to read logs, you are already behind. Audit trails: Who did what, when, and from where. Every deployment, every config change, every access request. Alerting on anomalies: Not just \u0026ldquo;is the service up?\u0026rdquo; but \u0026ldquo;is this access pattern normal?\u0026rdquo; Unusual API call volumes, access from new locations, privilege escalations. Runtime security: Tools like Falco for container runtime monitoring. Know when something unexpected happens in production. Monitoring is also how you prove to auditors and customers that your security controls are working. \u0026ldquo;Trust us\u0026rdquo; is not a compliance strategy.\nOpen source is your ally Some of the best security tools available are open source. Trivy, Falco, OPA, Semgrep, gitleaks, cosign, KICS, Checkov. The ecosystem is rich and maturing fast.\nBenefits of open source security tooling:\nTransparency: You can read the rules and understand exactly what is being checked. Community: Thousands of contributors finding edge cases and adding detection rules. No vendor lock-in: You can switch tools without renegotiating a contract. Cost: Start for free, scale as needed. This does not mean commercial tools have no place. Some provide valuable aggregation, management, and support. But you can build a very solid security pipeline with open source tools alone, and we think every team should start there.\nContinuous learning is essential The threat landscape changes constantly. The tools change. The best practices evolve. What was considered secure two years ago might have a CVE today.\nWhat we do to stay current:\nDedicate time for learning: At least a few hours per sprint for the team to read about new vulnerabilities, tools, and techniques. This is not a nice-to-have. It is a professional requirement. Run internal CTFs and tabletop exercises: Nothing teaches security like trying to break things. Regular exercises keep skills sharp and reveal gaps in your defenses. Participate in the community: Attend meetups, contribute to open source, read advisories. The security community is generous with knowledge. Take advantage of it. Review and update: Quarterly reviews of your security tooling, policies, and incident response procedures. What worked last quarter may not work next quarter. Final Thoughts DevSecOps isn\u0026rsquo;t a destination. There\u0026rsquo;s no point where you say \u0026ldquo;we\u0026rsquo;re done, we\u0026rsquo;re secure.\u0026rdquo; It\u0026rsquo;s a continuous practice of reducing risk, improving visibility, building a culture where security is as natural as writing tests.\nThe most important lesson: perfect is the enemy of good. A basic security pipeline that developers actually use beats a comprehensive one they bypass. Start where you are, improve iteratively, never stop.\n","date":"2025-10-20T00:00:00Z","permalink":"/en/p/lessons-learned-in-devsecops/","title":"Lessons learned in DevSecOps"},{"content":"LLMs have moved beyond chatbots. They\u0026rsquo;re now embedded in engineering workflows where they automate tedious tasks, speed incident response, and boost developer productivity. But deploying an LLM into a production DevOps pipeline is fundamentally different from using ChatGPT in a browser.\nThis guide covers what LLMOps means in practice, where LLMs fit into DevOps, architecture patterns that work, and pitfalls to avoid.\nWhat is LLMOps? LLMOps is the practices, tools, and infrastructure needed to operationalize LLMs. It extends MLOps but addresses challenges unique to language models:\nModel selection vs. model training: Most teams consume pre-trained models (via APIs or self-hosted inference) rather than training from scratch. The operational focus shifts to prompt engineering, fine-tuning, and retrieval-augmented generation (RAG). Cost management: LLM inference is expensive. Token-based pricing means costs scale with usage in ways that are harder to predict than traditional compute. Non-determinism: LLMs produce variable outputs for the same input, which complicates testing, validation, and reproducibility. Latency: Response times of seconds (not milliseconds) require different architectural patterns than traditional microservices. LLMOps is not a separate discipline. It is an extension of your existing DevOps and MLOps practices, adapted for the specific operational characteristics of language models.\nPractical use cases in DevOps Here is where LLMs are delivering real value in DevOps workflows today:\nAutomated code review LLMs can provide a first-pass review of pull requests, catching common issues like missing error handling, security anti-patterns, inconsistent naming, or missing tests. They do not replace human reviewers but reduce the burden of repetitive feedback.\nIncident summarization When an incident fires at 3 AM, the on-call engineer needs context fast. An LLM can ingest alert data, recent deployment logs, related runbooks, and previous incident reports to produce a concise summary of what is likely going wrong and what was done last time.\nLog analysis LLMs are surprisingly effective at pattern recognition in unstructured log data. Feed them a block of error logs and they can identify the root cause faster than manual grep sessions, especially for unfamiliar systems.\nDocumentation generation Generating draft documentation from code, API schemas, or Terraform modules. The output needs human review, but it eliminates the blank-page problem and keeps docs closer to current state.\nInfrastructure as Code generation Given a natural language description of desired infrastructure, LLMs can generate Terraform, Ansible, or Kubernetes manifests as a starting point. Useful for scaffolding, not for production-ready code without review.\nArchitecture patterns for LLM integration Pattern 1: API gateway to external LLM The simplest approach. Your application calls an external LLM API (OpenAI, Anthropic, etc.) through a centralized gateway that handles authentication, rate limiting, logging, and cost tracking.\n1 2 3 4 5 [CI/CD Pipeline] --\u0026gt; [API Gateway] --\u0026gt; [External LLM API] | [Logging \u0026amp; Metrics] | [Cost Tracking] Pros: No infrastructure to manage, access to the most capable models, fast to implement. Cons: Data leaves your network, vendor lock-in, variable latency, ongoing API costs.\nPattern 2: Self-hosted inference Run open-weight models (Llama, Mistral, etc.) on your own infrastructure using inference servers like vLLM or Ollama.\n1 2 3 [CI/CD Pipeline] --\u0026gt; [Load Balancer] --\u0026gt; [vLLM / Ollama Instance(s)] | [GPU Node Pool] Pros: Data stays internal, predictable costs at scale, no vendor dependency, full control over model versions. Cons: Requires GPU infrastructure, operational overhead, smaller models may be less capable.\nPattern 3: RAG-enhanced pipeline Combine an LLM with a retrieval system that provides relevant context from your own knowledge base (runbooks, documentation, past incidents). This dramatically improves response quality for domain-specific tasks.\n1 2 3 4 [Query] --\u0026gt; [Embedding Model] --\u0026gt; [Vector DB Search] --\u0026gt; [Context + Query] --\u0026gt; [LLM] --\u0026gt; [Response] | [Your Knowledge Base] (runbooks, docs, etc.) This pattern is particularly powerful for incident response and documentation tasks where the LLM needs your organization\u0026rsquo;s specific context.\nKey considerations Cost LLM API costs can be surprising. A code review pipeline that processes 50 PRs per day with large diffs can easily run hundreds of dollars per month. Strategies to control costs:\nSet token limits per request Cache common queries and responses Use smaller models for simpler tasks (triage with a small model, escalate to a larger one) Monitor token usage per pipeline and set alerts Latency LLM responses take seconds, not milliseconds. Design your integrations as asynchronous processes:\nPost code review comments after the fact, do not block the PR Process incident data in the background, push results to a Slack channel Use streaming responses where possible to improve perceived performance Hallucinations LLMs will confidently generate plausible-sounding but incorrect information. This is a critical concern for DevOps tasks where bad advice can cause outages.\nMitigations:\nAlways present LLM output as suggestions, never as authoritative actions Require human approval before any LLM-generated change is applied Use RAG to ground responses in verified documentation Implement output validation (e.g., lint generated IaC before presenting it) Security Data exposure: Anything you send to an external LLM API may be used for training or stored. Never send secrets, credentials, or sensitive customer data. Prompt injection: Malicious content in code, logs, or user input can manipulate LLM behavior. Sanitize inputs and validate outputs. Supply chain: LLM-generated code may introduce vulnerabilities. Run all generated code through your existing security scanning pipeline. Tools and platforms LangChain A framework for building LLM-powered applications. Useful for orchestrating multi-step chains (e.g., retrieve context, format prompt, call LLM, parse output). Supports many LLM providers and has good tooling for RAG pipelines.\n1 2 3 4 5 6 7 8 from langchain.chat_models import ChatOpenAI from langchain.prompts import ChatPromptTemplate prompt = ChatPromptTemplate.from_template( \u0026#34;Review this code diff for security issues and suggest fixes:\\n\\n{diff}\u0026#34; ) chain = prompt | ChatOpenAI(model=\u0026#34;gpt-4o\u0026#34;, temperature=0) result = chain.invoke({\u0026#34;diff\u0026#34;: code_diff}) vLLM A high-throughput inference engine for self-hosted models. Supports PagedAttention for efficient memory management and continuous batching for high throughput.\n1 2 3 4 # Start a vLLM server python -m vllm.entrypoints.openai.api_server \\ --model mistralai/Mistral-7B-Instruct-v0.2 \\ --port 8000 Exposes an OpenAI-compatible API, so you can swap between self-hosted and external APIs with minimal code changes.\nOllama The easiest way to run LLMs locally for development and testing. Great for prototyping pipelines before committing to infrastructure.\n1 2 3 4 5 6 7 # Pull and run a model ollama pull llama3 ollama run llama3 \u0026#34;Summarize this error log: [paste log]\u0026#34; # Serve as an API ollama serve # Then call http://localhost:11434/api/generate Example: Automated PR review pipeline Here is a conceptual pipeline for automated PR review using an LLM:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 # .github/workflows/llm-review.yml name: LLM Code Review on: pull_request: types: [opened, synchronize] jobs: llm-review: runs-on: ubuntu-latest steps: - name: Checkout uses: actions/checkout@v4 with: fetch-depth: 0 - name: Get diff id: diff run: | git diff origin/${{ github.base_ref }}...HEAD \u0026gt; diff.txt - name: Run LLM review env: LLM_API_KEY: ${{ secrets.LLM_API_KEY }} run: | python scripts/llm_review.py \\ --diff diff.txt \\ --model gpt-4o \\ --max-tokens 2000 \\ --output review.json - name: Post review comments uses: actions/github-script@v7 with: script: | const review = require(\u0026#39;./review.json\u0026#39;); await github.rest.pulls.createReview({ owner: context.repo.owner, repo: context.repo.repo, pull_number: context.issue.number, body: review.summary, event: \u0026#39;COMMENT\u0026#39;, comments: review.line_comments }); The review script would:\nRead the diff Split large diffs into chunks that fit within the model\u0026rsquo;s context window For each chunk, construct a prompt asking for security issues, bugs, and style problems Aggregate results and format as GitHub review comments Include confidence scores and always mark output as AI-generated Guardrails and responsible use Label all LLM output clearly as AI-generated. Engineers should know when they are reading machine output. Never auto-merge or auto-apply LLM suggestions. Keep a human in the loop for all changes. Log all prompts and responses for debugging and audit purposes. Set spending limits and alerts on LLM API usage. Review prompt templates regularly to ensure they do not leak sensitive information. Test for bias and errors with representative samples before deploying to production workflows. Getting started recommendations Pick one use case - Don\u0026rsquo;t try to LLM-enable everything at once. Start low-risk: documentation drafts, commit message suggestions. Start with an external API - Don\u0026rsquo;t invest in GPU infrastructure until you\u0026rsquo;ve validated the use case. Use OpenAI or Anthropic to prototype. Measure everything - Track cost per invocation, latency, user satisfaction, error rates from day one. Build an evaluation framework - Create a test suite of known-good inputs and expected outputs. Run it against every prompt change or model update. Plan your data strategy - Decide early what data you\u0026rsquo;ll and won\u0026rsquo;t send to external APIs. Document clearly. Iterate on prompts - Prompt engineering is iterative. Version control prompts, treat as code. LLMs are a powerful tool for DevOps automation, but they\u0026rsquo;re exactly that: a tool. They work best when thoughtfully integrated into existing workflows, with clear boundaries on what they can and cannot do autonomously.\n","date":"2025-06-15T00:00:00Z","permalink":"/en/p/llmops-integrating-llms-into-devops-workflows/","title":"LLMOps: integrating LLMs into DevOps workflows"},{"content":"Supply chain attacks are no longer theoretical. The 2020 SolarWinds breach showed how one compromised build pipeline could hit thousands of organizations, including government agencies. Then Log4Shell in 2021 proved that a vulnerability deep in a transitive dependency could threaten every Java app overnight. The message was clear: we need visibility into what\u0026rsquo;s in our software and stronger integrity guarantees.\nThis guide covers the practical tools: SBOMs, Sigstore, and SLSA framework.\nWhy Supply Chain Security Matters Traditional security focuses on your own code: static analysis, dependency scanning, penetration testing. Supply chain security extends that perimeter to everything your software depends on and every step in the process of building and delivering it.\nThe attack surface includes:\nSource code repositories: compromised developer accounts, malicious commits Dependencies: typosquatting, dependency confusion, compromised upstream packages Build systems: tampered CI/CD pipelines, injected build steps Artifact registries: replaced binaries, unsigned packages Deployment pipelines: modified manifests, man-in-the-middle attacks A single weak link in any of these stages can compromise the entire chain.\nSoftware Bill of Materials (SBOM) An SBOM is a formal, machine-readable inventory of all components in a piece of software. Think of it as an ingredients list for your application. It includes direct dependencies, transitive dependencies, their versions, licenses, and relationships.\nWhy You Need One Vulnerability response: When a new CVE drops (like Log4Shell), you can instantly check if any of your applications are affected. License compliance: Know exactly what licenses you are shipping. Regulatory requirements: The US Executive Order 14028 and the EU Cyber Resilience Act both push toward mandatory SBOMs. Transparency: Customers and partners can verify what they are running. SBOM Formats Two main formats dominate:\nSPDX (Software Package Data Exchange): An ISO standard (ISO/IEC 5962:2021), originally focused on license compliance, now comprehensive. Supports JSON, RDF, YAML, and tag-value formats. CycloneDX: An OWASP project, designed from the ground up for security use cases. Supports JSON and XML. Lighter weight and more opinionated. Both are solid choices. CycloneDX tends to be easier to work with for security-focused workflows. SPDX has broader adoption in compliance-heavy industries.\nGenerating SBOMs with Syft Syft from Anchore is one of the best tools for generating SBOMs. It supports container images, filesystems, and archives.\nInstall syft:\n1 curl -sSfL https://raw.githubusercontent.com/anchore/syft/main/install.sh | sh -s -- -b /usr/local/bin Generate an SBOM from a container image:\n1 2 3 4 5 # CycloneDX format (JSON) syft packages registry.example.com/myapp:v1.2.3 -o cyclonedx-json \u0026gt; sbom.cdx.json # SPDX format (JSON) syft packages registry.example.com/myapp:v1.2.3 -o spdx-json \u0026gt; sbom.spdx.json Generate an SBOM from a local directory:\n1 syft packages dir:/path/to/project -o cyclonedx-json \u0026gt; sbom.cdx.json You can then scan the SBOM for vulnerabilities using Grype:\n1 grype sbom:sbom.cdx.json The Sigstore Ecosystem Sigstore is an open-source project that makes cryptographic signing and verification accessible. It eliminates the need to manage long-lived signing keys, which has historically been the main barrier to adoption of artifact signing.\nThe ecosystem has three core components:\ncosign: Signs and verifies container images and other OCI artifacts. Fulcio: A certificate authority that issues short-lived certificates based on OIDC identity (your existing identity provider). Rekor: A transparency log that creates an immutable, tamper-resistant record of signing events. How It Works You authenticate with an OIDC provider (GitHub, Google, Microsoft, etc.). Fulcio issues a short-lived certificate tied to your identity. cosign uses that certificate to sign your artifact. The signing event is recorded in Rekor\u0026rsquo;s transparency log. Anyone can verify the signature using the transparency log, without needing your public key. This is called \u0026ldquo;keyless\u0026rdquo; signing. No keys to rotate, no secrets to manage, no PKI infrastructure to maintain.\nSigning Container Images with Cosign Install cosign:\n1 2 3 4 5 6 # Using Go go install github.com/sigstore/cosign/v2/cmd/cosign@latest # Or download a release curl -sSfL https://github.com/sigstore/cosign/releases/latest/download/cosign-linux-amd64 -o /usr/local/bin/cosign chmod +x /usr/local/bin/cosign Sign an image (keyless mode):\n1 cosign sign registry.example.com/myapp:v1.2.3 This will open a browser for OIDC authentication. In CI, you can use workload identity (e.g., GitHub Actions OIDC token) for non-interactive signing.\nVerify an image:\n1 2 3 cosign verify registry.example.com/myapp:v1.2.3 \\ --certificate-identity=user@example.com \\ --certificate-oidc-issuer=https://accounts.google.com Attach an SBOM to an image and sign it:\n1 2 3 4 5 # Attach the SBOM cosign attach sbom --sbom sbom.cdx.json registry.example.com/myapp:v1.2.3 # Sign the SBOM attachment cosign sign --attachment sbom registry.example.com/myapp:v1.2.3 The SLSA Framework SLSA (Supply-chain Levels for Software Artifacts, pronounced \u0026ldquo;salsa\u0026rdquo;) is a framework that defines increasing levels of supply chain integrity guarantees.\nSLSA Levels Level 0: No guarantees. This is where most projects start. Level 1: The build process is documented and produces provenance (metadata about how an artifact was built). Level 2: The build is hosted on a hosted build service that generates authenticated provenance. Level 3: The build platform provides hardened builds with tamper-resistant provenance. The build environment is isolated and ephemeral. Each level builds on the previous one. The goal is not to jump to Level 3 immediately but to incrementally improve your posture.\nSLSA Provenance Provenance answers the critical questions: Who built this? What source was used? What build process was followed? Was the build environment tamper-proof?\nSLSA provenance is a signed attestation in the in-toto format that captures this information.\nCI/CD Integration: GitHub Actions Example Here is a practical GitHub Actions workflow that builds an image, generates an SBOM, signs everything, and generates SLSA provenance:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 name: Build, Sign, and Attest on: push: tags: - \u0026#39;v*\u0026#39; permissions: contents: read packages: write id-token: write # Required for keyless signing env: REGISTRY: ghcr.io IMAGE_NAME: ${{ github.repository }} jobs: build-sign-attest: runs-on: ubuntu-latest steps: - name: Checkout uses: actions/checkout@v4 - name: Log in to GHCR uses: docker/login-action@v3 with: registry: ${{ env.REGISTRY }} username: ${{ github.actor }} password: ${{ secrets.GITHUB_TOKEN }} - name: Build and push image id: build uses: docker/build-push-action@v5 with: push: true tags: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.ref_name }} - name: Install cosign uses: sigstore/cosign-installer@v3 - name: Install syft uses: anchore/sbom-action/download-syft@v0 - name: Sign the image run: | cosign sign --yes \\ ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.ref_name }}@${{ steps.build.outputs.digest }} - name: Generate SBOM run: | syft packages \\ ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.ref_name }}@${{ steps.build.outputs.digest }} \\ -o cyclonedx-json \u0026gt; sbom.cdx.json - name: Attach and sign SBOM run: | cosign attach sbom --sbom sbom.cdx.json \\ ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.ref_name }}@${{ steps.build.outputs.digest }} cosign sign --yes --attachment sbom \\ ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.ref_name }}@${{ steps.build.outputs.digest }} - name: Verify signature run: | cosign verify \\ --certificate-identity-regexp=\u0026#34;https://github.com/${{ github.repository }}/*\u0026#34; \\ --certificate-oidc-issuer=https://token.actions.githubusercontent.com \\ ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.ref_name }}@${{ steps.build.outputs.digest }} The id-token: write permission is what enables keyless signing in GitHub Actions. The GitHub OIDC token is automatically used by cosign without any manual key management.\nPractical Adoption Roadmap You do not need to do everything at once. Here is a sensible progression:\nWeek 1-2: Visibility\nStart generating SBOMs for your most critical applications using syft. Integrate Grype into your CI pipeline for vulnerability scanning against the SBOM. Week 3-4: Signing\nSet up cosign keyless signing in your CI/CD pipelines. Sign your container images on every build. Month 2: Verification\nEnforce signature verification in your deployment pipelines (e.g., Kubernetes admission controllers like Kyverno or Sigstore policy-controller). Attach SBOMs to images and sign them. Month 3+: SLSA Provenance\nAdd SLSA provenance generation using slsa-github-generator. Work toward SLSA Level 2, then Level 3. Automate provenance verification in your deployment tooling. Key Takeaways SBOMs give visibility - You can\u0026rsquo;t secure what you can\u0026rsquo;t see. Generate SBOMs for every artifact. Sigstore removes the excuse - Keyless signing kills key management overhead. No good reason not to sign. SLSA is a maturity model - Use it to improve supply chain integrity incrementally, not as all-or-nothing. Automate everything - These tools are built for CI/CD integration. Manual doesn\u0026rsquo;t scale. The supply chain security ecosystem is maturing fast. Tools are production-ready, standards are solidifying, and regulatory pressure keeps rising. The best time to start was yesterday. The second-best is now.\n","date":"2025-02-10T00:00:00Z","permalink":"/en/p/supply-chain-security-sbom-and-sigstore/","title":"Supply chain security: SBOM and Sigstore"},{"content":"Kubernetes Attack Surface Out-of-the-box Kubernetes isn\u0026rsquo;t secure. The cluster exposes multiple attack vectors that need systematic attention:\nAPI Server - Central control point; unauthorized access = full cluster compromise etcd - Stores all cluster state including secrets in base64 (not encrypted by default) Kubelet - Node agent; exposed kubelet API leaks pod info and allows command execution Container runtime - Breakout vulnerabilities give host access Network - Pods can communicate freely with any other pod by default Supply chain - Malicious or vulnerable images introduce backdoors This guide covers the key hardening measures organized by attack vector.\nAPI Server Hardening The API server is the most critical component. Secure it first:\nDisable anonymous authentication: set --anonymous-auth=false Enable audit logging: track who did what (covered in detail below) Restrict access: use firewall rules or security groups to limit who can reach the API server Enable admission controllers: PodSecurity, NodeRestriction, ResourceQuota, and LimitRanger should all be active Use OIDC for authentication: integrate with your identity provider instead of relying on client certificates 1 2 # Check current API server flags (on kubeadm clusters) kubectl -n kube-system get pod kube-apiserver-\u0026lt;node\u0026gt; -o jsonpath=\u0026#39;{.spec.containers[0].command}\u0026#39; | tr \u0026#39;,\u0026#39; \u0026#39;\\n\u0026#39; RBAC Best Practices RBAC is your primary authorization mechanism. Enforce least privilege rigorously.\nKey Rules Never use cluster-admin for workloads \u0026ndash; it grants unlimited access Use Roles (namespaced) over ClusterRoles whenever possible Avoid wildcards in resource or verb specifications Bind to ServiceAccounts, not users for automated workloads Review bindings regularly \u0026ndash; permissions accumulate over time Example: Restricted Deployment Role 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: namespace: production name: deployment-manager rules: # Can manage deployments - apiGroups: [\u0026#34;apps\u0026#34;] resources: [\u0026#34;deployments\u0026#34;] verbs: [\u0026#34;get\u0026#34;, \u0026#34;list\u0026#34;, \u0026#34;watch\u0026#34;, \u0026#34;create\u0026#34;, \u0026#34;update\u0026#34;, \u0026#34;patch\u0026#34;] # Can view pods and logs - apiGroups: [\u0026#34;\u0026#34;] resources: [\u0026#34;pods\u0026#34;, \u0026#34;pods/log\u0026#34;] verbs: [\u0026#34;get\u0026#34;, \u0026#34;list\u0026#34;, \u0026#34;watch\u0026#34;] # Can view services - apiGroups: [\u0026#34;\u0026#34;] resources: [\u0026#34;services\u0026#34;] verbs: [\u0026#34;get\u0026#34;, \u0026#34;list\u0026#34;, \u0026#34;watch\u0026#34;] # Explicitly NO access to secrets, configmaps/write, or exec --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: namespace: production name: deployment-manager-binding subjects: - kind: ServiceAccount name: ci-deployer namespace: production roleRef: kind: Role name: deployment-manager apiGroup: rbac.authorization.k8s.io Audit existing RBAC with:\n1 2 3 4 5 6 # List all cluster-admin bindings kubectl get clusterrolebindings -o json | \\ jq \u0026#39;.items[] | select(.roleRef.name == \u0026#34;cluster-admin\u0026#34;) | {name: .metadata.name, subjects: .subjects}\u0026#39; # Check what a specific service account can do kubectl auth can-i --list --as=system:serviceaccount:production:ci-deployer -n production Pod Security Standards Kubernetes Pod Security Standards (PSS) define three security restriction levels. Since 1.25, Pod Security Admission (PSA) is the built-in enforcement mechanism (replaces PodSecurityPolicy).\nThe Three Levels Level Description Use Case Privileged No restrictions System-level workloads (CNI, storage drivers) Baseline Blocks known privilege escalations General-purpose workloads Restricted Maximum hardening Sensitive workloads, multi-tenant clusters Enforcing Pod Security Standards Apply standards at the namespace level using labels:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 apiVersion: v1 kind: Namespace metadata: name: production labels: # Enforce restricted: reject pods that violate pod-security.kubernetes.io/enforce: restricted pod-security.kubernetes.io/enforce-version: latest # Warn on baseline violations (shows warning but allows) pod-security.kubernetes.io/warn: restricted pod-security.kubernetes.io/warn-version: latest # Audit: log violations pod-security.kubernetes.io/audit: restricted pod-security.kubernetes.io/audit-version: latest Compliant Pod Example A pod that passes the restricted level:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 apiVersion: v1 kind: Pod metadata: name: secure-app namespace: production spec: securityContext: runAsNonRoot: true seccompProfile: type: RuntimeDefault containers: - name: app image: myregistry.com/app:v1.2.3@sha256:abc123... securityContext: allowPrivilegeEscalation: false readOnlyRootFilesystem: true runAsUser: 1000 runAsGroup: 1000 capabilities: drop: - ALL resources: limits: memory: \u0026#34;256Mi\u0026#34; cpu: \u0026#34;500m\u0026#34; requests: memory: \u0026#34;128Mi\u0026#34; cpu: \u0026#34;250m\u0026#34; Key security settings to always include:\nrunAsNonRoot: true \u0026ndash; never run containers as root readOnlyRootFilesystem: true \u0026ndash; prevent filesystem modifications allowPrivilegeEscalation: false \u0026ndash; block setuid/setgid capabilities.drop: [\u0026quot;ALL\u0026quot;] \u0026ndash; remove all Linux capabilities seccompProfile.type: RuntimeDefault \u0026ndash; apply default seccomp profile Always specify resource limits to prevent DoS NetworkPolicies By default, all pods can communicate with all other pods in a cluster. NetworkPolicies restrict this to only the traffic that is necessary.\nDefault Deny All Start every namespace with a default deny policy:\n1 2 3 4 5 6 7 8 9 10 apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: default-deny-all namespace: production spec: podSelector: {} policyTypes: - Ingress - Egress Allow Specific Traffic Then explicitly allow only what is needed:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-api-to-db namespace: production spec: podSelector: matchLabels: app: database policyTypes: - Ingress ingress: - from: - podSelector: matchLabels: app: api-server ports: - protocol: TCP port: 5432 --- apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: api-server-egress namespace: production spec: podSelector: matchLabels: app: api-server policyTypes: - Egress egress: # Allow DNS resolution - to: [] ports: - protocol: UDP port: 53 - protocol: TCP port: 53 # Allow connection to database - to: - podSelector: matchLabels: app: database ports: - protocol: TCP port: 5432 # Allow external HTTPS - to: - ipBlock: cidr: 0.0.0.0/0 except: - 10.0.0.0/8 - 172.16.0.0/12 - 192.168.0.0/16 ports: - protocol: TCP port: 443 Note: NetworkPolicies require a CNI plugin that supports them (Calico, Cilium, Weave Net). The default kubenet does not enforce NetworkPolicies.\nSecrets Management Kubernetes Secrets are base64-encoded, not encrypted. Anyone with read access to Secrets in a namespace can decode them trivially. Proper secrets management requires additional tooling.\nOption 1: Sealed Secrets Sealed Secrets (by Bitnami) encrypts secrets client-side so they can be safely stored in Git:\n1 2 3 4 5 6 7 8 9 10 # Install kubeseal CLI # Encrypt a secret kubectl create secret generic db-creds \\ --from-literal=password=supersecret \\ --dry-run=client -o yaml | \\ kubeseal --format yaml \u0026gt; sealed-db-creds.yaml # The sealed secret can be committed to Git # Only the cluster\u0026#39;s controller can decrypt it kubectl apply -f sealed-db-creds.yaml Option 2: External Secrets Operator External Secrets Operator syncs secrets from external providers (AWS Secrets Manager, HashiCorp Vault, GCP Secret Manager) into Kubernetes:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 apiVersion: external-secrets.io/v1beta1 kind: ExternalSecret metadata: name: db-credentials namespace: production spec: refreshInterval: 1h secretStoreRef: name: aws-secrets-manager kind: ClusterSecretStore target: name: db-credentials data: - secretKey: password remoteRef: key: production/database property: password Enable Encryption at Rest Ensure etcd encrypts secrets at rest:\n1 2 3 4 5 6 7 8 9 10 11 12 # /etc/kubernetes/encryption-config.yaml apiVersion: apiserver.config.k8s.io/v1 kind: EncryptionConfiguration resources: - resources: - secrets providers: - aescbc: keys: - name: key1 secret: \u0026lt;base64-encoded-32-byte-key\u0026gt; - identity: {} Audit Logging Kubernetes audit logs record every request to the API server, providing a detailed trail of who did what and when.\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 # audit-policy.yaml apiVersion: audit.k8s.io/v1 kind: Policy rules: # Log all requests to secrets at the Metadata level - level: Metadata resources: - group: \u0026#34;\u0026#34; resources: [\u0026#34;secrets\u0026#34;] # Log pod exec/attach at RequestResponse level - level: RequestResponse resources: - group: \u0026#34;\u0026#34; resources: [\u0026#34;pods/exec\u0026#34;, \u0026#34;pods/attach\u0026#34;] # Log all write operations at Request level - level: Request verbs: [\u0026#34;create\u0026#34;, \u0026#34;update\u0026#34;, \u0026#34;patch\u0026#34;, \u0026#34;delete\u0026#34;] # Log everything else at Metadata level - level: Metadata omitStages: - RequestReceived Enable in the API server with:\n1 2 3 4 5 --audit-policy-file=/etc/kubernetes/audit-policy.yaml --audit-log-path=/var/log/kubernetes/audit.log --audit-log-maxage=30 --audit-log-maxbackup=10 --audit-log-maxsize=100 Ship audit logs to your SIEM or log aggregation system (Loki, Elasticsearch) for analysis and alerting.\nImage Policies with Kyverno Kyverno is a policy engine for Kubernetes that validates, mutates, and generates resources based on policies. It is simpler to adopt than OPA Gatekeeper because policies are written as Kubernetes resources rather than Rego.\nRequire Image Digest and Trusted Registry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 apiVersion: kyverno.io/v1 kind: ClusterPolicy metadata: name: require-image-digest spec: validationFailureAction: Enforce rules: - name: require-digest match: any: - resources: kinds: - Pod validate: message: \u0026#34;Images must use a digest (@sha256:...) instead of a tag\u0026#34; pattern: spec: containers: - image: \u0026#34;*@sha256:*\u0026#34; - name: require-trusted-registry match: any: - resources: kinds: - Pod validate: message: \u0026#34;Images must come from the trusted registry\u0026#34; pattern: spec: containers: - image: \u0026#34;myregistry.com/*\u0026#34; This policy ensures that only images from your trusted registry with pinned digests are deployed, preventing both supply chain attacks and tag mutability issues.\nCIS Kubernetes Benchmark The CIS Kubernetes Benchmark provides a comprehensive set of security recommendations. Run automated checks with kube-bench:\n1 2 3 4 5 # Run CIS benchmark checks kubectl apply -f https://raw.githubusercontent.com/aquasecurity/kube-bench/main/job.yaml # View results kubectl logs job/kube-bench Address findings by priority: critical items first (API server auth, etcd encryption), then high (RBAC, network policies), then medium and low.\nHardening Checklist Use this checklist as a starting point for securing your cluster:\nAPI server: anonymous auth disabled, audit logging enabled etcd: encrypted at rest, access restricted to API server only RBAC: no unnecessary cluster-admin bindings, least privilege enforced Pod Security: restricted PSS enforced on production namespaces NetworkPolicies: default deny in all namespaces, explicit allow rules Secrets: external secrets manager or sealed secrets, encryption at rest Images: signed images, trusted registry enforcement, digest pinning Nodes: automatic security updates, CIS-hardened OS Audit: API server audit logs shipped to SIEM Monitoring: alerts on RBAC changes, privileged pod creation, exec into pods Supply chain: vulnerability scanning in CI/CD, admission-time image scanning Network: TLS between all components, service mesh for mTLS between pods Security hardening is not a one-time activity. Schedule quarterly reviews to reassess your posture, run CIS benchmarks, review RBAC bindings, and update policies as your cluster evolves.\n","date":"2024-11-03T00:00:00Z","permalink":"/en/p/kubernetes-security-hardening-guide/","title":"Kubernetes security hardening guide"},{"content":"The Three Pillars of Observability Observability is the ability to understand what is happening inside your systems by examining their external outputs. It rests on three pillars:\nMetrics are numeric measurements collected over time: CPU usage, request latency, error rates, queue depths. They\u0026rsquo;re cheap to store, fast to query, and perfect for dashboards and alerts. Prometheus is the dominant tool here.\nLogs are timestamped text records of discrete events: application errors, access logs, audit trails. They provide detailed context that metrics can\u0026rsquo;t. Loki, Elasticsearch, and Fluentd handle log aggregation.\nTraces follow a single request as it moves through multiple services, showing latency at each hop. Jaeger and Tempo are the main open-source options. Traces are essential for debugging distributed systems, but they\u0026rsquo;re the most complex to instrument.\nThis guide focuses on metrics and logs using the Prometheus + Grafana + Loki stack, which covers the majority of observability needs for most teams.\nPrometheus Architecture Prometheus uses a pull-based model: instead of applications pushing metrics to a central collector, Prometheus scrapes HTTP endpoints at regular intervals. This design has some nice advantages:\nServices don\u0026rsquo;t need to know about the monitoring system Prometheus controls the scrape rate and detects when targets go down You can easily run it locally against any service that exposes a /metrics endpoint Core Components Prometheus Server: scrapes targets, stores time-series data, evaluates alert rules Exporters: translate metrics from third-party systems (node_exporter for Linux, mysqld_exporter for MySQL) Pushgateway: accepts metrics pushed by short-lived batch jobs Alertmanager: receives alerts from Prometheus and routes them to your notification channels Scrape Configuration A basic prometheus.yml configuration:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 global: scrape_interval: 15s evaluation_interval: 15s rule_files: - \u0026#34;alert_rules.yml\u0026#34; alerting: alertmanagers: - static_configs: - targets: - \u0026#34;alertmanager:9093\u0026#34; scrape_configs: - job_name: \u0026#34;prometheus\u0026#34; static_configs: - targets: [\u0026#34;localhost:9090\u0026#34;] - job_name: \u0026#34;node-exporter\u0026#34; static_configs: - targets: [\u0026#34;node-exporter:9100\u0026#34;] - job_name: \u0026#34;application\u0026#34; metrics_path: \u0026#34;/metrics\u0026#34; static_configs: - targets: [\u0026#34;app:8080\u0026#34;] # Kubernetes service discovery - job_name: \u0026#34;kubernetes-pods\u0026#34; kubernetes_sd_configs: - role: pod relabel_configs: - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape] action: keep regex: true - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path] action: replace target_label: __metrics_path__ regex: (.+) In Kubernetes, service discovery automatically finds pods annotated with prometheus.io/scrape: \u0026quot;true\u0026quot;. You don\u0026rsquo;t have to manually list every target anymore.\nPromQL Basics PromQL is Prometheus\u0026rsquo;s query language. Here are the most useful patterns:\nInstant Vectors and Rate 1 2 3 4 5 6 7 8 # Current CPU usage per core node_cpu_seconds_total{mode=\u0026#34;idle\u0026#34;} # Per-second rate of HTTP requests over the last 5 minutes rate(http_requests_total[5m]) # Request rate by status code sum(rate(http_requests_total[5m])) by (status_code) Latency Percentiles with Histograms 1 2 3 4 5 # 95th percentile request latency histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le)) # 99th percentile by endpoint histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, handler)) Error Rates 1 2 3 4 5 6 7 8 9 10 11 12 # Error rate as percentage sum(rate(http_requests_total{status_code=~\u0026#34;5..\u0026#34;}[5m])) / sum(rate(http_requests_total[5m])) * 100 # Availability (inverse of error rate) 1 - ( sum(rate(http_requests_total{status_code=~\u0026#34;5..\u0026#34;}[5m])) / sum(rate(http_requests_total[5m])) ) Resource Utilization 1 2 3 4 5 # Memory usage percentage (1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100 # Disk usage percentage (1 - node_filesystem_avail_bytes{mountpoint=\u0026#34;/\u0026#34;} / node_filesystem_size_bytes{mountpoint=\u0026#34;/\u0026#34;}) * 100 Grafana Dashboards Grafana connects to Prometheus as a data source and lets you build dashboards with panels for graphs, tables, gauges, and heatmaps.\nSetup Add Prometheus as a data source in Grafana either through the UI or via provisioning:\n1 2 3 4 5 6 7 8 9 10 11 12 13 # grafana/provisioning/datasources/prometheus.yml apiVersion: 1 datasources: - name: Prometheus type: prometheus access: proxy url: http://prometheus:9090 isDefault: true - name: Loki type: loki access: proxy url: http://loki:3100 Dashboard Provisioning Instead of creating dashboards manually in the UI, store them as JSON files and provision them automatically:\n1 2 3 4 5 6 7 8 9 10 11 12 # grafana/provisioning/dashboards/default.yml apiVersion: 1 providers: - name: \u0026#34;Default\u0026#34; orgId: 1 folder: \u0026#34;\u0026#34; type: file disableDeletion: false updateIntervalSeconds: 30 options: path: /var/lib/grafana/dashboards foldersFromFilesStructure: true Place your dashboard JSON files in the mounted directory. Export existing dashboards from the Grafana UI using the share/export feature and commit them to Git. This gives you version-controlled, reproducible dashboards.\nPro tip: when exporting dashboards for provisioning, replace hardcoded datasource UIDs with the variable ${DS_PROMETHEUS} so they work across environments.\nLoki for Log Aggregation Loki is Grafana\u0026rsquo;s log aggregation system. It\u0026rsquo;s designed to be cost-effective by indexing only metadata (labels) rather than full log content. It pairs naturally with Grafana, letting you correlate logs and metrics in the same dashboard.\nArchitecture Loki uses the same label-based approach as Prometheus. Logs are tagged with labels like {job=\u0026quot;myapp\u0026quot;, namespace=\u0026quot;production\u0026quot;} and queried using LogQL:\n1 2 3 4 5 6 7 8 # All error logs from the payment service {job=\u0026#34;payment-service\u0026#34;} |= \u0026#34;error\u0026#34; # JSON-structured logs, filter by level and extract fields {job=\u0026#34;api\u0026#34;} | json | level=\u0026#34;error\u0026#34; | line_format \u0026#34;{{.msg}}\u0026#34; # Count errors per service over time sum(count_over_time({job=~\u0026#34;.+\u0026#34;} |= \u0026#34;error\u0026#34; [5m])) by (job) Log Collection with Promtail Promtail is the agent that ships logs to Loki. A basic configuration:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 server: http_listen_port: 9080 positions: filename: /tmp/positions.yaml clients: - url: http://loki:3100/loki/api/v1/push scrape_configs: - job_name: containers static_configs: - targets: - localhost labels: job: containers __path__: /var/log/containers/*.log In Kubernetes, deploy Promtail as a DaemonSet to collect logs from all nodes automatically.\nAlerting with Alertmanager Alertmanager handles alert routing, grouping, deduplication, and silencing. Prometheus evaluates alert rules and fires alerts to Alertmanager, which then delivers notifications.\nAlert Rules Define alert rules in a file referenced by prometheus.yml:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 # alert_rules.yml groups: - name: application rules: - alert: HighErrorRate expr: | sum(rate(http_requests_total{status_code=~\u0026#34;5..\u0026#34;}[5m])) / sum(rate(http_requests_total[5m])) \u0026gt; 0.05 for: 5m labels: severity: critical annotations: summary: \u0026#34;High error rate detected\u0026#34; description: \u0026#34;Error rate is {{ $value | humanizePercentage }} over the last 5 minutes\u0026#34; - alert: HighLatency expr: | histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le)) \u0026gt; 1.0 for: 10m labels: severity: warning annotations: summary: \u0026#34;High p95 latency\u0026#34; description: \u0026#34;95th percentile latency is {{ $value }}s\u0026#34; - alert: DiskSpaceLow expr: | (1 - node_filesystem_avail_bytes{mountpoint=\u0026#34;/\u0026#34;} / node_filesystem_size_bytes{mountpoint=\u0026#34;/\u0026#34;}) * 100 \u0026gt; 85 for: 15m labels: severity: warning annotations: summary: \u0026#34;Disk space above 85%\u0026#34; description: \u0026#34;Disk usage on {{ $labels.instance }} is {{ $value }}%\u0026#34; Alertmanager Configuration 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 # alertmanager.yml global: resolve_timeout: 5m route: group_by: [\u0026#34;alertname\u0026#34;, \u0026#34;severity\u0026#34;] group_wait: 30s group_interval: 5m repeat_interval: 4h receiver: \u0026#34;default\u0026#34; routes: - match: severity: critical receiver: \u0026#34;pagerduty-critical\u0026#34; repeat_interval: 1h - match: severity: warning receiver: \u0026#34;slack-warnings\u0026#34; receivers: - name: \u0026#34;default\u0026#34; slack_configs: - api_url: \u0026#34;https://hooks.slack.com/services/XXX/YYY/ZZZ\u0026#34; channel: \u0026#34;#alerts\u0026#34; title: \u0026#39;{{ .GroupLabels.alertname }}\u0026#39; text: \u0026#39;{{ range .Alerts }}{{ .Annotations.description }}{{ end }}\u0026#39; - name: \u0026#34;pagerduty-critical\u0026#34; pagerduty_configs: - service_key: \u0026#34;\u0026lt;pagerduty-service-key\u0026gt;\u0026#34; - name: \u0026#34;slack-warnings\u0026#34; slack_configs: - api_url: \u0026#34;https://hooks.slack.com/services/XXX/YYY/ZZZ\u0026#34; channel: \u0026#34;#warnings\u0026#34; Docker Compose Deployment Here\u0026rsquo;s a complete docker-compose.yml to spin up the full observability stack locally:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 version: \u0026#34;3.8\u0026#34; services: prometheus: image: prom/prometheus:latest volumes: - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml - ./prometheus/alert_rules.yml:/etc/prometheus/alert_rules.yml - prometheus_data:/prometheus ports: - \u0026#34;9090:9090\u0026#34; command: - \u0026#34;--config.file=/etc/prometheus/prometheus.yml\u0026#34; - \u0026#34;--storage.tsdb.retention.time=30d\u0026#34; alertmanager: image: prom/alertmanager:latest volumes: - ./alertmanager/alertmanager.yml:/etc/alertmanager/alertmanager.yml ports: - \u0026#34;9093:9093\u0026#34; grafana: image: grafana/grafana:latest volumes: - ./grafana/provisioning:/etc/grafana/provisioning - ./grafana/dashboards:/var/lib/grafana/dashboards - grafana_data:/var/lib/grafana ports: - \u0026#34;3000:3000\u0026#34; environment: - GF_SECURITY_ADMIN_PASSWORD=changeme loki: image: grafana/loki:latest volumes: - loki_data:/loki ports: - \u0026#34;3100:3100\u0026#34; promtail: image: grafana/promtail:latest volumes: - ./promtail/config.yml:/etc/promtail/config.yml - /var/log:/var/log:ro command: -config.file=/etc/promtail/config.yml node-exporter: image: prom/node-exporter:latest ports: - \u0026#34;9100:9100\u0026#34; volumes: - /proc:/host/proc:ro - /sys:/host/sys:ro - /:/rootfs:ro command: - \u0026#34;--path.procfs=/host/proc\u0026#34; - \u0026#34;--path.sysfs=/host/sys\u0026#34; - \u0026#34;--path.rootfs=/rootfs\u0026#34; volumes: prometheus_data: grafana_data: loki_data: Start the stack with:\n1 docker-compose up -d Access Grafana at http://localhost:3000, Prometheus at http://localhost:9090, and Alertmanager at http://localhost:9093.\nProduction Tips Use recording rules for expensive PromQL queries that dashboards run frequently. Pre-compute and store the result as a new metric to reduce query load.\nSet retention based on resolution: keep high-resolution data (15s intervals) for 15 days, then downsample to 5m for 90 days using Thanos or Cortex for long-term storage.\nLabel cardinality matters: avoid labels with unbounded values (user IDs, request IDs). High cardinality labels will blow up Prometheus memory usage.\nUse Grafana folders and teams to organize dashboards by service or team. Skip the mega-dashboard that tries to show everything.\nAlert on symptoms, not causes: alert on \u0026ldquo;error rate is high\u0026rdquo; rather than \u0026ldquo;Pod restarted.\u0026rdquo; Users care about the impact, not the internal mechanism.\nImplement alert runbooks: every alert should link to a runbook describing what to check and how to mitigate. Add the link in the alert annotation.\nTest your alerts: use promtool check rules alert_rules.yml to validate rule syntax. Use unit tests for complex PromQL expressions.\nSecure your stack: put Grafana behind SSO/OAuth, restrict Prometheus access to internal networks, enable TLS between components in production.\nThe Prometheus + Grafana + Loki stack provides a solid observability foundation that scales well for most organizations. Start with metrics and alerting, add log aggregation when you need to correlate events, and introduce tracing when debugging cross-service latency becomes something you do regularly.\n","date":"2024-08-25T00:00:00Z","permalink":"/en/p/observability-stack-prometheus-grafana-and-alerting/","title":"Observability stack: Prometheus, Grafana, and alerting"},{"content":"SAST vs DAST: Understanding the Difference Application security testing has two fundamental approaches, and knowing when to use each matters for your security posture.\nSAST (Static Application Security Testing) reads source code, bytecode, or binaries without running the app. It looks for patterns that match known vulnerabilities (SQL injection, XSS, hardcoded credentials, insecure deserialization). Think of it like a security linter that flags dangerous patterns.\nDAST (Dynamic Application Security Testing) tests a running application by sending crafted requests and analyzing responses. It simulates what an attacker would do: probe endpoints, test authentication, look for misconfigurations. DAST sees the app from outside, like a real attacker would.\nAspect SAST DAST When it runs Build time, on source code Runtime, against deployed app What it finds Code flaws, hardcoded secrets, insecure patterns Runtime vulnerabilities, misconfigurations, auth issues False positive rate Higher (no runtime context) Lower (tests real behavior) Language dependency Yes (needs language-specific rules) No (tests HTTP/API layer) Coverage All code paths (including dead code) Only reachable endpoints Speed Fast (seconds to minutes) Slower (minutes to hours) Best stage Every commit/PR Staging or pre-production Short answer: use both. SAST catches problems early and cheap. DAST catches problems that only show up at runtime. They complement each other.\nTool comparison SAST tools Tool Languages Strengths License Semgrep 30+ languages Fast, custom rules, great CI integration OSS + Commercial SonarQube 25+ languages Broad ecosystem, quality gates, dashboard Community + Commercial Bandit Python only Python-specific, lightweight, easy setup OSS (Apache 2.0) CodeQL 10+ languages Deep semantic analysis, GitHub native Free for OSS DAST tools Tool Type Strengths License OWASP ZAP Proxy-based scanner Full-featured, scriptable, community rules OSS (Apache 2.0) Burp Suite Proxy-based scanner Best-in-class manual + automated testing Commercial Nuclei Template-based scanner Fast, huge template library, CI-friendly OSS (MIT) For most teams, Semgrep + OWASP ZAP provides an excellent open-source foundation that covers both SAST and DAST without licensing costs.\nGitLab CI pipeline integration Here is a complete .gitlab-ci.yml example that integrates both SAST and DAST into a pipeline with proper gating:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 stages: - build - test - sast - deploy-staging - dast - deploy-production variables: SEMGREP_RULES: \u0026#34;p/owasp-top-ten p/security-audit\u0026#34; ZAP_TARGET_URL: \u0026#34;https://staging.example.com\u0026#34; # --- SAST Stage --- semgrep-scan: stage: sast image: semgrep/semgrep:latest script: - semgrep ci --config \u0026#34;$SEMGREP_RULES\u0026#34; --json --output semgrep-results.json - | # Fail pipeline if high/critical findings exceed threshold HIGH_COUNT=$(cat semgrep-results.json | jq \u0026#39;[.results[] | select(.extra.severity == \u0026#34;ERROR\u0026#34;)] | length\u0026#39;) echo \u0026#34;High severity findings: $HIGH_COUNT\u0026#34; if [ \u0026#34;$HIGH_COUNT\u0026#34; -gt 0 ]; then echo \u0026#34;Pipeline blocked: $HIGH_COUNT high severity findings\u0026#34; exit 1 fi artifacts: paths: - semgrep-results.json when: always allow_failure: false bandit-scan: stage: sast image: python:3.11-slim script: - pip install bandit - bandit -r src/ -f json -o bandit-results.json --severity-level medium || true - | HIGH_COUNT=$(cat bandit-results.json | jq \u0026#39;.results | map(select(.issue_severity == \u0026#34;HIGH\u0026#34;)) | length\u0026#39;) echo \u0026#34;Bandit high severity findings: $HIGH_COUNT\u0026#34; if [ \u0026#34;$HIGH_COUNT\u0026#34; -gt 3 ]; then exit 1 fi artifacts: paths: - bandit-results.json when: always only: - merge_requests - main # --- Deploy Staging --- deploy-staging: stage: deploy-staging script: - echo \u0026#34;Deploying to staging...\u0026#34; - ./deploy.sh staging environment: name: staging url: https://staging.example.com # --- DAST Stage --- owasp-zap-scan: stage: dast image: ghcr.io/zaproxy/zaproxy:stable script: - mkdir -p /zap/wrk - | zap-full-scan.py \\ -t \u0026#34;$ZAP_TARGET_URL\u0026#34; \\ -r zap-report.html \\ -J zap-results.json \\ -l WARN \\ -d - | # Parse results and enforce policy HIGH_ALERTS=$(cat zap-results.json | jq \u0026#39;[.site[].alerts[] | select(.riskcode == \u0026#34;3\u0026#34;)] | length\u0026#39;) echo \u0026#34;High risk alerts: $HIGH_ALERTS\u0026#34; if [ \u0026#34;$HIGH_ALERTS\u0026#34; -gt 0 ]; then echo \u0026#34;Pipeline blocked: $HIGH_ALERTS high risk vulnerabilities found\u0026#34; exit 1 fi artifacts: paths: - zap-report.html - zap-results.json when: always dependencies: - deploy-staging # --- Deploy Production --- deploy-production: stage: deploy-production script: - echo \u0026#34;Deploying to production...\u0026#34; - ./deploy.sh production environment: name: production when: manual only: - main The key design decisions in this pipeline:\nSAST runs on every merge request, catching issues before code merges DAST runs against staging, testing the deployed application Thresholds are configurable: zero tolerance for high/critical SAST findings, but some flexibility for lower severities Reports are always saved as artifacts, even when the pipeline passes Production deploy is manual, gated behind both SAST and DAST stages Handling False Positives False positives are the number one reason teams abandon security scanning. Handle them systematically:\nFor Semgrep Create a .semgrepignore file or use inline annotations:\n1 2 # nosemgrep: python.lang.security.audit.hardcoded-password DEFAULT_TEST_PASSWORD = \u0026#34;test123\u0026#34; # Only used in test fixtures For persistent false positives, create a semgrep-exclusions.yml:\n1 2 3 4 5 6 7 rules: - id: ignore-test-passwords pattern: $X = \u0026#34;...\u0026#34; paths: exclude: - tests/ - fixtures/ For OWASP ZAP Use a context file to exclude known false positives:\n1 2 3 4 5 6 7 \u0026lt;alertFilter\u0026gt; \u0026lt;ruleId\u0026gt;10038\u0026lt;/ruleId\u0026gt; \u0026lt;url\u0026gt;https://staging.example.com/api/health\u0026lt;/url\u0026gt; \u0026lt;urlIsRegex\u0026gt;false\u0026lt;/urlIsRegex\u0026gt; \u0026lt;enabled\u0026gt;true\u0026lt;/enabled\u0026gt; \u0026lt;newRisk\u0026gt;-1\u0026lt;/newRisk\u0026gt; \u0026lt;!-- -1 = False Positive --\u0026gt; \u0026lt;/alertFilter\u0026gt; The golden rule: every suppression must include a justification comment explaining why it is safe to ignore. Review suppressions quarterly.\nThreshold and Gate Policies Define clear policies per environment and severity:\nSeverity PR/MR Gate Main Branch Gate Pre-Production Critical Block Block Block High Block Block Warn Medium Warn Warn Info Low Info Info Info Implement these as exit codes in your CI scripts. Start permissive and tighten over time \u0026ndash; blocking on everything from day one will create frustration and workarounds.\nReporting and Dashboards Aggregate results from both SAST and DAST into a centralized view:\nDefectDojo: open-source vulnerability management platform that ingests reports from Semgrep, ZAP, Bandit, and dozens of other tools GitLab Security Dashboard: native integration if you are on GitLab Ultimate Custom dashboards: push scan results to Elasticsearch and visualize in Grafana Track these metrics over time:\nMean time to remediate (MTTR) per severity Total open vulnerabilities by age False positive rate (suppressions vs total findings) Scan coverage (percentage of repos with security scanning enabled) IAST as a Complement IAST (Interactive Application Security Testing) combines elements of both SAST and DAST by instrumenting the application at runtime. An agent runs inside the application during testing, correlating HTTP requests with code execution paths.\nIAST tools like Contrast Security and Datadog Application Security provide:\nLower false positive rates than SAST Code-level context that DAST lacks Real-time feedback during QA testing Consider IAST as a third layer once SAST and DAST are mature in your pipeline.\nPractical Rollout Strategy Deploying security scanning organization-wide requires phases:\nPhase 1: Visibility (Weeks 1-4) Enable SAST in CI with allow_failure: true (don\u0026rsquo;t block) Generate reports, keep as artifacts Find top vulnerability categories Establish baseline metrics Phase 2: Selective Gating (Weeks 5-8) Block only critical/high SAST findings Add DAST scans to staging for key apps Create suppression and triage workflows Train developers how to interpret and fix findings Phase 3: Full Integration (Weeks 9-12) Enforce SAST gates everywhere Enforce DAST gates on all web apps Integrate with vulnerability management platform Auto-create tickets for findings Phase 4: Continuous Improvement (Ongoing) Write custom Semgrep rules for your patterns Tune ZAP policies to reduce scan time Track MTTR, set improvement targets Quarterly reviews of false positives Biggest mistake: jumping straight to Phase 3. Start with visibility, build trust, tighten gradually. Scanning should help, not block.\n","date":"2024-05-12T00:00:00Z","permalink":"/en/p/sast/dast-integration-in-ci/cd-pipelines/","title":"SAST/DAST integration in CI/CD pipelines"},{"content":"What Is DataOps? DataOps applies DevOps principles (automation, continuous integration, monitoring, collaboration) to data pipelines and analytics. While DevOps ships software reliably, DataOps ships data reliably.\nThe difference: what flows through the pipeline. DevOps builds, tests, deploys code. DataOps builds, tests, deploys data transformations. The goal is ensuring data arriving at dashboards, ML models, and downstream systems is correct, fresh, and trustworthy.\nIf you\u0026rsquo;ve ever had a broken dashboard Monday morning because a schema changed over the weekend, you know why DataOps matters.\nCore principles 1. Automation first Every step in your data pipeline \u0026ndash; extraction, transformation, loading, testing, and deployment \u0026ndash; should be automated. Manual SQL scripts run from someone\u0026rsquo;s laptop are a liability. Codify everything, version it in Git, and let orchestrators handle execution.\n2. Continuous testing Data testing is not optional. You should validate data at every stage:\nSchema tests: column types, nullability constraints Volume tests: row counts within expected ranges Freshness tests: data arrived on schedule Business rule tests: revenue is never negative, dates are not in the future 3. Monitoring and observability You need to know when something breaks before your stakeholders do. Instrument your pipelines with metrics on latency, row counts, error rates, and data quality scores. Set up alerts that fire when anomalies are detected.\n4. Collaboration and version control Data pipelines are code. Treat them that way. Use pull requests, code reviews, and CI/CD for your transformation logic. Every change to a pipeline should be reviewable, testable, and reversible.\nPipeline architecture: ETL vs ELT The two dominant patterns for data pipelines are ETL and ELT. The choice depends on your infrastructure and use case.\nETL (Extract, Transform, Load) Data is extracted from sources, transformed in a processing engine (Spark, Python scripts), and then loaded into the target system. This pattern makes sense when:\nYou need to reduce data volume before loading (cost control) Transformations require heavy computation not suited for your warehouse You have strict data governance requiring transformation before storage ELT (Extract, Load, Transform) Data is extracted and loaded raw into a data warehouse (BigQuery, Snowflake, Redshift), then transformed in place using SQL. This is the modern default because:\nCloud warehouses have massive compute capacity SQL-based transformations are easier to review and test Raw data is preserved, enabling reprocessing when logic changes Tools like dbt make SQL-based transformations first-class citizens For most teams starting today, ELT is the recommended approach unless you have a specific reason to transform before loading.\nKey tools Apache Airflow \u0026ndash; orchestration Airflow is the most widely adopted open-source orchestrator for data pipelines. It lets you define workflows as Directed Acyclic Graphs (DAGs) in Python, with built-in scheduling, retries, dependency management, and a web UI for monitoring.\nHere is a practical example of a DAG that orchestrates an ELT pipeline:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 from airflow import DAG from airflow.operators.python import PythonOperator from airflow.providers.common.sql.operators.sql import SQLExecuteQueryOperator from airflow.utils.dates import days_ago from datetime import timedelta default_args = { \u0026#34;owner\u0026#34;: \u0026#34;data-team\u0026#34;, \u0026#34;retries\u0026#34;: 2, \u0026#34;retry_delay\u0026#34;: timedelta(minutes=5), \u0026#34;email_on_failure\u0026#34;: True, \u0026#34;email\u0026#34;: [\u0026#34;data-alerts@company.com\u0026#34;], } with DAG( dag_id=\u0026#34;elt_sales_pipeline\u0026#34;, default_args=default_args, schedule_interval=\u0026#34;@daily\u0026#34;, start_date=days_ago(1), catchup=False, tags=[\u0026#34;elt\u0026#34;, \u0026#34;sales\u0026#34;], ) as dag: extract_load = PythonOperator( task_id=\u0026#34;extract_and_load_raw\u0026#34;, python_callable=extract_and_load_sales_data, # your extraction function ) transform = SQLExecuteQueryOperator( task_id=\u0026#34;transform_sales\u0026#34;, conn_id=\u0026#34;warehouse_conn\u0026#34;, sql=\u0026#34;sql/transform_sales.sql\u0026#34;, ) run_quality_checks = PythonOperator( task_id=\u0026#34;data_quality_checks\u0026#34;, python_callable=run_great_expectations_suite, ) extract_load \u0026gt;\u0026gt; transform \u0026gt;\u0026gt; run_quality_checks Key patterns to follow in Airflow:\nIdempotent tasks: running the same task twice should produce the same result Atomic writes: use staging tables and swap on success Parameterized dates: use {{ ds }} template variables for date partitioning Small tasks: each task should do one thing, making failures easy to diagnose dbt \u0026ndash; transformation dbt (data build tool) is the standard for managing SQL-based transformations in an ELT pipeline. It provides:\nModular SQL: break complex transformations into referenceable models Built-in testing: schema tests, custom tests, and data freshness checks Documentation: auto-generated docs from your model descriptions Lineage: visual DAG showing how models depend on each other A typical dbt project structure looks like:\n1 2 3 4 5 6 7 8 models/ staging/ stg_sales.sql -- clean raw data stg_customers.sql marts/ fct_daily_revenue.sql -- business-level aggregations dim_customers.sql schema.yml -- tests and documentation Great Expectations \u0026ndash; data quality Great Expectations is a Python framework for defining, running, and documenting data quality checks. It goes beyond simple assertions by generating human-readable data documentation.\nHere is an example of setting up expectations for a sales table:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 import great_expectations as gx context = gx.get_context() # Connect to your data source datasource = context.sources.add_or_update_pandas(\u0026#34;sales_source\u0026#34;) data_asset = datasource.add_csv_asset(\u0026#34;daily_sales\u0026#34;, filepath_or_buffer=\u0026#34;data/daily_sales.csv\u0026#34;) batch_request = data_asset.build_batch_request() # Create an expectation suite suite = context.add_or_update_expectation_suite(\u0026#34;sales_quality_suite\u0026#34;) validator = context.get_validator( batch_request=batch_request, expectation_suite_name=\u0026#34;sales_quality_suite\u0026#34;, ) # Define expectations validator.expect_column_to_exist(\u0026#34;order_id\u0026#34;) validator.expect_column_values_to_be_unique(\u0026#34;order_id\u0026#34;) validator.expect_column_values_to_not_be_null(\u0026#34;customer_id\u0026#34;) validator.expect_column_values_to_be_between(\u0026#34;amount\u0026#34;, min_value=0, max_value=100000) validator.expect_column_values_to_be_in_set(\u0026#34;status\u0026#34;, [\u0026#34;pending\u0026#34;, \u0026#34;completed\u0026#34;, \u0026#34;refunded\u0026#34;]) # Run validation results = validator.validate() validator.save_expectation_suite(discard_failed_expectations=False) if not results.success: raise Exception(f\u0026#34;Data quality checks failed: {results.statistics}\u0026#34;) Integrate this into your Airflow DAG so that quality gates run after every transformation step. If checks fail, the pipeline stops and alerts fire.\nMonitoring and observability A production data pipeline needs observability across several dimensions:\nDimension What to Track Tools Pipeline health Task success/failure rates, duration trends Airflow metrics, Prometheus Data freshness Time since last successful load dbt source freshness, custom checks Data volume Row counts per table per run Great Expectations, custom SQL Data quality Test pass/fail rates, anomaly scores Great Expectations, Monte Carlo Cost Warehouse compute usage, storage growth Cloud provider dashboards Set up alerts for:\nAny pipeline task failure Data freshness exceeding SLA thresholds Row count deviations beyond 2 standard deviations from the rolling average Data quality test failures Push Airflow metrics to Prometheus and build Grafana dashboards that give your team a single pane of glass for pipeline health.\nBest practices Treat pipelines as code: all SQL, DAG definitions, and configuration live in Git Use environments: dev, staging, production \u0026ndash; just like application code Implement CI/CD: run dbt tests and linting on every pull request Design for failure: every task should be retryable and idempotent Document data contracts: define and publish schemas that upstream and downstream teams agree on Start with testing: add data quality checks before adding new features Alert on SLAs, not just failures: a pipeline that succeeds but runs 3x slower than usual is still a problem Keep raw data immutable: never modify source data; transform into separate tables DataOps isn\u0026rsquo;t a tool. It\u0026rsquo;s a set of practices that make your data infrastructure reliable, testable, and maintainable. Start with orchestration and testing, then add monitoring and quality checks as you mature.\n","date":"2024-02-18T00:00:00Z","permalink":"/en/p/dataops-building-reliable-data-pipelines/","title":"DataOps: building reliable data pipelines"},{"content":"What triggered this A couple of months ago I spent time hardening the K3s cluster. I went through an entire weekend changing configurations, tuning kernel parameters, swapping Flannel for Cilium, writing network policies. By the end the cluster was in much better shape.\nBut I had done all of it by hand.\nIf I need to recreate that node from scratch tomorrow, how long does it take? Probably two or three days digging through my own notes scattered across text files, terminal history, and chat messages. And I would still miss things. That is not sustainable.\nSo I decided to build a proper infrastructure repository. Not a demo, not a proof of concept — the repository where the definition of everything I run at home lives, sanitized enough to publish.\nWhat the homelab looks like Before talking about the repository structure, it helps to explain what needs to be managed. The setup is:\nA main server running Proxmox where several VMs live Two additional physical nodes forming the K3s cluster A router running OpenWrt A NAS running TrueNAS It is not a huge environment, but it has enough variety that managing everything by hand is a real problem. Especially because Proxmox, the VMs, the cluster, and the NAS have configurations that interact with each other: IPs, internal DNS, certificates, users.\nRepository structure After a few experiments, this is the layout that works for me:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 homelab-infra/ ├── terraform/ │ ├── modules/ │ │ ├── proxmox-vm/ │ │ ├── dns-record/ │ │ └── network-vlan/ │ └── environments/ │ ├── main.tf │ ├── variables.tf │ └── terraform.tfvars.example ├── ansible/ │ ├── inventory/ │ │ ├── hosts.yml │ │ └── group_vars/ │ ├── playbooks/ │ │ ├── bootstrap.yml │ │ ├── k3s-server.yml │ │ ├── k3s-agent.yml │ │ └── hardening.yml │ └── roles/ │ ├── common/ │ ├── cis-level1/ │ └── k3s/ ├── kubernetes/ │ ├── base/ │ ├── apps/ │ │ ├── monitoring/ │ │ ├── storage/ │ │ └── networking/ │ └── policies/ │ ├── gatekeeper/ │ └── seccomp/ ├── .gitlab-ci.yml ├── .sops.yaml └── README.md Three clearly separated layers: provisioning (Terraform), node configuration (Ansible), and Kubernetes workloads. Security policies live inside kubernetes/policies/ because they are Kubernetes resources, but I treat them as a conceptually distinct layer.\nTerraform: provisioning with Proxmox The Proxmox provider for Terraform is the Telmate one (telmate/proxmox). It is not official, but it is the most widely used and works reasonably well.\nThe proxmox-vm module wraps VM creation with the parameters I use regularly:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 # terraform/modules/proxmox-vm/main.tf resource \u0026#34;proxmox_vm_qemu\u0026#34; \u0026#34;vm\u0026#34; { name = var.name target_node = var.target_node clone = var.template full_clone = true cores = var.cores memory = var.memory sockets = 1 disk { size = var.disk_size type = \u0026#34;virtio\u0026#34; storage = var.storage_pool discard = \u0026#34;on\u0026#34; } network { model = \u0026#34;virtio\u0026#34; bridge = var.network_bridge tag = var.vlan_tag } ipconfig0 = \u0026#34;ip=${var.ip_address}/24,gw=${var.gateway}\u0026#34; nameserver = var.nameserver searchdomain = var.searchdomain ciuser = var.ssh_user sshkeys = var.ssh_public_key lifecycle { ignore_changes = [ network, ] } } Sensitive variables — the Proxmox API token, SSH keys — are not in the repository in plaintext. I use SOPS to encrypt the terraform.tfvars file with age:\n1 2 3 4 5 6 # .sops.yaml creation_rules: - path_regex: .*\\.tfvars$ age: age1xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx - path_regex: ansible/inventory/group_vars/.*\\.yml$ age: age1xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx The encrypted file (terraform.tfvars) is committed to Git. Decrypting it requires the age private key, which lives on the CI server and my local machine — never in the repository.\nAnsible: node configuration Once Terraform provisions the VMs, Ansible configures them. The bootstrap.yml playbook does the minimum needed to get a freshly created node into working shape:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 # ansible/playbooks/bootstrap.yml --- - name: Bootstrap new nodes hosts: all become: true roles: - common - cis-level1 - name: Configure K3s server nodes hosts: k3s_servers become: true roles: - k3s The common role installs base packages, configures NTP, hardens SSH, sets sysctl parameters, and creates system users. The cis-level1 role applies CIS Benchmark Level 1 recommendations for Debian.\nI did not write the CIS role from scratch. I started from the community role dev-sec/ansible-collection-hardening and adapted it. There are quite a few tasks the default role applies that do not fit a homelab — things designed for production servers with strict audit requirements. I went through each task, understood what it did, and decided if it applied to my case.\nSome things I disabled:\n1 2 3 4 5 # ansible/roles/cis-level1/defaults/main.yml os_auth_pam_pwquality_enable: false # No local users with passwords os_security_users_allow: [\u0026#34;vagrant\u0026#34;] # Dev environment only os_filesystem_whitelist: - vfat # Required for UEFI boot And some things I added specifically for K3s:\n1 2 3 4 5 6 7 8 # Kernel parameters required by K3s with protect-kernel-defaults kernel_parameters: - { name: \u0026#34;kernel.panic\u0026#34;, value: \u0026#34;10\u0026#34; } - { name: \u0026#34;kernel.panic_on_oops\u0026#34;, value: \u0026#34;1\u0026#34; } - { name: \u0026#34;vm.overcommit_memory\u0026#34;, value: \u0026#34;1\u0026#34; } - { name: \u0026#34;vm.panic_on_oom\u0026#34;, value: \u0026#34;0\u0026#34; } - { name: \u0026#34;fs.inotify.max_user_watches\u0026#34;, value: \u0026#34;524288\u0026#34; } - { name: \u0026#34;fs.inotify.max_user_instances\u0026#34;, value: \u0026#34;512\u0026#34; } Kubernetes: manifests with Kustomize For Kubernetes manifests I use Kustomize over Helm where possible. Helm is more powerful for complex things, but for my own applications Kustomize is sufficient and produces readable YAML.\nThe basic Kustomize structure:\n1 2 3 4 5 6 7 8 9 10 11 kubernetes/apps/monitoring/ ├── base/ │ ├── kustomization.yaml │ ├── namespace.yaml │ ├── prometheus-deployment.yaml │ └── grafana-deployment.yaml └── overlays/ └── homelab/ ├── kustomization.yaml └── patches/ └── resource-limits.yaml The homelab overlay adds environment-specific settings without modifying the base manifests:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 # kubernetes/apps/monitoring/overlays/homelab/patches/resource-limits.yaml apiVersion: apps/v1 kind: Deployment metadata: name: prometheus namespace: monitoring spec: template: spec: containers: - name: prometheus resources: requests: memory: \u0026#34;256Mi\u0026#34; cpu: \u0026#34;100m\u0026#34; limits: memory: \u0026#34;512Mi\u0026#34; cpu: \u0026#34;500m\u0026#34; OPA/Gatekeeper: policies as code Gatekeeper is a Kubernetes admission controller that uses OPA (Open Policy Agent) to evaluate policies written in Rego. Instead of letting any pod deploy with any configuration, policies reject manifests that do not meet security requirements.\nActive policies:\nNo containers running as root 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 # kubernetes/policies/gatekeeper/no-root-containers.yaml apiVersion: constraints.gatekeeper.sh/v1beta1 kind: K8sPSPAllowedUsers metadata: name: psp-pods-allowed-user-ranges spec: match: kinds: - apiGroups: [\u0026#34;\u0026#34;] kinds: [\u0026#34;Pod\u0026#34;] excludedNamespaces: - kube-system - falco parameters: runAsUser: rule: MustRunAsNonRoot runAsGroup: rule: MustRunAs ranges: - min: 1000 max: 65535 Required resource limits Without resource limits, a pod can consume all the node\u0026rsquo;s memory and bring down the cluster. This policy prevents that:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 # kubernetes/policies/gatekeeper/require-resource-limits.yaml apiVersion: templates.gatekeeper.sh/v1 kind: ConstraintTemplate metadata: name: k8srequiredresources spec: crd: spec: names: kind: K8sRequiredResources targets: - target: admission.k8s.gatekeeper.sh rego: | package k8srequiredresources violation[{\u0026#34;msg\u0026#34;: msg}] { container := input.review.object.spec.containers[_] not container.resources.limits.memory msg := sprintf(\u0026#34;Container \u0026#39;%v\u0026#39; has no memory limit defined\u0026#34;, [container.name]) } violation[{\u0026#34;msg\u0026#34;: msg}] { container := input.review.object.spec.containers[_] not container.resources.limits.cpu msg := sprintf(\u0026#34;Container \u0026#39;%v\u0026#39; has no CPU limit defined\u0026#34;, [container.name]) } --- apiVersion: constraints.gatekeeper.sh/v1beta1 kind: K8sRequiredResources metadata: name: require-resource-limits spec: match: kinds: - apiGroups: [\u0026#34;\u0026#34;] kinds: [\u0026#34;Pod\u0026#34;] excludedNamespaces: - kube-system Trusted image registries 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 # kubernetes/policies/gatekeeper/allowed-registries.yaml apiVersion: templates.gatekeeper.sh/v1 kind: ConstraintTemplate metadata: name: k8sallowedrepos spec: crd: spec: names: kind: K8sAllowedRepos validation: openAPIV3Schema: type: object properties: repos: type: array items: type: string targets: - target: admission.k8s.gatekeeper.sh rego: | package k8sallowedrepos violation[{\u0026#34;msg\u0026#34;: msg}] { container := input.review.object.spec.containers[_] not any_repo_matches(container.image) msg := sprintf(\u0026#34;Image \u0026#39;%v\u0026#39; does not come from an allowed registry\u0026#34;, [container.image]) } any_repo_matches(image) { repo := input.parameters.repos[_] startswith(image, repo) } --- apiVersion: constraints.gatekeeper.sh/v1beta1 kind: K8sAllowedRepos metadata: name: allowed-registries spec: match: kinds: - apiGroups: [\u0026#34;\u0026#34;] kinds: [\u0026#34;Pod\u0026#34;] excludedNamespaces: - kube-system parameters: repos: - \u0026#34;registry.homelab.internal/\u0026#34; - \u0026#34;ghcr.io/my-user/\u0026#34; - \u0026#34;quay.io/prometheus/\u0026#34; - \u0026#34;grafana/\u0026#34; Seccomp profiles Seccomp profiles restrict which system calls a container can make. Kubernetes has a default profile (RuntimeDefault) that is already reasonable, but for applications I know well I define tighter profiles.\nThe profiles live in the repository and are deployed as ConfigMaps or directly to the nodes:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 // kubernetes/policies/seccomp/web-app-profile.json { \u0026#34;defaultAction\u0026#34;: \u0026#34;SCMP_ACT_ERRNO\u0026#34;, \u0026#34;architectures\u0026#34;: [\u0026#34;SCMP_ARCH_X86_64\u0026#34;], \u0026#34;syscalls\u0026#34;: [ { \u0026#34;names\u0026#34;: [ \u0026#34;accept4\u0026#34;, \u0026#34;bind\u0026#34;, \u0026#34;brk\u0026#34;, \u0026#34;clone\u0026#34;, \u0026#34;close\u0026#34;, \u0026#34;connect\u0026#34;, \u0026#34;epoll_create1\u0026#34;, \u0026#34;epoll_ctl\u0026#34;, \u0026#34;epoll_wait\u0026#34;, \u0026#34;execve\u0026#34;, \u0026#34;exit_group\u0026#34;, \u0026#34;fcntl\u0026#34;, \u0026#34;fstat\u0026#34;, \u0026#34;futex\u0026#34;, \u0026#34;getdents64\u0026#34;, \u0026#34;getpid\u0026#34;, \u0026#34;getsockname\u0026#34;, \u0026#34;getsockopt\u0026#34;, \u0026#34;listen\u0026#34;, \u0026#34;lstat\u0026#34;, \u0026#34;mmap\u0026#34;, \u0026#34;mprotect\u0026#34;, \u0026#34;munmap\u0026#34;, \u0026#34;nanosleep\u0026#34;, \u0026#34;newfstatat\u0026#34;, \u0026#34;openat\u0026#34;, \u0026#34;poll\u0026#34;, \u0026#34;prctl\u0026#34;, \u0026#34;read\u0026#34;, \u0026#34;recvfrom\u0026#34;, \u0026#34;rt_sigaction\u0026#34;, \u0026#34;rt_sigprocmask\u0026#34;, \u0026#34;rt_sigreturn\u0026#34;, \u0026#34;sendto\u0026#34;, \u0026#34;set_robust_list\u0026#34;, \u0026#34;setsockopt\u0026#34;, \u0026#34;sigaltstack\u0026#34;, \u0026#34;socket\u0026#34;, \u0026#34;stat\u0026#34;, \u0026#34;write\u0026#34; ], \u0026#34;action\u0026#34;: \u0026#34;SCMP_ACT_ALLOW\u0026#34; } ] } Referenced from the pod spec:\n1 2 3 4 5 spec: securityContext: seccompProfile: type: Localhost localhostProfile: \u0026#34;web-app-profile.json\u0026#34; Building a seccomp profile from scratch is tedious. My approach is to start with RuntimeDefault, use strace to see what syscalls the application actually makes, and then build a tighter profile for workloads I want to restrict further.\nCI/CD pipeline The repository has a GitLab CI pipeline that automates applying changes. The flow is:\nOn merge requests: terraform plan and ansible-lint to catch problems before merging On merge to main: terraform apply and, if Ansible files changed, the corresponding playbook 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 # .gitlab-ci.yml (excerpt) stages: - validate - plan - apply variables: TF_ROOT: \u0026#34;${CI_PROJECT_DIR}/terraform/environments\u0026#34; ANSIBLE_CONFIG: \u0026#34;${CI_PROJECT_DIR}/ansible/ansible.cfg\u0026#34; terraform-validate: stage: validate image: hashicorp/terraform:1.6 script: - cd \u0026#34;$TF_ROOT\u0026#34; - terraform init -backend=false - terraform validate rules: - changes: - terraform/**/* terraform-plan: stage: plan image: hashicorp/terraform:1.6 script: - cd \u0026#34;$TF_ROOT\u0026#34; - terraform init - terraform plan -out=tfplan artifacts: paths: - \u0026#34;${TF_ROOT}/tfplan\u0026#34; expire_in: 1 week rules: - if: $CI_PIPELINE_SOURCE == \u0026#34;merge_request_event\u0026#34; changes: - terraform/**/* terraform-apply: stage: apply image: hashicorp/terraform:1.6 script: - cd \u0026#34;$TF_ROOT\u0026#34; - terraform init - terraform apply -auto-approve rules: - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH changes: - terraform/**/* when: manual ansible-lint: stage: validate image: python:3.11-slim script: - pip install ansible ansible-lint - ansible-lint ansible/ rules: - changes: - ansible/**/* The terraform apply is manual — I do not want infrastructure changing automatically without my approval. The plan runs automatically on the MR for visibility.\nWhat I sanitized Publishing the repository required reviewing what should not be there:\nInternal IPs: replaced with example ranges (192.168.1.x) Domain names: the internal homelab domain (homelab.internal in the repo, something different in production) Usernames: real usernames are not in the repository SSH public keys: replaced with placeholders Password hashes: removed from the Ansible inventory Application secrets: encrypted with SOPS or removed, with an .example file alongside The rule I followed: if someone with access to my local network could use that information to attack something, it does not go into the repository in plaintext. Everything else can be there.\nWhat is still pending There are still things I manage by hand that should be codified:\nOpenWrt: the router configuration is the hardest to bring into IaC. There is a Terraform module for OpenWrt that is not very well maintained. For now I manage it with an Ansible script that backs up the configuration and another that restores it. Not idempotent, but it works.\nTrueNAS: it has a fairly complete REST API. There is a Terraform provider in development. It is on my radar for the next iteration.\nBackups: I have backups, but the process is not in the repository. It lives in another loose script that will eventually end up here.\nThe repository is never \u0026ldquo;finished.\u0026rdquo; What matters is that the current state of the homelab is represented in it, and that any change goes through Git.\n","date":"2023-11-20T00:00:00Z","permalink":"/en/p/building-an-iac-repository-for-the-homelab/","title":"Building an IaC repository for the homelab"},{"content":"Why a Maturity Model Helps Most teams know they should \u0026ldquo;shift security left,\u0026rdquo; but knowing where to start is the hard part. A maturity model gives you a structured way to assess your current state, identify gaps, and plan a realistic roadmap for improvement.\nWithout a model, security improvements tend to be reactive (triggered by incidents or audit findings rather than deliberate planning). A maturity model turns security from a fire drill into an engineering discipline with measurable progress.\nThe model described here has five levels. The goal is not to rush to the highest level but to make steady, sustainable progress. Each level builds on the previous one.\nThe Five Maturity Levels Level 1: Ad-Hoc At this level, security is an afterthought. There are no formal processes, and security activities happen sporadically if at all.\nWhat it looks like:\nNo security testing in CI/CD pipelines. Vulnerabilities discovered in production or by external parties. No dedicated security tooling. Developers have little to no security training. Incident response is improvised. Compliance is addressed manually before audits. Typical tools: None specifically for security. Maybe a firewall and antivirus.\nLevel 2: Reactive Security is recognized as important, but the approach is reactive. The team responds to vulnerabilities and incidents but doesn\u0026rsquo;t proactively prevent them.\nWhat it looks like:\nBasic static analysis (SAST) runs occasionally, but findings are not always addressed. Dependency scanning is done manually or on an ad-hoc basis. There\u0026rsquo;s some security documentation, but it\u0026rsquo;s outdated. Incident response exists as a documented process, though it\u0026rsquo;s rarely practiced. Security reviews happen late in the development cycle (right before release). Typical tools: SonarQube (basic rules), OWASP Dependency-Check, manual penetration testing.\nLevel 3: Proactive Security is integrated into the development workflow. The team actively seeks to prevent vulnerabilities rather than just reacting to them.\nWhat it looks like:\nSAST and DAST run automatically in CI/CD pipelines. Dependency scanning with automated alerts for known vulnerabilities. Container image scanning before deployment (Trivy, Grype). Infrastructure as Code is scanned for misconfigurations (Checkov, tfsec). Threat modeling is performed for new features and architecture changes. Security champions exist within development teams. Blameless postmortems are conducted after security incidents. Regular security training for developers. Typical tools: Semgrep, Trivy, Checkov, OWASP ZAP, HashiCorp Vault, Falco.\nLevel 4: Optimized Security is deeply embedded in every stage of the software lifecycle. Metrics drive decisions, and the team continuously improves based on data.\nWhat it looks like:\nSecurity gates in pipelines that block deployment if critical issues are found. Mean time to remediate (MTTR) is tracked and continuously reduced. Software Bill of Materials (SBOM) generated for every release. Signed artifacts and verified supply chain. Automated compliance checks mapped to frameworks (SOC2, ISO 27001, PCI-DSS). Runtime security monitoring with automated response (Falco + custom rules). Regular red team exercises and chaos engineering for security. Security metrics are part of engineering dashboards. Typical tools: Sigstore/cosign, OPA/Gatekeeper, Kyverno, SIEM integration, automated compliance platforms.\nLevel 5: Innovative Security is a competitive advantage. The team contributes to the broader security community and pushes the state of the art.\nWhat it looks like:\nBug bounty programs actively managed. Custom security tooling developed for organization-specific risks. Machine learning applied to anomaly detection and threat hunting. Security is a feature sold to customers (certifications, transparency reports). Active participation in open-source security projects. Zero-trust architecture fully implemented. Policy as code governs all infrastructure and application security. Typical tools: Custom-built platforms, eBPF-based security tools, advanced SIEM with ML, zero-trust service mesh.\nKey Dimensions A maturity model isn\u0026rsquo;t one-dimensional. Assess your organization across these dimensions, as progress is rarely uniform:\nCode Security Level Practices Ad-Hoc No code scanning Reactive Occasional SAST, manual code reviews for security Proactive Automated SAST/DAST in CI, security-focused code review guidelines Optimized Custom rules for organization-specific patterns, MTTR tracked Innovative AI-assisted code review, automatic fix suggestions Infrastructure Security Level Practices Ad-Hoc Manual server configuration, no hardening standards Reactive Basic hardening checklists, occasional audits Proactive IaC scanning, automated hardening, CIS benchmarks Optimized Policy as code (OPA), drift detection, automated remediation Innovative Self-healing infrastructure, zero-trust networking Monitoring and Detection Level Practices Ad-Hoc No security monitoring Reactive Basic log collection, manual review after incidents Proactive Centralized logging, alerting on known patterns, runtime monitoring Optimized SIEM with correlation rules, automated response playbooks Innovative ML-based anomaly detection, threat hunting programs Incident Response Level Practices Ad-Hoc No process, ad-hoc response Reactive Documented runbooks, rarely tested Proactive Regular tabletop exercises, blameless postmortems, on-call rotation Optimized Automated incident classification, SLA-driven response times Innovative Chaos engineering for security, automated containment Compliance Level Practices Ad-Hoc Manual evidence collection before audits Reactive Spreadsheet-based tracking, periodic reviews Proactive Automated evidence collection, continuous monitoring Optimized Compliance as code, real-time dashboards, automated reporting Innovative Continuous certification, public transparency reports Self-Assessment Checklist Rate your organization on each item (Yes / Partial / No):\nBuild Phase:\nSAST runs automatically on every pull request. Dependency scanning alerts on known CVEs. Container images are scanned before being pushed to a registry. IaC templates are scanned for misconfigurations. Secrets detection prevents credentials from being committed. Deploy Phase:\nSecurity gates can block deployment for critical findings. Artifacts are signed and signatures are verified. SBOM is generated for every release. Infrastructure changes go through policy-as-code validation. Run Phase:\nRuntime security monitoring is active (Falco, Sysdig, etc.). Centralized logging with security-relevant alerts. Network segmentation limits blast radius. Secrets are managed through a dedicated vault. Culture and Process:\nDevelopers receive regular security training. Security champions are embedded in development teams. Blameless postmortems are conducted after incidents. Threat modeling is part of the design process for new features. Security metrics are tracked and reviewed regularly. Roadmap for Progression Moving up the maturity levels doesn\u0026rsquo;t happen overnight. Here\u0026rsquo;s a practical roadmap:\nFrom Ad-Hoc to Reactive (3-6 months) Add a SAST tool to your CI pipeline (start with Semgrep - it has good defaults and is fast). Enable dependency scanning (GitHub Dependabot, or trivy fs in CI). Document your incident response process, even if it\u0026rsquo;s simple. Run a single security training session for the team. From Reactive to Proactive (6-12 months) Add container image scanning and IaC scanning to pipelines. Implement secrets detection in pre-commit hooks (gitleaks, detect-secrets). Appoint security champions in each team. Start threat modeling for major features. Conduct your first blameless postmortem after an incident. Deploy runtime monitoring (Falco). From Proactive to Optimized (12-18 months) Implement security gates that can block deployments. Track MTTR and set reduction targets. Generate SBOMs and sign artifacts. Implement policy-as-code for infrastructure (OPA/Gatekeeper). Map automated checks to compliance frameworks. Integrate security metrics into engineering dashboards. From Optimized to Innovative (18+ months) Launch a bug bounty program. Build custom security tooling for organization-specific risks. Implement zero-trust architecture. Run regular red team exercises. Contribute to open-source security projects. Cultural Aspects Tools and processes are necessary but insufficient. Culture determines whether security practices actually stick.\nBlameless Postmortems When a security incident occurs, the instinct is often to find someone to blame. This drives people to hide mistakes and cover up near-misses. Blameless postmortems flip this around: they focus on systemic failures and process improvements rather than individual fault. The question changes from \u0026ldquo;who made this mistake?\u0026rdquo; to \u0026ldquo;what allowed this mistake to happen, and how do we prevent it?\u0026rdquo;\nSecurity Champions A security champion is a developer who takes on extra responsibility for security within their team. They are not full-time security engineers \u0026mdash; they are developers who act as a bridge between the security team and the development team. Their role includes:\nReviewing security-relevant pull requests. Staying current on security topics and sharing knowledge. Participating in threat modeling sessions. Being the first point of contact for security questions. This model scales far better than having a central security team review everything.\nMaking Security Easy If security practices are painful, people will find workarounds. The goal is to make security the easiest path:\nProvide secure templates and starter projects. Automate as much as possible so developers don\u0026rsquo;t have to remember manual steps. Give fast feedback. A SAST scan that takes 30 minutes will be ignored; one that takes 30 seconds will be used. Celebrate security improvements just as you celebrate feature delivery. Conclusion A DevSecOps maturity model is a compass, not a destination. The value comes from honest self-assessment, setting realistic goals, and making steady progress. Start where you are, pick the dimension where improvement will have the most impact, and build from there. Security is a team sport. The best security cultures are built incrementally, one practice at a time.\n","date":"2023-10-08T10:00:00+01:00","permalink":"/en/p/devsecops-maturity-model/","title":"DevSecOps maturity model"},{"content":"Starting point I have been running K3s on a couple of physical machines at home for a while. I originally set it up to learn Kubernetes without deploying something heavy, and over time I started using it for small projects: a few web apps, monitoring services, things like that. The setup worked fine but I had never sat down to think carefully about security.\nA few months ago I decided to do that exercise. No rush, no generic checklist from the internet. Just look at what I had, understand what was wrong, and fix it.\nWhat I found was not catastrophic, but there were quite a few things I did not like. I am writing this down because K3s has its own quirks compared to a standard Kubernetes cluster, and most hardening guides cover kubeadm or managed clusters, not home setups running on a single binary.\nWhat makes K3s different Before getting into what I did, it is worth understanding what makes K3s distinct from a security standpoint.\nK3s packages everything in a single binary: API server, scheduler, controller manager, kubelet, kube-proxy, and containerd. Instead of etcd it uses SQLite by default. The default CNI is Flannel. The default ingress controller is Traefik. All of this makes the installation very simple, but it also means there are active components you might not need, and some design decisions favor ease of use over security.\nThe main configuration file lives at /etc/rancher/k3s/config.yaml. That is where most of what I am going to touch ends up.\nThe node before the cluster Before touching anything Kubernetes-related, the operating system. The cluster runs on Debian, so I started there.\nSSH I had password authentication enabled. That is the first thing to turn off:\n1 2 3 4 # /etc/ssh/sshd_config PasswordAuthentication no PermitRootLogin no AllowUsers myuser I also changed the default port. It does not stop someone determined from finding you, but it eliminates a lot of noise in the logs.\nFirewall with nftables The machine had its network interfaces completely open. With a home K3s cluster, the API server (port 6443) should not be reachable from outside your local network. My basic rules with nftables:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 # List active ruleset nft list ruleset # Restrictive default policy nft add chain inet filter input { type filter hook input priority 0 \\; policy drop \\; } # Allow loopback and established connections nft add rule inet filter input iifname lo accept nft add rule inet filter input ct state established,related accept # SSH only from local network nft add rule inet filter input ip saddr 192.168.1.0/24 tcp dport 22 accept # K3s API server only from local network nft add rule inet filter input ip saddr 192.168.1.0/24 tcp dport 6443 accept # Kubernetes node ports nft add rule inet filter input tcp dport 30000-32767 accept # Traefik (if used as ingress) nft add rule inet filter input tcp dport { 80, 443 } accept # ICMP nft add rule inet filter input icmp type echo-request accept What surprised me when reviewing this: the API server was listening on all interfaces and was reachable directly from the WAN because the router had a port forward for another service that was dragging along some traffic. Small scare, nothing exploited, but not something I wanted to leave.\nKernel parameters K3s with the --protect-kernel-defaults flag verifies that certain kernel parameters are set correctly. If they are not, startup fails with a clear message. Better to configure them beforehand:\n1 2 3 4 5 6 7 # /etc/sysctl.d/90-k3s-hardening.conf kernel.panic = 10 kernel.panic_on_oops = 1 vm.overcommit_memory = 1 vm.panic_on_oom = 0 fs.inotify.max_user_watches = 524288 fs.inotify.max_user_instances = 512 1 sysctl --system K3s configuration With the node in order, onto the cluster.\nBase configuration file 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 # /etc/rancher/k3s/config.yaml write-kubeconfig-mode: \u0026#34;0600\u0026#34; protect-kernel-defaults: true secrets-encryption: true kube-apiserver-arg: - \u0026#34;anonymous-auth=false\u0026#34; - \u0026#34;audit-log-path=/var/log/k3s/audit.log\u0026#34; - \u0026#34;audit-log-maxage=30\u0026#34; - \u0026#34;audit-policy-file=/etc/rancher/k3s/audit-policy.yaml\u0026#34; kube-controller-manager-arg: - \u0026#34;terminated-pod-gc-threshold=10\u0026#34; kubelet-arg: - \u0026#34;streaming-connection-idle-timeout=5m\u0026#34; - \u0026#34;protect-kernel-defaults=true\u0026#34; - \u0026#34;make-iptables-util-chains=true\u0026#34; The three most important lines here:\nsecrets-encryption: true enables encryption at rest for Kubernetes secrets. In K3s this is a first-class flag, no need to configure EncryptionConfiguration manually like in kubeadm. anonymous-auth=false removes anonymous access to the API server. write-kubeconfig-mode: \u0026quot;0600\u0026quot; enforces restrictive permissions on the kubeconfig that K3s generates at /etc/rancher/k3s/k3s.yaml. Audit policy Without audit logs you have no idea what is happening in your cluster. This is a reasonable minimum:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 # /etc/rancher/k3s/audit-policy.yaml apiVersion: audit.k8s.io/v1 kind: Policy rules: - level: Metadata resources: - group: \u0026#34;\u0026#34; resources: [\u0026#34;secrets\u0026#34;] - level: RequestResponse resources: - group: \u0026#34;\u0026#34; resources: [\u0026#34;pods/exec\u0026#34;, \u0026#34;pods/attach\u0026#34;] - level: Request verbs: [\u0026#34;create\u0026#34;, \u0026#34;update\u0026#34;, \u0026#34;patch\u0026#34;, \u0026#34;delete\u0026#34;] - level: None users: [\u0026#34;system:kube-proxy\u0026#34;] verbs: [\u0026#34;watch\u0026#34;] resources: - group: \u0026#34;\u0026#34; resources: [\u0026#34;endpoints\u0026#34;, \u0026#34;services\u0026#34;, \u0026#34;services/status\u0026#34;] - level: Metadata omitStages: - RequestReceived The logs go to /var/log/k3s/audit.log. In my case I collect them with Promtail and send them to Loki, so I can run queries from Grafana when I need to review something.\nKubeconfig The kubeconfig K3s generates at /etc/rancher/k3s/k3s.yaml has admin credentials. A mistake I spotted in my own setup: I had that file copied to ~/.kube/config with 644 permissions. Any process running under my user could read it.\n1 2 # Correct permissions chmod 600 ~/.kube/config If you have additional users who need cluster access, create ServiceAccounts or use client certificates with limited RBAC. Do not distribute the admin kubeconfig.\nCNI: from Flannel to Cilium Flannel is the K3s default. It works, but it does not support NetworkPolicies out of the box. That means all pods can communicate with each other without restriction.\nI switched to Cilium. The process with K3s requires disabling Flannel first:\n1 2 3 # /etc/rancher/k3s/config.yaml (add) flannel-backend: none disable-network-policy: true Then install Cilium with Helm:\n1 2 3 4 5 6 7 8 9 helm repo add cilium https://helm.cilium.io/ helm install cilium cilium/cilium \\ --namespace kube-system \\ --set operator.replicas=1 \\ --set ipam.mode=kubernetes \\ --set kubeProxyReplacement=strict \\ --set k8sServiceHost=192.168.1.10 \\ --set k8sServicePort=6443 With kubeProxyReplacement=strict, Cilium also replaces kube-proxy, which gives better performance with eBPF and better network traffic visibility.\nAfter the switch I started applying NetworkPolicies. The first and most important one: default deny in all application namespaces:\n1 2 3 4 5 6 7 8 9 10 apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: default-deny-all namespace: apps spec: podSelector: {} policyTypes: - Ingress - Egress From there, explicitly allowing only what is necessary.\nSecrets: what to do when it is just K3s In a corporate environment you would use Vault or External Secrets Operator backed by AWS Secrets Manager or similar. In a homelab that is overkill. My solution was simpler.\nK3s\u0026rsquo;s encryption at rest already protects secrets in the database. For manifests in Git I use age to encrypt sensitive values before committing them:\n1 2 # Encrypt a value echo \u0026#34;my-real-password\u0026#34; | age -r $(cat ~/.config/age/recipient.txt) | base64 The encrypted value goes into the repository. Deploying it requires a manual decryption step. Less convenient than an automatic operator, but for a homelab it is good enough and does not add operational complexity.\nFalco for runtime detection Everything above is prevention. Falco is detection: it monitors system calls and generates alerts when something suspicious happens inside a container.\nI installed it with Helm:\n1 2 3 4 5 helm repo add falcosecurity https://falcosecurity.github.io/charts helm install falco falcosecurity/falco \\ --namespace falco \\ --create-namespace \\ --set driver.kind=ebpf The default rules already cover quite a bit: shell execution in containers, writes to system directories, privilege changes, unexpected network connections. I added some custom rules for my specific use case.\nAlerts go to Loki as well. That way I have the API server audit logs and Falco alerts in the same place.\nPod Security Standards With K3s \u0026gt;= 1.25 you can use Pod Security Admission directly. I applied the baseline level on application namespaces and restricted where possible:\n1 2 3 4 5 6 7 apiVersion: v1 kind: Namespace metadata: name: apps labels: pod-security.kubernetes.io/enforce: baseline pod-security.kubernetes.io/warn: restricted The restricted level is strict: containers cannot run as root, the root filesystem must be read-only, and all Linux capabilities must be dropped. Some of the workloads I had deployed did not comply. I fixed them one by one before raising the enforce level.\nWhat I left for later Not everything is perfect. There are things I know I should do that I have not gotten to yet.\nRootless K3s: you can run K3s entirely without root, which reduces the impact of a potential privilege escalation. I have looked into it and it has some limitations with certain network features, but for my use case it should work.\nImage verification with cosign: signing and verifying images before deploying them. I have a private registry for my own images and the third-party ones I use, but I am not verifying signatures yet.\nCIS Benchmark: K3s has official documentation for the CIS K3s benchmark. I have reviewed it partially but have not run kube-bench systematically to see what is still pending.\nWhat I learned The default K3s installation is surprisingly reasonable for what it is: a distribution designed to be easy to deploy. But \u0026ldquo;reasonable\u0026rdquo; is not the same as \u0026ldquo;hardened.\u0026rdquo;\nWhat surprised me most was how easy most of the changes turned out to be. K3s has first-class flags for many things that require manual configuration in kubeadm. The secrets-encryption, protect-kernel-defaults, the audit policy directly in the config file. It is not that hard if you sit down and read the documentation.\nThe biggest risk in a homelab is not that someone on the internet attacks you directly. It is that you have a bunch of services running on the same network as your personal life, and if one gets compromised, the blast radius can be larger than it seems. Worth taking seriously even if it is a toy environment.\n","date":"2023-09-14T00:00:00Z","permalink":"/en/p/how-i-hardened-my-k3s-homelab/","title":"How I hardened my K3s homelab"},{"content":"The Attack Surface of Containers Containers give you process isolation, not security isolation. A misconfigured container can expose the host kernel, leak secrets, or become a pivot point for lateral movement. Understand the attack surface first:\nContainer images - Vulnerable libraries, hardcoded credentials, unnecessary tools Container runtime - Vulnerabilities that allow breakouts Orchestrator misconfigurations (Kubernetes RBAC, network policies) - Expose services, grant excessive permissions Supply chain - Compromised base images or dependencies Secrets - Baked into images or environment variables that any process can access Security requires defense in depth across the entire lifecycle: build, ship, run.\nImage security Use minimal base images Every additional package in a container image is a potential vulnerability. Prefer minimal base images:\n1 2 3 4 # Prefer distroless or Alpine over full distributions FROM gcr.io/distroless/static-debian11 # or FROM alpine:3.18 Distroless images contain only your application and its runtime dependencies \u0026mdash; no shell, no package manager, no unnecessary binaries. This dramatically reduces the attack surface.\nMulti-stage builds Multi-stage builds let you compile in one stage and copy only the final artifact to a minimal runtime image:\n1 2 3 4 5 6 7 8 9 10 11 12 13 # Build stage FROM golang:1.21 AS builder WORKDIR /app COPY go.mod go.sum ./ RUN go mod download COPY . . RUN CGO_ENABLED=0 go build -o /app/server # Runtime stage FROM gcr.io/distroless/static-debian11 COPY --from=builder /app/server /server USER nonroot:nonroot ENTRYPOINT [\u0026#34;/server\u0026#34;] The build tools, source code, and intermediate files never make it into the final image.\nNever run as root Running containers as root means a container escape gives the attacker root on the host. Always specify a non-root user:\n1 2 3 # Create a non-root user RUN addgroup -S appgroup \u0026amp;\u0026amp; adduser -S appuser -G appgroup USER appuser In Kubernetes, enforce this with Pod Security Standards (more on this below).\nSecure vs. insecure Dockerfile comparison Here is a side-by-side comparison of common mistakes versus secure practices:\n1 2 3 4 5 6 7 8 # INSECURE - Do NOT do this FROM ubuntu:latest RUN apt-get update \u0026amp;\u0026amp; apt-get install -y curl wget vim COPY . /app RUN echo \u0026#34;DB_PASSWORD=supersecret\u0026#34; \u0026gt; /app/.env EXPOSE 22 80 443 USER root CMD [\u0026#34;python3\u0026#34;, \u0026#34;/app/main.py\u0026#34;] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 # SECURE - Follow this pattern FROM python:3.11-slim AS builder WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir --user -r requirements.txt FROM python:3.11-slim RUN groupadd -r appgroup \u0026amp;\u0026amp; useradd -r -g appgroup appuser WORKDIR /app COPY --from=builder /root/.local /home/appuser/.local COPY --chown=appuser:appgroup . . ENV PATH=/home/appuser/.local/bin:$PATH EXPOSE 8080 USER appuser HEALTHCHECK --interval=30s --timeout=3s CMD curl -f http://localhost:8080/health || exit 1 CMD [\u0026#34;python3\u0026#34;, \u0026#34;main.py\u0026#34;] Key differences: minimal base image, multi-stage build, no secrets in the image, non-root user, only necessary ports exposed, health check included.\nImage scanning with Trivy Trivy is an open-source vulnerability scanner that checks container images, filesystems, and IaC configurations. Integrate it into your CI pipeline to catch vulnerabilities before deployment.\nBasic image scan 1 2 3 4 5 6 7 8 # Scan an image for vulnerabilities trivy image python:3.11-slim # Scan with severity filter trivy image --severity HIGH,CRITICAL myapp:latest # Fail CI if critical vulnerabilities are found trivy image --exit-code 1 --severity CRITICAL myapp:latest Scanning in CI/CD Add Trivy to your pipeline so that no image with critical vulnerabilities reaches production:\n1 2 3 4 5 6 7 8 # Example GitHub Actions step - name: Run Trivy vulnerability scanner uses: aquasecurity/trivy-action@master with: image-ref: \u0026#39;myapp:${{ github.sha }}\u0026#39; format: \u0026#39;table\u0026#39; exit-code: \u0026#39;1\u0026#39; severity: \u0026#39;CRITICAL,HIGH\u0026#39; Scanning IaC and filesystems Trivy goes beyond container images:\n1 2 3 4 5 # Scan Kubernetes manifests for misconfigurations trivy config ./k8s-manifests/ # Scan a filesystem for secrets and vulnerabilities trivy fs --security-checks vuln,secret ./ Runtime security with Falco While scanning catches vulnerabilities at build time, Falco monitors containers at runtime. It uses kernel-level instrumentation to detect suspicious behavior:\nUnexpected shell spawns inside containers. Processes reading sensitive files (/etc/shadow, /etc/passwd). Outbound network connections to unexpected destinations. Privilege escalation attempts. Example Falco rules 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 - rule: Terminal shell in container desc: Detect a shell being spawned in a container condition: \u0026gt; spawned_process and container and proc.name in (bash, sh, zsh, dash) output: \u0026gt; Shell spawned in container (user=%user.name container=%container.name shell=%proc.name parent=%proc.pname) priority: WARNING - rule: Read sensitive file in container desc: Detect reads of sensitive files condition: \u0026gt; open_read and container and fd.name in (/etc/shadow, /etc/passwd) output: \u0026gt; Sensitive file read in container (user=%user.name file=%fd.name container=%container.name) priority: ERROR Deploy Falco as a DaemonSet in your Kubernetes cluster to get visibility into runtime behavior across all nodes.\nKubernetes security Pod Security Standards Kubernetes Pod Security Standards define three levels of restriction:\nPrivileged \u0026mdash; Unrestricted (for system-level workloads only). Baseline \u0026mdash; Prevents known privilege escalations. Restricted \u0026mdash; Heavily restricted, following security best practices. Apply them at the namespace level:\n1 2 3 4 5 6 7 8 apiVersion: v1 kind: Namespace metadata: name: production labels: pod-security.kubernetes.io/enforce: restricted pod-security.kubernetes.io/audit: restricted pod-security.kubernetes.io/warn: restricted NetworkPolicies By default, all pods in Kubernetes can communicate with each other. NetworkPolicies let you restrict traffic:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: api-allow namespace: production spec: podSelector: matchLabels: app: api policyTypes: - Ingress - Egress ingress: - from: - podSelector: matchLabels: app: frontend ports: - protocol: TCP port: 8080 egress: - to: - podSelector: matchLabels: app: database ports: - protocol: TCP port: 5432 This policy ensures the API pod only receives traffic from the frontend and only sends traffic to the database.\nRBAC Follow the principle of least privilege. Avoid giving cluster-admin to service accounts. Create specific roles:\n1 2 3 4 5 6 7 8 9 apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: namespace: production name: deployment-manager rules: - apiGroups: [\u0026#34;apps\u0026#34;] resources: [\u0026#34;deployments\u0026#34;] verbs: [\u0026#34;get\u0026#34;, \u0026#34;list\u0026#34;, \u0026#34;watch\u0026#34;, \u0026#34;update\u0026#34;, \u0026#34;patch\u0026#34;] Regularly audit RBAC bindings with tools like kubectl-who-can or rbac-tool.\nSecrets management Never store secrets in container images, environment variables in plain Dockerfiles, or version control. Instead:\nUse Kubernetes Secrets (encrypted at rest with KMS). Use external secrets managers: HashiCorp Vault, AWS Secrets Manager, Azure Key Vault. Use the External Secrets Operator to sync secrets from external providers into Kubernetes. Rotate secrets regularly and audit access. Supply chain security Signed images Sign your container images with cosign (part of the Sigstore project) and verify signatures before deployment:\n1 2 3 4 5 # Sign an image cosign sign --key cosign.key myregistry/myapp:v1.0 # Verify a signature cosign verify --key cosign.pub myregistry/myapp:v1.0 SBOM (Software Bill of Materials) Generate an SBOM for every image so you can quickly check whether a newly disclosed CVE affects your running containers:\n1 2 3 4 5 # Generate SBOM with Trivy trivy image --format spdx-json --output sbom.json myapp:latest # Or use syft syft myapp:latest -o spdx-json \u0026gt; sbom.json Container security checklist Use this checklist to assess your container security posture:\nBase images are minimal (distroless or Alpine). Multi-stage builds are used to exclude build tools. Containers run as non-root users. Images are scanned for vulnerabilities in CI/CD. No secrets are stored in images or plain environment variables. Kubernetes Pod Security Standards are enforced. NetworkPolicies restrict pod-to-pod communication. RBAC follows least privilege. Runtime security monitoring is in place (Falco or equivalent). Images are signed and signatures are verified before deployment. SBOMs are generated and stored for all production images. Secrets are managed through an external secrets manager. Image pull policies are set to Always for mutable tags. Regular security audits and penetration tests are conducted. Conclusion Container security is an ongoing practice, not a one-time task. Start with basics: minimal images, non-root users, scanning. Then layer on runtime monitoring, network policies, supply chain verification, and automated enforcement. Each layer shrinks the blast radius and moves you closer to actual security.\n","date":"2023-07-14T10:00:00+01:00","permalink":"/en/p/container-security-best-practices/","title":"Container security best practices"},{"content":"What Is MLOps and Why It Matters MLOps is about deploying and maintaining ML models reliably and efficiently in production. It bridges data science experiments and production engineering. Without it, you hit the same problems repeatedly: models that work in notebooks but fail in production, no way to reproduce results, painful handoffs between teams, and zero visibility into how models perform once deployed.\nThe idea is simple: treat ML systems with the same rigor as software. Use version control, automated testing, continuous delivery, and monitoring. Just acknowledge that data and models introduce unique challenges.\nThe ML Lifecycle Before diving into patterns, understand the stages every ML system goes through:\nData ingestion and validation - Collect, clean, and validate input data Feature engineering - Transform raw data into features the model can use Model training - Run experiments, tune hyperparameters, pick algorithms Model evaluation - Test model quality against held-out data and business metrics Model deployment - Serve predictions in production (batch or real-time) Monitoring and feedback - Track performance, detect drift, retrain when needed Each stage has failure modes, and the patterns below help prevent them.\nKey design patterns Feature store A feature store is a centralized repository for storing, sharing, and serving ML features. Instead of each team recomputing features from scratch, a feature store provides:\nConsistency between training and serving (avoiding training-serving skew). Reusability across teams and models. Point-in-time correctness for historical feature values. Tools like Feast, Tecton, and Hopsworks implement this pattern. If you find multiple teams duplicating feature pipelines, a feature store is likely worth the investment.\nModel registry A model registry acts as a versioned catalog for trained models. It stores model artifacts, metadata (hyperparameters, metrics, training data version), and lifecycle stage (staging, production, archived).\nMLflow Model Registry is one of the most widely adopted solutions. It lets you promote models through stages with approval workflows and track lineage from experiment to production.\nCT/CI/CD for ML Traditional CI/CD pipelines build and deploy code. ML pipelines need three loops:\nContinuous Training (CT) \u0026mdash; Automatically retrain models when data changes or performance degrades. Continuous Integration (CI) \u0026mdash; Validate not just code but also data schemas, feature expectations, and model quality thresholds. Continuous Delivery (CD) \u0026mdash; Deploy validated models to serving infrastructure automatically. A typical pipeline trigger might be: new data lands in the data lake, CT kicks off retraining, CI runs validation tests, and CD pushes the model to production if all checks pass.\nA/B testing A/B testing for models means routing a percentage of traffic to a new model while the rest continues hitting the current production model. You measure business metrics (conversion rate, click-through, revenue) rather than just ML metrics (accuracy, F1). This pattern is essential because a model that scores well offline can still perform poorly in production due to feedback loops, latency, or distribution differences.\nShadow deployment In shadow mode, the new model receives production traffic and generates predictions, but those predictions are not served to users. Instead, they are logged alongside the current model\u0026rsquo;s predictions for offline comparison. This is a low-risk way to validate a model on real traffic before exposing it to users.\nCanary releases for models Similar to canary deployments in software, you roll out a new model to a small fraction of traffic (say 5%), monitor key metrics, and gradually increase traffic if everything looks healthy. If metrics degrade, you roll back automatically. This combines well with A/B testing but focuses more on risk mitigation than experimentation.\nTooling overview Tool Primary Use Key Strength MLflow Experiment tracking, model registry Flexible, vendor-neutral Kubeflow End-to-end ML pipelines on Kubernetes Scalable, cloud-native DVC Data and model versioning Git-like workflow for data Weights \u0026amp; Biases Experiment tracking, visualization Excellent UI and collaboration Feast Feature store Open-source, production-ready Seldon Core Model serving on Kubernetes Advanced deployment strategies There is no single tool that covers everything. Most production setups combine several of these, choosing based on team expertise and infrastructure constraints.\nExample: MLflow experiment tracking Here is a minimal example of tracking an experiment with MLflow:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 import mlflow import mlflow.sklearn from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import accuracy_score, f1_score from sklearn.model_selection import train_test_split # Start an MLflow run with mlflow.start_run(run_name=\u0026#34;rf-baseline\u0026#34;): # Log parameters n_estimators = 100 max_depth = 10 mlflow.log_param(\u0026#34;n_estimators\u0026#34;, n_estimators) mlflow.log_param(\u0026#34;max_depth\u0026#34;, max_depth) # Train model model = RandomForestClassifier( n_estimators=n_estimators, max_depth=max_depth, random_state=42 ) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2) model.fit(X_train, y_train) # Log metrics predictions = model.predict(X_test) mlflow.log_metric(\u0026#34;accuracy\u0026#34;, accuracy_score(y_test, predictions)) mlflow.log_metric(\u0026#34;f1_score\u0026#34;, f1_score(y_test, predictions, average=\u0026#34;weighted\u0026#34;)) # Log model artifact mlflow.sklearn.log_model(model, \u0026#34;random-forest-model\u0026#34;) Every run is tracked with its parameters, metrics, and artifacts, making it straightforward to compare experiments and reproduce results.\nAnti-patterns to avoid No versioning of data or models. If you cannot reproduce a training run from six months ago, you have a problem. Version everything: code, data, configuration, and model artifacts.\nTraining-serving skew. When the feature computation logic differs between training and serving, predictions silently degrade. A feature store or shared feature computation library helps eliminate this.\nManual deployment. Copy-pasting model files to a server is a recipe for incidents. Automate deployment through pipelines with proper validation gates.\nIgnoring model monitoring. Models degrade over time as input distributions shift. Without monitoring, you only discover this when a user complains or a business metric drops. Set up alerts for prediction distribution changes, latency, and data quality.\nMonolithic pipelines. A single pipeline that does everything from data ingestion to model serving is fragile and hard to debug. Break pipelines into modular, independently testable stages.\nOver-engineering too early. Not every ML project needs Kubeflow and a feature store on day one. Start simple, identify bottlenecks, and adopt patterns as the complexity of your system grows.\nMLOps maturity levels Organizations typically progress through several maturity levels:\nLevel 0: manual Models trained in notebooks. Manual deployment (file copy, manual API restart). No experiment tracking. No monitoring. Level 1: ML pipeline automation Automated training pipelines. Experiment tracking with tools like MLflow. Basic model validation before deployment. Some monitoring of model predictions. Level 2: CI/CD for ML Automated testing of data, features, and model quality. Continuous training triggered by data changes or schedule. Automated deployment with canary or shadow releases. Comprehensive monitoring with alerting and automated rollback. Level 3: Full MLOps Feature store for consistent feature management. Model registry with governance and approval workflows. A/B testing integrated into the deployment process. Data and model lineage tracked end-to-end. Self-healing pipelines that detect and respond to drift automatically. Most teams are somewhere between Level 0 and Level 1. The goal is not to jump to Level 3 immediately but to progress incrementally, addressing the most painful bottlenecks first.\nConclusion MLOps is about applying engineering patterns to ML\u0026rsquo;s unique challenges. Start with experiment tracking and basic automation, then add feature stores, model registries, and advanced deployment strategies as you scale. The key: treat models like first-class production artifacts. Version them, test them, monitor them, improve them.\n","date":"2023-03-22T10:00:00+01:00","permalink":"/en/p/mlops-pipeline-design-patterns/","title":"MLOps pipeline design patterns"},{"content":"What is AIOps? AIOps (Artificial Intelligence for IT Operations) applies machine learning and data analytics to operational data (logs, metrics, events, traces) to automate and improve workflows. Gartner coined the term in 2017, but the idea is simple: use algorithms to handle the volume and complexity that humans can\u0026rsquo;t manage manually.\nIn practical terms, AIOps platforms ingest data from monitoring tools, APM systems, log aggregators, and event sources. They apply ML models to detect anomalies, correlate events, identify root causes, and in some cases trigger automated remediation. The goal is to reduce mean time to detection (MTTD) and mean time to resolution (MTTR) while freeing operations teams from alert fatigue.\nWhy traditional monitoring falls short Monitoring used to work fine. You had a few servers, a handful of apps, and a limited set of metrics to watch. A static CPU threshold or log regex was enough.\nModern infrastructure broke that model:\nScale: A medium Kubernetes cluster generates millions of metrics and logs per minute. You can\u0026rsquo;t humanly watch dashboards at that scale. Complexity: Microservices create tangled dependency graphs. One user request might touch dozens of services. Finding what caused a latency spike means correlating data across all of them. Dynamic environments: Auto-scaling, ephemeral containers, and serverless functions mean baselines constantly shift. Static thresholds explode with false positives. Alert fatigue: Teams get buried in alerts. When 90% is noise, that critical 10% disappears. Engineers start ignoring everything. AIOps doesn\u0026rsquo;t replace monitoring. It layers on top of what you already have and makes it smarter.\nKey capabilities 1. Anomaly detection Instead of static thresholds, AIOps uses ML models (often time-series analysis, clustering, or autoencoders) to learn what \u0026ldquo;normal\u0026rdquo; looks like for each metric and service. When behavior deviates significantly from the learned baseline, an anomaly is flagged.\nThis handles the dynamic baseline problem. If your application normally sees a traffic spike every Monday at 9 AM, the model learns that pattern and does not alert on it. But an unexpected spike at 3 AM on a Wednesday gets flagged.\n2. Event correlation A single infrastructure issue can generate hundreds or thousands of related alerts across different monitoring tools. AIOps correlates these events — grouping them by time, topology, and causal relationships — to present a single incident instead of a wall of alerts.\nFor example, a network switch failure might trigger alerts on: the switch itself, all connected servers (connectivity lost), all applications on those servers (health check failures), and downstream services (timeout errors). An AIOps platform correlates all of these into one incident: \u0026ldquo;Network switch X failed.\u0026rdquo;\n3. Root cause analysis Beyond correlation, AIOps attempts to identify the root cause of an incident. By understanding the topology of your infrastructure and the causal chain of events, it can suggest that the network switch failure is the root cause, rather than presenting the application timeout as an independent issue.\nThis is where the value becomes tangible. Instead of an on-call engineer spending 30 minutes tracing through dashboards and logs, the platform surfaces the probable root cause immediately.\n4. Auto-remediation The most mature AIOps implementations close the loop by triggering automated remediation actions. If a known pattern is detected (disk filling up, a pod in CrashLoopBackOff, a runaway process consuming memory), the platform can execute predefined runbooks automatically.\nExamples:\nRestart a crashed pod or service. Scale up a deployment when anomalous load is detected. Clear a log directory when disk usage exceeds a dynamic threshold. Trigger a failover when a primary database becomes unresponsive. Auto-remediation requires careful design. Start with low-risk actions and expand as confidence grows.\nCommon platforms and tools The AIOps landscape includes both commercial platforms and open-source building blocks:\nCommercial platforms Platform Strengths Dynatrace Strong auto-discovery, AI engine (Davis), full-stack observability Datadog Unified monitoring + ML-powered alerting, Watchdog anomaly detection Splunk ITSI Powerful log analytics + ML toolkit, good for event correlation Moogsoft Pioneered AIOps space, strong event correlation and noise reduction BigPanda Event correlation and automation focused, integrates with existing tools PagerDuty Incident management with ML-driven noise reduction and smart grouping Open-source building blocks You can assemble an AIOps-like stack from open-source components:\nData collection: Prometheus, Grafana Agent, OpenTelemetry Collector, Fluentd/Fluent Bit. Data storage: Prometheus (metrics), Elasticsearch/OpenSearch (logs), Jaeger/Tempo (traces). Anomaly detection: Facebook Prophet, Isolation Forest (scikit-learn), luminol, Grafana ML. Event correlation: Custom logic on top of event streams, or StackStorm for event-driven automation. Alerting and automation: Alertmanager, Grafana OnCall, StackStorm, Rundeck. Building a custom AIOps stack is significantly more work than using a commercial platform, but it gives you full control and avoids vendor lock-in. A reasonable middle ground is using a commercial platform for core AIOps capabilities while keeping your data pipeline open-source.\nPractical use cases Noise reduction in alert management A team receiving 500+ alerts per day implements AIOps event correlation. Related alerts are grouped into incidents, duplicates are suppressed, and flapping alerts are silenced. Alert volume drops by 80%, and the on-call engineer can focus on actual incidents.\nProactive capacity planning AIOps models analyze historical resource usage trends and predict when capacity limits will be reached. Instead of reacting to a disk-full alert at 2 AM, the platform predicts the issue two weeks in advance and creates a ticket for the team to address during business hours.\nFaster incident response During a production outage, the AIOps platform correlates alerts across the monitoring stack, identifies the root cause (a recent deployment that introduced a memory leak), and surfaces the relevant deployment commit. MTTR drops from 45 minutes to 10 minutes.\nAutomated scaling The platform detects anomalous traffic patterns that deviate from the learned baseline. Instead of waiting for CPU to hit 80% (the static threshold), it triggers a scale-up action based on the rate of change, ensuring capacity is ready before users experience degradation.\nHow AIOps fits into DevOps workflows AIOps is not a replacement for DevOps practices. It is an enhancement layer:\n1 2 3 4 5 Code ──\u0026gt; CI/CD Pipeline ──\u0026gt; Deploy ──\u0026gt; Observe ──\u0026gt; AIOps Layer ──\u0026gt; Act │ │ Monitoring Stack ML Models (metrics, logs, (anomaly detection, traces, events) correlation, RCA) Developers benefit from faster root cause identification when their code causes issues in production. Operations teams benefit from noise reduction, automated remediation, and proactive alerting. SRE teams benefit from data-driven SLO tracking and error budget burn rate analysis. AIOps works best when your observability foundation is solid. If you are not collecting good data (structured logs, meaningful metrics, distributed traces), ML models will not produce meaningful insights. Fix your observability first, then layer AIOps on top.\nGetting started: A pragmatic path If AIOps sounds useful, here\u0026rsquo;s a practical approach:\nAudit your current observability stack. What data are you collecting? Do you have structured logs? Consistently labeled metrics? Traces across services? AIOps can only work with good data.\nStart with noise reduction. This is the lowest-hanging fruit. Implement alert grouping and deduplication. Even basic rules-based correlation (before any ML) will reduce alert fatigue significantly.\nAdd anomaly detection to key metrics. Pick 3-5 critical business and infrastructure metrics. Apply a time-series anomaly detection model. Facebook Prophet or Prometheus recording rules with seasonal adjustments are good starting points.\nImplement automated remediation for known issues. Identify the top 5 recurring incidents. Write runbooks for them. Automate the runbooks using StackStorm, Rundeck, or your platform\u0026rsquo;s automation engine.\nEvaluate a commercial platform when complexity demands it. If you have hundreds of services, multiple monitoring tools, and a growing operations team, the investment in a commercial AIOps platform may be justified by the reduction in MTTR alone.\nMeasure the impact. Track MTTD, MTTR, alert-to-incident ratio, and false positive rate. Without metrics, you can\u0026rsquo;t prove AIOps is worth the investment.\nAIOps isn\u0026rsquo;t magic. It\u0026rsquo;s a set of techniques that, applied to solid operational data, can reduce the burden on ops teams and improve reliability. Start small, measure everything, and scale what actually works.\n","date":"2022-12-05T00:00:00Z","permalink":"/en/p/introduction-to-aiops-intelligent-it-operations/","title":"Introduction to AIOps: intelligent IT operations"},{"content":"Why infrastructure as code matters Managing infrastructure manually through web consoles or ad-hoc scripts creates problems that pile up over time: inconsistent environments, undocumented changes, impossible rollbacks, and the classic \u0026ldquo;it works on my machine\u0026rdquo; extended to entire servers.\nInfrastructure as Code (IaC) fixes this by treating infrastructure like application code: it\u0026rsquo;s written, versioned, reviewed, tested, and applied through automated workflows. The benefits show up right away:\nReproducibility: Spin up identical environments in minutes, not days. Version control: Every infrastructure change goes through a PR with code review. Documentation by default: The code is the documentation of what your infrastructure looks like. Disaster recovery: Rebuild everything from code if a region goes down. Cost visibility: Review infrastructure changes before they are applied (and before they start costing money). Terraform vs other tools Several IaC tools exist. Here\u0026rsquo;s how Terraform compares to the main alternatives:\nFeature Terraform Pulumi CloudFormation Ansible Language HCL (declarative) Python, TypeScript, Go, etc. JSON/YAML YAML (procedural) Cloud support Multi-cloud Multi-cloud AWS only Multi-cloud (via modules) State management Explicit state file Managed by Pulumi service Managed by AWS Stateless Learning curve Moderate Varies by language Moderate Low Ecosystem Huge provider ecosystem Growing AWS-only but deep Huge role ecosystem Best for Multi-cloud infra Teams that prefer general-purpose languages AWS-only shops Configuration management Terraform\u0026rsquo;s sweet spot is multi-cloud infrastructure provisioning with a declarative approach. If you\u0026rsquo;re on AWS only and want tight integration, CloudFormation is reasonable. If your team prefers writing Python over HCL, Pulumi deserves a look. But for most teams managing infrastructure across providers, Terraform is the pragmatic choice.\nCore concepts Providers Providers are plugins that let Terraform interact with APIs — AWS, Azure, GCP, Kubernetes, GitHub, Cloudflare, and hundreds more.\n1 2 3 4 5 6 7 8 9 10 11 12 terraform { required_providers { aws = { source = \u0026#34;hashicorp/aws\u0026#34; version = \u0026#34;~\u0026gt; 5.0\u0026#34; } } } provider \u0026#34;aws\u0026#34; { region = \u0026#34;eu-west-1\u0026#34; } Resources Resources are the fundamental building blocks. Each resource block describes one infrastructure object.\n1 2 3 4 5 6 7 8 resource \u0026#34;aws_instance\u0026#34; \u0026#34;web\u0026#34; { ami = \u0026#34;ami-0c55b159cbfafe1f0\u0026#34; instance_type = \u0026#34;t3.micro\u0026#34; tags = { Name = \u0026#34;web-server\u0026#34; } } State Terraform maintains a state file that maps your configuration to real-world resources. This is how Terraform knows what exists, what needs to change, and what to destroy. The state file is critical. Losing it means Terraform loses track of your infrastructure.\nModules Modules are reusable packages of Terraform configuration. Think of them as functions: they take inputs (variables), create resources, and produce outputs.\n1 2 3 4 5 6 7 8 9 10 11 12 13 module \u0026#34;vpc\u0026#34; { source = \u0026#34;terraform-aws-modules/vpc/aws\u0026#34; version = \u0026#34;5.1.0\u0026#34; name = \u0026#34;my-vpc\u0026#34; cidr = \u0026#34;10.0.0.0/16\u0026#34; azs = [\u0026#34;eu-west-1a\u0026#34;, \u0026#34;eu-west-1b\u0026#34;] private_subnets = [\u0026#34;10.0.1.0/24\u0026#34;, \u0026#34;10.0.2.0/24\u0026#34;] public_subnets = [\u0026#34;10.0.101.0/24\u0026#34;, \u0026#34;10.0.102.0/24\u0026#34;] enable_nat_gateway = true } Practical example: VPC + EC2 Here\u0026rsquo;s a complete example that provisions a VPC with a public subnet and an EC2 instance:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 terraform { required_version = \u0026#34;\u0026gt;= 1.5.0\u0026#34; required_providers { aws = { source = \u0026#34;hashicorp/aws\u0026#34; version = \u0026#34;~\u0026gt; 5.0\u0026#34; } } } provider \u0026#34;aws\u0026#34; { region = \u0026#34;eu-west-1\u0026#34; } # --- Networking --- resource \u0026#34;aws_vpc\u0026#34; \u0026#34;main\u0026#34; { cidr_block = \u0026#34;10.0.0.0/16\u0026#34; enable_dns_support = true enable_dns_hostnames = true tags = { Name = \u0026#34;main-vpc\u0026#34; } } resource \u0026#34;aws_subnet\u0026#34; \u0026#34;public\u0026#34; { vpc_id = aws_vpc.main.id cidr_block = \u0026#34;10.0.1.0/24\u0026#34; availability_zone = \u0026#34;eu-west-1a\u0026#34; map_public_ip_on_launch = true tags = { Name = \u0026#34;public-subnet\u0026#34; } } resource \u0026#34;aws_internet_gateway\u0026#34; \u0026#34;gw\u0026#34; { vpc_id = aws_vpc.main.id tags = { Name = \u0026#34;main-igw\u0026#34; } } resource \u0026#34;aws_route_table\u0026#34; \u0026#34;public\u0026#34; { vpc_id = aws_vpc.main.id route { cidr_block = \u0026#34;0.0.0.0/0\u0026#34; gateway_id = aws_internet_gateway.gw.id } tags = { Name = \u0026#34;public-rt\u0026#34; } } resource \u0026#34;aws_route_table_association\u0026#34; \u0026#34;public\u0026#34; { subnet_id = aws_subnet.public.id route_table_id = aws_route_table.public.id } # --- Security Group --- resource \u0026#34;aws_security_group\u0026#34; \u0026#34;web\u0026#34; { name = \u0026#34;web-sg\u0026#34; description = \u0026#34;Allow HTTP and SSH\u0026#34; vpc_id = aws_vpc.main.id ingress { from_port = 80 to_port = 80 protocol = \u0026#34;tcp\u0026#34; cidr_blocks = [\u0026#34;0.0.0.0/0\u0026#34;] } ingress { from_port = 22 to_port = 22 protocol = \u0026#34;tcp\u0026#34; cidr_blocks = [\u0026#34;YOUR_IP/32\u0026#34;] # Restrict to your IP } egress { from_port = 0 to_port = 0 protocol = \u0026#34;-1\u0026#34; cidr_blocks = [\u0026#34;0.0.0.0/0\u0026#34;] } } # --- EC2 Instance --- resource \u0026#34;aws_instance\u0026#34; \u0026#34;web\u0026#34; { ami = \u0026#34;ami-0c55b159cbfafe1f0\u0026#34; instance_type = \u0026#34;t3.micro\u0026#34; subnet_id = aws_subnet.public.id vpc_security_group_ids = [aws_security_group.web.id] tags = { Name = \u0026#34;web-server\u0026#34; } } # --- Outputs --- output \u0026#34;instance_public_ip\u0026#34; { value = aws_instance.web.public_ip } output \u0026#34;vpc_id\u0026#34; { value = aws_vpc.main.id } The plan/apply workflow Terraform follows a predictable workflow:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 # 1. Initialize - download providers and modules terraform init # 2. Format - ensure consistent code style terraform fmt # 3. Validate - check syntax and configuration terraform validate # 4. Plan - preview what will change (critical step!) terraform plan -out=tfplan # 5. Apply - execute the plan terraform apply tfplan # 6. Destroy - tear down all resources (when needed) terraform destroy The terraform plan step is the most important. Never skip it. Always review the plan output before applying, especially in production. The plan shows you exactly what will be created, modified, or destroyed.\n1 2 # Example plan output Plan: 6 to add, 0 to change, 0 to destroy. In CI/CD pipelines, save the plan to a file (-out=tfplan) and apply that exact plan. This prevents race conditions where infrastructure changes between the plan and apply steps.\nState management best practices State management is where most Terraform problems originate. Follow these practices:\nUse a remote backend Never store state locally or in Git. Use a remote backend with encryption and locking:\n1 2 3 4 5 6 7 8 9 terraform { backend \u0026#34;s3\u0026#34; { bucket = \u0026#34;my-terraform-state\u0026#34; key = \u0026#34;prod/networking/terraform.tfstate\u0026#34; region = \u0026#34;eu-west-1\u0026#34; encrypt = true dynamodb_table = \u0026#34;terraform-locks\u0026#34; } } The DynamoDB table provides state locking. This prevents two people or pipelines from modifying the same infrastructure at the same time.\nOrganize state by component Don\u0026rsquo;t put all your infrastructure in one state file. Split by component or team:\n1 2 3 4 5 6 7 8 9 10 environments/ ├── prod/ │ ├── networking/ # VPC, subnets, routes │ ├── compute/ # EC2, ASGs, load balancers │ ├── database/ # RDS instances │ └── monitoring/ # CloudWatch, alerts └── staging/ ├── networking/ ├── compute/ └── database/ Smaller state files mean faster plans, smaller blast radius, and fewer teams competing for locks.\nUse terraform_remote_state sparingly You can reference outputs from other state files, but use it carefully. Over-reliance on remote state creates tight coupling between components. Prefer passing values through variables or a parameter store.\nTips for production use Pin provider versions. Use ~\u0026gt; constraints to allow patch updates but prevent breaking changes: version = \u0026quot;~\u0026gt; 5.0\u0026quot;.\nUse workspaces carefully. Workspaces are useful for simple environment separation but get confusing at scale. Separate directories per environment is usually clearer.\nImplement a CI/CD pipeline for Terraform. Run terraform plan on PRs and post the output as a PR comment. Run terraform apply only after merge and approval.\nUse prevent_destroy for critical resources. This lifecycle rule stops accidental destruction of databases or persistent storage:\n1 2 3 4 5 6 resource \u0026#34;aws_db_instance\u0026#34; \u0026#34;main\u0026#34; { # ... lifecycle { prevent_destroy = true } } Tag everything. Use a default_tags block in the provider to ensure every resource gets standard tags (environment, team, project).\nUse tflint and checkov. Lint your Terraform code and scan for security misconfigurations before applying.\n1 2 3 tflint --init tflint checkov -d . Import existing resources. If you have manually created infrastructure, use terraform import to bring it under management instead of recreating it.\nReview the plan diff carefully. A resource showing \u0026ldquo;destroy and recreate\u0026rdquo; might cause downtime. Understand which changes are in-place versus destructive.\nTerraform is one of those tools that rewards discipline. The more consistently you follow these practices, the more confidently your team manages infrastructure at scale.\n","date":"2022-09-10T00:00:00Z","permalink":"/en/p/infrastructure-as-code-with-terraform-a-practical-guide/","title":"Infrastructure as code with Terraform: a practical guide"},{"content":"What is shift-left security? Shift-left security means moving security practices earlier in the software development lifecycle. Rather than treating security as a final gate (or worse, an afterthought) before production, you embed security checks directly into your CI/CD pipelines. This catches vulnerabilities when they\u0026rsquo;re cheapest to fix: at development time.\nThe traditional \u0026ldquo;build it, then audit it\u0026rdquo; model doesn\u0026rsquo;t scale. Modern applications ship dozens of times a day. If your security review is a manual quarterly process, you\u0026rsquo;re deploying vulnerable code most of the time.\nThreat vectors in CI/CD pipelines Before hardening your pipeline, understand what you\u0026rsquo;re defending against. CI/CD systems are high-value targets because they have access to production credentials, source code, and build artifacts.\nCommon threat vectors include:\nCompromised dependencies: A malicious package in your dependency tree (supply chain attack). Leaked secrets: API keys, tokens, or passwords committed to source code or exposed in build logs. Code vulnerabilities: SQL injection, XSS, insecure deserialization, and other OWASP Top 10 issues introduced in your application code. Insecure pipeline configuration: Overly permissive runner access, unprotected branch rules, or missing approval gates. Container image vulnerabilities: Base images with known CVEs that end up in production. Build artifact tampering: Unsigned or unverified artifacts that attackers can replace. Integrating SAST into your pipeline Static Application Security Testing (SAST) analyzes source code without executing it. It catches vulnerabilities early, before code even runs.\nRecommended tools Semgrep: Fast, flexible, and supports custom rules. Works across many languages. My top recommendation for most teams. Bandit: Python-specific. Excellent for Python projects because it understands Python-specific security patterns. SonarQube: Broader scope (code quality and security combined). Heavier to set up but useful if you want a single dashboard. Running Semgrep locally 1 2 3 4 5 6 7 8 # Install pip install semgrep # Run with default rulesets semgrep --config auto . # Run with OWASP Top 10 rules specifically semgrep --config \u0026#34;p/owasp-top-ten\u0026#34; . Running Bandit for Python projects 1 2 pip install bandit bandit -r ./src -f json -o bandit-report.json The key is to run these tools on every pull request, not just on the main branch. Developers should see findings before code is merged.\nIntegrating DAST into your pipeline Dynamic Application Security Testing (DAST) tests the running application from the outside, simulating an attacker. It catches issues that SAST misses (misconfigurations, authentication flaws, and runtime vulnerabilities).\nOWASP ZAP OWASP ZAP is the standard open-source DAST tool. You can run it in your pipeline against a staging deployment:\n1 2 3 4 5 6 7 # Pull the ZAP Docker image docker pull ghcr.io/zaproxy/zaproxy:stable # Run a baseline scan against a target URL docker run -t ghcr.io/zaproxy/zaproxy:stable zap-baseline.py \\ -t https://staging.example.com \\ -r zap-report.html DAST is typically slower than SAST. Run it after deployment to a staging environment rather than on every commit. A nightly or per-release cadence works well for most teams.\nSecret scanning Leaked secrets are one of the most common and damaging security failures. Once a secret makes it into Git history, consider it compromised, even if you remove it in a later commit.\nTools gitleaks: Scans Git repositories for secrets using regex and entropy-based detection. trufflehog: Searches through Git history for high-entropy strings and known secret patterns. Running gitleaks 1 2 3 4 5 6 7 8 # Install brew install gitleaks # or download binary from GitHub releases # Scan the current repo gitleaks detect --source . --report-path gitleaks-report.json # Scan in CI (detect only new leaks in the PR) gitleaks detect --source . --log-opts=\u0026#34;origin/main..HEAD\u0026#34; Pre-commit hook Block secrets before they even reach the remote:\n1 2 3 4 5 6 # .pre-commit-config.yaml repos: - repo: https://github.com/gitleaks/gitleaks rev: v8.16.1 hooks: - id: gitleaks Dependency scanning Your application is only as secure as its weakest dependency. Supply chain attacks are increasing, so you need to scan your dependency tree regularly.\nTools Trivy: Scans container images, filesystems, and Git repos for vulnerabilities. Fast and comprehensive. Snyk: Commercial option with a generous free tier. Good developer experience. OWASP Dependency-Check: Mature, open-source, supports multiple ecosystems. 1 2 3 4 5 # Scan a container image with Trivy trivy image my-app:latest # Scan the filesystem for vulnerable dependencies trivy fs --severity HIGH,CRITICAL . GitHub Actions workflow with security stages Here is a practical GitHub Actions workflow that integrates all the security stages discussed above:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 name: Secure CI/CD Pipeline on: pull_request: branches: [main] push: branches: [main] jobs: secret-scan: name: Secret Scanning runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 with: fetch-depth: 0 - name: Run gitleaks uses: gitleaks/gitleaks-action@v2 env: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} sast: name: Static Analysis runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Run Semgrep uses: returntocorp/semgrep-action@v1 with: config: \u0026gt;- p/security-audit p/owasp-top-ten - name: Run Bandit (Python) run: | pip install bandit bandit -r ./src -f json -o bandit-report.json || true - name: Upload SAST reports uses: actions/upload-artifact@v4 with: name: sast-reports path: \u0026#34;*.json\u0026#34; dependency-scan: name: Dependency Scanning runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Run Trivy filesystem scan uses: aquasecurity/trivy-action@master with: scan-type: fs severity: HIGH,CRITICAL exit-code: 1 build-and-scan-image: name: Build \u0026amp; Scan Image needs: [secret-scan, sast, dependency-scan] runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Build Docker image run: docker build -t my-app:${{ github.sha }} . - name: Scan image with Trivy uses: aquasecurity/trivy-action@master with: image-ref: my-app:${{ github.sha }} severity: HIGH,CRITICAL exit-code: 1 deploy-staging: name: Deploy to Staging needs: [build-and-scan-image] runs-on: ubuntu-latest environment: staging steps: - name: Deploy to staging run: echo \u0026#34;Deploy to staging environment\u0026#34; dast: name: Dynamic Analysis needs: [deploy-staging] runs-on: ubuntu-latest steps: - name: OWASP ZAP Baseline Scan uses: zaproxy/action-baseline@v0.9.0 with: target: \u0026#34;https://staging.example.com\u0026#34; Notice the deliberate ordering: secret scanning, SAST, and dependency scanning run in parallel (fast feedback). Image scanning runs after build. DAST runs after staging deployment.\nBest practices checklist Here is a checklist to assess the security maturity of your CI/CD pipeline:\nSecret scanning runs on every PR and blocks merge on findings. Pre-commit hooks prevent secrets from being committed locally. SAST runs on every PR with rules covering OWASP Top 10. Dependency scanning runs on every PR and flags HIGH/CRITICAL CVEs. Container image scanning runs before pushing images to a registry. DAST runs against staging on every release (or nightly at minimum). Branch protection requires passing security checks before merge. Least privilege is enforced on CI runner permissions and secrets access. Signed commits and signed artifacts are enforced for production releases. Build logs are reviewed to ensure secrets are not leaked in output. Dependency pinning is used (exact versions, not ranges) to prevent supply chain attacks. Regular rotation of all CI/CD secrets and service account keys. Final thoughts Shift-left security isn\u0026rsquo;t about buying a tool, it\u0026rsquo;s about changing the culture. Security checks in CI/CD should be fast, automated, and non-negotiable. Start with secret scanning (highest ROI), add SAST, then layer in dependency scanning and DAST.\nThe goal isn\u0026rsquo;t to catch everything in the pipeline. The goal is to raise the bar high enough that obvious issues never reach production, freeing your security team to focus on the subtle, high-impact threats that require human judgment.\n","date":"2022-06-20T00:00:00Z","permalink":"/en/p/ci/cd-pipeline-security-a-shift-left-approach/","title":"CI/CD pipeline security: a shift-left approach"},{"content":"This post presents several free and open-source Android applications that can be downloaded from F-Droid.\nF-Droid AntennaPod\nAurora Store\nFiles\nCalendar\nContacts\nF-Droid: repositories\nF-Droid Archive\nF-Droid\nGuardian Project Archive\nGuardian Project Official Releases\nPartido Interdimensional Pirata: https://fdroid.partidopirata.com.ar/fdroid/repo?fingerprint=3DF6969EA3A2186D8A5DB00884B3F42F164931E8CFAFD7CC48263CAD1361A1D5\narchive.newpipe.net/fdroid/repo: https://archive.newpipe.net/fdroid/repo?fingerprint=E2402C78F9B97C6C89E97DB914A2751FDA1D02FE2039CC0897A462BDB57E7501\nFedilab\nGallery\nGitFox\nGitNex\nRecorder\nHacker\u0026rsquo;s Keyboard\nIceCatMobile\nJami\nK-9 Mail\nLibreOffice Viewer\nLibrera\nMarkor\nMoneyBuster\nMusic\nNewPipe\nNextcloud\nOpenkeyChain\nOpenTrack\nOSM DashBoard\nOsmAnd~\nProtonmail\nRiseUpVPN\nStandard Notes\nTutanota\nUntrackme\nVLC\n","date":"2022-06-03T00:00:00Z","permalink":"/en/p/mobile-applications/","title":"Mobile Applications"},{"content":"What is GitOps? GitOps is an operational framework that takes DevOps best practices from application development (version control, collaboration, CI/CD) and applies them to infrastructure automation. At its core, GitOps treats Git as the single source of truth for both infrastructure and application configuration.\nRather than manually applying changes to your clusters or infrastructure, you declare the desired state in Git repositories. Automated agents then ensure your systems converge to that declared state. If something drifts, the system corrects itself automatically.\nThis isn\u0026rsquo;t just a trendy buzzword. GitOps fundamentally changes how teams operate Kubernetes environments, bringing real predictability, auditability, and deployment speed.\nCore Principles GitOps rests on four foundational principles. If your workflow does not honor all four, you are probably doing \u0026ldquo;Git-flavored ops\u0026rdquo; rather than true GitOps.\n1. Declarative Configuration The entire desired state of your system must be described declaratively. For Kubernetes, that means YAML manifests, Helm charts, or Kustomize overlays stored in Git. No imperative kubectl apply commands run from someone\u0026rsquo;s laptop.\n2. Versioned and Immutable Since Git is your source of truth, every change is versioned. You get a complete audit trail automatically (who changed what, when, and why). Rolling back is just reverting a commit. No more guessing about what the previous production state was.\n3. Pulled Automatically Approved changes are automatically pulled and applied by agents running inside the cluster. This is the crucial difference: instead of a CI pipeline pushing changes into the cluster (which means giving CI credentials to the cluster), the agent inside the cluster pulls changes from Git. This pull-based model significantly improves your security.\n4. Continuously Reconciled (Self-Healing) The agent continuously compares the desired state in Git with the actual state in the cluster. If someone manually modifies a resource (or if a node fails and resources are rescheduled), the agent detects the drift and reconciles it. The system is self-healing by design.\nGitOps Workflow Here\u0026rsquo;s a simplified text-based diagram of a typical GitOps workflow:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Developer ──\u0026gt; Git Push ──\u0026gt; Application Repo │ ▼ CI Pipeline (build, test, push image) │ ▼ Config Repo (update image tag in manifests) │ ▼ GitOps Agent (ArgoCD / Flux) watches Config Repo │ ▼ Kubernetes Cluster (desired state applied) │ ▼ Continuous Reconciliation (drift detection + self-healing) Key points:\nThe Application Repo contains your source code. CI builds and pushes container images. The Config Repo (sometimes called the \u0026ldquo;environment repo\u0026rdquo;) holds your Kubernetes manifests. CI or an automated process updates the image tag here after each successful build. The GitOps Agent watches the Config Repo and applies changes to the cluster. Separating application code from deployment configuration is a best practice because it keeps concerns isolated and allows independent versioning.\nArgoCD vs Flux: A Practical Comparison The two dominant GitOps tools for Kubernetes are ArgoCD and Flux. Both are CNCF projects and both are production-ready. Here\u0026rsquo;s an honest comparison based on hands-on experience.\nFeature ArgoCD Flux UI Rich web UI with visualization No built-in UI (use Weave GitOps or similar) Architecture Centralized server with API Decentralized, controller-based Multi-tenancy AppProjects with RBAC Namespace-scoped controllers Helm support Native rendering HelmRelease CRD with HelmController Kustomize Native Native Notifications Built-in notification controller Separate notification-controller Image automation Not built-in (use Argo Image Updater) Built-in image automation controllers Learning curve Lower (UI helps) Slightly steeper (CLI/CRD-first) GitOps Toolkit Monolithic Modular (pick what you need) When to choose ArgoCD Your team values a visual dashboard for cluster state. You need strong multi-tenancy with role-based access control. You want a lower barrier to entry for team members new to GitOps. When to choose Flux You prefer a modular, composable architecture. You need built-in image automation (automatically updating image tags in Git). You favor a CLI-first, infrastructure-as-code approach without relying on a UI. Honestly, both are solid choices. I lean toward ArgoCD for teams that need visibility and toward Flux for platform teams that prefer composability.\nArgoCD Application Manifest Example Here\u0026rsquo;s a minimal ArgoCD Application resource that deploys an application from a Git repository:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 apiVersion: argoproj.io/v1alpha1 kind: Application metadata: name: my-app namespace: argocd spec: project: default source: repoURL: https://github.com/my-org/my-app-config.git targetRevision: main path: overlays/production destination: server: https://kubernetes.default.svc namespace: my-app syncPolicy: automated: prune: true selfHeal: true syncOptions: - CreateNamespace=true Here\u0026rsquo;s what each important field does:\nsource.repoURL: Points to the Git repository containing your manifests. source.targetRevision: The branch or tag to track. source.path: The directory within the repo containing the manifests (useful with Kustomize overlays). destination.server: The target Kubernetes cluster API server. syncPolicy.automated: Enables automatic syncing. prune: true removes resources that are no longer in Git. selfHeal: true re-applies desired state when drift is detected. To apply this manifest after installing ArgoCD:\n1 kubectl apply -f application.yaml ArgoCD will immediately begin syncing the declared state to the cluster.\nBenefits for Teams Adopting GitOps isn\u0026rsquo;t just a technical improvement. It changes how your team actually works:\nFaster onboarding: New team members can understand the entire system state by reading the config repo. No tribal knowledge needed. Reliable rollbacks: Revert a Git commit, and the cluster follows. No need to remember the exact kubectl commands that were run. Improved security posture: Developers never need direct kubectl access to production. All changes go through Git PRs with code review. Audit compliance: Every change is a Git commit with author, timestamp, and rationale. This satisfies many compliance requirements out of the box. Reduced cognitive load: Developers focus on writing code and updating manifests. The GitOps agent handles the rest. Disaster recovery: If a cluster is destroyed, you can recreate the entire state from Git. The config repo is your backup. Getting Started If you\u0026rsquo;re new to GitOps, here\u0026rsquo;s a pragmatic path forward:\nStart with a non-critical environment. Set up ArgoCD or Flux in a staging cluster first. Migrate one application at a time. Don\u0026rsquo;t try to convert everything overnight. Establish a config repo convention early. Decide on directory structure, naming, and branching strategy before scaling. Use Kustomize or Helm for environment differences. Avoid copying manifests between staging/ and production/ directories. Set up notifications. Connect ArgoCD or Flux to Slack or your preferred channel so the team sees sync events. GitOps is one of those practices where the investment pays off quickly. Once your team experiences the confidence of knowing that Git reflects reality, there\u0026rsquo;s no going back.\n","date":"2022-03-15T00:00:00Z","permalink":"/en/p/gitops-principles-and-workflow-a-practical-guide/","title":"GitOps principles and workflow: a practical guide"},{"content":"This post covers installing and configuring a dual boot setup with multiple GNU/Linux distributions (Debian and ParrotOS) using rEFInd. If you use GRUB to boot multiple distributions and have never heard of rEFInd, then this post is for you. We followed the instructions from this article.\nImportant note: the installation described below was performed on a clean storage disk. If you have data you want to keep, it is strongly recommended to back up all data before installing on the disk.\nRequirements Clean storage disk. USB flashed with Debian and later with Parrot Review the basic concepts Basic Concepts Debian GNU/Linux is a free operating system, developed by thousands of volunteers from around the world who collaborate via the Internet.\nDebian\u0026rsquo;s dedication to free software, its volunteer base, its non-commercial nature, and its open development model distinguish it from other GNU operating system distributions1.\nLVM is an implementation of a logical volume manager for the Linux kernel. LVM includes many of the features expected from a volume manager, including:\nResizing of logical groups Resizing of logical volumes Read-only snapshots (LVM2 offers read and write) RAID0 of logical volumes. LVM does not implement RAID1 or RAID5, so it is recommended to use dedicated RAID software for these operations, placing the LVs on top of the RAID2. RAID will not be used in this configuration.\nLUKS is a disk encryption specification created by Clemens Fruhwirth, originally intended for Linux. While most disk encryption software implements different and incompatible undocumented formats, LUKS specifies a standard on-disk format, platform-independent, for use with various tools. This not only facilitates compatibility and interoperability between different programs, but also ensures that they all implement password management in a secure and documented manner. The reference implementation runs on Linux and is based on an enhanced version of cryptsetup, using dm-crypt as the disk encryption interface3.\nA boot loader loads an operating system kernel into memory and executes it. A boot manager hands over control to another boot program. GRUB is both a boot loader and a boot manager. rEFInd is only a boot manager.\nAnother fundamental concept is understanding the difference between EFI/UEFI and BIOS.\nLVM is an implementation of a logical volume manager for the Linux kernel. LVM includes many of the features expected from a volume manager, including:\nResizing of logical groups Resizing of logical volumes Read-only snapshots (LVM2 offers read and write) RAID0 of logical volumes. LVM does not implement RAID1 or RAID5, so it is recommended to use dedicated RAID software for these operations, placing the LVs on top of the RAID2. RAID will not be used in this configuration.\nLUKS is a disk encryption specification created by Clemens Fruhwirth, originally intended for Linux. While most disk encryption software implements different and incompatible undocumented formats, LUKS specifies a standard on-disk format, platform-independent, for use with various tools. This not only facilitates compatibility and interoperability between different programs, but also ensures that they all implement password management in a secure and documented manner. The reference implementation runs on Linux and is based on an enhanced version of cryptsetup, using dm-crypt as the disk encryption interface3.\nIn the Partition Table, the ext4 format is used for partitions because it improves I/O speed and uses less CPU than the ext3 and ext2 formats. The following minimum values are recommended:\nPartition Recommended Size Debian Allocation Custom Allocation Contains / \u0026gt;= 750MB 22GB 64GB /etc, /bin, /sbin, /lib, /dev, /usr /usr \u0026gt;= 4-6GB 0 0 User programs, libs and docs /var \u0026gt;= 2-3GB 32GB 112GB Variable data such as emails /tmp \u0026gt;= 100MB 16GB 32GB Web pages, package cache, temporary data /home \u0026gt;= 100MB 200GB 288GB Directory with Documents, Downloads, \u0026hellip; /boot \u0026gt;= 256MB 500MB 512GB Primary Partition, ext4 or ext2, encryption not recommended /boot/efi \u0026gt;= 100MB 250MB 0 Encryption not recommended and bootable flag: on /swap \u0026gt;= 8GB 16GB 16GB Swap area GParted Using GParted from a live USB, delete all partitions and create a new GPT (GUID Partition Table) partition table. GPT is a format used by EFI systems and is a modern alternative to the MBR (Master Boot Record) format used by BIOS systems.\nUEFI requires that each boot disk have a special partition called the EFI System Partition (ESP). The ESP is a simple FAT16 or FAT32 partition with the boot and esp partition flags. The ESP stores EFI executable files; although they are smaller than 100MB, some operating systems require the partition to have a capacity of 500MB. Therefore, create a 550MB primary partition, fat32 format, with the label ESP and efi as the partition name. Apply the changes and in the manage flags option, select boot and esp.\nThe next step is to create two ext4 partitions for the Debian and ParrotOS operating systems. For example, you can allocate 250GB to Debian, 150GB to Parrot, and the rest can be a shared data partition. It is recommended to assign a corresponding label to each partition.\nDebian Installation Access the expert installation mode with the graphical interface and proceed to disk and partition detection. Create the following partitions:\n500MB for a shared EFI partition. 500MB for the Debian boot partition. 500MB for the ParrotOS boot partition. The remainder in a partition where the different operating systems will reside. First, create an encrypted volume on the partition labeled all-Operative-Systems, specifying that the partition should not be formatted or erased. Then create an LVM volume group and the following logical volumes:\nDebian Volumes: 8GB for SWAP. 250GB for root. 100GB for home. ParrotOS Volumes: 8GB for SWAP. 250GB for root. 100GB for home. Shared Data Volumes: The remainder for shared data. Assign the corresponding mount point to each Debian logical volume and finish the installation by choosing your preferred desktop environment.\nParrotOS Installation Since ParrotOS is a Debian-based distribution, the installation is the same as in the previous section. Access the expert graphical installation mode and proceed to the disk detection step. Since the operating systems partition is encrypted, you need to decrypt it and detect the LVM volume group. To do this, exit the disk detection section and go to the section for opening a terminal or shell. First, decrypt the encrypted partition with:\nNote: the /dev/sdaX partition must correspond to the encrypted one, and the name must be the one assigned as the label.\n1 cryptsetup luksOpen /dev/sdaX all-Operative-Systems Then detect the LVM volume group with:\n1 vgchange -a y Note: the following steps may not work on the first attempt and may need to be completed with the next section. You could skip the end of this section and install GRUB directly.\nOnce the above commands have been executed, continue with the installation until the GRUB installation. Open a terminal again and identify the UUID of the encrypted partition with:\n1 blkid /dev/sdaX Next, edit the /etc/crypttab file:\n1 nano /etc/crypttab Add the following content, where the UUID is the one obtained from the blkid command:\n1 all-Operative-Systems UUID=524c1ad6-fabe-4f32-9bb0-c8db1286b262 none luks Finish the installation and reboot the operating system. If everything works correctly, you are done. Most commonly, when trying to boot the operating system, it will not be able to open the encrypted partition or the encrypted volumes. In that case, it will drop to an initramfs terminal and you will need to follow the steps below.\nThe Encrypted Partition Does Not Open and an initramfs Terminal Appears If you get an initramfs terminal, you will need to repeat the steps to decrypt the partition and open the encrypted volume group as described above.\nOpen the encrypted partition with:\n1 cryptsetup luksOpen /dev/sdaX all-Operative-Systems Then detect the LVM volume group with:\n1 vgchange -a y To boot the system, simply use the following command:\n1 exit This will take you to the login screen; enter the credentials created during installation. Once the operating system has started, open a terminal and detect the UUID of the encrypted partition. The X in sdaX corresponds to the number of the encrypted partition; if you do not know it, simply use the blkid command.\n1 blkid /dev/sdaX Edit the /etc/crypttab file with nano:\n1 sudo nano /etc/crypttab Add the following:\n1 all-Operative-Systems UUID=524c1ad6-fabe-4f32-9bb0-c8db1286b262 none luks Once finished, use the following command to update initramfs:\n1 sudo update-initramfs -u Reboot the operating system with:\n1 sudo reboot rEFInd Installation Install rEFInd with the following command:\n1 sudo apt install refind Wikipedia, Debian\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nWikipedia, Logical Volume Manager (Linux)\u0026#160;\u0026#x21a9;\u0026#xfe0e;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nWikipedia, LUKS (Linux Unified Key Setup)\u0026#160;\u0026#x21a9;\u0026#xfe0e;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","date":"2022-01-06T00:00:00Z","permalink":"/en/p/installing-debian-and-parrotos-with-dual-boot-on-a-luks-encrypted-partition-with-lvm-volumes-and-refind/","title":"Installing Debian and ParrotOS with Dual Boot on a LUKS-encrypted Partition with LVM Volumes and rEFInd"},{"content":"This post describes the steps to configure Plymouth on Debian when all partitions are encrypted with LUKS, except for the /boot partition. The plymouth-themes repository is used as a reference.\nWhat is Plymouth? Plymouth is an application that starts very early in the boot process (even before the filesystem is mounted) and provides a graphical boot animation while the startup process runs in the background.\nIt is designed to work on systems that have DRM modesetting drivers. The idea is that very early in the boot process, native modesetting is configured, and Plymouth uses this mode. This mode must be maintained throughout the entire boot process, even after starting the X graphical server. The main purpose is to avoid flickering during the startup process 1.\nPlymouth Installation 1 sudo apt install plymouth GRUB Modification Edit the default GRUB configuration file:\n1 sudo nano /etc/default/grub Modify line 9 by adding splash:\n1 GRUB_CMDLINE_LINUX_DEFAULT=\u0026#34;quiet splash\u0026#34; Plymouth Themes Clone the themes repository:\n1 git clone https://github.com/adi1090x/plymouth-themes.git Navigate into one of the directories named pack_X, for example pack_3:\n1 cd plymouth-themes/pack_3 Select the theme you want, for example lone:\n1 sudo cp -r lone /usr/share/plymouth/themes/ Plymouth - Debian\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","date":"2021-11-25T13:21:56+01:00","permalink":"/en/p/plymouth-configuration/","title":"Plymouth Configuration"},{"content":"This post outlines the steps to install and configure a Raspberry Pi for deploying a series of services.\n1. OS Installation 1.1. Ubuntu, RaspberryPiOS or LibreElec Using the Raspberry Pi Imager tool, you can flash the image of Ubuntu, RaspberryPiOS, LibreElec, or whichever you prefer, onto any micro SD card or USB drive 1. For example, we will choose the Raspbian OS image with desktop and flash it onto a 64GB USB drive.\nAnother way to flash an image onto an SD card or a USB drive is by using the following commands:\n1 2 3 4 5 6 7 8 9 10 11 # See the partitions lsblk # Umount the USB partition umount /dev/sdc1 # Format in vFAT mkfs.vfat -F 32 /dev/sdc -I # Flash the ISO into USB dd status=progress if=NAME.iso of=/dev/sdc 1.2. Arch To install Arch, the steps from the AArch64 version of the Arch installation 2 were followed. We reproduced those steps for AArch64.\nStart fdisk to partition the SD card: 1 fdisk /dev/sdX At the fdisk prompt, delete old partitions and create a new one:\nType o. This will clear out any partitions on the drive. Type p to list partitions. There should be no partitions left. Type n, then p for primary, 1 for the first partition on the drive, press ENTER to accept the default first sector, then type +200M for the last sector. Type t, then c to set the first partition to type W95 FAT32 (LBA). Type n, then p for primary, 2 for the second partition on the drive, and then press ENTER twice to accept the default first and last sector. Write the partition table and exit by typing w. Create and mount the FAT filesystem: 1 2 3 mkfs.vfat /dev/sdX1 mkdir boot mount /dev/sdX1 boot Create and mount the ext4 filesystem: 1 2 3 mkfs.ext4 /dev/sdX2 mkdir root mount /dev/sdX2 root Download and extract the root filesystem (as root, not via sudo): 1 2 3 wget http://os.archlinuxarm.org/os/ArchLinuxARM-rpi-aarch64-latest.tar.gz bsdtar -xpf ArchLinuxARM-rpi-aarch64-latest.tar.gz -C root sync Move boot files to the first partition: 1 mv root/boot/* boot Before unmounting the partitions, update /etc/fstab for the different SD block device compared to the Raspberry Pi 3: 1 sed -i \u0026#39;s/mmcblk0/mmcblk1/g\u0026#39; root/etc/fstab Unmount the two partitions: 1 umount boot root Insert the SD card into the Raspberry Pi, connect ethernet, and apply 5V power.\nUse the serial console or SSH to the IP address given to the board by your router.\nLogin as the default user alarm with the password alarm. The default root password is root. Initialize the pacman keyring and populate the Arch Linux ARM package signing keys: 1 2 pacman-key --init pacman-key --populate archlinuxarm 2. Basic Configuration Once the operating system is installed, there are several options to configure the Raspberry Pi. The simplest approach is to run a network scan with sudo nmap -f 192.168.1.0/24 to identify the IP address assigned to the Raspberry Pi by the router. Then connect via ssh pi@192.168.10.250 (for Raspbian) or ssh alarm@192.168.10.250 (for Arch). You can also set it up headlessly, without an extra monitor, keyboard, and mouse, as explained in a previous post, or by using these three external peripherals. In this case, we will use an external monitor, keyboard, and mouse to simplify the walkthrough. Once the Raspberry Pi is powered on with the USB drive or SD card connected, a dialog will appear to configure the language, Wi-Fi, and a password (for example, use KeepassXC to generate a random password).\n2.1 System Update Once we have a console with a non-root user, we will open a new console as the root user:\n1 su - The default password is usually root or similar.\nDebian-based: 1 2 # Update and upgrade packages system apt-get update -y \u0026amp;\u0026amp; sudo apt-get upgrade -y Arch-based: 1 2 # Update and upgrade packages system pacman -Syu 2.2 Time Configuration We follow this guide on first steps after installing Arch 3 or any minimalist distribution such as HypriotOS. We set the timezone to the appropriate one:\n1 timedatectl set-timezone Europe/London We synchronize the clock with the internet:\n1 timedatectl set-ntp true 2.3 Locale Configuration Uncomment the desired locale in the locale.gen file (for example: en_US.UTF-8):\n1 nano /etc/locale.gen Run:\n1 locale-gen And run:\n1 localectl set-locale LANG=en_US.UTF-8 2.4 Hostname Change Change the hostname with:\n1 hostnamectl set-hostname \u0026lt;name\u0026gt; Add an alias for the hostname in the /etc/hosts file of the computer you are using for the configuration. Use nano /etc/hosts:\n1 2 127.0.0.1\tlocalhost.localdomain\t\u0026lt;name\u0026gt;\tlocalhost ::1\tlocalhost.localdomain\t\u0026lt;name\u0026gt;\tlocalhost (Optional) Enable colored output in pacman If using Arch, run:\n1 sed -i \u0026#39;s/#Color/Color/\u0026#39; /etc/pacman.conf (Optional) Add 8GB of SWAP memory If using Arch, run:\n1 fallocate -l 8192M /swapfile (Optional) New user with sudo permissions If using Arch, we will now use the visudo utility to edit group permissions for running administrative commands with sudo.\n1 2 pacman -S sudo EDITOR=nano visudo Uncomment the following line:\n1 2 ## Uncomment to allow members of group sudo to execute any command %sudo ALL=(ALL:ALL) ALL Create a new sudo group with:\n1 sudo groupadd sudo Create a new user with:\n1 useradd -m -G sudo username Set a password for the new user:\n1 passwd username Once the new user is created, delete the alarm user.\n1 userdel alarm On Debian or derivatives, modify the permissions of the user created during installation and add them to the sudo group with the following command:\n1 2 su - usermod -aG sudo username Reboot the system:\n1 reboot 2.5 SSH The next step is to copy the public key to the ~/.ssh/authorized_keys file on the Raspberry Pi. Use the following command:\n1 ssh-copy-id -i \u0026lt;identity.pub\u0026gt; pi@\u0026lt;raspberry ip or node-1\u0026gt; Now it will prompt for the SSH key password, and we will connect to the Raspberry Pi via:\n1 ssh pi@node-1 (Optional) Wi-Fi Configuration Reproduced from the guide: first steps after installing Arch 3\nkarog, on ArchLinux ARM forums provided a simple way to connect to Wi-Fi. As root, do the following steps:\nnano /etc/systemd/network/wlan0.network to configure the wlan0 interface: Add the following contents to the file: 1 2 3 4 5 [Match] Name=wlan0 [Network] DHCP=yes wpa_passphrase \u0026quot;\u0026lt;SSID\u0026gt;\u0026quot; \u0026quot;\u0026lt;PASSWORD\u0026gt;\u0026quot; \u0026gt; /etc/wpa_supplicant/wpa_supplicant-wlan0.conf. Replace and with your respective Wi-Fi network name and password. systemctl enable wpa_supplicant@wlan0 to enable the Wi-Fi when booting systemctl start wpa_supplicant@wlan0 to connect to Wi-Fi. You\u0026rsquo;re good to go!\nIf you ever want to remove Wi-Fi connection (e.g. when you want to make it connect only through ethernet):\nsystemctl stop wpa_supplicant@wlan0 systemctl disable wpa_supplicant@wlan0 rm /etc/wpa_supplicant/wpa_supplicant-wlan0.conf rm /etc/systemd/network/wlan0.network 3. Software Installation 3.1. Basic Packages Debian: 1 sudo apt install -y software-properties-common git wget Arch: 1 sudo pacman -S -y git wget 3.2. Docker and Docker Compose It is recommended to install docker rootless.\n3.2.1. Installation on Debian Install Docker using the installation script:\n1 2 curl -fsSL https://get.docker.com -o get-docker.sh sudo sh get-docker.sh For docker-compose, install it through Python 3:\n1 2 3 4 sudo apt-get install libffi-dev libssl-dev sudo apt install python3-dev sudo apt-get install -y python3 python3-pip sudo pip3 install docker-compose Add the user to the docker group:\n1 sudo usermod -aG docker ${USER} 3.2.2. Installation on Arch Install Docker and docker-compose from the official repositories:\n1 2 sudo pacman -Sy docker sudo pacman -Sy docker-compose Enable the service with:\n1 sudo systemctl start docker.service Enable it on every reboot:\n1 sudo systemctl start docker.service Add the user to the docker group:\n1 sudo usermod -aG docker ${USER} 3.2.3 Docker Rootless Arch: 1 2 sudo pacman -S shadow sudo pacman -S fuse-overlayfs Add kernel.unprivileged_userns_clone=1 in /etc/sysctl.conf:\n1 2 sudo nano /etc/sysctl.conf sudo sysctl --system Common: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 sudo touch /etc/subuid \u0026amp;\u0026amp; sudo touch /etc/subgid su - echo \u0026#34;pi:100000:65536\u0026#34; \u0026gt;\u0026gt; /etc/subgid echo \u0026#34;pi:100000:65536\u0026#34; \u0026gt;\u0026gt; /etc/subuid exit sudo systemctl disable --now docker.service docker.socket curl -fsSL https://get.docker.com/rootless | sh systemctl --user start docker systemctl --user enable docker sudo loginctl enable-linger $(whoami) export DOCKER_HOST=unix://$XDG_RUNTIME_DIR/docker.sock Test it with docker run -d -p 8080:80 nginx.\nIf you want it to start at boot, run the following commands:\n1 2 sudo systemctl enable docker.service sudo systemctl enable containerd.service 3.3. Ansible Debian 1 sudo apt install -y ansible Arch: 1 sudo pacman -Sy ansible 3.3.1 SSH Hardening with Ansible Using the Ansible collections devsec.hardening, we apply security mechanisms and verify them with the following commands.\nInstallation: 1 ansible-galaxy install dev-sec.ssh-hardening Create a playbook for each Ansible role called ansible-ssh-hardening.yaml.\nRun these playbooks with the following commands:\n1 ansible-playbook ansible-ssh-hardening.yaml --ask-become-pass You can also create an Ansible playbook with any modules you wish to include.\nAdd your SSH key to an ssh-agent using zsh (or bash): 1 2 ssh-agent zsh ssh-add ~/.ssh/id_ed25519 Run the Ansible playbook with the required sudo password for the commands: 1 ansible-playbook playbook.yaml --ask-become-pass 3.4. Installing zsh and Oh My Zsh Debian4: 1 2 sudo apt install zsh sh -c \u0026#34;$(curl -fsSL https://raw.github.com/ohmyzsh/ohmyzsh/master/tools/install.sh)\u0026#34; Arch: 1 2 sudo pacman -Sy zsh zsh-completions sh -c \u0026#34;$(curl -fsSL https://raw.github.com/ohmyzsh/ohmyzsh/master/tools/install.sh)\u0026#34; 3.4.1. Installing Powerlevel10k Download and place the 4 Meslo Nerd .ttf fonts in /usr/local/share/fonts. They must have permissions 644 (-rw-r\u0026ndash;r\u0026ndash;).5\nMesloLGS NF Regular.ttf MesloLGS NF Bold.ttf MesloLGS NF Italic.ttf MesloLGS NF Bold Italic.ttf Create the /usr/local/share/fonts directory: 1 2 sudo mkdir /usr/local/share/fonts cd /usr/local/share/fonts Download the fonts:\n1 sudo wget https://github.com/romkatv/powerlevel10k-media/raw/master/MesloLGS%20NF%20Regular.ttf https://github.com/romkatv/powerlevel10k-media/raw/master/MesloLGS%20NF%20Bold.ttf https://github.com/romkatv/powerlevel10k-media/raw/master/MesloLGS%20NF%20Italic.ttf https://github.com/romkatv/powerlevel10k-media/raw/master/MesloLGS%20NF%20Bold%20Italic.ttf Clone the powerlevel10k project:\n1 2 3 git clone --depth=1 https://github.com/romkatv/powerlevel10k.git ~/powerlevel10k echo \u0026#39;source ~/powerlevel10k/powerlevel10k.zsh-theme\u0026#39; \u0026gt;\u0026gt;~/.zshrc exec zsh Replace the following value in ~/.zshrc:\n1 ZSH_THEME=\u0026#34;powerlevel10k/powerlevel10k\u0026#34; 3.4.2. Installing zsh Plugins To add a series of useful plugins, edit the ~/.zshrc file with nano ~/.zshrc and modify the following:\n1 2 #plugins=(git) plugins=(git git-extras history ansible zsh-autosuggestions zsh-syntax-highlighting docker-helpers docker docker-compose kubectl kubectx colorize nmap pip ssh-agent sudo pipenv fzf fzf-docker) We need to install the zsh-autosuggestions, zsh-syntax-highlighting, docker-helpers, fzf, and fzf-docker plugins.\nTo install zsh-autosuggestions:\n1 git clone https://github.com/zsh-users/zsh-autosuggestions ~/.oh-my-zsh/plugins/zsh-autosuggestions To install zsh-syntax-highlighting:\n1 git clone https://github.com/zsh-users/zsh-syntax-highlighting.git ~/.oh-my-zsh/plugins/zsh-syntax-highlighting To install fzf use the official repositories:\n1 sudo pacman -Sy fzf To install fzf-docker:\n1 git clone https://github.com/pierpo/fzf-docker ~/.oh-my-zsh/plugins/fzf-docker To install docker-helpers:\n1 git clone https://github.com/unixorn/docker-helpers.zshplugin ~/.oh-my-zsh/plugins/docker-helpers Customize to your liking and reload the ~/.zshrc file:\n1 source ~/.zshrc Snapd Installation Install the snapd and core packages:\n1 sudo apt install -y snapd core Add the Snap Executables Path to bash and Zsh PATH Add the snap executables path to PATH:\n1 2 3 4 echo \u0026#34;export PATH=$PATH:/snap/bin\u0026#34; \u0026gt;\u0026gt; ~/.bashrc source ~/.bashrc echo \u0026#34;export PATH=$PATH:/snap/bin\u0026#34; \u0026gt;\u0026gt; ~/.zshrc source ~/.zshrc Verify the path was added correctly:\n1 echo $PATH Add Launchers to the Applications Menu Create a symbolic link from the directory that stores snap launchers (/var/lib/snapd/desktop/applications) to the system applications directory (usr/share/applications/):\n1 sudo ln -s /var/lib/snapd/desktop/applications /usr/share/applications/snapd Flatpak Installation From the official Flatpak documentation, follow these steps:\nInstall Flatpak 1 sudo apt install flatpak -y Add the Flatpak repository 1 flatpak remote-add --if-not-exists flathub https://flathub.org/repo/flathub.flatpakrepo Reboot the system to apply the changes. Add Launchers to the Menu Create a symbolic link from the directory that stores flatpak launchers (/var/lib/flatpak/exports/share/applications/) to the system applications directory (usr/share/applications/):\n1 sudo ln -s /var/lib/flatpak/exports/share/applications/ /usr/share/applications/flatpak KeePassXC Note: installed via Snap because the official repositories have an outdated version.\nInstall via Snap: 1 sudo snap install keepassxc Download the browser extension.\nConfigure the browser extension using an official KeePassXC script. Save the script and run it:\n1 2 wget https://raw.githubusercontent.com/keepassxreboot/keepassxc/master/utils/keepassxc-snap-helper.sh zsh keepassxc-snap-helper.sh If you get the error Could not find keepassxc.proxy! Ensure the keepassxc snap is installed properly., this is because the snap executables path is missing from PATH. Add it with:\n1 2 3 4 echo \u0026#34;export PATH=$PATH:/snap/bin\u0026#34; \u0026gt;\u0026gt; ~/.zshrc source ~/.zshrc echo \u0026#34;export PATH=$PATH:/snap/bin\u0026#34; \u0026gt;\u0026gt; ~/.bashrc source ~/.bashrc Run the script again:\n1 bash keepassxc-snap-helper.sh VSCodium 4 Add the repository GPG key: 1 wget -qO - https://gitlab.com/paulcarroty/vscodium-deb-rpm-repo/raw/master/pub.gpg | gpg --dearmor | sudo dd of=/etc/apt/trusted.gpg.d/vscodium.gpg Add the repository: 1 2 3 4 echo \u0026#39;deb [ signed-by=/usr/share/keyrings/vscodium-archive-keyring.gpg ] https://paulcarroty.gitlab.io/vscodium-deb-rpm-repo/debs vscodium main\u0026#39; | sudo tee /etc/apt/sources.list.d/vscodium.list Update repositories and install VSCodium: 1 sudo apt update \u0026amp;\u0026amp; sudo apt install codium Installing Ubuntu on the Raspberry - Atareao\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nArchLinux, Arch\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nsTheZoc, Raspberry Pi Setup Guide\u0026#160;\u0026#x21a9;\u0026#xfe0e;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nDebian Documentation, Fonts\u0026#160;\u0026#x21a9;\u0026#xfe0e;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nVSCodium Documentation, Installation\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","date":"2021-10-21T00:00:00Z","permalink":"/en/p/installing-and-configuring-debian-ubuntu-raspberrypios-libreelec-or-arch-on-a-raspberry-pi/","title":"Installing and Configuring Debian, Ubuntu, RaspberryPiOS, LibreElec or Arch on a Raspberry Pi"},{"content":"What is LineageOS? Installing LineageOS on Xiaomi Mi A1 Requirements Materials used in this scenario:\nXiaomi Mi A1 LineageOS16 official (nightly=development) LineageOS17 Unofficial TWRP Recovery Downloads Download and install the Android development files (SDK platform) adb and fastboot (in my case: minimal_adb_fastboot_v1.4.3).\nEnable OEM Unlocking Enable OEM unlocking and USB debugging in the phone settings.\nBootloader Unlock Use the following commands to unlock the bootloader. Run them from the adb and fastboot folder:\n1 2 3 4 5 6 7 8 adb devices # Reboot the phone into fastboot mode adb reboot bootloader # Check the device status fastboot oem device-info # Unlock the bootloader fastboot oem unlock # Reboot the phone Verify that it has been unlocked successfully with:\n1 2 3 adb devices adb reboot bootloader fastboot oem device-info Recovery Mode Installation After enabling USB debugging mode and unlocking the OEM, install recovery mode using the .img file in the same folder as adb and fastboot:\n1 fastboot flash recovery twrp-installer-3.3.1-0-tissot.img The following error occurs:\n1 2 3 4 5 6 target reported max download size of 534773760 bytes sending \u0026#39;recovery\u0026#39; (29348 KB)... OKAY [ 0.683s] writing \u0026#39;recovery\u0026#39;... FAILED (remote: partition table doesn\u0026#39;t exist) finished. total time: 0.699s The solution is:\n1 fastboot flash boot twrp-3.3.1-0-tissot.img The following result is obtained:\n1 2 3 4 5 6 7 8 RESULT: fastboot flash boot twrp-3.3.1-0-tissot.img target reported max download size of 534773760 bytes sending \u0026#39;boot\u0026#39; (29348 KB)... OKAY [ 0.679s] writing \u0026#39;boot\u0026#39;... OKAY [ 0.181s] finished. total time: 0.860s Image Installation Reboot into recovery mode with power button + volume up and follow these steps:\nWipe -\u0026gt; Format Data Advanced Wipe -\u0026gt; Dalvik, system and data Install -\u0026gt; lineage-1-\u0026hellip;-tissot.zip Wipe Dalvik and reboot system. Once completed, LineageOS is installed. By default, Google Play is not included. If you encounter any errors, you can install the minimal version of GApps, for example open_gapps-arm-9.0-pico-20200408.zip.\n","date":"2021-10-21T00:00:00Z","permalink":"/en/p/lineageos-installation/","title":"LineageOS Installation"},{"content":"This post outlines the steps to harden the OS and SSH on a Raspberry Pi, so you can later deploy a set of secured services.\n1. SSH Configuration The next step requires copying your public key to the ~/.ssh/authorized_keys file on the Raspberry Pi. Use the following command:\n1 ssh-copy-id -i \u0026lt;identity.pub\u0026gt; pi@\u0026lt;raspberry ip or node-1\u0026gt; It will prompt for your SSH key password. Then connect to the Raspberry Pi with:\n1 ssh pi@node-1 2. Installing Ansible Debian 1 sudo apt install -y ansible Arch: 1 sudo pacman -Sy ansible 3. OS and SSH Configuration Using Ansible 3.1. Using a collection Installation: 1 2 ansible-galaxy install dev-sec.os-hardening ansible-galaxy install dev-sec.ssh-hardening Create a playbook for each Ansible role named ansible-os-hardening.yaml and ansible-ssh-hardening.yaml.\nRun these playbooks with the following commands:\n1 2 ansible-playbook ansible-os-hardening.yaml --ask-become-pass ansible-playbook ansible-ssh-hardening.yaml --ask-become-pass 3.2. Using a basic playbook Add your SSH key to an ssh-agent using zsh (or bash): 1 2 ssh-agent zsh ssh-add ~/.ssh/id_ed25519 Run the Ansible playbook with the sudo password required for the commands: 1 ansible-playbook playbook.yaml --ask-become-pass ","date":"2021-10-21T00:00:00Z","permalink":"/en/p/os-and-ssh-hardening/","title":"OS and SSH Hardening"},{"content":"This post covers several useful commands for system administration.\nQuery network information 1 ip a Monitor TCP traffic 1 sudo tcpdump -i \u0026lt;device\u0026gt; Delete the default route 1 sudo ip route del default via 192.168.1.1 ","date":"2021-10-21T00:00:00Z","permalink":"/en/p/useful-commands-for-gnu/linux-system-administration/","title":"Useful commands for GNU/Linux system administration"},{"content":"Hacking Awesome hacking Awesome Hacking Resources System Administration Snowflake SSH Client is a Swiss army knife for sysadmins Network Analysis Subdomains Subdomain enumeration tools comparison:\nAmass: slower but finds more valid subdomains Findomain: faster with a good number of discovered subdomains Subfinder Sublist3r DNSRecon dnssearch Knock SubBrute Secure Programming Code Review Paid: Check Marx VeraCode Fortify AppScan-IBM Kiuwan Dependency Review: Dependency check Kiuwan Sonatype SourceClear BlackDuck Snyk Hardcoded Secrets (passwords in the code) Gitleaks ","date":"2021-05-15T00:00:00Z","permalink":"/en/p/security-tools-collection/","title":"Security Tools Collection"},{"content":"Purge all Docker resources 1 2 docker system prune --all docker system prune --volumes Remove all images 1 docker rmi -f $(docker images -a -q) ","date":"2021-05-15T00:00:00Z","permalink":"/en/p/useful-docker-commands/","title":"Useful Docker commands"},{"content":"This post covers the configuration of the Arch operating system.\n1. Arch Installation 1.1. What is Arch? Arch Linux is an independently developed, general-purpose x86-64 GNU/Linux distribution that strives to provide the latest stable versions of most software by following a rolling-release model. The default installation is a minimal base system, configured by the user to only add what is purposely required 1.\n1.2. Basic Concepts LVM is an implementation of a logical volume manager for the Linux kernel. LVM includes many of the features expected from a volume manager, including:\nResizing of logical groups Resizing of logical volumes Read-only snapshots (LVM2 offers read and write) RAID0 of logical volumes. LVM does not implement RAID1 or RAID5, so it is recommended to use dedicated RAID software for these operations, placing the LVs on top of the RAID 2. RAID will not be used in this configuration.\nLUKS is a disk encryption specification created by Clemens Fruhwirth, originally intended for Linux. While most disk encryption software implements different and incompatible undocumented formats, LUKS specifies a standard on-disk format, platform-independent, for use with various tools. This not only facilitates compatibility and interoperability between different programs, but also ensures that they all implement password management in a secure and documented manner. The reference implementation runs on Linux and is based on an enhanced version of cryptsetup, using dm-crypt as the disk encryption interface 3.\nA boot loader loads an operating system kernel into memory and executes it. A boot manager hands over control to another boot program. GRUB is both a boot loader and a boot manager. rEFInd is only a boot manager.\nAnother fundamental concept is understanding the difference between EFI/UEFI and BIOS.\nLVM is an implementation of a logical volume manager for the Linux kernel. LVM includes many of the features expected from a volume manager, including:\nResizing of logical groups Resizing of logical volumes Read-only snapshots (LVM2 offers read and write) RAID0 of logical volumes. LVM does not implement RAID1 or RAID5, so it is recommended to use dedicated RAID software for these operations, placing the LVs on top of the RAID 2. RAID will not be used in this configuration.\nLUKS is a disk encryption specification created by Clemens Fruhwirth, originally intended for Linux. While most disk encryption software implements different and incompatible undocumented formats, LUKS specifies a standard on-disk format, platform-independent, for use with various tools. This not only facilitates compatibility and interoperability between different programs, but also ensures that they all implement password management in a secure and documented manner. The reference implementation runs on Linux and is based on an enhanced version of cryptsetup, using dm-crypt as the disk encryption interface 3.\nIn the Partition Table, the ext4 format is used for partitions because it improves I/O speed and uses less CPU than the ext3 and ext2 formats. The following minimum values are recommended:\nPartition Recommended Size Debian Allocation Custom Allocation Contains / \u0026gt;= 750MB 22GB 64GB /etc, /bin, /sbin, /lib, /dev, /usr /usr \u0026gt;= 4-6GB 0 0 User programs, libs and docs /var \u0026gt;= 2-3GB 32GB 112GB Variable data such as emails /tmp \u0026gt;= 100MB 16GB 32GB Web pages, package cache, temporary data /home \u0026gt;= 100MB 200GB 288GB Directory with Documents, Downloads, \u0026hellip; /boot \u0026gt;= 256MB 500MB 512GB Primary Partition, ext4 or ext2, encryption not recommended /boot/efi \u0026gt;= 100MB 250MB 0 Encryption not recommended and bootable flag: on /swap \u0026gt;= 8GB 16GB 16GB Swap area 1.3. Flashing the Arch Image The steps from the Arch installation guide were followed 4.\nOne way to flash an image is with the dd command as shown below:\n1 2 # See the partitions lsblk -f 1 2 # Umount the USB partition umount /dev/sda1 1 2 # Flash the ISO into USB sudo dd bs=4M if=archlinux-2022.05.01-x86_64.iso of=/dev/sda conv=fsync oflag=direct status=progress 1.4. Booting Arch Connect the USB and ethernet cable, then boot Arch Linux from the USB via the BIOS.\n2. Initial Configuration 2.1. Set the Keyboard Layout in the Live Environment Switch the keyboard to Spanish:\n1 loadkeys es 2.2. Configure Wi-Fi If the iwd package is not installed, install it using an ethernet connection:\n1 sudo pacman -yS iwd Configure the Wi-Fi interface with:\n: wlan0 : Wi-Fi network name : password To find these values, you can use the iwctl command followed by device device show.\n1 iwctl --passphrase \u0026lt;passphrase\u0026gt; station \u0026lt;device\u0026gt; connect \u0026lt;SSID\u0026gt; To see available Wi-Fi networks, run the following commands:\n1 2 3 iwctl station list iwctl station wlan0 scan iwctl station station get-networks Verify that you have an IP address with:\n1 ip a Now, if you change the root user password with the passwd command, you can connect to the machine with:\n1 ssh root@\u0026lt;machine-IP\u0026gt; 2.3. Time Update We follow this guide with the first steps after installing Arch 5. Set the timezone to the appropriate one with:\n1 timedatectl set-timezone Europe/Madrid Synchronize the clock with the internet:\n1 timedatectl set-ntp true 2.4. Unlock LUKS-encrypted Partition Configured with LVM Logical Volumes Decrypt the partition with:\n1 cryptsetup luksOpen /dev/sdaX all-Operative-Systems Then detect the LVM volume group with:\n1 vgchange -a y Note: the following steps may not work on the first attempt and may need to be completed with the next section. You could skip the end of this section and install GRUB directly.\nOnce the commands above have been executed, continue with the installation until the GRUB installation. Open a terminal again and identify the UUID of the encrypted partition with:\n1 blkid /dev/sdaX \u0026gt;\u0026gt; nano /etc/crypttab Next, edit the /etc/crypttab file:\n1 nano /etc/crypttab Add the following content, where the UUID is the one obtained from the blkid command:\n1 all-Operative-Systems UUID=524c1ad6-1111-2222-0000-c8db1286b262 none luks This configuration may need to be repeated later.\n2.5. Mount the Partitions According to the Arch documentation for creating filesystems and mounting volumes, format the previously created volumes and mount the following partitions:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 # Format and mount the root partition mkfs.ext4 -L arch-root /dev/lvm-all-OS/lvm-arch-root mount /dev/lvm-all-OS/lvm-arch-root /mnt # Format and mount the home partition mkfs.ext4 -L arch-home /dev/lvm-all-OS/lvm-arch-home mount --mkdir /dev/lvm-all-OS/lvm-arch-home /mnt/home # Format and mount the EFI (ESP) and BOOT partitions # If the boot partition is not formatted as FAT32, use the following commented command # mkfs.fat -F 32 -n boot-arch /dev/sda4 mkfs.ext4 -L boot-arch /dev/sda4 mount --mkdir /dev/sda4 /mnt/boot mount --mkdir /dev/sda1 /mnt/boot/efi 2.6. Installation of Essential and Recommended Packages Sources:\n[https://denovatoanovato.net/instalar-arch-linux/#uefi] [https://linuxhint.com/setup-luks-encryption-on-arch-linux/] Try following with encrypt instead of lvm2 or using both Essential packages:\n1 pacstrap /mnt base base-devel linux linux-firmware lvm2 nano vim intel-ucode iwd Recommended packages (some may produce errors):\n1 pacstrap /mnt grub networkmanager dhcpcd efibootmgr gvfs gvfs-mtp netctl wpa_supplicant dialog nano initramfs To enable the touchpad, install the xf86-input-synaptics package.\nOther additional packages could include os-probes (may produce errors).\nThen generate the fstab file, which contains the system\u0026rsquo;s partition table.\n1 genfstab -pU /mnt \u0026gt;\u0026gt; /mnt/etc/fstab 2.7. Enter the Base System It is time to enter the installed base system to continue configuring it from within. To access the system in chroot, run:\n1 arch-chroot /mnt 2.8 Update Hostname 1 echo hostname \u0026gt; /etc/hostname 2.9. Update Timezone 1 ln -sf /usr/share/zoneinfo/Europe/Madrid /etc/localtime 2.10. Set the Clock 1 hwclock --systohc 2.11. Configure Keyboard Layout 1 echo KEYMAP=es \u0026gt;\u0026gt; /etc/vconsole.conf 2.12. Configure mkinitcpio Sources:\n[https://www.linuxserver.io/blog/2014-01-18-installing-arch-linux-with-root-on-an-lvm] [https://wiki.archlinux.org/title/LVM_(Espa%C3%B1ol)#Crear_sistemas_de_archivos_y_montar_los_vol%C3%BAmenes_l%C3%B3gicos] 1 nano /etc/mkinitcpio.conf Edit the HOOKS line and add the following:\n1 HOOKS=(base udev autodetect keyboard keymap consolefont modconf block encrypt lvm2 filesystems fsck) Then run:\n1 mkinitcpio -p linux At this point, you could unmount the partition and reboot the operating system with the following commands if you did not install Arch on an encrypted partition. If you installed it on an encrypted partition, you must configure GRUB to indicate that it is encrypted (step 2.13).\n1 2 umount -R /mnt sudo reboot 2.13. Configure GRUB Install GRUB with the following commands:\n1 grub-install --boot-directory=/boot --efi-directory=/boot/efi --target=x86_64-efi --recheck /dev/sda4 In /etc/default/grub, edit the GRUB_CMDLINE_LINUX line to:\n1 GRUB_CMDLINE_LINUX=\u0026#34;cryptdevice=/dev/sda2:luks:allow-discards\u0026#34; [Tip] To automatically detect other operating systems on your computer, install os-prober (pacman -S os-prober) before running the following command.\nFinally, configure GRUB with:\n1 2 grub-mkconfig -o /boot/grub/grub.cfg grub-mkconfig -o /boot/efi/EFI/arch/grub.cfg 2.14. Configuration for LUKS-encrypted Partition Note: the following steps are the same as those in section 2.4, so check carefully whether they already worked in that step or need to be repeated.\nDetect the UUID of the encrypted partition. The X in sdaX corresponds to the number of the encrypted partition; if you do not know it, simply use the blkid command.\n1 blkid /dev/sdaX \u0026gt;\u0026gt; /etc/crypttab Edit the /etc/crypttab file with nano:\n1 sudo nano /etc/crypttab Add the following:\n1 all-Operative-Systems UUID=524c1ad6-1111-2222-0000-c8db1286b262 none luks Install initramfs:\n1 pacman -Sy initramfs Once finished, use the following command to update initramfs:\n1 sudo update-initramfs -u 2.14 (Optional) Initialize the Pacman Keyring Install the keyring:\n1 pacman -Sy archlinux-keyring Initialize the pacman keyring and populate the Arch Linux ARM package signing keys (if using a Raspberry Pi):\n1 2 pacman-key --init pacman-key --populate archlinuxarm 2.15 Unmount the Partitions Unmount the mnt partition:\n1 umount -R /mnt Reboot the operating system with:\n1 sudo reboot 3. Advanced Configuration Connect via SSH to the machine again and follow the steps below.\n3.1. System Update Once you have a console with a non-root user, open a new console as the root user:\n1 su - The default password is usually root or the one previously configured.\nUpdate Arch: 1 pacman -Syu 3.2. Language Update If not previously configured, uncomment the desired language in the locale.gen file (e.g., en_US.UTF-8):\n1 nano /etc/locale.gen Run:\n1 locale-gen Then run:\n1 localectl set-locale LANG=en_US.UTF-8 3.3. Hostname Change Change the hostname with:\n1 hostnamectl set-hostname \u0026lt;name\u0026gt; Add a hostname alias in the /etc/hosts file of the computer you are using for the configuration. Use nano /etc/hosts:\n1 2 127.0.0.1\tlocalhost.localdomain\t\u0026lt;name\u0026gt;\tlocalhost ::1\tlocalhost.localdomain\t\u0026lt;name\u0026gt;\tlocalhost 3.4. (Optional) Enable Color Output in Pacman If using Arch, run:\n1 sed -i \u0026#39;s/#Color/Color/\u0026#39; /etc/pacman.conf 3.5. (Optional) Add 8GB of SWAP Memory If using Arch, run:\n1 fallocate -l 8192M /swapfile 3.6. (Optional) New User with Sudo Privileges If using Arch, we will now use the visudo utility to edit group permissions for running administrative commands with sudo.\n1 2 pacman -S sudo EDITOR=nano visudo Uncomment the following line:\n1 2 ## Uncomment to allow members of group sudo to execute any command %sudo ALL=(ALL:ALL) ALL Create a new sudo group with:\n1 sudo groupadd sudo Create a new user with:\n1 useradd -m -G sudo username Set a password for the new user:\n1 passwd username Once you have the new user, delete the alarm user or the default installation user (if one exists).\n1 userdel alarm Modify the permissions of the user created during installation and add them to the sudo group with the following command:\n1 2 su - usermod -aG sudo username Reboot the system:\n1 reboot 3.6. SSH Keys In the next step, you need to copy the public key to the ~/.ssh/authorized_keys file on the machine. To do this, use the following command:\n1 ssh-copy-id -i \u0026lt;identity.pub\u0026gt; pi@\u0026lt;machine IP\u0026gt; Now it will ask for your SSH key password, and you can connect to the machine with:\n1 ssh pi@\u0026lt;machine IP\u0026gt; 3.7. (Optional) Wi-Fi Configuration Reproduced from the guide: first steps after installing Arch 5. karog, on ArchLinux ARM forums provides a very simple way to connect to Wi-Fi. As root, follow these steps:\nnano /etc/systemd/network/wlan0.network to configure the wlan0 interface: Add the following content to the file: 1 2 3 4 5 [Match] Name=wlan0 [Network] DHCP=yes wpa_passphrase \u0026quot;\u0026lt;SSID\u0026gt;\u0026quot; \u0026quot;\u0026lt;PASSWORD\u0026gt;\u0026quot; \u0026gt; /etc/wpa_supplicant/wpa_supplicant-wlan0.conf. Replace and with your Wi-Fi network name and password. systemctl enable wpa_supplicant@wlan0 to enable Wi-Fi on boot. systemctl start wpa_supplicant@wlan0 to connect to Wi-Fi. Everything is now set up. However, if you ever want to remove the Wi-Fi connection (for example, when you want the machine to connect only via ethernet):\nsystemctl stop wpa_supplicant@wlan0 systemctl disable wpa_supplicant@wlan0 rm /etc/wpa_supplicant/wpa_supplicant-wlan0.conf rm /etc/systemd/network/wlan0.network 4. Package and Software Installation 4.1. Basic Packages git and wget: 1 sudo pacman -S -y git wget yay: The most commonly used AUR helpers in Arch Linux are Yaourt and Packer. You can easily use them for Arch Linux package management tasks such as installing and updating packages. However, both have been discontinued in favor of yay, short for Yet Another Yaourt. Yay is a modern AUR helper written in the Go language. It has very few dependencies and supports AUR tab completion so you don\u0026rsquo;t have to type out full commands.\nWe install it with the following commands in the opt directory, which is the designated folder for storing third-party programs.\n1 2 3 4 cd /opt sudo git clone https://aur.archlinux.org/yay.git sudo chown -R $USER:$USER ./yay makepkg -si Update the repos with:\n1 sudo yay -Syu 4.2 Install zsh, Oh My Zsh and Powerlevel10k Install zsh and oh-my-zsh with the following commands:\n1 2 sudo pacman -S zsh sh -c \u0026#34;$(curl -fsSL https://raw.github.com/ohmyzsh/ohmyzsh/master/tools/install.sh)\u0026#34; To install Powerlevel10k, we need to install the required fonts with the command:\n1 2 yay -Sy --noconfirm ttf-meslo-nerd-font-powerlevel10k sudo pacman -S powerline-common awesome-terminal-fonts Alternatively, you can do it manually by downloading and placing the 4 .ttf fonts from Meslo Nerd in /usr/local/share/fonts. They must have permissions 644 (-rw-r\u0026ndash;r\u0026ndash;)6.\nMesloLGS NF Regular.ttf MesloLGS NF Bold.ttf MesloLGS NF Italic.ttf MesloLGS NF Bold Italic.ttf Create the /usr/local/share/fonts directory:\n1 2 sudo mkdir /usr/local/share/fonts cd /usr/local/share/fonts Download the fonts:\n1 sudo wget https://github.com/romkatv/powerlevel10k-media/raw/master/MesloLGS%20NF%20Regular.ttf https://github.com/romkatv/powerlevel10k-media/raw/master/MesloLGS%20NF%20Bold.ttf https://github.com/romkatv/powerlevel10k-media/raw/master/MesloLGS%20NF%20Italic.ttf https://github.com/romkatv/powerlevel10k-media/raw/master/MesloLGS%20NF%20Bold%20Italic.ttf Install and configure Powerlevel10k with:\n1 2 yay -S --noconfirm zsh-theme-powerlevel10k-git echo \u0026#39;source /usr/share/zsh-theme-powerlevel10k/powerlevel10k.zsh-theme\u0026#39; \u0026gt;\u0026gt;~/.zshrc 4.2. Docker and Docker Compose It is recommended to install Docker rootless (4.2.2.) but it may cause issues with some Docker containers. If you do not want to deal with those issues or you are configuring a production server that requires security, follow the next section (4.2.1.).\n4.2.1. Installation on Arch Install Docker from the official repositories:\n1 sudo pacman -Sy docker docker-compose Add the user to the docker group:\n1 sudo usermod -aG docker ${USER} Enable the Docker daemon:\n1 2 sudo systemctl enable docker.service sudo systemctl enable docker.socket If you get an error, reboot the machine.\n4.2.2 Docker Rootless Arch: 1 2 sudo pacman -S shadow sudo pacman -S fuse-overlayfs Add kernel.unprivileged_userns_clone=1 in /etc/sysctl.conf:\n1 2 sudo nano /etc/sysctl.conf sudo sysctl --system 1 2 3 4 5 6 7 8 9 10 11 12 13 14 sudo touch /etc/subuid \u0026amp;\u0026amp; sudo touch /etc/subgid su - echo \u0026#34;pi:100000:65536\u0026#34; \u0026gt;\u0026gt; /etc/subgid echo \u0026#34;pi:100000:65536\u0026#34; \u0026gt;\u0026gt; /etc/subuid exit sudo systemctl disable --now docker.service docker.socket curl -fsSL https://get.docker.com/rootless | sh systemctl --user start docker systemctl --user enable docker sudo loginctl enable-linger $(whoami) export DOCKER_HOST=unix://$XDG_RUNTIME_DIR/docker.sock Test with docker run -d -p 8080:80 nginx.\nIf you want it to start at boot, run the following commands:\n1 2 sudo systemctl enable docker.service sudo systemctl enable containerd.service 4.3 i3wm Window Manager Following the steps from Low Orbit Flux - Arch Linux How to Install i3 Gaps, install the xorg-xinit package which installs xinit. The xinit program allows the user to manually start an Xorg display server. Install the X window server xorg and xterm.\n1 sudo pacman -S xorg xorg-xinit xterm Install some optional extras:\n1 pacman -S xorg-xeyes xorg-xclock Install the entire i3 group, prioritizing i3-gaps over i3-wm since the former is a fork of the latter.\n1 sudo apt install i3-w Install the necessary drivers for your machine.\n1 2 3 sudo pacman -S nvidia nvidia-utils # NVIDIA sudo pacman -S xf86-video-amdgpu mesa # AMD sudo pacman -S xf86-video-intel mesa # Intel Follow the steps from Low Orbit Flux - Arch Linux How to Install i3 Gaps to completion.\n5. References ArchLinux, Arch\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nWikipedia, Logical Volume Manager (Linux)\u0026#160;\u0026#x21a9;\u0026#xfe0e;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nWikipedia, LUKS (Linux Unified Key Setup)\u0026#160;\u0026#x21a9;\u0026#xfe0e;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nArchLinux, Installation Guide\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nGitHub - TheZoc/Setting up guide for ArchLinux on Raspberry Pi.md\u0026#160;\u0026#x21a9;\u0026#xfe0e;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nDebian Documentation, Fonts\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","date":"2021-04-02T00:00:00Z","permalink":"/en/p/arch-installation-and-configuration/","title":"Arch Installation and Configuration"},{"content":"This post covers the configuration of the Debian operating system.\nWhat is Debian? Debian GNU/Linux is a free operating system, developed by thousands of volunteers from around the world who collaborate via the Internet.\nDebian\u0026rsquo;s dedication to free software, its volunteer base, its non-commercial nature, and its open development model distinguish it from other GNU operating system distributions1.\nAdd Wi-Fi and NVIDIA Drivers Running the following commands will log in as root and add the repositories needed to install drivers not included in the fully free repositories:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 # Log in as root su - # Install vim apt install -y vim # Add contrib and non-free reopositories ## Edit /etc/apt/sources.list vim /etc/apt/sources.list ## Add contrib and non-free at the end deb http://mirror.librelabucm.org/debian/ buster main contrib non-free deb http://security.debian.org/debian-security buster/updates main contrib non-free deb http://mirror.librelabucm.org/debian/ buster-updates main contrib non-free # Add non-free drivers for WiFi apt install -y firmware-iwlwifi firmware-atheros firmware-misc-nonfree firmware-intelwimax firmware-realtek firmware-linux firmware-linux-nonfree Verify that the packages have been installed:\n1 sudo apt list --installed | grep firmware Add a User to the Sudo Group 1 2 su - usermod -aG sudo username Verify Sudo Group Membership 1 getent group sudo Log In with the Sudo Group User 1 su - username Install git and wget 1 sudo apt install git wget -y Install zsh and Oh My Zsh2 1 2 sudo apt install zsh sh -c \u0026#34;$(curl -fsSL https://raw.github.com/ohmyzsh/ohmyzsh/master/tools/install.sh)\u0026#34; Install Powerlevel10k Download and place the 4 .ttf fonts from Meslo Nerd in /usr/local/share/fonts. They must have permissions 644 (-rw-r\u0026ndash;r\u0026ndash;).3\nMesloLGS NF Regular.ttf MesloLGS NF Bold.ttf MesloLGS NF Italic.ttf MesloLGS NF Bold Italic.ttf Create the /usr/local/share/fonts directory:\n1 2 sudo mkdir /usr/local/share/fonts cd /usr/local/share/fonts Download the fonts:\n1 sudo wget https://github.com/romkatv/powerlevel10k-media/raw/master/MesloLGS%20NF%20Regular.ttf https://github.com/romkatv/powerlevel10k-media/raw/master/MesloLGS%20NF%20Bold.ttf https://github.com/romkatv/powerlevel10k-media/raw/master/MesloLGS%20NF%20Italic.ttf https://github.com/romkatv/powerlevel10k-media/raw/master/MesloLGS%20NF%20Bold%20Italic.ttf Clone the powerlevel10k project:\n1 git clone --depth=1 https://github.com/romkatv/powerlevel10k.git ${ZSH_CUSTOM:-$HOME/.oh-my-zsh/custom}/themes/powerlevel10k Replace the following value in ~/.zshrc:\n1 ZSH_THEME=\u0026#34;powerlevel10k/powerlevel10k\u0026#34; Configure to your liking and reload the ~/.zshrc file:\n1 source ~/.zshrc Add Launchers to the Menu Use the following command to emulate the applications in /etc/profile within zsh.\n1 emulate sh -c \u0026#39;source /etc/profile\u0026#39; Recommended Software Installation Snapd Installation Install the snapd and core packages:\n1 2 sudo apt install snapd sudo snap install core Add Snap Executables Path to bash and Zsh PATH Add the snap executables path to the PATH:\n1 2 3 4 echo \u0026#34;export PATH=$PATH:/snap/bin\u0026#34; \u0026gt;\u0026gt; ~/.bashrc source ~/.bashrc echo \u0026#34;export PATH=$PATH:/snap/bin\u0026#34; \u0026gt;\u0026gt; ~/.zshrc source ~/.zshrc Verify that the path has been added correctly:\n1 echo $PATH Add Launchers to the Application Menu Create a symbolic link from the directory that stores snap launchers (/var/lib/snapd/desktop/applications) to the system applications directory (usr/share/applications/)\n1 sudo ln -s /var/lib/snapd/desktop/applications /usr/share/applications/snapd Flatpak Installation From the official Flatpak documentation, follow these steps:\nInstall Flatpak 1 sudo apt install flatpak -y Add the Flatpak repository 1 flatpak remote-add --if-not-exists flathub https://flathub.org/repo/flathub.flatpakrepo Reboot the system to apply the changes. Add Launchers to the Menu Create a symbolic link from the directory that stores flatpak launchers (/var/lib/flatpak/exports/share/applications/) to the system applications directory (usr/share/applications/)\n1 sudo ln -s /var/lib/flatpak/exports/share/applications/ /usr/share/applications/flatpak Aptitude 1 sudo apt install aptitude Nextcloud Sync Client Download the AppImage file from Nextcloud, grant execution permissions to the user, and run it with:\n1 2 chmod u+x Nextcloud-3.3.5-x86_64.AppImage ./Nextcloud-3.3.5-x86_64.AppImage Sync the folders.\nKeePassXC Note: installed via Snap because the official repositories have an outdated version.\nInstall via snap: 1 sudo snap install keepassxc Download the browser extension.\nConfigure the browser extension using an official KeePassXC script. Save the script and run:\n1 2 wget https://raw.githubusercontent.com/keepassxreboot/keepassxc/master/utils/keepassxc-snap-helper.sh zsh keepassxc-snap-helper.sh If you get the error Could not find keepassxc.proxy! Ensure the keepassxc snap is installed properly., this is because the snap executables path needs to be added to the PATH:\n1 2 3 4 echo \u0026#34;export PATH=$PATH:/snap/bin\u0026#34; \u0026gt;\u0026gt; ~/.zshrc source ~/.zshrc echo \u0026#34;export PATH=$PATH:/snap/bin\u0026#34; \u0026gt;\u0026gt; ~/.bashrc source ~/.bashrc Run the script again:\n1 bash keepassxc-snap-helper.sh VSCodium 4 Add the repository GPG key: 1 wget -qO - https://gitlab.com/paulcarroty/vscodium-deb-rpm-repo/raw/master/pub.gpg | gpg --dearmor | sudo dd of=/etc/apt/trusted.gpg.d/vscodium.gpg Add the repository: 1 2 3 4 echo \u0026#39;deb [ signed-by=/usr/share/keyrings/vscodium-archive-keyring.gpg ] https://paulcarroty.gitlab.io/vscodium-deb-rpm-repo/debs vscodium main\u0026#39; | sudo tee /etc/apt/sources.list.d/vscodium.list Update repositories and install VSCodium: 1 sudo apt update \u0026amp;\u0026amp; sudo apt install codium Using LaTeX with VSCodium In settings, search for word wrap and enable it so that lines do not extend infinitely. Install the LaTeX distribution Texlive (recommended by the VSCodium LaTeX Workshop extension), ChkTex for LaTeX semantic checking, and texlive-extra-utils for extensions like latexindent. 1 apt-get install -y texlive texlive-latex-extra texlive-extra-utils chktex latexmk texlive-fonts-recommended texlive-fonts-extra texlive-science texlive-latex-base-doc Add the path 1 2 3 4 echo \u0026#39;export PATH=$PATH:/usr/share\u0026#39; \u0026gt;\u0026gt; ~/.bashrc echo \u0026#39;export PATH=$PATH:/usr/share\u0026#39; \u0026gt;\u0026gt; ~/.zshrc source ~/.bashrc source ~/.zshrc Inkscape Install via the Flatpak repositories.\n1 flatpak install org.inkscape.Inkscape Mattermost-Desktop According to the official Mattermost documentation, for Debian-based operating systems, the steps to follow are:\nDownload the latest version of Mattermost (use the official documentation page): 64-bit systems mattermost-desktop-4.6.2-linux-amd64.deb Zotero The reference steps are from the Debian wiki for installing Zotero.\nInstall Zotero via Flatpak:\n1 flatpak install flathub org.zotero.Zotero Add Zotero to the PATH:\n1 echo \u0026#39;export PATH=$PATH:/var/lib/flatpak/exports/bin\u0026#39; \u0026gt;\u0026gt; ~/.bashrc Run Zotero:\n1 flatpak run org.zotero.Zotero Sync the library and install the BetterBibTex plugin. To install the BetterBibTex plugin, follow its documentation.\nOnce installed, add the following script to include the keywords when exporting with\nOwnCloud Follow the installation guide for Debian.\nOnce installed, sync the folders.\nThunderbird Copy and paste the .thunderbird folder for a complete migration. Install with:\n1 sudo apt install thunderbird Pip 1 sudo apt install python3-pip Node and npm 1 2 sudo curl -fsSL https://deb.nodesource.com/setup_lts.x | sudo bash - sudo apt-get install -y nodejs Kubernetes kubectl Install using native package management\nHDMI Audio Configuration According to this post, add the following to /etc/pulse/default.pa:\n1 2 3 load-module module-alsa-sink device=hdmi:0 load-module module-combine-sink sink_name=combined set-default-sink combined XFCE Customization on Debian Theme Download themes from xfce-look, filtering by rating. Some recommended ones are Qogir-dark, Ultimate-dark, or Nordic. Extract them and copy them to the .themes folder, located at /home/username/.themes.\nGo to Appearance -\u0026gt; Themes -\u0026gt; Qogir-dark.\nIcons Add the Qogir-dark icons. Download them from xfce-look, extract them, and copy them to the .icons folder located at /home/username/.icons.\nGo to Appearance -\u0026gt; Icons -\u0026gt; Qogir-dark\nDock Install Plank:\n1 sudo apt-get install plank Window Manager Install emerald:\n1 sudo apt install emerald Run the emerald-theme-manager program and choose a theme:\n1 emerald-theme-manager Run in the background:\n1 emerald --replace \u0026amp; Plymouth Steps followed from the official Debian wiki.\ni3wm Window Manager 1 sudo apt install i3 i3status Wikipedia, Debian\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nOh My Zsh Documentation, Install oh-my-zsh\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nDebian Documentation, Fonts\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nVSCodium Documentation, Installation\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","date":"2021-04-02T00:00:00Z","permalink":"/en/p/debian-configuration/","title":"Debian Configuration"},{"content":"This post covers installing and configuring the Debian operating system with LUKS-encrypted LVM volumes.\nWhat is Debian? Debian GNU/Linux is a free operating system, developed by thousands of volunteers from around the world who collaborate via the Internet.\nDebian\u0026rsquo;s dedication to free software, its volunteer base, its non-commercial nature, and its open development model distinguish it from other GNU operating system distributions1.\nWhat is LVM (Logical Volume Manager)? LVM is an implementation of a logical volume manager for the Linux kernel. LVM includes many of the features expected from a volume manager, including:\nResizing of logical groups Resizing of logical volumes Read-only snapshots (LVM2 offers read and write) RAID0 of logical volumes. LVM does not implement RAID1 or RAID5, so it is recommended to use dedicated RAID software for these operations, placing the LVs on top of the RAID2. RAID will not be used in this configuration.\nWhat is LUKS (Linux Unified Key Setup)? LUKS is a disk encryption specification created by Clemens Fruhwirth, originally intended for Linux. While most disk encryption software implements different and incompatible undocumented formats, LUKS specifies a standard on-disk format, platform-independent, for use with various tools. This not only facilitates compatibility and interoperability between different programs, but also ensures that they all implement password management in a secure and documented manner. The reference implementation runs on Linux and is based on an enhanced version of cryptsetup, using dm-crypt as the disk encryption interface3.\nPartition Table The ext4 format is used for partitions because it improves I/O speed and uses less CPU than the ext3 and ext2 formats. The following minimum values are recommended:\nPartition Recommended Size Debian Allocation Custom Allocation Contains / \u0026gt;= 750MB 22GB 64GB /etc, /bin, /sbin, /lib, /dev, /usr /usr \u0026gt;= 4-6GB 0 0 User programs, libs and docs /var \u0026gt;= 2-3GB 32GB 112GB Variable data such as emails /tmp \u0026gt;= 100MB 16GB 32GB Web pages, package cache, temporary data /home \u0026gt;= 100MB 200GB 288GB Directory with Documents, Downloads, \u0026hellip; /boot \u0026gt;= 256MB 500MB 512GB Primary Partition, ext4 or ext2, encryption not recommended /boot/efi \u0026gt;= 100MB 250MB 0 Encryption not recommended and bootable flag: on /swap \u0026gt;= 8GB 16GB 16GB Swap area Steps Followed It is recommended to connect the machine via ethernet so the system updates during installation.\nConfigure the language, region, keyboard, etc. (Skip this step) Create manual partitions, specifically 3: one for /boot, another for /boot/efi, and another for the remaining partitions which will be encrypted with LUKS. Encrypt with LUKS and choose a password of more than 20 characters. Create an LVM volume and then create the logical volume partitions for each partition. Assign the labels and finish configuring the partitions. Set a hostname and create the root user and a non-privileged user. Recommended References Arch Wiki, dm-crypt/Encrypting an entire system Debian Wiki, LVM Youtube, How to install Debian GNU/Linux with LUKS encrypted LVM Wikipedia, Debian\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nWikipedia, Logical Volume Manager (Linux)\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nWikipedia, LUKS (Linux Unified Key Setup)\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","date":"2021-04-02T00:00:00Z","permalink":"/en/p/installing-debian-with-luks-encrypted-lvm-volumes/","title":"Installing Debian with LUKS-encrypted LVM Volumes"},{"content":"Create a group Create the shared group: 1 sudo groupadd shared Add a user Add user devops to the shared group: 1 sudo usermod -a -G shared devops For the changes to take effect, you need to log out and log back in as the devops user. You can verify with the command:\n1 groups Change the group of a directory and its files Change the group of the wiki directory to shared: 1 sudo chgrp -R shared wiki Add write permissions for a group on a directory and its files Add write w permissions for the user group on the wiki directory.\n1 sudo chmod -R g+w wiki ","date":"2021-04-02T00:00:00Z","permalink":"/en/p/permissions-in-gnu/linux/","title":"Permissions in GNU/Linux"},{"content":"Cloud Cloud Training Resources Kubernetes Kube academy\nKatakoda\nA visual guide on troubleshooting Kubernetes deployments\nTécnicas avanzadas de Scheduling\n","date":"2021-03-05T10:46:34+01:00","permalink":"/en/p/devops-training-resources/","title":"DevOps Training resources"},{"content":"Configure VSCode as a LaTeX IDE Install the following packages on Debian based:\ntexlive-full latexmk 1 sudo apt install -y texlive-full latexmk Add some extensions:\nLatex Workshop LTex Todo Tree Select word-wrap to see the text in unique windows: -\u0026gt; Settings -\u0026gt; Search= wrap -\u0026gt; Editor: Word Wrap -\u0026gt; on ","date":"2021-02-17T13:58:35+01:00","permalink":"/en/p/latex-configuration/","title":"LaTeX Configuration"},{"content":"Guide for Authors\nSurveys, Review or Literature Review The terms ‘Review’, ‘Literature Review’, and ‘Survey’ are interchangeable in this context.\nWhat makes a good survey?\nA survey may typically contain the following elements: Be approximately 50 printed pages in length (i.e. 100 typed pages) A good descriptive title A concise abstract A table of contents that establishes the structure of the survey Introduction (including motivation and historical remarks) Outline of the Survey Basic concepts, examples and results (with sketches of the proofs) Comments on the relevance of the results, relations to other results and applications Open problems Critical review of the relevant literature Comprehensive bibliography ","date":"2021-02-17T11:25:44+01:00","permalink":"/en/p/academic-papers-structure/","title":"Academic papers structure"},{"content":"Create a researcher profile It\u0026rsquo;s recommended to register in the following platforms:\nORCID\nResearcherID\nScopus\nWeb of Science\n[Publons(https://publons.com/)]\nGoogle Scholar Citations\nHow to measure the impact of authors and their papers? Index H Science Citation Index (SCI) Impact of scientific journals ","date":"2021-02-17T11:25:44+01:00","permalink":"/en/p/academic-researcher-profile/","title":"Academic Researcher Profile"},{"content":"How to start to research about a topic? Go to a paper search engine or database and do a specific search with keywords and \u0026quot;\u0026quot; or the advanced search. Look for surveys or literature reviews of the research topic. Do a quick see to the abstract, body and conclusion and write a short note about the paper. Select the papers that are more close to your research topic. ","date":"2021-02-17T09:13:29+01:00","permalink":"/en/p/first-steps-in-academic-research/","title":"First Steps in academic research"},{"content":"Requirements Pipenv 1 2 3 4 5 6 7 8 9 10 11 # Install pip install pipenv # Install a packages for the project pipenv install \u0026lt;package\u0026gt; # Activate Virtual Env pipenv shell # Run a script in the virtual env pipenv run python \u0026lt;script.py\u0026gt; Generate requirements.txt:\n1 pipenv lock -r \u0026gt;\u0026gt; requirements.txt ","date":"2021-02-17T09:13:29+01:00","permalink":"/en/p/python-basics/","title":"Python basics"},{"content":"Zotero Zotero “is a free and open-source reference management software to manage bibliographic data and related research materials” 1. It is recommended to use File Syncing with WebDAV to have more free space.\nThere are many useful plugins for Zotero. Some recommended plugins are:\nLatex BetterBitTex. It is highly recommended add auto-export of the bibliography and use git. Manage files Zotfile \u0026ldquo;plugin to manage your attachments: automatically rename, move, and attach PDFs (or other files) to Zotero items, sync PDFs from your Zotero library to your (mobile) PDF reader (e.g. an iPad, Android tablet, etc.) and extract annotations from PDF files\u0026rdquo; 2. Citations Counters Zotero Citation Counts Manager Zotero Scholar Citation - DEPRECATED Wikipedia, Zotero\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nGitHub, Zotfile\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","date":"2021-02-17T09:12:29+01:00","permalink":"/en/p/bibliography/","title":"Bibliography"},{"content":"How to deploy in GitLab pages? Set up CI/CD adding a file .gitlab-ci.yml with Hugo template. Change the master branch to main in the template. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 # This file is a template, and might need editing before it works on your project. --- # All available Hugo versions are listed here: # https://gitlab.com/pages/hugo/container_registry image: registry.gitlab.com/pages/hugo:latest variables: GIT_SUBMODULE_STRATEGY: recursive test: script: - hugo except: - main pages: script: - hugo artifacts: paths: - public only: - main Modify the baseurl parameter with this structure baseURL = \u0026quot;https://\u0026lt;gitlab-user\u0026gt;.gitlab.io/\u0026lt;project-name\u0026gt;/\u0026quot;.\nEnable the Pages Access Control for everyone visibility. Navigate to your project’s Settings \u0026gt; General and expand Visibility, project features, permissions \u0026gt; Pages \u0026gt; Everyone.\n","date":"2021-02-15T13:00:14+01:00","permalink":"/en/p/gitlab-pages/","title":"Gitlab pages"},{"content":"MLOps with GitHub Video of MLOps Workflow With Github Actions - O\u0026rsquo;Really ","date":"2021-02-15T13:00:14+01:00","permalink":"/en/p/mlops-training-resources/","title":"MLOps Training resources"},{"content":"\u0026ldquo;Hugo-theme-learn is a theme for Hugo, a fast and modern static website engine written in Go. Where Hugo is often used for blogs, this multilingual-ready theme is fully designed for documentation\u0026rdquo; 1. The GitHub repo of Learn has MIT license.\nCreate a new chapter with: 1 hugo new --kind chapter hugo/_index.md Create a new entry. 1 hugo new hugo/quick_start.md Hugo-theme-learn, Documentation\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","date":"2021-02-13T12:45:35+01:00","permalink":"/en/p/hugo-learn-theme/","title":"Hugo Learn theme"},{"content":"How to create a new Hugo site? Install Hugo following the oficial docs.\nChoose a theme here. For example, the theme of this website is Learn.\nCreate a new hugo project with:\n1 hugo new site \u0026lt;name-project\u0026gt; Init git repo. 1 git init Copy the zip of the theme choosed in the new folder themes or add the submodule. 1 git clone https://github.com/matcornic/hugo-theme-learn themes/learn Or add the submodule:\n1 git submodule add https://github.com/gesquive/slate themes/slate Edit configuration file config.toml and add: 1 2 baseURL = \u0026#34;http://localhost:1313/\u0026#34; theme = \u0026#34;learn\u0026#34; Run the website, it will be available at http://localhost:1313/. 1 hugo server ","date":"2021-02-13T12:19:30+01:00","permalink":"/en/p/hugo-quick-start/","title":"Hugo quick start"},{"content":"Where look for peer-reviewed academic literature? Scopus Web of Science Google Scholar Institute of Electrical and Electronics Engineers (IEEE) Xplore Association for Computing Machinery (ACM) Digital Library ScienceDirect dblp Computer Science Bibliography Springer Link Wiley Online Library The collection of Computer Science Bibliography Open Access Button Open-access pre-print repositories ArXiv.org Archive ouverte HAL Where to look for books? Open Libra How visualize the connections between academic papers? Connected papers What industry electronic databases exist? Gartner OVUM ","date":"2021-02-12T14:06:49+01:00","permalink":"/en/p/search-engines-for-academic-papers/","title":"Search engines for academic papers"},{"content":"Preparing Files for USB Memory Stick Booting Download the image 1 wget \u0026lt;url-iso\u0026gt; Flash the ISO into USB First of all is to identified the partition of the USB connected. We use lsblk command and then, we unmount the partition. After that, we format in vFAT and, finnally, flash the ISO into USB.\nOne way is to burn an image is with the dd command as shown below:\n1 2 3 4 5 6 7 8 # See the partitions lsblk # Umount the USB partition sudo umount /dev/sdc1 # Flash the ISO into USB sudo dd bs=4M if=image.iso of=/dev/sda conv=fsync oflag=direct status=progress ","date":"2021-01-11T00:00:00Z","permalink":"/en/p/creating-a-bootable-usb-flashdrive/","title":"Creating a Bootable USB Flashdrive"},{"content":"This post provides an overview of the basic kubectl CLI commands that can be applied to Kubernetes objects. Some examples of Kubernetes objects include Pods, ReplicaSets, Deployments, Namespaces, etc.\nNamespaces Namespaces are used in Kubernetes to organize cluster objects. Essentially, a namespace represents a folder containing a set of objects. By default, kubectl interacts with the default namespace. To use a different namespace, the --namespace flag is required, for example --namespace=example. To interact with all namespaces, use the --all-namespaces flag 1.\nContexts If you want to change the default namespace permanently, you can use a context. When used, it is recorded in the kubectl configuration file, stored at HOME/.kube/config. To create a context with a new default namespace name, run 1:\n1 kubectl config set-context my-context --namespace=nuevonamespacepordefecto 1 kubectl config use-context my-context Kubernetes API objects Every Kubernetes object is represented by a RESTful resource and exists at a unique HTTP path in the Kubernetes API. Resources are represented as JSON or YAML files. Through the kubectl command, you can access these objects. For example, using kubectl get you can access any resource in the default namespace 1:\n1 kubectl get \u0026lt;resource-name\u0026gt; To get a more specific resource:\n1 kubectl get \u0026lt;resource-name\u0026gt; \u0026lt;object-name\u0026gt; To get more information about the object in JSON or YAML format, you can add the -o json or -o yaml flags respectively. This output is not very human-readable.\nAnother option to get human-readable details about an object is to use the kubectl describe command:\n1 kubectl describe \u0026lt;resource-name\u0026gt; \u0026lt;object-name\u0026gt; Creating, updating, or deleting Kubernetes objects As mentioned earlier, Kubernetes objects or resources are represented by JSON or YAML files. To create, update, or delete these objects, such files are used. For example, to create or update an object stored in ejemplo.yaml, run:\n1 kubectl apply -f ejemplo.yaml If you prefer to make interactive edits instead of modifying the local file, you can use the kubectl edit command to download the latest version and launch an editor. After saving the file, it will be uploaded and automatically updated.\n1 kubectl edit \u0026lt;resource-name\u0026gt; \u0026lt;object-name\u0026gt; The kubectl apply command also saves the version history of configuration files. You can access these records using the edit-last-applied, set-last-applied, and view-last-applied options.\n1 kubectl apply -f myobj.yaml view-last-applied To delete an object, simply run:\n1 kubectl delete -f ejemplo.yaml Debugging kubectl also has a set of commands for debugging your containers. To view the logs of a running container, run:\n1 kubectl logs \u0026lt;pod-name\u0026gt; If there are multiple containers in the pod, you can choose the container to inspect with the -c flag.\nBy default, kubectl logs lists the current logs and exits. If you want to continuously stream the logs to the terminal instead, you can add the -f (follow) flag to the command line.\nYou can also use the exec command to run a command in a running container:\n1 kubectl exec -it \u0026lt;pod-name\u0026gt; -- bash This will provide an interactive console inside the running container for more detailed debugging.\nO\u0026rsquo;Reilly, Kubernetes: Up and Running\u0026#160;\u0026#x21a9;\u0026#xfe0e;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","date":"2020-10-29T00:00:00Z","permalink":"/en/p/basic-kubernetes-commands/","title":"Basic Kubernetes Commands"},{"content":"This post provides a description of the basic Kubernetes objects. Some examples of Kubernetes objects include Pods, ReplicaSets, Deployments, Namespaces, etc.\nBasic objects According to the Kubernetes documentation, Kubernetes objects are persistent entities. Kubernetes uses these entities to represent the state of the cluster. Each Kubernetes object is represented by a RESTful resource and exists at a unique HTTP path. Specifically, objects can describe 1:\nWhich containerized applications are running (and on which nodes). The resources available to those applications. The behavioral policies for those applications, such as restart policies, upgrades, and fault tolerance. Almost all Kubernetes objects include two fields that configure them: the spec or desired state of the object specification, and the status or actual/current state of the object. In the spec section, the intent or desired state of the object is declared. The control plane is responsible for attempting to match the actual state of the object with the desired state.\nTo create an object, the Kubernetes API must receive the object information in JSON format. Most of the time, the information is sent through kubectl in a .yaml file that will be converted to JSON format. An example of a .yaml file is as follows 1:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2 kind: Deployment metadata: name: nginx-deployment spec: selector: matchLabels: app: nginx replicas: 2 # tells deployment to run 2 pods matching the template template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx:1.14.2 ports: - containerPort: 80 Required fields The following mandatory fields must be set to create an object 1:\napiVersion: which version of the Kubernetes API is being used to create the object. kind: what type of object you want to create. metadata: data that serves to uniquely identify the object, including a name, a UID, and an optional namespace. spec: the desired state for the object. Labels Labels are key-value pairs attached to Kubernetes objects. They are used to organize and select subsets of objects based on predefined requirements. Many objects can share the same label, so labels do not provide uniqueness to objects 2.\nFor example, to add the label color=green to a Pod called plant, you can run the following 3:\n1 kubectl label pods planta color=verde The above command will not overwrite an existing label, so you need to use the --overwrite flag. Or if you want to remove the color label, use the following command:\n1 kubectl label pods bar color - Label selectors Controllers use label selectors to select a subset of objects. Kubernetes supports two types of selectors 4:\nEquality-based selectors Allow filtering objects based on label keys and values. Matching is achieved using the operators:\nEquals = or == (there is no difference between the operators) Not equals != Set-based selectors Allow filtering objects based on a set of values. You can use in, notin operators for label values, and the exists operator for label keys.\nObject types Pods A pod is the smallest scheduling unit in Kubernetes. It is a logical collection of one or more containers that 2:\nAre scheduled on the same host. Share the same network namespace, and therefore share a single IP address assigned to the Pod. Have access to mount the same external storage (volumes). Pods are ephemeral in nature and do not have self-healing capabilities. That is why controllers are used to manage Pod replication, fault tolerance, self-healing, etc. Some of these controllers include Deployments, ReplicaSets, etc.\nReplicaSets Deployments Kubernetes Documentation, Understanding Kubernetes Objects\u0026#160;\u0026#x21a9;\u0026#xfe0e;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nKubernetes Documentation, Labels and Selectors\u0026#160;\u0026#x21a9;\u0026#xfe0e;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nO\u0026rsquo;Reilly, Kubernetes: Up and Running\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nKubernetes Documentation, Label selectors\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","date":"2020-10-29T00:00:00Z","permalink":"/en/p/basic-kubernetes-objects/","title":"Basic Kubernetes Objects"},{"content":"Local Kubernetes installation There are several tools that can be used to deploy Kubernetes on one or many clusters. Some of them include:\nMinikube Kind Docker Desktop MicroK8s K3S Minikube is the easiest and preferred method for setting up Kubernetes locally. It is used to manage a single-node cluster, although there is already an experimental feature that supports multi-node clusters.\nMinikube The minikube project is a local Kubernetes cluster implementation for Linux, macOS, and Windows. Its goal is to be the best tool for local Kubernetes application development 1.\nThe first steps with minikube can be found in the official documentation and are as follows 2:\nRequirements 2 CPUs or more 2GB of RAM 20GB of disk space Internet connection Container or virtual machine manager, such as: Docker, Hyperkit, Hyper-V, KVM, Parallels, Podman, VirtualBox, or VMware. Installing minikube For Linux there are three options:\nBinary package: 1 2 curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64 sudo install minikube-linux-amd64 /usr/local/bin/minikube Debian package: 1 2 curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube_latest_amd64.deb sudo dpkg -i minikube_latest_amd64.deb RPM package: 1 2 curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-latest.x86_64.rpm sudo rpm -ivh minikube-latest.x86_64.rpm To start minikube, run:\n1 minikube start To stop minikube safely, run:\n1 minikube stop Installing Kubernetes Kubernetes can be installed locally on virtual machines or directly on the operating system. Tools such as Ansible or kubeadm can be used to automate the installation.\nThe CLI tool kubectl can be used to manage, deploy, and configure the resources and applications of the Minikube cluster, and can be installed with the following commands:\n1 2 3 4 5 sudo apt-get update \u0026amp;\u0026amp; sudo apt-get install -y apt-transport-https gnupg2 curl curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add - echo \u0026#34;deb https://apt.kubernetes.io/ kubernetes-xenial main\u0026#34; | sudo tee -a /etc/apt/sources.list.d/kubernetes.list sudo apt-get update sudo apt-get install -y kubectl For details on kubectl commands, you can refer to the kubectl book, the official Kubernetes documentation, or its GitHub repository.\nA common step after installation is to configure and enable kubectl command autocompletion:\n1 2 3 4 sudo apt install -y bash-completion source /usr/share/bash-completion/bash-completion source \u0026lt;(kubectl completion bash) echo \u0026#39;source \u0026lt;(kubectl completion bash)\u0026#39; \u0026gt;\u0026gt;~/.bashrc Other relevant packages to install include:\nkubeadm: used for managing or automating the installation kubelet: an agent that runs on each node and communicates with the control plane components kubernetes-cni: allows configuring network elements 1 sudo apt-get install kubelet kubeadm kubernetes-cni GitHub, minikube\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nMinikube, Getting Started\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","date":"2020-10-29T00:00:00Z","permalink":"/en/p/getting-started-with-minikube/","title":"Getting Started with Minikube"},{"content":"This post covers the components of the Kubernetes architecture. The main sources of information are the Introduction to Kubernetes course by The Linux Foundation on edX, authored by Chris Pokorni and Neependra Khare, and the official Kubernetes documentation.\nComponents of the Kubernetes architecture A Kubernetes cluster consists of a set of worker machines, called nodes, that run containerized applications. Every cluster has at least one worker node. At a very high level of abstraction, Kubernetes has the following main components:\nOne or more master nodes, on the control plane side. One or more worker nodes. The following figure shows the architecture of the components of a Kubernetes cluster 1:\nThe master node provides a runtime environment for the control plane responsible for managing the state of a Kubernetes cluster and is the brain behind all operations within the cluster. The control plane components are agents with very distinct roles in cluster management. To communicate with the Kubernetes cluster, users send requests to the control plane through a command-line interface (CLI) tool, a web user interface dashboard, or an application programming interface (API) 2.\nIt is essential to keep the control plane running at all costs. Losing the control plane can cause downtime, resulting in service disruption to clients, with a potential loss of business. To ensure fault tolerance of the control plane, master node replicas can be added to the cluster, configured in high availability mode. While only one of the master nodes is dedicated to actively managing the cluster, the control plane components remain synchronized across the master node replicas. This type of configuration adds resilience to the cluster\u0026rsquo;s control plane, in case the active master node fails 2.\nTo preserve the state of the Kubernetes cluster, all cluster configuration data is saved in etcd. etcd is a distributed key-value store that only holds data related to the cluster state, not client workload data. etcd can be configured on the master node (stacked topology) or on its dedicated host (external topology) to help reduce the chances of data store loss by decoupling it from the other control plane agents 2.\nWith the stacked etcd topology, high availability master node replicas also ensure the resilience of the etcd data store. However, that is not the case with the external etcd topology, where etcd hosts must be replicated separately for high availability, a configuration that introduces the need for additional hardware.\nA master node runs the following control plane components 1:\nkube-apiserver or API server kube-scheduler or scheduler kube-controller-manager or controller manager etcd or data store While a worker node has the following components:\nContainer Runtime kubelet or node agent kube-proxy or proxy Addons for DNS, dashboard, cluster-level monitoring, and logging Master node kube-apiserver All administrative tasks are coordinated by kube-apiserver, a central control plane component that runs on the master node. The API server receives RESTful requests from users, operators, and external agents, then validates and processes them. During processing, the API server reads the current state of the Kubernetes cluster from the etcd data store, and after the execution of a call, the resulting state of the Kubernetes cluster is saved in the distributed key-value data store for persistence. The API server is the only control plane component that communicates with the etcd data store, both for reading and saving Kubernetes cluster state information, acting as an intermediary interface for any other control plane agent querying the cluster state.\nThe API server is highly configurable and customizable. It can scale horizontally, and it also supports adding custom secondary API servers, a configuration that turns the primary API server into a proxy for all custom secondary API servers and routes all incoming RESTful calls to them based on custom-defined rules 2.\nkube-scheduler The role of the kube-scheduler is to assign new workload objects, such as pods, to nodes. During the scheduling process, decisions are made based on the current state of the Kubernetes cluster and the requirements of the new object. The scheduler obtains from the etcd data store, through the API server, the resource usage data for each worker node in the cluster. The scheduler also receives from the API server the requirements of the new object that are part of its configuration data. Requirements may include constraints set by users and operators, such as scheduling work on a node labeled with disk == ssd as a key-value pair. The scheduler also takes into account Quality of Service (QoS) requirements, data locality, affinity, anti-affinity, dependent data location, taints, cluster topology, etc. Once all the cluster data is available, the scheduling algorithm filters the nodes with predicates to isolate potential candidate nodes, which are then scored with priorities to select the node that satisfies all the requirements for the new workload. The result of the decision process is communicated to the API server, which then delegates the workload deployment to other control plane agents.\nThe scheduler is highly configurable and customizable through scheduling policies, plugins, and profiles. Additional custom schedulers are also supported. A scheduler is extremely important and complex in a multi-node Kubernetes cluster 2.\nkube-controller-manager A control plane component that runs the controllers to regulate the state of the Kubernetes cluster. Controllers are watch loops that run continuously and compare the desired state of the cluster (provided by the configuration data of objects) with its current state (obtained from the etcd data store through the API server). In case of a discrepancy, corrective actions are taken in the cluster until its current state matches the desired state. It runs controllers responsible for acting when nodes become unavailable, for ensuring the expected number of pods, for creating endpoints, service accounts, and API access tokens 2. Logically, each controller is an independent process, but to reduce complexity, they are all compiled into a single binary and run in a single process. These controllers include 1:\nNode controller: responsible for detecting and responding when a node goes down Replication controller: responsible for maintaining the correct number of pods for each replication controller in the system Endpoints controller: builds the Endpoints object, i.e., joins Services and Pods Service account and token controllers: create default accounts and API access tokens for new Namespaces. etcd A persistent, consistent, and distributed key-value data store used to store all Kubernetes cluster information 1. New data is appended to the data store, never replaced. Obsolete data is periodically compacted to minimize the size of the data store.\nOf all the control plane components, only the API server can communicate with the etcd data store.\nThe etcd CLI management tool, etcdctl, provides options for backups, snapshots, and restores. These are especially useful for a single-instance etcd Kubernetes cluster, common in development and learning environments. However, in Staging and Production environments, it is extremely important to replicate data stores in high availability mode.\nSome Kubernetes cluster bootstrapping tools, such as kubeadm, provision stacked etcd master nodes, where the data store runs alongside the other control plane components on the same master node and shares resources with them 2.\nFor data store isolation from the control plane components, the bootstrapping process can be configured for an external etcd topology. The data store is deployed on a separate dedicated host from the control plane, thus reducing the chances of an etcd failure 2.\nBoth stacked and external etcd topologies support high availability configurations. etcd is based on the Raft consensus protocol, which allows a set of machines to survive the failure of some of them, including master node failures. At any given time, one of the nodes in the group will be the leader and the rest will be followers 2.\netcd is written in the Go programming language. In Kubernetes, besides storing the cluster state, etcd is also used to store configuration details such as subnets, ConfigMaps, Secrets, etc.\nWorker node A worker node provides a runtime environment for client applications. Although they are containerized microservices, these applications are encapsulated in pods, controlled by the cluster\u0026rsquo;s control plane agents running on the master node. Pods are scheduled on worker nodes, where they find the necessary compute, memory, and storage resources to run, and networking to communicate with each other and the outside world. A pod is the smallest scheduling unit in Kubernetes. It is a logical collection of one or more containers scheduled together, and the collection can be started, stopped, or rescheduled as a single unit of work.\nAdditionally, in a multi-worker Kubernetes cluster, network traffic between client users and the containerized applications deployed in Pods is handled directly by the worker nodes and is not routed through the master node 2.\nContainer Runtime Although Kubernetes is described as a \u0026ldquo;container orchestration engine\u0026rdquo;, it does not have the ability to handle containers directly. To manage the lifecycle of a container, Kubernetes requires a container runtime on the node where a Pod and its containers will be scheduled. Kubernetes supports many container runtimes 2:\nDocker: although it is a container platform that uses containerd as its container runtime, it is the most popular option used with Kubernetes CRI-O: a lightweight container runtime for Kubernetes that also supports Docker image registries containerd: a simple, portable container runtime that provides robustness frakti: a hypervisor-based container runtime for Kubernetes kubelet The kubelet is an agent that runs on every node and communicates with the control plane components on the master node. It receives pod definitions, primarily from the API server, and interacts with the container runtime on the node to run containers associated with the pod. It also monitors the health and resources of the containers running in pods. The kubelet agent takes a set of Pod specifications, called PodSpecs, that have been created by Kubernetes and ensures that the containers described in them are running and healthy.\nThe kubelet connects to container runtimes through a plugin based on the Container Runtime Interface (CRI). The CRI consists of protocol buffers, gRPC APIs, libraries, and additional specifications and tools that are currently under development. To connect to interchangeable container runtimes, kubelet uses a shim application that provides a clear abstraction layer between kubelet and the container runtime.\nFrom blog.kubernetes.io\nAs shown above, the kubelet acting as a gRPC client connects to the CRI shim, which in turn acts as a gRPC server to perform container and image operations. The CRI implements two services: ImageService and RuntimeService. ImageService is responsible for all image-related operations, while RuntimeService is responsible for all pod and container-related operations 2.\nkube-proxy kube-proxy is the network agent that runs on every node, responsible for dynamic updates and maintenance of all network rules on the node. It extracts Pod network details and forwards connection requests to Pods.\nThe kube-proxy is responsible for TCP, UDP, and SCTP stream forwarding or round-robin forwarding across a set of pod backends, and it implements forwarding rules defined by users through Service API objects 2.\nAddons Addons are cluster features and functionalities not yet available in Kubernetes, so they are implemented through third-party pods and services 2.\nDNS: the cluster DNS is a DNS server required to assign DNS records to Kubernetes objects and resources\nDashboard: a general-purpose web-based user interface for cluster management\nMonitoring: collects cluster-level container metrics and stores them in a central data store\nLogging: collects cluster-level container logs and stores them in a central log store for analysis.\nNetworking challenges Decoupled microservices-based applications rely heavily on networking to mimic the tight coupling that was once available in the monolithic era. Networking, in general, is not the easiest to understand and implement. Kubernetes is no exception: as an orchestrator of containerized microservices, it must address several distinct networking challenges 2:\nContainer-to-container communication within pods Pod-to-pod communication on the same node and across all cluster nodes Pod-to-Service communication within the same namespace and across cluster namespaces External-to-Service communication so that clients can access applications in a cluster. Container to container within pods By leveraging the virtualization features of the underlying host OS kernel, a container runtime creates an isolated network space for each container it starts. On Linux, this isolated network space is called a network namespace. A network namespace can be shared between containers or with the host operating system.\nWhen a pod is started, the Container Runtime initializes a special pause container with the sole purpose of creating a network namespace for the pod. All additional containers, created through user requests, running within the Pod will share the Pause container\u0026rsquo;s network namespace so they can all communicate with each other via localhost.\nPod to pod across nodes In a Kubernetes cluster, pods are scheduled on nodes in a nearly unpredictable manner. Regardless of their host node, pods are expected to be able to communicate with all other pods in the cluster, all without the implementation of Network Address Translation (NAT). This is a fundamental requirement of any Kubernetes networking implementation.\nThe Kubernetes networking model aims to reduce complexity and treats Pods as VMs on a network, where each VM is equipped with a network interface, so each Pod receives a unique IP address. This model is called \u0026ldquo;IP-per-Pod\u0026rdquo; and ensures pod-to-pod communication, just as virtual machines can communicate with each other on the same network.\nHowever, let us not forget about containers. They share the Pod\u0026rsquo;s network namespace and must coordinate port assignments within the Pod just as applications would on a VM, while being able to communicate with each other on localhost within the Pod. However, containers are integrated with the overall Kubernetes networking model through the use of Container Network Interface (CNI)-compatible CNI plugins. CNI is a set of specifications and libraries that allow plugins to configure networking for containers. While there are some core plugins, most CNI plugins are third-party Software-Defined Networking (SDN) solutions that implement the Kubernetes networking model. In addition to addressing the fundamental networking model requirement, some networking solutions offer support for network policies. Flannel, Weave, and Calico are just a few of the SDN solutions available for Kubernetes clusters.\nPod to the outside world A successfully deployed containerized application running in pods within a Kubernetes cluster may require accessibility from the outside world. Kubernetes enables external accessibility through Services, complex encapsulations of routing rule definitions stored in iptables on cluster nodes and implemented by kube-proxy agents. By exposing services to the external world with the help of kube-proxy, applications become accessible from outside the cluster through a virtual IP address.\nDocumentation Kubernetes, Kubernetes Components\u0026#160;\u0026#x21a9;\u0026#xfe0e;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nedX, Introduction to kubernetes - The Linux Foundation\u0026#160;\u0026#x21a9;\u0026#xfe0e;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","date":"2020-10-28T00:00:00Z","permalink":"/en/p/kubernetes-architecture-components/","title":"Kubernetes Architecture Components"},{"content":"This post covers the basic introductory concepts of Kubernetes. The main sources of information are the Introduction to Kubernetes course by The Linux Foundation on edX, authored by Chris Pokorni and Neependra Khare, and the official Kubernetes documentation.\nWhat is Kubernetes and why is it used? Kubernetes or \u0026ldquo;K8s\u0026rdquo; is a portable, extensible, open-source platform, licensed under Apache 2.0, for managing workloads and services. It is used for automating the deployment, scaling, and management of containerized applications and was originally designed by Google and donated to the Cloud Native Computing Foundation (part of the Linux Foundation). It supports different container runtime environments, including Docker 1.\nCurrently, there is a trend toward running processes in what is known as the \u0026ldquo;cloud\u0026rdquo;. However, this was preceded by a model known as monolithic, which relied on outdated software architecture principles, with large components written in legacy programming languages and the entire system deployed on expensive hardware that was costly to manage.\nTherefore, the current trend is to separate and simplify each software component to turn them into distributed components, described by their specific characteristics. This creates microservices that can be coupled together and are easy to replace or relocate. The microservices architecture is aligned with the principles of Event-Driven Architecture (EDA) and Service-Oriented Architecture (SOA).\nEach microservice is developed in a modern programming language, selected as the most suitable for the type of service and its function. This offers great flexibility in combining microservices with specific hardware when necessary, enabling deployments on low-cost commodity hardware. Although the distributed nature of microservices adds complexity to the architecture, one of the greatest benefits of microservices is scalability. With the overall application becoming modular, each microservice can be scaled individually, either manually or automatically through demand-based autoscaling. Additionally, there is practically no downtime or service disruption for clients because updates are rolled out seamlessly, one service at a time, instead of having to recompile, rebuild, and restart an entire monolithic application 2.\nIn summary, these microservices are deployed in containers and Kubernetes is a container orchestrator. Therefore, to understand what Kubernetes is, it is necessary to review the basic concepts of containers and container orchestrators.\nWhat are containers? Containers are OS-level virtual spaces that bundle application code with associated libraries and configuration files, along with the dependencies needed for the application to run. They provide scalability and high performance to applications on any infrastructure of your choice. They are best suited for delivering microservices by providing portable and isolated virtual environments so applications can run without interference from other running applications.\nBenefits of using containers The benefits of using containers include 3:\nAgile application creation and deployment: Greater ease and efficiency in creating container images instead of virtual machines. Continuous development, integration, and deployment: Allows container images to be built and deployed frequently and reliably, facilitating rollbacks since the image is immutable. Dev and Ops separation of concerns: You can create container images at build time rather than at deployment time, decoupling the application from the infrastructure. Observability: Surfaces not only OS-level information and metrics, but also application health and other signals. Consistency across development, testing, and production environments: The application runs the same on a laptop as it does in the cloud. Cloud and OS distribution portability: Runs on Ubuntu, RHEL, CoreOS, your physical datacenter, Google Kubernetes Engine, and everything else. Application-centric management: Raises the level of abstraction from the OS and virtualized hardware to the application running on a system with logical resources. Loosely coupled, distributed, elastic, liberated microservices: Applications are broken into smaller, independent pieces that can be deployed and managed dynamically, rather than as a monolithic application running on a single high-capacity machine. Resource isolation: Makes application performance more predictable. Resource utilization: Enables greater efficiency and density. What are container orchestrators? In development environments, running containers on a single host for application development and testing can be a viable option. However, when migrating to quality assurance (QA) and production (Prod) environments, it is no longer viable because applications and services must meet specific requirements 2:\nFault tolerance. On-demand scalability. Optimal resource usage. Auto-discovery to automatically discover and communicate between components. Accessibility from the outside world. Seamless security updates or patches with zero downtime. Container orchestrators are tools that group systems to form clusters where container deployment and management are automated at scale, meeting the requirements listed above.\nThere are several container orchestrator solutions, and some of the available ones are:\nAmazon Elastic Container Service (ECS). Azure Container Instances. Azure Service Fabric. Kubernetes. Marathon. Nomad. Docker Swarm. While it is feasible to manage a few containers manually, orchestrators greatly simplify administration for operators, especially when dealing with hundreds or thousands of containers. Most container orchestrators can perform the following actions 2:\nGroup hosts while creating a cluster. Schedule containers to run on cluster hosts based on resource availability. Allow containers in a cluster to communicate with each other regardless of the host they are deployed on in the cluster. Bind containers and storage resources. Group sets of similar containers and bind them to load-balancing constructs to simplify access to containerized applications, creating a level of abstraction between containers and the user. Manage and optimize resource usage. Allow the implementation of policies to secure access to applications running inside containers. With all these configurable yet flexible features, container orchestrators are a great choice when it comes to managing containerized applications at scale.\nKubernetes features Kubernetes offers a very broad set of features for container orchestration. Its main features are 2:\nAutomatic bin packing: Kubernetes automatically schedules containers based on resource needs and constraints, to maximize utilization without sacrificing availability.\nSelf-healing: Kubernetes automatically replaces and reschedules containers from failed nodes. It kills and restarts containers that do not respond to health checks, based on existing rules or policies. It also prevents traffic from being routed to unresponsive containers.\nHorizontal scaling: With Kubernetes, applications scale manually or automatically based on CPU usage or custom metrics.\nService discovery and load balancing: Containers receive their own IP addresses from Kubernetes, while Kubernetes assigns a single Domain Name System (DNS) name to a set of containers to help load balance requests across all containers in the set.\nAutomated rollouts and rollbacks: Kubernetes seamlessly rolls out and rolls back application updates and configuration changes, constantly monitoring application health to prevent any downtime.\nSecret and configuration management: Kubernetes manages sensitive data and application configuration details separately from the container image, to avoid rebuilding the respective image. Secrets consist of sensitive or confidential information passed to the application without revealing the sensitive content to the code configuration.\nStorage orchestration: Kubernetes automatically mounts Software-Defined Storage (SDS) solutions to containers from local storage, external cloud providers, distributed storage, or network storage systems.\nBatch execution: Kubernetes supports batch execution, long-running jobs, and replaces failed containers.\nThe architecture of Kubernetes is modular and pluggable. It not only orchestrates microservice-type applications as decoupled modules, but its own architecture also follows decoupled microservice patterns. Kubernetes functionality can be extended by writing custom resources, operators, custom APIs, scheduling rules, or plugins.\nWikipedia, Kubernetes\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nedX, Introduction to kubernetes - The Linux Foundation.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nKubernetes Documentation, What is Kubernetes?\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","date":"2020-10-27T00:00:00Z","permalink":"/en/p/introduction-to-kubernetes/","title":"Introduction to Kubernetes"},{"content":"This article offers a sample of basic Markdown syntax that can be used in Hugo content files, also it shows whether basic HTML elements are decorated with CSS in a Hugo theme.\nHeadings The following HTML \u0026lt;h1\u0026gt;—\u0026lt;h6\u0026gt; elements represent six levels of section headings. \u0026lt;h1\u0026gt; is the highest section level while \u0026lt;h6\u0026gt; is the lowest.\nH1 H2 H3 H4 H5 H6 Paragraph Xerum, quo qui aut unt expliquam qui dolut labo. Aque venitatiusda cum, voluptionse latur sitiae dolessi aut parist aut dollo enim qui voluptate ma dolestendit peritin re plis aut quas inctum laceat est volestemque commosa as cus endigna tectur, offic to cor sequas etum rerum idem sintibus eiur? Quianimin porecus evelectur, cum que nis nust voloribus ratem aut omnimi, sitatur? Quiatem. Nam, omnis sum am facea corem alique molestrunt et eos evelece arcillit ut aut eos eos nus, sin conecerem erum fuga. Ri oditatquam, ad quibus unda veliamenimin cusam et facea ipsamus es exerum sitate dolores editium rerore eost, temped molorro ratiae volorro te reribus dolorer sperchicium faceata tiustia prat.\nItatur? Quiatae cullecum rem ent aut odis in re eossequodi nonsequ idebis ne sapicia is sinveli squiatum, core et que aut hariosam ex eat.\nBlockquotes The blockquote element represents content that is quoted from another source, optionally with a citation which must be within a footer or cite element, and optionally with in-line changes such as annotations and abbreviations.\nBlockquote without attribution Tiam, ad mint andaepu dandae nostion secatur sequo quae. Note that you can use Markdown syntax within a blockquote.\nBlockquote with attribution Don\u0026rsquo;t communicate by sharing memory, share memory by communicating.\n— Rob Pike1\nTables Tables aren\u0026rsquo;t part of the core Markdown spec, but Hugo supports supports them out-of-the-box.\nName Age Bob 27 Alice 23 Inline Markdown within tables Italics Bold Code italics bold code Code Blocks Code block with backticks 1 2 3 4 5 6 7 8 9 10 11 \u0026lt;!doctype html\u0026gt; \u0026lt;html lang=\u0026#34;en\u0026#34;\u0026gt; \u0026lt;head\u0026gt; \u0026lt;meta charset=\u0026#34;utf-8\u0026#34;\u0026gt; \u0026lt;title\u0026gt;Example HTML5 Document\u0026lt;/title\u0026gt; \u0026lt;/head\u0026gt; \u0026lt;body\u0026gt; \u0026lt;p\u0026gt;Test\u0026lt;/p\u0026gt; \u0026lt;/body\u0026gt; \u0026lt;!-- this line is extraneous 2Error from server (Forbidden): deployments.apps is forbidden: User \u0026#34;chiptest\u0026#34; cannot create resource \u0026#34;deployments\u0026#34; in API group \u0026#34;apps\u0026#34; in the namespace \u0026#34;default\u0026#34; --\u0026gt; \u0026lt;/html\u0026gt; Code block indented with four spaces \u0026lt;!doctype html\u0026gt; \u0026lt;html lang=\u0026quot;en\u0026quot;\u0026gt; \u0026lt;head\u0026gt; \u0026lt;meta charset=\u0026quot;utf-8\u0026quot;\u0026gt; \u0026lt;title\u0026gt;Example HTML5 Document\u0026lt;/title\u0026gt; \u0026lt;/head\u0026gt; \u0026lt;body\u0026gt; \u0026lt;p\u0026gt;Test\u0026lt;/p\u0026gt; \u0026lt;/body\u0026gt; \u0026lt;/html\u0026gt; Code block with Hugo\u0026rsquo;s internal highlight shortcode 1 2 3 4 5 6 7 8 9 10 \u0026lt;!doctype html\u0026gt; \u0026lt;html lang=\u0026#34;en\u0026#34;\u0026gt; \u0026lt;head\u0026gt; \u0026lt;meta charset=\u0026#34;utf-8\u0026#34;\u0026gt; \u0026lt;title\u0026gt;Example HTML5 Document\u0026lt;/title\u0026gt; \u0026lt;/head\u0026gt; \u0026lt;body\u0026gt; \u0026lt;p\u0026gt;Test\u0026lt;/p\u0026gt; \u0026lt;/body\u0026gt; \u0026lt;/html\u0026gt; List Types Ordered List First item Second item Third item Unordered List List item Another item And another item Nested list Fruit Apple Orange Banana Dairy Milk Cheese Other Elements — abbr, sub, sup, kbd, mark GIF is a bitmap image format.\nH2O\nXn + Yn = Zn\nPress CTRL+ALT+Delete to end the session.\nMost salamanders are nocturnal, and hunt for insects, worms, and other small creatures.\nThe above quote is excerpted from Rob Pike\u0026rsquo;s talk during Gopherfest, November 18, 2015.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","date":"2020-09-26T00:00:00Z","permalink":"/en/p/markdown-syntax-guide/","title":"Markdown Syntax Guide"}]