Showing posts with label DevOps. Show all posts
Showing posts with label DevOps. Show all posts

How to Fix Kubernetes CrashLoopBackOff: A Practical Guide



It’s the most famous (and frustrating) status in the Kubernetes world. You run kubectl get pods, and there it is: 0/1 CrashLoopBackOff.

Despite the scary name, CrashLoopBackOff isn’t actually the error—it’s Kubernetes telling you: "I tried to start your app, it died, I waited, and I’m about to try again."

Here is the "Triple Post" finale to get your cluster healthy before the weekend.


1. The "First 3" Commands

Before you start guessing, run these three commands in order. They tell you 90% of what you need to know.

CommandWhy run it?
kubectl describe pod <name>Look at the Events section at the bottom. It often says why it failed (e.g., OOMKilled).
kubectl logs <name> --previousCrucial. This shows the logs from the failed instance before it restarted.
kubectl get events --sort-by=.metadata.creationTimestampShows a timeline of cluster-wide issues (like Node pressure).

2. The Usual Suspects

If the logs are empty (a common headache!), the issue is likely happening before the app even starts.

  • OOMKilled: Your container exceeded its memory limit.

    • Fix: Increase resources.limits.memory.

  • Config Errors: You referenced a Secret or ConfigMap that doesn't exist, or has a typo.

    • Fix: Check the describe pod output for "MountVolume.SetUp failed".

  • Permissions: Your app is trying to write to a directory it doesn't own (standard in hardened images).

    • Fix: Check your securityContext or Dockerfile USER permissions.

  • Liveness Probe Failure: Your app is actually running fine, but the probe is checking the wrong port.

    • Fix: Double-check livenessProbe.httpGet.port.


3. The Pro-Tip: The "Sleeper" Debug

If you still can't find the bug because the container crashes too fast to inspect, override the entrypoint.

Update your deployment YAML to just run a sleep loop:

command: ["/bin/sh", "-c", "while true; do sleep 30; done;"]

Now the pod will stay "Running," and you can kubectl exec -it <pod> -- /bin/sh to poke around the environment manually!



Fixing Docker Error: "conflict: unable to remove repository reference"



Have you ever tried to clean up your local machine by deleting old Docker images, only to be met with this frustrating message?

Error response from daemon: conflict: unable to remove repository reference

"my-image" (must force) - container <ID> is using its referenced image <ID> 

This error happens because Docker is protective. It won't let you delete an image if there is a container—even a stopped one—that was created from it.

Step 1: Identify the "Zombie" Containers

The error message usually gives you a container ID. You can see all containers (running and stopped) that are blocking your deletion by running:

docker ps -a

Look for any container that is using the image you are trying to delete.

Step 2: Remove the Container First

Before you can delete the image, you must remove the container. If the container is still running, you’ll need to stop it first:

# Stop the container

docker stop <container_id>


# Remove the container 

docker rm <container_id>

Step 3: Delete the Image

Now that the dependency is gone, you can safely remove the image:

docker rmi <image_name_or_id>

The "Shortcut" (Force Delete)

If you don't care about the containers and just want the image gone immediately, you can use the -f (force) flag.

Warning: This will leave "dangling" containers that no longer have a valid image reference.

docker rmi -f <image_id>

Pro Tip: The Bulk Cleanup

If your machine is cluttered with dozens of these conflicts, don't fix them one by one. Use the prune command to safely remove all stopped containers and unused images in one go:

docker system prune

(Add the -a flag if you also want to remove unused images, not just "dangling" ones.)


How to Fix PostgreSQL Error: "FATAL: sorry, too many clients already"



 If you are seeing the error FATAL: sorry, too many clients already or FATAL: too many connections for role "username", your PostgreSQL instance has hit its limit of concurrent connections.

This usually happens when:

  • Your application isn't closing database connections properly.

  • You have a sudden spike in traffic.

  • A connection pooler (like PgBouncer) isn't configured.

Step 1: Check Current Connection Usage

Before changing any settings, you need to see who is using the connections. Run this query to get a breakdown of active vs. idle sessions:

SELECT count(*), state 

FROM pg_stat_activity 

GROUP BY state;

If you see a high number of "idle" connections, your application is likely "leaking" connections (opening them but never closing them).

Step 2: Emergency Fix (Kill Idle Connections)

If your production site is down because of this error, you can manually terminate idle sessions to free up slots immediately:

-- This kills all idle connections older than 5 minutes

SELECT pg_terminate_backend(pid)

FROM pg_stat_activity

WHERE state = 'idle' 

AND state_change < current_timestamp - interval '5 minutes';

Step 3: Increase max_connections (The Configuration Fix)

The default limit in PostgreSQL is often 100. If your hardware has enough RAM, you can increase this.

  1. Find your config file: SHOW config_file;

  2. Open postgresql.conf and find the max_connections setting.

  3. Change it to a higher value (e.g., 200 or 500).

  4. Restart PostgreSQL for changes to take effect.

Warning: Every connection consumes memory (roughly 5-10MB). If you set this too high, you might run the entire server out of RAM (OOM).

Step 4: The Professional Solution (Connection Pooling)

Increasing max_connections is a temporary fix. For a production-grade setup, you should use PgBouncer.

Instead of your application connecting directly to Postgres, it connects to PgBouncer. PgBouncer keeps a small pool of real connections open to the database and rotates them among hundreds of incoming requests.

Sample pgbouncer.ini configuration:

[databases]

mydatabase = host=127.0.0.1 port=5432 dbname=mydatabase


[pgbouncer]

listen_port = 6432

auth_type = md5

pool_mode = transaction

max_client_conn = 1000

default_pool_size = 20

Summary Checklist

  • Audit your code: Ensure every db.connect() has a corresponding db.close().

  • Monitor: Set up alerts for when connections exceed 80% of max_connections.

  • Scale: Use a connection pooler like PgBouncer or pg_pool if you have more than 100 active users.




How to Fix Kubernetes CrashLoopBackOff: A Practical Guide

It’s the most famous (and frustrating) status in the Kubernetes world. You run  kubectl get pods , and there it is: 0/1 CrashLoopBackOff . ...