Publishing Services on Bare Metal Kubernetes with MetalLB

There’s a reality that teams running bare metal Kubernetes hit very often: the cluster side matures, ingress is established, observability settles in — but publishing services to the outside world is still done by hand. In a cloud environment, this is generally solved in a few minutes with a LoadBalancer service. In your own data center or hybrid setup, the same need spreads across the network team, firewall rules, VIP management, and hand-maintained IP tables. MetalLB fills that gap; but its real value isn’t merely handing out IPs — it’s connecting Kubernetes and network operations together in a more predictable way.

Technical schema showing the service publishing flow with MetalLB on a bare metal Kubernetes cluster — MetalLB brings service publishing on bare metal closer to a cloud-like experience; but the IP pool, the L2 vs BGP mode, and the operational boundaries need to be clear from the start.

The problem isn’t really about handing out IPs

Many teams position MetalLB as just “the tool that hands external IPs to Kubernetes.” That’s an incomplete framing. The real problems are:

Which services should get external access isn’t clearly defined.
Ownership between IP pools and network segmentation is disconnected.
Failover behavior changes depending on the L2 topology or routing design.
Application teams want service exposure to be easy, while the platform team doesn’t want to lose risk control.

MetalLB is straightforward to install; but successful use requires the network intent to be just as clear.

L2 or BGP?

This is the most critical decision up front. L2 mode starts faster. It works well in single-data-center setups, with limited node counts and clusters that share the same broadcast domain. But in L2 mode, when VIP ownership migrates from one node to another, network behavior becomes sensitive to topology.

BGP mode, on the other hand, offers a more enterprise and scalable model:

Routes are exchanged explicitly with network devices.
It behaves more consistently across multi-rack or multi-segment scenarios.
The access path becomes more predictable when a node fails.

My own recommendation is this: in labs or small production environments with two switches, a single room, and a limited number of services, L2 is acceptable; in enterprise clusters, no permanent design should land without considering BGP.

Why does IP pool design matter?

Once MetalLB is installed, the most common mistake is to define a single broad IP pool and let every load balancer service draw from it. That model is short-term convenient, but it lowers audit and operations quality. A better approach is to split pools by intent:

North-south production traffic
Services accessed from the internal network
Temporary test or migration services
Management plane dependencies

This separation makes it visible which IP range belongs to which risk class. The firewall team, the security team, and the platform team can all look at the same table.

A simple starting definition

The example below shows a small but manageable MetalLB setup for a bare metal cluster:

apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: prod-external
  namespace: metallb-system
spec:
  addresses:
    - 10.40.20.120-10.40.20.139
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: prod-external
  namespace: metallb-system
spec:
  ipAddressPools:
    - prod-external
---
apiVersion: v1
kind: Service
metadata:
  name: edge-api
  annotations:
    metallb.universe.tf/address-pool: prod-external
spec:
  type: LoadBalancer
  selector:
    app: edge-api
  ports:
    - port: 443
      targetPort: 8443

This example looks small; but if the service definition, pool ownership, and segment intent stay consistent, moving to BGP mode later becomes much easier too.

Which controls are essential on the operations side?

Managing MetalLB at the kube manifest level alone isn’t enough. I recommend setting up these controls from day one:

Tracking the count of allocated and free IPs
An inventory of services sharing the same pool
An audit trail for VIP change events
An approval model for services receiving an external IP
A clear separation between ingress, gateway, and direct LoadBalancer use

In particular, in enterprise setups where different teams share the same cluster, “who allocated this external IP” is a governance question before it’s a technical one.

The boundary between network team and platform team

Successful MetalLB use draws this boundary very clearly:

Network team: defines which VLAN, subnet, BGP peer, and north-south flows are accepted.
Platform team: operates Kubernetes objects and pool policies within those boundaries.
Application teams: request services through a self-service experience but don’t step outside the enterprise guardrails.

Without that separation, either the platform team starts doing network design or the network team becomes a bottleneck on every service change.

Where should you be more careful?

MetalLB doesn’t solve every problem. The architectural decision needs more care in these situations:

Older networks where ARP behavior is unpredictable in wide L2 domains
IP pollution in test clusters that change frequently
Legacy physical load balancers consuming the same IP space
An external surface that’s unmanageable due to ad hoc LoadBalancer services

In those environments, simplifying service publishing principles first and then deploying MetalLB tends to be the better order.

Conclusion

Publishing services on bare metal Kubernetes with MetalLB may look like “bringing a cloud feature back to the data center,” but it’s actually an opportunity to build a healthier contract between the platform and network teams. If you split IP pools by intent, choose between L2 and BGP based on topology, and make external IP usage visible, your bare metal Kubernetes environment becomes far more manageable. MetalLB’s strength isn’t installation simplicity; it’s its ability to standardize service publishing.

Publishing Services on Bare Metal Kubernetes with MetalLB

The problem isn’t really about handing out IPs

L2 or BGP?

Why does IP pool design matter?

A simple starting definition

Which controls are essential on the operations side?

The boundary between network team and platform team

Where should you be more careful?

Conclusion

Comments

Curated digest, hand-picked by me — not the AI

Your Reading Stats

Related Posts

Gradually Tightening Kubernetes Network Policies with Cilium

BGP Route Flap Anatomy: Why It Happens, How to Fix It?

The Anatomy of VLAN Segmentation: Foundations of Proper Design

The problem isn’t really about handing out IPs

L2 or BGP?

Why does IP pool design matter?

A simple starting definition

Which controls are essential on the operations side?

The boundary between network team and platform team

Where should you be more careful?

Conclusion

Comments

Curated digest, hand-picked by me — not the AI

Your Reading Stats

Related Posts

Gradually Tightening Kubernetes Network Policies with Cilium

BGP Route Flap Anatomy: Why It Happens, How to Fix It?

The Anatomy of VLAN Segmentation: Foundations of Proper Design

Klavye Kısayolları