Posted on

Despite offering a similar feature set on the surface, the two projects have substantially different architectures, most notably Cilium’s use of eBPF and WireGuard for processing and encrypting L4 traffic in the kernel, contrasted with Istio’s ztunnel component for L4 in user space. These differences have resulted in substantial speculation about how Istio will perform at scale compared to Cilium.

While many comparisons have been made about tenancy models, security protocols and basic performance of the two projects, there has not yet been a full evaluation published at enterprise scale. Rather than emphasizing theoretical performance, we put Istio’s ambient mode and Cilium through their paces, focusing on key metrics like latency, throughput, and resource consumption. We cranked up the pressure with realistic load scenarios, simulating a bustling Kubernetes environment. Finally, we pushed the size of our AKS cluster up to 1,000 nodes on 11,000 cores, to understand how these projects perform at scale. Our results show areas where each can improve, but also indicate that Istio is the clear winner.

Behind the Scenes: Why the Difference?

The key to understanding these performance differences lies in the architecture and design of each tool.

  • Cilium’s Control Plane Conundrum: Cilium runs a control plane instance on each node, leading to API server strain and configuration overhead as your cluster expands. This frequently caused our API server to crash, followed by Cilium becoming unready, and the entire cluster becoming unresponsive.
  • Istio’s Efficiency Edge: Istio, with its centralized control plane and identity-based approach, streamlines configuration and reduces the burden on your API server and nodes, directing critical resources to processing and securing your traffic, rather than processing configuration. Istio takes further advantage of the resources not used in the control plane by running as many Envoy instances as a workload needs, while Cilium is limited to one shared Envoy instance per node.

Digging Deeper

Hidden Costs

While Istio operates entirely in user space, Cilium’s L4 dataplane runs in the Linux kernel using eBPF. Prometheus metrics for resource consumption only measure user space resources, meaning that all kernel resources used by Cilium are not accounted for in this test.

Recommendations: Choosing the Right Tool for the Job

So, what’s the verdict? Well, it depends on your specific needs and priorities. For small clusters with pure L3/L4 use cases and no requirement for encryption, Cilium offers a cost-effective and performant solution. However, for larger clusters and a focus on stability, scalability, and advanced features, Istio’s ambient mode, along with an alternate NetworkPolicy implementation, is the way to go. Many customers choose to combine the L3/L4 features of Cilium with the L4/L7 and encryption features of Istio for a defense-in-depth strategy.

Remember, the world of cloud-native networking is constantly evolving. Keep an eye on developments in both Istio and Cilium, as they continue to improve and address these challenges.

Link