Published on

Understanding Cartesian Explosion in EF Core and How to Avoid It

Authors

Understanding Cartesian Explosion in EF Core and How to Avoid It

Entity Framework Core (EF Core) is a powerful ORM (Object-Relational Mapper) for .NET applications. However, it comes with its own set of challenges. One such issue is Cartesian explosion, which can significantly impact performance if not addressed properly. In this blog post, we’ll explore what Cartesian explosion is, how it happens in EF Core, and practical ways to avoid it.


What is Cartesian Explosion?

A Cartesian explosion occurs when querying related data in EF Core results in an unexpectedly large dataset due to the nature of SQL joins. This typically happens when you use multiple Include or ThenInclude statements to eagerly load related entities.

SQL joins can produce duplicate rows for the main entity, especially in many-to-many or one-to-many relationships. When multiple relationships are involved, the result set grows exponentially, causing a "Cartesian explosion." This increases memory usage, data transfer size, and query execution time, potentially slowing down your application.

Example Scenario: How Cartesian Explosion Happens

Let’s consider a simple example with the following EF Core entities:

public class Order
{
    public int Id { get; set; }
    public string OrderNumber { get; set; }
    public List<OrderItem> OrderItems { get; set; }
    public Customer Customer { get; set; }
}

public class OrderItem
{
    public int Id { get; set; }
    public string ProductName { get; set; }
    public int Quantity { get; set; }
    public Order Order { get; set; }
}

public class Customer
{
    public int Id { get; set; }
    public string Name { get; set; }
    public List<Order> Orders { get; set; }
}

Suppose you write a query to fetch orders, along with their related order items and customer details:

var orders = dbContext.Orders
    .Include(o => o.OrderItems)
    .Include(o => o.Customer)
    .ToList();

SQL Query Generated by EF Core This query generates the following SQL:

SELECT o.*, oi.*, c.*
FROM Orders o
LEFT JOIN OrderItems oi ON o.Id = oi.OrderId
LEFT JOIN Customers c ON o.CustomerId = c.Id;

Problematic Result Set

If an order has:

  • 5 OrderItems
  • 1 Customer

The query will return 5 rows, with the Customer data repeated in each row. This duplication can grow exponentially as the number of related entities increases. For example:

  • If there are 3 relationships with 5, 10, and 20 records respectively, the result will have 5 × 10 × 20 = 1000 rows.

This is what we call Cartesian explosion.


How to Mitigate Cartesian Explosion

Fortunately, there are several ways to avoid Cartesian explosion in EF Core. Below are some strategies to optimize your queries and improve performance:

1. Use Split Queries

EF Core 5.0+ introduces split queries, which divide a single query into multiple smaller queries. Each query fetches data for a specific relationship, avoiding large result sets caused by joins.

var orders = dbContext.Orders
    .Include(o => o.OrderItems)
    .Include(o => o.Customer)
    .AsSplitQuery()
    .ToList();

How it works:

  • The first query fetches Orders.
  • The second query fetches OrderItems for those orders.
  • The third query fetches Customers for those orders.

This eliminates duplication and improves performance.

2. Apply Filters in Includes

When using Include statements, you can filter the related data to load only what is needed. This reduces the size of the result set and prevents unnecessary duplication.

var orders = dbContext.Orders
    .Include(o => o.OrderItems.Where(oi => oi.Quantity > 5))
    .Include(o => o.Customer)
    .ToList();

In this example, only OrderItems with a Quantity greater than 5 are included, reducing the number of rows.

3. Use Projections to Shape the Result

Instead of loading entire entities, use LINQ Select to fetch only the data you need. This avoids joins and allows you to customize the shape of the result.

var orders = dbContext.Orders
    .Select(o => new
    {
        o.Id,
        o.OrderNumber,
        Customer = new { o.Customer.Id, o.Customer.Name },
        OrderItems = o.OrderItems.Select(oi => new { oi.ProductName, oi.Quantity })
    })
    .ToList();

Projections reduce data duplication and allow fine-grained control over what is loaded.

4. Use Lazy Loading

Lazy loading defers the loading of related data until it is explicitly accessed. This can be useful when you don’t always need the related data.

To enable lazy loading, configure your entities and use proxies:

protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
{
    optionsBuilder.UseLazyLoadingProxies();
}

Ensure navigation properties are marked as virtual:

public virtual List<OrderItem> OrderItems { get; set; }
public virtual Customer Customer { get; set; }

Lazy loading avoids loading all related data upfront, preventing unnecessary joins.

5. Optimize Relationships

Reevaluate your entity relationships to reduce the number of navigation properties being eagerly loaded. Avoid including large or deeply nested relationships unless absolutely necessary.


When to Choose Each Approach

StrategyWhen to Use
Split QueriesWhen fetching multiple related entities and you want to avoid large result sets caused by joins.
Filtered IncludesWhen you only need a subset of related data.
ProjectionsWhen you want to customize the shape of the result and load only the required data fields.
Lazy LoadingWhen related data is not always needed or accessed. Be cautious, as excessive lazy loading can lead to the "N+1 query problem."
Optimize EntitiesWhen your data model contains unnecessary relationships that lead to excessive joins.

Conclusion

Cartesian explosion is a common problem when working with complex relationships in EF Core, but it can be managed effectively by using split queries, projections, lazy loading, filtered includes, and optimized relationships. By understanding how EF Core generates queries and tailoring your approach to suit your application’s needs, you can ensure optimal performance and scalability.

Take control of your queries and avoid the performance pitfalls of Cartesian explosion!

*This blog was optimized and refined with the assistance of ChatGPT