The N+1 Selects Problem

The N+1 selects problem can occur wherever a Domain object has a multi-valued property. Client code triggers a select query on the database to get hold of a first object, then a further 'N' selects to get hold of each member of the multi-valued property.

In the case of the Starter Application, most of the problem is taken care of in the object factories. The factory that creates the first object sets its multi-valued properties to collection proxies (in doAddRelations). In the diagram below for example, the Foo factory sets a Foo::Bars property to a Bar collection proxy. If/when the multi-valued property is accessed, data for the entire collection is retrieved in a single query.



Potential for N+1 Select Problems

This works fine for simple two-level, 'parent/child' relationships, however the situation becomes more complex when additional levels are introduced. Where three levels of objects are accessed, the factory that creates the first object must use special collection proxies for multi-valued properties, proxies which not only load child data, but also 'eagerly load' data for objects related to those children.

Eager loading can become unwieldy in complex object graphs. If you are regularly working more than three levels deep it may be time to consider an ORM or alternative database access patterns.

Eager Loading

In the case of the diagram above, if Foo is loaded in order to access Baz or the Bats, the N+1 problem will occur. To avoid this, client code must forewarn the Foo factory that it intends to access Baz and Bat, and the Foo factory must then attach a Bar collection proxy that eagerly loads Baz and Bat data. I use 'flavours' to let the factory know when it should eagerly load data.

The possible flavours of a factory are stored as class constants in the factory itself, where the value of each constant is a bitmask. The actual flavour of a factory instance is stored as a bit field in $factory->flavour, and can be modified via the fluent methods setFlavour and addFlavour. (Note: if, for some reason, your service locator returns a new instance of a factory each time it is called, you will need to use a static factory::flavour.)

Suitable flavours for the Foo factory would be:

const BARS_WITH_BAZ  = 1;
const BARS_WITH_BATS = 2;

Client code, such as the Service Layer, sets the flavour as follows.

$this->serviceLocator->get('Core\Domain\Factory\Foo')
    ->setFlavour(Factory\Foo::BAR_WITH_BAZ)
    ->addFlavour(Factory\Foo::BAR_WITH_BATS);

The Foo factory needs to be updated with conditional code that sets the correct proxies based on the current factory flavour. The changes are limited to doAddRelations, and the updated method is as follows:

protected function doAddRelations(array &$data)
{
    if (empty($data['bars'])) {

        if (($this->flavour & self::BAR_WITH_BAZ) &&
            ($this->flavour & self::BAR_WITH_BATS)) {

            $data['bars'] = $this->getCollectionProxy(
                'Bar', 'findByFooWithBazAndBats', array($data['id']));

        } elseif ($this->flavour & self::BAR_WITH_BAZ) {

            $data['bars'] = $this->getCollectionProxy(
                'Bar', 'findByFooWithBaz', array($data['id']));

        } elseif ($this->flavour & self::BAR_WITH_BATS) {

            $data['bars'] = $this->getCollectionProxy(
                'Bar', 'findByFooWithBats', array($data['id']));

        } else {

            $data['bars'] = $this->getCollectionProxy(
                'Bar', 'findByFoo', array($data['id']));
        }
    }
}

Of course this will have a knock-on effect on the Bar mapper which must be updated with three new finder methods: findByFooWithBazAndBats, findByFooWithBaz, findByFooWithBats.

These new methods are more complex than other mapper methods. You will need to decide whether to join tables and bring data back in one query, or run multiple queries, and you will need to manipulate arrays and sub-arrays. It will help if you have mastered PHP array functions. Closures are often useful here too.

Note that BarFactory::doTypeConversion is already set up to handle arrays as inputs for the Baz and Bats properties, and is therefore ready to work with the multi-dimensional arrays produced by the new Bar mapper methods. I have found that it is a good policy to build factories that handle arrays as inputs for all properties that are objects or collections. In most cases they will eventually come in handy.

If you are comfortable with the eager loading approach, but not the proliferation of Mapper methods, you could consider a solution that uses a single mapper method to load the Bar collection with all direct associates, something like BarMapper::findByFooWithAssociates.

A Note on Collections

I started this section by saying that the N+1 problem can occur wherever a Domain object has a multi-valued property, but there is a second form of the problem too. The second form occurs when client code, usually the Service Layer, calls a mapper method which returns a collection, and each member of the collection has a proxied property. If the client iterates the collection, accessing the proxied property along the way, the result is N trips to the database.

The obvious solution to this is to add mapper methods that eagerly load data for properties which were proxied, and let the Service Layer call these instead of the more simple finders. In my experience, these additional mapper methods are often one-and-the-same as the methods required for eager loading in the first form of the problem, which is convenient.

Tags: