Only update changed fields or properties for an entity in Drupal

Only update changed fields or properties for an entity in Drupal

Difficulty: 
Let's Rock

When you save (precisely for an update) an entity Drupal does a massive job:

  • Retrieve an unchaged copy of the original entity (entity_load_unchanged)
  • Update the entity and entity_revision table.
  • Issue and update to bind the revision id in the main entity table (even if unchanged)
  • Update all fields, with their revisions.
  • Invalidate the cache for the entity.
  • Trigger many, many hooks in the way

Our sample entity for this article has 12 fields and the previous adds up to a total of (estimate):

  • Loading the unchanged entity, about 13 select statements (for fields + entity table)
  • Updating the entity and the revision, 2 update statements.
  • Updating the fields (with revisions disable): 12 update statements.
  • Additional modules whose hooks are triggered on an entity update such as the metatag, search api or print module.

We profiled a call to entity_save() for the code example you will find below in this article, with the following results (real):

  • Total of 191 statements issued against the database. Really? This entity only has 12 fields....
  • 35 of them for transaction management (SAVE TRANSACTION SAVEPOINT)
  • 36 of them SELECT 1 - I'd like to find out where these are comming from
  • This leaves us with only 120 "real" statements, still too much for a 12 field entity.
  • About 20 as a result of calling the entity_update hook and performed by 3d party modules (metatag, print, search api, node block and views_content_cache)

[UPDATE] This will become obsolete after this issue gets into core, where field SQL storage will compare current values with the original entity.

The complete statement trace of the Database Engine is provided as an attachment to this article to prove I'm not making this up and that they are real numbers.

This means that you cannot rely on using default entity storage techniques if you plan to produce maintainable, fast and scalable code.

On a more straighforward and simple database design all this could be all done in a single insert statement.

The first thing to consider is installing and setting up the Field SQL No revisions module. On 99.99% of use cases, field revisions are not going to be used. By installing and configuring this module you are overriding the default Field SQL storage engine with one that does not store revision data for fields

What if you just wanted to update one or two fields - or properties - from an entity?

We scouted the internet and found solutions that did not live up to our expectations, such as this one:

//Get the id of your field
$name = 'name_of_your_field';
$info = field_info_field($field_name);
$fields = array(info['id']); 

//Execute the storage function
field_sql_storage_field_storage_write('model', $entity, 'update', $fields);

Or this one:

$field = new stdClass();
$field->type = 'story_cover'; // content type name
$field->nid = $node->nid; // node id
$field->field_number_of_pages[LANGUAGE_NONE][0]['value'] = 'YOUR_VALUE'; // field name
field_attach_update('node', $field);

You can clearly see (and not worth dicussing) that these are umantainable, non intuituitive, messy and error prompt approches. Reminds me of the whole PHP language itself or a big chunk of the code in the PHP ecosystem: a fractal of bad design.

What we need is a solution that:

  • Is transparent, consistent and easy to use for the developer.
  • Perfectly integrates with current Entity manipulation tools.
  • Is as lightweight as possible, yet still does not completely drill through the different abstraction APIs.
  • Ideally, will only update the information that needs to be updated - only  update what has changed.

Consider the following piece of code:

// Vamos a manipular la inscripción.
$inscription = UtilsEntity::entity_metadata_wrapper('node', (int) $inscription_id);
// La propia matrícula debe estar vinculada al pedido a través
// de un campo de referencia.
$vinculado = false;
foreach($inscription->field_referencia_pedidos as $pedido) {
  if ($pedido->getIdentifier() == $order->order_id) {
    $vinculado = true;
    break;
  }
}
if (!$vinculado) {
  // Lo vinculamos si no lo estuviera.
  $inscription->field_referencia_pedidos[NULL] = $order->order_id;
}
if (in_array($order->order_status, array('completed', 'payment_received'))) {
  if ($inscription->field_estado_insc->value() != 'paid') {
     $inscription->field_estado_insc = 'paid';
  }
} 
else if (in_array($order->order_status, array('canceled', 'pending' , 'abandoned', 'in_checkout'))) {
  if ($inscription->field_estado_insc->value() == 'paid') {
    $inscription->field_estado_insc = 'pending';
  }
}     
// Sleek save
$inscription->save();

This is more or less good practice code regarding entity manipulation in Drupal. But no matter what happens in that code, when save() is called the whole entity save process is triggered, issuing a mind blowing number of database statements.

Before going any further, if we wanted to only update what has changed we would need to either:

  • Keep track of what has changed, and only update that.
  • Compare the current entity with the original one and detect all potential changes.

Each one of these has different implications:

Keeping track of changes: requires using a consistent way of manipulating the entity (possibly a wrapper) and any change done to the entity outside the wrapping mechanism will not be detected.

Compare to original: requires either detaching (clone) the entity instance before manipulating it or retrieving a fresh entity (entity_load_unchanged) from the database before the comparison. From a performance point of view the first one is the most feasible, but requires the coder to always make sure that the entities are detached (cloned) before being tampered with because that is the only way we can retrieve the original entity form cache without having to fully reload from storage.

We implemented sample approaches for each one of these, and finally concluded that the more natural and maintainable way to have sleek entity updates was to extend the EntityMetadataWrapper to track property and field changes, and to only update the data that has changed.

With this approach, the above sample piece of code will need no (nearly) changes at all. The metadata wrapper will detect what has changed and only update what is required. You can see the complete implementation at the end of the article (it's a trimmed down version of the actual code, this is part of Fdf - FastDevelopmentFramework).

We had to:

  • Implement a custom version of entity_metadata_wrapper() that returns an instance of our derived version of EntityDrupalWrapper
  • Implement a derived class of EntityDrupalWrapper (FdfEntityMetadataWrapper) that keeps track of changes and overrides the default save() function to only update what is needed.
  • Not in the example: override EntityListWrapper, EntityStructureWrapper and other methods so that our FdfEntityMetadataWrapper is used consistently when interacting with the wrapper.

Don't worry, all this is just about 75 lines of code.

What are the advantages of this strategy?

  • Very little disruption to actual code (if already using the EntityDrupalWrapper)
  • Coder needs not to think about how storage is managed, but must be aware of the internal behaviour of selective updates.
  • The entity and revision table are only updated if any of the properties of the entity have been set through the wrapper, and only the changed table fields (properties) will be updated.
  • Only fields that have been set will be updated
  • We are not drilling through the abstraction layer and into the storage engine like the approaches that use field_sql_storage_field_storage_write
  • This manipulates consistently the entity as a whole making the update transactional

Remember that at the start of the article we traced a total of 191 statements - only 120 of them real - when performing the save() call, even when no data has changed in the entity.

Changing the EntitWrapper for our custom wrapper with field/property change detection lead to:

  • 0 statements if the entity was not manipulated.
  • 9 statements (4 real) if 1 single value field was updated.
  • 23 statements (15 real) if 2 fields were updated (one of them is a multivalue field with 4 values at the time of the insert)
  • 2 statements (2 real) if we changed any (or all) of the properties of the entity such as title, timestamp, etc.

With the change, the above logic has moved from issuing always 191 statements to regularly issuing 0 (the logic is called on every order update, but usually no changes are made to the $inscription entity) and issuing at the most 23 statements in the worst case where the 2 fields present in the snippet have been tampered with.

Of course depending on your priorities you could make different changes such as updating the entity's timestamp when fields are updated, or triggering some of the hooks that have been omitted in the implementation. We decided to keep as many hooks as possible away from here because if any of them relies on changing anything in the entity the changes will not be persisted as they will not have been detected by the FdfEntityMetadataWrapper.

<?php

namespace Drupal\fdf\Entity;

/**
 * Extends EntityDrupalWrapper to provide property and field
 * change tracking.
 */
class FdfEntityMetadataWrapper extends \EntityDrupalWrapper {

  // @var $changed_fields string[]
  private $changed_fields = array();
  // @var $changed_properties string[]
  private $changed_properties = array();
  
  /**
   * Permanently save the wrapped entity.
   *
   * @throws \EntityMetadataWrapperException
   *   If the entity type does not support saving.
   *
   * @return \EntityDrupalWrapper
   */
  public function save($fast = TRUE) {
    // Only save if fields or properties have changed.
    if (empty($this->changed_fields) && empty($this->changed_properties)) {
      return $this;
    }
    if ($this->data) {
      if (!entity_type_supports($this->type, 'save')) {
        throw new \EntityMetadataWrapperException("There is no information about how to save entities of type " . check_plain($this->type) . '.');
      }
      if (empty($this->getIdentifier()) || !$fast) {
        entity_save($this->type, $this->data);
      }
      else {
        static::UpdateEntityFast($this->type, $this->data, array_keys($this->changed_fields), array_keys($this->changed_properties));
      }
      // On insert, update the identifier afterwards.
      if (!$this->id) {
        list($this->id, , ) = entity_extract_ids($this->type, $this->data);
      }
    }
    // If the entity hasn't been loaded yet, don't bother saving it.
    return $this;
  }
  
  /**
   * Magic method: Set a property.
   */
  protected function setProperty($name, $value) {
    $info = $this->getPropertyInfo($name);
    if (isset($info['field']) && $info['field'] == TRUE) {
      $this->changed_fields[$name] = TRUE;
    }
    else {
      $this->changed_properties[$name] = TRUE;
    }
    parent::setProperty($name, $value);
  }
  
  /**
   * Update only the specified fields and properties for the entity
   * without triggering hooks and events.
   *
   * @param string $entity_type
   * @param mixed $entity
   * @param array $fields
   */
  public static function UpdateEntityFast($entity_type, $entity, $fields, $properties) {
    $transaction = db_transaction();
    global $user;
    try {
      $info = entity_get_info($entity_type);
      $id = entity_id($entity_type, $entity);
      if (empty($id)) {
        throw new \Exception("Este método solo puede usarse para actualizar entidades.");
      }
      // Extract the ID
      $key_name = $info['entity keys']['id'];
      $key_revision = $info['entity keys']['revision'];
      if (!empty($fields)) {
        // Instance and type.
        $update_entity = new \stdClass();
        $update_entity->type = $entity->type;
        // Set the ID.
        $update_entity->{$key_name} = $id;
        // Copy the fields that we want to attach.
        foreach ($fields as $field) {
          $update_entity->{$field} = $entity->{$field};
        }
        // Update the field.
        field_attach_presave($entity_type, $update_entity);
        field_attach_update($entity_type, $update_entity);
      }
      if (!empty($properties)) {
        // Update the main record.
        $record = new \stdClass();
        $record->changed = REQUEST_TIME;
        $record->timestamp = REQUEST_TIME;
        $record->{$key_name} = $id;
        foreach ($properties as $property) {
          $record->{$property} = $entity->{$property};
        }
        drupal_write_record($info['base table'], $record, $key_name);
        // Update the revision.
        $record->uid = $user->uid;
        $record->{$key_revision} = $entity->{$key_revision};
        drupal_write_record($info['revision table'], $record, $key_revision);
      }
      // Invalidate this cache entity.
      entity_get_controller($entity_type)->resetCache(array($id));
    }
    catch (Exception $e) {
      $transaction->rollback();
      watchdog_exception('Fdf', $e);
      throw $e;
    }
  }
}
<?php

namespace Drupal\fdf\Utilities;

use \Drupal\fdf\FdfCore;
use \Drupal\fdf\Entity\FdfEntityMetadataWrapper;

class UtilsEntity {
  
  /**
   * Summary of entity_metadata_wrapper
   * @param mixed $type 
   * @param mixed $data 
   * @param array $info 
   * @return \EntityDrupalWrapper|\EntityListWrapper|\EntityStructureWrapper|\EntityValueWrapper
   */
  public static function entity_metadata_wrapper($type, $data = NULL, array $info = array()) {
    if ($type == 'entity' || (($entity_info = entity_get_info()) && isset($entity_info[$type]))) {
      // If the passed entity is the global $user, we load the user object by only
      // passing on the user id. The global user is not a fully loaded entity.
      if ($type == 'user' && is_object($data) && $data == $GLOBALS['user']) {
        $data = $data->uid;
      }
      return new FdfEntityMetadataWrapper($type, $data, $info);
    }
    elseif ($type == 'list' || entity_property_list_extract_type($type)) {
      return new \EntityListWrapper($type, $data, $info);
    }
    elseif (isset($info['property info'])) {
      return new \EntityStructureWrapper($type, $data, $info);
    }
    else {
      return new \EntityValueWrapper($type, $data, $info);
    }
  }
}

 

Comments

This provides great performance info as well as the s art coding solution. To me, the difference between update complexity of a Field update versus an Entity Property update is very insightful. While Fields provide some great advantages, Entity properties are clearly something that should be considered when working with complex data items (id ones with more than 3 fields/properties). Not only are properties faster to update, scale without impact on multiple but they also are much faster for queries in Views.

A man after my own heart... I am going to try to make this work in my modules. I hate all the excessive overhead in Drupal. Thank you.

Add new comment

By: david_garcia Wednesday, April 22, 2015 - 07:00