Add new comment

Using LINQ (Language Integrated Queries) in Drupal or how to write queries x5 faster

Difficulty: 
Come Get Some

Introduction

In this article I will show you how to use an enterprise ready technology (LINQ) to write your Drupal queries x5 faster, with improved readabilty, usability and maintainability.

LINQ is basically Queries in Code - not queries in strings in code - The advantages of having queries integrated in the language are, among others:

  • Strongly typed database schema. That means no more mistakes while writing plain SQL. If you improperly write the name of a table or a field the query simply won't compile at design time. 
  • No magic strings.
  • Intellisense (AKA Autocomplete). Explore the database tables and fields and get hints while writing the query.
  • Faster development.
  • Reusable abstract expressions (lambdas and functions)

One of the big problems with Drupal is that as your requirements and complexity grow both the database abstraction layer and the query builder start to shine for their lack of productivity features (it's not just Drupal but the whole PHP ecosystem). But in Drupal this problem is worse because the data model is extremely decoupled, loose and above all complex. That was the price we had to pay to get revisions, translations and multi value fields all over an abstracted layer that can provide compatibility for multiple database engines.

Before we go any further here is what our sample use case for this article looks like in Drupal VS LINQ. This is a very simple query, but it already looks a mess in Drupal (imagine what a really complex query looks like, you are better off writing plain SQL).

// Nodos del tipo inscripción.
$query = db_select('node','inscription');
$query->condition('inscription.type','inscripcion','=');
// Nos traemos el campo congreso
$query->join('field_data_field_congreso','field_congreso','field_congreso.entity_id = inscription.nid');
// Nos traemos la actividad congreso
$query->join('node','actividad','field_congreso.field_congreso_nid = actividad.nid AND actividad.type = :node_type_congreso', array(':node_type_congreso' => 'congreso'));
// Nos traemos el campo mail 
$query->join('field_data_field_email','field_email','field_email.entity_id = inscription.nid');
// Nos traemos al propietario
$query->join('users', 'inscription_owner', 'inscription_owner.uid = inscription.uid');
// Aplicamos criterio autor y actividad
$or = db_or();
$or->condition('inscription.uid', '', 'IS NULL');
$or->condition('inscription.uid', 1, '=');
$query->condition($or);
// El autor debe tener el mail indicado
$query->condition('field_email.field_email_email', $mail, 'LIKE');
// Nos traemos el ID de inscripción
$query->addField('inscription', 'nid');
$result = $query->execute()->fetchAll();

And this is what that same query looks like in LINQ (it's a GIF animation to show off the power of autocomplete):

No more typing errors, runtime malformed queries or having to look up every 5 seconds what the over engineered Drupal database schema looks like.

This is all cross platform compatible:

  • The queries are run on EF (Entity Framework) that can plug on top of SQL Server, MySQL, Oracle and others.
  • This all runs on top of the .Net framework, that is already working on Linux thanks to Mono and will be natively supported soon thanks to Microsoft.
  • You can interoperate with the LINQ queries from PHP either by means of the NetPHP libraries, or with Phalanger (both of those links point to Github repositories).

Setting up the project and folder structure

To support this from Drupal we will assume that any additional .Net code is going to be customer specific, so we are going to place it into the customer's customization module.

We are going to create a new "net" folder with two subfolders "bin" and "src". The bin is going to be the place where we will be deploying the compiled binaries, and the "src" will contain the Visual Studio project's and source code.

Because we want to keep the data model binaries separate from the actual code we will be creating 3 projects:

  • One containing the Entity Framework / Linq-To-Sql data clases and any custom database acces related logic
  • One containing all our custom code and queries
  • One containing .Net tests

Your solution should end up looking something like this:

Communicating .Net and PHP

The first thing we want to do is to bring the NetPhp technology into our project. This is a bridge that uses PHP's com_dotnet extension so bring into PHP the best of .Net.

We are using the great Composer Manager module to manage module dependencies on external libraries.

Add a composer.json to your module's directory that brings in the NetPhp php libraries:

{
    "require": {
        "drupalonwindows/netphp": "dev-master"
    }
}

Regenerate the global composer.json file either from the UI or running the drush command "composer-json-rebuild".

Navigate into sites/default/composer (or the location you have configured in composer manager) and run a composer install command.

Download the NetPhp binary library, and drop it inside your PHP installation directory (where the php.exe and php-fastcgi.exe files are located).

For this demo project we are going to put all our queries into the DrupalCustom.DrupalQueries class library project in C#. We need a way to retrieve an instance of that class, and make it aware of the database connection settings.

So we came up with this manager to alows us to retrieve the DrupalCustom.DrupalQueries instance whenever we need it and populated with the database connection options.

namespace Drupal\mymodule;

use \NetPhp\Core\NetManager;

class QueryManager {
  // @var NetManager $manager
  private static $manager = NULL;
  private static $DrupalQueries = NULL;

  public static function GetManager() {
    global $databases;
    if (static::$DrupalQueries== null) {
      static::$manager = new NetManager();
      $dll = DRUPAL_ROOT . '\\' . str_replace('/', '\\', drupal_get_path('module', 'mymodule') . '\\net\\src\\DrupalCustom\\bin\\Debug\\DrupalCustom.dll');
      static::$manager->RegisterAssembly($dll, 'DrupalCustom');
      static::$DrupalQueries = static::$manager->Create('DrupalCustom', 'DrupalCustom.DrupalQueries')->Instantiate();
      $conn = $databases['default']['default'];
      static::$DrupalQueries->InitDataContext('6470B', $conn['database'], $conn['username'], $conn['password']);
    }
    return static::$DrupalQueries;
  }
}

This is just a singleton like pattern that will retrieve an instantiated and database aware COM instance of our DrupalCustom.DrupalQueries class.

Let's see how it works with more detail.

To instantiate a .Net class from PHP you need to use a copy of the NetManager class. NetManager takes care of managing assembly and class namespaces:

static::$manager = new NetManager();

Then we must register the location of our binary file in the NetManager and give it an alias. Because we are on a development environment we will be using the /bin/Debug/DrupalCustom.dll file, but on production you might want to change this to another location. Using this location allows us to simply rebuild the .Net project and have everything updated real time.

$dll = DRUPAL_ROOT . '\\' . str_replace('/', '\\', drupal_get_path('module', 'mymodule') . '\\net\\src\\DrupalCustom\\bin\\Debug\\DrupalCustom.dll');
// Register the binary with DrupalCustom alias
static::$manager->RegisterAssembly($dll, 'DrupalCustom');

We are now ready to make an instance of our .Net class by specifying the assembly alias we registered before, and the Full Qualified Name of the class:

static::$DrupalQueries = static::$manager->Create('DrupalCustom', 'DrupalCustom.DrupalQueries')->Instantiate();

The next step is to make this query manager aware of what database is it going to use to run the queries on. We've implemented an InitDataContext that stores internally the database information into the query management class:

$conn = $databases['default']['default'];
static::$DrupalQueries->InitDataContext($conn['host'], $conn['database'], $conn['username'], $conn['password']);

That's it. From now on whenever you want to call a query that is stored in our .Net DrupalQueries class you simply need to call GetManager():

$result = QueryManager::GetManager()->MyCustomQueryOrMethod(param1, param2, param3);

The productivity edge

You might be thinking that the 5 fold claim in improved query development performance is an exaggeration. That is far from true.

Let's see what kind of Drupal specific magic we can leverage from LINQ besides having strongly typed queries, type hinting, autocomplete and design time validation of the query.

One of LINQ's most powerful feature is the ability to reuse expressions.

Let's say for example that we have the node entity, with a bundle named "inscripcion" and that this bundle has a text based field called "field_cargo". Writing the query in C# looks like this:

var query = (from inscripton in dcs.CurrentContext.node.AsExpandable()
             join cargo in dcs.CurrentContext.field_data_field_cargo on inscripton.nid equals cargo.entity_id
             select cargo.field_cargo_value);

But we know that the data structure of field cargo and the join needed to bring it into the query is going to be always the same. So we are going to encapsulate this into its own reusable expression:

public static Expression<Func<node, IQueryable<field_data_field_cargo>>> load_cargo = (n) =>
    DCScope.Current.field_data_field_cargo.Where((i) => i.entity_id == n.nid).DefaultIfEmpty();

Then the original query can reuse this expression for improved readability and maintainablity:

var query = (from inscripton in dcs.CurrentContext.node.AsExpandable()
             from cargo in NodeFields.load_cargo.Invoke(inscripton)
             select cargo.field_cargo_value);

Even better, you can use this same expression to bring into the query the "field_cargo" as many times as you want and with as many bundles of the same entity you would like to. This was just an example, you can bundle into a single expression as much logic as you wish, such as bringing in a field for an entity reference field or any custom conditions you can come up with.

There are some approaches in Drupal to facilitate querying data, such as the EntityFieldQuery, but it is so limited that it is barely useful for anything but for making your code a little bit more readable.

Performance

We are already leveraging deployment and development gains (design-time) but what are the run-time implications of this change. Is there a performance penalty (or gain) for having to use an additional layer of COM + LINQ instead of drupal's query builder?

We are going to benchmark a sample query under several different scenarios.

This is the query in Drupal:

//nodos del tipo inscripción.
$query = db_select('node','inscription');
$query->condition('inscription.type','inscripcion','=');
//unimos con las actividades (congresos)
$query->join('field_data_field_congreso','field_congreso','field_congreso.entity_id = inscription.nid');
$query->join('node','actividad','field_congreso.field_congreso_nid = actividad.nid AND actividad.type = :node_type_congreso', array(':node_type_congreso' => 'congreso'));
//aplicamos criterio autor y actividad
$query->condition('inscription.uid', $uid, '=');
$query->condition('actividad.nid', $nid, '=');
$query->fields('inscription', array('nid'));
$result = $query->execute()->fetchAll();

And this is the query in C#:

var result = (from inscription in dcs.CurrentContext.node.AsExpandable()
              join field_congreso in dcs.CurrentContext.field_data_field_congreso on inscription.nid equals field_congreso.entity_id
              join congreso in dcs.CurrentContext.node on field_congreso.field_congreso_nid equals congreso.nid
              where inscription.type == "inscripcion"
              && inscription.uid == uid
              && congreso.nid == nid
              select inscription.nid);

return result.FirstOrDefault();

We have verified that both queries output a similar SQL statement and that they are 100% equivalent.

We tested running the query 200 times in a tight loop in 3 different scenarios with the following results:

  Maximum (s) Average(s)
Drupal Query Builder 0.61 0.57
LINQ over Entity Framework 6 1.23 1.12
LINQ over Linq-To-Sql 0.57 0.499

Nothing really surprising, the entity framework is a big bloated piece of an ORM, and even with compiled queries (that come by default with EF6) they are 100% slower than Drupal's query builder or Linq-To-Sql. This being Microsoft Flagship ORM at the moment, we don't understand this results and there is a chance that there is some sort of misconfiguration, but we were not able to find it.

We are also aware that EF6 first compilation is slower (it then stores statically the compiled query in the process), and being PHP a state-less architecture looping 200 times over a query is not a real world scenario. EF6 is already out of this game.

We are going to do an ApacheBench test over a page that bootstraps drupal and runs a single query without concurrency and 1000 times. This is a more real life friendly scenario because the COM object will be instantiated on every request, instead of being reused in a loop.

define('DRUPAL_ROOT', getcwd());
require_once DRUPAL_ROOT . '/includes/bootstrap.inc';
drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);
// ----------- TEST 1 LINQ
$queries = Drupal\prevencionintegral\QueryManager::GetManager();
$uid1 = $queries->InscripcionDeUsuario(48577, 24778);
// ----------- TEST 2 DRUPAL QUERY BUILDER
$uid1 = Drupal\prevencionintegral\Managers\ManagerCongresoInscriptions::InscripcionDeUsuario(48577, 24778);

The results showed that there was no appreciable difference in throughput, even taking into account that the LINQ version has much more overhead (instantiating a COM object, communicating  through COM, etc.) as compared to Drupal's query builder.

Our conclusion is that from a performance standpoint there is no penalty when moving queries out of Drupal and into LINQ, actually you get a slight performance edge.