Add new comment

Importing Excel data with PHP, OpenXML and SpreadsheetLight: a Drupal example

Difficulty: 
Let's Rock

In this article you will learn how to manipulate Excel files from PHP using the OpenXML specification and the SpreadsheetLight library.

OpenXML is an open source specification that is used to build all Office 2007 and beyond document formats including Word and Excel. SpreadsheetLight builds on top of the standard OpenXML library to make Excel manipulation a piece of cake.

I will be using a Drupal based example in which an application was required to import data that administrators had to upload from Excel files into custom entities.

Introductory Example

Before we dive into the Drupal based example, let's see a quick visual demonstration on how to use the SpreadsheetLight library to work on some Excel files.

Here is the code sample from the previous animation:

    // SpreadsheetLight works on the idea of a currently selected worksheet.
    // If no worksheet name is provided on opening an existing spreadsheet,
    // the first available worksheet is selected.
    $sl = \ms\SpreadsheetLight\netSLDocument::SLDocument_Constructor("MyFile.xlsx", "Sheet1");

    $sl->SetCellValue("E6", "Let's party!!!!");

    $sl->SelectWorksheet("Sheet3");
    $sl->SetCellValue("E5", "Ready!!");

    $sl->AddWorksheet("DanceFloor");
    $sl->SetCellValue("B4", "aaaa");
    $sl->SetCellValueNumeric("B3", 25);

    $sl->SaveAs("D:\\MYNEWFILE.xlsx");

To get IDE integration and autocomplete for the SpreadhseetLight binary we rely on a PHP class map dumped with NetPhp.

Use case using Drupal

The first thing we need to do is create a form for admins to be able to upload their Excel files:

[..] // Code removed for demonstration purposes

class FormMassInscriptionUpload extends FormGeneric implements FormInterface {

  [..] // Code removed for demonstration purposes

  public static function generateForm($form, &$form_state) {

    global $user;
    $congresos = array();

    // Obtener lista de congresos
    $query = new \EntityFieldQuery();
    $query->entityCondition('entity_type', 'node')
      ->entityCondition('bundle', 'congreso')
      ->addMetaData('account', user_load(1));

    $result = $query->execute();
    $options = array();
    if (isset($result['node'])) {
      $congresos = array_keys($result['node']);
      foreach(entity_load('node', $congresos) as $nid => $congreso) {
        $entity = UtilsEntity::entity_metadata_wrapper('node', $congreso);
        $options[$entity->getIdentifier()] = $entity->title->value();
      }
    }

    $form['congreso'] = array(
      "#type" => "select", 
      "#title" => t("Seleccione el congreso"), 
      "#default_value" => NULL,
      "#options" => $options,
      "#description" => 'Congreso al que desea inscribir a los usuarios.',
      '#required' => true
    );
    
    $form['plantilla'] = array(
           '#type' => 'managed_file',
            '#title' => 'Fichero Excel con las inscripciones.',
            '#description' => 'Fichero con los datos de las inscripciones. Excel 2007 o superior.',
            '#upload_location' => 'temporary://',
            '#upload_validators' => array(
                'file_validate_extensions' => array('xlsx'),
                // Pass the maximum file size in bytes
                'file_validate_size' => array(file_upload_max_size()*1024*1024),
             ));

    $form['actions']['submit'] = array(
                                       '#type' => 'submit',
                                       '#value' => 'Procesar',
                                   );
    
    // Funciones de envío y validación.
    $form['#submit'][] = static::GetStaticMethodCallback('submit');
    $form['#validate'][] = static::GetStaticMethodCallback('validate');

    return $form;
  }
  
  [..] // Code removed for demonstration purposes

  public static function validate($form, &$form_state) {
    $file = file_load($form_state['values']['plantilla']);
    $controller = new ControllerMassInscriptions();
    $errors = $controller->ValidateTemplate($file->fid);
    if (!empty($errors)) {
      foreach($errors as $error) {
        form_set_error('plantilla', $error);
      }
      
    }
  }

  public static function submit($form, &$form_state) {
    $file = file_load($form_state['values']['plantilla']);
    $controller = new ControllerMassInscriptions();
    $controller->ImportTemplate($file->fid, $form_state['values']['congreso']);
  }
}

Nothing special here. We are just creating a Form plugin to gather input from the user that will validate that the Excel file has the required format and that will invoke the Excel processing once the form is succesfully submitted. Notice that all the Excel magic happens inside the ControllerMassInscriptions class:

use \ms\SpreadsheetLight\netSLDocument;

// [..] Code removed for demonstration purposes

/* Controller class for Mass Inscription Upload
 */
class ControllerMassInscriptions {

  /**
   * Mapeo de columnas del excel a propiedades/fields de la entidad.
   * 
   * @var mixed
   */
  public static $columns = array(
    'NOMBRE' => array(
        'field' => 'field_nombre',
        'container' => 'NOloMBRE',
        'required' => TRUE,
        ),
    // [..] Code removed for demonstration purposes
    );

  /**
   * Validate the contents of an excel file.
   * 
   */
  public function ValidateTemplate($fid) {
    // Cargar el excel y verificar que tiene todas las columnas que debe tener,
    // y que todas las columnas "básicas" tengan un valor correcto.
    $file = file_load($fid);
    $doc = netSLDocument::Object_Constructor(drupal_realpath($file->uri));

    // Scan the first row to tell what data is in what column.
    $cols = static::$columns;
    $this->ScanHeaders($doc, $cols);

    $errors = array();

    // Si alguna columna no tiene datos, avisar!
    foreach($cols as $name => $col) {
      if (!isset($col['column'])&& $col['required'] == TRUE) {
        $errors[] = "No se ha encontrado la columna {$name}";
      }
    }

    // Load the contents of the Excel.
    $contents = $this->LoadContents($doc, $cols);

    // Make sure all e-mails are valid
    foreach($contents as $row) {
      if (!valid_email_address($row->MAIL)) {
        $errors[] = "La dirección de correo {$row->MAIL} es inválida.";
      }
    }

    return $errors;
  }

  /**
   * Prepare a batch process using Drupal's batch API
   */
  public function ImportTemplate($fid, $congreso) {

    // El callback de finish necesita truco...
    $finish_callback = FdfCore::RegisterGlobalFunctionMap(UtilsFdf::GetStaticMethodCallback(static::class, 'ImportFinishCallback'),UtilsNamespace::ModuleNameFromFullName(static::class), FALSE);

    $controller = new ControllerMassInscriptions();

    // Cargar el excel y verificar que tiene todas las columnas que debe tener,
    // y que todas las columnas "básicas" tengan un valor correcto.
    $file = file_load($fid);
    $doc = netSLDocument::Object_Constructor(drupal_realpath($file->uri));

    // Scan the first row to tell what data is in what column.
    $cols = static::$columns;
    $controller->ScanHeaders($doc, $cols);

    // Load the contents of the Excel.
    $contents = $controller->LoadContents($doc, $cols);

    $batch = array(
      'title' => t('Importing'),
      'operations' => array(
        array(UtilsFdf::GetStaticMethodCallback(static::class, 'ImportProcessCallback'), array($fid, $congreso, $contents)),
      ),
      'finished' => $finish_callback,
    );

    batch_set($batch);
  }

  /**
   * This is the callback for the batch process.
   */
  public static function ImportProcessCallback($fid, $congreso, $contents, &$context) {

    if (empty($context['sandbox'])) {
      $context['sandbox']['progress'] = 0;
      $context['sandbox']['current_row'] = 0;
      $context['sandbox']['results'] = array();
      $context['sandbox']['max'] = count($contents);
    }

    // Uno par por request.
    for ($x = 0; $x < 3; $x++) {
      $item = array_values($contents)[$context['sandbox']['current_row']];
      if (empty($item)) {
        break;
      }
      $context['results'][] = static::ImportItem($item, $congreso);

      $context['sandbox']['current_row']++;
      $context['sandbox']['progress']++;
    }

    if ($context['sandbox']['progress'] != $context['sandbox']['max']) {
      $context['finished'] = $context['sandbox']['progress'] / $context['sandbox']['max'];
    }
  }


  /**
   * Importa un solo ítem!
   */
  public static function ImportItem(RowContainerMassInscription $item, $congreso) {
    $result = array();
    $columns = static::$columns;

    try {

      // Vemos si el usuario ya existe en el sistema
      $inscripciones = ManagerCongresoInscriptions::InscripcionDeMail($item->MAIL, $congreso);
      if (!empty($inscripciones)) {
        $result[] = array('type' => 'warning', 'message' => "El usuario con el e-mail $item->MAIL ya tiene una inscripción al congreso.");
      }
      else {
        // Creamos la inscripción.
        $inscription = entity_create('node', array('type' => 'inscripcion'));
        $inscription = UtilsEntity::entity_metadata_wrapper('node', $inscription);
        foreach ($columns as $col) {
          $inscription->{$col['field']} = $item->{$col['container']};
        }
        // El propietario es el root.
        $inscription->author = 1;
        $inscription->field_congreso = (int) $congreso;
        $inscription->save();
        // Aquí ni se generan facturas ni ostias, si quieren facturas que las hagan desde otro sitio.
        // Lo mismo con la suscripción a boletines.
      }
    }
    catch(\Exception $e) {
      $result[] = array('type' => 'error', 'message' => "Error importando el usuario con e-mail {$item->MAIL}: {$e->getMessage()}");
    }

    return $result;
  }

  /**
   * Process results. 
   */
  public static function ImportFinishCallback($success, $results, $operations) {
    // The 'success' parameter means no fatal PHP errors were detected. All
    // other error management should be handled using 'results'.
    if ($success) {
      $message = format_plural(count($results), 'One post processed.', '@count posts processed.');
    }
    else {
      $message = t('Finished with an error.');
    }

    drupal_set_message($message);

    // Providing data for the redirected page is done through $_SESSION.
    foreach ($results as $result) {
      foreach($result as $message) {
        drupal_set_message($message['message'], $message['type']);
      }
    }
    
  }

  /**
   * Scan the file headers for column matching.
   */
  private function ScanHeaders($doc, &$cols) {
    $searchMax = 50;
    $searchCurrent = 0;
    while(TRUE && $searchCurrent < $searchMax) {
      $content = $doc->GetCellValueAsString(1, $searchCurrent);
      if (!empty($content)) {
        $content = trim($content);
        if (isset($cols[$content])) {
          $cols[$content]['column'] = $searchCurrent;
        }
      }
      $searchCurrent++;
    }
  }

  /**
   * Load a list of container from the excel file.
   */
  private function LoadContents($doc, array $cols) {
    $result = array();
    $row = 2;
    while(TRUE) {
      $r = new RowContainerMassInscription();
      foreach($cols as $col) {
        if (isset($col['column'])) {
          $value = trim($doc->GetCellValueAsString($row, $col['column']), "\t\n\r\0\x0B ");
          if(mb_detect_encoding($value, 'UTF-8', TRUE) == FALSE) {
            $value = utf8_encode($value);
          }
          $r->{$col['container']} = $value;
        }
        if (empty($r->{$col['container']}) && $col['required'] == TRUE) {
          break 2;
        }
      }
      $result[] =  $r;
      $row++;
    }
    return $result;
  }
}

Although the code is self explanatory I'll comment on some key parts of it.

The ValidateTemplate() method is the validation method being called from the form plugin. It loads the Excel file using the SLDocument object from the SpreadsheetLight library (you can find many excel manipulation examples in C# that you can straight port to PHP).

The ScanHeaders() method searches for a predefined set of Headers by their "string" name (such as 'Nombre', 'Apellidos', etc...) in the Excel file and asigns to each of them a column numbers so that data in the Excel file can be mapped to the actual data we are expecting. So the column order does not matter as long as the columns in the Excel file have the "titles" we expect them to have:

Using the header matching data the LoadContents() method scans the Excel file and extracts all the information so that it can be later processed by our Batch process. To do so it uses the SLDocument->GetCellValueAsString() to retrieve the cell values for ever row. There were some encoding issues with the input from the Excel file so some encoding checks and conversions where required.

        if (isset($col['column'])) {
          $value = trim($doc->GetCellValueAsString($row, $col['column']), "\t\n\r\0\x0B ");
          if(mb_detect_encoding($value, 'UTF-8', TRUE) == FALSE) {
            $value = utf8_encode($value);
          }
          $r->{$col['container']} = $value;
        }

The ImportTemplate() method is the method being called from the form submit and the one in charge of setting up the Batch API process that will be in charge of importing the Excel data. This is how it goes:

  • ImportTemplate() calls ScanHeaders() and LoadContents() to scan the Excel file and retrieve all the data into a PHP friendly data sturcture. 
  • ImportTemplate() sets up a Batch API process that processes 3 rows per request using the ImportProcessCallback() callback.
  • ImportProcessCallback() manages the Batch process and sends row data to ImportItem()
  • ImportItem() actually dumps the data into a Drupal Entity and saves it
  • When the Batch process is over, ImportFinishCallback() is called by the Batch API where we send some messages to the user.