Cache Design in Drupal


for Developers



Jimmy Huang at
PHPConf 2013, Taipei, Taiwan

Why Drupal?


  • Drupal is slow without Cache

  • Drupal's flexibility heavily depends on cache design

  • I love Drupal - already fall in love over 8 years

  • Many developers verify these design

Without Cache  → Cached

        350ms → 170ms

Drupal 7 clean install

Without Cache  → Cached

5,977ms  →  627 ms

  • Over 10 custom fields
  • Custom Categories, Tag
  • Maps, Location info
  • Author Statistics
  • Related Articles




Best open source choice of my life


- Jimmy Huang   

Drupaltaiwan.org (2006)


Drupalcamp Taipei (2012)


Founder of Drupal "Dries" on the screen

Drupalcamp Hackthon 2013


for open street map tw / ubuntu taiwan

Become "Real" Drupaler

 


Multi-Language Sites

  

One-to-many sub site



Forum / Community Website


EC/Shopping Cart


Stackoverflow like Website


News Portal / Videos Portal


Enterprise Information Portal


Crown Founding(Kickstarter Like)


Food traceability system


CRM / Event Register (With CiviCRM)


Congress data info


Mobile Content Backend

 

Most Important: Developers

2001 ~ 2013 commit log visualization of Drupal





If WordPress is what Web designers choose, 
Drupal is what Web Developers choose.
- Andrew Oliver    



source

Views make me lazy




Content Type make me boring





Flexibility perfect but...



Slow... without tuning / caching


Drupal here!!








Everything  Cached
in modern web app

Cache design highlight in Drupal

  • Consider Static Resources
    • CSS, Javascript, Images, CDN support
    • Static HTML (3-party)

  • Different Bin  for different usage or module
    • Core: Page cache, Config cache, object cache, routing ...
    • Custom Module: Rendered content cache ...

  • Swappable Cache engine
    • Database (core)
    • Memcache, APC, File based, AWS (3-party)





    Static Resources





    304 not modify

    when visit Cached Page

    1. Tell browser take control of cache
    2. Calculate cache lifetime for browser
    • check cookie / session time
    • check "Max Age" settings in Drupal
  • Tell browser the actual "Max-Age"


  • Cache-Control

    Header
     Cache-Control: public, max-age=21600
    Code
    // If the client sent a session cookie, a cached copy will only be served
    // to that one particular client due to Vary: Cookie. Thus, do not set
    // max-age > 0, allowing the page to be cached by external proxies, when a
    // session cookie is present unless the Vary header has been replaced or
    // unset in hook_boot().
    $max_age = !isset($_COOKIE[session_name()]) || isset($hook_boot_headers['vary']) ? variable_get('page_cache_maximum_age', 0) : 0;
    $default_headers['Cache-Control'] = 'public, max-age=' . $max_age;

    Etag, Last-Modified, Expires


    • Always send expire header in early year
      • Let Etag and Last-Modified to take control expires

    • Assign cache created time to  Last-Modified
      • that the correct meaning of last-modified

    • Assign cache created time to Etag
      • Cheap unique id for this page
      • Sync with Last-Modified
    Last-modified
    default_headers['Last-Modified'] = gmdate(DATE_RFC1123, $cache->created);

    ETag
    $etag = '"' . $cache->created . '-' . intval($return_compressed) . '"';

    Expires
    // HTTP/1.0 proxies does not support the Vary header, so prevent any caching
    // by sending an Expires date in the past. HTTP/1.1 clients ignores the
    // Expires header if a Cache-Control: max-age= directive is specified
    // 2616, section 14.9.3).$default_headers['Expires'] = 'Sun, 19 Nov 1978 05:00:00 GMT';
    see also: drupal_serve_page_from_cache



    CSS / JS aggregation


    cross modules

    JS / CSS in modules

    Situation

    • Different modules have own css / js
    • Different modules in different directory
    • Some module use jQuery plugin, some not
    • Themes has own css, javascript
    • IE have 30 css file limitation

    Solution

    • Put css / js into static array
    • Output when all the modules/themes done
    • Aggregate files to decrease front-end connection

    After Aggregation

    13 files → 4 files

    Add Resource / Build Cache

    drupal_add_js (50 calls)
    drupal_add_css (42 calls)

    drupal_build_js_cache
    drupal_build_css_cache

    Template just a single variable


     <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="<?php print $language->language; ?>" version="XHTML+RDFa 1.0" dir="<?php print $language->dir; ?>"<?php print $rdf_namespaces; ?>>
    
    <head profile="<?php print $grddl_profile; ?>">
      <?php print $head; ?>
      <title><?php print $head_title; ?></title>
      <?php print $styles; ?>
      <?php print $scripts; ?>
    </head>
    <body 





    Image Cache

    Custom Image Size and Style


    Cache resized image based on layout



    "Cache" Image by style

    1. Check if resized image exists

    2. If not, generate image base on style

    3. Cache generated image

    4. Next time will serve static img directly

    Image cache: file test than passed to PHP

    Apache: mod_rewrite config

    # Pass all requests not referring directly to files 
    # in the filesystem to
    # index.php. Clean URLs are handled in
    # drupal_environment_initialize().RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule ^ index.php [L]

    Nginx: try_files config

    location ~ ^/sites/.*/files/styles/ {
    try_files $uri @rewrite; }location @rewrite { rewrite ^ /index.php; }

    Save request of PHP

    • 1,500ms → 600ms
    • Use keep-alive to serve multiple image in 1 http





    Static HTML Cache

    (third-party module)

    Poorman's high performance cache

    • File exists then serve cached html
      • Same as image cache
    • Anonymous visitor only
      • Drupal deliver whole page cache to anonymous only
    • Check cookie to detect anonymous in Web Server
      • Apache: 
        RewriteCond %{HTTP_COOKIE} SESS
      • Nginx
        map $http_cookie $no_cache {  default 0;
          ~SESS 1; # PHP session cookie
        }

    See also:  Boost module


    Bootstrap cache
    settings, translations, modules, autoload ...

    Bootstrap in Drupal

    Drupal will bootstrap on every visit

           When visit a url "http://example.com/node/123":
    • Page cache exists ? return cache
      • Load global settings (variable)
      • Load sessions, user object
      • Load translations
      • routing permission of this url ? return 403, 404
        • Load System Modules
        • Autoload prepared
          →  Execute page result




    Page Cache

    Save printed HTML to cache

    1. After Render all the element (drupal_deliver_html_page)
    2. Before the end, ob_get_clean to save all HTML
    3. Next visit, deliver whole page

    Feature

    • deliver to Anonymous user only
    • support any type of cache  bin (varnish, memcached)
    • store gzipped version to save both CPU/bandwidth



    Performance compare sheet


    Cache Type 100 Page Avarage
    none 414ms per page
    Database (Drupal default) 53ms  per page
    Memcached 32ms  per page  
    Boost 0.264ms  per page  

    Tested in Linode 1024, Nginx + PHP 5.3




    Settings

    Every module save config easily

    Save config -  variable_set

    $custom_var = array(
      'test1' => 1,
      'test2' => 2,
    );variable_set('mymodulename_custom_var', $custom_var);


    Retrive config - variable_get

    $var = variable_get('mymodulename_custom_var', array()); 

    500+ serialize blob record in DB

    All the settings save to database table.

    Cache all config to single record


    • Clear cache when save new config
    • Retrieve cache on every bootstrap

    With / without variable cache

    saved 30-50ms in large site


    see also: _drupal_bootstrap_variables
    variable_initialize




    Translations

    5000+ strings saved in database

    • Every module can use t() to translate string
    • 1000+  calls of t() per page
    • Cache strings into 1 record, to save DB overhead
    • Use static array to save duplicate string loading
    • Cache clear when translate string





    Modules

    &

    Autoload

    Drupal is Module based system

    • 100~200 modules for modern website
    • 500+ calls per request - Lookup module frequently
    • Scan whole directory - io overhead

    Module List Cache


    • Cache refresh when enable / disable module
    • 1 record saved whole directory scan

    see also: system_list

    Autoload

    • Registry Class into database when module enabled
    • Autoload Class related file when exists
    • Save database class mapping into cache
    • Autoload list from cache without DB overhead

    see also: _registry_check_code_registry_update


    overhead for works together 

    • Modules work together by hooks even independently
    • Every hook call function_exists loops all modules
    • 200+ calls of module_implements, 100 module enable
      → loop 200*100 to check function_exists

    hook_form_alter
    example:
    module "profile" modify form element made by module "user"
    function profile_form_alter(&$form, &$form_state, $form_id) {
      if (($form_id == 'user_register_form' || $form_id == 'user_profile_form')) {
        // modify form element here...
      }
    }

    Cache on Module hooks


    • Run all the loops first time
    • Cache by hook indexed array after page load
    • Next time, just loop cached hook
    • When invoke specific hook, only loop array[hook]
      • cached    : 1000+ function_exists calls (3ms)
      • no cache: 20000+ function_exists calls (70ms)

    see also: module_implementsmodule_implements_write_cache





    Content Cache





    Article

    Article field in clicks in Drupal

    but ... every field is a single table entity

    When visit an Article, we need to...

    • Join (1 + numer of field) tables - DB overhead
    • Parse text to specific format - PHP overhead
      • parse wiki syntax, bbcode, remove un-secure html ..
    • Prepare Article Object for usage

    prepare Article (node) object


    • Load 10+ times in node page
      • check permissions
      • lookup language
      • check revisions
      • ....
    • Include all info of an article
      • these info to be rendered to html

    see also: node_load

    Static array for loaded object

    to speed up multiple load per page
    without cache: 48ms → 7ms

        // Try to load entities from the static cache, if entity type supports
        // static caching.
        if ($this->cache && !$revision_id) {
          $entities += $this->cacheGet($ids, $conditions);
         // If any entities were loaded
         // remove them from the ids still to load.      if ($passed_ids) {
            $ids = array_keys(array_diff_key($passed_ids, $entities));
          }
        }
    source: DrupalDefaultEntityController::load 

    Field cache:  every insert / update

    • Serialized array put into cache bin
    • Cached structure can be used by other module
     a:5:{s:4:"body";a:1:{s:3:"und";a:1:{i:0;a:5:{s:5:"value";s:834:"Lorem ipsum dolor sit amet, consectetur adipiscing elit. Etiam ac pellentesque tellus. Sed ullamcorper, tellus euismod luctus .... faucibus.";s:7:"summary";s:0:"";s:6:"format";s:9:"full_html";s:10:"safe_value";s:846:"<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Etiam ac pellentesque tellus. Sed ullamcorper, tellus euismod luc ... bus.</p>\n";s:12:"safe_summary";s:0:"";}}}s:10:"field_tags";a:1:{s:3:"und";a:3:{i:0;a:1:{s:3:"tid";s:1:"1";}i:1;a:1:{s:3:"tid";s:1:"2";}i:2;a:1:{s:3:"tid";s:1:"3";}}}s:11:"field_image";a:1:{s:3:"und";a:1:{i:0;a:13:{s:3:"fid";s:2:"14";s:3:"alt";s:0:"";s:5:"title";s:0:"";s:5:"width";s:3 :"960";s:6:"height";s:3:"720";s:3:"uid";s:1:"1";s:8: "filename";s:36:"25919_4928590944976_1048974114_n.jpg";s:3:"uri";s:57:"public://field/image/25919_4928590944976_1048974114_n.jpg";s:8:"filemime";s:10:"image/jpeg";s:8:"filesize";s:5:"80014";s:6:"status";s:1:"1";s:9:"timestamp";s:10:"1380443978";s:11:"rdf_mapping";a:0:{}}}}s:14:"field_category";a:0:{}s:12:"field_rating";a:0:{}}






    Form

    feature of native form builder

    • Security
      • xss attack prevent - per submission form id
    • Centralized form generation process
    • Can be changed by any other module
    • Can be add new element inherited by other module
      • date picker form inherited from textfield
      • image field inherited from file field
      • date picker can be use for any other modules
       $form['my_other_field_need_date_picker'] = array(
        '#type' => 'date_popup',
        '#title => t('My Date'),
        ....
      );


    Form elements types

    checkbox, checkboxes, date, fieldset, file, machine_name, managed_file, password, password_confirm, radio, radios, select, tableselect, text_format, textarea, textfield, vertical_tabs, weight

    But .... very expensive when

    • cross modules to hooks
    • cross modules to alter form
    • gathering all element type
      • to render a form html
      • to fill default values

    Why Drupal Cache Form

    • Cache for multiple-step submission
    • Cache for validate submitted value
    • Cache when errors appear (and doesn't need regenerate)
    • Generate whole form in first step
      • would not regenerate when next step
      • save the submitted value in state cache
      • check sumitted value for detect invalid input



    Navigation

    (Drupal Menu)

    Navigation is complex

    • Permission based
      • Some link only for logged user
      • Some link for administrator, special routing
    • Navigation can be place on many places
      •  Header, footer, developer, account, article
    • Parent-child trails eg. Gallery > Jimmy > Photo
    vs

    Navigation cache

    • Cache calculated parent-child relationship
    • Cache navigation tree by permission indexed
    • Menu/Routing cache save 50-80ms per page
      • Cache by user / by permissions





    Cache Design

     

    Design the Cache

    Generate Component


    Panels handling block

    • 3-party module
    • Design layout yourself
    • Add Drupal blocks into these layout
    • Set cache for each block 

    Each block cache set


    Cache Method

    • Time based cache (simple cache) 
    • Page based cache (cache by url) 
    • Rules based cache (cache by condition)
    • Custom cache  programming

    Cache Ripper, Spider, Engine




    Question

    Made with Slides.com