Archives
Visitors
  • 56579This month:
  • 648Today:
  • 21Currently online:



LeaseWeb CDN

POC: Flexible PHP Output Caching

I started my pet-project almost a year ago, developed it in my free time and it is time to write an article about it. I also released this project under Apache 2.0 license.(http://github.com/tothimre/POC)

Last year at the Symfony conference in Paris I have heard a really good quote:

“There are only two hard things in Computer Science: cache invalidation and naming things” — Phil Karlton. I agree with it and it gave me a boost to keep evolving the concept.

Addressed Audience

This article is for software developers and people who can tweak code. It is also for people who had or will have performance issues. This software will help you to solve these issues with minimal effort.

I don’t want you to get bored, so I will lead you to the examples, if you want to know more technical details you can find it after the following topic.

Examples

Unfortunately I cannot cover all the features framework provides in detail here but I can give you a brief introduction to their usage through a few examples. Let’s take the first functional tests in the framework as the first example. Please be aware though that this is a quickly evolving project and the examples seen/given here might not work with these parameters in the near future as the parameters are also subject to change. So please always refer to the documentation and examples shipped with the framework.

The functional tests do not cover all the implemented features but can be used to prove that the main functionality (caching) is working on the webserver. The tests can be found in the root of the project at the “/functionaltests” folder. The following example code does not contain inline comments because I am going to explain the code in detail.

Let’s have a look at cache_filecache.php


use POC\Poc;
use POC\cache\cacheimplementation\FileCache;

include("../framework/autoload.php");

$poc  = new Poc(array(Poc::PARAM_CACHE => new FileCache(), Poc::PARAM_DEBUG => true));

$poc->start();

include('lib/text_generator.php');

The first thing you notice looking at the code snippet above is the way parameters are given to the constructor of the Poc class. Because many parameters can be changed in the classes shipped with the project, I decided to build a more flexible way of handling parameters than the one used in PHP. The user has to pass an array to these objects where the index of the array is the name of the parameter and the value of course is the value of the parameter. If a parameter is not defined in the array the framework will use the builtin default value for that parameter. As the default cache engine is FileCache we could have omitted the PARAM_CACHE parameter in the previous example and define the $poc variable like this:

$poc  = new Poc(array(Poc::PARAM_DEBUG => true));

This is a really easy scenario where your application is mocked by the “/lib/text_generator.php” file. This is at the moment a Lorem Ipsum generator. We cache its contents – simply by creating the Poc object – as we can see in the example. We store the generated caches for 5 seconds – it is the default value – and we also want to see some performance information at the end of the cached text so we turned on debugging. We achieve this by adding the last parameter with a “true” value. The Hasher class is an important part of the concept. Let me describe it in the following example.

In this example we used the FileCache engine for caching, but by changing only a few characters we can use  “MemcachedCache”, “RediskaCache”, “MongoCache”, etc. So, it is really easy to implement new caching engines to the project.

More complex Example

Let’s get a closer look at an everyday example. That’s why I took the Symfony Jobeet tutorial for describing the usage more closely. I have copied my framework to the lib/vendors folder and also created the poc.php file in the config folder. This file needs to be included at the very beginning of the application. For now we put it at the first line of the file:  config/ProjectConfiguration.class.php

Let me reveal to you the contents of the file poc.php.

require(dirname(__FILE__).'/../lib/vendor/poc/framework/autoload.php');
use POC\cache\filtering\Hasher;
use POC\cache\filtering\Filter;
use POC\Poc;
use POC\cache\cacheimplementation\FileCache;

$hasher = new Hasher();
$hasher->addDistinguishVariable($_GET);
$hasher->addDistinguishVariable($_SERVER["REQUEST_URI"]);

$conditions = new Filter();
$conditions->addBlackListCondition($_POST);
$conditions->addBlackListCondition(strpos($_SERVER['REQUEST_URI'],'backend'));
$conditions->addBlackListCondition(strpos($_SERVER['REQUEST_URI'],'job'));
$conditions->addBlackListCondition(strpos($_SERVER['REQUEST_URI'],'affiliate'));

$cache = new FileCache(array(FileCache::PARAM_TTL=>300,
FileCache::PARAM_FILTER=>$conditions,
FileCache::PARAM_HASHER=>$hasher));

$poc  = new Poc(array(Poc::PARAM_CACHE=>$cache, Poc::PARAM_DEBUG=>true));
$poc->start();

As you can see, the basics of this case are really similar to the previous example. But here we utilize some other features as well.

By calling the addDistinguishVariable function we will make our cache unique. Those variables get stored into an array that you add to it as a parameter, then at one point the array will be serialized and a hashkey will be generated from it. This hash will identify your cache. You have to find the variables that can identify the state of your application and add those to this function, it is that simple! As I examined the Jobeet application the $_GET, and the Request URI values define the state of the application on the level we want to use the cache. This means that authenticated pages are not cached in this case.

now we have reached the next piece of code that calls the addBlacklistCondition function. This stores logical values regarding the current state of your application. If any of those are true the page will not be involved in the caching process. Here we have defined if there is a POST action or if the URL contains any of the “backend”,” job”, “affiliate” words the caching is not applied.

Easy, right? There is no need for more explanation; it just works out of box. Maybe I have not covered all possible blacklist states in this example, but you know the basic software.

Performance

The Engine is really small; it does not contain more than 2400 lines of code (excluding unittests and vendors), and it does not do any “black magic”. The total overhead should be less than one millisecond. If there is a cache hit, it is likely to get the output to your machine within a few milliseconds. If you measure the process before the moment the output is pushed to the client, you can see that the cache engine in most cases gets you the page under 1 millisecond. (Note that the actual performance might depend on the cache engine you employ and the environment you run the software on.)The performance is really promising, way much faster as I thought it will be when I started the project, but for now what I can say is that in a next article you will see proper benchmarks as well, so stay tuned!

Constraints and concepts

I wanted to use the latest programming methods so I decided to support only PHP 5.3 and above. Some of the several concepts and methodologies the project uses are the following:

  • namespaces
  • Continuous integration (Jenkins)
  • Dependency injection

Of course it had to be fast so I didn’t want to rely on external frameworks but used the built in PHP functionality where it was possible.

Features

With this framework you can customize the caching settings to a great extent. Let me list some of these options:

  • Output caching based on user defined criteria
  • Cache invalidation by TTL
  • Blacklisting / cache invalidation by application state
  • Blacklisting by output content
  • For caching it utilizes many interfaces, such as:
    • Memcached
    • Redis
    • MongoDb
    • Its own filesystem based engine.
    • APC (experimental, performs and works well on a webserver, but unfortunately the CLI interface does not behave like it should and it cannot be unit tested properly so I don’t include it in the master branch)
  • For cache tagging it utilizes MySQL but more database engines will be added
  • Cache Invalidation by tags
  • Minimal overhead on the performance
  • Easy to turn on/off
  • Controls the headers

Planned features

As the framework is still in an early state, many new features will be implemented in the future. These include the following:

  • Edge side includes
  • Cache templating with Twig
  • statistics stored in database
  • And many more

If you have any questions and or suggestions, feel free to leave a comment!

8 Responses to “POC: Flexible PHP Output Caching”

  • Jon Williams:

    Is there a reason that you used Class constants as the keys for the options array? I find that using simple strings is nicer. Otherwise, looks good so far!

  • Maurits van der Schee:

    Looks good.. Why/how is it better than Varnish? Just curious ;-)

  • Hello,

    Thanks for your comment.

    You can use the strings as well. But as the author of the framework I cannot use magic strings.
    It is a littlebit more typing if you don’t use any IDE, but it helps to keeps your code cleaner,and I suppose more understandable as well. Because the framework builds on dependency injection the quantity of this parameters will increase in the future. All paramerter indexer constants starts wth PARAM prefix, so it will help you to find the possibilities for one class easier in the future, and is more convinient than to remember all strings. But anyone can use strings freely. Just it is not officially promoted :-)

  • Vic D'Elfant:

    Looks like a very promising pet-project, will be giving it a try soon! Getting a page in under 1ms is a great result, but then again I assume that Leaseweb has sufficient “decent hardware” to run this on.

    Jon: You can easily make a typo in a string… it’s harder to do that with a class constant if you use an IDE with autocomplete and/or error checking as you type :)

  • Maurits: Varnish is another layer, as the frameworks handles headers it can help your reverse proxy on caching, also for varnish you can write some modules easily and if a cache is by tags is invalidated POC can call you varnish to invalidate the caches it stores. AS POC builds on Dependency Injection this can be done easily. In other words, you can think for POC as an application built in reverse proxy that can provide really relevant information (because it can see the inner state of the application as well) for the external reverse proxies.

    Vic: Early adopters receive premium support for free :) so go for it!
    Yes Leaseweb of course has got really nice hardware just see them on http://www.leaseweb.com . But even if you have a “decent hardware” caching helps you to keep your customers even more satisfied, because speed matters. Also Recently I have implemented a feature called CIA Protection (is already on the master branch), but an article will come on that topic soon as well. Even if you don’t want to use caching at all that feature can be useful for anyone.

  • […] der Leaseweb Labs-Blog gibt es einem kürzlich erschienenen Beitrag Blick auf Verwendung des POC Rahmen , um mit flexiblen Ausgabezwischenspeicherung zu arbeiten. Das […]

  • Rogier Mulders:

    Hey Imre! Nice to hear from you and your project! Interesting as I just had a project with varnish and exactly the invalidating of the cache is still an issue

  • Rogier: There of course multiple strategies to invalidate the cache, I have some experience on the topic, if you need some suggestion, just contact me in private, or here we can talk over it.

Leave a Reply