Posts Tagged ‘php’

Buffered Nginx log reading using PHP and logrotate

The problem: We want to continuously read the Nginx access logs on our high-traffic web nodes for real-time analytic purposes.

Logrotate every 10 seconds

Not possible! The minimum that logrotate allows is hourly. Apart from that I do not think this is very practical. I would still prefer to keep the logfiles hourly, for easy disk management. I do not like an hour of delay on my statistics. I would prefer something like one minute until it reaches the MySQL database.

Using a named pipe with mkfifo

The idea is that you can make a pipe with “mkfifo” and configure Nginx to log to that file. Sounds good until you realize that the buffer size of the pipe is quite limited (a few kb) and it will just silently drop log lines on buffer overflow. Also Igor Sysoev advised that you should not do it and since he knows what he is talking about, I suggest you stay away from it. :-)

Watch the log file with inotify

This idea is to install a watcher on the log file that triggers an event when the file is written to. This is a good idea, but the amount of events on a busy server may be just a little too high for this to be really efficient. We would rather not be notified every time the filesystem sees a change on the Nginx access log file. EDIT: This is actually not a bad idea as long as you enable buffered Nginx access logging, see next post.

Open the file and read until EOF every x seconds

Open the file and keep it open. Sleep for x seconds and read the file until the end. Send that data to your statistics cluster and return to the sleep. This works great until the file gets rotated. Once the file gets rotated, you first need to finish reading the old file until you start reading the new one. Fortunately, logrotate moves the file by default and a open moved file can still be read using the original file descriptor. When we find that that the file has been moved y seconds ago and not been written to for z seconds, then we can decide to close the rotated file and open the new current file. This way we can ensure that we do not lose any log lines. Certainly x, y, and z need some values found using trial and error. I think five, three, and one seconds should be a good starting point.

Configure logrotate to rotate hourly for 48 hours

This can be done using the following logrotate script:

/var/log/nginx/*.log {
        hourly
        missingok
        rotate 48
        compress
        delaycompress
        notifempty
        create 0640 www-data adm
        sharedscripts
        prerotate
                if [ -d /etc/logrotate.d/httpd-prerotate ]; then \
                        run-parts /etc/logrotate.d/httpd-prerotate; \
                fi \
        endscript
        postrotate
                [ -s /run/nginx.pid ] && kill -USR1 `cat /run/nginx.pid`
        endscript
}

Make sure you also move the logrotate cron job from daily to hourly using:

sudo mv /etc/cron.daily/logrotate /etc/cron.hourly/logrotate

Read the log files

The following code is an example in PHP that will demonstrate how this log reader will work. Note that Python and Golang may be more suitable languages for implementing these kinds of long running programs. This script just prints the data it reads. Streaming the log lines to your big data analytics cluster is left as an exercise to the reader. ;-)

<?php
$logfile = "/var/log/nginx/access.log";
$var_x = 5;
$var_y = 3;
$var_z = 1;
$f = fopen($logfile, "r");
fseek($f,0,SEEK_END);
while (true) {
  $data = stream_get_contents($f);
  echo $data; // or send the data somewhere
  clearstatcache();
  $s1 = fstat($f);
  $s2 = stat($logfile);
  $renamed  = $s1["dev"] != $s2["dev"] || $s1["ino"] != $s2["ino"];
  if ($renamed && time()-$s1["ctime"]>$var_y && time()-$s1["mtime"]>$var_z) {
    echo "renamed\n";
    while (!file_exists($logfile)) sleep(1);
    fclose($f);
    $f = fopen($logfile,"r");
  } else sleep($var_x);
}
fclose($f);

While running the script above you can make sure the access log is written to another terminal using:

siege -c 100 http://localhost/

Now you can force the rotation to be executed using:

sudo logrotate -f /etc/logrotate.d/nginx

And see how the script gracefully handles it.

Command line access from your browser using shell.php

shell_php

Sometimes you want shell access from the browser. It can be achieved using PHP if the security settings allow it. I implemented this functionality in shell.php (available on Github). In the above screenshot you see how shell access from a browser works. The script allows you to upload, download, view edit and remove a file, zip and unzip a directory and traverse the directories on the server using the mouse, but you can also type in custom commands using the keyboard.

Security warning and disclaimer

Please run this script only on machines you own (or during an authorized pentest). Also make sure the machine is properly firewalled (port 80 should not be reachable from the Internet). Do not use it for malicious purposes! Read more on abuse of shell scripts here.

Known issues

If the script does not work it may be because the PHP “passthru” function on which it relies is disabled. To list disabled PHP functions execute the following PHP code:

var_dump(ini_get('safe_mode'));
var_dump(explode(',',ini_get('disable_functions')));
var_dump(explode(',',ini_get('suhosin.executor.func.blacklist')));

On a out-of-the-box Ubuntu 14.04 that will output:

bool(false)
Array
(
    [0] => pcntl_alarm
    [1] => pcntl_fork
    [2] => pcntl_waitpid
    [3] => pcntl_wait
    [4] => pcntl_wifexited
    [5] => pcntl_wifstopped
    [6] => pcntl_wifsignaled
    [7] => pcntl_wexitstatus
    [8] => pcntl_wtermsig
    [9] => pcntl_wstopsig
    [10] => pcntl_signal
    [11] => pcntl_signal_dispatch
    [12] => pcntl_get_last_error
    [13] => pcntl_strerror
    [14] => pcntl_sigprocmask
    [15] => pcntl_sigwaitinfo
    [16] => pcntl_sigtimedwait
    [17] => pcntl_exec
    [18] => pcntl_getpriority
    [19] => pcntl_setpriority
    [20] =>
)
Array
(
    [0] =>
)

PHP shell execution commands

If the script does not run using passthru(), it will try a few other commands. The following commands are similar:

  • exec() Returns last line of commands output
  • passthru() Passes commands output directly to the browser
  • system() Passes commands output directly to the browser and returns last line
  • shell_exec() Returns commands output
  • popen() Opens read or write pipe to process of a command
  • proc_open() Similar to popen() but greater degree of control
  • pcntl_exec() Executes a program

Hardening your server with open_basedir

If the above script seems scary to you, then you may want to prevent it from executing on your server. You can do this by enabling safe mode (deprecated), using the “disable_functions” php.ini variable and/or the Suhosin function execution blacklist.

I have found a well written post on securing your PHP installation, check it out! Apart from limiting the executable functions they also recommend the “open_basedir” php.ini config variable. It limits the files that can be accessed by PHP to the specified directory-tree. I believe this is a powerful tool.

Also it could be a good idea to secure your “/tmp” directory with “nodev”, “nosuid” and “noexec” flag as described here.

Cannot find Suhosin?

Note that the “php5-suhosin” package (a PHP security extension) is no longer installed nor available on Debian based systems. Some of the security improvements have been incorporated into the latest PHP versions (5.4 and 5.5). If you want to install Suhosin (from Github) on Ubuntu 14.04 (PHP 5.5.9) you can follow this tutorial.

You can read more about the controversy around removing Suhosin on LWN.net.

Privilege separation

If there are multiple users on the system “privilege separation” is a MUST. This means running the PHP code in the user context  (e.g. as user “maurits”) and not as user “www-data”. I have found a great article explaining how this can be achieved. The easiest solution is to run:

sudo apt-get install libapache2-mpm-itk

And then adding the “AssignUserID” directive to every “VirtualHost” configuration. Note that this may not be the safest solution, but it performs well and is easy to install.

Conclusion

You should always update and patch PHP to the latest version to prevent exploitation of known security holes. Tools like “disable_functions”, “open_basedir”, Suhosin and filesystem flags reduce the attack surface and prevent exploitation of unknown security holes. You can use them to create implement a layered security strategy. Also do not forget about privilege separation.

Install Adminer manually on Ubuntu 14.04

adminerAs I wrote over two years ago, Adminer is a very good alternative to PHPMyAdmin. I often find myself looking up that old post, because I frequently install, recommend or update Adminer. After using this software for several years, I am now convinced that it is (has become) much better than PHPMyAdmin. Especially since the new user interface of PHPMyAdmin has become worse. Adminer has progressed a lot and is at version 4.1.0 today. I simply love version 4 and I use it almost daily. The top 3 reasons (for me) to choose it are:
  1. Very clear and consistent user interface
  2. It automatically adds foreign keys
  3. You can easily reorder columns in a table

I think that once you give it a (serious) try, you will never want to use PHPMyAdmin (or any other database management tool) again… ever.

Install Adminer from the repository

It is also great that Adminer is now part of the standard Ubuntu repositories. This means that you can install it with “sudo apt-get install adminer”. However, I do not recommend this. The version of Adminer in the repository is version 3.3.3-1. And it is a very active project with great improvements in every version. Also, upgrading does not hurt, since it handles its dependencies very flexible. In my experience you can run the latest version on any recent Linux without any compatibility issues.

Install Adminer manually

Here are the commands you need for installation (on a Debian based system) that runs Apache 2.4, like Ubuntu 14.04:
sudo mkdir /usr/share/adminer
sudo wget "http://www.adminer.org/latest.php" -O /usr/share/adminer/latest.php
sudo ln -s /usr/share/adminer/latest.php /usr/share/adminer/adminer.php
echo "Alias /adminer.php /usr/share/adminer/adminer.php" | sudo tee /etc/apache2/conf-available/adminer.conf
sudo a2enconf adminer.conf
sudo service apache2 restart

Updating and uninstalling

This is the one-liner for updating Adminer:

sudo wget "http://www.adminer.org/latest.php" -O /usr/share/adminer/latest.php

And these are the commands needed for uninstallation:

sudo a2disconf adminer.conf
sudo service apache2 restart
sudo rm /etc/apache2/conf-available/adminer.conf
sudo rm -Rf /usr/share/adminer

If you know of any tool that is as good as Adminer, then let us know in the comments.

How to use the “yield” keyword in PHP 5.5 and up

The “yield” keyword is new in PHP 5.5. This keyword allows you to program “generators”. Wikipedia explains generators accurately:

A generator is very similar to a function that returns an array, in that a generator has parameters, can be called, and generates a sequence of values. However, instead of building an array containing all the values and returning them all at once, a generator yields the values one at a time, which requires less memory and allows the caller to get started processing the first few values immediately. In short, a generator looks like a function but behaves like an iterator.

The concept of generators is not new. The “yield” keyword exists in other programming languages as well. As far as I know C#, Ruby, Python, and JavaScript have this keyword. The first usage that comes to mind for me is when I want to read a big text file line-by-line (for instance a log file). Instead of reading the whole text file into RAM you can use an iterator and still have a simple program flow containing a “foreach” loop that iterates over all the lines. I wrote a small script in PHP that shows how to do this (efficiently) using the “yield” keyword:

<?php
class File {

  private $file;
  private $buffer;

  function __construct($filename, $mode) {
    $this->file = fopen($filename, $mode);
    $this->buffer = false;
  }

  public function chunks() {
    while (true) {
      $chunk = fread($this->file,8192);
      if (strlen($chunk)) yield $chunk;
      elseif (feof($this->file)) break;
    }
  }

  function lines() {
    foreach ($this->chunks() as $chunk) {
      $lines = explode("\n",$this->buffer.$chunk);
      $this->buffer = array_pop($lines);
      foreach ($lines as $line) yield $line;
    }
    if ($this->buffer!==false) { 
      yield $this->buffer;
    }
  }

  // ... more methods ...
}

$f = new File("data.txt","r");
foreach ($f->lines() as $line) {
  echo memory_get_usage(true)."|$line\n";
}

One of my colleagues asked me why I used “fread” and did not simply call PHP’s “fgets” function (which reads a line from a file). I assumed that he was right and that it would be faster. To my surprise the above implementation is (on my machine) actually faster than the “fgets” variant that is shown below:

<?php
class File {

  private $file;

  function __construct($filename, $mode) {
    $this->file = fopen($filename, $mode);
  }

  function lines() {
    while (($line = fgets($this->file)) !== false) {
        yield $line;
    }
  }

  // ... more methods ...
}

$f = new File("data.txt","r");
foreach ($f->lines() as $line) {
  echo memory_get_usage(true)."|$line";
}

I played around with the two implementations above,  and found out that the execution speed and memory usage of the first implementation is dependent on the amount of bytes read by “fread”. So I made a benchmark script:

<?php
class File {

  private $file;
  private $buffer;
  private $size;

  function __construct($filename, $mode, $size = 8192) {
    $this->file = fopen($filename, $mode);
    $this->buffer = false;
    $this->size = $size;
  }

  public function chunks() {
    while (true) {
      $chunk = fread($this->file,$this->size);
      if (strlen($chunk)) yield $chunk;
      elseif (feof($this->file)) break;
    }
  }

  function lines() {
    foreach ($this->chunks() as $chunk) {
      $lines = explode("\n",$this->buffer.$chunk);
      $this->buffer = array_pop($lines);
      foreach ($lines as $line) yield $line;
    }
    if ($this->buffer!==false) { 
      yield $this->buffer;
    }
  }
}

echo "size;memory;time\n";
for ($i=6;$i<20;$i++) {
  $size = ceil(pow(2,$i));
  // "data.txt" is a text file of 897MB holding 40 million lines
  $f = new File("data.txt","r", $size);
  $time = microtime(true);
  foreach ($f->lines() as $line) {
    $line .= '';
  }
  echo $size.";".(memory_get_usage(true)/1000000).";".(microtime(true)-$time)."\n";
}

You can generate the “data.txt” file yourself. First step is to take the above script and save it as “yield.php”. After that you have to save the following bash code in a file and run it:

#!/bin/bash
cp /dev/null data_s.txt
for i in {1..1000}
do
 cat yield.php >> data_s.txt
done
cp /dev/null data.txt
for i in {1..1000}
do
 cat data_s.txt >> data.txt
done
rm data_s.txt

I executed the benchmark script on my workstation and loaded its output into a spreadsheet so I could plot the graph below.

yield_graph

As you can see, the best score is for the 16384 bytes (16 kB) fread size. With that fread size the 40 million lines from the 897 MB text file were iterated at 11.88 seconds using less than 1 MB of RAM. I do not understand why the performance graph looks like it does. I can reason that reading small chunks of data is not efficient, since it requires many I/O operations that each have overheads. But why is reading large chunks inefficient? It is a mystery to me, but maybe you know why? If you do, then please use the comments and enlighten me (and the other readers).

Symfony on HHVM 3 and Nginx 1.4 vs PHP 5.5 and Apache 2.4

symfony_hhvm_nginx

Installing Symfony

From symfony.com/download we get the latest (2.4.4) version of Symfony. I have unpacked it and put it in the directory “/home/maurits/public_html”. In “app/AppKernel.php” I moved the “AcmeDemoBundle” to the production section and in “routing.yml” I added the “_acme_demo” route that was originally in “routing_dev.yml”.

Test environment

I tested on my i5-2300 CPU with 16GB of RAM and an SSD. To run a benchmark, I installed on my Ubuntu 14.04 both Apache 2.4 with PHP 5.5 and Nginx 1.4 with HHVM 3. I used Apache Bench (ab) to test the “/demo” path on both web servers. In Apache, I disabled XDebug and enabled Zend OPcache.

Install and configure HHVM 3 with Nginx 1.4 for Symfony

For HHVM, we find pre-built (64-bit) packages listed on Github. This is how you install them on Ubuntu 14.04:

wget -O - http://dl.hhvm.com/conf/hhvm.gpg.key | sudo apt-key add -
echo deb http://dl.hhvm.com/ubuntu trusty main | sudo tee /etc/apt/sources.list.d/hhvm.list
sudo apt-get update
sudo apt-get install hhvm

First we install Nginx from the Ubuntu 14.04 repository using:

sudo apt-get install nginx

Now we configure Nginx using the “normal” FastCGI configuration:

server {
    listen             8080;
    server_name        sf2testproject.dev;

    root /home/maurits/public_html/web;

    location / {
        # try to serve file directly, fallback to rewrite
        try_files $uri @rewriteapp;
    }

    location @rewriteapp {
        # rewrite all to app.php
        rewrite ^(.*)$ /app.php/$1 last;
    }

    location ~ ^/(app|app_dev|config)\.php(/|$) {
        fastcgi_pass 127.0.0.1:9000;
        fastcgi_split_path_info ^(.+\.php)(/.*)$;
        include fastcgi_params;
        fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
        fastcgi_param HTTPS off;
    }
}

Single request

When doing a single request and analyzing the response times using the Firebug “Net” panel there is no noticeable difference. This is probably because the threads do not have to compete for CPU. So let’s skip this and do some load testing.

Apache Bench results (Symfony 2.4 / Apache 2.4 / PHP 5.5)

maurits@nuc:~$ ab -c 10 -n 2000 http://sf2testproject.dev/demo/
This is ApacheBench, Version 2.3 <$Revision: 1528965 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking sf2testproject.dev (be patient)
Completed 200 requests
Completed 400 requests
Completed 600 requests
Completed 800 requests
Completed 1000 requests
Completed 1200 requests
Completed 1400 requests
Completed 1600 requests
Completed 1800 requests
Completed 2000 requests
Finished 2000 requests

Server Software:        Apache/2.4.7
Server Hostname:        sf2testproject.dev
Server Port:            80

Document Path:          /demo/
Document Length:        4658 bytes

Concurrency Level:      10
Time taken for tests:   9.784 seconds
Complete requests:      2000
Failed requests:        0
Total transferred:      9982000 bytes
HTML transferred:       9316000 bytes
Requests per second:    204.42 [#/sec] (mean)
Time per request:       48.918 [ms] (mean)
Time per request:       4.892 [ms] (mean, across all concurrent requests)
Transfer rate:          996.36 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       3
Processing:    18   49  10.9     50      90
Waiting:       15   41  10.1     41      78
Total:         18   49  10.9     50      91

Percentage of the requests served within a certain time (ms)
  50%     50
  66%     54
  75%     56
  80%     58
  90%     62
  95%     65
  98%     69
  99%     73
 100%     91 (longest request)

Apache Bench results (Symfony 2.4 / Ningx 1.4 / HHVM 3)

maurits@nuc:~$ ab -c 10 -n 2000 http://sf2testproject.dev:8080/demo/
This is ApacheBench, Version 2.3 <$Revision: 1528965 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking sf2testproject.dev (be patient)
Completed 200 requests
Completed 400 requests
Completed 600 requests
Completed 800 requests
Completed 1000 requests
Completed 1200 requests
Completed 1400 requests
Completed 1600 requests
Completed 1800 requests
Completed 2000 requests
Finished 2000 requests

Server Software:        nginx/1.4.6
Server Hostname:        sf2testproject.dev
Server Port:            8080

Document Path:          /demo/
Document Length:        4658 bytes

Concurrency Level:      10
Time taken for tests:   4.678 seconds
Complete requests:      2000
Failed requests:        0
Total transferred:      9900000 bytes
HTML transferred:       9316000 bytes
Requests per second:    427.50 [#/sec] (mean)
Time per request:       23.392 [ms] (mean)
Time per request:       2.339 [ms] (mean, across all concurrent requests)
Transfer rate:          2066.52 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       1
Processing:     8   23  11.5     21      84
Waiting:        8   22  11.3     20      84
Total:          8   23  11.5     21      84

Percentage of the requests served within a certain time (ms)
  50%     21
  66%     26
  75%     30
  80%     32
  90%     39
  95%     46
  98%     54
  99%     58
 100%     84 (longest request)

Conclusion

On both setups, I did not do any optimization. I just installed and ran the benchmark. It seems that Symfony 2.4 on HHVM is about twice as fast as on PHP 5.5. In a real life setup this means you need half the machines. This seem like a good cost reduction at first, but it may not be as good as it looks. I believe that in reality most Symfony applications are doing database and/or API calls. These will not be faster when using HHVM, since HHVM only speeds up PHP execution. This is why I think that the difference will be smaller (than a factor 2) for a real life Symfony application. What do you think?