Recent Comments
Archives
Visitors
  • 44455This month:
  • 861Today:
  • 18Currently online:



LeaseWeb CDN

Why speed is more important than scalability

Software developers creating web applications like to talk about scalability as if it is totally unrelated to computing efficiency. They also like to argue that abstractions are important. They will tell you that their DBAL, Router, View Egine, Dependency Injection and even ORM do not have that much overhead (only a little). They will argue that the learning curve pays off and that the performance loss is not that bad. In this post I’ll argue that page loading speed is more important than scalability (or pretty abstractions).

Orders of magnitude of speed on the web

Just to get an idea of speed, I tried to search for a web action for every order of magnitude:

  • 0.1 ms – A simple database lookup from memory
  • 1 ms – Serving small static content from RAM
  • 10 ms – An very simple API that only does a DB lookup
  • 100 ms – A complex page load that calls multiple APIs
  • 1000 ms – Nothing should take this long… :-)

But I’m sure that your website has pages that take a full second or more (even this site has). How can that be?

CPUs and web farms

Most severs have 1 to 4 CPUs. Each CPU has 2 to 32 cores. If you do a single web request, then you are using (at most) a single core of a single CPU. Maybe that is why people say that page loading speed is irrelevant. If you have more visitors, then they will use other cores or even other CPUs. This is true, but what if you have more concurrent requests than visitors? You can simply add machines and configure a web farm, as most people do.

At some point you may have 16 servers running your popular web application with page loads that average at 300 ms. If you could bring down the page load time to 20 ms, you could run this on a single box! Imagine how much simpler that is! The “one big box” strategy is also called the “big iron” strategy. Often programmers are not very careful with resources. This is because a software developers tend to aim for beautiful abstractions and not for fast software.

Programming languages matter

Only when hardware enthusiasts and software developers work together you may get efficient software. Nice frameworks may have to be removed. Toys like Dependency Injection that mainly bring theoretical value may have to be sacrificed. Also languages need to be chosen for execution speed. Languages that typically score good are: C, C# (Mono), Go and even Java (OpenJDK). Languages that typically score very bad: PHP, Python, Ruby and Perl (source: benchmarksgame). This is understandable, as these are all interpreted languages. But it should be considered when building a web application.

Stop using web application frameworks

But even if you use an interpreted language (which may be 10x slower), then you can still have good performance. Unless of course you build applications consisting of tens of thousands of lines of code that all need to be loaded. Normally people would not do that as it would take too much time to write. (warning: sarcasm) In order to be able to fail – against all odds – people have created frameworks. These are tens of thousands of lines of code that you can install on your server. With it your small and lean application will still be slow (/sarcasm). I think it is fair to say that frameworks will make your web application typically 10-100x slower.

Now let that be exactly the approach that is seen in the industry as “best practice”. Unbelievable right?

5 reasons your application is slow

If you are on shared hardware (VM or shared webhosting), then you need to fix that first. I’m sure switching to dedicated hardware will give you a better (and more consistent) performance pattern. Still, each of the following problems may give you an order of a magnitude of speed decrease:

  1. Not enough RAM
  2. No SSD disks (or wrong controllers)
  3. Using an interpreted programming language
  4. A bloated web framework
  5. Not using Memcache where possible

How many of these apply to you? Let me know in the comments.

Finally

This post will probably be considered offending by programmers that like VMs, frameworks and interpreted languages. Please, don’t use that negative energy in the comments. Use it to “Go” and try Gorilla, I dare you! It is not a framework and it is fast, very fast! It will not only going to be interesting and a lot of fun, it may also change your mind about this article.

NB: Go is almost as fast as the highly optimized C code of Nginx (about 10-100x faster than PHP).

Fluid web forms using AJAX

Sometimes you want a web form that changes based on the input. The web form can give the user feedback while filling it in: We often see web forms put green check-marks behind (server side) validated fields. Some fields may turn red after you filled them in wrong. This validation can either be done client side (in JavaScript) or server side (using AJAX).

I’ve created a small application (available on Github) that looks like this:

fluid_form

When you fill in a field and it loses focus, the entire web form is submitted and updated. This includes field validation on all fields and marking them red when validation fails. Note that only after you submitted the form you want validate that the required fields are filled in. So you need to differentiate between validation before and after submit. The form has a hidden “submitted” field to facilitate that.

This demo is using the following components:

  1. Bootstrap 3 (requires jQuery)
  2. jQuery-populate (plugin by Dave Stewart)

All validation and calculation is done server side.

Why? Well, some validation can simply not be done client side. Promo codes in your web shop’s shopping cart is a good example (for security reasons). Another reason may be that sending all your validation data in a JSON object may in some cases overload your client. For instance, when you have hundreds of products with lots of product specific options. This may lead to slow rendering, due to excessive JavaScript usage. This may be especially problematic on less powerful clients, like mobile phones.

There is a downside: In order to allow for real-time server side validation you need to be able to sustain way more page loads. If the form is submitted after every changed input field, you need the web form handling to be light-weight. I managed to get the initial page load and the submit and validate to be 1 (one!) millisecond in vanilla PHP on my local machine (see screenshot). I guess that if your page loads are this cheap, you may consider using a little more server side technology. :-)

Play with it, feel it!

git clone https://github.com/mevdschee/bootstrap-fluid-form.git
cd bootstrap-fluid-form
php -S localhost:8888

Point your browser to http://localhost:8888 and test the application. Pay special attention to the speed and the usability. Does your application resemble this? If not, then you should ask yourself… why not?

Data-driven API design

We live in a world of APIs. LeaseWeb is an infrastructure provider that has taken on an API-first approach. This means that we first build an API for our products. Only after the API is finished we build the UI for the product and we do this on top of our own API. This “eat your own dog food” principle ensures a high quality API in which bugs are found before the customer encounters them.

Our “bare metal” (formerly: dedicated server) team has an API for switch port management. Opening a switch port may (dependent on the switch brand) take up several seconds. This is because (for instance) a SSH connection has to be made to the switch, then an XML command and answer has to be sent back and forth. Also, some switches may wait a few seconds to respond with a status, as they are trying to identify the link status, after the port is opened.

But what if a customer requests a “faster” API? We can’t make opening or closing a switch port go faster. What to do? If any API call would only modify a single record in the database, then the execution time would be predictable (and always low). To achieve this there are two API designs possible: “asynchronous” and “data-driven”. In this post we will explore both and show you why a data-driven API is a better choice for consistent APIs.

Synchronous API design

This is the naive implementation. It allows you to Open/Enable the switch port attached to the server with a synchronous call:

POST /bareMetals/{bareMetalId}/switchPort/open

Parameters:

bareMetalId (integer) - The id of the bare metal server

Example request:

POST /v1/bareMetals/1234/switchPort/open HTTP/1.1
Host: api.leaseweb.com
Accept: application/json
X-Lsw-Auth: 4e7d4f2d-e683-4192-a113-61dad6ac9a15

Example response:

HTTP/1.1 200 OK
Content-Type: text/html

{
	"switchPort": {
		"status": "opened",
		"switchNode": 1,
		"serverId": 1234,
		"serverName": "My Server"
	}
}

Note that this call may take several seconds to complete and does not follow the RESTful resource naming scheme.

Asynchronous API design

If the call is asynchronous then the request is the same:

POST /bareMetals/{bareMetalId}/switchPort/open

Parameters:

bareMetalId (integer) - The id of the bare metal server

Example request:

POST /v1/bareMetals/1234/switchPort/open HTTP/1.1
Host: api.leaseweb.com
Accept: application/json
X-Lsw-Auth: 4e7d4f2d-e683-4192-a113-61dad6ac9a15

Example response:

HTTP/1.1 200 OK
Content-Type: text/html

{
	"jobId": "f5825d99-62ed-433b-930b-1b9561fe4a6e",
	"createdAt": "2015-01-14T15:09:10+01:00"
}

The response is a job identifier that then has to be polled for status, like this:

GET /bareMetals/{bareMetalId}/jobs/{jobId}

Parameters:

bareMetalId (integer) - The id of the bare metal server
jobId (string) - The id of a job

Example request:

GET /v1/bareMetals/1234/jobs/f5825d99-62ed-433b-930b-1b9561fe4a6e HTTP/1.1
Host: api.leaseweb.com
Accept: application/json
X-Lsw-Auth: 4e7d4f2d-e683-4192-a113-61dad6ac9a15

Example response:

HTTP/1.1 200 OK
Content-Type: text/html

{
	"id": "f5825d99-62ed-433b-930b-1b9561fe4a6e",
	"status": 3,
	"statusDescription": "Failed",
	"createdAt": "2015-01-14T15:09:10+01:00",
	"updatedAt": "2015-01-14T15:09:12+01:00"
}

All reponses can be sent really fast as they are simple database lookups. It is important to realize that you need some sort of job manager in the back that does actually execute the jobs and updates the job status accordingly. Note that the commands do not follow the RESTful resource naming scheme.

Data-driven API design

When following data-driven API design you are simply modeling the database tables to facilitate the functionality. It allows you to have fast replies and also follows the RESTful resource naming scheme. You may get the status of the switch port attached to the server with a call like this:

GET /bareMetals/{bareMetalId}/switchPorts

Parameters:

bareMetalId (integer) - The id of the bare metal server

Example request:

GET /v1/bareMetals/1234/switchPorts HTTP/1.1
Host: api.leaseweb.com
Accept: application/json
X-Lsw-Auth: 4e7d4f2d-e683-4192-a113-61dad6ac9a15

Example response:

HTTP/1.1 200 OK
Content-Type: text/html

{
	"switchPorts": [
		{
			"status": "opened",
			"switchNode": 1,
			"serverId": 1234,
			"serverName": "My Server"
		}
	]
}

When you are opening the switch port you are actually inserting a record in the switchPortOpenRequests table with the standard POST as prescribed by the RESTful resource naming scheme, like this:

POST /bareMetals/{bareMetalId}/switchPortOpenRequests

Parameters:

bareMetalId (integer) - The id of the bare metal server

Example request:

POST /v1/bareMetals/1234/switchPortOpenRequests HTTP/1.1
Host: api.leaseweb.com
Accept: application/json
X-Lsw-Auth: 4e7d4f2d-e683-4192-a113-61dad6ac9a15

Example response:

HTTP/1.1 200 OK
Content-Type: text/html

{
	"id": "f5825d99-62ed-433b-930b-1b9561fe4a6e",
	"status": 1,
	"statusDescription": "Pending",
	"createdAt": "2015-01-14T15:09:10+01:00",
	"updatedAt": "2015-01-14T15:09:10+01:00"
}

Just like with the asynchronous model you can poll for status. Note that there is no job identifier, just a normal identifier, as the request is a normal resource that follows the RESTful resource naming scheme. You do still need to have a background process that watches the switchPortOpenRequests table to execute these requests when they are in “pending” state. This background process is responsible for updating both the switchPortOpenRequests and the switchPorts table with the correct status information.

GET /bareMetals/{bareMetalId}/switchPortOpenRequests/{id}

Parameters:

bareMetalId (integer) - The id of the bare metal server
id (string) - The id of the request

Example request:

GET /v1/bareMetals/1234/switchPortOpenRequests/f5825d99-62ed-433b-930b-1b9561fe4a6e HTTP/1.1
Host: api.leaseweb.com
Accept: application/json
X-Lsw-Auth: 4e7d4f2d-e683-4192-a113-61dad6ac9a15

Example response:

HTTP/1.1 200 OK
Content-Type: text/html

{
	"id": "f5825d99-62ed-433b-930b-1b9561fe4a6e",
	"status": 3,
	"statusDescription": "Failed",
	"createdAt": "2015-01-14T15:09:10+01:00",
	"updatedAt": "2015-01-14T15:09:12+01:00"
}

As you can see this last way of modeling does not only reduce every API call to be a database lookup, it also allows you to follow the (predictable) RESTful resource naming scheme.

Generating a data-driven API

The advantage of following an API naming scheme is that the API becomes predictable and can be generated. The MySQL-CRUD-API project facilitates creating a full-featured API by uploading a single PHP file. This API will follow the resource naming scheme for your MySQL or SQL Server tables. With a data-driven API design this should not limit you to create any API functionality. Try it!

David Beazley’s concurrency demo on PyCon 2015

pycon_2015David Beazley did an amazing live demo on Python Concurrency on PyCon 2015 (Montreal, last month). Lucky for us it is available on YouTube! I must say, that once I started watching the video I was hooked! I simply could not stop watching for the entire 45 minutes.

Python concurrency methods and their performance characteristics

In the beginning of the video David types in a Fibonacci socket server and shows the audience how socket programming works. That is already quite impressive (as he talks and types at the same time). Then he evaluates several approaches to Python concurrency: threads, processes, co-routines combined with the select call and finally co-routines combined with the select call and threads.

The performance characteristics of the approaches are shown by running a concurrency test and then a “requests per second” and a “response time in seconds” performance test. David clearly explains which role the GIL plays on the concurrency of co-routines and also where you see the (operating system) overhead caused by threads and processes.

In the questions he says that you should follow the best practices on concurrency and he specifically names “don’t have shared state” and “use functions”. See the talk by clicking on the video below:


Thank you, PyCon 2015, for sharing this video with us!

What’s up with FizzBuzz post commenters?

Somehow FizzBuzz is still a topic today and that’s why I recently wrote a (satirical) post about it. Today I’ll describe the history of FizzBuzz and provide you with the links to the original articles. For anyone who does not know what FizzBuzz is, let me explain this first. Fizzbuzz is a programming test of a trivial algorithm that is supposed to be written with pen and paper (without using a computer). Many programmers fail this test during interviews. What that means? No idea… really… I don’t.

What’s wrong with FizzBuzz?

I believe in programming tests in the interview process, but not in the “trivial algorithm with pen and paper” type. In my opinion the following is wrong with the FizzBuzz test:

  1. No Google or other tools
  2. Being overlooked while writing
  3. No way to run the code (or check the syntax)
  4. No text editing (insertion/modification)
  5. Algorithm is trivial

To me, these differences with programming in the office make FizzBuzz a test that has nothing to do with the actual creation of software. Although I have no data to back it up, my feeling says that FizzBuzz success DOES correlate with programming experience. But programmers that are good at memorizing syntax are far less valuable than those solving non-trivial algorithms. Since FizzBuzz rewards only the first, I think it is flawed. So it is a bad test, so what?

Those comments… why?

Well, before we skip to the history of FizzBuzz I have one question about these posts that I want you to help me answer:

“Why do people post answers (code) to FizzBuzz articles in the comments?”.

Aren’t you disqualifying yourself by posting the answer to FizzBuzz? I feel you are saying “the answer is interesting” or “look, I can solve it!”. What’s up with that? Some people post answers in obscure languages, which I also do not understand. What is that supposed to mean? Does it mean “this obscure language can be used to solve this trivial problem” or “I know this obscure language good enough to solve a trivial problem”? Or, more understandable I guess, are all these reactions trolls to invoke a reaction? If that is the case, then what is the desired reaction?

One commenter (EdwardM) writes:

And, just because I’m a geek, and can’t help myself, here’s what I whipped up in a couple of minutes. I’m sure it could be better, but that’s when you start up an interesting conversation about refactoring in the interview, right? 😉

He knows it is wrong to post the answer, but still he does it. Is it the urge to share? The urge for recognition? Maybe that is all not true. Maybe I am too negative and is it a positive thing. Maybe it is fueled by the “joy of programming”, can that be it?

History of FizzBuzz

Okay, now as promised the first three posts from 2007 that started the whole FizzBuzz discussion in the first place:

  1. Imran Ghory: Using FizzBuzz to Find Developers who Grok Coding
  2. Reginald Braithwaite – Don’t Overthink FizzBuzz
  3. Jeff Atwood – Why Can’t Programmers.. Program?

Other interesting links on the topic:

Read them, but remember: beware of the comments! 😉