Thanks again ! It was definitely a life-changing event for me !
While discussing with some of Dalibo's employees, one of them made a remark that triggered my internal challenge instinct.
He said that postgresql in Linux in a VM in Windows is faster than PostgreSQL on the same Windows.
As I'm new in the postgresql/linux world, I was baffled by this information but when I requested numbers for it, he had none.
I understood that it's was just a boutade (I'm quick to understand when one explains it multiple times) and that he just meant that PostgreSQL on Linux was faster than on Windows.
Linux architecture vs Windows architecture.To understand his speed claim, one has to understand the main difference regarding architectural design between Windows and Linux.
Linux can fork while Windows can't !
But WTF is a fork ?
With a quick summary, a fork is a system call that allows a process to create child processes of itself and both new processes keep on their businesses.
They can both share their memory and communicate together.
This is a standard development technique in the Unix/Linux environment, but this is not standard at all for windows... as fork doesn't exist in windows.
It doesn't exist and you shouldn't try to do it, this is not standard architecture for windows application.
If you're building an application with this kind of architecture on windows, it should be based on thread instead, or even better, if you're building it in .NET, you should build it with async !
But if fork doesn't exist on windows but the postgresql code is the same for linux and windows, and that it forks in linux, what does it do ?
Well the brilliant PostgreSQL developer created a system for windows that emulates the fork ... with threads !
Kudos to them for that, because now we have postgresql on Windows !
But let's come back to the boutade, it is supposed that because of this custom "fork", PostgreSQL on Windows is slower than on Linux.
As with the popular "Pics or it didn't happen", I was craving for numbers to have a proof for that.
The only benchmark I could find is one from RedHat, found on the developpez.net forum.
Some may say that it's not entirely impartial due to the fact that it was made by RedHat.
The Situation.The client I'm currently working for as a FULL Windows infrastructure and they are planning to have more and more PostgreSQL because .. $ORACLE$ !
So before going further with PostgreSQL on Windows for our new apps, I wanted to be sure that it was the best option because :
- It has to withstand time ! (We are currently working with a system that was written in 1993, the year the mosaic browser and the first Pentium cpu were released !Therefore, with a little extrapolation, the new apps we're building now will still be working in 2037).
- It has to be reliable.
- It has to be efficient.
If we had to move to Linux, maybe the best time is at the start, not the end...
So, for each point, I came to the conclusion that :
- Windows and Linux where there in 1993 and are still here, so that's fine for me. (Windows NT and Slackware both started in 1993)
- PostgreSQL started in 1995 and a lot of users are using it now without issues and recently Gartner has put them in the leader quadrant, so this is also fine for me.
- That's the missing link, I don't know for sure that it's better on Linux or Windows.
So what can I do to find the missing link ??? Do a benchmark myself !
The benchmarkFor this benchmark, I want :
- A very simple scenario
- To isolate the benchmarked components.
- To see if PostgreSQL on Linux is faster/slower than on Windows with the same client. (Because I want to test the server !)
- To be the closest possible to a "real life" scenario.
- To be in the cloud. Why ? Because a lot of future apps will be built in the cloud... and I don't have a sufficient infrastructure at home to test it.
"The database"Remember KISS ? (Not that KISS)
create table table_a( thread_id int, a_value int);
create index on table_a(thread_id);
insert into table_a(thread_id,a_value)
select * from generate_series(1,10) as thread_id ,generate_series(1,100000) as a_value;
"The Client "Windows 2012 R2 server on amazon, m4.xlarge type, with all the default settings.
The client "application" is composed of 3 console apps that launches 5000 tasks each, available on github https://github.com/jmguazzo/TestPostgresqlPerf.
Each task does 1 select and 1 update of a random record on the table_a.
The output of the application is composed of the following resultsApplicationId RunningTasks TaskDuration EndTime
7fef1...c1 31 9530868 03:46:01
The tasks duration value is in Ticks but I took the liberty to convert it to seconds while analyzing the results.
The console application is in C# and uses NPGSQL 3.0.3.
"The Windows Postgresql Server " aka WSWindows 2012 R2 server on amazon, m4.xlarge type, with all the default settings.
Postgresql 9.4.5 installed with the wizard.
I changed the max_connections to 300, listen_addresses to * and the required changes to pg_hba.conf for the connection to work.
"The Linux Postgresql Server " aka LSAmazon Linux AMI, m4.xlarge type, with all the default settings.
Postgresql 9.4.5 installed with yum.(I had to google that part a little as this was quite new for me!)
I made the same changes for postgresql.conf and pg_hba.conf as the one I made for Windows.
ResultsFor a simple "mise en bouche", here are the execution time for the creation of the table
Query returned successfully: 1000000 rows affected, 14624 ms execution time.
Query returned successfully: 1000000 rows affected, 9374 ms execution time.
Query returned successfully: 1000000 rows affected, 3859 ms execution time.
From there, I'm baffled, Linux is definitely faster.
(And now that we know that my laptop's quite slow, I won't include it in other results...)
As explained before, the test consist of 15.000 executions of 1 SELECT and 1 UPDATE.
While running the test on LS, I had 8 "Timeout while getting a connection from pool" errors.
And to be honest, I expected to have more errors with Windows than Linux, so I guess that's another flabbergasting based on assumptions...
When an error occurs during an execution, my projects output a -1 for duration as shown hereunder :
7fef1...c1 31 541229671 03:47:09
7fef1...c1 31 -1 00:00:00
Therefore, I have to remove those 8 executions from my statistical analysis, and I will remove 8 executions from the WS results.
I was quite puzzled for deciding which one to remove so I removed them around the median value.
So what can we assume from this ?
On the Minimum duration, WS was almost 15% faster but on the Maximum, it was 10% slower than LS.
For the average duration, WS was 1% slower than LS. But as the saying goes, if my head is in the oven and my feet in the fridge, on average I'm good !
Due to the median being at the half of the results and the fact that we have 14992 valid results (15000 - 8 errors), the median is at result n° 7496.
WS is 10% faster if we look at the time only; but if we look at the full results, I had to get to the WS execution n° 8543 to have a duration of 0,93746 sec.
So more than 1000 executions on WS ran faster than the median on LS.
Response Time vs ThroughputI didn't want to analyze traffic/throughput because most of the time, it's useless, especially regarding the application I made and the fact that I'm using the cloud.
I find it always difficult to measure hardware precisely as, for me, analyzing it while using it is close to Schrödinger's Cat !
I was more obsessed with response time as my memory was always ringing this research from Jakob Nielsen that give the following limits for response time :
- 0.1 second is about the limit for having the user feel that the system is reacting instantaneously, meaning that no special feedback is necessary except to display the result.
- 1.0 second is about the limit for the user's flow of thought to stay uninterrupted, even though the user will notice the delay. Normally, no special feedback is necessary during delays of more than 0.1 but less than 1.0 second, but the user does lose the feeling of operating directly on the data.
- 10 seconds is about the limit for keeping the user's attention focused on the dialogue. For longer delays, users will want to perform other tasks while waiting for the computer to finish, so they should be given feedback indicating when the computer expects to be done. Feedback during the delay is especially important if the response time is likely to be highly variable, since users will then not know what to expect.
I took the hypothesis that if your DB + your Web Server must respond in less than 10 sec, I'd give half the time to the DB and therefore :
|Quantity of execution faster than 5 sec
There's only a 0.001% difference between WS executions and LS.
And for both systems, more than 99% of executions took less than 5 sec.
Let's put it all together
|Number of executions
|Number of errors
|Quantity of execution faster than 5 sec
The infrastructure I used was on Amazon and my results may have been impacted by some actions outside of my control... which, by the way, will always be the case when you're using the cloud.
Moreover, the differences aren't statistically significant IMHO, but you could always debate.
With those results, I won't say that PostgreSQL on one OS is faster than on the other.
And, to answer my own request, I won't advise my client to migrate to Linux because of PostgreSQL performance. (You could argue that there are more tools for PostgreSQL on Linux than on Windows but that was not the point here)
For me, PostgreSQL performance on Windows is not better nor worst, it's at par with Linux !
All the results are available on my github.