London Stock Exchange switches to Linux

technology: 

The London Stock Exchange is switching to Linux. The details are limited and claims of improved speed feature in the reports. is the switchover succeeding? Is it really Linux providing the speed? When we dig beneath the marketing hype and the wild claims, what is really happening?

Stock exchanges use a variety of computers and offer trades at several speeds. The London Stock Exchange, LSE, used Windows based computers somewhere in their system. Their computers reportedly take hundreds of milliseconds to complete a transaction and that is apparently slow compared to other exchanges. The LSE talked about 10 ms trade transactions. The new system, using Linux, is somewhere in the middle, more than a 100 ms, but less than the current system.

10 milliseconds

A 10 ms transaction is impossible unless you cheat. The first cheat is to measure the time at your computer instead of at the customer's computer. The 200 ms transaction time at the customer's desk might be 10 ms of server time and 190 ms of network time. You can report the 20 ms instead of the 200 ms. This type of cheat is only used by application installation companies to demand payment for meeting specifications. You make your customers angry by misrepresenting the service they get.

You can also speed up applications by removing reliability features and things as simple as completing transactions before reporting them as complete. Yes, instead of writing something to the database, waiting for the database to respond, then reporting the result, you can guess what the result might be then send the guess back and hope the real transaction eventually completes.

One online post says the old LSE system completes transactions in only 1.4 ms when measured at the firewall. That is impressive if the transaction is complete and permanent.

Network upgrade

The LSE has a sort of monopoly on a market but they also have limited customers and the global financial market lets rich multinational companies move their trades to other cities. They eventually have to satisfy someone. Today the LSE upgrade is on hold pending other changes including a network upgrade.

The floor traders use computers connected direct to the LSE servers using networks completely controlled by the LSE. There is no excuse for a slow network but the LSE are upgrading their network, which implies they were lagging behind. I see endless cases where management demand speed then pay for 90 percent of an upgrade but leave little bits untouched, despite those little bits restricting major parts of the network to the slow old speed. The LSE example sounds like the LSE network was not systematically upgraded when new hardware became available. They should have hired me to keep it up to date.

Most likely the network upgrade would help their old system. The Move to Linux is actually a move to Linux + a fast network.

Database replacement

The original application is reported as using IIS on Windows but no mention of a database. The new system uses Oracle but not conventional Oracle, this is Oracle in memory. Databases in memory disappear when you switch off the computer. You have to add systems transfer the data to permanent storage. In effect, the completed transaction in memory is just a temporary transaction, the transaction is not permanent until written to disk. in this type of cheat, you report the transaction as complete when the temporary transaction is finished, not the permanent recording of the transaction. If the whole system crashes and you restore form disk, some transactions will be missing.

There are lots of things you can do to replicate the temporary transactions to permanent databases. Oracle and the open source PostgreSQL have all the right features to make the system work. The LSE decision to replace Windows with Linux then then to use an expensive proprietary database on top of Linux is a rejection of open source software.

The LSE claims to be reducing costs but Oracle costs more than Windows and ISS. If there is a reduction in cost, it is more likely to be from a consolidation of software licences through using new fast hardware. You see a lot of similar cost justifications and in most cases, if they started again with their old development using new hardware and competitive software licensing, their old system would be equally cheaper.

The Move to Linux is actually a move to:

  • Oracle
  • Fast network
  • Oh and something called Linux

Tests on a small scale

The magic improvements in speed are reported on tests from a small trading pool, not the full LSE. When you start using all the performance shortcuts, including a database in memory, you hit limits of size and everything suddenly becomes massively slow. The LSE full trading pool might fit the new system or it might die. The fact that the new system is on hold pending upgrades tells us the new system has problems needing additional hardware or additional software or additional development work.

Most of the performance shortcuts have additional problems not mentioned when people put forward the shortcuts. You move problems from one part of the system to another. Think about the highly tuned engines used in Formula 1 racing. The average F1 race is shorter than a typical cross city commute in Sydney. An F1 race with twenty cars has at least one breaking down with engine failure. Think of the million commuters on Sydney roads and extrapolate. If they all drove F1 cars, there would be 50 thousand cars creating down during the morning commute to work then another 50 thousand breaking down during the afternoon commute.

The new LSE system, using Linux, has failed at least one in the short time since implementations and there is a report saying it failed twice. The old system failed only once in several years of use. The Linux promoters are quick to say the breakdowns are not the fault of Linux but those same people keep mentioning Windows when they highlight the one failure of the old system.

The new LSE system is highly specialised and tuned. It was always going to fail during volume testing. Reproducing that volume of trades for months on end is difficult during testing. Open source software is well tested for Web servers and some other applications. There are not enough stock exchanges in the world to test all the components of the LSE system in the same combination with the same type of activity.

People tell me they always have to reboot Windows and never have to reboot Linux. I point out that I have used Linux, Solaris, Unix, Windows, and various Apple operating systems side by side for years. They all need reboots. When you run the same applications on Linux and Windows, they need the same number and frequency of reboots. The problems are usually the applications, not the operating system.

My current Windows XP 64 workstation runs many more applications than my Linux equivalent. They both have to be rebooted due to applications locking up. The Windows XP 64 machine does not need rebooting due to operating system problems. The Linux system, using Ubuntu, needed regular reboots for Ubuntu 8, 9, and 10.4. The latest Ubuntu 10.10 seems to work reliably but still needs a reboot each week during the system update.

Stock exchanges typically work for only a few hours each day then shut down giving you time to add more memory, replace failed disks, and reboot. Even if Linux never needed a reboot, stock exchanges can reboot every day.

C#

One report blames the programming language C# for the failure of the previous system. Programmers like to blame the languages they do not like, even if they have never used the languages. From my experience of trying to program in a Microsoft environment, the biggest problem is not the language, it is the endless modules brought in from external libraries by people wanting to save coding time. Those modules are closed source and not properly tested. When you build a high reliability system, you have to replace those unknown external modules with your own code so you can test every aspect of the code.

Some of the Microsoft languages are pigs to use because there are so few people with solid experience of using them. A common alternative is Java and Java is a pig to learn, creating the same problems as the Microsoft languages. If the LSE wanted a low cost reliable system, they should have used common Web based technologies because Web related technologies are the most heavily tested programming technologies in the world. They could then use commonly known ways to tune their Web based system.

The LSE new system was written in native code, meaning either assembler or a hardware specific version of C. The developers worked with Intel to use all the hardware features of a specific Intel processor. The previous system was effectively hardware independent, giving the owners the chance to beat down the cost of their hardware. The new system limits hardware savings in return for performance.

The Move to Linux is actually a move to:

  • Hand coded application
  • Hardware specific tuning
  • Oracle
  • Fast network
  • Oh and something called Linux

.NET

The LSE previous system used .NET. .NET creates problems everywhere I see it in use. People are blaming Windows when Windows can run six months error free if you do not use .NET or Internet Explorer. .NET crashed every computer where I tried to use .NET or allowed Microsoft Update to install a .NET update. .NET is the reason I always keep a Linux computer handy when updating a Windows machine. I can understand the LSE doing anything to remove .NET.

Linux?

The new LSE system is the third LSE system, not the second. The first ran on Tandem computers because reliability was the number one priority at the time. If you look far enough back to see when the second LSE system was started, the question would not be Windows or Linux, back then commercial application developers were choosing between Windows and Solaris. HP Unix was just turning the corner from popular to obsolete. NetBSD and FreeBSD were as popular as Linux outside of Web development. People were still choosing Novel Unix over Linux for corporate applications. The decision to make the second system compatible with the first had the twin effects of making a Web based solution unlikely and of downgrading the chance to use one of the operating systems popular for Web applications.

Microsoft eventually ran an advertising campaign saying Windows beat Linux but Linux was not a big consideration for that type of application at the time the decision was made. It was Windows against long list of alternatives with Linux perhaps fifth on the list.

If the LSE used Solaris for their second system and the system crashed, would the Linux people be publicly complaining about Solaris the way they complain about Windows? Why do they complain about the LSE system crashing on Windows just once in several years but ignore the Linux based replacement crashing twice in just a few months while processing far less data?

Apple have a lot of experience with operating systems. They wrote several and threw them all away, creating great expense for Apple computer owners who were forever upgrading and having to replace lots of expensive applications repeatedly. Apple eventually settled on a BSD licensed Unix so they can make a bigger profit than they could if they used Linux. Many Linux distributions are less reliable than the Apple OSX labelled Unix because the Linux distributions do not have a massively big rich company focused on stamping out bugs. Well, not really focused on that, Apple are focused on advertising on television, but Apple do put more resources into their pet Unix than most Linux distributions receive. The Apple OSX Unix still fails.

Linux has some support from IBM. The biggest Linux distribution is Ubuntu and the latest Ubuntu is great for the desktop. Back when the LSE had to make their decision about their second system, the Linux choice was Debian Linux against Red Hat Linux and neither had good support for the latest hardware, nowhere as good as today. Back then Sun Spark sales where based largely on Sun Solaris working on Sun Spark without extensive work installing special drivers. Using Sun Solaris on Sun Spark was more expensive than using Windows on Intel/AMD based hardware from HP/Dell/IBM. Linux and Apache was the number one choice for the Web but not everywhere else.

Linux is now the obvious replacement for Windows. Ubuntu and Fedora give you up to date hardware support. Google uses Linux for Android. You can still choose the wrong way to develop applications on top of Linux, resulting in crashes, and you still have to make choices about reliability versus performance. Until you can eliminate Oracle, you system is not truly free or open.

Non stop replacement

The first LSE system was built on the famous Tandem NonStop system. The NonStops had no single point of failure and were used by many stock exchanges and banks. The Tandem NonStop database technology was the base for Microsoft's database clustering system. If you had to replace a Tandem system, the Microsoft clustering technology could look like the easiest option. Oracle brought out an equivalent but the conversion was difficult because Oracle used to have a lot of weird historical peculiarities.

If you talked, back then, with database experts who have a lot of experience with different databases, most of them would advise against a conversion to Oracle without a lot of money and resources. Some would suggest the open source PostgreSQL ahead of Oracle. DB2 and Sybase would be mentioned. The Microsoft SQL server is based on Sybase, cost half the price of Oracle, and was closer to the SQL standard.

Conclusion

There are so many differences between the old LSE system and the new LSE system that the operating system is irrelevant, and is the least important factor in the change. The two LSE crashes on Linux and the one on Windows would occur on Solaris, OS/2, HP-UX, NetBSD, anything.