In 2003 I released a piece of software I called LTS, which today is called “Cloud Optimized Storage”. Originally I was trying to implement the functionality of EMC’s Centera, which I did, but I went way past it. I realized that I wanted finer control over where I wanted the data to be stored. LTS was part of an email archiving system, and it held some 2 million email messages at one time!
I also wanted to control how an object was named and how it was protected. After all, I thought why in the heck would I want to take a hash of an object, use it as the name and store it? It made no sense as to a way to protect a document! Didn’t we all learn in college that you don’t want take one thing and make it do something else? So to me a name was just a name and if I wanted to protect an object I ought to protect it.
Being in the network world since before the advent of TCP/IP (now who doesn’t remember UUCP?) I decided that I ought to be able to talk to this device from any where in the world, and if I could put a front end on LTS I could spread them out every where. Hence LTS started to look like what EMC Atmos is today, and it was done in 2003!
Sure I stumbled on some things. Coming from a telecom space I thought Linux HA was great (after all I used it in the design and implementation of Vonage’s first voicemail system). Only when I started to do the numbers – it would take 500 servers, each with 4 disk drives of 500 GB each to store 1 petabyte – did I realize that conventional HA wouldn’t work. So I needed to find some other approach.
I also started to realize that all those 2000 disk drives would act different than the one I had in my house. After all 2000 of anything is a large data set and something will always be failing. Even the 500 servers is a large number and one or more of them would fail. So I needed to deal with that issue too.
Since I had spent so much time in my career dealing with network and server management (I was part of the IETF that worked on specifiying SNMP and part of the Compaq team to put CPQ servers in the enterprise), I started to realize that configuring those 500 servers or 2000 drives was going to be a challenge as well as watching them operate.
All this led me to the 2005/2006 release of Twisted Storage. Well, that and I wanted to do it in Python. Since that initial release I decided that I didn’t quite hit the mark on some of the things I wanted in the system. And I have been working to solve them ever since.
Today I’ve started to release the newest version of Twisted Storage. It is a complete redesign of the 2005 release (in fact there was one between 2005 and today that didn’t see the light of day!). This is the fourth version of the system! It is as different as the 2005 version was from LTS.
Version 4 is based on some very different design goals, and over the next few weeks I will start to discuss them. Here is the list:
- Content always available
- No backup, recovery or restore
- Incremental, linear scalability
- Policy driven storage
- Work flow enabled processing
- Tune-able “knobs” to trade-off durability and performance
- No special hardware; support old hardware
- Minimal administrative overhead
- Totally distributred, loosely coupled, asynchronous design
- Support external, legacy storage systems and linkage
Today I think I got darn close to meeting all those objectives. I’ve left some of the harder ones off the list. After all you have to have something to work toward. I’ve always wanted a system that could take common sense reasoning and fix itself (doesn’t that sound very HAL like?). In that same vein I want the system to be able to dynamically reconfigure itself – not just take nodes and storage out of use but to actually change how the program operates! Yes, I know really out there, but I love challenges.
Over the next month I will be releasing each of the new components of Twisted Storage. Yes, it is slow but I want to make sure there is documentation in place (I hate projects that assume you are going to read code or look at generated lists of calls! – That is not the way to do it!).
The first part has been released and I call it TSnosql. It is a python implementation of a rather sophisticated key-value system. Check it out and look at the documentation. It is the heart of Twisted Storage’s management configuration system.