Jump to content











Photo
* * * * * 1 votes

Poor man's cluster for Windows (DRBD lookalike)


  • Please log in to reply
6 replies to this topic

#1 Dr. Net!

Dr. Net!
  • Members
  • 4 posts
  •  
    Austria

Posted 06 December 2007 - 08:33 AM

Hi all,

First of all: Thank you Olof - you give me hope ...

Please excuse a longish first post!

A very common usage scenario I have encountered over and over for the quite some years now is "Semi high availability": When a Server dies, it is acceptable for users to be "thrown out" of a server-based application, but after a maximum of 2 minutes or so they need to be able to log in again and continue work - even if the dead server has developed a hardware fault or won't come up again for any other reason.

This forces some failover model between a pair of servers, and while the switchover between two windows servers is not trivial but manageable, there remains the issue of data: There is no way to make sure a standby server has up-to-date data on its storage without really serious storage devices, say: Fibre or NetApp etc., which are more often than not prohibitivly expensive.

In Linux there exists DRBD, which is essentially a Raid 1 over TCP: The disks of two servers are kept in sync on a block level via the network, (at most) one of the machines being the master at any point in time - this master being able to actually use the disk. Reads are satisfied from the master's local disk, while writes are handed to both the master's and via TCP to the slave's disk. When the master dies, the slave can become the new master and now has an up-to-date local disk to use when going from standby to active state.

This eliminates the need for expensive shared storage, cutting costs for a typical installation by 50% or so.

Most of DRBD is about optimization, control logic, authentication etc - things that can more often than not be handled by scripting, if the failover window is not in the 2sec, but in the 2min range.

Our current solution is to set up pairs of Linux servers with DRBD, than run VMware and a single guest Windows instance, the DRBD volumes being presented to Windows as virtual disks. Essentially Linux "miniclustering" and virtualized Windows handles the workload.

Enter ImDisk with the proxy mode!

A quick look a the source makes me quite sure, that besides using a file backend OR establishing TCP connections to devio, it should easily be feasable to read from the file backend, while writing to file AND devio/TCP. A minimal addition to the initialization would have to provide, that the file backend and the devio connection agree in backend size. Connection timeout or socket error would have to give a retry or two, then just drop the TCP part an run as a "normal" file backend.

Would you consider adding such a mode?

I would be happy to provide the integration - i.e. a complete set of scripts and hacks, maybe consolidated into a service:
On the master side it would wait for a slave to appear (honoring a timeout), then set up ImDisk as a master and signal the application to start. On the slave side it would heartbeat the master, and - when the master stops responding - become a master itself and signal the application to now run here.
If the application used a different IP (or better: different NIC) than the heartbeat, a "netsh interface ip name ip" could provide for failover of this "payload address".

Does this make any sense?

Best regards & thanks agian,

Eugen

#2 Olof Lagerkvist

Olof Lagerkvist

    Gold Member

  • Developer
  • 1448 posts
  • Location:Borås, Sweden
  •  
    Sweden

Posted 08 December 2007 - 09:25 PM

I must say this was interesting to read.

One thought that appeared in my mind when I read it was, how do you keep the virtual disk cache synchronized? If there is data written by the application but not yet completely written to disk the filesystem could be in a state where it needs a chkdsk before re-mounted from another system.

Anyway, what you are asking for should be really easy to do as a simple user-mode proxy application for imdisk. Essentially, all it would need to do would be responding to I/O requests and pass them to an image file just like the current devio does but also queue the write operations to devio listening on another machine. That is, a simple modification to the current devio.

#3 Dr. Net!

Dr. Net!
  • Members
  • 4 posts
  •  
    Austria

Posted 23 January 2008 - 11:10 PM

Hi Olof,

sorry for the long delay - quite busy at the moment!

One thought that appeared in my mind when I read it was, how do you keep the virtual disk cache synchronized? If there is data written by the application but not yet completely written to disk the filesystem could be in a state where it needs a chkdsk before re-mounted from another system.


There are 2 answers to this:

The simple version: We are talking of the block device layer - so whatever comes above is of no interest to us.

The correct(er) version:
A ) Whatever client consumes a high availability storage service is likely to use cache passby. I am not a Windows expert, but i am sure there exists something like the Linux IO_DIRECT on Windows (in fact, I know MSSQL-Server to use it)

B ) The need for chkdsk could easily be satisfied by the scripting part that glues the whole consept together. This might push the time for recovery a bit, but it is easily makeable. Since the original slave activly switches to master because he detects a master failure, the slave definitly knows about the need for chkdsk, so he might as well run it.

Anyway, what you are asking for should be really easy to do as a simple user-mode proxy application for imdisk. Essentially, all it would need to do would be responding to I/O requests and pass them to an image file just like the current devio does but also queue the write operations to devio listening on another machine. That is, a simple modification to the current devio.


There is one additional issue (good-to-have, not must-have): Both instances must agree on the disk and block size

So I think for a proof of concept we only need a modification to allow local storage AND proxy mode to run parallel for write, with local storage only for read.

I would create some batch files for the handling and set up a testbed, seriously thrashing it.

If it looks promising, further work can be put into making it a self-containing productionready tool

Would you give it a shot?

Best regards,

Eugen

#4 Olof Lagerkvist

Olof Lagerkvist

    Gold Member

  • Developer
  • 1448 posts
  • Location:Borås, Sweden
  •  
    Sweden

Posted 25 January 2008 - 03:32 PM

...

So I think for a proof of concept we only need a modification to allow local storage AND proxy mode to run parallel for write, with local storage only for read.

I would create some batch files for the handling and set up a testbed, seriously thrashing it.

If it looks promising, further work can be put into making it a self-containing productionready tool

Would you give it a shot?


Yes definitely, this sounds interesting and feels like a "must-try/must-do" thing :thumbsup:

Problem is that I am (extremely) busy, but I will give it a try a soon as I get some time! :D

#5 Dr. Net!

Dr. Net!
  • Members
  • 4 posts
  •  
    Austria

Posted 12 February 2008 - 11:51 PM

Problem is that I am (extremely) busy, but I will give it a try a soon as I get some time! :thumbsup:


Thanks a lot, I appreciate it!

Best regards, Eugen

#6 Dr. Net!

Dr. Net!
  • Members
  • 4 posts
  •  
    Austria

Posted 18 September 2009 - 09:41 PM

Hi Olof et al.,

Anyway, what you are asking for should be really easy to do as a simple user-mode proxy application for imdisk. Essentially, all it would need to do would be responding to I/O requests and pass them to an image file just like the current devio does but also queue the write operations to devio listening on another machine. That is, a simple modification to the current devio.


Sorry to ask, but has there been any progress on this? The other "poor man's cluster" tools have quite matured meanwhile, working good with iSCSI storage.

Now this is the last missing bit to get this running without specialised hardware, and I am simply not up to the job: I dont speak C fluently enough.

Best regards,

Eugen

#7 Olof Lagerkvist

Olof Lagerkvist

    Gold Member

  • Developer
  • 1448 posts
  • Location:Borås, Sweden
  •  
    Sweden

Posted 20 September 2009 - 08:27 PM

Sorry to ask, but has there been any progress on this?


Sorry, not any I have heard of. There have been a few e-mail reports from people who have made things almost similar to this, but not exactly the same. I would say the interest for this particular feature was bigger a few years ago and people who have been interested in it have in many cases moved to other projects, such as filesystem filter based projects etc.




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users