Jump to content











Photo
- - - - -

Multi Session MultiCast Imaging


  • Please log in to reply
2 replies to this topic

#1 GeekToMe

GeekToMe

    Member

  • Members
  • 39 posts
  •  
    Canada

Posted 17 September 2010 - 06:31 PM

Hey guys,

I've posted most of this information over on Symantec's forums, but I had some non Ghost related issues that I need some insight on, and figured I might as well give as much information as possible.

I got some awesome help from a few boot-land.net members over the last year when this project was in its infancy and I've taken that and built a pretty nice setup. Now it just needs some fine tuning.

Most of this info is related to my Ghost issues, but if anyone has advice that would be great.Ok so here goes:

We are currently in the process of developing a large imaging infrastructure for our refurbishing department. This project has been going on for just over a year and has evolved several levels since its inception. Right now the system is functional, although we think the performance could be much better.

I'll start by explaining the physical setup of the system, the hardware we're using, and the basic process.

We currently have a large shelving area setup to support about 180 un-boxed laptops , AC adapters plugged in, and connected to the network with Cat5e. 95% of the laptops only have 10/100 Ethernet adapters. These are connected to two Dell PowerConnect 6248 48-port managed Gigabit switches (and we have two more are on the way to fully connect all the units). The switches are currently connected to each other with just Cat5e as well, and one of them is connected to our new imaging server. The server is a quad core Q8300, with 6 GB of RAM, and two 2TB RAID1 (4 Western Digital 2TB SATAII 5900RPM Low Power drives) storage arrays to store all the images, and running Windows 7. There is also a D-Link DIR-655 Gigabit router that is set to help with DHCP addressing.

On the software side of things, I have a PXE/TFTP boot system setup using TFTPd32 and PXELinux, with menu options for Dariks Boot & Nuke which we use to perform DoD 5022.22M 7 pass encrypted drive scrubbing, and a custom ISO of Symantec's Ghost WinPE.

We just recently upgraded to the new image server from an older Dell Xeon based system with Windows Server 2003 which was getting bogged down by multiple multicast sessions with Ghost Server. Earlier on in the project we had large batches of similar units, 30-40 of the same laptops, but now we have up to 15 different models, in batches of usually 3-10 of the same system.

We upgraded to the managed switches after getting abysmal performance from regular Gigabit switches, but even still with all the new hardware, most batches only restore at about 250MB/min if lucky, and usually sink down to 120-150MB/min. And with some of the images exceeding 25GB, this takes a very long time to process the systems (3-5 hours). We are not necessarily running 15 multicast sessions all the time, usually just 4-5, but ideally we would like to be able to image a full load of 180 systems in the run of the day.

The systems are un-boxed, setup by our warehouse staff and physically connected to the network, then our technician boots to the PXE and starts the scrubbing process which we let run overnight. The next morning they get reloaded into the WinPE-512 environment, which offers a menu to connect to one of 15 restore sessions, 3 create sessions, and run Speccy to verify system specifications during the restore.

At this point, like I said, the system is functional, just slow. We have the Dell switches setup for IGMP snooping with help from Dell on the configuration, and using several monitoring tools, the imaging server seems to send out data at around 20-40MB/s with 4-5 concurrent sessions. Uni-cast create sessions seem to run fine, usually around 450-600MB/min on the 10/100 units, and I tested it with my HP 6930p and got 1400-1600MB/min on a create session. Internal transfer speeds between the RAID drives is around 100-125MB/s and Windows based file transfers between my 6930p and the image server over the Gigabit connection were doing about 70MB/s. When a single session of 2-3 systems is restoring by themselves their transfer speeds are usually around 300-500MB/min.

I'm kind of stumped at the moment. Could this be a limitation of Ghost that is causing the slow speeds? Are we trying to push it too far? I think the Gigabit connection should be able to support at least 7-9 sessions running at 300-500MB/min leaving some room for packet overhead. The server seems to be plenty powerful, not exceeding 50% CPU utilization, even with 15 sessions running. I have added the correct drivers when ever we run into a new system that the Ghost WinPE doesn't support so I don't believe it is a driver issue on the laptops.

Any suggestions would be greatly appreciated. Below I've included the configuration file that Dell recommended, so if there is anyone with PowerConnect experience they might be able to point out an issue.

console#show run!Current Configuration:!System Description "Dell 24 Port Gigabit Ethernet, 2.0.0.12, VxWorks5.5.1"!System Software Version 2.0.0.12!configure!vlan databasevlan  700ip igmp snooping 1ip igmp snooping querier 1exit!ip address 192.168.2.1 255.255.255.0ip address vlan 700ip routingbridge multicast filteringip igmp snoopingip igmp snooping querier!interface vlan 1routingip address  192.168.20.1  255.255.255.0exit!interface range ethernet 1/g1-1/g48ip igmp snoopingspanning-tree portfastexitexit

Ok. So here are some questions that I have for Boot-Land.net members:

1. With PXELinux, is there a way to show transfer progress with XXX% rather than the little dots that stream across the console? The WinPE image is 180MB and it would be nice to see actual progress rather than just a screen full of periods. Not a huge priority, mostly cosmetic.

Regarding my current PXELinux setup, I am using pxelinux.0 and vesamenu.c32 from version 3.86 of Syslinux, and memdisk from version 4.03-pre2. I haven't tried with fully running from 4.03pre2 on the new server, but when we setup the old Dell server 2 weeks ago we had major transfer issues with the v4 files. After playing around with it I found the current setup to work best, but I will take a look at using full v4 this afternoon.

2. I'm pretty sure this is an issue with the actual PXE or TFTP protocols, but when there are multiple units transferring via TFTP, and some multicast sessions are operating, many units are not able to receive the PXE boot files and loading the ISOs slows to a crawl. I think I remember reading at one point that because TFTP is a pretty simple protocol, it doesn't have any kind of quality of service or prioritization mechanism. This causes problems, because if we add a new unit when there is a lot of network activity, we cannot get it loaded into the PXE or Ghost environment to start a fresh session.

3. I've solved a previous issue of lengthy selection of the boot file with the Option 209 and a direct path to the pxelinux.cfg/default and it seems to work in both PXE Compatibility mode and Option Negotiation, so I was wondering if there was any benefit to using one over the other.

4. We have had some slight issues with DHCP addressing between the TFTPd32 and the DLink router, and after doing some research I believe I need to setup TFTPd32 as a DHCP Relay, to just perform the TFTP duties instead of DHCP and TFTP. Currently we have an address pool from 192.168.20.10-90 assigned by TFTPd32, and the DLink with 192.168.20.91-254, the theory behind that was to have TFTPd32 assign addresses for the initial PXE boot and TFTP transfer, and then the DLink assign addresses once the Ghost WinPE booted. When we removed the DLink router from the network, TFTPd32 seemed to have problems assigning and maintaining addresses and performance slowed as well. I found an app called Serva, which incorporates TFTPd32 and a few other programs to provide more features, but it's DHCP Relay didn't really seem to work, and it had all the same options for the TFTPd32 module as the standalone. I'm not sure if that Serva project is being developed with any kind of association with Boot-Land.net, but I noticed that TFTPd32 hasn't been updated since last year.

Here are the settings that seem to give the best performance in TFTPd32 so far.

DHCP & TFTP Servers enabled
DHCP is bound to the popper network, but Ping before assignation and persistent leases are disabled.
TFTP Security set to None
Timeout set to 3 seconds
Max Retransmit 6
TFTP Port 69 (Firewalls are also disabled on the server and the router)
Option Negotiation enabled
Translate Unix File Names
TFTP is also bound to the proper network
Anticipation windows is set to 4096 - This has seemed to give the fastest transfer speeds, but I switch between 512, 1024, 2048, and sometimes 8192. nVidia network adapters do not like the anticipation window modification and I usually have to turn it off or down to 128/256.

Another issue with TFTP is when the transfer initiates, 4-5 of each file appear in the TFTP Server page progress list, then after a few seconds all but one error out and the transfer begins. I have fiddles with the different settings but the don't seem to have any effect. I also get a lot of "Ack block xxxxx ignored [recieved twice]" entries in the log. Timeouts also show up under heavy network load.

I think there may be some modifications to be made to the Dell switch to give the PXE and TFTP packets a higher priority, we also were going to try adding a second network card to run the DHCP and TFTP on and leave the Ghost multicasting on the existing Gigabit adapter.

Well I think this post is long enough for now, might add some more later, but I think this should be enough to get a start on.

Really looking forward to some input. Any help is appreciated!

Thanks,

Graham

#2 v_h

v_h

    Newbie

  • Members
  • 21 posts

Posted 29 September 2010 - 11:30 PM

Since no one answered you, I'll take a shot but my experience with Ghost is in situations nowhere in the scale you are using.

I got a couple of suggestions and the first is to put the old ghost server back in service. Leave the 2 Dell switches unconnected and run them in parallel.

Your disks setup is pretty slow. I would forget about the raid 1 setup and grab a couple of large (1 or 2 TB) caviar black drives instead. For recovery, you can make a ghost image of the ghost server and back up the images files to those green drives periodically depending on how often your images files changes. Split the ghost images between the two main drives and try to split the load between them. You can sync the images folders on each drive so that they are identical and alternate the drives for concurrent sessions.

I would drop tftpd32 altogether. I use tftpd32 for on-demand use but found that it is not reliable when running continuously. The server 03 probably got the DHCP server service and you need to add the option (67 iirc)to point to the pxelinux.0. I don't remember if server 03 got a tftp server builtin or not but there are plenty of free tftp server. The one usually mention in the Symantec Ghost forum is the Solarwind TFTP server. On the Windows 7 machine, you got the DLink handling out DHCP address so you need a PXE and tftp server. I just use the 3com Boot Services that used to come with Ghost. I don't know if it is still available or not. If you don't have access to the 3Com software, I think the Ghost CD comes with Altiris deployment center or something like that. I remembered installing part of it to get WinImage. The WinImage is an old version but very useful for modifying boot floppy images. I have also used Microsoft Remote Boot Service and TFTP service from the Windows Embedded CD.

If you think that tftp is slow, you may want to switch to gpxe so that tftp is only needed for the gpxelinux.0 file and the menu. The boot image can be sent by http. If all your ghost session used Ghost WinPE, you may want to check out the old DOS boot floppy image. It transfer very quickly and boot to ghost in seconds. I use the universal packet driver v2 and it works with at least 90 percent of our computer but ours is a pretty homogeneous shop with nearly 100% of the desktop is Dell and the laptops are a mix of Dells, Panasonics and GD/Itronix. Just remove the mouse driver, it locks up some computer so ghost won't start. Dos version is also easier to automate. For those that the DOS disk won't work, I use LiveXP and ghost32. It is much smaller and boot faster than the WinPE disk. I got 2, one is a stripped down BootSDI compressed booted with startrom.com for those low memory systems and a more full feature general purpose Wimboot one.

I hope that some of the suggestions help with your problem.

#3 patpat

patpat

    Member

  • Banned
  • 48 posts
  •  
    United States

Posted 14 October 2010 - 04:58 PM

Having TFTPd32 and the DLink router assigning IPs at the same time is not good... You can let the DLINK to assign IPs and use Serva32 as "proxy" DHCP.
When using Serva as proxy DHCP, Serva DHCP only answers to PXE clients a special "complementary" DHCP packet containing the name of the file to boot from (pxelinux.0) and the IP of the TFTP server (Servas's own IP)
Serva's IP can be dynamically assigned by the DLINK. On this case remember not to bind the proxy DHCP and TFTP services to any particular address. You can also assign to Serva’s interface a fixed IP (coordinated with the DLINK pool) and binding Serva's proxy DHCP and TFTP services to it.
Serva32 has a general log where the activity of all the protocols can be traced (including PDHCP & TFTP) if you can post it here I might help you a bit more on this.

I'm using Serva on a daily basis w/o problem but not on the scale of your project. Serva has not any kind of association with Boot-Land.net




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users