I've posted most of this information over on Symantec's forums, but I had some non Ghost related issues that I need some insight on, and figured I might as well give as much information as possible.
I got some awesome help from a few boot-land.net members over the last year when this project was in its infancy and I've taken that and built a pretty nice setup. Now it just needs some fine tuning.
Most of this info is related to my Ghost issues, but if anyone has advice that would be great.Ok so here goes:
We are currently in the process of developing a large imaging infrastructure for our refurbishing department. This project has been going on for just over a year and has evolved several levels since its inception. Right now the system is functional, although we think the performance could be much better.
I'll start by explaining the physical setup of the system, the hardware we're using, and the basic process.
We currently have a large shelving area setup to support about 180 un-boxed laptops , AC adapters plugged in, and connected to the network with Cat5e. 95% of the laptops only have 10/100 Ethernet adapters. These are connected to two Dell PowerConnect 6248 48-port managed Gigabit switches (and we have two more are on the way to fully connect all the units). The switches are currently connected to each other with just Cat5e as well, and one of them is connected to our new imaging server. The server is a quad core Q8300, with 6 GB of RAM, and two 2TB RAID1 (4 Western Digital 2TB SATAII 5900RPM Low Power drives) storage arrays to store all the images, and running Windows 7. There is also a D-Link DIR-655 Gigabit router that is set to help with DHCP addressing.
On the software side of things, I have a PXE/TFTP boot system setup using TFTPd32 and PXELinux, with menu options for Dariks Boot & Nuke which we use to perform DoD 5022.22M 7 pass encrypted drive scrubbing, and a custom ISO of Symantec's Ghost WinPE.
We just recently upgraded to the new image server from an older Dell Xeon based system with Windows Server 2003 which was getting bogged down by multiple multicast sessions with Ghost Server. Earlier on in the project we had large batches of similar units, 30-40 of the same laptops, but now we have up to 15 different models, in batches of usually 3-10 of the same system.
We upgraded to the managed switches after getting abysmal performance from regular Gigabit switches, but even still with all the new hardware, most batches only restore at about 250MB/min if lucky, and usually sink down to 120-150MB/min. And with some of the images exceeding 25GB, this takes a very long time to process the systems (3-5 hours). We are not necessarily running 15 multicast sessions all the time, usually just 4-5, but ideally we would like to be able to image a full load of 180 systems in the run of the day.
The systems are un-boxed, setup by our warehouse staff and physically connected to the network, then our technician boots to the PXE and starts the scrubbing process which we let run overnight. The next morning they get reloaded into the WinPE-512 environment, which offers a menu to connect to one of 15 restore sessions, 3 create sessions, and run Speccy to verify system specifications during the restore.
At this point, like I said, the system is functional, just slow. We have the Dell switches setup for IGMP snooping with help from Dell on the configuration, and using several monitoring tools, the imaging server seems to send out data at around 20-40MB/s with 4-5 concurrent sessions. Uni-cast create sessions seem to run fine, usually around 450-600MB/min on the 10/100 units, and I tested it with my HP 6930p and got 1400-1600MB/min on a create session. Internal transfer speeds between the RAID drives is around 100-125MB/s and Windows based file transfers between my 6930p and the image server over the Gigabit connection were doing about 70MB/s. When a single session of 2-3 systems is restoring by themselves their transfer speeds are usually around 300-500MB/min.
I'm kind of stumped at the moment. Could this be a limitation of Ghost that is causing the slow speeds? Are we trying to push it too far? I think the Gigabit connection should be able to support at least 7-9 sessions running at 300-500MB/min leaving some room for packet overhead. The server seems to be plenty powerful, not exceeding 50% CPU utilization, even with 15 sessions running. I have added the correct drivers when ever we run into a new system that the Ghost WinPE doesn't support so I don't believe it is a driver issue on the laptops.
Any suggestions would be greatly appreciated. Below I've included the configuration file that Dell recommended, so if there is anyone with PowerConnect experience they might be able to point out an issue.
console#show run!Current Configuration:!System Description "Dell 24 Port Gigabit Ethernet, 2.0.0.12, VxWorks5.5.1"!System Software Version 2.0.0.12!configure!vlan databasevlan 700ip igmp snooping 1ip igmp snooping querier 1exit!ip address 192.168.2.1 255.255.255.0ip address vlan 700ip routingbridge multicast filteringip igmp snoopingip igmp snooping querier!interface vlan 1routingip address 192.168.20.1 255.255.255.0exit!interface range ethernet 1/g1-1/g48ip igmp snoopingspanning-tree portfastexitexit
Ok. So here are some questions that I have for Boot-Land.net members:
1. With PXELinux, is there a way to show transfer progress with XXX% rather than the little dots that stream across the console? The WinPE image is 180MB and it would be nice to see actual progress rather than just a screen full of periods. Not a huge priority, mostly cosmetic.
Regarding my current PXELinux setup, I am using pxelinux.0 and vesamenu.c32 from version 3.86 of Syslinux, and memdisk from version 4.03-pre2. I haven't tried with fully running from 4.03pre2 on the new server, but when we setup the old Dell server 2 weeks ago we had major transfer issues with the v4 files. After playing around with it I found the current setup to work best, but I will take a look at using full v4 this afternoon.
2. I'm pretty sure this is an issue with the actual PXE or TFTP protocols, but when there are multiple units transferring via TFTP, and some multicast sessions are operating, many units are not able to receive the PXE boot files and loading the ISOs slows to a crawl. I think I remember reading at one point that because TFTP is a pretty simple protocol, it doesn't have any kind of quality of service or prioritization mechanism. This causes problems, because if we add a new unit when there is a lot of network activity, we cannot get it loaded into the PXE or Ghost environment to start a fresh session.
3. I've solved a previous issue of lengthy selection of the boot file with the Option 209 and a direct path to the pxelinux.cfg/default and it seems to work in both PXE Compatibility mode and Option Negotiation, so I was wondering if there was any benefit to using one over the other.
4. We have had some slight issues with DHCP addressing between the TFTPd32 and the DLink router, and after doing some research I believe I need to setup TFTPd32 as a DHCP Relay, to just perform the TFTP duties instead of DHCP and TFTP. Currently we have an address pool from 192.168.20.10-90 assigned by TFTPd32, and the DLink with 192.168.20.91-254, the theory behind that was to have TFTPd32 assign addresses for the initial PXE boot and TFTP transfer, and then the DLink assign addresses once the Ghost WinPE booted. When we removed the DLink router from the network, TFTPd32 seemed to have problems assigning and maintaining addresses and performance slowed as well. I found an app called Serva, which incorporates TFTPd32 and a few other programs to provide more features, but it's DHCP Relay didn't really seem to work, and it had all the same options for the TFTPd32 module as the standalone. I'm not sure if that Serva project is being developed with any kind of association with Boot-Land.net, but I noticed that TFTPd32 hasn't been updated since last year.
Here are the settings that seem to give the best performance in TFTPd32 so far.
DHCP & TFTP Servers enabled
DHCP is bound to the popper network, but Ping before assignation and persistent leases are disabled.
TFTP Security set to None
Timeout set to 3 seconds
Max Retransmit 6
TFTP Port 69 (Firewalls are also disabled on the server and the router)
Option Negotiation enabled
Translate Unix File Names
TFTP is also bound to the proper network
Anticipation windows is set to 4096 - This has seemed to give the fastest transfer speeds, but I switch between 512, 1024, 2048, and sometimes 8192. nVidia network adapters do not like the anticipation window modification and I usually have to turn it off or down to 128/256.
Another issue with TFTP is when the transfer initiates, 4-5 of each file appear in the TFTP Server page progress list, then after a few seconds all but one error out and the transfer begins. I have fiddles with the different settings but the don't seem to have any effect. I also get a lot of "Ack block xxxxx ignored [recieved twice]" entries in the log. Timeouts also show up under heavy network load.
I think there may be some modifications to be made to the Dell switch to give the PXE and TFTP packets a higher priority, we also were going to try adding a second network card to run the DHCP and TFTP on and leave the Ghost multicasting on the existing Gigabit adapter.
Well I think this post is long enough for now, might add some more later, but I think this should be enough to get a start on.
Really looking forward to some input. Any help is appreciated!
Thanks,
Graham