Jump to content











Photo
* * * * * 1 votes

gPXE not handing off iSCSI connection to Windows installer.

iscsi gpxe windows

  • Please log in to reply
39 replies to this topic

#1 sfinktah

sfinktah

    Frequent Member

  • Advanced user
  • 217 posts
  • Location:Der Äther
  • Interests:/(C(++|#)|P(HP|XE)|(OS|Linu)X|8051)/
  •  
    Australia

Posted 05 December 2011 - 07:38 PM

Tested with Windows 7, Windows 7 SP1, Windows 2K8, Windows 2K8 SP1 for DVD install. (Including Win PE via TFTP methods).
Tested with openfiler and Microsoft iSCSI Software Target.
Tested with VMware E1000 virtual environment, and two physical machines (GIgabyte Z68 / i7 2600k).
Tested with gPXE 1.01


This part works fine:

Posted Image



IP minint-cd1j4jq.lan.1024 > win-2008-dev.lan.iscsi-target: Flags [P.], seq 49:154, ack 1, win 4096, options [nop,nop,TS val 407500 ecr 669092], length 105																					  

   0x0000:  4500 009d 0308 0000 4006 f2ff c0a8 0173  E.......@......s																																											

   0x0010:  c0a8 0190 0400 0cbc 4bdf 37a0 1376 a597  ........K.7..v..																																											

   0x0020:  8018 1000 09f4 0000 0101 080a 0006 37cc  ..............7.																																											

   0x0030:  000a 35a4 496e 6974 6961 746f 724e 616d  ..5.InitiatorNam																																											

   0x0040:  653d 6971 6e2e 3230 3030 2d30 392e 6f72  e=iqn.2000-09.or																																											

   0x0050:  672e 6574 6865 7262 6f6f 743a 554e 4b4e  g.etherboot:UNKN																																											

   0x0060:  4f57 4e00 5461 7267 6574 4e61 6d65 3d69  OWN.TargetName=i																																											

   0x0070:  716e 3a74 6172 6765 7400 5365 7373 696f  qn:target.Sessio																																											

   0x0080:  6e54 7970 653d 4e6f 726d 616c 0041 7574  nType=Normal.Aut																																											

   0x0090:  684d 6574 686f 643d 4e6f 6e65 00		 hMethod=None.														


and so forth, and so on.

However, when the operating system being installed attempts to take over, it enters into a SYN/RST loop.


Last of the gPXE packets (note originating port 1024)

IP minint-cd1j4jq.lan.1024 > win-2008-dev.lan.iscsi-target: Flags [P.], ack 4481, win 4096, options [nop,nop,TS val 408816 ecr 676838], length



First packets from Windows Installation (note originating port 49158)

IP minint-cd1j4jq.lan.49158 > win-2008-dev.lan.iscsi-target: Flags [S], seq 1493702989, win 8192, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length

IP win-2008-dev.lan.iscsi-target > minint-cd1j4jq.lan.49158: Flags [R.], seq 0, ack 1493702990, win 0, length 0																																  


This just repeats. In fact, after writing all this, I turned my sniffer back on to check:


IP minint-f4151q2.lan.49335 > win-2008-dev.lan.iscsi-target: Flags [S], seq 1797995141, win 8192, options [mss 1460,nop,nop,sackOK], length 0

IP win-2008-dev.lan.iscsi-target > minint-f4151q2.lan.49335: Flags [R.], seq 0, ack 1797995142, win 0, length 0

IP minint-f4151q2.lan.49336 > win-2008-dev.lan.iscsi-target: Flags [S], seq 4194978989, win 8192, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0

IP win-2008-dev.lan.iscsi-target > minint-f4151q2.lan.49336: Flags [R.], seq 0, ack 1, win 0, length 0



At no point is the original connected from gPXE (port 1024) released... and at this point I do not have the relevant information to determine whether it should. But I know that two simultaneous connections is just not going to happen.

To see if I could remedy the situation, I enabled timeouts in the Microsoft iSCSI target, and it obligingly dropped the original gPXE connection from port 1024. However, the SYN/RST loop continues, and a new connection is not established.

Edited by sfinktah, 05 December 2011 - 07:42 PM.


#2 erwan.l

erwan.l

    Platinum Member

  • Developer
  • 3041 posts
  • Location:Nantes - France
  •  
    France

Posted 05 December 2011 - 07:48 PM

Tested with Windows 7, Windows 7 SP1, Windows 2K8, Windows 2K8 SP1 for DVD install. (Including Win PE via TFTP methods).
Tested with openfiler and Microsoft iSCSI Software Target.
Tested with VMware E1000 virtual environment, and two physical machines (GIgabyte Z68 / i7 2600k).
Tested with gPXE 1.01


This part works fine:

Posted Image



IP minint-cd1j4jq.lan.1024 > win-2008-dev.lan.iscsi-target: Flags [P.], seq 49:154, ack 1, win 4096, options [nop,nop,TS val 407500 ecr 669092], length 105																					  

   0x0000:  4500 009d 0308 0000 4006 f2ff c0a8 0173  E.......@......s																																											

   0x0010:  c0a8 0190 0400 0cbc 4bdf 37a0 1376 a597  ........K.7..v..																																											

   0x0020:  8018 1000 09f4 0000 0101 080a 0006 37cc  ..............7.																																											

   0x0030:  000a 35a4 496e 6974 6961 746f 724e 616d  ..5.InitiatorNam																																											

   0x0040:  653d 6971 6e2e 3230 3030 2d30 392e 6f72  e=iqn.2000-09.or																																											

   0x0050:  672e 6574 6865 7262 6f6f 743a 554e 4b4e  g.etherboot:UNKN																																											

   0x0060:  4f57 4e00 5461 7267 6574 4e61 6d65 3d69  OWN.TargetName=i																																											

   0x0070:  716e 3a74 6172 6765 7400 5365 7373 696f  qn:target.Sessio																																											

   0x0080:  6e54 7970 653d 4e6f 726d 616c 0041 7574  nType=Normal.Aut																																											

   0x0090:  684d 6574 686f 643d 4e6f 6e65 00		 hMethod=None.														


and so forth, and so on.

However, when the operating system being installed attempts to take over, it enters into a SYN/RST loop.


Last of the gPXE packets (note originating port 1024)

IP minint-cd1j4jq.lan.1024 > win-2008-dev.lan.iscsi-target: Flags [P.], ack 4481, win 4096, options [nop,nop,TS val 408816 ecr 676838], length



First packets from Windows Installation (note originating port 49158)

IP minint-cd1j4jq.lan.49158 > win-2008-dev.lan.iscsi-target: Flags [S], seq 1493702989, win 8192, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length

IP win-2008-dev.lan.iscsi-target > minint-cd1j4jq.lan.49158: Flags [R.], seq 0, ack 1493702990, win 0, length 0																																  


This just repeats. In fact, after writing all this, I turned my sniffer back on to check:


IP minint-f4151q2.lan.49335 > win-2008-dev.lan.iscsi-target: Flags [S], seq 1797995141, win 8192, options [mss 1460,nop,nop,sackOK], length 0

IP win-2008-dev.lan.iscsi-target > minint-f4151q2.lan.49335: Flags [R.], seq 0, ack 1797995142, win 0, length 0

IP minint-f4151q2.lan.49336 > win-2008-dev.lan.iscsi-target: Flags [S], seq 4194978989, win 8192, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0

IP win-2008-dev.lan.iscsi-target > minint-f4151q2.lan.49336: Flags [R.], seq 0, ack 1, win 0, length 0



At no point is the original connected from gPXE (port 1024) released... and at this point I do not have the relevant information to determine whether it should. But I know that two simultaneous connections is just not going to happen.

To see if I could remedy the situation, I enabled timeouts in the Microsoft iSCSI target, and it obligingly dropped the original gPXE connection from port 1024. However, the SYN/RST loop continues, and a new connection is not established.


I did install windows 2008 over iscsi a few times with gpxe.
Did you try previous versions or ipxe?

I was myself using starwind as an iscsi target.

/Erwan

#3 erwan.l

erwan.l

    Platinum Member

  • Developer
  • 3041 posts
  • Location:Nantes - France
  •  
    France

Posted 05 December 2011 - 08:18 PM

also, a while ago I had written this : how to install windows 2008 thru gpxe + iscsi.
may be it can help?
the keep-san option was the trick for me backt hen.

/erwan

edit : forgot the link http://erwan.l.free....si/install.html

#4 sambul61

sambul61

    Gold Member

  • Advanced user
  • 1568 posts
  •  
    American Samoa

Posted 06 December 2011 - 02:03 AM

also, a while ago I had written this : how to install windows 2008 thru gpxe + iscsi.

Hi Erwan,

Could you provide a link to this material?

#5 sfinktah

sfinktah

    Frequent Member

  • Advanced user
  • 217 posts
  • Location:Der Äther
  • Interests:/(C(++|#)|P(HP|XE)|(OS|Linu)X|8051)/
  •  
    Australia

Posted 06 December 2011 - 02:10 AM

Hahaha, oh erwan... if only you knew.... I recognised your name from your iscsi guide before I even got to read the post. I had actually just looked at it again today, as I was looking to see what versions of gPXE were "known working". Your site came up tops in the images search, and I noticed you were using 1.01+ too.

I will certainly try StarWind as a target... mostly the trouble with these guides (and yours is by far the most recent), is that you can't really repeat them exactly...

there is definately something wierd happening at a network level though. even though it's trying to reconnect a second time, the iSCSI server should still be accepting to connection to see what target it wants.

the other wierd thing i noticed was that pinging the iscsi server from a cmd window during setup was returning crazy destination unreachable message. definately something screwing in the networking department i say.

starwind it is then... and maybe a 0.96 vintage of gPXE or so, i'm not sure which was a good year :P

#6 sfinktah

sfinktah

    Frequent Member

  • Advanced user
  • 217 posts
  • Location:Der Äther
  • Interests:/(C(++|#)|P(HP|XE)|(OS|Linu)X|8051)/
  •  
    Australia

Posted 06 December 2011 - 02:23 AM

sambul, lol... you pasted me his link yesterday :P

anyway, it's no a question of "knowing how to do it", i already know how.

i could write 4 different guides without consulting any reference materials.

i know how to do it, the problem is it doesn't work. lol. no matter which method i use, they all do the same thing... they sit for about 60 seconds or more at two points during the install... (which i now know is because it's repeatedly trying to re-connect to the iscsi server)... then they give up.

if for any reason i don't get the iscsi right (say, if i forgot the keep-san option), then setup zips right past those points.

ironically, i can install windows xp over iscsi without any issue at all.

believe me, i've tried a lot of ways. i've booted gpxe from floppy, from usb, from pxe, i've even tried to get it burned into my bios (had to settle for a vmware bios replacement instead). i've chain loaded winpe and tried to setup from there, i've run setup from dvd, pxe, and over netbios....

i've even written a moa241 iscsi plugin pack that gives you the complete iscsi targetting system in the PE envrionment. and it works - you can view all the iscsi targets you like - you just can't install to them because they don't have iBFTs.

i think the only thing i haven't tried is using a microsoft tftp server. but i have tried with no tftp server at all... so i can't see that being the issue. still, i'll try anything.

Edited by sfinktah, 06 December 2011 - 02:30 AM.


#7 erwan.l

erwan.l

    Platinum Member

  • Developer
  • 3041 posts
  • Location:Nantes - France
  •  
    France

Posted 06 December 2011 - 05:19 AM

Hi Erwan,

Could you provide a link to this material?


post edited

#8 erwan.l

erwan.l

    Platinum Member

  • Developer
  • 3041 posts
  • Location:Nantes - France
  •  
    France

Posted 06 December 2011 - 05:23 AM

I will certainly try StarWind as a target... mostly the trouble with these guides (and yours is by far the most recent), is that you can't really repeat them exactly...


What do you mean by "cant repeat them exactly" ?

About the SYN/RST loop, from a basic network point of view, looks like the client (initiator) is trying to connect tot the server (target) but the target is not accepting it (starting with an ACK frame).
Have you checked the network rules on the target side?

/erwan

#9 erwan.l

erwan.l

    Platinum Member

  • Developer
  • 3041 posts
  • Location:Nantes - France
  •  
    France

Posted 06 December 2011 - 05:26 AM

the other wierd thing i noticed was that pinging the iscsi server from a cmd window during setup was returning crazy destination unreachable message. definately something screwing in the networking department i say.


there is definitely something wrong at your network level or at your target level : ip conflict / mac conflict during the process?

also, for now i would also stick to physical machines to exclude any issues around virtualisation. (i know you reported it fails in both world but you could have several issue mingled)

finally, looking at your first screenshot it looks ok to me :
-you did connect to your iscsi target (so it works?) but got a boot failed because your disk is not installed yet
-next step is to move on with the cd installer where you should see your blank disk : look at step 5 in my procedure (link above), we basically have the same screenshot just before booting to CD.

/erwan

#10 sfinktah

sfinktah

    Frequent Member

  • Advanced user
  • 217 posts
  • Location:Der Äther
  • Interests:/(C(++|#)|P(HP|XE)|(OS|Linu)X|8051)/
  •  
    Australia

Posted 07 December 2011 - 10:36 AM

reboot.nt4.com/Make_PE3_Plugin.iscsi_initiator.32_64.7z



#11 sfinktah

sfinktah

    Frequent Member

  • Advanced user
  • 217 posts
  • Location:Der Äther
  • Interests:/(C(++|#)|P(HP|XE)|(OS|Linu)X|8051)/
  •  
    Australia

Posted 07 December 2011 - 11:14 AM

When I say you can't repeat the guides exactly, I'm referring to the age of the software (often preview releases or beta candidates), and also non-specificity of information regarding the windows installation.

e.g. I can see from my logs, that the first test I did the other night, was to install Windows 7 x64 Ultimate SP1 (6.1.7601.17514 (win7sp1_rtm.101119-1850), using a winpe.wim created using the non-SP1 WAIK (7600.win7_rtm.090713-1255).

Here's what it looked like:


IBS    CallBack_LanguagePack_ReadLangIni:Successfully gathered language list from lang.ini.

IBS    Callback_BootstrapApplyWpeSettings: Detected iBFT; setup will initialize networking support for iSCSI

IBS    Computer is already named and no new name is specified, returning success.

IBS    Acquired profiling mutex

IBS    Install MS_MSCLIENT: 0x00000000

IBS    Install MS_NETBIOS: 0x00000000

IBS    Install MS_SMB: 0x00000000

IBS    Install MS_TCPIP6: 0x00000000

IBS    Install MS_TCPIP: 0x00000000

IBS    Service dhcp start: 0x00000000

IBS    Service lmhosts start: 0x00000000

IBS    Service ikeext start: 0x00000000

IBS    Service mpssvc start: 0x00000000

IBS    Released profiling mutex

IBS    Spent 141ms installing network components  

IBS    Spent 1731ms installing network drivers    

IBS    Spent 60841ms confirming network initialization; status 0x003d0002    

IBS    Callback_BootstrapApplyWpeSettings: Successfully initialized iSCSI support

IBS    Callback_SetWinPEAndOSImageInfoOnBB: Cannot set image info as source path is not yet set.


Note the 60841ms delay. Now if someone were to post a "setupact.log" of an install that worked, I could check for differences...

I thoroughly agree with the real machine factor, it was really hard to be sure the packet sniffs I was seeing were 100% authentic, as both machines were being hosted on the same hypervisor - and I couldn't get a proper packet sniff because OpenFiler doesn't have TCPDUMP, and I'm fairly sure packet sniffing the Windows end is out of the question (plus it would likely affect the results).

It occured to me last night after reading your last post, that I should attempt to try iSCSI installation of something else - e.g. a Linux distro - and compare the packet dumps to see what is *meant* to be happening.

I'm just setting up a StarWind target under vSphere 5.0, to do an install of a real PC. And then I'll swap, using a real iSCSI target, and a virtualized PC. But all my previous VM install have been on VMware Fusion, so it will be something different at least.

#12 sfinktah

sfinktah

    Frequent Member

  • Advanced user
  • 217 posts
  • Location:Der Äther
  • Interests:/(C(++|#)|P(HP|XE)|(OS|Linu)X|8051)/
  •  
    Australia

Posted 07 December 2011 - 12:41 PM

Well, ran a few tests... found a few interesting things... all those SYN/RESET packet do not reach (or come form) the iSCSI target.

Ran across a new guide, written a few weeks ago, ... which somehow doesn't manage to cover anything that isn't in 100 other guides. Except for that fact that he used iPXE... which looks just like gPXE to me. But I found this tidbit on their website:

Posted Image

Which I think would be a real good idea. :)

#13 erwan.l

erwan.l

    Platinum Member

  • Developer
  • 3041 posts
  • Location:Nantes - France
  •  
    France

Posted 07 December 2011 - 07:36 PM

in my first post I was suggesting ipxe : do you actually get the same issue with ipxe?

also, note that you have all gpxe versions here : http://rom-o-matic.net/ .

this week end I will execute the procedure I wrote using latest gpxe and latest ipxe to see if it works fine on my side.

/Erwan

edit : also check that your iscsi target accepts anonymous connection and several connections at the same time.
to do so, use a iscsi initator to test a successful connection to your target.

#14 erwan.l

erwan.l

    Platinum Member

  • Developer
  • 3041 posts
  • Location:Nantes - France
  •  
    France

Posted 07 December 2011 - 10:02 PM

update :

I downloaded latest gpxe (1.0.1), installed starwind (an old 3.5.3 still on my harddrive), boot win 7 and all went fine.

Side note : I had to add a rule to my firewall to allow outgoing trafic for starwind .

So to me the gpxe part is fine.

/erwan

Attached Thumbnails

  • iscsi.png


#15 sfinktah

sfinktah

    Frequent Member

  • Advanced user
  • 217 posts
  • Location:Der Äther
  • Interests:/(C(++|#)|P(HP|XE)|(OS|Linu)X|8051)/
  •  
    Australia

Posted 08 December 2011 - 05:26 AM

Ahh, crap. I just hit back and deleted my big post. Too lazy to write it all again.

Thanks a lot for running that test for me. I have actually managed to get iSCSI install to work.

It turns out, that something was causing Windows to use the MAC address of my DHCP/DNS/TFTP/GATEWAY server to attempt to contact the iSCSI target.

I also noticed that Windows does it's own DHCP request, and doesn't request the extra options for iSCSI.

So I took the DHCP server out of the equation and wrote a script:

script.txt:

#!gpxe

ifopen net0

set net0/ip 192.168.1.115

set net0/netmask 255.255.255.0

set net0/gateway 192.168.1.138

set net0/dns 192.168.1.238

set keep-san 1

sanboot iscsi:192.168.1.8::::nas:iscsi

prompt --key 0x02 --timeout 2000 Press Ctrl-B for the iPXE command line... && shell || exit


and compiled into iPXE as an option rom for the virtual e1000 vmware nic:

make bin/8086100f.mrom DEBUG=scsi:3,iscsi:3 EMBED=script.txt

and inserted the required lines into the .vmx file for the virtual machine I was using:


ethernet0.opromsize = 262144

e1000bios.filename = "8086100f.mrom"


Set the boot order to DVD first, placed the official 7601.17514.101119-1850_x64fre_server_2008_r2_standard_enterprise_datacenter_and_web_with_sp1_x64_dvd_617601.iso (md5: 8dcde01d0da526100869e2457aafb7ca), and hit escape at boot to force it to load iPXE first.

That makes it drop nicely back to the DVD after the iSCSI initialization, and everything worked a charm...

Posted Image

Well... still haven't finished, but I can see my iscsi drive in diskpart now.

Posted Image

The moral of the story, is probably that I read somewhere in a Microsoft document that you should never use DHCP for iSCSI. I'm sure that isn't true entirely, but I think it would work a lot smoother if your DHCP or TFTP server were on the same IP as your iSCSI.

#16 bwiese

bwiese

    Newbie

  • Members
  • 12 posts
  •  
    United States

Posted 08 December 2011 - 06:11 AM

The Microsoft iSCSI Initiator has a poorly documented "feature" that I spent many hours fighting with just as you have.

When supplied with a gateway address, the Microsoft iSCSI Initiator always adds a route to your iSCSI target IP address (as provided in the handoff) through the gateway IP address. It does this regardless of the fact that they may be on the same subnet and that the route is completely unnecessary and in fact entirely wrong. The assumption on the part of the programmers was that the iSCSI net would be separate from the operational net, as the recommended best practice for a high performance iSCSI setup would dictate (isolate iSCSI traffic from the workload traffic). The conclusion on their part is that if you supply a gateway address, you must need it to reach the iSCSI Target.

Thus, once the handoff to the Microsoft iSCSI Initiator is made, 1) you can't ping your iSCSI target, but you can ping every other IP address on your network, 2) the iSCSI drive dies and won't reconnect, with netstat showing an inability to connect (as confirmed by ping). Doing a route print will reveal the damage. Your comment (below) describes this exactly.

<quote>It turns out, that something was causing Windows to use the MAC address of my DHCP/DNS/TFTP/GATEWAY server to attempt to contact the iSCSI target.</quote>

The interesting thing is that if you remove the gateway during gPXE/iPXE (either by script, or by DHCP option override), Windows will install fine, and then you can re-add the gateway manually from within Windows and regain upstream access post-install.

I have several dozen diskless computers operating with various versions and editions of Windows 7 (including Embedded Standard), all with Microsoft DHCP, Microsoft iSCSI Initiator, Microsoft iSCSI Software Target, and iPXE. Nothing more is required. I use Windows Deployment Services to install each OS, so I never even need a DVD (in fact none of my machines have DVD or floppy drives either).

My iPXE script is as follows --

#!ipxe
dhcp
clear net0.dhcp/gateway:ipv4
set initiator-iqn iqn.2010-04.org.ipxe:${net0/mac:hexhyp}
set root-path iscsi:${next-server}::::iqn.1991-05.com.microsoft:thin-${net0/mac:hexhyp}-target
set keep-san 1
sanboot ${root-path} ||
chain bootx86wdsnbp.com ||

With this arrangement, I merely need to name my iSCSI Targets according to the MAC address of each machine, add devices (VHD's), and I'm done. I don't even need to use DHCP reservations to provide unique iSCSI Targets to each machine.

Good luck to you! Let us know if this helps.

Edited by bwiese, 08 December 2011 - 06:19 AM.


#17 erwan.l

erwan.l

    Platinum Member

  • Developer
  • 3041 posts
  • Location:Nantes - France
  •  
    France

Posted 08 December 2011 - 07:24 AM

Note that you need a dhcp server that will send an empty boot filename when the dhcp user-class request contains "gPXE".
This way, at the second dhcp request (the one coming from gPXE), gPXE will then use the root path pointing to your iscsi target.
If you dont have this setup, gPXE will boot in loop.

Your DHCP server also need to pass the following option : keep-san.
If you dont have this option, the windows boot will not see your iscsi target.

Could it be that the issue was that your dhcp server was not setup properly? Not all dhcp server can handle these 2 above constraints.

Alternative is indeed to use a manual script (gPXE or iPXE) like explained in here : http://erwan.l.free....si/install.html .

#!gpxe
echo "Greetings! Hit Ctrl-C to bail out."
sleep 5
echo "Going to DHCP on primary network adapter"
ifopen net0
dhcp net0
set keep-san 1
sanboot iscsi:192.168.1.100::::w2k8
boot

/Erwan

#18 bwiese

bwiese

    Newbie

  • Members
  • 12 posts
  •  
    United States

Posted 08 December 2011 - 06:49 PM

http://support.microsoft.com/kb/960104

If you start a Windows Server 2003 or Windows Server 2008 system using iSCSI Boot with the Microsoft iSCSI Boot Initiator, the gateway settings specified in the iSCSI Boot solution will always be used by Windows to reach the iSCSI Target hosting the boot drive, even if the iSCSI Target is directly reachable (on-link) and is not on a network disjoint from the iSCSI Boot NIC.

While starting, the Microsoft iSCSI Boot Initiator creates a static route to the iSCSI Target which contains the gateway specified in the iSCSI Boot solution. This route is a system-critical route that cannot be removed. Because the gateway from the iSCSI Boot solution is specified in the route, the gateway will always be used in communication with the iSCSI Target, even if the iSCSI Target is directly reachable (on-link) and a gateway is not required to communicate with the iSCSI Target.

In an iSCSI Boot environment, the optimal configuration is to have a NIC dedicated to iSCSI traffic and a separate NIC or NICs used for network communication with other servers or workstations. In an iSCSI Boot environment, the iSCSI Boot NIC being used to communicate with the iSCSI Target should only be used for communication with the iSCSI Target.

The NIC being used to communicate with the iSCSI Target should be configured to communicate with the iSCSI Target using the most efficient network route possible. For example, if a gateway is not needed to reach the iSCSI Target, then one should not be specified in the iSCSI Boot solution. This will prevent network traffic to the iSCSI Target from being unnecessarily routed through a gateway.


Note that most routers will never send packet back out the same interface from which it originated. Thus, the packet dies at the gateway.

Edited by bwiese, 08 December 2011 - 06:52 PM.


#19 sfinktah

sfinktah

    Frequent Member

  • Advanced user
  • 217 posts
  • Location:Der Äther
  • Interests:/(C(++|#)|P(HP|XE)|(OS|Linu)X|8051)/
  •  
    Australia

Posted 08 December 2011 - 10:07 PM

@bwiese: thanks for taking the time to sign-up and reply. You information will be of great help documenting a more robust procedure. I doubt I would have had the energy to nail down the exact original of such a tricky problem, and I hadn't even considered that it might be adding routes.

When supplied with a gateway address, the Microsoft iSCSI Initiator always adds a route to your iSCSI target IP address (as provided in the handoff) through the gateway IP address. It does this regardless of the fact that they may be on the same subnet and that the route is completely unnecessary and in fact entirely wrong.


You hit the nail on the head. It also explains why there was no entry in the ARP table for the target, and why adding a static one made no difference. If I'd just issued one netstat -rn during the hundred netstat -an commands I ran, it would have been obvious.

Note that most routers will never send packet back out the same interface from which it originated. Thus, the packet dies at the gateway.


Correct. Worse in my case, since it actually responded with a RST using the target's IP, instead of an ICMP from it's own IP. Although it did reject pings with the correct ICMP replies, although I'm not sure Microsoft's implementation of PING actually showed this.

The interesting thing is that if you remove the gateway during gPXE/iPXE (either by script, or by DHCP option override), Windows will install fine, and then you can re-add the gateway manually from within Windows and regain upstream access post-install.


Interesting - I noticed that Windows sends a DHCP request of it's own at the beginning of the "real" setup:


	   Hostname Option 12, length 14: "minint-gt1dkem"

	   Vendor-Class Option 60, length 8: "MSFT 5.0"


You can differentiate this from the PXE based requested via the Vendor-Class, and theoretically from post-setup DHCP requests based on Hostname != minint-*

So it should be possible to remove the requirement to manually intervene.

I haven't looked as WDS yet, my interests are rather academic once I have removed the requirement for those damnable DVDs and unlabelled flash drives.

I have a (respectable) number of bootable ISOs, live linux operating systems, recovery/backup tools, and a WinPE (it is somewhat difficult to have multiple WinPE/PXE setups when they all insist on using /Boot and / on the tftp server. Perhaps we should swap notes. I use pxelinux (syslinux) to navigate them, your setup may not need or require such.

I based my setup on the "FOG open source cloning solution", although all that remains of it is the artwork, which I like. Since all the code runs on the client, a simple copy of the tftp root duplicates it anywhere. There's few pages of junk. and I try to get everything booting correctly from PXE rather than just loading ISOs into memory.

Posted Image

#20 sfinktah

sfinktah

    Frequent Member

  • Advanced user
  • 217 posts
  • Location:Der Äther
  • Interests:/(C(++|#)|P(HP|XE)|(OS|Linu)X|8051)/
  •  
    Australia

Posted 08 December 2011 - 10:39 PM

Note that you need a dhcp server that will send an empty boot filename when the dhcp user-class request contains "gPXE".
This way, at the second dhcp request (the one coming from gPXE), gPXE will then use the root path pointing to your iscsi target.
If you dont have this setup, gPXE will boot in loop.


I already had the gPXE chaining set up, so worst case is that it loads PXELINUX. From there, you hit escape, and can issue gpxe commands (after a fashion) as long as you have the full syslinux setup:

Posted Image

I also have a few PXELINUX menu entries specifically for booting various iscsi targets, so it's not that big of a problem.

Your DHCP server also need to pass the following option : keep-san.
If you dont have this option, the windows boot will not see your iscsi target.

Could it be that the issue was that your dhcp server was not setup properly? Not all dhcp server can handle these 2 above constraints.





    192.168.1.238.bootps > minint-iq89h8s.lan.bootpc:

	 Your-IP minint-iq89h8s.lan

	 Server-IP nas.lan

	 Client-Ethernet-Address 00:0c:29:8d:fa:c4 (oui Unknown)

	 sname "nas"

	 file "undionly.kpxe"

	 Vendor-rfc1048 Extensions

	   Magic Cookie 0x63825363

	   DHCP-Message Option 53, length 1: ACK

	   Server-ID Option 54, length 4: 192.168.1.238

	   Lease-Time Option 51, length 4: 259200

	   RN Option 58, length 4: 129600

	   RB Option 59, length 4: 226800

	   Subnet-Mask Option 1, length 4: 255.255.255.0

	   BR Option 28, length 4: 192.168.1.255

	   Default-Gateway Option 3, length 4: 192.168.1.238

	   Domain-Name-Server Option 6, length 4: 192.168.1.238

	   Domain-Name Option 15, length 3: "lan"

	   FQDN Option 81, length 21: [SO] 255/255 "minint-gt1dkem.lan"

	   RP Option 17, length 22: "iscsi:nas::::nas:iscsi"

	   T175 Option 175, length 3: 8.1.1


I think I have that part nailed down :)

Alternative is indeed to use a manual script (gPXE or iPXE) like explained in here : http://erwan.l.free....si/install.html .


Indeed, that's exactly what I ended up doing.

#!gpxe
echo "Greetings! Hit Ctrl-C to bail out."
sleep 5
echo "Going to DHCP on primary network adapter"
ifopen net0
dhcp net0
set keep-san 1
sanboot iscsi:192.168.1.100::::w2k8
boot


Hehe, I remember copying that script back when I started. I had to remove the sleep command because they don't compile in support for that unless you know what checkbox to select. Also, that last "boot" is somewhat "undefined" since you haven't loaded an image to boot from. You should probably be using "chain" or just "exit" depending on what you want to achieve.

chain combines three operations in sequence to fetch, load, then execute...

imgfetch
imgload
imgexec

boot is just an alias for imgexec. if it hasn't loaded an image, it's just going to give you this:
Posted Image

#21 bwiese

bwiese

    Newbie

  • Members
  • 12 posts
  •  
    United States

Posted 08 December 2011 - 10:48 PM

I banged my head against that problem for an embarrassingly long time before I stumbled across a forum/blog post explaining the Microsoft iSCSI Initiator behavior, after which everything perfectly made sense. In my case I had a VMware based test environment where everything worked fine, but would not work in production, and of course I was intentionally isolating my VMware environment -- which had no gateway.

I knew when I read your post that you had exactly the same problem. I hope we can save others the same experience.

<rant>I have to admit that I was more than a little irritated at the adding of the route when clearly the iSCSI Target IP address is on the local subnet. It would be a trivial amount of programming to check for and skip that route intelligently, and I know of NO routers that will send back a packet on the same interface like the technet article seems to suggest can happen. But, it's exactly that sort of lazy programming from Microsoft that left us with buffer overflows for 15 years.</rant>

I use iPXE's undionly.kpxe, and of course Microsoft's TFTP server (hence WDS). In my environment all of the iSCSI boot OS are zero maintenance thin clients, that auto logon and launch an RDP session to our terminal server. No patches, no antivirus, no local desktop access, and so I don't miss the gateway at all.

That's a really interesting observation about the DHCP request with the "MSFT 5.0" vendor class. That should allow the application of the gateway without passing it through the iBFT. Let me know if you test that and how it works.

I don't know if this is useful to you, but I didn't want to use the /boot directory of Microsoft's TFTP server for my own purposes, so I exposed another directory, called /thin. You can do this from

HKLMSystemCurrentControlSetServicesWDSServerProvidersWDSTFTP

There you will find ReadFilter which normally reads

boot*
tmp*
boot*
tmp*

In my case it now reads


boot*
tmp*
boot*
tmp*
thin*
thin*

Thus we can extend the exposed directory space available to TFTP clients.

Good luck!

Edited by bwiese, 08 December 2011 - 10:52 PM.


#22 bwiese

bwiese

    Newbie

  • Members
  • 12 posts
  •  
    United States

Posted 08 December 2011 - 11:06 PM

Hmm this is interesting. In the sanbootconf.git log, Michael has the following comment.

https://git.ipxe.org...ace7673741bf192

[driver] Try to prevent spurious routing entries for iSCSI

If a gateway is specified in the iBFT, the Microsoft iSCSI initiator will create a static route to the iSCSI target via that gateway. (See http://support.microsoft.com/kb/960104 for details.) If the target is in the same subnet then this is undesirable, since it will mean duplicating every outbound packet on the network.

Try to prevent this undesirable behaviour by adjusting the gateway address stored in the iBFT.


Looks like he might (haven't inspected the actual code) just be removing the gateway from the iBFT as a preemptive workaround for Microsoft's behavior. I wonder why he does that for sanbootconf but not iPXE. I wonder how many people have tried and given up because of this problem without ever figuring it out.

Edited by bwiese, 08 December 2011 - 11:06 PM.


#23 erwan.l

erwan.l

    Platinum Member

  • Developer
  • 3041 posts
  • Location:Nantes - France
  •  
    France

Posted 09 December 2011 - 12:03 AM

When supplied with a gateway address, the Microsoft iSCSI Initiator always adds a route to your iSCSI target IP address (as provided in the handoff) through the gateway IP address. It does this regardless of the fact that they may be on the same subnet and that the route is completely unnecessary and in fact entirely wrong.



Note that most routers will never send packet back out the same interface from which it originated. Thus, the packet dies at the gateway.


The interesting thing is that if you remove the gateway during gPXE/iPXE (either by script, or by DHCP option override), Windows will install fine, and then you can re-add the gateway manually from within Windows and regain upstream access post-install.


@bwiese: excellent finding ! this is most valuable information for the iscsi geeks :) thanks for sharing.

Interestingly I was always lucky to "miss" that issue as it seems my (crappy) router actually sends the packets back.

And I had completely overlooked that there are actually 3 dhcp steps : first @ computer boot getting gpxe, second from gpxe and 3rd from the ms iscsi initiator during the early windows install stage.

However I am going to find a way to reproduce it in my home-lab and include this in my procedure.

Thx again for this very instructive discussion,
Erwan

#24 sfinktah

sfinktah

    Frequent Member

  • Advanced user
  • 217 posts
  • Location:Der Äther
  • Interests:/(C(++|#)|P(HP|XE)|(OS|Linu)X|8051)/
  •  
    Australia

Posted 09 December 2011 - 04:42 AM

Hmm this is interesting. In the sanbootconf.git log, Michael has the following comment.

Looks like he might (haven't inspected the actual code) just be removing the gateway from the iBFT as a preemptive workaround for Microsoft's behavior. I wonder why he does that for sanbootconf but not iPXE. I wonder how many people have tried and given up because of this problem without ever figuring it out.


That is extremely interesting. It also explains why my Windows XP iSCSI installation worked without a hitch. I had assumed it was a difference between the iSCSI initiators. I have the source trees for iPXE and gPXE, so I will check out the change.

What you said about TFTP is not in itself applicable to non Microsoft envrionments, but the fact that you made it work suggests to me that Microsoft will honor alternative paths. With the exception of bootmgr.exe (I think?) which it always insists on getting from the root. I had read an article that suggested this was fixable by specifying the path as Boot or Boot or /Boot or /Boot/ or something highly specific, but I never really followed up since it works fine for me.

Your thin desktop environment sounds very interesting. I was looking at doing something similar, for the purposes of being able to boot to a vSphere Client. In my case, vSphere would replace Remote Desktop (oh god, I can nearly remember it's real name still.... MSTSC.EXE /console?)

I've tried remoting to the vSphere Client, but if you have to open a Console from there, it just becomes unbearable.

Are you using XPe for RDP? I looked briefly into XPe this week, since I had to run off a handful of identical XP VMs to do load testing. I ended up using this seemingly shoddy (but actually quite reliable) product from Singapore called "CCBoot." I'll not waste time praising it's amazing ability to clone a live machine to iSCSI with a few mouse clicks, but rather about how it's designed to use a single iSCSI drive for all clients (the whole thing is designed to run net-cafe's)

As each new computer boots to PXE for the first time, you set it's name (or go with the default PC101, etc). It shares the same single iSCSI virtual drive to each machine, and stores changes in a "writeback file," which by default is promptly deleted at the end of each session. :)

You can select individual machines from the inventory, and specify an alternate iSCSI image, or set it to save it's own writeback file (so it didn't forget changes on reboot), and also merge the writeback file into the main image to effect global updates.

I suspect it's somehow related to SanDeploy, which is another Singaporian company with it's own iSCSI booting solution, but haven't looked into it further.

#25 sfinktah

sfinktah

    Frequent Member

  • Advanced user
  • 217 posts
  • Location:Der Äther
  • Interests:/(C(++|#)|P(HP|XE)|(OS|Linu)X|8051)/
  •  
    Australia

Posted 09 December 2011 - 05:04 AM

You don't have to be a programmer to see a quick fix, here is the code from iPXE that sets up iBFT:

Posted Image

The full text of his changes to sanbootconf are below, but it's not really necessary to implement all that checking, if you just want to make a boot image without an iBFT gateway. It would take the work of a minute to change that line... but I'm just not sure I can bare another iSCSI install to test it :)

Spoiler





0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users