experchange > vms

Rich Jordan (07-10-19, 09:47 PM)
Customer is skittish about firmware updates due to the occasional tale of bricked Integrity boxes from a failed firmware update. I did one update on our inhouse 2620 some time ago with no issues but thats our sum total experience; really nearly all my upgrade work was with VAXen and Alphas over theyears.

We are finally moving an RX3600 from V8.3 to V8.4 with current patches; literally the first downtime we've been allowed to preschedule in several years. I'm going to be pre-staging on a local RX2620 to check for any gotchas that are not model specific.

The machine and VMS are under HPe support (I know...) but the downtime window is going to break if there's a brick event and field service or hardwarereplacement is needed. We did call HPe service to see about any 'pre-call' notification so they're ready to go and was told 'no'.

I'm waiting on access to the support account so I can look at all the 'entitled' downloads, and, apparently, release notes. I have the firmware kits for our 2620 and do not see anything called out about specific VMS versionsor 'must install this intermediate version' so I'm hopeful if we do the firmware it will be a one-shot.

By chance is there a single source of: what minimum firmware release is required for VMS V8.4 and what firmware updates can be installed without intermediate firmware updates?

Thanks
Stephen Hoffman (07-10-19, 11:48 PM)
On 2019-07-10 19:47:47 +0000, Rich Jordan said:

> Customer is skittish about firmware updates due to the occasional tale
> of bricked Integrity boxes from a failed firmware update....
> By chance is there a single source of: what minimum firmware release
> is required for VMS V8.4 and what firmware updates can be installed
> without intermediate firmware updates?


Ask HPE. They're the source of the firmware, and they'll also be able
to tell you whether there's an intermediate upgrade required for this
case.
I'd expect at least 4.32A (22 Sep 2014) here, but there might possibly
be something newer—and I don't have any rx3600 boxes handy.
If there's enough concern over getting bricked, stage a spare server.
Or better still, just have a spare server. As servers can and do fail.
As for options for spares, an Integrity rx3600 is ancient and slow.
Most any available Integrity i2 or later should be able to outrun it.
I prefer to use the disk image (ISO) path for Integrity firmware
upgrades, given the choice. That does require a (working) optical drive.
Robert A. Brooks (07-11-19, 01:19 AM)
On 7/10/2019 3:47 PM, Rich Jordan wrote:
> Customer is skittish about firmware updates due to the occasional tale of
> bricked Integrity boxes from a failed firmware update.


Tell the customer not to believe everything they read on the internet.

I'm not doubting that it as happened a handful of times, and those handful of
times multiply in telling over the years, but as someone who has upgraded iLO
and system firmware on HP(E) systems over 1,000* times with no brick, I'm
confident that short of someone willfully powering off the system during the
upgrade, it will be OK.

*that guess is likely on the low side, as I used to develop firmware at HP,
after working in VMS Engineering, and did a lot of firmware upgrading. That
number includes upgrading to firmware that was quite buggy, but didn't brick,
and allowed a subsequent upgrade.
Simon Clubley (07-11-19, 02:17 PM)
On 2019-07-10, Robert A. Brooks <FIRST.LAST> wrote:
> *that guess is likely on the low side, as I used to develop firmware at HP,
> after working in VMS Engineering, and did a lot of firmware upgrading. That
> number includes upgrading to firmware that was quite buggy, but didn't brick,
> and allowed a subsequent upgrade.


Serious question: What happens on Itanium if, on a rare occasion,
a firmware update, or a power failure at exactly the wrong point,
_does_ well and truly brick the machine ?

Is there anything like a JTAG port which a HPE engineer can use to
install a working version of the firmware ?

Or would the existing onboard recovery process be guaranteed to always
allow the user to recover back to a working firmware version ?

Simon.
Robert A. Brooks (07-11-19, 03:17 PM)
On 7/11/2019 8:17 AM, Simon Clubley wrote:
> On 2019-07-10, Robert A. Brooks <FIRST.LAST> wrote:
> Serious question: What happens on Itanium if, on a rare occasion,
> a firmware update, or a power failure at exactly the wrong point,
> _does_ well and truly brick the machine ?


I don't know, because I never saw a bricked system, nor do I remember
any of my firmware-writing colleagues experiencing that problem.

> Is there anything like a JTAG port which a HPE engineer can use to
> install a working version of the firmware ?


For certain platforms, yes. I wrote firmware for the IA64 i2 blades
and rx2800 iLO, and for the Superdome X enclosure.

The Superdome X definitely had a JTAG interface; I'm not sure about the iLO.

I never actually *saw* the hardware; I was working remotely in Massachusetts,
and the systems were in Texas and California.

> Or would the existing onboard recovery process be guaranteed to always
> allow the user to recover back to a working firmware version ?


The Superdome X did have a recovery mechanism to fall back to the last
known working firmware; I don't think most of the lower-end platforms did,
but I'm not sure.
Martin Vorländer (07-11-19, 07:44 PM)
Simon Clubley <clubley> schrieb:
> Serious question: What happens on Itanium if, on a rare occasion,
> a firmware update, or a power failure at exactly the wrong point,
> _does_ well and truly brick the machine ?
> Is there anything like a JTAG port which a HPE engineer can use to
> install a working version of the firmware ?
> Or would the existing onboard recovery process be guaranteed to always
> allow the user to recover back to a working firmware version ?


AFAIK the Integrity has *two* firmwares stored. In theory, it should
be able to boot the intact ROM when it detects a corruption on the
half-updated ROM.

cu,
Martin
Stephen Hoffman (07-12-19, 12:38 AM)
On 2019-07-11 13:17:24 +0000, Robert A. Brooks said:

> On 7/11/2019 8:17 AM, Simon Clubley wrote:
> I don't know, because I never saw a bricked system, nor do I remember
> any of my firmware-writing colleagues experiencing that problem.


Without the benefit of a failsafe loader, the design was a trade-off
against depot repair. I've seen one or maybe two Integrity server
bricked by failed firmware or possibly by a contemporaneous hardware
failure, but it's very rare. That's out of a whole lot of firmware
upgrades, too. Per then-HP, "It should be noted that problems due to
firmware are relatively rare, and you should look for other problem
causes first." The server I was dealing with and that was bricked was
replaced, as that was more expeditious than dealing with the depot
repair. Swap the boards and the storage, recreate the boot aliases,
and off we went. I don't know if depot repair is still typical here,
but I would not be surprised. The firmware for the Integrity rx3600
series has been around for quite a while and has been entirely
unchanged, too.

>> Is there anything like a JTAG port which a HPE engineer can use to
>> install a working version of the firmware ?

> For certain platforms, yes.


AFAIK, all of the Integrity boxes had manufacturing access of some sort.

The bigger issue lurking here has little to do with the firmware
upgrades and the (very low) risk of bricking, and more centrally with a
critical dependency on a server design that was announced Sept. 7,
2006—~thirteen years ago—and is well past its end-of-service date of
January 31, 2011—~eight years ago.

Servers this old can tend to fail for reasons other than failing
firmware upgrades.

I'd also be concerned that any OpenVMS server that's been booted for
months or years might not reboot correctly, due to previously-untested
startup modifications or some other unrecognized issue, too.

Uptime being a measure of the interval since system and security
patches were applied, and the last time the startups have been tested.

And a server that's running OpenVMS without recent patches is already
running with known vulnerabilities.

A redundant server or a cluster is the typical approach toward keeping
the apps—the apps, and not necessarily a specific server—available, too.

And HPE OpenVMS I64 V8.4 was released over nine years ago, and it's
falling off of all new-patch support in less than 18 months.

Firmware-assisted server bricking... is probably not the biggest
consideration lurking here. What's the path to VSI OpenVMS and new
patches, and to an Integrity i4 or i6-class server—or a pair of i4 or
i6 servers and maybe an external storage shelf—or are there plans to
lay in stocks of spare servers and parts and awaiting the completion of
the VSI OpenVMS x86-64 port and the associated local application port?