Jump to content
IGNORED

Caps Zuma WS2012 R2 with Mellanox fiber card reboots every 30 minutes!!


Recommended Posts

So I have been away from my system for 6+ weeks (unfortunate health issues starting with kidney stone). As I attempted to slowly get back into it I noticed, via RDC, that my headless i7 Windows Server R2 (with AO) was on its main screen, no the HQPlayer screen I always leave on. ? Weird. Seemed like it may have rebooted, and I first expected some sort of Windows update (which I very specifically turned off). Nope.

 

This machine has run HQP, Roon (for 2 channel) and sometimes Minimserver (for testing) for the past couple of years with ZERO issues. AO has always been updated to the latest beta (I am an alpha customer). Zero issues.

 

The machine is powered, via a 19V pico, by a hearty Hynes SR7EHD (set to 19V), and the SSD is powered by a Barrows-modified (for 5V) SOtM dual battery unit. Last year I added a Mellanox ConnectX2 fiber PCIe card to complete my music room fiber project (see Optical Networking Configurations thread for all the detail).

 

So...a couple days ago, I logged ino the machine and voila it showed me the main screen. Ouch! After some investigating I did a screenscrape of the Event log (see pic below) which shows that about twice per hour my machine reboots itself (Kernel Power error):

 

"The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly."

 

which follows a couple Mellanox warnings about firmware:

 

"The firmware version that is burned on the Mellanox ConnectX-2 Ethernet Adapter device does not support Network Direct functionality. This may affect the File Transfer (SMB) performance. The current firmware version is 2.9.1200 while we recommend using firmware version 2.9.8350 or higher. Please burn a newer firmware and restart the Mellanox ConnectX device. For more details about firmware burning process please refer to Support information on http://mellanox.com"

 

"Mellanox ConnectX-2 Ethernet Adapter device reports that the "QOS (ETS) capability is missing". The current firmware does not support the QOS (ETS) capability. Please burn the latest firmware and restart your machine. (The issue is reported in Function SetHardwareAssistAttributes)."

 

and dozens of errors about:

"Failed to schedule Software Protection service for re-start at 2016-04-23T00:25:53Z. Error Code: 0x80040154".

 

When the system comes back up I get an error:

"The UAC File Virtualization service failed to start due to the following error: This driver has been blocked from loading"

 

Caps reboot issues.jpg

 

???

 

I have a WS2012 R2 much smaller Caps Carbon (with same AO) that acts as my HQP NAA and has ZERo events logged.

 

Any help is much appreciated. I have reviews to do, and miss my music. :)

 

Thanks

Ted

Link to comment

Ted, I got the sense with a Mellanox card that I was using, that it was overheating. You could try remounting the heat sink with new paste. You could also try measuring temps with an IR thermometer (gun type)

Custom room treatments for headphone users.

Link to comment

Jabbr, it seems cool to the touch and nothing has changed in over 6 months.

 

However, in trying to relive any steps I've taken in the past weeks, it became obvious to me that one important step occurred last weekend! I finally activated Windows (I tried numerous times as the Please Activate was like a watermark in my lower right corner, but Windows would always say "activation not working at this time, try later"). This time all went smoothly (via online) and I got back a successful activation message and a normal product key filled in, with the wallpaper water mark now gone. I can't imagine, though, that this would have any bearing. But in the spirit of transparency thought I would mention it.

Link to comment

If it's cool to touch that's fine -- mine was quite hot. Try touching the backside of the board (on the other side of the chip) just to humor me -- if your paste has failed/cracked etc then the heatsink doesn't see the heat.

Custom room treatments for headphone users.

Link to comment

My bad..I was just touching the connector area. The heatsinks are very hot, as is the top edge of the card. Ouch.

 

What is a no-brainer card to replace this with? LR Link? Intel x520s are so damn expensive, and I don't think I need 10GB. I guess in the meantime I can try simply using the copper onboard ethernet port and see if the rebooting goes away.

Link to comment

Ted - I recently switched from Mellanox to an Intel X520-DA1 (one port) because I kept having minor problems with the Mellonox. So far no issues with the X520 over the past month or so. One reason may that the drivers are included in Linux (Ubuntu Studio) so no driver installation required. I know you are a Win guy so probably not relevant to you.

 

In any case, I picked up a refurbished X520 on eBay for $60. Also I had to get an Intel SFP module also off eBay for $40.

Link to comment

Update: I removed the Mellanox card and am using the mobo ethernet port. The pc still rebooted after about 30 minutes!! It's not the Mellanox. Argh!! Same event log line items as before (except no Mellanox errors of course).

Link to comment
Hi Ted

 

what are the details of the kernel-power warning in the event log? What eventID etc?

 

Phil, Hi. As per the pic above, it is event id 41. Details:

 

System

 

- Provider

 

[ Name] Microsoft-Windows-Kernel-Power

[ Guid] {331C3B3A-2005-44C2-AC5E-77220C37D6B4}

 

EventID 41

 

Version 3

 

Level 1

 

Task 63

 

Opcode 0

 

Keywords 0x8000000000000002

 

- TimeCreated

 

[ SystemTime] 2016-03-31T17:40:12.971872100Z

 

EventRecordID 248253

 

Correlation

 

- Execution

 

[ ProcessID] 4

[ ThreadID] 8

 

Channel System

 

Computer CAPS-R2

 

- Security

 

[ UserID] S-1-5-18

 

 

- EventData

 

BugcheckCode 0

BugcheckParameter1 0x0

BugcheckParameter2 0x0

BugcheckParameter3 0x0

BugcheckParameter4 0x0

SleepInProgress 0

PowerButtonTimestamp 0

BootAppStatus 0

 

When I investigate this, the web says that with zero parameters above this is possibly a power supply issue, as per your email to me. I am not hugely technical but I guess I'll have to put a multimeter on the Hynes ps and make sure it still shows 19V.

Link to comment

Do you have "any" other psu just for testing? Does not need to be the final solution, but would certainly help to narrow down the issue

ıllıllı [  ...AO 4.00 BETA... ] ıllıllı
____________________________________________________________________________________

 

Shop | Reviews | Reference System | AudiophileOptimizer 3.00 | PDF Guide

 

Link to comment

I tested my $900 Hynes SR7EHD and it was outputting 18.6V so I dialed it back up to 19.05V and put it back in. Same result!

 

I would probably blame my $50 19V pico psu (harness) over my $900 Hynes at this point, but no idea how to test it. And, why? Nothing changed except activating Windows. This every 30 minutes thing feels like something else, not a bad ps. But what do I know. All I know is that my music server is DOA right now.

 

Phil,

I do not have another 19V ps. I will scour the house for maybe a laptop style one.

Link to comment

I can't believe how this has deteriorated! I could not immediately find a 19V ps alternative so I replaced the 16-24V pico dc-dc with my standard 12V one, and tried a 12V brick. Nothing. Now I am no longer even getting the power button to work. I put the 16-24 pico back in and found a 20V laptop brick. It, like the 12V before it, lights up the mobo led and the pico but the power button does nothing. Nothing boots. I even shorted the power + and - pins to force a power on, but nothing. Argh!!!

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now



×
×
  • Create New...