Jump to content
Gavin1977

Server and endpoint, or dual cpu server?

Rate this topic

Recommended Posts

I thought it would be useful to start a new thread for this.  What provides the best audiable benefit... server and endpoint, or a dual cpu server?

 

Also, does anyone have any guidance as to how you set up and optimise a dual CPU server?  For example, Roon ROCK running on one CPU with dedicated hard disk, and end point/renderer running from the other CPU (Euphony perhaps)?

 

Thanks

Share this post


Link to post
Share on other sites

I guess this answers my question to a certain extent...

 

https://taikoaudio.com/product/sgm-extreme/
 

Custom Windows 10 Enterprise LTSC 2019 OS, Roon and Jplay playback software.  "Windows 10 components designed for industrial controller applications with SOTA process scheduler".

 

So provided the process schedule is optimised that this is probably the preferred solution as no network bridge is involved between two different systems in this configuration.

 

My next thought is how a dual CPU server would compare with a virtualised system on a multicore processor, for example Roon ROCK on one core and Euphony on another core.  Not quite the same technical performance as a dual CPU server as a single CPU has to share cache, motherboard controller resources and a single power supply (dual CPU has it's own PSU rail pwer CPU) - but might also sound good depending upon the configuration.

 

Interestingly, Intel VT-d does allow direct assigment of resources to virtual machines, so it should be possible to assign one bank of RAM to one core, and so forth.

https://software.intel.com/en-us/articles/intel-virtualization-technology-for-directed-io-vt-d-enhancing-intel-platforms-for-efficient-virtualization-of-io-devices

 

Would be an interesting experiment.

Share this post


Link to post
Share on other sites

On the software side, things could get quite interesting when the size of the entire Windows folder alone is getting all the way down to less than 60MB

 

http://www.tirnahifi.org/forum/viewtopic.php?t=3759&start=30

 

Stripping Windows as well as Roon way down with a much lower number of processes / threads / handles

 

http://www.headphoneclub.com/thread-732005-1-1.html

 

There really are plenty of fun things to play around with memory and cache allocation

 

https://software.intel.com/en-us/articles/introduction-to-memory-bandwidth-allocation

https://01.org/intel-rdt-linux/blogs/fyu1/2017/resource-allocation-intel®-resource-director-technology

https://software.intel.com/en-us/articles/cache-allocation-technology-telco-nfv-noisy-neighbor-experiments

 

There's also NUMA once we've got a pair of processors

 

https://frankdenneman.nl/2016/07/06/introduction-2016-numa-deep-dive-series/

https://frankdenneman.nl/2016/07/07/numa-deep-dive-part-1-uma-numa/

https://frankdenneman.nl/2016/07/08/numa-deep-dive-part-2-system-architecture/

https://frankdenneman.nl/2016/07/11/numa-deep-dive-part-3-cache-coherency/

https://frankdenneman.nl/2016/07/13/numa-deep-dive-4-local-memory-optimization/

https://frankdenneman.nl/2016/08/22/numa-deep-dive-part-5-esxi-vmkernel-numa-constructs/

 

Take a look at some block diagrams of Xeon motherboards from Supermicro, interestingly we'll find an example like this

 

https://www.provantage.com/supermicro-mbd-x11dpl-i~7SUPM5QH.htm

 

The CPU on the right might seem to be getting the least amount of distractions while her big sis on the left should be put under the spotlight. With the right software settings we could direct all the heavy lifting to the left hand while the right hand could take it easy. Now that "extra special" PCI-E X16 SLOT 3 is obviously good for stuff like Pink Faun USB bridge with Ultra OCXO etc.

 

https://www.supermicro.com/manuals/motherboard/C620/MNL-1946.pdf#page=17

 

Here's a little something for handling CPU with a relatively high TDP

 

https://www.anandtech.com/show/14486/noctua-shows-off-concept-fanless-cpu-cooler-up-to-120w-of-cooling-performance

 

A relatively affordable CPU that's ready for Optane DCPMM

 

https://www.provantage.com/intel-cd8069504212701~7ITEP6HP.htm

https://ark.intel.com/content/www/us/en/ark/products/193389/intel-xeon-silver-4215-processor-11m-cache-2-50-ghz.html

Share this post


Link to post
Share on other sites
On 11/1/2019 at 4:46 PM, Gavin1977 said:

I thought it would be useful to start a new thread for this.  What provides the best audiable benefit... server and endpoint, or a dual cpu server?

 

This is such a broad question. It's like what speakers provide the best audible benefit... single driver speakers, horns, full range 3-way, bookshelf 2-way, with or without subs, with high sensitivity or low sensitivity drivers, etc. 

 

I don't think it's that important which direction you go, but how well you tweak your system. I look at the digital source as one complex system from the file you play all the way to the analog output from your DAC. Everything matters - the software you are using, the hardware it goes through, how the software interacts with the hardware, cables, interface with the DAC, etc. etc. You just have to pay really good attention to every detail and you get paid nice dividends at the end. 

 

Clean power is the single most important thing in digital audio. When I was using decent power supplies (sBooster, HDPlex, Uptone, etc.) I prefered a two box solution. I also liked it better, when I limited the current the devices drew, the CPU frequency, etc. 

But when I got into higher end power supplies and separate rails for everything possible, I started to prefer a single box computer with best power supplies and tweaked for the highest performance possible. 

That makes me think that the dual box solution was helping me to mask some of the power noise... maybe, maybe not. 

 

Anyhow, I can write forever on the topic but planning to stop here. 

Share this post


Link to post
Share on other sites

Thanks for the feedback.  I do think there is a technical approach that can be taken to this and you've highlighted some of your findings which are useful.

 

I was hoping that someone would have undertaken a comparision - of course it would not be absolute - just an indication. 

 

Lots of useful infromation on these forums but it takes some time to root them out - for example Marcin on the 'main thread' has clearly stated that ECC Wide Temperature Apacer memory has made a difference. Some people have likend the effect to the improvement you get using RAM boot... so it gives you an idea of the improvement and where to try and focus resources.

 

What I haven't found though is someone who's upgraded to ECC memory, but also knows what magnitude of improvement you can have from a end-point / server config as well for comparative purposes.  Perhaps you're also right in the fact that it becomes irrelevant once you have really high quality power supply... this would make sense, but I was just hoping for some more feedback based on peoples experience.  Cheers

Share this post


Link to post
Share on other sites

Also, many thanks to Seeteeyou - partiqularly the link to the block diagram for the supermicro motherboard. 

 

5 hours ago, seeteeyou said:

 

The point of interest here is that on a dual CPU motherboard the second CPU only has a high bandwith UPI link to deal with - what this says to me is that 'technically' a dual CPU motherboard such as this one might well have the potential to be 'better' than a single CPU solution as all of the heavy lifting is done by one of the CPU's leaving one CPU as an endpoint as Seeteeyou suggests.

 

Using two separate sinlge processor computers as server/endpoint would not achieve this as each CPU would need to talk to the PCH. Quality of bridge and power supplies all have an impact of course.

 

I feel satisfied that some substance has been found on the topic 😀

Share this post


Link to post
Share on other sites

I better point out the fact that Wide Temperature ECC options from Apacer are only available as UDIMM at the moment but all Xeon motherboards from Supermicro don't support UDIMM.

 

They do have non-WT ones if we're looking for RDIMM and LRDIMM but neither one would seem to be 2019 models

 

https://industrial.apacer.com/en-ww/DRAM/DDR4-RDIMM

https://industrial.apacer.com/en-ww/DRAM/DDR4-LRDIMM

 

There are also other ways to move the USB card itself further away from the rest of the bunch. Bandwidth is only limited to 250Mbps with this option, though

 

https://www.provantage.com/startech-pex2pcie4l~7STR917N.htm

https://www.amazon.ca/StarTech-com-PEX2PCIE4L-Express-External-Expansion/dp/B002I9SK5S

 

DVI-D cables are required and we could find some copper + optical hybrid options like this one below

 

http://vitextech.com/ddi-dvi-fiber-optic-cable-2/

http://www.arp.com/medias/56015b7b154fe215aaa3c0ac.pdf#page=4

Quote

1 - 3 ft. DVI-D Cable

 

Here's another one for $69

 

http://www.fibbrtech.com/producthelp.php?id=70

https://www.amazon.com/FIBBR-Fiber-Resolution-Digital-32-81ft/dp/B07PGD5Q66

Share this post


Link to post
Share on other sites
8 hours ago, seeteeyou said:

There's also NUMA once we've got a pair of processors

 

You have NUMA also with number of modern single-socket CPUs such as many latest AMD processors that have two "chiplets" in the same package. So you may have a multi-layer NUMA for example with AMD EPYC CPUs.

 

On 11/5/2019 at 11:07 AM, Gavin1977 said:

My next thought is how a dual CPU server would compare with a virtualised system on a multicore processor

 

Both carry some extra overhead, especially virtual machines, but dual-socket has double the memory bandwidth compared to single socket. Virtual machines of course only add extra overhead, but don't create any additional benefit in terms of performance.

 

On 11/1/2019 at 11:46 PM, Gavin1977 said:

I thought it would be useful to start a new thread for this.  What provides the best audiable benefit... server and endpoint, or a dual cpu server?

 

Not related at all. Dual-socket (or quad-socket or what ever you like) just adds more CPUs and memory bandwidth to the computer. With some overhead also baked in due to NUMA.

 

Advantage of a separate networked endpoint is to allow locating big loud servers outside of listening spaces. And also use of very low power system with minimal activity as endpoint.

 

2 hours ago, Gavin1977 said:

dual CPU motherboard such as this one might well have the potential to be 'better' than a single CPU solution as all of the heavy lifting is done by one of the CPU's leaving one CPU as an endpoint as Seeteeyou suggests

 

Can you explain why it would be any better? The other CPU in this kind of case would be still a big beast anyway.

 

8 hours ago, seeteeyou said:

On the software side, things could get quite interesting when the size of the entire Windows folder alone is getting all the way down to less than 60MB

 

On software side, things get interesting when you get rid of Windows and go for a custom Linux where you have all the source code available and can modify all the aspects of the operating system....

 


Signalyst - Developer of HQPlayer

Pulse & Fidelity - Software Defined Amplifiers

Share this post


Link to post
Share on other sites

Very interesting and many thanks for the feedback.

 

Epyc processors do indeed have separate 'Zepplin' dies in one CPU package which are connected via AMD Infinity fabric so different NUMA configurations are possible and hence potential optimisation of resources.   This is probably why the SGM server has all of it's RAM banks populated as there must be some sort of audiable benefit for this (I assume SGM are operating on a similar basis although this is an intel based system).

 

Shame Epyc processors have such high TDP, I have looked at AMDs new Embedded V1605B - but it's just a single die.  So no opportunity for Numa / dual CPU style optimisations there.  An Epyc system would be reliant on a HDPlex or Paul Hynes SR7 to satisfy these levels of power consumption - I'd like to keep within a lower power budget.

 

I think Intel vPro CPU or Xeon CPU's with VT-x and VT-d capabilities are quite interesting to me now since you can assign resources - I think I'll try installing Windows 10 Enterprise LTSC 2019 OS, Roon and Jplay playback software on My Intel DNHE NUC with the i7-8650U processor and also play with the VT-x & VT-d capacities as little experiment.  I will try setting up Euphony as a virtual endpoint with it's own dedicated disk (the main OS and Roon can reside on the M.2 SSD).

 

Share this post


Link to post
Share on other sites
14 hours ago, seeteeyou said:

I better point out the fact that Wide Temperature ECC options from Apacer are only available as UDIMM at the moment but all Xeon motherboards from Supermicro don't support UDIMM.

 

 

I am not sure what you are saying but the Supermicro Xeon motherboards DO support UDIMM.  I just built a server using X11SCM-F and bought UDIMM from Apacer.

 


Speakers: Vandersteen Model 7s, 4 M&K ST-150Ts, 1 VCC-5; Amplification: 2 Vandersteen M7-HPAs, CI Audio D200 MKII, Ayre V-6xe; Preamp: Doshi Audio Line Stage v3.0; Phono Pre: Doshi Audio Phono Pre; Analog: Wave Kinetics NVS with Durand Telos composite arm; SME 3012R arm, Clearaudio Goldfinger Statement v2; Reel to Reel:  Technics RS-1500; Doshi Tape Pre-Amp; Studer A810, Studer A812, Tascam BR-20; Multi-channel: Bryston SP-3; Digital: Custom PC (WS 2016/AO/HQPlayer/Roon)> Lampizator Big 7 DAC

Share this post


Link to post
Share on other sites

We're focusing on SGM Extreme above, and then I just happened to forget about adding the word Scalable after Xeon while I was looking at this link below. For some reasons UDIMM never appeared here

 

https://www.supermicro.com/en/products/x11/motherboards

 

"Regular" Xeon would be much closer to an i7 or i9 since all of them are sharing the same socket type, of course UDIMM must be compatible with those somewhat "lesser" boards

Share this post


Link to post
Share on other sites
On 11/8/2019 at 12:53 PM, Gavin1977 said:

Shame Epyc processors have such high TDP, I have looked at AMDs new Embedded V1605B - but it's just a single die.  So no opportunity for Numa / dual CPU style optimisations there.  An Epyc system would be reliant on a HDPlex or Paul Hynes SR7 to satisfy these levels of power consumption - I'd like to keep within a lower power budget.

 

There's not much point in making NUMA system with small CPUs, because the overhead would be still there but you wouldn't get benefits. When you have enough processors to starve single socket memory channels, you need to add more memory channels by using NUMA. Another, more efficient alternative is to do like GPUs do, where they just scale the memory bus width to 512 bits and up. So you don't get overhead cost of NUMA, but you get NUMA worth of memory bandwidth.

 

Biggest GPUs use 4096-bit wide memory bus with HBM2 memory.

 

So instead of using two sockets, I've been opting to use CPU + GPU. Which is another kind of NUMA architecture, both having their own RAM but the interconnect between the two mapped memory regions is 16x PCIe. Plus depending on case, a separate network endpoint.

 


Signalyst - Developer of HQPlayer

Pulse & Fidelity - Software Defined Amplifiers

Share this post


Link to post
Share on other sites
6 hours ago, Miska said:

 

There's not much point in making NUMA system with small CPUs, because the overhead would be still there but you wouldn't get benefits. When you have enough processors to starve single socket memory channels, you need to add more memory channels by using NUMA. Another, more efficient alternative is to do like GPUs do, where they just scale the memory bus width to 512 bits and up. So you don't get overhead cost of NUMA, but you get NUMA worth of memory bandwidth.

 

Biggest GPUs use 4096-bit wide memory bus with HBM2 memory.

 

So instead of using two sockets, I've been opting to use CPU + GPU. Which is another kind of NUMA architecture, both having their own RAM but the interconnect between the two mapped memory regions is 16x PCIe. Plus depending on case, a separate network endpoint.

 

You've opened up another can of worms :-)  Shame GPU cannot be used as a USB endpoint as well (HDMI I2S perhaps though).

 

You've got me thinking about PICMG 1.3 / SHB Express combinations to interconnect two computers.  Anyone have experience with this?

 

I tried Windows 10 Enterprise LTSC 2019 with JPLAY  on my NUC and it was a 'pleasing' clean sound, but was quite a bit behind behind Euphony.  I tried to set up a linux virtual PC (using Hyper-V and also Virtual box) and tried to get it to boot from my copy of Euphony on USB (because Euphony needs a dedicated hard drive), but no luck getting this configuration to work at the moment the virtual pc won't boot.  If I was sucessful in getting this to boot then I'd try to run Roon core on Windows 10 Enterprise LTSC 2019 and Euphony on the virtualised PC (dedicated CPU core to it) - would be interesting, but it's looking like it might not be technically feasible.

 

Am also interested in the Network Audio Adapter approach, so will look at that.

Share this post


Link to post
Share on other sites
52 minutes ago, Gavin1977 said:

You've opened up another can of worms 🙂 Shame GPU cannot be used as a USB endpoint as well (HDMI I2S perhaps though).

 

There wouldn't be any point in using GPU as USB endpoint. I2S shouldn't be used to connect devices at all, it is only for chip-to-chip interconnect.

 

53 minutes ago, Gavin1977 said:

You've got me thinking about PICMG 1.3 / SHB Express combinations to interconnect two computers.  Anyone have experience with this?

 

I fail to see point of that, it would be same as having two computers on the same motherboard.

 

I much rather use Ethernet.

 

56 minutes ago, Gavin1977 said:

I tried to set up a linux virtual PC (using Hyper-V and also Virtual box) and tried to get it to boot from my copy of Euphony on USB (because Euphony needs a dedicated hard drive), but no luck getting this configuration to work at the moment the virtual pc won't boot.

 

Using virtual machine for audio is extremely bad idea because of the overheads you just have massively increased latencies, etc.

 

If you want two machines, you are much better off with two real machines.

 


Signalyst - Developer of HQPlayer

Pulse & Fidelity - Software Defined Amplifiers

Share this post


Link to post
Share on other sites
On 11/1/2019 at 5:46 PM, Gavin1977 said:

I thought it would be useful to start a new thread for this.  What provides the best audiable benefit... server and endpoint, or a dual cpu server?

 

Also, does anyone have any guidance as to how you set up and optimise a dual CPU server?  For example, Roon ROCK running on one CPU with dedicated hard disk, and end point/renderer running from the other CPU (Euphony perhaps)?

 

Thanks

I was running a server/endpoint solution with 2 NUC's because of USB harshness issues direct from server . Using an Uptone Audio JS-2 power supply eliminated

any need to "outsource" USB connectivity from the media server.


Regards,

Dave

 

Audio system

Share this post


Link to post
Share on other sites

I have been evaluating two newly set up NUCs for my source:

 

RoonServer: NUC8i7BEH/16GB RAM on 19V rail, running Audiolinux 2.0 headless RAMboot + 32GB Optane for Roon DB

RoonBridge: NUC7i5DNHE/8GB RAM on 12V rail, running Audiolinux 2.0 headless RAMboot, WiFi.

 

 

Both boxes are being powered from a Keces P8 with dual rails (20V/4A + 12V/4A). Neither are in a fanless case but I've disabled TurboBoost and am using standard Boot mode. Network is all Ubiquiti switches and access points, no LPSU.

 

I'm trying to decide if a 1-box or 2-box setup is better for me. So far my tests prefer using the 7i5 as roonbridge, even though this runs on WiFi and on the 12V rail (which, when testing the 7i5 alone, is slightly inferior to the 20V). I can't precisely put my finger on the difference but I've gone back and forth a few times and each time the 2-box with 7i5 as bridge comes across as more relaxed and natural.

 

One very strange observation is the 1-box setup is drawing more watts than the 2-box. When running 2 boxes, the 8i7 draws 0.3-0.4A @ 20V (say, 7w) and the 7i5 another 0.3A @ 12V (so, 6w) for a total of 13w. On the other hand, when I run only the 8i7 directly connected to the DAC, it pulls 1.0-1.3A @ 20V (>20w) consistently! What gives?

 

On another note, has anyone tried a Thunderbolt 3 -> GbE adapter with Audiolinux? Would it work out of the box? Then I can try network bridging to the endpoint.

Share this post


Link to post
Share on other sites

Seems as though under certain circumstances e.g. very good linear power supply and good software (e.g. Euphony) in a optimised single box might be the preferred option at the moment. bobfa has also experienced this.  davide256 also confirms above that upgraded power supply removed the need for a separate endpoint

 

 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×
×
  • Create New...