RAID0?

I need to build a storage array that will only be used to write large files (mostly long videos at 1080p or higher resolutions), but will also be accessed by more than four editing workstations.

Basically, there is one video+audio input capture workstation that saves all of the video and audio onto the storage array in 10-minute chunks.
While the files are being saved onto the array, two workstations are going to read the completed chunks and compress it with a low bitrate VP9 codec. Two or more workstations are going to use those same files for editing and combining the audio stream with the video before being exported into VP9 at a near lossless bitrate.

What sort of solution should I be looking at? PCIe SSDs in JBOD, SATA SSDs in RAID0, or SATA/SAS hard drives in a RAID0?
>inb4 no redundancy
All the files will be backed up on different workstations depending on which one is accessing the raw files and the video/audio capture machine. There will be multiple copies of the raw video+audio track in case of failures/corruptions.

Other urls found in this thread:

ark.intel.com/products/49186/Intel-Ethernet-Server-Adapter-I340-T4
ark.intel.com/products/83964/Intel-Ethernet-Converged-Network-Adapter-X710-DA2
twitter.com/AnonBabble

...

Consider that you probably dont have 10GbE or better, it really doesnt matter. A single modern HDD has been able saturate a gigabit link for years

this

and
figure out whether you need latency or bandwidth. Is your editing purely linear or is it random-access?

>Consider that you probably dont have 10GbE or better
Actually I do. I have two 10GbE cards on each machine so that they can connect to each other directly is need be in case the storage array goes down. They are all hooked up to a 10GbE switch I bought used for $500, so I don't have to have five 10GbE NICs on the NAS taking up all of the PCIe slots. I don't think PA is necessary on the workstations (yet), so it's one 10GbE SFP+ connector per machine and two on the NAS itself.

source plz

Purely linear, which makes going with an HDD array a bit more palatable. But if I am going to RAID a bunch of platters, I want to know what sort of speeds I should expect and whether I can make them fast enough to write files at rates in excess of 1.0GB/s.
And I'll be using hardware RAID solutions for either a SAS or SATA array, not the onboard RAID on the motherboards since they peak at around 1.1GB/s write on my benchmarks. And that's with only only workstation writing onto the disk, not multiple workstations reading and writing off of the array at the same time.

>They are all hooked up to a 10GbE switch I bought used for $500
which make/model? a quanta?

You still havent defined your budget or what "large" files are.

> I want to know what sort of speeds I should expect
pic related is 8x HGST UltraStar 7K4000 SATA disks in a RAID 6 behind an Areca 1833ix-24 with 8GB of cache

>1833ix-24
1883ix-24

pic related is 8x 480GB Seagate 600 Pros on the same controller

Better RAID cards like the 1883ix and others also support flash tiering where one or more SSDs is used as a write cache for a HDDs which would work well in your scenario.

>which make/model?
Give me a sec, need to make a trip downstairs to check the box
>You still havent defined your budget
There's no limit to how much I can spend on the NAS, as far as I know.
>what "large" files are.
Like I said, mostly raw and encoded (to VP9) video + audio files. Resolution and framerate depends on what the source recording camera is. Same with audio quality, since the people who we work with usually bring their own audio equipment and somehow expect everything to "just werk". I need to take into account the worst case scenario of recording raw 2160p 16bit or higher color depth video so we don't have any problems down the line until the next time we need an equipment upgrade.

Questions:
These benchmarks are only for a single thread read/write workload on the array, correct?
What does it look like when multiple concurrent read/write operations are done on the array?
Can the 1833ix-24 handle more than four active read/write operations at a time?
Should I opt for SAS rather than SATA with this sort of workload?
What sort of SAS or SATA controller do you think can handle what I need?

>Better RAID cards like the 1883ix and others also support flash tiering where one or more SSDs is used as a write cache for a HDDs which would work well in your scenario.
As a write-through cache?
Would it matter how large the cache is compared to the actual video file sizes?
That might make SATA arrays more manageable for the expected workload.

I'd love to answer what switch it is, but my fucktarded client took the damn thing home with him. He's the one who picked the switch up from the seller, so I have no idea if he bought a legit 10GbE switch or some chinkshit hub with "10GbE" taped on its front.

I'm going to assume that it's a legit 10GbE switch for the sake of my sanity and the only thing I know about it is that it's a Dell-branded rackmountable 10GbE switch with 8 SFP+ ports and two bridged ports.

They should just shove the dudes into their washing machine and turn them into respectable Chongs.

>Actually I do. I have two 10GbE cards on each machine so that they can connect to each other directly is need be in case the storage array goes down.
Why would the storage array going down matter if they're connected via a switch?

Just bond the NICs together for doubled bandwidth (20 Gbps). Or if you're worried about downtime, get a second switch and use them redundantly.

>hardware RAID
don't waste your money, just use mdraid or zfs. They won't be your bottleneck, and they will prevent your from shooting yourself in the foot

ZFS also supports this, as does linux with mdraid+bcache

still no reason to waste your money on hardware RAID controllers

>2160p 16-bit raw video
You didn't specify the framerate or subsampling so I'll assume it's 60 fps 4:4:4 (worst case scenario).

At that configuration you'll be pushing 23 Gbps of traffic if you want realtime playback

>These benchmarks are only for a single thread read/write workload on the array, correct?
yes

>What does it look like when multiple concurrent read/write operations are done on the array?
The box it is on runs ESXi now so I cant really benchmark it easily without taking down 30+ VMs.

>Can the 1833ix-24 handle more than four active read/write operations at a time?
Yes, its based on a LSI 3108, pretty much the top end RAID chip right now.

>Should I opt for SAS rather than SATA with this sort of workload?
It is more defined by your budget, but ideally yes. SATA managed to fuck things up in ways you wouldnt think possible.

>What sort of SAS or SATA controller do you think can handle what I need?
More defined by your budget.

>As a write-through cache?
yes

>Would it matter how large the cache is compared to the actual video file sizes?
yes, when the cache gets full things slow down. I dont bother to use tiering, I just dump all the VMs on the SSD RAID, and have a large shared virtual disk for my NAS on the HDD RAID.

>I need to take into account the worst case scenario of recording raw 2160p 16bit or higher color depth video so we don't have any problems down the line until the next time we need an equipment upgrade.
how many TB of storage do you need? what do the file sizes look like? >At that configuration you'll be pushing 23 Gbps of traffic if you want realtime playback
unless you really do have an unlimited budget it sounds like you'll probably be doing tiering.

>Just bond the NICs together for doubled bandwidth (20 Gbps).
LACP doesn't improve bandwidth if there is a single data stream which sounds like the case for the workstations.

>memefs

>still no reason to waste your money on hardware RAID controllers
offloading disk operations from the CPU is why RAID cards still exist.
Then OP is fucked and his low end 10GbE switch is useless.

>At that configuration you'll be pushing 23 Gbps of traffic if you want realtime playback
Then OP is fucked and his low end 10GbE switch is useless.

>get a second switch
I didn't consider that since my boss might be more than frustrated at spending another $1500 on a "unnecessary" switch. I'm not sure I can make a strong enough argument for that when he knows it's cheaper to replace the switch outright and spend the money then on a new switch.

>You didn't specify the framerate or subsampling so I'll assume it's 60 fps 4:4:4
Think worst-case, so probably 60fps (even though I've never seen any of our clients use anything other than 25fps 4K cameras). They all use 4:4:4 subsampling, so yes.
>At that configuration you'll be pushing 23 Gbps of traffic if you want realtime playback
Are you sure? I thought the total bandwidth wouldn't push more than 10Gbps even with my worst-case scenario.
Is there any way to team 2 10GbE ports with 4 1GbE ports? Is that even possible?
I should be good for 2160p 160bit 4:4:4 at 25fps, right? Last I checked, the most I needed was about 12GbE total bandwidth for that.

>don't waste your money, just use mdraid or zfs
>still no reason to waste your money on hardware RAID controllers
Dude, the NAS array is powered by a puny dual-core Pentium G4400. There's no way in hell it can manage that many near-saturated 10GbE connections at once AND RAID arrays with such a weak processor without something slowing down.
I wouldn't have gone for the expensive options if the cheap option could do the job.

>wouldn't push more than 10Gbps even with my worst-case scenario
Fuck, I meant 19Gbps

>LACP doesn't improve bandwidth if there is a single data stream which sounds like the case for the workstations.
Depends on the bonding mode. You can round-robin them on linux etc.

>offloading disk operations from the CPU is why RAID cards still exist.
Even during RAID6 recovery there is virtually no CPU load. A single core of my 3.3 GHz SB-E processor has 10 GB/s throughput with SSE2/SSSE3 instructions, basically bound only by memory.

>I didn't consider that since my boss might be more than frustrated at spending another $1500 on a "unnecessary" switch. I'm not sure I can make a strong enough argument for that when he knows it's cheaper to replace the switch outright and spend the money then on a new switch.
I thought your budget was unlimited?

Anyway, a second switch is only needed if you want fault tolerance against a switch failure

>What sort of solution should I be looking at? PCIe SSDs in JBOD, SATA SSDs in RAID0, or SATA/SAS hard drives in a RAID0?

10GbE and lots of NLSAS HDDs will be the most effective solution. SSDs are great for low latency random IO, but no better than SAS HDDs for large sequential writes (no, not even PCIe ones).

>Are you sure? I thought the total bandwidth wouldn't push more than 10Gbps even with my worst-case scenario.
16 bits/sample
3840*2160 samples/plane
3 planes/frame
60 frames/second

16*3840*2160*3*60 = 23e9 bits/second

for ( drive speed not fast enough )
{ drive++ }

Also add Ethernet/IP/UDP overhead

>The box it is on runs ESXi now so I cant really benchmark it easily without taking down 30+ VMs.
If you're running 30+ VMs off of it, surely you have more than 1 thread of requests going to the array at once. I just need pure sequential writes/reads, not random small file writes.
>LSI 3108
That's the RAID cards I was looking at, but that's SAS2/SATA3 only, right?
Wouldn't a SAS3 drive theoretically yield much higher sustained read/write operations with multiple requests? Or does that not matter with SAS?
> SATA managed to fuck things up in ways you wouldnt think possible.
How so? I'm guessing I should so SAS only on the RAID card, but does it matter for the drives?
>unless you really do have an unlimited budget it sounds like you'll probably be doing tiering
So which RAID solutions should I be looking at if I need tiering?
>LACP doesn't improve bandwidth if there is a single data stream which sounds like the case for the workstations
In my case, there will be multiple data streams, but if what you're saying is correct, then I may need more than 2 teamed 10GbE ports?
Fuck me, is that even supported on current 10GbE+ NICs?

>Dude, the NAS array is powered by a puny dual-core Pentium G4400.
Nevermind. I didn't see that. You're right, trying to build a server as powerful as this on a dual-core pentium is a joke.

Get a system with 128 GB of RAM and 8-16 cores at least, OP. If you truly want to serve that many clients at that bandwidth.

>Wouldn't a SAS3 drive theoretically yield much higher sustained read/write operations with multiple requests? Or does that not matter with SAS?
The bottleneck is the physical drive, not the SAS cable. Your HDDs won't hit 3 Gbps

>>get a second switch
>I didn't consider that since my boss might be more than frustrated at spending another $1500 on a "unnecessary" switch
This wont hope at all unless you have multiple data transfers going at the same time. If that user was right about 23Gbps you're looking at 40GbE which is going to be a lot more than $1500.

>Is there any way to team 2 10GbE ports with 4 1GbE ports?
I havent tried it before and aren't motivated enough to do it on my switch, but teaming ports requires multiple data streams to increase bandwidth. It would help on the file server but be useless for the workstations.

>the most I needed was about 12GbE total bandwidth for that.
Then you need 40GbE

>Dude, the NAS array is powered by a puny dual-core Pentium G4400.
You're fucked.

>Fuck, I meant 19Gbps
You need 40GbE

>That's the RAID cards I was looking at, but that's SAS2/SATA3 only, right?
Its a 8 port SAS3 (12gig) chip, my card has a 28 port expander on board.

>How so? I'm guessing I should so SAS only on the RAID card, but does it matter for the drives?
I had a SSD start to die, commands would time out and it would cause the failure lights on all disks in the array to go on. Which made it impossible to figure out which disk was dying till it actually did die and the controller figured it out. If you google around there are other people reporting similar things where a single disk can cause all disks on the controller to in effect fail. As another user said, look at nearline SAS if you can afford it.

>So which RAID solutions should I be looking at if I need tiering?
LSI3108s support it, it needs a license though iirc. Rebranded cards may include support for it like my Areca.

>In my case, there will be multiple data streams, but if what you're saying is correct, then I may need more than 2 teamed 10GbE ports?
You need 40GbE which is going to be expensive as fuck.

>Fuck me, is that even supported on current 10GbE+ NICs?
Yes.

>I thought your budget was unlimited?
It is in the sense of the NAS array itself, but my boss would rather not have me spend money on shit he can physically see (like adding another switch on top of the one we already have). As long as he doesn't see the money at work, it's less of an annoyance to him. He's a Jew, what do you expect?
>a second switch is only needed if you want fault tolerance against a switch failure
I'd like that. Not sure if my boss will approve of that until switch failure during production costs him money.
I thought if I offset most of the CPU load onto different RAID cards and CPU-efficient NICs, I wouldn't need anything more than a Pentium. The solution we have right now is an Ivy Bridge Core i3 and I've never seen CPU usage higher than 20% even with both RAID and NIC cards being saturated by our capture machine. I assumed that half the threads with almost twice the CPU frequency would be enough.
But I think you're right. I already got the C236 motherboard, but I'll switch it to a Xeon E3 v5 when I review the order.

>I've never seen CPU usage higher than 20% even with both RAID and NIC cards being saturated by our capture machine.
How many gbps was it pushing? Do the NICs have RDMA support?

>How many gbps was it pushing?
I think that time it was pushing three ~900 Mbps streams on the NIC and an additional ~600 Mbps doing a back-up for our old audio-only workstation. I'm not sure about the RAID drive, but that's was the only storage we had on that machine.

>Do the NICs have RDMA support?
No
ark.intel.com/products/49186/Intel-Ethernet-Server-Adapter-I340-T4

>three ~900 Mbps streams on the NIC and an additional ~600 Mbps
You had 20% CPU load with

You're basically saying that my most expensive parts will be the network cards and switch rather than the RAID array, which I was afraid of. We already bought the switch (although I'm not sure if it's legit since I haven't had a chance to see it yet), so I'm already fucked one way for wasting money.
>It would help on the file server but be useless for the workstations.
Would it be better to install RAID cards in all of the workstations and use the new NAS as a non-live storage then? That would cost production time, but I can see what you're saying about the network itself.

That was for the Ivy Bridge system. I'm using these for the new Skylake-based NAS.
ark.intel.com/products/83964/Intel-Ethernet-Converged-Network-Adapter-X710-DA2

>You're basically saying that my most expensive parts will be the network cards and switch rather than the RAID array
Well a 12 port 40GbE Cisco Nexus 5624Q is going to run around $20k plus all the NICs and QSFPs, and thats assuming you only need a layer 2 switch, if you want full layer 3 you get to pay even more for licenses.

>Would it be better to install RAID cards in all of the workstations and use the new NAS as a non-live storage then?
It will be a lot cheaper.

You can make your life many times easier by giving each workstation its own local storage that it can access at full speed.

Plus, if you let all workstations share data with each other directly instead of using a central bottleneck, you will get O(N^2) times the potential bandwidth.

>$20k plus all the NICs and QSFPs
Yeah, that's not happening. Our NAS and network upgrade to 10GbE is potentially costing us almost $15k.

>you will get O(N^2) times the potential bandwidth.
Sorry, brain fart. That's only true for a fully connected graph, not for a switch. You will get O(N) improvement only.

>You can make your life many times easier by giving each workstation its own local storage that it can access at full speed.
I think you're right. The original plan was to save money by not having multiple RAID cards and drives in each station and having everything centralized to reduce costs, but if the network isn't enough to handle even our current 4K workload and upgrading to a network that can costs more than the expected NAS cost alone, then it's not worth it.

Throw a couple of PCI-e SSDs into the workstations and use soft RAID? I assume they have capable CPUs

>I assume they have capable CPUs
They're all Xeon E5 v3s. We already have SATA RAID cards in them because the built-in RAID solution with the motherboards had a data cap of around 1.1GB/s. Adding more drives didn't help, I'm guessing because it either fully saturated the DMI 2.0 from the southbridge on those boards or the SATA controller itself. PCIe SSDs in JBOD seems like an ideal solution, but they're still too small for most of our recordings. We run out on 4TB drives within 3 hour with 4K recording sessions.

Have you tried using lossless video instead of raw video? FFV1 can typically give you 60%-70% reduction in bitrate over raw video, which would (in principle) cut down on your costs by the same amount

>Have you tried using lossless video instead of raw video?
Our capture cards are very picky about compressed outputs and can only do a few proprietary lossy codecs instead of raw outputs. I'd love to swap card out, but then we have to get new drivers and firmware installed that don't play well with some of our cameras, audio equipment, etc.

That webm rustled my jimmies