Advanced Computing in the Age of AI | Sunday, September 25, 2022

Oracle Lifts The Veil On Big Memory Sparc M6-32 System 

oracle-ellison-m6-32-vs-ibmThe word on the street ahead of the Oracle OpenWorld extravaganza, which opened on Sunday evening in San Francisco, was that CEO Larry Ellison would take the stage and talk about the company's biggest and baddest server to date along with in-memory extensions for the Oracle 12c database. Those rumors turned out to be spot on.

The precise feeds and speeds of the "Big Memory Machine," as Ellison called the Sparc M6-32 system, were not available as EnterpriseTech, went to press, but we will hunt them down. In the meantime, let's go over what we know from Ellison's keynote address.

The Sparc M6-32 is based on the twelve-core Sparc M6 processor, the second of the high-end Sparc processors that Oracle has launched since it took over Sun Microsystems in early 2010. The M6 has twice as many cores as the M5 processor it replaces; both chips are based on the "S3" core design that has also been used in the prior two generations of the Sparc T series chips aimed at entry and midrange servers.

The T4 and T5 have more cores and less L3 cache memory, while the M5 and M6 have fewer cores, have a lot more L3 cache (48 MB in this case), and are aimed at massively scaled, shared memory systems rather than bitty boxes. Each M6 core has eight threads, which means a single socket in the box has 96 individual threads for processing. On Java, database, and other parallel workloads that like threads, these chips will offer decent performance. It is not clear how well they will do on single-threaded jobs. Oracle has not yet divulged the clock speeds on the Sparc M6 processors, but the M5 chips, which were implemented in the same 28 nanometer chip making processes as the M6, ran at 3 GHz.

The M6-32 system has a whopping 1,024 memory sticks in its chassis. That works out to 32 TB of main memory in a single system image with 384 cores and 3,072 threads. Ellison said that the 384-port interconnect, code-named "Bixby" and invented by Oracle, has 3 TB/sec of bandwidth linking the system together in a 32-way setup. The M6 chips are linked in groups of four into a single image using interconnect ports on the chips (much like a symmetric multiprocessing server), with the Bixby interconnect then hooking together multiple four-socket nodes into a single system with non-uniform memory access (NUMA) clustering. The M6-32 has 32 sockets can scale all the way up to 96 sockets. Ellison said that the M6-32 had 1.4 TB/sec of memory bandwidth and 1 TB/sec of I/O bandwidth.

"This thing moves data very fast and it processes data very fast," bragged Ellison.

As he often does when he launches new systems, Ellison made a point of comparing his box to the largest IBM Power Systems machine, in this case the 32-socket, 256-core Power 795 server.

Presumably these are like-for-like comparisons, but without the details, it is hard to say how fair the comparison Ellison made above is.

In addition to being offered as a standalone server, the M6-32 will also be available as a so-called SuperCluster configuration, with Exadata storage servers (for speeding up database processing) hooking into the machine through a 40 Gb/sec InfiniBand network.

The Sparc M6-32 is available now and runs the Solaris 11 version of Unix. The chart above suggests that a 32-socket, 32 TB configuration should cost around $3 million.

While this machine is not limited to database processing, given the large memory footprint and the number of threads it can bring to bear, the M6-32 was certainly designed for database processing. And, specifically, is well matched to the new in-memory extensions to the Oracle 12c database that Ellison also previewed during his OpenWorld keynote.

These new in-memory database extensions created by Oracle are clever. Traditional databases store data in rows and index them for fast searching and online transaction processing. Newer databases (including the Exadata storage servers) use a columnar data store plus compression to radically speed up analytics pre-processing that is then fed up to the database.

With the new in-memory extensions, Oracle is now storing data in both row and column format – at the same time – and updating each to maintain transactional consistence across the formats. This puts overhead on the databases, as Ellison explained, but now because analytical indexes no longer need to be maintained and the columnar store is running in memory, there is less work for the database to do as information changes. (It doesn't have to update all of the OLTP and analytical indexes.) And thus, rows can be inserted somewhere between three and four times faster, transaction processing rates increase by a factor of two, and queries against that native columnar store that is residing in main memory can process queries at least 100 times as fast as the Oracle 12c database in a row format running off disk drives.

23 Responses to Oracle Lifts The Veil On Big Memory Sparc M6-32 System

  1. Brett Murphy says:

    The best way to describe the Oracle M6-32 is a “Unbalanced Mess”. They took 4 x T5-8 servers reducing the cores from 16 to 12 gluing them together. This is supported by the fact there are just 4 Dynamic System Domains and not 32 which one would prefer for granularity of resources.

    I’m picturing the latex glove where it is blown up like a balloon. It is grossly mis-figured. So many cores, so much memory yet only 64 PCIe slots, only 32 internal drives divisible into how many groups? Only 4 DSD’s yet 512 LDOM’s….barely enough to cover all the physical cores and still not enough to support all of the threads.

    Let’s see a M6-8, M6-16 and a M6-24 along with the M6-32 all showing some benchmarks. I bet we would see the performance drop off from each tier significantly. Can’t imagine how poor the performance would be for customers actually using dozens of LDOM’s on this server. Oracle is simply about software. They want lots of cores with features that prevent you from limiting them. Big servers mean big dollars to Uncle Larry. I’ll stick with my IBM Power servers where I can control my Oracle software licensing. And I don’t need a 780 like Oracle likes to say. Where Oracles says 64 cores I use 16 IBM Power cores and still license just the cores I need for Oracle.

    Btw, don’t be fooled by the claims of performance increases necessarily. Just moving the db into memory will yield significant results. They are letting the readers think the server has more to do with that than simply putting it into Ram.

    • Daniel says:

      @ Brett … how about using DB2 and have inside control on db server(s), multi-threading|tasking, etc; since you are on the IBM trust list you could even talk to the developers to get special builds for your sales; the other option is of course open-hardware&software with the sources in your developers hands

      … ideas coming from just a ‘poor’ architect!

      P.S. just curious: what happened with ‘As Is Arizona’?

      • Brett Murphy says:

        I agree. I understand loyalty to a brand….but to a point. DB2 offers customers so many more features for less money. Compression, Partitioning, Tools, Replications, Data Protection, BLU and PureScale and it includes the first year maintenance on new licenses where Oracle charges 22% above the license cost. I did an analysis for a AIX / Power customer between the two and Oracle was over twice the cost of DB2 (for Linux Unix & Windows) ASE edition which is equivalent to Oracles Enterprise Edition offerings.

        • Daniel says:

          ‘Oracle is simply about software’ !?!

          Unfortunately you missed completely my point of complete irony … an IBM vendor with apparently not so much technical background (just reading from your discourse/critic) on either development or system/network engineering, commenting on the competition … IBM has a real competitor on its main business, hardware, now with Sun’s resources!

          Sun had already quite a large market share so …

          I have background on all the above thing … I used to be, among other things, in my ‘youth’ in the Linux kernel development gang hacking for a better future 😉

          Just curious … ever tested M6-32s and tried to build architectures with those hosts? If yes would you be kind and share the configurations with us; just to have a baseline on your critic!

          • Brett Murphy says:

            Interesting that you question my place to comment on technology while you assert that you are qualified…well, let me step back and defer to you sir – the floor is yours as you have clearly made your case…..not!

            Larry has stated he wants to be like Big Blue and just because you say the M6-32 is a competitor doesn’t make it so. And where is the rest of the M6 family? Who wants to buy just only a 384 core server with 32 or 16 TB of Ram? Where is the 16, 32, 64, 128? I guess a customer could buy a M5-32 the 6 core per socket version of the M6. My guess that means this was Oracles attempt to take the chips that wouldn’t pass with 12 cores at 3.6 GHz so they made them 6 cores – nothing wrong with this as everybody does it. What this shows is there is no real scalability with either the M5 or M6. Both are unbalanced messes….Let’s face it, Oracle is a software company who makes money from selling software licenses. If you could give away the M6-32 and get customers to buy Oracle licenses on them you would just like HP giving away printers to sell ink. Customers see through this.

            Sun’s market share was quite large. I worked there for 10 years – was a wonderful company but they lost their way with Schwartz and they never were very good at evolving a product as they tended to take a good thing and drive it until they wore it out. I can give you chapter and verse.

            I haven’t tested a M6-32….have you actually tested a Power7+ server using AIX and PowerVM? I have the benefit and perspective of working at both Sun and IBM (I no longer work for either) and have experience with both. I have also read the datasheets, reviews, whitepapers and docs for the M6 and M5-32 servers. The servers weaknesses and limitations speak for themselves. Sounds like you work for Oracle so I guess you have to love your ugly baby – if Oracle is anything they are a outstanding marketing company. Oracle can do more than have their staff refute negative comments about their servers but instead they can publish competitive benchmarks using valid benchmarks with current servers in the results for that posting instead of the cherry picking Oracle is well known for. Let’s see, I’ll publish a benchmark against a Power6 server from 5 years ago then claim dominance – woohoo, you are Da MAN! Or, Oracle does a benchmark for 1TB then extrapolates and says how it would destroy IBM Power7. Wait, I’m on a roll so I’m going to say more. Or, you will use the high end 795 server to compare your result against so you can try to show a better price/performance. Or, or…or, you will use Oracle support that is “per incident” or Oracle database licenses that are not perpetual like most customers buy to manipulate the results.

            If you would like to hear more I would be happy to dissect more of the M6/M5-32’s features so the rest of the internet can continue to learn more.

            I’ll take any Power7+ server with 1/4th the number of cores of any SPARC or Exa server. I’ll use a IBM FlashSystems RamSan 820 which is a 1U 300 watt 35 lb 24 TB flash array generating over 500k IOPS with latency as low as 25 microseconds, preferably with DB2 v10.5 but since my AIX on Power server run’s Oracle better than your SPARC box either one will beat your server in performance and overall TCA and TCO. Unlike vendor games used for benchmarks we would use “real” customer discounts, licensing and prices. If Larry offers “Oracle will pay customers $10 million if a Sun-Oracle configuration isn’t at least twice as fast as a comparable IBM solution.” I’m sure he isn’t afraid of this challenge.

  2. […] round out the feeds and speeds of the M6-32 that debuted earlier this week, which were missing at the time. The M6 chip has one cryptographic unit and one floating point unit […]

  3. Daniel says:

    Again a long diatribe of sales talk:

    Fist, Larry ‘might sue’ you to call him a ‘liar’ and will not give me anything from the ‘proceeds’ since I am not defending/working for anybody but myself!

    Second, and returning to the focus of my position – I am not a talker (neither sales nor management) but just a builder – systems based on custom Linux kernels (since I/we have the source) optimized to do the right/best thing from the client perspective – balancing processing, memory and if needed network bandwidth for the job(s) at hand … Exadata, InfiniBand, etc.

    Third, Timothy and Gary (Combs) are invited to step in as referees!

    (http://www.oracle.com/technetwork/server-storage/sun-sparc-enterprise/documentation/o13-066-sparc-m6-32-architecture-2016053.pdf)

    @ Timothy – M6 – 3.6 Gz – official
    + what Larry bragged, I see it true in reality!

    P.S. Even Larry could come in here and ‘brag’ some more 🙂

    • Brett Murphy says:

      Hmmm, you used the word ‘liar’, no me. Clearly we disagree on whether or not the SPARC M6-32 (and M5-32 for that matter) are unbalanced messes. the docs to include the pdf link you provide shows it. So you know I have no issues with Solaris or Linux. RedHat and SuSe are great. Oracle’s UBL is another example of Oracle trying to lock in customers to their technology stack. None of the above offer the features that AIX does – but that is a Ford vs Chevy discussion. What is undeniable is the M6 architecture, it’s limited IO, limited to just 4 domains down from 32 with the M9000. Barely enough LDOM’s to cover the number of physical cores. The continued inefficient thread dispatch of 2 threads per clock cycle – what does that do to LDOMs? Maybe that is why they limit the number to just above the number of physical cores so you don’t give customers too much rope. Compare that to 4 threads executing every clock cycle on Power. 1000 DIMMs? Why, just because you can? So Oracle can say they have a 32 TB server? Well, the cost is they increased the MTBF by increased the number of components. There was a time when Sun tried to reduce the number of components in servers as they recognized that contributes to failure. Also, by having so many DIMMs, the M6/M5 architecture puts 4 DIMMs on each channel which is 16 DIMMs per memory controller – yes, great for capacity but not so good for performance. Power servers are about balance computing not showing off just for the sake of showing off. The enterprise servers like 770, 780 and 795 provide 1 DIMM per memory channel and 4 memory channels per memory controller.

      If I could award the M6/M5 a virtual Grand Price ribbon for “Unbalanced Mess” I would. Btw, you mention ExaData – another IBM wannabe. Trying to be like IBM’s integrated IBM i (ie OS/400) server. ExaData is hugely propietary, singular purpose, restrictive and expensive. Unlike the 25 year old IBM i environment which can run side by side with AIX and Linux on Power servers all 3 environments are more open, flexible and inexpensive than ever. ExaData is whitebox x86 in a inefficient design, flash and lots of software to overcome all of the hardware, scalability and performance limitations. Any Power server with IBM FlashSystems running Oracle or DB2 will smoke ExaData. I know, I know. You are smarter than I am and you know what you are talking about and I don’t…ok, uncle. Seriously though, if you were in the midwest I would buy you a beer and shake your hand. I’m passionate but don’t mean to make it personal.

      IBM Power7+ 780 @ 4.42 GHz http://www-03.ibm.com/systems/power/hardware/780/specs.html – outperforms, more efficient and significantly lower TCO over SPARC and x86 running Oracle! I see it and do it every day.

  4. Daniel says:

    Timothy started with Ellison and you continued with Larry … what more personal than that … sorry if you feel to personalized in this ‘mess’!

    In our labs I see the systems coming pretty balanced … repeating myself … to the job(s) at hand! And then of course on clients grounds not much of a ‘mess’!

    At the begging of this millennium, IBM invested around 2 billion dollars and then Caldera wanted them; after the noise I did not see much Linux involvement on developerWorks; is AIX resurrected and maybe prepared to be open-sourced? (full kernel and all!)

    I am not necessarily smart but I feel a hacker in the MIT original semantics not to mention Berkely! Eric would help me on that:

    http://catb.org/esr/writings/hacker-history/hacker-history.html

  5. Jody says:

    Very interesting conversation, but I think the biggest point is way over looked here. ORACLE LIES!!!!! and a lot. Not my personal opinion but just look at the fact that FTC has had to smack there hands time and time again. Sorry, but I would not trust them any more than any two bit used car salesman trying to tell me that Hummer is more fuel efficient than a Toyota Prius..

    The facts really speak for themselves.

    http://www.wired.com/wiredenterprise/2012/07/oracle-advertising-slapped/

    http://www.eweek.com/c/a/IT-Management/Oracle-Rapped-for-Misleading-Advertising-826091/

    http://techcrunch.com/2012/07/25/oracle-pulls-ads-after-national-advertising-group-says-it-made-false-claims-against-ibm/

    http://www.itbusinessedge.com/cm/blogs/enderle/oracles-alleged-false-advertising-showcases-deeper-problems/?cs=50853

    http://www.businessinsider.com/oracle-attack-ads-cause-trouble-2013-8

    • Daniel says:

      Jody dear you just made my point. I don’t care of corporate lies I care of toys good enough to help in my projects; the focus here, at least mine, is on computing not business!

    • kebabbert says:

      Oracle lies? Have you read the complaints? In one case Oracle said “20x faster than IBM” referring to one specific case. That is unfair, complained IBM, and Oracle had to stop using that.

      Have you heard about when IBM claimed that one z10 Mainframe with 64 cpus, can replace 1.500 x86 servers? Well, the fact is that the z10 cpu was much slower than a decent x86 server cpu, back then. The z10 cpu is in par with an old Celeron cpu. So, how could 64 weak cpus, replace 1.500 x86 servers? After digging around a bit, I found that all x86 servers idle, and the IBM Mainframe is loaded 100%! And also, the x86 servers were old and had something like 256MB RAM or so. What would happen if some 100ish x86 servers started to do some work? The IBM Mainframe would never be able to catch up. So why was IBM not forced to pull back that advertisement? It is possible to emulate a IBM Mainframe on a laptop, using open source “TurboHercules”. If I boot up three Mainframes on my laptop, can I claim that my laptop can replace three IBM Mainframes? Would you consider that, a lie?

      Even the newest z12 IBM Mainframe cpu at 5.26GHz is way slower than a decent x86 cpu.

      “Here is a source from Microsoft about hos slow the IBM Mainframe cpus are:
      http://www.microsoft.com/presspass/features/2003/sep03/09-15LinuxStudies.mspx?
      “we found that each [z9] mainframe CPU performed 14 percent less work than one [single core] 900 MHz Intel Xeon processor running Windows Server 2003.” The z10 is 50% faster than z9, and the z196 is 50% faster than z10, which means a z196 is 1.5 x 1.5 = 2.25 times faster than a z9. This means a z196 corresponds to 2.25 x 900MHz = 2 GHz Intel Xeon. But todays modern server x86 cpus have 8 cores, which means they have in total 8 cores x 2 GHz = 16 GHz. We see that x86 at 16GHz is more than z196 at 2GHz. This shows how slow the z196 cpu is

      Here is another source from a famous Linux expert that ported Linux to IBM Mainframe, who says 1MIPS == 4MHz x86.
      http://www.mail-archive.com/[email protected]/msg18587.html
      This shows that a z196 with 1400 MIPS corresponds to 5,6GHz x86. But a modern x86 has 8 cores, that means it has in total 16GHz, which is 3x faster than 5.6GHz. Again, we see that the Mainframe is not suited for number crunching.

      Here is another link where the cofounder of Mainframe emulator TurboHercules says that a 8-way Nehalem-EX gives 3.200 MIPS using software emulation: http://en.wikipedia.org/wiki/TurboHercules#Performance
      But software emulation is 5-10x slower. This means a 8-way Nehalem-EX running native code should be 5-10x faster, that is, 16.000 – 32.000MIPS. This big MIPS number matches a fully equipped z196 mainframe with 24 cpus.”

      Ergo, you can replace the biggest IBM Mainframe sporting 64 cpus, with a 8-16 cpu x86 server.

      Also, the latest zEC12 IBM Mainframe cpu, is just 30% faster than the previos z10, so the zEC12 is not that much faster than z10 cpu.

  6. Brett Murphy says:

    No need for IBM to open source AIX. Power is about flexibility for customers. For maximum security, features, scalability, stability as a commercial Unix OS then customers can choose AIX. Customers who require integrated business apps choose IBM i. Those who want open source on the most scalable, reliable and flexible platform can choose Linux (RedHat & SuSE) on Power. IBM just announced a $1B investment in their PowerLinux initiatives not to mention the OpenPower Consortium. https://www.enterpriseai.news/2013/09/18/ibm-invests-1-billion-linux-revive-power-systems/
    http://www.itjungle.com/bns/bns080613-story01.html

  7. Steve Thomas says:

    I hope it’s not too late to ask a question / make a comment here.

    I’m looking at an extreme high-scale workload that looks like a big Java program running entirely (for all intents and purposes) out of main memory. I will be accessing an in-memory data store in a randomly accessed way (get .1k of memory, process, get .1k more memory, process, etc.).

    Now, I like the idea of having main memory in the terabytes, and I can (eventually) use all it.

    The problem sticking in my mind is this: 48MB is puny to the point of meaningless in the face of 32TB of RAM. That means that my app is basically going be making long-distance memory requests about every 5-10 instructions, say.

    My benchmarks show me that random access to RAM when you go past the cache size is about 20-30ns on my test system. Maybe faster on a better processor? Sure. Call it 15ns?

    15ns = a 66Mhz processor. Remember those?

    Like a system in the old days that was IO constrained, I am now more interested in the overall max memory bandwidth (requests per second) more than I am the aggregate processing power (which is now basically “infinity” now).

    So hence the real question for evaluating a high-scale system for me is, what is the max number of main memory requests per second for the system? That is going to be my limiting factor, and I suspect my app is not that different than a lot of others (in-memory Oracle is probably going to have a similar profile).

    And the corollary question is, given a (call it) 1:10 ratio of processing instructions to memory fetches/stores, what is the *actual* performance of Sun’s offerings vs. others?

    If you think about it, a 3Ghz process is executing an instruction every ~1/3 ns and thus nopping ~45 times when main memory is fetched. With my processing estimate of a 1:10 ratio, that means that I’m really getting 1/4 of the actual clock speed. (Hench hyper threads are very real to my app).

    Now, take the stated overall system bandwidth of 1.4TB/sec. I’m going to guess that is derived from, say, 1k packets (I might be completely wrong in my approach here?). If that was the case though, That means 1.4 billion memory fetches per second. For my app profile, a total of 14 billion instructions per second.

    In short, my app would hit peak performance on an M32 system with ONE SOCKET.

    Ergo: WE SHOULD START RATING HIGH-END SYSTEMS BY AGGREGATE MAIN-MEMORY ACCESSES PER SECOND.

    So my question for experts here is, what am I missing? I’m doing a lot of hand-waving in my assumptions. I’d be curious about what people thought.

    Thanks!

    • kebabbert says:

      @Steve Thomas,
      Have you tried to redo calculations using x86 or IBM POWER? You will get worse results. The SPARC T5 is the worlds fastest cpu today, with many world records, for isntance 240% faster than IBM POWER7 on TPC-H. Here are some world records:
      https://blogs.oracle.com/BestPerf/entry/20130326_sparc_t5_speccpu2006_rate

    • kebabbert says:

      Actually, it is an old well known problem you bring forth. But in essence, you are correct.

      Studies from Intel shows that a server x86 cpu, under full load, maximum usage, idles 50% of the time because it waits for data. The pipeline stalls, so it must wait for data. A server workload serves many 1000s of clients, so all that data can never fit into the cpu cache – so the server must always go out to the slow RAM.

      A desktop cpu typically can fit all work data into a small cache, so that is ok.

      A server x86 might run at 2.5GHz, and RAM runs at a lower speed. IBM POWER6 runs at 5GHz, and RAM runs at a lower speed, this means that POWER6 gets massive cache hits all the time. It might wait for data maybe 60% of the time, under full load. Or even more.

      Thus an ideal server cpu must not be sensitive to cache misses. Desktop cpus are very sensitive.

      This problem plagues all cpus on the market, there is no way to avoid cache misses. Oracle/Sun has now solved this problem in a unique way.

      The SPARC T1 was revolutionizing in that it did not tried to avoid cache misses (it had a very small cache), instead, when a cache miss occured, it switched thread in one clock cycle and continued doing work in another thread. Normally it takes 100s or 1000s of cycles to switch thread. But SPARC Niagara cpus can switch threads very very fast, so Niagara cpus (T1,T2,T3,T4,T5) never idle under full load, they have always work to do in another thread. Studies show that SPARC Niagara cpus idled something like 10% under full load – which is unique and revolutinizing. This is the reason a 1.6GHz Niagara could be many times faster than a 5GHz IBM POWER6 – on large server workloads where data never fit into cache. Because SPARC never got a stalled pipeline.

      Back in those days, the SPARC T1 had 8 cores and 4 threads in each – which was crazy back then. Normally you had single or max dual core back then.

      In 2015, Oracle will release a SPARC server with 16.384 threads and 64TB RAM – which sound crazy today, but I guess everyone will try to catch up on SPARC in 2020.

  8. […] Sparc M6-32, which was announced last fall, is the great-great grandchild of the Starfire system, and it is also one of the most scalable […]

  9. […] least at some customers. Oracle did not say anything about how the Sparc M6-32 massive NUMA boxes, announced last fall, were doing in the market, which bring 32 sockets and 32 TB of main memory to bear on a single […]

  10. […] shows that Oracle was planning to get the M6 processors in the field in early 2014, and in fact it pulled these in early and got them out the door in September last year in the M6-32 systems. In early 2012, Oracle pulled the Sparc T5 chips into late 2012, and then changed its mind and […]

  11. […] previewed the 12c in-memory database technologies last September during its OpenWorld conference, along with the launch of its top-end Sparc M6-32 systems, which have 32 sockets and support 32 TB […]

  12. […] similar sized boxes as well as the high-end Sparc M5 servers had chips based on the S3 cores. The Sparc M6-32 system announced last year and nicknamed the Big Memory Machine was based on the Sparc M6 chip, which also used the S3 cores. […]

  13. […] interconnect another proprietary interconnect is geared more towards enterprise applications (e.g. a 384-port interconnect, that has 3 TB/sec of bandwidth linking the system together in a 32-way […]

Add a Comment

EnterpriseAI