As HANA can now be run on non-appliances (for non-production use), I thought it would be interesting to find out what impact reducing the clock speed and core count of a system has on query performance.  Unfortunately, I don’t have access to hundreds of hardware configurations, making true comparisons impossible.  I do, however, have access to Westmere-EX HANA Server.  This is an IBM 512GiB Memory IBM Appliance, part number 7143H2G.  This is a standard appliance running SLES SP3.  It has:

  • 4 x 10 Core E7-8870  @ 2.40GHz
  • 32 x 16GiB DDR3 DIMMs
  • 8 x 600GB 10K SAS Hard Disks
  • 1 x 1.2TB FusionIO SSD PCIe card (for log area)

The rest of the specs can be found here

The testing I conducted used a single HANA instance running on revision 91.  I created a schema called “TESTING” and a table called “BIGTABLE”.  BIGTABLE was created with the following SQL:

CREATE COLUMN TABLE "TESTING"."BIGTABLE" ( COL01 DOUBLE, COL02 DOUBLE, COL03 DOUBLE, COL04 DOUBLE, COL05 DOUBLE, COL06 DOUBLE, COL07 DOUBLE, COL08 DOUBLE, COL09 DOUBLE, COL010 DOUBLE )

It’s a dumb table, but all I wanted to do was to test raw query performance, so a single table with lots of data was good enough.

I filled the table using SAP HANA’s RAND function:

INSERT INTO "TESTING"."BIGTABLE" ( SELECT TOP 10000000 RAND(), RAND(), RAND(), RAND(), RAND(), RAND(), RAND(), RAND(), RAND(), RAND() FROM OBJECTS CROSS JOIN OBJECTS )

I inserted 10,000,000 rows at a time until the table had 500,000,000 rows.  I then concocted a nasty query to keep the cores busy for a little while, which is not always easy with HANA.

SELECT COUNT(COL01) AS DCOL01, COUNT(COL02) AS DCOL02, COUNT(COL03) AS DCOL03, COUNT(COL04) AS DCOL04, COUNT(COL05) AS DCOL05, COUNT(COL06) AS DCOL06, COUNT(COL07) AS DCOL07, COUNT(COL08) AS DCOL08, COUNT(COL09) AS DCOL09, COUNT(COL10) AS DCOL10 FROM ( SELECT DISTINCT COL01, COL02, COL03, COL04, COL05, COL06, COL07, COL08, COL09, COL10 FROM "TESTING"."BIGTABLE" )

With the system running on all cores at full speed (2.39Ghz) the query took, on average over five runs, 13.1 seconds. This was after running the query once to establish SQL plan cache.

This is the baseline for the testing.

Reducing the clock speed

A HANA appliance will usually have the Linux CPU governor set to ‘performance’, which instructs the CPUs to run at their full clock speed at all times.  Changing the CPU governor can lead to lower performance (see SAPNOTE – 1890444 – Slow HANA system due to CPU power save mode).  However, this is exactly what I wanted, setting the CPU governor to ‘userpsace’ allows an application or the user to request specific clock speeds.

More information can be found here

I re-ran the query at the following clock speeds: 2.26Ghz , 1.99Ghz, 1.72Ghz, 1.46Ghz, 1.19Ghz, 1.06Ghz (the lowest available).

This graph represents clock speed vs query performance.  The clearest way to represent the results is showing the CPU speed and query performance as percent.  For clock speed, 100% represents the maximum clock speed from the primary test#( 2.39Ghz).  For Query performance, 100% represents the time take to complete the query (110% represent the query completing 1/10th faster, 90% represents the query taking 1/10th longer).

40CorePerformance

The data shows that as clock speed decreases so does query performance.  However, query performance seems to decrease at a lesser rate.

Reducing Core Count

After running the clock speed test on all 40 cores I preceded to reduce the core count.  This is done by disabling cores in the UEFI menu.

Along with the original 40 core tests I also ran tests with 32, 24 & 16 cores.

To graph the results the same technique as before was used again, except that 100% of performance was relative to the number of cores on test and not the original baseline.  The results are below.

32CorePerformance

24CorePerformance

16CorePerformance

As with 40 cores, we see a similar pattern.  As clock speed decreases so does query performance.  It also appears that when fewer cores available we see closer correlation between clock speed and query performance.

Comparing reduced core count only

With the available data it is also possible to consider a using fewer cores at the same clock speed affects query performance.

This graph show the data for reduced cored at 2.39Ghz.

CoreCountPerformance

Similar to the lowering of the the clock speed, as core count decreases so does query performance, but the correlation is not direct.  Query performance does not drop by as much, but may drop more rapidly if the core count were to drop below 16 cores.

Conclusion

The test performed here cannot be seen as a direct comparison between types of processors.  The tests try to emulate processors with differing clock speeds and core counts, the result are approximate.  These tests were conducted against Westmere EX, I will test against Ivy Bridge at some time in the future.

The data across all tests shows, unsurprisingly, that the best performance is achieved with a higher number of cores at the greatest available clock speed.  Query performance is reduced when lowering the CPU clock speed.  The percentage of performance reduction generally correlates with the reduction in clock speed.  Performance is also decreased when lowering core count, however, the performance decrease observed is lower, proportionately, to the decrease in cores.  However, it may be the case when the core count drops below a certain threshold that query performance is adversely affected.

The data suggest that users can, with some level of confidence, predict the performance of a lower cost non production system.  For example, a development system with half the amount of cores as a production system will perform roughly half as quickly.  A system using cores that are 20% slower than an appliance will result in a system that is roughly 20% slower.  It is also possible to calculate the expected speed of system that use lower clocks speeds and fewer cores.

We will be using this data to work with our customer to define non production systems that meet their performance needs while lowering costs.

The next post in this series will look at how the IO subsystem for a non production HANA system effects the performance of data acquisition and start-up time.