Timbo Site

write something


在阿里云上选择入门级实例

背景

今年年初我们在架构上做了大幅改动,应用上做了相当多的优化,这些改进让我们在业务不断增长的同时,CPU占用率、网络吞吐上还能出现巨幅下降。大量实例规格可直接降一档,费用减半,少部分实例降两档,费用减半再减半。

我们计划在当前预留实例周期结束,下一个预留实例周期开始前,再进行一次大幅的架构改动,以期获得更好的性价比。

这段时间我在规划下一次架构改动。AWS的比较简单,基本只会在M7g系实例T4g系实例中选择。阿里云改变比较多,原有入门级可选规格多出来了个经济型实例,且在企业级计算规格里新增了通用算力型实例规格。之前一直在用突发型实例,突然多出两个选择,而阿里云的文档说得非常不清不楚,突发型实例t6文档下会大大方方告诉你他们用的是不是最新一代的Cascade Lake处理器,但经济型实例和通用算力型实例下就比较模糊,三者价格又比较近,评估实例规格时我痛苦地摇摆了一下午,最后还是决定通过跑测试跑出来如下表格。

2C4G Instance Perf single core - crc16 all cores - crc16 single core - matrixprod all cores - matrixprod
e-c1m2.large 1340.82 ops/s 1983.60 ops/s 1801.47 ops/s 1711.04 ops/s
u1-c1m2.large 1340.16 ops/s 1978.24 ops/s 1812.71 ops/s 1709.14 ops/s
t6-c1m2.large 1308.88 ops/s 2508.38 ops/s 1756.67 ops/s 3303.01 ops/s

结论

如上表格让我敲定了在阿里云上选突发型实例t6,这篇文章要表达的结论也到此结束。

再往下就是无关紧要的细节了。

细节

测试项目

我们的业务环节会用校验和、加解密来保证用户信息安全,因此我选用了stress-ng对实例进行压测。测试项目上选取了crc16和matrixprod,前者针对校验和跑一分钟,后者针对缓存、内存、浮点运算综合能力跑一分钟,并测试了单核与全核下的表现。

实例配置

统一选择了2核4G的配置进行测试

  • 经济型实例:ecs.e-c1m2.large
  • 通用算力型实例:ecs.u1-c1m2.large
  • 突发型实例:ecs.t6-c1m2.large

统一使用Ubuntu 22.04操作系统

PRETTY_NAME="Ubuntu 22.04.3 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.3 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

并且将内核更新至当前稳定版最新

Linux instance 5.15.0-73-generic #80-Ubuntu SMP Mon May 15 15:18:26 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

测试命令

单核crc16

stress-ng --cpu 1 --cpu-method crc16 -t 1m --times --metrics-brief

多核crc16

stress-ng --cpu-method crc16 -t 1m --times --metrics-brief

单核matrixprod

stress-ng --cpu 1 --cpu-method matrixprod -t 1m --times --metrics-brief

多核matrixprod

stress-ng --cpu-method matrixprod -t 1m --times --metrics-brief

测试结果

经济型实例测试结果

单核crc16

stress-ng: info:  [4496] setting to a 1 min, 0 secs run per stressor
stress-ng: info:  [4496] dispatching hogs: 1 cpu
stress-ng: metrc: [4496] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s
stress-ng: metrc: [4496]                           (secs)    (secs)    (secs)   (real time) (usr+sys time)
stress-ng: metrc: [4496] cpu               80452     60.00     60.00      0.00      1340.82        1340.97
stress-ng: info:  [4496] for a 60.00s run time:
stress-ng: info:  [4496]     120.01s available CPU time
stress-ng: info:  [4496]      59.99s user time   ( 49.99%)
stress-ng: info:  [4496]       0.00s system time (  0.00%)
stress-ng: info:  [4496]      59.99s total time  ( 49.99%)
stress-ng: info:  [4496] load average: 0.78 0.33 0.16
stress-ng: info:  [4496] skipped: 0
stress-ng: info:  [4496] passed: 1: cpu (1)
stress-ng: info:  [4496] failed: 0
stress-ng: info:  [4496] metrics untrustworthy: 0
stress-ng: info:  [4496] successful run completed in 1 min, 0.00 secs

全核crc16

stress-ng: info:  [4499] setting to a 1 min, 0 secs run per stressor
stress-ng: info:  [4499] dispatching hogs: 2 cpu
stress-ng: metrc: [4499] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s
stress-ng: metrc: [4499]                           (secs)    (secs)    (secs)   (real time) (usr+sys time)
stress-ng: metrc: [4499] cpu              119018     60.00    119.58      0.02      1983.60         995.20
stress-ng: info:  [4499] for a 60.00s run time:
stress-ng: info:  [4499]     120.00s available CPU time
stress-ng: info:  [4499]     119.57s user time   ( 99.64%)
stress-ng: info:  [4499]       0.01s system time (  0.01%)
stress-ng: info:  [4499]     119.58s total time  ( 99.65%)
stress-ng: info:  [4499] load average: 1.49 0.63 0.28
stress-ng: info:  [4499] skipped: 0
stress-ng: info:  [4499] passed: 2: cpu (2)
stress-ng: info:  [4499] failed: 0
stress-ng: info:  [4499] metrics untrustworthy: 0
stress-ng: info:  [4499] successful run completed in 1 min, 0.00 secs

单核matrixprod

stress-ng: info:  [4507] setting to a 1 min, 0 secs run per stressor
stress-ng: info:  [4507] dispatching hogs: 1 cpu
stress-ng: metrc: [4507] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s
stress-ng: metrc: [4507]                           (secs)    (secs)    (secs)   (real time) (usr+sys time)
stress-ng: metrc: [4507] cpu              108091     60.00     59.99      0.00      1801.47        1801.75
stress-ng: info:  [4507] for a 60.00s run time:
stress-ng: info:  [4507]     120.01s available CPU time
stress-ng: info:  [4507]      59.98s user time   ( 49.98%)
stress-ng: info:  [4507]       0.00s system time (  0.00%)
stress-ng: info:  [4507]      59.98s total time  ( 49.98%)
stress-ng: info:  [4507] load average: 0.66 0.46 0.28
stress-ng: info:  [4507] skipped: 0
stress-ng: info:  [4507] passed: 1: cpu (1)
stress-ng: info:  [4507] failed: 0
stress-ng: info:  [4507] metrics untrustworthy: 0
stress-ng: info:  [4507] successful run completed in 1 min, 0.00 secs

全核matrixprod

stress-ng: info:  [4512] setting to a 1 min, 0 secs run per stressor
stress-ng: info:  [4512] dispatching hogs: 2 cpu
stress-ng: metrc: [4512] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s
stress-ng: metrc: [4512]                           (secs)    (secs)    (secs)   (real time) (usr+sys time)
stress-ng: metrc: [4512] cpu              102666     60.00    119.43      0.06      1711.04         859.23
stress-ng: info:  [4512] for a 60.00s run time:
stress-ng: info:  [4512]     120.01s available CPU time
stress-ng: info:  [4512]     119.42s user time   ( 99.51%)
stress-ng: info:  [4512]       0.05s system time (  0.04%)
stress-ng: info:  [4512]     119.47s total time  ( 99.55%)
stress-ng: info:  [4512] load average: 1.40 0.70 0.38
stress-ng: info:  [4512] skipped: 0
stress-ng: info:  [4512] passed: 2: cpu (2)
stress-ng: info:  [4512] failed: 0
stress-ng: info:  [4512] metrics untrustworthy: 0
stress-ng: info:  [4512] successful run completed in 1 min, 0.00 secs
通用算力型实例测试结果

单核crc16

stress-ng: info:  [3127] setting to a 1 min, 0 secs run per stressor
stress-ng: info:  [3127] dispatching hogs: 1 cpu
stress-ng: metrc: [3127] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s
stress-ng: metrc: [3127]                           (secs)    (secs)    (secs)   (real time) (usr+sys time)
stress-ng: metrc: [3127] cpu               80411     60.00     59.99      0.00      1340.16        1340.31
stress-ng: info:  [3127] for a 60.00s run time:
stress-ng: info:  [3127]     120.00s available CPU time
stress-ng: info:  [3127]      59.99s user time   ( 49.99%)
stress-ng: info:  [3127]       0.00s system time (  0.00%)
stress-ng: info:  [3127]      59.99s total time  ( 49.99%)
stress-ng: info:  [3127] load average: 0.69 0.35 0.15
stress-ng: info:  [3127] skipped: 0
stress-ng: info:  [3127] passed: 1: cpu (1)
stress-ng: info:  [3127] failed: 0
stress-ng: info:  [3127] metrics untrustworthy: 0
stress-ng: info:  [3127] successful run completed in 1 min, 0.00 secs

多核crc16

stress-ng: info:  [3134] setting to a 1 min, 0 secs run per stressor
stress-ng: info:  [3134] dispatching hogs: 2 cpu
stress-ng: metrc: [3134] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s
stress-ng: metrc: [3134]                           (secs)    (secs)    (secs)   (real time) (usr+sys time)
stress-ng: metrc: [3134] cpu              118697     60.00    119.46      0.02      1978.24         993.47
stress-ng: info:  [3134] for a 60.00s run time:
stress-ng: info:  [3134]     120.01s available CPU time
stress-ng: info:  [3134]     119.45s user time   ( 99.54%)
stress-ng: info:  [3134]       0.01s system time (  0.01%)
stress-ng: info:  [3134]     119.46s total time  ( 99.54%)
stress-ng: info:  [3134] load average: 1.37 0.61 0.26
stress-ng: info:  [3134] skipped: 0
stress-ng: info:  [3134] passed: 2: cpu (2)
stress-ng: info:  [3134] failed: 0
stress-ng: info:  [3134] metrics untrustworthy: 0
stress-ng: info:  [3134] successful run completed in 1 min, 0.00 secs

单核matrixprod

stress-ng: info:  [3141] setting to a 1 min, 0 secs run per stressor
stress-ng: info:  [3141] dispatching hogs: 1 cpu
stress-ng: metrc: [3141] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s
stress-ng: metrc: [3141]                           (secs)    (secs)    (secs)   (real time) (usr+sys time)
stress-ng: metrc: [3141] cpu              108766     60.00     60.00      0.00      1812.71        1812.90
stress-ng: info:  [3141] for a 60.00s run time:
stress-ng: info:  [3141]     120.01s available CPU time
stress-ng: info:  [3141]      59.99s user time   ( 49.99%)
stress-ng: info:  [3141]       0.00s system time (  0.00%)
stress-ng: info:  [3141]      59.99s total time  ( 49.99%)
stress-ng: info:  [3141] load average: 0.96 0.64 0.30
stress-ng: info:  [3141] skipped: 0
stress-ng: info:  [3141] passed: 1: cpu (1)
stress-ng: info:  [3141] failed: 0
stress-ng: info:  [3141] metrics untrustworthy: 0
stress-ng: info:  [3141] successful run completed in 1 min, 0.00 secs

多核matrixprod

stress-ng: info:  [3143] setting to a 1 min, 0 secs run per stressor
stress-ng: info:  [3143] dispatching hogs: 2 cpu
stress-ng: metrc: [3143] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s
stress-ng: metrc: [3143]                           (secs)    (secs)    (secs)   (real time) (usr+sys time)
stress-ng: metrc: [3143] cpu              102553     60.00    119.31      0.11      1709.14         858.77
stress-ng: info:  [3143] for a 60.00s run time:
stress-ng: info:  [3143]     120.01s available CPU time
stress-ng: info:  [3143]     119.31s user time   ( 99.42%)
stress-ng: info:  [3143]       0.10s system time (  0.08%)
stress-ng: info:  [3143]     119.41s total time  ( 99.50%)
stress-ng: info:  [3143] load average: 1.55 0.87 0.41
stress-ng: info:  [3143] skipped: 0
stress-ng: info:  [3143] passed: 2: cpu (2)
stress-ng: info:  [3143] failed: 0
stress-ng: info:  [3143] metrics untrustworthy: 0
stress-ng: info:  [3143] successful run completed in 1 min, 0.00 secs
突发型实例测试结果

单核crc16

stress-ng: info:  [3248] setting to a 1 min, 0 secs run per stressor
stress-ng: info:  [3248] dispatching hogs: 1 cpu
stress-ng: metrc: [3248] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s
stress-ng: metrc: [3248]                           (secs)    (secs)    (secs)   (real time) (usr+sys time)
stress-ng: metrc: [3248] cpu               78534     60.00     59.98      0.00      1308.88        1309.17
stress-ng: info:  [3248] for a 60.00s run time:
stress-ng: info:  [3248]     120.00s available CPU time
stress-ng: info:  [3248]      59.98s user time   ( 49.98%)
stress-ng: info:  [3248]       0.00s system time (  0.00%)
stress-ng: info:  [3248]      59.98s total time  ( 49.98%)
stress-ng: info:  [3248] load average: 0.82 0.35 0.14
stress-ng: info:  [3248] skipped: 0
stress-ng: info:  [3248] passed: 1: cpu (1)
stress-ng: info:  [3248] failed: 0
stress-ng: info:  [3248] metrics untrustworthy: 0
stress-ng: info:  [3248] successful run completed in 1 min, 0.00 secs

多核crc16

stress-ng: info:  [3250] setting to a 1 min, 0 secs run per stressor
stress-ng: info:  [3250] dispatching hogs: 2 cpu
stress-ng: metrc: [3250] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s
stress-ng: metrc: [3250]                           (secs)    (secs)    (secs)   (real time) (usr+sys time)
stress-ng: metrc: [3250] cpu              150510     60.00    118.91      0.16      2508.38        1264.10
stress-ng: info:  [3250] for a 60.01s run time:
stress-ng: info:  [3250]     120.01s available CPU time
stress-ng: info:  [3250]     118.90s user time   ( 99.07%)
stress-ng: info:  [3250]       0.15s system time (  0.12%)
stress-ng: info:  [3250]     119.05s total time  ( 99.20%)
stress-ng: info:  [3250] load average: 1.50 0.65 0.26
stress-ng: info:  [3250] skipped: 0
stress-ng: info:  [3250] passed: 2: cpu (2)
stress-ng: info:  [3250] failed: 0
stress-ng: info:  [3250] metrics untrustworthy: 0
stress-ng: info:  [3250] successful run completed in 1 min, 0.01 secs

单核matrixprod

stress-ng: info:  [3263] setting to a 1 min, 0 secs run per stressor
stress-ng: info:  [3263] dispatching hogs: 1 cpu
stress-ng: metrc: [3263] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s
stress-ng: metrc: [3263]                           (secs)    (secs)    (secs)   (real time) (usr+sys time)
stress-ng: metrc: [3263] cpu              105401     60.00     59.99      0.00      1756.67        1756.95
stress-ng: info:  [3263] for a 60.00s run time:
stress-ng: info:  [3263]     120.00s available CPU time
stress-ng: info:  [3263]      59.98s user time   ( 49.98%)
stress-ng: info:  [3263]       0.00s system time (  0.00%)
stress-ng: info:  [3263]      59.98s total time  ( 49.98%)
stress-ng: info:  [3263] load average: 0.92 0.65 0.30
stress-ng: info:  [3263] skipped: 0
stress-ng: info:  [3263] passed: 1: cpu (1)
stress-ng: info:  [3263] failed: 0
stress-ng: info:  [3263] metrics untrustworthy: 0
stress-ng: info:  [3263] successful run completed in 1 min, 0.00 secs

多核matrixprod

stress-ng: info:  [3268] setting to a 1 min, 0 secs run per stressor
stress-ng: info:  [3268] dispatching hogs: 2 cpu
stress-ng: metrc: [3268] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s
stress-ng: metrc: [3268]                           (secs)    (secs)    (secs)   (real time) (usr+sys time)
stress-ng: metrc: [3268] cpu              198188     60.00    118.97      0.17      3303.01        1663.55
stress-ng: info:  [3268] for a 60.00s run time:
stress-ng: info:  [3268]     120.01s available CPU time
stress-ng: info:  [3268]     118.96s user time   ( 99.13%)
stress-ng: info:  [3268]       0.16s system time (  0.13%)
stress-ng: info:  [3268]     119.12s total time  ( 99.26%)
stress-ng: info:  [3268] load average: 1.62 0.89 0.41
stress-ng: info:  [3268] skipped: 0
stress-ng: info:  [3268] passed: 2: cpu (2)
stress-ng: info:  [3268] failed: 0
stress-ng: info:  [3268] metrics untrustworthy: 0
stress-ng: info:  [3268] successful run completed in 1 min, 0.00 secs

CPU架构

阿里云文档有点奇怪,分出了相当多类型的实例,一些实例会很自豪地告诉用户处理器型号,一些实例就说得很敷衍,感觉不太想让用户知道型号是啥。

cat /proc/cpuinfolscpu,这两个指令都可以翻出大量的CPU规格细节,取比较重要的指标:

经济型实例 通用算力型实例 突发型实例
cpu family 6 6 6
cpu MHz 2500.002 2500.002 2500.000
cache size 33792 KB 33792 KB 36608 KB
model name Intel(R) Xeon(R) Platinum Intel(R) Xeon(R) Platinum Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz

安装cpuid,运行指令后,结果会提示:经济型实例和通用算力型实例都是基于Skylake架构的Xeon Platinum处理器。

在这点上,突发型实例用的Cascade Lake架构处理器在工艺和效能上会更优一些。

与AWS机型对比

写了这么长,重点在这里,所有测试的原始记录我都归档好了,这里就不需要太长篇幅,直接给对比结果:

2C4G Instance Perf single core - crc16 all cores - crc16 single core - matrixprod all cores - matrixprod
e-c1m2.large 1340.82 ops/s 1983.60 ops/s 1801.47 ops/s 1711.04 ops/s
u1-c1m2.large 1340.16 ops/s 1978.24 ops/s 1812.71 ops/s 1709.14 ops/s
t6-c1m2.large 1308.88 ops/s 2508.38 ops/s 1756.67 ops/s 3303.01 ops/s
t4g.medium 828.99 ops/s 1660.24 ops/s 44.13 ops/s 89.49 ops/s
t3a.medium 1046.94 ops/s 1682.28 ops/s 675.70 ops/s 818.42 ops/s
t3.medium 1058.33 ops/s 1848.89 ops/s 1364.36 ops/s 1600.27 ops/s

AWS的T4g实例测下来比较离谱,离真实的性能差异很大,但唯独它是ARM架构,其他都是x86架构,体验下来Graviton 3处理器非常给力,这个结果我认为不太对。

性价比

AWS实例价格在同样的地区很少改动,仅在换代更迭时会下降一些。阿里云的购买则是浮动的,比如ecs.e-c1m2.large机型上周在华东1可用区K购买3年时有2.3折的折扣,今天再看则只有3.3折。阿里云上的实例能以很低价格买到时,性价比就很高,而配置比较稀少,有时还会是9.9折的机型,就有种不想让你买,买到就成冤大头的感觉,这个时候就不存在什么性价比。

但阿里云会提供一些配置很特殊的机器,比如ecs.u1-c1m1.4xlarge,规格为16核16G内存,当需要更多CPU运算而不怎么需要内存时,就非常有性价比。

GCP的Sales来跟我们宣传时,说过GCP可自定义实例的CPU数量和内存大小,而我自己去看的时没找到这个选项,不然可以玩出花来~~,前不久还跟我们说我们要用Bard时让我们用1M的Token呢~~。

结论

目前阿里云ECS上的t6实例非常适合我们负载的情况,性价比第一。