在阿里云上选择入门级实例
背景
今年年初我们在架构上做了大幅改动,应用上做了相当多的优化,这些改进让我们在业务不断增长的同时,CPU占用率、网络吞吐上还能出现巨幅下降。大量实例规格可直接降一档,费用减半,少部分实例降两档,费用减半再减半。
我们计划在当前预留实例周期结束,下一个预留实例周期开始前,再进行一次大幅的架构改动,以期获得更好的性价比。
这段时间我在规划下一次架构改动。AWS的比较简单,基本只会在M7g系实例和T4g系实例中选择。阿里云改变比较多,原有入门级可选规格多出来了个经济型实例,且在企业级计算规格里新增了通用算力型实例规格。之前一直在用突发型实例,突然多出两个选择,而阿里云的文档说得非常不清不楚,突发型实例t6文档下会大大方方告诉你他们用的是不是最新一代的Cascade Lake处理器,但经济型实例和通用算力型实例下就比较模糊,三者价格又比较近,评估实例规格时我痛苦地摇摆了一下午,最后还是决定通过跑测试跑出来如下表格。
2C4G Instance Perf | single core - crc16 | all cores - crc16 | single core - matrixprod | all cores - matrixprod |
---|---|---|---|---|
e-c1m2.large | 1340.82 ops/s | 1983.60 ops/s | 1801.47 ops/s | 1711.04 ops/s |
u1-c1m2.large | 1340.16 ops/s | 1978.24 ops/s | 1812.71 ops/s | 1709.14 ops/s |
t6-c1m2.large | 1308.88 ops/s | 2508.38 ops/s | 1756.67 ops/s | 3303.01 ops/s |
结论
如上表格让我敲定了在阿里云上选突发型实例t6,这篇文章要表达的结论也到此结束。
再往下就是无关紧要的细节了。
细节
测试项目
我们的业务环节会用校验和、加解密来保证用户信息安全,因此我选用了stress-ng对实例进行压测。测试项目上选取了crc16和matrixprod,前者针对校验和跑一分钟,后者针对缓存、内存、浮点运算综合能力跑一分钟,并测试了单核与双全核下的表现。
实例配置
统一选择了2核4G的配置进行测试
- 经济型实例:ecs.e-c1m2.large
- 通用算力型实例:ecs.u1-c1m2.large
- 突发型实例:ecs.t6-c1m2.large
统一使用Ubuntu 22.04操作系统
PRETTY_NAME="Ubuntu 22.04.3 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.3 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
并且将内核更新至当前稳定版最新
Linux instance 5.15.0-73-generic #80-Ubuntu SMP Mon May 15 15:18:26 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
测试命令
单核crc16
stress-ng --cpu 1 --cpu-method crc16 -t 1m --times --metrics-brief
多核crc16
stress-ng --cpu-method crc16 -t 1m --times --metrics-brief
单核matrixprod
stress-ng --cpu 1 --cpu-method matrixprod -t 1m --times --metrics-brief
多核matrixprod
stress-ng --cpu-method matrixprod -t 1m --times --metrics-brief
测试结果
经济型实例测试结果
单核crc16
stress-ng: info: [4496] setting to a 1 min, 0 secs run per stressor
stress-ng: info: [4496] dispatching hogs: 1 cpu
stress-ng: metrc: [4496] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s
stress-ng: metrc: [4496] (secs) (secs) (secs) (real time) (usr+sys time)
stress-ng: metrc: [4496] cpu 80452 60.00 60.00 0.00 1340.82 1340.97
stress-ng: info: [4496] for a 60.00s run time:
stress-ng: info: [4496] 120.01s available CPU time
stress-ng: info: [4496] 59.99s user time ( 49.99%)
stress-ng: info: [4496] 0.00s system time ( 0.00%)
stress-ng: info: [4496] 59.99s total time ( 49.99%)
stress-ng: info: [4496] load average: 0.78 0.33 0.16
stress-ng: info: [4496] skipped: 0
stress-ng: info: [4496] passed: 1: cpu (1)
stress-ng: info: [4496] failed: 0
stress-ng: info: [4496] metrics untrustworthy: 0
stress-ng: info: [4496] successful run completed in 1 min, 0.00 secs
全核crc16
stress-ng: info: [4499] setting to a 1 min, 0 secs run per stressor
stress-ng: info: [4499] dispatching hogs: 2 cpu
stress-ng: metrc: [4499] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s
stress-ng: metrc: [4499] (secs) (secs) (secs) (real time) (usr+sys time)
stress-ng: metrc: [4499] cpu 119018 60.00 119.58 0.02 1983.60 995.20
stress-ng: info: [4499] for a 60.00s run time:
stress-ng: info: [4499] 120.00s available CPU time
stress-ng: info: [4499] 119.57s user time ( 99.64%)
stress-ng: info: [4499] 0.01s system time ( 0.01%)
stress-ng: info: [4499] 119.58s total time ( 99.65%)
stress-ng: info: [4499] load average: 1.49 0.63 0.28
stress-ng: info: [4499] skipped: 0
stress-ng: info: [4499] passed: 2: cpu (2)
stress-ng: info: [4499] failed: 0
stress-ng: info: [4499] metrics untrustworthy: 0
stress-ng: info: [4499] successful run completed in 1 min, 0.00 secs
单核matrixprod
stress-ng: info: [4507] setting to a 1 min, 0 secs run per stressor
stress-ng: info: [4507] dispatching hogs: 1 cpu
stress-ng: metrc: [4507] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s
stress-ng: metrc: [4507] (secs) (secs) (secs) (real time) (usr+sys time)
stress-ng: metrc: [4507] cpu 108091 60.00 59.99 0.00 1801.47 1801.75
stress-ng: info: [4507] for a 60.00s run time:
stress-ng: info: [4507] 120.01s available CPU time
stress-ng: info: [4507] 59.98s user time ( 49.98%)
stress-ng: info: [4507] 0.00s system time ( 0.00%)
stress-ng: info: [4507] 59.98s total time ( 49.98%)
stress-ng: info: [4507] load average: 0.66 0.46 0.28
stress-ng: info: [4507] skipped: 0
stress-ng: info: [4507] passed: 1: cpu (1)
stress-ng: info: [4507] failed: 0
stress-ng: info: [4507] metrics untrustworthy: 0
stress-ng: info: [4507] successful run completed in 1 min, 0.00 secs
全核matrixprod
stress-ng: info: [4512] setting to a 1 min, 0 secs run per stressor
stress-ng: info: [4512] dispatching hogs: 2 cpu
stress-ng: metrc: [4512] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s
stress-ng: metrc: [4512] (secs) (secs) (secs) (real time) (usr+sys time)
stress-ng: metrc: [4512] cpu 102666 60.00 119.43 0.06 1711.04 859.23
stress-ng: info: [4512] for a 60.00s run time:
stress-ng: info: [4512] 120.01s available CPU time
stress-ng: info: [4512] 119.42s user time ( 99.51%)
stress-ng: info: [4512] 0.05s system time ( 0.04%)
stress-ng: info: [4512] 119.47s total time ( 99.55%)
stress-ng: info: [4512] load average: 1.40 0.70 0.38
stress-ng: info: [4512] skipped: 0
stress-ng: info: [4512] passed: 2: cpu (2)
stress-ng: info: [4512] failed: 0
stress-ng: info: [4512] metrics untrustworthy: 0
stress-ng: info: [4512] successful run completed in 1 min, 0.00 secs
通用算力型实例测试结果
单核crc16
stress-ng: info: [3127] setting to a 1 min, 0 secs run per stressor
stress-ng: info: [3127] dispatching hogs: 1 cpu
stress-ng: metrc: [3127] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s
stress-ng: metrc: [3127] (secs) (secs) (secs) (real time) (usr+sys time)
stress-ng: metrc: [3127] cpu 80411 60.00 59.99 0.00 1340.16 1340.31
stress-ng: info: [3127] for a 60.00s run time:
stress-ng: info: [3127] 120.00s available CPU time
stress-ng: info: [3127] 59.99s user time ( 49.99%)
stress-ng: info: [3127] 0.00s system time ( 0.00%)
stress-ng: info: [3127] 59.99s total time ( 49.99%)
stress-ng: info: [3127] load average: 0.69 0.35 0.15
stress-ng: info: [3127] skipped: 0
stress-ng: info: [3127] passed: 1: cpu (1)
stress-ng: info: [3127] failed: 0
stress-ng: info: [3127] metrics untrustworthy: 0
stress-ng: info: [3127] successful run completed in 1 min, 0.00 secs
多核crc16
stress-ng: info: [3134] setting to a 1 min, 0 secs run per stressor
stress-ng: info: [3134] dispatching hogs: 2 cpu
stress-ng: metrc: [3134] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s
stress-ng: metrc: [3134] (secs) (secs) (secs) (real time) (usr+sys time)
stress-ng: metrc: [3134] cpu 118697 60.00 119.46 0.02 1978.24 993.47
stress-ng: info: [3134] for a 60.00s run time:
stress-ng: info: [3134] 120.01s available CPU time
stress-ng: info: [3134] 119.45s user time ( 99.54%)
stress-ng: info: [3134] 0.01s system time ( 0.01%)
stress-ng: info: [3134] 119.46s total time ( 99.54%)
stress-ng: info: [3134] load average: 1.37 0.61 0.26
stress-ng: info: [3134] skipped: 0
stress-ng: info: [3134] passed: 2: cpu (2)
stress-ng: info: [3134] failed: 0
stress-ng: info: [3134] metrics untrustworthy: 0
stress-ng: info: [3134] successful run completed in 1 min, 0.00 secs
单核matrixprod
stress-ng: info: [3141] setting to a 1 min, 0 secs run per stressor
stress-ng: info: [3141] dispatching hogs: 1 cpu
stress-ng: metrc: [3141] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s
stress-ng: metrc: [3141] (secs) (secs) (secs) (real time) (usr+sys time)
stress-ng: metrc: [3141] cpu 108766 60.00 60.00 0.00 1812.71 1812.90
stress-ng: info: [3141] for a 60.00s run time:
stress-ng: info: [3141] 120.01s available CPU time
stress-ng: info: [3141] 59.99s user time ( 49.99%)
stress-ng: info: [3141] 0.00s system time ( 0.00%)
stress-ng: info: [3141] 59.99s total time ( 49.99%)
stress-ng: info: [3141] load average: 0.96 0.64 0.30
stress-ng: info: [3141] skipped: 0
stress-ng: info: [3141] passed: 1: cpu (1)
stress-ng: info: [3141] failed: 0
stress-ng: info: [3141] metrics untrustworthy: 0
stress-ng: info: [3141] successful run completed in 1 min, 0.00 secs
多核matrixprod
stress-ng: info: [3143] setting to a 1 min, 0 secs run per stressor
stress-ng: info: [3143] dispatching hogs: 2 cpu
stress-ng: metrc: [3143] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s
stress-ng: metrc: [3143] (secs) (secs) (secs) (real time) (usr+sys time)
stress-ng: metrc: [3143] cpu 102553 60.00 119.31 0.11 1709.14 858.77
stress-ng: info: [3143] for a 60.00s run time:
stress-ng: info: [3143] 120.01s available CPU time
stress-ng: info: [3143] 119.31s user time ( 99.42%)
stress-ng: info: [3143] 0.10s system time ( 0.08%)
stress-ng: info: [3143] 119.41s total time ( 99.50%)
stress-ng: info: [3143] load average: 1.55 0.87 0.41
stress-ng: info: [3143] skipped: 0
stress-ng: info: [3143] passed: 2: cpu (2)
stress-ng: info: [3143] failed: 0
stress-ng: info: [3143] metrics untrustworthy: 0
stress-ng: info: [3143] successful run completed in 1 min, 0.00 secs
突发型实例测试结果
单核crc16
stress-ng: info: [3248] setting to a 1 min, 0 secs run per stressor
stress-ng: info: [3248] dispatching hogs: 1 cpu
stress-ng: metrc: [3248] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s
stress-ng: metrc: [3248] (secs) (secs) (secs) (real time) (usr+sys time)
stress-ng: metrc: [3248] cpu 78534 60.00 59.98 0.00 1308.88 1309.17
stress-ng: info: [3248] for a 60.00s run time:
stress-ng: info: [3248] 120.00s available CPU time
stress-ng: info: [3248] 59.98s user time ( 49.98%)
stress-ng: info: [3248] 0.00s system time ( 0.00%)
stress-ng: info: [3248] 59.98s total time ( 49.98%)
stress-ng: info: [3248] load average: 0.82 0.35 0.14
stress-ng: info: [3248] skipped: 0
stress-ng: info: [3248] passed: 1: cpu (1)
stress-ng: info: [3248] failed: 0
stress-ng: info: [3248] metrics untrustworthy: 0
stress-ng: info: [3248] successful run completed in 1 min, 0.00 secs
多核crc16
stress-ng: info: [3250] setting to a 1 min, 0 secs run per stressor
stress-ng: info: [3250] dispatching hogs: 2 cpu
stress-ng: metrc: [3250] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s
stress-ng: metrc: [3250] (secs) (secs) (secs) (real time) (usr+sys time)
stress-ng: metrc: [3250] cpu 150510 60.00 118.91 0.16 2508.38 1264.10
stress-ng: info: [3250] for a 60.01s run time:
stress-ng: info: [3250] 120.01s available CPU time
stress-ng: info: [3250] 118.90s user time ( 99.07%)
stress-ng: info: [3250] 0.15s system time ( 0.12%)
stress-ng: info: [3250] 119.05s total time ( 99.20%)
stress-ng: info: [3250] load average: 1.50 0.65 0.26
stress-ng: info: [3250] skipped: 0
stress-ng: info: [3250] passed: 2: cpu (2)
stress-ng: info: [3250] failed: 0
stress-ng: info: [3250] metrics untrustworthy: 0
stress-ng: info: [3250] successful run completed in 1 min, 0.01 secs
单核matrixprod
stress-ng: info: [3263] setting to a 1 min, 0 secs run per stressor
stress-ng: info: [3263] dispatching hogs: 1 cpu
stress-ng: metrc: [3263] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s
stress-ng: metrc: [3263] (secs) (secs) (secs) (real time) (usr+sys time)
stress-ng: metrc: [3263] cpu 105401 60.00 59.99 0.00 1756.67 1756.95
stress-ng: info: [3263] for a 60.00s run time:
stress-ng: info: [3263] 120.00s available CPU time
stress-ng: info: [3263] 59.98s user time ( 49.98%)
stress-ng: info: [3263] 0.00s system time ( 0.00%)
stress-ng: info: [3263] 59.98s total time ( 49.98%)
stress-ng: info: [3263] load average: 0.92 0.65 0.30
stress-ng: info: [3263] skipped: 0
stress-ng: info: [3263] passed: 1: cpu (1)
stress-ng: info: [3263] failed: 0
stress-ng: info: [3263] metrics untrustworthy: 0
stress-ng: info: [3263] successful run completed in 1 min, 0.00 secs
多核matrixprod
stress-ng: info: [3268] setting to a 1 min, 0 secs run per stressor
stress-ng: info: [3268] dispatching hogs: 2 cpu
stress-ng: metrc: [3268] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s
stress-ng: metrc: [3268] (secs) (secs) (secs) (real time) (usr+sys time)
stress-ng: metrc: [3268] cpu 198188 60.00 118.97 0.17 3303.01 1663.55
stress-ng: info: [3268] for a 60.00s run time:
stress-ng: info: [3268] 120.01s available CPU time
stress-ng: info: [3268] 118.96s user time ( 99.13%)
stress-ng: info: [3268] 0.16s system time ( 0.13%)
stress-ng: info: [3268] 119.12s total time ( 99.26%)
stress-ng: info: [3268] load average: 1.62 0.89 0.41
stress-ng: info: [3268] skipped: 0
stress-ng: info: [3268] passed: 2: cpu (2)
stress-ng: info: [3268] failed: 0
stress-ng: info: [3268] metrics untrustworthy: 0
stress-ng: info: [3268] successful run completed in 1 min, 0.00 secs
CPU架构
阿里云文档有点奇怪,分出了相当多类型的实例,一些实例会很自豪地告诉用户处理器型号,一些实例就说得很敷衍,感觉不太想让用户知道型号是啥。
cat /proc/cpuinfo
和lscpu
,这两个指令都可以翻出大量的CPU规格细节,取比较重要的指标:
经济型实例 | 通用算力型实例 | 突发型实例 | |
---|---|---|---|
cpu family | 6 | 6 | 6 |
cpu MHz | 2500.002 | 2500.002 | 2500.000 |
cache size | 33792 KB | 33792 KB | 36608 KB |
model name | Intel(R) Xeon(R) Platinum | Intel(R) Xeon(R) Platinum | Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz |
安装cpuid,运行指令后,结果会提示:经济型实例和通用算力型实例都是基于Skylake架构的Xeon Platinum处理器。
在这点上,突发型实例用的Cascade Lake架构处理器在工艺和效能上会更优一些。
与AWS机型对比
写了这么长,重点在这里,所有测试的原始记录我都归档好了,这里就不需要太长篇幅,直接给对比结果:
2C4G Instance Perf | single core - crc16 | all cores - crc16 | single core - matrixprod | all cores - matrixprod |
---|---|---|---|---|
e-c1m2.large | 1340.82 ops/s | 1983.60 ops/s | 1801.47 ops/s | 1711.04 ops/s |
u1-c1m2.large | 1340.16 ops/s | 1978.24 ops/s | 1812.71 ops/s | 1709.14 ops/s |
t6-c1m2.large | 1308.88 ops/s | 2508.38 ops/s | 1756.67 ops/s | 3303.01 ops/s |
t4g.medium | 828.99 ops/s | 1660.24 ops/s | 44.13 ops/s | 89.49 ops/s |
t3a.medium | 1046.94 ops/s | 1682.28 ops/s | 675.70 ops/s | 818.42 ops/s |
t3.medium | 1058.33 ops/s | 1848.89 ops/s | 1364.36 ops/s | 1600.27 ops/s |
AWS的T4g实例测下来比较离谱,离真实的性能差异很大,但唯独它是ARM架构,其他都是x86架构,体验下来Graviton 3处理器非常给力,这个结果我认为不太对。
性价比
AWS实例价格在同样的地区很少改动,仅在换代更迭时会下降一些。阿里云的购买则是浮动的,比如ecs.e-c1m2.large机型上周在华东1可用区K购买3年时有2.3折的折扣,今天再看则只有3.3折。阿里云上的实例能以很低价格买到时,性价比就很高,而配置比较稀少,有时还会是9.9折的机型,就有种不想让你买,买到就成冤大头的感觉,这个时候就不存在什么性价比。
但阿里云会提供一些配置很特殊的机器,比如ecs.u1-c1m1.4xlarge,规格为16核16G内存,当需要更多CPU运算而不怎么需要内存时,就非常有性价比。
GCP的Sales来跟我们宣传时,说过GCP可自定义实例的CPU数量和内存大小,而我自己去看的时没找到这个选项,不然可以玩出花来~~,前不久还跟我们说我们要用Bard时让我们用1M的Token呢~~。
结论
目前阿里云ECS上的t6实例非常适合我们负载的情况,性价比第一。