Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heapster scalability testing #5880

Closed
vishh opened this issue Mar 24, 2015 · 18 comments
Closed

Heapster scalability testing #5880

vishh opened this issue Mar 24, 2015 · 18 comments
Assignees
Labels
priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Milestone

Comments

@vishh
Copy link
Contributor

vishh commented Mar 24, 2015

Heapster must be tested to ensure that it meets our v1.0 scalability goals - 100 node clusters (#3876) each running 30 - 50 pods each (#4188). A soak test might also be very helpful.
Some of the interesting signals to track includes,

  • Heapster uptime
  • Kubelet stats API latency
  • API server watch errors
  • Monitoring Backend (GCM) write errors, QPS
  • Total number of metrics being handled
  • Heapster housekeeping latency

Heapster needs to expose some metrics to aid in scalability testing.

cc @vmarmol

@vishh vishh added priority/backlog Higher priority than priority/awaiting-more-evidence. team/cluster labels Mar 24, 2015
@vishh vishh added this to the v1.0 milestone Mar 24, 2015
@brendandburns brendandburns modified the milestones: v1.0, v1.0-post Apr 28, 2015
@vishh
Copy link
Contributor Author

vishh commented May 11, 2015

cc @dchen1107

@vishh vishh added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed priority/backlog Higher priority than priority/awaiting-more-evidence. labels May 11, 2015
@dchen1107 dchen1107 modified the milestones: v1.0-candidate, v1.0-post May 13, 2015
@roberthbailey roberthbailey modified the milestones: v1.0, v1.0-candidate May 19, 2015
@vishh
Copy link
Contributor Author

vishh commented May 27, 2015

I made some changes to heapster to make it scale to 100 nodes without any load.
Next steps:

  1. Measure resource usage of heapster in a 100 node cluster with and without load.
  2. Soak test for heapster to quantify its reliability.

cc @saad-ali @dchen1107

@thockin
Copy link
Member

thockin commented Jun 5, 2015

@vishh Plans for this? Is there someone available to take this right now?

@vishh
Copy link
Contributor Author

vishh commented Jun 5, 2015

@saad-ali: Will you be able to look at heapster scalability?

@saad-ali
Copy link
Member

Abhi (@ArtfulCoder) and I will work on this together.

Ideally we'd like to measure the following metrics:

  • Heapster to API Server
    • QPS
    • Latency
    • Read/write error rate
    • Total number of events being pulled
  • Heapster to Kubelet(s)
    • QPS
    • Latency
    • Read/write error rate
    • Total number of metrics being pulled
  • Heapster to backends (GCM, GCL, InfluxDB)
    • QPS
    • Latency
    • Read/write error rate
    • Total number of events/metrics being pushed

These internal metrics require implementing a Heapster instrumentation infrastructure that doesn't yet exist. Therefore, we'll consider this lower priority and likely post V1 task.

Instead we'll focus on getting a baseline of the following basic process metrics for Heapster:

  • Uptime
  • Memory usage
  • CPU usage
  • Network bandwidth usage

These are available today because Heapster is run in a container.

For a first stab at this, we plan on doing the following:

  1. On a cluster of 4-5 nodes, start many (100s?) of containers that essentially do nothing (but will generate metrics).
  2. Make sure Heapster is running on the cluster and pulling metrics and events and pushing them to influxdb and GCM.
  3. Pull the basic container metrics (CPU, memory, uptime, network usage) for Heapster via custom script periodically (30 sec? 1 min? 5 min? 15 min?)
  4. Let the cluster run for 24-72 hours.

@vishh
Copy link
Contributor Author

vishh commented Jun 11, 2015

Thanks for picking this up!
4 can be tackled by using a real monitoring backend like GCM.

@dchen1107
Copy link
Member

@saad-ali The plan SGTM Thanks!

@saad-ali
Copy link
Member

Abhi and I set up a GCE cluster with 4 nodes yesterday. We scheduled 275 pods (1 container each) on the cluster. Within an hour heapster stopped sending data to GCM because we hit quota limits:

W0612 02:21:55.875194       1 driver.go:388] [GCM] Push attempt 2 failed: request &{Method:POST URL:https://www.googleapis.com/cloudmonitoring/v2beta2/projects/saads-vms2/timeseries:write Proto:HTTP/1.1 ProtoMajor:1 ProtoMinor:1 Header:map[Content-Type:[application/json] Authorization:[Bearer ya29.kAFTeh9ov-tLTYj7_5tq1JRyx86qTRlLNRwDEykK0ZvjKSZRX6pBRxDm-B3XpP68TuzZteCrxGLUDQ]] Body:{Reader:0x4c209d730e0} ContentLength:13935 TransferEncoding:[] Close:false Host:www.googleapis.com Form:map[] PostForm:map[] MultipartForm:<nil> Trailer:map[] RemoteAddr: RequestURI: TLS:<nil>} failed with status "403 Forbidden" and response: &{Status:403 Forbidden StatusCode:403 Proto:HTTP/1.1 ProtoMajor:1 ProtoMinor:1 Header:map[Vary:[Origin X-Origin] Date:[Fri, 12 Jun 2015 02:21:55 GMT] Expires:[Fri, 12 Jun 2015 02:21:55 GMT] Cache-Control:[private, max-age=0] X-Frame-Options:[SAMEORIGIN] Alternate-Protocol:[443:quic,p=1] Content-Type:[application/json; charset=UTF-8] X-Content-Type-Options:[nosniff] X-Xss-Protection:[1; mode=block] Server:[GSE]] Body:0x4c209126780 ContentLength:-1 TransferEncoding:[chunked] Close:false Trailer:map[] Request:0x4c209d36d00 TLS:0x4c208217980}, Body: "{
 "error": {
  "errors": [
   {
    "domain": "usageLimits",
    "reason": 
"quotaExceeded",
    "message": "Request would exceed timeseries quota of 20000"
   }
  ],
  "code": 403,
  "message": "Request would exceed timeseries quota of 20000"
 }
}
"

By morning time the error switched to:

 failed: request &{Method:POST URL:https://www.googleapis.com/cloudmonitoring/v2beta2/projects/saads-vms2/timeseries:write Proto:HTTP/1.1 ProtoMajor:1 ProtoMinor:1 Header:map[Content-Type:[application/json] Authorization:[Bearer ya29.kAEbkiec73bWCBbuKTM7XPJ7mofV40aDG_7uK196GIR2WKEJsvw-Kef3yq100fHSBR_qoYIOvQQl-A]] Body:{Reader:0x4c20a4db830} ContentLength:3012 TransferEncoding:[] Close:false Host:www.googleapis.com Form:map[] PostForm:map[] MultipartForm:<nil> Trailer:map[] RemoteAddr: RequestURI: TLS:<nil>} failed with status "403 Forbidden" and response: &{Status:403 Forbidden StatusCode:403 Proto:HTTP/1.1 ProtoMajor:1 ProtoMinor:1 Header:map[X-Content-Type-Options:[nosniff] Vary:[Origin X-Origin] Content-Type:[application/json; charset=UTF-8] Date:[Fri, 12 Jun 2015 17:27:18 GMT] Expires:[Fri, 12 Jun 2015 17:27:18 GMT] Cache-Control:[private, max-age=0] X-Frame-Options:[SAMEORIGIN] X-Xss-Protection:[1; mode=block] Server:[GSE] Alternate-Protocol:[443:quic,p=1]] Body:0x4c209ce58c0 ContentLength:-1 TransferEncoding:[chunked] Close:false Trailer:map[] Request:0x4c209d37380 TLS:0x4c209336a80}, Body:
 "{
 "error": {
  "errors": [
   {
    "domain": "usageLimits",
    "reason": "dailyLimitExceeded",
    "message": "Daily Limit Exceeded"

   }
  ],
  "code": 403,
  "message": "Daily Limit Exceeded"
 }
}
"

To by pass I will try to get quota increased, in the meantime, I'll set up a script to scrap docker stats off the machine directly.

CC @erickhan @dchen1107

@vishh
Copy link
Contributor Author

vishh commented Jun 12, 2015

The quota issue is expected.

On Fri, Jun 12, 2015 at 10:46 AM, Saad Ali notifications@github.com wrote:

Abhi and I set up a GCE cluster with 4 nodes yesterday. We scheduled 275
pods (1 container each) on the cluster. Within an hour heapster stopped
sending data to GCM because we hit quota limits:

W0612 02:21:55.875194 1 driver.go:388] [GCM] Push attempt 2 failed: request &{Method:POST URL:https://www.googleapis.com/cloudmonitoring/v2beta2/projects/saads-vms2/timeseries:write Proto:HTTP/1.1 ProtoMajor:1 ProtoMinor:1 Header:map[Content-Type:[application/json] Authorization:[Bearer ya29.kAFTeh9ov-tLTYj7_5tq1JRyx86qTRlLNRwDEykK0ZvjKSZRX6pBRxDm-B3XpP68TuzZteCrxGLUDQ]] Body:{Reader:0x4c209d730e0} ContentLength:13935 TransferEncoding:[] Close:false Host:www.googleapis.com Form:map[] PostForm:map[] MultipartForm: Trailer:map[] RemoteAddr: RequestURI: TLS:} failed with status "403 Forbidden" and response: &{Status:403 Forbidden StatusCode:403 Proto:HTTP/1.1 ProtoMajor:1 ProtoMinor:1 Header:map[Vary:[Origin X-Origin] Date:[Fri, 12 Jun 2015 02:21:55 GMT] Expires:[Fri, 12 Jun 2015 02:21:55 GMT] Cache-Control:[private, max-age=0] X-Frame-Options:[SAMEORIGIN] Alternate-Protocol:[443:quic,p=1] Content-Type:[application/json; charset=UTF-8] X-Co
ntent-Type-Options:[nosniff] X-Xss-Protection:[1; mode=block] Server:[GSE]] Body:0x4c209126780 ContentLength:-1 TransferEncoding:[chunked] Close:false Trailer:map[] Request:0x4c209d36d00 TLS:0x4c208217980}, Body: "{
"error": {
"errors": [
{
"domain": "usageLimits",
"reason":
"quotaExceeded",
"message": "Request would exceed timeseries quota of 20000"
}
],
"code": 403,
"message": "Request would exceed timeseries quota of 20000"
}
}
"

By morning time the error switched to:

failed: request &{Method:POST URL:https://www.googleapis.com/cloudmonitoring/v2beta2/projects/saads-vms2/timeseries:write Proto:HTTP/1.1 ProtoMajor:1 ProtoMinor:1 Header:map[Content-Type:[application/json] Authorization:[Bearer ya29.kAEbkiec73bWCBbuKTM7XPJ7mofV40aDG_7uK196GIR2WKEJsvw-Kef3yq100fHSBR_qoYIOvQQl-A]] Body:{Reader:0x4c20a4db830} ContentLength:3012 TransferEncoding:[] Close:false Host:www.googleapis.com Form:map[] PostForm:map[] MultipartForm: Trailer:map[] RemoteAddr: RequestURI: TLS:} failed with status "403 Forbidden" and response: &{Status:403 Forbidden StatusCode:403 Proto:HTTP/1.1 ProtoMajor:1 ProtoMinor:1 Header:map[X-Content-Type-Options:[nosniff] Vary:[Origin X-Origin] Content-Type:[application/json; charset=UTF-8] Date:[Fri, 12 Jun 2015 17:27:18 GMT] Expires:[Fri, 12 Jun 2015 17:27:18 GMT] Cache-Control:[private, max-age=0] X-Frame-Options:[SAMEORIGIN] X-Xss-Protection:[1; mode=block] Server:[GSE] Alternate-Protocol:[443:q
uic,p=1]] Body:0x4c209ce58c0 ContentLength:-1 TransferEncoding:[chunked] Close:false Trailer:map[] Request:0x4c209d37380 TLS:0x4c209336a80}, Body:
"{
"error": {
"errors": [
{
"domain": "usageLimits",
"reason": "dailyLimitExceeded",
"message": "Daily Limit Exceeded"

}
],
"code": 403,
"message": "Daily Limit Exceeded"
}
}
"

To by pass I will try to get quota increased, in the meantime, I'll set up
a script to scrap docker stats off the machine directly.

CC @erickhan https://github.com/erickhan @dchen1107
https://github.com/dchen1107


Reply to this email directly or view it on GitHub
#5880 (comment)
.

@dchen1107
Copy link
Member

We are filing the request to ask for more quota. Is there any easy way to get those quota, and do we have a estimation on how big quota it should be for a given cluster size (number of nodes and number of pods, etc. ) cc/ @roberthbailey @a-robinson too.

@saad-ali
Copy link
Member

Here are the initial set of results.

Test Setup

  • Cluster characteristics:
    • 4 nodes
    • n1-standard-1 (1 vCPU, 3.8 GB memory)
  • Test duration: 20 hours total
    • 17 hours under "no load" to get a baseline
      • No additional pods/containers other than default
    • 3 hours under "high load"
      • 135 additional pods, with 10 containers each (1350 extra containers)
      • Container used was a small a statically-linked C program that just sleeps for 28 days with a container size of 877.6 kB.

Results

Memory

Heapster and InfluxDB memory usage grows proportionally with the number of pods/containers.

Heapster Memory Usage

  • At "no load" memory usage slowly creeped up to 50 MB.
  • At "high load" spiked to around 230MB.
    heapster_mem_1350containers

InfluxDB Memory Usage

  • At "no load" creeped up from 25MB to 50MB.
  • At "high load" spiked to around 354MB.
    influxdb_mem_1350containers

Fluentd/ElasticSearch Memory Usage

  • At "no load" stable around 47MB
  • At "high load" stable around 51MB
    fluentdelasticsearch_mem_1350containers

CPU

The CPU usage is cumulative and, thus, increases over time. Meaning the rate of increase is the interesting bit. But the most interesting bit is the relative rate of use overtime: in particular, InfluxDB appears to use almost an order of magnitude more CPU than ElasticSearch and even Heapster (I saw InfluxDB consuming, at times, 80% of CPU on the machine it was on).

Heapster CPU Usage
heapster_cpu_1350containers

InfluxDB CPU Usage
influxdb_cpu_1350containers

Fluentd/ElasticSearch CPU Usage
fluentdelasticsearch_cpu_1350contianers

Restarts

Hepaster, ElasticSearch, InfluxDB, Grafana containers never restarted during the test.

@dchen1107
Copy link
Member

@saad-ali After you restart heapster, does the memory usage go down? Looks like there are some memory leakage issues.

Edit: I misread @saad-ali's comment above. There are more loads later when spike happens.

@saad-ali
Copy link
Member

@dchen1107 Yes, once the extra containers are removed, Heapster memory usage drops back down:
heapster_mem_0extracontainers

As does InfluxDB memory usage (though not nearly close to original levels):
influxdb_mem_0extracontainers

@saad-ali
Copy link
Member

Here are the results from a 100 node cluster 5000 pod run.

Test Setup

  • Cluster characteristics:
    • 100 nodes
    • n1-standard-1 (1 vCPU, 3.75 GB memory)
  • Test duration: 20 hours total
    • 3 hours under "no load" to get a baseline
      • No additional pods/containers other than default
    • 17 hours under "high load"
      • 5000 additional pods, with 2 containers each
        • Was only able to schedule 4983 Pods (17 Pods remained in waiting state)
      • Container used was a small a statically-linked C program that just sleeps for 28 days with a container size of 877.6 kB.

Results

Memory

Reconfirmed that Heapster and InfluxDB memory usage grows proportionally with the number of nodes/pods/containers.

Heapster Memory Usage

  • At "no load" memory was around 600 MB.
  • At "high load" memory was stable around 2.4 GB.
    heapster_mem_usage_100nodes_4983pods

InfluxDB Memory Usage

  • At "no load" creeped up to 200 MB.
  • At "high load" up to 1.9 GB (I noticed spikes as high as 3.2 GB while removing pods).
    influxdb_mem_usage_100nodes_4983pods

Fluentd/ElasticSearch Memory Usage

  • At "no load" stable around 45 MB
  • At "high load" between 50-60 MB
    fluentd_mem_usage_100nodes_4983pods

CPU

The CPU usage is the derivative of the cumulative usage overtime and thus shows the rate of change over time.. InfluxDB was consistently high (pegged around 80-95%). Heapster would spike up and down between (as high as 99%).

Heapster CPU Usage
heapstercpuusage_100nodes_4983pods

InfluxDB CPU Usage
influxdb_cpu_usage_100nodes_4983pods

Fluentd/ElasticSearch CPU Usage
fluentd_cpu_usage_100nodes_4983pods

Restarts

Hepaster, ElasticSearch, InfluxDB, Grafana containers never restarted during the test because of crashes. But Heapster did get appear to get rescheduled a two times (on to the same machine).

@dchen1107
Copy link
Member

Nice work, @saad-ali. I am closing this one since @vishh filed a separate one for configuring those addons pod: #10256

dchen1107 added a commit to dchen1107/kubernetes-1 that referenced this issue Jun 23, 2015
dchen1107 added a commit to dchen1107/kubernetes-1 that referenced this issue Jun 25, 2015
spiffxp added a commit to spiffxp/kraken-services that referenced this issue Jun 30, 2015
These appear to be the numbers google is using to hit their 100-node 1.0
goal, per perf testing done under kubernetes/kubernetes#5880

This looks like 12X less data, and I've been finding influx unresponsive
somewhere between 10-20 nodes, so maybe this is all the breathing room we
need.
spiffxp added a commit to spiffxp/kraken-services that referenced this issue Jul 1, 2015
These appear to be the numbers google is using to hit their 100-node 1.0
goal, per perf testing done under kubernetes/kubernetes#5880

The defaults are 10s poll interval, 5s resolution, so this should back
off load by about an order of magnitude.

TODO:
- drop the verbose flag once finished debugging
spiffxp added a commit to spiffxp/kraken-services that referenced this issue Jul 1, 2015
These appear to be the numbers google is using to hit their 100-node 1.0
goal, per perf testing done under kubernetes/kubernetes#5880

The defaults are 10s poll interval, 5s resolution, so this should back
off load by about an order of magnitude.
spiffxp added a commit to spiffxp/kraken-services that referenced this issue Jul 1, 2015
These appear to be the numbers google is using to hit their 100-node 1.0
goal, per perf testing done under kubernetes/kubernetes#5880

The defaults are 10s poll interval, 5s resolution, so this should back
off load by about an order of magnitude.

We're using `avoidColumns=true` to force heapster to avoid additional
columns and instead append all metadata into the series names.  It makes
the series name ugly and hard to aggregate on the grafana side, but it
wildly reduces CPU load.  I guess that's why influxdb docs recommend
more series with fewer points over fewer series with more points.

Grafana's kraken dashboard updated to use the new series.
@jeremyeder
Copy link

@saad-ali did you ever push this further in total pod count? We're seeing failures after 12k...

@saad-ali
Copy link
Member

saad-ali commented Nov 4, 2016

This is pretty old. Check out http://blog.kubernetes.io/2016/07/kubernetes-updates-to-performance-and-scalability-in-1.3.html

Check out the https://github.com/kubernetes/community/blob/master/sig-scalability/README.md they should be able to give you the current information/plans, and address any issues you are having with published numbers.

@jeremyeder
Copy link

Found out last week that Heapster is being deprecated in favor of a metrics server and other components.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

No branches or pull requests

7 participants