Skip to content

kvm: Agent should not check if remaining memory on host is sufficient#2766

Merged
yadvr merged 1 commit into
apache:masterfrom
wido:kvm-agent-allocated-memory
Aug 8, 2018
Merged

kvm: Agent should not check if remaining memory on host is sufficient#2766
yadvr merged 1 commit into
apache:masterfrom
wido:kvm-agent-allocated-memory

Conversation

@wido

@wido wido commented Jul 24, 2018

Copy link
Copy Markdown
Contributor

When a Instance is (attempted to be) started in KVM Host the Agent
should not worry about the allocated memory on this host.

To make a proper judgement we need to take more into account:

  • Memory Overcommit ratio
  • Host reserved memory
  • Host overcommit memory

The Management Server has all the information and the DeploymentPlanner
has to make the decision if a Instance should and can be started on a
Host, not the host itself.

Signed-off-by: Wido den Hollander wido@widodh.nl

@wido wido added this to the 4.12.0.0 milestone Jul 24, 2018
@rafaelweingartner

Copy link
Copy Markdown
Member

So, you are removing this validation from the KVM agent, right?
Can the management server deal with concurrency? I mean two or more management servers deploying VMs in the same KVM server.

@DaanHoogland DaanHoogland left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as code cleanup it looks good, but how do we deal with memory deficiency if we don't check beforehand?

@wido

wido commented Jul 24, 2018

Copy link
Copy Markdown
Contributor Author

@rafaelweingartner Yes, it is being removed from the Agent.

The old logic would sum up all the running VMs and then check if the remaining memory is enough. We however have a memory over commit ratio on the Management Server side which is not taken into account here.

If the Management Server thinks (which is leading) that the Instance can start there, it should be started.

Concurrency, I'm not aware. The mgmt server locks its tables, right? So you won't have those happening on the same host I'm assuming.

@rafaelweingartner

Copy link
Copy Markdown
Member

Understood.

Well, the MS should address concurrency, there are a few locking mechanisms in place, but sometimes I have the feeling that concurrency is not being addressed properly.

Anyways, thanks for the PR.

@borisstoyanov

Copy link
Copy Markdown
Contributor

@blueorangutan package

@blueorangutan

Copy link
Copy Markdown

@borisstoyanov a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan

Copy link
Copy Markdown

Packaging result: ✔centos6 ✖centos7 ✔debian. JID-2207

When a Instance is (attempted to be) started in KVM Host the Agent
should not worry about the allocated memory on this host.

To make a proper judgement we need to take more into account:

- Memory Overcommit ratio
- Host reserved memory
- Host overcommit memory

The Management Server has all the information and the DeploymentPlanner
has to make the decision if a Instance should and can be started on a
Host, not the host itself.

Signed-off-by: Wido den Hollander <wido@widodh.nl>
@wido wido force-pushed the kvm-agent-allocated-memory branch from a4adb66 to 101c190 Compare July 25, 2018 13:02
@yadvr

yadvr commented Jul 26, 2018

Copy link
Copy Markdown
Member

@blueorangutan package

@blueorangutan

Copy link
Copy Markdown

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@yadvr yadvr left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with the change, but not sure if this change may introduce a regression.

@blueorangutan

Copy link
Copy Markdown

Packaging result: ✔centos6 ✔centos7 ✔debian. JID-2209

@yadvr

yadvr commented Jul 26, 2018

Copy link
Copy Markdown
Member

@blueorangutan test

@blueorangutan

Copy link
Copy Markdown

@rhtyd a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@wido

wido commented Jul 26, 2018

Copy link
Copy Markdown
Contributor Author

@rhtyd Thanks! As I stated, the Agent shouldn't worry about Memory Allocation is that's up to the Management Server and the deployment planners there.

@wido

wido commented Jul 26, 2018

Copy link
Copy Markdown
Contributor Author

@rafaelweingartner Are you OK with me merging this PR as we have the approvals?

@rafaelweingartner

Copy link
Copy Markdown
Member

Yes, I approved it already two days ago.

Shouldn't we wait for the test results though?

@wido

wido commented Jul 26, 2018

Copy link
Copy Markdown
Contributor Author

@rafaelweingartner Ah, yes, I didn't notice the tests yet :) Just wondering when we merge PRs into master right now.

But indeed, let's wait for the tests.

@borisstoyanov

Copy link
Copy Markdown
Contributor

@blueorangutan test

@blueorangutan

Copy link
Copy Markdown

@borisstoyanov a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@borisstoyanov

Copy link
Copy Markdown
Contributor

@blueorangutan package

@blueorangutan

Copy link
Copy Markdown

@borisstoyanov a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan

Copy link
Copy Markdown

Packaging result: ✔centos6 ✔centos7 ✔debian. JID-2216

@borisstoyanov

Copy link
Copy Markdown
Contributor

@blueorangutan test

@blueorangutan

Copy link
Copy Markdown

@borisstoyanov a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan

Copy link
Copy Markdown

Trillian test result (tid-2898)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 39848 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr2766-t2898-kvm-centos7.zip
Intermitten failure detected: /marvin/tests/smoke/test_privategw_acl.py
Intermitten failure detected: /marvin/tests/smoke/test_public_ip_range.py
Intermitten failure detected: /marvin/tests/smoke/test_templates.py
Intermitten failure detected: /marvin/tests/smoke/test_usage.py
Intermitten failure detected: /marvin/tests/smoke/test_volumes.py
Intermitten failure detected: /marvin/tests/smoke/test_vpc_redundant.py
Intermitten failure detected: /marvin/tests/smoke/test_host_maintenance.py
Smoke tests completed. 64 look OK, 5 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
test_03_vpc_privategw_restart_vpc_cleanup Failure 179.98 test_privategw_acl.py
test_04_extract_template Failure 128.32 test_templates.py
ContextSuite context=TestISOUsage>:setup Error 0.00 test_usage.py
test_06_download_detached_volume Failure 138.56 test_volumes.py
test_02_cancel_host_maintenace_with_migration_jobs Error 3.26 test_host_maintenance.py

@borisstoyanov borisstoyanov left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants