Skip to content

improve 6 of misra analysis performance#2329

Merged
danmar merged 1 commit into
cppcheck-opensource:masterfrom
elfel19:master
Jan 8, 2020
Merged

improve 6 of misra analysis performance#2329
danmar merged 1 commit into
cppcheck-opensource:masterfrom
elfel19:master

Conversation

@elfel19

@elfel19 elfel19 commented Nov 5, 2019

Copy link
Copy Markdown
Contributor

improve misra analysis performance:

  • misra_5_3
  • misra_5_4
  • misra_5_5
  • misra_12_4
  • misra_20_1
  • misra_20_2

Hello.

Our group used cppcheck and misra.py analyzer. But, sometime we handled huge size of dump file.(maximum 13GB..)
It's very critical issue for us. Because, resources are limited and response should be fast.
So, we improved six of misra rules. (we are already used it.)

Core idea is join fewer token list, and get faster value of tokens.
Especially effective at misra_5_3 and misra_5_4.
These two rules sometimes loop in very deeply. So, bring up to slows down analysis performance and consumes a lot of memory.

We've already tested it on several projects, but I hope it's officially released in review here.

Comment thread addons/cppcheckdata.py Outdated
@jubnzv

jubnzv commented Nov 5, 2019

Copy link
Copy Markdown
Contributor

These changes generally look fine.

But if you're talking about performance improvement, could you tell us what method did you use for performance measurement and provide some benchmark results? How they'll affect the memory consumption and execution time of the misra.py?

@stephankim935

Copy link
Copy Markdown

Evaluation

   To evaluate the performance of our new misra.py script, we compared vanilla version of misra.py(Vanilla) and our misra.py(Ours) execution time and memory usage.

Table 1. shows the size of the sample files we used. Each sample files are dumpfile generated from cppcheck with --dump option. Original source codes of each dumpfile are part of our product.

File Name Size(MB)
FileA 1.4
FileB 29.8
FileC 350.3
FileD 1235.9

Table 1. Size of sample files.

   For the one of the samples, we ran the Vanilla first then ran the Ours consequently. To ensure that OS page cache does not make an unfair effect to comparison, we flushed the buff/cache whenever it finishes the execution of misra.py. We used time command of Linux to measure the execution time of misra.py and used python Memory-profiler package to measure usage of heap memory every 0.1 seconds.
   We performed evaluation under the environment as bellow.

  Specification
CPU 8 core, 16 thread, 2.17GHz AMD Ryzen 7 1800X Eight-Core Processor
RAM 64 GB
Python 3.6.8
OS GNU/Linux Debian 4.19.37-3

Table 2. Evaluation environment.

   We confirmed that the numbers of issues detected by Ours and Vanilla are matched except for misra-c2012-5.3 of FileC. In FileC, Vanilla and Ours detected 81480 and 115514 "misra-c2012-5.3" issues respectively. And we found out Vanilla detected 17017 and 17017 "misra-c2012-5.3" issues in only two different locations (as a File_name : Line_number pair) while Ours only detected 3 and 3 issues in those two locations by avoiding redundant detection.

Exccution
Figure 1. Comparison of execution time between Vanilla and Ours

  Figure 1. shows that Ours is 45.0×, 11.5× and 3.7× faster than Vanilla for the FileB, FileC and FileD respectively. Even for the FileA whose size is relatively small, Ours is 1.4× faster than Vanilla. Ours use Hash table(Dictionary in python) instead of multi-level loop to look up certain directive, scope, enum-token and macro item. These changes reduce many iterations whenever a misra rule need to check those items especially on misra-c2012-5.3 misra-c2012-5.4 and misra-c2012-5.5.

heap_trend_vanilla
Figure 2. Heap usage of Vanilla for FileD over time

heap_trend_ours
Figure 3. Heap usage of Ours for FileD over time

   Figure 2 and Figure 3 show that heap usage of both misra.py processes is not largely changed after it reaches its peak. This trend is shown for every sample files. So, we defined the maximum heap usage of misra.py process as heap usage of misra.py.

heap_memory
Figure 4. Comparison of memory usage between Vanilla and Ours. Memory usage of Vanilla is normalized to 1.0.

Figure 4. shows that Ours is effective when it analyzes a large dumpfile. Ours uses 33.6% lesser heap memory than Vanilla on average for FileB, FileC and FileD.

   For FileA, Ours does not reduce heap memory usage significantly, but also not make any negative effect on heap memory usage either.
   Ours loads configuration item whenever it is requested instead of holding every configuration items at first. In Ours, every configuration item instance lose its reference count (counted by only one variable) at the end of an iteration of configuration node list(see parse_dump() function). So, configuration item does not remain in heap memory after it is analyzed because python runtime retrieve memory of instance whose reference count is zero. For the large file, size of a configuration item is also large. So, purging useless configuration item on time is crucial to reduce memory pressure of a system when multiple misra.py process analyze large dumpfiles simultaneously.

@stephankim935

Copy link
Copy Markdown

Hello.
I'm working on the same team with elfel19.
The report above is the execution time and memory(heap) usage measurement report of our code.
I hope that can help you understand our work.

@jubnzv

jubnzv commented Nov 12, 2019

Copy link
Copy Markdown
Contributor

Thanks for detailed representation. It makes this PR much more clear.

@elfel19 elfel19 marked this pull request as ready for review November 12, 2019 10:00
@elfel19

elfel19 commented Nov 21, 2019

Copy link
Copy Markdown
Contributor Author

Hello.

I want to discuss current test-case failure.

We found that the code we modified to improve memory utilization is being used in various addons(naming.py, y2038.py, etc.).
We didn't think about this modify because we only use misra.py among the addons and we apologize for this.

So, I suggest two methods and try to revise them through comments.

  1. modify all addons as follows.

ex) y2038.py (line no.: 168)

for cfg in data.configurations:
   cfg = data.Configuration(cfg)

ex) naming.py (line no.: 45)

for cfg in data. configurations:
   cfg = data.Configuration(cfg)

Way of cfg is used has changed, so fixing these parts will work. But we did not measure its side-effects.

  1. Will be reverted to cfg use.

cppcheckdata.py

self.configurations.append(Configuration(cfgnode))

misra.py

- cfg = data.Configuration(cfg)

Restoring modified cfg load from cppcheckdata and misra will only reflect the misra our modified.
In this case, memory utilization of the improved codes is restored, and only the speed due to misra is improved.

If give us to your feedback, we'll make your changes and test it.

P.s. I'm sorry, if ignored the pull request policy or sequence from here. Because this is my first open source contribution, there was some immaturity.

@versat

versat commented Nov 21, 2019

Copy link
Copy Markdown
Collaborator

I vote for solution 1. I do not see any issues with changing that for all addons. Delaying the call of the constructor of Configuration makes sense to reduce memory usage. I do not see any speed drawbacks from that when looking at the code.
Maybe these changes can be done in an own pull request and only the Misra changes in this one. What do you think?

There were also some changes/fixes to the addon scripts. Can you try to merge them or rebase your branch onto Head?

@elfel19

elfel19 commented Nov 21, 2019

Copy link
Copy Markdown
Contributor Author

Thanks to your comment. :)
I'll try to work tomorrow.
It's too late here.

Comment thread addons/misra.py Outdated
@elfel19

elfel19 commented Dec 11, 2019

Copy link
Copy Markdown
Contributor Author

We checked that 5.3 is removed and modified this, and also checked the recent memory consumption #2395
If the pull request for memory consumption is merged, we may need to restore our modified configuration.
Please give feedback if our pull request is valid.

@danmar

danmar commented Dec 11, 2019

Copy link
Copy Markdown
Collaborator

Thanks for looking at this! I will release cppcheck-1.90 in a few weeks. And this change feels a bit dangerous to me, so I want to merge it after the release.

@danmar danmar added the merge-after-next-release Wait with merging this PR until after the next Release label Dec 11, 2019

@danmar danmar left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a quick look ~2 minutes and it looks very reasonable to me.

@versat

versat commented Dec 27, 2019

Copy link
Copy Markdown
Collaborator

@elfel19 Can you please rebase on HEAD or merge it into your branch. There were some improvements in another PR. Can you have a look what can be still improved/is still valid in this PR?

@elfel19

elfel19 commented Dec 28, 2019

Copy link
Copy Markdown
Contributor Author

Sorry, I was late for check. I'll get to work today.

@elfel19 elfel19 force-pushed the master branch 2 times, most recently from 2d310f9 to 669e025 Compare January 2, 2020 04:55
@elfel19

elfel19 commented Jan 2, 2020

Copy link
Copy Markdown
Contributor Author

HAPPY NEW YEAR!

I rebased my HEAD onto the Verification; printing debug output on std::cout commit(e32c01b).
Perhaps I think, improvements to this work are still valid.

Here is the final code conclusion.
5_3: Deleted.
5_4: improvement
5_5: improvement
12_4: already applied (#2410 )
20_1: Improvement
20_2: incorrectly logged(No change)

@elfel19

elfel19 commented Jan 3, 2020

Copy link
Copy Markdown
Contributor Author

Here is simple memory usage and analysis speed comparison of each version.
image
X-axis: time spent(sec)
Y-axis: memory usage(MB)

test dump file size: 351MB

legends description(From left):

  • 1.9_custom: our custom version of misra.py and cppcheckdata.py(customizing call of the constructor of Configuration)
  • 1.89_custom: a custom version of misra.py and cppcheckdata.py(same as above). we are now using this version.
  • 1.9_origin: the vanilla version of misra.py from 1.90 tag
  • master_HEAD: the vanilla version of misra.py from master HEAD(from 0e369ed)
  • master_HEAD_custom: our custom version of only modified three of rules misra.py from master HEAD(from 03369edd)
    └ It removed our call of Configurator function of cppcheckdata.py, depends on the pull request memory consumption #2395

@danmar

danmar commented Jan 3, 2020

Copy link
Copy Markdown
Collaborator

Travis says:

5.05s$ pylint --rcfile=pylintrc_travis addons/*.py
************* Module addons.misra
addons/misra.py:1014:0: W0311: Bad indentation. Found 24 spaces, expected 16 (bad-indentation)
addons/misra.py:1017:0: W0311: Bad indentation. Found 20 spaces, expected 16 (bad-indentation)
-----------------------------------
Your code has been rated at 9.99/10
The command "pylint --rcfile=pylintrc_travis addons/*.py" exited with 4.

@elfel19

elfel19 commented Jan 7, 2020

Copy link
Copy Markdown
Contributor Author

Oh, I missed it. Fixed wrong indent.

@danmar danmar merged commit a288ea3 into cppcheck-opensource:master Jan 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-after-next-release Wait with merging this PR until after the next Release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants