Understand, Visualize & Improve Continuous Integration Alexander Reelsen Community Advocate alex@elastic.co | @spinscale
Slide 2
How do you act on your CI data?
Slide 3
Everything is a search problem!
Slide 4
Ecommerce
Slide 5
Social networks
Slide 6
File Search
Slide 7
Location Search
Slide 8
Observability
Slide 9
Slide 10
Elasticsearch in one minute Search Engine (FTS, Analytics, Geo), near realtime Distributed, scalable, highly available, resilient Interface: HTTP & JSON Heart of the Elastic Stack (Kibana, Logstash, Beats)
Slide 11
Kibana in one minute The window into the Elastic Stack Visualization & Dashboarding Management Monitoring
Slide 12
Slide 13
Slide 14
Slide 15
Slide 16
Slide 17
Slide 18
Survey time!
Slide 19
CI Tooling Jenkins, JenkinsX, TeamCity, Bamboo Travis, CircleCI, GitLab CI Azure Pipelines, AWS CI/CD, Google CI/CD Others?
Slide 20
CI infrastructure Self Hosted on own infrastructure Self Hosted in the cloud CI-as-a-service
Slide 21
CI as a role CI is part of ops role CI is part of dev role Infrastructure Engineer Build Engineer
Slide 22
Test test test dev branches (master, 7.x) release branches (7.8, 6.8) feature branches PRs BWC Benchmarking Packaging tests JVM versions Garbage collectors Operating systems
Slide 23
Do you act on CI results? If so, how?
Slide 24
What is CI data? Time series Recency Locality Fragmentation
Slide 25
CI output unstructured huge needle in the haystack ( TRACE loglevel) requires postprocessing security
Slide 26
Analyzing CI results Detect seemingly random bugs Centralised lookup ability for the team Emails are not a good CI status medium Long term trends (failures, test count, coverage)
Slide 27
Requirements meta data enrichment - per branch, per test run, per test method run, failures only data model should be based on search requirements (query optimized) define search ability: by timestamp, by branch, by class, by test method, by success/failure
Slide 28
Ask your data Did this test fail earlier this month?
Slide 29
Ask your data Show me all failed test runs of this class in this branch in the last month
Slide 30
Ask your data Is this test failing only under this OS?
Slide 31
Ask your data Is our test count increasing compared to our SLOC count?
Slide 32
Ask your data Are our successful test runs decreasing since we doubled the team size?
Slide 33
Ask your data Do the holidays/OOO hours have impact on our CI? Can we reduce costs?
Slide 34
crystal spec
Slide 35
Structure test output xUnit Test Anything Protocol
Slide 36
TAP
Slide 37
xUnit output
<?xml version=”1.0”?> <testsuite tests=”62” skipped=”0” errors=”0” failures=”0” time=”0.261869582” timestamp=”2020-07-21T12:43:50Z” hostname=”rhincodon”> <testcase file=”/Users/alr/devel/elastic/community/slack-command-community/spec/custom_log_handler_spec.cr” classname=”spec.custom_log_handler_spec” name=”CustomLogHandler should not log access to '/' endpoint” time=”3.6394e-5”/> <testcase file=”/Users/alr/devel/elastic/community/slack-command-community/spec/custom_log_handler_spec.cr” classname=”spec.custom_log_handler_spec” name=”CustomLogHandler should log access to any other endpoint” time=”0.000250577”/> <testcase file=”/Users/alr/devel/elastic/community/slack-command-community/spec/objects_spec.cr” classname=”spec.objects_spec” name=”Objects Event can retrieve html_url” time=”7.589e-6”/> <testcase file=”/Users/alr/devel/elastic/community/slack-command-community/spec/objects_spec.cr” classname=”spec.objects_spec” name=”Objects Event can retrieve region” time=”7.438e-6”/>
Slide 38
That’s it - right? Map standard output/error to tests Map logger output to tests Add run specific meta data (branch, OS, JVM) Indexing/Storing strategy Preaggregate data (test count, success/failure, failed tests) Code coverage metrics
Slide 39
Index data into Elasticsearch Massage output data into proper JSON logstash/ xml2json /self written tool? Integration with CI
Java Output (test failure) <?xml version=”1.0” encoding=”UTF-8”?> <testsuite name=”co.elastic.community.AdminServiceTests” tests=”2” skipped=”0” failures=”1” errors=”0” timestamp=”2020-07-14T11:36:54” hostname=”rhincodon” time=”0.022”> <properties/> <testcase name=”testDefaultAdminService()” classname=”co.elastic.community.AdminServiceTests” time=”0.021”> <failure message=”org.opentest4j.AssertionFailedError: Expecting: <true> to be equal to: <false> but was not.” type=”org.opentest4j.AssertionFailedError”>org.opentest4j.AssertionFailedError: Expecting: <true> to be equal to: <false> but was not. at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:500) at co.elastic.community.AdminServiceTests.testDefaultAdminService(AdminServiceTests.java:13) …
Slide 42
Crystal Output (single XML file) <?xml version=”1.0”?> <testsuite tests=”62” skipped=”0” errors=”0” failures=”1” time=”0.258457318” timestamp=”2020-07-14T11:26:23Z” hostname=”rhincodon”> <testcase file=”/Users/alr/devel/elastic/community/slack-command-community/spec/custom_log_handler_spec.cr” classname=”spec.custom_log_handler_spec” name=”CustomLogHandler should not log access to '/' endpoint” time=”3.1523e-5”/> <testcase file=”/Users/alr/devel/elastic/community/slack-command-community/spec/custom_log_handler_spec.cr” classname=”spec.custom_log_handler_spec” name=”CustomLogHandler should log access to any other endpoint” time=”0.000282161”> <failure message=”Expected: 2 got: 1”>../../../../../../usr/local/Cellar/crystal/0.35.1_1/src/spec/methods.cr:76:5 in ‘fail’ ../../../../../../usr/local/Cellar/crystal/0.35.1_1/src/spec/expectations.cr:447:9 in ‘should’ spec/custom_log_handler_spec.cr:33:5 in ‘->’ ../../../../../../usr/local/Cellar/crystal/0.35.1_1/src/primitives.cr:255:3 in ‘internal_run’ ../../../../../../usr/local/Cellar/crystal/0.35.1_1/src/spec/example.cr:33:16 in ‘run’ ../../../../../../usr/local/Cellar/crystal/0.35.1_1/src/spec/context.cr:18:23 in ‘internal_run’ ../../../../../../usr/local/Cellar/crystal/0.35.1_1/src/spec/context.cr:330:7 in ‘run’ ../../../../../../usr/local/Cellar/crystal/0.35.1_1/src/spec/context.cr:18:23 in ‘internal_run’ ../../../../../../usr/local/Cellar/crystal/0.35.1_1/src/spec/context.cr:147:7 in ‘run’ ../../../../../../usr/local/Cellar/crystal/0.35.1_1/src/spec/dsl.cr:270:7 in ‘->’ ../../../../../../usr/local/Cellar/crystal/0.35.1_1/src/primitives.cr:255:3 in ‘run’ ../../../../../../usr/local/Cellar/crystal/0.35.1_1/src/crystal/main.cr:45:14 in ‘main’ ../../../../../../usr/local/Cellar/crystal/0.35.1_1/src/crystal/main.cr:114:3 in ‘main’</failure> </testcase> … </testsuite>
Slide 43
Demo Elastic Build Stats Elasticsearch CI
Slide 44
Slide 45
Slide 46
Slide 47
Summary Analytics use-case Not needed for operational tasks ML: Time series anomaly detection (network outage) Test triage!
Slide 48
Test triage Daily single person owner for test failures Check failure Assign team Disable tests to stabilize
Slide 49
The future[tm] Delivery pipeline as a service (K8s: Tekton) Much more automated canary deployments Smarter test execution (affected code changes test cases first) Increase of specialization roles around CI/CD Analytics build into pipelines
Slide 50
Thanks for listening Q&A Alexander Reelsen Community Advocate alex@elastic.co | @spinscale
Slide 51
Elastic Cloud
Slide 52
Elastic Support Subscriptions
Slide 53
Getting more help
Slide 54
Discuss Forum https://discuss.elastic.co
Slide 55
Community & Meetups https://community.elastic.co
Slide 56
Official Elastic Training https://training.elastic.co
Slide 57
Thanks for listening Q&A Alexander Reelsen Community Advocate alex@elastic.co | @spinscale