ML specific: compare your environments

For example, your model runs on mobile and in cloud using different engines: coreml and onnxruntime.

It would be great to make sure both pipelines are equal:

def test_pipelines_match(image_fixture):
   image = cv2.imread(image_fixture)
   coreml_model = CoreMLModel()
   onnx_model = OnnxModel()
   coreml_bbox, coreml_angle = coreml_model.predict(image)
   onnx_bbox, onnx_angle = onnx_model.predict(image)

   np.testing.assert_allclose(coreml_bbox, onnx_bbox, rtol=0, atol=1e-4)
   np.testing.assert_allclose(coreml_angle, onnx_angle, rtol=0, atol=1e-4)

Reliable machine learning

Engineering perspective at ML software

Arseny Kravchenko

Contact me:

Not that cool story

What we will talk about

What we will NOT talk about

ML and Software relationship

Defensive programming

Reliable software requires (automatic) testing

Types of testing:

What are we avoiding with tests?

Reliable software requires different kind of tests

Example: microservice for oriented object detection

Example: microservice for oriented object detection

Example: microservice for oriented object detection

Example: microservice for oriented object detection

Unit tests

Unit tests

Unit tests

Unit tests

Fixtures

Integration tests

Integration tests

Mocks and patches

Mocks and patches

Mocks and patches

More complicated pytest stuff

More complicated pytest stuff

More complicated pytest stuff

I have a legacy code and no tests, what do I do?

Smoke tests

Exotics: mutation tests

Exotics: property-based tests

Best practices: coverage

Best practices: add tests when bug happens

Best practices: CI

Positive and negative tests

Positive and negative tests

Flaky tests

Flaky tests

Flaky tests

ML specific software quality

Is there a bug?

ML specific common bugs

Where is the bug?

Where is the bug?

ML specific: consistency tests

ML specific: invariance tests

ML specific: invariance tests

ML specific: invariance tests

ML specific: negation tests

ML specific: compare your environments

ML specific: unit tests with fixtures

ML specific: directional expectation tests

ML specific: shallow network as fixture

ML specific: test end-to-end pipeline

Samples as tests

Assertions and runtime checks

Why?

Assertions and runtime checks

Assertions and runtime checks

What should we check in runtime?

What should we check in runtime?

Various languages require various levels of runtime checks

Various languages require various levels of runtime checks

Robustness vs correctness

Robustness vs correctness

Debugging

Debugging: mental exercise

Debugging: way to go

Logging

Logging

Monitoring

Monitoring

Monitoring

Links for further reading (generic software)

Links for further reading (ML specific)

Thanks!

Shameless self-promo

More complicated `pytest` stuff

More complicated `pytest` stuff

More complicated `pytest` stuff