https://martinfowler.com/articles/nonDeterminism.html
Eradicating Non-Determinism in Tests
non-deterministic tests - tests that sometimes pass and sometimes fail. Left uncontrolled, non-deterministic tests can completely destroy the value of an automated regression suite. In this article I outline how to deal with non-deterministic tests. Initially quarantine helps to reduce their damage to other tests, but you still have to fix them soon. Therefore I discuss treatments for the common causes for non-determinism: lack of isolation, asynchronous behavior, remote services, time, and resource leaks.
Footnotes
1: Yes, I know many advocates of TDD consider that a primary virtue of testing is the way it drives requirements and design. I agree that this is a big benefit, but I consider the regression suite to be the single biggest benefit that automated tests give us. Even without TDD tests are worth the cost for that.
2: Sometimes, of course, a test failure is due to a change in what the code is supposed to do, but the test hasn't been updated to reflect the new behavior. This is essentially a bug in the tests, but is equally easy to fix if it's caught right away.
3: There is a useful role for non-deterministic tests. Tests seeded from a randomizer can help hunt out edge cases. Performance tests will always come back with different values. But these kinds of tests are quite different from automated regression tests, which are my focus here.
4: This works well for the Mingle team as they are skillful enough to find and fix non-deterministic tests quickly and disciplined enough to ensure they do it quickly. If your build remains broken for long due to your quarantine tests failing you will lose the value of continuous integration. So for most teams I'd advise keeping the quarantined tests out of the main pipeline.
5: There's no hard-and-fast definitions here, but I'm using the early Extreme Programming terminology of using "unit test" to mean something fine-grained and "functional test" as a test that's more end-to-end and feature related.
6: One trick is to create the initial database and copy it using file system commands before opening it for each test run. File system copies are often faster than loading data using the database commands.
7: Of course this trick only works when you can conduct the test without committing any transactions.
8: Although you'll still need a timeout in case you never get a reply - and that time out is subject to the same danger when you move to a different environment. Fortunately you can set that timeout to be pretty high, which minimizes the chances of that biting you.
9: In that case, however the tests will run very slowly. You may want to consider aborting the whole test suite if you reach the wait limit.
10: If your asynchronous behavior is triggered from the UI, it's often a good UI choice to have some indicator to show an asynchronous operation is in progress. Having this be part of the UI also helps testing as the hooks required to stop this indicator can be the same hooks as detecting when to progress the test logic.
11: There are other advantages to using a test double in these circumstances, even if the remote system is deterministic. Often response time is too slow to use a remote system. If you can only talk to a live system, then your tests can generate significant, and unappreciated, load on that system.
12: You could reseed your datastore for each test based on the current time. But that's a lot of work, and fraught with potential timing errors.
13: In this case the clock stub is a common way to break isolation, each test that uses it should ensure it's properly re-initialized.
14: One of my colleagues likes to force a test run just before and after midnight in order to catch tests that use the current time and assume it's the same day an hour or two later. This is especially good at times like the last day of the month.
15: Although, of course, this isn't always a non-determinism bug, but one that's due to a change in environment. Depending on how close the clock ticks are to the id allocation, it could result in non-deterministic behavior.
Lack of Isolation
Asynchronous Behavior
Remote services
Time
Resource Leaks
Testing with game app (LibGdx). Game app have some special characteristics.
https://github.com/libgdx/libgdx/issues/5995
Lack of Isolation
In order to get tests to run reliably, you must have clear control over the environment in which they run, so you have a well-known state at the beginning of the test. If one test creates some data in the database and leaves it lying around, it can corrupt the run of another test which may rely on a different database state.
Case study: a website for auto parts sales, mainly tire, brakes for example.
We have DEV, Staging and Live environments, and may be local computer of each Dev (3 Devs + 1 leader).
And we have some child or subsidiary sites like for Canada market, for only tire sales, for Shopify channel etc.
And only some main sites have Staging / Dev environments, others are sole Live site or Dev sites.
And we have blog too (Wordpress obviously) embedded to PHP framework (a common but not top popular and outdated PHP 5 vs 8 for example).
And we have 2 teams from two different countries. And we have Backend team too (real Backend ?) with .NET tech.
=> So think ab it, each time we have a test, think about isolation.
APIs
Order
Order Paypal, PayFlow, Amz, Affirms
Orders have stored in many table: order, order_detail, order_history, order_log (may be some table for each payment methods) and some methods log to file system.
And table store API_response (talk to other Backend/Warehouse .NET for example)
Therefore I find it's really important to focus on keeping tests isolated. Properly isolated tests can be run in any sequence. As you get to larger operational scope of functional tests, it gets progressively harder to keep tests isolated. When you are tracking down a non-determinism, lack of isolation is a common and frustrating cause.
Keep your tests isolated from each other, so that execution of one test will not affect any others.
One trick that's handy when you're using databases, is to conduct your tests inside a transaction, and then to rollback the transaction at the end of the test. That way the transaction manager cleans up for you, reducing the chance of errors
https://www.sqlshack.com/rollback-sql-rolling-back-transactions-via-the-rollback-sql-query/
- So if we have 3 insert queries follow each others to 3 different tables. Then can we rolled back all inserted query after done a test ?
Between these insert queries there are select queries to get data just inserted, for example: we inserted order record, then query its data to grab order_id to order_history...
Can transactions work ? in other words, did select query have data ?
Transaction mean atomic update (?), ...
This sound impossible, at least if we check AUTO_INCREMENT ? if transaction can rolled back then can AUTO_INCREMENT too ? seem practical but not logic.
Another approach is to do a single build of a mostly-immutable starting fixture before running a group of tests. Then ensure that the tests don't change that initial state (or if they do, they reverse the changes in tear-down). This tactic is more error-prone than rebuilding the fixture for each test, but it may be worthwhile iff it takes too long to build the fixture each time.
Although databases are a common cause for isolation problems, there are plenty of times you can get these in-memory too. In particular be aware with static data and singletons. A good example for this kind of problem is contextual environment, such as the currently logged in user.
Some people prefer to put less emphasis on isolation and more on defining clear dependencies to force tests to run in a specified order. I prefer isolation because it gives you more flexibility in running subsets of tests and parallelizing tests.
https://github.com/libgdx/libgdx/issues/5995
It sucks, but I don't see unit testing catching on in game development (outside of situations where the logic can be tested independently) any time soon.
@tommyettinger You make a good point about not breaking backwards compatibility. I disagree with you on the rest of what you said. Testing a game presents all of the exact same challenges you face testing anything with a graphical user interface. I've unit tested games before. The scenarios you mentioned were exactly the kind of things I did test.
To make it backwards compatible, we just need to add a configuration option. We could call it startOnInitialize, which would default to true. Then you just have to set that to false in your tests.
https://gamesfromwithin.com/when-is-it-ok-not-to-tdd
Never use bare sleeps to wait for asynchonous responses: use a callback or polling.
//pseudo-code makeAsyncCall startTime = Time.now; while(! responseReceived) { if (Time.now - startTime > waitLimit) throw new TestTimeoutException; sleep (pollingInterval); } readResponse
No responseIn some crawl script I have used mechanism look like polling, I wait a bit of time after anunsuccessful call and repeat 2-3 times in a nested condition.
Hey thanks for sharing a great article in this blog page. It's very nice define every steps. You can visit here for know about which are the Best Web Designing Companies in India.
ReplyDelete