Yesterday I composed my first pull request for the VOC project. The project is aimed at writing a Python bytecode to JVM bytecode transpiler.
While adding some extra test cases, I noticed that the system was not abiding by me telling it to expect failure using the @expectedFailure decorator. I decided to investigate.
After having implemented builtins.sum in Java, I wanted to add some extra test cases. One of the test cases was summing up a list of integers and floating point numbers. It all seemed so simple:
def test_sum_mix_floats_and_ints(self): self.assertCodeExecution(""" print(sum([1, 1.414, 2, 3.14159])) """)
Little did I know:
FAIL: test_sum_mix_floats_and_ints (tests.builtins.test_sum.BuiltinSumFunctionTests) ... TypeError: unsupported operand type(s) for +: 'float' and 'float'
It turned out, adding floats together was not yet implemented. Fair enough. I don't want to make the test go to waste, but I do know that for the time being, it is normal for it to fail. Let us mark it as such.
from unittest import expectedFailure ... @expectedFailure def test_sum_mix_floats_and_ints(self): self.assertCodeExecution(""" print(sum([1, 1.414, 2, 3.14159])) """)
And that should be it, right. We have told the system that it should expect a failure, and more importantly, we now expect it to shut up about it.
Re-running the test suite now gives us
FAIL: test_sum_mix_floats_and_ints (tests.builtins.test_sum.BuiltinSumFunctionTests) ... TypeError: unsupported operand type(s) for +: 'float' and 'float'
Uhm. What? That was totally unexpected. Uncalled for, even. Why is the computer not obeying me? Is the world ending? Has the singularity finally occurred? (Probably yes, but the AI is smart enough to not let us know).
Putting all superstition aside, I went to look for a fix (besides removing the test method). In this same test case, there already was another system for marking methods as expected failure (or not).
class BuiltinSumFunctionTests(BuiltinFunctionTestCase, TranspileTestCase): functions = ["sum"] not_implemented = [ 'test_bytearray', 'test_bytes', 'test_class', 'test_complex', 'test_dict', 'test_frozenset', 'test_set', 'test_str', ] ... @expectedFailure # + not defined on float/float yet. def test_sum_mix_floats_and_ints(self): self.assertCodeExecution(""" print(sum([1, 1.414, 2, 3.14159])) """)
Adding the test_sum_mix_floats_and_ints to the not_implemented list did the trick. It was successfully marked as 'expected failure' in the output. The world was right again.
But some nagging feeling remained in my head: Why did "the only obvious way to do it" fail on me? It did not sit right with me. So I decided to investigate further. The not_implemented list gave me a good clue as to what I was looking for. greping around in the code base gave me a good pointer as to where I wanted to look: tests/utils.py contained the only non-listing reference to not_implemented. It was in a method called run which was supposed to override unittest.TestCase.run.
class BuiltinFunctionTestCase: format = '' def run(self, result=None): # Override the run method to inject the "expectingFailure" marker # when the test case runs. for test_name in dir(self): if test_name.startswith('test_'): getattr(self, test_name).__dict__['__unittest_expecting_failure__'] = test_name in self.not_implemented return super().run(result=result) def assertBuiltinFunction(self, **kwargs): substitutions = kwargs.pop('substitutions') self.assertCodeExecution( """ f = %(f)s x = %(x)s print(%(format)s%(operation)s) """ % kwargs, "Error running %(operation)s with f=%(f)s, x=%(x)s" % kwargs, substitutions=substitutions ) for datatype, examples in SAMPLE_DATA.items(): vars()['test_%s' % datatype] = _builtin_test('test_%s' % datatype, 'f(x)', examples)
Here we see the following: It dynamically creates a set of tests based on SAMPLE_DATA. This is a really nice way to generate a lot of test cases dynamically. Due to how it was used, one immediately had test cases for sum(...) on a variety of builtin data types. Most of course non-sensical (what is the sum of None?), but still.
Now, what was the not_implemented about? It was a way of marking to the test runner that a certain case was not fully handled yet. For instance, 'test_list' was marked as not implemented before I provided a basic implementation of sum.
So let us take a closer look at where the magic happens, to take away the illusion of magic. The interesting lines are as follows:
for test_name in dir(self): if test_name.startswith('test_'): getattr(self, test_name).__dict__['__unittest_expecting_failure__'] = test_name in self.not_implemented
Let's look at it closer. First dir(self) asks for the names of 'all interesting things' on self, and this is then iterated over using the name test_name.
Second, it is checked that the test_name starts with the string 'test_'. If not, we won't touch it.
The last line does so many things, it is mind boggling. Again, dissecting helps here. First, it gets the attribute from self which is described as in test_name, and then gets the __dict__ item belonging to that. Finally, it writes the item '__unittest_expecting_failure__' to that, with the value test_name in self.not_implemented.
Now, I'm sorry to say this, but I'm not quite sure that the previous paragraph helped anyone in understanding what was going on.
Anyhow, let's rewrite it a bit more sequential (less golfed).
expected_failure = test_name in self.not_implemented test_method = getattr(self, test_name) test_method.__dict__['__unittest_expecting_failure__'] = expected_failure
The first two lines look quite innocent. The last line explained to me why my original method was being marked as not expecting failure: I needed to register it explicitly in not_implemented.
Then, I started experimenting with different ways of "solving" this "problem" (the cognitive dissonance that happened to me).
First, I tried using
test_method.__dict__.setdefault('__unittest_expecting_failure__', expected_failure)
hoping that would work. Well, it partially did. There was another problem though, the value seemed to stay stuck for other unit tests, causing really weird results.
Then I tried setting the value using setdefault only when the failure was to be expected. That helped a bit, with the downside of a lot of "unexpected success" messages.
Turns out that the __dict__ of a bound method, is the same as the __dict__ of the unbound method.
>>> class K(object): ... def foo(self): ... pass >>> K.foo <function K.foo at 0x1059e8400> >>> k = K() >>> k.foo <bound method K.foo of <__main__.K object at 0x1059eb160>> >>> k.foo.__dict__ is K.foo.__dict__ True
Since the generated code is all on the same class, every run in a subclass caused more and more methods to be marked as expecting failure, so the tests became more and more lenient. Definitely not what I wanted.
Having had no luck so far, I decided to look into the __unittest_expecting__failure__ marker. Specifically, where it was used instead of where it was to be set. In Python 3.4.4, the interesting region was on lines 566 to 570 of unittest/case.py
expecting_failure_method = getattr(testMethod, "__unittest_expecting_failure__", False) expecting_failure_class = getattr(self, "__unittest_expecting_failure__", False) expecting_failure = expecting_failure_class or expecting_failure_method
So expecting failure was calculated based on the method and the class. Reading a bit further gave me another piece of information needed. The name of the test method was stored in an attribute _testMethodName.
Using this new piece of information, I could rewrite the run method as follows:
def run(self, result=None): # Override the run method to inject the "expectingFailure" marker # when the test case runs. self.__unittest_expecting_failure__ = self._testMethodName in self.not_implemented return super().run(result=result)
And all was right in the world. So I did a git push, and awaited the results from Travis-CI.
Oh, while we are waiting: did you know the nice thing about the attribute named _testMethodName. It's guaranteed 100% not to clash with the name of an attribute in a subclass. Why? Because _testMethodName is not PEP8, and everybody writes PEP8 compliant code nowadays.
Back on topic. Travis-CI got back with the results. Failure! To be honest, Travis-CI has been complaining about my work on this aspect for so long, I myself was starting to feel like a failure.
However, I did make progress. The test case was now successful on Python 3.4.4, but not on Python 3.4.2. Looking at unittest/case.py from 3.4.2, it was obvious the check on the class attribute was missing. However, seeing the progress I made, I refused to give up. I know the solution was close.
I had one last trick up my sleeve, a trick I was really hoping to not use: replacing the test method with a wrapper.
def run(self, result=None): # Override the run method to inject the "expectingFailure" marker # when the test case runs. if self._testMethodName in self.not_implemented: method = getattr(self, self._testMethodName) wrapper = lambda *args, **kwargs: method(*args, **kwargs) wrapper.__unittest_expecting_failure__ = True setattr(self, self._testMethodName, wrapper) return super().run(result=result)
I did another quick run, pushed the code back to GitHub, and awaited the results from Travis-CI. It worked! Success! Party-time!
There were a couple more complications I did not go in to: I tried to reduce duplicate code by creating a mixin class. However, that did initially go so well, due to self.not_implemented not always being defined. In the end I got it working, though.
One thing I would like to note in particular. Even though I found something in the test suite of VOC that annoyed me, I am not annoyed by (the test suite of) VOC itself. I think it is a really interesting project, and it could have potential for bringing Python to a Java world (Android, in particular).
The parts of the codebase I have seen so far also make real sense to me. From what I understand, they welcome any contributors. So if you are interested in getting a Python-to-Java transpiler working, why not help out the VOC project!
Also, the "trick" that was used to auto-generate a lot of test cases is really nice. I've seen it used before (albeit in a different way) in an internal project I was once working on. It has the advantage of generating a lot of test cases with very little effort. The drawback is that it generates a lot of test cases.
In the end, by understanding the problem space, the wiggle room for solutions, and sometimes even digging into the internals of a tool you are using, you can find an elegant solution to a problem.