it's slightly slower, but the safety is worth it i think.
```
name old time/op new time/op delta
Next-8 30.0ns ± 2% 31.0ns ± 3% +3.56% (p=0.002 n=7+8)
NextParallel-8 79.4ns ± 1% 92.5ns ± 1% +16.58% (p=0.000 n=8+8)
```
use atomics rather than mutexes to synchronize state between calls.
```
name old time/op new time/op delta
Next-8 244ns ± 0% 30ns ± 2% -87.70% (p=0.000 n=8+7)
NextParallel-8 215µs ±60% 0µs ± 1% -99.96% (p=0.000 n=8+8)
```
The results for NextParallel are around ~80ns/op, but that doesn't
show up in the benchstat output.