based on my experiments these biases are indeed not needed. code runs faster, identical results. keeping the option just because it deviates from the gpt-2 setup
A
Andrej Karpathy committed
0e90ee9d48613b37d26264ee6b3e8fa0f6e75525
Parent: 001c1e7