Text this: Inverse design of grating couplers using the policy gradient method from reinforcement learning