What is Cross-Modal Generalization?