Evaluating the Robustness of Analogical Reasoning in Large Language Models