even image representation with the help of language without off-the-shelf vision-language pairs. As an easy-to-use plug-and-play mechanism, we evaluate its universality and robustness with multiple ...