Evaluation of educational programs has accelerated dramatically in the past quarter century. With this expansion has come clear methodological improvement involving randomized control studies and other approaches for establishing causation that considerably strengthen their internal validity. Such studies are, however, conducted within individual countries with the institutional structure of the schools and the national labor markets, and they are seldom replicated either within or across countries. A natural question is whether the results of an individual high-quality educational evaluation in one country can be reasonably applied in other countries. This paper focuses on existing research into differences across countries that, while generally impossible to incorporate into program evaluations, potentially have direct effects on key elements of policy and on the outcomes that can be expected. In particular, available cross-national studies on a variety of topics suggest using caution when generalizing evaluation results across countries, because student results are likely to vary systematically with a number of fundamental country-level institutional characteristics that are not explicitly considered in within-country evaluation analyses. Unfortunately, there is currently too little replication of basic research studies to provide explicit guidance on when and where cross-national generalizations are possible.