Engineering the Science of Learning

Author: Justin Reich

Developing a Continuous Research Infrastructure for MOOCs to Catalyze Learning Research

Four years since the creation of Coursera and edX, there is much to celebrate in the accomplishments of research into massive open online courses (MOOCs) and other forms of open online learning.

Millions of students have participated in thousands of courses, leaving a trail of billions of log events recording their behavior, efforts, failures, and successes. From these massive records, researchers have documented the demographic characteristics of learners and patterns of course participation (Ho et al. 2015; Perna et al. 2014), and they have begun exploring methods for supporting greater persistence and completion in courses (Baker et al. 2014).

But, for all the very useful, policy-relevant findings, MOOC research has led to few new insights about how people learn and how best to support their learning. Beyond basic observations that more active students earn higher grades than less active students, there are few new theories or design principles for how students learn best or how best to teach them.

To take advantage of the opportunities for research provided by the millions of students across thousands of MOOCs, an ambitious new approach to online learning research is needed.

How to Enhance Online Learning Research

Large-scale online learning environments should be engineered so that both the learning platform and courses are continuously improving. Internet services such as Google, Facebook, and Netflix regularly test new features and experiences with users, and this steady stream of experimentation provides evidence and insights that allow these services to iteratively improve.

Three changes, already under way in pockets, are necessary in MOOC platforms and course development processes to support constant development.

First, cycles of course offerings should be dramatically shortened by shifting from session-based courses that run once or twice a year during specific dates to on-demand courses that allow new cohorts of students to start every week or even every day. Several of the world’s largest courses with over a million registrants each—Harvard’s CS50X: Introduction to Computer Science, Stanford’s Machine Learning on Coursera, and UC San Diego’s Learning How to Learn on Coursera—have adopted this approach, and Coursera has been refactoring its platform to support these kinds of on-demand courses. The shift to more flexible availability will allow insights from experimental interventions to be incorporated more rapidly and regularly and accelerate the pace of learning research as courses iteratively improve.

Second, MOOCs need to incorporate an expanded set of experiments, so that each course tests a variety of instructional approaches—from the overall syllabus design to the specific wording of questions and prompts. Every course should be designed and implemented in such a way that it collects evidence to make the next run of the course better. edX has recently introduced authoring tools that make it very simple for any course developer to include randomized experiments (sometimes called A/B tests) that test the efficacy of different pieces of course content.

Third, to take advantage of these platform changes, course teams will need to include greater multidisciplinary expertise and to more tightly couple the work of course designers and researchers. Rather than having faculty and course teams create courses to be evaluated afterward by separate research teams, research considerations should be built into courses from the earliest phases of design. In the days before MOOCs, the Open Learning Initiative (OLI) (formerly at Carnegie Mellon and now at Stanford) pioneered this kind of joint development among researchers, software developers, and content experts (EDUCAUSE Learning Initiative 2006).

With these changes to platform and processes, MOOCs would support a continuous research infrastructure. It would be continuous in two ways.

  1. Learning research would be an important consideration throughout every phase of course design, deployment, and refinement. At present, course design and education research are pursued separately, and research occurs as a phase in the course life cycle after the development of the course, usually by a separate team. In a continuous research infrastructure, research considerations would be attended to from the earliest days of proposing a course through its final archival as static open courseware.
  2. Courses—with their embedded experiments—would run on an always-on, on-demand basis rather than once per year. At present, annual cycles of course offerings mean that iterative improvements based on the results of instructional experiments require years to come to fruition. In a continuous research infrastructure, on-demand courses would allow for tighter (shorter) cycles of iterative improvement, guided by evidence from constant experimentation.

With these major shifts, MOOCs will be better positioned to improve in quality and advance the science of learning as they make learning opportunities available to the public.

In the following sections I review limitations in current MOOC research, and then discuss the three shifts—on-demand courses, expanded experimentation, and multidisciplinary design teams—necessary to realize a more ambitious future for large-scale online learning research.

A continuous research infrastructure for MOOCs will support shorter and more integrated cycles of instructional experiments, analysis of student learning data, and refinement of existing courses, and these shorter cycles of experiment and iteration will enable more rapid improvements in online teaching and learning.

Limitations in Current MOOC Research

MOOC research is currently defined by the separation of course development from learning research. In typical, nearly universal practice, course developers and researchers are two separate teams engaged in two different, overlapping initiatives: faculty and instructional designers make courses, and researchers study the results afterward.

“Fishing in the Exhaust”

I have characterized the early years of MOOC research as “fishing in the exhaust,” a strategy in which instructional teams create MOOCs without regard to research objectives or questions, platforms log data from student behavior in those courses, and data analysts examine the tracking log data, the “exhaust,” afterward (Reich 2014b).

There was optimism in the basic outlines of this strategy. An early hope of MOOC enthusiasts was that the reams of data collected by online learning platforms would enable an almost purely inductive approach to learning research (Nihalani 2013; Norvig 2015). Once educational researchers and their computer science partners had “the data,” data mining tools and predictive algorithms would bring educational research into the modern era. Algorithms would unlock the secrets to human learning hidden in these data.

But in these first years of MOOC research the strategy has primarily yielded commonplace findings about MOOC student behavior. In a paper about MITx’s first course, 6.002x (Circuits and Electronics), DeBoer and colleagues (2014) examined 20 activity and outcome measures and found that each form of activity was positively correlated with every other form of activity. Thus, the number of lectures watched is positively correlated with the number of forum postings, both of which are positively correlated with the numbers of hours on site and of weeks active, and all of these are positively correlated with grades and certificate attainment.

Dozens of studies along the same lines followed, many of them reaching similar conclusions. I have summarized this observation as Reich’s Law: “People who do stuff do more stuff, and people who do stuff do better than people who don’t do stuff” (Reich 2014a). A substantial portion of the MOOC research literature to date can be summarized thus.

Data, Quality, and Assessment

The research has been limited because the systems generating the data are limited. MOOC instructional materials are not designed to test multiple instructional approaches, beyond perhaps presenting some material in multiple media, such as text and video.

They may be of low instructional quality as well. In a recent study of MOOCs’ alignment with generally accepted principles of instructional design, the scores of 76 MOOCs ranged from 0 to 28 on a 72-point scale (Margaryan et al. 2015).

Moreover, assessment materials are plagued by technical limitations—virtually no MOOCs attempt automated assessment of sophisticated human performance like essay writing or design exercises. There are a few interesting trials of peer assessment of complex performance (e.g., Kulkarni et al. 2015), as in Scott Klemmer’s Human Computer Interaction course (, but most MOOCs include only multiple choice or quantitative response questions.

The assessment infrastructure in many MOOCs does not support robust inferences about what students do and do not understand. If outcome measures don’t provide adequate evidence of what students have learned, and if input materials provide a limited range of instructional supports, prediction algorithms will at best find a local minimum. Future systems with more sophisticated instructional approaches and richer measures of learning will yield more useful results.

Separation of Course Development and Experimental Design

Post hoc observational research has been accompanied by a smaller set of experimental studies (Anderson et al. 2014; Kizilcec et al. 2014; Lamb et al. 2015). They have been very important in pointing the way toward a future paradigm of experimentation in MOOC research, but their limitations are also instructive.

As mentioned above, most MOOC experiments have been designed and implemented separately from course development teams: one common model is that a team of social psychologists or behavioral economists develops an intervention that can be dropped into any course for testing. Because the researchers have no content expertise, few published experiments have been based on discipline-specific educational research or attempted to improve subject-specific instruction.1

In addition, few lines of experimental research have iteratively tested refined intervention designs over multiple cycles, in part because the course cycles are so long: If courses run only once a year, then every iteration of an intervention requires a year of delay to be implemented and evaluated. MOOC experiments have yet to take advantage of the massive numbers of learners or the full potential of a shift to digital learning infrastructure.

Fishing for correlations in course tracking logs and implementing simple one-off experiments were both sensible initial approaches to exploratory MOOC research. Creating separate course creation and research teams worked neatly for most university organizational structures, and the platforms offered minimal support for in situ or iterative experimentation.

For the field to advance, however, early efforts must not rut out into path dependencies (Reich 2015). A shift to continuous research infrastructures would address many of the limitations of the current paradigm. The course development process needs to attend to research continuously throughout the creation of new courses, and platforms need to support continuous, iterative experimental research.

From Session-Based to On-Demand MOOCs

Borrowing from the heritage of residential education, the earliest MOOCs typically had defined start and end dates, often mirroring the traditional semester calendar. But this approach is inconvenient for the busy and complicated lives of online learners who may not be full-time students.

Redefining Course Cohorts

MOOC platforms need to enable courses that can be run continuously with on-demand signup. Coursera has made substantial progress toward this shift. Many of its most popular courses now have sessions that begin every two or three weeks rather than once per year.2

When courses start every few weeks, multiple -instances of a course are running in a staggered -fashion, allowing 10–25 new cohorts per year instead of only one or two. Course interventions can go through multiple iterative cycles each year, and if research proceeds quickly enough it could even be possible to introduce effective interventions to existing cohorts, so that -students more rapidly receive the benefits of their participation in experimental activities.

It may ultimately be possible to allow for course cohorts to be segmented not by when students start but by where they have progressed in the course, so cohorts are defined by each student’s last activity rather than the time of their first. Evidence from student tracking logs suggests that heterogeneity in course completion pathways is high (Mullaney 2014) and in some courses very few students maintain the recommended schedule (Mullaney and Reich 2015).

Drawing from Online Games

One effective model of progress-based cohorting occurs in massively multiplayer online games that provide tools for characters of different levels and abilities to find appropriate synchronous challenges on an on-demand basis. Players use “Dungeon Finders” that help algorithmically assemble groups of similarly leveled, currently online players ready to take on the same challenge (Dabbish et al. 2012).

In MOOCs, problem set discussions and synchronous debates would replace raids and dungeons, but the mechanisms for cohorting students on-demand for particular kinds of challenges might be similar (see Ferschke et al. 2015 for two prototypes). With these kinds of tools in place, it might be possible to target an intervention at a particular point in a course (a topic, discussion, or assignment) and gather data from all students progressing through that section over a period of days rather than months. This kind of model would look much more like the on-going experimentation conducted by Google and Facebook than the annual research cycles more typical of classroom educational research.

Supporting Sophisticated Experimentation

For continuously running courses, platforms need to develop the infrastructure to support sophisticated experimentation. edX offers basic tools for authoring A/B testing of content blocks in courses, and Coursera plans to release similar functionality. These tools allow nonprogrammers to implement randomized controlled trials of course content with the same ease with which they would add a reading, video, or assignment.

Student “Self-Check”

In one recent experiment, HarvardX course developers and researchers tested a voluntary “discussion self-check,” a noncredit question on quizzes that asked if people had contributed to the forum (Lamb et al. 2015). The goal was to encourage greater forum participation. With each weekly quiz for the first six weeks of the course, one random half of students received the normal three-question quiz, and the other received the three questions plus the self-check question (this kind of design might be better understood as an A/Not A test rather than an A/B test). Findings showed that the group that got the self-check question contributed new forum posts at about twice the rate of the control group. As a result, discussion self-check questions are increasingly a common feature in HarvardX courses.

The edX experimentation toolkit is a great start but has a limited palette of options for experimental design. Researchers can set up A/B or A/B/…/N tests at the content block level but not at the chapter or unit level. It is also not possible to experiment with the overall structure of the course or the interface of the platform. All students are randomly assigned in equal proportions to all groups, whereas more sophisticated assignments through matched pairs (King et al. 2011) or dynamic assignments through multi-armed bandits (Scott 2010) would allow for a more robust range of experimental designs.

With a more diverse toolkit, researchers will be able to expand the number and kind of experimental interventions in each course. In a continuous research infrastructure with large numbers of students, researchers should lean toward including larger numbers of multifactorial experiments in MOOCs. For instance, in the discussion self-check experiment, only one version of the intervention was tested with nearly 10,000 students in each treatment and control group. A substantial amount of statistical power was, in a sense, wasted. Researchers could have tested different question stems, response anchors, question placements, and other features of the self-check to better understand the mechanism of the intervention and to optimize its effect.

Power Calculations

In typical classroom education research, power calculations are used to determine whether a study has sufficient numbers of subjects to make reasonable claims; in MOOC research, power calculations should be used to estimate how many different experiments can be fit into a course. In contrast to the “correlate everything with everything else” approach to post hoc observational research and the “try one thing and modify every year” approach to typical experimental research, researchers in a continuous research infrastructure should pursue a targeted set of theory-informed experiments, with large numbers of content and sequential variations, continuously updated from one weekly cohort to the next as new findings emerge (Williams and Heffernan 2015).

Each modification on particular interventions might result in trivial gains in learning outcomes—a slight improvement in explanatory text might result in effect size gains of only 1/100 of a standard deviation. In the typical approach to educational reform, such gains are seen far below the threshold of useful interventions to pursue, as defined for instance by the .25 effect size threshold for inclusion in the Institute of Education Sciences (2014) What Works Clearinghouse. Such high thresholds for worthwhile interventions are appropriate for reform efforts that require substantial time and commitment to implement.

In a continuous research infrastructure, it may be possible to string together 25 interventions with .01 effect size to produce meaningful gains in student learning, if at least some of the 25 effects are additive. If each intervention requires only a few minutes of programming to implement and can then be delivered with perfect fidelity to every subsequent student, then the cumulative effect of small interventions on student learning could eventually be quite substantial. Rather than seeking a single “home run” intervention, continuous research infrastructures seek to codify large numbers of small improvements that can build on one another.

Course Instructional Teams

To take full advantage of these more sophisticated platforms, courses will need to be designed from the beginning to answer specific research questions and course teams will need to include a greater range of multidisciplinary expertise. The current research model depends on a kind of serendipitous insight—that across dozens or hundreds of courses there are good ideas being baked into the instructional design of courses that predictive algorithms will be able to identify post hoc. A more promising approach is to design courses to answer the most important educational questions in a discipline.

Need for Multidisciplinary Teams

Diverse expertise is required to develop courses as spaces for systematic research. Course teams should include content experts and discipline-based education researchers who can identify relevant dilemmas in disciplinary instruction and propose competing approaches to resolve them.

Instructional designers are needed to incorporate effective pedagogical approaches in the online platform. Experts in causal research methods and experimentation should ensure best practices in randomization, experimental design, and data collection for analysis. Others with experience in assessment design need to develop an assessment infrastructure that measures student learning outcomes that are relevant to course goals and research questions.

Teams may require software developers who can implement new approaches to assessments or instruction. As the course progresses through daily or weekly cohorts, data scientists need to identify promising instructional approaches and areas for modifying and improving interventions.

The Open Learning Initiative has used a version of this multidisciplinary model for years in its course development (EDUCAUSE Learning Initiative 2006). One of the signature advantages of this approach to course design is that content experts often suffer from a “curse of expertise” (Hinds 1999)—they lose the ability to empathize with new learners and to predict the kinds of learning scaffolds that novices need in a new subject. Working in diverse teams ensures that course faculty have design partners who can represent the voice of content novices throughout the design of the course.

Organizational and Cultural Shift

While to some extent team experts may be able to work in a temporal sequence, a major organizational shift would enable close collaboration among the experts during the course design phase. Ideally, as universities evaluate faculty and courses for funding, courses with clear research goals would be given both priority in the approval process and additional funding.

“Research-focused” courses should be developed using a new, separate process that ensures that both learning goals and research goals are well defined from the beginning of course development and that various course elements align with those goals. Overall, universities might concentrate their resources to produce fewer and better courses.

These changes call for faculty to accept a profoundly different role as teachers. Rather than the current “one-man band” approach, where faculty serve as designers, presenters, assessors, and evaluators, faculty would join a team to carry out these roles. In exchange for diminished autonomy over every aspect of the course, faculty would get access to greater resources in creating a course and the opportunity to participate in systematic investigations into teaching and learning.

Some future MOOCs might be led by a new kind of faculty member whose primary purpose is leading these kinds of complex endeavors, or senior faculty might be released from teaching and service responsibilities to focus on a single course for an extended time. It is unclear how many current faculty would be eager to accept such a trade, but given the scale and cost of building MOOCs in a continuous research infrastructure, only a relatively few pioneers would be needed.

Funding and Incentivizing a Continuous Research Infrastructure

The proposed additional effort and expertise will make the development of MOOCs more expensive than the process is currently. Current expense levels, however, typically allow for substantial investment only in a “first draft” of MOOC materials and then course teams are required to turn their attention to new a project. These first drafts are of mixed quality, do not improve much between runs of a course, and many do not appear to be sustainable to produce over the long run (Hollands and Tirthali 2014).

While there may be value in dozens of new first-draft courses each year, to identify which topics will garner an audience, this rapid generation should be complemented by a smaller set of courses with much greater investment and improvement over a longer period of time. Developing a continuous research infrastructure for online learning will require substantial new funding to support platform upgrades and larger teams working together for longer periods of time, all of which will ideally be justified by more valuable research findings, better learning experiences for students, and a greater return on investment for universities.

Justifying Investment

There are at least two potential approaches for justifying the additional level of expense. One would be for universities and MOOC providers to lavish this level of attention on their highest-revenue-generating courses, in the expectation that greater learning, satisfaction, persistence, and rates of completion would raise the bottom line of profitable courses and provide a monetary justification for the higher expenses. The intersection of research and financial interests could prove to be quite powerful; the model appears to be the approach motivating Coursera’s Course Success Team (Riddell 2015).

A second approach would focus on “public good,” with the same level of attention brought to bear on the most commonly taken courses. Such a model would borrow inspiration from Rice University’s OpenStax initiative (Baraniuk 2013), which has created open source textbooks for the 20 college courses with the highest cumulative enrollment (mostly introductory courses in math, physics, biology, chemistry, history, and social sciences).

Philanthropists could justify investment in a continuous research infrastructure for these courses not only because they have the potential to benefit the large number of MOOC students but also because the research insights from careful analysis of these courses would benefit all introductory instruction in the discipline.

Incentivizing Faculty

Any discussion of efforts to improve teaching in higher education inevitably comes around to the challenges of incentivizing faculty to devote their scarce time and energy to teaching and discipline-based education research. Dedicated streams of funding can be one mechanism to unlock faculty time for such research to improve instruction.

Some science and engineering departments are also creating new positions, such as professors of practice, to retain faculty more committed to teaching than research. These teaching-focused faculty may be the ideal leaders for new courses based on a continuous research infrastructure.

In many disciplines and fields, education research is growing as a legitimized domain for scholarly inquiry—such as the physics education research community—with specialized conferences, journals, and recognition in tenure and promotion decisions. Nurturing the growth and prestige of these scholarly communities will, in the long run, be among the best ways to support faculty committed to advancing teaching and learning research in STEM fields.


The elements of a continuous research infrastructure for MOOCs already exist across platforms and organizations. The Open Learning Initiative offers a model multidisciplinary design process, Coursera has pointed the way to large-scale, on-demand MOOCs, and edX has implemented the most user-friendly interface for nonprogrammers to implement A/B testing in a learning management system.

The challenge is to weave these pieces together—in terms of both the technological platform and the social organization of course design—to create courses that are engines of research, learning, and improvement. Internet companies in the private sector have demonstrated the capacity for iterative development in online services through constant experimentation and refinement. Universities and their MOOC platform partners should apply these insights to the emerging platforms of online learning.

Rapid and opportunistic expansion of MOOC research has made clear the tremendous potential of the enterprise, and the next era should be characterized by a targeted focus on building the infrastructure that will allow MOOCs to more effectively advance the science of learning.


Anderson A, Huttenlocher D, Kleinberg J, Leskovec J. 2014. Engaging with massive online courses. Proceedings of the 2014 International World Wide Web Conference, April 7–11, Seoul. pp. 687–698.

Baker R, Evans B, Greenberg E, Dee T. 2014. Understanding persistence in MOOCs (massive open online courses): Descriptive and experimental evidence. EMOOCs: European MOOCs Stakeholder Summit, February 10–12, Lausanne. pp. 5–10.

Baraniuk RG. 2013. Opening education. The Bridge 43(2):41–47.

Dabbish L, Kraut R, Patton J. 2012. Communication and commitment in an online game team. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, May 5–10, Austin. pp. 879–888.

DeBoer J, Ho AD, Stump GS, Breslow L. 2014. Changing “course”: Reconceptualizing educational variables for massive open online courses. Educational Researcher 43(2):74–84.

EDUCAUSE Learning Initiative. 2006. Open Learning Initiative. Washington. Online at

Ferschke O, Howley I, Tomar G, Yang D, Rosé C. 2015. Fostering discussion across communication media in massive open online courses. Proceedings of the 11th International Conference on Computer Supported Collaborative Learning, June 7–11, Gothenburg, Sweden. Vol 1, pp. 459–466.

Fisher WW. 2014. CopyrightX. HarvardX Working Paper Series No. 5. Online at

Hinds PJ. 1999. The curse of expertise: The effects of expertise and debiasing methods on prediction of novice performance. Journal of Experimental Psychology: Applied 5(2):205–221.

Ho AD, Chuang I, Reich J, Coleman C, Whitehill J, Northcutt CG, Williams JJ, Hansen JD, Lopez G, Petersen R. 2015. HarvardX and MITx: Two years of open online courses, fall 2012–summer 2014. HarvardX Working Paper No. 10. Online at doi:10.2139/ssrn.2586847.

Hollands FM, Tirthali D. 2014. Resource requirements and costs of developing and delivering MOOCs. International Review of Research in Open and Distributed Learning 15(5):113–133.

Institute of Education Sciences. 2014. What Works Clearinghouse: Procedures and Standards Handbook (version 3.0). Washington.

King G, Nielsen R, Coberley C, Pope JE, Wells A. 2011. Comparative effectiveness of matching methods for causal inference. Unpublished working paper. Available at

Kizilcec RF, Schneider E, Cohen G, McFarland D. 2014. Encouraging forum participation in online courses with collectivist, individualist, and neutral motivational framings. eLearning Papers 37:13–22.

Kulkarni CE, Bernstein MS, Klemmer SR. 2015. PeerStudio: Rapid peer feedback emphasizes revision and improves performance. Proceedings of the Second ACM Conference on Learning @ Scale, March 14–18, Vancouver. pp. 75–84.

Lamb A, Smilack J, Ho AD, Reich J. 2015. Addressing common analytic challenges to randomized experiments in MOOCs: Attrition and zero-inflation. Proceedings of the Second ACM Conference on Learning @ Scale, March 14–18, Vancouver. pp. 21–30.

Margaryan A, Bianco M, Littlejohn A. 2015. Instructional quality of massive open online courses (MOOCs). Computers and Education 80:77–83.

Mullaney T. 2014. Making sense of MOOCs: A reconceptualization of HarvardX courses and their students. Unpublished undergraduate thesis. Available at

Mullaney T, Reich J. 2015. Staggered versus all-at-once content release in massive open online courses: Evaluating a -natural experiment. Proceedings of the Second ACM Conference on Learning @ Scale, March 14–18, Vancouver. pp. 185–194.

Nihalani R. 2013. Video is great, but unlocking MOOC data is the game changer. SkilledUp Insights blog, June 18. Online at

Norvig P. 2015. Machine learning for learning at scale. Proceedings of the Second ACM Conference on Learning @ Scale, March 14–18, Vancouver.

Perna LW, Ruby A, Boruch RF, Wang N, Scull J, Ahmad S, Evans C. 2014. Moving through MOOCs: Understanding the progression of users in massive open online courses. Educational Researcher 43(9):421–432.

Reich J. 2014a. Big data MOOC research breakthrough: Learning activities lead to achievement. Education Week EdTech Researcher Blog, March 30. Online at

Reich J. 2014b. Four types of MOOC research. Learning with MOOCs: A Practitioner’s Workshop, August 12–13, Cambridge, MA. Online at

Reich J. 2015. Rebooting MOOC research. Science 347(6217):34–35.

Riddell R. 2015. Coursera’s Stiglitz: MOOC revolution is just beginning. Education Dive, March 13.

Scott SL. 2010. A modern Bayesian look at the multi-armed bandit. Applied Stochastic Models in Business and Industry 26(6):639–658.

Williams JJ, Heffernan N. 2015. A methodology for discovering how to adaptively personalize to users using experimental comparisons. Online at


1 An experiment in the first version of CopyrightX in 2013 stands out as an exception: it tested two different curricula, one based on US law and one based on international examples (Fisher 2014).

2 edX does not provide platform support for on-demand courses, so CS50x (Introduction to Computer Science) runs from January 1 through December 31 each year, and as the year progresses the recommended schedule of deadlines automatically updates, albeit for all students rather than in a personalized way.

About the Author: Justin Reich is executive director of the MIT Teaching Systems Lab.