I've been wanting to post about my experiences teaching using R and RStudioServer for teaching. I've got a list of big picture things, but today I'm thinking about a small, but for me crucial, one. 

In my classes and workshops I often use an rmarkdown template to help students get started.  You get the option to use a template when you create a new rmarkdown file.   Rmarkdown comes with two default templates one for a github document and one for a package vignette.  Both are wordy, but less so than the default markdown file, which my students find confusing at first and the annoying in always having to blank out.  

From the user point of view, a template is an RMarkdown file which contains some amount of predefined structure. I made that intentionally vauge because you can have a template that is a blank file if you want. But more often in my teaching I use templates to scaffold student work as they are learning to use R and RMarkdown.  Scaffolding is a term used in writing pedagogy.  Around Lehman most of the time it is used to me giving a students a  process of brainstorming, drafting, revision and polishing (and more depending on the course). It is also used to mean providing students with a structure for getting from point A to point B.  In my case point B is understanding specific concepts in statistics and R.

I use templates for both of these kinds of scaffolding when I teach with R.  For example, in the early exposure to R the experience can be very overwhelming for students. With a template I can structure experiences that are simple, but highly rewarding.  For example, they might just have to add a title to a graph, enter variable names into an analysis, and write some text.  For students who are further along, a template might provide the outline for a research report.  I also have a blank template that gets around the "first delete everything but the stuff at the top" step.

Creating a template is easy. It is just a markdown file. For example this is a somewhat complex file I created for a two-hour introduction to data science workshop this summer. The students had never used R before nor had most of them had any sociology, so I wanted to make a structure for the online part of the workshop that would ensure success.  This is the almost blank template file that I'm considering making even more blank. This is an outline for a project report.

You could, of course, just share these as gists, but I like to put them into a package so they can be opened directly from RStudio. To do that you can include them in an R package. Loading the package will add them to the list of options when creating a new file.  if you don't have a package you can use any of the package making tools to create a skeleton package.  

The templates rely on the somewhat magical inst folder in R packages.  That directory holds "installed files" which are basically all the other files you might need in your package.  They get copied to the top level when your package is installed using whatever file structure you have given them

These templates always require two files, and they go into  a subfolder in the inst/rmarkdown/templates folder.   So the blank template would be in inst/rmarkdown/templates/blank. At that level you add a file called template.yaml.  It is extremely important that you make sure there is an a in  the yaml file extension or RStudio willl not recognize that file when it scans for templates to list. Be careful because in saving and copying RStudio defaults to yml.  

The templates.yaml file should contain two items, name: and description.

name: blank
description: >
    A blank template

The other file should be called skeleton.Rmd  and placed in a subfolder called skeleton. 

The biggest problems I have with this system have to do with file extensions. If you have  yml instead of yaml or rmd instead of Rmd the templates may not show up on the list of templates.  The case issue is (from what I can tell) specific to Linux which is what most people probably have their instances of RStudioServer installed on.

Another small issue is  that different operating systems sort alphabetically differently, so not all students see the same list in the same order. 

Also I want to note that there is a whole different category of things called templates that are related to the templates package. It's not the only time that the same term is used in the context of unrelated things in R.

Updated to add information about the Rmd file extension.

In 1939 Edwin Sutherland gave a presidential address to the American Sociological Society, and it made page 12 of the New York Times.  Nine full paragraphs summarized his talk, reporting that "Dr. Sutherland described present day white collar criminals as 'more suave and deceptive' than last century's 'robber barons' and asserted that 'in many periods more important crime news may be found on the financial pages of the newspapers than the front pages.'"

That speech, on December 27, is usually considered the moment at which the term "white collar crime" was invented.  Of course crimes by elites, crimes of deception,  financial crimes, and crimes in business had been written about at least since Leviticus, Sutherland pulled all these concepts together with one incredibly evocative phrase.  I've often wondered if this story was really as clear cut as the story makes out.  I took a look at the Google books n-gram data for "white collar crime" and "white-collar crime" (full size graph).

"White Collar Crime" Usage Over Time, 1920-2008

The n-gram data certainly seems to be consistent with the story.  The data are far from perfect (we know Google scanned a lot of books but not how they chose them; it seems certain that it was not random; also sometimes they include modern additions or annoations) but the data are always interesting to consider. 

I also was curious about whether the term white-collar crime displaced anything else but that seems not to have been the case, at least in any obvious way. Still, it is interesting to see the rise of the term "financial crime" since the mid-1970. The persistence of "robber barons" (with its own history) is also fascinating since it is also such an evocative term.   (full size graph)

Use of Five Related Terms, 1920-2008

Sutherland's book White Collar Crime was published in 1948 and it is probably fair to attribute to it what is essentially a doubling of the use of the term (if you combine uses with and without the hyphen) at that point. But even before then his work shook things up in the world of sociology and particularly the sociological study of crime. I've written before about how Robert Merton revised his "Social Structure and Anomie" paper in response to Sutherland, and how that revision made the version that appeared in  Social Theory and Social Structure so much more powerful.  The fact that criminology students are often assigned the 1938 version is maddening to me.

I am really interested in the peak that happened in 1980, which is right around the time that all the federal money that paid part of my way through graduate school and funding the work that led to Crimes of the Middle Classes was awarded. That is around the same time as the Conyers report and the founding of the National White Collar Crime Center.

Overall, at least for now, it seems as though the story of Sutherland's invention of white collar crime as both a phrase and a form of classification seems to be true.

Steven Weber's book The Success of Open Source is a book I read when I first joined the OSM board. There are a lot of books about Open Source but Weber's is the one that makes the most serious effort to think about open source from a social science perspective. Which is to say, it incorporates a serious effort to use somewhat systematic empirical data and to apply a number of theoretical concepts from political economy. In other words, very much from my world and I'd guess almost no one actually in the open source world has actually read it all the way through. This is just like the fact that I've never read, I don't know, Knuth. I can and do read about code and algorithms and so on--no problem reading and understanding most of the mass market books on how to write PHP, but let's be real. There are some books that  are for people who have taken the  computer science classes that are for computer scientists  and there are some that are for people like me who are code curious. So that's why people who write software read books like The Cathedral and the Bazaar or Dreaming in Code (both important books) that give an anthropology-lite treatment of the open source world, but they aren't really reading serious social science. If nothing else you can tell by the bibliographies.

So I've decided to start rereading Weber's book three years later. When I first read it there were some things I thought he got wrong and some things that i thought he got right.

I'm very interested in the general issue of how people organize themselves to get things done. Sometimes this is done in an intentional and self conscious manner, but at other times and much more often this is shaped by social and institutional forces. For example, early in my time in graduate school there was an important article by two faculty members in my department (Paul DiMaggio and Woody Powell) called "The iron cage revisited" institutional isomorphism and collective rationality in organizational fields" which helped to clarify this.  The short version of the idea is that organizations, especially those in the same general environment or field, often end up  very similar to each other. The question is why? Is there just some "natural" way for organizations to form? Or can we understand this as a consequence of social and institutional factors.  Of course, answer number two is correct. People in organizations think those organizations just happen or happen solely because of conscious decisions they make, but that kind of thinking represents two extremes of the same wrong approach. The first has no room for people to make decisions. The second overstates the level of autonomy that people have when organizing. So, this line of work helps us understand why open source projects tend to follow one of a few patterns.

Weber identifies four key strategies that open source projects use to "manage complexity among a geographically dispersed community not subject to hierarchical control."  These are "technical design, sanctioning mechanisms, the license as explicit social structure, and formal governance institutions." (172)

Each one of these deserves a careful look, so I'm going to do separate discussions. Taken together they help explain a great deal.