Stata: Using generate to create new variables
06 Jul 2011Generating New Variables
The primary method for creating new variables in Stata is the generate
command. Load the auto dataset.
clear
sysuse auto
describe
New Variable from Existing Variables
Let’s create a new variable that is the sum of weight and length (ignore for the moment that summing weights and lengths doesn’t make a ton of sense). Simple with generate
. The syntax of generate
is:
generate nameOfNewVariable=whateverTheNewVariableIsEqualTo
So to create a new variable called weightlength that is the sum of weight and length we type:
generate weightlength = weight+length
Now we have new variable called weightlength.
Suppose now that we want to create a new variable that is the square of weight.
generate weight2 = weight^2
New Variable that is a Constant
Suppose we want to create a new variable that is a constant value (this isn’t necessarily a good idea and you can use macros to store constants but using a variable can be pretty convenient too). Let’s make a new variable x that is equal to 100.
generate x = 100
Let’s create a new variable that is equal to the mean of weight – we’ll call it meanweight.
summarize weight
generate meanweight = 3019.459
You can also use the results of the summarize
command to create a mean.
summarize weight
generate meanweight = r(mean)
You can use the _N
operator to create a new variable that is equal to the number of observations in a dataset.
generate obs = _N
If you combine this with by
you can create a new variable that will be equal to the number of observations within the levels of the by
variable. For example, we can type:
by foreign: generate obs = _N
This will create a variable that is a constant within the levels of foreign. That is, we are going to get the number of foreign cars and the number of domestic cars. If a line in the data is associated with foreign cars the new obs variable will have a value of 22 and domestic cars will have a value of 52. Give it a try and see how it works.
New Variable that is a Random Draw from a Distribution
We can create a new variable that is a random draw from a distribution. Let’s create a new variable whose values will be random draws from a normal distribution with a mean of 0 and a standard deviation of 1. The random normal generator command is rnormal()
(it defaults to a mean of 0 and standard deviation of 1 and it will draw as many values as there are observations in the dataset).
generate random = rnormal()
Create a New Variable that Indexes the Observations
You can use the _n
operator to create a variable that indexes the observation number.
generate index = _n
This will create a new variable that runs from 1 to 74. You can combine this with by
to create an index within another variable.
by foreign: index = _n
This will create a new variable that runs from 1 to 52 for domestic cars and 1 to 22 for foreign cars.
Conclusion
I’ve just touched on the ways you can create new variables. You can also use the egen
command to create new variables. Try new ways to create variables and be sure to read the help files.