======================================
===== Álvaro A. Gutiérrez-Vargas =====
======================================

Are my labels there? Searching among label variables values on Stata

Stata ENG

The problem

This brief post answer the question my orignal google search How to check if a value is contained into a label variable on stata?

Imagine we have a problem where we need to check if a string is contained into the labels of another variable. Let’s start with a minimal example using sysuse auto and the variable foreing which is the only one with labels already in place.

clear all
sysuse auto, clear
tab foreign 
tab foreign ,nola

   Car type |      Freq.     Percent        Cum.
------------+-----------------------------------
   Domestic |         52       70.27       70.27
    Foreign |         22       29.73      100.00
------------+-----------------------------------
      Total |         74      100.00

The solution

The simple answer to the question is contained in the following lines. For example, if we search for an existing label like Domestic. We can see that 52 observations present such a label.

sum foreign if foreign =="Domestic":`: value label foreign' 

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
     foreign |         52           0           0          0          0

However, if we seach for a label that is not present we obtain the following.

sum foreign if foreign =="Borges":`: value label foreign' 
(value label dereference "Borges":origin not found)

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
     foreign |          0

Up to here the question is already answered, it was just a combination of summarize and a clever way to search among labels.

The explanation

Well, let’s start breaking down the syntax used above in small pieces. Take a look on the following lines. Fist, label list will list all the existent labels, where the only existent one is called origin. Therefore, when we repeate the summarize operation replacing `: value label foreign' by origin we, indeed, obtain the same results, meaning that learn that the the expression among quotes is no more than a way to call the name of the label attached to the variable foreign. To know more about such expressions that calculates results on the fly (N.Cox 2008)

label list
origin:
           0 Domestic
           1 Foreign

. sum foreign  if foreign =="Domestic":origin

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
     foreign |         52           0           0          0          0

The fancy solution

Now let’s write a program that can check the existence of a string among the labels of a variable for us. We will use a combination of summarize, capture and confirm.

The following program (check_labels) will check if the string that we type on the option , label() is present or not in the labels of a variable.

syntax varlist(max=1) [if] , label(string) is declaring that the program will just allows one variable and one mandatory option called label() that is going to be read as a string by the program.

After, we use quietly sum calling the options from syntax in the same fashion as the first example, and adding the option ,nomean to speed up the process because it won’t compute the variance (probably a better name could be novariance tho).

Finally, we use confirm to check if the value was or nor present into the label. We will get a 7 when it’s not present and a 0 when the label is actually present storage in _rc. See help confirm. After we can storage into a scalar called present a value of 1 when the the string was present and 0 if it wasn’t.

capture drop program check_labels

program define check_labels ,rclass
    version 12.0
    syntax varlist(max=1) [if] , label(string)              

    quietly sum `varlist' if  `varlist' == "`label'": `: value label `varlist'' ,nomean
    
    local mean_altern = r(mean)
    capture confirm number `mean_altern'
    if _rc ==7 {
        di in red " `label' isn't present on labels of var `varlist' "
        return scalar present = 0
    }
    else if _rc ==0 {
        di in red " `label' is present on labels of var `varlist' "
        return scalar present = 1
    } 
end

To conclude, using can invoke the program check_labels as follows.

check_labels foreign , label(Domestic)
 Domestic is present on labels of var foreign 

ret list

scalars:
            r(present) =  1
check_labels foreign , label(Borges)
 Borges isn't present on labels of var foreign 

ret list
scalars:
            r(present) =  0

Resourses that inspired me to post this.