======================================
===== Álvaro A. Gutiérrez-Vargas =====
======================================

Write your own command to fit a likelihood using a Mata-evaluator with ml command on Stata

Discrete Choice Models Mata Stata ml

If you are reading this you should be quite desperate to get an answer about how to deal with the impossible [holy] trinity of put together a (1) d-family Mata evaluator, within an (2) ado file (a.k.a your own new command) that performs a maximum likelihood estimation using (3) -ml- invoking your Mata evaluator. Let me say that you are probably (?) in the write place.

Indeed, this post is an improved version of a later post where I fitted a conditional logit using -ml- with Mata-based likelihood function. Now, we are a little bit futher and we will write our own command -MyClogit.ado- to perform the same task. In principle, this could sound pretty repetititve and useless, but I can tell you that it does not.

Based on a early error that I faced in the past, maily explained and solved here, I came up with the following solution.

Also, as a warning, this post assumes that you are an advanced user of Stata and Mata, as well as you know how to declare programs and parse using (syntax)[https://www.stata.com/manuals13/psyntax.pdf]. Just for the sake of concreteness, I am not trying to write a replacement for the the excellent book Gould, W., Pitblado, J., & Sribney, W. (2006), nevertheless this is just small cavear that was uncovered on it. Which, to be concrete, is the coexistence between a Mata d-family evaluator within a ado file that uses -ml- to fit the model, and as you can see is a extremely specific topic…

That’s being said, the explanations here are kept the very minumum and restricted only to the relavant part, aka, the creation of mata libraries.

We will use 3 main archives:

Given that I am assuming that you are an advanced user, all the files should be self-evident for you, but a couple of commets are worth to mention:

  • First, within MyClogit.ado I create a mata library using the line qui capture do MyLikelihood_LL.mata this is the whole glue of the three achives. This line, indeed, create a mata library that we will storage our mata evaluator called MyLikelihood_LL to be called later using ml in the line ml model d0 MyLikelihood_LL(). The parenteses is the way to tell -ml- that we are fitting using a Mata evaluator.

  • Second. When I said that qui capture do MyLikelihood_LL.mata creates a Mata library it actually happens at the very bottom of the file MyLikelihood_LL.mata. I create a library called lMyClogit which have inside our Mata evaluator called MyLikelihood_LL().. This is storaged in a file called lMyClogit.mlib on your folder.

  • Third and finally. All the three files need to be places in the same directory and you need to declare that you are actually in such a directory when running MyClogit. Therefore, in MyClogit_test.do I first set my work enviroment, then create the data and at the very bottom invoke the function obtaining the same results as in my previous post.

MyClogit.ado

program MyClogit 
    version 12

    if replay() {
    if ("`e(cmd)'" != "MyClogit") error 301
    Replay `0'
    }
    else Estimate `0'
end


program Estimate, eclass sortpreserve 
    syntax varlist(fv) [if] [in] ,      ///
            GRoup(varname)              ///
            [vce(passthru)              ///
            noLOg                       /// 
            Level(cilevel)              ///
            TRace                       ///
            GRADient                    ///
            HESSian                     ///
            SHOWSTEP                    ///
            ITERate(passthru)           ///
            TOLerance(passthru)         ///
            LTOLerance(passthru)        ///
            GTOLerance(passthru)        ///
            NRTOLerance(passthru)       ///
            CONSTraints(passthru)       ///
            TECHnique(passthru)         ///
            DIFficult                   ///
            FRom(string)                /// 
    ]

    
    
    di "`constraints'"
    
    local mlopts `trace' `gradient' `hessian' `showstep' `iterate' `tolerance' ///
    `ltolerance' `gtolerance' `nrtolerance' `constraints' `technique' `difficult' `from'
    
    if ("`technique'" == "technique(bhhh)") {
    di in red "technique(bhhh) is not allowed."
    exit 498
    }

    capture confirm numeric var `group'
    if _rc != 0 {
        di in r "The group variable must be numeric"
        exit 498
    }

    // check syntax
    gettoken lhs rhs : varlist
    
    // mark the estimation sample
    marksample touse
    markout `touse' `group' 
    
    // id of individuals
    global MY_panel = "`group'"
    
    // qui create mlibrary with all LL
    qui capture do MyLikelihood_LL.mata 
    
    // Title
    loc title "MyClogit"

    loc LL  "(MyClogit: `lhs' = `rhs', nocons) " 
    
    ml model d0 MyLikelihood_LL()           ///
            `LL'                                          ///   
            if `touse',                             ///
            `vce'                                       ///
            missing                                     ///
            first                                       ///
            `log'                                         ///
            `mlopts'                                    ///     
            `init'                                      /// 
            title(`title')                          ///
            maximize                
            
        // Show models
        ereturn local cmd MyClogit
        Replay , level(`level') 
        ereturn local  cmdline `"`0'"'  
    
end


 program Replay
    syntax [, Level(cilevel) ]
    ml display , level(`level')  
 end

MyLikelihood_LL.mata

mata:
    void MyLikelihood_LL(transmorphic scalar M, real scalar todo,
    real rowvector b, real scalar lnf,
    real rowvector g, real matrix H)
 {

    // variables declaration    
    real matrix panvar  
    real matrix paninfo 
    real scalar npanels 

    real scalar n 
    real matrix Y 
    real matrix X 
    
    real matrix x_n 
    real matrix y_n 
    
    
    
    // regressors and explained variable
    Y = moptimize_util_depvar(M, 1)     
    X= moptimize_init_eq_indepvars(M,1) 
    
    // betas
    id_beta_eq=moptimize_util_eq_indices(M,1)
    betas= b[|id_beta_eq|]
    
 
    st_view(panvar = ., ., st_global("MY_panel"))
    paninfo = panelsetup(panvar, 1)     
    npanels = panelstats(paninfo)[1] 
    lnfj = J(npanels, 1, 0) 

    for(n=1; n <= npanels; ++n) {

        x_n = panelsubmatrix(X, n, paninfo) 
        y_n = panelsubmatrix(Y, n, paninfo) 
        
        // Linear utility
        util_n =betas :* x_n
        util_sum =rowsum(util_n) 
        U_exp = exp(util_sum)
        // Probability of each alternative
        p_i =     (colsum(U_exp:* y_n)) / (colsum(U_exp))
        // Add contribution to the likelihood
        lnfj[n] = ln(p_i)
    
    }
    
    lnf = moptimize_util_sum(M, lnfj)
 }
    
end
    
    
// Creating a matalibrary that is available EVERYWHERE FOR EVER
mata:

mata mlib create lMyClogit, replace
mata mlib add    lMyClogit MyLikelihood_LL()
mata mlib index

end

MyClogit_test.do


clear all
set more off

cd "C:\Users\u0133260\Dropbox\blog_with_smol\content\post\Do_Files\mata_ml_in_ado_form"

version 12
drop _all
set obs 1000
gen id = _n
local n_choices =3
expand `n_choices'
bys id : gen alternative = _n

// Regressors
gen x1 =  runiform(-2,2)
gen x2 =  runiform(-2,2)
matrix betas_st = (0.5,2)

mata: 
void data_gen() 
{
        betas =st_matrix("betas_st")

        st_view(X = ., ., "x*")
        st_view(panvar = ., ., "id")
        paninfo = panelsetup(panvar, 1)     
        npanels = panelstats(paninfo)[1] 
        


        for(n=1; n <= npanels; ++n) { 

            x_n = panelsubmatrix(X, n, paninfo) 
            // Linear utility
            util_n =betas :* x_n
            util_sum =rowsum(util_n) 
            U_exp = exp(util_sum)
            // Probability of each alternative
            p_i =     U_exp :/ colsum(U_exp)
                
            cum_p_i =runningsum(p_i)    
            rand_draws = J(rows(x_n),1,uniform(1,1)) 
            pbb_balance = rand_draws:<cum_p_i
            cum_pbb_balance = runningsum(pbb_balance)
            choice_n = (cum_pbb_balance:== J(rows(x_n),1,1))

            
            if (n==1)     Y =choice_n   
            else Y=Y \ choice_n 
            
         }
    
        resindex = st_addvar("byte","choice")
        st_store((1,rows(Y)),resindex,Y)    
}
end


// Calling generating data
mata: data_gen()

// Fitting the model
MyClogit choice  x* , group(id) 

The results

. MyClogit choice  x* , group(id) 


initial:       log likelihood = -1098.6123
alternative:   log likelihood = -1096.4835
rescale:       log likelihood = -1039.8137
Iteration 0:   log likelihood = -1039.8137  
Iteration 1:   log likelihood = -559.44484  
Iteration 2:   log likelihood = -535.63127  
Iteration 3:   log likelihood = -535.23192  
Iteration 4:   log likelihood = -535.23099  
Iteration 5:   log likelihood = -535.23099  

MyClogit                                        Number of obs     =      3,000
                                                Wald chi2(2)      =     407.96
Log likelihood = -535.23099                     Prob > chi2       =     0.0000

------------------------------------------------------------------------------
      choice |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x1 |   .4915744   .0523072     9.40   0.000     .3890543    .5940946
          x2 |   1.951068   .0970573    20.10   0.000     1.760839    2.141297
------------------------------------------------------------------------------