Write your own command to fit a likelihood using a Mata-evaluator with ml command on Stata
Discrete Choice Models Mata Stata mlIf you are reading this you should be quite desperate to get an answer about how to deal with the impossible [holy] trinity of put together a (1) d-family Mata evaluator, within an (2) ado file (a.k.a your own new command) that performs a maximum likelihood estimation using (3) -ml- invoking your Mata evaluator. Let me say that you are probably (?) in the write place.
Indeed, this post is an improved version of a later post where I fitted a conditional logit using -ml- with Mata-based likelihood function. Now, we are a little bit futher and we will write our own command -MyClogit.ado- to perform the same task. In principle, this could sound pretty repetititve and useless, but I can tell you that it does not.
Based on a early error that I faced in the past, maily explained and solved here, I came up with the following solution.
Also, as a warning, this post assumes that you are an advanced user of Stata and Mata, as well as you know how to declare programs and parse using (syntax)[https://www.stata.com/manuals13/psyntax.pdf]. Just for the sake of concreteness, I am not trying to write a replacement for the the excellent book Gould, W., Pitblado, J., & Sribney, W. (2006), nevertheless this is just small cavear that was uncovered on it. Which, to be concrete, is the coexistence between a Mata d-family evaluator within a ado file that uses -ml- to fit the model, and as you can see is a extremely specific topic…
That’s being said, the explanations here are kept the very minumum and restricted only to the relavant part, aka, the creation of mata libraries.
We will use 3 main archives:
Given that I am assuming that you are an advanced user, all the files should be self-evident for you, but a couple of commets are worth to mention:
First, within MyClogit.ado I create a mata library using the line
qui capture do MyLikelihood_LL.mata
this is the whole glue of the three achives. This line, indeed, create a mata library that we will storage our mata evaluator calledMyLikelihood_LL
to be called later using ml in the lineml model d0 MyLikelihood_LL()
. The parenteses is the way to tell -ml- that we are fitting using a Mata evaluator.Second. When I said that
qui capture do MyLikelihood_LL.mata
creates a Mata library it actually happens at the very bottom of the file MyLikelihood_LL.mata. I create a library calledlMyClogit
which have inside our Mata evaluator calledMyLikelihood_LL().
. This is storaged in a file calledlMyClogit.mlib
on your folder.Third and finally. All the three files need to be places in the same directory and you need to declare that you are actually in such a directory when running
MyClogit
. Therefore, in MyClogit_test.do I first set my work enviroment, then create the data and at the very bottom invoke the function obtaining the same results as in my previous post.
MyClogit.ado
program MyClogit
version 12
if replay() {
if ("`e(cmd)'" != "MyClogit") error 301
Replay `0'
}
else Estimate `0'
end
program Estimate, eclass sortpreserve
syntax varlist(fv) [if] [in] , ///
GRoup(varname) ///
[vce(passthru) ///
noLOg ///
Level(cilevel) ///
TRace ///
GRADient ///
HESSian ///
SHOWSTEP ///
ITERate(passthru) ///
TOLerance(passthru) ///
LTOLerance(passthru) ///
GTOLerance(passthru) ///
NRTOLerance(passthru) ///
CONSTraints(passthru) ///
TECHnique(passthru) ///
DIFficult ///
FRom(string) ///
]
di "`constraints'"
local mlopts `trace' `gradient' `hessian' `showstep' `iterate' `tolerance' ///
`ltolerance' `gtolerance' `nrtolerance' `constraints' `technique' `difficult' `from'
if ("`technique'" == "technique(bhhh)") {
di in red "technique(bhhh) is not allowed."
exit 498
}
capture confirm numeric var `group'
if _rc != 0 {
di in r "The group variable must be numeric"
exit 498
}
// check syntax
gettoken lhs rhs : varlist
// mark the estimation sample
marksample touse
markout `touse' `group'
// id of individuals
global MY_panel = "`group'"
// qui create mlibrary with all LL
qui capture do MyLikelihood_LL.mata
// Title
loc title "MyClogit"
loc LL "(MyClogit: `lhs' = `rhs', nocons) "
ml model d0 MyLikelihood_LL() ///
`LL' ///
if `touse', ///
`vce' ///
missing ///
first ///
`log' ///
`mlopts' ///
`init' ///
title(`title') ///
maximize
// Show models
ereturn local cmd MyClogit
Replay , level(`level')
ereturn local cmdline `"`0'"'
end
program Replay
syntax [, Level(cilevel) ]
ml display , level(`level')
end
MyLikelihood_LL.mata
mata:
void MyLikelihood_LL(transmorphic scalar M, real scalar todo,
real rowvector b, real scalar lnf,
real rowvector g, real matrix H)
{
// variables declaration
real matrix panvar
real matrix paninfo
real scalar npanels
real scalar n
real matrix Y
real matrix X
real matrix x_n
real matrix y_n
// regressors and explained variable
Y = moptimize_util_depvar(M, 1)
X= moptimize_init_eq_indepvars(M,1)
// betas
id_beta_eq=moptimize_util_eq_indices(M,1)
betas= b[|id_beta_eq|]
st_view(panvar = ., ., st_global("MY_panel"))
paninfo = panelsetup(panvar, 1)
npanels = panelstats(paninfo)[1]
lnfj = J(npanels, 1, 0)
for(n=1; n <= npanels; ++n) {
x_n = panelsubmatrix(X, n, paninfo)
y_n = panelsubmatrix(Y, n, paninfo)
// Linear utility
util_n =betas :* x_n
util_sum =rowsum(util_n)
U_exp = exp(util_sum)
// Probability of each alternative
p_i = (colsum(U_exp:* y_n)) / (colsum(U_exp))
// Add contribution to the likelihood
lnfj[n] = ln(p_i)
}
lnf = moptimize_util_sum(M, lnfj)
}
end
// Creating a matalibrary that is available EVERYWHERE FOR EVER
mata:
mata mlib create lMyClogit, replace
mata mlib add lMyClogit MyLikelihood_LL()
mata mlib index
end
MyClogit_test.do
clear all
set more off
cd "C:\Users\u0133260\Dropbox\blog_with_smol\content\post\Do_Files\mata_ml_in_ado_form"
version 12
drop _all
set obs 1000
gen id = _n
local n_choices =3
expand `n_choices'
bys id : gen alternative = _n
// Regressors
gen x1 = runiform(-2,2)
gen x2 = runiform(-2,2)
matrix betas_st = (0.5,2)
mata:
void data_gen()
{
betas =st_matrix("betas_st")
st_view(X = ., ., "x*")
st_view(panvar = ., ., "id")
paninfo = panelsetup(panvar, 1)
npanels = panelstats(paninfo)[1]
for(n=1; n <= npanels; ++n) {
x_n = panelsubmatrix(X, n, paninfo)
// Linear utility
util_n =betas :* x_n
util_sum =rowsum(util_n)
U_exp = exp(util_sum)
// Probability of each alternative
p_i = U_exp :/ colsum(U_exp)
cum_p_i =runningsum(p_i)
rand_draws = J(rows(x_n),1,uniform(1,1))
pbb_balance = rand_draws:<cum_p_i
cum_pbb_balance = runningsum(pbb_balance)
choice_n = (cum_pbb_balance:== J(rows(x_n),1,1))
if (n==1) Y =choice_n
else Y=Y \ choice_n
}
resindex = st_addvar("byte","choice")
st_store((1,rows(Y)),resindex,Y)
}
end
// Calling generating data
mata: data_gen()
// Fitting the model
MyClogit choice x* , group(id)
The results
. MyClogit choice x* , group(id)
initial: log likelihood = -1098.6123
alternative: log likelihood = -1096.4835
rescale: log likelihood = -1039.8137
Iteration 0: log likelihood = -1039.8137
Iteration 1: log likelihood = -559.44484
Iteration 2: log likelihood = -535.63127
Iteration 3: log likelihood = -535.23192
Iteration 4: log likelihood = -535.23099
Iteration 5: log likelihood = -535.23099
MyClogit Number of obs = 3,000
Wald chi2(2) = 407.96
Log likelihood = -535.23099 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
choice | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x1 | .4915744 .0523072 9.40 0.000 .3890543 .5940946
x2 | 1.951068 .0970573 20.10 0.000 1.760839 2.141297
------------------------------------------------------------------------------
References
- Gould, W., Pitblado, J., & Sribney, W. (2006).. Maximum likelihood estimation with Stata. Stata press.
Here are some deadends that I faced when I was trying to solve this problem. They are also really helpful for further discussion and reading.