RooFit - Creating & Fitting Models - Working with Workspaces

Statistics can be very particular about its use of language. We begin with some definitions for a few basic concepts.

  • observable - something you measure in an experiment, for example a particle's momentum. Often, a function of measured quantities, for example an invariant mass of several particles
  • global observable or auxiliary observable - an observable from another measurement, for example, the integrated luminosity
  • model - a set of probability density functions (PDFs), which describe distributions of the observables or functions of observables
  • model parameter - any variable in your PDF expression, which is not an observable
  • parameter of interest (POI) - a model parameter that you study, for example a cross section of your signal process
  • nuisance parameter - every other model parameter, which is not your parameter of interest
  • data or dataset - a set of values of observables, either measured in an experiment, or simulated
  • likelihood - a model computed for a particular dataset
  • hypothesis - a particular model, with specified observables, POI, nuisance parameters, and prior PDFs (in case of Bayesian inference)
  • prior PDF - a probability density for an observable or a model parameter, which is known a priori, i.e. before a measurement is considered. This is a Bayesian concept exclusively. Prior has no meaning or place in a frequentist type of inference
  • Bayesian - a type of statistical inference that usually produces probability of the hypothesis given the data. Requires a prior.
  • frequentist - a type of statistical inference that usually produces probability of the data given the hypothesis.
In [1]:
import ROOT
%jsroot on
Welcome to JupyROOT 6.07/07

Creating and fitting a model

The aim of the RooFit exercises is to learn how to build a RooFit model, how to fit the model on a set of data that we will generate and how to plot the result. We will also learn how to save the full RooFit workspace containing the model and data in a file which we can read back later, for example when we will do the RooStats exercises. We will learn also how to generate the data from the model. The same generated data we will then use to fit the model

In previous exercises we bypassed the workspace functionality. This allows us to save the statistical model and use it later. This is useful for us in case we wish to use it later. (in the next notebook)

In [2]:
w = ROOT.RooWorkspace("w")
w.factory("Gaussian:pdf(x[-10,10],mu[1,-1000,1000],sigma[2,0,1000])")
Out[2]:
<ROOT.RooGaussian object ("pdf") at 0x609a140>
RooFit v3.60 -- Developed by Wouter Verkerke and David Kirkby 
                Copyright (C) 2000-2013 NIKHEF, University of California & Stanford University
                All rights reserved, please read http://roofit.sourceforge.net/license.txt

We extract some parameters from this workspace and generate some data. Remember that RooStats has trouble overloading the generate function so we need to explicitly define x as a RooArgSet.

In [3]:
pdf = w.pdf("pdf")
x = w.var("x")
n = 1000
data = pdf.generate(ROOT.RooArgSet(x),n)
data.SetName("data")

Already we arrive at the second of two major problems with RooStats/RooFit in python. The first (above) is function overloading. The last is seen here when we would like to use a simple Title definition that is usually obtained from the RooFit name space. (odd that it's RooFit and not RooStats especially when I would like to demonstrate the differences between the two libraries)

In [4]:
pl = x.frame(ROOT.RooFit.Title("Gaussian Fit"))

This is the first time we use the canvas. Never the less it's useful to make a new TCanvas each time we plot.

In [5]:
c = ROOT.TCanvas()
data.plotOn(pl)
pl.Draw()
c.Draw()
In [6]:
pdf.fitTo(data, ROOT.RooFit.Minimizer("Minutit2","Migrad"))
Out[6]:
<ROOT.RooFitResult object at 0x(nil)>
[#1] INFO:Minization -- RooMinimizer::optimizeConst: activating const optimization
 **********
 **    1 **SET PRINT           1
 **********
 **********
 **    2 **SET NOGRAD
 **********
 PARAMETER DEFINITIONS:
    NO.   NAME         VALUE      STEP SIZE      LIMITS
     1 mu           1.00000e+00  2.00000e+02   -1.00000e+03  1.00000e+03
     2 sigma        2.00000e+00  1.00000e+00    0.00000e+00  1.00000e+03
 **********
 **    3 **SET ERR         0.5
 **********
 **********
 **    4 **SET PRINT           1
 **********
 **********
 **    5 **SET STR           1
 **********
 NOW USING STRATEGY  1: TRY TO BALANCE SPEED AGAINST RELIABILITY
 **********
 **    6 **MIGRAD        1000           1
 **********
 FIRST CALL TO USER FUNCTION AT NEW START POINT, WITH IFLAG=4.
 START MIGRAD MINIMIZATION.  STRATEGY  1.  CONVERGENCE WHEN EDM .LT. 1.00e-03
 FCN=2151.01 FROM MIGRAD    STATUS=INITIATE       10 CALLS          11 TOTAL
                     EDM= unknown      STRATEGY= 1      NO ERROR MATRIX       
  EXT PARAMETER               CURRENT GUESS       STEP         FIRST   
  NO.   NAME      VALUE            ERROR          SIZE      DERIVATIVE 
   1  mu           1.00000e+00   2.00000e+02   2.01358e-01  -1.34087e+04
   2  sigma        2.00000e+00   1.00000e+00   2.31716e-02  -1.74083e+03
                               ERR DEF= 0.5
 MIGRAD MINIMIZATION HAS CONVERGED.
 MIGRAD WILL VERIFY CONVERGENCE AND ERROR MATRIX.
 COVARIANCE MATRIX CALCULATED SUCCESSFULLY
 FCN=2149.23 FROM MIGRAD    STATUS=CONVERGED      35 CALLS          36 TOTAL
                     EDM=1.62584e-07    STRATEGY= 1      ERROR MATRIX ACCURATE 
  EXT PARAMETER                                   STEP         FIRST   
  NO.   NAME      VALUE            ERROR          SIZE      DERIVATIVE 
   1  mu           1.05370e+00   6.56499e-02   2.09766e-06   5.75979e+00
   2  sigma        2.07587e+00   4.64543e-02   3.25621e-05  -1.37633e-01
                               ERR DEF= 0.5
 EXTERNAL ERROR MATRIX.    NDIM=  25    NPAR=  2    ERR DEF=0.5
  4.310e-03  3.785e-06 
  3.785e-06  2.158e-03 
 PARAMETER  CORRELATION COEFFICIENTS  
       NO.  GLOBAL      1      2
        1  0.00124   1.000  0.001
        2  0.00124   0.001  1.000
 **********
 **    7 **SET PRINT           1
 **********
 **********
 **    8 **SET NOGRAD
 **********
 PARAMETER DEFINITIONS:
    NO.   NAME         VALUE      STEP SIZE      LIMITS
     1 mu           1.05370e+00  6.56499e-02   -1.00000e+03  1.00000e+03
     2 sigma        2.07587e+00  4.64543e-02    0.00000e+00  1.00000e+03
 **********
 **    9 **SET ERR         0.5
 **********
 **********
 **   10 **SET PRINT           1
 **********
 **********
 **   11 **HESSE        1000
 **********
 FIRST CALL TO USER FUNCTION AT NEW START POINT, WITH IFLAG=4.
 COVARIANCE MATRIX CALCULATED SUCCESSFULLY
 FCN=2149.23 FROM HESSE     STATUS=OK             12 CALLS          13 TOTAL
                     EDM=1.62475e-07    STRATEGY= 1      ERROR MATRIX ACCURATE 
  EXT PARAMETER                                INTERNAL      INTERNAL  
  NO.   NAME      VALUE            ERROR       STEP SIZE       VALUE   
   1  mu           1.05370e+00   6.56501e-02   6.56500e-06   1.05370e-03
   2  sigma        2.07587e+00   4.64544e-02   1.02072e-04  -1.47964e+00
                               ERR DEF= 0.5
 EXTERNAL ERROR MATRIX.    NDIM=  25    NPAR=  2    ERR DEF=0.5
  4.310e-03  8.434e-06 
  8.434e-06  2.158e-03 
 PARAMETER  CORRELATION COEFFICIENTS  
       NO.  GLOBAL      1      2
        1  0.00277   1.000  0.003
        2  0.00277   0.003  1.000
[#1] INFO:Minization -- RooMinimizer::optimizeConst: deactivating const optimization
Warning in <ROOT::Math::FitConfig::CreateMinimizer>: Could not create the Minutit2 minimizer. Try using the minimizer Minuit
Warning in <ROOT::Math::FitConfig::CreateMinimizer>: Could not create the Minutit2 minimizer. Try using the minimizer Minuit
In [7]:
c = ROOT.TCanvas()
data.plotOn(pl)
pdf.plotOn(pl)
pdf.paramOn(pl, ROOT.RooFit.Layout(0.6,0.9,0.85))
pl.Draw()
c.Draw()

Now we mustn't forget to save this data to the workspace! However since import is a protected keyword it cannot be used as a method. Luckily we have a workaround.

In [8]:
getattr(w,'import')(data)
Out[8]:
False
[#1] INFO:ObjectHandling -- RooWorkspace::import(w) importing dataset data

This first workspace only has two parameters mu and sigma. We can easily add a third normalisation parameter for the number of events. Note that jupyter stores the workspace object in memory. So we can should define the workspace ' fresh'

This requires us to move from a likelihood to an extended likelihood formalism as described in the SignalAndBackgroundModels notebook.

In [9]:
w = ROOT.RooWorkspace("w")
w.factory("Gaussian:gaus(x[-10,10],mu[1,-1000,1000],sigma[2,0,1000])")
w.factory("ExtendPdf:pdf(gaus, nevt[100,0,100000])")
w.Print()
RooWorkspace(w) w contents

variables
---------
(mu,nevt,sigma,x)

p.d.f.s
-------
RooGaussian::gaus[ x=x mean=mu sigma=sigma ] = 0.882497
RooExtendPdf::pdf[ pdf=gaus n=nevt ] = 0.882497

In [10]:
pdf = w.pdf("pdf")
x = w.var("x")
In [11]:
pdf.fitTo(data, ROOT.RooFit.Minimizer("Minutit2","Migrad"))
Out[11]:
<ROOT.RooFitResult object at 0x(nil)>
[#1] INFO:Minization -- p.d.f. provides expected number of events, including extended term in likelihood.
[#1] INFO:Minization -- RooMinimizer::optimizeConst: activating const optimization
 **********
 **    1 **SET PRINT           1
 **********
 **********
 **    2 **SET NOGRAD
 **********
 PARAMETER DEFINITIONS:
    NO.   NAME         VALUE      STEP SIZE      LIMITS
     1 mu           1.00000e+00  2.00000e+02   -1.00000e+03  1.00000e+03
     2 nevt         1.00000e+02  5.00000e+01    0.00000e+00  1.00000e+05
     3 sigma        2.00000e+00  1.00000e+00    0.00000e+00  1.00000e+03
 **********
 **    3 **SET ERR         0.5
 **********
 **********
 **    4 **SET PRINT           1
 **********
 **********
 **    5 **SET STR           1
 **********
 NOW USING STRATEGY  1: TRY TO BALANCE SPEED AGAINST RELIABILITY
 **********
 **    6 **MIGRAD        1500           1
 **********
 FIRST CALL TO USER FUNCTION AT NEW START POINT, WITH IFLAG=4.
 START MIGRAD MINIMIZATION.  STRATEGY  1.  CONVERGENCE WHEN EDM .LT. 1.00e-03
 FCN=-2354.16 FROM MIGRAD    STATUS=INITIATE       14 CALLS          15 TOTAL
                     EDM= unknown      STRATEGY= 1      NO ERROR MATRIX       
  EXT PARAMETER               CURRENT GUESS       STEP         FIRST   
  NO.   NAME      VALUE            ERROR          SIZE      DERIVATIVE 
   1  mu           1.00000e+00   2.00000e+02   2.01358e-01  -1.34087e+04
   2  nevt         1.00000e+02   5.00000e+01   1.63770e-02  -2.84463e+04
   3  sigma        2.00000e+00   1.00000e+00   2.31716e-02  -1.74083e+03
                               ERR DEF= 0.5
 MIGRAD MINIMIZATION HAS CONVERGED.
 MIGRAD WILL VERIFY CONVERGENCE AND ERROR MATRIX.
 COVARIANCE MATRIX CALCULATED SUCCESSFULLY
 FCN=-3758.52 FROM MIGRAD    STATUS=CONVERGED      69 CALLS          70 TOTAL
                     EDM=9.78851e-05    STRATEGY= 1      ERROR MATRIX ACCURATE 
  EXT PARAMETER                                   STEP         FIRST   
  NO.   NAME      VALUE            ERROR          SIZE      DERIVATIVE 
   1  mu           1.05418e+00   6.56411e-02   2.78374e-06   1.16746e+02
   2  nevt         1.00002e+03   3.16232e+01   1.34409e-04   2.11506e-01
   3  sigma        2.07558e+00   4.64401e-02   4.33295e-05  -6.11053e+00
                               ERR DEF= 0.5
 EXTERNAL ERROR MATRIX.    NDIM=  25    NPAR=  3    ERR DEF=0.5
  4.309e-03  5.273e-10  5.488e-06 
  5.273e-10  1.000e+03  3.726e-10 
  5.488e-06  3.726e-10  2.157e-03 
 PARAMETER  CORRELATION COEFFICIENTS  
       NO.  GLOBAL      1      2      3
        1  0.00180   1.000  0.000  0.002
        2  0.00000   0.000  1.000  0.000
        3  0.00180   0.002  0.000  1.000
 **********
 **    7 **SET PRINT           1
 **********
 **********
 **    8 **SET NOGRAD
 **********
 PARAMETER DEFINITIONS:
    NO.   NAME         VALUE      STEP SIZE      LIMITS
     1 mu           1.05418e+00  6.56411e-02   -1.00000e+03  1.00000e+03
     2 nevt         1.00002e+03  3.16232e+01    0.00000e+00  1.00000e+05
     3 sigma        2.07558e+00  4.64401e-02    0.00000e+00  1.00000e+03
 **********
 **    9 **SET ERR         0.5
 **********
 **********
 **   10 **SET PRINT           1
 **********
 **********
 **   11 **HESSE        1500
 **********
 FIRST CALL TO USER FUNCTION AT NEW START POINT, WITH IFLAG=4.
 COVARIANCE MATRIX CALCULATED SUCCESSFULLY
 FCN=-3758.52 FROM HESSE     STATUS=OK             16 CALLS          17 TOTAL
                     EDM=9.78147e-05    STRATEGY= 1      ERROR MATRIX ACCURATE 
  EXT PARAMETER                                INTERNAL      INTERNAL  
  NO.   NAME      VALUE            ERROR       STEP SIZE       VALUE   
   1  mu           1.05418e+00   6.56413e-02   6.56411e-06   1.05418e-03
   2  nevt         1.00002e+03   3.16232e+01   3.17862e-04  -1.37046e+00
   3  sigma        2.07558e+00   4.64401e-02   1.02047e-04  -1.47965e+00
                               ERR DEF= 0.5
 EXTERNAL ERROR MATRIX.    NDIM=  25    NPAR=  3    ERR DEF=0.5
  4.309e-03  2.916e-13  9.418e-06 
  2.916e-13  1.000e+03  6.677e-11 
  9.418e-06  6.677e-11  2.157e-03 
 PARAMETER  CORRELATION COEFFICIENTS  
       NO.  GLOBAL      1      2      3
        1  0.00309   1.000  0.000  0.003
        2  0.00000   0.000  1.000  0.000
        3  0.00309   0.003  0.000  1.000
[#1] INFO:Minization -- RooMinimizer::optimizeConst: deactivating const optimization
Warning in <ROOT::Math::FitConfig::CreateMinimizer>: Could not create the Minutit2 minimizer. Try using the minimizer Minuit
Warning in <ROOT::Math::FitConfig::CreateMinimizer>: Could not create the Minutit2 minimizer. Try using the minimizer Minuit
In [12]:
c = ROOT.TCanvas()
data.plotOn(pl)
pdf.plotOn(pl)
pdf.paramOn(pl, ROOT.RooFit.Layout(0.6,0.9,0.85))
pl.Draw()
c.Draw()

This however is an unbinned likelihood fit. Usually our data is binned.

In [13]:
binneddata = pdf.generateBinned(ROOT.RooArgSet(x),n)
binneddata.SetName("binneddata")
In [14]:
pdf.fitTo(binneddata, ROOT.RooFit.Minimizer("Minutit2","Migrad"))
Out[14]:
<ROOT.RooFitResult object at 0x(nil)>
[#1] INFO:Minization -- p.d.f. provides expected number of events, including extended term in likelihood.
[#1] INFO:Minization -- RooMinimizer::optimizeConst: activating const optimization
 **********
 **   12 **SET PRINT           1
 **********
 **********
 **   13 **SET NOGRAD
 **********
 PARAMETER DEFINITIONS:
    NO.   NAME         VALUE      STEP SIZE      LIMITS
     1 mu           1.05418e+00  6.56413e-02   -1.00000e+03  1.00000e+03
     2 nevt         1.00002e+03  3.16232e+01    0.00000e+00  1.00000e+05
     3 sigma        2.07558e+00  4.64401e-02    0.00000e+00  1.00000e+03
 **********
 **   14 **SET ERR         0.5
 **********
 **********
 **   15 **SET PRINT           1
 **********
 **********
 **   16 **SET STR           1
 **********
 NOW USING STRATEGY  1: TRY TO BALANCE SPEED AGAINST RELIABILITY
 **********
 **   17 **MIGRAD        1500           1
 **********
 FIRST CALL TO USER FUNCTION AT NEW START POINT, WITH IFLAG=4.
 START MIGRAD MINIMIZATION.  STRATEGY  1.  CONVERGENCE WHEN EDM .LT. 1.00e-03
 FCN=-3765.72 FROM MIGRAD    STATUS=INITIATE        6 CALLS          24 TOTAL
                     EDM= unknown      STRATEGY= 1      NO ERROR MATRIX       
  EXT PARAMETER               CURRENT GUESS       STEP         FIRST   
  NO.   NAME      VALUE            ERROR          SIZE      DERIVATIVE 
   1  mu           1.05418e+00   6.56413e-02   6.56413e-05  -9.30683e+03
   2  nevt         1.00002e+03   3.16232e+01   3.17861e-03   2.10032e-01
   3  sigma        2.07558e+00   4.64401e-02   1.02047e-03   3.09346e+02
                               ERR DEF= 0.5
 MIGRAD MINIMIZATION HAS CONVERGED.
 MIGRAD WILL VERIFY CONVERGENCE AND ERROR MATRIX.
 COVARIANCE MATRIX CALCULATED SUCCESSFULLY
 FCN=-3765.96 FROM MIGRAD    STATUS=CONVERGED      31 CALLS          49 TOTAL
                     EDM=1.74039e-05    STRATEGY= 1      ERROR MATRIX ACCURATE 
  EXT PARAMETER                                   STEP         FIRST   
  NO.   NAME      VALUE            ERROR          SIZE      DERIVATIVE 
   1  mu           1.09428e+00   6.51695e-02   2.78180e-06   2.49079e+00
   2  nevt         1.00000e+03   3.16227e+01   1.34690e-04   1.53096e-03
   3  sigma        2.06068e+00   4.61184e-02   4.36387e-05   4.09876e+00
                               ERR DEF= 0.5
 EXTERNAL ERROR MATRIX.    NDIM=  25    NPAR=  3    ERR DEF=0.5
  4.247e-03 -5.181e-10  4.359e-06 
 -5.181e-10  1.000e+03 -5.317e-13 
  4.359e-06 -5.317e-13  2.127e-03 
 PARAMETER  CORRELATION COEFFICIENTS  
       NO.  GLOBAL      1      2      3
        1  0.00145   1.000 -0.000  0.001
        2  0.00000  -0.000  1.000 -0.000
        3  0.00145   0.001 -0.000  1.000
 **********
 **   18 **SET PRINT           1
 **********
 **********
 **   19 **SET NOGRAD
 **********
 PARAMETER DEFINITIONS:
    NO.   NAME         VALUE      STEP SIZE      LIMITS
     1 mu           1.09428e+00  6.51695e-02   -1.00000e+03  1.00000e+03
     2 nevt         1.00000e+03  3.16227e+01    0.00000e+00  1.00000e+05
     3 sigma        2.06068e+00  4.61184e-02    0.00000e+00  1.00000e+03
 **********
 **   20 **SET ERR         0.5
 **********
 **********
 **   21 **SET PRINT           1
 **********
 **********
 **   22 **HESSE        1500
 **********
 FIRST CALL TO USER FUNCTION AT NEW START POINT, WITH IFLAG=4.
 COVARIANCE MATRIX CALCULATED SUCCESSFULLY
 FCN=-3765.96 FROM HESSE     STATUS=OK             18 CALLS          19 TOTAL
                     EDM=1.73698e-05    STRATEGY= 1      ERROR MATRIX ACCURATE 
  EXT PARAMETER                                INTERNAL      INTERNAL  
  NO.   NAME      VALUE            ERROR       STEP SIZE       VALUE   
   1  mu           1.09428e+00   6.51697e-02   6.51695e-06   1.09428e-03
   2  nevt         1.00000e+03   3.16227e+01   3.17860e-04  -1.37046e+00
   3  sigma        2.06068e+00   4.61184e-02   1.01705e-04  -1.47998e+00
                               ERR DEF= 0.5
 EXTERNAL ERROR MATRIX.    NDIM=  25    NPAR=  3    ERR DEF=0.5
  4.247e-03  9.396e-11  8.216e-06 
  9.396e-11  1.000e+03  6.649e-11 
  8.216e-06  6.649e-11  2.127e-03 
 PARAMETER  CORRELATION COEFFICIENTS  
       NO.  GLOBAL      1      2      3
        1  0.00273   1.000  0.000  0.003
        2  0.00000   0.000  1.000  0.000
        3  0.00273   0.003  0.000  1.000
[#1] INFO:Minization -- RooMinimizer::optimizeConst: deactivating const optimization
Warning in <ROOT::Math::FitConfig::CreateMinimizer>: Could not create the Minutit2 minimizer. Try using the minimizer Minuit
Warning in <ROOT::Math::FitConfig::CreateMinimizer>: Could not create the Minutit2 minimizer. Try using the minimizer Minuit
In [15]:
pl = x.frame(ROOT.RooFit.Title("Binned Data"))
c = ROOT.TCanvas()
binneddata.plotOn(pl)
pdf.plotOn(pl)
pdf.paramOn(pl, ROOT.RooFit.Layout(0.6,0.9,0.85))
pl.Draw()
c.Draw()

Fitting Signal Over Exponential Background

This section recaps the excecises seen in BuildCompositeModel. Here we make two p.d.f, one representing a signal and one a background distributions. We want to determine the number of signal events. For this we need to perform an extended maximum likelihood fit, where the signal events is one of the fit parameter. We will create the composite model formed by a Gaussian signal over a falling exponential background. We will have ~ 100 Gaussian-distributed signal event over ~ 1000 exponential-distributed background. We assume that the location and width of the signal are known. The "mass" observable is distributed in range [0 - 10] and the signal has mean ~ 2 and sigma 0.3. The exponential decay of the background is 2.

In [16]:
nsig = 100
nbkg = 1000

This workspace we will use later so it's best to create it in full now.
Here the model is defined as being an extended pdf formed from the exponential and gaussian pdfs

In [17]:
w = ROOT.RooWorkspace("w")
w.factory("Exponential:bkg_pdf(x[0,10], a[-0.5,-2,-0.2])")
w.factory("Gaussian:sig_pdf(x, mass[2], sigma[0.3])")
w.factory("SUM:model(nsig[0,10000]*sig_pdf, nbkg[0,10000]*bkg_pdf)")
Out[17]:
<ROOT.RooAddPdf object ("model") at 0x70a2740>

Once again we retrieve our variables from the workspace.
'x' here is the observable that we're interested in.

In [18]:
pdf = w.pdf("model")
x = w.var("x")

We set the desired value of the signal and background events.

In [19]:
w.var("nsig").setVal(nsig)
w.var("nbkg").setVal(nbkg)
w.Print()
RooWorkspace(w) w contents

variables
---------
(a,mass,nbkg,nsig,sigma,x)

p.d.f.s
-------
RooExponential::bkg_pdf[ x=x c=a ] = 0.082085
RooAddPdf::model[ nsig * sig_pdf + nbkg * bkg_pdf ] = 0.0746227
RooGaussian::sig_pdf[ x=x mean=mass sigma=sigma ] = 1.92875e-22

Let's generate us some data!

If not defined generate will generate according to the total S+B events

(NB RooStats has a random number generator in it's core functions. This can have a random seed set. This isn't implemented in python yet but also isn't hugely important unless you want to generate the same 'random' numbers every time.)

In [20]:
x.setBins(50)
data = pdf.generate(ROOT.RooArgSet(x))
data.SetName("data")
getattr(w,'import')(data)
data.Print()
[#1] INFO:ObjectHandling -- RooWorkspace::import(w) importing dataset data
RooDataSet::data[x] = 1100 entries
In [21]:
c = ROOT.TCanvas()
plot = x.frame(ROOT.RooFit.Title("Gaussian Signal over Exponential Background"))
data.plotOn(plot)
plot.Draw()
c.Draw()

Time to fit. This time we assign the fit result to a variable that we can access later.

In [22]:
r = pdf.fitTo(data, ROOT.RooFit.Save(True), ROOT.RooFit.Minimizer("Minuit2","Migrad"))
r.Print()
[#1] INFO:Minization -- p.d.f. provides expected number of events, including extended term in likelihood.
[#1] INFO:Minization -- RooMinimizer::optimizeConst: activating const optimization
[#1] INFO:Minization --  The following expressions have been identified as constant and will be precalculated and cached: (sig_pdf)
[#1] INFO:Minization --  The following expressions will be evaluated in cache-and-track mode: (bkg_pdf)
Minuit2Minimizer: Minimize with max-calls 1500 convergence for edm < 1 strategy 1
MnSeedGenerator: for initial parameters FCN = -4800.709155313
MnSeedGenerator: Initial state:   - FCN =  -4800.709155313 Edm =     0.157376 NCalls =     13
VariableMetric: start iterating until Edm is < 0.001
VariableMetric: Initial state   - FCN =  -4800.709155313 Edm =     0.157376 NCalls =     13
VariableMetric: Iteration #   0 - FCN =  -4800.709155313 Edm =     0.157376 NCalls =     13
VariableMetric: Iteration #   1 - FCN =  -4800.894381229 Edm =    0.0246325 NCalls =     21
VariableMetric: Iteration #   2 - FCN =  -4800.920688773 Edm =  0.000295129 NCalls =     28
VariableMetric: After Hessian   - FCN =  -4800.920688773 Edm =  0.000309086 NCalls =     44
VariableMetric: Iteration #   3 - FCN =  -4800.920688773 Edm =  0.000309086 NCalls =     44
Minuit2Minimizer : Valid minimum - status = 0
FVAL  = -4800.9206887733335
Edm   = 0.000309086173302367386
Nfcn  = 44
a	  = -0.502626	 +/-  0.0173349	(limited)
nbkg	  = 1011.71	 +/-  35.7239	(limited)
nsig	  = 88.2408	 +/-  18.7975	(limited)
[#1] INFO:Minization -- RooMinimizer::optimizeConst: deactivating const optimization

  RooFitResult: minimized FCN value: -4800.92, estimated distance to minimum: 0.000309153
                covariance matrix quality: Full, accurate covariance matrix
                Status : MINIMIZE=0 HESSE=0 

    Floating Parameter    FinalValue +/-  Error   
  --------------------  --------------------------
                     a   -5.0263e-01 +/-  1.73e-02
                  nbkg    1.0117e+03 +/-  3.57e+01
                  nsig    8.8241e+01 +/-  1.88e+01

Info in <Minuit2>: Minuit2Minimizer::Hesse : Hesse is valid - matrix is accurate
In [23]:
c = ROOT.TCanvas()
plot = x.frame(ROOT.RooFit.Title("Fit Over Composite"))
data.plotOn(plot)
pdf.plotOn(plot)
pdf.plotOn(plot, ROOT.RooFit.Components("bkg_pdf"), ROOT.RooFit.LineStyle(2) )
pdf.plotOn(plot, ROOT.RooFit.Components("sig_pdf"), ROOT.RooFit.LineColor(2), ROOT.RooFit.LineStyle(2) )

pdf.paramOn(plot,ROOT.RooFit.Layout(0.5,0.9,0.85))

plot.Draw()
c.Draw()
[#1] INFO:Plotting -- RooAbsPdf::plotOn(model) directly selected PDF components: (bkg_pdf)
[#1] INFO:Plotting -- RooAbsPdf::plotOn(model) indirectly selected PDF components: ()
[#1] INFO:Plotting -- RooAbsPdf::plotOn(model) directly selected PDF components: (sig_pdf)
[#1] INFO:Plotting -- RooAbsPdf::plotOn(model) indirectly selected PDF components: ()

Now in addition to the data and pdfs in the Workspace we should also include a map to what these things correspond to. This is done via a ModelConfig.

This might seem pointless but it's really really important when trying to retrieve information from these workspaces.

In [24]:
mc = ROOT.RooStats.ModelConfig("ModelConfig",w)
mc.SetPdf(pdf)
mc.SetParametersOfInterest(ROOT.RooArgSet(w.var("nsig")))
mc.SetSnapshot(ROOT.RooArgSet(w.var("nsig")))
mc.SetObservables(ROOT.RooArgSet(w.var("x")))
w.defineSet("nuisParams","a,nbkg")
nuis = getattr(w,'set')("nuisParams")
mc.SetNuisanceParameters(nuis)
getattr(w,'import')(mc)
Out[24]:
False

Now that this is added to the workspace we can save it to file.

In [25]:
w.writeToFile("output/GausExpModel.root",True)
Out[25]:
False

Simultaneous Fitting

Often we have to fit to two or more channels simultaneously. We extend our RooSimultaneous p.d.f. using a combined data set, which can be fit for one (or more) common parameters.

  • The first channel will be similar to the one of the previous exercise. A Gaussian signal over an Exponential decay background. We assume as before that the location and width of the signal are known. The "mass" observable is distributed in range [0 - 10] and the signal has mean ~ 2 and sigma 0.3. The exponential decay of the background is 2. We assume we have 1000 background events.

  • The second channel will be similar to the first one, we will have the same signal model, but a different background model. We assume we have less and a flatter background. We have an exponential decay of 4 and 100 background events in total.

  • We assume the number of signal events in the first channel is 50 and in the second channel is 30. We introduce a common rate parameter mu expressing the signal strength.

We will follow the same steps as before:

  • Create the models for the two channels
  • Create a combined model using simultaneous pdf (class RooSimultaneous).
  • Generate a combined data set, fit and plot the result
In [26]:
w = ROOT.RooWorkspace("w")

We add each model to the workspace. Starting with the first extended model.

In [27]:
w.factory("Exponential:bkg1_pdf(x[0,10], a1[-0.5,-2,-0.2])")
w.factory("Gaussian:sig_pdf(x, mass[2], sigma[0.3])")
w.factory("prod:nsig1(mu[1,0,5],xsec1[50])")
w.factory("SUM:model1(nsig1*sig_pdf, nbkg1[1000,0,10000]*bkg1_pdf)")
Out[27]:
<ROOT.RooAddPdf object ("model1") at 0x7c960a0>

and now the second extended model

In [28]:
w.factory("Exponential:bkg2_pdf(x, a2[-0.25,-2,-0.2])")
w.factory("prod:nsig2(mu,xsec2[30])")
w.factory("SUM:model2(nsig2*sig_pdf, nbkg2[100,0,10000]*bkg2_pdf)")
Out[28]:
<ROOT.RooAddPdf object ("model2") at 0x7d5c040>

and now we join them using a 'jointModel'

In [29]:
w.factory("index[channel1,channel2]")
w.factory("SIMUL:jointModel(index,channel1=model1,channel2=model2)")
Out[29]:
<ROOT.RooSimultaneous object ("jointModel") at 0x7de59d0>

We now extract our objects to work on.

x is once again the observable index is now the category (or channel)

the index also must be added to the fit as an observable later...

In [30]:
pdf = w.pdf("jointModel")
x = w.var("x")
index = w.cat("index")

now we generate some binned data. 100 bins is the default.

In [31]:
x.setBins(50)
data = pdf.generate(ROOT.RooArgSet(x,index))
data.SetName("data")
getattr(w,'import')(data)
data.Print()
[#1] INFO:ObjectHandling -- RooWorkspace::import(w) importing dataset data
RooDataSet::data[x,index] = 1180 entries

Soon we will fit to the data. First let's look at the data we are fitting.

In [32]:
c = ROOT.TCanvas()
plot1 = x.frame(ROOT.RooFit.Title("Channel 1"))
plot2 = x.frame(ROOT.RooFit.Title("Channel 2"))
data.plotOn(plot1,ROOT.RooFit.Cut("index==index::channel1"))
data.plotOn(plot2,ROOT.RooFit.Cut("index==index::channel2"))
c.Divide(1,2)
c.cd(1)
plot1.Draw()
c.cd(2)
plot2.Draw()
c.Draw()
[#1] INFO:Plotting -- RooTreeData::plotOn: plotting 1053 events out of 1180 total events
[#1] INFO:Plotting -- RooTreeData::plotOn: plotting 127 events out of 1180 total events

And the fit. We save it again and print the save.

In [33]:
r = pdf.fitTo(data, ROOT.RooFit.Save(True), ROOT.RooFit.Minimizer("Minuit2","Migrad"))
r.Print()
[#1] INFO:Minization -- p.d.f. provides expected number of events, including extended term in likelihood.
[#1] INFO:Minization -- RooMinimizer::optimizeConst: activating const optimization
RooAbsTestStatistic::initSimMode: creating slave calculator #0 for state channel1 (1053 dataset entries)
RooAbsTestStatistic::initSimMode: creating slave calculator #1 for state channel2 (127 dataset entries)
[#1] INFO:Fitting -- RooAbsTestStatistic::initSimMode: created 2 slave calculators.
[#1] INFO:Minization --  The following expressions have been identified as constant and will be precalculated and cached: (sig_pdf)
[#1] INFO:Minization --  The following expressions will be evaluated in cache-and-track mode: (bkg1_pdf)
[#1] INFO:Minization --  The following expressions have been identified as constant and will be precalculated and cached: (sig_pdf)
[#1] INFO:Minization --  The following expressions will be evaluated in cache-and-track mode: (bkg2_pdf)
Minuit2Minimizer: Minimize with max-calls 2500 convergence for edm < 1 strategy 1
MnSeedGenerator: for initial parameters FCN = -3980.572707488
MnSeedGenerator: Initial state:   - FCN =  -3980.572707488 Edm =      3.64239 NCalls =     21
VariableMetric: start iterating until Edm is < 0.001
VariableMetric: Initial state   - FCN =  -3980.572707488 Edm =      3.64239 NCalls =     21
VariableMetric: Iteration #   0 - FCN =  -3980.572707488 Edm =      3.64239 NCalls =     21
VariableMetric: Iteration #   1 - FCN =  -3982.048070012 Edm =     0.682633 NCalls =     35
VariableMetric: Iteration #   2 - FCN =  -3982.645283192 Edm =    0.0961719 NCalls =     47
VariableMetric: Iteration #   3 - FCN =  -3982.713439224 Edm =   0.00590045 NCalls =     59
VariableMetric: Iteration #   4 - FCN =  -3982.718522102 Edm =  0.000164098 NCalls =     71
VariableMetric: After Hessian   - FCN =  -3982.718522102 Edm =   0.00014359 NCalls =    104
VariableMetric: Iteration #   5 - FCN =  -3982.718522102 Edm =   0.00014359 NCalls =    104
Minuit2Minimizer : Valid minimum - status = 0
FVAL  = -3982.71852210233692
Edm   = 0.000143589756360758324
Nfcn  = 104
a1	  = -0.497784	 +/-  0.0174227	(limited)
a2	  = -0.308357	 +/-  0.0456454	(limited)
mu	  = 1.26228	 +/-  0.212139	(limited)
nbkg1	  = 989.145	 +/-  33.3914	(limited)
nbkg2	  = 89.7669	 +/-  10.4851	(limited)
[#1] INFO:Minization -- RooMinimizer::optimizeConst: deactivating const optimization

  RooFitResult: minimized FCN value: -3982.72, estimated distance to minimum: 0.000143531
                covariance matrix quality: Full, accurate covariance matrix
                Status : MINIMIZE=0 HESSE=0 

    Floating Parameter    FinalValue +/-  Error   
  --------------------  --------------------------
                    a1   -4.9778e-01 +/-  1.74e-02
                    a2   -3.0836e-01 +/-  4.56e-02
                    mu    1.2623e+00 +/-  2.12e-01
                 nbkg1    9.8914e+02 +/-  3.34e+01
                 nbkg2    8.9767e+01 +/-  1.05e+01

Info in <Minuit2>: Minuit2Minimizer::Hesse : Hesse is valid - matrix is accurate
In [34]:
c = ROOT.TCanvas()
plot1 = x.frame(ROOT.RooFit.Title("Channel 1"))
plot2 = x.frame(ROOT.RooFit.Title("Channel 2"))
data.plotOn(plot1,ROOT.RooFit.Cut("index==index::channel1"))
data.plotOn(plot2,ROOT.RooFit.Cut("index==index::channel2"))

pdf.plotOn(plot1,ROOT.RooFit.ProjWData(data),ROOT.RooFit.Slice(w.cat("index"),"channel1"))
pdf.plotOn(plot2,ROOT.RooFit.ProjWData(data),ROOT.RooFit.Slice(w.cat("index"),"channel2"))

pdf.paramOn(plot1,ROOT.RooFit.Layout(0.65,0.85,0.85),ROOT.RooFit.Parameters(ROOT.RooArgSet(w.var("a1"),w.var("nbkg1"))))
pdf.paramOn(plot2,ROOT.RooFit.Layout(0.65,0.85,0.85),ROOT.RooFit.Parameters(ROOT.RooArgSet(w.var("a2"),w.var("nbkg2"))))
pdf.paramOn(plot2,ROOT.RooFit.Layout(0.65,0.85,0.7),ROOT.RooFit.Parameters(ROOT.RooArgSet(w.var("mu"))))

c.Divide(1,2)
c.cd(1)
plot1.Draw()
c.cd(2)
plot2.Draw()
c.Draw()
[#1] INFO:Plotting -- RooTreeData::plotOn: plotting 1053 events out of 1180 total events
[#1] INFO:Plotting -- RooTreeData::plotOn: plotting 127 events out of 1180 total events
[#1] INFO:Plotting -- RooSimultaneous::plotOn(jointModel) plot on x represents a slice in the index category (index)
[#1] INFO:Plotting -- RooAbsReal::plotOn(model1) slice variable index was not projected anyway
[#1] INFO:Plotting -- RooSimultaneous::plotOn(jointModel) plot on x represents a slice in the index category (index)
[#1] INFO:Plotting -- RooAbsReal::plotOn(model2) slice variable index was not projected anyway
In [35]:
ROOT.RooStats.ModelConfig("ModelConfig",w)
mc.SetPdf(pdf)
mc.SetParametersOfInterest(ROOT.RooArgSet(w.var("mu")))
mc.SetObservables(ROOT.RooArgSet(w.var("x"),w.cat("index")))
w.defineSet("nuisParams","a1,nbkg1,a2,nbkg2")
mc.SetNuisanceParameters(w.set("nuisParams"))

At this point we should save a snapshot and import it all into the workspace

In [36]:
mc.SetSnapshot(ROOT.RooArgSet(w.var("mu")))
getattr(w,'import')(mc)
Out[36]:
False
[#1] INFO:ObjectHandling -- RooWorkspace::saveSnaphot(w) replacing previous snapshot with name ModelConfig__snapshot

And save!

In [37]:
w.writeToFile("output/CombinedModel.root",True)
Out[37]:
False
In [ ]: