In [1]:
import ROOT
ROOT.RooMsgService.instance().setGlobalKillBelow(5)
%jsroot on

from matplotlib import pyplot
%matplotlib inline
Welcome to JupyROOT 6.07/07

Reading Workspaces and Recreating Models

First we read in the signal region model from the workspace building notebook.

In [2]:
infile = ROOT.TFile.Open("output/mH_BinModel.root")
w = infile.Get("w")
mc = w.obj("ModelConfig")
data = w.data("data")
mc.LoadSnapshot()
x = w.var("x")

Compute the significance (Hypothesis Test)

The aim of this exercise is to compute the significance of observing a signal in the Gaussian plus Exponential model. It is better to re-generate the model using a lower value for the number of signal events (e.g. 500), in order not to have a too high significance, which will then be difficult to estimate if we use pseudo-experiments. To estimate the significance, we need to perform an hypothesis test. We want to disprove the null model, i.e the background only model against the alternate model, the background plus the signal. In RooStats, we do this by defining two two ModelConfig objects, one for the null model (the background only model in this case) and one for the alternate model (the signal plus background).

We have defined in the workspace saved in the ROOT file only the signal plus background model. We can create the background model from the signal plus background model by setting the value of nsig=0. We do this by cloning the S+B model and by defining a snapshot in the ModelConfig for nsig with a different value:

In [3]:
sbModel = w.obj("ModelConfig")
poi = sbModel.GetParametersOfInterest().first()
poi.setVal(500)
sbModel.SetSnapshot(ROOT.RooArgSet(poi))
In [4]:
bModel = sbModel.Clone()
bModel.SetName("B_only_model")
poi.setVal(0)
bModel.SetSnapshot(ROOT.RooArgSet(poi))

Using the given models plus the observed data set, we can create a RooStats hypothesis test calculator to compute the significance. We have the choice between

  • Frequentist calculator
  • Hybrid Calculator
  • Asymptotic calculator

The first two require generate pseudo-experiment, while the Asymptotic calculator, requires only a fit to the two models. For the Frequentist and the Hybrid calculators we need to choose also the test statistics. We have various choices in RooStats. Since it is faster, we will use for the exercise the Asymptotic calculator. The Asymptotic calculator it is based on the Profile Likelihood test statistics, the one used at LHC (see slides for its definition). For testing the significance, we must use the one-side test statistic (we need to configure this explicitly in the class). Summarising, we will do:

  • create the AsymptoticCalculator class using the two models and the data set;
  • run the test of hypothesis using the GetHypoTest function.
  • Look at the result obtained as a HypoTestResult object
In [5]:
infile = ROOT.TFile.Open("output/SRselection.root")
w = infile.Get("w")
data = w.data("data")
sbModel = w.obj("ModelConfig")
sbModel.SetName("S+B Model")
poi = sbModel.GetParametersOfInterest().first()
poi.setVal(500)
sbModel.SetSnapshot(ROOT.RooArgSet(poi))
bModel = sbModel.Clone()
bModel.SetName("B Model")
poi.setVal(0)
bModel.SetSnapshot(ROOT.RooArgSet(poi))
In [6]:
w.Print()
RooWorkspace(w) w contents

variables
---------
(n_bkg,n_sig,x)

p.d.f.s
-------
RooHistPdf::bkgpdf[ pdfObs=(x) ] = 30.1693
RooAddPdf::model[ n_sig * sigpdf + n_bkg * bkgpdf ] = 29.7313
RooHistPdf::sigpdf[ pdfObs=(x) ] = 10.8357

datasets
--------
RooDataSet::data(x)

embedded datasets (in pdfs and functions)
-----------------------------------------
RooDataHist::sighist(x)
RooDataHist::bkghist(x)

parameter snapshots
-------------------
ModelConfig__snapshot = (n_sig=421.877)
S+B Model__snapshot = (n_sig=500)
B Model__snapshot = (n_sig=42.1877)

named sets
----------
B Model__snapshot:(n_sig)
ModelConfig_NuisParams:(n_bkg)
ModelConfig_Observables:(x)
ModelConfig_POI:(n_sig)
ModelConfig__snapshot:(n_sig)
S+B Model__snapshot:(n_sig)
nuisParams:(n_bkg)

generic objects
---------------
RooStats::ModelConfig::S+B Model

In [7]:
ac = ROOT.RooStats.AsymptoticCalculator(data, sbModel, bModel)
ac.SetOneSidedDiscovery(True)
AsymptoticCalculator::EvaluateNLL  ........ using Minuit / Migrad with strategy  1 and tolerance 1
 **********
 **    1 **SET PRINT           0
 **********
 **********
 **    2 **SET NOGRAD
 **********
 PARAMETER DEFINITIONS:
    NO.   NAME         VALUE      STEP SIZE      LIMITS
     1 n_bkg        1.81996e+03  8.18984e+02    1.81996e+02  1.81996e+04
     2 n_sig        4.21877e+01  4.17658e+02    4.21877e+01  4.21877e+03
 MINUIT WARNING IN PARAM DEF
 ============== STARTING VALUE IS AT LIMIT.
 MINUIT WARNING IN PARAMETR
 ============== VARIABLE2 IS AT ITS LOWER ALLOWED LIMIT.
 MINUIT WARNING IN PARAMETR
 ============== VARIABLE2 BROUGHT BACK INSIDE LIMITS.
 **********
 **    3 **SET ERR         0.5
 **********
 **********
 **    4 **SET PRINT           0
 **********
 **********
 **    5 **SET STR           1
 **********
 **********
 **    6 **MIGRAD        1000           1
 **********
 MINUIT WARNING IN MIGrad    
 ============== VARIABLE2 IS AT ITS LOWER ALLOWED LIMIT.
 MIGRAD MINIMIZATION HAS CONVERGED.
 MIGRAD WILL VERIFY CONVERGENCE AND ERROR MATRIX.
 FCN=-6392.89 FROM MIGRAD    STATUS=CONVERGED      62 CALLS          63 TOTAL
                     EDM=3.69409e-08    STRATEGY= 1      ERROR MATRIX ACCURATE 
  EXT PARAMETER                                   STEP         FIRST   
  NO.   NAME      VALUE            ERROR          SIZE      DERIVATIVE 
   1  n_bkg        1.72455e+03   1.48962e+02   5.16875e-04   1.75626e-02
   2  n_sig        5.17457e+02   1.44647e+02   1.91545e-03   5.41459e-03
                               ERR DEF= 0.5
AsymptoticCalculator::EvaluateNLL -  value = -6392.89	fit time : Real time 0:00:00, CP time 0.060
MakeAsimov: Setting poi n_sig to a constant value = 500
MakeAsimov: doing a conditional fit for finding best nuisance values 
 **********
 **    1 **SET PRINT           0
 **********
 **********
 **    2 **SET NOGRAD
 **********
 PARAMETER DEFINITIONS:
    NO.   NAME         VALUE      STEP SIZE      LIMITS
     1 n_bkg        1.72455e+03  1.48962e+02    1.81996e+02  1.81996e+04
 **********
 **    3 **SET ERR         0.5
 **********
 **********
 **    4 **SET PRINT           0
 **********
 **********
 **    5 **SET STR           1
 **********
 **********
 **    6 **MIGRAD         500           1
 **********
 MIGRAD MINIMIZATION HAS CONVERGED.
 MIGRAD WILL VERIFY CONVERGENCE AND ERROR MATRIX.
 FCN=-6392.89 FROM MIGRAD    STATUS=CONVERGED      12 CALLS          13 TOTAL
                     EDM=7.31554e-09    STRATEGY= 1      ERROR MATRIX ACCURATE 
  EXT PARAMETER                                   STEP         FIRST   
  NO.   NAME      VALUE            ERROR          SIZE      DERIVATIVE 
   1  n_bkg        1.74159e+03   4.72248e+01   5.14549e-04  -9.17573e-03
                               ERR DEF= 0.5
fit time Real time 0:00:00, CP time 0.000
RooDataSet::AsimovData[x,weight:binWeightAsimov] = 5 entries (2241.59 weighted)
Generated Asimov data for observables RooArgSet:: = (x)
AsymptoticCalculator::EvaluateNLL  ........ using Minuit / Migrad with strategy  1 and tolerance 1
 **********
 **    7 **SET PRINT           0
 **********
 **********
 **    8 **SET NOGRAD
 **********
 PARAMETER DEFINITIONS:
    NO.   NAME         VALUE      STEP SIZE      LIMITS
     1 n_bkg        1.74159e+03  4.72248e+01    1.81996e+02  1.81996e+04
 **********
 **    9 **SET ERR         0.5
 **********
 **********
 **   10 **SET PRINT           0
 **********
 **********
 **   11 **SET STR           1
 **********
 **********
 **   12 **MIGRAD         500           1
 **********
 MIGRAD MINIMIZATION HAS CONVERGED.
 FCN=-6384.83 FROM MIGRAD    STATUS=CONVERGED      11 CALLS          12 TOTAL
                     EDM=2.12159e-18    STRATEGY= 1      ERROR MATRIX ACCURATE 
  EXT PARAMETER                                   STEP         FIRST   
  NO.   NAME      VALUE            ERROR          SIZE      DERIVATIVE 
   1  n_bkg        1.74159e+03   4.72197e+01   0.00000e+00  -2.21009e-07
                               ERR DEF= 0.5
AsymptoticCalculator::EvaluateNLL -  value = -6384.83 for poi fixed at = 500	fit time : Real time 0:00:00, CP time 0.000
In [8]:
asResult = ac.GetHypoTest()
asResult.Print()
AsymptoticCalculator::EvaluateNLL  ........ using Minuit / Migrad with strategy  1 and tolerance 1
 **********
 **   13 **SET PRINT           0
 **********
 **********
 **   14 **SET NOGRAD
 **********
 PARAMETER DEFINITIONS:
    NO.   NAME         VALUE      STEP SIZE      LIMITS
     1 n_bkg        1.72455e+03  1.48962e+02    1.81996e+02  1.81996e+04
 **********
 **   15 **SET ERR         0.5
 **********
 **********
 **   16 **SET PRINT           0
 **********
 **********
 **   17 **SET STR           1
 **********
 **********
 **   18 **MIGRAD         500           1
 **********
 MIGRAD MINIMIZATION HAS CONVERGED.
 MIGRAD WILL VERIFY CONVERGENCE AND ERROR MATRIX.
 FCN=-6387.54 FROM MIGRAD    STATUS=CONVERGED      16 CALLS          17 TOTAL
                     EDM=6.37688e-07    STRATEGY= 1      ERROR MATRIX ACCURATE 
  EXT PARAMETER                                   STEP         FIRST   
  NO.   NAME      VALUE            ERROR          SIZE      DERIVATIVE 
   1  n_bkg        2.19882e+03   4.73477e+01   4.58409e-04  -9.58088e-02
                               ERR DEF= 0.5
AsymptoticCalculator::EvaluateNLL -  value = -6387.54 for poi fixed at = 42.1877	fit time : Real time 0:00:00, CP time 0.000
AsymptoticCalculator::EvaluateNLL  ........ using Minuit / Migrad with strategy  1 and tolerance 1
 **********
 **   19 **SET PRINT           0
 **********
 **********
 **   20 **SET NOGRAD
 **********
 PARAMETER DEFINITIONS:
    NO.   NAME         VALUE      STEP SIZE      LIMITS
     1 n_bkg        2.19882e+03  4.73477e+01    1.81996e+02  1.81996e+04
 **********
 **   21 **SET ERR         0.5
 **********
 **********
 **   22 **SET PRINT           0
 **********
 **********
 **   23 **SET STR           1
 **********
 **********
 **   24 **MIGRAD         500           1
 **********
 MIGRAD MINIMIZATION HAS CONVERGED.
 MIGRAD WILL VERIFY CONVERGENCE AND ERROR MATRIX.
 FCN=-6379.83 FROM MIGRAD    STATUS=CONVERGED      12 CALLS          13 TOTAL
                     EDM=3.0201e-11    STRATEGY= 1      ERROR MATRIX ACCURATE 
  EXT PARAMETER                                   STEP         FIRST   
  NO.   NAME      VALUE            ERROR          SIZE      DERIVATIVE 
   1  n_bkg        2.19847e+03   4.73439e+01   4.59750e-04   6.59346e-04
                               ERR DEF= 0.5
AsymptoticCalculator::EvaluateNLL -  value = -6379.83 for poi fixed at = 42.1877	fit time : Real time 0:00:00, CP time 0.000

Results HypoTestAsymptotic_result: 
 - Null p-value = 0.000535342
 - Significance = 3.27126
 - CL_b: 0.000535342
 - CL_s+b: 0.456887
 - CL_s: 853.45

And the frequentist appproach.

This needs test statistics.

In [9]:
fc = ROOT.RooStats.FrequentistCalculator(data, sbModel, bModel)
fc.SetToys(2000,500)
profll = ROOT.RooStats.ProfileLikelihoodTestStat(sbModel.GetPdf())
profll.SetOneSidedDiscovery(True)

toymcs = ROOT.RooStats.ToyMCSampler(fc.GetTestStatSampler())
toymcs.SetTestStatistic(profll)
In [10]:
if not sbModel.GetPdf().canBeExtended():
    toymcs.SetNEventsPerToy(1)
    print 'adjusting for non-extended formalism'

Run the test!

In [11]:
fqResult = fc.GetHypoTest()
fqResult.Print()
    ----> Doing a re-scan first
    ----> Doing a re-scan first
    ----> Doing a re-scan first
    ----> Doing a re-scan first
    ----> trying with improve
    ----> Doing a re-scan first
    ----> Doing a re-scan first
    ----> trying with improve
    ----> Doing a re-scan first
    ----> Doing a re-scan first
    ----> Doing a re-scan first
    ----> trying with improve
    ----> Doing a re-scan first
    ----> Doing a re-scan first
    ----> Doing a re-scan first
    ----> trying with improve

Results HypoTestCalculator_result: 
 - Null p-value = 0.0005 +/- 0.000499875
 - Significance = 3.29053 +/- 0.281273 sigma
 - Number of Alt toys: 500
 - Number of Null toys: 2000
 - Test statistic evaluated on data: 5.34344
 - CL_b: 0.0005 +/- 0.000499875
 - CL_s+b: 0.448 +/- 0.0222394
 - CL_s: 896 +/- 896.88

and plot!

In [12]:
c = ROOT.TCanvas()
plot = ROOT.RooStats.HypoTestPlot(fqResult)
plot.SetLogYaxis(True)
plot.Draw()
c.Draw()
In [ ]: