import ROOT
%jsroot on
First we read in the Combined Model From the RooStats notebook.
infile = ROOT.TFile.Open("output/GausExpModel.root")
w = infile.Get("w")
mc = w.obj("ModelConfig")
data = w.data("data")
mc.LoadSnapshot()
Now define a ProfileLikelihood calculator and ask for the 68% interval
pl = ROOT.RooStats.ProfileLikelihoodCalculator(data,mc)
pl.SetConfidenceLevel(0.683)
Find parameters at this interval
mc.LoadSnapshot()
interval = pl.GetInterval()
find the iterval on the first Parameter of Interest
firstPOI = mc.GetParametersOfInterest().first()
lowerLimit = interval.LowerLimit(firstPOI)
upperLimit = interval.UpperLimit(firstPOI)
print "68% interval on {0} is [{1},{2}]".format(firstPOI.GetName(),lowerLimit,upperLimit)
We can also plot this.
plot = ROOT.RooStats.LikelihoodIntervalPlot(interval)
plot.SetRange(0,150)
#plot.SetNPoints(50)
c = ROOT.TCanvas()
ROOT.gPad.SetLogy(True)
plot.Draw()
c.Draw()
RooStats has several built in Bayesian calculators. We have the choice between using the MCMC calculator or the numerical one. We will start using the MCMC and optionally in the next exercise we will show how to use the numerical Bayesian calculator. We will use the same model used before for the Profile Likelihood calculator and repeat a similar process.
Once the Data etc are extracted from the workspace. We also need the number of signal events.
To optimise we take a range of possible values (Truth was 100 events)
nsig = w.var("nsig")
nsig.setRange(0,200)
Now we define our MCMC calculator. Again looking at 68.3% confidence level.
mcmc = ROOT.RooStats.MCMCCalculator(data,mc)
mcmc.SetConfidenceLevel(0.683)
Now MCMC methods have various implementations and approaches. Here we must propose a function. The sequential proposal given is usually fairly robust.
sp = ROOT.RooStats.SequentialProposal(0.1)
mcmc.SetProposalFunction(sp)
If you're interested in this I'd recommend looking it up. Here I just define lots of parameters. SetNumIters is related to the number of iterations of the Metropolis-Hastings algorithm SetBurnInSteps concerns how many interations are needed to avoid local minima due to low stats We can construct either a one-sided Bayesian interval (SetLetSideTailFraction=0) or central bayesian interval (SetLetSideTailFraction=0.5)
mcmc.SetNumIters(100000)
mcmc.SetNumBurnInSteps(1000)
mcmc.SetLeftSideTailFraction(0.5)
Now we run the calculator! (may take a while depending on the number of steps)
interval = mcmc.GetInterval()
Now we plot and print!
firstPOI = mc.GetParametersOfInterest().first()
print "68% interval on {0} is : [{1},{2}]".format(firstPOI.GetName(),interval.LowerLimit(firstPOI),interval.UpperLimit(firstPOI))
c = ROOT.TCanvas()
plot = ROOT.RooStats.MCMCIntervalPlot(interval)
plot.Draw()
c.Draw()
We can try now to use also the numerical Bayesian calculator on the same model. As before we create the class and configure it setting for example the integration type or the number of iterations. Since the numerical integration can be tricky, it is better in this case to set before a reasonable range from the nuisance parameters, which will be integrated to obtain the marginalised posterior function. A sensible choice is to use for example an interval using 10-sigma around the best fit values for the nuisance parameters.
Note that we have not defined a prior distribution for the parameter of interest (the number of signal events). If a prior is not defined in the model, it is assumed that the prior distribution for the parameter of interest is the uniform distribution in the parameter range.
define reasanable range for nuisance parameters ( +/- 10 sigma around fitted values) to facilitate the integral
mc.Print()
Unfortunately iterating over parameters is really ikky in python. Also pointers are harder to define.
nuisPar = mc.GetNuisanceParameters()
nuis_iter = nuisPar.createIterator()
var = nuis_iter.Next()
while var:
var.setRange(var.getVal()-10*var.getError(), var.getVal()+10*var.getError())
var = nuis_iter.Next()
As before, we define our calculator at 68.3% confidence level.
Again the interval can be defined but here central is default. SetLeftSideTailFraction = {0.5: central interval, 0.: upper limit, NULL:shortest}
the intergration can also be defined using integration type. Default is ADAPTIVE but also available are "VEGAS" , "MISER", "PLAIN" and "TOYMC" if using TOYMC you also need the following lines:
Now we compute interval by scanning the posterior function.
bayesianCalc = ROOT.RooStats.BayesianCalculator(data,mc)
bayesianCalc.SetConfidenceLevel(0.683)
bayesianCalc.SetLeftSideTailFraction(0.5)
bayesianCalc.SetNumIters(1000)
bayesianCalc.SetIntegrationType("")
bayesianCalc.SetScanOfPosterior(50)
Now we compute interval by scanning the posterior function.
firstPOI = mc.GetParametersOfInterest().first()
interval = bayesianCalc.GetInterval()
lowerLimit = interval.LowerLimit()
upperLimit = interval.UpperLimit()
print "68% interval on {0} is : [{1},{2}]".format(firstPOI.GetName(),lowerLimit,upperLimit)
c = ROOT.TCanvas()
plot = bayesianCalc.GetPosteriorPlot()
plot.Draw()
c.Draw()
We create now a counting model based on Poisson Statistics. Let's suppose we observe nobs events when we expect nexp, where nexp = s+b (s is the number of signal events and b is the number of background events. The expected distribution for nexp is a Poisson : Poisson( nobs | s+b). s is the parameter we want to estimate or set a limit (the parameter of interest), while b is the nuisance parameter. b is not known exactly, but has uncertainty sigmab. The nominal value of b is = b0=. To express this uncertainty we add in the model a Gaussian constraint. We can interpret this term as having an additional measurement b0 with an uncertainty sigmab: i.e. we have a likelihood for that measurement =Gaussian( b0 | b, sigma). In case of Bayesian statistics we can interpret the constraint as a prior knowledge on the parameter b.
here we define nobs as the number of observed events, b as the number of background events, sigmab is the relative uncertainty in number of background events.
nobs = 3
b = 1
sigmab = 0.2
Now we can define a new poisson model with a Gaussian constraint. The Poisson is (n | s+b)
w = ROOT.RooWorkspace("w")
w.factory("sum:nexp(s[3,0,15],b[1,0,10])")
w.factory("Poisson:pdf(nobs[0,50],nexp)")
w.factory("Gaussian:constraint(b0[0,10],b,sigmab[1])")
w.factory("PROD:model(pdf,constraint)")
and set the values. b0 must be set constant in order to be considered as a global observable.
w.var("b0").setVal(b)
w.var("b0").setConstant(True)
w.var("sigmab").setVal(sigmab*b)
We create the model config, and make sure we make a snapshot (we'll need this later for doing hypothesis testing)
mc = ROOT.RooStats.ModelConfig("ModelConfig",w)
mc.SetPdf(w.pdf("model"))
mc.SetParametersOfInterest(ROOT.RooArgSet(w.var("s")))
mc.SetObservables(ROOT.RooArgSet(w.var("nobs")))
mc.SetNuisanceParameters(ROOT.RooArgSet(w.var("b")))
mc.SetSnapshot(ROOT.RooArgSet(w.var("s")))
mc.SetGlobalObservables(ROOT.RooArgSet(w.var("b0")))
mc.Print()
getattr(w,'import')(mc)
make data set with the namber of observed events
data = ROOT.RooDataSet("data","", ROOT.RooArgSet(w.var("nobs")))
w.var("nobs").setVal(3)
data.add(ROOT.RooArgSet(w.var("nobs") ))
getattr(w,'import')(data)
w.Print()
And write to file!
w.writeToFile("output/CountingModel.root", True)
pl = ROOT.RooStats.ProfileLikelihoodCalculator(data,mc)
pl.SetConfidenceLevel(0.95)
interval = pl.GetInterval()
firstPOI = mc.GetParametersOfInterest().first()
lowerLimit = interval.LowerLimit(firstPOI)
upperLimit = interval.UpperLimit(firstPOI)
print "95% interval on {0} is [{1},{2}]".format(firstPOI.GetName(),lowerLimit,upperLimit)
nuisPar = mc.GetNuisanceParameters()
nuis_iter = nuisPar.createIterator()
var = nuis_iter.Next()
while var:
var.setRange(var.getVal()-10*var.getError(), var.getVal()+10*var.getError())
var = nuis_iter.Next()
bayesianCalc = ROOT.RooStats.BayesianCalculator(data,mc)
bayesianCalc.SetConfidenceLevel(0.95)
bayesianCalc.SetLeftSideTailFraction(0.) # get the upperlimit
bayesianCalc.SetNumIters(1000)
bayesianCalc.SetIntegrationType("")
bayesianCalc.SetScanOfPosterior(50)
interval = bayesianCalc.GetInterval()
firstPOI = mc.GetParametersOfInterest().first()
lowerLimit = interval.LowerLimit()
upperLimit = interval.UpperLimit()
print "95% interval on {0} is [{1},{2}]".format(firstPOI.GetName(),lowerLimit,upperLimit)
c = ROOT.TCanvas()
plot = bayesianCalc.GetPosteriorPlot()
plot.Draw()
c.Draw()
The aim of this exercise is to compute the significance of observing a signal in the Gaussian plus Exponential model. It is better to re-generate the model using a lower value for the number of signal events (e.g. 50), in order not to have a too high significance, which will then be difficult to estimate if we use pseudo-experiments. To estimate the significance, we need to perform an hypothesis test. We want to disprove the null model, i.e the background only model against the alternate model, the background plus the signal. In RooStats, we do this by defining two two ModelConfig objects, one for the null model (the background only model in this case) and one for the alternate model (the signal plus background).
We have defined in the workspace saved in the ROOT file only the signal plus background model. We can create the background model from the signal plus background model by setting the value of nsig=0. We do this by cloning the S+B model and by defining a snapshot in the ModelConfig for nsig with a different value:
sbModel = w.obj("ModelConfig")
poi = sbModel.GetParametersOfInterest().first()
poi.setVal(50)
sbModel.SetSnapshot(ROOT.RooArgSet(poi))
bModel = sbModel.Clone()
bModel.SetName("B_only_model")
poi.setVal(0)
bModel.SetSnapshot(ROOT.RooArgSet(poi))
Using the given models plus the observed data set, we can create a RooStats hypothesis test calculator to compute the significance. We have the choice between
The first two require generate pseudo-experiment, while the Asymptotic calculator, requires only a fit to the two models. For the Frequentist and the Hybrid calculators we need to choose also the test statistics. We have various choices in RooStats. Since it is faster, we will use for the exercise the Asymptotic calculator. The Asymptotic calculator it is based on the Profile Likelihood test statistics, the one used at LHC (see slides for its definition). For testing the significance, we must use the one-side test statistic (we need to configure this explicitly in the class). Summarising, we will do:
infile = ROOT.TFile.Open("output/GausExpModel.root")
w = infile.Get("w")
data = w.data("data")
sbModel = w.obj("ModelConfig")
sbModel.SetName("S+B Model")
poi = sbModel.GetParametersOfInterest().first()
poi.setVal(50)
sbModel.SetSnapshot(ROOT.RooArgSet(poi))
bModel = sbModel.Clone()
bModel.SetName("B Model")
poi.setVal(0)
bModel.SetSnapshot(ROOT.RooArgSet(poi))
w.Print()
ac = ROOT.RooStats.AsymptoticCalculator(data, sbModel, bModel)
ac.SetOneSidedDiscovery(True)
asResult = ac.GetHypoTest()
asResult.Print()
And the frequentist appproach.
This needs test statistics.
fc = ROOT.RooStats.FrequentistCalculator(data, sbModel, bModel)
fc.SetToys(2000,500)
profll = ROOT.RooStats.ProfileLikelihoodTestStat(sbModel.GetPdf())
profll.SetOneSidedDiscovery(True)
toymcs = ROOT.RooStats.ToyMCSampler(fc.GetTestStatSampler())
toymcs.SetTestStatistic(profll)
if not sbModel.GetPdf().canBeExtended():
toymcs.SetNEventsPerToy(1)
print 'adjusting for non-extended formalism'
Run the test!
fqResult = fc.GetHypoTest()
fqResult.Print()
and plot!
c = ROOT.TCanvas()
plot = ROOT.RooStats.HypoTestPlot(fqResult)
plot.SetLogYaxis(True)
plot.Draw()
c.Draw()
As an optional exercise to lear more about computing significance in RooStats, we will study the significance (or equivalent the p-value) we obtain in our Gaussian signal plus exponential background model as function of the signal mass hypothesis. For doing this we perform the test of hypothesis, using the AsymptoticCalculator, for several mass points. We can then plot the obtained p-value for the null hypothesis (p0). We will also estimate the expected significance for our given S+B model as function of its mass. The expected significance can be obtained using a dedicated function in the AsymptoticCalculator: AsymptoticCalculator::GetExpectedPValues using as input the observed p-values for the null and the alternate models. See the Asymptotic formulae paper for the details.
massMin = 0.5
massMax = 8.5
masses = []
p0values = []
p0valuesExpected = []
stepsize = (massMax-massMin)/40.0
masslist = [massMin + i*stepsize for i in range(41)]
for mass in masslist:
w.var("mass").setVal(mass)
ac = ROOT.RooStats.AsymptoticCalculator(data, sbModel, bModel)
ac.SetOneSidedDiscovery(True)
ac.SetPrintLevel(-1)
asymCalcResult = ac.GetHypoTest()
masses.append(mass)
p0values.append(asymCalcResult.NullPValue())
expectedP0 = ROOT.RooStats.AsymptoticCalculator.GetExpectedPValues(
asymCalcResult.NullPValue(), asymCalcResult.AlternatePValue(), 0, False)
p0valuesExpected.append(expectedP0)
import numpy as np
c = ROOT.TCanvas()
graph1 = ROOT.TGraph(len(masses),np.asarray(masses),np.asarray(p0values))
graph2 = ROOT.TGraph(len(masses),np.asarray(masses),np.asarray(p0valuesExpected))
graph1.SetMarkerStyle(20)
graph1.Draw("APC")
graph2.SetLineStyle(2)
graph2.Draw("C")
graph1.GetXaxis().SetTitle("mass")
graph1.GetYaxis().SetTitle("p0 value")
graph1.SetTitle("Significance vs Mass")
graph1.SetMinimum(graph2.GetMinimum())
graph1.SetLineColor(4)
graph2.SetLineColor(2)
ROOT.gPad.SetLogy(True)
c.Draw()
In our last exercise we will compute an upper limit using the frequentist hypothesis test inversion method. We need to perform the hypothesis test for different parameter of interest points and compute the corresponding p-values. Since we are interesting in computing a limit, the test null hypothesis, that we want to disprove, is the in this case the S+B model, while the alternate hypothesis is the B only model. It is important to remember this, when we construct the hypothesis test calculator.
For doing the inversion RooStats provides an helper class, HypoTestInverter, which is constructed using one of the HypoTest calculator (Frequentist, Hybrid or Asymptotic). To compute the limit we need to do:
infile = ROOT.TFile.Open("output/CountingModel.root")
w = infile.Get("w")
mc = w.obj("ModelConfig")
data = w.data("data")
sbModel = w.obj("ModelConfig")
poi = sbModel.GetParametersOfInterest().first()
bModel = sbModel.Clone()
bModel.SetName(sbModel.GetName()+"_with_poi_0")
poi.setVal(0)
bModel.SetSnapshot(ROOT.RooArgSet(poi))
fc = ROOT.RooStats.FrequentistCalculator(data, bModel, sbModel)
fc.SetToys(1000,500)
ac = ROOT.RooStats.AsymptoticCalculator(data, bModel, sbModel)
ac.SetOneSided(True)
ac.SetPrintLevel(-1)
calc = ROOT.RooStats.HypoTestInverter(ac)
calc.SetConfidenceLevel(0.95)
for CLS
calc.UseCLs(True)
calc.SetVerbose(False)
toymcs = calc.GetHypoTestCalculator().GetTestStatSampler()
Profile likelihood test statistics are used one sided for CLs since it's a bounded interval.
profll = ROOT.RooStats.ProfileLikelihoodTestStat(sbModel.GetPdf())
profll.SetOneSided(True)
toymcs.SetTestStatistic(profll)
if not sbModel.GetPdf().canBeExtended():
toymcs.SetNEventsPerToy(1)
print 'can not be extended'
define the number of points and some minmaxes
npoints = 10
poimin = poi.getMin()
poimax = poi.getMax()
calc.SetFixedScan(npoints,poimin,poimax)
r = calc.GetInterval()
upperLimit = r.UpperLimit()
plot = ROOT.RooStats.HypoTestInverterPlot("HTI_Result_Plot","HypoTest Scan Result",r)
c = ROOT.TCanvas("HypoTestInverter Scan")
c.SetLogy(False)
plot.Draw("CLb 2CL")
c.Draw()