Statistics can be very particular about its use of language. We begin with some definitions for a few basic concepts.
import ROOT
%jsroot on
The aim of the RooFit exercises is to learn how to build a RooFit model, how to fit the model on a set of data that we will generate and how to plot the result. We will also learn how to save the full RooFit workspace containing the model and data in a file which we can read back later, for example when we will do the RooStats exercises. We will learn also how to generate the data from the model. The same generated data we will then use to fit the model
In previous exercises we bypassed the workspace functionality. This allows us to save the statistical model and use it later. This is useful for us in case we wish to use it later. (in the next notebook)
w = ROOT.RooWorkspace("w")
w.factory("Gaussian:pdf(x[-10,10],mu[1,-1000,1000],sigma[2,0,1000])")
We extract some parameters from this workspace and generate some data. Remember that RooStats has trouble overloading the generate function so we need to explicitly define x as a RooArgSet.
pdf = w.pdf("pdf")
x = w.var("x")
n = 1000
data = pdf.generate(ROOT.RooArgSet(x),n)
data.SetName("data")
Already we arrive at the second of two major problems with RooStats/RooFit in python. The first (above) is function overloading. The last is seen here when we would like to use a simple Title definition that is usually obtained from the RooFit name space. (odd that it's RooFit and not RooStats especially when I would like to demonstrate the differences between the two libraries)
pl = x.frame(ROOT.RooFit.Title("Gaussian Fit"))
This is the first time we use the canvas. Never the less it's useful to make a new TCanvas each time we plot.
c = ROOT.TCanvas()
data.plotOn(pl)
pl.Draw()
c.Draw()
pdf.fitTo(data, ROOT.RooFit.Minimizer("Minutit2","Migrad"))
c = ROOT.TCanvas()
data.plotOn(pl)
pdf.plotOn(pl)
pdf.paramOn(pl, ROOT.RooFit.Layout(0.6,0.9,0.85))
pl.Draw()
c.Draw()
Now we mustn't forget to save this data to the workspace! However since import is a protected keyword it cannot be used as a method. Luckily we have a workaround.
getattr(w,'import')(data)
This first workspace only has two parameters mu and sigma. We can easily add a third normalisation parameter for the number of events. Note that jupyter stores the workspace object in memory. So we can should define the workspace ' fresh'
This requires us to move from a likelihood to an extended likelihood formalism as described in the SignalAndBackgroundModels notebook.
w = ROOT.RooWorkspace("w")
w.factory("Gaussian:gaus(x[-10,10],mu[1,-1000,1000],sigma[2,0,1000])")
w.factory("ExtendPdf:pdf(gaus, nevt[100,0,100000])")
w.Print()
pdf = w.pdf("pdf")
x = w.var("x")
pdf.fitTo(data, ROOT.RooFit.Minimizer("Minutit2","Migrad"))
c = ROOT.TCanvas()
data.plotOn(pl)
pdf.plotOn(pl)
pdf.paramOn(pl, ROOT.RooFit.Layout(0.6,0.9,0.85))
pl.Draw()
c.Draw()
This however is an unbinned likelihood fit. Usually our data is binned.
binneddata = pdf.generateBinned(ROOT.RooArgSet(x),n)
binneddata.SetName("binneddata")
pdf.fitTo(binneddata, ROOT.RooFit.Minimizer("Minutit2","Migrad"))
pl = x.frame(ROOT.RooFit.Title("Binned Data"))
c = ROOT.TCanvas()
binneddata.plotOn(pl)
pdf.plotOn(pl)
pdf.paramOn(pl, ROOT.RooFit.Layout(0.6,0.9,0.85))
pl.Draw()
c.Draw()
This section recaps the excecises seen in BuildCompositeModel. Here we make two p.d.f, one representing a signal and one a background distributions. We want to determine the number of signal events. For this we need to perform an extended maximum likelihood fit, where the signal events is one of the fit parameter. We will create the composite model formed by a Gaussian signal over a falling exponential background. We will have ~ 100 Gaussian-distributed signal event over ~ 1000 exponential-distributed background. We assume that the location and width of the signal are known. The "mass" observable is distributed in range [0 - 10] and the signal has mean ~ 2 and sigma 0.3. The exponential decay of the background is 2.
nsig = 100
nbkg = 1000
This workspace we will use later so it's best to create it in full now.
Here the model is defined as being an extended pdf formed from the exponential and gaussian pdfs
w = ROOT.RooWorkspace("w")
w.factory("Exponential:bkg_pdf(x[0,10], a[-0.5,-2,-0.2])")
w.factory("Gaussian:sig_pdf(x, mass[2], sigma[0.3])")
w.factory("SUM:model(nsig[0,10000]*sig_pdf, nbkg[0,10000]*bkg_pdf)")
Once again we retrieve our variables from the workspace.
'x' here is the observable that we're interested in.
pdf = w.pdf("model")
x = w.var("x")
We set the desired value of the signal and background events.
w.var("nsig").setVal(nsig)
w.var("nbkg").setVal(nbkg)
w.Print()
Let's generate us some data!
If not defined generate will generate according to the total S+B events
(NB RooStats has a random number generator in it's core functions. This can have a random seed set. This isn't implemented in python yet but also isn't hugely important unless you want to generate the same 'random' numbers every time.)
x.setBins(50)
data = pdf.generate(ROOT.RooArgSet(x))
data.SetName("data")
getattr(w,'import')(data)
data.Print()
c = ROOT.TCanvas()
plot = x.frame(ROOT.RooFit.Title("Gaussian Signal over Exponential Background"))
data.plotOn(plot)
plot.Draw()
c.Draw()
Time to fit. This time we assign the fit result to a variable that we can access later.
r = pdf.fitTo(data, ROOT.RooFit.Save(True), ROOT.RooFit.Minimizer("Minuit2","Migrad"))
r.Print()
c = ROOT.TCanvas()
plot = x.frame(ROOT.RooFit.Title("Fit Over Composite"))
data.plotOn(plot)
pdf.plotOn(plot)
pdf.plotOn(plot, ROOT.RooFit.Components("bkg_pdf"), ROOT.RooFit.LineStyle(2) )
pdf.plotOn(plot, ROOT.RooFit.Components("sig_pdf"), ROOT.RooFit.LineColor(2), ROOT.RooFit.LineStyle(2) )
pdf.paramOn(plot,ROOT.RooFit.Layout(0.5,0.9,0.85))
plot.Draw()
c.Draw()
Now in addition to the data and pdfs in the Workspace we should also include a map to what these things correspond to. This is done via a ModelConfig.
This might seem pointless but it's really really important when trying to retrieve information from these workspaces.
mc = ROOT.RooStats.ModelConfig("ModelConfig",w)
mc.SetPdf(pdf)
mc.SetParametersOfInterest(ROOT.RooArgSet(w.var("nsig")))
mc.SetSnapshot(ROOT.RooArgSet(w.var("nsig")))
mc.SetObservables(ROOT.RooArgSet(w.var("x")))
w.defineSet("nuisParams","a,nbkg")
nuis = getattr(w,'set')("nuisParams")
mc.SetNuisanceParameters(nuis)
getattr(w,'import')(mc)
Now that this is added to the workspace we can save it to file.
w.writeToFile("output/GausExpModel.root",True)
Often we have to fit to two or more channels simultaneously. We extend our RooSimultaneous p.d.f. using a combined data set, which can be fit for one (or more) common parameters.
The first channel will be similar to the one of the previous exercise. A Gaussian signal over an Exponential decay background. We assume as before that the location and width of the signal are known. The "mass" observable is distributed in range [0 - 10] and the signal has mean ~ 2 and sigma 0.3. The exponential decay of the background is 2. We assume we have 1000 background events.
The second channel will be similar to the first one, we will have the same signal model, but a different background model. We assume we have less and a flatter background. We have an exponential decay of 4 and 100 background events in total.
We assume the number of signal events in the first channel is 50 and in the second channel is 30. We introduce a common rate parameter mu expressing the signal strength.
We will follow the same steps as before:
w = ROOT.RooWorkspace("w")
We add each model to the workspace. Starting with the first extended model.
w.factory("Exponential:bkg1_pdf(x[0,10], a1[-0.5,-2,-0.2])")
w.factory("Gaussian:sig_pdf(x, mass[2], sigma[0.3])")
w.factory("prod:nsig1(mu[1,0,5],xsec1[50])")
w.factory("SUM:model1(nsig1*sig_pdf, nbkg1[1000,0,10000]*bkg1_pdf)")
and now the second extended model
w.factory("Exponential:bkg2_pdf(x, a2[-0.25,-2,-0.2])")
w.factory("prod:nsig2(mu,xsec2[30])")
w.factory("SUM:model2(nsig2*sig_pdf, nbkg2[100,0,10000]*bkg2_pdf)")
and now we join them using a 'jointModel'
w.factory("index[channel1,channel2]")
w.factory("SIMUL:jointModel(index,channel1=model1,channel2=model2)")
We now extract our objects to work on.
x is once again the observable index is now the category (or channel)
the index also must be added to the fit as an observable later...
pdf = w.pdf("jointModel")
x = w.var("x")
index = w.cat("index")
now we generate some binned data. 100 bins is the default.
x.setBins(50)
data = pdf.generate(ROOT.RooArgSet(x,index))
data.SetName("data")
getattr(w,'import')(data)
data.Print()
Soon we will fit to the data. First let's look at the data we are fitting.
c = ROOT.TCanvas()
plot1 = x.frame(ROOT.RooFit.Title("Channel 1"))
plot2 = x.frame(ROOT.RooFit.Title("Channel 2"))
data.plotOn(plot1,ROOT.RooFit.Cut("index==index::channel1"))
data.plotOn(plot2,ROOT.RooFit.Cut("index==index::channel2"))
c.Divide(1,2)
c.cd(1)
plot1.Draw()
c.cd(2)
plot2.Draw()
c.Draw()
And the fit. We save it again and print the save.
r = pdf.fitTo(data, ROOT.RooFit.Save(True), ROOT.RooFit.Minimizer("Minuit2","Migrad"))
r.Print()
c = ROOT.TCanvas()
plot1 = x.frame(ROOT.RooFit.Title("Channel 1"))
plot2 = x.frame(ROOT.RooFit.Title("Channel 2"))
data.plotOn(plot1,ROOT.RooFit.Cut("index==index::channel1"))
data.plotOn(plot2,ROOT.RooFit.Cut("index==index::channel2"))
pdf.plotOn(plot1,ROOT.RooFit.ProjWData(data),ROOT.RooFit.Slice(w.cat("index"),"channel1"))
pdf.plotOn(plot2,ROOT.RooFit.ProjWData(data),ROOT.RooFit.Slice(w.cat("index"),"channel2"))
pdf.paramOn(plot1,ROOT.RooFit.Layout(0.65,0.85,0.85),ROOT.RooFit.Parameters(ROOT.RooArgSet(w.var("a1"),w.var("nbkg1"))))
pdf.paramOn(plot2,ROOT.RooFit.Layout(0.65,0.85,0.85),ROOT.RooFit.Parameters(ROOT.RooArgSet(w.var("a2"),w.var("nbkg2"))))
pdf.paramOn(plot2,ROOT.RooFit.Layout(0.65,0.85,0.7),ROOT.RooFit.Parameters(ROOT.RooArgSet(w.var("mu"))))
c.Divide(1,2)
c.cd(1)
plot1.Draw()
c.cd(2)
plot2.Draw()
c.Draw()
ROOT.RooStats.ModelConfig("ModelConfig",w)
mc.SetPdf(pdf)
mc.SetParametersOfInterest(ROOT.RooArgSet(w.var("mu")))
mc.SetObservables(ROOT.RooArgSet(w.var("x"),w.cat("index")))
w.defineSet("nuisParams","a1,nbkg1,a2,nbkg2")
mc.SetNuisanceParameters(w.set("nuisParams"))
At this point we should save a snapshot and import it all into the workspace
mc.SetSnapshot(ROOT.RooArgSet(w.var("mu")))
getattr(w,'import')(mc)
And save!
w.writeToFile("output/CombinedModel.root",True)