Getting Started

A translation of the RooFit Users Manual seen on January 2016 into Python the SWAN service

Tags should include. pyROOT, python, RooFit, RooStats, iPython, jupyter, ROOT and more. This page specifically features, working with TTrees, numpy, RooDataHist, RooRealVar and I add these since the links that I found when googling these links prooved invaluable.

In [1]:
import ROOT
Welcome to JupyROOT 6.08/02

jsroot allows for cool interactive zoomable plots. It's very cool but can be intensive and taxing on display resources

In [2]:
%jsroot on

Building a Model

A key concept in RooFit is a mathematical correspondance with objects.

  • RooRealVar expresses a variable
  • RooAbsReal expresses a function
  • RooAbsPdf expresses a probability density function
In [3]:
x = ROOT.RooRealVar("x","x",-10,10)
mean = ROOT.RooRealVar("mean","Mean of Gaussian",-10,10)
sigma = ROOT.RooRealVar("sigma","Width of Gaussian",3,-10,10)
gauss = ROOT.RooGaussian("gauss","gauss(x,mean,sigma)",x,mean,sigma)
RooFit v3.60 -- Developed by Wouter Verkerke and David Kirkby 
                Copyright (C) 2000-2013 NIKHEF, University of California & Stanford University
                All rights reserved, please read http://roofit.sourceforge.net/license.txt

Models are built from variables and pdfs. The mathematical coherance is key.

Visualising a Model

In statistics the angle at which you view your data is important. Roostats uses frames to do this. First a frame is created then a model is fitted onto the frame. This frame (with model) is then drawn on a ROOT TCanvas.

In [4]:
c = ROOT.TCanvas()
xframe = x.frame()
gauss.plotOn(xframe)
xframe.Draw()
c.Draw()

A frame contains a snapshot of the item as soon as it is plotted onto it. It can contain different plots of the same distribution.

In [5]:
newframe = x.frame()
gauss.plotOn(newframe)
sigma.setVal(2)
gauss.plotOn(newframe,ROOT.RooFit.LineColor(2))
newframe.Draw()
c.Draw()

Using notebooks makes it difficult to plot several frames, since the notebook must be rendered at the correct order.

Interestingly, real values must be set using the setVal() function not simply to a float as in C++

Importing Data

Data usually come either binned or unbinned.

In [6]:
h1 = ROOT.TH1D("h1","gaussian histogram",20,-10,10)
h1.FillRandom("gaus",10000)
h2 = ROOT.TH1D("h2","gaussian histogram",20,-10,10)
for i in range(10000): h2.Fill(ROOT.gRandom.Gaus(0,3))
In [7]:
x = ROOT.RooRealVar("x","x",-10,10)
l = ROOT.RooArgList(x)
data = ROOT.RooDataHist("data", "data set with x1", l, h1)
data2 = ROOT.RooDataHist("data2", "data set with x2", l, h2)
[#1] INFO:DataHandling -- RooDataHist::adjustBinning(data): fit range of variable x expanded to nearest bin boundaries: [-10,10] --> [-10,10]
[#1] INFO:DataHandling -- RooDataHist::adjustBinning(data2): fit range of variable x expanded to nearest bin boundaries: [-10,10] --> [-10,10]

For some reason, the RooDataHist wont take the RooRealVar as an argument. As such we add it to the RooArgList.

In [8]:
xframe = x.frame()
data.plotOn(xframe)
data2.plotOn(xframe,ROOT.RooFit.MarkerColor(2))
xframe.Draw()
h1.Draw("same")
c.Draw()
[#1] INFO:Plotting -- RooPlot::updateFitRangeNorm: New event count of 9989 will supercede previous event count of 10000 for normalization of PDF projections
In [9]:
import numpy as np
tree = ROOT.TTree("tree","tree")
x = np.zeros(1,dtype=float)
tree.Branch("x",x,'x/D')
for i in range(10000):
    x[0] = np.random.normal(0,3,1)
    tree.Fill()
In [10]:
x = ROOT.RooRealVar("x","x",-10,10)
data = ROOT.RooDataSet("data","dataset from tree",tree,ROOT.RooArgSet(x))
[#1] INFO:Eval -- RooTreeDataStore::loadValues(data) Ignored 6 out of range events

As can be seen the benefits of using python mean that we can interface ROOT and numpy. Both of which is And now to plot!

In [11]:
xframe = x.frame()
data.plotOn(xframe,ROOT.RooFit.Binning(25))
xframe.Draw()
c.Draw()

Generally RooFit can use binned and unbinned data interchangeably since they both (RooDataSet and RooDataHist) inherit from the same class.

Fitting a Model to data

Usually in data analysis we want to test how well the data conforms to our hypothesis, in real terms this means comparing the results of the two and testing how similar they look. In order to do this we must create a test statistic. the most common choices are $\chi^2 \text{ and }-\log(likelihood)$

The default fit in RooFit is the maximum likelihood fit that matches the data's binned or unbinned status. Since RooFit is a ROOT implementation the minimisation is done by MINUIT through is TMinuit implementation.

The easiest fit is performed by the fitTo() method of class RooAbsPdf. Which builds a -log(L) function from the gauss function and the given dataset, then MINUIT minimizes it and estimates the errors.

In [12]:
x = ROOT.RooRealVar("x","x",-10,10)
mean = ROOT.RooRealVar("mean","Mean of Gaussian",0,-10,10)
sigma = ROOT.RooRealVar("sigma","Width of Gaussian",3,-10,10)
gauss = ROOT.RooGaussian("gauss","gauss(x,mean,sigma)",x,mean,sigma)

data = gauss.generate(ROOT.RooArgSet(x),10000)

xframe = x.frame()
data.plotOn(xframe, ROOT.RooLinkedList())
gauss.plotOn(xframe)
xframe.Draw()
c.Draw()

In this example the generate function is used to generate a data set from the function provided.

In [13]:
result = gauss.fitTo(data,ROOT.RooFit.PrintLevel(-1))
[#1] INFO:Minization -- RooMinimizer::optimizeConst: activating const optimization
[#1] INFO:Minization -- RooMinimizer::optimizeConst: deactivating const optimization
In [14]:
mean.Print()
sigma.Print()
RooRealVar::mean = 0.0172338 +/- 0.0299542  L(-10 - 10) 
RooRealVar::sigma = 2.98095 +/- 0.0217309  L(-10 - 10) 

These can be set to constant

In [15]:
mean.setConstant(ROOT.kTRUE)
sigma.setConstant(ROOT.kFALSE)

And the ranges modified.

In [16]:
sigma.setRange(0.1,3)
result = gauss.fitTo(data,ROOT.RooFit.Minos(ROOT.kTRUE),ROOT.RooFit.PrintLevel(-1))
[#1] INFO:Minization -- RooMinimizer::optimizeConst: activating const optimization
[#1] INFO:Minization -- RooMinimizer::optimizeConst: deactivating const optimization
In [17]:
gauss.fitTo(data,ROOT.RooFit.Range(-5,5),ROOT.RooFit.PrintLevel(-1))
Out[17]:
<ROOT.RooFitResult object at 0x(nil)>
[#1] INFO:Eval -- RooRealVar::setRange(x) new range named 'fit' created with bounds [-5,5]
[#1] INFO:Fitting -- RooAbsOptTestStatistic::ctor(nll_gauss_gaussData) constructing test statistic for sub-range named fit
[#1] INFO:Eval -- RooRealVar::setRange(x) new range named 'NormalizationRangeForfit' created with bounds [-10,10]
[#1] INFO:Eval -- RooRealVar::setRange(x) new range named 'fit_nll_gauss_gaussData' created with bounds [-5,5]
[#1] INFO:Fitting -- RooAbsOptTestStatistic::ctor(nll_gauss_gaussData) fixing interpretation of coefficients of any RooAddPdf to full domain of observables 
[#1] INFO:Minization -- RooMinimizer::optimizeConst: activating const optimization
[#1] INFO:Minization -- RooMinimizer::optimizeConst: deactivating const optimization
In [18]:
xframe = x.frame()
data.plotOn(xframe,ROOT.RooLinkedList())
gauss.plotOn(xframe)
xframe.Draw()
c.Draw()
[#1] INFO:Plotting -- RooAbsPdf::plotOn(gauss) p.d.f was fitted in range and no explicit plot,norm range was specified, using fit range as default
[#1] INFO:Plotting -- RooAbsPdf::plotOn(gauss) only plotting range 'fit_nll_gauss_gaussData'
[#1] INFO:Plotting -- RooAbsPdf::plotOn(gauss) p.d.f. curve is normalized using explicit choice of ranges 'fit_nll_gauss_gaussData'

Calculating integrals over models

In [19]:
intGaussX = gauss.createIntegral(ROOT.RooArgSet(x))
intGaussX.getVal()
Out[19]:
7.438427223538795

Since intGaussX is a RooAbsReal

In [20]:
cdf = gauss.createCdf(ROOT.RooArgSet(x))
xframe = x.frame()
cdf.plotOn(xframe)
xframe.Draw()
c.Draw()