class Statsample::Bivariate::Polychoric

Polychoric correlation.

The polychoric correlation is a measure of bivariate association arising when both observed variates are ordered, categorical variables that result from polychotomizing the two undelying continuous variables (Drasgow, 2006)

According to Drasgow(2006), there are tree methods to estimate the polychoric correlation: ML Joint estimation, ML two-step estimation and polycoric series estimate. You can select the estimation method with method attribute.

ML Joint Estimation

Requires gsl library and gsl gem. Joint estimation uses derivative based algorithm by default, based on Ollson(1979). There is available a derivative free algorithm available compute_one_step_mle_without_derivatives() , based loosely on J.Fox R package 'polycor' algorithm.

Two-step Estimation

Default method. Uses a no-derivative aproach, based on J.Fox R package 'polycor'.

Polychoric series estimate.

Warning: Result diverge a lot from Joint and two-step calculation.

Requires gsl library and gsl gem. Based on Martinson and Hamdam(1975) algorithm.

Use

You should enter a Matrix with ordered data. For example:

      -------------------
      | y=0 | y=1 | y=2 | 
      -------------------
x = 0 |  1  |  10 | 20  |
      -------------------
x = 1 |  20 |  20 | 50  |
      -------------------

The code will be

matrix=Matrix[[1,10,20],[20,20,50]]
poly=Statsample::Bivariate::Polychoric.new(matrix, :method=>:joint)
puts poly.r

See extensive documentation on Uebersax(2002) and Drasgow(2006)

Reference

Constants

EPSILON

Epsilon

MAX_ITERATIONS

Max number of iteratios

METHOD

Default method

MINIMIZER_TYPE_JOINT_DERIVATIVE

GSL multidimensional minimizer, derivative based

MINIMIZER_TYPE_JOINT_NO_DERIVATIVE

GSL multidimensional minimizer, non derivative based

MINIMIZER_TYPE_TWO_STEP

GSL unidimensional minimizer

Attributes

alpha[R]

Returns the rows thresholds

beta[R]

Returns the columns thresholds

debug[RW]

Debug algorithm (See iterations, for example)

epsilon[RW]

Absolute error for iteration.

failed[R]
iteration[R]

Number of iterations

log[R]

Log of algorithm

loglike_model[R]

Model ll

max_iterations[RW]

Max number of iterations used on iterative methods. Default to MAX_ITERATIONS

method[RW]

Method of calculation of polychoric series. :two_step used by default.

:two_step

two-step ML, based on code by J.Fox

:polychoric_series

polychoric series estimate, using algorithm AS87 by Martinson and Hamdan (1975).

:joint

one-step ML, usign derivatives by Olsson (1979)

minimizer_type_joint_derivative[RW]

Minimizer type for joint estimate, using derivative. Default “conjugate_pr”. See rb-gsl.rubyforge.org/min.html for reference.

minimizer_type_joint_no_derivative[RW]

Minimizer type for joint estimate, no derivative. Default “nmsimplex”. See rb-gsl.rubyforge.org/min.html for reference.

minimizer_type_two_step[RW]

Minimizer type for two step. Default “brent” See rb-gsl.rubyforge.org/min.html for reference.

name[RW]

Name of the analysis

r[R]

Returns the polychoric correlation

threshold_x[R]

Returns the rows thresholds

threshold_y[R]

Returns the columns thresholds

Public Class Methods

new(matrix, opts=Hash.new) click to toggle source

Params:

  • matrix: Contingence table

  • opts: Hash with options. Could be any

accessable attribute of object

# File lib/statsample/bivariate/polychoric.rb, line 168
def initialize(matrix, opts=Hash.new)
  @matrix=matrix
  @n=matrix.column_size
  @m=matrix.row_size
  raise "row size <1" if @m<=1
  raise "column size <1" if @n<=1
  
  @method=METHOD
  @name=_("Polychoric correlation")
  @max_iterations=MAX_ITERATIONS
  @epsilon=EPSILON
  @minimizer_type_two_step=MINIMIZER_TYPE_TWO_STEP
  @minimizer_type_joint_no_derivative=MINIMIZER_TYPE_JOINT_NO_DERIVATIVE
  @minimizer_type_joint_derivative=MINIMIZER_TYPE_JOINT_DERIVATIVE
  
  @debug=false
  @iteration=nil
  opts.each{|k,v|
    self.send("#{k}=",v) if self.respond_to? k
  }
  @r=nil
  @pd=nil
  @failed=false
  compute_basic_parameters
end
new_with_vectors(v1,v2) click to toggle source

Create a Polychoric object, based on two vectors

# File lib/statsample/bivariate/polychoric.rb, line 161
def self.new_with_vectors(v1,v2)
  Polychoric.new(Crosstab.new(v1,v2).to_matrix)
end

Public Instance Methods

compute() click to toggle source

Start the computation of polychoric correlation based on attribute method.

# File lib/statsample/bivariate/polychoric.rb, line 201
def compute
  if @method==:two_step
    compute_two_step_mle
  elsif @method==:joint
    compute_one_step_mle
  elsif @method==:polychoric_series
    compute_polychoric_series
  else
    raise "Not implemented"
  end
end

Estimation methods

↑ top

Public Instance Methods

compute_one_step_mle() click to toggle source

Compute joint ML estimation. Uses compute_one_step_mle_with_derivatives() by default.

# File lib/statsample/bivariate/polychoric.rb, line 382
def compute_one_step_mle
  compute_one_step_mle_with_derivatives
end
compute_one_step_mle_with_derivatives() click to toggle source

Compute Polychoric correlation with joint estimate, usign derivative based minimization method.

Much faster than method without derivatives.

# File lib/statsample/bivariate/polychoric.rb, line 390
def compute_one_step_mle_with_derivatives
  # Get initial values with two-step aproach
  compute_two_step_mle
  # Start iteration with past values
  rho=@r
  cut_alpha=@alpha
  cut_beta=@beta
  parameters=[rho]+cut_alpha+cut_beta
  np=@nc-1+@nr
  
  
  loglike_f = Proc.new { |v, params|
    new_rho=v[0]
    new_alpha=v[1, @nr-1]
    new_beta=v[@nr, @nc-1]
    pr=Processor.new(new_alpha,new_beta,new_rho,@matrix)
    pr.loglike
  }
  
  loglike_df = Proc.new {|v, params, df |
    compute_derivatives_vector(v,df)
  }
    
  
  my_func = GSL::MultiMin::Function_fdf.alloc(loglike_f,loglike_df, np)
  my_func.set_params(parameters)      # parameters
  
  x = GSL::Vector.alloc(parameters.dup)
  minimizer = GSL::MultiMin::FdfMinimizer.alloc(minimizer_type_joint_derivative,np)
  minimizer.set(my_func, x, 1, 1e-3)
  
  iter = 0
  message=""
  begin_time=Time.new
  begin
    iter += 1
    status = minimizer.iterate()
    #p minimizer.f
    #p minimizer.gradient
    status = minimizer.test_gradient(1e-3)
    if status == GSL::SUCCESS
      total_time=Time.new-begin_time
      message+="Joint MLE converged to minimum on %0.3f seconds at\n" % total_time
    end
    x = minimizer.x
    message+= sprintf("%5d iterations", iter)+"\n";
    message+= "args="
    for i in 0...np do
      message+=sprintf("%10.3e ", x[i])
    end
    message+=sprintf("f() = %7.3f\n"  , minimizer.f)+"\n";
  rescue => e
    @failed=true
  end while status == GSL::CONTINUE and iter < @max_iterations
  
  @iteration=iter
  @log+=message        
  @r=minimizer.x[0]
  @alpha=minimizer.x[1,@nr-1].to_a
  @beta=minimizer.x[@nr,@nc-1].to_a
  @loglike_model= -minimizer.minimum
  
  pr=Processor.new(@alpha,@beta,@r,@matrix)
  
end
compute_one_step_mle_without_derivatives() click to toggle source

Compute Polychoric correlation with joint estimate, usign derivative-less minimization method.

Rho and thresholds are estimated at same time. Code based on R package “polycor”, by J.Fox.

# File lib/statsample/bivariate/polychoric.rb, line 463
def compute_one_step_mle_without_derivatives
  # Get initial values with two-step aproach
  compute_two_step_mle
  # Start iteration with past values
  rho=@r
  cut_alpha=@alpha
  cut_beta=@beta
  parameters=[rho]+cut_alpha+cut_beta
  np=@nc-1+@nr

  minimization = Proc.new { |v, params|
    new_rho=v[0]
   new_alpha=v[1, @nr-1]
   new_beta=v[@nr, @nc-1]
   
   #puts "f'rho=#{fd_loglike_rho(alpha,beta,rho)}"
   #(@nr-1).times {|k|
   #  puts "f'a(#{k}) = #{fd_loglike_a(alpha,beta,rho,k)}"
   #  puts "f'a(#{k}) v2 = #{fd_loglike_a2(alpha,beta,rho,k)}"
   #
   #}
   #(@nc-1).times {|k|
   #  puts "f'b(#{k}) = #{fd_loglike_b(alpha,beta,rho,k)}"
   #}
   pr=Processor.new(new_alpha,new_beta,new_rho,@matrix)
   
   df=Array.new(np)
   #compute_derivatives_vector(v,df)
   pr.loglike
  }
  my_func = GSL::MultiMin::Function.alloc(minimization, np)
  my_func.set_params(parameters)      # parameters
  
  x = GSL::Vector.alloc(parameters.dup)
  
  ss = GSL::Vector.alloc(np)
  ss.set_all(1.0)
  
  minimizer = GSL::MultiMin::FMinimizer.alloc(minimizer_type_joint_no_derivative,np)
  minimizer.set(my_func, x, ss)
  
  iter = 0
  message=""
  begin_time=Time.new
  begin
    iter += 1
    status = minimizer.iterate()
    status = minimizer.test_size(@epsilon)
    if status == GSL::SUCCESS
      total_time=Time.new-begin_time
      message="Joint MLE converged to minimum on %0.3f seconds at\n" % total_time
    end
    x = minimizer.x
    message+= sprintf("%5d iterations", iter)+"\n";
    for i in 0...np do
      message+=sprintf("%10.3e ", x[i])
    end
    message+=sprintf("f() = %7.3f size = %.3f\n", minimizer.fval, minimizer.size)+"\n";
  end while status == GSL::CONTINUE and iter < @max_iterations
  @iteration=iter
  @log+=message        
  @r=minimizer.x[0]
  @alpha=minimizer.x[1,@nr-1].to_a
  @beta=minimizer.x[@nr,@nc-1].to_a
  @loglike_model= -minimizer.minimum
end
compute_polychoric_series() click to toggle source

Compute polychoric correlation using polychoric series. Algorithm: AS87, by Martinson and Hamdam(1975).

Warning: According to Drasgow(2006), this computation diverges greatly of joint and two-step methods.

# File lib/statsample/bivariate/polychoric.rb, line 577
def compute_polychoric_series 
  @nn=@n-1
  @mm=@m-1
  @nn7=7*@nn
  @mm7=7*@mm
  @mn=@n*@m
  @cont=[nil]
  @n.times {|j|
    @m.times {|i|
      @cont.push(@matrix[i,j])
    }
  }

  pcorl=0
  cont=@cont
  xmean=0.0
  sum=0.0
  row=[]
  colmn=[]
  (1..@m).each do |i|
    row[i]=0.0
    l=i
    (1..@n).each do |j|
      row[i]=row[i]+cont[l]
      l+=@m
    end
    raise "Should not be empty rows" if(row[i]==0.0)
    xmean=xmean+row[i]*i.to_f
    sum+=row[i]
  end
  xmean=xmean/sum.to_f
  ymean=0.0
  (1..@n).each do |j|
    colmn[j]=0.0
    l=(j-1)*@m
    (1..@m).each do |i|
      l=l+1
      colmn[j]=colmn[j]+cont[l] #12
    end
    raise "Should not be empty cols" if colmn[j]==0
    ymean=ymean+colmn[j]*j.to_f
  end
  ymean=ymean/sum.to_f
  covxy=0.0
  (1..@m).each do |i|
    l=i
    (1..@n).each do |j|
      conxy=covxy+cont[l]*(i.to_f-xmean)*(j.to_f-ymean)
      l=l+@m
    end
  end
  
  chisq=0.0
  (1..@m).each do |i|
    l=i
    (1..@n).each do |j|
      chisq=chisq+((cont[l]**2).quo(row[i]*colmn[j]))
      l=l+@m
    end
  end
  
  phisq=chisq-1.0-(@mm*@nn).to_f / sum.to_f
  phisq=0 if(phisq<0.0) 
  # Compute cumulative sum of columns and rows
  sumc=[]
  sumr=[]
  sumc[1]=colmn[1]
  sumr[1]=row[1]
  cum=0
  (1..@nn).each do |i| # goto 17 r20
    cum=cum+colmn[i]
    sumc[i]=cum
  end
  cum=0
  (1..@mm).each do |i| 
    cum=cum+row[i]
    sumr[i]=cum
  end
  alpha=[]
  beta=[]
  # Compute points of polytomy
  (1..@mm).each do |i| #do 21
    alpha[i]=Distribution::Normal.p_value(sumr[i] / sum.to_f)
  end # 21
  (1..@nn).each do |i| #do 22
    beta[i]=Distribution::Normal.p_value(sumc[i] / sum.to_f)
  end # 21
  @alpha=alpha[1,alpha.size] 
  @beta=beta[1,beta.size]
  @sumr=row[1,row.size]
  @sumc=colmn[1,colmn.size]
  @total=sum
  
  # Compute Fourier coefficients a and b. Verified
  h=hermit(alpha,@mm)
  hh=hermit(beta,@nn)
  a=[]
  b=[]
  if @m!=2 # goto 24
    mmm=@m-2
    (1..mmm).each do |i| #do 23
      a1=sum.quo(row[i+1] * sumr[i] * sumr[i+1])
      a2=sumr[i]   * xnorm(alpha[i+1])
      a3=sumr[i+1] * xnorm(alpha[i])
      l=i
      (1..7).each do |j| #do 23
        a[l]=Math::sqrt(a1.quo(j))*(h[l+1] * a2 - h[l] * a3)
        l=l+@mm
      end
    end #23
  end
  # 24
  
  
  if @n!=2 # goto 26
    nnn=@n-2
    (1..nnn).each do |i| #do 25
      a1=sum.quo(colmn[i+1] * sumc[i] * sumc[i+1])
      a2=sumc[i] * xnorm(beta[i+1])
      a3=sumc[i+1] * xnorm(beta[i])
      l=i
      (1..7).each do |j| #do 25
        b[l]=Math::sqrt(a1.quo(j))*(a2 * hh[l+1] - a3*hh[l])
        l=l+@nn
      end # 25
    end # 25
  end
  #26 r20
  l = @mm
  a1 = -sum * xnorm(alpha[@mm])
  a2 = row[@m] * sumr[@mm] 
  (1..7).each do |j| # do 27
    a[l]=a1 * h[l].quo(Math::sqrt(j*a2))
    l=l+@mm
  end # 27
  
  l = @nn
  a1 = -sum * xnorm(beta[@nn])
  a2 = colmn[@n] * sumc[@nn]

  (1..7).each do |j| # do 28
    b[l]=a1 * hh[l].quo(Math::sqrt(j*a2))
    l = l + @nn
  end # 28
  rcof=[]
  # compute coefficients rcof of polynomial of order 8
  rcof[1]=-phisq
  (2..9).each do |i| # do 30
    rcof[i]=0.0
  end #30
  m1=@mm
  (1..@mm).each do |i| # do 31
    m1=m1+1
    m2=m1+@mm
    m3=m2+@mm
    m4=m3+@mm
    m5=m4+@mm
    m6=m5+@mm
    n1=@nn
    (1..@nn).each do |j| # do 31
      n1=n1+1
      n2=n1+@nn
      n3=n2+@nn
      n4=n3+@nn
      n5=n4+@nn
      n6=n5+@nn
      
      rcof[3] = rcof[3] + a[i]**2 * b[j]**2
      
      rcof[4] = rcof[4] + 2.0 * a[i] * a[m1] * b[j] * b[n1]
      
      rcof[5] = rcof[5] + a[m1]**2 * b[n1]**2 +
        2.0 * a[i] * a[m2] * b[j] * b[n2]
      
      rcof[6] = rcof[6] + 2.0 * (a[i] * a[m3] * b[j] *
        b[n3] + a[m1] * a[m2] * b[n1] * b[n2])
      
      rcof[7] = rcof[7] + a[m2]**2 * b[n2]**2 +
        2.0 * (a[i] * a[m4] * b[j] * b[n4] + a[m1] * a[m3] *
          b[n1] * b[n3])
      
      rcof[8] = rcof[8] + 2.0 * (a[i] * a[m5] * b[j] * b[n5] +
        a[m1] * a[m4] * b[n1] * b[n4] + a[m2] *  a[m3] * b[n2] * b[n3])
      
      rcof[9] = rcof[9] + a[m3]**2 * b[n3]**2 +
        2.0 * (a[i] * a[m6] * b[j] * b[n6] + a[m1] * a[m5] * b[n1] *
        b[n5] + (a[m2] * a[m4] * b[n2] * b[n4]))
    end # 31
  end # 31

  rcof=rcof[1,rcof.size]
  poly = GSL::Poly.alloc(rcof)
  roots=poly.solve
  rootr=[nil]
  rooti=[nil]
  roots.each {|c|
    rootr.push(c.real)
    rooti.push(c.im)
  }
  @rootr=rootr
  @rooti=rooti
  
  norts=0
  (1..7).each do |i| # do 43
    
    next if rooti[i]!=0.0 
    if (covxy>=0.0)
      next if(rootr[i]<0.0 or rootr[i]>1.0)
      pcorl=rootr[i]
      norts=norts+1
    else
      if (rootr[i]>=-1.0 and rootr[i]<0.0)
        pcorl=rootr[i]
        norts=norts+1              
      end
    end
  end # 43
  raise "Error" if norts==0
  @r=pcorl
  pr=Processor.new(@alpha,@beta,@r,@matrix)
  @loglike_model=-pr.loglike
  
end
compute_two_step_mle() click to toggle source

Computation of polychoric correlation usign two-step ML estimation.

Two-step ML estimation “first estimates the thresholds from the one-way marginal frequencies, then estimates rho, conditional on these thresholds, via maximum likelihood” (Uebersax, 2006).

The algorithm is based on code by Gegenfurtner(1992).

References:

  • Gegenfurtner, K. (1992). PRAXIS: Brent's algorithm for function minimization. Behavior Research Methods, Instruments & Computers, 24(4), 560-564. Available on www.allpsych.uni-giessen.de/karl/pdf/03.praxis.pdf

  • Uebersax, J.S. (2006). The tetrachoric and polychoric correlation coefficients. Statistical Methods for Rater Agreement web site. 2006. Available at: john-uebersax.com/stat/tetra.htm . Accessed February, 11, 2010

# File lib/statsample/bivariate/polychoric.rb, line 287
def compute_two_step_mle
  if Statsample.has_gsl?
    compute_two_step_mle_gsl
  else
    compute_two_step_mle_ruby
  end
end
compute_two_step_mle_gsl() click to toggle source

Compute two step ML estimation using gsl.

# File lib/statsample/bivariate/polychoric.rb, line 319
def compute_two_step_mle_gsl 
  
fn1=GSL::Function.alloc {|rho|
  pr=Processor.new(@alpha,@beta, rho, @matrix)
  pr.loglike
}
@iteration = 0
max_iter = @max_iterations
m = 0             # initial guess
m_expected = 0
a=-0.9999
b=+0.9999
gmf = GSL::Min::FMinimizer.alloc(@minimizer_type_two_step)
gmf.set(fn1, m, a, b)
header=_("Two step minimization using %s method\n") % gmf.name
header+=sprintf("%5s [%9s, %9s] %9s %10s %9s\n", "iter", "lower", "upper", "min",
   "err", "err(est)")
  
header+=sprintf("%5d [%.7f, %.7f] %.7f %+.7f %.7f\n", @iteration, a, b, m, m - m_expected, b - a)
@log=header
puts header if @debug
begin
  @iteration += 1
  status = gmf.iterate
  status = gmf.test_interval(@epsilon, 0.0)
  
  if status == GSL::SUCCESS
    @log+="converged:"
    puts "converged:" if @debug
  end
  a = gmf.x_lower
  b = gmf.x_upper
  m = gmf.x_minimum
  message=sprintf("%5d [%.7f, %.7f] %.7f %+.7f %.7f\n",
    @iteration, a, b, m, m - m_expected, b - a);
  @log+=message
  puts message if @debug
end while status == GSL::CONTINUE and @iteration < @max_iterations
@r=gmf.x_minimum
@loglike_model=-gmf.f_minimum
end

LL methods

↑ top

Public Instance Methods

chi_square() click to toggle source

Chi Square of model

# File lib/statsample/bivariate/polychoric.rb, line 230
def chi_square
  if @loglike_model.nil?
    compute
  end
  -2*(@loglike_model-loglike_data)
end
chi_square_df() click to toggle source
# File lib/statsample/bivariate/polychoric.rb, line 237
def chi_square_df
  (@nr*@nc)-@nc-@nr
end
compute_basic_parameters() click to toggle source
# File lib/statsample/bivariate/polychoric.rb, line 244
def compute_basic_parameters
  @nr=@matrix.row_size
  @nc=@matrix.column_size
  @sumr=[0]*@matrix.row_size
  @sumrac=[0]*@matrix.row_size
  @sumc=[0]*@matrix.column_size
  @sumcac=[0]*@matrix.column_size
  @alpha=[0]*(@nr-1)
  @beta=[0]*(@nc-1)
  @total=0
  @nr.times do |i|
    @nc.times do |j|
      @sumr[i]+=@matrix[i,j]
      @sumc[j]+=@matrix[i,j]
      @total+=@matrix[i,j]
    end
  end
  ac=0
  (@nr-1).times do |i|
    @sumrac[i]=@sumr[i]+ac
    @alpha[i]=Distribution::Normal.p_value(@sumrac[i] / @total.to_f)
    ac=@sumrac[i]
  end
  ac=0
  (@nc-1).times do |i|
    @sumcac[i]=@sumc[i]+ac
    @beta[i]=Distribution::Normal.p_value(@sumcac[i] / @total.to_f)
    ac=@sumcac[i]
  end
end
loglike_data() click to toggle source
# File lib/statsample/bivariate/polychoric.rb, line 215
def loglike_data
  loglike=0
  @nr.times do |i|
    @nc.times do |j|
      res=@matrix[i,j].quo(@total)
      if (res==0)
        res=1e-16
      end
    loglike+= @matrix[i,j]  * Math::log(res )
    end
  end
  loglike
end