class Statsample::Bivariate::Polychoric
Polychoric
correlation.¶ ↑
The polychoric correlation is a measure of bivariate association arising when both observed variates are ordered, categorical variables that result from polychotomizing the two undelying continuous variables (Drasgow, 2006)
According to Drasgow(2006), there are tree methods to estimate the polychoric correlation: ML Joint estimation, ML two-step estimation and polycoric series estimate. You can select the estimation method with method
attribute.
ML Joint Estimation¶ ↑
Requires gsl library and gsl
gem. Joint estimation uses derivative based algorithm by default, based on Ollson(1979). There is available a derivative free algorithm available compute_one_step_mle_without_derivatives
() , based loosely on J.Fox R package 'polycor' algorithm.
Two-step Estimation¶ ↑
Default method. Uses a no-derivative aproach, based on J.Fox R package 'polycor'.
Polychoric
series estimate. ¶ ↑
Warning: Result diverge a lot from Joint and two-step calculation.
Requires gsl library and gsl
gem. Based on Martinson and Hamdam(1975) algorithm.
Use¶ ↑
You should enter a Matrix with ordered data. For example:
------------------- | y=0 | y=1 | y=2 | ------------------- x = 0 | 1 | 10 | 20 | ------------------- x = 1 | 20 | 20 | 50 | -------------------
The code will be
matrix=Matrix[[1,10,20],[20,20,50]] poly=Statsample::Bivariate::Polychoric.new(matrix, :method=>:joint) puts poly.r
See extensive documentation on Uebersax(2002) and Drasgow(2006)
Reference¶ ↑
-
Drasgow F. (2006).
Polychoric
and polyserial correlations. In Kotz L, Johnson NL (Eds.), Encyclopedia of statistical sciences. Vol. 7 (pp. 69-74). New York: Wiley. -
Olsson, U. (1979) Maximum likelihood estimation of the polychoric correlation coefficient. Psychometrika 44, 443-460.
-
Uebersax, J.S. (2006). The tetrachoric and polychoric correlation coefficients. Statistical Methods for Rater Agreement web site. 2006. Available at: john-uebersax.com/stat/tetra.htm . Accessed February, 11, 2010
Constants
- EPSILON
Epsilon
- MAX_ITERATIONS
Max number of iteratios
- METHOD
Default method
- MINIMIZER_TYPE_JOINT_DERIVATIVE
GSL multidimensional minimizer, derivative based
- MINIMIZER_TYPE_JOINT_NO_DERIVATIVE
GSL multidimensional minimizer, non derivative based
- MINIMIZER_TYPE_TWO_STEP
GSL unidimensional minimizer
Attributes
Returns the rows thresholds
Returns the columns thresholds
Debug algorithm (See iterations, for example)
Absolute error for iteration.
Number of iterations
Log of algorithm
Model ll
Max number of iterations used on iterative methods. Default to MAX_ITERATIONS
Method of calculation of polychoric series. :two_step
used by default.
- :two_step
-
two-step ML, based on code by J.Fox
- :polychoric_series
-
polychoric series estimate, using algorithm AS87 by Martinson and Hamdan (1975).
- :joint
-
one-step ML, usign derivatives by Olsson (1979)
Minimizer type for joint estimate, using derivative. Default “conjugate_pr”. See rb-gsl.rubyforge.org/min.html for reference.
Minimizer type for joint estimate, no derivative. Default “nmsimplex”. See rb-gsl.rubyforge.org/min.html for reference.
Minimizer type for two step. Default “brent” See rb-gsl.rubyforge.org/min.html for reference.
Name of the analysis
Returns the polychoric correlation
Returns the rows thresholds
Returns the columns thresholds
Public Class Methods
Params:
-
matrix
: Contingence table -
opts
: Hash with options. Could be any
accessable attribute of object
# File lib/statsample/bivariate/polychoric.rb, line 168 def initialize(matrix, opts=Hash.new) @matrix=matrix @n=matrix.column_size @m=matrix.row_size raise "row size <1" if @m<=1 raise "column size <1" if @n<=1 @method=METHOD @name=_("Polychoric correlation") @max_iterations=MAX_ITERATIONS @epsilon=EPSILON @minimizer_type_two_step=MINIMIZER_TYPE_TWO_STEP @minimizer_type_joint_no_derivative=MINIMIZER_TYPE_JOINT_NO_DERIVATIVE @minimizer_type_joint_derivative=MINIMIZER_TYPE_JOINT_DERIVATIVE @debug=false @iteration=nil opts.each{|k,v| self.send("#{k}=",v) if self.respond_to? k } @r=nil @pd=nil @failed=false compute_basic_parameters end
Create a Polychoric
object, based on two vectors
# File lib/statsample/bivariate/polychoric.rb, line 161 def self.new_with_vectors(v1,v2) Polychoric.new(Crosstab.new(v1,v2).to_matrix) end
Public Instance Methods
Start the computation of polychoric correlation based on attribute method
.
# File lib/statsample/bivariate/polychoric.rb, line 201 def compute if @method==:two_step compute_two_step_mle elsif @method==:joint compute_one_step_mle elsif @method==:polychoric_series compute_polychoric_series else raise "Not implemented" end end
Estimation methods
↑ topPublic Instance Methods
Compute joint ML estimation. Uses compute_one_step_mle_with_derivatives
() by default.
# File lib/statsample/bivariate/polychoric.rb, line 382 def compute_one_step_mle compute_one_step_mle_with_derivatives end
Compute Polychoric
correlation with joint estimate, usign derivative based minimization method.
Much faster than method without derivatives.
# File lib/statsample/bivariate/polychoric.rb, line 390 def compute_one_step_mle_with_derivatives # Get initial values with two-step aproach compute_two_step_mle # Start iteration with past values rho=@r cut_alpha=@alpha cut_beta=@beta parameters=[rho]+cut_alpha+cut_beta np=@nc-1+@nr loglike_f = Proc.new { |v, params| new_rho=v[0] new_alpha=v[1, @nr-1] new_beta=v[@nr, @nc-1] pr=Processor.new(new_alpha,new_beta,new_rho,@matrix) pr.loglike } loglike_df = Proc.new {|v, params, df | compute_derivatives_vector(v,df) } my_func = GSL::MultiMin::Function_fdf.alloc(loglike_f,loglike_df, np) my_func.set_params(parameters) # parameters x = GSL::Vector.alloc(parameters.dup) minimizer = GSL::MultiMin::FdfMinimizer.alloc(minimizer_type_joint_derivative,np) minimizer.set(my_func, x, 1, 1e-3) iter = 0 message="" begin_time=Time.new begin iter += 1 status = minimizer.iterate() #p minimizer.f #p minimizer.gradient status = minimizer.test_gradient(1e-3) if status == GSL::SUCCESS total_time=Time.new-begin_time message+="Joint MLE converged to minimum on %0.3f seconds at\n" % total_time end x = minimizer.x message+= sprintf("%5d iterations", iter)+"\n"; message+= "args=" for i in 0...np do message+=sprintf("%10.3e ", x[i]) end message+=sprintf("f() = %7.3f\n" , minimizer.f)+"\n"; rescue => e @failed=true end while status == GSL::CONTINUE and iter < @max_iterations @iteration=iter @log+=message @r=minimizer.x[0] @alpha=minimizer.x[1,@nr-1].to_a @beta=minimizer.x[@nr,@nc-1].to_a @loglike_model= -minimizer.minimum pr=Processor.new(@alpha,@beta,@r,@matrix) end
Compute Polychoric
correlation with joint estimate, usign derivative-less minimization method.
Rho and thresholds are estimated at same time. Code based on R package “polycor”, by J.Fox.
# File lib/statsample/bivariate/polychoric.rb, line 463 def compute_one_step_mle_without_derivatives # Get initial values with two-step aproach compute_two_step_mle # Start iteration with past values rho=@r cut_alpha=@alpha cut_beta=@beta parameters=[rho]+cut_alpha+cut_beta np=@nc-1+@nr minimization = Proc.new { |v, params| new_rho=v[0] new_alpha=v[1, @nr-1] new_beta=v[@nr, @nc-1] #puts "f'rho=#{fd_loglike_rho(alpha,beta,rho)}" #(@nr-1).times {|k| # puts "f'a(#{k}) = #{fd_loglike_a(alpha,beta,rho,k)}" # puts "f'a(#{k}) v2 = #{fd_loglike_a2(alpha,beta,rho,k)}" # #} #(@nc-1).times {|k| # puts "f'b(#{k}) = #{fd_loglike_b(alpha,beta,rho,k)}" #} pr=Processor.new(new_alpha,new_beta,new_rho,@matrix) df=Array.new(np) #compute_derivatives_vector(v,df) pr.loglike } my_func = GSL::MultiMin::Function.alloc(minimization, np) my_func.set_params(parameters) # parameters x = GSL::Vector.alloc(parameters.dup) ss = GSL::Vector.alloc(np) ss.set_all(1.0) minimizer = GSL::MultiMin::FMinimizer.alloc(minimizer_type_joint_no_derivative,np) minimizer.set(my_func, x, ss) iter = 0 message="" begin_time=Time.new begin iter += 1 status = minimizer.iterate() status = minimizer.test_size(@epsilon) if status == GSL::SUCCESS total_time=Time.new-begin_time message="Joint MLE converged to minimum on %0.3f seconds at\n" % total_time end x = minimizer.x message+= sprintf("%5d iterations", iter)+"\n"; for i in 0...np do message+=sprintf("%10.3e ", x[i]) end message+=sprintf("f() = %7.3f size = %.3f\n", minimizer.fval, minimizer.size)+"\n"; end while status == GSL::CONTINUE and iter < @max_iterations @iteration=iter @log+=message @r=minimizer.x[0] @alpha=minimizer.x[1,@nr-1].to_a @beta=minimizer.x[@nr,@nc-1].to_a @loglike_model= -minimizer.minimum end
Compute polychoric correlation using polychoric series. Algorithm: AS87, by Martinson and Hamdam(1975).
Warning: According to Drasgow(2006), this computation diverges greatly of joint and two-step methods.
# File lib/statsample/bivariate/polychoric.rb, line 577 def compute_polychoric_series @nn=@n-1 @mm=@m-1 @nn7=7*@nn @mm7=7*@mm @mn=@n*@m @cont=[nil] @n.times {|j| @m.times {|i| @cont.push(@matrix[i,j]) } } pcorl=0 cont=@cont xmean=0.0 sum=0.0 row=[] colmn=[] (1..@m).each do |i| row[i]=0.0 l=i (1..@n).each do |j| row[i]=row[i]+cont[l] l+=@m end raise "Should not be empty rows" if(row[i]==0.0) xmean=xmean+row[i]*i.to_f sum+=row[i] end xmean=xmean/sum.to_f ymean=0.0 (1..@n).each do |j| colmn[j]=0.0 l=(j-1)*@m (1..@m).each do |i| l=l+1 colmn[j]=colmn[j]+cont[l] #12 end raise "Should not be empty cols" if colmn[j]==0 ymean=ymean+colmn[j]*j.to_f end ymean=ymean/sum.to_f covxy=0.0 (1..@m).each do |i| l=i (1..@n).each do |j| conxy=covxy+cont[l]*(i.to_f-xmean)*(j.to_f-ymean) l=l+@m end end chisq=0.0 (1..@m).each do |i| l=i (1..@n).each do |j| chisq=chisq+((cont[l]**2).quo(row[i]*colmn[j])) l=l+@m end end phisq=chisq-1.0-(@mm*@nn).to_f / sum.to_f phisq=0 if(phisq<0.0) # Compute cumulative sum of columns and rows sumc=[] sumr=[] sumc[1]=colmn[1] sumr[1]=row[1] cum=0 (1..@nn).each do |i| # goto 17 r20 cum=cum+colmn[i] sumc[i]=cum end cum=0 (1..@mm).each do |i| cum=cum+row[i] sumr[i]=cum end alpha=[] beta=[] # Compute points of polytomy (1..@mm).each do |i| #do 21 alpha[i]=Distribution::Normal.p_value(sumr[i] / sum.to_f) end # 21 (1..@nn).each do |i| #do 22 beta[i]=Distribution::Normal.p_value(sumc[i] / sum.to_f) end # 21 @alpha=alpha[1,alpha.size] @beta=beta[1,beta.size] @sumr=row[1,row.size] @sumc=colmn[1,colmn.size] @total=sum # Compute Fourier coefficients a and b. Verified h=hermit(alpha,@mm) hh=hermit(beta,@nn) a=[] b=[] if @m!=2 # goto 24 mmm=@m-2 (1..mmm).each do |i| #do 23 a1=sum.quo(row[i+1] * sumr[i] * sumr[i+1]) a2=sumr[i] * xnorm(alpha[i+1]) a3=sumr[i+1] * xnorm(alpha[i]) l=i (1..7).each do |j| #do 23 a[l]=Math::sqrt(a1.quo(j))*(h[l+1] * a2 - h[l] * a3) l=l+@mm end end #23 end # 24 if @n!=2 # goto 26 nnn=@n-2 (1..nnn).each do |i| #do 25 a1=sum.quo(colmn[i+1] * sumc[i] * sumc[i+1]) a2=sumc[i] * xnorm(beta[i+1]) a3=sumc[i+1] * xnorm(beta[i]) l=i (1..7).each do |j| #do 25 b[l]=Math::sqrt(a1.quo(j))*(a2 * hh[l+1] - a3*hh[l]) l=l+@nn end # 25 end # 25 end #26 r20 l = @mm a1 = -sum * xnorm(alpha[@mm]) a2 = row[@m] * sumr[@mm] (1..7).each do |j| # do 27 a[l]=a1 * h[l].quo(Math::sqrt(j*a2)) l=l+@mm end # 27 l = @nn a1 = -sum * xnorm(beta[@nn]) a2 = colmn[@n] * sumc[@nn] (1..7).each do |j| # do 28 b[l]=a1 * hh[l].quo(Math::sqrt(j*a2)) l = l + @nn end # 28 rcof=[] # compute coefficients rcof of polynomial of order 8 rcof[1]=-phisq (2..9).each do |i| # do 30 rcof[i]=0.0 end #30 m1=@mm (1..@mm).each do |i| # do 31 m1=m1+1 m2=m1+@mm m3=m2+@mm m4=m3+@mm m5=m4+@mm m6=m5+@mm n1=@nn (1..@nn).each do |j| # do 31 n1=n1+1 n2=n1+@nn n3=n2+@nn n4=n3+@nn n5=n4+@nn n6=n5+@nn rcof[3] = rcof[3] + a[i]**2 * b[j]**2 rcof[4] = rcof[4] + 2.0 * a[i] * a[m1] * b[j] * b[n1] rcof[5] = rcof[5] + a[m1]**2 * b[n1]**2 + 2.0 * a[i] * a[m2] * b[j] * b[n2] rcof[6] = rcof[6] + 2.0 * (a[i] * a[m3] * b[j] * b[n3] + a[m1] * a[m2] * b[n1] * b[n2]) rcof[7] = rcof[7] + a[m2]**2 * b[n2]**2 + 2.0 * (a[i] * a[m4] * b[j] * b[n4] + a[m1] * a[m3] * b[n1] * b[n3]) rcof[8] = rcof[8] + 2.0 * (a[i] * a[m5] * b[j] * b[n5] + a[m1] * a[m4] * b[n1] * b[n4] + a[m2] * a[m3] * b[n2] * b[n3]) rcof[9] = rcof[9] + a[m3]**2 * b[n3]**2 + 2.0 * (a[i] * a[m6] * b[j] * b[n6] + a[m1] * a[m5] * b[n1] * b[n5] + (a[m2] * a[m4] * b[n2] * b[n4])) end # 31 end # 31 rcof=rcof[1,rcof.size] poly = GSL::Poly.alloc(rcof) roots=poly.solve rootr=[nil] rooti=[nil] roots.each {|c| rootr.push(c.real) rooti.push(c.im) } @rootr=rootr @rooti=rooti norts=0 (1..7).each do |i| # do 43 next if rooti[i]!=0.0 if (covxy>=0.0) next if(rootr[i]<0.0 or rootr[i]>1.0) pcorl=rootr[i] norts=norts+1 else if (rootr[i]>=-1.0 and rootr[i]<0.0) pcorl=rootr[i] norts=norts+1 end end end # 43 raise "Error" if norts==0 @r=pcorl pr=Processor.new(@alpha,@beta,@r,@matrix) @loglike_model=-pr.loglike end
Computation of polychoric correlation usign two-step ML estimation.
Two-step ML estimation “first estimates the thresholds from the one-way marginal frequencies, then estimates rho, conditional on these thresholds, via maximum likelihood” (Uebersax, 2006).
The algorithm is based on code by Gegenfurtner(1992).
References:
-
Gegenfurtner, K. (1992). PRAXIS: Brent's algorithm for function minimization. Behavior Research Methods, Instruments & Computers, 24(4), 560-564. Available on www.allpsych.uni-giessen.de/karl/pdf/03.praxis.pdf
-
Uebersax, J.S. (2006). The tetrachoric and polychoric correlation coefficients. Statistical Methods for Rater Agreement web site. 2006. Available at: john-uebersax.com/stat/tetra.htm . Accessed February, 11, 2010
# File lib/statsample/bivariate/polychoric.rb, line 287 def compute_two_step_mle if Statsample.has_gsl? compute_two_step_mle_gsl else compute_two_step_mle_ruby end end
Compute two step ML estimation using gsl.
# File lib/statsample/bivariate/polychoric.rb, line 319 def compute_two_step_mle_gsl fn1=GSL::Function.alloc {|rho| pr=Processor.new(@alpha,@beta, rho, @matrix) pr.loglike } @iteration = 0 max_iter = @max_iterations m = 0 # initial guess m_expected = 0 a=-0.9999 b=+0.9999 gmf = GSL::Min::FMinimizer.alloc(@minimizer_type_two_step) gmf.set(fn1, m, a, b) header=_("Two step minimization using %s method\n") % gmf.name header+=sprintf("%5s [%9s, %9s] %9s %10s %9s\n", "iter", "lower", "upper", "min", "err", "err(est)") header+=sprintf("%5d [%.7f, %.7f] %.7f %+.7f %.7f\n", @iteration, a, b, m, m - m_expected, b - a) @log=header puts header if @debug begin @iteration += 1 status = gmf.iterate status = gmf.test_interval(@epsilon, 0.0) if status == GSL::SUCCESS @log+="converged:" puts "converged:" if @debug end a = gmf.x_lower b = gmf.x_upper m = gmf.x_minimum message=sprintf("%5d [%.7f, %.7f] %.7f %+.7f %.7f\n", @iteration, a, b, m, m - m_expected, b - a); @log+=message puts message if @debug end while status == GSL::CONTINUE and @iteration < @max_iterations @r=gmf.x_minimum @loglike_model=-gmf.f_minimum end
LL methods
↑ topPublic Instance Methods
Chi Square of model
# File lib/statsample/bivariate/polychoric.rb, line 230 def chi_square if @loglike_model.nil? compute end -2*(@loglike_model-loglike_data) end
# File lib/statsample/bivariate/polychoric.rb, line 237 def chi_square_df (@nr*@nc)-@nc-@nr end
# File lib/statsample/bivariate/polychoric.rb, line 244 def compute_basic_parameters @nr=@matrix.row_size @nc=@matrix.column_size @sumr=[0]*@matrix.row_size @sumrac=[0]*@matrix.row_size @sumc=[0]*@matrix.column_size @sumcac=[0]*@matrix.column_size @alpha=[0]*(@nr-1) @beta=[0]*(@nc-1) @total=0 @nr.times do |i| @nc.times do |j| @sumr[i]+=@matrix[i,j] @sumc[j]+=@matrix[i,j] @total+=@matrix[i,j] end end ac=0 (@nr-1).times do |i| @sumrac[i]=@sumr[i]+ac @alpha[i]=Distribution::Normal.p_value(@sumrac[i] / @total.to_f) ac=@sumrac[i] end ac=0 (@nc-1).times do |i| @sumcac[i]=@sumc[i]+ac @beta[i]=Distribution::Normal.p_value(@sumcac[i] / @total.to_f) ac=@sumcac[i] end end
# File lib/statsample/bivariate/polychoric.rb, line 215 def loglike_data loglike=0 @nr.times do |i| @nc.times do |j| res=@matrix[i,j].quo(@total) if (res==0) res=1e-16 end loglike+= @matrix[i,j] * Math::log(res ) end end loglike end