class UkParliament::MemberListDocPipeline

Class defining the pipeline process of a scraped member list document.

Public Class Methods

new(house_id, document) click to toggle source

Initialise the class, calling the parent class init, with provided args.

Calls superclass method UkParliament::DocPipeline::new
# File lib/uk_parliament/member_list_doc_pipeline.rb, line 5
def initialize(house_id, document)
  super
end

Public Instance Methods

house_member_list(members) click to toggle source

Produce the list of members for the relevant house.

# File lib/uk_parliament/member_list_doc_pipeline.rb, line 10
def house_member_list(members)
  @members = members

  execute
end

Private Instance Methods

commons_constituency(member, node) click to toggle source

Extract a Commons member constituency from a document node.

# File lib/uk_parliament/member_list_doc_pipeline.rb, line 93
def commons_constituency(member, node)
  member['constituency'] = node.content
end
commons_members() click to toggle source

Process House of Commons member list document data, pulling out each member's basic data and appending to a list of members.

# File lib/uk_parliament/member_list_doc_pipeline.rb, line 32
def commons_members
  table_rows = @document.xpath("//tr[descendant::a[starts-with(@href, 'http://www.parliament.uk/biographies/commons/')]]")

  table_rows.each do |row|
    member = {}

    name = row.at_xpath('./td[1]/a')
    first_cell_text = row.xpath('./td[1]//text()')
    constituency = row.at_xpath('./td[2]')

    member_name(member, name)
    member_profile_url(member, name)
    member_id(member, name)
    commons_party(member, first_cell_text)
    commons_constituency(member, constituency)

    @members << member
  end
end
commons_party(member, nodeset) click to toggle source

Extract Commons member party from a document node.

# File lib/uk_parliament/member_list_doc_pipeline.rb, line 88
def commons_party(member, nodeset)
  member['party'] = nodeset.last.to_s.strip[1..-2]
end
define_commons_tasks() click to toggle source

Define the tasks that will be performed for the commons member list pipeline.

# File lib/uk_parliament/member_list_doc_pipeline.rb, line 20
def define_commons_tasks
  @commons_tasks = %w(commons_members)
end
define_lords_tasks() click to toggle source

Define the tasks that will be performed for the lords member list pipeline.

# File lib/uk_parliament/member_list_doc_pipeline.rb, line 26
def define_lords_tasks
  @lords_tasks = %w(lords_members)
end
lords_members() click to toggle source

Process House of Lords member list document data, pulling out each member's basic data and appending to a list of members.

# File lib/uk_parliament/member_list_doc_pipeline.rb, line 54
def lords_members
  table_rows = @document.xpath("//tr[descendant::a[starts-with(@href, 'http://www.parliament.uk/biographies/lords/')]]")

  table_rows.each do |row|
    member = {}

    name = row.at_xpath('./td[1]/a')
    party = row.at_xpath('./td[2]')

    member_name(member, name)
    member_profile_url(member, name)
    member_id(member, name)
    lords_party(member, party)

    @members << member
  end
end
lords_party(member, node) click to toggle source

Extract Lords member party or group from a document node.

# File lib/uk_parliament/member_list_doc_pipeline.rb, line 98
def lords_party(member, node)
  member['party_or_group'] = node.content
end
member_id(member, node) click to toggle source

Extract member ID data from a document node.

# File lib/uk_parliament/member_list_doc_pipeline.rb, line 83
def member_id(member, node)
  member['id'] = node['href'].split('/').last.to_i
end
member_name(member, node) click to toggle source

Extract member name data from a document node.

# File lib/uk_parliament/member_list_doc_pipeline.rb, line 73
def member_name(member, node)
  member['alphabetical_name'] = node.content
end
member_profile_url(member, node) click to toggle source

Extract member summary URL data from a document node.

# File lib/uk_parliament/member_list_doc_pipeline.rb, line 78
def member_profile_url(member, node)
  member['url'] = node['href']
end