class UkParliament::MemberListDocPipeline
Class defining the pipeline process of a scraped member list document.
Public Class Methods
Initialise the class, calling the parent class init, with provided args.
UkParliament::DocPipeline::new
# File lib/uk_parliament/member_list_doc_pipeline.rb, line 5 def initialize(house_id, document) super end
Public Instance Methods
Produce the list of members for the relevant house.
# File lib/uk_parliament/member_list_doc_pipeline.rb, line 10 def house_member_list(members) @members = members execute end
Private Instance Methods
Extract a Commons
member constituency from a document node.
# File lib/uk_parliament/member_list_doc_pipeline.rb, line 93 def commons_constituency(member, node) member['constituency'] = node.content end
Process House of Commons
member list document data, pulling out each member's basic data and appending to a list of members.
# File lib/uk_parliament/member_list_doc_pipeline.rb, line 32 def commons_members table_rows = @document.xpath("//tr[descendant::a[starts-with(@href, 'http://www.parliament.uk/biographies/commons/')]]") table_rows.each do |row| member = {} name = row.at_xpath('./td[1]/a') first_cell_text = row.xpath('./td[1]//text()') constituency = row.at_xpath('./td[2]') member_name(member, name) member_profile_url(member, name) member_id(member, name) commons_party(member, first_cell_text) commons_constituency(member, constituency) @members << member end end
Extract Commons
member party from a document node.
# File lib/uk_parliament/member_list_doc_pipeline.rb, line 88 def commons_party(member, nodeset) member['party'] = nodeset.last.to_s.strip[1..-2] end
Define the tasks that will be performed for the commons member list pipeline.
# File lib/uk_parliament/member_list_doc_pipeline.rb, line 20 def define_commons_tasks @commons_tasks = %w(commons_members) end
Define the tasks that will be performed for the lords member list pipeline.
# File lib/uk_parliament/member_list_doc_pipeline.rb, line 26 def define_lords_tasks @lords_tasks = %w(lords_members) end
Process House of Lords
member list document data, pulling out each member's basic data and appending to a list of members.
# File lib/uk_parliament/member_list_doc_pipeline.rb, line 54 def lords_members table_rows = @document.xpath("//tr[descendant::a[starts-with(@href, 'http://www.parliament.uk/biographies/lords/')]]") table_rows.each do |row| member = {} name = row.at_xpath('./td[1]/a') party = row.at_xpath('./td[2]') member_name(member, name) member_profile_url(member, name) member_id(member, name) lords_party(member, party) @members << member end end
Extract Lords
member party or group from a document node.
# File lib/uk_parliament/member_list_doc_pipeline.rb, line 98 def lords_party(member, node) member['party_or_group'] = node.content end
Extract member ID data from a document node.
# File lib/uk_parliament/member_list_doc_pipeline.rb, line 83 def member_id(member, node) member['id'] = node['href'].split('/').last.to_i end
Extract member name data from a document node.
# File lib/uk_parliament/member_list_doc_pipeline.rb, line 73 def member_name(member, node) member['alphabetical_name'] = node.content end
Extract member summary URL data from a document node.
# File lib/uk_parliament/member_list_doc_pipeline.rb, line 78 def member_profile_url(member, node) member['url'] = node['href'] end