class RGeo::Shapefile::Reader
Represents a shapefile that is open for reading.
You can use this object to read a shapefile straight through, yielding the data in a block; or you can perform random access reads of indexed records.
You must close this object after you are done, in order to close the underlying files. Alternatively, you can pass a block to Reader::open
, and the reader will be closed automatically for you at the end of the block.
Dependencies¶ ↑
Attributes in shapefiles are stored in a “.dbf” (dBASE) format file. The “dbf” gem is required to read these files. If this gem is not installed, shapefile reading will still function, but attributes will not be available.
Correct interpretation of the polygon shape type requires some functionality that is available in the RGeo::Geos module. Hence, reading a polygon shapefile will generally fail if that module is not available or the GEOS library is not installed. It is possible to bypass this requirement by relaxing the polygon tests and making some assumptions about the file format. See the documentation for Reader::open
for details.
Shapefile
support¶ ↑
This class supports shapefiles formatted according to the 1998 “ESRI Shapefile
Technical Description”. It converts shapefile data to RGeo
geometry objects, as follows:
-
Shapefile
records are represented by theRGeo::Shapefile::Reader::Record
class, which provides the geometry, the attributes, and the record number (0-based). -
Attribute reading is supported by the “dbf” gem, which provides the proper typecasting for numeric, string, boolean, and date/time column types. Data in unrecognized column types are returned as strings.
-
All shape types documented in the 1998 publication are supported, including point, polyline, polygon, multipoint, and multipatch, along with Z and M versions.
-
Null shapes are translated into nil geometry objects. That is,
Record#geometry
will return nil if that record has a null shape. -
The point shape type yields Point geometries.
-
The multipoint shape type yields MultiPoint geometries.
-
The polyline shape type yields MultiLineString geometries.
-
The polygon shape type yields MultiPolygon geometries.
-
The multipatch shape type yields GeometryCollection geometries. (See below for an explanation of why we do not return a MultiPolygon.)
Some special notes and limitations in our shapefile support:
-
Our implementation assumes that shapefile data is in a Cartesian coordinate system when it performs certain computations, such as directionality of polygon rings. It also ignores the 180 degree longitude seam, so it may not correctly interpret objects whose coordinates are in lat/lon space and which span that seam.
-
The ESRI polygon specification allows interior rings to touch their exterior ring in a finite number of points. This technically violates the OGC Polygon definition. However, such a structure remains a legal OGC MultiPolygon, and it is in principle possible to detect this case and transform the geometry type accordingly. We do not yet do this. Therefore, it is possible for a shapefile with polygon type to yield an illegal geometry.
-
The ESRI polygon specification clearly specifies the winding order for inner and outer rings: outer rings are clockwise while inner rings are counterclockwise. We have heard it reported that there may be shapefiles out there that do not conform to this spec. Such shapefiles may not read correctly.
-
The ESRI multipatch specification includes triangle strips and triangle fans as ways of constructing polygonal patches. We read in the aggregate polygonal patches, and do not preserve the individual triangles.
-
The ESRI multipatch specification allows separate patch parts to share common boundaries, thus effectively becoming a single polygon. It is in principle possible to detect this case and merge the constituent polygons; however, such a data structure implies that the intent is for such polygons to remain distinct objects even though they share a common boundary. Therefore, we do not attempt to merge such polygons. However, this means it is possible for a multipatch to violate the OGC MultiPolygon assertions, which do not allow constituent polygons to share a common boundary. Therefore, when reading a multipatch, we return a GeometryCollection instead of a MultiPolygon.
Constants
- NODATA_LIMIT
Values less than this value are considered “no value” in the shapefile format specification.
Public Class Methods
Create a new shapefile reader. You must pass the path for the main shapefile (e.g. “path/to/file.shp”). You may also omit the “.shp” extension from the path. All three files that make up the shapefile (“.shp”, “.shx”, and “.dbf”) must be present for successful opening of a shapefile.
You must also provide a RGeo::Feature::FactoryGenerator. It should understand the configuration options :has_z_coordinate
and :has_m_coordinate
. You may also pass a specific RGeo::Feature::Factory, or nil to specify the default Cartesian FactoryGenerator.
If you provide a block, the shapefile reader will be yielded to the block, and automatically closed at the end of the block. In this instance, File.open returns the value of the block. If you do not provide a block, the shapefile reader will be returned from this call. It is then the caller’s responsibility to close the reader when it is done.
Options include:
:factory_generator
-
A RGeo::Feature::FactoryGenerator that should return a factory based on the dimension settings in the input. It should understand the configuration options
:has_z_coordinate
and:has_m_coordinate
. You may also pass a specific RGeo::Feature::Factory. If no factory generator is provided, the default Cartesian factory generator is used. This option can also be specified using the:factory
key. :srid
-
If provided, this option is passed to the factory generator. This is useful because shapefiles do not contain a SRID.
:assume_inner_follows_outer
-
If set to true, some assumptions are made about ring ordering in a polygon shapefile. See below for details. Default is false.
Ring ordering in polygon shapefiles¶ ↑
The ESRI polygon shape type specifies that the ordering of rings in the shapefile is not significant. That is, rings can be in any order, and inner rings need not necessarily follow the outer ring they are associated with. This specification causes some headache in the process of constructing polygons from a shapefile, because it becomes necessary to run some geometric analysis on the rings that are read in, in order to determine which inner rings should go with which outer rings.
RGeo’s shapefile reader uses GEOS to perform this analysis. However, this means that if GEOS is not available, the analysis will fail. It also means reading polygons may be slow, especially for polygon records with a large number of parts. Therefore, it is possible to turn off this analysis by setting the :assume_inner_follows_outer
switch when creating a Reader
. This causes the shapefile reader to assume that inner rings always follow their corresponding outer ring in the file. This is probably true for most well-behaved shapefiles out there, but since it is not part of the specification, this shortcutting is not turned on by default. However, if you are running RGeo
on a platform without GEOS, you have no choice but to turn on this switch and make this assumption about your input shapefiles.
# File lib/rgeo/shapefile/reader.rb, line 161 def self.open(path_, opts_ = {}, &block_) file_ = new(path_, opts_) if block_ begin yield file_ ensure file_.close end else file_ end end
Public Instance Methods
Returns true if attributes are available. This may be false because there is no “.dbf” file or because the dbf gem is not available.
# File lib/rgeo/shapefile/reader.rb, line 270 def attributes_available? @opened ? (@attr_dbf ? true : false) : nil end
Close the shapefile. You should not use this Reader
after it has been closed. Most methods will return nil.
# File lib/rgeo/shapefile/reader.rb, line 251 def close return unless @opened @main_file.close @index_file.close @attr_dbf.close if @attr_dbf @opened = false end
Returns the current file pointer as a record index (0-based). This is the record number that will be read when Reader#next
is called.
# File lib/rgeo/shapefile/reader.rb, line 346 def cur_index @opened ? @cur_record_index : nil end
Read the remaining records starting with the current record index, and yield the Reader::Record
for each one.
# File lib/rgeo/shapefile/reader.rb, line 359 def each return to_enum(:each) { @num_records } unless block_given? raise IOError, "File was not open" unless @opened # Each needs to be idempotent, therefore we reset all the internal indexes to their original value current_record_index = @cur_record_index begin rewind yield _read_next_record while @cur_record_index < @num_records ensure seek_index(current_record_index) end self end
Returns the factory used by this reader.
# File lib/rgeo/shapefile/reader.rb, line 276 def factory @opened ? @factory : nil end
Get the given record number. Equivalent to seeking to that index and calling next.
# File lib/rgeo/shapefile/reader.rb, line 399 def get(index_) seek_index(index_) ? self.next : nil end
Returns the maximum m, or nil if the shapefile does not contain m.
# File lib/rgeo/shapefile/reader.rb, line 338 def mmax @opened ? @mmax : nil end
Returns the minimum m, or nil if the shapefile does not contain m.
# File lib/rgeo/shapefile/reader.rb, line 332 def mmin @opened ? @mmin : nil end
Read and return the next record as a Reader::Record
.
# File lib/rgeo/shapefile/reader.rb, line 352 def next @opened && @cur_record_index < @num_records ? _read_next_record : nil end
Returns the number of records in the shapefile.
# File lib/rgeo/shapefile/reader.rb, line 282 def num_records @opened ? @num_records : nil end
Returns true if this Reader
is still open, or false if it has been closed.
# File lib/rgeo/shapefile/reader.rb, line 262 def open? @opened end
Rewind to the beginning of the file. Equivalent to seek_index
(0).
# File lib/rgeo/shapefile/reader.rb, line 392 def rewind seek_index(0) end
Seek to the given record index.
# File lib/rgeo/shapefile/reader.rb, line 375 def seek_index(index_) if @opened && index_ >= 0 && index_ <= @num_records if index_ < @num_records && index_ != @cur_record_index @index_file.seek(100 + 8 * index_) offset_ = @index_file.read(4).unpack("N").first @main_file.seek(offset_ * 2) end @cur_record_index = index_ true else false end end
Returns the shape type code.
# File lib/rgeo/shapefile/reader.rb, line 290 def shape_type_code @opened ? @shape_type_code : nil end
Returns the maximum x.
# File lib/rgeo/shapefile/reader.rb, line 302 def xmax @opened ? @xmax : nil end
Returns the minimum x.
# File lib/rgeo/shapefile/reader.rb, line 296 def xmin @opened ? @xmin : nil end
Returns the maximum y.
# File lib/rgeo/shapefile/reader.rb, line 314 def ymax @opened ? @ymax : nil end
Returns the minimum y.
# File lib/rgeo/shapefile/reader.rb, line 308 def ymin @opened ? @ymin : nil end
Returns the maximum z, or nil if the shapefile does not contain z.
# File lib/rgeo/shapefile/reader.rb, line 326 def zmax @opened ? @zmax : nil end
Returns the minimum z, or nil if the shapefile does not contain z.
# File lib/rgeo/shapefile/reader.rb, line 320 def zmin @opened ? @zmin : nil end