public class VPTree<E extends GeospatialPoint> extends Object implements GeospatialPointDatabase<E>
A geospatial database that uses a vantage point tree as its storage mechanism.
Vantage point trees (or "vp-trees") are a subclass of metric trees. Vantage point trees use binary space partitioning to recursively divide points among their nodes. Nodes in a vantage point tree have a center point and a distance threshold; points with a distance less than or equal to the threshold are assigned to the "left" child of a node and the others are assigned to the "right" child.
Queries in a vp-tree execute in O(log n) time in the best case, and tree construction takes O(n log n) time.
Vantage point trees may optionally be constructed with a
nodeCapacity
argument. The node capacity dictates the maximum number
of points that should be stored in any given leaf node, although
nodes may hold more than that many points if its contents can't be
partitioned (i.e. all of the points in the node are coincident).
When a search executed against a vp-tree reaches a leaf node, all of the points in that node are considered for inclusion in the result set. It's possible that a k-nearest-neighbor search will only need to visit one leaf node if that node contains all of the k nearest neighbors to the query point. A "good" node capacity for a vp-tree will be orders of magnitude smaller than the size of the whole dataset and slightly larger (on the same order of magnitude) than the "usual" number of nearest neighbors returned in a search; a well-chosen node capacity will make searches more efficient by minimizing the number of nodes that need to be visited while still minimizing the number of "bad" points considered.
The default node capacity for a vp-tree is 32 points.
Note that the VPTree
class is not thread-safe;
because they forego any kind of synchronization or locking, VPTree
instances can achieve slightly higher search throughput than their
thread-safe counterparts. This makes the VPTree
class a good choice
for single-threaded applications or applications where the tree will not be
modified after construction. For a thread-safe alternative, use the
LockingVPTree
class.
Modifier and Type | Field and Description |
---|---|
static int |
DEFAULT_BIN_SIZE
The default node capacity (32 points) for nodes in this tree.
|
Constructor and Description |
---|
VPTree()
Constructs a new, empty vp-tree with a default node capacity.
|
VPTree(Collection<E> points)
Constructs a new vp-tree that contains (and indexes) all of the points in
the given collection.
|
VPTree(Collection<E> points,
int nodeCapacity)
Constructs a new vp-tree that contains (and indexes) all of the points in
the given collection and has leaf nodes with the given point capacity.
|
VPTree(int nodeCapacity)
Constructs a new, empty vp-tree with the specified node capacity.
|
Modifier and Type | Method and Description |
---|---|
boolean |
add(E point)
Adds a single point to this vp-tree.
|
boolean |
addAll(Collection<? extends E> points)
Adds all of the points in the given collection to this vp-tree.
|
void |
clear()
Removes all points from this vp-tree.
|
boolean |
contains(Object o)
Tests whether this vp-tree contains the given point.
|
boolean |
containsAll(Collection<?> c)
Tests whether this vp-tree contains all of the points in the given
collection.
|
List<E> |
getAllNeighborsWithinDistance(GeospatialPoint queryPoint,
double maxDistance)
Returns a list of all points within a given distance to a query point.
|
List<E> |
getAllNeighborsWithinDistance(GeospatialPoint queryPoint,
double maxDistance,
SearchCriteria<E> searchCriteria)
Returns a list of all points within a given distance to a query point
that meet a set of search criteria.
|
List<E> |
getAllPointsInBoundingBox(double west,
double east,
double north,
double south)
Returns a list of all points in the database within the given bounding
"box." A point is considered to be inside the box if its latitude falls
between the given north and south limits (inclusive) and its longitude
falls between the east and west limits (inclusive).
|
List<E> |
getAllPointsInBoundingBox(double west,
double east,
double north,
double south,
GeospatialPoint orderingPoint)
Returns a list of all points in the database within the given bounding
"box." A point is considered to be inside the box if its latitude falls
between the given north and south limits (inclusive) and its longitude
falls between the east and west limits (inclusive).
|
List<E> |
getAllPointsInBoundingBox(double west,
double east,
double north,
double south,
SearchCriteria<E> otherCriteria)
Returns a list of all points in the database within the given bounding
"box" that also satisfy the given search criteria.
|
List<E> |
getAllPointsInBoundingBox(double west,
double east,
double north,
double south,
SearchCriteria<E> otherCriteria,
GeospatialPoint orderingPoint)
Returns a list of all points in the database within the given bounding
"box" that also satisfy the given search criteria.
|
int |
getBinSize()
Returns the maximum number of points any leaf node of this tree should
contain.
|
E |
getNearestNeighbor(GeospatialPoint queryPoint)
Returns the nearest neighbor to the given query point.
|
E |
getNearestNeighbor(GeospatialPoint queryPoint,
double maxDistance)
Returns the nearest neighbor to the given query point so long as the
nearest neighbor is within the given maximum distance.
|
E |
getNearestNeighbor(GeospatialPoint queryPoint,
double maxDistance,
SearchCriteria<E> searchCriteria)
Returns the nearest neighbor to the query point that satisfies the given
search criteria so long as that point falls within the given maximum
distance from the query point.
|
E |
getNearestNeighbor(GeospatialPoint queryPoint,
SearchCriteria<E> searchCriteria)
Returns the nearest neighbor to the given query point that satisfies the
given search criteria.
|
List<E> |
getNearestNeighbors(GeospatialPoint queryPoint,
int maxResults)
Returns a list of the nearest neighbors to a given query point.
|
List<E> |
getNearestNeighbors(GeospatialPoint queryPoint,
int maxResults,
double maxDistance)
Returns a list of the nearest neighbors to a given query point and
within a given maximum distance.
|
List<E> |
getNearestNeighbors(GeospatialPoint queryPoint,
int maxResults,
double maxDistance,
SearchCriteria<E> searchCriteria)
Returns a list of the nearest neighbors to a given query point and
within a given maximum distance.
|
List<E> |
getNearestNeighbors(GeospatialPoint queryPoint,
int maxResults,
SearchCriteria<E> searchCriteria)
Returns a list of the nearest neighbors to a given query point that
satisfy the given search criteria.
|
boolean |
isEmpty()
Tests whether this tree is empty.
|
Iterator<E> |
iterator()
Returns an
Iterator over all of the points contained in this
tree. |
boolean |
remove(Object o)
Removes a point from this tree.
|
boolean |
removeAll(Collection<?> c)
Removes all of the points in the given collection from this tree.
|
boolean |
retainAll(Collection<?> c)
Retains only the points in this vp-tree that are contained in the
specified collection.
|
int |
size()
Returns the total number of points stored in this vp-tree.
|
Object[] |
toArray()
Returns an array containing all of the points in this vp-tree.
|
<T> T[] |
toArray(T[] a)
Returns an array containing all of the points in this vp-tree; the
runtime type of the returned array is that of the specified array.
|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
equals, hashCode
public static final int DEFAULT_BIN_SIZE
public VPTree()
public VPTree(int nodeCapacity)
nodeCapacity
- the maximum number of points to store in a leaf node of the
treepublic VPTree(Collection<E> points)
points
- the points to use to populate this treepublic VPTree(Collection<E> points, int nodeCapacity)
points
- the points to use to populate this treenodeCapacity
- the largest number of points any leaf node of the tree should
containpublic int getBinSize()
public boolean add(E point)
add
in interface Collection<E extends GeospatialPoint>
point
- the point to add to this treetrue
if the tree was modified by the addition of this
point; vp-trees are always modified by adding points, so this
method always returns truepublic boolean addAll(Collection<? extends E> points)
addAll
in interface Collection<E extends GeospatialPoint>
points
- the points to add to this treetrue
if the tree was modified by the addition of the
points; vp-trees are always modified by adding points, so this
method always returns truepublic void clear()
clear
in interface Collection<E extends GeospatialPoint>
public boolean contains(Object o)
contains
in interface Collection<E extends GeospatialPoint>
o
- the object to test for membership in this treetrue
if this tree contains the given point or
false
otherwisepublic boolean containsAll(Collection<?> c)
containsAll
in interface Collection<E extends GeospatialPoint>
c
- the collection of points to test for membership in this treetrue
if this tree contains all of the members of the
given collection or false
otherwisepublic boolean isEmpty()
isEmpty
in interface Collection<E extends GeospatialPoint>
true
if this tree contains no points or false
otherwisepublic Iterator<E> iterator()
Iterator
over all of the points contained in this
tree. The order of iteration is not defined, and the Iterator
returned by this method does not support the optional remove
method. The behavior of the returned Iterator
is not defined if
the tree is modified after the Iterator
is returned.iterator
in interface Iterable<E extends GeospatialPoint>
iterator
in interface Collection<E extends GeospatialPoint>
Iterator
over the points contained in this treepublic boolean remove(Object o)
remove
in interface Collection<E extends GeospatialPoint>
o
- the point to removetrue
if the tree was modified by removing this point
(i.e. if the point was present in the tree) or false
otherwisepublic boolean removeAll(Collection<?> c)
removeAll
in interface Collection<E extends GeospatialPoint>
c
- the collection of points to remove from this truetrue
if the tree was modified by removing the given
points (i.e. if any of the points were present in the tree) or
false
otherwisepublic boolean retainAll(Collection<?> c)
retainAll
in interface Collection<E extends GeospatialPoint>
c
- collection containing elements to be retained in this treetrue
if this tree was modified by calling this method or
false
otherwisepublic int size()
size
in interface Collection<E extends GeospatialPoint>
public Object[] toArray()
toArray
in interface Collection<E extends GeospatialPoint>
public <T> T[] toArray(T[] a)
Returns an array containing all of the points in this vp-tree; the runtime type of the returned array is that of the specified array. If the collection fits in the specified array, it is returned therein. Otherwise, a new array is allocated with the runtime type of the specified array and the size of this collection.
If all of the points in this tree fit in the specified array with room
to spare (i.e., the array has more elements than this vp-tree), the
element in the array immediately following the end of the collection is
set to null
.
toArray
in interface Collection<E extends GeospatialPoint>
a
- the array into which the elements of this tree are to be
stored, if it is big enough; otherwise, a new array of the
same runtime type is allocated for this purposepublic List<E> getNearestNeighbors(GeospatialPoint queryPoint, int maxResults)
GeospatialPointDatabase
Returns a list of the nearest neighbors to a given query point. The returned list is sorted by increasing distance from the query point.
This returned list will contain at most maxResults
elements
(and may contain fewer if maxResults
is larger than the number of
points in the database). If multiple points have the same distance from
the query point, the order in which they appear in the returned list is
undefined. By extension, if multiple points have the same distance from
the query point and those points would "straddle" the end of the returned
list, which points are included in the list and which are cut off is not
prescribed.
getNearestNeighbors
in interface GeospatialPointDatabase<E extends GeospatialPoint>
queryPoint
- the point for which to find neighborsmaxResults
- the maximum length of the returned listpublic List<E> getNearestNeighbors(GeospatialPoint queryPoint, int maxResults, double maxDistance)
GeospatialPointDatabase
Returns a list of the nearest neighbors to a given query point and within a given maximum distance. The returned list is sorted by increasing distance from the query point.
This returned list will contain at most maxResults
elements
(and may contain fewer if maxResults
is larger than the number of
points in the database or if fewer than maxResults
points were
found within the given maximum distance). If multiple points have the
same distance from the query point, the order in which they appear in the
returned list is undefined. By extension, if multiple points have the
same distance from the query point and those points would "straddle" the
end of the returned list, which points are included in the list and which
are cut off is not prescribed.
getNearestNeighbors
in interface GeospatialPointDatabase<E extends GeospatialPoint>
queryPoint
- the point for which to find neighborsmaxResults
- the maximum length of the returned listmaxDistance
- the maximum allowable distance, in meters, from the query
point; points farther away than maxDistance
will not
be included in the returned listpublic List<E> getNearestNeighbors(GeospatialPoint queryPoint, int maxResults, SearchCriteria<E> searchCriteria)
GeospatialPointDatabase
Returns a list of the nearest neighbors to a given query point that satisfy the given search criteria. The returned list is sorted by increasing distance from the query point.
This returned list will contain at most maxResults
elements
(and may contain fewer if maxResults
is larger than the number of
points in the database). If multiple points have the same distance from
the query point, the order in which they appear in the returned list is
undefined. By extension, if multiple points have the same distance from
the query point and those points would "straddle" the end of the returned
list, which points are included in the list and which are cut off is not
prescribed.
getNearestNeighbors
in interface GeospatialPointDatabase<E extends GeospatialPoint>
queryPoint
- the point for which to find neighborsmaxResults
- the maximum length of the returned listsearchCriteria
- the search criteria to be met by all returned pointspublic List<E> getNearestNeighbors(GeospatialPoint queryPoint, int maxResults, double maxDistance, SearchCriteria<E> searchCriteria)
GeospatialPointDatabase
Returns a list of the nearest neighbors to a given query point and within a given maximum distance. The returned list is sorted by increasing distance from the query point.
This returned list will contain at most maxResults
elements
(and may contain fewer if maxResults
is larger than the number of
points in the database or if fewer than maxResults
points were
found within the given maximum distance). If multiple points have the
same distance from the query point, the order in which they appear in the
returned list is undefined. By extension, if multiple points have the
same distance from the query point and those points would "straddle" the
end of the returned list, which points are included in the list and which
are cut off is not prescribed.
getNearestNeighbors
in interface GeospatialPointDatabase<E extends GeospatialPoint>
queryPoint
- the point for which to find neighborsmaxResults
- the maximum length of the returned listmaxDistance
- the maximum allowable distance, in meters, from the query
point; points farther away than maxDistance
will not
be included in the returned listsearchCriteria
- the search criteria to be met by all returned pointspublic List<E> getAllNeighborsWithinDistance(GeospatialPoint queryPoint, double maxDistance)
GeospatialPointDatabase
getAllNeighborsWithinDistance
in interface GeospatialPointDatabase<E extends GeospatialPoint>
queryPoint
- the point for which to find neighborsmaxDistance
- the maximum allowable distance, in meters, from the query
point; points farther away than maxDistance
will not
be included in the returned listpublic List<E> getAllNeighborsWithinDistance(GeospatialPoint queryPoint, double maxDistance, SearchCriteria<E> searchCriteria)
GeospatialPointDatabase
getAllNeighborsWithinDistance
in interface GeospatialPointDatabase<E extends GeospatialPoint>
queryPoint
- the point for which to find neighborsmaxDistance
- the maximum allowable distance, in meters, from the query
point; points farther away than maxDistance
will not
be included in the returned listsearchCriteria
- the search criteria to be met by all points in the returned
listpublic E getNearestNeighbor(GeospatialPoint queryPoint)
GeospatialPointDatabase
getNearestNeighbor
in interface GeospatialPointDatabase<E extends GeospatialPoint>
queryPoint
- the point for which to find the nearest neighbornull
if the
database contains no pointspublic E getNearestNeighbor(GeospatialPoint queryPoint, double maxDistance)
GeospatialPointDatabase
getNearestNeighbor
in interface GeospatialPointDatabase<E extends GeospatialPoint>
queryPoint
- the point for which to find the nearest neighbormaxDistance
- the maximum allowable distance from the query point in metersnull
if the
database contains no points within maxDistance
meters of
the query pointpublic E getNearestNeighbor(GeospatialPoint queryPoint, SearchCriteria<E> searchCriteria)
GeospatialPointDatabase
getNearestNeighbor
in interface GeospatialPointDatabase<E extends GeospatialPoint>
queryPoint
- the point for which to find the nearest neighborsearchCriteria
- the criteria to apply to potential nearest neighborsnull
if the
database contains no points that satisfy the given search
criteriapublic E getNearestNeighbor(GeospatialPoint queryPoint, double maxDistance, SearchCriteria<E> searchCriteria)
GeospatialPointDatabase
getNearestNeighbor
in interface GeospatialPointDatabase<E extends GeospatialPoint>
queryPoint
- the point for which to find the nearest neighbormaxDistance
- the maximum allowable distance from the query point in meterssearchCriteria
- the criteria to apply to potential nearest neighborsnull
if no
points matching the given search criteria were found within
maxDistance
meters of the query pointpublic List<E> getAllPointsInBoundingBox(double west, double east, double north, double south)
GeospatialPointDatabase
getAllPointsInBoundingBox
in interface GeospatialPointDatabase<E extends GeospatialPoint>
west
- the western limit of the bounding box in degreeseast
- the eastern limit of the bounding box in degreesnorth
- the northern limit of the bounding box in degreessouth
- the southern limit of the bounding box in degreespublic List<E> getAllPointsInBoundingBox(double west, double east, double north, double south, GeospatialPoint orderingPoint)
GeospatialPointDatabase
getAllPointsInBoundingBox
in interface GeospatialPointDatabase<E extends GeospatialPoint>
west
- the western limit of the bounding box in degreeseast
- the eastern limit of the bounding box in degreesnorth
- the northern limit of the bounding box in degreessouth
- the southern limit of the bounding box in degreesorderingPoint
- a point to use for sorting the list of results by distance;
may be null
if no ordering is requiredorderingPoint
public List<E> getAllPointsInBoundingBox(double west, double east, double north, double south, SearchCriteria<E> otherCriteria)
GeospatialPointDatabase
getAllPointsInBoundingBox
in interface GeospatialPointDatabase<E extends GeospatialPoint>
west
- the western limit of the bounding box in degreeseast
- the eastern limit of the bounding box in degreesnorth
- the northern limit of the bounding box in degreessouth
- the southern limit of the bounding box in degreesotherCriteria
- a set of additional search criteria to apply to points that
lie within the bounding box; may be null
if no
additional criteria are to be appliedpublic List<E> getAllPointsInBoundingBox(double west, double east, double north, double south, SearchCriteria<E> otherCriteria, GeospatialPoint orderingPoint)
GeospatialPointDatabase
getAllPointsInBoundingBox
in interface GeospatialPointDatabase<E extends GeospatialPoint>
west
- the western limit of the bounding box in degreeseast
- the eastern limit of the bounding box in degreesnorth
- the northern limit of the bounding box in degreessouth
- the southern limit of the bounding box in degreesotherCriteria
- a set of additional search criteria to apply to points that
lie within the bounding box; may be null
if no
additional criteria are to be appliedorderingPoint
- a point to use for sorting the list of results by distance;
may be null
if no ordering is requiredorderingPoint
jeospatial is an open-source library hosted at https://github.com/jchambers/jeospatial.