import csv import numpy as np import matplotlib.pyplot as plt from sklearn import manifold # Distance file available from RMDS project: # https://github.com/cheind/rmds/blob/master/examples/european_city_distances.csv reader = csv.reader(open("european_city_distances.csv", "r"), delimiter=';') data = list(reader) dists =  cities =  for d in data: cities.append(d) dists.append(map(float , d[1:-1])) adist = np.array(dists) amax = np.amax(adist) adist /= amax mds = manifold.MDS(n_components=2, dissimilarity="precomputed", random_state=6)) results = mds.fit(adist) coords = results.embedding_ plt.subplots_adjust(bottom = 0.1) plt.scatter( coords[:, 0], coords[:, 1], marker = 'o' ) for label, x, y in zip(cities, coords[:, 0], coords[:, 1]): plt.annotate( label, xy = (x, y), xytext = (-20, 20), textcoords = 'offset points', ha = 'right', va = 'bottom', bbox = dict(boxstyle = 'round,pad=0.5', fc = 'yellow', alpha = 0.5), arrowprops = dict(arrowstyle = '->', connectionstyle = 'arc3,rad=0')) plt.show()Notes: If you don't specify a random_state, then a slightly different embedding may be generated each time (with arbitary rotation) in the 2D plane. If it's slow, you can use multiple CPUs via n_jobs=N.
Monday, 13 January 2014
Convert distance matrix to 2D projection with Python
In my continuing quest to never use R again, I've been trying to figure out how to embed points described by a distance matrix into 2D. This can be done with several manifold embeddings provided by scikit-learn. The diagram below was generated using metric multi-dimensional scaling based on a distance matrix of pairwise distances between European cities (docs here and here).