Search Results

Search found 4 results on 1 pages for 'pyml'.

Page 1/1 | 1 

  • Save PyML.classifiers.multi.OneAgainstRest(SVM()) object?

    - by Michael Aaron Safyan
    I'm using PYML to construct a multiclass linear support vector machine (SVM). After training the SVM, I would like to be able to save the classifier, so that on subsequent runs I can use the classifier right away without retraining. Unfortunately, the .save() function is not implemented for that classifier, and attempting to pickle it (both with standard pickle and cPickle) yield the following error message: pickle.PicklingError: Can't pickle : it's not found as __builtin__.PySwigObject Does anyone know of a way around this or of an alternative library without this problem? Thanks. Edit/Update I am now training and attempting to save the classifier with the following code: mc = multi.OneAgainstRest(SVM()); mc.train(dataset_pyml,saveSpace=False); for i, classifier in enumerate(mc.classifiers): filename=os.path.join(prefix,labels[i]+".svm"); classifier.save(filename); Notice that I am now saving with the PyML save mechanism rather than with pickling, and that I have passed "saveSpace=False" to the training function. However, I am still gettting an error: ValueError: in order to save a dataset you need to train as: s.train(data, saveSpace = False) However, I am passing saveSpace=False... so, how do I save the classifier(s)? P.S. The project I am using this in is pyimgattr, in case you would like a complete testable example... the program is run with "./pyimgattr.py train"... that will get you this error. Also, a note on version information: [michaelsafyan@codemage /Volumes/Storage/classes/cse559/pyimgattr]$ python Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29) [GCC 4.2.1 (Apple Inc. build 5646)] on darwin Type "help", "copyright", "credits" or "license" for more information. import PyML print PyML.__version__ 0.7.0

    Read the article

  • PyML 0.7.2 - How to prevent accuracy from dropping after storing/loading a classifier?

    - by Michael Aaron Safyan
    This is a followup from "Save PyML.classifiers.multi.OneAgainstRest(SVM()) object?". The solution to that question was close, but not quite right, (the SparseDataSet is broken, so attempting to save/load with that dataset container type will fail, no matter what. Also, PyML is inconsistent in terms of whether labels should be numbers or strings... it turns out that the oneAgainstRest function is actually not good enough, because the labels need to be strings and simultaneously convertible to floats, because there are places where it is assumed to be a string and elsewhere converted to float) and so after a great deal of hacking and such I was finally able to figure out a way to save and load my multi-class classifier without it blowing up with an error.... however, although it is no longer giving me an error message, it is still not quite right as the accuracy of the classifier drops significantly when it is saved and then reloaded (so I'm still missing a piece of the puzzle). I am currently using the following custom mutli-class classifier for training, saving, and loading: class SVM(object): def __init__(self,features_or_filename,labels=None,kernel=None): if isinstance(features_or_filename,str): filename=features_or_filename; if labels!=None: raise ValueError,"Labels must be None if loading from a file."; with open(os.path.join(filename,"uniquelabels.list"),"rb") as uniquelabelsfile: self.uniquelabels=sorted(list(set(pickle.load(uniquelabelsfile)))); self.labeltoindex={}; for idx,label in enumerate(self.uniquelabels): self.labeltoindex[label]=idx; self.classifiers=[]; for classidx, classname in enumerate(self.uniquelabels): self.classifiers.append(PyML.classifiers.svm.loadSVM(os.path.join(filename,str(classname)+".pyml.svm"),datasetClass = PyML.VectorDataSet)); else: features=features_or_filename; if labels==None: raise ValueError,"Labels must not be None when training."; self.uniquelabels=sorted(list(set(labels))); self.labeltoindex={}; for idx,label in enumerate(self.uniquelabels): self.labeltoindex[label]=idx; points = [[float(xij) for xij in xi] for xi in features]; self.classifiers=[PyML.SVM(kernel) for label in self.uniquelabels]; for i in xrange(len(self.uniquelabels)): currentlabel=self.uniquelabels[i]; currentlabels=['+1' if k==currentlabel else '-1' for k in labels]; currentdataset=PyML.VectorDataSet(points,L=currentlabels,positiveClass='+1'); self.classifiers[i].train(currentdataset,saveSpace=False); def accuracy(self,pts,labels): logger=logging.getLogger("ml"); correct=0; total=0; classindexes=[self.labeltoindex[label] for label in labels]; h=self.hypotheses(pts); for idx in xrange(len(pts)): if h[idx]==classindexes[idx]: logger.info("RIGHT: Actual \"%s\" == Predicted \"%s\"" %(self.uniquelabels[ classindexes[idx] ], self.uniquelabels[ h[idx] ])); correct+=1; else: logger.info("WRONG: Actual \"%s\" != Predicted \"%s\"" %(self.uniquelabels[ classindexes[idx] ], self.uniquelabels[ h[idx] ])) total+=1; return float(correct)/float(total); def prediction(self,pt): h=self.hypothesis(pt); if h!=None: return self.uniquelabels[h]; return h; def predictions(self,pts): h=self.hypotheses(self,pts); return [self.uniquelabels[x] if x!=None else None for x in h]; def hypothesis(self,pt): bestvalue=None; bestclass=None; dataset=PyML.VectorDataSet([pt]); for classidx, classifier in enumerate(self.classifiers): val=classifier.decisionFunc(dataset,0); if (bestvalue==None) or (val>bestvalue): bestvalue=val; bestclass=classidx; return bestclass; def hypotheses(self,pts): bestvalues=[None for pt in pts]; bestclasses=[None for pt in pts]; dataset=PyML.VectorDataSet(pts); for classidx, classifier in enumerate(self.classifiers): for ptidx in xrange(len(pts)): val=classifier.decisionFunc(dataset,ptidx); if (bestvalues[ptidx]==None) or (val>bestvalues[ptidx]): bestvalues[ptidx]=val; bestclasses[ptidx]=classidx; return bestclasses; def save(self,filename): if not os.path.exists(filename): os.makedirs(filename); with open(os.path.join(filename,"uniquelabels.list"),"wb") as uniquelabelsfile: pickle.dump(self.uniquelabels,uniquelabelsfile,pickle.HIGHEST_PROTOCOL); for classidx, classname in enumerate(self.uniquelabels): self.classifiers[classidx].save(os.path.join(filename,str(classname)+".pyml.svm")); I am using the latest version of PyML (0.7.2, although PyML.__version__ is 0.7.0). When I construct the classifier with a training dataset, the reported accuracy is ~0.87. When I then save it and reload it, the accuracy is less than 0.001. So, there is something here that I am clearly not persisting correctly, although what that may be is completely non-obvious to me. Would you happen to know what that is?

    Read the article

  • PyML 0.7.2 - How to prevent accuracy from dropping after stroing/loading a classifier?

    - by Michael Aaron Safyan
    This is a followup from "Save PyML.classifiers.multi.OneAgainstRest(SVM()) object?". The solution to that question was close, but not quite right, (the SparseDataSet is broken, so attempting to save/load with that dataset container type will fail, no matter what. Also, PyML is inconsistent in terms of whether labels should be numbers or strings... it turns out that the oneAgainstRest function is actually not good enough, because the labels need to be strings and simultaneously convertible to floats, because there are places where it is assumed to be a string and elsewhere converted to float) and so after a great deal of hacking and such I was finally able to figure out a way to save and load my multi-class classifier without it blowing up with an error.... however, although it is no longer giving me an error message, it is still not quite right as the accuracy of the classifier drops significantly when it is saved and then reloaded (so I'm still missing a piece of the puzzle). I am currently using the following custom mutli-class classifier for training, saving, and loading: class SVM(object): def __init__(self,features_or_filename,labels=None,kernel=None): if isinstance(features_or_filename,str): filename=features_or_filename; if labels!=None: raise ValueError,"Labels must be None if loading from a file."; with open(os.path.join(filename,"uniquelabels.list"),"rb") as uniquelabelsfile: self.uniquelabels=sorted(list(set(pickle.load(uniquelabelsfile)))); self.labeltoindex={}; for idx,label in enumerate(self.uniquelabels): self.labeltoindex[label]=idx; self.classifiers=[]; for classidx, classname in enumerate(self.uniquelabels): self.classifiers.append(PyML.classifiers.svm.loadSVM(os.path.join(filename,str(classname)+".pyml.svm"),datasetClass = PyML.VectorDataSet)); else: features=features_or_filename; if labels==None: raise ValueError,"Labels must not be None when training."; self.uniquelabels=sorted(list(set(labels))); self.labeltoindex={}; for idx,label in enumerate(self.uniquelabels): self.labeltoindex[label]=idx; points = [[float(xij) for xij in xi] for xi in features]; self.classifiers=[PyML.SVM(kernel) for label in self.uniquelabels]; for i in xrange(len(self.uniquelabels)): currentlabel=self.uniquelabels[i]; currentlabels=['+1' if k==currentlabel else '-1' for k in labels]; currentdataset=PyML.VectorDataSet(points,L=currentlabels,positiveClass='+1'); self.classifiers[i].train(currentdataset,saveSpace=False); def accuracy(self,pts,labels): logger=logging.getLogger("ml"); correct=0; total=0; classindexes=[self.labeltoindex[label] for label in labels]; h=self.hypotheses(pts); for idx in xrange(len(pts)): if h[idx]==classindexes[idx]: logger.info("RIGHT: Actual \"%s\" == Predicted \"%s\"" %(self.uniquelabels[ classindexes[idx] ], self.uniquelabels[ h[idx] ])); correct+=1; else: logger.info("WRONG: Actual \"%s\" != Predicted \"%s\"" %(self.uniquelabels[ classindexes[idx] ], self.uniquelabels[ h[idx] ])) total+=1; return float(correct)/float(total); def prediction(self,pt): h=self.hypothesis(pt); if h!=None: return self.uniquelabels[h]; return h; def predictions(self,pts): h=self.hypotheses(self,pts); return [self.uniquelabels[x] if x!=None else None for x in h]; def hypothesis(self,pt): bestvalue=None; bestclass=None; dataset=PyML.VectorDataSet([pt]); for classidx, classifier in enumerate(self.classifiers): val=classifier.decisionFunc(dataset,0); if (bestvalue==None) or (val>bestvalue): bestvalue=val; bestclass=classidx; return bestclass; def hypotheses(self,pts): bestvalues=[None for pt in pts]; bestclasses=[None for pt in pts]; dataset=PyML.VectorDataSet(pts); for classidx, classifier in enumerate(self.classifiers): for ptidx in xrange(len(pts)): val=classifier.decisionFunc(dataset,ptidx); if (bestvalues[ptidx]==None) or (val>bestvalues[ptidx]): bestvalues[ptidx]=val; bestclasses[ptidx]=classidx; return bestclasses; def save(self,filename): if not os.path.exists(filename): os.makedirs(filename); with open(os.path.join(filename,"uniquelabels.list"),"wb") as uniquelabelsfile: pickle.dump(self.uniquelabels,uniquelabelsfile,pickle.HIGHEST_PROTOCOL); for classidx, classname in enumerate(self.uniquelabels): self.classifiers[classidx].save(os.path.join(filename,str(classname)+".pyml.svm")); I am using the latest version of PyML (0.7.2, although PyML.__version__ is 0.7.0). When I construct the classifier with a training dataset, the reported accuracy is ~0.87. When I then save it and reload it, the accuracy is less than 0.001. So, there is something here that I am clearly not persisting correctly, although what that may be is completely non-obvious to me. Would you happen to know what that is?

    Read the article

  • Loading a PyML multiclass classifier... why isn't this working?

    - by Michael Aaron Safyan
    This is a followup from "Save PyML.classifiers.multi.OneAgainstRest(SVM()) object?". I am using PyML for a computer vision project (pyimgattr), and have been having trouble storing/loading a multiclass classifier. When attempting to load one of the SVMs in a composite classifier, with loadSVM, I am getting: ValueError: invalid literal for float(): rest Note that this does not happen with the first classifier that I load, only with the second. What is causing this error, and what can I do to get around this so that I can properly load the classifier? Details To better understand the trouble I'm running into, you may want to look at pyimgattr.py (currently revision 11). I am invoking the program with "./pyimgattr.py train" which trains the classifier (invokes train on line 571, which trains the classifier with trainmulticlassclassifier on line 490 and saves it with storemulticlassclassifier on line 529), and then invoking the program with "./pyimgattr.py test" which loads the classifier in order to test it with the testing dataset (invokes test on line 628, which invokes loadmulticlassclassifier on line 549). The multiclass classifier consists of several one-against-rest SVMs which are saved individually. The loadmulticlassclassifier function loads these individually by calling loadSVM() on several different files. It is in this call to loadSVM (done indirectly in loadclassifier on line 517) that I get an error. The first of the one-against-rest classifiers loads successfully, but the second one does not. A transcript is as follows: $ ./pyimgattr.py test [INFO] pyimgattr -- Loading attributes from "classifiers/attributes.lst"... [INFO] pyimgattr -- Loading classnames from "classifiers/classnames.lst"... [INFO] pyimgattr -- Loading dataset "attribute_data/apascal_test.txt"... [INFO] pyimgattr -- Loaded dataset "attribute_data/apascal_test.txt". [INFO] pyimgattr -- Loading multiclass classifier from "classifiers/classnames_from_attributes"... [INFO] pyimgattr -- Constructing object into which to store loaded data... [INFO] pyimgattr -- Loading manifest data... [INFO] pyimgattr -- Loading classifier from "classifiers/classnames_from_attributes/aeroplane.svm".... scanned 100 patterns scanned 200 patterns read 100 patterns read 200 patterns {'50': 38, '60': 45, '61': 46, '62': 47, '49': 37, '52': 39, '53': 40, '24': 16, '25': 17, '26': 18, '27': 19, '20': 12, '21': 13, '22': 14, '23': 15, '46': 34, '47': 35, '28': 20, '29': 21, '40': 32, '41': 33, '1': 1, '0': 0, '3': 3, '2': 2, '5': 5, '4': 4, '7': 7, '6': 6, '8': 8, '58': 44, '39': 31, '38': 30, '15': 9, '48': 36, '16': 10, '19': 11, '32': 24, '31': 23, '30': 22, '37': 29, '36': 28, '35': 27, '34': 26, '33': 25, '55': 42, '54': 41, '57': 43} read 250 patterns in LinearSparseSVModel done LinearSparseSVModel constructed model [INFO] pyimgattr -- Loaded classifier from "classifiers/classnames_from_attributes/aeroplane.svm". [INFO] pyimgattr -- Loading classifier from "classifiers/classnames_from_attributes/bicycle.svm".... label at None delimiter , Traceback (most recent call last): File "./pyimgattr.py", line 797, in sys.exit(main(sys.argv)); File "./pyimgattr.py", line 782, in main return test(attributes_file,classnames_file,testing_annotations_file,testing_dataset_path,classifiers_path,logger); File "./pyimgattr.py", line 635, in test multiclass_classnames_from_attributes_classifier = loadmulticlassclassifier(classnames_from_attributes_folder,logger); File "./pyimgattr.py", line 529, in loadmulticlassclassifier classifiers.append(loadclassifier(os.path.join(filename,label+".svm"),logger)); File "./pyimgattr.py", line 502, in loadclassifier result=loadSVM(filename,datasetClass = SparseDataSet); File "/Library/Python/2.6/site-packages/PyML/classifiers/svm.py", line 328, in loadSVM data = datasetClass(fileName, **args) File "/Library/Python/2.6/site-packages/PyML/containers/vectorDatasets.py", line 224, in __init__ BaseVectorDataSet.__init__(self, arg, **args) File "/Library/Python/2.6/site-packages/PyML/containers/baseDatasets.py", line 214, in __init__ self.constructFromFile(arg, **args) File "/Library/Python/2.6/site-packages/PyML/containers/baseDatasets.py", line 243, in constructFromFile for x in parser : File "/Library/Python/2.6/site-packages/PyML/containers/parsers.py", line 426, in next x = [float(token) for token in tokens[self._first:self._last]] ValueError: invalid literal for float(): rest

    Read the article

1