Class instance clustering in object reference graph for multi-entries serialization

Posted by Juh_ on Programmers See other posts from Programmers or by Juh_
Published on 2013-08-02T13:09:15Z Indexed on 2013/08/02 16:02 UTC
Read the original article Hit count: 356

Filed under:

My question is on the best way to cluster a graph of class instances (i.e. objects, the graph nodes) linked by object references (the -directed- edges of the graph) around specifically marked objects. To explain better my question, let me explain my motivation:

I currently use a moderately complex system to serialize the data used in my projects:

"marked" objects have a specific attributes which stores a "saving entry": the path to an associated file on disc (but it could be done for any storage type providing the suitable interface)
Those object can then be serialized automatically (eg: obj.save())
The serialization of a marked object 'a' contains implicitly all objects 'b' for which 'a' has a reference to, directly s.t: a.b = b, or indirectly s.t.: a.c.b = b for some object 'c'

This is very simple and basically define specific storage entries to specific objects. I have then "container" type objects that:

can be serialized similarly (in fact their are or can-be "marked")
they don't serialize in their storage entries the "marked" objects (with direct reference): if a and a.b are both marked, a.save() calls b.save() and stores a.b = storage_entry(b)

So, if I serialize 'a', it will serialize automatically all objects that can be reached from 'a' through the object reference graph, possibly in multiples entries. That is what I want, and is usually provides the functionalities I need. However, it is very ad-hoc and there are some structural limitations to this approach:

the multi-entry saving can only works through direct connections in "container" objects, and
there are situations with undefined behavior such as if two "marked" objects 'a'and 'b' both have a reference to an unmarked object 'c'. In this case my system will stores 'c' in both 'a' and 'b' making an implicit copy which not only double the storage size, but also change the object reference graph after re-loading.

I am thinking of generalizing the process. Apart for the practical questions on implementation (I am coding in python, and use Pickle to serialize my objects), there is a general question on the way to attach (cluster) unmarked objects to marked ones.

So, my questions are:

What are the important issues that should be considered? Basically why not just use any graph parsing algorithm with the "attach to last marked node" behavior.
Is there any work done on this problem, practical or theoretical, that I should be aware of?

Note: I added the tag graph-database because I think the answer might come from that fields, even if the question is not.

Developer IT

Class instance clustering in object reference graph for multi-entries serialization - Developer IT

Class instance clustering in object reference graph for multi-entries serialization

python

graph

serialization

object

graph-databases

Related posts about python

unmet dependencies in Ubuntu 12.04

How can I get sikuli-ide to work?

Getting PATH right for python after MacPorts install

call python with system() in R to run a python script emulating the python console

Python - Calling a non python program from python?

Related posts about graph

C++: Error in Xcode; "Graph::Coordinate::Coordinate()", referenced from: ...

How to create Line Graph and Bar graph on same parameters in asp.net

[C++] Write connected components of a graph using Boost Graph

Display Graph using Boost Graph Library

Matlab multiple graph types inside one graph

Categories cloud