Class instance clustering in object reference graph for multi-entries serialization

Posted by Juh_ on Programmers See other posts from Programmers or by Juh_
Published on 2013-08-02T13:09:15Z Indexed on 2013/08/02 16:02 UTC
Read the original article Hit count: 278

My question is on the best way to cluster a graph of class instances (i.e. objects, the graph nodes) linked by object references (the -directed- edges of the graph) around specifically marked objects. To explain better my question, let me explain my motivation:

I currently use a moderately complex system to serialize the data used in my projects:

  • "marked" objects have a specific attributes which stores a "saving entry": the path to an associated file on disc (but it could be done for any storage type providing the suitable interface)
  • Those object can then be serialized automatically (eg: obj.save())
  • The serialization of a marked object 'a' contains implicitly all objects 'b' for which 'a' has a reference to, directly s.t: a.b = b, or indirectly s.t.: a.c.b = b for some object 'c'

This is very simple and basically define specific storage entries to specific objects. I have then "container" type objects that:

  1. can be serialized similarly (in fact their are or can-be "marked")
  2. they don't serialize in their storage entries the "marked" objects (with direct reference): if a and a.b are both marked, a.save() calls b.save() and stores a.b = storage_entry(b)

So, if I serialize 'a', it will serialize automatically all objects that can be reached from 'a' through the object reference graph, possibly in multiples entries. That is what I want, and is usually provides the functionalities I need. However, it is very ad-hoc and there are some structural limitations to this approach:

  1. the multi-entry saving can only works through direct connections in "container" objects, and
  2. there are situations with undefined behavior such as if two "marked" objects 'a'and 'b' both have a reference to an unmarked object 'c'. In this case my system will stores 'c' in both 'a' and 'b' making an implicit copy which not only double the storage size, but also change the object reference graph after re-loading.

I am thinking of generalizing the process. Apart for the practical questions on implementation (I am coding in python, and use Pickle to serialize my objects), there is a general question on the way to attach (cluster) unmarked objects to marked ones.

So, my questions are:

  • What are the important issues that should be considered? Basically why not just use any graph parsing algorithm with the "attach to last marked node" behavior.
  • Is there any work done on this problem, practical or theoretical, that I should be aware of?

Note: I added the tag graph-database because I think the answer might come from that fields, even if the question is not.

© Programmers or respective owner

Related posts about python

Related posts about graph