Search Results

Search found 6 results on 1 pages for 'tjb1982'.

Page 1/1 | 1 

  • How does I/O work for large graph databases?

    - by tjb1982
    I should preface this by saying that I'm mostly a front end web developer, trained as a musician, but over the past few years I've been getting more and more into computer science. So one idea I have as a fun toy project to learn about data structures and C programming was to design and implement my own very simple database that would manage an adjacency list of posts. I don't want SQL (maybe I'll do my own query language? I'm just having fun). It should support ACID. It should be capable of storing 1TB let's say. So with that, I was trying to think of how a database even stores data, without regard to data structures necessarily. I'm working on linux, and I've read that in that world "everything is a file," including hardware (like /dev/*), so I think that that obviously has to apply to a database, too, and it clearly does--whether it's MySQL or PostgreSQL or Neo4j, the database itself is a collection of files you can see in the filesystem. That said, there would come a point in scale where loading the entire database into primary memory just wouldn't work, so it doesn't make sense to design it with that mindset (I assume). However, reading from secondary memory would be much slower and regardless some portion of the database has to be in primary memory in order for you to be able to do anything with it. I read this post: Why use a database instead of just saving your data to disk? And I found it difficult to understand how other databases, like SQLite or Neo4j, read and write from secondary memory and are still very fast (faster, it would seem, than simply writing files to the filesystem as the above question suggests). It seems the key is indexing. But even indexes need to be stored in secondary memory. They are inherently smaller than the database itself, but indexes in a very large database might be prohibitively large, too. So my question is how is I/O generally done with large databases like the one I described above that would be at least 1TB storing a big adjacency list? If indexing is more or less the answer, how exactly does indexing work--what data structures should be involved?

    Read the article

  • Persisting NLP parsed data

    - by tjb1982
    I've recently started experimenting with NLP using Stanford's CoreNLP, and I'm wondering what are some of the standard ways to store NLP parsed data for something like a text mining application? One way I thought might be interesting is to store the children as an adjacency list and make good use of recursive queries (postgres supports this and I've found it works really well). Something like this: Component ( id, POS, parent_id ) Word ( id, raw, lemma, POS, NER ) CW_Map ( component_id, word_id, position int ) But I assume there are probably many standard ways to do this depending on what kind of analysis is being done that have been adopted by people working in the field over the years. So what are the standard persistence strategies for NLP parsed data and how are they used?

    Read the article

  • Understanding clojure keywords

    - by tjb1982
    I'm taking my first steps with Clojure. Otherwise, I'm somewhat competent with JavaScript, Python, Java, and a little C. I was reading this artical that describes destructuring vectors and maps. E.g. => (def point [0 0]) => (let [[x y] point] => (println "the coordinates are:" x y)) the coordinates are: 0 0 but I'm having a difficult time understanding keywords. At first glance, they seem really simple, as they just evaluate to themselves: => :test :test But they seem to be used is so many different ways and I don't understand how to think about them. E.g., you can also do stuff like this: => (defn full-name [& {first :first last :last}] => (println first last)) => (full-name :first "Tom" :last "Brennan") Tom Brennan nil This doesn't seem intuitive to me. I would have guessed the arguments should have been something more like: (full-name {:first "Tom" :last "Brennan"}) because it looks like in the function definition that you're saying "no required arguments, but a variable number of arguments comes in the form of a single map". But it seems more like you're saying "no required arguments, but a variable number of arguments comes which should be a list of alternating keywords and values... ?" I'm not really sure how to wrap my brain around this. Also, things like this confuse me too: => (def population {:humans 5 :zombies 1000}) => (:zombies population) 1000 => (population :zombies) 1000 How do maps and keywords suddenly become functions? If I could get some clarification on the use of keywords in these two examples, that would be really helpful. Update I've also seen http://stackoverflow.com/questions/3337888/clojure-named-arguments and while the accepted answer is a great demonstration of how to use keywords with destructuring and named arguments, I'm really looking more for understanding how to think about them--why the language is designed this way and how I can best internalize their use.

    Read the article

  • Persisting natural language processing parsed data

    - by tjb1982
    I've recently started experimenting with natural language processing (NLP) using Stanford's CoreNLP, and I'm wondering what are some of the standard ways to store NLP parsed data for something like a text mining application? One way I thought might be interesting is to store the children as an adjacency list and make good use of recursive queries (Postgres supports this and I've found it works really well). But I assume there are probably many standard ways to do this depending on what kind of analysis is being done that have been adopted by people working in the field over the years. So what are the standard persistence strategies for NLP parsed data and how are they used?

    Read the article

  • 404s on password protected content

    - by tjb1982
    I'm new to WordPress and SEO, generally, but we've been running into problems with our site that don't seem to make sense to me. The problem is that our editor likes to schedule posts and/or mark them private until she is ready to make them public, but somehow Google is crawling these posts and getting 404s (because they are password protected). How does Google know they exist in the first place? I checked the sitemap.xml file and don't see a record of the post. One of the offending posts was marked public, but is scheduled for a future date. Could that have something to do with it? I've tried to Google the answer, and I came up with a good amount of reassurance that this won't hurt the site, but I'm still wondering how it's happening in the first place. It's hard because I don't know exactly what the editor's workflow is. Is it possible she's posting publicly first and then revising it to be private only after it's too late? Does anyone know how Google finds WordPress URLs it shouldn't have access to?

    Read the article

  • How are the conceptual pairs Abstract/Concrete, Generic/Specific, and Complex/Simple related to one another in software architecture?

    - by tjb1982
    (= 2 (+ 1 1)) take the above. The requirement of the '=' predicate is that its arguments be comparable. Any two structures are comparable in this case, and so the contract/requirement is pretty generic. The '+' predicate requires that its arguments be numbers. That's more specific. (socket domain type protocol) the arguments here are much more specific (even though the arguments are still just numbers and the function itself returns a file descriptor, which is itself an int), but the arguments are more abstract, and the implementation is built up from other functions whose abstractions are less abstract, which are themselves built from less and less abstract abstractions. To the point where the requirements are something like move from one location to another, observe whether the switch at that location is on or off, turn the switch on or off, or leave it the same, etc. But are functions also less and less complex the less abstract they are? And is there a relationship between the number and range of arguments of a function and the complexity of its implementation, as you go from more abstract to less abstract, and vice versa? (= 2 (+ 1 1) 2r10) the '=' predicate is more generic than the '+' predicate, and thus could be more complex in its implementation. The '+' predicate's contract is less generic, and so could be less complex in its implementation. Is this even a little correct? What about the 'socket' function? Each of those arguments is a number of some kind. What they represent, though, is something more elaborate. It also returns a number (just like the others do), which is also a representation of something conceptually much more elaborate than a number. To boil it down, I'm asking if there is a relationship between the following dimensions, and why: Abstract/Concrete Complex/Simple Generic/Specific And more specifically, do different configurations of these dimensions have a specific, measurable impact on the number and range of the arguments (i.e., the contract) of a function?

    Read the article

1