data oriented design - Page 220

What are the algorithms that are used for working with large data in popular web applications

- by Moss Farmer

I am looking for some well known algorithms that can be considered while handling very large amount of data.(Edit- By large amount of data I refer to records in a database excluding blobs). These algorithms if not in totality but in parts may be used in big web applications like Twitter, Last.fm , Amazon ,etc. Specifically, I'm looking for names or links to such algorithms. My primary interest lies in developing a very deep understanding on working with large database records and writing efficient code for working with the same.

Read the article

How can I keep directories in sync

- by Guillaume Boudreau

I have a directory, dirA, that users can work in: they can create, modify, rename and delete files & sub-directores in dirA. I want to keep another directory, dirB, in sync with dirA. What I'd like, is a discussion on finding a working algorithm that would achieve the above, with the limitations listed below. Requirements: 1. Something asynchronous - I don't want to stop file operations in dirA while I work in dirB. 2. I can't assume that I can just blindly rsync dirA to dirB on regular interval - dirA could contain millions of files & directories, and terrabytes of data. Completely walking the dirA tree could take hours. Those two requirements makes this really difficult. Having it asynchronous means that when I start working on a specific file from dirA, it might have moved a lot since it appeared. And the second limitation means that I really need to watch dirA, and work on atomic file operations that I notice. Current (broken) implementation: 1. Log all file & directory operations in dirA. 2. Using a separate process, read that log, and 'repeat' all the logged operations in dirB. Why is it broken: echo 1 > dirA/file1 # Allow the 'log reader' process to create dirB/file1: log = "write dirA/file1"; action = cp dirA/file1 dirB/file1; result = OK echo 1 > dirA/file2 mv dirA/file1 dirA/file3 mv dirA/file2 dirA/file1 rm dirA/file3 # End result: file1 contains '1' # 'log reader' process starts working on the 4 above file operations: log = "write file2"; action = cp dirA/file2 dirB/file2; result = failed: there is no dirA/file2 log = "rename file1 file3"; action = mv dirB/file1 dirB/file3; result = OK log = "rename file2 file1"; action = mv dirB/file2 dirB/file1; result = failed: there is no dirB/file2 log = "delete file3"; action = rm dirB/file3; result = OK # End result in dirB: no more files! Another broken example: echo 1 > dirA/dir1/file1 mv dirA/dir1 dirA/dir2 # 'log reader' process starts working on the 2 above file operations: log = "write file1"; action = cp dirA/dir1/file1 dirB/dir1/file1; result = failed: there is no dirA/dir1/file1 log = "rename dir1 dir2"; action = mv dirB/dir1 dirB/dir2; result = failed: there is no dirA/dir1 # End result if dirB: nothing!

Read the article

Interconnect nodes in a Java distributed infrastructure for tweet processing

- by David Moreno García

I'm working in a new version of an old project that I used to download and process user statuses from Twitter. The main problem of that project was its infrastructure. I used multiple instances of a java application (trackers) to download from Twitter given an specific task (basically terms to search for), connected with a central node (a web application) that had to process all tweets once per day and generate a new task for each trackers once each 15 minutes. The central node also had to monitor all trackers and enable/disable them under user petition. This, as I said, was too slow because I had multiple bottlenecks, so in this new version I want to improve the infrastructure and isolate all functionalities in specific nodes. I also need a good notification system to receive notifications for any node. So, in the next diagram I show the components that I'll need in this new version: As you can see, there are more nodes. Here are some notes about them: Dashboard: Controls trackers statuses and send a single task to each of them (under user request). The trackers will use this task until replaced with a new one (if done, not each 15 minutes like before). Search engine: I need to store all the tweets. They are firstly stored in a local database for each tracker but after that I'm thinking on using something like Elasticsearch to be able to do fast searches. Tweet processor: Just and isolated component with its own database (maybe something like the search engine to have fast access to info generated by the module). In the future more could be added. Application UI: A web application with a shared database with the Dashboard (mainly to store users information and preferences). Indeed, both could be merged into a single web. The main difference with the previous version of the project is that now they will be isolated and they will only show information and send requests. I will not do any heavy task in them (like process tweets as I did before). So, having this components, my main headache is how to structure all to not have to rewrite a lot of code every time I need to access any new data. Another headache is how can I interconnect nodes. I could use sockets but that is a pain in the ass. Maybe a REST layer? And finally, if all the nodes are isolated, how could I generate notifications for each user which info is only in the database used by the Application UI? I'm programming this using Java and Spring (at least I used them in the last version) but I have no problems with changing the language if I can take advantage of a tool/library/engine to make my life easier and have a better platform. Any comment will be appreciated.

Read the article

Is it possible to use ASP.NET Dynamic Data and SubSonic 3?

- by James

Is it possible to use ASP.NET Dynamic Data with SubSonic 3 in-place of Linq to SQL classes or the Entity Framework? MetaModel.RegisterContext() throws an exception if you use the context class that SubSonic generates. I thought I remembered coming across a SubSonic/Dynamic Data example back before SubSonic 3 was released but I can't find it now. Has anyone been able to get this to work?

Read the article

Microsoft Cyber Security Survey Finds Businesses' Most Valuable Data at Risk

IT security organizations are spending too much on data protection for compliance but not enough on securing key trade secrets, the real crown jewels of corporate data.

Read the article

PTS training on Big Data now available!

- by Javier Puerta

The popular PTS (Platform Technology Services) technical trainings for partners now include a workshop on Big Data. First workshop will take place in Milan on July 10-12. (You can register by clicking the link below) Oracle Big Data Technical Workshop July 10-12, 2012: Cinisello Balsamo, Milan, Italy For more info contact [email protected]

Read the article

Master Data Management Update

Oracle's Master Data Management suite has seen remarkable development progress in the past year and a half. Leveraging out-of-the-box integration to applications provided by Application Integration Architecture, the cost, risk and time it takes to implement an MDM solution has been cut in half. Oracle Applications are now 'MDM Aware', Data Quality tools have reached state-of-the-art status, and new hubs are coming on line. In this AppsCast, Pascal Laik, VP MDM Products discusses this progress, what it means for Oracle customers, and where we are going from here.

Read the article

How do I pass a string or data object between two view controllers?

- by Jonathan

In my last question i asked how to best send a string from one view controller to another, both which were on a navigation stack: http://stackoverflow.com/questions/2898860/pass-string-from-tableviewcontroller-to-viewcontroller-in-navigation-stack However I just realised I can either pass the path to the file in the app's document's folder as the first (the table view) has already accessed the data in the file should I pass viewcontroller the data to the pushed VC?

Read the article

Web application design with distributed servers

- by Bonn

I want to build a web application/server with this structure: main-server sub-server transaction-server (create, update, delete) view-server (view, search) authentication-server documents-server reporting-server library-server e-learning-server The main-server acts as host server for sub-server. I can add many sub-servers and connect it to main-server (via plug-play interface maybe), then it can begin querying data from another sub-servers (which has been connected to the main-server). The sub-servers can be anywhere as long as connected to internet. The main-server can manage all sub-servers which are connected to it (query data, setting permission between sub-servers, etc). The purpose is simple, the web application will be huge as the company grows, so I want to distribute it into small connected plug-able servers. My question is, does the structure above already have a standardized method? or are there any different views? what are the technologies needed? I need a lot of researches before the execution plan begin. thanks a lot.

Read the article

What schema documentation tools exist for PostgreSQL

- by Brad Koch

MySQL has MySQL Workbench for designing and documenting your schema, and generates CREATE and ALTER scripts based on your design. We're looking at migrating to PostgreSQL in the near future, and we do need a practical way of documenting and modifying the schema structure. What similar tools exist for Postgres (that are OS X/Linux compatible)? Alternatively, what equivalent conventions would be followed for designing and documenting the structure of your Postgres database?

Read the article

Using SOUNDEX and DIFFERENCE to Standardize Data in SQL Server

My client wants to standardize address information for existing and future addresses collected for their customers, particularly the street suffixes. The application used to enter and collect address information has the street suffix separated from the address field, but it is a textbox instead of a drop down list therefore things are not standardized. I know there are some options out there to standarize data, but they would like a less expensive alternative. Are there any functions in SQL Server that I can use to standardized data?

Read the article

Long Term Data Storage - Choosing A Media Type

Choosing a long term data storage medium isn';t as easy as you may think. You might imagine that the data could be burnt to CD, locked in a cupboard and that it would last forever however unfortunatel... [Author: Chris Holgate - Computers and Internet - April 02, 2010]

Read the article

Passing a Collection of Java Beans as DataSource to Jasper, How to design this in ireports

- by Vaishak Suresh

I need to design a report, that displays the data from a collection(Say List). This list contains multiple POJOs. The POJOs are populated by the data access layer of the application. How do I design a report template for this requirement in iReports?

Read the article

Keeping Pace with Data Encryption Laws

States are passing more and more data security laws, the US Senate and the House have bills meandering through Congress, securing personal information and encrypting that data is no longer optional.

Read the article

R: What's the easiest way to print out pairs of values from a data.frame?

- by John

I have a data.frame: df<-data.frame(a=c("x","x","y","y"),b=c(1,2,3,4)) > df a b 1 x 1 2 x 2 3 y 3 4 y 4 What's the easiest way to print out each pair of values as a list of strings like this: "x1", "x2", "y1", "y2"

Read the article

Recovering Data from a Corrupt MS Access Table

A corrupt table in MS Access means lost time and data. It can lead to a loss of revenue or even employment. Learn how you might be able to recover most of the data when the worst happens.

Read the article

Lookup for data sources in a query

- by DAXShekhar

public static str lookupDatasourceOfQuery(Query _query) { Query query = _query; QueryBuildDataSource qbds; int dsIterator; Map map = new Map(Types::String, Types::String); ; for (dsIterator = 1; dsIterator <= query.dataSourceCount(); dsIterator++) { qbds = query.dataSourceNo(dsIterator); map.insert(qbds.name(), qbds.name()); } return pickList(map, "Data source", "Data sources"); }

Read the article

BIDS MDX Query opening in Design Mode

- by Stephen

Hi, I have an rdl file uses custom MDX to generate a dataset. The file is opening in Design MOde and removing the custom mdx. How do I stop it from opening in Design Mode? Many THanks, S

Read the article

An overview of Master Data Services - MDS in SQL Server 2008 R2

With Master Data Services, IT organizations can centrally manage critical data assets companywide. Too many SQL Servers to keep up with?Download a free trial of SQL Response to monitor your SQL Servers in just one intuitive interface."The monitoringin SQL Response is excellent." Mike Towery.

Read the article

What's the best way to keep java app data stored redundantly in a file?

- by Bijan

If I have systems that are based on realtime data, how can I ensure that all the information that is current is redundantly stored in a file? So that when the program starts again, it uses this information to initialize itself back to where it was when it closed. I know of xstream and HSQLDB. but wasn't sure if this was the best option for data that needs to be a literal carbon copy.

Read the article

How do we provide valid time estimates during Sprint Planning without doing "too much" design?

- by Michael Edenfield

My team is getting up to speed with Scrum, but most of us are more familiar with non-agile or "pseudo-"agile methodologies. The part that is the biggest hurdle for us is running an efficient Sprint Planning meeting where we break our backlog items into tasks, and estimate hours. (I'm using the terminology from the VS2010 Scrum Template; apologies if I use the wrong word somewhere.) When we try to figure out how long a task is going to take, we often fall into the trap of designing the feature at the code level -- table layout, interfaces, etc -- in order to figure out how long that's going to take. I'm pretty sure this is not the appropriate place to be doing that kind of design. We should be scheduling tasks for these design meetings during the sprint. However, we are having trouble figuring out how else to come up with meaningful estimates for the tasks. Are there any practical habits/techniques/etc. for making a judgement call about how long a feature is going to take, without knowing how you plan to implement it? If our time estimates are going to change significantly once the design has been completed, how can we properly budget our Sprint backlog ahead of time? EDIT: Just to clarify, since some of the comments/answers are very valid but I think addressing the wrong question. We know that what we're doing is not right, and that we should be building time into the sprint for this design. Conceptually all of the developers understand that. We also also bringing in a team member with Scrum experience to keep us on track if we start going off into the weeds. The problem is that, without going through this design process, we are finding it difficult to provide concrete time estimates for anything. We are constantly saying things like "well if we design it this way it might take 8 hours but if we end up having to do this other way instead that will take about 32 but it might not be as bad once we start trying to write it...". I also assume that this process will get better once we have some historical velocity to work from, but many of the technologies and architectural patterns we are using are new to us. But if potentially-wildly-wrong estimates are just a natural part of adapting this process then we will just need to recondition ourselves to accept that :)

Read the article

Brainstorming to Collect Data in Designing the Corporate Computer Network

Suppose that you are appointed to lead a project of a computer network design in your organization that contains many sites that need to be integrated into a single corporate private network, what ar... [Author: Ki Grinsing - Computers and Internet - April 03, 2010]

Read the article

finding houses within a radius

- by paul smith

During an interview I was asked given the following: A real estate application that lists all houses that are currently on the market (i.e., for sale) within a given distance (say for example the user wants to find all houses within 20 miles), how would you design your application (both data structure and alogirithm) to build this type of service? Any ideas? How would you implement it? I told him I didn't know becaue I've never done any geo-related stuff before.

Read the article

Google I/O 2012 - So You've Read the Design Guide; Now What?

Google I/O 2012 - So You've Read the Design Guide; Now What? Daniel Lehmann, Tor Norbye, Richard Ngo The Android Design Guide describes how to design beautiful Android apps, but not how to build them. In this talk we'll give practical tips for how to apply fit & finish as you are implementing your design, we'll show you how to avoid some common pitfalls, we'll describe some useful patterns, and show you how tools can help. For all I/O 2012 sessions, go to developers.google.com From: GoogleDevelopers Views: 38 1 ratings Time: 56:31 More in Science & Technology

Read the article

AWS EC2 Oracle RDB - Storing and managing my data

- by llaszews

When create an Oracle Database on the Amazon cloud you will need to store you database files somewhere on the EC2 cloud. There are basically three places where database files can be stored: 1. Local drive - This is the local drive that is part of the virtual server EC2 instance. 2. Elastic Block Storage (EBS) - Network attached storage that appears as a local drive. 3. Simple Storage Server (S3) - 'Storage for the Internet'. S3 is not high speed and intended for store static document type files. S3 can also be used for storing static web page files. Local drives are ephemeral so not appropriate to be used as a database storage device. The leaves EBS which is the best place to store database files. EBS volumes appear as local disk drives. They are actually network-attached to an Amazon EC2 instance. In addition, EBS persists independently from the running life of a single Amazon EC2 instance. If you use an EBS backed instance for your database data, it will remain available after reboot but not after terminate. In many cases you would not need to terminate your instance but only stop it, which is equivalent of shutdown. In order to save your database data before you terminate an instance, you can snapshot the EBS to S3. Using EBS as a data store you can move your Oracle data files from one instance to another. This allows you to move your database from one region or or zone to another. Unfortunately, to scale out your Oracle RDS on AWS you can not have read only replicas. This is only possible with the other Oracle relational database - MySQL. The free micro instances use EBS as its storage. This is a very good white paper that has more details: AWS Storage Options This white paper also discusses: SQS, SimpleDB, and Amazon RDS in the context of storage devices. However, these are not storage devices you would use to store an Oracle database. This slide deck discusses a lot of information that is in the white paper: AWS Storage Options slideshow

Search Results

Search found 69357 results on 2775 pages for 'data oriented design'.

Page 220/2775 | < Previous Page | 216 217 218 219 220 221 222 223 224 225 226 227 | Next Page >

- by Moss Farmer

- by Guillaume Boudreau

- by David Moreno García

- by James

- by Javier Puerta

- by Jonathan

- by Bonn

- by Brad Koch

- by Vaishak Suresh

- by John

- by DAXShekhar

- by Stephen

- by Bijan

- by Michael Edenfield

- by paul smith

- by llaszews

< Previous Page | 216 217 218 219 220 221 222 223 224 225 226 227 | Next Page >