Search Results

Search found 69357 results on 2775 pages for 'data oriented design'.

Page 5/2775 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

How To Deal With Terrible Design Decisions

- by splatto

I'm a consultant at one company. There is another consultant who is a year older than me and has been here 3 months longer than I have, and a full time developer. The full-time developer is great. My concern is that I see the consultant making absolutely terrible design decisions. For example, M:M relationships are being stored in the database as a comma-delimited string rather than using a conjunction table to hold the relationships. For example, consider two tables, Car and Property: Car records: Camry Volvo Mercedes Property records: Spare Tire Satellite Radio Ipod Support Standard Rather than making a table CarProperties to represent this, he has made a "Property" attribute on the Car table whose data looks like "1,3,7,13,19,25," I hate how this decision and others are affecting the quality of my code. We have butted heads over this design three times in the past two months since I've been here. He asked me why my suggestion was better, and I responded that our database would be eliminating redundant data by converting to a higher normal form. I explained that this design flaw in particular is discussed and discouraged in entry level college programs, and he responded with a shot at me saying that these comma-separated-value database properties are taught when you do your masters (which neither of us have). Needless to say, he became very upset and demanded I apologize for criticizing his work, which I did in the interest of not wanting to be the consultant to create office drama. Our project manager is focused on delivering a product ASAP and is a very strong personality - Suggesting to him at this point that we spend some time to do this right will set him off. There is a strong likelihood that both of our contracts will be extended to work on a second project coming up. How will I be able to exert dominant influence over the design of the system and the data model to ensure that such terrible mistakes are not repeated in the next project? A glimpse at the dynamics: I can be a strong personality if I don't measure myself. The other consultant is not a strong personality, is a poor communicator, is quite stubborn and thinks he is better than everyone else. The project manager is an extremely strong personality who is focused on releasing tomorrow's product yesterday. The full-time developer is very laid back and easy going, a very effective communicator, but is someone who will accept bad design if it means not rocking the boat. Code reviews or anything else that takes "time" will be out of the question - there is no way our PM will be sold on such a thing by anybody.

Read the article
Design suggestion for expression tree evaluation with time-series data

- by Lirik

I have a (C#) genetic program that uses financial time-series data and it's currently working but I want to re-design the architecture to be more robust. My main goals are: sequentially present the time-series data to the expression trees. allow expression trees to access previous data rows when needed. to optimize performance of the data access while evaluating the expression trees. keep a common interface so various types of data can be used. Here are the possible approaches I've thought about: I can evaluate the expression tree by passing in a data row into the root node and let each child node use the same data row. I can evaluate the expression tree by passing in the data row index and letting each node get the data row from a shared DataSet (currently I'm passing the row index and going to multiple synchronized arrays to get the data). Hybrid: an immutable data set is accessible by all of the expression trees and each expression tree is evaluated by passing in a data row. The benefit of the first approach is that the data row is being passed into the expression tree and there is no further query done on the data set (which should increase performance in a multithreaded environment). The drawback is that the expression tree does not have access to the rest of the data (in case some of the functions need to do calculations using previous data rows). The benefit of the second approach is that the expression trees can access any data up to the latest data row, but unless I specify what that row is, I'll have to iterate through the rows and figure out which one is the last one. The benefit of the hybrid is that it should generally perform better and still provide access to the earlier data. It supports two basic "views" of data: the latest row and the previous rows. Do you guys know of any design patterns or do you have any tips that can help me build this type of system? Should I use a DataSet to hold and present the data, or are there more efficient ways to present rows of data while maintaining a simple interface? FYI: All of my code is written in C#.

Read the article
Big Data – Basics of Big Data Analytics – Day 18 of 21

- by Pinal Dave

In yesterday’s blog post we learned the importance of the various components in Big Data Story. In this article we will understand what are the various analytics tasks we try to achieve with the Big Data and the list of the important tools in Big Data Story. When you have plenty of the data around you what is the first thing which comes to your mind? “What do all these data means?” Exactly – the same thought comes to my mind as well. I always wanted to know what all the data means and what meaningful information I can receive out of it. Most of the Big Data projects are built to retrieve various intelligence all this data contains within it. Let us take example of Facebook. When I look at my friends list of Facebook, I always want to ask many questions such as - On which date my maximum friends have a birthday? What is the most favorite film of my most of the friends so I can talk about it and engage them? What is the most liked placed to travel my friends? Which is the most disliked cousin for my friends in India and USA so when they travel, I do not take them there. There are many more questions I can think of. This illustrates that how important it is to have analysis of Big Data. Here are few of the kind of analysis listed which you can use with Big Data. Slicing and Dicing: This means breaking down your data into smaller set and understanding them one set at a time. This also helps to present various information in a variety of different user digestible ways. For example if you have data related to movies, you can use different slide and dice data in various formats like actors, movie length etc. Real Time Monitoring: This is very crucial in social media when there are any events happening and you wanted to measure the impact at the time when the event is happening. For example, if you are using twitter when there is a football match, you can watch what fans are talking about football match on twitter when the event is happening. Anomaly Predication and Modeling: If the business is running normal it is alright but if there are signs of trouble, everyone wants to know them early on the hand. Big Data analysis of various patterns can be very much helpful to predict future. Though it may not be always accurate but certain hints and signals can be very helpful. For example, lots of data can help conclude that if there is lots of rain it can increase the sell of umbrella. Text and Unstructured Data Analysis: unstructured data are now getting norm in the new world and they are a big part of the Big Data revolution. It is very important that we Extract, Transform and Load the unstructured data and make meaningful data out of it. For example, analysis of lots of images, one can predict that people like to use certain colors in certain months in their cloths. Big Data Analytics Solutions There are many different Big Data Analystics Solutions out in the market. It is impossible to list all of them so I will list a few of them over here. Tableau – This has to be one of the most popular visualization tools out in the big data market. SAS – A high performance analytics and infrastructure company IBM and Oracle – They have a range of tools for Big Data Analysis Tomorrow In tomorrow’s blog post we will discuss about very important components of the Big Data Ecosystem – Data Scientist. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
Object model design: collections on classes

- by Luke Puplett

Hi all, Consider Train.Passengers, what type would you use for Passengers where passengers are not supposed to be added or removed by the consuming code? I'm using .NET Framework, so this discussion would suit .NET, but it could apply to a number of modern languages/frameworks. In the .NET Framework, the List is not supposed to be publicly exposed. There's Collection and ICollection and guidance, which I tend to agree with, is to return the closest concrete type down the inheritance tree, so that'd be Collection since it is already an ICollection. But Collection has read/write semantics and so possibly it should be a ReadOnlyCollection, but its arguably common sense not to alter the contents of a collection that you don't have intimate knowledge about so is it necessary? And it requires extra work internally and can be a pain with (de)serialization. At the extreme ends I could just return Person[] (since LINQ now provides much of the benefits that previously would have been afforded by a more specified collection) or even build a strongly-typed PersonCollection or ReadOnlyPersonCollection! What do you do? Thanks for your time. Luke

Read the article
A design pattern for data binding an object (with subclasses) to asp.net user control

- by Rohith Nair

I have an abstract class called Address and I am deriving three classes ; HomeAddress, Work Address, NextOfKin address. My idea is to bind this to a usercontrol and based on the type of Address it should bind properly to the ASP.NET user control. My idea is the user control doesn't know which address it is going to present and based on the type it will parse accordingly. How can I design such a setup, based on the fact that, the user control can take any type of address and bind accordingly. I know of one method like :- Declare class objects for all the three types (Home,Work,NextOfKin). Declare an enum to hold these types and based on the type of this enum passed to user control, instantiate the appropriate object based on setter injection. As a part of my generic design, I just created a class structure like this :- I know I am missing a lot of pieces in design. Can anybody give me an idea of how to approach this in proper way.

Read the article
How to design database having multiple interrelated entities

- by Sharath Chandra

I am designing a new system which is more of a help system for core applications in banks or healthcare sector. Given the nature of the system this is not a heavy transaction oriented system but more of read intensive. Now within this application I have multiple entities which are related to each other. For e.g. Assume the following entities in the system User Training Regulations Now each of these entities have M:N Relationship with each other. Assuming the usage of a standard RDBMS, the design may involve many relationship tables each containing the relationships one other entity ("User_Training", "User_Regulations", "Training_Regulations"). This design is limiting since I have more than 3 entities in the system and maintaining the relationship graph is difficult this way. The most frequently used operation is "given an entity get me all the related entities" . I need to design the database where this operation is relatively inexpensive. What are the different recommendations for modelling this kind of database.

Read the article
Design for object with optional and modifiable attributtes?

- by Ikuzen

I've been using the Builder pattern to create objects with a large number of attributes, where most of them are optional. But up until now, I've defined them as final, as recommended by Joshua Block and other authors, and haven't needed to change their values. I am wondering what should I do though if I need a class with a substantial number of optional but non-final (mutable) attributes? My Builder pattern code looks like this: public class Example { //All possible parameters (optional or not) private final int param1; private final int param2; //Builder class public static class Builder { private final int param1; //Required parameters private int param2 = 0; //Optional parameters - initialized to default //Builder constructor public Builder (int param1) { this.param1 = param1; } //Setter-like methods for optional parameters public Builder param2(int value) { param2 = value; return this; } //build() method public Example build() { return new Example(this); } } //Private constructor private Example(Builder builder) { param1 = builder.param1; param2 = builder.param2; } } Can I just remove the final keyword from the declaration to be able to access the attributes externally (through normal setters, for example)? Or is there a creational pattern that allows optional but non-final attributes that would be better suited in this case?

Read the article
design a model for a system of dependent variables

- by dbaseman

I'm dealing with a modeling system (financial) that has dozens of variables. Some of the variables are independent, and function as inputs to the system; most of them are calculated from other variables (independent and calculated) in the system. What I'm looking for is a clean, elegant way to: define the function of each dependent variable in the system trigger a re-calculation, whenever a variable changes, of the variables that depend on it A naive way to do this would be to write a single class that implements INotifyPropertyChanged, and uses a massive case statement that lists out all the variable names x1, x2, ... xn on which others depend, and, whenever a variable xi changes, triggers a recalculation of each of that variable's dependencies. I feel that this naive approach is flawed, and that there must be a cleaner way. I started down the path of defining a CalculationManager<TModel> class, which would be used (in a simple example) something like as follows: public class Model : INotifyPropertyChanged { private CalculationManager<Model> _calculationManager = new CalculationManager<Model>(); // each setter triggers a "PropertyChanged" event public double? Height { get; set; } public double? Weight { get; set; } public double? BMI { get; set; } public Model() { _calculationManager.DefineDependency<double?>( forProperty: model => model.BMI, usingCalculation: (height, weight) => weight / Math.Pow(height, 2), withInputs: model => model.Height, model.Weight); } // INotifyPropertyChanged implementation here } I won't reproduce CalculationManager<TModel> here, but the basic idea is that it sets up a dependency map, listens for PropertyChanged events, and updates dependent properties as needed. I still feel that I'm missing something major here, and that this isn't the right approach: the (mis)use of INotifyPropertyChanged seems to me like a code smell the withInputs parameter is defined as params Expression<Func<TModel, T>>[] args, which means that the argument list of usingCalculation is not checked at compile time the argument list (weight, height) is redundantly defined in both usingCalculation and withInputs I am sure that this kind of system of dependent variables must be common in computational mathematics, physics, finance, and other fields. Does someone know of an established set of ideas that deal with what I'm grasping at here? Would this be a suitable application for a functional language like F#? Edit More context: The model currently exists in an Excel spreadsheet, and is being migrated to a C# application. It is run on-demand, and the variables can be modified by the user from the application's UI. Its purpose is to retrieve variables that the business is interested in, given current inputs from the markets, and model parameters set by the business.

Read the article
What OO Design to use ( is there a Design Pattern )?

- by Blundell

I have two objects that represent a 'Bar/Club' ( a place where you drink/socialise). In one scenario I need the bar name, address, distance, slogon In another scenario I need the bar name, address, website url, logo So I've got two objects representing the same thing but with different fields. I like to use immutable objects, so all the fields are set from the constructor. One option is to have two constructors and null the other fields i.e: class Bar { private final String name; private final Distance distance; private final Url url; public Bar(String name, Distance distance){ this.name = name; this.distance = distance; this.url = null; } public Bar(String name, Url url){ this.name = name; this.distance = null; this.url = url; } // getters } I don't like this as you would have to null check when you use the getters In my real example the first scenario has 3 fields and the second scenario has about 10, so it would be a real pain having two constructors, the amount of fields I would have to declare null and then when the object are in use you wouldn't know which Bar you where using and so what fields would be null and what wouldn't. What other options do I have? Two classes called BarPreview and Bar? Some type of inheritance / interface? Something else that is awesome?

Read the article
Defining formula through user interface in user form

- by BriskLabs Pakistan

I am a student and developing a simple assignment - windows form application in visual studio 2010. The application is suppose to construct formulas as per user requirement. The process: It has to pick data from columns of Microsoft Access database and the user should be able to pick the data by column name like we do in a drop down menu. and create reusable formulas in it ( configure it once and can change it again). followings are column titles from database that can be picked for example. e.g Col -1 : Marks in Maths Col -2 : Total Marks in Maths Col -3 : Marks in science Col -4 : Total marks in science Finally we should be able to construct any formula in the UI like (Col 1 + Col 3 ) / ( col 2 + col 4) = Formula 1 once this is formula is set saved and a name is assigned to it by user. he/she can use the formula and results shall appear in a window below. i.e He would be able to calculate his desired figures (formula) by only manipulating underlying data on the UI layer....choose the data for a period and apply the formula and get the answer Problem: It looks like I have to create an app where rules are set through UI....... this means no stored procedures are required in SQL.... please suggest the right approach.

Read the article
Big Data – What is Big Data – 3 Vs of Big Data – Volume, Velocity and Variety – Day 2 of 21

- by Pinal Dave

Data is forever. Think about it – it is indeed true. Are you using any application as it is which was built 10 years ago? Are you using any piece of hardware which was built 10 years ago? The answer is most certainly No. However, if I ask you – are you using any data which were captured 50 years ago, the answer is most certainly Yes. For example, look at the history of our nation. I am from India and we have documented history which goes back as over 1000s of year. Well, just look at our birthday data – atleast we are using it till today. Data never gets old and it is going to stay there forever. Application which interprets and analysis data got changed but the data remained in its purest format in most cases. As organizations have grown the data associated with them also grew exponentially and today there are lots of complexity to their data. Most of the big organizations have data in multiple applications and in different formats. The data is also spread out so much that it is hard to categorize with a single algorithm or logic. The mobile revolution which we are experimenting right now has completely changed how we capture the data and build intelligent systems. Big organizations are indeed facing challenges to keep all the data on a platform which give them a single consistent view of their data. This unique challenge to make sense of all the data coming in from different sources and deriving the useful actionable information out of is the revolution Big Data world is facing. Defining Big Data The 3Vs that define Big Data are Variety, Velocity and Volume. Volume We currently see the exponential growth in the data storage as the data is now more than text data. We can find data in the format of videos, musics and large images on our social media channels. It is very common to have Terabytes and Petabytes of the storage system for enterprises. As the database grows the applications and architecture built to support the data needs to be reevaluated quite often. Sometimes the same data is re-evaluated with multiple angles and even though the original data is the same the new found intelligence creates explosion of the data. The big volume indeed represents Big Data. Velocity The data growth and social media explosion have changed how we look at the data. There was a time when we used to believe that data of yesterday is recent. The matter of the fact newspapers is still following that logic. However, news channels and radios have changed how fast we receive the news. Today, people reply on social media to update them with the latest happening. On social media sometimes a few seconds old messages (a tweet, status updates etc.) is not something interests users. They often discard old messages and pay attention to recent updates. The data movement is now almost real time and the update window has reduced to fractions of the seconds. This high velocity data represent Big Data. Variety Data can be stored in multiple format. For example database, excel, csv, access or for the matter of the fact, it can be stored in a simple text file. Sometimes the data is not even in the traditional format as we assume, it may be in the form of video, SMS, pdf or something we might have not thought about it. It is the need of the organization to arrange it and make it meaningful. It will be easy to do so if we have data in the same format, however it is not the case most of the time. The real world have data in many different formats and that is the challenge we need to overcome with the Big Data. This variety of the data represent represent Big Data. Big Data in Simple Words Big Data is not just about lots of data, it is actually a concept providing an opportunity to find new insight into your existing data as well guidelines to capture and analysis your future data. It makes any business more agile and robust so it can adapt and overcome business challenges. Tomorrow In tomorrow’s blog post we will try to answer discuss Evolution of Big Data. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
The Data Scientist

- by BuckWoody

A new term - well, perhaps not that new - has come up and I’m actually very excited about it. The term is Data Scientist, and since it’s new, it’s fairly undefined. I’ll explain what I think it means, and why I’m excited about it. In general, I’ve found the term deals at its most basic with analyzing data. Of course, we all do that, and the term itself in that definition is redundant. There is no science that I know of that does not work with analyzing lots of data. But the term seems to refer to more than the common practices of looking at data visually, putting it in a spreadsheet or report, or even using simple coding to examine data sets. The term Data Scientist (as far as I can make out this early in it’s use) is someone who has a strong understanding of data sources, relevance (statistical and otherwise) and processing methods as well as front-end displays of large sets of complicated data. Some - but not all - Business Intelligence professionals have these skills. In other cases, senior developers, database architects or others fill these needs, but in my experience, many lack the strong mathematical skills needed to make these choices properly. I’ve divided the knowledge base for someone that would wear this title into three large segments. It remains to be seen if a given Data Scientist would be responsible for knowing all these areas or would specialize. There are pretty high requirements on the math side, specifically in graduate-degree level statistics, but in my experience a company will only have a few of these folks, so they are expected to know quite a bit in each of these areas. Persistence The first area is finding, cleaning and storing the data. In some cases, no cleaning is done prior to storage - it’s just identified and the cleansing is done in a later step. This area is where the professional would be able to tell if a particular data set should be stored in a Relational Database Management System (RDBMS), across a set of key/value pair storage (NoSQL) or in a file system like HDFS (part of the Hadoop landscape) or other methods. Or do you examine the stream of data without storing it in another system at all? This is an important decision - it’s a foundation choice that deals not only with a lot of expense of purchasing systems or even using Cloud Computing (PaaS, SaaS or IaaS) to source it, but also the skillsets and other resources needed to care and feed the system for a long time. The Data Scientist sets something into motion that will probably outlast his or her career at a company or organization. Often these choices are made by senior developers, database administrators or architects in a company. But sometimes each of these has a certain bias towards making a decision one way or another. The Data Scientist would examine these choices in light of the data itself, starting perhaps even before the business requirements are created. The business may not even be aware of all the strategic and tactical data sources that they have access to. Processing Once the decision is made to store the data, the next set of decisions are based around how to process the data. An RDBMS scales well to a certain level, and provides a high degree of ACID compliance as well as offering a well-known set-based language to work with this data. In other cases, scale should be spread among multiple nodes (as in the case of Hadoop landscapes or NoSQL offerings) or even across a Cloud provider like Windows Azure Table Storage. In fact, in many cases - most of the ones I’m dealing with lately - the data should be split among multiple types of processing environments. This is a newer idea. Many data professionals simply pick a methodology (RDBMS with Star Schemas, NoSQL, etc.) and put all data there, regardless of its shape, processing needs and so on. A Data Scientist is familiar not only with the various processing methods, but how they work, so that they can choose the right one for a given need. This is a huge time commitment, hence the need for a dedicated title like this one. Presentation This is where the need for a Data Scientist is most often already being filled, sometimes with more or less success. The latest Business Intelligence systems are quite good at allowing you to create amazing graphics - but it’s the data behind the graphics that are the most important component of truly effective displays. This is where the mathematics requirement of the Data Scientist title is the most unforgiving. In fact, someone without a good foundation in statistics is not a good candidate for creating reports. Even a basic level of statistics can be dangerous. Anyone who works in analyzing data will tell you that there are multiple errors possible when data just seems right - and basic statistics bears out that you’re on the right track - that are only solvable when you understanding why the statistical formula works the way it does. And there are lots of ways of presenting data. Sometimes all you need is a “yes” or “no” answer that can only come after heavy analysis work. In that case, a simple e-mail might be all the reporting you need. In others, complex relationships and multiple components require a deep understanding of the various graphical methods of presenting data. Knowing which kind of chart, color, graphic or shape conveys a particular datum best is essential knowledge for the Data Scientist. Why I’m excited I love this area of study. I like math, stats, and computing technologies, but it goes beyond that. I love what data can do - how it can help an organization. I’ve been fortunate enough in my professional career these past two decades to work with lots of folks who perform this role at companies from aerospace to medical firms, from manufacturing to retail. Interestingly, the size of the company really isn’t germane here. I worked with one very small bio-tech (cryogenics) company that worked deeply with analysis of complex interrelated data. So watch this space. No, I’m not leaving Azure or distributed computing or Microsoft. In fact, I think I’m perfectly situated to investigate this role further. We have a huge set of tools, from RDBMS to Hadoop to allow me to explore. And I’m happy to share what I learn along the way.

Read the article
Representing complex object dependencies

- by max

I have several classes with a reasonably complex (but acyclic) dependency graph. All the dependencies are of the form: class X instance contains an attribute of class Y. All such attributes are set during initialization and never changed again. Each class' constructor has just a couple parameters, and each object knows the proper parameters to pass to the constructors of the objects it contains. class Outer is at the top of the dependency hierarchy, i.e., no class depends on it. Currently, the UI layer only creates an Outer instance; the parameters for Outer constructor are derived from the user input. Of course, Outer in the process of initialization, creates the objects it needs, which in turn create the objects they need, and so on. The new development is that the a user who knows the dependency graph may want to reach deep into it, and set the values of some of the arguments passed to constructors of the inner classes (essentially overriding the values used currently). How should I change the design to support this? I could keep the current approach where all the inner classes are created by the classes that need them. In this case, the information about "user overrides" would need to be passed to Outer class' constructor in some complex user_overrides structure. Perhaps user_overrides could be the full logical representation of the dependency graph, with the overrides attached to the appropriate edges. Outer class would pass user_overrides to every object it creates, and they would do the same. Each object, before initializing lower level objects, will find its location in that graph and check if the user requested an override to any of the constructor arguments. Alternatively, I could rewrite all the objects' constructors to take as parameters the full objects they require. Thus, the creation of all the inner objects would be moved outside the whole hierarchy, into a new controller layer that lies between Outer and UI layer. The controller layer would essentially traverse the dependency graph from the bottom, creating all the objects as it goes. The controller layer would have to ask the higher-level objects for parameter values for the lower-level objects whenever the relevant parameter isn't provided by the user. Neither approach looks terribly simple. Is there any other approach? Has this problem come up enough in the past to have a pattern that I can read about? I'm using Python, but I don't think it matters much at the design level.

Read the article
Where, in an object oriented system should you, if at all, choose (C-style) structs over classes?

- by Anto

C and most likely many other languages provide a struct keyword for creating structures (or something in a similar fashion). These are (at least in C), from a simplified point of view like classes, but without polymorphism, inheritance, methods, and so on. Think of an object-oriented (or multi paradigm) language with C-style structs. Where would you choose them over classes? Now, I don't believe they are to be used with OOP as classes seem to replace their purposes, but I wonder if there are situations where they could be preferred over classes in otherwise object-oriented programs and in what kind of situations. Are there such situations?

Read the article
New design patterns/design strategies

- by steven

I've studied and implemented design patterns for a few years now, and I'm wondering. What are some of the newer design patterns (since the GOF)? Also, what should one, similar to myself, study [in the way of software design] next? Note: I've been using TDD, and UML for some time now. I'm curious about the newer paradigm shifts, and or newer design patterns.

Read the article
Object-Oriented equivalent of LISP's progn function?

- by Archer

I'm currently writing a LISP parser that iterates through some AutoLISP code and does its best to make it a little easier to read (changing prefix notation to infix notation, changing setq assignments to "=" assignments, etc.) for those that aren't used to LISP code/only learned object oriented programming. While writing commands that LISP uses to add to a "library" of LISP commands, I came across the LISP command "progn". The only problem is that it looks like progn is simply executing code in a specific order and sometimes (not usually) assigning the last value to a variable. Am I incorrect in assuming that for translating progn to object-oriented understanding that I can simply forgo the progn function and print the statements that it contains? If not, what would be a good equivalent for progn in an object-oriented language?

Read the article
Fraud Detection with the SQL Server Suite Part 2

- by Dejan Sarka

This is the second part of the fraud detection whitepaper. You can find the first part in my previous blog post about this topic. My Approach to Data Mining Projects It is impossible to evaluate the time and money needed for a complete fraud detection infrastructure in advance. Personally, I do not know the customer’s data in advance. I don’t know whether there is already an existing infrastructure, like a data warehouse, in place, or whether we would need to build one from scratch. Therefore, I always suggest to start with a proof-of-concept (POC) project. A POC takes something between 5 and 10 working days, and involves personnel from the customer’s site – either employees or outsourced consultants. The team should include a subject matter expert (SME) and at least one information technology (IT) expert. The SME must be familiar with both the domain in question as well as the meaning of data at hand, while the IT expert should be familiar with the structure of data, how to access it, and have some programming (preferably Transact-SQL) knowledge. With more than one IT expert the most time consuming work, namely data preparation and overview, can be completed sooner. I assume that the relevant data is already extracted and available at the very beginning of the POC project. If a customer wants to have their people involved in the project directly and requests the transfer of knowledge, the project begins with training. I strongly advise this approach as it offers the establishment of a common background for all people involved, the understanding of how the algorithms work and the understanding of how the results should be interpreted, a way of becoming familiar with the SQL Server suite, and more. Once the data has been extracted, the customer’s SME (i.e. the analyst), and the IT expert assigned to the project will learn how to prepare the data in an efficient manner. Together with me, knowledge and expertise allow us to focus immediately on the most interesting attributes and identify any additional, calculated, ones soon after. By employing our programming knowledge, we can, for example, prepare tens of derived variables, detect outliers, identify the relationships between pairs of input variables, and more, in only two or three days, depending on the quantity and the quality of input data. I favor the customer’s decision of assigning additional personnel to the project. For example, I actually prefer to work with two teams simultaneously. I demonstrate and explain the subject matter by applying techniques directly on the data managed by each team, and then both teams continue to work on the data overview and data preparation under our supervision. I explain to the teams what kind of results we expect, the reasons why they are needed, and how to achieve them. Afterwards we review and explain the results, and continue with new instructions, until we resolve all known problems. Simultaneously with the data preparation the data overview is performed. The logic behind this task is the same – again I show to the teams involved the expected results, how to achieve them and what they mean. This is also done in multiple cycles as is the case with data preparation, because, quite frankly, both tasks are completely interleaved. A specific objective of the data overview is of principal importance – it is represented by a simple star schema and a simple OLAP cube that will first of all simplify data discovery and interpretation of the results, and will also prove useful in the following tasks. The presence of the customer’s SME is the key to resolving possible issues with the actual meaning of the data. We can always replace the IT part of the team with another database developer; however, we cannot conduct this kind of a project without the customer’s SME. After the data preparation and when the data overview is available, we begin the scientific part of the project. I assist the team in developing a variety of models, and in interpreting the results. The results are presented graphically, in an intuitive way. While it is possible to interpret the results on the fly, a much more appropriate alternative is possible if the initial training was also performed, because it allows the customer’s personnel to interpret the results by themselves, with only some guidance from me. The models are evaluated immediately by using several different techniques. One of the techniques includes evaluation over time, where we use an OLAP cube. After evaluating the models, we select the most appropriate model to be deployed for a production test; this allows the team to understand the deployment process. There are many possibilities of deploying data mining models into production; at the POC stage, we select the one that can be completed quickly. Typically, this means that we add the mining model as an additional dimension to an existing DW or OLAP cube, or to the OLAP cube developed during the data overview phase. Finally, we spend some time presenting the results of the POC project to the stakeholders and managers. Even from a POC, the customer will receive lots of benefits, all at the sole risk of spending money and time for a single 5 to 10 day project: The customer learns the basic patterns of frauds and fraud detection The customer learns how to do the entire cycle with their own people, only relying on me for the most complex problems The customer’s analysts learn how to perform much more in-depth analyses than they ever thought possible The customer’s IT experts learn how to perform data extraction and preparation much more efficiently than they did before All of the attendees of this training learn how to use their own creativity to implement further improvements of the process and procedures, even after the solution has been deployed to production The POC output for a smaller company or for a subsidiary of a larger company can actually be considered a finished, production-ready solution It is possible to utilize the results of the POC project at subsidiary level, as a finished POC project for the entire enterprise Typically, the project results in several important “side effects” Improved data quality Improved employee job satisfaction, as they are able to proactively contribute to the central knowledge about fraud patterns in the organization Because eventually more minds get to be involved in the enterprise, the company should expect more and better fraud detection patterns After the POC project is completed as described above, the actual project would not need months of engagement from my side. This is possible due to our preference to transfer the knowledge onto the customer’s employees: typically, the customer will use the results of the POC project for some time, and only engage me again to complete the project, or to ask for additional expertise if the complexity of the problem increases significantly. I usually expect to perform the following tasks: Establish the final infrastructure to measure the efficiency of the deployed models Deploy the models in additional scenarios Through reports By including Data Mining Extensions (DMX) queries in OLTP applications to support real-time early warnings Include data mining models as dimensions in OLAP cubes, if this was not done already during the POC project Create smart ETL applications that divert suspicious data for immediate or later inspection I would also offer to investigate how the outcome could be transferred automatically to the central system; for instance, if the POC project was performed in a subsidiary whereas a central system is available as well Of course, for the actual project, I would repeat the data and model preparation as needed It is virtually impossible to tell in advance how much time the deployment would take, before we decide together with customer what exactly the deployment process should cover. Without considering the deployment part, and with the POC project conducted as suggested above (including the transfer of knowledge), the actual project should still only take additional 5 to 10 days. The approximate timeline for the POC project is, as follows: 1-2 days of training 2-3 days for data preparation and data overview 2 days for creating and evaluating the models 1 day for initial preparation of the continuous learning infrastructure 1 day for presentation of the results and discussion of further actions Quite frequently I receive the following question: are we going to find the best possible model during the POC project, or during the actual project? My answer is always quite simple: I do not know. Maybe, if we would spend just one hour more for data preparation, or create just one more model, we could get better patterns and predictions. However, we simply must stop somewhere, and the best possible way to do this, according to my experience, is to restrict the time spent on the project in advance, after an agreement with the customer. You must also never forget that, because we build the complete learning infrastructure and transfer the knowledge, the customer will be capable of doing further investigations independently and improve the models and predictions over time without the need for a constant engagement with me.

Read the article
Design Pattern Advice for Bluetooth App for Android

- by Aimee Jones

I’m looking for some advice on which patterns would apply to some of my work. I’m planning on doing a project as part of my college work and I need a bit of help. My main project is to make a basic Android bluetooth tracking system where the fixed locations of bluetooth dongles are mapped onto a map of a building. So my android app will regularly scan for nearby dongles and triangulate its location based on signal strength. The dongles location would be saved to a database along with their mac addresses to differentiate between them. The android phones location will then be sent to a server. This information will be used to show the phone’s location on a map of the building, or map of a route taken, on a website. My side project is to choose a suitable design pattern that could be implemented in this main project. I’m still a bit new to design patterns and am finding it hard to get my head around ones that may be suitable. I’ve heard maybe some that are aimed at web applications for the server side of things may be appropriate. My research so far is leading me to the following: Navigation Strategy Pattern Observer Pattern Command Pattern News Design Pattern Any advice would be a great help! Thanks

Read the article
Clean MVC design when there is viewer latency

- by Tony Suffolk 66

It isn't clear if this question has already been answered, so apologies in advance if this is a duplicate : I am implementing a game and trying to design around a clean MVC pattern - so my Control plane will implement the rules of the game (but not how the game is displayed), and the View plane implements how the game is displayed, and user iteraction - i.e. what game items or controls the user has activated. The challenge that I have is this : In my game the Control Plane can move game items more or less instaneously (The decision about what item to place where - and some of the initial consequences of that placement are reasonably trivial to calculate), but I want to design the Control Plane so that the View plane can display these movements either instaneously or using movement animations. The other complication is that player interaction must be locked out while those game items are moving (similar to chess - you can't attack an opposing piece as it moves past one of your pieces) So do I : Implement all the logic in the Control Plane asynchronously - and separate the descision making from the actions - so the Control plane decides piece 'A' needs to move to a given place - tells the view plane, and but does not implement the move in data until the view plane informs the control plane that the move/animation is complete. A lot of interlock points between the two layers. Implement all the control plane logic in one place - decisions and movement (keeping track of what moved where), and pass all the movements in one go to the View plane to do with what it will. Control Plane is almost fire and forget here. A hybrid of 1 & 2 - The control plane implements all the moves in a temporary data store - but maintains a second store which reflects what is actually visible to the viewer, based on calls and feedback from the View plane. All 3 are relatively easy to implement (target language is python), but having never done a clean MVC pattern with view latency before - I am not sure which design is best

Read the article
Authorization design-pattern / practice?

- by Lawtonfogle

On one end, you have users. On the other end, you have activities. I was wondering if there is a best practice to relate the two. The simplest way I can think of is to have every activity have a role, and assign every user every role they need. The problem is that this gets really messy in practice as soon as you go beyond a trivial system. A way I recently designed was to have users who have roles, and roles have privileges, and activities require some combinations of privileges. For the trivial case, this is more complex, but I think it will scale better. But after I implemented it, I felt like it was overkill for the system I had. Another option would be to have users, who have roles, and activities require you to have a certain role to perform with many activities sharing roles. A more complex variant of this would given activities many possible roles, which you only needed one of. And an even more complex variant would be to allow logical statements of role ownership to use an activity (i.e. Must have A and (B exclusive or C) and must not have D). I could continue to list more, but I think this already gives a picture. And many of these have trade offs. But in software design, there are oftentimes solutions, while perhaps not perfect in every possible case, are clearly top of the pack to an extent it isn't even considered opinion based (i.e. how to store passwords, plain text is worse, hashing better, hashing and salt even better, despite the increased complexity of each level) (i.e. 2, Smart UI designs for applications are bad, even if it is subjective as to what the best design is). So, is there a best practice for authorization design that is not purely opinion based/subjective?

Read the article
How does a search functionality fit in DDD with CQRS?

- by Songo

In Vaughn Vernon's book Implementing domain driven design and the accompanying sample application I found that he implemented a CQRS approach to the iddd_collaboration bounded context. He presents the following classes in the application service layer: CalendarApplicationService.java CalendarEntryApplicationService.java CalendarEntryQueryService.java CalendarQueryService.java I'm interested to know if an application will have a search page that feature numerous drop downs and check boxes with a smart text box to match different search patterns; How will you structure all that search logic? In a command service or a query service? Taking a look at the CalendarQueryService.java I can see that it has 2 methods for a huge query, but no logic at all to mix and match any search filters for example. I've heard that the application layer shouldn't have any business logic, so where will I construct my dynamic query? or maybe just clutter everything in the Query service?

Read the article
PHP class data implementation

- by Bakanyaka

I'm studying OOP PHP and have watched two tutorials that implement user login\registration system as an example. But implementation varies. Which way will be more correct one to work with data such as this? Load all data retrieved from database as array into a property called something like _data on class creation and further methods operate with this property Create separate properties for each field retrieved from database, on class creation load all data fields into respective properties and operate with that properties separately? Then let's say I want to create a method that returns a list of all users with their data. Which way is better? Method that returns just an array of userdata like this: Array([0]=>array([id] => 1, [username] => 'John', ...), [1]=>array([id] => 2, [username] => 'Jack', ...), ...) Method that creates a new instance of it's class for each user and returns an array of objects

Read the article
Big Data – Beginning Big Data Series Next Month in 21 Parts

- by Pinal Dave

Big Data is the next big thing. There was a time when we used to talk in terms of MB and GB of the data. However, the industry is changing and we are now moving to a conversation where we discuss about data in Petabyte, Exabyte and Zettabyte. It seems that the world is now talking about increased Volume of the data. In simple world we all think that Big Data is nothing but plenty of volume. In reality Big Data is much more than just a huge volume of the data. When talking about the data we need to understand about variety and volume along with volume. Though Big data look like a simple concept, it is extremely complex subject when we attempt to start learning the same. My Journey I have recently presented on Big Data in quite a few organizations and I have received quite a few questions during this roadshow event. I have collected all the questions which I have received and decided to post about them on the blog. In the month of October 2013, on every weekday we will be learning something new about Big Data. Every day I will share a concept/question and in the same blog post we will learn the answer of the same. Big Data – Plenty of Questions I received quite a few questions during my road trip. Here are few of the questions. I want to learn Big Data – where should I start? Do I need to know SQL to learn Big Data? What is Hadoop? There are so many organizations talking about Big Data, and every one has a different approach. How to start with big Data? Do I need to know Java to learn about Big Data? What is different between various NoSQL languages. I will attempt to answer most of the questions during the month long series in the next month. Big Data – Big Subject Big Data is a very big subject and I no way claim that I will be covering every single big data concept in this series. However, I promise that I will be indeed sharing lots of basic concepts which are revolving around Big Data. We will discuss from fundamentals about Big Data and continue further learning about it. I will attempt to cover the concept so simple that many of you might have wondered about it but afraid to ask. Your Role! During this series next month, I need your one help. Please keep on posting questions you might have related to big data as blog post comments and on Facebook Page. I will monitor them closely and will try to answer them as well during this series. Now make sure that you do not miss any single blog post in this series as every blog post will be linked to each other. You can subscribe to my feed or like my Facebook page or subscribe via email (by entering email in the blog post). Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Big Data, PostADay, SQL, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
Handling Types for Real and Complex Matrices in a BLAS Wrapper

- by mga

I come from a C background and I'm now learning OOP with C++. As an exercise (so please don't just say "this already exists"), I want to implement a wrapper for BLAS that will let the user write matrix algebra in an intuitive way (e.g. similar to MATLAB) e.g.: A = B*C*D.Inverse() + E.Transpose(); My problem is how to go about dealing with real (R) and complex (C) matrices, because of C++'s "curse" of letting you do the same thing in N different ways. I do have a clear idea of what it should look like to the user: s/he should be able to define the two separately, but operations would return a type depending on the types of the operands (R*R = R, C*C = C, R*C = C*R = C). Additionally R can be cast into C and vice versa (just by setting the imaginary parts to 0). I have considered the following options: As a real number is a special case of a complex number, inherit CMatrix from RMatrix. I quickly dismissed this as the two would have to return different types for the same getter function. Inherit RMatrix and CMatrix from Matrix. However, I can't really think of any common code that would go into Matrix (because of the different return types). Templates. Declare Matrix<T> and declare the getter function as T Get(int i, int j), and operator functions as Matrix *(Matrix RHS). Then specialize Matrix<double> and Matrix<complex>, and overload the functions. Then I couldn't really see what I would gain with templates, so why not just define RMatrix and CMatrix separately from each other, and then overload functions as necessary? Although this last option makes sense to me, there's an annoying voice inside my head saying this is not elegant, because the two are clearly related. Perhaps I'm missing an appropriate design pattern? So I guess what I'm looking for is either absolution for doing this, or advice on how to do better.

Read the article
Best practice to collect information from child objects

- by Markus

I'm regularly seeing the following pattern: public abstract class BaseItem { BaseItem[] children; // ... public void DoSomethingWithStuff() { StuffCollection collection = new StuffCollection(); foreach(child c : children) c.AddRequiredStuff(collection); // do something with the collection ... } public abstract void AddRequiredStuff(StuffCollection collection); } public class ConcreteItem : BaseItem { // ... public override void AddRequiredStuff(StuffCollection collection) { Stuff stuff; // ... collection.Add(stuff); } } Where I would use something like this, for better information hiding: public abstract class BaseItem { BaseItem[] children; // ... public void DoSomethingWithStuff() { StuffCollection collection = new StuffCollection(); foreach(child c : children) collection.AddRange(c.RequiredStuff()); // do something with the collection ... } public abstract StuffCollection RequiredStuff(); } public class ConcreteItem : BaseItem { // ... public override StuffCollection RequiredStuff() { StuffCollection stuffCollection; Stuff stuff; // ... stuffCollection.Add(stuff); return stuffCollection; } } What are pros and cons of each solution? For me, giving the implementation access to parent's information is some how disconcerting. On the other hand, initializing a new list, just to collect the items is a useless overhead ... What is the better design? How would it change, if DoSomethingWithStuff wouldn't be part of BaseItem but a third class? PS: there might be missing semicolons, or typos; sorry for that! The above code is not meant to be executed, but just for illustration.

Read the article

Search Results

Search found 69357 results on 2775 pages for 'data oriented design'.

Page 5/2775 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

- by splatto

- by Lirik

- by Pinal Dave

- by Luke Puplett

- by Rohith Nair

- by Sharath Chandra

- by Ikuzen

- by dbaseman

- by Blundell

- by BriskLabs Pakistan

- by Pinal Dave

- by BuckWoody

- by max

- by Anto

- by steven

- by Archer

- by Dejan Sarka

- by Aimee Jones

- by Tony Suffolk 66

- by Lawtonfogle

- by Songo

- by Bakanyaka

- by Pinal Dave

- by mga

- by Markus

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >